Re: Data loading from DB - data sizes and obstacles
On Fri, Aug 7, 2009 at 11:15 AM, Amit Nithiananith...@gmail.com wrote: All, An off and on project of mine has been to work on refactoring the way we load data from MySQL into Solr. Our current approach is fairly hard coded and not configurable as I would like. I was curious of people who have used the DIH and/or LuSQL to load data into Solr, how much data you typically load and obstacles you have run into along the way. For example, some of our SQL queries are quite complex with a bunch of joins which can cause headaches for the DB. I am mainly curious for those who use MySQL for comparison. I am also looking to evaluate DIH vs LuSQL (the 0.9.3 which I read about but haven't seen for download). Has any progress been made on the making DIH a separate library? I haven't seen enough interest in the DIH as a library yet. So it is not yet taken up Sorry for the flurry of questions but am interested in everyones response! Thanks Amit -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Data loading from DB - data sizes and obstacles
I have been a satisfied DIH user for a long time. The project I use Solr for, runs on a MySQL (5.1) version. There are 6 solr-cores in total with a combined index size of 12G. The database design is as relational as it can get, and writing SQL queries to fetch the data has always been always a problem. Thanks to DIH, I have honed my DB concepts and created nice procedure and views to faltten out data. For DIH, I have kept it simple select statements (or procedure calls) with a few entities and pushed the heavylifting to database(scripts). I am talking about 4million records here. I have never tried LuSQL. Cheers Avlesh On Fri, Aug 7, 2009 at 11:15 AM, Amit Nithian anith...@gmail.com wrote: All, An off and on project of mine has been to work on refactoring the way we load data from MySQL into Solr. Our current approach is fairly hard coded and not configurable as I would like. I was curious of people who have used the DIH and/or LuSQL to load data into Solr, how much data you typically load and obstacles you have run into along the way. For example, some of our SQL queries are quite complex with a bunch of joins which can cause headaches for the DB. I am mainly curious for those who use MySQL for comparison. I am also looking to evaluate DIH vs LuSQL (the 0.9.3 which I read about but haven't seen for download). Has any progress been made on the making DIH a separate library? Sorry for the flurry of questions but am interested in everyones response! Thanks Amit
Documentation for Master-Slave Replication missing for Solr1.3. No mirror site for Solr 1.4 distribution.
Hi, I want to know how to setup master-slave configuration for Solr 1.3 . I can't get documentation on the net. I found one for 1.4 but not for 1.3 . ReplicationHandler is not present in 1.3. Also, I would like to know from will I get the Solr 14. distribution. The Solr Site lists mirrors only for 1.3 dist. Regards, Ninad.
Re: Documentation for Master-Slave Replication missing for Solr1.3. No mirror site for Solr 1.4 distribution.
1.4 is not released yet. you can grab a nightly from here http://people.apache.org/builds/lucene/solr/nightly/ On Fri, Aug 7, 2009 at 12:47 PM, Ninad Rauthbase.user.ni...@gmail.com wrote: Hi, I want to know how to setup master-slave configuration for Solr 1.3 . I can't get documentation on the net. I found one for 1.4 but not for 1.3 . ReplicationHandler is not present in 1.3. Also, I would like to know from will I get the Solr 14. distribution. The Solr Site lists mirrors only for 1.3 dist. Regards, Ninad. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Documentation for Master-Slave Replication missing for Solr1.3. No mirror site for Solr 1.4 distribution.
On Fri, Aug 7, 2009 at 12:47 PM, Ninad Raut hbase.user.ni...@gmail.comwrote: Hi, I want to know how to setup master-slave configuration for Solr 1.3 . I can't get documentation on the net. I found one for 1.4 but not for 1.3 . ReplicationHandler is not present in 1.3. Also, I would like to know from will I get the Solr 14. distribution. The Solr Site lists mirrors only for 1.3 dist. Regards, Most documentation on the 1.3 script based replication is on the wiki at: http://wiki.apache.org/solr/CollectionDistribution http://wiki.apache.org/solr/SolrCollectionDistributionScripts http://wiki.apache.org/solr/SolrCollectionDistributionStatusStats http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline -- Regards, Shalin Shekhar Mangar.
Re: Documentation for Master-Slave Replication missing for Solr1.3. No mirror site for Solr 1.4 distribution.
Hi Noble, can these builds be used in production environment? Are they stable? we are not going live now, but in a few months we will. as such when will 1.4 be officially released? 2009/8/7 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com 1.4 is not released yet. you can grab a nightly from here http://people.apache.org/builds/lucene/solr/nightly/ On Fri, Aug 7, 2009 at 12:47 PM, Ninad Rauthbase.user.ni...@gmail.com wrote: Hi, I want to know how to setup master-slave configuration for Solr 1.3 . I can't get documentation on the net. I found one for 1.4 but not for 1.3 . ReplicationHandler is not present in 1.3. Also, I would like to know from will I get the Solr 14. distribution. The Solr Site lists mirrors only for 1.3 dist. Regards, Ninad. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
CorruptIndexException: Unknown format version
Hi, how can that happen, it is a new index, and it is already corrupt? Did anybody else something like this? WARN - 2009-08-07 10:44:54,925 | Solr index directory 'data/solr/index' doesn't exist. Creating new index... WARN - 2009-08-07 10:44:56,583 | solrconfig.xml uses deprecated admin/gettableFiles, Please update your config to use the ShowFileRequestHandler. WARN - 2009-08-07 10:44:56,586 | adding ShowFileRequestHandler with hidden files: [XSLT] ERROR - 2009-08-07 10:44:58,758 | java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException: Unknown format version: -7 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:433) at org.apache.solr.core.SolrCore.init(SolrCore.java:216) at org.apache.solr.core.SolrCore.getSolrCore(SolrCore.java:177) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) Best regards -- Maximilian Hütter blue elephant systems GmbH Wollgrasweg 49 D-70599 Stuttgart Tel: (+49) 0711 - 45 10 17 578 Fax: (+49) 0711 - 45 10 17 573 e-mail : max.huet...@blue-elephant-systems.com Sitz : Stuttgart, Amtsgericht Stuttgart, HRB 24106 Geschäftsführer: Joachim Hörnle, Thomas Gentsch, Holger Dietrich
Re: mergeFactor / indexing speed
Juhu, great news, guys. I merged my child entity into the root entity, and changed the custom entityprocessor to handle the additional columns correctly. And - indexing 160k documents now takes 5min instead of 1.5h! (Now I can go relaxed on vacation. :-D ) Conclusion: In my case performance was so bad because of constantly querying a database on a different machine (network traffic + db query per document). Thanks for all your help! Chantal Avlesh Singh schrieb: does DIH call commit periodically, or are things done in one big batch? AFAIK, one big batch. yes. There is no index available once the full-import started (and the searcher has no cache, other wise it still reads from that). There is no data (i.e. in the Admin/Luke frontend) visible until the import is finished correctly.
Re: Language Detection for Analysis?
Otis Gospodnetic wrote: Bradford, If I may: Have a look at http://www.sematext.com/products/language-identifier/index.html And/or http://www.sematext.com/products/multilingual-indexer/index.html .. and a Nutch plugin with similar functionality: http://lucene.apache.org/nutch/apidocs-1.0/org/apache/nutch/analysis/lang/LanguageIdentifier.html -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: Language Detection for Analysis?
Hi, On Fri, Aug 7, 2009 at 12:31 PM, Andrzej Bialeckia...@getopt.org wrote: .. and a Nutch plugin with similar functionality: http://lucene.apache.org/nutch/apidocs-1.0/org/apache/nutch/analysis/lang/LanguageIdentifier.html See also TIKA-209 [1] where I'm currently integrating the Nutch code to work with Tika. Tika 0.5 will have built-in language detection based on this. [1] https://issues.apache.org/jira/browse/TIKA-209 BR, Jukka Zitting
Help creating schema for indexable document
Hi Guys. I am struggling to create a schema with a determinist content model for a set of documents I want to index. My indexable documents will look something like: add doc field name=id1/field field name=codecode1/field field name=codecode2/field field name=categorymycategory/field /doc /add My service will be mission critical and will accept batch imports from a potentially unreliable source. Are there any xml schema guru's who can help me with creating xn xsd which will work with my sample document? Thanks in advance for your help, -- Ross -- View this message in context: http://www.nabble.com/Help-creating-schema-for-indexable-document-tp24862700p24862700.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: mergeFactor / indexing speed
On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: Juhu, great news, guys. I merged my child entity into the root entity, and changed the custom entityprocessor to handle the additional columns correctly. And - indexing 160k documents now takes 5min instead of 1.5h! I'm a little late to the party but you may also want to look at CachedSqlEntityProcessor. -- Regards, Shalin Shekhar Mangar.
Re: mergeFactor / indexing speed
Thanks for the tip, Shalin. I'm happy with 6 indexes running in parallel and completing in less than 10min, right now, but I'll have look anyway. Shalin Shekhar Mangar schrieb: On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: Juhu, great news, guys. I merged my child entity into the root entity, and changed the custom entityprocessor to handle the additional columns correctly. And - indexing 160k documents now takes 5min instead of 1.5h! I'm a little late to the party but you may also want to look at CachedSqlEntityProcessor. -- Regards, Shalin Shekhar Mangar.
Solr 1.4 in Production Environment-- Is it stable?
Hi, Has anyone used Solr 1.4 in production? There are some really nice features in it like - Directly adding POJOs to Solr - ReplicationHandler etc. Is 1.4 stable enought to be used in production?
Re: solr v1.4 in production?
On Wed, Jul 1, 2009 at 6:17 PM, Ed Summers e...@pobox.com wrote: Here at the Library of Congress we've got several production Solr instances running v1.3. We've been itching to get at what will be v1.4 and were wondering if anyone else happens to be using it in production yet. Any information you can provide would be most welcome. We're using Solr 1.4 built from r793546 in production along with the new java based replication. -- Regards, Shalin Shekhar Mangar.
Re: Solr 1.4 in Production Environment-- Is it stable?
I know a number of large companies using 1.4-dev. But you could also wait another month or so and get the real 1.4. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Ninad Raut hbase.user.ni...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, August 7, 2009 7:32:17 AM Subject: Solr 1.4 in Production Environment-- Is it stable? Hi, Has anyone used Solr 1.4 in production? There are some really nice features in it like - Directly adding POJOs to Solr - ReplicationHandler etc. Is 1.4 stable enought to be used in production?
Re: Language Detection for Analysis?
There are several free Language Detection libraries out there, as well as a few commercial ones. I think Karl Wettin has even written one as a plugin for Lucene. Nutch also has one, AIUI. I would just Google language detection. Also see http://www.lucidimagination.com/search/?q=language+detection, as this has been brought up many times before and I'm sure there are links in the archives. On Aug 6, 2009, at 3:46 PM, Bradford Stephens wrote: Hey there, We're trying to add foreign language support into our new search engine -- languages like Arabic, Farsi, and Urdu (that don't work with standard analyzers). But our data source doesn't tell us which languages we're actually collecting -- we just get blocks of text. Has anyone here worked on language detection so we can figure out what analyzers to use? Are there commercial solutions? Much appreciated! -- http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Item Facet
Thanks Avlesh. But I didn't get it. How a dynamic field would aggregate values in query time? On Thu, Aug 6, 2009 at 11:14 PM, Avlesh Singhavl...@gmail.com wrote: Dynamic fields might be an answer. If you had a field called product_* and these were populated with the corresponding values during indexing then faceting on these fields will give you the desired behavior. The only catch here is that the product names have to be known upfront. A wildcard support for field names in facet.fl is still to come in Solr. Here's the issue - https://issues.apache.org/jira/browse/SOLR-247 Cheers Avlesh On Fri, Aug 7, 2009 at 3:33 AM, David Lojudice Sobrinho dalss...@gmail.comwrote: I can't reindex because the aggregated/grouped result should change as the query changes... in other words, the result must by dynamic We've been thinking about a new handler for it something like: /select?q=laptoprows=0itemfacet=onitemfacet.field=product_name,min(price),max(price) Does it make sense? Something easier ready to use? On Thu, Aug 6, 2009 at 6:05 PM, Ge, Yao (Y.)y...@ford.com wrote: If you can reindex, simply rebuild the index with fields replaced by combining existing fields. -Yao -Original Message- From: David Lojudice Sobrinho [mailto:dalss...@gmail.com] Sent: Thursday, August 06, 2009 4:17 PM To: solr-user@lucene.apache.org Subject: Item Facet Hi... Is there any way to group values like shopping.yahoo.com or shopper.cnet.com do? For instance, I have documents like: doc1 - product_name1 - value1 doc2 - product_name1 - value2 doc3 - product_name1 - value3 doc4 - product_name2 - value4 doc5 - product_name2 - value5 doc6 - product_name2 - value6 I'd like to have a result grouping by product name with the value range per product. Something like: product_name1 - (value1 to value3) product_name2 - (value4 to value6) It is not like the current facet because the information is grouped by item, not the entire result. Any idea? Thanks! David Lojudice Sobrinho -- __ David L. S. dalss...@gmail.com __ -- __ David L. S. dalss...@gmail.com __
Re: Solr 1.4 in Production Environment-- Is it stable?
We also use 1.4 which has gotten hit with load tests of up to 2000queries/sec. Biggest thing is make sure you are using the slaves for that kind of load. Other than that 1.4 is pretty impressive. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Otis Gospodnetic otis_gospodne...@yahoo.com Reply-To: solr-user@lucene.apache.org Date: Fri, 7 Aug 2009 05:26:06 -0700 (PDT) To: solr-user@lucene.apache.org Subject: Re: Solr 1.4 in Production Environment-- Is it stable? I know a number of large companies using 1.4-dev. But you could also wait another month or so and get the real 1.4. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Ninad Raut hbase.user.ni...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, August 7, 2009 7:32:17 AM Subject: Solr 1.4 in Production Environment-- Is it stable? Hi, Has anyone used Solr 1.4 in production? There are some really nice features in it like - Directly adding POJOs to Solr - ReplicationHandler etc. Is 1.4 stable enought to be used in production?
Re: Item Facet
Are your product_name* fields numeric fields (integer or float)? Dals wrote: Hi... Is there any way to group values like shopping.yahoo.com or shopper.cnet.com do? For instance, I have documents like: doc1 - product_name1 - value1 doc2 - product_name1 - value2 doc3 - product_name1 - value3 doc4 - product_name2 - value4 doc5 - product_name2 - value5 doc6 - product_name2 - value6 I'd like to have a result grouping by product name with the value range per product. Something like: product_name1 - (value1 to value3) product_name2 - (value4 to value6) It is not like the current facet because the information is grouped by item, not the entire result. Any idea? Thanks! David Lojudice Sobrinho -- View this message in context: http://www.nabble.com/Item-Facet-tp24853669p24865535.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: CorruptIndexException: Unknown format version
Wow, that is an interesting one... I bet there is more than one Lucene version kicking around the classpath somehow. Try removing all of the servlet container's working directories. -Yonik http://www.lucidimagination.com On Fri, Aug 7, 2009 at 4:41 AM, Maximilian Hütterm...@blue-elephant-systems.com wrote: Hi, how can that happen, it is a new index, and it is already corrupt? Did anybody else something like this? WARN - 2009-08-07 10:44:54,925 | Solr index directory 'data/solr/index' doesn't exist. Creating new index... WARN - 2009-08-07 10:44:56,583 | solrconfig.xml uses deprecated admin/gettableFiles, Please update your config to use the ShowFileRequestHandler. WARN - 2009-08-07 10:44:56,586 | adding ShowFileRequestHandler with hidden files: [XSLT] ERROR - 2009-08-07 10:44:58,758 | java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException: Unknown format version: -7 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:433) at org.apache.solr.core.SolrCore.init(SolrCore.java:216) at org.apache.solr.core.SolrCore.getSolrCore(SolrCore.java:177) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) Best regards -- Maximilian Hütter blue elephant systems GmbH Wollgrasweg 49 D-70599 Stuttgart Tel : (+49) 0711 - 45 10 17 578 Fax : (+49) 0711 - 45 10 17 573 e-mail : max.huet...@blue-elephant-systems.com Sitz : Stuttgart, Amtsgericht Stuttgart, HRB 24106 Geschäftsführer: Joachim Hörnle, Thomas Gentsch, Holger Dietrich
Re: Item Facet
The behavior i'm expecting is something similar to a GROUP BY in a relational database. SELECT product_name, model, min(price), max(price), count(*) FROM t GROUP BY product_name, model The current schema: product_name (type: text) model (type: text) price (type: sfloat) On Fri, Aug 7, 2009 at 11:07 AM, Yao Geyao...@gmail.com wrote: Are your product_name* fields numeric fields (integer or float)? Dals wrote: Hi... Is there any way to group values like shopping.yahoo.com or shopper.cnet.com do? For instance, I have documents like: doc1 - product_name1 - value1 doc2 - product_name1 - value2 doc3 - product_name1 - value3 doc4 - product_name2 - value4 doc5 - product_name2 - value5 doc6 - product_name2 - value6 I'd like to have a result grouping by product name with the value range per product. Something like: product_name1 - (value1 to value3) product_name2 - (value4 to value6) It is not like the current facet because the information is grouped by item, not the entire result. Any idea? Thanks! David Lojudice Sobrinho -- View this message in context: http://www.nabble.com/Item-Facet-tp24853669p24865535.html Sent from the Solr - User mailing list archive at Nabble.com. -- __ David L. S. dalss...@gmail.com __
Is kill -9 safe or not?
I've seen several threads that are one or two years old saying that performing kill -9 on the java process running Solr either CAN, or CAN NOT corrupt your index. The more recent ones seem to say that it CAN NOT, but before I bake a kill -9 into my control script (which first tries a normal kill, of course), I'd like to hear the answer straight from the horse's mouth... I'm using Solr 1.4 nightly from about a month ago. Can I kill -9 without fear of having to rebuild my index? Thanks! Michael
Re: Preserving C++ and other weird tokens
On Thu, Aug 6, 2009 at 11:38 AM, Michael _ solrco...@gmail.com wrote: Hi everyone, I'm indexing several documents that contain words that the StandardTokenizer cannot detect as tokens. These are words like C# .NET C++ which are important for users to be able to search for, but get treated as C, NET, and C. How can I create a list of words that should be understood to be indivisible tokens? Is my only option somehow stringing together a lot of PatternTokenizers? I'd love to do something like tokenizer class=StandardTokenizer tokenwhitelist=.NET C++ C# /. Thanks in advance! By the way, in case it wasn't clear: I'm not particularly tied to using the StandardTokenizer. Any tokenizer would be fine, if it did a reasonable job of splitting up the input text while preserving special cases. I'm also not averse to passing in a list of regexes, if I had to, but I'm suspicious that that would be redoing a lot of the work done by the parser inside the Tokenizer. Thanks, Michael
Re: Is kill -9 safe or not?
Kill -9 will not corrupt your index, but you would lose any uncommitted documents. -Yonik http://www.lucidimagination.com On Fri, Aug 7, 2009 at 11:07 AM, Michael _solrco...@gmail.com wrote: I've seen several threads that are one or two years old saying that performing kill -9 on the java process running Solr either CAN, or CAN NOT corrupt your index. The more recent ones seem to say that it CAN NOT, but before I bake a kill -9 into my control script (which first tries a normal kill, of course), I'd like to hear the answer straight from the horse's mouth... I'm using Solr 1.4 nightly from about a month ago. Can I kill -9 without fear of having to rebuild my index? Thanks! Michael
Re: Preserving C++ and other weird tokens
http://search.lucidimagination.com/search/document/2d325f6178afc00a/how_to_search_for_c -Yonik http://www.lucidimagination.com On Thu, Aug 6, 2009 at 11:38 AM, Michael _solrco...@gmail.com wrote: Hi everyone, I'm indexing several documents that contain words that the StandardTokenizer cannot detect as tokens. These are words like C# .NET C++ which are important for users to be able to search for, but get treated as C, NET, and C. How can I create a list of words that should be understood to be indivisible tokens? Is my only option somehow stringing together a lot of PatternTokenizers? I'd love to do something like tokenizer class=StandardTokenizer tokenwhitelist=.NET C++ C# /. Thanks in advance!
Re: Attempt to query for max id failing with exception
I just tried this sample code... it worked fine for me on trunk. -Yonik http://www.lucidimagination.com On Thu, Aug 6, 2009 at 8:28 PM, Reuben Firminreub...@benetech.org wrote: I'm using SolrJ. When I attempt to set up a query to retrieve the maximum id in the index, I'm getting an exception. My setup code is: final SolrQuery params = new SolrQuery(); params.addSortField(id, ORDER.desc); params.setRows(1); params.setQuery(queryString); final QueryResponse queryResponse = server.query(params); This latter line is blowing up with: Not Found request: http://solr.xxx.myserver/select?sort=iddescrows=1q=*:*wt=javabinversion=2.2 org.apache.solr.common.SolrException org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(343) org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(183) org.apache.solr.client.solrj.request.QueryRequest#process(90) org.apache.solr.client.solrj.SolrServer#query(109) There are a couple things to note - - there is a space between id and desc which looks suspicious, but swapping changing wt to XML and leaving the URL otherwise the same causes solr no grief when queried via a browser - the index is in fact empty - this particular section of code is bulk loading our documents, and using the max id query to figure out where to start from. (I can and will try catching the exception and assuming 0, but ideally I wouldn't get an exception just from doing the query) Am I doing this query in the wrong way? Thanks Reuben
Re: Is kill -9 safe or not?
Yonik, Uncommitted (as in solr uncommited) on unflushed? Thanks, Otis - Original Message From: Yonik Seeley yo...@lucidimagination.com To: solr-user@lucene.apache.org Sent: Friday, August 7, 2009 11:10:49 AM Subject: Re: Is kill -9 safe or not? Kill -9 will not corrupt your index, but you would lose any uncommitted documents. -Yonik http://www.lucidimagination.com On Fri, Aug 7, 2009 at 11:07 AM, Michael _wrote: I've seen several threads that are one or two years old saying that performing kill -9 on the java process running Solr either CAN, or CAN NOT corrupt your index. The more recent ones seem to say that it CAN NOT, but before I bake a kill -9 into my control script (which first tries a normal kill, of course), I'd like to hear the answer straight from the horse's mouth... I'm using Solr 1.4 nightly from about a month ago. Can I kill -9 without fear of having to rebuild my index? Thanks! Michael
Re: Attempt to query for max id failing with exception
Yep, thanks - this turned out to be a systems configuration error. Our sysadmin hadn't opened up the http port on the server's internal network interface; I could browse to it from outside (i.e. firefox on my machine), but the apache landing page was being returned when CommonsHttpSolrServer tried to get at it. Reuben On Fri, Aug 7, 2009 at 12:03 PM, Yonik Seeley yo...@lucidimagination.comwrote: I just tried this sample code... it worked fine for me on trunk. -Yonik http://www.lucidimagination.com On Thu, Aug 6, 2009 at 8:28 PM, Reuben Firminreub...@benetech.org wrote: I'm using SolrJ. When I attempt to set up a query to retrieve the maximum id in the index, I'm getting an exception. My setup code is: final SolrQuery params = new SolrQuery(); params.addSortField(id, ORDER.desc); params.setRows(1); params.setQuery(queryString); final QueryResponse queryResponse = server.query(params); This latter line is blowing up with: Not Found request: http://solr.xxx.myserver/select?sort=iddescrows=1q=*:*wt=javabinversion=2.2 org.apache.solr.common.SolrException org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(343) org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(183) org.apache.solr.client.solrj.request.QueryRequest#process(90) org.apache.solr.client.solrj.SolrServer#query(109) There are a couple things to note - - there is a space between id and desc which looks suspicious, but swapping changing wt to XML and leaving the URL otherwise the same causes solr no grief when queried via a browser - the index is in fact empty - this particular section of code is bulk loading our documents, and using the max id query to figure out where to start from. (I can and will try catching the exception and assuming 0, but ideally I wouldn't get an exception just from doing the query) Am I doing this query in the wrong way? Thanks Reuben
Re: Is kill -9 safe or not?
On Fri, Aug 7, 2009 at 12:04 PM, Otis Gospodneticotis_gospodne...@yahoo.com wrote: Yonik, Uncommitted (as in solr uncommited) on unflushed? Solr uncommitted. Even if the docs hit the disk via a segment flush, they aren't part of the index until the index descriptor (segments_n) is written pointing to that new segment. -Yonik http://www.lucidimagination.com Thanks, Otis - Original Message From: Yonik Seeley yo...@lucidimagination.com To: solr-user@lucene.apache.org Sent: Friday, August 7, 2009 11:10:49 AM Subject: Re: Is kill -9 safe or not? Kill -9 will not corrupt your index, but you would lose any uncommitted documents. -Yonik http://www.lucidimagination.com On Fri, Aug 7, 2009 at 11:07 AM, Michael _wrote: I've seen several threads that are one or two years old saying that performing kill -9 on the java process running Solr either CAN, or CAN NOT corrupt your index. The more recent ones seem to say that it CAN NOT, but before I bake a kill -9 into my control script (which first tries a normal kill, of course), I'd like to hear the answer straight from the horse's mouth... I'm using Solr 1.4 nightly from about a month ago. Can I kill -9 without fear of having to rebuild my index? Thanks! Michael
Solr CMS Integration
I've been asked to suggest a framework for managing a website's content and making all that content searchable. I'm comfortable using Solr for search, but I don't know where to start with the content management system. Is anyone using a CMS (open source or commercial) that you've integrated with Solr for search and are happy with? This will be a consumer facing website with a combination or articles, blogs, white papers, etc. Thanks, Wojtek -- View this message in context: http://www.nabble.com/Solr-CMS-Integration-tp24868462p24868462.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Preserving C++ and other weird tokens
Ach, sorry I didn't find this before posting! - Michael Yonik Seeley-2 wrote: http://search.lucidimagination.com/search/document/2d325f6178afc00a/how_to_search_for_c -Yonik http://www.lucidimagination.com -- View this message in context: http://www.nabble.com/Preserving-%22C%2B%2B%22-and-other-weird-tokens-tp24848968p24868579.html Sent from the Solr - User mailing list archive at Nabble.com.
Question regarding merging Solr indexes
Hello, I have a MultiCore setup with 3 cores. I am trying to merge the indexes of core1 and core2 into core3. I looked at the wiki but I'm somewhat unclear on what needs to happen. This is what I used: http://localhost:9085/solr/core3/admin/?action=mergeindexescore=core3indexDir=/solrHome/core1/data/indexindexDir=/solrHome/core2/data/indexcommit=true When I hit this I just go to the admin page for core3. Maybe the way I reference the indexes is incorrect? What path goes there anyway? Thanks -- View this message in context: http://www.nabble.com/Question-regarding-merging-Solr-indexes-tp24868670p24868670.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr CMS Integration
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 wojtekpia schrieb: Hi Wojtek, I've been asked to suggest a framework for managing a website's content and making all that content searchable. I'm comfortable using Solr for search, but I don't know where to start with the content management system. Is anyone using a CMS (open source or commercial) that you've integrated with Solr for search and are happy with? This will be a consumer facing website with a combination or articles, blogs, white papers, etc. if you're comfortable with PHP you might want to look at Drupal (http://drupal.org/project/apachesolr) which sounds like a good match for your requirements... Regards, Andre -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkp8YlQACgkQ3wuzs9k1icVFSACgjRy7AOd+Aney7LDmpWTaIssz p74AnAn+/5So+qSfpfbXOXShCYZfAppS =zqHU -END PGP SIGNATURE-
Re: Solr CMS Integration
lucidimagination.com is powered off of Drupal and we index it using Solr (but not the Drupal plugin, as we have non CMS data as well). It has blogs, articles, white papers, mail archives, JIRA tickets, Wiki's etc. On Aug 7, 2009, at 1:01 PM, wojtekpia wrote: I've been asked to suggest a framework for managing a website's content and making all that content searchable. I'm comfortable using Solr for search, but I don't know where to start with the content management system. Is anyone using a CMS (open source or commercial) that you've integrated with Solr for search and are happy with? This will be a consumer facing website with a combination or articles, blogs, white papers, etc. Thanks, Wojtek -- View this message in context: http://www.nabble.com/Solr-CMS-Integration-tp24868462p24868462.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
localSolr install
Is there any sort of guide to installing and configuring localSolr into an existing solr implementation? I'm not extremely versed with java applications, but I've managed to cobble together jetty and solr multicore fairly reliably. I've downloaded localLucine 2.0 and localSolr 6.1, and this is where the guesswork starts. Any help is greatly appreciated.
Re: localSolr install
Hi All, I also need the same information. I am planning to set up solr. I have data around 20 to 30 million records and those in csv formats. Your help is highly appreciable. Regards, Bhargava S Akula. 2009/8/7 Brian Klippel br...@theport.com Is there any sort of guide to installing and configuring localSolr into an existing solr implementation? I'm not extremely versed with java applications, but I've managed to cobble together jetty and solr multicore fairly reliably. I've downloaded localLucine 2.0 and localSolr 6.1, and this is where the guesswork starts. Any help is greatly appreciated.
Re: Is kill -9 safe or not?
Thanks for the confirmation and reassurance! - Michael Yonik Seeley-2 wrote: On Fri, Aug 7, 2009 at 12:04 PM, Otis Gospodneticotis_gospodne...@yahoo.com wrote: Yonik, Uncommitted (as in solr uncommited) on unflushed? Solr uncommitted. Even if the docs hit the disk via a segment flush, they aren't part of the index until the index descriptor (segments_n) is written pointing to that new segment. -Yonik http://www.lucidimagination.com -- View this message in context: http://www.nabble.com/Is-kill--9-safe-or-not--tp24866506p24869260.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr CMS Integration
I would second that and add that you may want to consider acquia.com as they provide a solid infrustracture to support the solr instance. On Fri, Aug 7, 2009 at 11:20 AM, Andre Hagenbruch andre.hagenbr...@rub.dewrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 wojtekpia schrieb: Hi Wojtek, I've been asked to suggest a framework for managing a website's content and making all that content searchable. I'm comfortable using Solr for search, but I don't know where to start with the content management system. Is anyone using a CMS (open source or commercial) that you've integrated with Solr for search and are happy with? This will be a consumer facing website with a combination or articles, blogs, white papers, etc. if you're comfortable with PHP you might want to look at Drupal (http://drupal.org/project/apachesolr) which sounds like a good match for your requirements... Regards, Andre -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkp8YlQACgkQ3wuzs9k1icVFSACgjRy7AOd+Aney7LDmpWTaIssz p74AnAn+/5So+qSfpfbXOXShCYZfAppS =zqHU -END PGP SIGNATURE- -- Contact me: 801.850.2953 (cell or sms) facebook: http://www.facebook.com/profile.php?id=534661678 LinkedIn: http://www.linkedin.com/profile?viewProfile=key=3902213 website:scanalytix.com
Re: Solr CMS Integration
Thanks for the responses. I'll give Drupal a shot. It sounds like it'll do the trick, and if it doesn't then at least I'll know what I'm looking for. Wojtek -- View this message in context: http://www.nabble.com/Solr-CMS-Integration-tp24868462p24870218.html Sent from the Solr - User mailing list archive at Nabble.com.
PhoneticFilterFactory related questions
Hi, I have a schema with three (relevant to this question) fields: title, author, book_content. I found that if PhoneticFilterFactory is used as a filter on book_content, it was bringing back all kinds of unrelated results, so I have it applied only against title and author. Questions -- 1) I have the filter set up on both the index and query analyzers for the fieldType of title/author. When running against an index which had been built without the phonetic filter, phonetic searches still worked. Is there a performance benefit to applying the phonetic filter to the index analyzer as well as the query analyzer, are there other benefits to doing so, or should I not bother? (I.e. should I just apply the filter to the query analyzer?) 2) Title / author matches are generally boosted, which is fine if it's an exact match (i.e. Shakespeare In Love or by William Shakespeare are more relevant than a book which mentions Shakespeare). However, the phonetic filter put a bit of a spanner in the works - now if I search for bottling, books with the word b*a*ttling in the title show up above books with the non-substituted word in the content. How can I juggle the boosting / field setup to be something like: a) Title/author matches (with exactly matched spelling - stemming etc is fine) b) Content matches (with exactly matched spelling) c) Title/author matches (with phoneme equivalent spelling) Do I need to create separate non-phonetic title/author fields for this, or is there a different way to achieve the same effect? Thanks Reuben
Solr Security
Have anyone had an experience to setup the Solr Security? http://wiki.apache.org/solr/SolrSecurity I would like to implement using HTTP Authentication or using Path Based Authentication. So, in the webdefault.xml I set like the following: security-constraint web-resource-collection web-resource-nameSolr authenticated application/web-resource-name url-pattern/core1/*/url-pattern /web-resource-collection auth-constraint role-namecore1-role/role-name /auth-constraint /security-constraint login-config auth-methodBASIC/auth-method realm-nameTest Realm/realm-name /login-config What should I put in url-pattern and web-resource-name ? Then I set up Realm.properties like this guest: guest, core1-role Francis
Re: Solr CMS Integration
Am 07.08.2009 um 19:01 schrieb wojtekpia: I've been asked to suggest a framework for managing a website's content and making all that content searchable. I'm comfortable using Solr for search, but I don't know where to start with the content management system. Is anyone using a CMS (open source or commercial) that you've integrated with Solr for search and are happy with? This will be a consumer facing website with a combination or articles, blogs, white papers, etc. Hi Wojtek, Have a look at TYPO3. http://typo3.org/ It is quite powerful. Ingo and I are currently implementing a SOLR extension for it. We currently use it at http://www.be-lufthansa.com/ Contact me if you want an insight. Many greetings, Olivier -- Olivier Dobberkau . . . . . . . . . . . . . . Je TYPO3, desto d.k.d d.k.d Internet Service GmbH Kaiserstrasse 73 D 60329 Frankfurt/Main Fon: +49 (0)69 - 247 52 18 - 0 Fax: +49 (0)69 - 247 52 18 - 99 Mail: olivier.dobber...@dkd.de Web: http://www.dkd.de Registergericht: Amtsgericht Frankfurt am Main Registernummer: HRB 45590 Geschäftsführer: Olivier Dobberkau, Søren Schaffstein, Götz Wegenast Aktuelle Projekte: http://bewegung.taz.de - Launch (Ruby on Rails) http://www.hans-im-glueck.de - Relaunch (TYPO3) http://www.proasyl.de - Relaunch (TYPO3)
Re: Solr CMS Integration
Hello Wojtek, I don't want to discourage all the famous CMSs around nor solr uptake but xwiki is quite a powerful CMS and has a search that is lucene based. paul Le 07-août-09 à 22:42, Olivier Dobberkau a écrit : I've been asked to suggest a framework for managing a website's content and making all that content searchable. I'm comfortable using Solr for search, but I don't know where to start with the content management system. Is anyone using a CMS (open source or commercial) that you've integrated with Solr for search and are happy with? This will be a consumer facing website with a combination or articles, blogs, white papers, etc. Have a look at TYPO3. http://typo3.org/ It is quite powerful. Ingo and I are currently implementing a SOLR extension for it. We currently use it at http://www.be-lufthansa.com/ Contact me if you want an insight. smime.p7s Description: S/MIME cryptographic signature
spellcheck component in 1.4 distributed
I am e-mailing to inquire about the status of the spellchecking component in 1.4 (distributed). I saw SOLR-785, but it is unreleased and for 1.5. Any help would be much appreciated. Thanks in advance, Mike
Re: solr v1.4 in production?
Pubget has been using 1.4 for a while now to make the replication easier. http://pubget.com We compiled a while back and are thinking of updating to the latest build to start playing with distributed spell checking. On Fri, Aug 7, 2009 at 7:42 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Jul 1, 2009 at 6:17 PM, Ed Summers e...@pobox.com wrote: Here at the Library of Congress we've got several production Solr instances running v1.3. We've been itching to get at what will be v1.4 and were wondering if anyone else happens to be using it in production yet. Any information you can provide would be most welcome. We're using Solr 1.4 built from r793546 in production along with the new java based replication. -- Regards, Shalin Shekhar Mangar. -- Regards, Ian Connor
Can multiple Solr webapps access the same lucene index files?
Hello, I have a question I can't find an answer to in the list. Can mutliple solr webapps (for instance in separate cluster nodes) share the same lucene index files stored within a shared filesystem? We do this with a custom Lucene search application right now, I'm trying to switch to using solr and am curious if we can use the same deployment strategy. Mark
MoreLikeThis: How to get quality terms from html from content stream?
I'm using the MoreLikeThisHandler with a content stream to get documents from my index that match content from an html page like this: http://localhost:8080/solr/mlt?stream.url=http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2009/08/06/SP5R194Q13.DTLmlt.fl=bodyrows=4debugQuery=true But, not surprisingly, the query generated is meaningless because a lot of the markup is picked out as terms: str name=parsedquery_toString body:li body:href body:div body:class body:a body:script body:type body:js body:ul body:text body:javascript body:style body:css body:h body:img body:var body:articl body:ad body:http body:span body:prop /str Does anyone know a way to transform the html so that the content can be parsed out of the content stream and processed w/o the markup? Or do I need to write my own HTMLParsingMoreLikeThisHandler? If I parse the content out to a plain text file and point the stream.url param to file:///parsedfile.txt it works great. -Jay
How to use key with facet.prefix?
I'm trying to facet multiple times on same field using key. This works fine except when I use prefixes for these facets. What I got so far (and not functional): .. facet=true facet.field=categoryf.category.facet.prefix=01 facet.field={!key=subcat}categoryf.subcat.facet.prefix=00 This will give me 2 facets in results, one named 'category' and another 'subcat' like expected. But prefix for key 'subcat' is ignored and the other prefix is used for both facets. How do I use key with prefixes or am I barking up the wrong tree here? Thanks!
Re: Can multiple Solr webapps access the same lucene index files?
Yes, they could all point to an index that lives on a NAS or SAN, for example. You'd still have to make sure only one server is writing to the index at a time. Zookeeper can help with coordination of that. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Mark Diggory mdigg...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, August 7, 2009 8:16:46 PM Subject: Can multiple Solr webapps access the same lucene index files? Hello, I have a question I can't find an answer to in the list. Can mutliple solr webapps (for instance in separate cluster nodes) share the same lucene index files stored within a shared filesystem? We do this with a custom Lucene search application right now, I'm trying to switch to using solr and am curious if we can use the same deployment strategy. Mark
Re: Question regarding merging Solr indexes
On Fri, Aug 7, 2009 at 10:45 PM, ahammad ahmed.ham...@gmail.com wrote: Hello, I have a MultiCore setup with 3 cores. I am trying to merge the indexes of core1 and core2 into core3. I looked at the wiki but I'm somewhat unclear on what needs to happen. This is what I used: http://localhost:9085/solr/core3/admin/?action=mergeindexescore=core3indexDir=/solrHome/core1/data/indexindexDir=/solrHome/core2/data/indexcommit=true When I hit this I just go to the admin page for core3. Maybe the way I reference the indexes is incorrect? What path goes there anyway? Look at http://wiki.apache.org/solr/MergingSolrIndexes#head-0befd0949a54b6399ff926062279afec62deb9ce -- Regards, Shalin Shekhar Mangar.