Re: Setting termInfosIndexDivisor and Interval?
On Mon, Jul 20, 2009 at 8:04 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Are we currently supporting this or in 1.4? (i.e. IndexReader.open and IndexWriter.setTermIndexInterval) It's useful for trie range, shingles, etc, where many terms are potentially created. No, we don't currently but we should. Lets open an issue. -- Regards, Shalin Shekhar Mangar.
Help needed with Solr maxBooleanClauses
Hi, We have scenario where we need to send more than 1024 ids in the Solr url as OR condition. I have changed the value of maxBooleanClauses in solrconfig.xml, to 2048, but it is failing after handling 1024 OR conditions. Solr is throwing SEVERE: org.apache.solr.common.SolrException: Bad Request whenever I am sending more than 1024 OR conditions. Is there any way I can change this value on Solr configuration. Thanks, Dipanjan
How to configure Solr in Glassfish ?
I want use glassfish as the solr search server, but I don't know how to configure. Anybody knows? enzhao...@gmail.com Thanks! -- View this message in context: http://www.nabble.com/How-to-configure-Solr--in-Glassfish---tp24565758p24565758.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Help needed with Solr maxBooleanClauses
On Mon, Jul 20, 2009 at 1:37 PM, dipanjan_pramanick dipanjan_praman...@infosys.com wrote: Hi, We have scenario where we need to send more than 1024 ids in the Solr url as OR condition. I have changed the value of maxBooleanClauses in solrconfig.xml, to 2048, but it is failing after handling 1024 OR conditions. Solr is throwing SEVERE: org.apache.solr.common.SolrException: Bad Request whenever I am sending more than 1024 OR conditions. Is there any way I can change this value on Solr configuration. The maxBooleanClauses is there as a safe guard against extremely slow queries. If you can tell us about the exact problem you are solving, we may be able to suggest an alternative approach? Creating such huge boolean clauses may be a bad design choice. As for the exception you are seeing, it seems to me that you may be exceeding the size of a GET request. Using an HTTP POST request may work. -- Regards, Shalin Shekhar Mangar.
Re: Help needed with Solr maxBooleanClauses
Hi Shalin, Thanks for your time to respond to this issue. Its true that there is a design flaw, because of what we need to support a huge list of OR conditions through Solr. But still I would like to know if there is any other configuration other than the one in solrConfig.xml, through which we can pass more than 1024 OR conditions. maxBooleanClauses1024maxBooleanClauses Regarding HTTP Post, in Solr 1.3, it is only accepts request in url form not as request parameter or request object. That's another issue. Hence we need to send the query in url form only. Thanks, Dipanjan From: Shalin Shekhar Mangar shalinman...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Mon, 20 Jul 2009 13:58:55 +0530 To: solr-user@lucene.apache.org Subject: Re: Help needed with Solr maxBooleanClauses On Mon, Jul 20, 2009 at 1:37 PM, dipanjan_pramanick dipanjan_praman...@infosys.com wrote: Hi, We have scenario where we need to send more than 1024 ids in the Solr url as OR condition. I have changed the value of maxBooleanClauses in solrconfig.xml, to 2048, but it is failing after handling 1024 OR conditions. Solr is throwing SEVERE: org.apache.solr.common.SolrException: Bad Request whenever I am sending more than 1024 OR conditions. Is there any way I can change this value on Solr configuration. The maxBooleanClauses is there as a safe guard against extremely slow queries. If you can tell us about the exact problem you are solving, we may be able to suggest an alternative approach? Creating such huge boolean clauses may be a bad design choice. As for the exception you are seeing, it seems to me that you may be exceeding the size of a GET request. Using an HTTP POST request may work. -- Regards, Shalin Shekhar Mangar. CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
Re: Help needed with Solr maxBooleanClauses
On Mon, Jul 20, 2009 at 2:12 PM, dipanjan_pramanick dipanjan_praman...@infosys.com wrote: Its true that there is a design flaw, because of what we need to support a huge list of OR conditions through Solr. But still I would like to know if there is any other configuration other than the one in solrConfig.xml, through which we can pass more than 1024 OR conditions. maxBooleanClauses1024maxBooleanClauses Regarding HTTP Post, in Solr 1.3, it is only accepts request in url form not as request parameter or request object. That's another issue. Hence we need to send the query in url form only. Changing the value of maxBooleanClauses in solrconfig.xml is sufficient. The problem here is that you may be exceeding the maximum allowed size of an HTTP GET request (is that 2KB?). You must use POST request to send such a huge query string. Again, it will help if you can post the complete stack trace of the error. -- Regards, Shalin Shekhar Mangar.
Confusion around Binary/XML in SolrJ
I am using solr 1.4 dev in a multicore way. Each of my core's solrconfig.xml has the following lines requestHandler name=/update class=solr.XmlUpdateRequestHandler / requestHandler name=/update/javabin class=solr.BinaryUpdateRequestHandler / I am using SolrJ as EmbeddedSolrServer. When I try to add a POJO ( with @Field annotations ), the data does not get indexed. Where as, if I use SolrInputDocument way, the data gets indexed. PS: Both ways I am adding data using addBean/add and then commit followed by optimize PPS: The final intention is that all the indexing and searching needs to be done in the binary format since I am running on a single machine. Could someone provide insights on this issue ? Thanks!
Re: Confusion around Binary/XML in SolrJ
Another observation: I am even unable to delete documents using the EmbeddedSolrServer ( on a specific core ) Steps: 1) I have 2 cores ( core0 , core1 ) Each of them have ~10 records. 2) System.setProperty(solr.solr.home, /home/user/projects/solr/example/multi); File home = new File(/home/user/projects/solr/example/multi); File f = new File(home, solr.xml); CoreContainer coreContainer = new CoreContainer(); coreContainer.load(/home/user/projects/solr/example/multi, f); SolrServer server = new EmbeddedSolrServer(coreContainer, core1); server.deleteByQuery(*:*); server.commit(); server.optimize(); 3) When I check the status using http://localhost:8983/solr/admin/cores?action=STATUS , I still see same number of numDocs. 4) If I try deleting using CommonsHttpSolrServer, it works fine String url = http://localhost:8983/solr/core1;; CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); server.setSoTimeout(1000); // socket read timeout server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); // defaults to false server.setAllowCompression(true); server.setMaxRetries(1); // defaults to 0. 1 not recommended. server.setRequestWriter(new BinaryRequestWriter()); server.deleteByQuery(*:*); server.commit(); server.optimize(); Thanks! On Mon, Jul 20, 2009 at 3:26 PM, Code Tester codetester.codetes...@gmail.com wrote: I am using solr 1.4 dev in a multicore way. Each of my core's solrconfig.xml has the following lines requestHandler name=/update class=solr.XmlUpdateRequestHandler / requestHandler name=/update/javabin class=solr.BinaryUpdateRequestHandler / I am using SolrJ as EmbeddedSolrServer. When I try to add a POJO ( with @Field annotations ), the data does not get indexed. Where as, if I use SolrInputDocument way, the data gets indexed. PS: Both ways I am adding data using addBean/add and then commit followed by optimize PPS: The final intention is that all the indexing and searching needs to be done in the binary format since I am running on a single machine. Could someone provide insights on this issue ? Thanks!
Re: Help needed with Solr maxBooleanClauses
Hi Shalin, We just found that there is no limit on Solr side about the maximum boolean condition. We have set the maxBooleanClauses2048/maxBooleanClauses and we are able to send about 1574 OR conditions. Over that limit, we are getting HTTP/1.1 400 Bad Request. You are correct, it's not a Solr issue, its due to HTTP GET is not being able to send such a large request. But now the question is, Solr only accepts request in url form not as request parameter or request object. That's the main issue. Hence we need to send the query in url form only. Can you please suggest, if you have tried the similar thing instead of passing as URL, instead passing as object or request entity using POST. Thanks, Dipanjan From: Shalin Shekhar Mangar shalinman...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Mon, 20 Jul 2009 14:54:04 +0530 To: solr-user@lucene.apache.org Subject: Re: Help needed with Solr maxBooleanClauses On Mon, Jul 20, 2009 at 2:12 PM, dipanjan_pramanick dipanjan_praman...@infosys.com wrote: Its true that there is a design flaw, because of what we need to support a huge list of OR conditions through Solr. But still I would like to know if there is any other configuration other than the one in solrConfig.xml, through which we can pass more than 1024 OR conditions. maxBooleanClauses1024maxBooleanClauses Regarding HTTP Post, in Solr 1.3, it is only accepts request in url form not as request parameter or request object. That's another issue. Hence we need to send the query in url form only. Changing the value of maxBooleanClauses in solrconfig.xml is sufficient. The problem here is that you may be exceeding the maximum allowed size of an HTTP GET request (is that 2KB?). You must use POST request to send such a huge query string. Again, it will help if you can post the complete stack trace of the error. -- Regards, Shalin Shekhar Mangar. CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
post error - ERROR:unknown field 'title'
Hi guys. I have two different solr versions as I am evaluating nightly builds. On a more recent one.. I think 15th July I am getting the following error : ERROR:unknown field 'title' I am posting to 'solr/update/extract' with the following: curl http://localhost:8983/solr/update/extract?ext.literal.id=1ext.literal.code=somecodeext.literal.url=someurl/file.pdfext.literal.category=somecatext.literal.updated=2009-06-01T09:10:30.000Zext.idx.attr=true\ext.def.fl=text; -F myfi...@1411_9.pdf My schema does not, and is not intended to contain a 'title' field. Thanks in advance for your help, -- Ross -- View this message in context: http://www.nabble.com/post-error---ERROR%3Aunknown-field-%27title%27-tp24567235p24567235.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Confusion around Binary/XML in SolrJ
Sorry everyone. Found the issue. It was because of a very stupid assumption. My code and solr were running as 2 different processes! ( Weird part is that when I run the code using EmbeddedSolrServer, it did not throw any exception that there was already a server running on that port. ) Thanks! On Mon, Jul 20, 2009 at 3:41 PM, Code Tester codetester.codetes...@gmail.com wrote: Another observation: I am even unable to delete documents using the EmbeddedSolrServer ( on a specific core ) Steps: 1) I have 2 cores ( core0 , core1 ) Each of them have ~10 records. 2) System.setProperty(solr.solr.home, /home/user/projects/solr/example/multi); File home = new File(/home/user/projects/solr/example/multi); File f = new File(home, solr.xml); CoreContainer coreContainer = new CoreContainer(); coreContainer.load(/home/user/projects/solr/example/multi, f); SolrServer server = new EmbeddedSolrServer(coreContainer, core1); server.deleteByQuery(*:*); server.commit(); server.optimize(); 3) When I check the status using http://localhost:8983/solr/admin/cores?action=STATUS , I still see same number of numDocs. 4) If I try deleting using CommonsHttpSolrServer, it works fine String url = http://localhost:8983/solr/core1;; CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); server.setSoTimeout(1000); // socket read timeout server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); // defaults to false server.setAllowCompression(true); server.setMaxRetries(1); // defaults to 0. 1 not recommended. server.setRequestWriter(new BinaryRequestWriter()); server.deleteByQuery(*:*); server.commit(); server.optimize(); Thanks! On Mon, Jul 20, 2009 at 3:26 PM, Code Tester codetester.codetes...@gmail.com wrote: I am using solr 1.4 dev in a multicore way. Each of my core's solrconfig.xml has the following lines requestHandler name=/update class=solr.XmlUpdateRequestHandler / requestHandler name=/update/javabin class=solr.BinaryUpdateRequestHandler / I am using SolrJ as EmbeddedSolrServer. When I try to add a POJO ( with @Field annotations ), the data does not get indexed. Where as, if I use SolrInputDocument way, the data gets indexed. PS: Both ways I am adding data using addBean/add and then commit followed by optimize PPS: The final intention is that all the indexing and searching needs to be done in the binary format since I am running on a single machine. Could someone provide insights on this issue ? Thanks!
Solr and UIMA
We are starting to use UIMA as a platform to analyze the text. The result of analyzing a document is a UIMA CAS. A Cas is a generic data structure that can contain different data. UIMA processes single documents, They get the documents from a CAS producer, process them using a PIPE that the user defines and finally sends the result to a CAS consumer, that saves or stores the result. The pipe is then a connection of different tools that annotate the text with different information. Different sets of tools are available out there, each of them deffining it's own data type's that are included in the CAS. To perform a PIPE output and input CAS of the elements to connect need to be compatible There is CAS consumer that feeds a LUCENE index, it's called LUCAS but I was looking to it, and I prefer to use UIMA connected to SOLR, why? A: I know solr ;-) and i like it B: I can configure the fields and their processing in solr using xml. Once done then I have it ready to use with a set of tools that allow me to easily explore the data C: Is easier to use SOLR as a web service that may receive docs from different UIMA's (Natural Language processing is CPU intensive ) D: Break things down. The CAS would only produce XML that solr can process. Then different Tokenizers can be used to deal with the data in the CAS. the main point is that the XML has a the doc and field labels of solr. E: The set of capabilities to process the xml is defined in XML, similar to lucas to define the ouput and in the solr schema to define how this is processed. I want to use it in order to index something that is common but I can't get any tool to do that with sol: indexing a word and coding at the same position the syntactic and semantic information. I know that in Lucene this is evolving and it will be possible to include metadata but for the moment So, my idea is first to produce a UIMA CAS consumer that performs the POST of an XML file containing the plain text text of the document to SOLR; then try to modify this in order to include multiple fields and start coding the semantic information. So, before starting, i would like to know your opinions and if anyone is interested to collaborate, or has some code that can be integrated into this. -- View this message in context: http://www.nabble.com/Solr-and-UIMA-tp24567504p24567504.html Sent from the Solr - User mailing list archive at Nabble.com.
Wildcards at the Beginning of a Search.
Hallo Solr Users... I tryed to search with a Wildcard at the beginning from a search. for example, i will search for *est and get test, vogelnest, fest, But it dosent work, i alsways get an error... Now my Big brother GOOGLE tolds me, that it can work but a search with a Wildcad at the beginning need a long time... Now i will test ist. but How?
Re: Help needed with Solr maxBooleanClauses
If yours is a JAVA stack of application, I would recommend moving to SolrJ. It is a client API which lets you talk to Solr. Know more about it here - http://wiki.apache.org/solr/Solrj Clients API's for other languages can be found here - http://wiki.apache.org/solr/#head-ab1768efa59b26cbd30f1acd03b633f1d110ed47 Cheers Avlesh On Mon, Jul 20, 2009 at 3:44 PM, dipanjan_pramanick dipanjan_praman...@infosys.com wrote: Hi Shalin, We just found that there is no limit on Solr side about the maximum boolean condition. We have set the maxBooleanClauses2048/maxBooleanClauses and we are able to send about 1574 OR conditions. Over that limit, we are getting HTTP/1.1 400 Bad Request. You are correct, it's not a Solr issue, its due to HTTP GET is not being able to send such a large request. But now the question is, Solr only accepts request in url form not as request parameter or request object. That's the main issue. Hence we need to send the query in url form only. Can you please suggest, if you have tried the similar thing instead of passing as URL, instead passing as object or request entity using POST. Thanks, Dipanjan From: Shalin Shekhar Mangar shalinman...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Mon, 20 Jul 2009 14:54:04 +0530 To: solr-user@lucene.apache.org Subject: Re: Help needed with Solr maxBooleanClauses On Mon, Jul 20, 2009 at 2:12 PM, dipanjan_pramanick dipanjan_praman...@infosys.com wrote: Its true that there is a design flaw, because of what we need to support a huge list of OR conditions through Solr. But still I would like to know if there is any other configuration other than the one in solrConfig.xml, through which we can pass more than 1024 OR conditions. maxBooleanClauses1024maxBooleanClauses Regarding HTTP Post, in Solr 1.3, it is only accepts request in url form not as request parameter or request object. That's another issue. Hence we need to send the query in url form only. Changing the value of maxBooleanClauses in solrconfig.xml is sufficient. The problem here is that you may be exceeding the maximum allowed size of an HTTP GET request (is that 2KB?). You must use POST request to send such a huge query string. Again, it will help if you can post the complete stack trace of the error. -- Regards, Shalin Shekhar Mangar. CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
Re: Confusion around Binary/XML in SolrJ
On Jul 20, 2009, at 6:11 AM, Code Tester wrote: I am even unable to delete documents using the EmbeddedSolrServer ( on a specific core ) Steps: 1) I have 2 cores ( core0 , core1 ) Each of them have ~10 records. 2) System.setProperty(solr.solr.home, /home/user/projects/solr/example/multi); File home = new File(/home/user/projects/solr/example/multi); File f = new File(home, solr.xml); CoreContainer coreContainer = new CoreContainer(); coreContainer.load(/home/user/projects/solr/example/multi, f); SolrServer server = new EmbeddedSolrServer(coreContainer, core1); server.deleteByQuery(*:*); server.commit(); server.optimize(); 3) When I check the status using http://localhost:8983/solr/admin/cores?action=STATUS , I still see same Assuming both of these point to the same Solr home and data directory, a commit is needed on the HTTP process in order for it to pick up changes made to the underlying index (that occurred without its knowledge). Erik
Re: Wildcards at the Beginning of a Search.
See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently does not have leading wildcard support enabled. Erik On Jul 20, 2009, at 8:09 AM, Jörg Agatz wrote: Hallo Solr Users... I tryed to search with a Wildcard at the beginning from a search. for example, i will search for *est and get test, vogelnest, fest, But it dosent work, i alsways get an error... Now my Big brother GOOGLE tolds me, that it can work but a search with a Wildcad at the beginning need a long time... Now i will test ist. but How?
Posting multiple documents at once - clarification
Hi, When we post a file with a number of documents of the format shown below to solr, if there is some 'error' in one of the doc, then all the docs in the file are error-ed out and not added to the Solr-index. ?xml add doc ... /doc doc ... /doc /add Is there any way by which we can tell solr to skip only the docs that have the actual error? or should we need to post each doc in a separate file to achieve granularity in adding? -- Thanks, Vanniarajan
Re: Posting multiple documents at once - clarification
if the error is an xml parsing error there is no way of continuing from that point. even otherwise , solr assumes that if the whole payload is not correct it is to be discarded On Mon, Jul 20, 2009 at 6:32 PM, Vannia Rajankvanniara...@gmail.com wrote: Hi, When we post a file with a number of documents of the format shown below to solr, if there is some 'error' in one of the doc, then all the docs in the file are error-ed out and not added to the Solr-index. ?xml add doc ... /doc doc ... /doc /add Is there any way by which we can tell solr to skip only the docs that have the actual error? or should we need to post each doc in a separate file to achieve granularity in adding? -- Thanks, Vanniarajan -- - Noble Paul | Principal Engineer| AOL | http://aol.com
RE: Word frequency count in the index
Hi Wunder, Thanks for your reply! I take your point. It has to be appropriate to your content... In the cases I deal with, using stop words wouldn't be a big deal because the documents we handle are usually a proper article (although titles could still be impacted by it). I based my stop words on the most frequent terms I could find on my index when I indexed my whole database. I'm sure it will change over time but itf would deal with the rest. I'm inclined to keep it like this, but maybe some tests and real query analisys would be good. I will let you know if any interesting patterns emerges. Cheers, Daniel -Original Message- From: Walter Underwood [mailto:wunderw...@netflix.com] Sent: 16 July 2009 17:15 To: solr-user@lucene.apache.org Subject: Re: Word frequency count in the index I haven't researched old versions of Lucene, but I think it has always been a vector space, tf.idf engine. I don't see any hint of probabilistic scoring. A bit of background about stop words and idf. They are two versions of the same thing. Stop words are a manual, on/off decision about what words are important. That decision is high risk and easy to get wrong. We have a movie titled To be and to have. Oops. Inverse document frequency (idf) replaces that on/off control with a proportional weight calculated from the index. For Netflix, that means that weeds: season 2 has a high weight for weeds and lower weights for season and 2. In my control theory course, my professor told me to only use proportional control when on/off didn't work. Well, stop words don't work and idf does. For a longer list of movie titles entirely made of stop words, go here: http://wunderwood.org/most_casual_observer/2007/05/invisible_titles.html wunder On 7/16/09 8:50 AM, Daniel Alheiros daniel.alhei...@bbc.co.uk wrote: Hi Walter, Has it always been there? Which version of Lucene are we talking about? Regards, Daniel -Original Message- From: Walter Underwood [mailto:wunderw...@netflix.com] Sent: 16 July 2009 15:04 To: solr-user@lucene.apache.org Subject: Re: Word frequency count in the index Lucene uses a tf.idf relevance formula, so it automatically finds common words (stop words) in your documents and gives them lower weight. I recommend not removing stop words at all and letting Lucene handle the weighting. wunder On 7/16/09 3:29 AM, Pooja Verlani pooja.verl...@gmail.com wrote: Hi, Is there any way in SOLR to know the count of each word indexed in the solr ? I want to find out the different word frequencies to figure out ' application specific stop words'. Please let me know if its possible. Thank you, Regards, Pooja http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
method inform of SolrCoreAware callled 2 times
Hey there, I have implemented a custom component wich extends SearchComponent and implements SolrCoreAware. I have decalred it in solrconfig.xml as: searchComponent name=mycomp class=solr.MyCustomComponent And added it in my Searchhandler as: arr name=last-components strmycomp/str /arr I am using multicore with two cores. I have noticed (doing some logging) that the method inform (the ones that implements SolrCoreAware) in being called 2 times per each core when I start my solr instance. As I understood SolrCoreAware inform method should be just called once per core, am I right or it's normal that is is called 2 times per core? -- View this message in context: http://www.nabble.com/method-inform-of-SolrCoreAware-callled-2-times-tp24570221p24570221.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcards at the Beginning of a Search.
There is a hacky way to do it if you can pull it off. You can prepend some known prefix to the field then strip it off when you get the results back. An example would be putting Phone: in front of every value in a phone number field then instead of searching like this *-111- (which won't work) you would search (Phone: *-111-). Keep in mind this way will work syntactically but basically changes the index into a file sort so you will see a performance dip. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Erik Hatcher e...@ehatchersolutions.com Reply-To: solr-user@lucene.apache.org Date: Mon, 20 Jul 2009 08:20:15 -0400 To: solr-user@lucene.apache.org Subject: Re: Wildcards at the Beginning of a Search. See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently does not have leading wildcard support enabled. Erik On Jul 20, 2009, at 8:09 AM, Jörg Agatz wrote: Hallo Solr Users... I tryed to search with a Wildcard at the beginning from a search. for example, i will search for *est and get test, vogelnest, fest, But it dosent work, i alsways get an error... Now my Big brother GOOGLE tolds me, that it can work but a search with a Wildcad at the beginning need a long time... Now i will test ist. but How?
Re: Wildcards at the Beginning of a Search.
Add setAllowLeadingWildcard(true); to the constructor of org.apache.solr.search.SolrQueryParser.java Gr, Reza On Jul 20, 2009, at 4:00 PM, Jeff Newburn wrote: There is a hacky way to do it if you can pull it off. You can prepend some known prefix to the field then strip it off when you get the results back. An example would be putting Phone: in front of every value in a phone number field then instead of searching like this *-111- (which won't work) you would search (Phone: *-111-). Keep in mind this way will work syntactically but basically changes the index into a file sort so you will see a performance dip. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Erik Hatcher e...@ehatchersolutions.com Reply-To: solr-user@lucene.apache.org Date: Mon, 20 Jul 2009 08:20:15 -0400 To: solr-user@lucene.apache.org Subject: Re: Wildcards at the Beginning of a Search. See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently does not have leading wildcard support enabled. Erik On Jul 20, 2009, at 8:09 AM, Jörg Agatz wrote: Hallo Solr Users... I tryed to search with a Wildcard at the beginning from a search. for example, i will search for *est and get test, vogelnest, fest, But it dosent work, i alsways get an error... Now my Big brother GOOGLE tolds me, that it can work but a search with a Wildcad at the beginning need a long time... Now i will test ist. but How?
Solr tika and posting .pst files
Hi, I am using Solr-Tika to post various files.When i try to post .pst file(outlook express), the file is being posted but it does not contain any data.I could not found anything useful after googling. Regarding solrschema , i use 1) id 2) content(this is the default field) Do i need to configure Tika to be able to handle .pst format ? ,I would like to hear your suggestions. Note:1) I use VB.NET as a front end tool. 2) Other file contents are properly mapped to content field. -- Yours, S.Selvam
RE: Wildcards at the Beginning of a Search.
Depending on how you are sending docs in for indexing, you could also add an additional field who's value was a string reverse of the primary value. Then search that field with a trialing wildcard. -Original Message- From: Jeff Newburn [mailto:jnewb...@zappos.com] Sent: Monday, July 20, 2009 10:00 AM To: solr-user@lucene.apache.org Subject: Re: Wildcards at the Beginning of a Search. There is a hacky way to do it if you can pull it off. You can prepend some known prefix to the field then strip it off when you get the results back. An example would be putting Phone: in front of every value in a phone number field then instead of searching like this *-111- (which won't work) you would search (Phone: *-111-). Keep in mind this way will work syntactically but basically changes the index into a file sort so you will see a performance dip. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Erik Hatcher e...@ehatchersolutions.com Reply-To: solr-user@lucene.apache.org Date: Mon, 20 Jul 2009 08:20:15 -0400 To: solr-user@lucene.apache.org Subject: Re: Wildcards at the Beginning of a Search. See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently does not have leading wildcard support enabled. Erik On Jul 20, 2009, at 8:09 AM, Jörg Agatz wrote: Hallo Solr Users... I tryed to search with a Wildcard at the beginning from a search. for example, i will search for *est and get test, vogelnest, fest, But it dosent work, i alsways get an error... Now my Big brother GOOGLE tolds me, that it can work but a search with a Wildcad at the beginning need a long time... Now i will test ist. but How?
Re: Posting multiple documents at once - clarification
2009/7/20 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com if the error is an xml parsing error there is no way of continuing from that point. even otherwise , solr assumes that if the whole payload is not correct it is to be discarded Thank you for your response -- Thanks, Vanniarajan
Implementing related tags
Hi, I have a specific requirement for searching and looking for some help from the community on how to achieve it using solr: I need to index 1million + documents. Each document contains ( among other fields ) 3 fields representing the category which that doc belongs to. For example ( a very simplied case to make it easier to explain ) Doc 1 Place : NY, Paris, Tokyo Authors: AuthorA, AuthorB, AuthorC, AuthorD Tags: tagA, tagB, ballon Doc 2 Place : Bangkok Authors: AuthorD Tags: tagZ So each doc can contain multiple values for each of above fields ( place, author, tags ) Now the searching requirements is that, by constrainting on one of the value, I need a search on related fields. Example: By giving a constraint Author: AuthorD, I need a search on the search space: Place: Ny, Paris, Tokyo and London Author: AuthorA, AuthorB, AuthorC, Tags: tagA, tagB and tagZ ( The above result is generated by the fact that every item in the result has atleast 1 doc in common with AuthorD ) So as I am typing Ba, I need to get Ballon and Bangkok ( These Tags and Places have atleast 1doc where it also had AuthorD ) Is such a system possible to implement using solr? Thanks!
Recommended Articles
Does anyone have links or books to recommended reading on search in general. Would like to see some literature on larger search concepts and ideas. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562
Re: Recommended Articles
http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Dstripbooksfield-keywords=searchx=0y=0 Does anyone have links or books to recommended reading on search in general. Would like to see some literature on larger search concepts and ideas. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562
Re: Recommended Articles
dar...@ontrenet.com wrote: http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Dstripbooksfield-keywords=searchx=0y=0 Does anyone have links or books to recommended reading on search in general. Would like to see some literature on larger search concepts and ideas. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 Check out: http://wiki.apache.org/lucene-java/InformationRetrieval Some good stuff there, though I don't think often updated. My favorite is this free gem: http://www-csli.stanford.edu/~hinrich/information-retrieval-book.html -- - Mark http://www.lucidimagination.com
Solr JMX and Cacti
Hey all, We have several deployments of Solr across our enterprise. Our largest one is a several GB and when enough documents are added an OOM exception is occurring. To debug this problem I have enable JMX. My goal is to write some cacti templates similar to the ones I have done for hadoop. http://www.jointhegrid.com/hadoop/. The only cacti template for solr I have found is old, broken and is using curl and PHP to try and read the values off the web interface. I have a few general questions/comments and also would like to know how others are dealing with this. 1) SNMP has counters/gauges. With JMX it is hard to know what a variable is without watching it for a while. Some fields are obvious, (total_x) (cumulative_x) it is worth wild to add some notes in the MBEAN info to say works like counter works like gauge. This way a network engineer like me does not have to go code surfing to figure out how to graph them. Has anyone written up a list of what the attributes are, types, and what they mean? 2) The values that are not counter style I am assuming are sampled, what is the sampling rate and is it adjustable? Any tips are helpful. Thank you,
Re: Solr JMX and Cacti
On Jul 20, 2009, at 8:47 AM, Edward Capriolo wrote: Hey all, We have several deployments of Solr across our enterprise. Our largest one is a several GB and when enough documents are added an OOM exception is occurring. To debug this problem I have enable JMX. My goal is to write some cacti templates similar to the ones I have done for hadoop. http://www.jointhegrid.com/hadoop/. The only cacti template for solr I have found is old, broken and is using curl and PHP to try and read the values off the web interface. I have a few general questions/comments and also would like to know how others are dealing with this. 1) SNMP has counters/gauges. With JMX it is hard to know what a variable is without watching it for a while. Some fields are obvious, (total_x) (cumulative_x) it is worth wild to add some notes in the MBEAN info to say works like counter works like gauge. This way a network engineer like me does not have to go code surfing to figure out how to graph them. Has anyone written up a list of what the attributes are, types, and what they mean? 2) The values that are not counter style I am assuming are sampled, what is the sampling rate and is it adjustable? Any tips are helpful. Thank you, Check: http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/handler/RequestHandlerBase.java For cacti, you should probably ignore the two 'rate' based calculations as they are just derivatives: lst.add(avgTimePerRequest, (float) totalTime / (float) this.numRequests); lst.add(avgRequestsPerSecond, (float) numRequests*1000 / (float)(System.currentTimeMillis()-handlerStart));
SolrJ embedded server : error while adding document
Hi SolR guys, I'm starting to play with SolR after few years with classic Lucene. I'm trying to index a single document using the embedded server, but I got a strange error which looks like XML parsing problem (see trace hereafter). To add details, this is a simple Junit which create single document then pass it to the server in a ArraylistSolrInputDocument. The document only have 2 fields id and text as it is described in the configuration. ul 20, 2009 5:50:50 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: missing content stream at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:114) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:147) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) at org.weblab_project.services.solr.SolrComponent.flushIndexBuffer(SolrComponent.java:132) at org.weblab_project.services.solr.SolrComponentTest.testAddOneDocument(SolrComponentTest.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:154) at junit.framework.TestCase.runBare(TestCase.java:127) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196) Jul 20, 2009 5:50:50 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=/update params={} status=500 QTime=6 Cannot flush the index buffer : Server error while adding documents -- Gérard Dupont Information Processing Control and Cognition (IPCC) - EADS DS http://weblab-project.org Document Learning team - LITIS Laboratory
Re: SolrJ embedded server : error while adding document
my mistake, pb with the buffer I added. But it raises a question : does solr (using embedded server) has its own buffer mechanism in indexing or not ? I guess not but I might be wrong. 2009/7/20 Gérard Dupont ger.dup...@gmail.com Hi SolR guys, I'm starting to play with SolR after few years with classic Lucene. I'm trying to index a single document using the embedded server, but I got a strange error which looks like XML parsing problem (see trace hereafter). To add details, this is a simple Junit which create single document then pass it to the server in a ArraylistSolrInputDocument. The document only have 2 fields id and text as it is described in the configuration. ul 20, 2009 5:50:50 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: missing content stream at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:114) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:147) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) at org.weblab_project.services.solr.SolrComponent.flushIndexBuffer(SolrComponent.java:132) at org.weblab_project.services.solr.SolrComponentTest.testAddOneDocument(SolrComponentTest.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:154) at junit.framework.TestCase.runBare(TestCase.java:127) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196) Jul 20, 2009 5:50:50 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=/update params={} status=500 QTime=6 Cannot flush the index buffer : Server error while adding documents -- Gérard Dupont Information Processing Control and Cognition (IPCC) - EADS DS http://weblab-project.org Document Learning team - LITIS Laboratory -- Gérard Dupont Information Processing Control and Cognition (IPCC) - EADS DS http://weblab-project.org Document Learning team - LITIS Laboratory
Exception searching PhoneticFilterFactory field with number
Reposting in hopes of an answer... Hello all, I am getting the following exception whenever a user includes a numeric term in their search, and the search includes a field defined with a PhoneticFilterFactory and further it occurs whether I use the DoubleMetaphone encoder or any other. Has this ever come up before? I can replicate this with no data in the index at all, but if I search the field by hand from the solr web interface there is no exception. I am running the lucid imagination 1.3 certified release in a multicore master/slaves configuration. I will include the field def and the search/exception below and let me know if I can include any more clues... seems like it's trying to make a field with no name/value: fieldType name=spellcheck class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ !--filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/-- filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=false/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=query_synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ !--filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/-- filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=false/ /analyzer /fieldType Jul 17, 2009 2:42:18 PM org.apache.solr.core.SolrCore execute INFO: [10017] webapp=/solr path=/select/ params={f.partitionId.facet.limit=10f.categoryId.facet.missing=falsef. categoryId.facet.zeros=falsefacet=truefacet=truefacet=truefacet=true facet=truefacet=truef.taxonomyCategoryId.facet.limit=-1f.priceBucket id.facet.limit=-1f.partitionId.facet.zeros=falsef.categoryId.facet.sor t=truef.categoryId.facet.limit=-1f.marketplaceIds.facet.limit=10f.mfg Id.facet.missing=falsef.priceBucketid.facet.zeros=falsedebugQuery=true f.priceBucketid.facet.sort=truef.partitionId.facet.missing=falsef.tax onomyCategoryId.facet.zeros=falsef.priceBucketid.facet.missing=falsefa cet.field=categoryIdfacet.field=taxonomyCategoryIdfacet.field=partitio nIdfacet.field=mfgIdfacet.field=marketplaceIdsfacet.field=priceBucket idf.mfgId.facet.zeros=falsef.taxonomyCategoryId.facet.sort=truef.mark etplaceIds.facet.missing=falserows=48f.partitionId.facet.sort=truesta rt=0q=(sku:va+AND+sku:2226+AND+sku:w))+OR+((upc:va+AND+upc: 2226+AND+upc:w))+OR+((mfgPartNo:va+AND+mfgPartNo:2226+AND+mfgPar tNo:w))+OR+((title_en_uk:va+AND+title_en_uk:2226+AND+title_en_uk: w))^8+OR+((moreWords_en_uk:va+AND+moreWords_en_uk:2226+AND+moreWord s_en_uk:w))^2+OR+((allDoublemetaphone:va+AND+allDoublemetaphone:222 6+AND+allDoublemetaphone:w))^0.5)+AND+((_val_:sum\(product\(boosted, 30\),product\(sales,1000\),product\(views,10\),product\(image,100\)\ )f.taxonomyCategoryId.facet.missing=falsef.mfgId.facet.limit=10f .marketplaceIds.facet.sort=truef.marketplaceIds.facet.zeros=falsef.mfg Id.facet.sort=true} hits=0 status=500 QTime=84 Jul 17, 2009 2:42:18 PM org.apache.solr.common.SolrException log SEVERE: java.lang.RuntimeException: java.lang.IllegalArgumentException: name and value cannot both be empty at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:470) at org.apache.solr.util.SolrPluginUtils.doStandardDebug(SolrPluginUtils.jav a:399) at org.apache.solr.handler.component.DebugComponent.process(DebugComponent. java:54) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search Handler.java:177) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1205) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:303) at
Re: Implementing related tags
Have a look at the MoreLikeThis component - http://wiki.apache.org/solr/MoreLikeThis Cheers Avlesh On Mon, Jul 20, 2009 at 8:05 PM, James T codetester.codetes...@gmail.comwrote: Hi, I have a specific requirement for searching and looking for some help from the community on how to achieve it using solr: I need to index 1million + documents. Each document contains ( among other fields ) 3 fields representing the category which that doc belongs to. For example ( a very simplied case to make it easier to explain ) Doc 1 Place : NY, Paris, Tokyo Authors: AuthorA, AuthorB, AuthorC, AuthorD Tags: tagA, tagB, ballon Doc 2 Place : Bangkok Authors: AuthorD Tags: tagZ So each doc can contain multiple values for each of above fields ( place, author, tags ) Now the searching requirements is that, by constrainting on one of the value, I need a search on related fields. Example: By giving a constraint Author: AuthorD, I need a search on the search space: Place: Ny, Paris, Tokyo and London Author: AuthorA, AuthorB, AuthorC, Tags: tagA, tagB and tagZ ( The above result is generated by the fact that every item in the result has atleast 1 doc in common with AuthorD ) So as I am typing Ba, I need to get Ballon and Bangkok ( These Tags and Places have atleast 1doc where it also had AuthorD ) Is such a system possible to implement using solr? Thanks!
Re: SolrJ embedded server : error while adding document
not sure what you mean... yes, i guess... you send a bunch of requests with add( doc/collection ) and they are not visible until you send commit() On Jul 20, 2009, at 9:07 AM, Gérard Dupont wrote: my mistake, pb with the buffer I added. But it raises a question : does solr (using embedded server) has its own buffer mechanism in indexing or not ? I guess not but I might be wrong. 2009/7/20 Gérard Dupont ger.dup...@gmail.com Hi SolR guys, I'm starting to play with SolR after few years with classic Lucene. I'm trying to index a single document using the embedded server, but I got a strange error which looks like XML parsing problem (see trace hereafter). To add details, this is a simple Junit which create single document then pass it to the server in a ArraylistSolrInputDocument. The document only have 2 fields id and text as it is described in the configuration. ul 20, 2009 5:50:50 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: missing content stream at org .apache .solr .handler .XmlUpdateRequestHandler .handleRequestBody(XmlUpdateRequestHandler.java:114) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org .apache .solr .client .solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java: 147) at org .apache .solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java: 217) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) at org .weblab_project .services.solr.SolrComponent.flushIndexBuffer(SolrComponent.java:132) at org .weblab_project .services .solr.SolrComponentTest.testAddOneDocument(SolrComponentTest.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:154) at junit.framework.TestCase.runBare(TestCase.java:127) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at org .eclipse .jdt .internal .junit .runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org .eclipse .jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org .eclipse .jdt .internal .junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460) at org .eclipse .jdt .internal .junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673) at org .eclipse .jdt .internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java: 386) at org .eclipse .jdt .internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java: 196) Jul 20, 2009 5:50:50 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=/update params={} status=500 QTime=6 Cannot flush the index buffer : Server error while adding documents -- Gérard Dupont Information Processing Control and Cognition (IPCC) - EADS DS http://weblab-project.org Document Learning team - LITIS Laboratory -- Gérard Dupont Information Processing Control and Cognition (IPCC) - EADS DS http://weblab-project.org Document Learning team - LITIS Laboratory
Re: SolrJ embedded server : error while adding document
On Mon, Jul 20, 2009 at 18:35, Ryan McKinley ryan...@gmail.com wrote: you send a bunch of requests with add( doc/collection ) and they are not visible until you send commit() That's what I meant thanks. -- Gérard Dupont Information Processing Control and Cognition (IPCC) - EADS DS http://weblab-project.org Document Learning team - LITIS Laboratory
Indexing issue with XML control characters
During indexing I will often get this error: SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 3)) at [row,col {unknown-source}]: [2,1] at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675) By looking at this list and elsewhere I know that I need to filter out most control characters so I have been employing this regex: /[\x00-\x08\x0B\x0C\x0E-\x1F]/ But I still get the error. What is strange is that if I re-run my indexing process after a failure it will work on the previously failed node and then error out on another node some time later. That is, it is not deterministic. If I look at the text that is attempted to be indexed its pure as you can get one (a bunch of medical keywords like leg bones and nose). Any ideas would be greatly appreciated. The platform is: Solr implementation version: 1.3.0 694707 Lucene implementation version: 2.4-dev 691741 Mac OS X 10.5.7 JVM 1.5.0_19-b02-304 Thanks /Rupert
Re: How to configure Solr in Glassfish ?
What have you tried? Deploying the Solr war should be pretty straightforward. The main issue is likely setting solr.home. You likely have a lot of options there though. You can set a system property in the startup script, set a system property in the webapp context xml (if you can locate it), or I think glassfish offers a GUI to set such things. There really shouldn't be much more to it than that, but you should try and see what you run into. I havn't tried out glassfish in a couple years now. -- - Mark http://www.lucidimagination.com On Mon, Jul 20, 2009 at 8:27 AM, huenzhao huenz...@126.com wrote: I want use glassfish as the solr search server, but I don't know how to configure. Anybody knows? enzhao...@gmail.com Thanks! -- View this message in context: http://www.nabble.com/How-to-configure-Solr--in-Glassfish---tp24565758p24565758.html Sent from the Solr - User mailing list archive at Nabble.com. --
Re: Solr JMX and Cacti
On Mon, Jul 20, 2009 at 12:31 PM, Ryan McKinleyryan...@gmail.com wrote: On Jul 20, 2009, at 9:16 AM, Edward Capriolo wrote: On Mon, Jul 20, 2009 at 11:53 AM, Ryan McKinleyryan...@gmail.com wrote: On Jul 20, 2009, at 8:47 AM, Edward Capriolo wrote: Hey all, We have several deployments of Solr across our enterprise. Our largest one is a several GB and when enough documents are added an OOM exception is occurring. To debug this problem I have enable JMX. My goal is to write some cacti templates similar to the ones I have done for hadoop. http://www.jointhegrid.com/hadoop/. The only cacti template for solr I have found is old, broken and is using curl and PHP to try and read the values off the web interface. I have a few general questions/comments and also would like to know how others are dealing with this. 1) SNMP has counters/gauges. With JMX it is hard to know what a variable is without watching it for a while. Some fields are obvious, (total_x) (cumulative_x) it is worth wild to add some notes in the MBEAN info to say works like counter works like gauge. This way a network engineer like me does not have to go code surfing to figure out how to graph them. Has anyone written up a list of what the attributes are, types, and what they mean? 2) The values that are not counter style I am assuming are sampled, what is the sampling rate and is it adjustable? Any tips are helpful. Thank you, Check: http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/handler/RequestHandlerBase.java For cacti, you should probably ignore the two 'rate' based calculations as they are just derivatives: lst.add(avgTimePerRequest, (float) totalTime / (float) this.numRequests); lst.add(avgRequestsPerSecond, (float) numRequests*1000 / (float)(System.currentTimeMillis()-handlerStart)); Thanks Ryan, Actually, I typically graph the derivatives directly. As graphing a derivative is usually easier then writing cacti CDEF's which can be fickle when exporting the templates between versions, but I see your point. However, do you see the point I was getting at? Without MBEAN info stating that these values are derivatives I have to dig through source code. It is not a complaint, just a note that their seems to be so much work on JMX counters, but just a few word description in the MBEAN info would eliminate the need to dig through the source tree when it actually comes time to for someone to render these counters. no doubt -- i am unfamiliar with how these get passed to JMX (or where extra docs would be helpful) -- feel free to submit a patch that adds this info, perhaps to the wiki? javadoc? this way it will be easier for the next guy Also one more question on my mind, how are the JMX objects effected by a multi core deployment. Does each core have its own objects or are they shared? each core/handler gets its own object -- they are not shared across cores. Thank you, Edward Ryan, After adding a jmx in the solconfig.xml and setting some command line -D options. JMX is available on a tcp port. From that point a java tools can read the value directly. I have console programs that output the value so cacti 'data input methods' can read the data in. I just subclass this ...http://www.jointhegrid.com/svn/hadoop-cacti-jtg/trunk/src/com/jointhegrid/hadoopjmx/JMXBase.java The jconsole GUI tool then allows you to browse the JMX tree. If the MBEAN info is filled in it is displayed directly to the user. Patching the attributes to have a more verbose descriptions would be very helpful. I will open a Jira for that. Thanks, Edward
RE: multi-word synonyms with multiple matches
You haven't given us the full details on how you are using the SynonymFilterFactory (expand true or false?) but in general: yes the SynonymFilter finds the longest match it can. Sorry - doing expansion at index time: filter class=solr.SynonymFilterFactory synonyms=title_synonyms.txt ignoreCase=true expand=true/ if every svp is also a vp, then being explict in your synonyms (when doing index time expansion) should work... vp,vice president svp,senior vice president=vp,svp,senior vice president That worked - thanks!
Re: Implementing related tags
That does not seem to work fine. To further simplify the issue, assuming there is a multi valued tag field and number of docs is 1 million. By constrainting on a given tag, I need to search on the related tags. So Doc 1: tags: tagA, tagB, tagC, ball Doc 2: tags: tagA, bat Now constrainting on tagA and searching for ba*, I need something like http://localhost:8983/solr/memoir/select?fq=tag:tagAq=(tags%3Aba*) and just return the related tags ( not the docs where that tag is present ) tagA maybe present in 20K docs ( of 1 million docs), but tagA might have totally 100 other related tags ( i.e those 100 tags had appeared with tagA in atleast 1 doc ). So the search space ( by constrainting on tagA ) is 100 and not 1million. Hope that helps in explaining the issue better. Thanks! On Mon, Jul 20, 2009 at 9:51 PM, Avlesh Singh avl...@gmail.com wrote: Have a look at the MoreLikeThis component - http://wiki.apache.org/solr/MoreLikeThis Cheers Avlesh On Mon, Jul 20, 2009 at 8:05 PM, James T codetester.codetes...@gmail.com wrote: Hi, I have a specific requirement for searching and looking for some help from the community on how to achieve it using solr: I need to index 1million + documents. Each document contains ( among other fields ) 3 fields representing the category which that doc belongs to. For example ( a very simplied case to make it easier to explain ) Doc 1 Place : NY, Paris, Tokyo Authors: AuthorA, AuthorB, AuthorC, AuthorD Tags: tagA, tagB, ballon Doc 2 Place : Bangkok Authors: AuthorD Tags: tagZ So each doc can contain multiple values for each of above fields ( place, author, tags ) Now the searching requirements is that, by constrainting on one of the value, I need a search on related fields. Example: By giving a constraint Author: AuthorD, I need a search on the search space: Place: Ny, Paris, Tokyo and London Author: AuthorA, AuthorB, AuthorC, Tags: tagA, tagB and tagZ ( The above result is generated by the fact that every item in the result has atleast 1 doc in common with AuthorD ) So as I am typing Ba, I need to get Ballon and Bangkok ( These Tags and Places have atleast 1doc where it also had AuthorD ) Is such a system possible to implement using solr? Thanks!
Re: Implementing related tags
Faceting on tags will give you all the related tags, including the original tag (tagA in your case). You will have to filter out the original tag on the client side if you don't want to show that. With Solar 1.4, you will be able to use localParam to exclude the original tag in the results. If you tags field is analyzed, you will want to facet on a raw copy (using copy field) of the tags. If you want related tags that starts with ba, you can use a facet.prefix; q=tags:tagAfacet=truefacet.mincount=1facet.perfix=ab Bill On Mon, Jul 20, 2009 at 2:40 PM, Avlesh Singh avl...@gmail.com wrote: If I understood your problem correctly, faceting on tags field is what you need. Try this - http://localhost:8983/solr/ goog_1248106219337 memoir/select?fq=tag:tagAq=( goog_1248106219337 tags%3Aba*)facet=truefacet.field=tagsfacet.mincount=1 http://localhost:8983/solr/memoir/select?fq=tag:tagAq=%28tags%3Aba*%29facet=truefacet.field=tagsfacet.mincount=1 Notice the usage of facet parameters. Locate the facet_counts section in your response. If this is what you were looking for, then http://wiki.apache.org/solr/SimpleFacetParameters might be a good read. Cheers Avlesh On Mon, Jul 20, 2009 at 11:37 PM, James T codetester.codetes...@gmail.comwrote: That does not seem to work fine. To further simplify the issue, assuming there is a multi valued tag field and number of docs is 1 million. By constrainting on a given tag, I need to search on the related tags. So Doc 1: tags: tagA, tagB, tagC, ball Doc 2: tags: tagA, bat Now constrainting on tagA and searching for ba*, I need something like http://localhost:8983/solr/memoir/select?fq=tag:tagAq=(tags%3Aba*)http://localhost:8983/solr/memoir/select?fq=tag:tagAq=%28tags%3Aba*%29 http://localhost:8983/solr/memoir/select?fq=tag:tagAq=%28tags%3Aba*%29and just return the related tags ( not the docs where that tag is present ) tagA maybe present in 20K docs ( of 1 million docs), but tagA might have totally 100 other related tags ( i.e those 100 tags had appeared with tagA in atleast 1 doc ). So the search space ( by constrainting on tagA ) is 100 and not 1million. Hope that helps in explaining the issue better. Thanks! On Mon, Jul 20, 2009 at 9:51 PM, Avlesh Singh avl...@gmail.com wrote: Have a look at the MoreLikeThis component - http://wiki.apache.org/solr/MoreLikeThis Cheers Avlesh On Mon, Jul 20, 2009 at 8:05 PM, James T codetester.codetes...@gmail.com wrote: Hi, I have a specific requirement for searching and looking for some help from the community on how to achieve it using solr: I need to index 1million + documents. Each document contains ( among other fields ) 3 fields representing the category which that doc belongs to. For example ( a very simplied case to make it easier to explain ) Doc 1 Place : NY, Paris, Tokyo Authors: AuthorA, AuthorB, AuthorC, AuthorD Tags: tagA, tagB, ballon Doc 2 Place : Bangkok Authors: AuthorD Tags: tagZ So each doc can contain multiple values for each of above fields ( place, author, tags ) Now the searching requirements is that, by constrainting on one of the value, I need a search on related fields. Example: By giving a constraint Author: AuthorD, I need a search on the search space: Place: Ny, Paris, Tokyo and London Author: AuthorA, AuthorB, AuthorC, Tags: tagA, tagB and tagZ ( The above result is generated by the fact that every item in the result has atleast 1 doc in common with AuthorD ) So as I am typing Ba, I need to get Ballon and Bangkok ( These Tags and Places have atleast 1doc where it also had AuthorD ) Is such a system possible to implement using solr? Thanks!
index version on slave
If you ask for the index version of a slave instance, you always get version number being 0. Is it expected behavior? I am using this url http://slave_host:8983/solr/replication?command=indexversion This request returns correct version on master. If you use the 'details' command, you get the right version number (and generation number, and it gives more than what you want). Thanks, -- J
Re: unable to run the solr in tomcat 5.0
try this: java -Durl=http://localhost:8080/solr/update -jar post.jar filename.xml it should work. HH uday kumar maddigatla wrote: hi you mis understood my question. When i try to use the command java -post.jar *.*. It is trying to Post files in Solr which is there in 8983 port. If we use Jety then the default port number is 8983. But what about the thing that if we use tomcat which uses 8080 as port. If we use Jetty we can access Solr with this address http://localhost:8983/solr. If we use Tomcat we can access Solr with this address http://localhost:8080/solr. So if we use above command(java -post.jar) it clearly shows this kind of message in command promt C:\TestDocumetsjava -jar post.jar *.* SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file OIO_INV_579814008_14118.xml SimplePostTool: FATAL: Connection error (is Solr running at http://localhost:8983/solr/update ?): java.net.ConnectException: Connection refused: connect Means it is trying to post the files in Solr running at http://localhost:8983/solr/update . But in my case Solr is running in 8080 port. Only b'use of Solr i can't change my tomcat port number. Is there any other possibility in Solr to index the documents rather than command utility. Michael Ludwig-4 wrote: uday kumar maddigatla schrieb: My intention is to use 8080 as port. Is there any other way taht Solr will post the files in 8080 port Solr doesn't post, it listens. Use the curl utility as indicated in the documentation. http://wiki.apache.org/solr/UpdateXmlMessages Michael Ludwig -- View this message in context: http://www.nabble.com/unable-to-run-the-solr-in-tomcat-5.0-tp23400759p24576184.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: hierarchical faceting discussion
I was particularly surprised by the SOLR-64 numbers. What makes it's response so huge (and thus slow) to return the entire tree of facet counts? Erik On Jul 19, 2009, at 5:35 PM, Erik Hatcher wrote: I've posted the details of some experiments I just did comparing/ contrasting two approaches for faceting on documents within hierarchical structures: http://wiki.apache.org/solr/HierarchicalFaceting I'm sure I'm only scratching the service with the currently implementations of both SOLR-64 and SOLR-792. Alternative approaches are welcome! As I said on the wiki page, there won't be any single method that works in all cases - it will depend on how the hierarchical counts are needed - as an entire tree? (not likely in large taxonomic cases!) How are level pruned counts needed? Implementation-wise, seems like payloads could be be useful for some use cases. What are the use cases? What types and sizes of hierarchies are folks dealing with out there in the real world? Erik
Re: Recommended Articles
I personally love this book: http://www.amazon.com/Building-Search-Applications-Lucene-LingPipe/dp/0615204252 It intermixes search with analysis: sentiment, named entity recognition, NLP Pipelines and so on... There's a little Nutch cameo too... On Mon, Jul 20, 2009 at 4:56 PM, Mark Miller markrmil...@gmail.com wrote: dar...@ontrenet.com wrote: http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Dstripbooksfield-keywords=searchx=0y=0 Does anyone have links or books to recommended reading on search in general. Would like to see some literature on larger search concepts and ideas. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 Check out: http://wiki.apache.org/lucene-java/InformationRetrieval Some good stuff there, though I don't think often updated. My favorite is this free gem: http://www-csli.stanford.edu/~hinrich/information-retrieval-book.htmlhttp://www-csli.stanford.edu/%7Ehinrich/information-retrieval-book.html -- - Mark http://www.lucidimagination.com -- “I may not believe in myself, but I believe in what I'm doing.” -- Jimmy Page
Re: Obtaining SOLR index size on disk
Actually, if you have a server enabled as a replication master, the stats.jsp page reports the index size, so that information is available in some cases. -Peter On Sat, Jul 18, 2009 at 8:14 AM, Erik Hatchere...@ehatchersolutions.com wrote: On Jul 17, 2009, at 8:45 PM, J G wrote: Is it possible to obtain the SOLR index size on disk through the SOLR API? I've read through the docs and mailing list questions but can't seem to find the answer. No, but it'd be a great addition to the /admin/system handler which returns lots of other useful trivia like the free memory, ulimit, uptime, and such. Erik -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Exception searching PhoneticFilterFactory field with number
Robert, Can you narrow things down by simplifying the query? For example, I see allDoublemetaphone:2226, which looks suspicious in the give me phonetic version of the input context, but if you could narrow it down, we could probably be able to help more. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Robert Petersen rober...@buy.com To: solr-user@lucene.apache.org Sent: Monday, July 20, 2009 12:11:38 PM Subject: Exception searching PhoneticFilterFactory field with number Reposting in hopes of an answer... Hello all, I am getting the following exception whenever a user includes a numeric term in their search, and the search includes a field defined with a PhoneticFilterFactory and further it occurs whether I use the DoubleMetaphone encoder or any other. Has this ever come up before? I can replicate this with no data in the index at all, but if I search the field by hand from the solr web interface there is no exception. I am running the lucid imagination 1.3 certified release in a multicore master/slaves configuration. I will include the field def and the search/exception below and let me know if I can include any more clues... seems like it's trying to make a field with no name/value: positionIncrementGap=100 class=solr.WhitespaceTokenizerFactory/ synonyms=index_synonyms.txt ignoreCase=true expand=false/ ignoreCase=true words=stopwords.txt/ protected=protwords.txt/ class=solr.RemoveDuplicatesTokenFilterFactory/ encoder=DoubleMetaphone inject=false/ class=solr.WhitespaceTokenizerFactory/ synonyms=query_synonyms.txt ignoreCase=true expand=true/ ignoreCase=true words=stopwords.txt/ protected=protwords.txt/ class=solr.RemoveDuplicatesTokenFilterFactory/ encoder=DoubleMetaphone inject=false/ Jul 17, 2009 2:42:18 PM org.apache.solr.core.SolrCore execute INFO: [10017] webapp=/solr path=/select/ params={f.partitionId.facet.limit=10f.categoryId.facet.missing=falsef. categoryId.facet.zeros=falsefacet=truefacet=truefacet=truefacet=true facet=truefacet=truef.taxonomyCategoryId.facet.limit=-1f.priceBucket id.facet.limit=-1f.partitionId.facet.zeros=falsef.categoryId.facet.sor t=truef.categoryId.facet.limit=-1f.marketplaceIds.facet.limit=10f.mfg Id.facet.missing=falsef.priceBucketid.facet.zeros=falsedebugQuery=true f.priceBucketid.facet.sort=truef.partitionId.facet.missing=falsef.tax onomyCategoryId.facet.zeros=falsef.priceBucketid.facet.missing=falsefa cet.field=categoryIdfacet.field=taxonomyCategoryIdfacet.field=partitio nIdfacet.field=mfgIdfacet.field=marketplaceIdsfacet.field=priceBucket idf.mfgId.facet.zeros=falsef.taxonomyCategoryId.facet.sort=truef.mark etplaceIds.facet.missing=falserows=48f.partitionId.facet.sort=truesta rt=0q=(sku:va+AND+sku:2226+AND+sku:w))+OR+((upc:va+AND+upc: 2226+AND+upc:w))+OR+((mfgPartNo:va+AND+mfgPartNo:2226+AND+mfgPar tNo:w))+OR+((title_en_uk:va+AND+title_en_uk:2226+AND+title_en_uk: w))^8+OR+((moreWords_en_uk:va+AND+moreWords_en_uk:2226+AND+moreWord s_en_uk:w))^2+OR+((allDoublemetaphone:va+AND+allDoublemetaphone:222 6+AND+allDoublemetaphone:w))^0.5)+AND+((_val_:sum\(product\(boosted, 30\),product\(sales,1000\),product\(views,10\),product\(image,100\)\ )f.taxonomyCategoryId.facet.missing=falsef.mfgId.facet.limit=10f .marketplaceIds.facet.sort=truef.marketplaceIds.facet.zeros=falsef.mfg Id.facet.sort=true} hits=0 status=500 QTime=84 Jul 17, 2009 2:42:18 PM org.apache.solr.common.SolrException log SEVERE: java.lang.RuntimeException: java.lang.IllegalArgumentException: name and value cannot both be empty at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:470) at org.apache.solr.util.SolrPluginUtils.doStandardDebug(SolrPluginUtils.jav a:399) at org.apache.solr.handler.component.DebugComponent.process(DebugComponent. java:54) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search Handler.java:177) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1205) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:232) at
Re: method inform of SolrCoreAware callled 2 times
it is not normal to get the inform() called twice for a single object. which version of solr are you using? On Mon, Jul 20, 2009 at 7:17 PM, Marc Sturlesemarc.sturl...@gmail.com wrote: Hey there, I have implemented a custom component wich extends SearchComponent and implements SolrCoreAware. I have decalred it in solrconfig.xml as: searchComponent name=mycomp class=solr.MyCustomComponent And added it in my Searchhandler as: arr name=last-components strmycomp/str /arr I am using multicore with two cores. I have noticed (doing some logging) that the method inform (the ones that implements SolrCoreAware) in being called 2 times per each core when I start my solr instance. As I understood SolrCoreAware inform method should be just called once per core, am I right or it's normal that is is called 2 times per core? -- View this message in context: http://www.nabble.com/method-inform-of-SolrCoreAware-callled-2-times-tp24570221p24570221.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
DocList Pagination
Hi, I am try to get the next DocList page in my custom search component. Could I get a code example of this? Cheers. -- View this message in context: http://www.nabble.com/DocList-Pagination-tp24581850p24581850.html Sent from the Solr - User mailing list archive at Nabble.com.