Re: Solr with Auto-suggest
Hi Ryan, I gone through your post https://issues.apache.org/jira/browse/SOLR-357 where you mention about prefix filter,can you tell me how to use that patch,and you mentioned to use the code as bellow, fieldType name=prefix_full class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType fieldType name=prefix_token class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType ... field name=prefix1 type=prefix_full indexed=true stored=false/ field name=prefix2 type=prefix_token indexed=true stored=false/ ... copyField source=name dest=prefix1/ copyField source=name dest=prefix2/ For using the above code is that you are using EdgeNGramFilterFactory or PrefixingFilterFactory. or the above code works for EdgeNGramFilterFactory,i am not clear about it,with out using the PrefixingFilterFactory patch, is that i can write the above code. And the next is name in copyFiled is text type or string type waiting for your reply, Regards, Rekha -- View this message in context: http://www.nabble.com/Solr-with-Auto-suggest-tp16880894p25530993.html Sent from the Solr - User mailing list archive at Nabble.com.
Highlighting not working on a prefix_token field
I have a prefix_token field defined as underneath in my schema.xml fieldType name=prefix_token class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType Searches on the field work fine and as expected. However, attempts to highlight on this field does not yield any results. Highlighting on other fields work fine. Any clues? I am using Solr 1.3 Cheers Avlesh
Re: Highlighting not working on a prefix_token field
On Wed, Sep 23, 2009 at 12:23 PM, Avlesh Singh avl...@gmail.com wrote: I have a prefix_token field defined as underneath in my schema.xml fieldType name=prefix_token class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType Searches on the field work fine and as expected. However, attempts to highlight on this field does not yield any results. Highlighting on other fields work fine. Won't work until SOLR-1268 comes along. http://www.lucidimagination.com/search/document/4da480fe3eb0e7e4/highlighting_in_stemmed_or_n_grammed_fields_possible -- Regards, Shalin Shekhar Mangar.
Re: Highlighting not working on a prefix_token field
Hmmm .. But ngrams with KeywordTokenizerFactory instead of the WhitespaceTokenizerFactory work just as fine. Related issues? Cheers Avlesh On Wed, Sep 23, 2009 at 12:27 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Sep 23, 2009 at 12:23 PM, Avlesh Singh avl...@gmail.com wrote: I have a prefix_token field defined as underneath in my schema.xml fieldType name=prefix_token class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType Searches on the field work fine and as expected. However, attempts to highlight on this field does not yield any results. Highlighting on other fields work fine. Won't work until SOLR-1268 comes along. http://www.lucidimagination.com/search/document/4da480fe3eb0e7e4/highlighting_in_stemmed_or_n_grammed_fields_possible -- Regards, Shalin Shekhar Mangar.
Re: Solr with Auto-suggest
On Wed, Sep 23, 2009 at 11:30 AM, dharhsana rekha.dharsh...@gmail.comwrote: Hi Ryan, I gone through your post https://issues.apache.org/jira/browse/SOLR-357 where you mention about prefix filter,can you tell me how to use that patch,and you mentioned to use the code as bellow, fieldType name=prefix_full class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType fieldType name=prefix_token class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType ... field name=prefix1 type=prefix_full indexed=true stored=false/ field name=prefix2 type=prefix_token indexed=true stored=false/ ... copyField source=name dest=prefix1/ copyField source=name dest=prefix2/ For using the above code is that you are using EdgeNGramFilterFactory or PrefixingFilterFactory. or the above code works for EdgeNGramFilterFactory,i am not clear about it,with out using the PrefixingFilterFactory patch, is that i can write the above code. There is no such thing in Solr as a PrefixingFilterFactory. Use EdgeNGramFilterFactory. And the next is name in copyFiled is text type or string type Name was a field in his schema. Whatever fields' values you want for auto-suggest, copy them over to the field. -- Regards, Shalin Shekhar Mangar.
Re: Highlighting not working on a prefix_token field
On Wed, Sep 23, 2009 at 12:31 PM, Avlesh Singh avl...@gmail.com wrote: Hmmm .. But ngrams with KeywordTokenizerFactory instead of the WhitespaceTokenizerFactory work just as fine. Related issues? I'm sorry I don't understand the question. Do you mean to say that highlighting works with one but not with another? -- Regards, Shalin Shekhar Mangar.
Re: Highlighting not working on a prefix_token field
I'm sorry I don't understand the question. Do you mean to say that highlighting works with one but not with another? Yes. Cheers Avlesh On Wed, Sep 23, 2009 at 12:59 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Sep 23, 2009 at 12:31 PM, Avlesh Singh avl...@gmail.com wrote: Hmmm .. But ngrams with KeywordTokenizerFactory instead of the WhitespaceTokenizerFactory work just as fine. Related issues? I'm sorry I don't understand the question. Do you mean to say that highlighting works with one but not with another? -- Regards, Shalin Shekhar Mangar.
Finding near duplicates which searching Documents
Hi, When we have news content crawled we face a problme of same content being repeated in many documents. We want to add a near duplicate document filter to detect such documents. Is there a way to do that in SOLR? Regards, Ninad Raut.
Re: Finding near duplicates which searching Documents
On Wed, Sep 23, 2009 at 3:14 PM, Ninad Raut hbase.user.ni...@gmail.comwrote: Hi, When we have news content crawled we face a problme of same content being repeated in many documents. We want to add a near duplicate document filter to detect such documents. Is there a way to do that in SOLR? Look at http://wiki.apache.org/solr/Deduplication -- Regards, Shalin Shekhar Mangar.
Phrase stopwords
Hi, Is it possible to have a phrase as a stopword in solr? In case, please share how to do so? regards, Pooja
Re: Finding near duplicates which searching Documents
Is this feature included in SOLR 1.4?? On Wed, Sep 23, 2009 at 3:29 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Sep 23, 2009 at 3:14 PM, Ninad Raut hbase.user.ni...@gmail.com wrote: Hi, When we have news content crawled we face a problme of same content being repeated in many documents. We want to add a near duplicate document filter to detect such documents. Is there a way to do that in SOLR? Look at http://wiki.apache.org/solr/Deduplication -- Regards, Shalin Shekhar Mangar.
RE: Oracle incomplete DataImport results
After investigating the log files, the DataImporter was throwing an error from the Oracle DB driver: java.sql.SQLException: ORA-22835: Buffer too small for CLOB to CHAR or BLOB to RAW conversion (actual: 2890, maximum: 2000) Aka. There was a problem with the 551st item where a related item had a text field of type Clob that was too long and was therefore causing a problem when using the function TO_NCHAR to fix the type. FIX: Used the Oracle function dbms_lob.substr(FIELD_NAME, MAX_LENGTH, 1) to just trim the string (this also applies and implicit converstion). -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: 22 September 2009 19:27 To: solr-user@lucene.apache.org Subject: Re: Oracle incomplete DataImport results On Tue, Sep 22, 2009 at 10:53 PM, Daniel Bradley daniel.brad...@adfero.co.uk wrote: I appear to be getting only a small number of items imported into Solr when doing a full-import against an oracle data-provider. The query I'm running is something approximately similar to: SELECT ID, dbms_lob.substr(Text, 4000, 1) Text, Date, LastModified, Type, Created, Available, Parent, Title from TheTableName where Available CURRENT_DATE and Available add_months(current_date, -1) This retrieves the last month's items from the database (The dbms_lob.substr function is used to avoid Solr simply indexing the object name as Text is the Oracle clob type). When running this in oracle sql developer approximately 5600 rows are returned however running a full import only imports approximately 550 items. There's no visible memory use and no exceptions suggesting any problems with lack of memory. Is there any limiting of the number of items you can import in a single request? Any other thoughts on this problem would be much appreciated. What is the uniqueKey in schema.xml? Is it possible that many of those 5600 rows share the same value for solr's uniqueKey field? There are no limits on the number of items you can import. The number of documents created should correspond to the number of rows returned by the root level entity's query (assuming the uniqueKey for each of those documents is actually unique). -- Regards, Shalin Shekhar Mangar. This message has been scanned for viruses by Websense Hosted Email Security - On Behalf of Adfero Ltd DISCLAIMER: This email (including any attachments) is subject to copyright, and the information in it is confidential. Use of this email or of any information in it other than by the addressee is unauthorised and unlawful. Whilst reasonable efforts are made to ensure that any attachments are virus-free, it is the recipient's sole responsibility to scan all attachments for viruses. All calls and emails to and from this company may be monitored and recorded for legitimate purposes relating to this company's business. Any opinions expressed in this email (or in any attachments) are those of the author and do not necessarily represent the opinions of Adfero Ltd or of any other group company.
Re: Finding near duplicates which searching Documents
On Wed, Sep 23, 2009 at 3:50 PM, Ninad Raut hbase.user.ni...@gmail.comwrote: Is this feature included in SOLR 1.4?? Yep. -- Regards, Shalin Shekhar Mangar.
Re: Oracle incomplete DataImport results
On Wed, Sep 23, 2009 at 3:53 PM, Daniel Bradley daniel.brad...@adfero.co.uk wrote: After investigating the log files, the DataImporter was throwing an error from the Oracle DB driver: java.sql.SQLException: ORA-22835: Buffer too small for CLOB to CHAR or BLOB to RAW conversion (actual: 2890, maximum: 2000) Aka. There was a problem with the 551st item where a related item had a text field of type Clob that was too long and was therefore causing a problem when using the function TO_NCHAR to fix the type. FIX: Used the Oracle function dbms_lob.substr(FIELD_NAME, MAX_LENGTH, 1) to just trim the string (this also applies and implicit converstion). Phew, tricky one! Thanks for bringing closure. -- Regards, Shalin Shekhar Mangar.
Exact match
Hi, I am doing exact search in Solr .In Solr admin page I am giving the search input string for search. For ex: I am giving “channeL12” as search input string in solr home page it displays search results as doc str name=urlhttp://rediff/field str name=titlefirst/field str name=descriptionchanneL12/field /doc As there is a matching input for “channeL12”. If I give “channel12” as search input string with L in lower case I am not getting any search results. In fact I changed ignoreCase =”true” in schema.xml schema.xml analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory ignoreCase=true / /analyzer I want to ignore casesensitive search in my search results. Please let me know if I need to make changes any where else or what to do to achieve the desired output. Regards Bhaskar
Re: Exact match
Hi, I am doing exact search in Solr .In Solr admin page I am giving the search input string for search. For ex: I am giving “channeL12” as search input string in solr home page it displays search results as doc str name=urlhttp://rediff/field str name=titlefirst/field str name=descriptionchanneL12/field /doc As there is a matching input for “channeL12”. If I give “channel12” as search input string with L in lower case I am not getting any search results. In fact I changed ignoreCase =”true” in schema.xml schema.xml analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory ignoreCase=true / /analyzer I want to ignore casesensitive search in my search results. Please let me know if I need to make changes any where else or what to do to achieve the desired output. Regards Bhaskar First of all, WhitespaceTokenizerFactory does not have an ignoreCase parameter. You need to add filter class=solr.LowerCaseFilterFactory/ to your both query and index analyzer. analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory/ /analyzer But i think it will be more convenient for you to use the same analyzer for index and query. Hope this helps.
about url field error
hello guy I am newbie on solr. I have running solr on tomcat6, all is ok, when i add data to solrserver via http post cause a error the below is code SolrInputDocument solrdoc=new SolrInputDocument(); solrdoc.addField(url,request.getParameter(URL)); 2009-9-23 21:18:03 org.apache.solr.update.processor.LogUpdateProcessor finish info: {add=[http://www.yahoo.com]} 0 1 Invalid version or the data in not in 'javabin' format the schema.xml content is field name=url type=url stored=true indexed=true required=true/ i have to solve this error thank you very much -- View this message in context: http://www.nabble.com/about-url-field-error-tp25531153p25531153.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Phrase stopwords
From: Pooja Verlani pooja.verl...@gmail.com Subject: Phrase stopwords To: solr-user@lucene.apache.org Date: Wednesday, September 23, 2009, 1:15 PM Hi, Is it possible to have a phrase as a stopword in solr? In case, please share how to do so? regards, Pooja I think that can be implemented casting/using SynonymFilterFactory and StopFilterFactory. filter class=solr.SynonymFilterFactory synonyms=syn.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ syn.txt will contain lines: phrase as a stopword = somestupidtoken phrase stopword = somestupidtoken three words stopword = somestupidtoken stopwords.txt will contain line: somestupidtoken IMO it will work since SynonymFilterFactory can handle multi-word synonyms like a b c d = foo. With expand=false, you can use this filter to reduce your multi-word stopwords to a single token (that has a low possibility to occur in your docuements). Then remove this single token with StopFilter. This combination will remove multi-word entries in your syn.txt. Hope this helps.
RE: Parallel requests to Tomcat
For 8-CPU load-stress testing of Tomcat you are probably making mistake: - you should execute load-stress software and wait 5-30 minutes (depends on index size) BEFORE taking measurements. 1. JVM HotSpot need to compile everything into native code 2. Tomcat Thread Pool needs warm up 3. SOLR caches need warm up(!) And etc. 8 parallel requests are too small for default Tomcat; it uses 150 threads (default for old versions), and new Concurrent package from Java 5 You should not test manually; use software such as The Grinder etc., also note please: there is difference between mean time and response time, between average (successful) requests per second and average response time... Tomcat is serializing the requests - doesn't mean anything for performance... yes, it has dedicated Listener on dedicated port dispatching requests to worker threads... and LAN NIC card serializes everything too... Fuad Efendi http://www.linkedin.com/in/liferay -Original Message- From: Michael [mailto:solrco...@gmail.com] Sent: September-22-09 4:04 PM To: solr-user@lucene.apache.org Subject: Parallel requests to Tomcat Hi, I have a Solr+Tomcat installation on an 8 CPU Linux box, and I just tried sending parallel requests to it and measuring response time. I would expect that it could handle up to 8 parallel requests without significant slowdown of any individual request. Instead, I found that Tomcat is serializing the requests. For example, the response time for each of 2 parallel requests is nearly 2 times that for a single request, and the time for each of 8 parallel requests is about 4 times that of a single request. I am pretty sure this is a Tomcat issue, for when I started 8 identical instances of Solr+Tomcat on the machine (on 8 different ports), I could send one request to each in parallel with only a 20% slowdown (compared to 300% in a single Tomcat.) I'm using the stock Tomcat download with minimal configuration changes, except that I disabled all logging (in case the logger was blocking for each request, serializing them.) I'm giving 2G RAM to each JVM. Does anyone more familiar with Tomcat know what's wrong? I can't imagine that Tomcat really can't handle parallel requests.
Re: Parallel requests to Tomcat
I'm using a Solr 1.4 nightly from around July. Is that recent enough to have the improved reader implementation? I'm not sure whether you'd call my operations IO heavy -- each query has so many terms (~50) that even against a 45K document index a query takes 130ms, but the entire index is in a ramfs. - Michael On Tue, Sep 22, 2009 at 8:08 PM, Yonik Seeley yo...@lucidimagination.comwrote: What version of Solr are you using? Solr1.3 and Lucene 2.4 defaulted to an index reader implementation that had to synchronize, so search operations that are IO heavy can't proceed in parallel. You shouldn't see this with 1.4 -Yonik http://www.lucidimagination.com On Tue, Sep 22, 2009 at 4:03 PM, Michael solrco...@gmail.com wrote: Hi, I have a Solr+Tomcat installation on an 8 CPU Linux box, and I just tried sending parallel requests to it and measuring response time. I would expect that it could handle up to 8 parallel requests without significant slowdown of any individual request. Instead, I found that Tomcat is serializing the requests. For example, the response time for each of 2 parallel requests is nearly 2 times that for a single request, and the time for each of 8 parallel requests is about 4 times that of a single request. I am pretty sure this is a Tomcat issue, for when I started 8 identical instances of Solr+Tomcat on the machine (on 8 different ports), I could send one request to each in parallel with only a 20% slowdown (compared to 300% in a single Tomcat.) I'm using the stock Tomcat download with minimal configuration changes, except that I disabled all logging (in case the logger was blocking for each request, serializing them.) I'm giving 2G RAM to each JVM. Does anyone more familiar with Tomcat know what's wrong? I can't imagine that Tomcat really can't handle parallel requests.
RE: Parallel requests to Tomcat
I have 0-15ms for 50M (millions docs), Tomcat, 8-CPU: http://www.tokenizer.org == - something obviously wrong in your case, 130ms is too high. Is it dedicated server? Disk swapping? Etc. -Original Message- From: Michael [mailto:solrco...@gmail.com] Sent: September-23-09 11:17 AM To: solr-user@lucene.apache.org; yo...@lucidimagination.com Subject: Re: Parallel requests to Tomcat I'm using a Solr 1.4 nightly from around July. Is that recent enough to have the improved reader implementation? I'm not sure whether you'd call my operations IO heavy -- each query has so many terms (~50) that even against a 45K document index a query takes 130ms, but the entire index is in a ramfs. - Michael On Tue, Sep 22, 2009 at 8:08 PM, Yonik Seeley yo...@lucidimagination.comwrote: What version of Solr are you using? Solr1.3 and Lucene 2.4 defaulted to an index reader implementation that had to synchronize, so search operations that are IO heavy can't proceed in parallel. You shouldn't see this with 1.4 -Yonik http://www.lucidimagination.com On Tue, Sep 22, 2009 at 4:03 PM, Michael solrco...@gmail.com wrote: Hi, I have a Solr+Tomcat installation on an 8 CPU Linux box, and I just tried sending parallel requests to it and measuring response time. I would expect that it could handle up to 8 parallel requests without significant slowdown of any individual request. Instead, I found that Tomcat is serializing the requests. For example, the response time for each of 2 parallel requests is nearly 2 times that for a single request, and the time for each of 8 parallel requests is about 4 times that of a single request. I am pretty sure this is a Tomcat issue, for when I started 8 identical instances of Solr+Tomcat on the machine (on 8 different ports), I could send one request to each in parallel with only a 20% slowdown (compared to 300% in a single Tomcat.) I'm using the stock Tomcat download with minimal configuration changes, except that I disabled all logging (in case the logger was blocking for each request, serializing them.) I'm giving 2G RAM to each JVM. Does anyone more familiar with Tomcat know what's wrong? I can't imagine that Tomcat really can't handle parallel requests.
Re: Parallel requests to Tomcat
Hi Fuad, thanks for the reply. My queries are heavy enough that the difference in performance is obvious. I am using a home-grown load testing script that sends 1000 realistic queries to the server and takes the average response time. My index is on a ramfs which I've shown makes the QR and doc caches unnecessary; I am warming up the filter and fieldvalue caches before beginning the test. There's no appreciable difference between query times at the beginning, middle, or end of the test, so I can't blame the hotspot or the Tomcat thread pool for not being warmed up. The queries I'm using are complex enough that they take a long time to run. 8 queries against 1 Tomcat average 600ms per query, while 8 queries against 8 Tomcats average 190ms per query (on a dedicated 8 CPU server w 32G RAM). I don't see how to interpret these numbers except that Tomcat is not multithreading as well as it should :) Your thoughts? Michael On Wed, Sep 23, 2009 at 10:48 AM, Fuad Efendi f...@efendi.ca wrote: For 8-CPU load-stress testing of Tomcat you are probably making mistake: - you should execute load-stress software and wait 5-30 minutes (depends on index size) BEFORE taking measurements. 1. JVM HotSpot need to compile everything into native code 2. Tomcat Thread Pool needs warm up 3. SOLR caches need warm up(!) And etc. 8 parallel requests are too small for default Tomcat; it uses 150 threads (default for old versions), and new Concurrent package from Java 5 You should not test manually; use software such as The Grinder etc., also note please: there is difference between mean time and response time, between average (successful) requests per second and average response time... Tomcat is serializing the requests - doesn't mean anything for performance... yes, it has dedicated Listener on dedicated port dispatching requests to worker threads... and LAN NIC card serializes everything too... Fuad Efendi http://www.linkedin.com/in/liferay -Original Message- From: Michael [mailto:solrco...@gmail.com] Sent: September-22-09 4:04 PM To: solr-user@lucene.apache.org Subject: Parallel requests to Tomcat Hi, I have a Solr+Tomcat installation on an 8 CPU Linux box, and I just tried sending parallel requests to it and measuring response time. I would expect that it could handle up to 8 parallel requests without significant slowdown of any individual request. Instead, I found that Tomcat is serializing the requests. For example, the response time for each of 2 parallel requests is nearly 2 times that for a single request, and the time for each of 8 parallel requests is about 4 times that of a single request. I am pretty sure this is a Tomcat issue, for when I started 8 identical instances of Solr+Tomcat on the machine (on 8 different ports), I could send one request to each in parallel with only a 20% slowdown (compared to 300% in a single Tomcat.) I'm using the stock Tomcat download with minimal configuration changes, except that I disabled all logging (in case the logger was blocking for each request, serializing them.) I'm giving 2G RAM to each JVM. Does anyone more familiar with Tomcat know what's wrong? I can't imagine that Tomcat really can't handle parallel requests.
RE: Parallel requests to Tomcat
Correction: 0 - 150ms (depends on size of query results; 150ms for non-cached (new) queries returning more than 50K docs). -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: September-23-09 11:26 AM To: solr-user@lucene.apache.org Subject: RE: Parallel requests to Tomcat I have 0-15ms for 50M (millions docs), Tomcat, 8-CPU: http://www.tokenizer.org == - something obviously wrong in your case, 130ms is too high. Is it dedicated server? Disk swapping? Etc. -Original Message- From: Michael [mailto:solrco...@gmail.com] Sent: September-23-09 11:17 AM To: solr-user@lucene.apache.org; yo...@lucidimagination.com Subject: Re: Parallel requests to Tomcat I'm using a Solr 1.4 nightly from around July. Is that recent enough to have the improved reader implementation? I'm not sure whether you'd call my operations IO heavy -- each query has so many terms (~50) that even against a 45K document index a query takes 130ms, but the entire index is in a ramfs. - Michael On Tue, Sep 22, 2009 at 8:08 PM, Yonik Seeley yo...@lucidimagination.comwrote: What version of Solr are you using? Solr1.3 and Lucene 2.4 defaulted to an index reader implementation that had to synchronize, so search operations that are IO heavy can't proceed in parallel. You shouldn't see this with 1.4 -Yonik http://www.lucidimagination.com On Tue, Sep 22, 2009 at 4:03 PM, Michael solrco...@gmail.com wrote: Hi, I have a Solr+Tomcat installation on an 8 CPU Linux box, and I just tried sending parallel requests to it and measuring response time. I would expect that it could handle up to 8 parallel requests without significant slowdown of any individual request. Instead, I found that Tomcat is serializing the requests. For example, the response time for each of 2 parallel requests is nearly 2 times that for a single request, and the time for each of 8 parallel requests is about 4 times that of a single request. I am pretty sure this is a Tomcat issue, for when I started 8 identical instances of Solr+Tomcat on the machine (on 8 different ports), I could send one request to each in parallel with only a 20% slowdown (compared to 300% in a single Tomcat.) I'm using the stock Tomcat download with minimal configuration changes, except that I disabled all logging (in case the logger was blocking for each request, serializing them.) I'm giving 2G RAM to each JVM. Does anyone more familiar with Tomcat know what's wrong? I can't imagine that Tomcat really can't handle parallel requests.
Re: Parallel requests to Tomcat
On Wed, Sep 23, 2009 at 11:26 AM, Fuad Efendi f...@efendi.ca wrote: - something obviously wrong in your case, 130ms is too high. Is it dedicated server? Disk swapping? Etc. It's that my queries are ridiculously complex. My users are very familiar with boolean searching, and I'm doing a lot of processing outside of Solr that increases the query size by something like 50x. I'm OK with the individual query time -- I can always shave terms off if I must. It's the difference between 1 Tomcat and 8 Tomcats that is the problem: I'd like to be able to harness all 8 CPUs! While my test corpus is 45K docs, my actual corpus will be 30MM, and so I'd like to get all the performance I can out of my box. Michael
RE: Parallel requests to Tomcat
8 queries against 1 Tomcat average 600ms per query, while 8 queries against 8 Tomcats average 190ms per query (on a dedicated 8 CPU server w 32G RAM). I don't see how to interpret these numbers except that Tomcat is not multithreading as well as it should :) Hi Michael, I think it is very natural; 8 single processes not sharing anything are faster than 8 threads sharing something. However, 600ms is too high. My index is on a ramfs which I've shown makes the QR and doc caches unnecessary; However, SOLR is faster than pure Lucene, try SOLR caches! (I am not sure about current version of SOLR(Lucene), but Lucene always used synchronized isDeleted() method which causes 'serialization').
Re: Parallel requests to Tomcat
On Wed, Sep 23, 2009 at 11:17 AM, Michael solrco...@gmail.com wrote: I'm using a Solr 1.4 nightly from around July. Is that recent enough to have the improved reader implementation? I'm not sure whether you'd call my operations IO heavy -- each query has so many terms (~50) that even against a 45K document index a query takes 130ms, but the entire index is in a ramfs. This could well be IO bound - lots of seeks and reads. Perhaps try on a normal filesystem and see if you still see the serialization - someone recently saw some funny results with tmpfs and lucene, so it would be good to rule that out. If you want to try and rule out tomcat, throw the webapp in jetty. -Yonik http://www.lucidimagination.com
Re: Parallel requests to Tomcat
Hi Fuad, On Wed, Sep 23, 2009 at 11:37 AM, Fuad Efendi f...@efendi.ca wrote: 8 queries against 1 Tomcat average 600ms per query, while 8 queries against 8 Tomcats average 190ms per query (on a dedicated 8 CPU server w 32G RAM). I don't see how to interpret these numbers except that Tomcat is not multithreading as well as it should :) Hi Michael, I think it is very natural; 8 single processes not sharing anything are faster than 8 threads sharing something. 8 threads sharing something may have *some* overhead versus 8 processes, but as you say, 410ms overhead points to a different problem. However, 600ms is too high. My index is on a ramfs which I've shown makes the QR and doc caches unnecessary; However, SOLR is faster than pure Lucene, try SOLR caches! I have. In a separate test, I verified that the caches that save disk I/O (QR and doc) make no difference to query time, because my index is on a ramfs. The caches that save CPU cycles (filter and fieldvalue, because I'm doing heavy faceting) DO help and I do have them turned on. Michael
Re: Parallel requests to Tomcat
Hi Yonik, On Wed, Sep 23, 2009 at 11:42 AM, Yonik Seeley yo...@lucidimagination.comwrote: This could well be IO bound - lots of seeks and reads. If this were IO bound, wouldn't I see the same results when sending my 8 requests to 8 Tomcats? There's only one disk (well, RAM) whether I'm querying 8 processes or 8 threads in 1 process, right? Michael
Re: Parallel requests to Tomcat
This sure seems like a good time to try LucidGaze for Solr. That would give some Solr-specific profiling data. http://www.lucidimagination.com/Downloads/LucidGaze-for-Solr wunder On Sep 23, 2009, at 8:47 AM, Michael wrote: Hi Yonik, On Wed, Sep 23, 2009 at 11:42 AM, Yonik Seeley yo...@lucidimagination.comwrote: This could well be IO bound - lots of seeks and reads. If this were IO bound, wouldn't I see the same results when sending my 8 requests to 8 Tomcats? There's only one disk (well, RAM) whether I'm querying 8 processes or 8 threads in 1 process, right? Michael
Re: Parallel requests to Tomcat
Thanks for the suggestion, Walter! I've been using Gaze 1.0 for a while now, but when I moved to a multicore approach (which was the impetus behind all of this testing) Gaze failed to start and I had to comment it out of solrconfig.xml to get Solr to start. Are you aware whether Gaze is able to work in a multicore environment? Michael On Wed, Sep 23, 2009 at 11:55 AM, Walter Underwood wun...@wunderwood.orgwrote: This sure seems like a good time to try LucidGaze for Solr. That would give some Solr-specific profiling data. http://www.lucidimagination.com/Downloads/LucidGaze-for-Solr wunder On Sep 23, 2009, at 8:47 AM, Michael wrote: Hi Yonik, On Wed, Sep 23, 2009 at 11:42 AM, Yonik Seeley yo...@lucidimagination.comwrote: This could well be IO bound - lots of seeks and reads. If this were IO bound, wouldn't I see the same results when sending my 8 requests to 8 Tomcats? There's only one disk (well, RAM) whether I'm querying 8 processes or 8 threads in 1 process, right? Michael
Re: Parallel requests to Tomcat
On Wed, Sep 23, 2009 at 11:47 AM, Michael solrco...@gmail.com wrote: Hi Yonik, On Wed, Sep 23, 2009 at 11:42 AM, Yonik Seeley yo...@lucidimagination.com wrote: This could well be IO bound - lots of seeks and reads. If this were IO bound, wouldn't I see the same results when sending my 8 requests to 8 Tomcats? There's only one disk (well, RAM) whether I'm querying 8 processes or 8 threads in 1 process, right? Right - I was thinking IO bound at the Lucene Directory level - which synchronized in the past and led to poor concurrency. Buy your Solr version is recent enough to use the newer unsynchronized method by default (on non-windows) -Yonik http://www.lucidimagination.com
Re: Parallel requests to Tomcat
On Wed, Sep 23, 2009 at 12:05 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Wed, Sep 23, 2009 at 11:47 AM, Michael solrco...@gmail.com wrote: If this were IO bound, wouldn't I see the same results when sending my 8 requests to 8 Tomcats? There's only one disk (well, RAM) whether I'm querying 8 processes or 8 threads in 1 process, right? Right - I was thinking IO bound at the Lucene Directory level - which synchronized in the past and led to poor concurrency. Buy your Solr version is recent enough to use the newer unsynchronized method by default (on non-windows) Ah, OK. So it looks like comparing to Jetty is my only next step. Although I'm not sure what I'm going to do based on the result of that test -- if Jetty behaves differently, then I still don't know why the heck Tomcat is behaving badly! :) Michael
RE: Parallel requests to Tomcat
8 threads sharing something may have *some* overhead versus 8 processes, but as you say, 410ms overhead points to a different problem. - You have baseline (single-threaded load-stress script sending requests to SOLR) (1-request-in-parallel, 8 requests to 8 Tomcats); 200ms looks extremely high... only if you are not GETting more than top-1 docs in a single query (instead of default top-10).
RE: Parallel requests to Tomcat
I'm not sure whether you'd call my operations IO heavy -- each query has so many terms (~50) that even against a 45K document index a query takes 130ms, but the entire index is in a ramfs. The more terms, the more it takes to find docset intersections (belonging to each term); something in SOLR/Lucene is still synchronized... Try to compare with smaller 1-term queries, different terms for parallel requests...
Re: Solr via ruby
Hi, Thanks for the discussion. We use the distributed option so I am not sure embedded is possible. As you also guessed, we use haproxy for load balancing and failover between replicas of the shards so giving this up for a minor performance boost is probably not wise. So essentially we have: User - HTTP Load Balancer - Mogrel Cluster - Haproxy - N x Solr Shards and it looks like that is the standard setup for performance from what you suggest here and most of the performance tweaks I thought of are already in use. Ian. On Fri, Sep 18, 2009 at 3:09 AM, Erik Hatcher erik.hatc...@gmail.comwrote: On Sep 18, 2009, at 1:09 AM, rajan chandi wrote: We are planning to use the external Solr on tomcat for scalability reasons. We thought that EmbeddedSolrServer uses HTTP too to talk with Ruby and vise-versa as in acts_as_solr ruby plugin. EmbeddedSolrServer is a way to run Solr as an API (like Lucene) rather than with any web container involved at all. In other words, only Java can use EmbeddedSolrServer (which means JRuby works great). The acts_as_solr plugin uses the solr-ruby library to communicate with Solr. Under solr-ruby, it's HTTP with ruby (wt=ruby) formatted responses for searches, and documents being indexed get converted to Solr's XML format and POSTed to the Solr URL used to open the Solr::Connection Erik If Ruby is not using the HTTP to talk EmbeddedSolrServer, what is it using? Thanks and Regards Rajan Chandi On Thu, Sep 17, 2009 at 9:44 PM, Erik Hatcher erik.hatc...@gmail.com wrote: On Sep 17, 2009, at 11:40 AM, Ian Connor wrote: Is there any support for connection pooling or a more optimized data exchange format? The solr-ruby library (as do other Solr + Ruby libraries) use the ruby response format and eval it. solr-ruby supports keeping the HTTP connection alive too. We are looking at any further ways to optimize the solr queries so we can possibly make more of them in the one request. The JSON like format seems pretty tight but I understand when the distributed search takes place it uses a binary protocol instead of text. I wanted to know if that was available or could be available via the ruby library. Is it possible to host a local shard and skip HTTP between ruby and solr? If you use JRuby you can do some fancy stuff, like use the javabin update and response formats so no XML is involved, and you could also use Solr's EmbeddedSolrServer to avoid HTTP. However, in practice rarely is HTTP the bottleneck and actually offers a lot of advantages, such as easy commodity load balancing and caching. But JRuby + Solr is a very beautiful way to go! If you're using MRI Ruby, though, you don't really have any options other than to go over HTTP. You could use json or ruby formatted responses - I'd be curious to see some performance numbers comparing those two. Erik -- Regards, Ian Connor 1 Leighton St #723 Cambridge, MA 02141 Call Center Phone: +1 (714) 239 3875 (24 hrs) Fax: +1(770) 818 5697 Skype: ian.connor
Multiple DisMax Queries spanning across multiple fields
For a particular requirement we have - we need to do a query that is a combination of multiple dismax queries behind the scenes. (Using solr 1.4 nightly ). The DisMaxQParser org.apache.solr.search.DisMaxQParser ( details at - http://wiki.apache.org/solr/DisMaxRequestHandler ) takes in the /qf/ parameters and applies the parser to /q /and computes relevance based on the same. We need to have a case where, the final query is a combination of{ (q = keywords, qf = Map of field weights) , (q1, qf1 ) , (q2, qf2 ) .. etc } combined by a boolean AND , for the individual queries. Creating a custom QParser works right away as below. public class MultiTermDisMaxQParser extends DisMaxQParser { .. .. .. @Override public Query parse() throws ParseException { BooleanQuery finalQuery = new BooleanQuery(true); Query superQuery = super.parse(); // Handles { (q, qf) combination }. ... ... // finalQuery adds superQuery with a weight. return finalQuery; } } Curious to see if we have an alternate method to implement the same / any other alternate suggestions to the problem itself.
ReversedWildcardFilterFactory (SOLR-1321) and KeywordTokenizerFactory
Hello, Can ReversedWildcardFilterFactory be used with KeywordTokenizerFactory ? I get the following error, looks like solr expects WhitespaceTokenizerFactory...Can anybody suggest how to rectify it. My schema snippet is also given below. Data is extracted via OpenNLP and indexed into Solr for performing leading wildcard searches for faceting purposes. ERROR HTTP Status 500 - org.apache.solr.analysis.WhitespaceTokenizerFactory.create(Ljava/io/Reader;)Lorg/apache/lucene/analysis/Tokenizer; java.lang.AbstractMethodError: org.apache.solr.analysis.WhitespaceTokenizerFactory.create(Ljava/io/Reader;)Lorg/apache/lucene/analysis/Tokenizer; at org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:69) at org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:74) at org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:364) at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:543) at org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:153) at org.apache.solr.util.SolrPluginUtils$DisjunctionMaxQueryParser.getFieldQuery(SolrPluginUtils.java:807) at org.apache.solr.util.SolrPluginUtils$DisjunctionMaxQueryParser.getFieldQuery(SolrPluginUtils.java:794) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1425) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1313) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1241) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1230) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:176) at org.apache.solr.search.DisMaxQParser.getUserQuery(DisMaxQParser.java:195) at org.apache.solr.search.DisMaxQParser.addMainQuery(DisMaxQParser.java:158) at org.apache.solr.search.DisMaxQParser.parse(DisMaxQParser.java:74) at org.apache.solr.search.QParser.getQuery(QParser.java:131) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:230) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:198) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:297) at org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:271) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:202) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:632) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:577) at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:206) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:632) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:577) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:571) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1080) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:150) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:632) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:577) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:571) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1080) at org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:272) at com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.invokeAdapter(DefaultProcessorTask.java:637) at com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.doProcess(DefaultProcessorTask.java:568) at com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.process(DefaultProcessorTask.java:813) at com.sun.enterprise.web.connector.grizzly.DefaultReadTask.executeProcessorTask(DefaultReadTask.java:341) at com.sun.enterprise.web.connector.grizzly.DefaultReadTask.doTask(DefaultReadTask.java:263) at com.sun.enterprise.web.connector.grizzly.DefaultReadTask.doTask(DefaultReadTask.java:214) at com.sun.enterprise.web.connector.grizzly.TaskBase.run(TaskBase.java:265) at com.sun.enterprise.web.connector.grizzly.ssl.SSLWorkerThread.run(SSLWorkerThread.java:106) SCHEMA snippet --- fieldType name=keywordText class=solr.TextField sortMissingLast=true
Very big numbers
Hi! I need to index in solr very big numbers. Something like 99,999,999,999,999.99 Right now i'm using an sdouble field type because I need to make range queries on this field. The problem is that the field value is being returned in scientific notation. Is there any way to avoid that? Thanks! Jonathan
Re: Solr http post performance seems slow - help?
On Friday 11 September 2009 11:06:20 am Dan A. Dickey wrote: ... Our JBoss expert and I will be looking into why this might be occurring. Does anyone know of any JBoss related slowness with Solr? And does anyone have any other sort of suggestions to speed indexing performance? Thanks for your help all! I'll keep you up to date with further progress. Ok, further progress... just to keep any interested parties up to date and for the record... I'm finding that using the example jetty setup (will be switching very very soon to a real jetty installation) is about the fastest. Using several processes to send posts to Solr helps a lot, and we're seeing about 80 posts a second this way. We also stripped down JBoss to the bare bones and the Solr in it is running nearly as fast - about 50 posts a second. It was our previous JBoss configuration that was making it appear slow for some reason. We will be running more tests and spreading out the pre-index workload across more machines and more processes. In our case we were seeing the bottleneck being one machine running 18 processes. The 2 quad core xeon system is experiencing about a 25% cpu load. And I'm not certain, but I think this may be actually 25% of one of the 8 cores. So, there's *lots* of room for Solr to be doing more work there. -Dan -- Dan A. Dickey | Senior Software Engineer Savvis 10900 Hampshire Ave. S., Bloomington, MN 55438 Office: 952.852.4803 | Fax: 952.852.4951 E-mail: dan.dic...@savvis.net
java doc error local params syntax for dismax
The javadoc for DisMaxQParserPlugin states: {!dismax qf=myfield,mytitle^2}foo creates a dismax query but actually, that gives an error. The correct syntax is {!dismax qf=myfield mytitle^2}foo (could use single quote instead of double quote). - Naomi
Re: java doc error local params syntax for dismax
On Wed, Sep 23, 2009 at 5:59 PM, Naomi Dushay ndus...@stanford.edu wrote: The javadoc for DisMaxQParserPlugin states: {!dismax qf=myfield,mytitle^2}foo creates a dismax query but actually, that gives an error. The correct syntax is {!dismax qf=myfield mytitle^2}foo (could use single quote instead of double quote). Thanks, I always forget that dismax uses space separated, not comma separated lists. -Yonik
Re: Solrj possible deadlock
I had the same problem again yesterday except the process halted after about 20mins this time. pof wrote: Hello, I was running a batch index the other day using the Solrj EmbeddedSolrServer when the process abruptly froze in it's tracks after running for about 4-5 hours and indexing ~400K documents. There were no document locks so it would seem likely that there was some kind of thread deadlock. I was hoping someone might be able to tell me some information about the following thread dump taken at the time: Full thread dump OpenJDK Client VM (1.6.0-b09 mixed mode): DestroyJavaVM prio=10 tid=0x9322a800 nid=0xcef waiting on condition [0x..0x0018a044] java.lang.Thread.State: RUNNABLE Java2D Disposer daemon prio=10 tid=0x0a28cc00 nid=0xf1c in Object.wait() [0x0311d000..0x0311def4] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x97a96840 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133) - locked 0x97a96840 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149) at sun.java2d.Disposer.run(Disposer.java:143) at java.lang.Thread.run(Thread.java:636) pool-1-thread-1 prio=10 tid=0x93a26c00 nid=0xcf7 waiting on condition [0x08a6a000..0x08a6b074] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x967acfd0 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1978) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:386) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Low Memory Detector daemon prio=10 tid=0x93a00c00 nid=0xcf5 runnable [0x..0x] java.lang.Thread.State: RUNNABLE CompilerThread0 daemon prio=10 tid=0x09fe9800 nid=0xcf4 waiting on condition [0x..0x096a7af4] java.lang.Thread.State: RUNNABLE Signal Dispatcher daemon prio=10 tid=0x09fe8800 nid=0xcf3 waiting on condition [0x..0x] java.lang.Thread.State: RUNNABLE Finalizer daemon prio=10 tid=0x09fd7000 nid=0xcf2 in Object.wait() [0x005ca000..0x005caef4] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x966e6d40 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133) - locked 0x966e6d40 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177) Reference Handler daemon prio=10 tid=0x09fd2c00 nid=0xcf1 in Object.wait() [0x00579000..0x00579d74] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x966e6dc8 (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:502) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) - locked 0x966e6dc8 (a java.lang.ref.Reference$Lock) VM Thread prio=10 tid=0x09fcf800 nid=0xcf0 runnable VM Periodic Task Thread prio=10 tid=0x93a02400 nid=0xcf6 waiting on condition JNI global references: 1072 Heap def new generation total 36288K, used 23695K [0x93f1, 0x9667, 0x9667) eden space 32256K, 73% used [0x93f1, 0x95633f60, 0x95e9) from space 4032K, 0% used [0x95e9, 0x95e9, 0x9628) to space 4032K, 0% used [0x9628, 0x9628, 0x9667) tenured generation total 483968K, used 72129K [0x9667, 0xb3f1, 0xb3f1) the space 483968K, 14% used [0x9667, 0x9ace04b8, 0x9ace0600, 0xb3f1) compacting perm gen total 23040K, used 22983K [0xb3f1, 0xb559, 0xb7f1) the space 23040K, 99% used [0xb3f1, 0xb5581ff8, 0xb5582000, 0xb559) No shared spaces configured. Cheers. Brett. -- View this message in context: http://www.nabble.com/Solrj-possible-deadlock-tp25530146p25531321.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrj possible deadlock
do you have anything custom going on? The fact that the lock is in java2d seems suspicious... On Sep 23, 2009, at 7:01 PM, pof wrote: I had the same problem again yesterday except the process halted after about 20mins this time. pof wrote: Hello, I was running a batch index the other day using the Solrj EmbeddedSolrServer when the process abruptly froze in it's tracks after running for about 4-5 hours and indexing ~400K documents. There were no document locks so it would seem likely that there was some kind of thread deadlock. I was hoping someone might be able to tell me some information about the following thread dump taken at the time: Full thread dump OpenJDK Client VM (1.6.0-b09 mixed mode): DestroyJavaVM prio=10 tid=0x9322a800 nid=0xcef waiting on condition [0x..0x0018a044] java.lang.Thread.State: RUNNABLE Java2D Disposer daemon prio=10 tid=0x0a28cc00 nid=0xf1c in Object.wait() [0x0311d000..0x0311def4] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x97a96840 (a java.lang.ref.ReferenceQueue $Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 133) - locked 0x97a96840 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 149) at sun.java2d.Disposer.run(Disposer.java:143) at java.lang.Thread.run(Thread.java:636) pool-1-thread-1 prio=10 tid=0x93a26c00 nid=0xcf7 waiting on condition [0x08a6a000..0x08a6b074] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x967acfd0 (a java.util.concurrent.locks.AbstractQueuedSynchronizer $ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer $ConditionObject.await(AbstractQueuedSynchronizer.java:1978) at java .util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java: 386) at java .util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java: 1043) at java .util .concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java: 1103) at java.util.concurrent.ThreadPoolExecutor $Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Low Memory Detector daemon prio=10 tid=0x93a00c00 nid=0xcf5 runnable [0x..0x] java.lang.Thread.State: RUNNABLE CompilerThread0 daemon prio=10 tid=0x09fe9800 nid=0xcf4 waiting on condition [0x..0x096a7af4] java.lang.Thread.State: RUNNABLE Signal Dispatcher daemon prio=10 tid=0x09fe8800 nid=0xcf3 waiting on condition [0x..0x] java.lang.Thread.State: RUNNABLE Finalizer daemon prio=10 tid=0x09fd7000 nid=0xcf2 in Object.wait() [0x005ca000..0x005caef4] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x966e6d40 (a java.lang.ref.ReferenceQueue $Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 133) - locked 0x966e6d40 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 149) at java.lang.ref.Finalizer $FinalizerThread.run(Finalizer.java:177) Reference Handler daemon prio=10 tid=0x09fd2c00 nid=0xcf1 in Object.wait() [0x00579000..0x00579d74] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x966e6dc8 (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:502) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) - locked 0x966e6dc8 (a java.lang.ref.Reference$Lock) VM Thread prio=10 tid=0x09fcf800 nid=0xcf0 runnable VM Periodic Task Thread prio=10 tid=0x93a02400 nid=0xcf6 waiting on condition JNI global references: 1072 Heap def new generation total 36288K, used 23695K [0x93f1, 0x9667, 0x9667) eden space 32256K, 73% used [0x93f1, 0x95633f60, 0x95e9) from space 4032K, 0% used [0x95e9, 0x95e9, 0x9628) to space 4032K, 0% used [0x9628, 0x9628, 0x9667) tenured generation total 483968K, used 72129K [0x9667, 0xb3f1, 0xb3f1) the space 483968K, 14% used [0x9667, 0x9ace04b8, 0x9ace0600, 0xb3f1) compacting perm gen total 23040K, used 22983K [0xb3f1, 0xb559, 0xb7f1) the space 23040K, 99% used [0xb3f1, 0xb5581ff8, 0xb5582000, 0xb559) No shared spaces configured. Cheers. Brett. -- View this message in context: http://www.nabble.com/Solrj-possible-deadlock-tp25530146p25531321.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: java doc error local params syntax for dismax
It's not just the spaces - it's that the quotes (single or double flavor) is required as well. On Sep 23, 2009, at 3:10 PM, Yonik Seeley wrote: On Wed, Sep 23, 2009 at 5:59 PM, Naomi Dushay ndus...@stanford.edu wrote: The javadoc for DisMaxQParserPlugin states: {!dismax qf=myfield,mytitle^2}foo creates a dismax query but actually, that gives an error. The correct syntax is {!dismax qf=myfield mytitle^2}foo (could use single quote instead of double quote). Thanks, I always forget that dismax uses space separated, not comma separated lists. -Yonik
Re: java doc error local params syntax for dismax
On Wed, Sep 23, 2009 at 8:24 PM, Naomi Dushay ndus...@stanford.edu wrote: It's not just the spaces - it's that the quotes (single or double flavor) is required as well. LocalParams are space delimited, so the original example would have worked if the dismax parser accepted comma delimited fields. -Yonik http://www.lucidimagination.com On Sep 23, 2009, at 3:10 PM, Yonik Seeley wrote: On Wed, Sep 23, 2009 at 5:59 PM, Naomi Dushay ndus...@stanford.edu wrote: The javadoc for DisMaxQParserPlugin states: {!dismax qf=myfield,mytitle^2}foo creates a dismax query but actually, that gives an error. The correct syntax is {!dismax qf=myfield mytitle^2}foo (could use single quote instead of double quote). Thanks, I always forget that dismax uses space separated, not comma separated lists. -Yonik
Can solr build on top of HBase
hi, i use hbase and solr ,now i have a large data need to index ,it means solr-index will be large, as the data increases,it will be more larger than now. so solrconfig.xml 's dataDir/solrhome/data/dataDir ,can i used it from api ,and point to my distrabuted hbase data storage, and if the index is too large ,will it be slow? thanks.
Re: solr caching problem
Is there any way to analyze or see that which documents are getting cached by documentCache - documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ On Wed, Sep 23, 2009 at 8:10 AM, satya tosatyaj...@gmail.com wrote: First of all , thanks a lot for the clarification.Is there any way to see, how this cache is working internally and what are the objects being stored and how much memory its consuming,so that we can get a clear picture in mind.And how to test the performance through cache. On Tue, Sep 22, 2009 at 11:19 PM, Fuad Efendi f...@efendi.ca wrote: 1)Then do you mean , if we delete a perticular doc ,then that is going to be deleted from cache also. When you delete document, and then COMMIT your changes, new caches will be warmed up (and prepopulated by some key-value pairs from old instances), etc: !-- documentCache caches Lucene Document objects (the stored fields for each document). Since Lucene internal document ids are transient, this cache will not be autowarmed. -- documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ - this one won't be 'prepopulated'. 2)In solr,is cache storing the entire document in memory or only the references to documents in memory. There are many different cache instances, DocumentCache should store ID, Document pairs, etc
Re: Can solr build on top of HBase
can hbase be mounted on the filesystem? Solr can only read data from a filesystem On Thu, Sep 24, 2009 at 7:27 AM, 梁景明 futur...@gmail.com wrote: hi, i use hbase and solr ,now i have a large data need to index ,it means solr-index will be large, as the data increases,it will be more larger than now. so solrconfig.xml 's dataDir/solrhome/data/dataDir ,can i used it from api ,and point to my distrabuted hbase data storage, and if the index is too large ,will it be slow? thanks. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Can we point a Solr server to index directory dynamically at runtime..
Hi, Is there any way to dynamically point the Solr servers to an index/data directories at run time? We are generating 200 GB worth of index per day and we want to retain the index for approximately 1 month. So our idea is to keep the first 1 week of index available at anytime for the users i.e have set of Solr servers up and running and handle request to get the past 1 week of date. But when user tries to query data which is older than 7 days old, we want to dynamically point the existing Solr instances to the inactive/dormant indexes and get the results. The main intention is to limit the number of Solr Slave instances and there by limit the # of Servers required. If the index directory and Solr instances are tightly coupled, then most of the Solr instances are just up and running and may hardly used, as most of the users are mainly interested in past 1 week data and not beyond that. Any thoughts or any other approaches to tackle this would be greatly appreciated. Thanks, sS
Re: Can solr build on top of HBase
Would FUSE (http://wiki.apache.org/hadoop/MountableHDFS) be of use? I wonder if you could take the data from HBase and index it into a Lucene index stored on HDFS. 2009/9/23 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com can hbase be mounted on the filesystem? Solr can only read data from a filesystem On Thu, Sep 24, 2009 at 7:27 AM, 梁景明 futur...@gmail.com wrote: hi, i use hbase and solr ,now i have a large data need to index ,it means solr-index will be large, as the data increases,it will be more larger than now. so solrconfig.xml 's dataDir/solrhome/data/dataDir ,can i used it from api ,and point to my distrabuted hbase data storage, and if the index is too large ,will it be slow? thanks. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: java doc error local params syntax for dismax
Okay, but {!dismax qf=myfield mytitle^2}foo works {!dismax qf=myfield mytitle^2}foo does NOT work - Naomi On Sep 23, 2009, at 5:52 PM, Yonik Seeley wrote: On Wed, Sep 23, 2009 at 8:24 PM, Naomi Dushay ndus...@stanford.edu wrote: It's not just the spaces - it's that the quotes (single or double flavor) is required as well. LocalParams are space delimited, so the original example would have worked if the dismax parser accepted comma delimited fields. -Yonik http://www.lucidimagination.com On Sep 23, 2009, at 3:10 PM, Yonik Seeley wrote: On Wed, Sep 23, 2009 at 5:59 PM, Naomi Dushay ndus...@stanford.edu wrote: The javadoc for DisMaxQParserPlugin states: {!dismax qf=myfield,mytitle^2}foo creates a dismax query but actually, that gives an error. The correct syntax is {!dismax qf=myfield mytitle^2}foo (could use single quote instead of double quote). Thanks, I always forget that dismax uses space separated, not comma separated lists. -Yonik