Re: UIMA Error
Hi How to apply the AlchemyAPIAnnotator? will this helps me with the *NamedEntityExtractionAnnotator?* *thanx a lot Tommaso for you time*
Re: keepword file with phrases
Hi Bill, quoting in the synonyms file did not produce the correct expansion :-( Looking at Chris's comments now cheers lee On 5 February 2011 23:38, Bill Bell billnb...@gmail.com wrote: OK that makes sense. If you double quote the synonyms file will that help for white space? Bill On 2/5/11 4:37 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : You need to switch the order. Do synonyms and expansion first, then : shingles.. except then he would be building shingles out of all the permutations of words in his symonyms -- including the multi-word synonyms. i don't *think* that's what he wants based on his example (but i may be wrong) : Have you tried using analysis.jsp ? he already mentioned he has, in his original mail, and that's how he can tell it's not working. lee: based on your followup post about seeing problems in the synonyms output, i suspect the problem you are having is with how the synonymfilter parses the synonyms file -- by default it assumes it should split on certain characters to creates multi-word synonyms -- but in your case the tokens you are feeding synonym filter (the output of your shingle filter) really do have whitespace in them there is a tokenizerFactory option that Koji added a hwile back to the SYnonymFilterFactory that lets you specify the classname of a TokenizerFactory to use when parsing the synonym rule -- that may be what you need to get your synonyms with spaces in them (so they work properly with your shingles) (assuming of course that i really understand your problem) -Hoss
Re: HTTP ERROR 400 undefined field: *
I *think* that there was a post a while ago saying that if you were using trunk 3_x one of the recent changes required re-indexing, but don't quote me on that. Have you tried that? Best Erick On Fri, Feb 4, 2011 at 2:04 PM, Jed Glazner jglaz...@beyondoblivion.comwrote: Sorry for the lack of details. It's all clear in my head.. :) We checked out the head revision from the 3.x branch a few weeks ago ( https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/). We picked up r1058326. We upgraded from a previous checkout (r960098). I am using our customized schema.xml and the solrconfig.xml from the old revision with the new checkout. After upgrading I just copied the data folders from each core into the new checkout (hoping I wouldn't have to re-index the content, as this takes days). Everything seems to work fine, except that now I can't get the score to return. The stack trace is attached. I also saw this warning in the logs not sure exactly what it's talking about: Feb 3, 2011 8:14:10 PM org.apache.solr.core.Config getLuceneVersion WARNING: the luceneMatchVersion is not specified, defaulting to LUCENE_24 emulation. You should at some point declare and reindex to at least 3.0, because 2.4 emulation is deprecated and will be removed in 4.0. This parameter will be mandatory in 4.0. Here is my request handler, the actual fields here are different than what is in mine, but I'm a little uncomfortable publishing how our companies search service works to the world: requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str str name=defTypeedismax/str bool name=tvtrue/bool !-- standard field to query on -- str name=qffield_a^2 field_b^2 field_c^4 /str !-- automatic phrase boosting! -- str name=pffield_d^10/str !-- boost function -- !-- we'll comment this out for now becuase we're passing it to solr as a paramter. Once we finalize the exact function we should move it here and take it out of the query string. -- !--str name=bflog(linear(field_e,0.001,1))^10/str-- str name=tie0.1/str /lst arr name=last-components strtvComponent/str /arr /requestHandler Anyway Hopefully this is enough info, let me know if you need more. Jed. On 02/03/2011 10:29 PM, Chris Hostetter wrote: : I was working on an checkout of the 3.x branch from about 6 months ago. : Everything was working pretty well, but we decided that we should update and : get what was at the head. However after upgrading, I am now getting this FWIW: please be specific. head of what? the 3x branch? or trunk? what revision in svn does that corrispond to? (the svnversion command will tell you) : HTTP ERROR 400 undefined field: * : : If I clear the fl parameter (default is set to *, score) then it works fine : with one big problem, no score data. If I try and set fl=score I get the same : error except it says undefined field: score?! : : This works great in the older version, what changed? I've googled for about : an hour now and I can't seem to find anything. i can't reproduce this using either trunk (r1067044) or 3x (r1067045) all of these queries work just fine... http://localhost:8983/solr/select/?q=* http://localhost:8983/solr/select/?q=solrfl=*,score http://localhost:8983/solr/select/?q=solrfl=score http://localhost:8983/solr/select/?q=solr ...you'll have to proivde us with a *lot* more details to help understand why you might be getting an error (like: what your configs look like, what the request looks like, what the full stack trace of your error is in the logs, etc...) -Hoss
Re: AND operator and dismax request handler
Try attaching debugQuery=on to your queries. The results will show you exactly what the query is after it gets parsed and the difference should stand out. About dismax. Try looking at the minimum should match parameter, that might do what you're looking for. Or, think about edismax if you're on trunk or 3_x... Best Erick On Sat, Feb 5, 2011 at 9:47 AM, Bagesh Sharma mail.bag...@gmail.com wrote: Hi friends, Please suggest me that how can i set query operator to AND for dismax request handler case. My problem is that i am searching a string water treatment plant using dismax request handler . The query formed is of such type http://localhost:8884/solr/select/?q=water+treatment+plantq.alt=*:*start=0rows=5sort=score%20descqt=dismaxomitHeader=true My handling for dismax request handler in solrConfig.xml is - requestHandler name=dismax class=solr.DisMaxRequestHandler default=true lst name=defaults str name=facettrue/str str name=echoParamsexplicit/str float name=tie0.2/float str name=qf TDR_SUBIND_SUBTDR_SHORT^3 TDR_SUBIND_SUBTDR_DETAILS^2 TDR_SUBIND_COMP_NAME^1.5 TDR_SUBIND_LOC_STATE^3 TDR_SUBIND_PROD_NAMES^2.5 TDR_SUBIND_LOC_CITY^3 TDR_SUBIND_LOC_ZIP^2.5 TDR_SUBIND_NAME^1.5 TDR_SUBIND_TENDER_NO^1 /str str name=pf TDR_SUBIND_SUBTDR_SHORT^15 TDR_SUBIND_SUBTDR_DETAILS^10 TDR_SUBIND_COMP_NAME^20 /str str name=qs1/str int name=ps0/int str name=mm20%/str /lst /requestHandler In the final parsed query it is like +((TDR_SUBIND_PROD_NAMES:water^2.5 | TDR_SUBIND_LOC_ZIP:water^2.5 | TDR_SUBIND_COMP_NAME:water^1.5 | TDR_SUBIND_TENDER_NO:water | TDR_SUBIND_SUBTDR_SHORT:water^3.0 | TDR_SUBIND_SUBTDR_DETAILS:water^2.0 | TDR_SUBIND_LOC_CITY:water^3.0 | TDR_SUBIND_LOC_STATE:water^3.0 | TDR_SUBIND_NAME:water^1.5)~0.2 (TDR_SUBIND_PROD_NAMES:treatment^2.5 | TDR_SUBIND_LOC_ZIP:treatment^2.5 | TDR_SUBIND_COMP_NAME:treatment^1.5 | TDR_SUBIND_TENDER_NO:treatment | TDR_SUBIND_SUBTDR_SHORT:treatment^3.0 | TDR_SUBIND_SUBTDR_DETAILS:treatment^2.0 | TDR_SUBIND_LOC_CITY:treatment^3.0 | TDR_SUBIND_LOC_STATE:treatment^3.0 | TDR_SUBIND_NAME:treatment^1.5)~0.2 (TDR_SUBIND_PROD_NAMES:plant^2.5 | TDR_SUBIND_LOC_ZIP:plant^2.5 | TDR_SUBIND_COMP_NAME:plant^1.5 | TDR_SUBIND_TENDER_NO:plant | TDR_SUBIND_SUBTDR_SHORT:plant^3.0 | TDR_SUBIND_SUBTDR_DETAILS:plant^2.0 | TDR_SUBIND_LOC_CITY:plant^3.0 | TDR_SUBIND_LOC_STATE:plant^3.0 | TDR_SUBIND_NAME:plant^1.5)~0.2) (TDR_SUBIND_SUBTDR_DETAILS:water treatment plant^10.0 | TDR_SUBIND_COMP_NAME:water treatment plant^20.0 | TDR_SUBIND_SUBTDR_SHORT:water treatment plant^15.0)~0.2 Now it gives me results if any of the word is found from text water treatment plant. I think here OR operator is working which finally combines the results. Now i want only those results for which only complete text should be matching water treatment plant. 1. I do not want to make any change in solrConfig.xml dismax handler. If possible then suggest any other handler to deal with it. 2. Does there is really or operator is working in query. basically when i query like this q=%2Bwater%2Btreatment%2Bplantq.alt=*:*q.op=ANDstart=0rows=5sort=score desc,TDR_SUBIND_SUBTDR_OPEN_DATE ascomitHeader=truedebugQuery=trueqt=dismax OR q=water+AND+treatment+AND+plantq.alt=*:*q.op=ANDstart=0rows=5sort=score desc,TDR_SUBIND_SUBTDR_OPEN_DATE ascomitHeader=truedebugQuery=trueqt=dismax Then it is giving different results. Can you suggest what is the difference between above two queries. Please suggest me for full text search water treatment plant. Thanks for your response. -- View this message in context: http://lucene.472066.n3.nabble.com/AND-operator-and-dismax-request-handler-tp2431391p2431391.html Sent from the Solr - User mailing list archive at Nabble.com.
Separating Index Reader and Writer
Hi all, I have setup two indexes one for reading(R) and other for writing(W).Index R refers to the same data dir of W (defined in solrconfig via dataDir). To make sure the R index sees the indexed documents of W , i am firing an empty commit on R. With this , I am getting performance improvement as compared to using the same index for reading and writing . Can anyone help me in knowing why this performance improvement is taking place even though both the indexeses are pointing to the same data directory. -- Thanks Regards, Isan Fulia.
Re: Optimize seaches; business is progressing with my Solr site
What does debugQuery=on give you? Second, what optimizatons are you doing? What shows up in they analysis page? does your admin page show the terms in your copyfield you expect? Best Erick On Sun, Feb 6, 2011 at 2:03 AM, Dennis Gearon gear...@sbcglobal.net wrote: Thanks to LOTS of information from you guys, my site is up and working. It's only an API now, I need to work on my OWN front end, LOL! I have my second customer. My general purpose repository API is very useful I'm finding. I will soon be in the business of optimizing the search engine part. For example. I have a copy field that has the words, 'boogie woogie ballroom' on lots of records in the copy field. I cannot find those records using 'boogie/boogi/boog', or the woogie versions of those, but I can with ballroom. For my VERY first lesson in optimization of search, what might be causing that, and where are the places to read on the Solr site on this? All the best on a Sunday, guys and gals. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die.
Re: Separating Index Reader and Writer
Hi, We use this scenario in production where we have one write-only Solr instance and 1 read-only, pointing to the same data. We do this so we can optimize caching/etc. for each instance for write/read. The main performance gain is in cache warming and associated parameters. For your Index W, it's worth turning off cache warming altogether, so commits aren't slowed down by warming. Peter On Sun, Feb 6, 2011 at 3:25 PM, Isan Fulia isan.fu...@germinait.com wrote: Hi all, I have setup two indexes one for reading(R) and other for writing(W).Index R refers to the same data dir of W (defined in solrconfig via dataDir). To make sure the R index sees the indexed documents of W , i am firing an empty commit on R. With this , I am getting performance improvement as compared to using the same index for reading and writing . Can anyone help me in knowing why this performance improvement is taking place even though both the indexeses are pointing to the same data directory. -- Thanks Regards, Isan Fulia.
Re: Separating Index Reader and Writer
Hi peter , Can you elaborate a little on how performance gain is in cache warming.I am getting a good improvement on search time. On 6 February 2011 23:29, Peter Sturge peter.stu...@gmail.com wrote: Hi, We use this scenario in production where we have one write-only Solr instance and 1 read-only, pointing to the same data. We do this so we can optimize caching/etc. for each instance for write/read. The main performance gain is in cache warming and associated parameters. For your Index W, it's worth turning off cache warming altogether, so commits aren't slowed down by warming. Peter On Sun, Feb 6, 2011 at 3:25 PM, Isan Fulia isan.fu...@germinait.com wrote: Hi all, I have setup two indexes one for reading(R) and other for writing(W).Index R refers to the same data dir of W (defined in solrconfig via dataDir). To make sure the R index sees the indexed documents of W , i am firing an empty commit on R. With this , I am getting performance improvement as compared to using the same index for reading and writing . Can anyone help me in knowing why this performance improvement is taking place even though both the indexeses are pointing to the same data directory. -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
Re: Separating Index Reader and Writer
Hi Peter, I must jump in this discussion: From a logical point of view what you are saying makes only sense if both instances do not run on the same machine or at least not on the same drive. When both run on the same machine and the same drive, the overall used memory should be equal plus I do not understand why this setup should affect cache warming etc., since the process of rewarming should be the same. Well, my knowledge about the internals is not very deep. But from just a logical point of view - to me - the same is happening as if I would do it in a single solr-instance. So what is the difference, what do I overlook? Another thing: While W is committing and writing to the index, is there any inconsistency in R or isn't there any, because W is writing a new Segment and so for R there isn't anything different until the commit finished? Are there problems during optimizing an index? How do you inform R about the finished commit? Thank you for your explanation, it's a really interesting topic! Regards, Em Peter Sturge-2 wrote: Hi, We use this scenario in production where we have one write-only Solr instance and 1 read-only, pointing to the same data. We do this so we can optimize caching/etc. for each instance for write/read. The main performance gain is in cache warming and associated parameters. For your Index W, it's worth turning off cache warming altogether, so commits aren't slowed down by warming. Peter On Sun, Feb 6, 2011 at 3:25 PM, Isan Fulia isan.fu...@germinait.com wrote: Hi all, I have setup two indexes one for reading(R) and other for writing(W).Index R refers to the same data dir of W (defined in solrconfig via dataDir). To make sure the R index sees the indexed documents of W , i am firing an empty commit on R. With this , I am getting performance improvement as compared to using the same index for reading and writing . Can anyone help me in knowing why this performance improvement is taking place even though both the indexeses are pointing to the same data directory. -- Thanks Regards, Isan Fulia. -- View this message in context: http://lucene.472066.n3.nabble.com/Separating-Index-Reader-and-Writer-tp2437666p2438730.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Optimize seaches; business is progressing with my Solr site
Hmmm, my default distance for geospatial was excluding the results, I believe. I have to check to see if I was actually looking at the desired return result for 'ballroom' alone. Mabye I wasn't. But I saw a lot to learn when I applied the techniques you gave me. Thank you :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Sun, February 6, 2011 8:21:15 AM Subject: Re: Optimize seaches; business is progressing with my Solr site What does debugQuery=on give you? Second, what optimizatons are you doing? What shows up in they analysis page? does your admin page show the terms in your copyfield you expect? Best Erick On Sun, Feb 6, 2011 at 2:03 AM, Dennis Gearon gear...@sbcglobal.net wrote: Thanks to LOTS of information from you guys, my site is up and working. It's only an API now, I need to work on my OWN front end, LOL! I have my second customer. My general purpose repository API is very useful I'm finding. I will soon be in the business of optimizing the search engine part. For example. I have a copy field that has the words, 'boogie woogie ballroom' on lots of records in the copy field. I cannot find those records using 'boogie/boogi/boog', or the woogie versions of those, but I can with ballroom. For my VERY first lesson in optimization of search, what might be causing that, and where are the places to read on the Solr site on this? All the best on a Sunday, guys and gals. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die.
Re: HTTP ERROR 400 undefined field: *
Yup, here it is, warning about needing to reindex: http://twitter.com/#!/lucene/status/28694113180192768 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Sun, February 6, 2011 9:43:00 AM Subject: Re: HTTP ERROR 400 undefined field: * I *think* that there was a post a while ago saying that if you were using trunk 3_x one of the recent changes required re-indexing, but don't quote me on that. Have you tried that? Best Erick On Fri, Feb 4, 2011 at 2:04 PM, Jed Glazner jglaz...@beyondoblivion.comwrote: Sorry for the lack of details. It's all clear in my head.. :) We checked out the head revision from the 3.x branch a few weeks ago ( https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/). We picked up r1058326. We upgraded from a previous checkout (r960098). I am using our customized schema.xml and the solrconfig.xml from the old revision with the new checkout. After upgrading I just copied the data folders from each core into the new checkout (hoping I wouldn't have to re-index the content, as this takes days). Everything seems to work fine, except that now I can't get the score to return. The stack trace is attached. I also saw this warning in the logs not sure exactly what it's talking about: Feb 3, 2011 8:14:10 PM org.apache.solr.core.Config getLuceneVersion WARNING: the luceneMatchVersion is not specified, defaulting to LUCENE_24 emulation. You should at some point declare and reindex to at least 3.0, because 2.4 emulation is deprecated and will be removed in 4.0. This parameter will be mandatory in 4.0. Here is my request handler, the actual fields here are different than what is in mine, but I'm a little uncomfortable publishing how our companies search service works to the world: requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str str name=defTypeedismax/str bool name=tvtrue/bool !-- standard field to query on -- str name=qffield_a^2 field_b^2 field_c^4 /str !-- automatic phrase boosting! -- str name=pffield_d^10/str !-- boost function -- !-- we'll comment this out for now becuase we're passing it to solr as a paramter. Once we finalize the exact function we should move it here and take it out of the query string. -- !--str name=bflog(linear(field_e,0.001,1))^10/str-- str name=tie0.1/str /lst arr name=last-components strtvComponent/str /arr /requestHandler Anyway Hopefully this is enough info, let me know if you need more. Jed. On 02/03/2011 10:29 PM, Chris Hostetter wrote: : I was working on an checkout of the 3.x branch from about 6 months ago. : Everything was working pretty well, but we decided that we should update and : get what was at the head. However after upgrading, I am now getting this FWIW: please be specific. head of what? the 3x branch? or trunk? what revision in svn does that corrispond to? (the svnversion command will tell you) : HTTP ERROR 400 undefined field: * : : If I clear the fl parameter (default is set to *, score) then it works fine : with one big problem, no score data. If I try and set fl=score I get the same : error except it says undefined field: score?! : : This works great in the older version, what changed? I've googled for about : an hour now and I can't seem to find anything. i can't reproduce this using either trunk (r1067044) or 3x (r1067045) all of these queries work just fine... http://localhost:8983/solr/select/?q=* http://localhost:8983/solr/select/?q=solrfl=*,score http://localhost:8983/solr/select/?q=solrfl=score http://localhost:8983/solr/select/?q=solr ...you'll have to proivde us with a *lot* more details to help understand why you might be getting an error (like: what your configs look like, what the request looks like, what the full stack trace of your error is in the logs, etc...) -Hoss
Re: AND operator and dismax request handler
Hi Bagesh, I think Hossman and Erick have given you the path that can you choos and found the desired result. Try mm value set to 0 to dismax work for your operators AND OR and NOT. Thanx: Grijesh Lucid Imagination Inc. On Sat, Feb 5, 2011 at 8:17 PM, Bagesh Sharma [via Lucene] ml-node+2431391-1089615873-85...@n3.nabble.com wrote: Hi friends, Please suggest me that how can i set query operator to AND for dismax request handler case. My problem is that i am searching a string water treatment plant using dismax request handler . The query formed is of such type http://localhost:8884/solr/select/?q=water+treatment+plantq.alt=*:*start=0rows=5sort=score%20descqt=dismaxomitHeader=true My handling for dismax request handler in solrConfig.xml is - requestHandler name=dismax class=solr.DisMaxRequestHandler default=true lst name=defaults str name=facettrue/str str name=echoParamsexplicit/str float name=tie0.2/float str name=qf TDR_SUBIND_SUBTDR_SHORT^3 TDR_SUBIND_SUBTDR_DETAILS^2 TDR_SUBIND_COMP_NAME^1.5 TDR_SUBIND_LOC_STATE^3 TDR_SUBIND_PROD_NAMES^2.5 TDR_SUBIND_LOC_CITY^3 TDR_SUBIND_LOC_ZIP^2.5 TDR_SUBIND_NAME^1.5 TDR_SUBIND_TENDER_NO^1 /str str name=pf TDR_SUBIND_SUBTDR_SHORT^15 TDR_SUBIND_SUBTDR_DETAILS^10 TDR_SUBIND_COMP_NAME^20 /str str name=qs1/str int name=ps0/int str name=mm20%/str /lst /requestHandler In the final parsed query it is like +((TDR_SUBIND_PROD_NAMES:water^2.5 | TDR_SUBIND_LOC_ZIP:water^2.5 | TDR_SUBIND_COMP_NAME:water^1.5 | TDR_SUBIND_TENDER_NO:water | TDR_SUBIND_SUBTDR_SHORT:water^3.0 | TDR_SUBIND_SUBTDR_DETAILS:water^2.0 | TDR_SUBIND_LOC_CITY:water^3.0 | TDR_SUBIND_LOC_STATE:water^3.0 | TDR_SUBIND_NAME:water^1.5)~0.2 (TDR_SUBIND_PROD_NAMES:treatment^2.5 | TDR_SUBIND_LOC_ZIP:treatment^2.5 | TDR_SUBIND_COMP_NAME:treatment^1.5 | TDR_SUBIND_TENDER_NO:treatment | TDR_SUBIND_SUBTDR_SHORT:treatment^3.0 | TDR_SUBIND_SUBTDR_DETAILS:treatment^2.0 | TDR_SUBIND_LOC_CITY:treatment^3.0 | TDR_SUBIND_LOC_STATE:treatment^3.0 | TDR_SUBIND_NAME:treatment^1.5)~0.2 (TDR_SUBIND_PROD_NAMES:plant^2.5 | TDR_SUBIND_LOC_ZIP:plant^2.5 | TDR_SUBIND_COMP_NAME:plant^1.5 | TDR_SUBIND_TENDER_NO:plant | TDR_SUBIND_SUBTDR_SHORT:plant^3.0 | TDR_SUBIND_SUBTDR_DETAILS:plant^2.0 | TDR_SUBIND_LOC_CITY:plant^3.0 | TDR_SUBIND_LOC_STATE:plant^3.0 | TDR_SUBIND_NAME:plant^1.5)~0.2) (TDR_SUBIND_SUBTDR_DETAILS:water treatment plant^10.0 | TDR_SUBIND_COMP_NAME:water treatment plant^20.0 | TDR_SUBIND_SUBTDR_SHORT:water treatment plant^15.0)~0.2 Now it gives me results if any of the word is found from text water treatment plant. I think here OR operator is working which finally combines the results. Now i want only those results for which only complete text should be matching water treatment plant. 1. I do not want to make any change in solrConfig.xml dismax handler. If possible then suggest any other handler to deal with it. 2. Does there is really or operator is working in query. basically when i query like this q=%2Bwater%2Btreatment%2Bplantq.alt=*:*q.op=ANDstart=0rows=5sort=score desc,TDR_SUBIND_SUBTDR_OPEN_DATE ascomitHeader=truedebugQuery=trueqt=dismax OR q=water+AND+treatment+AND+plantq.alt=*:*q.op=ANDstart=0rows=5sort=score desc,TDR_SUBIND_SUBTDR_OPEN_DATE ascomitHeader=truedebugQuery=trueqt=dismax Then it is giving different results. Can you suggest what is the difference between above two queries. Please suggest me for full text search water treatment plant. Thanks for your response. This email was sent by Bagesh Sharma (via Nabble) Your replies will appear at http://lucene.472066.n3.nabble.com/AND-operator-and-dismax-request-handler-tp2431391p2431391.html To receive all replies by email, subscribe to this discussion - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/AND-operator-and-dismax-request-handler-tp2431391p2441363.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance optimization of Proximity/Wildcard searches
Only couple of thousand documents are added daily so the old OS cache should still be useful since old documents remain same, right? Also can you please comment on my other thread related to Term Vectors? Thanks! On Sat, Feb 5, 2011 at 8:40 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Yes, OS cache mostly remains (obviously index files that are no longer around are going to remain the OS cache for a while, but will be useless and gradually replaced by new index files). How long warmup takes is not relevant here, but what queries you use to warm up the index and how much you auto-warm the caches. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Salman Akram salman.ak...@northbaysolutions.net To: solr-user@lucene.apache.org Sent: Sat, February 5, 2011 4:06:54 AM Subject: Re: Performance optimization of Proximity/Wildcard searches Correct me if I am wrong. Commit in index flushes SOLR cache but of course OS cache would still be useful? If a an index is updated every hour then a warm up that takes less than 5 mins should be more than enough, right? On Sat, Feb 5, 2011 at 7:42 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Salman, Warming up may be useful if your caches are getting decent hit ratios. Plus, you are warming up the OS cache when you warm up. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Salman Akram salman.ak...@northbaysolutions.net To: solr-user@lucene.apache.org Sent: Fri, February 4, 2011 3:33:41 PM Subject: Re: Performance optimization of Proximity/Wildcard searches I know so we are not really using it for regular warm-ups (in any case index is updated on hourly basis). Just tried few times to compare results. The issue is I am not even sure if warming up is useful for such regular updates. On Fri, Feb 4, 2011 at 5:16 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Salman, I only skimmed your email, but wanted to say that this part sounds a little suspicious: Our warm up script currently executes all distinct queries in our logs having count 5. It was run yesterday (with all the indexing update every It sounds like this will make warmup take a long time, assuming you have more than a handful distinct queries in your logs. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Salman Akram salman.ak...@northbaysolutions.net To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk Sent: Tue, January 25, 2011 6:32:48 AM Subject: Re: Performance optimization of Proximity/Wildcard searches By warmed index you only mean warming the SOLR cache or OS cache? As I said our index is updated every hour so I am not sure how much SOLR cache would be helpful but OS cache should still be helpful, right? I haven't compared the results with a proper script but from manual testing here are some of the observations. 'Recent' queries which are in cache of course return immediately (only if they are exactly same - even if they took 3-4 mins first time). I will need to test how many recent queries stay in cache but still this would work only for very common queries. User can run different queries and I want at least them to be at 'acceptable' level (5-10 secs) even if not very fast. Our warm up script currently executes all distinct queries in our logs having count 5. It was run yesterday (with all the indexing update every hour after that) and today when I executed some of the same queries again their time seemed a little less (around 15-20%), I am not sure if this means anything. However, still their time is not acceptable. What do you think is the best way to compare results? First run all the warm up queries and then execute same randomly andcompare? We are using Windows server, would it make a big difference if we move to Linux? Our load is not high but some queries are really complex. Also I was hoping to move to SSD in last after trying out all software options. Is that an agreed fact that on large indexes (which don't fit in RAM)