Re: External File Field eating memory
Hi Apporva, This was my master server replication configuration: core/conf/solrconfig.xml requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=confFiles../data/external_eff_views/str /lst /requestHandler It is only configuration files that can be replicated. So, when I wrote the above config. The external files was getting replicated in core/conf/data/external_eff_views. But for solr to read the external file, it looks for it into core/data/external_eff_views location. So firstly the file was not getting replicated properly. Therefore, I did not opted the option of replicating the eff file. And the second thing is that whenever there is a change in configuration files, the core gets reloaded by itself to reflect the changes. I am not sure if you can disable this reloading. Finally, I thought of creating files on slaves in a different way. Thanks Kamal On Tue, Jul 15, 2014 at 11:00 AM, Apoorva Gaurav apoorva.gau...@myntra.com wrote: Hey Kamal, What all config changes have you done to establish replication of external files and how have you disabled role reloading? On Wed, Jul 9, 2014 at 11:30 AM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Hi All, It was found that external file, which was getting replicated after every 10 minutes was reloading the core as well. This was increasing the query time. Thanks Kamal Kishore On Thu, Jul 3, 2014 at 12:48 PM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: With the above replication configuration, the eff file is getting replicated at core/conf/data/external_eff_views (new dir data is being created in conf dir) location, but it is not getting replicated at core/data/external_eff_views on slave. Please help. On Thu, Jul 3, 2014 at 12:21 PM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Thanks for your guidance Alexandre Rafalovitch. I am looking into this seriously. Another question is that I facing error in replication of eff file This is master replication configuration: core/conf/solrconfig.xml requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=confFiles../data/external_eff_views/str /lst /requestHandler The eff file is present at core/data/external_eff_views location. On Thu, Jul 3, 2014 at 11:50 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: This might be related: https://issues.apache.org/jira/browse/SOLR-3514 On Sat, Jun 28, 2014 at 5:34 PM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Hi Team, I have recently implemented EFF in solr. There are about 1.5 lacs(unsorted) values in the external file. After this implementation, the server has become slow. The solr query time has also increased. Can anybody confirm me if these issues are because of this implementation. Is that memory does EFF eats up? Regards Kamal Kishore -- Regards, Shalin Shekhar Mangar. -- Thanks Regards, Apoorva
Re: External File Field eating memory
Thanks Kamal. On Wed, Jul 16, 2014 at 11:43 AM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Hi Apporva, This was my master server replication configuration: core/conf/solrconfig.xml requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=confFiles../data/external_eff_views/str /lst /requestHandler It is only configuration files that can be replicated. So, when I wrote the above config. The external files was getting replicated in core/conf/data/external_eff_views. But for solr to read the external file, it looks for it into core/data/external_eff_views location. So firstly the file was not getting replicated properly. Therefore, I did not opted the option of replicating the eff file. And the second thing is that whenever there is a change in configuration files, the core gets reloaded by itself to reflect the changes. I am not sure if you can disable this reloading. Finally, I thought of creating files on slaves in a different way. Thanks Kamal On Tue, Jul 15, 2014 at 11:00 AM, Apoorva Gaurav apoorva.gau...@myntra.com wrote: Hey Kamal, What all config changes have you done to establish replication of external files and how have you disabled role reloading? On Wed, Jul 9, 2014 at 11:30 AM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Hi All, It was found that external file, which was getting replicated after every 10 minutes was reloading the core as well. This was increasing the query time. Thanks Kamal Kishore On Thu, Jul 3, 2014 at 12:48 PM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: With the above replication configuration, the eff file is getting replicated at core/conf/data/external_eff_views (new dir data is being created in conf dir) location, but it is not getting replicated at core/data/external_eff_views on slave. Please help. On Thu, Jul 3, 2014 at 12:21 PM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Thanks for your guidance Alexandre Rafalovitch. I am looking into this seriously. Another question is that I facing error in replication of eff file This is master replication configuration: core/conf/solrconfig.xml requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=confFiles../data/external_eff_views/str /lst /requestHandler The eff file is present at core/data/external_eff_views location. On Thu, Jul 3, 2014 at 11:50 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: This might be related: https://issues.apache.org/jira/browse/SOLR-3514 On Sat, Jun 28, 2014 at 5:34 PM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Hi Team, I have recently implemented EFF in solr. There are about 1.5 lacs(unsorted) values in the external file. After this implementation, the server has become slow. The solr query time has also increased. Can anybody confirm me if these issues are because of this implementation. Is that memory does EFF eats up? Regards Kamal Kishore -- Regards, Shalin Shekhar Mangar. -- Thanks Regards, Apoorva -- Thanks Regards, Apoorva
weird drastic query latency during performance testing and DIH import delay after performance testing
Hi I build my SolrCloud using Solr 4.6.0 (java version:1.7.0_45). In my cloud,I have a collection with 30 shard,and each shard has one replica. each core of the shard contains nearly 50 million docs that is 15GB in size,so does the replica. Before applying my cloud in the real world,I do a performance test with JMeter 2.11. The scenario of the my test is simple:100 threads sending requests for 20 seconds ,and these requests are only sent to a specific core of a specific shard.the request is similar to the following : http://IP:port/solr/tv_201407/select?q=*:*fq=BEGINTIME:[2014-06-01 00:00:00+TO+*]+AND+(CONTACT:${user})+AND (TV_STATE:00)shards=tv_201407 rows=2000sort=BEGINTIME+desc. I encountered the drastic query latency during performance testing and DIH import delay after performance testing.Please help me. I have tested several times and get the same problem and can not handle it by myself.Any suggestion will be apprecaited. The following steps describes what I have done . Step 1: Before the test,the DIH import job is very fast.As the statistics [1], the DIH importing takes only 1s for 10 docs. [1]--- Indexing completed. Added/Updated: 10 documents. Deleted 0 documents. (Duration: 01s) Requests: 1 (1/s), Fetched: 10 (10/s), Skipped: 0, Processed: 10 (10/s) Started: less than a minute ago --- Step 2: Then ,Doing the test under the caches are cleaned. The summery statistics data is as [2]. Although I have clean the caches,I never think the query latency becomes so drastic that it cannot be acceptable in my real application. The red font describes the latency of the query performance test on the core tv_201407 of the shard tv_201407 . So would you experts can give some hints about the drastic query latency ? [2]--- [solr@solr2 test]$ ../bin/jmeter.sh -n -t solrCoudKala20140401.jmx -l logfile_solrCloud_20.jtl Creating summariser aggregate Created the tree successfully using solrCoudKala20140401.jmx Starting the test @ Wed Jul 16 15:59:28 CST 2014 (1405497568104) Waiting for possible shutdown message on port 4445 aggregate + 1 in 8.1s =0.1/s Avg: 8070 Min: 8070 Max: 8070 Err: 0 (0.00%) Active: 100 Started: 100 Finished: 0 aggregate +103 in 13.4s =7.7/s Avg: 8027 Min: 4191 Max: 8434 Err: 0 (0.00%) Active: 97 Started: 100 Finished: 3 aggregate =104 in 13.4s =7.7/s Avg: 8027 Min: 4191 Max: 8434 Err: 0 (0.00%) aggregate + 96 in 7s = 14.5/s Avg: 6160 Min: 5295 Max: 6625 Err: 0 (0.00%) Active: 0 Started: 100 Finished: 100 aggregate =200 in15s = 13.6/s Avg: 7131 Min: 4191 Max: 8434 Err: 0 (0.00%) Tidying up ...@ Wed Jul 16 15:59:43 CST 2014 (1405497583461) ... end of run [solr@solr2 test]$ --- Step 3:To be continued,after the test,I do the DIH importing job again using the same import expresion.However the performance of the DIH becomes so unacceptable. to import the 10 docs takes 2 m 15 s [3]! Having noticing that ,solr can fetched the 10 docs fast,the processing is slow. [3]--- *Indexing completed. Added/Updated: 10 documents. Deleted 0 documents. (Duration: 2m 15s)* Requests: 1 (0/s), Fetched: 10 (0/s), Skipped: 0, Processed: 10 (0/s) Started: about an hour ago --- By the way. jvm gc goes normal,and there is no long full gc during the test. the load of my system(rhel 6.5) are also normal. Regards
TrieDateField, precisionStep impact on sorting performance
Hello, I'd like to sort on a TrieDateField which currently has a precisionStep value of 6. From what I got so far, the precisionStep value only affects range query performance and index size. However, the documentation for TrieDateField says: 'precisionStep=0 enables efficient date sorting and minimizes index size; precisionStep=8 (the default) enables efficient range queries.' Does this mean sorting performance will suffer for precisionStep values other than 0? Cheers, Dennis
Solr score manager
Hi All, I need a specific score mechanism. I would like to sort my results based on customize scoring field. scoring for example - 1. If this is a new object - 100 2. Edited - 80 3. Recent search - 50 4. Opened - 40 and some more actions... And then when execute a new search they sorted based on score field. Example: Object 1 : opened = 40. Object 2: New = 100 Object 3: edited X 2 + recent search X 1 = 210. Result: Object 3 Object 2 Object 1 Any good article for this? Examples? I'm using Solr with Java. Thanks in advance, Shay.
Re: Solr score manager
How are you storing this information in your documents? Regards, Alex On 16/07/2014 5:03 pm, Shay Sofer sha...@checkpoint.com wrote: Hi All, I need a specific score mechanism. I would like to sort my results based on customize scoring field. scoring for example - 1. If this is a new object - 100 2. Edited - 80 3. Recent search - 50 4. Opened - 40 and some more actions... And then when execute a new search they sorted based on score field. Example: Object 1 : opened = 40. Object 2: New = 100 Object 3: edited X 2 + recent search X 1 = 210. Result: Object 3 Object 2 Object 1 Any good article for this? Examples? I'm using Solr with Java. Thanks in advance, Shay.
Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory
Hi Jia, What happens when you use arr name=last-components instead of arr name=components Ahmet On Wednesday, July 16, 2014 3:07 AM, j...@ece.ubc.ca j...@ece.ubc.ca wrote: Hello everyone :) I have a product called xbox indexed, and when the user search for either x-box or x box i want the xbox product to be returned. I'm new to Solr, and from reading online, I thought I need to use WordDelimiterFilterFactory for x-box case, and WordBreakSolrSpellChecker for x box case. Is this correct? (1) In my schema file, this is what I changed: filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0 preserveOriginal=1/ But I don't see the xbox product returned when the search term is x-box, so I must have missed something (2) I tried to use WordBreakSolrSpellChecker together with DirectSolrSpellChecker as shown below, but the WordBreakSolrSpellChecker never got used: searchComponent name=wc_spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypewc_textSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspellCheck/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.3/float int name=maxEdits2/int int name=minPrefix1/int int name=maxInspections5/int int name=minQueryLength3/int float name=maxQueryFrequency0.01/float float name=thresholdTokenFrequency0.004/float /lst lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldspellCheck/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges10/int /lst /searchComponent requestHandler name=/spellcheck class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=dfSpellCheck/str str name=spellchecktrue/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str str name=spellcheck.build true/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultsfalse/str /lst arr name=components strwc_spellcheck/str /arr /requestHandler I tried to build the dictionary this way: http://localhost/solr/coreName/select?spellcheck=truespellcheck.build=true, but the response returned is this: response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=spellcheck.buildtrue/str str name=spellchecktrue/str /lst /lst str name=commandbuild/str result name=response numFound=0 start=0/ /response What's the correct way to build the dictionary? Even though my requestHandler's name=/spellcheck, i wasn't able to use http://localhost/solr/coreName/spellcheck?spellcheck=truespellcheck.build=true .. is there something wrong with my definition above? (3) I also tried to use WordBreakSolrSpellChecker without the DirectSolrSpellChecker as shown below: searchComponent name=wc_spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypewc_textSpell/str lst name=spellchecker str name=namedefault/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldspellCheck/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges10/int /lst /searchComponent requestHandler name=/spellcheck class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=dfSpellCheck/str str name=spellchecktrue/str str name=spellcheck.dictionarydefault/str !--str name=spellcheck.dictionarywordbreak/str -- str name=spellcheck.build true/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultsfalse/str /lst arr name=components strwc_spellcheck/str /arr /requestHandler And still unable to see WordBreakSolrSpellChecker being called anywhere. Would someone kindly help me? Many thanks, Jia
Re: TrieDateField, precisionStep impact on sorting performance
On Wed, Jul 16, 2014 at 5:51 AM, Kuehn, Dennis dennis.ku...@brands4friends.de wrote: I'd like to sort on a TrieDateField which currently has a precisionStep value of 6. From what I got so far, the precisionStep value only affects range query performance and index size. However, the documentation for TrieDateField says: 'precisionStep=0 enables efficient date sorting and minimizes index size; precisionStep=8 (the default) enables efficient range queries.' Does this mean sorting performance will suffer for precisionStep values other than 0? No, sorting speed is unaffected by precisionStep. That comment looks slightly misleading. -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data
Re: Slow inserts when using Solr Cloud
That's useful to know, thanks very much. I'll look into using CloudSolrServer, although I'm using solrnet at present. That would reduce some of the overhead - but not the extra 200ms I'm getting for forwarding to the replica when the replica is switched on. It does seem a very high overhead. When I consider that it takes 20ms to insert a new document to Solr with replicas disabled (if I route to the correct shard), you might expect it to take two to three times longer if it has to forward to one replica and then wait for a response, but an increase of 200ms seems really high doesn't it? Is there a forum where I should raise that? Thanks again for your help Ian Shalin Shekhar Mangar wrote You can use CloudSolrServer (if you're using Java) which will route documents correctly to the leader of the appropriate shard. On Tue, Jul 15, 2014 at 3:04 PM, ian lt; Ian.Williams@.nhs gt; wrote: Hi Mark Thanks for replying to my post. Would you know whether my findings are consistent with what other people see when using SolrCloud? One thing I want to investigate is whether I can route my updates to the correct shard in the first place, by having my client using the same hashing logic as Solr, and working out in advance which shard my inserts should be sent to. Do you know whether that's an approach that others have used? Thanks again Ian -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-inserts-when-using-Solr-Cloud-tp4146087p4147183.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-inserts-when-using-Solr-Cloud-tp4146087p4147481.html Sent from the Solr - User mailing list archive at Nabble.com.
Fwd: Solr score manager
-- Forwarded message -- From: Shay Sofer sha...@checkpoint.com Date: Wed, Jul 16, 2014 at 6:55 PM That’s my question :-) How should I manage this scoring system. I guess that I need to add new field (my_score) and update him as I want. -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Wednesday, July 16, 2014 1:53 PM To: solr-user Subject: Re: Solr score manager How are you storing this information in your documents? Regards, Alex On 16/07/2014 5:03 pm, Shay Sofer sha...@checkpoint.com wrote: Hi All, I need a specific score mechanism. I would like to sort my results based on customize scoring field. scoring for example - 1. If this is a new object - 100 2. Edited - 80 3. Recent search - 50 4. Opened - 40 and some more actions... And then when execute a new search they sorted based on score field. Example: Object 1 : opened = 40. Object 2: New = 100 Object 3: edited X 2 + recent search X 1 = 210. Result: Object 3 Object 2 Object 1 Any good article for this? Examples? I'm using Solr with Java. Thanks in advance, Shay. Email secured by Check Point
Re: TrieDateField, precisionStep impact on sorting performance
Thanks for clarifying! Dennis On 7/16/14 3:19 PM, Yonik Seeley yo...@heliosearch.com wrote: On Wed, Jul 16, 2014 at 5:51 AM, Kuehn, Dennis dennis.ku...@brands4friends.de wrote: I'd like to sort on a TrieDateField which currently has a precisionStep value of 6. From what I got so far, the precisionStep value only affects range query performance and index size. However, the documentation for TrieDateField says: 'precisionStep=0 enables efficient date sorting and minimizes index size; precisionStep=8 (the default) enables efficient range queries.' Does this mean sorting performance will suffer for precisionStep values other than 0? No, sorting speed is unaffected by precisionStep. That comment looks slightly misleading. -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data
RE: Solr score manager
Shay this presentation I gave at apachecon and dc solr exchange might be useful to you: http://www.slideshare.net/mobile/o19s/hacking-lucene-for-custom-search-results Sent from my Windows Phone From: Shay Sofer Sent: 7/16/2014 6:03 AM To: solr-user@lucene.apache.org Subject: Solr score manager Hi All, I need a specific score mechanism. I would like to sort my results based on customize scoring field. scoring for example - 1. If this is a new object - 100 2. Edited - 80 3. Recent search - 50 4. Opened - 40 and some more actions... And then when execute a new search they sorted based on score field. Example: Object 1 : opened = 40. Object 2: New = 100 Object 3: edited X 2 + recent search X 1 = 210. Result: Object 3 Object 2 Object 1 Any good article for this? Examples? I'm using Solr with Java. Thanks in advance, Shay.
Mixing ordinary and nested documents
Hi Solr users I would appreciate your inputs on how to handle a *mix *of *simple *and *nested *documents in the most easy and flexible way. I need to handle: - simple documens: webpages, short articles etc. (approx. 90% of the content) - nested documents: books containing chapters etc. (approx 10% of the content) For simple documents I just want to present straightforward search results without any grouping etc. For the nested documents I want to group by book and show book title, book price etc. AND the individual results within the book. Lets say there is a hit on Chapters 1 and Chapter 7 within Book 1 and a hit on Article 1, I would like to present this: *Book 1 title* Book 1 published date Book 1 description - *Chapter 1 title* Chapter 1 snippet - *Chapter 7 title* CHapter 7 snippet *Article 1 title* Article 1 published date Article 1 description Article 1 snippet It looks like it is pretty straightforward to use the CollapsingQParser to collapse the book results into one result and not to collapse the other results. But how about showing the information about the book (the parent document of the chapters)? 1) Is there a way to do an* optional block join* to a *parent *document and return it together *with *the *child *document - but not to require a parent document? - or - 2) Do I need to require parent-child documents for everything? This is really not my preferred strategy as only a small part of the documents is in a real parent-child relationship. This would mean a lot of dummy child documents. - or - 3) Should I just denormalize data and include the book information within each chapter document? - or - 4) ... or is there a smarter way? Your help is very much appreciated. Cheers, Bjørn Axelsen
RE: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory
Jia, I agree that for the spellcheckers to work, you need arr name=last-components instead of arr name=components. But the x-box = xbox example ought to be solved by analyzing using WordDelimiterFilterFactory and catenateWords=1 at query-time. Did you re-index after changing your analysis chain (you need to)? Perhaps you can show your full analyzer configuration, and someone here can help you find the problem. Also, the Analysis page on the solr Admin UI is invaluable for debugging text-field analyzer problems. Getting x box to analyze to xbox is trickier (but possible). The WordBreakSpellChecker is probably your best option if you have cases like this in your data users' queries. Of course, if you have a finite number of products that have spelling variants like this, SynonymFilterFactory might be all you need. I would recommend using index-time synonyms for your case rather than query-time synonyms. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Sent: Wednesday, July 16, 2014 7:42 AM To: solr-user@lucene.apache.org; j...@ece.ubc.ca Subject: Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory Hi Jia, What happens when you use arr name=last-components instead of arr name=components Ahmet On Wednesday, July 16, 2014 3:07 AM, j...@ece.ubc.ca j...@ece.ubc.ca wrote: Hello everyone :) I have a product called xbox indexed, and when the user search for either x-box or x box i want the xbox product to be returned. I'm new to Solr, and from reading online, I thought I need to use WordDelimiterFilterFactory for x-box case, and WordBreakSolrSpellChecker for x box case. Is this correct? (1) In my schema file, this is what I changed: filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0 preserveOriginal=1/ But I don't see the xbox product returned when the search term is x-box, so I must have missed something (2) I tried to use WordBreakSolrSpellChecker together with DirectSolrSpellChecker as shown below, but the WordBreakSolrSpellChecker never got used: searchComponent name=wc_spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypewc_textSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspellCheck/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.3/float int name=maxEdits2/int int name=minPrefix1/int int name=maxInspections5/int int name=minQueryLength3/int float name=maxQueryFrequency0.01/float float name=thresholdTokenFrequency0.004/float /lst lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldspellCheck/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges10/int /lst /searchComponent requestHandler name=/spellcheck class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=dfSpellCheck/str str name=spellchecktrue/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str str name=spellcheck.build true/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultsfalse/str /lst arr name=components strwc_spellcheck/str /arr /requestHandler I tried to build the dictionary this way: http://localhost/solr/coreName/select?spellcheck=truespellcheck.build=true, but the response returned is this: response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=spellcheck.buildtrue/str str name=spellchecktrue/str /lst /lst str name=commandbuild/str result name=response numFound=0 start=0/ /response What's the correct way to build the dictionary? Even though my requestHandler's name=/spellcheck, i wasn't able to use http://localhost/solr/coreName/spellcheck?spellcheck=truespellcheck.build=true .. is there something wrong with my definition above? (3) I also tried to use WordBreakSolrSpellChecker without the DirectSolrSpellChecker as shown below: searchComponent name=wc_spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypewc_textSpell/str lst name=spellchecker str name=namedefault/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldspellCheck/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges10/int /lst /searchComponent requestHandler name=/spellcheck class=org.apache.solr.handler.component.SearchHandler lst name=defaults
Strange Scoring Results
Hey All - I’m a Solr newbie in need of some help. I’m using Apache Nutch to crawl a site and populate a Solr core, which we then use to query search results. I’ve got it all up and running, but the Solr scoring results I get don’t seem to make any sense. Let’s take the following query as an example: content:devlearn 2014 registration information I have a page with a title of DevLearn 2014 Conference Expo - Registration Information” and a url of www.mydomain.com/DevLearn/content/3426/devlearn-2014-conference--expo--registration-information/“ which has multiple instances of all terms in the content field. I would expect this document to be returned at the top of the list, since in addition to being in the content field, all terms are in both the title and the url, which I’m boosting for. Instead, it returns as number 3320 in the results with a score of 0. Meanwhile, 3319 other pages return with higher scores, and all of these have fewer instances of the terms in the content field, and one or fewer of the terms in the title or url. Below is the select requestHandler section from my solrconfig.xml which shows the query select defaults. Let me know if I should include more of this file or any other information: requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dftext/str str name=hlon/str str name=hl.flcontent/str str name=hl.encoderhtml/str str name=hl.simple.prelt;stronggt;/str str name=hl.simple.postlt;/stronggt;/str str name=f.content.hl.snippets1/str str name=f.content.hl.fragsize200/str str name=f.content.hl.alternateFieldcontent/str str name=f.content.hl.maxAlternateFieldLength750/str str name=defTypeedismax/str str name=qf content^0.5 url^10.0 title^10.0 /str str name=dfcontent/str str name=mm100%/str str name=q.alt*:*/str str name=rows10/str str name=fl*,score/str str name=pf content^0.5 url^10.0 title^10.0 /str str name=ps100/str /lst /requestHandler
Re: Slow inserts when using Solr Cloud
Hi Ian, What's the CPU doing on the leader? Have you tried attaching a profiler to the leader while running and then seeing if there are any hotspots showing. Not sure if this is related but we recently fixed an issue in the area of leader forwarding to replica that used too many CPU cycles inefficiently - see SOLR-6136. Tim On Wed, Jul 16, 2014 at 7:49 AM, ian ian.willi...@wales.nhs.uk wrote: That's useful to know, thanks very much. I'll look into using CloudSolrServer, although I'm using solrnet at present. That would reduce some of the overhead - but not the extra 200ms I'm getting for forwarding to the replica when the replica is switched on. It does seem a very high overhead. When I consider that it takes 20ms to insert a new document to Solr with replicas disabled (if I route to the correct shard), you might expect it to take two to three times longer if it has to forward to one replica and then wait for a response, but an increase of 200ms seems really high doesn't it? Is there a forum where I should raise that? Thanks again for your help Ian Shalin Shekhar Mangar wrote You can use CloudSolrServer (if you're using Java) which will route documents correctly to the leader of the appropriate shard. On Tue, Jul 15, 2014 at 3:04 PM, ian lt; Ian.Williams@.nhs gt; wrote: Hi Mark Thanks for replying to my post. Would you know whether my findings are consistent with what other people see when using SolrCloud? One thing I want to investigate is whether I can route my updates to the correct shard in the first place, by having my client using the same hashing logic as Solr, and working out in advance which shard my inserts should be sent to. Do you know whether that's an approach that others have used? Thanks again Ian -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-inserts-when-using-Solr-Cloud-tp4146087p4147183.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-inserts-when-using-Solr-Cloud-tp4146087p4147481.html Sent from the Solr - User mailing list archive at Nabble.com.
solr-4.9.0 : [OverseerExitThread] but has failed to stop it. This is very likely to create a memory leak
Hi, When I am starting the SolrCloud (4.9) version top of the Tomcat, its throwing the below error message, I am using the JAVA runtime for memory leak exception . Summary of error message, [OverseerExitThread] but has failed to stop it. This is very likely to create a memory leak Detailed error message here, 16-Jul-2014 15:14:01.044 INFO [Thread-5] com.springsource.tcserver.licensing.LicensingLifecycleListener.setComponentS tate ComponentState to off 16-Jul-2014 15:14:01.049 INFO [Thread-5] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler [http-bio-8080] 16-Jul-2014 15:14:01.049 INFO [Thread-5] org.apache.catalina.core.StandardService.stopInternal Stopping service Catalina 16-Jul-2014 15:14:01.091 SEVERE [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web application [/solr-4.9.0] appears to have started a thread named [localhost-startStop-1-SendThread(cpsslrsbx01:2181)] but has failed to stop it. This is very likely to create a memory leak. 16-Jul-2014 15:14:01.091 SEVERE [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web application [/solr-4.9.0] appears to have started a thread named [localhost-startStop-1-EventThread] but has failed to stop it. This is very likely to create a memory leak. 16-Jul-2014 15:14:01.091 SEVERE [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web application [/solr-4.9.0] appears to have started a thread named [OverseerExitThread] but has failed to stop it. This is very likely to create a memory leak. 16-Jul-2014 15:14:01.093 INFO [Thread-5] org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler [http-bio-8080] 16-Jul-2014 15:14:01.094 INFO [Thread-5] org.apache.coyote.AbstractProtocol.destroy Destroying ProtocolHandler [http-bio-8080] 16-Jul-2014 15:31:40.834 INFO [main] com.springsource.tcserver.security.PropertyDecoder.init tc Runtime property decoder using memory-based key 16-Jul-2014 15:31:41.131 INFO [main] com.springsource.tcserver.security.PropertyDecoder.init tcServer Runtime property decoder has been initialized in 301 ms 16-Jul-2014 15:31:43.978 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler [http-bio-8080] 16-Jul-2014 15:31:45.141 INFO [main] com.springsource.tcserver.licensing.LicensingLifecycleListener.setComponentS tate ComponentState to on 16-Jul-2014 15:31:45.345 INFO [main] com.springsource.tcserver.serviceability.rmi.JmxSocketListener.init Started up JMX registry on 127.0.0.1:6969 in 187 ms 16-Jul-2014 15:31:45.370 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service Catalina 16-Jul-2014 15:31:45.370 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet Engine: VMware vFabric tc Runtime 2.9.2.RELEASE/7.0.39.B.RELEASE 16-Jul-2014 15:31:45.384 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployWAR Deploying web application archive /apps/ecps/vfabric-tc-server-standard-2.9.2.RELEASE/cps_8080/webapps/solr-4. 9.0.war 16-Jul-2014 15:31:48.204 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory /apps/ecps/vfabric-tc-server-standard-2.9.2.RELEASE/cps_8080/webapps/ROOT 16-Jul-2014 15:31:48.349 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler [http-bio-8080]
Re: Using hundreds of dynamic fields
Thanks, Jack and Jared, for your input on this. I'm looking into whether parent-child relationships via block or query time join will meet my requirements. Jack, I noticed in a bunch of other posts around the web that you've suggested to use dynamic fields in moderation. Is this suggestion based on negative performance implications of having to read and rewrite all previous fields for a document when doing atomic updates? Or are there additional inherent negatives to using lots of dynamic fields? Andy On Fri, Jun 27, 2014 at 11:46 AM, Jared Whiklo jared.whi...@umanitoba.ca wrote: This is probably not the best answer, but my gut says that even if you changed your document to a simple 2 fields and have one as your metric and the other as a TrieDateField you would speed up and simplify your date range queries. -- Jared Whiklo On 2014-06-27 10:10 AM, Andy Crossen acros...@gmail.com wrote: Hi folks, My application requires tracking a daily performance metric for all documents. I start tracking for an 18 month window from the time a doc is indexed, so each doc will have ~548 of these fields. I have in my schema a dynamic field to capture this requirement: dynamicField name=“metric_*” type=int …/ Example: metric_2014_06_24 : 15 metric_2014_06_25 : 21 … My application then issues a query that: a) sorts documents by the sum of the metrics within a date range that is variable for each query; b) gathers stats on the metrics using the Statistics component. With this design, the app must unfortunately: a) construct the sort as a long list of fields within the spec’d date range to accomplish the sum; e.g. sort=sum(metric_2014_06_24,metric_2014_06_25…) desc b) specify each field in the range independently to the Stats component; e.g. stats.field=metric_2014_06_24stats.field=metric_2014_06_25… Am I missing a cleaner way to accomplish this given the requirements above? Thanks for any suggestions you may have.
clearing fieldValueCache in solr 4.6
Hello. We're just starting to use solr in production. We've indexed 18,000 documents or so. We've just implemented faceted search results. We mistakenly stored integer ids in what was meant to be a string field. So, our facet results are showing numbers instead of the textual values. After fixing this oversight, reindexing the documents yields the correct results, but the faceted results still return the integer ids in addition to the enumerated values (the counts with the integer ids are zero). It looks like fieldValueCache is doing this. Is there any way to empty the cache? I've tried reloading the core through the admin, which didn't work, and haven't been able to find (REST-like) API documentation on fieldValueCache. We want to avoid emptying the index, if possible (or if that would even work). thanks! -Matt LeMay
Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem
I know u mentioned you have a single machine at play - but do you have multiple nodes on the machine that talk to one another ?? Does your problem recur when the load on the system is low ? Also faced a similar problem wherein the 5 second delay (described in detail on my other post) kept happening after a 1.5 minute inactivity interval. This was explained off as Solr keeping alive the http connection for inter-node communication for around 1.5 minutes before disconnecting - and if a new request happens post 1.5 minutes then, a new connection is created - which probably suffers a latency due to a DNS Name Lookup delay. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-irregularly-having-QTime-5ms-stracing-solr-cures-the-problem-tp4146047p4147512.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: clearing fieldValueCache in solr 4.6
One thing you could do is: 1. If you current index is called A1, then you can create a new index called A2 with the correct schema.xml / solrconfig.xml 2. Index your 18,000 documents into A2 afresh 3. Then delete A1 (the bad index) 4. Then quickly create an Alias with the name of A1 pointng to A2 - This way your consumers will still think they are talking to A1 - but in fact they would be querying against the new index. -- View this message in context: http://lucene.472066.n3.nabble.com/clearing-fieldValueCache-in-solr-4-6-tp4147509p4147514.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strategies for effective prefix queries?
A copy field does not address my problem, and this has nothing to do with stored fields. This is a query parsing problem, not an indexing problem. Here's the use case. If someone has a username like bob-smith, I would like it to match prefixes of bo and sm. I tokenize the username into the tokens bob and smith. Everything is fine so far. If someone enters bo sm as a search string, I would like bob-smith to be one of the results. The query to do this is straight forward, username:bo* username:sm*. Here's the problem. In order to construct that query, I have to tokenize the search string bo sm **on the client**. I don't want to reimplement tokenization on the client. Is there any way to give Solr the string bo sm, have Solr do the tokenization, then treat each token like a prefix? On Tue, Jul 15, 2014 at 4:55 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: So copyField it to another and apply alternative processing there. Use eDismax to search both. No need to store the copied field, just index it. Regards, Alex On 16/07/2014 2:46 am, Hayden Muhl haydenm...@gmail.com wrote: Both fields? There is only one field here: username. On Mon, Jul 14, 2014 at 6:17 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Search against both fields (one split, one not split)? Keep original and tokenized form? I am doing something similar with class name autocompletes here: https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24 Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Tue, Jul 15, 2014 at 8:04 AM, Hayden Muhl haydenm...@gmail.com wrote: I'm working on using Solr for autocompleting usernames. I'm running into a problem with the wildcard queries (e.g. username:al*). We are tokenizing usernames so that a username like solr-user will be tokenized into solr and user, and will match both sol and use prefixes. The problem is when we get solr-u as a prefix, I'm having to split that up on the client side before I construct a query username:solr* username:u*. I'm basically using a regex as a poor man's tokenizer. Is there a better way to approach this? Is there a way to tell Solr to tokenize a string and use the parts as prefixes? - Hayden
Re: Using hundreds of dynamic fields
I guess I'm just a big fan of simpler and cleaner data models! Especially if I were to have to look at somebody's data model and try to make sense out of it, such as how to keep all the fields straight for constructing queries. But atomic update and the need to read and rewrite all the fields is a concern as well. -- Jack Krupansky -Original Message- From: Andy Crossen Sent: Wednesday, July 16, 2014 1:05 PM To: solr-user@lucene.apache.org Subject: Re: Using hundreds of dynamic fields Thanks, Jack and Jared, for your input on this. I'm looking into whether parent-child relationships via block or query time join will meet my requirements. Jack, I noticed in a bunch of other posts around the web that you've suggested to use dynamic fields in moderation. Is this suggestion based on negative performance implications of having to read and rewrite all previous fields for a document when doing atomic updates? Or are there additional inherent negatives to using lots of dynamic fields? Andy On Fri, Jun 27, 2014 at 11:46 AM, Jared Whiklo jared.whi...@umanitoba.ca wrote: This is probably not the best answer, but my gut says that even if you changed your document to a simple 2 fields and have one as your metric and the other as a TrieDateField you would speed up and simplify your date range queries. -- Jared Whiklo On 2014-06-27 10:10 AM, Andy Crossen acros...@gmail.com wrote: Hi folks, My application requires tracking a daily performance metric for all documents. I start tracking for an 18 month window from the time a doc is indexed, so each doc will have ~548 of these fields. I have in my schema a dynamic field to capture this requirement: dynamicField name=“metric_*” type=int …/ Example: metric_2014_06_24 : 15 metric_2014_06_25 : 21 … My application then issues a query that: a) sorts documents by the sum of the metrics within a date range that is variable for each query; b) gathers stats on the metrics using the Statistics component. With this design, the app must unfortunately: a) construct the sort as a long list of fields within the spec’d date range to accomplish the sum; e.g. sort=sum(metric_2014_06_24,metric_2014_06_25…) desc b) specify each field in the range independently to the Stats component; e.g. stats.field=metric_2014_06_24stats.field=metric_2014_06_25… Am I missing a cleaner way to accomplish this given the requirements above? Thanks for any suggestions you may have.
Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory
Which tokenizer are you using? StandardTokenizer will split x-box into x and box, same as x box. If there's not too many of these, you could also use the PatternReplaceCharFilterFactory to map x box and x-box to xbox before the tokenizer. Diego Fernandez - 爱国 Software Engineer US GSS Supportability - Diagnostics - Original Message - Jia, I agree that for the spellcheckers to work, you need arr name=last-components instead of arr name=components. But the x-box = xbox example ought to be solved by analyzing using WordDelimiterFilterFactory and catenateWords=1 at query-time. Did you re-index after changing your analysis chain (you need to)? Perhaps you can show your full analyzer configuration, and someone here can help you find the problem. Also, the Analysis page on the solr Admin UI is invaluable for debugging text-field analyzer problems. Getting x box to analyze to xbox is trickier (but possible). The WordBreakSpellChecker is probably your best option if you have cases like this in your data users' queries. Of course, if you have a finite number of products that have spelling variants like this, SynonymFilterFactory might be all you need. I would recommend using index-time synonyms for your case rather than query-time synonyms. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Sent: Wednesday, July 16, 2014 7:42 AM To: solr-user@lucene.apache.org; j...@ece.ubc.ca Subject: Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory Hi Jia, What happens when you use arr name=last-components instead of arr name=components Ahmet On Wednesday, July 16, 2014 3:07 AM, j...@ece.ubc.ca j...@ece.ubc.ca wrote: Hello everyone :) I have a product called xbox indexed, and when the user search for either x-box or x box i want the xbox product to be returned. I'm new to Solr, and from reading online, I thought I need to use WordDelimiterFilterFactory for x-box case, and WordBreakSolrSpellChecker for x box case. Is this correct? (1) In my schema file, this is what I changed: filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0 preserveOriginal=1/ But I don't see the xbox product returned when the search term is x-box, so I must have missed something (2) I tried to use WordBreakSolrSpellChecker together with DirectSolrSpellChecker as shown below, but the WordBreakSolrSpellChecker never got used: searchComponent name=wc_spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypewc_textSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspellCheck/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.3/float int name=maxEdits2/int int name=minPrefix1/int int name=maxInspections5/int int name=minQueryLength3/int float name=maxQueryFrequency0.01/float float name=thresholdTokenFrequency0.004/float /lst lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldspellCheck/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges10/int /lst /searchComponent requestHandler name=/spellcheck class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=dfSpellCheck/str str name=spellchecktrue/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str str name=spellcheck.build true/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultsfalse/str /lst arr name=components strwc_spellcheck/str /arr /requestHandler I tried to build the dictionary this way: http://localhost/solr/coreName/select?spellcheck=truespellcheck.build=true, but the response returned is this: response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=spellcheck.buildtrue/str str name=spellchecktrue/str /lst /lst str name=commandbuild/str result name=response numFound=0 start=0/ /response What's the correct way to build the dictionary? Even though my requestHandler's name=/spellcheck, i wasn't able to use http://localhost/solr/coreName/spellcheck?spellcheck=truespellcheck.build=true .. is there something wrong with my definition above? (3) I also tried to use WordBreakSolrSpellChecker without the DirectSolrSpellChecker as shown below: searchComponent
Updating Oracle
Hi, I am new to Solr so I just want to know is something is possible. I might need some help coding later on after taking the tutorials. I am taking over a program that uses html and java script to dislay metadata from solr. They now would like to update one field. The solr db gets refeshed weekly from an oracle db. So in order to save the changes the oracle db needs to be updated. But to keep the updated field visible to the users I would like to update solr with the changes and then update oracle Can solr fire a data base trigger on the field updated to update oracle? What is called in solr and can you point me to an example? The other option is to add/modify the code to update oracle and solr from the application but this would be a lot of work. If this is the only option can you point to an example of updating a field in solr? What are my options? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Updating-Oracle-tp4147537.html Sent from the Solr - User mailing list archive at Nabble.com.
Shard Replicas not getting replicated data from leader
Hi, I have setup 4 Solr (4.9.0) Nodes into a single shard for a given collection, meaning I should have 4 replicated nodes. I have 3 Zookeepers in ensemble managing the configs for this collection. I have a load balancer in front of the 4 nodes to split traffic between them. I start this collection with an empty data/index directory. When I send /update requests to the load balancers I see these going to all 4 nodes. Also, I can see that all FOLLOWERs distribute the requests they receive to the LEADER as is expected. But for some reason the FOLLOWERS are not getting /replication requests from the LEADER. So the collection for the leader contains many thousand of documents and is on the 8th generation. I see that it's replicable in the admin interface, yet all FOLLOWER nodes have an empty index. Hence, I need your insights please. Thanks, Marc To Note: When I startup my nodes I see the following error in solr.log: 1) When Zookeeper does a clusterstate update, all nodes have their starte DOWN, why? This I means that in the Solr Admin interface they show up has down. This never updates to active. 2) I have a warning : org.apache.solr.rest.ManagedResource; No registered observers for /rest/managed, which I need to update solrconfig.xml to fix 3) I have the following error: ERROR - 2014-07-16 19:49:25.336; org.apache.solr.cloud.SyncStrategy; No UpdateLog found - cannot sync SOLR.LOG - [] INFO - 2014-07-16 19:47:30.870; org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update state numShards=null message={ operation:state, state:down, base_url:http://192.168.150.90:8983/solr;, core:collection_name, roles:null, node_name:192.168.150.90:8983_solr, shard:null, collection:collection_name, numShards:null, core_node_name:null} INFO - 2014-07-16 19:47:30.871; org.apache.solr.cloud.Overseer$ClusterStateUpdater; node=core_node1 is already registered [] WARN - 2014-07-16 19:47:34.535; org.apache.solr.rest.ManagedResource; No registered observers for /rest/managed [] INFO - 2014-07-16 19:48:25.135; org.apache.solr.common.cloud.ZkStateReader$3; Updating live nodes... (2) INFO - 2014-07-16 19:48:25.287; org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged INFO - 2014-07-16 19:48:25.291; org.apache.solr.common.cloud.ZkStateReader; Updating cloud state from ZooKeeper... INFO - 2014-07-16 19:48:25.293; org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update state numShards=null message={ operation:state, state:down, base_url:http://192.168.200.90:8983/solr;, core:collection_name, roles:null, node_name:192.168.200.90:8983_solr, shard:null, collection:collection_name, numShards:null, core_node_name:null} INFO - 2014-07-16 19:48:25.293; org.apache.solr.cloud.Overseer$ClusterStateUpdater; node=core_node2 is already registered INFO - 2014-07-16 19:48:25.293; org.apache.solr.cloud.Overseer$ClusterStateUpdater; shard=shard1 is already registered [] INFO - 2014-07-16 19:49:00.188; org.apache.solr.common.cloud.ZkStateReader$3; Updating live nodes... (3) INFO - 2014-07-16 19:49:00.322; org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged INFO - 2014-07-16 19:49:00.335; org.apache.solr.common.cloud.ZkStateReader; Updating cloud state from ZooKeeper... INFO - 2014-07-16 19:49:00.337; org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update state numShards=null message={ operation:state, state:down, base_url:http://192.168.200.91:8983/solr;, core:collection_name, roles:null, node_name:192.168.200.91:8983_solr, shard:null, collection:collection_name, numShards:null, core_node_name:null} INFO - 2014-07-16 19:49:00.337; org.apache.solr.cloud.Overseer$ClusterStateUpdater; node=core_node3 is already registered INFO - 2014-07-16 19:49:00.337; org.apache.solr.cloud.Overseer$ClusterStateUpdater; shard=shard1 is already registered [] INFO - 2014-07-16 19:49:21.220; org.apache.solr.common.cloud.ZkStateReader$3; Updating live nodes... (4) INFO - 2014-07-16 19:49:21.350; org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged INFO - 2014-07-16 19:49:21.357; org.apache.solr.common.cloud.ZkStateReader; Updating cloud state from ZooKeeper... INFO - 2014-07-16 19:49:21.359; org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update state numShards=null message={ operation:state, state:down, base_url:http://192.168.150.91:8983/solr;, core:collection_name, roles:null, node_name:192.168.150.91:8983_solr, shard:null, collection:collection_name, numShards:null, core_node_name:null} INFO - 2014-07-16 19:49:21.359; org.apache.solr.cloud.Overseer$ClusterStateUpdater; node=core_node4 is already registered INFO
Re: Updating Oracle
On 7/16/2014 1:45 PM, Jason Bourne wrote: I am new to Solr so I just want to know is something is possible. I might need some help coding later on after taking the tutorials. I am taking over a program that uses html and java script to dislay metadata from solr. They now would like to update one field. The solr db gets refeshed weekly from an oracle db. So in order to save the changes the oracle db needs to be updated. But to keep the updated field visible to the users I would like to update solr with the changes and then update oracle Can solr fire a data base trigger on the field updated to update oracle? What is called in solr and can you point me to an example? The other option is to add/modify the code to update oracle and solr from the application but this would be a lot of work. If this is the only option can you point to an example of updating a field in solr? What are my options? Thanks. It would be better to have systems outside of Solr manage this, make the change in Oracle and Solr at the same time. If you wanted to manage it from within Solr, you could write a custom Update Processor that looks at all of the updates that come into Solr, decides which of them require changes in Oracle, and makes those changes. You would then include that custom update processor in Solr. Most likely that would be Java code written against the solr-core API and condensed down into a jar file. You would then include that jar in your classpath and reference the class in an update chain configuration. https://wiki.apache.org/solr/UpdateRequestProcessor Thanks, Shawn
Upper or Lower Case
Hi , If I search 'Transmission Flush' it get the good match results, but when I use 'transmission flush' I get different order of results, I search for the Name column in the schema and it has below config for the field type. Any clue what is wrong or is there any Conf changes need to get the same results.? fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.KStemFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=0/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.KStemFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=0/ /analyzer /fieldType
Re: Memory leak for debugQuery?
Tom - You could maybe isolate it a little further by seeing using the “debug parameter with values of timing|query|results Erik On May 15, 2014, at 5:50 PM, Tom Burton-West tburt...@umich.edu wrote: Hello all, I'm trying to get relevance scoring information for each of 1,000 docs returned for each of 250 queries.If I run the query (appended below) without debugQuery=on, I have no problem with getting all the results with under 4GB of memory use. If I add the parameter debugQuery=on, memory use goes up continuously and after about 20 queries (with 1,000 results each), memory use reaches about 29.1 GB and the garbage collector gives up: org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.OutOfMemoryError: GC overhead limit exceeded I've attached a jmap -histo, exgerpt below. Is this a known issue with debugQuery? Tom query: q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2debugQuery=on without debugQuery=on: q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2 num #instances#bytes Class description -- 1: 585,559 10,292,067,456 byte[] 2: 743,639 18,874,349,592 char[] 3: 53,821 91,936,328 long[] 4: 70,430 69,234,400 int[] 5: 51,348 27,111,744 org.apache.lucene.util.fst.FST$Arc[] 6: 286,357 20,617,704 org.apache.lucene.util.fst.FST$Arc 7: 715,364 17,168,736 java.lang.String 8: 79,561 12,547,792 * ConstMethodKlass 9: 18,909 11,404,696 short[] 10: 345,854 11,067,328 java.util.HashMap$Entry 11: 8,823 10,351,024 * ConstantPoolKlass 12: 79,561 10,193,328 * MethodKlass 13: 228,587 9,143,480 org.apache.lucene.document.FieldType 14: 228,584 9,143,360 org.apache.lucene.document.Field 15: 368,423 8,842,152 org.apache.lucene.util.BytesRef 16: 210,342 8,413,680 java.util.TreeMap$Entry 17: 81,576 8,204,648 java.util.HashMap$Entry[] 18: 107,921 7,770,312 org.apache.lucene.util.fst.FST$Arc 19: 13,020 6,874,560 org.apache.lucene.util.fst.FST$Arc[] debugQuery_jmap.txt
Re: Upper or Lower Case
Hi, you need to put lowercase filter before kstem filter. Ahmet On Wednesday, July 16, 2014 11:55 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi , If I search 'Transmission Flush' it get the good match results, but when I use 'transmission flush' I get different order of results, I search for the Name column in the schema and it has below config for the field type. Any clue what is wrong or is there any Conf changes need to get the same results.? fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.KStemFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=0/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.KStemFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=0/ /analyzer /fieldType
Re: Memory leak for debugQuery?
Also, is this trunk? Solr 4.x? Single shard, right? On Wed, Jul 16, 2014 at 2:24 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Tom - You could maybe isolate it a little further by seeing using the “debug parameter with values of timing|query|results Erik On May 15, 2014, at 5:50 PM, Tom Burton-West tburt...@umich.edu wrote: Hello all, I'm trying to get relevance scoring information for each of 1,000 docs returned for each of 250 queries.If I run the query (appended below) without debugQuery=on, I have no problem with getting all the results with under 4GB of memory use. If I add the parameter debugQuery=on, memory use goes up continuously and after about 20 queries (with 1,000 results each), memory use reaches about 29.1 GB and the garbage collector gives up: org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.OutOfMemoryError: GC overhead limit exceeded I've attached a jmap -histo, exgerpt below. Is this a known issue with debugQuery? Tom query: q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2debugQuery=on without debugQuery=on: q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2 num #instances#bytes Class description -- 1: 585,559 10,292,067,456 byte[] 2: 743,639 18,874,349,592 char[] 3: 53,821 91,936,328 long[] 4: 70,430 69,234,400 int[] 5: 51,348 27,111,744 org.apache.lucene.util.fst.FST$Arc[] 6: 286,357 20,617,704 org.apache.lucene.util.fst.FST$Arc 7: 715,364 17,168,736 java.lang.String 8: 79,561 12,547,792 * ConstMethodKlass 9: 18,909 11,404,696 short[] 10: 345,854 11,067,328 java.util.HashMap$Entry 11: 8,823 10,351,024 * ConstantPoolKlass 12: 79,561 10,193,328 * MethodKlass 13: 228,587 9,143,480 org.apache.lucene.document.FieldType 14: 228,584 9,143,360 org.apache.lucene.document.Field 15: 368,423 8,842,152 org.apache.lucene.util.BytesRef 16: 210,342 8,413,680 java.util.TreeMap$Entry 17: 81,576 8,204,648 java.util.HashMap$Entry[] 18: 107,921 7,770,312 org.apache.lucene.util.fst.FST$Arc 19: 13,020 6,874,560 org.apache.lucene.util.fst.FST$Arc[] debugQuery_jmap.txt
Re: Strategies for effective prefix queries?
Your first and last email seem to be contradicting. You said initially you wanted to search for solr-u and match that. Now you are saying you want to search bo sm and match that. Either way, I do have very similar scenario working in the project I sent you a link to. I am breaking on full-stops and case changes for Javadoc names. You can try it live for yourself here: http://www.solr-start.com/javadoc/solr-lucene/index.html (Search for To Fi to match for TokenFilter). Regards, Alex Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Thu, Jul 17, 2014 at 1:00 AM, Hayden Muhl haydenm...@gmail.com wrote: A copy field does not address my problem, and this has nothing to do with stored fields. This is a query parsing problem, not an indexing problem. Here's the use case. If someone has a username like bob-smith, I would like it to match prefixes of bo and sm. I tokenize the username into the tokens bob and smith. Everything is fine so far. If someone enters bo sm as a search string, I would like bob-smith to be one of the results. The query to do this is straight forward, username:bo* username:sm*. Here's the problem. In order to construct that query, I have to tokenize the search string bo sm **on the client**. I don't want to reimplement tokenization on the client. Is there any way to give Solr the string bo sm, have Solr do the tokenization, then treat each token like a prefix? On Tue, Jul 15, 2014 at 4:55 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: So copyField it to another and apply alternative processing there. Use eDismax to search both. No need to store the copied field, just index it. Regards, Alex On 16/07/2014 2:46 am, Hayden Muhl haydenm...@gmail.com wrote: Both fields? There is only one field here: username. On Mon, Jul 14, 2014 at 6:17 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Search against both fields (one split, one not split)? Keep original and tokenized form? I am doing something similar with class name autocompletes here: https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24 Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Tue, Jul 15, 2014 at 8:04 AM, Hayden Muhl haydenm...@gmail.com wrote: I'm working on using Solr for autocompleting usernames. I'm running into a problem with the wildcard queries (e.g. username:al*). We are tokenizing usernames so that a username like solr-user will be tokenized into solr and user, and will match both sol and use prefixes. The problem is when we get solr-u as a prefix, I'm having to split that up on the client side before I construct a query username:solr* username:u*. I'm basically using a regex as a poor man's tokenizer. Is there a better way to approach this? Is there a way to tell Solr to tokenize a string and use the parts as prefixes? - Hayden
Re: Strategies for effective prefix queries?
Perhaps what you’re trying to do could be addressed by using the EdgeNGramFilterFactory filter? For query suggestions I’m using a very similar approach, this is an extract of the configuration I’m using: tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory maxGramSize=“10 minGramSize=1/ Basically this allows you to get partial matches from any part of the string, let’s say the field get’s this content at index time: A brown fox”, this document will be matched by the query (“bro”) for instance. My personal recommendation is to use this in a separated field that get’s populated through a copyField, this way you could apply different boosts. Greetings, On Jul 16, 2014, at 2:00 PM, Hayden Muhl haydenm...@gmail.com wrote: A copy field does not address my problem, and this has nothing to do with stored fields. This is a query parsing problem, not an indexing problem. Here's the use case. If someone has a username like bob-smith, I would like it to match prefixes of bo and sm. I tokenize the username into the tokens bob and smith. Everything is fine so far. If someone enters bo sm as a search string, I would like bob-smith to be one of the results. The query to do this is straight forward, username:bo* username:sm*. Here's the problem. In order to construct that query, I have to tokenize the search string bo sm **on the client**. I don't want to reimplement tokenization on the client. Is there any way to give Solr the string bo sm, have Solr do the tokenization, then treat each token like a prefix? On Tue, Jul 15, 2014 at 4:55 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: So copyField it to another and apply alternative processing there. Use eDismax to search both. No need to store the copied field, just index it. Regards, Alex On 16/07/2014 2:46 am, Hayden Muhl haydenm...@gmail.com wrote: Both fields? There is only one field here: username. On Mon, Jul 14, 2014 at 6:17 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Search against both fields (one split, one not split)? Keep original and tokenized form? I am doing something similar with class name autocompletes here: https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24 Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Tue, Jul 15, 2014 at 8:04 AM, Hayden Muhl haydenm...@gmail.com wrote: I'm working on using Solr for autocompleting usernames. I'm running into a problem with the wildcard queries (e.g. username:al*). We are tokenizing usernames so that a username like solr-user will be tokenized into solr and user, and will match both sol and use prefixes. The problem is when we get solr-u as a prefix, I'm having to split that up on the client side before I construct a query username:solr* username:u*. I'm basically using a regex as a poor man's tokenizer. Is there a better way to approach this? Is there a way to tell Solr to tokenize a string and use the parts as prefixes? - Hayden VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu
Re: problem with replication/solrcloud - getting 'missing required field' during update intermittently (SOLR-6251)
FYI. We finally tracked down the problem at least 99.9% sure at this point, and it was staring me in the face the whole time - just never noticed: [{id:4b2c4d09-31e2-4fe2-b767-3868efbdcda1,channel: {add: preet},channel: {add: adam}}] Look at the JSON... It's trying to add two channel array elements... Should have been: [{id:4b2c4d09-31e2-4fe2-b767-3868efbdcda1,channel: {add: preet}}, {id:4b2c4d09-31e2-4fe2-b767-3868efbdcda1,channel: {add: adam}}] I half wonder how it chose to interpret that particular chunk of json, but either way, I think the origin of our issue is resolved. From what I'm reading on JSON - this isn't valid syntax at all. I'm guessing that SOLR doesn't actually validate the JSON, and it's parser is just creating something weird in that situation like a new request for a whole new document. -- Nathan On 07/15/2014 07:19 PM, Nathan Neulinger wrote: Issue was closed in Jira requesting it be discussed here first. Looking for any diagnostic assistance on this issue with 4.8.0 since it is intermittent and occurs without warning. Setup is two nodes, with external zk ensemble. Nodes are accessed round-robin on EC2 behind an ELB. Schema has: schema name=hive version=1.5 ... field name=timestamp type=long indexed=false stored=true required=true multiValued=false omitNorms=true / ... Most of the updates are working without issue, but randomly we'll get the above failure, even though searches before and after the update clearly indicate that the document had the timestamp field in it. The error occurs when the second node does it's distrib operation against the first node. Diagnostic details are all in the jira issue. Can provide more as needed, but would appreciate any suggestions on what to try or to help diagnose this other than just trying to throw thousands of requests at it in round-robin between the two instances to see if it's possible to reproduce the issue. -- Nathan Nathan Neulinger nn...@neulinger.org Neulinger Consulting (573) 612-1412 -- Nathan Neulinger nn...@neulinger.org Neulinger Consulting (573) 612-1412
Inconsistant result's on solr cloud 4.8
Hi, We are using solr cloud with solr version 4.8, we have 2 shard/2 replica servers in Solr Cloud. During two consecutive request to the solr cloud, the total results number varies. 1) As per my understanding this can happen when the leader and the replica have inconsistant number of results. 2) This inconsistant number of docs between leader and replica can happen only when replica is recovering. Should a request be sent to a node which is recovering. Since this is happening on our live setup, we tend to question how much can we rely on solr. What could be causing this and what's the fix. Regards
Script Transformer Help
Hi All, I have data-config.xml as below:Script Transformer is omitted. dataConfig dataSource driver=org.hsqldb.jdbcDriver url=jdbc:hsqldb:/temp/example/ex user=sa / script![CDATA[function f1(row){ row.put('message', 'Hello World!');return row;}]]/script document name=products entity name=item query=select NAME,BSIN from items transformer=script:f1 field column=NAME name=id / field column=BSIN name=bsin / entity name=brands query=select brandname from brand where bsin='${item.BSIN}' field name=brand column=BRAND / field name=cname column=namedesc / /entity /entity /document /dataConfig I am able to access NAME and BSIN in the function f1. I am not able to access the brand and cname. Is there any way i can access brand and cname from child entity in script transformer ? Thanks in advance. Regards, Pavan .P.Patharde
Re: Script Transformer Help
Have you tried putting the transformer on the inner entity definition? It's like a nested loop and you just put it in the outer loop. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Thu, Jul 17, 2014 at 11:29 AM, pavan patharde pathardepa...@gmail.com wrote: Hi All, I have data-config.xml as below:Script Transformer is omitted. dataConfig dataSource driver=org.hsqldb.jdbcDriver url=jdbc:hsqldb:/temp/example/ex user=sa / script![CDATA[function f1(row){ row.put('message', 'Hello World!');return row;}]]/script document name=products entity name=item query=select NAME,BSIN from items transformer=script:f1 field column=NAME name=id / field column=BSIN name=bsin / entity name=brands query=select brandname from brand where bsin='${item.BSIN}' field name=brand column=BRAND / field name=cname column=namedesc / /entity /entity /document /dataConfig I am able to access NAME and BSIN in the function f1. I am not able to access the brand and cname. Is there any way i can access brand and cname from child entity in script transformer ? Thanks in advance. Regards, Pavan .P.Patharde
Re: problem with replication/solrcloud - getting 'missing required field' during update intermittently (SOLR-6251)
Phew, thanks for tracking it down. On Thu, Jul 17, 2014 at 7:50 AM, Nathan Neulinger nn...@neulinger.org wrote: FYI. We finally tracked down the problem at least 99.9% sure at this point, and it was staring me in the face the whole time - just never noticed: [{id:4b2c4d09-31e2-4fe2-b767-3868efbdcda1,channel: {add: preet},channel: {add: adam}}] Look at the JSON... It's trying to add two channel array elements... Should have been: [{id:4b2c4d09-31e2-4fe2-b767-3868efbdcda1,channel: {add: preet}}, {id:4b2c4d09-31e2-4fe2-b767-3868efbdcda1,channel: {add: adam}}] I half wonder how it chose to interpret that particular chunk of json, but either way, I think the origin of our issue is resolved. From what I'm reading on JSON - this isn't valid syntax at all. I'm guessing that SOLR doesn't actually validate the JSON, and it's parser is just creating something weird in that situation like a new request for a whole new document. -- Nathan On 07/15/2014 07:19 PM, Nathan Neulinger wrote: Issue was closed in Jira requesting it be discussed here first. Looking for any diagnostic assistance on this issue with 4.8.0 since it is intermittent and occurs without warning. Setup is two nodes, with external zk ensemble. Nodes are accessed round-robin on EC2 behind an ELB. Schema has: schema name=hive version=1.5 ... field name=timestamp type=long indexed=false stored=true required=true multiValued=false omitNorms=true / ... Most of the updates are working without issue, but randomly we'll get the above failure, even though searches before and after the update clearly indicate that the document had the timestamp field in it. The error occurs when the second node does it's distrib operation against the first node. Diagnostic details are all in the jira issue. Can provide more as needed, but would appreciate any suggestions on what to try or to help diagnose this other than just trying to throw thousands of requests at it in round-robin between the two instances to see if it's possible to reproduce the issue. -- Nathan Nathan Neulinger nn...@neulinger.org Neulinger Consulting (573) 612-1412 -- Nathan Neulinger nn...@neulinger.org Neulinger Consulting (573) 612-1412 -- Regards, Shalin Shekhar Mangar.
Re: Script Transformer Help
Thats a good idea Alexandre. I will try it and update the results.. Thanks. Pavan .P.Patharde Phone:9844626450 On Thu, Jul 17, 2014 at 10:08 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Have you tried putting the transformer on the inner entity definition? It's like a nested loop and you just put it in the outer loop. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Thu, Jul 17, 2014 at 11:29 AM, pavan patharde pathardepa...@gmail.com wrote: Hi All, I have data-config.xml as below:Script Transformer is omitted. dataConfig dataSource driver=org.hsqldb.jdbcDriver url=jdbc:hsqldb:/temp/example/ex user=sa / script![CDATA[function f1(row){ row.put('message', 'Hello World!');return row;}]]/script document name=products entity name=item query=select NAME,BSIN from items transformer=script:f1 field column=NAME name=id / field column=BSIN name=bsin / entity name=brands query=select brandname from brand where bsin='${item.BSIN}' field name=brand column=BRAND / field name=cname column=namedesc / /entity /entity /document /dataConfig I am able to access NAME and BSIN in the function f1. I am not able to access the brand and cname. Is there any way i can access brand and cname from child entity in script transformer ? Thanks in advance. Regards, Pavan .P.Patharde
Re: Strategies for effective prefix queries?
Thank you Jorge. I didn't know about that filter. It's just what I was looking for. - Hayden On Wed, Jul 16, 2014 at 4:35 PM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Perhaps what you’re trying to do could be addressed by using the EdgeNGramFilterFactory filter? For query suggestions I’m using a very similar approach, this is an extract of the configuration I’m using: tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory maxGramSize=“10 minGramSize=1/ Basically this allows you to get partial matches from any part of the string, let’s say the field get’s this content at index time: A brown fox”, this document will be matched by the query (“bro”) for instance. My personal recommendation is to use this in a separated field that get’s populated through a copyField, this way you could apply different boosts. Greetings, On Jul 16, 2014, at 2:00 PM, Hayden Muhl haydenm...@gmail.com wrote: A copy field does not address my problem, and this has nothing to do with stored fields. This is a query parsing problem, not an indexing problem. Here's the use case. If someone has a username like bob-smith, I would like it to match prefixes of bo and sm. I tokenize the username into the tokens bob and smith. Everything is fine so far. If someone enters bo sm as a search string, I would like bob-smith to be one of the results. The query to do this is straight forward, username:bo* username:sm*. Here's the problem. In order to construct that query, I have to tokenize the search string bo sm **on the client**. I don't want to reimplement tokenization on the client. Is there any way to give Solr the string bo sm, have Solr do the tokenization, then treat each token like a prefix? On Tue, Jul 15, 2014 at 4:55 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: So copyField it to another and apply alternative processing there. Use eDismax to search both. No need to store the copied field, just index it. Regards, Alex On 16/07/2014 2:46 am, Hayden Muhl haydenm...@gmail.com wrote: Both fields? There is only one field here: username. On Mon, Jul 14, 2014 at 6:17 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Search against both fields (one split, one not split)? Keep original and tokenized form? I am doing something similar with class name autocompletes here: https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24 Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Tue, Jul 15, 2014 at 8:04 AM, Hayden Muhl haydenm...@gmail.com wrote: I'm working on using Solr for autocompleting usernames. I'm running into a problem with the wildcard queries (e.g. username:al*). We are tokenizing usernames so that a username like solr-user will be tokenized into solr and user, and will match both sol and use prefixes. The problem is when we get solr-u as a prefix, I'm having to split that up on the client side before I construct a query username:solr* username:u*. I'm basically using a regex as a poor man's tokenizer. Is there a better way to approach this? Is there a way to tell Solr to tokenize a string and use the parts as prefixes? - Hayden VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu