RE: COMMERCIAL: RE: SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response
Hi Chris, Created ticket https://issues.apache.org/jira/browse/SOLR-6154 Included to the ticket the data.xml and a PDF with instructions on how to replicate. Sending different updates to different ports was just how the confluence tutorial made the steps; it does not affect the result of the test As soon as I have more information will post to the ticket. Appreciate the interest, let me know about any suggestion or feedback Thank you Ronald Matamoros -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: 06 June 2014 22:00 To: solr-user@lucene.apache.org Subject: COMMERCIAL: RE: SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response Ronald: I'm having a little trouble understading the steps o reproduce that you are describing -- in particular Step 1 f ii because i'm not really sure i understand what exactly you are putting in mem2.xml Also: Since you don't appera to be using implicit routing, i'm not clear on why you are explicitly sending differnet updates to different ports in Step 1 f i -- does that affect the results of your test? If you can reliably reproduce using modified data from the example, could you please open a Jira outline these steps and atached the modified data to index directly to that issue? (FWIW: If it doesn't matter what port you use to send which documents, then you should be able to create a single unified data.xml file containing all the docs to index in a single command) : Date: Thu, 29 May 2014 18:06:38 + : From: Ronald Matamoros rmatamo...@searchtechnologies.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org solr-user@lucene.apache.org : Subject: RE: SolrCloud: facet range option f.field.facet.mincount=1 omits : buckets on response : : Hi all, : : At the moment I am reviewing the code to determine if this is a legitimate bug that needs to be set as a JIRA ticket. : Any insight or recommendation is appreciated. : : Including the replication steps as text: : : - : Solr versions where issue was replicated. : * 4.5.1 (Linux) : * 4.8.1 (Windows + Cygwin) : : Replicating : : 1. Created two-shard environment - no replication : https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud : : a. Download Solr distribution from http://lucene.apache.org/solr/downloads.html : b. Unzipped solr-4.8.1.zip to a temporary location: SOLR_DIST_HOME : c. Ran once so the SolrCloud jars get unpacked: java -jar start.jar : d. Create nodes : i. cd SOLR_DIST_HOME : ii. Via Windows Explorer copied example to node1 : iii. Via Windows Explorer copied example to node2 : : e. Start Nodes : i. Start node 1 : :cd node1 :java -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar start.jar : : ii. Start node 2 : :cd node2 :java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar : : f. Fed sample documents : i. Out of the box : :curl http://localhost:8983/solr/update?commit=true -H Content-Type: text/xml -d @mem.xml :curl http://localhost:7574/solr/update?commit=true -H Content-Type: text/xml -d @monitor2.xml : : ii. Create a copy of mem.xml to mem2.xml; modified identifiers, names, prices and fed : :curl http://localhost:8983/solr/update?commit=true -H Content-Type: text/xml -d @mem2.xml : :add : doc :field name=idCOMPANY1/field :field name=nameCOMPANY1 Device/field :field name=manuCOMPANY1 Device Mfg/field :. :field name=price190/field :. : /doc : doc :field name=idCOMPANY2/field :field name=nameCOMPANY2 flatscreen/field :field name=manuCOMPANY2 Device Mfg./field :. :field name=price200.00/field :. : /doc : doc :field name=idCOMPANY3/field :field name=nameCOMPANY3 Laptop/field :field name=manuCOMPANY3 Device Mfg./field :. :field name=price800.00/field :. : /doc : : /add : : 2. Query **without** f.price.facet.mincount=1, counts and buckets are OK : : http://localhost:8983/solr/collection1/select?q=*:*fl=id,pricesort=id+ascfacet=truefacet.range=pricef.price.facet.range.start=0f.price.facet.range.end=1000f.price.facet.range.gap=50f.price.facet.range.other=allf.price.facet.range.include
RE: SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response
Ronald: I'm having a little trouble understading the steps o reproduce that you are describing -- in particular Step 1 f ii because i'm not really sure i understand what exactly you are putting in mem2.xml Also: Since you don't appera to be using implicit routing, i'm not clear on why you are explicitly sending differnet updates to different ports in Step 1 f i -- does that affect the results of your test? If you can reliably reproduce using modified data from the example, could you please open a Jira outline these steps and atached the modified data to index directly to that issue? (FWIW: If it doesn't matter what port you use to send which documents, then you should be able to create a single unified data.xml file containing all the docs to index in a single command) : Date: Thu, 29 May 2014 18:06:38 + : From: Ronald Matamoros rmatamo...@searchtechnologies.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org solr-user@lucene.apache.org : Subject: RE: SolrCloud: facet range option f.field.facet.mincount=1 omits : buckets on response : : Hi all, : : At the moment I am reviewing the code to determine if this is a legitimate bug that needs to be set as a JIRA ticket. : Any insight or recommendation is appreciated. : : Including the replication steps as text: : : - : Solr versions where issue was replicated. : * 4.5.1 (Linux) : * 4.8.1 (Windows + Cygwin) : : Replicating : : 1. Created two-shard environment - no replication : https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud : : a. Download Solr distribution from http://lucene.apache.org/solr/downloads.html : b. Unzipped solr-4.8.1.zip to a temporary location: SOLR_DIST_HOME : c. Ran once so the SolrCloud jars get unpacked: java -jar start.jar : d. Create nodes : i. cd SOLR_DIST_HOME : ii. Via Windows Explorer copied example to node1 : iii. Via Windows Explorer copied example to node2 : : e. Start Nodes : i. Start node 1 : :cd node1 :java -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar start.jar : : ii. Start node 2 : :cd node2 :java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar : : f. Fed sample documents : i. Out of the box : :curl http://localhost:8983/solr/update?commit=true -H Content-Type: text/xml -d @mem.xml :curl http://localhost:7574/solr/update?commit=true -H Content-Type: text/xml -d @monitor2.xml : : ii. Create a copy of mem.xml to mem2.xml; modified identifiers, names, prices and fed : :curl http://localhost:8983/solr/update?commit=true -H Content-Type: text/xml -d @mem2.xml : :add : doc :field name=idCOMPANY1/field :field name=nameCOMPANY1 Device/field :field name=manuCOMPANY1 Device Mfg/field :. :field name=price190/field :. : /doc : doc :field name=idCOMPANY2/field :field name=nameCOMPANY2 flatscreen/field :field name=manuCOMPANY2 Device Mfg./field :. :field name=price200.00/field :. : /doc : doc :field name=idCOMPANY3/field :field name=nameCOMPANY3 Laptop/field :field name=manuCOMPANY3 Device Mfg./field :. :field name=price800.00/field :. : /doc : : /add : : 2. Query **without** f.price.facet.mincount=1, counts and buckets are OK : : http://localhost:8983/solr/collection1/select?q=*:*fl=id,pricesort=id+ascfacet=truefacet.range=pricef.price.facet.range.start=0f.price.facet.range.end=1000f.price.facet.range.gap=50f.price.facet.range.other=allf.price.facet.range.include=upperspellcheck=falsehl=false : : Only six documents have prices : : lst name=facet_ranges : lst name=price : lst name=counts : int name=0.00/int : int name=50.01/int : int name=100.00/int : int name=150.03/int : int name=200.00/int : int name=250.01/int : int name=300.00/int : int name=350.00/int : int name=400.00/int : int name=450.00/int : int name=500.00/int : int name=550.00/int : int name=600.00/int : int name=650.00/int : int
Re: SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response
Hi Shawn, Thanks very much for the feedback. Have tested using the routing mechanism/composite-id on a larger scale. Unfortunately the same behaviour. Regards Ronald -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: 29 May 2014 20:16 To: solr-user@lucene.apache.org Subject: COMMERCIAL: Re: SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response On 5/29/2014 12:06 PM, Ronald Matamoros wrote: Hi all, At the moment I am reviewing the code to determine if this is a legitimate bug that needs to be set as a JIRA ticket. Any insight or recommendation is appreciated. snip Note: the value in int name=between changes with every other refresh of the query. Whenever distributed search results change from one query to the next, it's almost always caused by having documents with the same uniqueKey in more than one shard. Solr is able to remove these duplicates from the results, but there are other aspects of distributed searching that cannot be dealt with when there are duplicate documents. This leads to problems like numFound changing from one request to the next. To avoid these problems with SolrCloud, you'll likely want to create a new collection and set its router to compositeId. This ensures that indexed documents are distributed to shards according to the hash of their uniqueKey, not imported directly into the node where you made the update request. It's possible that my guess here is completely wrong, but this is usually the problem. Thanks, Shawn
RE: SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response
name=before0/int int name=after0/int int name=between4/int /lst /lst Refresh of the Query (may need to do this multiple times with F5 key on browser) lst name=facet_ranges lst name=price lst name=counts int name=150.03/int int name=250.01/int /lst float name=gap50.0/float float name=start0.0/float float name=end1000.0/float int name=before0/int int name=after0/int int name=between2/int /lst /lst Thank you, Ronald Matamoros -Original Message- From: Ronald Matamoros [mailto:rmatamo...@searchtechnologies.com] Sent: 27 May 2014 16:25 To: solr-user@lucene.apache.org Subject: COMMERCIAL: SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response Good afternoon, Is the f.field.facet.mincount option supported on a distributed search? Under SolrCloud experiencing that some buckets are ignored when using the option f.field.facet.mincount=1. The Solr logs do not indicate any error or warning during execution. The debug=true option and increasing the log levels to the FacetComponent do not provide any hints to the behaviour. Replicated the issue on both Solr 4.5.1 4.8.1. Attached a PDF that provides additional details and steps to replicate the behaviour using the out of the box Solr distribution. Any insight or recommendation to tackle this situation is much appreciated. Example, Removing the f.field.facet.mincount=1 option gives the expected list of buckets for the 6 documents matched. lst name=facet_ranges lst name=price lst name=counts int name=0.00/int int name=50.01/int int name=100.00/int int name=150.03/int int name=200.00/int int name=250.01/int int name=300.00/int int name=350.00/int int name=400.00/int int name=450.00/int int name=500.00/int int name=550.00/int int name=600.00/int int name=650.00/int int name=700.00/int int name=750.01/int int name=800.00/int int name=850.00/int int name=900.00/int int name=950.00/int /lst float name=gap50.0/float float name=start0.0/float float name=end1000.0/float int name=before0/int int name=after0/int int name=between2/int /lst /lst Using the f.field.facet.mincount=1 option removes the 0 count buckets but will also omit bucket int name=250.0 lst name=facet_ranges lst name=price lst name=counts int name=50.01/int int name=150.03/int int name=750.01/int /lst float name=gap50.0/float float name=start0.0/float float name=end1000.0/float int name=before0/int int name=after0/int int name=between4/int /lst /lst Refreshing the query using the browser's F5 option renders a different bucket list (you may need to refresh multiple times) lst name=facet_ranges lst name=price lst name=counts int name=150.03/int int name=250.01/int /lst float name=gap50.0/float float name=start0.0/float float name=end1000.0/float int name=before0/int int name=after0/int int name=between2/int /lst /lst Regards Ronald Matamoros
Re: SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response
On 5/29/2014 12:06 PM, Ronald Matamoros wrote: Hi all, At the moment I am reviewing the code to determine if this is a legitimate bug that needs to be set as a JIRA ticket. Any insight or recommendation is appreciated. snip Note: the value in int name=between changes with every other refresh of the query. Whenever distributed search results change from one query to the next, it's almost always caused by having documents with the same uniqueKey in more than one shard. Solr is able to remove these duplicates from the results, but there are other aspects of distributed searching that cannot be dealt with when there are duplicate documents. This leads to problems like numFound changing from one request to the next. To avoid these problems with SolrCloud, you'll likely want to create a new collection and set its router to compositeId. This ensures that indexed documents are distributed to shards according to the hash of their uniqueKey, not imported directly into the node where you made the update request. It's possible that my guess here is completely wrong, but this is usually the problem. Thanks, Shawn
SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response
Good afternoon, Is the f.field.facet.mincount option supported on a distributed search? Under SolrCloud experiencing that some buckets are ignored when using the option f.field.facet.mincount=1. The Solr logs do not indicate any error or warning during execution. The debug=true option and increasing the log levels to the FacetComponent do not provide any hints to the behaviour. Replicated the issue on both Solr 4.5.1 4.8.1. Attached a PDF that provides additional details and steps to replicate the behaviour using the out of the box Solr distribution. Any insight or recommendation to tackle this situation is much appreciated. Example, Removing the f.field.facet.mincount=1 option gives the expected list of buckets for the 6 documents matched. lst name=facet_ranges lst name=price lst name=counts int name=0.00/int int name=50.01/int int name=100.00/int int name=150.03/int int name=200.00/int int name=250.01/int int name=300.00/int int name=350.00/int int name=400.00/int int name=450.00/int int name=500.00/int int name=550.00/int int name=600.00/int int name=650.00/int int name=700.00/int int name=750.01/int int name=800.00/int int name=850.00/int int name=900.00/int int name=950.00/int /lst float name=gap50.0/float float name=start0.0/float float name=end1000.0/float int name=before0/int int name=after0/int int name=between2/int /lst /lst Using the f.field.facet.mincount=1 option removes the 0 count buckets but will also omit bucket int name=250.0 lst name=facet_ranges lst name=price lst name=counts int name=50.01/int int name=150.03/int int name=750.01/int /lst float name=gap50.0/float float name=start0.0/float float name=end1000.0/float int name=before0/int int name=after0/int int name=between4/int /lst /lst Refreshing the query using the browser's F5 option renders a different bucket list (you may need to refresh multiple times) lst name=facet_ranges lst name=price lst name=counts int name=150.03/int int name=250.01/int /lst float name=gap50.0/float float name=start0.0/float float name=end1000.0/float int name=before0/int int name=after0/int int name=between2/int /lst /lst Regards Ronald Matamoros