RE: Distributed grouping issue
Hi Martijn, I created a JIRA issue and attached a test that fails. It seems to exhibit the same issue that I see on my local box. (If you run it multiple times you can see that the group value of the top doc changes between runs.) Also, I had to change add fixShardCount = true; in the constructor of the TestDistributedGrouping class, which caused another test case to fail. (It's commented out in the patch with a TODO above it.) Please let me know if you need any other information. https://issues.apache.org/jira/browse/SOLR-3316 Thanks!! Cody -Original Message- From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of Martijn v Groningen Sent: Monday, April 02, 2012 10:49 PM To: solr-user@lucene.apache.org Subject: Re: Distributed grouping issue I tried the to reproduce this. However the matches always returns 4 in my case (when using rows=1 and rows=2). In your case the 2 documents on each core do belong to the same group, right? I did find something else. If I use rows=0 then an error occurs. I think we need to further investigate this. Can you open an issue in Jira? I'm a bit busy today. We can then further look into this in the coming days. Martijn On 2 April 2012 23:00, Young, Cody wrote: > Okay, I've played with this a bit more. Found something interesting: > > When the groups returned do not include results from a core, then the > core is excluded from the count. (I have 1 group, 2 documents per > core) > > Example: > > > http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/s > olr/core0,localhost:8983/solr/core1&group=true&group.field=group_field > &group.limit=10&rows=1 > > > > 2 > > Then, just by changing rows=2 > > > http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/s > olr/core0,localhost:8983/solr/core1&group=true&group.field=group_field > &group.limit=10&rows=2 > > > > 4 > > Let me know if you have any luck reproducing. > > Thanks, > Cody > > -Original Message- > From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On > Behalf Of Martijn v Groningen > Sent: Monday, April 02, 2012 1:48 PM > To: solr-user@lucene.apache.org > Subject: Re: Distributed grouping issue > > > > > All documents of a group exist on a single shard, there are no > > cross-shard groups. > > > You only have to partition documents by group when the groupCount and > some other features need to be accurate. For the "matches" this is not > necessary. The matches are summed up during merging the shared responses. > > I can't reproduce the error you are describing on a small local setup > I have here. I have two Solr cores with a simple schema. Each core has > 3 documents. When grouping the matches element returns 6. I'm running > on a trunk that I have updated 30 minutes ago. Can you try to isolate > the problem by testing with a small subset of your data? > > Martijn > -- Met vriendelijke groet, Martijn van Groningen
Re: Distributed grouping issue
I tried the to reproduce this. However the matches always returns 4 in my case (when using rows=1 and rows=2). In your case the 2 documents on each core do belong to the same group, right? I did find something else. If I use rows=0 then an error occurs. I think we need to further investigate this. Can you open an issue in Jira? I'm a bit busy today. We can then further look into this in the coming days. Martijn On 2 April 2012 23:00, Young, Cody wrote: > Okay, I've played with this a bit more. Found something interesting: > > When the groups returned do not include results from a core, then the core > is excluded from the count. (I have 1 group, 2 documents per core) > > Example: > > > http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/solr/core0,localhost:8983/solr/core1&group=true&group.field=group_field&group.limit=10&rows=1 > > > > 2 > > Then, just by changing rows=2 > > > http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/solr/core0,localhost:8983/solr/core1&group=true&group.field=group_field&group.limit=10&rows=2 > > > > 4 > > Let me know if you have any luck reproducing. > > Thanks, > Cody > > -Original Message- > From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On > Behalf Of Martijn v Groningen > Sent: Monday, April 02, 2012 1:48 PM > To: solr-user@lucene.apache.org > Subject: Re: Distributed grouping issue > > > > > All documents of a group exist on a single shard, there are no > > cross-shard groups. > > > You only have to partition documents by group when the groupCount and some > other features need to be accurate. For the "matches" this is not > necessary. The matches are summed up during merging the shared responses. > > I can't reproduce the error you are describing on a small local setup I > have here. I have two Solr cores with a simple schema. Each core has 3 > documents. When grouping the matches element returns 6. I'm running on a > trunk that I have updated 30 minutes ago. Can you try to isolate the > problem by testing with a small subset of your data? > > Martijn > -- Met vriendelijke groet, Martijn van Groningen
RE: Distributed grouping issue
Okay, I've played with this a bit more. Found something interesting: When the groups returned do not include results from a core, then the core is excluded from the count. (I have 1 group, 2 documents per core) Example: http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/solr/core0,localhost:8983/solr/core1&group=true&group.field=group_field&group.limit=10&rows=1 2 Then, just by changing rows=2 http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/solr/core0,localhost:8983/solr/core1&group=true&group.field=group_field&group.limit=10&rows=2 4 Let me know if you have any luck reproducing. Thanks, Cody -Original Message- From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of Martijn v Groningen Sent: Monday, April 02, 2012 1:48 PM To: solr-user@lucene.apache.org Subject: Re: Distributed grouping issue > > All documents of a group exist on a single shard, there are no > cross-shard groups. > You only have to partition documents by group when the groupCount and some other features need to be accurate. For the "matches" this is not necessary. The matches are summed up during merging the shared responses. I can't reproduce the error you are describing on a small local setup I have here. I have two Solr cores with a simple schema. Each core has 3 documents. When grouping the matches element returns 6. I'm running on a trunk that I have updated 30 minutes ago. Can you try to isolate the problem by testing with a small subset of your data? Martijn
Re: Distributed grouping issue
> > All documents of a group exist on a single shard, there are no cross-shard > groups. > You only have to partition documents by group when the groupCount and some other features need to be accurate. For the "matches" this is not necessary. The matches are summed up during merging the shared responses. I can't reproduce the error you are describing on a small local setup I have here. I have two Solr cores with a simple schema. Each core has 3 documents. When grouping the matches element returns 6. I'm running on a trunk that I have updated 30 minutes ago. Can you try to isolate the problem by testing with a small subset of your data? Martijn
RE: Distributed grouping issue
In the case of group=false: numFound="26" In the case of group=true: 34000 As a note, the grouped number changes when I hit refresh. It seems to display the count from any single shard. (The top match also changes). I haven't tried this in other versions of solr. All documents of a group exist on a single shard, there are no cross-shard groups. Thanks, Cody -Original Message- From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of Martijn v Groningen Sent: Monday, April 02, 2012 3:15 AM To: solr-user@lucene.apache.org Subject: Re: Distributed grouping issue The "matches" element in the response should return the number of documents that matched with the query and not the number of groups. Did you encountered this issue also with other Solr versions (3.5 or another nightly build)? Martijn On 2 April 2012 09:41, fbrisbart wrote: > Hi, > > when you write "I get xxx results", does it come from 'numFound' ? Or > you really display xxx results ? > When using both field collapsing and sharding, the 'numFound' may be > wrong. In that case, think about using 'shards.rows' parameter with a > high value (be careful, it's bad for performance). > > If the problem is really about the returned results, it may be because > of several documents having the same unique key "document_id" in > different shards. > > Hope it helps, > Franck > > > > Le vendredi 30 mars 2012 à 23:52 +, Young, Cody a écrit : > > I forgot to mention, I can see the distributed requests happening in > > the > logs: > > > > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute > > INFO: [core2] webapp=/solr path=/select > params={group.distributed.first=true&distrib=false&wt=javabin&rows=10& > version=2&fl=document_id,score&shard.url=localhost:8086/solr/core2&NOW > =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar > d=true} > status=0 QTime=2 > > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute > > INFO: [core4] webapp=/solr path=/select > params={group.distributed.first=true&distrib=false&wt=javabin&rows=10& > version=2&fl=document_id,score&shard.url=localhost:8086/solr/core4&NOW > =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar > d=true} > status=0 QTime=1 > > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute > > INFO: [core1] webapp=/solr path=/select > params={group.distributed.first=true&distrib=false&wt=javabin&rows=10& > version=2&fl=document_id,score&shard.url=localhost:8086/solr/core1&NOW > =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar > d=true} > status=0 QTime=1 > > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute > > INFO: [core3] webapp=/solr path=/select > params={group.distributed.first=true&distrib=false&wt=javabin&rows=10& > version=2&fl=document_id,score&shard.url=localhost:8086/solr/core3&NOW > =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar > d=true} > status=0 QTime=1 > > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute > > INFO: [core0] webapp=/solr path=/select > params={group.distributed.first=true&distrib=false&wt=javabin&rows=10& > version=2&fl=document_id,score&shard.url=localhost:8086/solr/core0&NOW > =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar > d=true} > status=0 QTime=1 > > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute > > INFO: [core6] webapp=/solr path=/select > params={group.distributed.first=true&distrib=false&wt=javabin&rows=10& > version=2&fl=document_id,score&shard.url=localhost:8086/solr/core6&NOW > =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar > d=true} > status=0 QTime=0 > > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute > > INFO: [core7] webapp=/solr path=/select > params={group.distributed.first=true&distrib=false&wt=javabin&rows=10& > version=2&fl=document_id,score&shard.url=localhost:8086/solr/core7&NOW > =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar > d=true} > status=0 QTime=3 > > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute > > INFO: [core5] webapp=/solr path=/select > params={group.distributed.first=true&distrib=false&wt=javabin&rows=10& > version=2&fl=doc
Re: Distributed grouping issue
roups.group_field=3554417600&group.topgroups.group_field=3140802904&fl=document_id,score&shard.url=localhost:8086/solr/core4&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} > status=0 QTime=2 > > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute > > INFO: [core6] webapp=/solr path=/select > params={distrib=false&group.distributed.second=true&wt=javabin&version=2&rows=10&group.topgroups.group_field=4183765296&group.topgroups.group_field=4608765424&group.topgroups.group_field=3524954944&group.topgroups.group_field=4182445488&group.topgroups.group_field=4213143392&group.topgroups.group_field=4328299312&group.topgroups.group_field=4206259648&group.topgroups.group_field=3465497912&group.topgroups.group_field=3554417600&group.topgroups.group_field=3140802904&fl=document_id,score&shard.url=localhost:8086/solr/core6&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} > status=0 QTime=2 > > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute > > INFO: [core4] webapp=/solr path=/select > params={NOW=1333151353217&shard.url=localhost:8086/solr/core4&ids=4182445488-535180165,3554417600-527549713,4608765424-526014561,3524954944-531590393,4183765296-514134497,4206259648-530219973,3465497912-534955957,4213143392-534186349,3140802904-538688961,4328299312-533482537&q=*:*&distrib=false&group.field=group_field&wt=javabin&isShard=true&version=2&rows=10} > status=0 QTime=5 > > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute > > INFO: [core0] webapp=/solr path=/select/ > params={shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7&q=*:*&group.field=group_field&group=true} > status=0 QTime=106 > > > > -Original Message- > > From: Young, Cody [mailto:cody.yo...@move.com] > > Sent: Friday, March 30, 2012 4:35 PM > > To: solr-user@lucene.apache.org > > Subject: Distributed grouping issue > > > > Hi All, > > > > I'm having an issue getting distributed grouping working on trunk (Mar > 29, 2012). > > > > If I send this query: > > > > http://localhost:8086/solr/core0/select/?q=*:*&group=false&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7 > > > > I get 260,000 results. As soon as I change to using grouping: > > > > > http://localhost:8086/solr/core0/select/?q=*:*&group=true&group.field=group_field&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7 > > > > I only get 32,000 results. (the number of documents in a single core.) > > > > The field that I am grouping on is defined as: > > > > multiValued="false" /> > > > > omitNorms="true"/> > > > > The document id: > > > > > > required="true" /> > > > > omitNorms="true"/> > > > > document_id > > > > Anyone else experiencing this? Any ideas? > > > > Thanks, > > Cody > > > -- Met vriendelijke groet, Martijn van Groningen
RE: Distributed grouping issue
p;group.topgroups.group_field=4183765296&group.topgroups.group_field=4608765424&group.topgroups.group_field=3524954944&group.topgroups.group_field=4182445488&group.topgroups.group_field=4213143392&group.topgroups.group_field=4328299312&group.topgroups.group_field=4206259648&group.topgroups.group_field=3465497912&group.topgroups.group_field=3554417600&group.topgroups.group_field=3140802904&fl=document_id,score&shard.url=localhost:8086/solr/core6&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true} > status=0 QTime=2 > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute > INFO: [core4] webapp=/solr path=/select > params={NOW=1333151353217&shard.url=localhost:8086/solr/core4&ids=4182445488-535180165,3554417600-527549713,4608765424-526014561,3524954944-531590393,4183765296-514134497,4206259648-530219973,3465497912-534955957,4213143392-534186349,3140802904-538688961,4328299312-533482537&q=*:*&distrib=false&group.field=group_field&wt=javabin&isShard=true&version=2&rows=10} > status=0 QTime=5 > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute > INFO: [core0] webapp=/solr path=/select/ > params={shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7&q=*:*&group.field=group_field&group=true} > status=0 QTime=106 > > -Original Message- > From: Young, Cody [mailto:cody.yo...@move.com] > Sent: Friday, March 30, 2012 4:35 PM > To: solr-user@lucene.apache.org > Subject: Distributed grouping issue > > Hi All, > > I'm having an issue getting distributed grouping working on trunk (Mar 29, > 2012). > > If I send this query: > > http://localhost:8086/solr/core0/select/?q=*:*&group=false > &shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7 > > I get 260,000 results. As soon as I change to using grouping: > > http://localhost:8086/solr/core0/select/?q=*:*&group=true&group.field=group_field&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7 > > I only get 32,000 results. (the number of documents in a single core.) > > The field that I am grouping on is defined as: > > multiValued="false" /> > > omitNorms="true"/> > > The document id: > > > required="true" /> > > omitNorms="true"/> > > document_id > > Anyone else experiencing this? Any ideas? > > Thanks, > Cody
RE: Distributed grouping issue
45488-535180165,3554417600-527549713,4608765424-526014561,3524954944-531590393,4183765296-514134497,4206259648-530219973,3465497912-534955957,4213143392-534186349,3140802904-538688961,4328299312-533482537&q=*:*&distrib=false&group.field=group_field&wt=javabin&isShard=true&version=2&rows=10} status=0 QTime=5 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core0] webapp=/solr path=/select/ params={shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7&q=*:*&group.field=group_field&group=true} status=0 QTime=106 -Original Message- From: Young, Cody [mailto:cody.yo...@move.com] Sent: Friday, March 30, 2012 4:35 PM To: solr-user@lucene.apache.org Subject: Distributed grouping issue Hi All, I'm having an issue getting distributed grouping working on trunk (Mar 29, 2012). If I send this query: http://localhost:8086/solr/core0/select/?q=*:*&group=false &shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7 I get 260,000 results. As soon as I change to using grouping: http://localhost:8086/solr/core0/select/?q=*:*&group=true&group.field=group_field&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7 I only get 32,000 results. (the number of documents in a single core.) The field that I am grouping on is defined as: The document id: document_id Anyone else experiencing this? Any ideas? Thanks, Cody
Distributed grouping issue
Hi All, I'm having an issue getting distributed grouping working on trunk (Mar 29, 2012). If I send this query: http://localhost:8086/solr/core0/select/?q=*:*&group=false &shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7 I get 260,000 results. As soon as I change to using grouping: http://localhost:8086/solr/core0/select/?q=*:*&group=true&group.field=group_field&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7 I only get 32,000 results. (the number of documents in a single core.) The field that I am grouping on is defined as: The document id: document_id Anyone else experiencing this? Any ideas? Thanks, Cody