RE: Distributed grouping issue

2012-04-04 Thread Young, Cody
Hi Martijn,

I created a JIRA issue and attached a test that fails. It seems to exhibit the 
same issue that I see on my local box. (If you run it multiple times you can 
see that the group value of the top doc changes between runs.)

Also, I had to change add fixShardCount = true; in the constructor of the 
TestDistributedGrouping class, which caused another test case to fail. (It's 
commented out in the patch with a TODO above it.)

Please let me know if you need any other information.

https://issues.apache.org/jira/browse/SOLR-3316

Thanks!!
Cody

-Original Message-
From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of 
Martijn v Groningen
Sent: Monday, April 02, 2012 10:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Distributed grouping issue

I tried the to reproduce this. However the matches always returns 4 in my case 
(when using rows=1 and rows=2).
In your case the 2 documents on each core do belong to the same group, right?

I did find something else. If I use rows=0 then an error occurs. I think we 
need to further investigate this.
Can you open an issue in Jira? I'm a bit busy today. We can then further look 
into this in the coming days.

Martijn

On 2 April 2012 23:00, Young, Cody  wrote:

> Okay, I've played with this a bit more. Found something interesting:
>
> When the groups returned do not include results from a core, then the 
> core is excluded from the count. (I have 1 group, 2 documents per 
> core)
>
> Example:
>
>
> http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/s
> olr/core0,localhost:8983/solr/core1&group=true&group.field=group_field
> &group.limit=10&rows=1
>
> 
> 
> 2
>
> Then, just by changing rows=2
>
>
> http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/s
> olr/core0,localhost:8983/solr/core1&group=true&group.field=group_field
> &group.limit=10&rows=2
>
> 
> 
> 4
>
> Let me know if you have any luck reproducing.
>
> Thanks,
> Cody
>
> -Original Message-
> From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On 
> Behalf Of Martijn v Groningen
> Sent: Monday, April 02, 2012 1:48 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Distributed grouping issue
>
> >
> > All documents of a group exist on a single shard, there are no 
> > cross-shard groups.
> >
> You only have to partition documents by group when the groupCount and 
> some other features need to be accurate. For the "matches" this is not 
> necessary. The matches are summed up during merging the shared responses.
>
> I can't reproduce the error you are describing on a small local setup 
> I have here. I have two Solr cores with a simple schema. Each core has 
> 3 documents. When grouping the matches element returns 6. I'm running 
> on a trunk that I have updated 30 minutes ago. Can you try to isolate 
> the problem by testing with a small subset of your data?
>
> Martijn
>



--
Met vriendelijke groet,

Martijn van Groningen


Re: Distributed grouping issue

2012-04-02 Thread Martijn v Groningen
I tried the to reproduce this. However the matches always returns 4 in my
case (when using rows=1 and rows=2).
In your case the 2 documents on each core do belong to the same group,
right?

I did find something else. If I use rows=0 then an error occurs. I think we
need to further investigate this.
Can you open an issue in Jira? I'm a bit busy today. We can then further
look into this in the coming days.

Martijn

On 2 April 2012 23:00, Young, Cody  wrote:

> Okay, I've played with this a bit more. Found something interesting:
>
> When the groups returned do not include results from a core, then the core
> is excluded from the count. (I have 1 group, 2 documents per core)
>
> Example:
>
>
> http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/solr/core0,localhost:8983/solr/core1&group=true&group.field=group_field&group.limit=10&rows=1
>
> 
> 
> 2
>
> Then, just by changing rows=2
>
>
> http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/solr/core0,localhost:8983/solr/core1&group=true&group.field=group_field&group.limit=10&rows=2
>
> 
> 
> 4
>
> Let me know if you have any luck reproducing.
>
> Thanks,
> Cody
>
> -Original Message-
> From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On
> Behalf Of Martijn v Groningen
> Sent: Monday, April 02, 2012 1:48 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Distributed grouping issue
>
> >
> > All documents of a group exist on a single shard, there are no
> > cross-shard groups.
> >
> You only have to partition documents by group when the groupCount and some
> other features need to be accurate. For the "matches" this is not
> necessary. The matches are summed up during merging the shared responses.
>
> I can't reproduce the error you are describing on a small local setup I
> have here. I have two Solr cores with a simple schema. Each core has 3
> documents. When grouping the matches element returns 6. I'm running on a
> trunk that I have updated 30 minutes ago. Can you try to isolate the
> problem by testing with a small subset of your data?
>
> Martijn
>



-- 
Met vriendelijke groet,

Martijn van Groningen


RE: Distributed grouping issue

2012-04-02 Thread Young, Cody
Okay, I've played with this a bit more. Found something interesting:

When the groups returned do not include results from a core, then the core is 
excluded from the count. (I have 1 group, 2 documents per core)

Example:

http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/solr/core0,localhost:8983/solr/core1&group=true&group.field=group_field&group.limit=10&rows=1



2

Then, just by changing rows=2

http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/solr/core0,localhost:8983/solr/core1&group=true&group.field=group_field&group.limit=10&rows=2



4

Let me know if you have any luck reproducing.

Thanks,
Cody 

-Original Message-
From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of 
Martijn v Groningen
Sent: Monday, April 02, 2012 1:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Distributed grouping issue

>
> All documents of a group exist on a single shard, there are no 
> cross-shard groups.
>
You only have to partition documents by group when the groupCount and some 
other features need to be accurate. For the "matches" this is not necessary. 
The matches are summed up during merging the shared responses.

I can't reproduce the error you are describing on a small local setup I have 
here. I have two Solr cores with a simple schema. Each core has 3 documents. 
When grouping the matches element returns 6. I'm running on a trunk that I have 
updated 30 minutes ago. Can you try to isolate the problem by testing with a 
small subset of your data?

Martijn


Re: Distributed grouping issue

2012-04-02 Thread Martijn v Groningen
>
> All documents of a group exist on a single shard, there are no cross-shard
> groups.
>
You only have to partition documents by group when the groupCount and some
other features need to be accurate. For the "matches" this is not
necessary. The matches are summed up during merging the shared responses.

I can't reproduce the error you are describing on a small local setup I
have here. I have two Solr cores with a simple schema. Each core has 3
documents. When grouping the matches element returns 6. I'm running on a
trunk that I have updated 30 minutes ago. Can you try to isolate the
problem by testing with a small subset of your data?

Martijn


RE: Distributed grouping issue

2012-04-02 Thread Young, Cody
In the case of group=false:

numFound="26"

In the case of group=true:

34000

As a note, the grouped number changes when I hit refresh. It seems to display 
the count from any single shard. (The top match also changes).

I haven't tried this in other versions of solr.

All documents of a group exist on a single shard, there are no cross-shard 
groups.

Thanks,
Cody 

-Original Message-
From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of 
Martijn v Groningen
Sent: Monday, April 02, 2012 3:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Distributed grouping issue

The "matches" element in the response should return the number of documents 
that matched with the query and not the number of groups.
Did you encountered this issue also with other Solr versions (3.5 or another 
nightly build)?

Martijn

On 2 April 2012 09:41, fbrisbart  wrote:

> Hi,
>
> when you write "I get xxx results", does it come from 'numFound' ? Or 
> you really display xxx results ?
> When using both field collapsing and sharding, the 'numFound' may be 
> wrong. In that case, think about using 'shards.rows' parameter with a 
> high value (be careful, it's bad for performance).
>
> If the problem is really about the returned results, it may be because 
> of several documents having the same unique key "document_id" in 
> different shards.
>
> Hope it helps,
> Franck
>
>
>
> Le vendredi 30 mars 2012 à 23:52 +, Young, Cody a écrit :
> > I forgot to mention, I can see the distributed requests happening in 
> > the
> logs:
> >
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core2] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&
> version=2&fl=document_id,score&shard.url=localhost:8086/solr/core2&NOW
> =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar
> d=true}
> status=0 QTime=2
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core4] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&
> version=2&fl=document_id,score&shard.url=localhost:8086/solr/core4&NOW
> =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar
> d=true}
> status=0 QTime=1
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core1] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&
> version=2&fl=document_id,score&shard.url=localhost:8086/solr/core1&NOW
> =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar
> d=true}
> status=0 QTime=1
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core3] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&
> version=2&fl=document_id,score&shard.url=localhost:8086/solr/core3&NOW
> =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar
> d=true}
> status=0 QTime=1
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core0] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&
> version=2&fl=document_id,score&shard.url=localhost:8086/solr/core0&NOW
> =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar
> d=true}
> status=0 QTime=1
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core6] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&
> version=2&fl=document_id,score&shard.url=localhost:8086/solr/core6&NOW
> =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar
> d=true}
> status=0 QTime=0
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core7] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&
> version=2&fl=document_id,score&shard.url=localhost:8086/solr/core7&NOW
> =1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShar
> d=true}
> status=0 QTime=3
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core5] webapp=/solr path=/select
> params={group.distributed.first=true&distrib=false&wt=javabin&rows=10&
> version=2&fl=doc

Re: Distributed grouping issue

2012-04-02 Thread Martijn v Groningen
roups.group_field=3554417600&group.topgroups.group_field=3140802904&fl=document_id,score&shard.url=localhost:8086/solr/core4&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
> status=0 QTime=2
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core6] webapp=/solr path=/select
> params={distrib=false&group.distributed.second=true&wt=javabin&version=2&rows=10&group.topgroups.group_field=4183765296&group.topgroups.group_field=4608765424&group.topgroups.group_field=3524954944&group.topgroups.group_field=4182445488&group.topgroups.group_field=4213143392&group.topgroups.group_field=4328299312&group.topgroups.group_field=4206259648&group.topgroups.group_field=3465497912&group.topgroups.group_field=3554417600&group.topgroups.group_field=3140802904&fl=document_id,score&shard.url=localhost:8086/solr/core6&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
> status=0 QTime=2
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core4] webapp=/solr path=/select
> params={NOW=1333151353217&shard.url=localhost:8086/solr/core4&ids=4182445488-535180165,3554417600-527549713,4608765424-526014561,3524954944-531590393,4183765296-514134497,4206259648-530219973,3465497912-534955957,4213143392-534186349,3140802904-538688961,4328299312-533482537&q=*:*&distrib=false&group.field=group_field&wt=javabin&isShard=true&version=2&rows=10}
> status=0 QTime=5
> > Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> > INFO: [core0] webapp=/solr path=/select/
> params={shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7&q=*:*&group.field=group_field&group=true}
> status=0 QTime=106
> >
> > -Original Message-
> > From: Young, Cody [mailto:cody.yo...@move.com]
> > Sent: Friday, March 30, 2012 4:35 PM
> > To: solr-user@lucene.apache.org
> > Subject: Distributed grouping issue
> >
> > Hi All,
> >
> > I'm having an issue getting distributed grouping  working on trunk (Mar
> 29, 2012).
> >
> > If I send this query:
> >
> > http://localhost:8086/solr/core0/select/?q=*:*&group=false&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7
> >
> > I get 260,000 results. As soon as I change to using grouping:
> >
> >
> http://localhost:8086/solr/core0/select/?q=*:*&group=true&group.field=group_field&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7
> >
> > I only get 32,000 results. (the number of documents in a single core.)
> >
> > The field that I am grouping on is defined as:
> >
> >  multiValued="false" />
> >
> >  omitNorms="true"/>
> >
> > The document id:
> >
> >
> >  required="true" />
> >
> >  omitNorms="true"/>
> >
> > document_id
> >
> > Anyone else experiencing this? Any ideas?
> >
> > Thanks,
> > Cody
>
>
>


-- 
Met vriendelijke groet,

Martijn van Groningen


RE: Distributed grouping issue

2012-04-02 Thread fbrisbart
p;group.topgroups.group_field=4183765296&group.topgroups.group_field=4608765424&group.topgroups.group_field=3524954944&group.topgroups.group_field=4182445488&group.topgroups.group_field=4213143392&group.topgroups.group_field=4328299312&group.topgroups.group_field=4206259648&group.topgroups.group_field=3465497912&group.topgroups.group_field=3554417600&group.topgroups.group_field=3140802904&fl=document_id,score&shard.url=localhost:8086/solr/core6&NOW=1333151353217&start=0&q=*:*&group.field=group_field&group=true&isShard=true}
>  status=0 QTime=2
> Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> INFO: [core4] webapp=/solr path=/select 
> params={NOW=1333151353217&shard.url=localhost:8086/solr/core4&ids=4182445488-535180165,3554417600-527549713,4608765424-526014561,3524954944-531590393,4183765296-514134497,4206259648-530219973,3465497912-534955957,4213143392-534186349,3140802904-538688961,4328299312-533482537&q=*:*&distrib=false&group.field=group_field&wt=javabin&isShard=true&version=2&rows=10}
>  status=0 QTime=5
> Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
> INFO: [core0] webapp=/solr path=/select/ 
> params={shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7&q=*:*&group.field=group_field&group=true}
>  status=0 QTime=106 
> 
> -Original Message-
> From: Young, Cody [mailto:cody.yo...@move.com] 
> Sent: Friday, March 30, 2012 4:35 PM
> To: solr-user@lucene.apache.org
> Subject: Distributed grouping issue
> 
> Hi All,
> 
> I'm having an issue getting distributed grouping  working on trunk (Mar 29, 
> 2012).
> 
> If I send this query:
> 
> http://localhost:8086/solr/core0/select/?q=*:*&group=false 
> &shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7
> 
> I get 260,000 results. As soon as I change to using grouping:
> 
> http://localhost:8086/solr/core0/select/?q=*:*&group=true&group.field=group_field&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7
> 
> I only get 32,000 results. (the number of documents in a single core.)
> 
> The field that I am grouping on is defined as:
> 
>  multiValued="false" />
> 
>  omitNorms="true"/>
> 
> The document id:
> 
> 
>  required="true" />
> 
>  omitNorms="true"/>
> 
> document_id
> 
> Anyone else experiencing this? Any ideas?
> 
> Thanks,
> Cody




RE: Distributed grouping issue

2012-03-30 Thread Young, Cody
45488-535180165,3554417600-527549713,4608765424-526014561,3524954944-531590393,4183765296-514134497,4206259648-530219973,3465497912-534955957,4213143392-534186349,3140802904-538688961,4328299312-533482537&q=*:*&distrib=false&group.field=group_field&wt=javabin&isShard=true&version=2&rows=10}
 status=0 QTime=5
Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
INFO: [core0] webapp=/solr path=/select/ 
params={shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7&q=*:*&group.field=group_field&group=true}
 status=0 QTime=106 

-Original Message-
From: Young, Cody [mailto:cody.yo...@move.com] 
Sent: Friday, March 30, 2012 4:35 PM
To: solr-user@lucene.apache.org
Subject: Distributed grouping issue

Hi All,

I'm having an issue getting distributed grouping  working on trunk (Mar 29, 
2012).

If I send this query:

http://localhost:8086/solr/core0/select/?q=*:*&group=false 
&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7

I get 260,000 results. As soon as I change to using grouping:

http://localhost:8086/solr/core0/select/?q=*:*&group=true&group.field=group_field&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7

I only get 32,000 results. (the number of documents in a single core.)

The field that I am grouping on is defined as:





The document id:






document_id

Anyone else experiencing this? Any ideas?

Thanks,
Cody


Distributed grouping issue

2012-03-30 Thread Young, Cody
Hi All,

I'm having an issue getting distributed grouping  working on trunk (Mar 29, 
2012).

If I send this query:

http://localhost:8086/solr/core0/select/?q=*:*&group=false 
&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7

I get 260,000 results. As soon as I change to using grouping:

http://localhost:8086/solr/core0/select/?q=*:*&group=true&group.field=group_field&shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7

I only get 32,000 results. (the number of documents in a single core.)

The field that I am grouping on is defined as:





The document id:






document_id

Anyone else experiencing this? Any ideas?

Thanks,
Cody