Issue with field collapsing in solr 4 while performing distributed search

2012-06-11 Thread Nitesh Nandy
Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud  (2 slices and
2 shards)

The setup was done as per the wiki: http://wiki.apache.org/solr/SolrCloud

We are doing distributed search. While querying, we use field collapsing
with ngroups set as true as we need the number of search results.

However, there is a difference in the number of result list returned and
the ngroups value returned.

Ex:
http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3group=truegroup.field=idgroup.ngroups=true


The response XMl looks like

response
script/
lst name=responseHeader
int name=status0/int
int name=QTime46/int
lst name=params
str name=group.fieldid/str
str name=group.ngroupstrue/str
str name=grouptrue/str
str name=qmessagebody:monit AND usergroupid:3/str
/lst
/lst
lst name=grouped
lst name=id
int name=matches10/int
int name=ngroups9/int
arr name=groups
lst
str name=groupValue320043/str
result name=doclist numFound=1 start=0
doc.../doc
/result
/lst
lst
str name=groupValue398807/str
result name=doclist numFound=5 start=0 maxScore=2.4154348...
/result
/lst
lst
str name=groupValue346878/str
result name=doclist numFound=2 start=0.../result
/lst
lst
str name=groupValue346880/str
result name=doclist numFound=2 start=0.../result
/lst
/arr
/lst
/lst
/response

So you can see that the ngroups value returned is 9 and the actual number
of groups returned is 4

Why do we have this discrepancy in the ngroups, matches and actual number
of groups. Is this an open issue ?

 Any kind of help is appreciated.

-- 
Regards,

Nitesh Nandy


Re: Issue with field collapsing in solr 4 while performing distributed search

2012-06-11 Thread Martijn v Groningen
The ngroups returns the number of groups that have matched with the
query. However if you want ngroups to be correct in a distributed
environment you need
to put document belonging to the same group into the same shard.
Groups can't cross shard boundaries. I guess you need to do
some manual document partitioning.

Martijn

On 11 June 2012 14:29, Nitesh Nandy niteshna...@gmail.com wrote:
 Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud  (2 slices and
 2 shards)

 The setup was done as per the wiki: http://wiki.apache.org/solr/SolrCloud

 We are doing distributed search. While querying, we use field collapsing
 with ngroups set as true as we need the number of search results.

 However, there is a difference in the number of result list returned and
 the ngroups value returned.

 Ex:
 http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3group=truegroup.field=idgroup.ngroups=true


 The response XMl looks like

 response
 script/
 lst name=responseHeader
 int name=status0/int
 int name=QTime46/int
 lst name=params
 str name=group.fieldid/str
 str name=group.ngroupstrue/str
 str name=grouptrue/str
 str name=qmessagebody:monit AND usergroupid:3/str
 /lst
 /lst
 lst name=grouped
 lst name=id
 int name=matches10/int
 int name=ngroups9/int
 arr name=groups
 lst
 str name=groupValue320043/str
 result name=doclist numFound=1 start=0
 doc.../doc
 /result
 /lst
 lst
 str name=groupValue398807/str
 result name=doclist numFound=5 start=0 maxScore=2.4154348...
 /result
 /lst
 lst
 str name=groupValue346878/str
 result name=doclist numFound=2 start=0.../result
 /lst
 lst
 str name=groupValue346880/str
 result name=doclist numFound=2 start=0.../result
 /lst
 /arr
 /lst
 /lst
 /response

 So you can see that the ngroups value returned is 9 and the actual number
 of groups returned is 4

 Why do we have this discrepancy in the ngroups, matches and actual number
 of groups. Is this an open issue ?

  Any kind of help is appreciated.

 --
 Regards,

 Nitesh Nandy



-- 
Met vriendelijke groet,

Martijn van Groningen


Re: Issue with field collapsing in solr 4 while performing distributed search

2012-06-11 Thread Jack Krupansky
Is there a Solr wiki that discusses these issues, such as Groups can't 
cross shard boundaries? Seems like it should be highlighted prominently, 
maybe here:

http://wiki.apache.org/solr/FieldCollapsing

Seems like it should be mentioned on the distributed/SolrCloud wiki(s) as 
well.


Is this a distributed IDF type of issue or something else? Is this an 
outright bug or an (insurmountable?) limitation?


I did notice SOLR-2066, but didn't see mention of the limitation. Are there 
any other limitations for distributed grouping?


-- Jack Krupansky

-Original Message- 
From: Martijn v Groningen

Sent: Monday, June 11, 2012 8:53 AM
To: solr-user@lucene.apache.org
Subject: Re: Issue with field collapsing in solr 4 while performing 
distributed search


The ngroups returns the number of groups that have matched with the
query. However if you want ngroups to be correct in a distributed
environment you need
to put document belonging to the same group into the same shard.
Groups can't cross shard boundaries. I guess you need to do
some manual document partitioning.

Martijn

On 11 June 2012 14:29, Nitesh Nandy niteshna...@gmail.com wrote:
Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud  (2 slices 
and

2 shards)

The setup was done as per the wiki: http://wiki.apache.org/solr/SolrCloud

We are doing distributed search. While querying, we use field collapsing
with ngroups set as true as we need the number of search results.

However, there is a difference in the number of result list returned and
the ngroups value returned.

Ex:
http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3group=truegroup.field=idgroup.ngroups=true


The response XMl looks like

response
script/
lst name=responseHeader
int name=status0/int
int name=QTime46/int
lst name=params
str name=group.fieldid/str
str name=group.ngroupstrue/str
str name=grouptrue/str
str name=qmessagebody:monit AND usergroupid:3/str
/lst
/lst
lst name=grouped
lst name=id
int name=matches10/int
int name=ngroups9/int
arr name=groups
lst
str name=groupValue320043/str
result name=doclist numFound=1 start=0
doc.../doc
/result
/lst
lst
str name=groupValue398807/str
result name=doclist numFound=5 start=0 maxScore=2.4154348...
/result
/lst
lst
str name=groupValue346878/str
result name=doclist numFound=2 start=0.../result
/lst
lst
str name=groupValue346880/str
result name=doclist numFound=2 start=0.../result
/lst
/arr
/lst
/lst
/response

So you can see that the ngroups value returned is 9 and the actual number
of groups returned is 4

Why do we have this discrepancy in the ngroups, matches and actual number
of groups. Is this an open issue ?

 Any kind of help is appreciated.

--
Regards,

Nitesh Nandy




--
Met vriendelijke groet,

Martijn van Groningen 



Re: Issue with field collapsing in solr 4 while performing distributed search

2012-06-11 Thread Nitesh Nandy
Martijn,

How do we add a custom algorithm for distributing documents in Solr Cloud?
According to this discussion
http://lucene.472066.n3.nabble.com/SolrCloud-how-to-index-documents-into-a-specific-core-and-how-to-search-against-that-core-td3985262.html
 , Mark discourages users from using custom distribution mechanism in Solr
Cloud.

Load balancing is not an issue for us at the moment. In that case, how
should we implement a custom partitioning algorithm.


On Mon, Jun 11, 2012 at 6:23 PM, Martijn v Groningen 
martijn.v.gronin...@gmail.com wrote:

 The ngroups returns the number of groups that have matched with the
 query. However if you want ngroups to be correct in a distributed
 environment you need
 to put document belonging to the same group into the same shard.
 Groups can't cross shard boundaries. I guess you need to do
 some manual document partitioning.

 Martijn

 On 11 June 2012 14:29, Nitesh Nandy niteshna...@gmail.com wrote:
  Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud  (2 slices
 and
  2 shards)
 
  The setup was done as per the wiki:
 http://wiki.apache.org/solr/SolrCloud
 
  We are doing distributed search. While querying, we use field collapsing
  with ngroups set as true as we need the number of search results.
 
  However, there is a difference in the number of result list returned
 and
  the ngroups value returned.
 
  Ex:
 
 http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3group=truegroup.field=idgroup.ngroups=true
 
 
  The response XMl looks like
 
  response
  script/
  lst name=responseHeader
  int name=status0/int
  int name=QTime46/int
  lst name=params
  str name=group.fieldid/str
  str name=group.ngroupstrue/str
  str name=grouptrue/str
  str name=qmessagebody:monit AND usergroupid:3/str
  /lst
  /lst
  lst name=grouped
  lst name=id
  int name=matches10/int
  int name=ngroups9/int
  arr name=groups
  lst
  str name=groupValue320043/str
  result name=doclist numFound=1 start=0
  doc.../doc
  /result
  /lst
  lst
  str name=groupValue398807/str
  result name=doclist numFound=5 start=0 maxScore=2.4154348...
  /result
  /lst
  lst
  str name=groupValue346878/str
  result name=doclist numFound=2 start=0.../result
  /lst
  lst
  str name=groupValue346880/str
  result name=doclist numFound=2 start=0.../result
  /lst
  /arr
  /lst
  /lst
  /response
 
  So you can see that the ngroups value returned is 9 and the actual number
  of groups returned is 4
 
  Why do we have this discrepancy in the ngroups, matches and actual number
  of groups. Is this an open issue ?
 
   Any kind of help is appreciated.
 
  --
  Regards,
 
  Nitesh Nandy



 --
 Met vriendelijke groet,

 Martijn van Groningen




-- 
Regards,

Nitesh Nandy


Re: Issue with field collapsing in solr 4 while performing distributed search

2012-06-11 Thread roz dev
I think that there is no way around doing custom logic in this case.

If indexing process knows that documents have to be grouped then they
better be together.

-Saroj


On Mon, Jun 11, 2012 at 6:37 AM, Nitesh Nandy niteshna...@gmail.com wrote:

 Martijn,

 How do we add a custom algorithm for distributing documents in Solr Cloud?
 According to this discussion

 http://lucene.472066.n3.nabble.com/SolrCloud-how-to-index-documents-into-a-specific-core-and-how-to-search-against-that-core-td3985262.html
  , Mark discourages users from using custom distribution mechanism in Solr
 Cloud.

 Load balancing is not an issue for us at the moment. In that case, how
 should we implement a custom partitioning algorithm.


 On Mon, Jun 11, 2012 at 6:23 PM, Martijn v Groningen 
 martijn.v.gronin...@gmail.com wrote:

  The ngroups returns the number of groups that have matched with the
  query. However if you want ngroups to be correct in a distributed
  environment you need
  to put document belonging to the same group into the same shard.
  Groups can't cross shard boundaries. I guess you need to do
  some manual document partitioning.
 
  Martijn
 
  On 11 June 2012 14:29, Nitesh Nandy niteshna...@gmail.com wrote:
   Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud  (2 slices
  and
   2 shards)
  
   The setup was done as per the wiki:
  http://wiki.apache.org/solr/SolrCloud
  
   We are doing distributed search. While querying, we use field
 collapsing
   with ngroups set as true as we need the number of search results.
  
   However, there is a difference in the number of result list returned
  and
   the ngroups value returned.
  
   Ex:
  
 
 http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3group=truegroup.field=idgroup.ngroups=true
  
  
   The response XMl looks like
  
   response
   script/
   lst name=responseHeader
   int name=status0/int
   int name=QTime46/int
   lst name=params
   str name=group.fieldid/str
   str name=group.ngroupstrue/str
   str name=grouptrue/str
   str name=qmessagebody:monit AND usergroupid:3/str
   /lst
   /lst
   lst name=grouped
   lst name=id
   int name=matches10/int
   int name=ngroups9/int
   arr name=groups
   lst
   str name=groupValue320043/str
   result name=doclist numFound=1 start=0
   doc.../doc
   /result
   /lst
   lst
   str name=groupValue398807/str
   result name=doclist numFound=5 start=0 maxScore=2.4154348...
   /result
   /lst
   lst
   str name=groupValue346878/str
   result name=doclist numFound=2 start=0.../result
   /lst
   lst
   str name=groupValue346880/str
   result name=doclist numFound=2 start=0.../result
   /lst
   /arr
   /lst
   /lst
   /response
  
   So you can see that the ngroups value returned is 9 and the actual
 number
   of groups returned is 4
  
   Why do we have this discrepancy in the ngroups, matches and actual
 number
   of groups. Is this an open issue ?
  
Any kind of help is appreciated.
  
   --
   Regards,
  
   Nitesh Nandy
 
 
 
  --
  Met vriendelijke groet,
 
  Martijn van Groningen
 



 --
 Regards,

 Nitesh Nandy