Issue with field collapsing in solr 4 while performing distributed search
Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud (2 slices and 2 shards) The setup was done as per the wiki: http://wiki.apache.org/solr/SolrCloud We are doing distributed search. While querying, we use field collapsing with ngroups set as true as we need the number of search results. However, there is a difference in the number of result list returned and the ngroups value returned. Ex: http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3group=truegroup.field=idgroup.ngroups=true The response XMl looks like response script/ lst name=responseHeader int name=status0/int int name=QTime46/int lst name=params str name=group.fieldid/str str name=group.ngroupstrue/str str name=grouptrue/str str name=qmessagebody:monit AND usergroupid:3/str /lst /lst lst name=grouped lst name=id int name=matches10/int int name=ngroups9/int arr name=groups lst str name=groupValue320043/str result name=doclist numFound=1 start=0 doc.../doc /result /lst lst str name=groupValue398807/str result name=doclist numFound=5 start=0 maxScore=2.4154348... /result /lst lst str name=groupValue346878/str result name=doclist numFound=2 start=0.../result /lst lst str name=groupValue346880/str result name=doclist numFound=2 start=0.../result /lst /arr /lst /lst /response So you can see that the ngroups value returned is 9 and the actual number of groups returned is 4 Why do we have this discrepancy in the ngroups, matches and actual number of groups. Is this an open issue ? Any kind of help is appreciated. -- Regards, Nitesh Nandy
Re: Issue with field collapsing in solr 4 while performing distributed search
The ngroups returns the number of groups that have matched with the query. However if you want ngroups to be correct in a distributed environment you need to put document belonging to the same group into the same shard. Groups can't cross shard boundaries. I guess you need to do some manual document partitioning. Martijn On 11 June 2012 14:29, Nitesh Nandy niteshna...@gmail.com wrote: Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud (2 slices and 2 shards) The setup was done as per the wiki: http://wiki.apache.org/solr/SolrCloud We are doing distributed search. While querying, we use field collapsing with ngroups set as true as we need the number of search results. However, there is a difference in the number of result list returned and the ngroups value returned. Ex: http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3group=truegroup.field=idgroup.ngroups=true The response XMl looks like response script/ lst name=responseHeader int name=status0/int int name=QTime46/int lst name=params str name=group.fieldid/str str name=group.ngroupstrue/str str name=grouptrue/str str name=qmessagebody:monit AND usergroupid:3/str /lst /lst lst name=grouped lst name=id int name=matches10/int int name=ngroups9/int arr name=groups lst str name=groupValue320043/str result name=doclist numFound=1 start=0 doc.../doc /result /lst lst str name=groupValue398807/str result name=doclist numFound=5 start=0 maxScore=2.4154348... /result /lst lst str name=groupValue346878/str result name=doclist numFound=2 start=0.../result /lst lst str name=groupValue346880/str result name=doclist numFound=2 start=0.../result /lst /arr /lst /lst /response So you can see that the ngroups value returned is 9 and the actual number of groups returned is 4 Why do we have this discrepancy in the ngroups, matches and actual number of groups. Is this an open issue ? Any kind of help is appreciated. -- Regards, Nitesh Nandy -- Met vriendelijke groet, Martijn van Groningen
Re: Issue with field collapsing in solr 4 while performing distributed search
Is there a Solr wiki that discusses these issues, such as Groups can't cross shard boundaries? Seems like it should be highlighted prominently, maybe here: http://wiki.apache.org/solr/FieldCollapsing Seems like it should be mentioned on the distributed/SolrCloud wiki(s) as well. Is this a distributed IDF type of issue or something else? Is this an outright bug or an (insurmountable?) limitation? I did notice SOLR-2066, but didn't see mention of the limitation. Are there any other limitations for distributed grouping? -- Jack Krupansky -Original Message- From: Martijn v Groningen Sent: Monday, June 11, 2012 8:53 AM To: solr-user@lucene.apache.org Subject: Re: Issue with field collapsing in solr 4 while performing distributed search The ngroups returns the number of groups that have matched with the query. However if you want ngroups to be correct in a distributed environment you need to put document belonging to the same group into the same shard. Groups can't cross shard boundaries. I guess you need to do some manual document partitioning. Martijn On 11 June 2012 14:29, Nitesh Nandy niteshna...@gmail.com wrote: Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud (2 slices and 2 shards) The setup was done as per the wiki: http://wiki.apache.org/solr/SolrCloud We are doing distributed search. While querying, we use field collapsing with ngroups set as true as we need the number of search results. However, there is a difference in the number of result list returned and the ngroups value returned. Ex: http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3group=truegroup.field=idgroup.ngroups=true The response XMl looks like response script/ lst name=responseHeader int name=status0/int int name=QTime46/int lst name=params str name=group.fieldid/str str name=group.ngroupstrue/str str name=grouptrue/str str name=qmessagebody:monit AND usergroupid:3/str /lst /lst lst name=grouped lst name=id int name=matches10/int int name=ngroups9/int arr name=groups lst str name=groupValue320043/str result name=doclist numFound=1 start=0 doc.../doc /result /lst lst str name=groupValue398807/str result name=doclist numFound=5 start=0 maxScore=2.4154348... /result /lst lst str name=groupValue346878/str result name=doclist numFound=2 start=0.../result /lst lst str name=groupValue346880/str result name=doclist numFound=2 start=0.../result /lst /arr /lst /lst /response So you can see that the ngroups value returned is 9 and the actual number of groups returned is 4 Why do we have this discrepancy in the ngroups, matches and actual number of groups. Is this an open issue ? Any kind of help is appreciated. -- Regards, Nitesh Nandy -- Met vriendelijke groet, Martijn van Groningen
Re: Issue with field collapsing in solr 4 while performing distributed search
Martijn, How do we add a custom algorithm for distributing documents in Solr Cloud? According to this discussion http://lucene.472066.n3.nabble.com/SolrCloud-how-to-index-documents-into-a-specific-core-and-how-to-search-against-that-core-td3985262.html , Mark discourages users from using custom distribution mechanism in Solr Cloud. Load balancing is not an issue for us at the moment. In that case, how should we implement a custom partitioning algorithm. On Mon, Jun 11, 2012 at 6:23 PM, Martijn v Groningen martijn.v.gronin...@gmail.com wrote: The ngroups returns the number of groups that have matched with the query. However if you want ngroups to be correct in a distributed environment you need to put document belonging to the same group into the same shard. Groups can't cross shard boundaries. I guess you need to do some manual document partitioning. Martijn On 11 June 2012 14:29, Nitesh Nandy niteshna...@gmail.com wrote: Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud (2 slices and 2 shards) The setup was done as per the wiki: http://wiki.apache.org/solr/SolrCloud We are doing distributed search. While querying, we use field collapsing with ngroups set as true as we need the number of search results. However, there is a difference in the number of result list returned and the ngroups value returned. Ex: http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3group=truegroup.field=idgroup.ngroups=true The response XMl looks like response script/ lst name=responseHeader int name=status0/int int name=QTime46/int lst name=params str name=group.fieldid/str str name=group.ngroupstrue/str str name=grouptrue/str str name=qmessagebody:monit AND usergroupid:3/str /lst /lst lst name=grouped lst name=id int name=matches10/int int name=ngroups9/int arr name=groups lst str name=groupValue320043/str result name=doclist numFound=1 start=0 doc.../doc /result /lst lst str name=groupValue398807/str result name=doclist numFound=5 start=0 maxScore=2.4154348... /result /lst lst str name=groupValue346878/str result name=doclist numFound=2 start=0.../result /lst lst str name=groupValue346880/str result name=doclist numFound=2 start=0.../result /lst /arr /lst /lst /response So you can see that the ngroups value returned is 9 and the actual number of groups returned is 4 Why do we have this discrepancy in the ngroups, matches and actual number of groups. Is this an open issue ? Any kind of help is appreciated. -- Regards, Nitesh Nandy -- Met vriendelijke groet, Martijn van Groningen -- Regards, Nitesh Nandy
Re: Issue with field collapsing in solr 4 while performing distributed search
I think that there is no way around doing custom logic in this case. If indexing process knows that documents have to be grouped then they better be together. -Saroj On Mon, Jun 11, 2012 at 6:37 AM, Nitesh Nandy niteshna...@gmail.com wrote: Martijn, How do we add a custom algorithm for distributing documents in Solr Cloud? According to this discussion http://lucene.472066.n3.nabble.com/SolrCloud-how-to-index-documents-into-a-specific-core-and-how-to-search-against-that-core-td3985262.html , Mark discourages users from using custom distribution mechanism in Solr Cloud. Load balancing is not an issue for us at the moment. In that case, how should we implement a custom partitioning algorithm. On Mon, Jun 11, 2012 at 6:23 PM, Martijn v Groningen martijn.v.gronin...@gmail.com wrote: The ngroups returns the number of groups that have matched with the query. However if you want ngroups to be correct in a distributed environment you need to put document belonging to the same group into the same shard. Groups can't cross shard boundaries. I guess you need to do some manual document partitioning. Martijn On 11 June 2012 14:29, Nitesh Nandy niteshna...@gmail.com wrote: Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud (2 slices and 2 shards) The setup was done as per the wiki: http://wiki.apache.org/solr/SolrCloud We are doing distributed search. While querying, we use field collapsing with ngroups set as true as we need the number of search results. However, there is a difference in the number of result list returned and the ngroups value returned. Ex: http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3group=truegroup.field=idgroup.ngroups=true The response XMl looks like response script/ lst name=responseHeader int name=status0/int int name=QTime46/int lst name=params str name=group.fieldid/str str name=group.ngroupstrue/str str name=grouptrue/str str name=qmessagebody:monit AND usergroupid:3/str /lst /lst lst name=grouped lst name=id int name=matches10/int int name=ngroups9/int arr name=groups lst str name=groupValue320043/str result name=doclist numFound=1 start=0 doc.../doc /result /lst lst str name=groupValue398807/str result name=doclist numFound=5 start=0 maxScore=2.4154348... /result /lst lst str name=groupValue346878/str result name=doclist numFound=2 start=0.../result /lst lst str name=groupValue346880/str result name=doclist numFound=2 start=0.../result /lst /arr /lst /lst /response So you can see that the ngroups value returned is 9 and the actual number of groups returned is 4 Why do we have this discrepancy in the ngroups, matches and actual number of groups. Is this an open issue ? Any kind of help is appreciated. -- Regards, Nitesh Nandy -- Met vriendelijke groet, Martijn van Groningen -- Regards, Nitesh Nandy