This wont work, see my thread on Solr3.6 Field collapsing Thanks, Tirthankar
-----Original Message----- From: Tom Burton-West <tburt...@umich.edu> Date: Tue, 21 Aug 2012 18:39:25 To: solr-user@lucene.apache.org<solr-user@lucene.apache.org> Reply-To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> Cc: William Dueber<dueb...@umich.edu>; Phillip Farber<pfar...@umich.edu> Subject: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents? Hello all, We are thinking about using Solr Field Collapsing on a rather large scale and wonder if anyone has experience with performance when doing Field Collapsing on millions of or billions of documents (details below. ) Are there performance issues with grouping large result sets? Details: We have a collection of the full text of 10 million books/journals. This is spread across 12 shards with each shard holding about 800,000 documents. When a query matches a journal article, we would like to group all the matching articles from the same journal together. (there is a unique id field identifying the journal). Similarly when there is a match in multiple copies of the same book we would like to group all results for the same book together (again we have a unique id field we can group on). Sometimes a short query against the OCR field will result in over one million hits. Are there known performance issues when field collapsing result sets containing a million hits? We currently index the entire book as one Solr document. We would like to investigate the feasibility of indexing each page as a Solr document with a field indicating the book id. We could then offer our users the choice of a list of the most relevant pages, or a list of the books containing the most relevant pages. We have approximately 3 billion pages. Does anyone have experience using field collapsing on this sort of scale? Tom Tom Burton-West Information Retrieval Programmer Digital Library Production Service Univerity of Michigan Library http://www.hathitrust.org/blogs/large-scale-search ******************Legal Disclaimer*************************** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you." *********************************************************