The data are fine and not duplicated - however, I want to analyze the data, and summarize one field (kind of like faceting), to understand what the largest value is.
For example: Document 1: label=1A1A1; body="adfasdfadsfasf" Document 2: label=5A1B1; body="adfaasdfasdfsdfadsfasf" Document 3: label=1A1A1; body="adasdfasdfasdffaasdfasdfsdfadsfasf" Document 4: label=7A1A1; body="azxzxcvdfaasdfasdfsdfadsfasf" Document 5: label=7A1A1; body="azxzxcvdfaasdfasdfsdasdaaaaafadsfasf" Document 6: label=5A1B1; body="adfaasdfasdfsdfadsfasfzzz" How do I get back just ONE of the largest "label" item? In other words, what query will return the 7A1A1 label just once? If I search for q=* and sort the results, it works, except I get back multiple hits for each label. If I do a facet, I can only sort by increasing order, when what I want is decreasing order. -Pete On Apr 6, 2011, at 10:22 PM, Otis Gospodnetic wrote: > Hi, > > I think you are saying dupes are the main problem? If so, > http://wiki.apache.org/solr/Deduplication ? > > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > ----- Original Message ---- >> From: Peter Spam <ps...@mac.com> >> To: solr-user@lucene.apache.org >> Sent: Thu, April 7, 2011 1:13:44 AM >> Subject: Tips for getting unique results? >> >> Hi, >> >> I have documents with a field that has "1A2B3C" alphanumeric characters. I >> can query for * and sort results based on this field, however I'd like to >> "uniq" these results (remove duplicates) so that I can get the 5 largest >> unique >> values. I can't use the StatsComponent because my values have letters in >> them >> too. >> >> Faceting (and ignoring the counts) gets me half of the way there, but I can >> only sort ascending. If I could also sort facet results descending, I'd be >> done. I'd rather not return all documents and just parse the last few >> results >> to work around this. >> >> Any ideas? >> >> >> -Pete >>