Re: Tips for getting unique results?
Hi Pete Still think facets are what you need. We use facets to identify the most common tags for documents in our library. I use them to print the top 25 most common document tags. The sort by count (the default) gives you the one with the highest count first and then the next most common and so on. Hope this helps. Shaun On 8 April 2011 19:28, Peter Spam wrote: > Thanks for the note, Shaun, but the documentation indicates that the > sorting is only in ascending order :-( > > facet.sort > > This param determines the ordering of the facet field constraints. > >• count - sort the constraints by count (highest count first) >• index - to return the constraints sorted in their index order > (lexicographic by indexed term). For terms in the ascii range, this will be > alphabetically sorted. > The default is count if facet.limit is greater than 0, index otherwise. > > Prior to Solr1.4, one needed to use true instead of count and false instead > of index. > > This parameter can be specified on a per field basis. > > > -Pete > > On Apr 8, 2011, at 2:49 AM, Shaun Campbell wrote: > > > Pete > > > > Surely the default sort order for facets is by descending count order. > See > > http://wiki.apache.org/solr/SimpleFacetParameters. If your results are > > really sorted in ascending order can't you sort them externally eg Java? > > > > Hope that helps. > > > > Shaun > >
Re: Tips for getting unique results?
Thanks for the note, Shaun, but the documentation indicates that the sorting is only in ascending order :-( facet.sort This param determines the ordering of the facet field constraints. • count - sort the constraints by count (highest count first) • index - to return the constraints sorted in their index order (lexicographic by indexed term). For terms in the ascii range, this will be alphabetically sorted. The default is count if facet.limit is greater than 0, index otherwise. Prior to Solr1.4, one needed to use true instead of count and false instead of index. This parameter can be specified on a per field basis. -Pete On Apr 8, 2011, at 2:49 AM, Shaun Campbell wrote: > Pete > > Surely the default sort order for facets is by descending count order. See > http://wiki.apache.org/solr/SimpleFacetParameters. If your results are > really sorted in ascending order can't you sort them externally eg Java? > > Hope that helps. > > Shaun
Re: Tips for getting unique results?
Pete Surely the default sort order for facets is by descending count order. See http://wiki.apache.org/solr/SimpleFacetParameters. If your results are really sorted in ascending order can't you sort them externally eg Java? Hope that helps. Shaun
Re: Tips for getting unique results?
I think you can specify the in-group sort, and specify a very small number (perhaps even one) to go in each group. But you'd have to store the length of each body and sort by that. I'm pretty sure grouping is trunk-only. The problem here is getting something that applies just within the group and not across groups... I'm not sure how to tackle that other than perhaps the grouping idea... Best Erick On Thu, Apr 7, 2011 at 6:36 PM, Peter Spam wrote: > Would grouping solve this? I'd rather not move to a pre-release solr ... > > To clarify the problem: > > The data are fine and not duplicated - however, I want to analyze the data, > and summarize one field (kind of like faceting), to understand what the > largest value is. > > For example: > > Document 1: label=1A1A1; body="adfasdfadsfasf" > Document 2: label=5A1B1; body="adfaasdfasdfsdfadsfasf" > Document 3: label=1A1A1; body="adasdfasdfasdffaasdfasdfsdfadsfasf" > Document 4: label=7A1A1; body="azxzxcvdfaasdfasdfsdfadsfasf" > Document 5: label=7A1A1; body="azxzxcvdfaasdfasdfsdasdafadsfasf" > Document 6: label=5A1B1; body="adfaasdfasdfsdfadsfasfzzz" > > How do I get back just ONE of the largest "label" item? > > In other words, what query will return the 7A1A1 label just once? If I > search for q=* and sort the results, it works, except I get back multiple > hits for each label. If I do a facet, I can only sort by increasing order, > when what I want is decreasing order. > > > -Peter > > On Apr 7, 2011, at 10:02 AM, Erick Erickson wrote: > > > What version of Solr are you using? And, assuming the version that > > has it in, have you seen grouping? > > > > Which is another way of asking why you want to do this, perhaps it's an > > XY problem > > > > Best > > Erick > > > > On Thu, Apr 7, 2011 at 1:13 AM, Peter Spam wrote: > > > >> Hi, > >> > >> I have documents with a field that has "1A2B3C" alphanumeric characters. > I > >> can query for * and sort results based on this field, however I'd like > to > >> "uniq" these results (remove duplicates) so that I can get the 5 largest > >> unique values. I can't use the StatsComponent because my values have > >> letters in them too. > >> > >> Faceting (and ignoring the counts) gets me half of the way there, but I > can > >> only sort ascending. If I could also sort facet results descending, I'd > be > >> done. I'd rather not return all documents and just parse the last few > >> results to work around this. > >> > >> Any ideas? > >> > >> > >> -Pete > >> > >
Re: Tips for getting unique results?
Would grouping solve this? I'd rather not move to a pre-release solr ... To clarify the problem: The data are fine and not duplicated - however, I want to analyze the data, and summarize one field (kind of like faceting), to understand what the largest value is. For example: Document 1: label=1A1A1; body="adfasdfadsfasf" Document 2: label=5A1B1; body="adfaasdfasdfsdfadsfasf" Document 3: label=1A1A1; body="adasdfasdfasdffaasdfasdfsdfadsfasf" Document 4: label=7A1A1; body="azxzxcvdfaasdfasdfsdfadsfasf" Document 5: label=7A1A1; body="azxzxcvdfaasdfasdfsdasdafadsfasf" Document 6: label=5A1B1; body="adfaasdfasdfsdfadsfasfzzz" How do I get back just ONE of the largest "label" item? In other words, what query will return the 7A1A1 label just once? If I search for q=* and sort the results, it works, except I get back multiple hits for each label. If I do a facet, I can only sort by increasing order, when what I want is decreasing order. -Peter On Apr 7, 2011, at 10:02 AM, Erick Erickson wrote: > What version of Solr are you using? And, assuming the version that > has it in, have you seen grouping? > > Which is another way of asking why you want to do this, perhaps it's an > XY problem > > Best > Erick > > On Thu, Apr 7, 2011 at 1:13 AM, Peter Spam wrote: > >> Hi, >> >> I have documents with a field that has "1A2B3C" alphanumeric characters. I >> can query for * and sort results based on this field, however I'd like to >> "uniq" these results (remove duplicates) so that I can get the 5 largest >> unique values. I can't use the StatsComponent because my values have >> letters in them too. >> >> Faceting (and ignoring the counts) gets me half of the way there, but I can >> only sort ascending. If I could also sort facet results descending, I'd be >> done. I'd rather not return all documents and just parse the last few >> results to work around this. >> >> Any ideas? >> >> >> -Pete >>
Re: Tips for getting unique results?
What version of Solr are you using? And, assuming the version that has it in, have you seen grouping? Which is another way of asking why you want to do this, perhaps it's an XY problem Best Erick On Thu, Apr 7, 2011 at 1:13 AM, Peter Spam wrote: > Hi, > > I have documents with a field that has "1A2B3C" alphanumeric characters. I > can query for * and sort results based on this field, however I'd like to > "uniq" these results (remove duplicates) so that I can get the 5 largest > unique values. I can't use the StatsComponent because my values have > letters in them too. > > Faceting (and ignoring the counts) gets me half of the way there, but I can > only sort ascending. If I could also sort facet results descending, I'd be > done. I'd rather not return all documents and just parse the last few > results to work around this. > > Any ideas? > > > -Pete >
Re: Tips for getting unique results?
The data are fine and not duplicated - however, I want to analyze the data, and summarize one field (kind of like faceting), to understand what the largest value is. For example: Document 1: label=1A1A1; body="adfasdfadsfasf" Document 2: label=5A1B1; body="adfaasdfasdfsdfadsfasf" Document 3: label=1A1A1; body="adasdfasdfasdffaasdfasdfsdfadsfasf" Document 4: label=7A1A1; body="azxzxcvdfaasdfasdfsdfadsfasf" Document 5: label=7A1A1; body="azxzxcvdfaasdfasdfsdasdafadsfasf" Document 6: label=5A1B1; body="adfaasdfasdfsdfadsfasfzzz" How do I get back just ONE of the largest "label" item? In other words, what query will return the 7A1A1 label just once? If I search for q=* and sort the results, it works, except I get back multiple hits for each label. If I do a facet, I can only sort by increasing order, when what I want is decreasing order. -Pete On Apr 6, 2011, at 10:22 PM, Otis Gospodnetic wrote: > Hi, > > I think you are saying dupes are the main problem? If so, > http://wiki.apache.org/solr/Deduplication ? > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message >> From: Peter Spam >> To: solr-user@lucene.apache.org >> Sent: Thu, April 7, 2011 1:13:44 AM >> Subject: Tips for getting unique results? >> >> Hi, >> >> I have documents with a field that has "1A2B3C" alphanumeric characters. I >> can query for * and sort results based on this field, however I'd like to >> "uniq" these results (remove duplicates) so that I can get the 5 largest >> unique >> values. I can't use the StatsComponent because my values have letters in >> them >> too. >> >> Faceting (and ignoring the counts) gets me half of the way there, but I can >> only sort ascending. If I could also sort facet results descending, I'd be >> done. I'd rather not return all documents and just parse the last few >> results >> to work around this. >> >> Any ideas? >> >> >> -Pete >>
Re: Tips for getting unique results?
Hi, I think you are saying dupes are the main problem? If so, http://wiki.apache.org/solr/Deduplication ? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Peter Spam > To: solr-user@lucene.apache.org > Sent: Thu, April 7, 2011 1:13:44 AM > Subject: Tips for getting unique results? > > Hi, > > I have documents with a field that has "1A2B3C" alphanumeric characters. I >can query for * and sort results based on this field, however I'd like to >"uniq" these results (remove duplicates) so that I can get the 5 largest >unique >values. I can't use the StatsComponent because my values have letters in >them >too. > > Faceting (and ignoring the counts) gets me half of the way there, but I can >only sort ascending. If I could also sort facet results descending, I'd be >done. I'd rather not return all documents and just parse the last few >results >to work around this. > > Any ideas? > > > -Pete >