Re: Tips for getting unique results?

2011-04-10 Thread Shaun Campbell
Hi Pete

Still think facets are what you need. We use facets to identify the most
common tags for documents in our library.  I use them to print the top 25
most common document tags.  The sort by count (the default) gives you the
one with the highest count first and then the next most common and so on.

Hope this helps.
Shaun

On 8 April 2011 19:28, Peter Spam  wrote:

> Thanks for the note, Shaun, but the documentation indicates that the
> sorting is only in ascending order :-(
>
> facet.sort
>
> This param determines the ordering of the facet field constraints.
>
>• count - sort the constraints by count (highest count first)
>• index - to return the constraints sorted in their index order
> (lexicographic by indexed term). For terms in the ascii range, this will be
> alphabetically sorted.
> The default is count if facet.limit is greater than 0, index otherwise.
>
> Prior to Solr1.4, one needed to use true instead of count and false instead
> of index.
>
> This parameter can be specified on a per field basis.
>
>
> -Pete
>
> On Apr 8, 2011, at 2:49 AM, Shaun Campbell wrote:
>
> > Pete
> >
> > Surely the default sort order for facets is by descending count order.
>  See
> > http://wiki.apache.org/solr/SimpleFacetParameters.  If your results are
> > really sorted in ascending order can't you sort them externally eg Java?
> >
> > Hope that helps.
> >
> > Shaun
>
>


Re: Tips for getting unique results?

2011-04-08 Thread Peter Spam
Thanks for the note, Shaun, but the documentation indicates that the sorting is 
only in ascending order :-(

facet.sort

This param determines the ordering of the facet field constraints.

• count - sort the constraints by count (highest count first)
• index - to return the constraints sorted in their index order 
(lexicographic by indexed term). For terms in the ascii range, this will be 
alphabetically sorted.
The default is count if facet.limit is greater than 0, index otherwise.

Prior to Solr1.4, one needed to use true instead of count and false instead of 
index.

This parameter can be specified on a per field basis.


-Pete

On Apr 8, 2011, at 2:49 AM, Shaun Campbell wrote:

> Pete
> 
> Surely the default sort order for facets is by descending count order.  See
> http://wiki.apache.org/solr/SimpleFacetParameters.  If your results are
> really sorted in ascending order can't you sort them externally eg Java?
> 
> Hope that helps.
> 
> Shaun



Re: Tips for getting unique results?

2011-04-08 Thread Shaun Campbell
Pete

Surely the default sort order for facets is by descending count order.  See
http://wiki.apache.org/solr/SimpleFacetParameters.  If your results are
really sorted in ascending order can't you sort them externally eg Java?

Hope that helps.

Shaun


Re: Tips for getting unique results?

2011-04-07 Thread Erick Erickson
I think you can specify the in-group sort, and specify a very small number
(perhaps
even one) to go in each group. But you'd have to store the length of each
body and sort by that.

I'm pretty sure grouping is trunk-only.

The problem here is getting something that applies
just within the group and not across groups... I'm not sure how to tackle
that
other than perhaps the grouping idea...

Best
Erick

On Thu, Apr 7, 2011 at 6:36 PM, Peter Spam  wrote:

> Would grouping solve this?  I'd rather not move to a pre-release solr ...
>
> To clarify the problem:
>
> The data are fine and not duplicated - however, I want to analyze the data,
> and summarize one field (kind of like faceting), to understand what the
> largest value is.
>
> For example:
>
> Document 1:   label=1A1A1; body="adfasdfadsfasf"
> Document 2:   label=5A1B1; body="adfaasdfasdfsdfadsfasf"
> Document 3:   label=1A1A1; body="adasdfasdfasdffaasdfasdfsdfadsfasf"
> Document 4:   label=7A1A1; body="azxzxcvdfaasdfasdfsdfadsfasf"
> Document 5:   label=7A1A1; body="azxzxcvdfaasdfasdfsdasdafadsfasf"
> Document 6:   label=5A1B1; body="adfaasdfasdfsdfadsfasfzzz"
>
> How do I get back just ONE of the largest "label" item?
>
> In other words, what query will return the 7A1A1 label just once?  If I
> search for q=* and sort the results, it works, except I get back multiple
> hits for each label.  If I do a facet, I can only sort by increasing order,
> when what I want is decreasing order.
>
>
> -Peter
>
> On Apr 7, 2011, at 10:02 AM, Erick Erickson wrote:
>
> > What version of Solr are you using? And, assuming the version that
> > has it in, have you seen grouping?
> >
> > Which is another way of asking why you want to do this, perhaps it's an
> > XY problem
> >
> > Best
> > Erick
> >
> > On Thu, Apr 7, 2011 at 1:13 AM, Peter Spam  wrote:
> >
> >> Hi,
> >>
> >> I have documents with a field that has "1A2B3C" alphanumeric characters.
>  I
> >> can query for * and sort results based on this field, however I'd like
> to
> >> "uniq" these results (remove duplicates) so that I can get the 5 largest
> >> unique values.  I can't use the StatsComponent because my values have
> >> letters in them too.
> >>
> >> Faceting (and ignoring the counts) gets me half of the way there, but I
> can
> >> only sort ascending.  If I could also sort facet results descending, I'd
> be
> >> done.  I'd rather not return all documents and just parse the last few
> >> results to work around this.
> >>
> >> Any ideas?
> >>
> >>
> >> -Pete
> >>
>
>


Re: Tips for getting unique results?

2011-04-07 Thread Peter Spam
Would grouping solve this?  I'd rather not move to a pre-release solr ...

To clarify the problem:

The data are fine and not duplicated - however, I want to analyze the data, and 
summarize one field (kind of like faceting), to understand what the largest 
value is.

For example:

Document 1:   label=1A1A1; body="adfasdfadsfasf"
Document 2:   label=5A1B1; body="adfaasdfasdfsdfadsfasf"
Document 3:   label=1A1A1; body="adasdfasdfasdffaasdfasdfsdfadsfasf"
Document 4:   label=7A1A1; body="azxzxcvdfaasdfasdfsdfadsfasf"
Document 5:   label=7A1A1; body="azxzxcvdfaasdfasdfsdasdafadsfasf"
Document 6:   label=5A1B1; body="adfaasdfasdfsdfadsfasfzzz"

How do I get back just ONE of the largest "label" item?

In other words, what query will return the 7A1A1 label just once?  If I search 
for q=* and sort the results, it works, except I get back multiple hits for 
each label.  If I do a facet, I can only sort by increasing order, when what I 
want is decreasing order.


-Peter

On Apr 7, 2011, at 10:02 AM, Erick Erickson wrote:

> What version of Solr are you using? And, assuming the version that
> has it in, have you seen grouping?
> 
> Which is another way of asking why you want to do this, perhaps it's an
> XY problem
> 
> Best
> Erick
> 
> On Thu, Apr 7, 2011 at 1:13 AM, Peter Spam  wrote:
> 
>> Hi,
>> 
>> I have documents with a field that has "1A2B3C" alphanumeric characters.  I
>> can query for * and sort results based on this field, however I'd like to
>> "uniq" these results (remove duplicates) so that I can get the 5 largest
>> unique values.  I can't use the StatsComponent because my values have
>> letters in them too.
>> 
>> Faceting (and ignoring the counts) gets me half of the way there, but I can
>> only sort ascending.  If I could also sort facet results descending, I'd be
>> done.  I'd rather not return all documents and just parse the last few
>> results to work around this.
>> 
>> Any ideas?
>> 
>> 
>> -Pete
>> 



Re: Tips for getting unique results?

2011-04-07 Thread Erick Erickson
What version of Solr are you using? And, assuming the version that
has it in, have you seen grouping?

Which is another way of asking why you want to do this, perhaps it's an
XY problem

Best
Erick

On Thu, Apr 7, 2011 at 1:13 AM, Peter Spam  wrote:

> Hi,
>
> I have documents with a field that has "1A2B3C" alphanumeric characters.  I
> can query for * and sort results based on this field, however I'd like to
> "uniq" these results (remove duplicates) so that I can get the 5 largest
> unique values.  I can't use the StatsComponent because my values have
> letters in them too.
>
> Faceting (and ignoring the counts) gets me half of the way there, but I can
> only sort ascending.  If I could also sort facet results descending, I'd be
> done.  I'd rather not return all documents and just parse the last few
> results to work around this.
>
> Any ideas?
>
>
> -Pete
>


Re: Tips for getting unique results?

2011-04-07 Thread Peter Spam
The data are fine and not duplicated - however, I want to analyze the data, and 
summarize one field (kind of like faceting), to understand what the largest 
value is.

For example:

Document 1:   label=1A1A1; body="adfasdfadsfasf"
Document 2:   label=5A1B1; body="adfaasdfasdfsdfadsfasf"
Document 3:   label=1A1A1; body="adasdfasdfasdffaasdfasdfsdfadsfasf"
Document 4:   label=7A1A1; body="azxzxcvdfaasdfasdfsdfadsfasf"
Document 5:   label=7A1A1; body="azxzxcvdfaasdfasdfsdasdafadsfasf"
Document 6:   label=5A1B1; body="adfaasdfasdfsdfadsfasfzzz"

How do I get back just ONE of the largest "label" item?

In other words, what query will return the 7A1A1 label just once?  If I search 
for q=* and sort the results, it works, except I get back multiple hits for 
each label.  If I do a facet, I can only sort by increasing order, when what I 
want is decreasing order.


-Pete
 
On Apr 6, 2011, at 10:22 PM, Otis Gospodnetic wrote:

> Hi,
> 
> I think you are saying dupes are the main problem?  If so, 
> http://wiki.apache.org/solr/Deduplication ?
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> - Original Message 
>> From: Peter Spam 
>> To: solr-user@lucene.apache.org
>> Sent: Thu, April 7, 2011 1:13:44 AM
>> Subject: Tips for getting unique results?
>> 
>> Hi,
>> 
>> I have documents with a field that has "1A2B3C" alphanumeric  characters.  I 
>> can query for * and sort results based on this field,  however I'd like to 
>> "uniq" these results (remove duplicates) so that I can get  the 5 largest 
>> unique 
>> values.  I can't use the StatsComponent because my  values have letters in 
>> them 
>> too.
>> 
>> Faceting (and ignoring the counts) gets  me half of the way there, but I can 
>> only sort ascending.  If I could also  sort facet results descending, I'd be 
>> done.  I'd rather not return all  documents and just parse the last few 
>> results 
>> to work around this.
>> 
>> Any  ideas?
>> 
>> 
>> -Pete
>> 



Re: Tips for getting unique results?

2011-04-06 Thread Otis Gospodnetic
Hi,

I think you are saying dupes are the main problem?  If so, 
http://wiki.apache.org/solr/Deduplication ?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Peter Spam 
> To: solr-user@lucene.apache.org
> Sent: Thu, April 7, 2011 1:13:44 AM
> Subject: Tips for getting unique results?
> 
> Hi,
> 
> I have documents with a field that has "1A2B3C" alphanumeric  characters.  I 
>can query for * and sort results based on this field,  however I'd like to 
>"uniq" these results (remove duplicates) so that I can get  the 5 largest 
>unique 
>values.  I can't use the StatsComponent because my  values have letters in 
>them 
>too.
> 
> Faceting (and ignoring the counts) gets  me half of the way there, but I can 
>only sort ascending.  If I could also  sort facet results descending, I'd be 
>done.  I'd rather not return all  documents and just parse the last few 
>results 
>to work around this.
> 
> Any  ideas?
> 
> 
> -Pete
>