Re: Tips for getting unique results?

2011-04-10 Thread Shaun Campbell
Hi Pete

Still think facets are what you need. We use facets to identify the most
common tags for documents in our library.  I use them to print the top 25
most common document tags.  The sort by count (the default) gives you the
one with the highest count first and then the next most common and so on.

Hope this helps.
Shaun

On 8 April 2011 19:28, Peter Spam ps...@mac.com wrote:

 Thanks for the note, Shaun, but the documentation indicates that the
 sorting is only in ascending order :-(

 facet.sort

 This param determines the ordering of the facet field constraints.

• count - sort the constraints by count (highest count first)
• index - to return the constraints sorted in their index order
 (lexicographic by indexed term). For terms in the ascii range, this will be
 alphabetically sorted.
 The default is count if facet.limit is greater than 0, index otherwise.

 Prior to Solr1.4, one needed to use true instead of count and false instead
 of index.

 This parameter can be specified on a per field basis.


 -Pete

 On Apr 8, 2011, at 2:49 AM, Shaun Campbell wrote:

  Pete
 
  Surely the default sort order for facets is by descending count order.
  See
  http://wiki.apache.org/solr/SimpleFacetParameters.  If your results are
  really sorted in ascending order can't you sort them externally eg Java?
 
  Hope that helps.
 
  Shaun




Re: Tips for getting unique results?

2011-04-08 Thread Shaun Campbell
Pete

Surely the default sort order for facets is by descending count order.  See
http://wiki.apache.org/solr/SimpleFacetParameters.  If your results are
really sorted in ascending order can't you sort them externally eg Java?

Hope that helps.

Shaun


Re: Tips for getting unique results?

2011-04-08 Thread Peter Spam
Thanks for the note, Shaun, but the documentation indicates that the sorting is 
only in ascending order :-(

facet.sort

This param determines the ordering of the facet field constraints.

• count - sort the constraints by count (highest count first)
• index - to return the constraints sorted in their index order 
(lexicographic by indexed term). For terms in the ascii range, this will be 
alphabetically sorted.
The default is count if facet.limit is greater than 0, index otherwise.

Prior to Solr1.4, one needed to use true instead of count and false instead of 
index.

This parameter can be specified on a per field basis.


-Pete

On Apr 8, 2011, at 2:49 AM, Shaun Campbell wrote:

 Pete
 
 Surely the default sort order for facets is by descending count order.  See
 http://wiki.apache.org/solr/SimpleFacetParameters.  If your results are
 really sorted in ascending order can't you sort them externally eg Java?
 
 Hope that helps.
 
 Shaun



Re: Tips for getting unique results?

2011-04-07 Thread Peter Spam
The data are fine and not duplicated - however, I want to analyze the data, and 
summarize one field (kind of like faceting), to understand what the largest 
value is.

For example:

Document 1:   label=1A1A1; body=adfasdfadsfasf
Document 2:   label=5A1B1; body=adfaasdfasdfsdfadsfasf
Document 3:   label=1A1A1; body=adasdfasdfasdffaasdfasdfsdfadsfasf
Document 4:   label=7A1A1; body=azxzxcvdfaasdfasdfsdfadsfasf
Document 5:   label=7A1A1; body=azxzxcvdfaasdfasdfsdasdafadsfasf
Document 6:   label=5A1B1; body=adfaasdfasdfsdfadsfasfzzz

How do I get back just ONE of the largest label item?

In other words, what query will return the 7A1A1 label just once?  If I search 
for q=* and sort the results, it works, except I get back multiple hits for 
each label.  If I do a facet, I can only sort by increasing order, when what I 
want is decreasing order.


-Pete
 
On Apr 6, 2011, at 10:22 PM, Otis Gospodnetic wrote:

 Hi,
 
 I think you are saying dupes are the main problem?  If so, 
 http://wiki.apache.org/solr/Deduplication ?
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/
 
 
 
 - Original Message 
 From: Peter Spam ps...@mac.com
 To: solr-user@lucene.apache.org
 Sent: Thu, April 7, 2011 1:13:44 AM
 Subject: Tips for getting unique results?
 
 Hi,
 
 I have documents with a field that has 1A2B3C alphanumeric  characters.  I 
 can query for * and sort results based on this field,  however I'd like to 
 uniq these results (remove duplicates) so that I can get  the 5 largest 
 unique 
 values.  I can't use the StatsComponent because my  values have letters in 
 them 
 too.
 
 Faceting (and ignoring the counts) gets  me half of the way there, but I can 
 only sort ascending.  If I could also  sort facet results descending, I'd be 
 done.  I'd rather not return all  documents and just parse the last few 
 results 
 to work around this.
 
 Any  ideas?
 
 
 -Pete
 



Re: Tips for getting unique results?

2011-04-07 Thread Erick Erickson
What version of Solr are you using? And, assuming the version that
has it in, have you seen grouping?

Which is another way of asking why you want to do this, perhaps it's an
XY problem

Best
Erick

On Thu, Apr 7, 2011 at 1:13 AM, Peter Spam ps...@mac.com wrote:

 Hi,

 I have documents with a field that has 1A2B3C alphanumeric characters.  I
 can query for * and sort results based on this field, however I'd like to
 uniq these results (remove duplicates) so that I can get the 5 largest
 unique values.  I can't use the StatsComponent because my values have
 letters in them too.

 Faceting (and ignoring the counts) gets me half of the way there, but I can
 only sort ascending.  If I could also sort facet results descending, I'd be
 done.  I'd rather not return all documents and just parse the last few
 results to work around this.

 Any ideas?


 -Pete



Re: Tips for getting unique results?

2011-04-07 Thread Peter Spam
Would grouping solve this?  I'd rather not move to a pre-release solr ...

To clarify the problem:

The data are fine and not duplicated - however, I want to analyze the data, and 
summarize one field (kind of like faceting), to understand what the largest 
value is.

For example:

Document 1:   label=1A1A1; body=adfasdfadsfasf
Document 2:   label=5A1B1; body=adfaasdfasdfsdfadsfasf
Document 3:   label=1A1A1; body=adasdfasdfasdffaasdfasdfsdfadsfasf
Document 4:   label=7A1A1; body=azxzxcvdfaasdfasdfsdfadsfasf
Document 5:   label=7A1A1; body=azxzxcvdfaasdfasdfsdasdafadsfasf
Document 6:   label=5A1B1; body=adfaasdfasdfsdfadsfasfzzz

How do I get back just ONE of the largest label item?

In other words, what query will return the 7A1A1 label just once?  If I search 
for q=* and sort the results, it works, except I get back multiple hits for 
each label.  If I do a facet, I can only sort by increasing order, when what I 
want is decreasing order.


-Peter

On Apr 7, 2011, at 10:02 AM, Erick Erickson wrote:

 What version of Solr are you using? And, assuming the version that
 has it in, have you seen grouping?
 
 Which is another way of asking why you want to do this, perhaps it's an
 XY problem
 
 Best
 Erick
 
 On Thu, Apr 7, 2011 at 1:13 AM, Peter Spam ps...@mac.com wrote:
 
 Hi,
 
 I have documents with a field that has 1A2B3C alphanumeric characters.  I
 can query for * and sort results based on this field, however I'd like to
 uniq these results (remove duplicates) so that I can get the 5 largest
 unique values.  I can't use the StatsComponent because my values have
 letters in them too.
 
 Faceting (and ignoring the counts) gets me half of the way there, but I can
 only sort ascending.  If I could also sort facet results descending, I'd be
 done.  I'd rather not return all documents and just parse the last few
 results to work around this.
 
 Any ideas?
 
 
 -Pete
 



Re: Tips for getting unique results?

2011-04-07 Thread Erick Erickson
I think you can specify the in-group sort, and specify a very small number
(perhaps
even one) to go in each group. But you'd have to store the length of each
body and sort by that.

I'm pretty sure grouping is trunk-only.

The problem here is getting something that applies
just within the group and not across groups... I'm not sure how to tackle
that
other than perhaps the grouping idea...

Best
Erick

On Thu, Apr 7, 2011 at 6:36 PM, Peter Spam ps...@mac.com wrote:

 Would grouping solve this?  I'd rather not move to a pre-release solr ...

 To clarify the problem:

 The data are fine and not duplicated - however, I want to analyze the data,
 and summarize one field (kind of like faceting), to understand what the
 largest value is.

 For example:

 Document 1:   label=1A1A1; body=adfasdfadsfasf
 Document 2:   label=5A1B1; body=adfaasdfasdfsdfadsfasf
 Document 3:   label=1A1A1; body=adasdfasdfasdffaasdfasdfsdfadsfasf
 Document 4:   label=7A1A1; body=azxzxcvdfaasdfasdfsdfadsfasf
 Document 5:   label=7A1A1; body=azxzxcvdfaasdfasdfsdasdafadsfasf
 Document 6:   label=5A1B1; body=adfaasdfasdfsdfadsfasfzzz

 How do I get back just ONE of the largest label item?

 In other words, what query will return the 7A1A1 label just once?  If I
 search for q=* and sort the results, it works, except I get back multiple
 hits for each label.  If I do a facet, I can only sort by increasing order,
 when what I want is decreasing order.


 -Peter

 On Apr 7, 2011, at 10:02 AM, Erick Erickson wrote:

  What version of Solr are you using? And, assuming the version that
  has it in, have you seen grouping?
 
  Which is another way of asking why you want to do this, perhaps it's an
  XY problem
 
  Best
  Erick
 
  On Thu, Apr 7, 2011 at 1:13 AM, Peter Spam ps...@mac.com wrote:
 
  Hi,
 
  I have documents with a field that has 1A2B3C alphanumeric characters.
  I
  can query for * and sort results based on this field, however I'd like
 to
  uniq these results (remove duplicates) so that I can get the 5 largest
  unique values.  I can't use the StatsComponent because my values have
  letters in them too.
 
  Faceting (and ignoring the counts) gets me half of the way there, but I
 can
  only sort ascending.  If I could also sort facet results descending, I'd
 be
  done.  I'd rather not return all documents and just parse the last few
  results to work around this.
 
  Any ideas?
 
 
  -Pete
 




Tips for getting unique results?

2011-04-06 Thread Peter Spam
Hi,

I have documents with a field that has 1A2B3C alphanumeric characters.  I can 
query for * and sort results based on this field, however I'd like to uniq 
these results (remove duplicates) so that I can get the 5 largest unique 
values.  I can't use the StatsComponent because my values have letters in them 
too.

Faceting (and ignoring the counts) gets me half of the way there, but I can 
only sort ascending.  If I could also sort facet results descending, I'd be 
done.  I'd rather not return all documents and just parse the last few results 
to work around this.

Any ideas?


-Pete


Re: Tips for getting unique results?

2011-04-06 Thread Otis Gospodnetic
Hi,

I think you are saying dupes are the main problem?  If so, 
http://wiki.apache.org/solr/Deduplication ?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Peter Spam ps...@mac.com
 To: solr-user@lucene.apache.org
 Sent: Thu, April 7, 2011 1:13:44 AM
 Subject: Tips for getting unique results?
 
 Hi,
 
 I have documents with a field that has 1A2B3C alphanumeric  characters.  I 
can query for * and sort results based on this field,  however I'd like to 
uniq these results (remove duplicates) so that I can get  the 5 largest 
unique 
values.  I can't use the StatsComponent because my  values have letters in 
them 
too.
 
 Faceting (and ignoring the counts) gets  me half of the way there, but I can 
only sort ascending.  If I could also  sort facet results descending, I'd be 
done.  I'd rather not return all  documents and just parse the last few 
results 
to work around this.
 
 Any  ideas?
 
 
 -Pete