Re: Retrieve documents that contain max value for a field

2008-12-29 Thread Sushil Vegad

This looks useful, but I am not sure how to use the component. Could you
please elaborate?

Also, this is not available in Solr 1.3. Any equivalent of it in 1.3?

Thanks,
Sushil


ryantxu wrote:
 
 not exactly what you are asking for, but check:
 http://wiki.apache.org/solr/StatsComponent
 
 this will at least tell you the max/min versionId...   right now it  
 only works with numeric values, so it won't help for timestamp.
 
 ryan
 

-- 
View this message in context: 
http://www.nabble.com/Retrieve-documents-that-contain-max-value-for-a-field-tp21175643p21203697.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Retrieve documents that contain max value for a field

2008-12-27 Thread Ryan McKinley


We want to write a single query where the query returns doc1_1,  
doc2_2 and
so on...that is for documents that have the same id, we want the  
query to

return the document with highest versionId or the latest timestamp.

Any thoughts how this can be done?



not exactly what you are asking for, but check:
http://wiki.apache.org/solr/StatsComponent

this will at least tell you the max/min versionId...   right now it  
only works with numeric values, so it won't help for timestamp.


ryan


Retrieve documents that contain max value for a field

2008-12-26 Thread Sushil Vegad

Hi,
Can someone please help with how to write a query for the following
scenario?

We index Topics that contains text. A topic can have many versions, each
version is indexed.  Our schema has topicid, versionId and timestamp fields,
amongst others. Topicid is not a uniqueField because multiple verisons of a
topic have the same topicId. Instead the versionId and timestamp differ for
each version as follows.

SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField(id, 1);
doc1.addField(versionId, 1);
doc1.addField(versionDate, 2008-12-23T23:59:59Z);

SolrInputDocument doc1_1 = new SolrInputDocument();
doc1_1.addField(id, 1);
doc1_1.addField(versionId, 2);
doc1_1.addField(versionDate, 2008-12-24T23:59:59Z);

SolrInputDocument doc2 = new SolrInputDocument();
doc2.addField(id, 2);
doc2.addField(versionId, 1);
doc2.addField(versionDate, 2008-12-23T23:59:59Z);

SolrInputDocument doc2_1 = new SolrInputDocument();
doc2_1.addField(id, 2);
doc2_1.addField(versionId, 2);
doc2_1.addField(versionDate, 2008-12-24T23:59:59Z);

SolrInputDocument doc2_2 = new SolrInputDocument();
doc2_2.addField(id, 2);
doc2_2.addField(versionId, 3);
doc2_2.addField(versionDate, 2008-12-25T23:59:59Z);

We want to write a single query where the query returns doc1_1, doc2_2 and
so on...that is for documents that have the same id, we want the query to
return the document with highest versionId or the latest timestamp.

Any thoughts how this can be done?

Thanks,
Sushil
-- 
View this message in context: 
http://www.nabble.com/Retrieve-documents-that-contain-max-value-for-a-field-tp21175643p21175643.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Retrieve documents that contain max value for a field

2008-12-26 Thread Otis Gospodnetic
Hi,

I'm not sure if Solr is the right tool for the job, if that's all there is to 
your application, but you might be able to get what you want by simply sorting 
on the version field.  Your version field is a very precise timestamp, which 
means the version field will have LOTS of unique values, which means that 
sorting by that field will eat your memory and increase your searchers' warmup 
time.  Please check the mailing lists for more information, or maybe we already 
have this covered in the Solr FAQ?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Sushil Vegad vsus...@serebrum.com
 To: solr-user@lucene.apache.org
 Sent: Friday, December 26, 2008 11:25:10 AM
 Subject: Retrieve documents that contain max value for a field
 
 
 Hi,
 Can someone please help with how to write a query for the following
 scenario?
 
 We index Topics that contains text. A topic can have many versions, each
 version is indexed.  Our schema has topicid, versionId and timestamp fields,
 amongst others. Topicid is not a uniqueField because multiple verisons of a
 topic have the same topicId. Instead the versionId and timestamp differ for
 each version as follows.
 
 SolrInputDocument doc1 = new SolrInputDocument();
 doc1.addField(id, 1);
 doc1.addField(versionId, 1);
 doc1.addField(versionDate, 2008-12-23T23:59:59Z);
 
 SolrInputDocument doc1_1 = new SolrInputDocument();
 doc1_1.addField(id, 1);
 doc1_1.addField(versionId, 2);
 doc1_1.addField(versionDate, 2008-12-24T23:59:59Z);
 
 SolrInputDocument doc2 = new SolrInputDocument();
 doc2.addField(id, 2);
 doc2.addField(versionId, 1);
 doc2.addField(versionDate, 2008-12-23T23:59:59Z);
 
 SolrInputDocument doc2_1 = new SolrInputDocument();
 doc2_1.addField(id, 2);
 doc2_1.addField(versionId, 2);
 doc2_1.addField(versionDate, 2008-12-24T23:59:59Z);
 
 SolrInputDocument doc2_2 = new SolrInputDocument();
 doc2_2.addField(id, 2);
 doc2_2.addField(versionId, 3);
 doc2_2.addField(versionDate, 2008-12-25T23:59:59Z);
 
 We want to write a single query where the query returns doc1_1, doc2_2 and
 so on...that is for documents that have the same id, we want the query to
 return the document with highest versionId or the latest timestamp.
 
 Any thoughts how this can be done?
 
 Thanks,
 Sushil
 -- 
 View this message in context: 
 http://www.nabble.com/Retrieve-documents-that-contain-max-value-for-a-field-tp21175643p21175643.html
 Sent from the Solr - User mailing list archive at Nabble.com.