Re: Processing a lot of results in Solr

2013-07-25 Thread Otis Gospodnetic
Mikhail, Yes, +1. This question comes up a few times a year. Grant created a JIRA issue for this many moons ago. https://issues.apache.org/jira/browse/LUCENE-2127 https://issues.apache.org/jira/browse/SOLR-1726 Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring

Re: Processing a lot of results in Solr

2013-07-24 Thread Mikhail Khludnev
Roman, Can you disclosure how that streaming writer works? What does it stream docList or docSet? Thanks On Wed, Jul 24, 2013 at 5:57 AM, Roman Chyla roman.ch...@gmail.com wrote: Hello Matt, You can consider writing a batch processing handler, which receives a query and instead of sending

Re: Processing a lot of results in Solr

2013-07-24 Thread Roman Chyla
Mikhail, It is a slightly hacked JSONWriter - actually, while poking around, I have discovered that dumping big hitsets would be possible - the main hurdle right now, is that writer is expecting to receive docuemnts with fields loaded, but if it received something that loads docs lazily, you could

Re: Processing a lot of results in Solr

2013-07-24 Thread Roman Chyla
On Tue, Jul 23, 2013 at 10:05 PM, Matt Lieber mlie...@impetus.com wrote: That sounds like a satisfactory solution for the time being - I am assuming you dump the data from Solr in a csv format? JSON How did you implement the streaming processor ? (what tool did you use for this? Not

Re: Processing a lot of results in Solr

2013-07-24 Thread Mikhail Khludnev
fwiw, i did some prototype with the following differences: - it streams straight to the socket output stream - it streams on-going during collecting, without necessity to store a bitset. It might have some limited extreme usage. Is there anyone interested? On Wed, Jul 24, 2013 at 7:19 PM, Roman

Re: Processing a lot of results in Solr

2013-07-24 Thread Chris Hostetter
: Subject: Processing a lot of results in Solr : Message-ID: d57c2b719b792f428beca7b0096c88e22c0...@mail1.impetus.co.in : In-Reply-To: 1374612243070-4079869.p...@n3.nabble.com https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a

Re: Processing a lot of results in Solr

2013-07-23 Thread Timothy Potter
Hi Matt, This feature is commonly known as deep paging and Lucene and Solr have issues with it ... take a look at http://solr.pl/en/2011/07/18/deep-paging-problem/ as a potential starting point using filters to bucketize a result set into sets of sub result sets. Cheers, Tim On Tue, Jul 23,

Re: Processing a lot of results in Solr

2013-07-23 Thread Roman Chyla
Hello Matt, You can consider writing a batch processing handler, which receives a query and instead of sending results back, it writes them into a file which is then available for streaming (it has its own UUID). I am dumping many GBs of data from solr in few minutes - your query + streaming

Re: Processing a lot of results in Solr

2013-07-23 Thread Matt Lieber
That sounds like a satisfactory solution for the time being - I am assuming you dump the data from Solr in a csv format? How did you implement the streaming processor ? (what tool did you use for this? Not familiar with that) You say it takes a few minutes only to dump the data - how long does it