On Tue, Jul 23, 2013 at 10:05 PM, Matt Lieber <mlie...@impetus.com> wrote:
> That sounds like a satisfactory solution for the time being - > I am assuming you dump the data from Solr in a csv format? > JSON > How did you implement the streaming processor ? (what tool did you use for > this? Not familiar with that) > this is what dumps the docs: https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/response/JSONDumper.java it is called by one of our batch processors, which can pass it a bitset of recs https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/handler/batch/BatchProviderDumpIndex.java as far as streaming is concerned, we were all very nicely surprised, a few GB file (on local network) took ridiculously short time - in fact, a colleague of mine was assuming it is not working, until we looked into the downloaded file ;-), you may want to look at line 463 https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/handler/batch/BatchHandler.java roman > You say it takes a few minutes only to dump the data - how long does it to > stream it back in, are performances acceptable (~ within minutes) ? > > Thanks, > Matt > > On 7/23/13 6:57 PM, "Roman Chyla" <roman.ch...@gmail.com> wrote: > > >Hello Matt, > > > >You can consider writing a batch processing handler, which receives a > >query > >and instead of sending results back, it writes them into a file which is > >then available for streaming (it has its own UUID). I am dumping many GBs > >of data from solr in few minutes - your query + streaming writer can go > >very long way :) > > > >roman > > > > > >On Tue, Jul 23, 2013 at 5:04 PM, Matt Lieber <mlie...@impetus.com> wrote: > > > >> Hello Solr users, > >> > >> Question regarding processing a lot of docs returned from a query; I > >> potentially have millions of documents returned back from a query. What > >>is > >> the common design to deal with this ? > >> > >> 2 ideas I have are: > >> - create a client service that is multithreaded to handled this > >> - Use the Solr "pagination" to retrieve a batch of rows at a time > >>("start, > >> rows" in Solr Admin console ) > >> > >> Any other ideas that I may be missing ? > >> > >> Thanks, > >> Matt > >> > >> > >> ________________________________ > >> > >> > >> > >> > >> > >> > >> NOTE: This message may contain information that is confidential, > >> proprietary, privileged or otherwise protected by law. The message is > >> intended solely for the named addressee. If received in error, please > >> destroy and notify the sender. Any use of this email is prohibited when > >> received in error. Impetus does not represent, warrant and/or guarantee, > >> that the integrity of this communication has been maintained nor that > >>the > >> communication is free of errors, virus, interception or interference. > >> > > > ________________________________ > > > > > > > NOTE: This message may contain information that is confidential, > proprietary, privileged or otherwise protected by law. The message is > intended solely for the named addressee. If received in error, please > destroy and notify the sender. Any use of this email is prohibited when > received in error. Impetus does not represent, warrant and/or guarantee, > that the integrity of this communication has been maintained nor that the > communication is free of errors, virus, interception or interference. >