Hi Mikhail, I can see it is lazy-loading, but I can't judge how much complex it becomes (presumably, the filter dispatching mechanism is doing also other things - it is there not only for streaming).
Let me just explain better what I found when I dug inside solr: documents (results of the query) are loaded before they are passed into a writer - so the writers are expecting to encounter the solr documents, but these documents were loaded by one of the components before rendering them - so it is kinda 'hard-coded'. But if solr was NOT loading these docs before passing them to a writer, writer can load them instead (hence lazy loading, but the difference is in numbers - it could deal with hundreds of thousands of docs, instead of few thousands now). I see one crucial point: this could work without any new handler/servlet - solr would just gain a new parameter, something like: 'lazy=true' ;) and people can use whatever 'wt' they did before disclaimer: i don't know whether that would break other stuff, I only know that I am using the same idea to dump what i need without breaking things (so far...;-)) - but obviously, i didn't want to patch solr core roman On Sat, Jul 27, 2013 at 3:52 PM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > Roman, > > Let me briefly explain the design > > special RequestParser stores servlet output stream into the context > https://github.com/m-khl/solr-patches/compare/streaming#L7R22 > > then special component injects special PostFilter/DelegatingCollector which > writes right into output > https://github.com/m-khl/solr-patches/compare/streaming#L2R146 > > here is how it streams the doc, you see it's lazy enough > https://github.com/m-khl/solr-patches/compare/streaming#L2R181 > > I mention that it disables later collectors > https://github.com/m-khl/solr-patches/compare/streaming#L2R57 > hence, no facets with streaming, yet as well as memory consumption. > > This test shows how it works > https://github.com/m-khl/solr-patches/compare/streaming#L15R115 > > all other code purposed for distributed search. > > > > On Sat, Jul 27, 2013 at 4:44 PM, Roman Chyla <roman.ch...@gmail.com> > wrote: > > > Mikhail, > > If your solution gives lazy loading of solr docs /and thus streaming of > > huge result lists/ it should be big YES! > > Roman > > On 27 Jul 2013 07:55, "Mikhail Khludnev" <mkhlud...@griddynamics.com> > > wrote: > > > > > Otis, > > > You gave links to 'deep paging' when I asked about response streaming. > > > Let me understand. From my POV, deep paging is a special case for > regular > > > search scenarios. We definitely need it in Solr. However, if we are > > talking > > > about data analytic like problems, when we need to select an "endless" > > > stream of responses (or store them in file as Roman did), 'deep paging' > > is > > > a suboptimal hack. > > > What's your vision on this? > > > > > > > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > <mkhlud...@griddynamics.com> >