They are for different use cases. Hoss's approach, I believe, focuses on deep paging of ranked search results. SOLR-5244 focuses on the batch export of an entire unranked search result in binary format. It's basically a very efficient bulk extract for Solr.
On Tue, Dec 17, 2013 at 6:51 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Joel - can you please elaborate a bit on how this compares with Hoss' > approach? Complementary? > > Thanks, > Otis > -- > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ > > > On Tue, Dec 17, 2013 at 6:45 PM, Joel Bernstein <joels...@gmail.com> > wrote: > > > SOLR-5244 is also working in this direction. This focuses on efficient > > binary extract of entire search results. > > > > > > On Tue, Dec 17, 2013 at 2:33 PM, Otis Gospodnetic < > > otis.gospodne...@gmail.com> wrote: > > > > > Hoss is working on it. Search for deep paging or cursor in JIRA. > > > > > > Otis > > > Solr & ElasticSearch Support > > > http://sematext.com/ > > > On Dec 17, 2013 12:30 PM, "Petersen, Robert" < > > > robert.peter...@mail.rakuten.com> wrote: > > > > > > > Hi solr users, > > > > > > > > We have a new use case where need to make a pile of data available as > > XML > > > > to a client and I was thinking we could easily put all this data > into a > > > > solr collection and the client could just do a star search and page > > > through > > > > all the results to obtain the data we need to give them. Then I > > > remembered > > > > we currently don't allow deep paging in our current search indexes as > > > > performance declines the deeper you go. Is this still the case? > > > > > > > > If so, is there another approach to make all the data in a collection > > > > easily available for retrieval? The only thing I can think of is to > > > query > > > > our DB for all the unique IDs of all the documents in the collection > > and > > > > then pull out the documents out in small groups with successive > queries > > > > like 'UniqueIdField:(id1 OR id2 OR ... OR idn)' 'UniqueIdField:(idn+1 > > OR > > > > idn+2 OR ... etc)' which doesn't seem like a very good approach > because > > > the > > > > DB might have been updated with new data which hasn't been indexed > yet > > > and > > > > so all the ids might not be in there (which may or may not matter I > > > > suppose). > > > > > > > > Then I was thinking we could have a field with an incrementing > numeric > > > > value which could be used to perform range queries as a substitute > for > > > > paging through everything. Ie queries like 'IncrementalField:[1 TO > > 100]' > > > > 'IncrementalField:[101 TO 200]' but this would be difficult to > maintain > > > as > > > > we update the index unless we reindex the entire collection every > time > > we > > > > update any docs at all. > > > > > > > > Is this perhaps not a good use case for solr? Should I use something > > > else > > > > or is there another approach that would work here to allow a client > to > > > pull > > > > groups of docs in a collection through the rest api until the client > > has > > > > gotten them all? > > > > > > > > Thanks > > > > Robi > > > > > > > > > > > > > > > > > > > -- > > Joel Bernstein > > Search Engineer at Heliosearch > > > -- Joel Bernstein Search Engineer at Heliosearch