SOLR-5244 is also working in this direction. This focuses on efficient binary extract of entire search results.
On Tue, Dec 17, 2013 at 2:33 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Hoss is working on it. Search for deep paging or cursor in JIRA. > > Otis > Solr & ElasticSearch Support > http://sematext.com/ > On Dec 17, 2013 12:30 PM, "Petersen, Robert" < > robert.peter...@mail.rakuten.com> wrote: > > > Hi solr users, > > > > We have a new use case where need to make a pile of data available as XML > > to a client and I was thinking we could easily put all this data into a > > solr collection and the client could just do a star search and page > through > > all the results to obtain the data we need to give them. Then I > remembered > > we currently don't allow deep paging in our current search indexes as > > performance declines the deeper you go. Is this still the case? > > > > If so, is there another approach to make all the data in a collection > > easily available for retrieval? The only thing I can think of is to > query > > our DB for all the unique IDs of all the documents in the collection and > > then pull out the documents out in small groups with successive queries > > like 'UniqueIdField:(id1 OR id2 OR ... OR idn)' 'UniqueIdField:(idn+1 OR > > idn+2 OR ... etc)' which doesn't seem like a very good approach because > the > > DB might have been updated with new data which hasn't been indexed yet > and > > so all the ids might not be in there (which may or may not matter I > > suppose). > > > > Then I was thinking we could have a field with an incrementing numeric > > value which could be used to perform range queries as a substitute for > > paging through everything. Ie queries like 'IncrementalField:[1 TO 100]' > > 'IncrementalField:[101 TO 200]' but this would be difficult to maintain > as > > we update the index unless we reindex the entire collection every time we > > update any docs at all. > > > > Is this perhaps not a good use case for solr? Should I use something > else > > or is there another approach that would work here to allow a client to > pull > > groups of docs in a collection through the rest api until the client has > > gotten them all? > > > > Thanks > > Robi > > > > > -- Joel Bernstein Search Engineer at Heliosearch