Possibly worth mentioning, although it might not be appropriate for your use case: if the fields you're interested in are configured with docValues, you could use streaming expressions (or directly handle thread-per-shard connections to the /export handler) and get everything in a single shot without paging of any kind. (I'm actually working on something of this nature now; though not quite ready for prime time, it's reliably exporting 68 million records to a 24G compressed zip archive in 23 minutes -- 24 shards).
On Mon, Feb 10, 2020 at 6:39 PM Erick Erickson <erickerick...@gmail.com> wrote: > > Any field that’s unique per doc would do, but yeah, that’s usually an ID. > > Hmmm, I don’t see why separate queries for 0-f are necessary if you’re firing > at individual replicas. Each replica should have multiple UUIDs that start > with 0-f. > > Unless I misunderstand and you’re just firing off, say, 16 threads at the > entire > collection rather than individual shards which would work too. But for > individual > shards I think you need to look for all possible IDs... > > Erick > > > On Feb 10, 2020, at 5:37 PM, Walter Underwood <wun...@wunderwood.org> wrote: > > > > > >> On Feb 10, 2020, at 2:24 PM, Walter Underwood <wun...@wunderwood.org> > >> wrote: > >> > >> Not sure if range queries work on a UUID field, ... > > > > A search for id:0* took 260 ms, so it looks like they work just fine. I’ll > > try separate queries for 0-f. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ (my blog) > > >