I know that when paging through a big set of results, using cursorMark is
better than using start/rows pagination because cursorMark works better
when data may be inserted/updated/deleted during pagination and it can have
better performance.
https://solr.apache.org/guide/solr/latest/query-guide/pagination-of-results.html

I know that there are Response Writers, so that if I want to get my results
in CSV I can, just by changing the wt parameter.
https://solr.apache.org/guide/solr/latest/query-guide/response-writers.html

So my question is, what if I want to combine them? Get a bunch of CSV's
nicely paginated with cursorMark?

I can't see any options to do this - are there any?

Are there any good workarounds?

I could just page with start/row and accept the problems with that. However
if a row is inserted/deleted/moved above my current position, my data will
shift by 1 and that's not great.

I could use cursorMark with 2 queries per page, like:

* set cursorMark to last known cursorMark or "*" if it's the start.
* call API once with JSON response writer. Note the value of nextCursorMark.
* call API a second time with CSV response writer. Save my CSV result
somewhere.
* maybe pause a second to avoid rate limiting.
* If nextCursorMark is different from last cursorMark there are more
results so loop over again.

With this system, if a row is inserted/deleted/moved above my current
position, my data will not shift - great. However if a row is
inserted/deleted/moved in my current page between the 2 queries, I may miss
a row or double count a row.

Any better options?

Thank you in advance,
James

Reply via email to