To retrieve large quantities of data from Elasticsearch into nifi, yes, it's
probably the best way we have.
The processor's don't currently use slicing (parallelism) internally for the
Elasticsearch queries, but as you're writing a query for every month, you could
increase the processor's Concu
Good points - I've done some testing.
About 1-2 minutes for 1 month's data with 1k page sizes and about half that
for 10k. About 8-10 minutes for 1 years worth of data at 10k pages.
Per month looks like the sweet spot in terms of size - that's about
500-750MB.
In terms of building the upstream t
I'd guess it depends on what you want to achieve downstream, e.g. would setting
the query processor to output per_query and return everything in 1 to be
useful? Internally, the processor is so fetching everything in pages from
Elasticsearch, setting the size higher will reduce the number of netw