Re: opensearch/elasticsearch with pagination

2023-08-19 Thread Chris Sampson
To retrieve large quantities of data from Elasticsearch into nifi, yes, it's probably the best way we have. The processor's don't currently use slicing (parallelism) internally for the Elasticsearch queries, but as you're writing a query for every month, you could increase the processor's Concu

Re: opensearch/elasticsearch with pagination

2023-08-19 Thread Richard Beare
Good points - I've done some testing. About 1-2 minutes for 1 month's data with 1k page sizes and about half that for 10k. About 8-10 minutes for 1 years worth of data at 10k pages. Per month looks like the sweet spot in terms of size - that's about 500-750MB. In terms of building the upstream t

Re: opensearch/elasticsearch with pagination

2023-08-19 Thread Chris Sampson
I'd guess it depends on what you want to achieve downstream, e.g. would setting the query processor to output per_query and return everything in 1 to be useful? Internally, the processor is so fetching everything in pages from Elasticsearch, setting the size higher will reduce the number of netw