Re: opensearch/elasticsearch with pagination

2023-08-23 Thread Chris Sampson
Just picking up on a couple of comments/threads within this chain (for clarity). Environment: * NiFi 2.0.0-SNAPSHOT (i.e. current “main” latest, but should be pretty much the same for the Elasticsearch processors as the current 1.23.2 release) * Elasticsearch 8.9.1 If I run the SearchElasticsear

Re: opensearch/elasticsearch with pagination

2023-08-21 Thread Richard Beare
I discovered the jsonqueryelasticsearch just before your message arrived, and it looks to be returning aggregates only if I set size: 0, so I think that is the place to start for this problem. On Mon, Aug 21, 2023 at 5:41 PM Chris Sampson wrote: > Using SearchElasticsearch for just an aggregatio

Re: opensearch/elasticsearch with pagination

2023-08-21 Thread Chris Sampson
Using SearchElasticsearch for just an aggregation feels like it might not be the right choice (maybe look at JsonQueryElasticsearch instead), or are the dates constantly changing, i.e. new data is always appearing, so you want to keep triggering the flow, and you want to use this as the starting

Re: opensearch/elasticsearch with pagination

2023-08-21 Thread Richard Beare
I'm repeatedly selecting the min and max date stamp using a SearchElasticSearch processor to begin creating the query generator. The query looks like: { "size" : 0, "aggs" : { "newest" : { "max" : { "field" : "Visit_DateTime"}}, "oldest" : { "min" : { "field" : "Visit_DateTime"}} } } This seems t

Re: opensearch/elasticsearch with pagination

2023-08-19 Thread Chris Sampson
To retrieve large quantities of data from Elasticsearch into nifi, yes, it's probably the best way we have. The processor's don't currently use slicing (parallelism) internally for the Elasticsearch queries, but as you're writing a query for every month, you could increase the processor's Concu

Re: opensearch/elasticsearch with pagination

2023-08-19 Thread Richard Beare
Good points - I've done some testing. About 1-2 minutes for 1 month's data with 1k page sizes and about half that for 10k. About 8-10 minutes for 1 years worth of data at 10k pages. Per month looks like the sweet spot in terms of size - that's about 500-750MB. In terms of building the upstream t

Re: opensearch/elasticsearch with pagination

2023-08-19 Thread Chris Sampson
I'd guess it depends on what you want to achieve downstream, e.g. would setting the query processor to output per_query and return everything in 1 to be useful? Internally, the processor is so fetching everything in pages from Elasticsearch, setting the size higher will reduce the number of netw

Re: opensearch/elasticsearch with pagination

2023-08-18 Thread Richard Beare
A bit of progress. First up, firing a match_all at my index with 20M documents doesn't work, as you probably expected. Or more precisely, is unlikely to be useful - I left it overnight and nothing appeared to have happened, so I guess it was madly fetching pages and filling up available storage. S

Re: opensearch/elasticsearch with pagination

2023-08-17 Thread Chris Sampson
Ah, so these processors have all been written for Elasticsearch, and use the Elasticsearch low-level REST API library to form connections. They've not been tested against OpenSearch, although hopefully should work for any interactions where the API is the same, but the two products continue to d

Re: opensearch/elasticsearch with pagination

2023-08-17 Thread Richard Beare
I did use the example and got errors. I'll revisit that (perhaps it is an opensearch idiosyncrasy). The per response option is probably my issue. I'll check that out and get back to you. Thanks again On Fri, Aug 18, 2023 at 2:30 PM Chris Sampson wrote: > Check the example in the processor's add

Re: opensearch/elasticsearch with pagination

2023-08-17 Thread Chris Sampson
Check the example in the processor's additional details docs [1] for how you could set size and sort fields for the query - size is used to determine the number of documents returned per page, sorry is required if using a "search after" or "point in time" query type. If the Query property is se

Re: opensearch/elasticsearch with pagination

2023-08-17 Thread Richard Beare
Thanks, that makes sense. I've had trouble getting a size parameter accepted, but will work on that later. However, I'm unsure what I should expect to see in the following test scenario. A fixed query in the Query parameter - a match all. i.e. nothing dynamic set by upstream processing An empty

Re: opensearch/elasticsearch with pagination

2023-08-17 Thread Chris Sampson
Again, sounds like it's working as documented [1] - an input is required to trigger the PaginatedJsonQueryElasticsearch processor, so something like GenerateFlowFile is a way to achieve that if you want to periodically execute a paginated query, e.g. by setting the Generate processor's schedule

Re: opensearch/elasticsearch with pagination

2023-08-17 Thread Richard Beare
I must be missing something simple. I've copied the parameters and query from the SearchElasticSearch processor and I'm not getting errors, but no flowfiles are produced. I'm forced to add an input connection, despite coding the query in the Query property. I have a GenerateFlowFile processor conn

Re: opensearch/elasticsearch with pagination

2023-08-16 Thread Chris Sampson
Elasticsearch doesn't have a CDC-like capability (it doesn't maintain a transaction log or such), so that approach isn't possible. What I've done previously is to maintain an audit log in a separate index within elasticsearch to track what data I've previously posted, e.g. this might be the las

Re: opensearch/elasticsearch with pagination

2023-08-16 Thread Richard Beare
One further question - what is the recommended way of checking for updates in an index and fetching new records in a similar manner to GenerateTableFetch for an sql DB? Thanks On Thu, Aug 17, 2023 at 7:21 AM Richard Beare wrote: > Sounds perfect. Thanks > > On Thu, Aug 17, 2023 at 5:11 AM Chris

Re: opensearch/elasticsearch with pagination

2023-08-16 Thread Richard Beare
Sounds perfect. Thanks On Thu, Aug 17, 2023 at 5:11 AM Chris Sampson wrote: > What you describe sounds like the processor is working as designed & > documented, i.e. it will restart the same query once it has reached the end > of the paginated scroll (or search_after, or point-in-time) query. >

Re: opensearch/elasticsearch with pagination

2023-08-16 Thread Chris Sampson
What you describe sounds like the processor is working as designed & documented, i.e. it will restart the same query once it has reached the end of the paginated scroll (or search_after, or point-in-time) query. Instead, it sounds like you want to try using the PaginatedJsonQueryElasticsearch [