What you describe sounds like the processor is working as designed & 
documented, i.e. it will restart the same query once it has reached the end of 
the paginated scroll (or search_after, or point-in-time) query.

Instead, it sounds like you want to try using the 
PaginatedJsonQueryElasticsearch [1] processor instead. This will execute the 
query given to it, either as the query property or the body of an incoming 
FlowFile, output the results, and then stop.


[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-elasticsearch-restapi-nar/1.23.0/org.apache.nifi.processors.elasticsearch.PaginatedJsonQueryElasticsearch/index.html

On 2023/08/16 07:57:43 Richard Beare wrote:
> Hi,
> I am using the SearchElasticSearch (1.20.0) processor to retrieve all
> documents (~20M) from an index, process and eventually return results to a
> new index, although for this test I'm retrieving and processing then
> discarding. I'm using opensearch.
> 
> My problem is that the process restarts after completion - I discovered
> this, and docs confirm, after seeing warnings from my processing code
> (which reformats json ready for other work) being repeated for the same
> document ID.
> 
> How do I configure the processor to stop after the completing the first
> query.
> 
> I've tried the following:
> 
> Query: {"query" : {"match_all" :{}}}
> 
> with pagination_type SCROLL
> 
> I haven't found a combination of the properties that doesn't lead to
> repeated cycles through the index.
> 
> I've also tried {"query" : {"match_all" :{}}, "sort" : [{"Visit_DateTime" :
> "asc"]}}
> 
> and SEARCH_AFTER pagination type, with the same problem.
> 
> What am I missing?
> Thanks
> 

Reply via email to