Jens,

In this case it would make perfect sense to use Cron for the ListFile. 
FetchFile would then be Timer Driven. The point here is around CRON driven 
processors in the middle of the flow.

Thanks
Mark

Sent from my iPhone

On Feb 22, 2023, at 10:17 AM, Jens M. Kofoed <[email protected]> wrote:


Hi Mark
We have many List/Get processors which is running via cron. Some systems export 
data to disk every hour, but the systems can't block read acces to the files 
while writing them. So NiFi can pull the same file multiple times and tries to 
delete it while the file is written. But we know that the export only takes 10 
minutes. Therefore we use a CRON to get files between 0 0 15-55 * *
We have similar issues with other systems only providing data or are accessibly 
at specific time slots.

To John:
Could you use a Notify/Wait gate function. Where a wait processor is blocking 
flowfiles to the mergeContent processor. And in another flow use a 
generateFlowfile and a notify process to open the gate (wait process). After 
the mergeContent you could have a notify process to close the gate again.
In this way, you would get many flowfile into the mergeContent process at the 
same time.

Kind regards
Jens M. Kofoed

Den 22. feb. 2023 kl. 15.24 skrev Mark Payne 
<[email protected]<mailto:[email protected]>>:

Interesting. Thanks for that feedback Harald. It might make sense to be more 
surgical about this, disabling it for MergeContent, for example, instead of all 
interflow processors.

Thanks
-Mark


On Feb 22, 2023, at 5:42 AM, Dobbernack, Harald (Key-Work) 
<[email protected]<mailto:[email protected]>> wrote:

Just responding to this part:
You should not be using CRON driven for any processors in the middle of a flow. 
In fact, we really
should probably just disable that all together.
Please don't disable this! We actually use CRON for some of our PutSFTP 
Processors as there are servicetimes of these SFTP we are supposed to respect 
and not use them or the SFTP will actually not be available... Of course we can 
also use a routing to a wait processor if we have arrived at a time where the 
destination should not be called, but it is so more simpler being able to tell 
the processor in the middle of the flow when not to run.

-----Ursprüngliche Nachricht-----
Von: Mark Payne <[email protected]<mailto:[email protected]>>
Gesendet: Dienstag, 21. Februar 2023 21:37
An: [email protected]<mailto:[email protected]>; John McGinn 
<[email protected]<mailto:[email protected]>>
Betreff: Re: Processor with cron scheduling in middle of flow

Key-Work IT-Sicherheit: Es handelt sich um eine externe E-Mail. Bitte nur auf 
Links oder Anhänge klicken, sofern die Echtheit der Nachricht klar ist.

John,

You should not be using CRON driven for any processors in the middle of a flow. 
In fact, we really should probably just disable that all together.
In fact, it’s exceedingly rare that you’d want anything other than Timer-Driven 
with a Run Schedule of 0 sec.
MergeContent will not create any merged output on its first iteration after 
it’s scheduled to run. It requires at least a second iteration before anything 
is transferred. Its algorithm has evolved over time, and it may well have 
happened to work previously but it’s really not being configured as intended.

When you say that you’re retrieving data from a few sources and then “merges 
that all back into a single file” - does that mean that you started with one 
FlowFile, split it up, and then want to re-assemble the data after performing 
enrichment? If so you’ll want to use a Merge Strategy of Defragment.

If you’re trying to just bring in some data and merge it together by 
correlation attribute, then Bin Packing makes sense. Here, you have a few 
properties that you can use to try to get the best bin packing. In short, a bin 
will be created when any of these conditions is met:

- The Minimum Group Size is reached AND the Minimum Number of Entries is met
- The Maximum Group Size OR the Maximum Number of Entries is met
- A bin has been sitting for “Max Bin Age” amount of time
- If a correlation attribute is used, and a FlowFile comes in that can’t go 
into any bin, it will evict the oldest.

If you’re seeing bins smaller than expected, you can look at the Data 
Provenance for the merged FlowFile, and it will tell you exactly which of the 
conditions above triggered the data to be merged. This may help to adjust these 
settings.

Hope this is helpful.

Thanks
-Mark


On Feb 17, 2023, at 1:39 PM, John McGinn via users 
<[email protected]<mailto:[email protected]>> wrote:

Hello,

NiFi 1.19.0 - I need some help in trying to make my idea work, or figure out 
the better way to do this.

I've got a flow that retrieves data from a few data sources, enhances 
individual flow files, converts attributes to CSV and then merges that all back 
into a single file. It takes roughly 20 minutes for the process to run from 
start to the MergeContent part, so when I do it manually, I stop the 
MergeContent processor until all flowfiles are in the queue waiting, and then I 
start the MergeContent processor. (Run One Time doesn't work for some reason.) 
That works fine, manually.

When I try to put cron scheduling in, it never kicks off. For instance, the 
initial processor in the flow has a cron schedule of the top of the hour. (0 0 
* * * ?) I then put 25 past the hour for Merge Content (0 25 * * * ?). When I 
start the flow, the flowfiles are generated and queue up in front of 
MergeContent by 25 minutes past the hour, but the MergeContent never kicks off.

I added a correlation attribute recently and removed the cron entry, but the 
MergeContent just creates small bunches of merged files.

I even attempted to put a cron on the AttributesToCSV with a maximum bin age on 
the Merge Content, since it takes less than a minute for the AttribuesToCSV to 
process the flowfiles at that point, but the cron didn't kick off there either.

Any ideas on how to get this to work?

Thanks,
John



Harald Dobbernack

Key-Work Consulting GmbH | Kriegsstr. 100 | 76133 | Karlsruhe | Germany | 
www.key-work.de<http://www.key-work.de><https://www.key-work.de> | 
Datenschutz<https://www.key-work.de/de/footer/datenschutz.html>
Fon: +49-721-78203-264 | E-Mail: 
[email protected]<mailto:[email protected]>

Key-Work Consulting GmbH, Karlsruhe, HRB 108695, HRG Mannheim
Geschäftsführung: Petra Wotring

Reply via email to