Interesting. Thanks for that feedback Harald. It might make sense to be more surgical about this, disabling it for MergeContent, for example, instead of all interflow processors.
Thanks -Mark > On Feb 22, 2023, at 5:42 AM, Dobbernack, Harald (Key-Work) > <[email protected]> wrote: > > Just responding to this part: >> You should not be using CRON driven for any processors in the middle of a >> flow. In fact, we really >> should probably just disable that all together. > Please don't disable this! We actually use CRON for some of our PutSFTP > Processors as there are servicetimes of these SFTP we are supposed to respect > and not use them or the SFTP will actually not be available... Of course we > can also use a routing to a wait processor if we have arrived at a time where > the destination should not be called, but it is so more simpler being able to > tell the processor in the middle of the flow when not to run. > > -----Ursprüngliche Nachricht----- > Von: Mark Payne <[email protected]> > Gesendet: Dienstag, 21. Februar 2023 21:37 > An: [email protected]; John McGinn <[email protected]> > Betreff: Re: Processor with cron scheduling in middle of flow > > Key-Work IT-Sicherheit: Es handelt sich um eine externe E-Mail. Bitte nur auf > Links oder Anhänge klicken, sofern die Echtheit der Nachricht klar ist. > > John, > > You should not be using CRON driven for any processors in the middle of a > flow. In fact, we really should probably just disable that all together. > In fact, it’s exceedingly rare that you’d want anything other than > Timer-Driven with a Run Schedule of 0 sec. > MergeContent will not create any merged output on its first iteration after > it’s scheduled to run. It requires at least a second iteration before > anything is transferred. Its algorithm has evolved over time, and it may well > have happened to work previously but it’s really not being configured as > intended. > > When you say that you’re retrieving data from a few sources and then “merges > that all back into a single file” - does that mean that you started with one > FlowFile, split it up, and then want to re-assemble the data after performing > enrichment? If so you’ll want to use a Merge Strategy of Defragment. > > If you’re trying to just bring in some data and merge it together by > correlation attribute, then Bin Packing makes sense. Here, you have a few > properties that you can use to try to get the best bin packing. In short, a > bin will be created when any of these conditions is met: > > - The Minimum Group Size is reached AND the Minimum Number of Entries is met > - The Maximum Group Size OR the Maximum Number of Entries is met > - A bin has been sitting for “Max Bin Age” amount of time > - If a correlation attribute is used, and a FlowFile comes in that can’t go > into any bin, it will evict the oldest. > > If you’re seeing bins smaller than expected, you can look at the Data > Provenance for the merged FlowFile, and it will tell you exactly which of the > conditions above triggered the data to be merged. This may help to adjust > these settings. > > Hope this is helpful. > > Thanks > -Mark > > >> On Feb 17, 2023, at 1:39 PM, John McGinn via users <[email protected]> >> wrote: >> >> Hello, >> >> NiFi 1.19.0 - I need some help in trying to make my idea work, or figure out >> the better way to do this. >> >> I've got a flow that retrieves data from a few data sources, enhances >> individual flow files, converts attributes to CSV and then merges that all >> back into a single file. It takes roughly 20 minutes for the process to run >> from start to the MergeContent part, so when I do it manually, I stop the >> MergeContent processor until all flowfiles are in the queue waiting, and >> then I start the MergeContent processor. (Run One Time doesn't work for some >> reason.) That works fine, manually. >> >> When I try to put cron scheduling in, it never kicks off. For instance, the >> initial processor in the flow has a cron schedule of the top of the hour. (0 >> 0 * * * ?) I then put 25 past the hour for Merge Content (0 25 * * * ?). >> When I start the flow, the flowfiles are generated and queue up in front of >> MergeContent by 25 minutes past the hour, but the MergeContent never kicks >> off. >> >> I added a correlation attribute recently and removed the cron entry, but the >> MergeContent just creates small bunches of merged files. >> >> I even attempted to put a cron on the AttributesToCSV with a maximum bin age >> on the Merge Content, since it takes less than a minute for the >> AttribuesToCSV to process the flowfiles at that point, but the cron didn't >> kick off there either. >> >> Any ideas on how to get this to work? >> >> Thanks, >> John > > > > Harald Dobbernack > > Key-Work Consulting GmbH | Kriegsstr. 100 | 76133 | Karlsruhe | Germany | > www.key-work.de<https://www.key-work.de> | > Datenschutz<https://www.key-work.de/de/footer/datenschutz.html> > Fon: +49-721-78203-264 | E-Mail: [email protected] > > Key-Work Consulting GmbH, Karlsruhe, HRB 108695, HRG Mannheim > Geschäftsführung: Petra Wotring
