Hi, The way I expect NiFi, is a suitable replacement of non-opensource & so-called opensource ETL tools :). In traditional ETL(or ELT) process, we use below process flow to load data in near real time:
1. spool dir > 2. collect available files > 3. load files in a "staging table" DB without transformation > 4. (In DB) do final transformation and load to final table > 5. truncate the "staging table" to avoid duplication on next load I think, above process flow is self explanatory. Process 3 to 5 should be done in sequence(each of them can be parallel individually). So, I am proposing below: - Implement scheduling option on process group. - Start-Stop on a "scheduled process group" will be possible on on process group level. - Each processors under same "scheduled process group" will maintain all other properties (like parallelism). Please let me know if anything needs more clarifications. -Obaid On Thu, Feb 11, 2016 at 8:40 PM, Oleg Zhurakousky < [email protected]> wrote: > Obaid > > Thanks for reaching out! > Currently it is not possible to wire a flow the way you describe. > Wha you are asking is a true Event Driven Consumer paradigm which been > discussed internally a lot lately, so it would be very interesting to get > your perspective as to why do you believe it is important within your use > case to have that, so please share, since it will help to bring this > discussion to a resolution. > > Having said that in the current NiFi architecture you still get that > serial processing that you are describing. The only difference is that > every consumer (Processor) is independent from the internals of another > since there is a Queue separating each and every one of them allowing each > processor to be managed independently (e.g., start/stop) while letting flow > to continue, knowing that messages will queue up if processor is not > available and processed as soon and as quick as processor becomes available. > > Anyway, your thoughts? > > Cheers > Oleg > > > On Feb 11, 2016, at 7:14 AM, obaidul karim <[email protected]> wrote: > > > > Hi All, > > > > Lets say, I have below processors: > > > > 1.listfile > 2.fetchfile > 3.putHDFS > 4.ExecuteSQL > > > > I want to run all above processors in sequence. > > Processor 1 will wait for next run until all 2,3 & 4 completes. > Similarly 2 will wait until 3 & 4 completes current run and so on. > > > > In other words, can I schedule the entire workflow as single job and > under single schedule ? > > > > > > Thanks in advance. > > > > -Obaid > > > > > >
