Hi,

The way I expect NiFi, is a suitable replacement of non-opensource &
so-called opensource ETL tools :).
In traditional ETL(or ELT) process, we use below process flow to load data
in near real time:

1. spool dir > 2. collect available files > 3. load files in a "staging
table" DB without transformation > 4. (In DB) do final transformation and
load to final table > 5. truncate the "staging table" to avoid duplication
on next load

I think, above process flow is self explanatory. Process 3 to 5 should be
done in sequence(each of them can be parallel individually).

So, I am proposing below:
- Implement scheduling option on process group.
- Start-Stop on a "scheduled process group" will be possible on on process
group level.
- Each processors under same "scheduled process group" will maintain all
other properties (like parallelism).

Please let me know if anything needs more clarifications.

-Obaid




On Thu, Feb 11, 2016 at 8:40 PM, Oleg Zhurakousky <
[email protected]> wrote:

> Obaid
>
> Thanks for reaching out!
> Currently it is not possible to wire a flow the way you describe.
> Wha you are asking is a true Event Driven Consumer paradigm which been
> discussed internally a lot lately, so it would be very interesting to get
> your perspective as to why do you believe it is important within your use
> case to have that, so please share, since it will help to bring this
> discussion to a resolution.
>
> Having said that in the current NiFi architecture you still get that
> serial processing that you are describing. The only difference is that
> every consumer (Processor) is independent from the internals of another
> since there is a Queue separating each and every one of them allowing each
> processor to be managed independently (e.g., start/stop) while letting flow
> to continue, knowing that messages will queue up if processor is not
> available and processed as soon and as quick as processor becomes available.
>
> Anyway, your thoughts?
>
> Cheers
> Oleg
>
> > On Feb 11, 2016, at 7:14 AM, obaidul karim <[email protected]> wrote:
> >
> > Hi All,
> >
> > Lets say, I have below processors:
> >
> > 1.listfile > 2.fetchfile > 3.putHDFS > 4.ExecuteSQL
> >
> > I want to run all above processors in sequence.
> > Processor 1 will wait for next run until all 2,3 & 4 completes.
> Similarly 2 will wait until 3 & 4 completes current run and so on.
> >
> > In other words, can I schedule the entire workflow as single job and
> under single schedule ?
> >
> >
> > Thanks in advance.
> >
> > -Obaid
> >
> >
>
>

Reply via email to