Manish, As you have laid it out to work i think it would be harder than it should be. However, that is in part due to not taking advantage of what NiFi's strengths are in building something reliable for that. The JS being called - it may be possible to simply implement its logic to pull data directly into NiFi rather than as files that get pulled into a dir and then have NiFi grab them. It would be rare that we could count on pulling data from a directory as a one-time event and thus once data is gone from it we know we're 'done'. So it is certainly better to avoid that and pull data into NiFi directly.
Once data is in NiFi: - Validation against the schema is straightforward using 'ValidateXML' - Delivery to HDFS is easy. - Kicking off some process once data is delivered to HDFS is easy. However, the part we don't have a good answer for (today) is how to kick off that job only once all items of a correlated group of items are passed the HDFS delivery point is something we don't have a good answer for. It is definitely solvable it is something that we should tackle. I totally agree that is a great use case. That said, what do you think about converting the XML to Avro directly in NiFi itself? We don't have a processor out of the box for it but you clearly already have the code for it so putting that into a processor should be quite straight forward. Thanks Joe On Fri, Dec 4, 2015 at 10:25 AM, Manish Gupta 8 <[email protected]> wrote: > Can someone please provide a workaround for this scenario. > > > > Thanks, > > Manish > > > > > > From: Manish Gupta 8 [mailto:[email protected]] > Sent: Thursday, December 03, 2015 2:18 PM > To: [email protected] > Subject: Trigger a processor if all files in a folder are processed > > > > Hi, > > > > I have a scenario where I want to trigger / execute one processor once > GetFile has pulled all the files from a folder and the last processor has > finished its execution. How can I implement this in Nifi? > > > > Basically what I am trying to do is: > > ({Execute Process to call some phantomJS script to download few files in a > directory}) : runs every 1 hour > > ({Get File (xml)} à {Validate with XSD} à {Put HDFS}): checks for files > continuously > > > > Now after this flow is complete i.e. all files are available in HDFS, I want > to submit my XML to Avro conversion MR job using Oozie REST. How can I make > sure that my Invoke HTTP processor executes only once and that too after all > files have successfully landed in HDFS? > > > > Thanks, > > Manish > >
