Suyog, If MergeContent is not working out, you could put a Hadoop client on the NiFi node, or a NiFi instance on a Hadoop cluster. In the latter case you can put a Remote Process Group on the edge node NiFi and an Input Port on the Hadoop cluster NiFi, then send the files from the edge to the cluster. On the Hadoop NiFi you can use PutHDFS to place the small files, then ExecuteStreamCommand to execute a "hadoop fs -cat" command to bring all the small files together for more efficient processing. I realize it's not ideal but could be a viable workaround until the aforementioned Jiras get resolved.
Regards, Matt > On Sep 7, 2016, at 12:54 PM, Kulkarni, Suyog <[email protected]> wrote: > > Thanks Matt. > Any recommendation for a workaround to achieve this? We are currently getting > hundreds of sensor messages/minute that we are ingesting into Hadoop (for > further analysis) using PutHDFS processor. But instead of creating hundreds > of small message files in HDFS, we would like to have them saved as one large > daily or weekly file. We successfully tested the MergeContent processor (to > merge the message data and periodically write one big file) but the latency > it introduces is not acceptable. What are some other options that we can try? > > Suyog Kulkarni > [email protected] > > > -----Original Message----- > From: Matt Burgess [mailto:[email protected]] > Sent: Wednesday, September 07, 2016 12:30 PM > To: [email protected] > Subject: Re: Appending files in Hadoop with PutHDFS ... > > Suyog, > > PutHDFS does not support appending files at the moment. I believe the Jira > you mentioned is NIFI-958 [1], which is marked Resolved but should be Closed > as duplicate. This case was split into two others, > NIFI-1321 for PutFile [2] and NIFI-1322 for PutHDFS [3]. The latter is not > resolved or being actively worked on, and the former appears to have been > abandoned in favor of an AppendLog processor. > > Regards, > Matt > > [1] > https://secure-web.cisco.com/1Z2BohChUCt7WjQqYnmHDRy7kZCsAU1hTdmwqXhD1Z84BMxX-RytYLbcBRv33zRDfYpu9wXqx_yKFJWyR5SMegn9OJby-c3JewEGr65lXwHqYTJ_ix0Q0VU-4VDjiRSd82iJG0oKHfrv6Ivo7RUilQDN7tSjmNblsZsaDhho_-7R88ZQ-3Dgcfl36SpoAUOQB2O6n_uhIZhQTTdksol7c4W3rIZ4l26Qy-P8IIVm5zvSA5_SFxN3fFUADzu16XnHYO6b3S76G9FFVqgyI7pyBeYGohFUsoyxDZhjYJgJMZLVFES5bHwUsgPU0TgrP33Npxqn_isikSwfNmAIuvCJ6YZAeqloaEQCHlwxJ5pioiwCopsksVWoSwswSFVHCHgdx/https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-958 > [2] > https://secure-web.cisco.com/19T3mDCw6U0hqAOuo87QoFuwEOsjyKQPygdnkLUf4xry38meESVn5ggZOEvhWbSFbK9NPpGn-A56BWwJJXXJs0xEAkhuEHgwPP8YHprSAOWnzn5O_xD6gRtigd-49MGRaItUQgLlUJ0848ZI5JUYHisuyfkCh0s4m1DRvUu_pU0I9mn_gcU-H67qdnGqKKcW6akuAUTLjK4j8dbLhMFMSb3Dnsgrs3bPH1WDjQWEhuL3erNddkJ3VNmsW83oxs9bFWEfRYbBXxVPMJzmhOpozL20bwL6rhPZZ6-RnkQhcZAvQHTCNwGiaNnUduDDx72G6a70If3wko8E_XUroaDmgGuBzK6Wc6oJNI3094Ihn9kEldYqQ-hxwsCAfyIzEiCST/https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-1321 > [3] > https://secure-web.cisco.com/19_Pxs1eklb1BUrYJIx3hAx13125_GpXkHvn4SDkYNbN9TVGLDBlfsQZ6XxLArnXHO-kbAqOygqpyyX25FgSFPNdaPv3vHsO4URVkwtamH08JQ-2ueutOKGU3SfsqY_Lpz9pXQ-HTqNiIiQWYiEWnFnBwiVfPhknsYcXIzcllpzLxbwVZ3OHMh9H4x_fUA8NrmWVgitsNSwDEZTAx3DQKcPOhQIO8YtT3IwJOwbmR_x7tsjsZVp3g15i9iPPSL6DBWZanTuAKE5Myn31IRLZpA4kYIzvTUCB4ragj8iPDIg6i1KwRxZKMDqjZXJqukPs8vPFfq47Hz3gaxzWUsPsxmNSU3VQoyOwk-yKkSaDFAQ8OdDHZDoxAHhbQl6ICspnE/https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-1322 > >> On Wed, Sep 7, 2016 at 12:24 PM, Kulkarni, Suyog <[email protected]> >> wrote: >> Hi, >> >> >> >> I just wanted to find out if PutHDFS now supports appending files in >> HDFS or not. I noticed there was a Jira with status “Resolved” for >> this, but I wanted to know which version has this feature or if there >> is any patch available for this. Also would like to know if anyone has >> tried it successfully or not. We are currently running version 0.6. >> >> >> >> Thanks, >> >> Suyog Kulkarni >> >> [email protected] >> >> >> >> >> >> >> This email transmission and any accompanying attachments may contain >> CSX privileged and confidential information intended only for the use >> of the intended addressee. Any dissemination, distribution, copying or >> action taken in reliance on the contents of this email by anyone other >> than the intended recipient is strictly prohibited. If you have >> received this email in error please immediately delete it and notify >> sender at the above CSX email address. Sender and CSX accept no >> liability for any damage caused directly or indirectly by receipt of this >> email. > > > > > This email transmission and any accompanying attachments may contain CSX > privileged and confidential information intended only for the use of the > intended addressee. Any dissemination, distribution, copying or action taken > in reliance on the contents of this email by anyone other than the intended > recipient is strictly prohibited. If you have received this email in error > please immediately delete it and notify sender at the above CSX email > address. Sender and CSX accept no liability for any damage caused directly or > indirectly by receipt of this email.
