For webhdfs it could be difficult to build out with InvokeHttp, because of the slightly unusual way Hadoop uses http redirects. The principal is that you send a request without payload, get the redirect then send the request dictated by the redirect, but with the payload (redirect should be to a datanodes, original request to namenode). This is not strictly speaking correct http behaviour, so might be harder to implement with InvokeHttp. On that basis I would probably vote specific processor.
Simon On 22 Apr 2016, at 01:37, Jeremy Dyer <jdy...@gmail.com<mailto:jdy...@gmail.com>> wrote: Yep all of those reasons make perfect sense to me. Now the question becomes is this something where we create new processors or just build out templates using existing processors like InvokeHTTP that we make publicly available? My vote would probably be for just making the processors but I would love to hear arguments for one or the other. On Thu, Apr 21, 2016 at 8:15 PM, larry mccay <lmc...@apache.org<mailto:lmc...@apache.org>> wrote: All valid points. Of course storing credentials in clear text in the definition is less than ideal but we could figure something out there as well. On Thu, Apr 21, 2016 at 7:50 PM, Tom Stewart <stewartthom...@yahoo.com<mailto:stewartthom...@yahoo.com>> wrote: I will share what would interest me. The HDFS processor today runs with authority matching the userid that NiFi is running as. Interactions with HDFS are via that userid, which limits what it can access. Now granted there are two options with the current PutHDFS processor (I believe). If you have a Kerberized cluster, you can use those credentials. However if you don't have Kerberos on your cluster then you can grant the user running NiFi to be a HDFS superuser and use the properties to set permissions on the files after the fact. Providing a processor for WebHDFS or Knox would offer several things that I can tell: - Not needing the core-site.xml and hdfs-site.xml files would be one advantage to some sites. Coordinating those between all of your Hadoop clusters and NiFi clusters could become cumbersome. - For target clusters that might have firewalls, being able to funnel through Knox Gateway offers some advantage (although possibly at the cost of performance or scalability). - For me, the thing I'd like in a Knox Gateway processor is the ability to specify the id/pw in the definition. I have my Knox linked with Active Directory for HDFS REST API calls so passing credentials from the Put processor would be useful since each NFM could use whatever application credentials made sense for a particular flow. Thanks, Tom ________________________________ From: larry mccay <lmc...@apache.org<mailto:lmc...@apache.org>> To: users@nifi.apache.org<mailto:users@nifi.apache.org> Sent: Thursday, April 21, 2016 6:34 PM Subject: Re: Apache NiFi - WebHDFS Any WebHDFS processor should make the URL and credentials configurable so that it could go direct to WebHDFS or through the Knox Gateway. On Thu, Apr 21, 2016 at 6:11 PM, Tom Stewart <stewartthom...@yahoo.com<mailto:stewartthom...@yahoo.com>> wrote: What about Knox Gateway? > On Apr 21, 2016, at 3:21 PM, Kumiko Yada > <kumiko.y...@ds-iq.com<mailto:kumiko.y...@ds-iq.com>> wrote: > > Will do. > > Thanks > Kumiko > > -----Original Message----- > From: Joe Witt [mailto:joe.w...@gmail.com<mailto:joe.w...@gmail.com>] > Sent: Thursday, April 21, 2016 12:45 PM > To: users@nifi.apache.org<mailto:users@nifi.apache.org> > Subject: Re: Apache NiFi - WebHDFS > > Kumiko, > > Not that I am aware of. If you do end up doing so and are interested in > contributing please let us know. > > Thanks > Joe > >> On Thu, Apr 21, 2016 at 3:43 PM, Kumiko Yada >> <kumiko.y...@ds-iq.com<mailto:kumiko.y...@ds-iq.com>> wrote: >> Hello, >> >> >> >> Has anyone written the custom process for WebHDFS? >> >> >> >> Thanks >> >> Kumiko