For webhdfs it could be difficult to build out with InvokeHttp, because of the 
slightly unusual way Hadoop uses http redirects. The principal is that you send 
a request without payload, get the redirect then send the request dictated by 
the redirect, but with the payload (redirect should be to a datanodes, original 
request to namenode). This is not strictly speaking correct http behaviour, so 
might be harder to implement with InvokeHttp. On that basis I would probably 
vote specific processor.

Simon


On 22 Apr 2016, at 01:37, Jeremy Dyer 
<jdy...@gmail.com<mailto:jdy...@gmail.com>> wrote:

Yep all of those reasons make perfect sense to me. Now the question becomes is 
this something where we create new processors or just build out templates using 
existing processors like InvokeHTTP that we make publicly available? My vote 
would probably be for just making the processors but I would love to hear 
arguments for one or the other.

On Thu, Apr 21, 2016 at 8:15 PM, larry mccay 
<lmc...@apache.org<mailto:lmc...@apache.org>> wrote:
All valid points.
Of course storing credentials in clear text in the definition is less than 
ideal but we could figure something out there as well.

On Thu, Apr 21, 2016 at 7:50 PM, Tom Stewart 
<stewartthom...@yahoo.com<mailto:stewartthom...@yahoo.com>> wrote:
I will share what would interest me. The HDFS processor today runs with 
authority matching the userid that NiFi is running as. Interactions with HDFS 
are via that userid, which limits what it can access. Now granted there are two 
options with the current PutHDFS processor (I believe). If you have a 
Kerberized cluster, you can use those credentials. However if you don't have 
Kerberos on your cluster then you can grant the user running NiFi to be a HDFS 
superuser and use the properties to set permissions on the files after the fact.

Providing a processor for WebHDFS or Knox would offer several things that I can 
tell:
  - Not needing the core-site.xml and hdfs-site.xml files would be one 
advantage to some sites.  Coordinating those between all of your Hadoop 
clusters and NiFi clusters could become cumbersome.
  - For target clusters that might have firewalls, being able to funnel through 
Knox Gateway offers some advantage (although possibly at the cost of 
performance or scalability).
  - For me, the thing I'd like in a Knox Gateway processor is the ability to 
specify the id/pw in the definition. I have my Knox linked with Active 
Directory for HDFS REST API calls so passing credentials from the Put processor 
would be useful since each NFM could use whatever application credentials made 
sense for a particular flow.

Thanks,
Tom

________________________________
From: larry mccay <lmc...@apache.org<mailto:lmc...@apache.org>>
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Sent: Thursday, April 21, 2016 6:34 PM
Subject: Re: Apache NiFi - WebHDFS

Any WebHDFS processor should make the URL and credentials configurable so that 
it could go direct to WebHDFS or through the Knox Gateway.

On Thu, Apr 21, 2016 at 6:11 PM, Tom Stewart 
<stewartthom...@yahoo.com<mailto:stewartthom...@yahoo.com>> wrote:
What about Knox Gateway?

> On Apr 21, 2016, at 3:21 PM, Kumiko Yada 
> <kumiko.y...@ds-iq.com<mailto:kumiko.y...@ds-iq.com>> wrote:
>
> Will do.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Joe Witt [mailto:joe.w...@gmail.com<mailto:joe.w...@gmail.com>]
> Sent: Thursday, April 21, 2016 12:45 PM
> To: users@nifi.apache.org<mailto:users@nifi.apache.org>
> Subject: Re: Apache NiFi - WebHDFS
>
> Kumiko,
>
> Not that I am aware of.  If you do end up doing so and are interested in 
> contributing please let us know.
>
> Thanks
> Joe
>
>> On Thu, Apr 21, 2016 at 3:43 PM, Kumiko Yada 
>> <kumiko.y...@ds-iq.com<mailto:kumiko.y...@ds-iq.com>> wrote:
>> Hello,
>>
>>
>>
>> Has anyone written the custom process for WebHDFS?
>>
>>
>>
>> Thanks
>>
>> Kumiko






Reply via email to