Hadoop and HttpFs

Remy Dubois Fri, 03 Apr 2015 05:58:31 -0700

Hi everyone,

I used to think about the constraint that a Hadoop client has to know and to 
have access to each single datanode to be able to read/write from/to HDFS. What 
happens if there are strong security policies on top of our cluster ?
I found the HttpFs (and webhdfs) that allows a client to talk to a single 
machine, in order to do what I'm looking for. Operations on HDFS work fine 
indeed.


Then, I've tried to execute a Pig (with Pig 0.12 on top of Hadoop 2.3.0) job 
using the same way. And here, there is these FileContext and AbstractFileSystem 
classes that don't allow any other FileSystem than hdfs and local. WebHdfs is 
then not accepted.
It's not a problem until you need to register a jar in your Pig application. 
Indeed, regarding the Load and the Store, prefixing their path with the 
webhdfs:// scheme works. But when you register a jar in the Pig application, 
the PigServer will reuse the initial configuration (the one with the hdfs://) 
in order to send the jars to the distributed cache. And at that point it fails 
because the client doesn't have access to the datanodes.

Am I right in my understanding of what happens in that case ?
Also, anyone meets this issue already? Any solution? Workaround?

Thanks a lot in advance,

Rémy.

Hadoop and HttpFs

Reply via email to