May be this helps: https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Pig
On Fri, Apr 3, 2015 at 5:56 AM, Remy Dubois <[email protected]> wrote: > Hi everyone, > > > > I used to think about the constraint that a Hadoop client has to know and > to have access to each single datanode to be able to read/write from/to > HDFS. What happens if there are strong security policies on top of our > cluster ? > > I found the HttpFs (and webhdfs) that allows a client to talk to a single > machine, in order to do what I’m looking for. Operations on HDFS work fine > indeed. > > > > Then, I’ve tried to execute a Pig (with Pig 0.12 on top of Hadoop 2.3.0) > job using the same way. And here, there is these FileContext and > AbstractFileSystem classes that don’t allow any other FileSystem than hdfs > and local. WebHdfs is then not accepted. > > It’s not a problem until you need to register a jar in your Pig > application. Indeed, regarding the Load and the Store, prefixing their path > with the webhdfs:// scheme works. But when you register a jar in the Pig > application, the PigServer will reuse the initial configuration (the one > with the hdfs://) in order to send the jars to the distributed cache. And > at that point it fails because the client doesn’t have access to the > datanodes. > > > > Am I right in my understanding of what happens in that case ? > > Also, anyone meets this issue already? Any solution? Workaround? > > > > Thanks a lot in advance, > > > > Rémy. >
