May be this helps:

https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Pig



On Fri, Apr 3, 2015 at 5:56 AM, Remy Dubois <[email protected]> wrote:

>  Hi everyone,
>
>
>
> I used to think about the constraint that a Hadoop client has to know and
> to have access to each single datanode to be able to read/write from/to
> HDFS. What happens if there are strong security policies on top of our
> cluster ?
>
> I found the HttpFs (and webhdfs) that allows a client to talk to a single
> machine, in order to do what I’m looking for. Operations on HDFS work fine
> indeed.
>
>
>
> Then, I’ve tried to execute a Pig (with Pig 0.12 on top of Hadoop 2.3.0)
> job using the same way. And here, there is these FileContext and
> AbstractFileSystem classes that don’t allow any other FileSystem than hdfs
> and local. WebHdfs is then not accepted.
>
> It’s not a problem until you need to register a jar in your Pig
> application. Indeed, regarding the Load and the Store, prefixing their path
> with the webhdfs:// scheme works. But when you register a jar in the Pig
> application, the PigServer will reuse the initial configuration (the one
> with the hdfs://) in order to send the jars to the distributed cache. And
> at that point it fails because the client doesn’t have access to the
> datanodes.
>
>
>
> Am I right in my understanding of what happens in that case ?
>
> Also, anyone meets this issue already? Any solution? Workaround?
>
>
>
> Thanks a lot in advance,
>
>
>
> Rémy.
>

Reply via email to