Re: Support for writing ORC file while connecting through knox

Srinivas M Mon, 06 Mar 2017 22:17:22 -0800

Thanks Owen and Larry for your perspective on this. This information is
helpful. I shall explore on the alternatives to meet the requirements of my
use case for now.


On a side note, it was mentioned that there are plans (or it is being
considered) to add the knoxFS. I have a question on that. As and when such
a client / API is made available, should the ORC implementation also has to
be enhanced to support the knoxFS in the ORC API or should that come in by
default. Or it would be too early to discuss on that.

On Tue, Mar 7, 2017 at 12:20 AM, larry mccay <[email protected]> wrote:

> Thanks for adding the Knox list to this conversation, Owen!
>
> This is an interesting topic and one that we should define an end-to-end
> usecase for.
>
> We have considered a number of things to address this at one time or
> another and have encountered one or more roadblocks on some of them:
>
> * Knox (or Proxy) FileSystem implementation that would accommodate the use
> of addition context needed to route requests through a proxy such as Knox
> by altering the default URLs to match what is expected by Knox. There was a
> POC of this done a while back and we can try and dust that off.
> * Knox did have a feature for configuring the "default topology" which
> would allow the URLs that are expected to be used with webhdfs direct to
> work and Knox would translate the interactions into the context of the
> configured default URLs. This feature is currently not working
> unfortunately and we have a JIRA filed to correct that.
> * There may be work needed in the java webhdfs client in order to
> accommodate SPNEGO on the redirected DN interactions. Currently, the DN
> doesn't expect the hadoop.auth cookie but a block access token instead (I
> believe). So, when the block access token is presented to a Knox instance
> that is configured to use the Hadoop Auth provider it doesn't find a
> hadoop.auth cookie so it challenges the client again. This is not expected
> in existing clients and it throws an exception. Investigation needed here
> for most efficient way to address this.
>
> Incidentally, you may also consider looking at the KnoxShell client
> classes to write a file to HDFS.
>
> http://knox.apache.org/books/knox-0-11-0/user-guide.html#Client+Details
>
> The example below is showing how to use the groovy based DSL for
> establishing a "session", deleting, writing and reading files to HDFS.
> The underlying java classes can be used directly as well as an SDK to do
> the same.
>
> Uptaking the gateway-shell module can be easily done by adding a maven
> dependency to your project for that module.
> Additionally, the 0.12.0 release which is currently undergoing a VOTE for
> release contains a separate client release artifact for download.
>
> import org.apache.hadoop.gateway.shell.Hadoop
> import org.apache.hadoop.gateway.shell.hdfs.Hdfs
> import groovy.json.JsonSlurper
>
> gateway = "https://localhost:8443/gateway/sandbox";
> username = "guest"
> password = "guest-password"
> dataFile = "README"
>
> session = Hadoop.login( gateway, username, password )
> Hdfs.rm( session ).file( "/tmp/example" ).recursive().now()
> Hdfs.put( session ).file( dataFile ).to( "/tmp/example/README" ).now()
> text = Hdfs.ls( session ).dir( "/tmp/example" ).now().string
> json = (new JsonSlurper()).parseText( text )
> println json.FileStatuses.FileStatus.pathSuffix
> session.shutdown()
> exit
>
>
> On Mon, Mar 6, 2017 at 11:45 AM, Owen O'Malley <[email protected]> wrote:
>
>> Unfortunately, in the short run, you'll need to copy them locally using
>> wget or curl and then read the ORC file using file:/// paths to use the
>> local file system.
>>
>> I talked with Larry McCay from the Knox project and he said that they are
>> considering making a KnoxFS Java client, which implements
>> org.apache.hadoop.fs.FileSystem, that would handle this use case.
>>
>> .. Owen
>>
>> On Mon, Mar 6, 2017 at 4:05 AM, Srinivas M <[email protected]> wrote:
>>
>> > Hi
>> >
>> > I have an application that uses the Hive ORC API and to write a ORC file
>> > to HDFS. I use the native FileSystem API and pass the WebHDFS URI
>> > (webhdfs://host:port) to create a FileSystem Object
>> >
>> > fs = FileSystem.get(hdfsuri,conf,_user) ;
>> >
>> > While trying to connect using the Knox gateway, is there a way to still
>> > use the Native FileSystem or should I be using the REST API calls to be
>> > able to access the Files on HDFS ?
>> >
>> > If so, is there any way to read or write an ORC file in such a case,
>> given
>> > that the ORC Reader or Writers, needs an object of type "
>> > org.apache.hadoop.fs.FileSystem"
>> >
>> > --
>> > Srinivas
>> > (*-*)
>> > ------------------------------------------------------------
>> > ------------------------------------------------------------
>> > ------------------------------------------------------------------
>> > You have to grow from the inside out. None can teach you, none can make
>> > you spiritual.
>> >                       -Narendra Nath Dutta(Swamy Vivekananda)
>> > ------------------------------------------------------------
>> > ------------------------------------------------------------
>> > ------------------------------------------------------------------
>> >
>>
>
>


-- 
Srinivas
(*-*)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
You have to grow from the inside out. None can teach you, none can make you
spiritual.
                      -Narendra Nath Dutta(Swamy Vivekananda)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Re: Support for writing ORC file while connecting through knox

Reply via email to