Re: Support for writing ORC file while connecting through knox

Owen O'Malley Tue, 07 Mar 2017 08:42:55 -0800

If the Knox team implements the Hadoop FileSystem API, the ORC reader and
writer could use it automatically.


.. Owen

On Mon, Mar 6, 2017 at 10:17 PM, Srinivas M <[email protected]> wrote:

> Thanks Owen and Larry for your perspective on this. This information is
> helpful. I shall explore on the alternatives to meet the requirements of my
> use case for now.
>
> On a side note, it was mentioned that there are plans (or it is being
> considered) to add the knoxFS. I have a question on that. As and when such
> a client / API is made available, should the ORC implementation also has to
> be enhanced to support the knoxFS in the ORC API or should that come in by
> default. Or it would be too early to discuss on that.
>
> On Tue, Mar 7, 2017 at 12:20 AM, larry mccay <[email protected]> wrote:
>
>> Thanks for adding the Knox list to this conversation, Owen!
>>
>> This is an interesting topic and one that we should define an end-to-end
>> usecase for.
>>
>> We have considered a number of things to address this at one time or
>> another and have encountered one or more roadblocks on some of them:
>>
>> * Knox (or Proxy) FileSystem implementation that would accommodate the
>> use of addition context needed to route requests through a proxy such as
>> Knox by altering the default URLs to match what is expected by Knox. There
>> was a POC of this done a while back and we can try and dust that off.
>> * Knox did have a feature for configuring the "default topology" which
>> would allow the URLs that are expected to be used with webhdfs direct to
>> work and Knox would translate the interactions into the context of the
>> configured default URLs. This feature is currently not working
>> unfortunately and we have a JIRA filed to correct that.
>> * There may be work needed in the java webhdfs client in order to
>> accommodate SPNEGO on the redirected DN interactions. Currently, the DN
>> doesn't expect the hadoop.auth cookie but a block access token instead (I
>> believe). So, when the block access token is presented to a Knox instance
>> that is configured to use the Hadoop Auth provider it doesn't find a
>> hadoop.auth cookie so it challenges the client again. This is not expected
>> in existing clients and it throws an exception. Investigation needed here
>> for most efficient way to address this.
>>
>> Incidentally, you may also consider looking at the KnoxShell client
>> classes to write a file to HDFS.
>>
>> http://knox.apache.org/books/knox-0-11-0/user-guide.html#Client+Details
>>
>> The example below is showing how to use the groovy based DSL for
>> establishing a "session", deleting, writing and reading files to HDFS.
>> The underlying java classes can be used directly as well as an SDK to do
>> the same.
>>
>> Uptaking the gateway-shell module can be easily done by adding a maven
>> dependency to your project for that module.
>> Additionally, the 0.12.0 release which is currently undergoing a VOTE for
>> release contains a separate client release artifact for download.
>>
>> import org.apache.hadoop.gateway.shell.Hadoop
>> import org.apache.hadoop.gateway.shell.hdfs.Hdfs
>> import groovy.json.JsonSlurper
>>
>> gateway = "https://localhost:8443/gateway/sandbox";
>> username = "guest"
>> password = "guest-password"
>> dataFile = "README"
>>
>> session = Hadoop.login( gateway, username, password )
>> Hdfs.rm( session ).file( "/tmp/example" ).recursive().now()
>> Hdfs.put( session ).file( dataFile ).to( "/tmp/example/README" ).now()
>> text = Hdfs.ls( session ).dir( "/tmp/example" ).now().string
>> json = (new JsonSlurper()).parseText( text )
>> println json.FileStatuses.FileStatus.pathSuffix
>> session.shutdown()
>> exit
>>
>>
>> On Mon, Mar 6, 2017 at 11:45 AM, Owen O'Malley <[email protected]>
>> wrote:
>>
>>> Unfortunately, in the short run, you'll need to copy them locally using
>>> wget or curl and then read the ORC file using file:/// paths to use the
>>> local file system.
>>>
>>> I talked with Larry McCay from the Knox project and he said that they are
>>> considering making a KnoxFS Java client, which implements
>>> org.apache.hadoop.fs.FileSystem, that would handle this use case.
>>>
>>> .. Owen
>>>
>>> On Mon, Mar 6, 2017 at 4:05 AM, Srinivas M <[email protected]> wrote:
>>>
>>> > Hi
>>> >
>>> > I have an application that uses the Hive ORC API and to write a ORC
>>> file
>>> > to HDFS. I use the native FileSystem API and pass the WebHDFS URI
>>> > (webhdfs://host:port) to create a FileSystem Object
>>> >
>>> > fs = FileSystem.get(hdfsuri,conf,_user) ;
>>> >
>>> > While trying to connect using the Knox gateway, is there a way to still
>>> > use the Native FileSystem or should I be using the REST API calls to be
>>> > able to access the Files on HDFS ?
>>> >
>>> > If so, is there any way to read or write an ORC file in such a case,
>>> given
>>> > that the ORC Reader or Writers, needs an object of type "
>>> > org.apache.hadoop.fs.FileSystem"
>>> >
>>> > --
>>> > Srinivas
>>> > (*-*)
>>> > ------------------------------------------------------------
>>> > ------------------------------------------------------------
>>> > ------------------------------------------------------------------
>>> > You have to grow from the inside out. None can teach you, none can make
>>> > you spiritual.
>>> >                       -Narendra Nath Dutta(Swamy Vivekananda)
>>> > ------------------------------------------------------------
>>> > ------------------------------------------------------------
>>> > ------------------------------------------------------------------
>>> >
>>>
>>
>>
>
>
> --
> Srinivas
> (*-*)
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------------
> You have to grow from the inside out. None can teach you, none can make
> you spiritual.
>                       -Narendra Nath Dutta(Swamy Vivekananda)
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------------
>

Re: Support for writing ORC file while connecting through knox

Reply via email to