I mean supporting Knox as either an additional scheme in the like hdfs://,
webhdfs:// knoxdfs:// or by just pointing the webhdfs:// scheme to knox.
The latter being preferable.
We did do some early work to provide a "default" topology - the name of
which is configurable - you will notice that within the
{GATEAY_HOME}/data/deployments that there is an additional webapp called
_default. This topology can be accessed without the Knox specific URL path
elements of "gateway/default" or gateway/{topology_name}. The intent of
this was to be able to use the Hadoop Java Client without change to access
APIs.
Unfortunately, we still have a couple known issues with that mechanism:
* the hadoop java client doesn't provide username/password as basic auth -
which we would need to use for basic auth to LDAP authentication
* our existing integration of the hadoop auth module for kerberos/SPNEGO
support doesn't currently support the redirect to datanodes with the file
access token instead of SPNEGO challenge yet.
So, digging into the requirements of the hadoop java client for content
length and transfer encoding isn't something that we have actually gotten
to yet and this will provide insights that we certainly need.
I think that the idea of considering the "webhdfs://' scheme as a wire
protocol rather than explicitly meaning the webhdfs endpoint itself is what
we should strive for. This could potentially mean our "_default" topology
idea or some environment variable or command line switch that mounts a
particular "webhdfs://" service which could mean a Knox topology identifier
rather than an actual "webhdfs" endpoint. This would provide more
flexibility that "_default" since there can only be one of those in a
gateway instance.
Now...whether 0.5.0 dropped the content length or not, we would have to
look up when that was added but I know that we had very early discussions
about that in the project's life. Perhaps there was a point where it just
wasn't working properly...
We can certainly consider the Transfer-Encoding header in lieu of the
Content-Length - it seems that would likely be more easily determined than
calculating the rewritten length up front - which would be challenging.
We should definitely understand why one or the other is required too.
A JIRA and path would certainly be welcomed!
On Sat, Nov 28, 2015 at 2:39 AM, Natasha d'silva <[email protected]>
wrote:
> Hi Larry,
> Thanks for getting back to me!
> Is dropping the content length header the result of a recent change or has
> it always been this way?
> That is, I'd like to know if/why it doesn't get dropped in 0.5. That
> Content length missing" exception is thrown If both the content length
> and Transfer Encoding headers are absent. So if the content length is
> dropped, could transfer Encoding be sent instead?
> I can read the same file using cURL, so I know it is possible, I am just
> not sure what the problem is from within the Java client. Any thoughts as
> to why it works in one case and not the other? I've compared the logs for
> each case and nothing else really stood out, other than that the buffer
> size is specified in the request issued from the Java client and not in the
> request from cURL.
>
> Also, when you say supporting Knox in the hadoop CLI.. Do you mean
> supporting Knox in the Hadoop Java API?
> I will open a jira outlining what I've found and try to put a patch
> together.
> On Nov 27, 2015 11:29 PM, "larry mccay" <[email protected]> wrote:
>
>> Hi Natasha -
>>
>> I believe that Knox drops that on the floor because we rewrite the
>> response and therefore invalidate the content length that is received from
>> WebHDFS.
>> Sending an incorrect Content-Length causes other problems.
>>
>> Recalculating the size from scratch would be a performance issue perhaps
>> we can calculate it based on the difference in the replaced content and
>> rewritten content but we would need to assess that possibility.
>>
>> Would love to work on getting support for Knox in the hadoop CLIs!
>>
>> If you would like to file a JIRA to assess that work and/or contribute a
>> fix for it that would be fantastic.
>>
>> thanks,
>>
>> --larry
>>
>> On Fri, Nov 27, 2015 at 8:32 PM, Natasha d'silva <[email protected]>
>> wrote:
>>
>>>
>>> Using Knox 0.6.0 with hadoop 2.7.1.
>>> I am using the Hadoop FileSystem api but I have made some very minor
>>> modifications to the WebHDFSFileSystem class:
>>> 1) I hardcoded the username/password credentials for the server such
>>> that the connection to the server succeeds.
>>> 2) I also changed the default path for the connections such that the URL
>>> path is
>>> https://<HOST>:<PORT>/*gateway/default/*webhdfs/v1/<PATH>
>>> instead of
>>> https://<HOST>:<PORT>/webhdfs/v1/<PATH>
>>> With these changes, I am able to successfully write to the server.
>>> However, I cannot read. I get the following exception:
>>>
>>> java.io.IOException: Content-Length is missing: {null=[HTTP/1.1 200
>>> OK], Server=[Jetty(8.1.14.v20131031)], Access-Control-Allow-Origin=[*],
>>> Access-Control-Allow-Methods=[GET], Connection=[close],
>>> Set-Cookie=[JSESSIONID=eknosvics9ou1s6ttv2kbfscl;Path=/gateway/default;Secure;HttpOnly],
>>> Expires=[Thu, 01 Jan 1970 00:00:00 GMT],
>>> Content-Type=[application/octet-stream]}
>>> at
>>> org.apache.hadoop.hdfs.web.ByteRangeInputStream.getStreamLength(ByteRangeInputStream.java:153)
>>> at
>>> org.apache.hadoop.hdfs.web.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:131)
>>> at
>>> org.apache.hadoop.hdfs.web.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:105)
>>> at
>>> org.apache.hadoop.hdfs.web.ByteRangeInputStream.<init>(ByteRangeInputStream.java:90)
>>> at
>>> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$OffsetUrlInputStream.<init>(WebHdfsFileSystem.java:1275)
>>> at
>>> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1189)
>>> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767)
>>>
>>>
>>>
>>> Comparing the headers between Knox 0.6 and Knox 0.5 shows that in v0.5
>>> the Content header *is* sent normally, but missing in the response to the
>>> client from Knox 0.6.
>>> I have turned on debugging in the Knox server and what is interesting is
>>> that the trace shows that the file is read successfully and the right
>>> headers seem to be generated, as is shown in the following excerpt from
>>> gateway.log:
>>>
>>>
>>> 2015-11-27 23:22:48,757 DEBUG conn.HttpClientConnectionOperator
>>> (HttpClientConnectionOperator.java:connect(129)) - Connection established
>>> 10.143.177.241:57593<->10.143.177.15:50075
>>> 2015-11-27 23:22:48,757 DEBUG execchain.MainClientExec
>>> (MainClientExec.java:execute(238)) - Executing request GET
>>> /webhdfs/v1/user/admin/hdfs_test_2015_20151127_182243.txt?user.name=admin&namenoderpcaddress=ehaascluster&buffersize=4096&offset=0&op=OPEN
>>> HTTP/1.1
>>> 2015-11-27 23:22:48,758 DEBUG execchain.MainClientExec
>>> (MainClientExec.java:execute(243)) - Target auth state: UNCHALLENGED
>>> 2015-11-27 23:22:48,758 DEBUG execchain.MainClientExec
>>> (MainClientExec.java:execute(249)) - Proxy auth state: UNCHALLENGED
>>> 2015-11-27 23:22:48,758 DEBUG http.headers
>>> (LoggingManagedHttpClientConnection.java:onRequestSubmitted(124)) -
>>> http-outgoing-16 >> GET
>>> /webhdfs/v1/user/admin/hdfs_test_2015_20151127_182243.txt?user.name=admin&namenoderpcaddress=ehaascluster&buffersize=4096&offset=0&op=OPEN
>>> HTTP/1.1
>>> 2015-11-27 23:22:48,758 DEBUG http.headers
>>> (LoggingManagedHttpClientConnection.java:onRequestSubmitted(127)) -
>>> http-outgoing-16 >> Accept: text/html, image/gif, image/jpeg, *; q=.2, */*;
>>> q=.2
>>> 2015-11-27 23:22:48,758 DEBUG http.headers
>>> (LoggingManagedHttpClientConnection.java:onRequestSubmitted(127)) -
>>> http-outgoing-16 >> Connection: keep-alive
>>> 2015-11-27 23:22:48,759 DEBUG http.headers
>>> (LoggingManagedHttpClientConnection.java:onRequestSubmitted(127)) -
>>> http-outgoing-16 >> User-Agent: Java/1.8.0
>>> 2015-11-27 23:22:48,759 DEBUG http.headers
>>> (LoggingManagedHttpClientConnection.java:onRequestSubmitted(127)) -
>>> http-outgoing-16 >> Host: sampleserver.com:50075
>>> 2015-11-27 23:22:48,759 DEBUG http.headers
>>> (LoggingManagedHttpClientConnection.java:onRequestSubmitted(127)) -
>>> http-outgoing-16 >> Accept-Encoding: gzip,deflate
>>> 2015-11-27 23:22:48,759 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 >> "GET
>>> /webhdfs/v1/user/admin/hdfs_test_2015_20151127_182243.txt?user.name=admin&namenoderpcaddress=ehaascluster&buffersize=4096&offset=0&op=OPEN
>>> HTTP/1.1[\r][\n]"
>>> 2015-11-27 23:22:48,759 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 >> "Accept: text/html, image/gif, image/jpeg, *; q=.2,
>>> */*; q=.2[\r][\n]"
>>> 2015-11-27 23:22:48,759 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 >> "Connection: keep-alive[\r][\n]"
>>> 2015-11-27 23:22:48,760 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 >> "User-Agent: Java/1.8.0[\r][\n]"
>>> 2015-11-27 23:22:48,760 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 >> "Host: sampleserver.com:50075[\r][\n]"
>>> 2015-11-27 23:22:48,760 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 >> "Accept-Encoding: gzip,deflate[\r][\n]"
>>> 2015-11-27 23:22:48,760 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 >> "[\r][\n]"
>>> 2015-11-27 23:22:48,765 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 << "HTTP/1.1 200 OK[\r][\n]"
>>> 2015-11-27 23:22:48,766 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 << "Access-Control-Allow-Methods: GET[\r][\n]"
>>> 2015-11-27 23:22:48,766 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 << "Access-Control-Allow-Origin: *[\r][\n]"
>>> 2015-11-27 23:22:48,766 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 << "Content-Type: application/octet-stream[\r][\n]"
>>> 2015-11-27 23:22:48,766 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 << "Connection: close[\r][\n]"
>>> 2015-11-27 23:22:48,767 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 << "Content-Length: 450[\r][\n]"
>>> 2015-11-27 23:22:48,767 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 << "[\r][\n]"
>>> 2015-11-27 23:22:48,767 DEBUG http.headers
>>> (LoggingManagedHttpClientConnection.java:onResponseReceived(113)) -
>>> http-outgoing-16 << HTTP/1.1 200 OK
>>> 2015-11-27 23:22:48,767 DEBUG http.headers
>>> (LoggingManagedHttpClientConnection.java:onResponseReceived(116)) -
>>> http-outgoing-16 << Access-Control-Allow-Methods: GET
>>> 2015-11-27 23:22:48,767 DEBUG http.headers
>>> (LoggingManagedHttpClientConnection.java:onResponseReceived(116)) -
>>> http-outgoing-16 << Access-Control-Allow-Origin: *
>>> 2015-11-27 23:22:48,768 DEBUG http.headers
>>> (LoggingManagedHttpClientConnection.java:onResponseReceived(116)) -
>>> http-outgoing-16 << Content-Type: application/octet-stream
>>> 2015-11-27 23:22:48,768 DEBUG http.headers
>>> (LoggingManagedHttpClientConnection.java:onResponseReceived(116)) -
>>> http-outgoing-16 << Connection: close
>>> 2015-11-27 23:22:48,768 DEBUG http.headers
>>> (LoggingManagedHttpClientConnection.java:onResponseReceived(116)) -
>>> http-outgoing-16 << Content-Length: 450
>>> 2015-11-27 23:22:48,769 DEBUG hadoop.gateway
>>> (DefaultDispatch.java:executeOutboundRequest(136)) - Dispatch response
>>> status: 200
>>> 2015-11-27 23:22:48,769 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 << "HDFS and knox read and write test New LINE 0[\n]"
>>> 2015-11-27 23:22:48,769 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 << "HDFS and knox read and write test New LINE 1[\n]"
>>> 2015-11-27 23:22:48,769 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 << "HDFS and knox read and write test New LINE 2[\n]"
>>> 2015-11-27 23:22:48,769 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 << "HDFS and knox read and write test New LINE 3[\n]"
>>> 2015-11-27 23:22:48,770 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 << "HDFS and knox read and write test New LINE 4[\n]"
>>> 2015-11-27 23:22:48,770 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 << "HDFS and knox read and write test New LINE 5[\n]"
>>> 2015-11-27 23:22:48,770 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 << "HDFS and knox read and write test New LINE 6[\n]"
>>> 2015-11-27 23:22:48,770 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 << "HDFS and knox read and write test New LINE 7[\n]"
>>> 2015-11-27 23:22:48,770 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 << "HDFS and knox read and write test New LINE 8[\n]"
>>> 2015-11-27 23:22:48,771 DEBUG http.wire (Wire.java:wire(72)) -
>>> http-outgoing-16 << "HDFS and knox read and write test New LINE 9[\n]"
>>> 2015-11-27 23:22:48,771 DEBUG conn.DefaultManagedHttpClientConnection
>>> (LoggingManagedHttpClientConnection.java:shutdown(87)) - http-outgoing-16:
>>> Shutdown connection
>>> 2015-11-27 23:22:48,771 DEBUG execchain.MainClientExec
>>> (ConnectionHolder.java:abortConnection(126)) - Connection discarded
>>> 2015-11-27 23:22:48,771 DEBUG conn.DefaultManagedHttpClientConnection
>>> (LoggingManagedHttpClientConnection.java:close(79)) - http-outgoing-16:
>>> Close connection
>>> 2015-11-27 23:22:48,771 DEBUG conn.PoolingHttpClientConnectionManager
>>> (PoolingHttpClientConnectionManager.java:releaseConnection(287)) -
>>> Connection released: [id: 16][route:
>>> {}->http://sampleserver.com:50075][total
>>> kept alive: 0; route allocated: 0 of 2; total allocated: 0 of 20]
>>> 2015-11-27 23:22:48,772 DEBUG nio.ssl (SslConnection.java:process(347))
>>> - [Session-1, SSL_NULL_WITH_NULL_NULL] SslConnection@172f1603 SSL
>>> NOT_HANDSHAKING i/o/u=0/0/0 ishut=false oshut=false
>>> {AsyncHttpConnection@c8a8734,g=HttpGenerator{s=2,h=307,b=450,c=-1},p=HttpParser{s=-5,l=10,c=0},r=1}
>>> NOT_HANDSHAKING filled=0/0 flushed=0/0
>>> 2015-11-27 23:22:48,772 DEBUG nio.ssl (SslConnection.java:wrap(462)) -
>>> [Session-1, SSL_NULL_WITH_NULL_NULL] wrap OK NOT_HANDSHAKING consumed=307
>>> produced=341
>>> 2015-11-27 23:22:48,772 DEBUG nio.ssl (SslConnection.java:process(347))
>>> - [Session-1, SSL_NULL_WITH_NULL_NULL] SslConnection@172f1603 SSL
>>> NOT_HANDSHAKING i/o/u=0/0/0 ishut=false oshut=false
>>> {AsyncHttpConnection@c8a8734,g=HttpGenerator{s=2,h=0,b=450,c=-1},p=HttpParser{s=-5,l=10,c=0},r=1}
>>> NOT_HANDSHAKING filled=0/0 flushed=341/0
>>> 2015-11-27 23:22:48,773 DEBUG nio.ssl (SslConnection.java:process(347))
>>> - [Session-1, SSL_NULL_WITH_NULL_NULL] SslConnection@172f1603 SSL
>>> NOT_HANDSHAKING i/o/u=0/0/0 ishut=false oshut=false
>>> {AsyncHttpConnection@c8a8734,g=HttpGenerator{s=2,h=0,b=450,c=-1},p=HttpParser{s=-5,l=10,c=0},r=1}
>>> NOT_HANDSHAKING filled=0/0 flushed=0/0
>>> 2015-11-27 23:22:48,773 DEBUG nio.ssl (SslConnection.java:process(347))
>>> - [Session-1, SSL_NULL_WITH_NULL_NULL] SslConnection@172f1603 SSL
>>> NOT_HANDSHAKING i/o/u=0/0/0 ishut=false oshut=false
>>> {AsyncHttpConnection@c8a8734,g=HttpGenerator{s=2,h=0,b=450,c=-1},p=HttpParser{s=-5,l=10,c=0},r=1}
>>> NOT_HANDSHAKING filled=0/0 flushed=0/0
>>> 2015-11-27 23:22:48,773 DEBUG nio.ssl (SslConnection.java:wrap(462)) -
>>> [Session-1, SSL_NULL_WITH_NULL_NULL] wrap OK NOT_HANDSHAKING consumed=450
>>> produced=522
>>> 2015-11-27 23:22:48,773 DEBUG nio.ssl (SslConnection.java:process(347))
>>> - [Session-1, SSL_NULL_WITH_NULL_NULL] SslConnection@172f1603 SSL
>>> NOT_HANDSHAKING i/o/u=0/0/0 ishut=false oshut=false
>>> {AsyncHttpConnection@c8a8734,g=HttpGenerator{s=2,h=0,b=0,c=-1},p=HttpParser{s=-5,l=10,c=0},r=1}
>>> NOT_HANDSHAKING filled=0/0 flushed=522/0
>>> 2015-11-27 23:22:48,774 DEBUG nio.ssl (SslConnection.java:process(347))
>>> - [Session-1, SSL_NULL_WITH_NULL_NULL] SslConnection@172f1603 SSL
>>> NOT_HANDSHAKING i/o/u=0/0/0 ishut=false oshut=false
>>> {AsyncHttpConnection@c8a8734,g=HttpGenerator{s=2,h=0,b=0,c=-1},p=HttpParser{s=-5,l=10,c=0},r=1}
>>> NOT_HANDSHAKING filled=0/0 flushed=0/0
>>> 2015-11-27 23:22:48,774 DEBUG server.Server (Server.java:handle(367)) -
>>> RESPONSE
>>> /gateway/default/webhdfs/data/v1/webhdfs/v1/user/admin/hdfs_test_2015_20151127_182243.txt
>>> 200 handled=true
>>>
>>>
>>>
>>> Is there something I'm missing? How come the final response from Knox
>>> does not contain the content-length header when it seems the response from
>>> Hadoop does?
>>>
>>>
>>>
>>> --
>>> -Natasha D'Silva
>>>
>>
>>