Kevin, I did use Knox with HDP 2.4. I can't say whether we saw this issue 
though. Sorry.

Alex

________________________________
From: Kevin Risden [[email protected]]
Sent: 24 May 2017 23:24
To: [email protected]
Subject: Re: Encoding/escaping whitespace in WebHDFS requests

Just saw this as I was submitting a potentially related WebHBase url encoding 
email to the knox-user list. Curious if they are related.

Alex - out of curiousity did you use Knox with HDP 2.4 or prior and not see 
this issue?

Kevin Risden

On Wed, May 24, 2017 at 4:08 PM, larry mccay 
<[email protected]<mailto:[email protected]>> wrote:
Thank you, Alex.

Please file a JIRA for this with the above details.
I will try and reproduce and investigate and see if we can't get it fixed or a 
workaround for the 0.13.0 release.
This is planned for the end of next week.

On Wed, May 24, 2017 at 3:18 PM, Willmer, Alex (UK Defence) 
<[email protected]<mailto:[email protected]>> wrote:
Hi Larry,

The same file does work directly from WebHDFS (see below). Looking more closely 
at the logs I sent previously, it looks like Knox (or something in the chain 
I'm unaware of) is decoding the %20 encoded spaces, then reencoding them as + 
encoded, i.e.

17/05/24 15:51:05 
||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
 with spaces.pdf?op=OPEN|unavailable|Request method: GET
..
17/05/24 15:51:05 
||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|success|Response
 status: 404

With thanks, Alex


Direct WebHDFS request (hostnames redacted)

# curl -si -u: 
"http://<namenode>:50070/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" 
--negotiate -L | head -n40
HTTP/1.1 401 Authentication required
Cache-Control: must-revalidate,no-cache,no-store
Date: Wed, 24 May 2017 19:01:41 GMT
Pragma: no-cache
Date: Wed, 24 May 2017 19:01:41 GMT
Pragma: no-cache
X-FRAME-OPTIONS: SAMEORIGIN
WWW-Authenticate: Negotiate
Set-Cookie: hadoop.auth=; Path=/; HttpOnly
Content-Type: text/html; charset=iso-8859-1
Content-Length: 1533
Server: Jetty(6.1.26.hwx)

HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Wed, 24 May 2017 19:01:42 GMT
Date: Wed, 24 May 2017 19:01:42 GMT
Pragma: no-cache
Expires: Wed, 24 May 2017 19:01:42 GMT
Date: Wed, 24 May 2017 19:01:42 GMT
Pragma: no-cache
X-FRAME-OPTIONS: SAMEORIGIN
WWW-Authenticate: Negotiate 
YGkGCSqGSIb3EgECAgIAb1owWKADAgEFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscxMzW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU=
Set-Cookie: 
hadoop.auth="u=admin&p=admin@CYSAFA&t=kerberos&e=1495688502002&s=b7p35TgaxItAUTkKJuSXuynoq9E=";
 Path=/; HttpOnly
Content-Type: application/octet-stream
Location: 
http://<datanode3>:1022/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ8ooBXF_ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVnYXRpb24PMTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0
Content-Length: 0
Server: Jetty(6.1.26.hwx)

HTTP/1.1 200 OK
Access-Control-Allow-Methods: GET
Access-Control-Allow-Origin: *
Content-Type: application/octet-stream
Connection: close
Content-Length: 13365618

%����1.6
<</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream
...


________________________________
From: larry mccay [[email protected]<mailto:[email protected]>]
Sent: 24 May 2017 18:05
To: [email protected]<mailto:[email protected]>
Subject: Re: Encoding/escaping whitespace in WebHDFS requests

Hi Alex -

I notice from the audit log that the 404 is actually coming from WebHDFS not 
from Knox.
Can you confirm that direct access to WebHDFS without going through Knox works 
with the same URL?

thanks,

--larry

On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) 
<[email protected]<mailto:[email protected]>> wrote:
How should I encode spaces characters in the URL when I make a request to 
WebHDFS through Knox? Or should be enabling/configuring  something in Knox to 
handle them?

I'm making the following (redacted values in <>) request to WebHDFS, through 
Knox

curl 
"https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN"
 \
     -<username>:<password> -k -s

However Knox is returning HTTP 404 with the following body 
(whitespace/formatting added by me)

{"exception":"FileNotFoundException",
 "javaClassName":"java.io<http://java.io>.FileNotFoundException",
 "message":"File /docs/filename+with+spaces.pdf not found."}}

I've tried encoding the spaces as + (same result), and not encoding them (HTTP 
400  Unknown Version).
If I request a file for which the path does not contain spaces then it works.

Any ideas?

With thanks, Alex



PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK 1.8.0_131 on 
CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is enabled in the 
cluster.

The (redacted) response headers for the %20 encoded request

< HTTP/1.1 404 Not Found
< Date: Wed, 24 May 2017 15:34:26 GMT
< Set-Cookie: 
JSESSIONID=15acwo8gt9qr8gdbvk48y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0; 
Expires=Tue, 23-May-2017 15:34:26 GMT
< Cache-Control: no-cache
< Expires: Wed, 24 May 2017 15:34:26 GMT
< Date: Wed, 24 May 2017 15:34:26 GMT
< Pragma: no-cache
< Expires: Wed, 24 May 2017 15:34:26 GMT
< Date: Wed, 24 May 2017 15:34:26 GMT
< Pragma: no-cache
< X-FRAME-OPTIONS: SAMEORIGIN
< Content-Type: application/json; charset=UTF-8
< Server: Jetty(6.1.26.hwx)
< Content-Length: 252

The (redacted) Knox logs for the %20 encoded request

==> /var/log/hadoop/knox/gateway-audit.log <==
17/05/24 15:51:05 
||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
 with spaces.pdf?op=OPEN|unavailable|Request method: GET
17/05/24 15:51:05 
||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
 with spaces.pdf?op=OPEN|success|
17/05/24 15:51:05 
||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
 with spaces.pdf?op=OPEN|success|Groups: []
17/05/24 15:51:05 
||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
 with spaces.pdf?op=OPEN|success|
17/05/24 15:51:05 
||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|unavailable|Request
 method: GET
17/05/24 15:51:05 
||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|success|Response
 status: 404
17/05/24 15:51:05 
||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
 with spaces.pdf?op=OPEN|success|Response status: 404

==> /var/log/hadoop/knox/gateway.log <==
2017-05-24 15:51:05,254 INFO  hadoop.gateway 
(KnoxLdapRealm.java:getUserDn(691)) - Computed userDn: 
uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for 
principal: <username>
2017-05-24 15:51:05,259 INFO  hadoop.gateway 
(AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true

The (redacted) topology

<topology>
    <gateway>
        <provider>
            <role>authentication</role>
            <name>ShiroProvider</name>
            <enabled>true</enabled>
            <param>
                <name>sessionTimeout</name>
                <value>30</value>
            </param>
            <param>
                <name>main.ldapRealm</name>
                
<value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value>
            </param>
            <param>
                <name>main.ldapContextFactory</name>
                
<value>org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory</value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory</name>
                <value>$ldapContextFactory</value>
            </param>
            <param>
                <name>main.ldapRealm.userDnTemplate</name>
                <value>uid={0},cn=users,cn=accounts,dc=<cluster></value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory.url</name>
                <value>ldap://<freeipa_node>:389</value>
            </param>
            <param>
                
<name>main.ldapRealm.contextFactory.authenticationMechanism</name>
                <value>simple</value>
            </param>
            <param>
                <name>urls./**</name>
                <value>authcBasic</value>
            </param>
        </provider>
        <provider>
            <role>authorization</role>
            <name>AclsAuthz</name>
            <enabled>true</enabled>
            <param>
                <name>knox.acl</name>
                <value>admin;*;*</value>
            </param>
        </provider>
        <provider>
            <role>identity-assertion</role>
            <name>Default</name>
            <enabled>true</enabled>
        </provider>
        <provider>
            <role>hostmap</role>
            <name>static</name>
            <enabled>false</enabled>
            
<param><name>localhost</name><value>sandbox,sandbox.hortonworks.com<http://sandbox.hortonworks.com></value></param>
        </provider>
    </gateway>

    <service>
        <role>WEBHDFS</role>
        <url>http://<namenode>:50070/webhdfs</url>
    </service>

    <service>
        <role>SOLRAPI</role>
        <url>http://<solrnode>:6083/solr</url>
    </service>
</topology>




Reply via email to