How should I encode spaces characters in the URL when I make a request to 
WebHDFS through Knox? Or should be enabling/configuring  something in Knox to 
handle them?

I'm making the following (redacted values in <>) request to WebHDFS, through 
Knox

curl 
"https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN"
 \
     -<username>:<password> -k -s

However Knox is returning HTTP 404 with the following body 
(whitespace/formatting added by me)

{"exception":"FileNotFoundException",
 "javaClassName":"java.io.FileNotFoundException",
 "message":"File /docs/filename+with+spaces.pdf not found."}}

I've tried encoding the spaces as + (same result), and not encoding them (HTTP 
400  Unknown Version). 
If I request a file for which the path does not contain spaces then it works.

Any ideas?

With thanks, Alex



PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK 1.8.0_131 on 
CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is enabled in the 
cluster.

The (redacted) response headers for the %20 encoded request

< HTTP/1.1 404 Not Found
< Date: Wed, 24 May 2017 15:34:26 GMT
< Set-Cookie: 
JSESSIONID=15acwo8gt9qr8gdbvk48y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0; 
Expires=Tue, 23-May-2017 15:34:26 GMT
< Cache-Control: no-cache
< Expires: Wed, 24 May 2017 15:34:26 GMT
< Date: Wed, 24 May 2017 15:34:26 GMT
< Pragma: no-cache
< Expires: Wed, 24 May 2017 15:34:26 GMT
< Date: Wed, 24 May 2017 15:34:26 GMT
< Pragma: no-cache
< X-FRAME-OPTIONS: SAMEORIGIN
< Content-Type: application/json; charset=UTF-8
< Server: Jetty(6.1.26.hwx)
< Content-Length: 252

The (redacted) Knox logs for the %20 encoded request

==> /var/log/hadoop/knox/gateway-audit.log <==
17/05/24 15:51:05 
||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
 with spaces.pdf?op=OPEN|unavailable|Request method: GET
17/05/24 15:51:05 
||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
 with spaces.pdf?op=OPEN|success|
17/05/24 15:51:05 
||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
 with spaces.pdf?op=OPEN|success|Groups: []
17/05/24 15:51:05 
||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
 with spaces.pdf?op=OPEN|success|
17/05/24 15:51:05 
||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|unavailable|Request
 method: GET
17/05/24 15:51:05 
||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|success|Response
 status: 404
17/05/24 15:51:05 
||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
 with spaces.pdf?op=OPEN|success|Response status: 404

==> /var/log/hadoop/knox/gateway.log <==
2017-05-24 15:51:05,254 INFO  hadoop.gateway 
(KnoxLdapRealm.java:getUserDn(691)) - Computed userDn: 
uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for 
principal: <username>
2017-05-24 15:51:05,259 INFO  hadoop.gateway 
(AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true

The (redacted) topology

<topology>
    <gateway>
        <provider>
            <role>authentication</role>
            <name>ShiroProvider</name>
            <enabled>true</enabled>
            <param>
                <name>sessionTimeout</name>
                <value>30</value>
            </param>
            <param>
                <name>main.ldapRealm</name>
                
<value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value>
            </param>
            <param>
                <name>main.ldapContextFactory</name>
                
<value>org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory</value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory</name>
                <value>$ldapContextFactory</value>
            </param>
            <param>
                <name>main.ldapRealm.userDnTemplate</name>
                <value>uid={0},cn=users,cn=accounts,dc=<cluster></value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory.url</name>
                <value>ldap://<freeipa_node>:389</value>
            </param>
            <param>
                
<name>main.ldapRealm.contextFactory.authenticationMechanism</name>
                <value>simple</value>
            </param>
            <param>
                <name>urls./**</name>
                <value>authcBasic</value>
            </param>
        </provider>
        <provider>
            <role>authorization</role>
            <name>AclsAuthz</name>
            <enabled>true</enabled>
            <param>
                <name>knox.acl</name>
                <value>admin;*;*</value>
            </param>
        </provider>
        <provider>
            <role>identity-assertion</role>
            <name>Default</name>
            <enabled>true</enabled>
        </provider>
        <provider>
            <role>hostmap</role>
            <name>static</name>
            <enabled>false</enabled>
            
<param><name>localhost</name><value>sandbox,sandbox.hortonworks.com</value></param>
        </provider>
    </gateway>

    <service>
        <role>WEBHDFS</role>
        <url>http://<namenode>:50070/webhdfs</url>
    </service>

    <service>
        <role>SOLRAPI</role>
        <url>http://<solrnode>:6083/solr</url>
    </service>
</topology>

Reply via email to