Just saw this as I was submitting a potentially related WebHBase url encoding email to the knox-user list. Curious if they are related.
Alex - out of curiousity did you use Knox with HDP 2.4 or prior and not see this issue? Kevin Risden On Wed, May 24, 2017 at 4:08 PM, larry mccay <[email protected]> wrote: > Thank you, Alex. > > Please file a JIRA for this with the above details. > I will try and reproduce and investigate and see if we can't get it fixed > or a workaround for the 0.13.0 release. > This is planned for the end of next week. > > On Wed, May 24, 2017 at 3:18 PM, Willmer, Alex (UK Defence) < > [email protected]> wrote: > >> Hi Larry, >> >> The same file does work directly from WebHDFS (see below). Looking more >> closely at the logs I sent previously, it looks like Knox (or something in >> the chain I'm unaware of) is decoding the %20 encoded spaces, then >> reencoding them as + encoded, i.e. >> >> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename >> with spaces.pdf?op=OPEN|unavailable|Request method: GET >> .. >> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://< >> namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf >> ?op=OPEN&doAs=<username>|success|Response status: 404 >> >> With thanks, Alex >> >> >> Direct WebHDFS request (hostnames redacted) >> >> # curl -si -u: >> "http://<namenode>:50070/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" >> --negotiate -L | head -n40 >> HTTP/1.1 401 Authentication required >> Cache-Control: must-revalidate,no-cache,no-store >> Date: Wed, 24 May 2017 19:01:41 GMT >> Pragma: no-cache >> Date: Wed, 24 May 2017 19:01:41 GMT >> Pragma: no-cache >> X-FRAME-OPTIONS: SAMEORIGIN >> WWW-Authenticate: Negotiate >> Set-Cookie: hadoop.auth=; Path=/; HttpOnly >> Content-Type: text/html; charset=iso-8859-1 >> Content-Length: 1533 >> Server: Jetty(6.1.26.hwx) >> >> HTTP/1.1 307 TEMPORARY_REDIRECT >> Cache-Control: no-cache >> Expires: Wed, 24 May 2017 19:01:42 GMT >> Date: Wed, 24 May 2017 19:01:42 GMT >> Pragma: no-cache >> Expires: Wed, 24 May 2017 19:01:42 GMT >> Date: Wed, 24 May 2017 19:01:42 GMT >> Pragma: no-cache >> X-FRAME-OPTIONS: SAMEORIGIN >> WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAg >> EFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscxM >> zW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU= >> Set-Cookie: hadoop.auth="u=admin&p=admin@CYSAFA&t=kerberos&e=14956885020 >> 02&s=b7p35TgaxItAUTkKJuSXuynoq9E="; Path=/; HttpOnly >> Content-Type: application/octet-stream >> Location: http://<datanode3>:1022/webhdfs/v1/docs/filename%20with% >> 20spaces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ >> 8ooBXF_ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVn >> YXRpb24PMTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0 >> Content-Length: 0 >> Server: Jetty(6.1.26.hwx) >> >> HTTP/1.1 200 OK >> Access-Control-Allow-Methods: GET >> Access-Control-Allow-Origin: * >> Content-Type: application/octet-stream >> Connection: close >> Content-Length: 13365618 >> >> %����1.6 >> <</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream >> ... >> >> >> ------------------------------ >> *From:* larry mccay [[email protected]] >> *Sent:* 24 May 2017 18:05 >> *To:* [email protected] >> *Subject:* Re: Encoding/escaping whitespace in WebHDFS requests >> >> Hi Alex - >> >> I notice from the audit log that the 404 is actually coming from WebHDFS >> not from Knox. >> Can you confirm that direct access to WebHDFS without going through Knox >> works with the same URL? >> >> thanks, >> >> --larry >> >> On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) < >> [email protected]> wrote: >> >>> How should I encode spaces characters in the URL when I make a request >>> to WebHDFS through Knox? Or should be enabling/configuring something in >>> Knox to handle them? >>> >>> I'm making the following (redacted values in <>) request to WebHDFS, >>> through Knox >>> >>> curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/ >>> filename%20with%20spaces.pdf?op=OPEN" \ >>> -<username>:<password> -k -s >>> >>> However Knox is returning HTTP 404 with the following body >>> (whitespace/formatting added by me) >>> >>> {"exception":"FileNotFoundException", >>> "javaClassName":"java.io.FileNotFoundException", >>> "message":"File /docs/filename+with+spaces.pdf not found."}} >>> >>> I've tried encoding the spaces as + (same result), and not encoding them >>> (HTTP 400 Unknown Version). >>> If I request a file for which the path does not contain spaces then it >>> works. >>> >>> Any ideas? >>> >>> With thanks, Alex >>> >>> >>> >>> PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK >>> 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is >>> enabled in the cluster. >>> >>> The (redacted) response headers for the %20 encoded request >>> >>> < HTTP/1.1 404 Not Found >>> < Date: Wed, 24 May 2017 15:34:26 GMT >>> < Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk4 >>> 8y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly >>> < Expires: Thu, 01 Jan 1970 00:00:00 GMT >>> < Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0; >>> Expires=Tue, 23-May-2017 15:34:26 GMT >>> < Cache-Control: no-cache >>> < Expires: Wed, 24 May 2017 15:34:26 GMT >>> < Date: Wed, 24 May 2017 15:34:26 GMT >>> < Pragma: no-cache >>> < Expires: Wed, 24 May 2017 15:34:26 GMT >>> < Date: Wed, 24 May 2017 15:34:26 GMT >>> < Pragma: no-cache >>> < X-FRAME-OPTIONS: SAMEORIGIN >>> < Content-Type: application/json; charset=UTF-8 >>> < Server: Jetty(6.1.26.hwx) >>> < Content-Length: 252 >>> >>> The (redacted) Knox logs for the %20 encoded request >>> >>> ==> /var/log/hadoop/knox/gateway-audit.log <== >>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename >>> with spaces.pdf?op=OPEN|unavailable|Request method: GET >>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate >>> way/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success| >>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate >>> way/<cluster>/webhdfs/v1/docs/filename with >>> spaces.pdf?op=OPEN|success|Groups: [] >>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>> 6b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gatew >>> ay/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success| >>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam >>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces. >>> pdf?op=OPEN&doAs=<username>|unavailable|Request method: GET >>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam >>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces. >>> pdf?op=OPEN&doAs=<username>|success|Response status: 404 >>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>> 6b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename >>> with spaces.pdf?op=OPEN|success|Response status: 404 >>> >>> ==> /var/log/hadoop/knox/gateway.log <== >>> 2017-05-24 15:51:05,254 INFO hadoop.gateway >>> (KnoxLdapRealm.java:getUserDn(691)) - Computed userDn: >>> uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for >>> principal: <username> >>> 2017-05-24 15:51:05,259 INFO hadoop.gateway >>> (AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true >>> >>> The (redacted) topology >>> >>> <topology> >>> <gateway> >>> <provider> >>> <role>authentication</role> >>> <name>ShiroProvider</name> >>> <enabled>true</enabled> >>> <param> >>> <name>sessionTimeout</name> >>> <value>30</value> >>> </param> >>> <param> >>> <name>main.ldapRealm</name> >>> <value>org.apache.hadoop.gatew >>> ay.shirorealm.KnoxLdapRealm</value> >>> </param> >>> <param> >>> <name>main.ldapContextFactory</name> >>> <value>org.apache.hadoop.gatew >>> ay.shirorealm.KnoxLdapContextFactory</value> >>> </param> >>> <param> >>> <name>main.ldapRealm.contextFactory</name> >>> <value>$ldapContextFactory</value> >>> </param> >>> <param> >>> <name>main.ldapRealm.userDnTemplate</name> >>> <value>uid={0},cn=users,cn=accounts,dc=<cluster></value> >>> </param> >>> <param> >>> <name>main.ldapRealm.contextFactory.url</name> >>> <value>ldap://<freeipa_node>:389</value> >>> </param> >>> <param> >>> <name>main.ldapRealm.contextFa >>> ctory.authenticationMechanism</name> >>> <value>simple</value> >>> </param> >>> <param> >>> <name>urls./**</name> >>> <value>authcBasic</value> >>> </param> >>> </provider> >>> <provider> >>> <role>authorization</role> >>> <name>AclsAuthz</name> >>> <enabled>true</enabled> >>> <param> >>> <name>knox.acl</name> >>> <value>admin;*;*</value> >>> </param> >>> </provider> >>> <provider> >>> <role>identity-assertion</role> >>> <name>Default</name> >>> <enabled>true</enabled> >>> </provider> >>> <provider> >>> <role>hostmap</role> >>> <name>static</name> >>> <enabled>false</enabled> >>> <param><name>localhost</name><value>sandbox,sandbox.hortonwo >>> rks.com</value></param> >>> </provider> >>> </gateway> >>> >>> <service> >>> <role>WEBHDFS</role> >>> <url>http://<namenode>:50070/webhdfs</url> >>> </service> >>> >>> <service> >>> <role>SOLRAPI</role> >>> <url>http://<solrnode>:6083/solr</url> >>> </service> >>> </topology> >>> >>> >> >
