Thanks Larry yea I had stumbled upon KNOX-709. Way more detail is in thread "WebHBase URL Encoding Issue". Didn't want to hijack this thread if the WebHBase issue isn't related.
Kevin Risden On Wed, May 24, 2017 at 6:08 PM, larry mccay <[email protected]> wrote: > Hi Kevin - > > You may see some change related to https://issues.apache.org/ > jira/browse/KNOX-709. > > thanks, > > --larry > > On Wed, May 24, 2017 at 6:24 PM, Kevin Risden <[email protected]> > wrote: > >> Just saw this as I was submitting a potentially related WebHBase url >> encoding email to the knox-user list. Curious if they are related. >> >> Alex - out of curiousity did you use Knox with HDP 2.4 or prior and not >> see this issue? >> >> Kevin Risden >> >> On Wed, May 24, 2017 at 4:08 PM, larry mccay <[email protected]> >> wrote: >> >>> Thank you, Alex. >>> >>> Please file a JIRA for this with the above details. >>> I will try and reproduce and investigate and see if we can't get it >>> fixed or a workaround for the 0.13.0 release. >>> This is planned for the end of next week. >>> >>> On Wed, May 24, 2017 at 3:18 PM, Willmer, Alex (UK Defence) < >>> [email protected]> wrote: >>> >>>> Hi Larry, >>>> >>>> The same file does work directly from WebHDFS (see below). Looking more >>>> closely at the logs I sent previously, it looks like Knox (or something in >>>> the chain I'm unaware of) is decoding the %20 encoded spaces, then >>>> reencoding them as + encoded, i.e. >>>> >>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webh >>>> dfs/v1/docs/filename with spaces.pdf?op=OPEN|unavailable|Request >>>> method: GET >>>> .. >>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam >>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf >>>> ?op=OPEN&doAs=<username>|success|Response status: 404 >>>> >>>> With thanks, Alex >>>> >>>> >>>> Direct WebHDFS request (hostnames redacted) >>>> >>>> # curl -si -u: "http://<namenode>:50070/webhd >>>> fs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" --negotiate -L | head >>>> -n40 >>>> HTTP/1.1 401 Authentication required >>>> Cache-Control: must-revalidate,no-cache,no-store >>>> Date: Wed, 24 May 2017 19:01:41 GMT >>>> Pragma: no-cache >>>> Date: Wed, 24 May 2017 19:01:41 GMT >>>> Pragma: no-cache >>>> X-FRAME-OPTIONS: SAMEORIGIN >>>> WWW-Authenticate: Negotiate >>>> Set-Cookie: hadoop.auth=; Path=/; HttpOnly >>>> Content-Type: text/html; charset=iso-8859-1 >>>> Content-Length: 1533 >>>> Server: Jetty(6.1.26.hwx) >>>> >>>> HTTP/1.1 307 TEMPORARY_REDIRECT >>>> Cache-Control: no-cache >>>> Expires: Wed, 24 May 2017 19:01:42 GMT >>>> Date: Wed, 24 May 2017 19:01:42 GMT >>>> Pragma: no-cache >>>> Expires: Wed, 24 May 2017 19:01:42 GMT >>>> Date: Wed, 24 May 2017 19:01:42 GMT >>>> Pragma: no-cache >>>> X-FRAME-OPTIONS: SAMEORIGIN >>>> WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAg >>>> EFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscxM >>>> zW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU= >>>> Set-Cookie: hadoop.auth="u=admin&p=admin@C >>>> YSAFA&t=kerberos&e=1495688502002&s=b7p35TgaxItAUTkKJuSXuynoq9E="; >>>> Path=/; HttpOnly >>>> Content-Type: application/octet-stream >>>> Location: http://<datanode3>:1022/webhdfs/v1/docs/filename%20with%20sp >>>> aces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ8ooB >>>> XF_ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVnYXRp >>>> b24PMTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0 >>>> Content-Length: 0 >>>> Server: Jetty(6.1.26.hwx) >>>> >>>> HTTP/1.1 200 OK >>>> Access-Control-Allow-Methods: GET >>>> Access-Control-Allow-Origin: * >>>> Content-Type: application/octet-stream >>>> Connection: close >>>> Content-Length: 13365618 >>>> >>>> %����1.6 >>>> <</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream >>>> ... >>>> >>>> >>>> ------------------------------ >>>> *From:* larry mccay [[email protected]] >>>> *Sent:* 24 May 2017 18:05 >>>> *To:* [email protected] >>>> *Subject:* Re: Encoding/escaping whitespace in WebHDFS requests >>>> >>>> Hi Alex - >>>> >>>> I notice from the audit log that the 404 is actually coming from >>>> WebHDFS not from Knox. >>>> Can you confirm that direct access to WebHDFS without going through >>>> Knox works with the same URL? >>>> >>>> thanks, >>>> >>>> --larry >>>> >>>> On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) < >>>> [email protected]> wrote: >>>> >>>>> How should I encode spaces characters in the URL when I make a request >>>>> to WebHDFS through Knox? Or should be enabling/configuring something in >>>>> Knox to handle them? >>>>> >>>>> I'm making the following (redacted values in <>) request to WebHDFS, >>>>> through Knox >>>>> >>>>> curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/ >>>>> filename%20with%20spaces.pdf?op=OPEN" \ >>>>> -<username>:<password> -k -s >>>>> >>>>> However Knox is returning HTTP 404 with the following body >>>>> (whitespace/formatting added by me) >>>>> >>>>> {"exception":"FileNotFoundException", >>>>> "javaClassName":"java.io.FileNotFoundException", >>>>> "message":"File /docs/filename+with+spaces.pdf not found."}} >>>>> >>>>> I've tried encoding the spaces as + (same result), and not encoding >>>>> them (HTTP 400 Unknown Version). >>>>> If I request a file for which the path does not contain spaces then it >>>>> works. >>>>> >>>>> Any ideas? >>>>> >>>>> With thanks, Alex >>>>> >>>>> >>>>> >>>>> PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK >>>>> 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is >>>>> enabled in the cluster. >>>>> >>>>> The (redacted) response headers for the %20 encoded request >>>>> >>>>> < HTTP/1.1 404 Not Found >>>>> < Date: Wed, 24 May 2017 15:34:26 GMT >>>>> < Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk4 >>>>> 8y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly >>>>> < Expires: Thu, 01 Jan 1970 00:00:00 GMT >>>>> < Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0; >>>>> Expires=Tue, 23-May-2017 15:34:26 GMT >>>>> < Cache-Control: no-cache >>>>> < Expires: Wed, 24 May 2017 15:34:26 GMT >>>>> < Date: Wed, 24 May 2017 15:34:26 GMT >>>>> < Pragma: no-cache >>>>> < Expires: Wed, 24 May 2017 15:34:26 GMT >>>>> < Date: Wed, 24 May 2017 15:34:26 GMT >>>>> < Pragma: no-cache >>>>> < X-FRAME-OPTIONS: SAMEORIGIN >>>>> < Content-Type: application/json; charset=UTF-8 >>>>> < Server: Jetty(6.1.26.hwx) >>>>> < Content-Length: 252 >>>>> >>>>> The (redacted) Knox logs for the %20 encoded request >>>>> >>>>> ==> /var/log/hadoop/knox/gateway-audit.log <== >>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename >>>>> with spaces.pdf?op=OPEN|unavailable|Request method: GET >>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate >>>>> way/<cluster>/webhdfs/v1/docs/filename with >>>>> spaces.pdf?op=OPEN|success| >>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate >>>>> way/<cluster>/webhdfs/v1/docs/filename with >>>>> spaces.pdf?op=OPEN|success|Groups: [] >>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>>> 6b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gatew >>>>> ay/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success| >>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam >>>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces. >>>>> pdf?op=OPEN&doAs=<username>|unavailable|Request method: GET >>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam >>>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces. >>>>> pdf?op=OPEN&doAs=<username>|success|Response status: 404 >>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>>> 6b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename >>>>> with spaces.pdf?op=OPEN|success|Response status: 404 >>>>> >>>>> ==> /var/log/hadoop/knox/gateway.log <== >>>>> 2017-05-24 15:51:05,254 INFO hadoop.gateway >>>>> (KnoxLdapRealm.java:getUserDn(691)) - Computed userDn: >>>>> uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for >>>>> principal: <username> >>>>> 2017-05-24 15:51:05,259 INFO hadoop.gateway >>>>> (AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true >>>>> >>>>> The (redacted) topology >>>>> >>>>> <topology> >>>>> <gateway> >>>>> <provider> >>>>> <role>authentication</role> >>>>> <name>ShiroProvider</name> >>>>> <enabled>true</enabled> >>>>> <param> >>>>> <name>sessionTimeout</name> >>>>> <value>30</value> >>>>> </param> >>>>> <param> >>>>> <name>main.ldapRealm</name> >>>>> <value>org.apache.hadoop.gatew >>>>> ay.shirorealm.KnoxLdapRealm</value> >>>>> </param> >>>>> <param> >>>>> <name>main.ldapContextFactory</name> >>>>> <value>org.apache.hadoop.gatew >>>>> ay.shirorealm.KnoxLdapContextFactory</value> >>>>> </param> >>>>> <param> >>>>> <name>main.ldapRealm.contextFactory</name> >>>>> <value>$ldapContextFactory</value> >>>>> </param> >>>>> <param> >>>>> <name>main.ldapRealm.userDnTemplate</name> >>>>> <value>uid={0},cn=users,cn=acc >>>>> ounts,dc=<cluster></value> >>>>> </param> >>>>> <param> >>>>> <name>main.ldapRealm.contextFactory.url</name> >>>>> <value>ldap://<freeipa_node>:389</value> >>>>> </param> >>>>> <param> >>>>> <name>main.ldapRealm.contextFa >>>>> ctory.authenticationMechanism</name> >>>>> <value>simple</value> >>>>> </param> >>>>> <param> >>>>> <name>urls./**</name> >>>>> <value>authcBasic</value> >>>>> </param> >>>>> </provider> >>>>> <provider> >>>>> <role>authorization</role> >>>>> <name>AclsAuthz</name> >>>>> <enabled>true</enabled> >>>>> <param> >>>>> <name>knox.acl</name> >>>>> <value>admin;*;*</value> >>>>> </param> >>>>> </provider> >>>>> <provider> >>>>> <role>identity-assertion</role> >>>>> <name>Default</name> >>>>> <enabled>true</enabled> >>>>> </provider> >>>>> <provider> >>>>> <role>hostmap</role> >>>>> <name>static</name> >>>>> <enabled>false</enabled> >>>>> <param><name>localhost</name><value>sandbox, >>>>> sandbox.hortonworks.com</value></param> >>>>> </provider> >>>>> </gateway> >>>>> >>>>> <service> >>>>> <role>WEBHDFS</role> >>>>> <url>http://<namenode>:50070/webhdfs</url> >>>>> </service> >>>>> >>>>> <service> >>>>> <role>SOLRAPI</role> >>>>> <url>http://<solrnode>:6083/solr</url> >>>>> </service> >>>>> </topology> >>>>> >>>>> >>>> >>> >> >
