Just saw this as I was submitting a potentially related WebHBase url
encoding email to the knox-user list. Curious if they are related.

Alex - out of curiousity did you use Knox with HDP 2.4 or prior and not see
this issue?

Kevin Risden

On Wed, May 24, 2017 at 4:08 PM, larry mccay <[email protected]> wrote:

> Thank you, Alex.
>
> Please file a JIRA for this with the above details.
> I will try and reproduce and investigate and see if we can't get it fixed
> or a workaround for the 0.13.0 release.
> This is planned for the end of next week.
>
> On Wed, May 24, 2017 at 3:18 PM, Willmer, Alex (UK Defence) <
> [email protected]> wrote:
>
>> Hi Larry,
>>
>> The same file does work directly from WebHDFS (see below). Looking more
>> closely at the logs I sent previously, it looks like Knox (or something in
>> the chain I'm unaware of) is decoding the %20 encoded spaces, then
>> reencoding them as + encoded, i.e.
>>
>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>> with spaces.pdf?op=OPEN|unavailable|Request method: GET
>> ..
>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<
>> namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf
>> ?op=OPEN&doAs=<username>|success|Response status: 404
>>
>> With thanks, Alex
>>
>>
>> Direct WebHDFS request (hostnames redacted)
>>
>> # curl -si -u: 
>> "http://<namenode>:50070/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN"
>> --negotiate -L | head -n40
>> HTTP/1.1 401 Authentication required
>> Cache-Control: must-revalidate,no-cache,no-store
>> Date: Wed, 24 May 2017 19:01:41 GMT
>> Pragma: no-cache
>> Date: Wed, 24 May 2017 19:01:41 GMT
>> Pragma: no-cache
>> X-FRAME-OPTIONS: SAMEORIGIN
>> WWW-Authenticate: Negotiate
>> Set-Cookie: hadoop.auth=; Path=/; HttpOnly
>> Content-Type: text/html; charset=iso-8859-1
>> Content-Length: 1533
>> Server: Jetty(6.1.26.hwx)
>>
>> HTTP/1.1 307 TEMPORARY_REDIRECT
>> Cache-Control: no-cache
>> Expires: Wed, 24 May 2017 19:01:42 GMT
>> Date: Wed, 24 May 2017 19:01:42 GMT
>> Pragma: no-cache
>> Expires: Wed, 24 May 2017 19:01:42 GMT
>> Date: Wed, 24 May 2017 19:01:42 GMT
>> Pragma: no-cache
>> X-FRAME-OPTIONS: SAMEORIGIN
>> WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAg
>> EFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscxM
>> zW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU=
>> Set-Cookie: hadoop.auth="u=admin&p=admin@CYSAFA&t=kerberos&e=14956885020
>> 02&s=b7p35TgaxItAUTkKJuSXuynoq9E="; Path=/; HttpOnly
>> Content-Type: application/octet-stream
>> Location: http://<datanode3>:1022/webhdfs/v1/docs/filename%20with%
>> 20spaces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ
>> 8ooBXF_ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVn
>> YXRpb24PMTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0
>> Content-Length: 0
>> Server: Jetty(6.1.26.hwx)
>>
>> HTTP/1.1 200 OK
>> Access-Control-Allow-Methods: GET
>> Access-Control-Allow-Origin: *
>> Content-Type: application/octet-stream
>> Connection: close
>> Content-Length: 13365618
>>
>> %����1.6
>> <</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream
>> ...
>>
>>
>> ------------------------------
>> *From:* larry mccay [[email protected]]
>> *Sent:* 24 May 2017 18:05
>> *To:* [email protected]
>> *Subject:* Re: Encoding/escaping whitespace in WebHDFS requests
>>
>> Hi Alex -
>>
>> I notice from the audit log that the 404 is actually coming from WebHDFS
>> not from Knox.
>> Can you confirm that direct access to WebHDFS without going through Knox
>> works with the same URL?
>>
>> thanks,
>>
>> --larry
>>
>> On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) <
>> [email protected]> wrote:
>>
>>> How should I encode spaces characters in the URL when I make a request
>>> to WebHDFS through Knox? Or should be enabling/configuring  something in
>>> Knox to handle them?
>>>
>>> I'm making the following (redacted values in <>) request to WebHDFS,
>>> through Knox
>>>
>>> curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/
>>> filename%20with%20spaces.pdf?op=OPEN" \
>>>      -<username>:<password> -k -s
>>>
>>> However Knox is returning HTTP 404 with the following body
>>> (whitespace/formatting added by me)
>>>
>>> {"exception":"FileNotFoundException",
>>>  "javaClassName":"java.io.FileNotFoundException",
>>>  "message":"File /docs/filename+with+spaces.pdf not found."}}
>>>
>>> I've tried encoding the spaces as + (same result), and not encoding them
>>> (HTTP 400  Unknown Version).
>>> If I request a file for which the path does not contain spaces then it
>>> works.
>>>
>>> Any ideas?
>>>
>>> With thanks, Alex
>>>
>>>
>>>
>>> PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK
>>> 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is
>>> enabled in the cluster.
>>>
>>> The (redacted) response headers for the %20 encoded request
>>>
>>> < HTTP/1.1 404 Not Found
>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>> < Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk4
>>> 8y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly
>>> < Expires: Thu, 01 Jan 1970 00:00:00 GMT
>>> < Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0;
>>> Expires=Tue, 23-May-2017 15:34:26 GMT
>>> < Cache-Control: no-cache
>>> < Expires: Wed, 24 May 2017 15:34:26 GMT
>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>> < Pragma: no-cache
>>> < Expires: Wed, 24 May 2017 15:34:26 GMT
>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>> < Pragma: no-cache
>>> < X-FRAME-OPTIONS: SAMEORIGIN
>>> < Content-Type: application/json; charset=UTF-8
>>> < Server: Jetty(6.1.26.hwx)
>>> < Content-Length: 252
>>>
>>> The (redacted) Knox logs for the %20 encoded request
>>>
>>> ==> /var/log/hadoop/knox/gateway-audit.log <==
>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>>> with spaces.pdf?op=OPEN|unavailable|Request method: GET
>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate
>>> way/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|
>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate
>>> way/<cluster>/webhdfs/v1/docs/filename with
>>> spaces.pdf?op=OPEN|success|Groups: []
>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>> 6b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gatew
>>> ay/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|
>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.
>>> pdf?op=OPEN&doAs=<username>|unavailable|Request method: GET
>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.
>>> pdf?op=OPEN&doAs=<username>|success|Response status: 404
>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>> 6b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>>> with spaces.pdf?op=OPEN|success|Response status: 404
>>>
>>> ==> /var/log/hadoop/knox/gateway.log <==
>>> 2017-05-24 15:51:05,254 INFO  hadoop.gateway
>>> (KnoxLdapRealm.java:getUserDn(691)) - Computed userDn:
>>> uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for
>>> principal: <username>
>>> 2017-05-24 15:51:05,259 INFO  hadoop.gateway
>>> (AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true
>>>
>>> The (redacted) topology
>>>
>>> <topology>
>>>     <gateway>
>>>         <provider>
>>>             <role>authentication</role>
>>>             <name>ShiroProvider</name>
>>>             <enabled>true</enabled>
>>>             <param>
>>>                 <name>sessionTimeout</name>
>>>                 <value>30</value>
>>>             </param>
>>>             <param>
>>>                 <name>main.ldapRealm</name>
>>>                 <value>org.apache.hadoop.gatew
>>> ay.shirorealm.KnoxLdapRealm</value>
>>>             </param>
>>>             <param>
>>>                 <name>main.ldapContextFactory</name>
>>>                 <value>org.apache.hadoop.gatew
>>> ay.shirorealm.KnoxLdapContextFactory</value>
>>>             </param>
>>>             <param>
>>>                 <name>main.ldapRealm.contextFactory</name>
>>>                 <value>$ldapContextFactory</value>
>>>             </param>
>>>             <param>
>>>                 <name>main.ldapRealm.userDnTemplate</name>
>>>                 <value>uid={0},cn=users,cn=accounts,dc=<cluster></value>
>>>             </param>
>>>             <param>
>>>                 <name>main.ldapRealm.contextFactory.url</name>
>>>                 <value>ldap://<freeipa_node>:389</value>
>>>             </param>
>>>             <param>
>>>                 <name>main.ldapRealm.contextFa
>>> ctory.authenticationMechanism</name>
>>>                 <value>simple</value>
>>>             </param>
>>>             <param>
>>>                 <name>urls./**</name>
>>>                 <value>authcBasic</value>
>>>             </param>
>>>         </provider>
>>>         <provider>
>>>             <role>authorization</role>
>>>             <name>AclsAuthz</name>
>>>             <enabled>true</enabled>
>>>             <param>
>>>                 <name>knox.acl</name>
>>>                 <value>admin;*;*</value>
>>>             </param>
>>>         </provider>
>>>         <provider>
>>>             <role>identity-assertion</role>
>>>             <name>Default</name>
>>>             <enabled>true</enabled>
>>>         </provider>
>>>         <provider>
>>>             <role>hostmap</role>
>>>             <name>static</name>
>>>             <enabled>false</enabled>
>>>             <param><name>localhost</name><value>sandbox,sandbox.hortonwo
>>> rks.com</value></param>
>>>         </provider>
>>>     </gateway>
>>>
>>>     <service>
>>>         <role>WEBHDFS</role>
>>>         <url>http://<namenode>:50070/webhdfs</url>
>>>     </service>
>>>
>>>     <service>
>>>         <role>SOLRAPI</role>
>>>         <url>http://<solrnode>:6083/solr</url>
>>>     </service>
>>> </topology>
>>>
>>>
>>
>

Reply via email to