Thanks Larry yea I had stumbled upon KNOX-709. Way more detail is in thread
"WebHBase URL Encoding Issue". Didn't want to hijack this thread if the
WebHBase issue isn't related.

Kevin Risden

On Wed, May 24, 2017 at 6:08 PM, larry mccay <[email protected]> wrote:

> Hi Kevin -
>
> You may see some change related to https://issues.apache.org/
> jira/browse/KNOX-709.
>
> thanks,
>
> --larry
>
> On Wed, May 24, 2017 at 6:24 PM, Kevin Risden <[email protected]>
> wrote:
>
>> Just saw this as I was submitting a potentially related WebHBase url
>> encoding email to the knox-user list. Curious if they are related.
>>
>> Alex - out of curiousity did you use Knox with HDP 2.4 or prior and not
>> see this issue?
>>
>> Kevin Risden
>>
>> On Wed, May 24, 2017 at 4:08 PM, larry mccay <[email protected]>
>> wrote:
>>
>>> Thank you, Alex.
>>>
>>> Please file a JIRA for this with the above details.
>>> I will try and reproduce and investigate and see if we can't get it
>>> fixed or a workaround for the 0.13.0 release.
>>> This is planned for the end of next week.
>>>
>>> On Wed, May 24, 2017 at 3:18 PM, Willmer, Alex (UK Defence) <
>>> [email protected]> wrote:
>>>
>>>> Hi Larry,
>>>>
>>>> The same file does work directly from WebHDFS (see below). Looking more
>>>> closely at the logs I sent previously, it looks like Knox (or something in
>>>> the chain I'm unaware of) is decoding the %20 encoded spaces, then
>>>> reencoding them as + encoded, i.e.
>>>>
>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webh
>>>> dfs/v1/docs/filename with spaces.pdf?op=OPEN|unavailable|Request
>>>> method: GET
>>>> ..
>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf
>>>> ?op=OPEN&doAs=<username>|success|Response status: 404
>>>>
>>>> With thanks, Alex
>>>>
>>>>
>>>> Direct WebHDFS request (hostnames redacted)
>>>>
>>>> # curl -si -u: "http://<namenode>:50070/webhd
>>>> fs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" --negotiate -L | head
>>>> -n40
>>>> HTTP/1.1 401 Authentication required
>>>> Cache-Control: must-revalidate,no-cache,no-store
>>>> Date: Wed, 24 May 2017 19:01:41 GMT
>>>> Pragma: no-cache
>>>> Date: Wed, 24 May 2017 19:01:41 GMT
>>>> Pragma: no-cache
>>>> X-FRAME-OPTIONS: SAMEORIGIN
>>>> WWW-Authenticate: Negotiate
>>>> Set-Cookie: hadoop.auth=; Path=/; HttpOnly
>>>> Content-Type: text/html; charset=iso-8859-1
>>>> Content-Length: 1533
>>>> Server: Jetty(6.1.26.hwx)
>>>>
>>>> HTTP/1.1 307 TEMPORARY_REDIRECT
>>>> Cache-Control: no-cache
>>>> Expires: Wed, 24 May 2017 19:01:42 GMT
>>>> Date: Wed, 24 May 2017 19:01:42 GMT
>>>> Pragma: no-cache
>>>> Expires: Wed, 24 May 2017 19:01:42 GMT
>>>> Date: Wed, 24 May 2017 19:01:42 GMT
>>>> Pragma: no-cache
>>>> X-FRAME-OPTIONS: SAMEORIGIN
>>>> WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAg
>>>> EFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscxM
>>>> zW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU=
>>>> Set-Cookie: hadoop.auth="u=admin&p=admin@C
>>>> YSAFA&t=kerberos&e=1495688502002&s=b7p35TgaxItAUTkKJuSXuynoq9E=";
>>>> Path=/; HttpOnly
>>>> Content-Type: application/octet-stream
>>>> Location: http://<datanode3>:1022/webhdfs/v1/docs/filename%20with%20sp
>>>> aces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ8ooB
>>>> XF_ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVnYXRp
>>>> b24PMTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0
>>>> Content-Length: 0
>>>> Server: Jetty(6.1.26.hwx)
>>>>
>>>> HTTP/1.1 200 OK
>>>> Access-Control-Allow-Methods: GET
>>>> Access-Control-Allow-Origin: *
>>>> Content-Type: application/octet-stream
>>>> Connection: close
>>>> Content-Length: 13365618
>>>>
>>>> %����1.6
>>>> <</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream
>>>> ...
>>>>
>>>>
>>>> ------------------------------
>>>> *From:* larry mccay [[email protected]]
>>>> *Sent:* 24 May 2017 18:05
>>>> *To:* [email protected]
>>>> *Subject:* Re: Encoding/escaping whitespace in WebHDFS requests
>>>>
>>>> Hi Alex -
>>>>
>>>> I notice from the audit log that the 404 is actually coming from
>>>> WebHDFS not from Knox.
>>>> Can you confirm that direct access to WebHDFS without going through
>>>> Knox works with the same URL?
>>>>
>>>> thanks,
>>>>
>>>> --larry
>>>>
>>>> On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) <
>>>> [email protected]> wrote:
>>>>
>>>>> How should I encode spaces characters in the URL when I make a request
>>>>> to WebHDFS through Knox? Or should be enabling/configuring  something in
>>>>> Knox to handle them?
>>>>>
>>>>> I'm making the following (redacted values in <>) request to WebHDFS,
>>>>> through Knox
>>>>>
>>>>> curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/
>>>>> filename%20with%20spaces.pdf?op=OPEN" \
>>>>>      -<username>:<password> -k -s
>>>>>
>>>>> However Knox is returning HTTP 404 with the following body
>>>>> (whitespace/formatting added by me)
>>>>>
>>>>> {"exception":"FileNotFoundException",
>>>>>  "javaClassName":"java.io.FileNotFoundException",
>>>>>  "message":"File /docs/filename+with+spaces.pdf not found."}}
>>>>>
>>>>> I've tried encoding the spaces as + (same result), and not encoding
>>>>> them (HTTP 400  Unknown Version).
>>>>> If I request a file for which the path does not contain spaces then it
>>>>> works.
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> With thanks, Alex
>>>>>
>>>>>
>>>>>
>>>>> PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK
>>>>> 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is
>>>>> enabled in the cluster.
>>>>>
>>>>> The (redacted) response headers for the %20 encoded request
>>>>>
>>>>> < HTTP/1.1 404 Not Found
>>>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>>>> < Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk4
>>>>> 8y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly
>>>>> < Expires: Thu, 01 Jan 1970 00:00:00 GMT
>>>>> < Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0;
>>>>> Expires=Tue, 23-May-2017 15:34:26 GMT
>>>>> < Cache-Control: no-cache
>>>>> < Expires: Wed, 24 May 2017 15:34:26 GMT
>>>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>>>> < Pragma: no-cache
>>>>> < Expires: Wed, 24 May 2017 15:34:26 GMT
>>>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>>>> < Pragma: no-cache
>>>>> < X-FRAME-OPTIONS: SAMEORIGIN
>>>>> < Content-Type: application/json; charset=UTF-8
>>>>> < Server: Jetty(6.1.26.hwx)
>>>>> < Content-Length: 252
>>>>>
>>>>> The (redacted) Knox logs for the %20 encoded request
>>>>>
>>>>> ==> /var/log/hadoop/knox/gateway-audit.log <==
>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>>>>> with spaces.pdf?op=OPEN|unavailable|Request method: GET
>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate
>>>>> way/<cluster>/webhdfs/v1/docs/filename with
>>>>> spaces.pdf?op=OPEN|success|
>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate
>>>>> way/<cluster>/webhdfs/v1/docs/filename with
>>>>> spaces.pdf?op=OPEN|success|Groups: []
>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>> 6b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gatew
>>>>> ay/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|
>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.
>>>>> pdf?op=OPEN&doAs=<username>|unavailable|Request method: GET
>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.
>>>>> pdf?op=OPEN&doAs=<username>|success|Response status: 404
>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>> 6b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>>>>> with spaces.pdf?op=OPEN|success|Response status: 404
>>>>>
>>>>> ==> /var/log/hadoop/knox/gateway.log <==
>>>>> 2017-05-24 15:51:05,254 INFO  hadoop.gateway
>>>>> (KnoxLdapRealm.java:getUserDn(691)) - Computed userDn:
>>>>> uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for
>>>>> principal: <username>
>>>>> 2017-05-24 15:51:05,259 INFO  hadoop.gateway
>>>>> (AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true
>>>>>
>>>>> The (redacted) topology
>>>>>
>>>>> <topology>
>>>>>     <gateway>
>>>>>         <provider>
>>>>>             <role>authentication</role>
>>>>>             <name>ShiroProvider</name>
>>>>>             <enabled>true</enabled>
>>>>>             <param>
>>>>>                 <name>sessionTimeout</name>
>>>>>                 <value>30</value>
>>>>>             </param>
>>>>>             <param>
>>>>>                 <name>main.ldapRealm</name>
>>>>>                 <value>org.apache.hadoop.gatew
>>>>> ay.shirorealm.KnoxLdapRealm</value>
>>>>>             </param>
>>>>>             <param>
>>>>>                 <name>main.ldapContextFactory</name>
>>>>>                 <value>org.apache.hadoop.gatew
>>>>> ay.shirorealm.KnoxLdapContextFactory</value>
>>>>>             </param>
>>>>>             <param>
>>>>>                 <name>main.ldapRealm.contextFactory</name>
>>>>>                 <value>$ldapContextFactory</value>
>>>>>             </param>
>>>>>             <param>
>>>>>                 <name>main.ldapRealm.userDnTemplate</name>
>>>>>                 <value>uid={0},cn=users,cn=acc
>>>>> ounts,dc=<cluster></value>
>>>>>             </param>
>>>>>             <param>
>>>>>                 <name>main.ldapRealm.contextFactory.url</name>
>>>>>                 <value>ldap://<freeipa_node>:389</value>
>>>>>             </param>
>>>>>             <param>
>>>>>                 <name>main.ldapRealm.contextFa
>>>>> ctory.authenticationMechanism</name>
>>>>>                 <value>simple</value>
>>>>>             </param>
>>>>>             <param>
>>>>>                 <name>urls./**</name>
>>>>>                 <value>authcBasic</value>
>>>>>             </param>
>>>>>         </provider>
>>>>>         <provider>
>>>>>             <role>authorization</role>
>>>>>             <name>AclsAuthz</name>
>>>>>             <enabled>true</enabled>
>>>>>             <param>
>>>>>                 <name>knox.acl</name>
>>>>>                 <value>admin;*;*</value>
>>>>>             </param>
>>>>>         </provider>
>>>>>         <provider>
>>>>>             <role>identity-assertion</role>
>>>>>             <name>Default</name>
>>>>>             <enabled>true</enabled>
>>>>>         </provider>
>>>>>         <provider>
>>>>>             <role>hostmap</role>
>>>>>             <name>static</name>
>>>>>             <enabled>false</enabled>
>>>>>             <param><name>localhost</name><value>sandbox,
>>>>> sandbox.hortonworks.com</value></param>
>>>>>         </provider>
>>>>>     </gateway>
>>>>>
>>>>>     <service>
>>>>>         <role>WEBHDFS</role>
>>>>>         <url>http://<namenode>:50070/webhdfs</url>
>>>>>     </service>
>>>>>
>>>>>     <service>
>>>>>         <role>SOLRAPI</role>
>>>>>         <url>http://<solrnode>:6083/solr</url>
>>>>>     </service>
>>>>> </topology>
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to