It does seem a bit odd.. I was able to get a crude workaround in place by
setting up an NGINX reverse proxy in front of Livy, with some Lua code to
automatically decompress requests if needed. However, it's yet another
layer of complexity, and the irony of having a proxy in place to handle the
proxied requests doesn't escape me ;-)

On tor 24 aug. 2017 17:34 larry mccay <lmc...@apache.org> wrote:

> So, this is really talking about the dispatched request between Knox and
> the Livy service not between the browser and Knox.
> This will require some investigation into what is available for HTTPClient
> and whether we need a custom dispatch or a specially configured dispatch
> for Livy through a service param.
>
> Seems really odd that Livy server can't deal with gzip'd requests and it
> should be investigated on that side as well.
>
> On Thu, Aug 24, 2017 at 11:24 AM, Sandeep More <moresand...@gmail.com>
> wrote:
>
>> Interesting ! AFAIK there is no way to disable compression as of now and
>> I would expect 'gateway.gzip.compress.mime.types' to work [1]
>>
>> So you are looking to turn off Gzip compression for livy service, correct
>> ? just want to make sure I understand the problem.
>>
>> Best,
>> Sandeep
>>
>> [1]
>> https://github.com/apache/knox/blob/master/gateway-server/src/main/java/org/apache/hadoop/gateway/GatewayServer.java#L422
>>
>>
>> On Thu, Aug 24, 2017 at 3:53 AM, Johan Wärlander <jo...@snowflake.nu>
>> wrote:
>>
>>> Hello,
>>>
>>> Me and a colleague have been working on setting up a Knox service for
>>> Livy, so that we can allow an external Jupyter setup to manage Spark
>>> sessions without handling Kerberos auth; basically following this guide:
>>>
>>>
>>> https://community.hortonworks.com/articles/70499/adding-livy-server-as-service-to-apache-knox.html
>>>
>>> However, Livy doesn't seem to accept the calls coming through Knox,
>>> whereas if we POST directly to Livy using 'curl', all is good.
>>>
>>> From a quick 'tcpdump' session, a difference seems to be that Knox uses
>>> chunked transfers and compression, so I decided to try out some options
>>> (see details further down), and there definitely appears to be a problem
>>> with compressing the request.
>>>
>>> Is there a way to disable compression for a particular service in Knox?
>>>
>>> NOTE: I know about 'gateway.gzip.compress.mime.types', but according to
>>> docs it only affects compression when sending data to the browser; we tried
>>> it nonetheless, and it didn't seem to help.
>>>
>>> TESTING DETAILS
>>>
>>> First, create some JSON to send to Livy:
>>>
>>> $ cat > session_johwar.json
>>> {"proxyUser":"johwar","kind": "pyspark"}
>>> $ gzip -n session_johwar.json
>>>
>>> Next, try a chunked and compressed POST request to /sessions:
>>>
>>> $ curl -u : --negotiate -v -s --trace-ascii http_trace_chunked_gz.log
>>> --data-binary @session_johwar.json.gz -H "Content-Type: application/json"
>>> -H 'Content-Encoding: gzip' -H 'Transfer-Encoding: chunked'
>>> http://myserver:8999/sessions
>>> "Illegal character ((CTRL-CHAR, code 31)): only regular white space
>>> (\\r, \\n, \\t) is allowed between tokens\n at [Source:
>>> HttpInputOverHTTP@756a5d6c; line: 1, column: 2]"
>>>
>>> Nope.. log excerpt:
>>>
>>> 040e: User-Agent: curl/7.47.0
>>> 0427: Accept: */*
>>> 0434: Content-Type: application/json
>>> 0454: Content-Encoding: gzip
>>> 046c: Transfer-Encoding: chunked
>>> 0488:
>>> 048a: 3d
>>> => Send data, 68 bytes (0x44)
>>> 0000: ...........V*(....-N-R.R...(O,R.Q...KQ.RP*.,.H,.V.....7..)...
>>> 003f: 0
>>> 0042:
>>> == Info: upload completely sent off: 68 out of 61 bytes
>>> <= Recv header, 26 bytes (0x1a)
>>> 0000: HTTP/1.1 400 Bad Request
>>> <= Recv header, 37 bytes (0x25)
>>> 0000: Date: Thu, 24 Aug 2017 07:20:57 GMT
>>> <= Recv header, 362 bytes (0x16a)
>>> 0000: WWW-Authenticate: Negotiate ...
>>> <= Recv header, 132 bytes (0x84)
>>> 0000: Set-Cookie: hadoop.auth="u=johwar&..."; HttpOnly
>>> <= Recv header, 47 bytes (0x2f)
>>> 0000: Content-Type: application/json; charset=UTF-8
>>> <= Recv header, 21 bytes (0x15)
>>> 0000: Content-Length: 172
>>> <= Recv header, 33 bytes (0x21)
>>> 0000: Server: Jetty(9.2.16.v20160414)
>>> <= Recv header, 2 bytes (0x2)
>>> 0000:
>>> <= Recv data, 172 bytes (0xac)
>>> 0000: "Illegal character ((CTRL-CHAR, code 31)): only regular white sp
>>> 0040: ace (\\r, \\n, \\t) is allowed between tokens\n at [Source: Http
>>> 0080: InputOverHTTP@583564e8; line: 1, column: 2]"
>>>
>>> Ok, so let's try with just compression:
>>>
>>> $ curl -u : --negotiate -v -s --trace-ascii http_trace_gz.log
>>> --data-binary @session_johwar.json.gz -H "Content-Type: application/json"
>>> -H 'Content-Encoding: gzip' http://myserver:8999/sessions
>>> "Illegal character ((CTRL-CHAR, code 31)): only regular white space
>>> (\\r, \\n, \\t) is allowed between tokens\n at [Source:
>>> HttpInputOverHTTP@188893c9; line: 1, column: 2]"
>>>
>>> Ok, no luck.. log is mostly the same, except for no chunking:
>>>
>>> 040e: User-Agent: curl/7.47.0
>>> 0427: Accept: */*
>>> 0434: Content-Type: application/json
>>> 0454: Content-Encoding: gzip
>>> 046c: Content-Length: 61
>>> 0480:
>>> => Send data, 61 bytes (0x3d)
>>> 0000: ...........V*(....-N-R.R...(O,R.Q...KQ.RP*.,.H,.V.....7..)...
>>> == Info: upload completely sent off: 61 out of 61 bytes
>>> <= Recv header, 26 bytes (0x1a)
>>> 0000: HTTP/1.1 400 Bad Request
>>>
>>> Decompress the file again:
>>>
>>> $ gunzip session_johwar.json.gz
>>>
>>> Then.. just a plain old request, "known" to work already:
>>>
>>> $ curl -u : --negotiate -v -s --trace-ascii http_trace.log --data
>>> @session_johwar.json -H "Content-Type: application/json"
>>> http://myserver:8999/sessions
>>>
>>> {"id":5,"appId":null,"owner":"johwar","proxyUser":"johwar","state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":[]}
>>>
>>> Yep. Log is looking a lot better:
>>>
>>> 040e: User-Agent: curl/7.47.0
>>> 0427: Accept: */*
>>> 0434: Content-Type: application/json
>>> 0454: Content-Length: 40
>>> 0468:
>>> => Send data, 40 bytes (0x28)
>>> 0000: {"proxyUser":"johwar","kind": "pyspark"}
>>> == Info: upload completely sent off: 40 out of 40 bytes
>>> <= Recv header, 22 bytes (0x16)
>>> 0000: HTTP/1.1 201 Created
>>>
>>> And with chunking?
>>>
>>> $ curl -u : --negotiate -v -s --trace-ascii http_trace_chunked.log
>>> --data @session_johwar.json -H "Content-Type: application/json" -H
>>> 'Transfer-Encoding: chunked' http://myserver:8999/sessions
>>>
>>> {"id":6,"appId":null,"owner":"johwar","proxyUser":"johwar","state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":[]}
>>>
>>> Still works.
>>>
>>
>>
>

Reply via email to