Interesting ! AFAIK there is no way to disable compression as of now and I
would expect 'gateway.gzip.compress.mime.types' to work [1]

So you are looking to turn off Gzip compression for livy service, correct ?
just want to make sure I understand the problem.

Best,
Sandeep

[1]
https://github.com/apache/knox/blob/master/gateway-server/src/main/java/org/apache/hadoop/gateway/GatewayServer.java#L422


On Thu, Aug 24, 2017 at 3:53 AM, Johan Wärlander <jo...@snowflake.nu> wrote:

> Hello,
>
> Me and a colleague have been working on setting up a Knox service for
> Livy, so that we can allow an external Jupyter setup to manage Spark
> sessions without handling Kerberos auth; basically following this guide:
>
> https://community.hortonworks.com/articles/70499/adding-
> livy-server-as-service-to-apache-knox.html
>
> However, Livy doesn't seem to accept the calls coming through Knox,
> whereas if we POST directly to Livy using 'curl', all is good.
>
> From a quick 'tcpdump' session, a difference seems to be that Knox uses
> chunked transfers and compression, so I decided to try out some options
> (see details further down), and there definitely appears to be a problem
> with compressing the request.
>
> Is there a way to disable compression for a particular service in Knox?
>
> NOTE: I know about 'gateway.gzip.compress.mime.types', but according to
> docs it only affects compression when sending data to the browser; we tried
> it nonetheless, and it didn't seem to help.
>
> TESTING DETAILS
>
> First, create some JSON to send to Livy:
>
> $ cat > session_johwar.json
> {"proxyUser":"johwar","kind": "pyspark"}
> $ gzip -n session_johwar.json
>
> Next, try a chunked and compressed POST request to /sessions:
>
> $ curl -u : --negotiate -v -s --trace-ascii http_trace_chunked_gz.log
> --data-binary @session_johwar.json.gz -H "Content-Type: application/json"
> -H 'Content-Encoding: gzip' -H 'Transfer-Encoding: chunked'
> http://myserver:8999/sessions
> "Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\r,
> \\n, \\t) is allowed between tokens\n at [Source: HttpInputOverHTTP@756a5d6c;
> line: 1, column: 2]"
>
> Nope.. log excerpt:
>
> 040e: User-Agent: curl/7.47.0
> 0427: Accept: */*
> 0434: Content-Type: application/json
> 0454: Content-Encoding: gzip
> 046c: Transfer-Encoding: chunked
> 0488:
> 048a: 3d
> => Send data, 68 bytes (0x44)
> 0000: ...........V*(....-N-R.R...(O,R.Q...KQ.RP*.,.H,.V.....7..)...
> 003f: 0
> 0042:
> == Info: upload completely sent off: 68 out of 61 bytes
> <= Recv header, 26 bytes (0x1a)
> 0000: HTTP/1.1 400 Bad Request
> <= Recv header, 37 bytes (0x25)
> 0000: Date: Thu, 24 Aug 2017 07:20:57 GMT
> <= Recv header, 362 bytes (0x16a)
> 0000: WWW-Authenticate: Negotiate ...
> <= Recv header, 132 bytes (0x84)
> 0000: Set-Cookie: hadoop.auth="u=johwar&..."; HttpOnly
> <= Recv header, 47 bytes (0x2f)
> 0000: Content-Type: application/json; charset=UTF-8
> <= Recv header, 21 bytes (0x15)
> 0000: Content-Length: 172
> <= Recv header, 33 bytes (0x21)
> 0000: Server: Jetty(9.2.16.v20160414)
> <= Recv header, 2 bytes (0x2)
> 0000:
> <= Recv data, 172 bytes (0xac)
> 0000: "Illegal character ((CTRL-CHAR, code 31)): only regular white sp
> 0040: ace (\\r, \\n, \\t) is allowed between tokens\n at [Source: Http
> 0080: InputOverHTTP@583564e8; line: 1, column: 2]"
>
> Ok, so let's try with just compression:
>
> $ curl -u : --negotiate -v -s --trace-ascii http_trace_gz.log
> --data-binary @session_johwar.json.gz -H "Content-Type: application/json"
> -H 'Content-Encoding: gzip' http://myserver:8999/sessions
> "Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\r,
> \\n, \\t) is allowed between tokens\n at [Source: HttpInputOverHTTP@188893c9;
> line: 1, column: 2]"
>
> Ok, no luck.. log is mostly the same, except for no chunking:
>
> 040e: User-Agent: curl/7.47.0
> 0427: Accept: */*
> 0434: Content-Type: application/json
> 0454: Content-Encoding: gzip
> 046c: Content-Length: 61
> 0480:
> => Send data, 61 bytes (0x3d)
> 0000: ...........V*(....-N-R.R...(O,R.Q...KQ.RP*.,.H,.V.....7..)...
> == Info: upload completely sent off: 61 out of 61 bytes
> <= Recv header, 26 bytes (0x1a)
> 0000: HTTP/1.1 400 Bad Request
>
> Decompress the file again:
>
> $ gunzip session_johwar.json.gz
>
> Then.. just a plain old request, "known" to work already:
>
> $ curl -u : --negotiate -v -s --trace-ascii http_trace.log --data
> @session_johwar.json -H "Content-Type: application/json"
> http://myserver:8999/sessions
> {"id":5,"appId":null,"owner":"johwar","proxyUser":"johwar","
> state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":null,"
> sparkUiUrl":null},"log":[]}
>
> Yep. Log is looking a lot better:
>
> 040e: User-Agent: curl/7.47.0
> 0427: Accept: */*
> 0434: Content-Type: application/json
> 0454: Content-Length: 40
> 0468:
> => Send data, 40 bytes (0x28)
> 0000: {"proxyUser":"johwar","kind": "pyspark"}
> == Info: upload completely sent off: 40 out of 40 bytes
> <= Recv header, 22 bytes (0x16)
> 0000: HTTP/1.1 201 Created
>
> And with chunking?
>
> $ curl -u : --negotiate -v -s --trace-ascii http_trace_chunked.log --data
> @session_johwar.json -H "Content-Type: application/json" -H
> 'Transfer-Encoding: chunked' http://myserver:8999/sessions
> {"id":6,"appId":null,"owner":"johwar","proxyUser":"johwar","
> state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":null,"
> sparkUiUrl":null},"log":[]}
>
> Still works.
>

Reply via email to