Sergey Shelukhin created KNOX-755:
-------------------------------------

             Summary: retry logic for replayBuffer limit errors is incorrect.
                 Key: KNOX-755
                 URL: https://issues.apache.org/jira/browse/KNOX-755
             Project: Apache Knox
          Issue Type: Bug
            Reporter: Sergey Shelukhin


Hive receives corrupted thrift requests when using Knox with Hive with a large 
query and insufficient replayBuffer:
{noformat}
org.apache.thrift.transport.TTransportException
        at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at 
org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:354)
        at 
org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:347)
        at 
org.apache.hive.service.cli.thrift.TExecuteStatementReq$TExecuteStatementReqStandardScheme.read(TExecuteStatementReq.java:618)
...
{noformat}

It seems that the retry logic for this error is incorrect, as follows (names 
changed to generic):
{noformat}
2016-10-05 15:25:51,104 DEBUG http.wire (Wire.java:wire(63)) - >> 
"[0x80][0x1][0x0][0x1][0x0][0x0][0x0][0x10]ExecuteStatement[0x0][0x0][0x0]...![0x88]SELECT
 1 AS `number_of_records`,[\n]"
...
2016-10-05 15:25:51,117 DEBUG http.wire (Wire.java:wire(77)) - >> "  
`tablename`.`columnn"
2016-10-05 15:25:51,118 DEBUG http.wire (Wire.java:wire(63)) - >> "[\r][\n]"
...
2016-10-05 15:25:51,119 INFO  client.DefaultHttpClient 
(DefaultRequestDirector.java:tryExecute(726)) - I/O exception 
(java.io.IOException) caught when processing request: Hit replay buffer max 
limit
2016-10-05 15:25:51,120 DEBUG client.DefaultHttpClient 
(DefaultRequestDirector.java:tryExecute(731)) - Hit replay buffer max limit
java.io.IOException: Hit replay buffer max limit
        at 
org.apache.hadoop.gateway.dispatch.CappedBufferHttpEntity$ReplayStream.read(CappedBufferHttpEntity.java:143)
        at java.io.InputStream.read(InputStream.java:101)
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1792)
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769)
        at org.apache.commons.io.IOUtils.copy(IOUtils.java:1744)
        at 
org.apache.hadoop.gateway.dispatch.CappedBufferHttpEntity.writeTo(CappedBufferHttpEntity.java:93)
        at 
org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
{noformat}
However, then it retries:
{noformat}
2016-10-05 15:25:51,121 INFO  client.DefaultHttpClient 
(DefaultRequestDirector.java:tryExecute(733)) - Retrying request
2016-10-05 15:25:51,121 DEBUG client.DefaultHttpClient 
(DefaultRequestDirector.java:tryExecute(703)) - Reopening the direct connection.
{noformat}
After auth (for which the same incorrect request as below is sent, but not 
parsed due to 401), it sends the thing again with correct auth header, as 
follows:
{noformat}
2016-10-05 15:25:51,166 DEBUG client.DefaultHttpClient 
(DefaultRequestDirector.java:tryExecute(713)) - Attempt 3 to execute request
2016-10-05 15:25:51,166 DEBUG conn.DefaultClientConnection 
(DefaultClientConnection.java:sendRequestHeader(269)) - Sending request: POST 
/cliservice?doAs=... HTTP/1.1
2016-10-05 15:25:51,167 DEBUG http.wire (Wire.java:wire(63)) - >> "POST 
/cliservice?doAs=... HTTP/1.1[\r][\n]"
...
2016-10-05 15:25:51,169 DEBUG http.wire (Wire.java:wire(63)) - >> 
"Authorization: Negotiate ...
2016-10-05 15:25:51,170 DEBUG http.wire (Wire.java:wire(63)) - >> "[\r][\n]"
...
2016-10-05 15:25:51,172 DEBUG http.wire (Wire.java:wire(63)) - >> "1000[\r][\n]"
2016-10-05 15:25:51,173 DEBUG http.wire (Wire.java:wire(63)) - >> 
"[0x80][0x1][0x0][0x1][0x0][0x0][0x0][0x10]ExecuteStatement[0x0] ... 
![0x88]SELECT 1 AS `number_of_records`,[\n]"
...
2016-10-05 15:25:51,186 DEBUG http.wire (Wire.java:wire(77)) - >> "  
`tablename`.`columnn"
2016-10-05 15:25:51,187 DEBUG http.wire (Wire.java:wire(63)) - >> "[\r][\n]"
2016-10-05 15:25:51,187 DEBUG http.wire (Wire.java:wire(63)) - >> "1f3[\r][\n]"
2016-10-05 15:25:51,187 DEBUG http.wire (Wire.java:wire(63)) - >> "ther` AS 
`anothercolumnnameother`,[\n]"
... rest of the query
{noformat}
Note that there's a  gap at "columnn", where "columnname" should be.

This results in the above error when reading the request, and error 500 on 
gateway side.

I think the retry logic should be fixed to send the correct buffer, or removed 
for this type of error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to