John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2:

> (1 comment)

https://github.com/apache/thrift/blob/0.10.0/lib/cpp/src/thrift/server/TServerFramework.cpp
 lines 147-157
 If I'm reading the thrift code correctly, fixing the lock doesn't really help 
the specific issue I've seen. The C++ thrift server implementation ends up 
calling getTransport on the same thread that does the accept of the connection. 
So if the getTransport implementation does reads/writes that ends up blocking, 
it prevents new connections from being accepted until the read/write completes 
or times out. We ran into this issue when a client connecting to the hs2_port 
started the SASL handshake then for some reason quits communicating with the 
server (the packets from the client to impalad started to not make it to 
impalad even the RST packet that the client sends after retrying). In this case 
all other connection requests to the hs2_port end up being queued until I think 
the tcp keepalive kicks in (I think the default time is 2 hours) or impalad is 
restarted. (The Java implementation of TThreadPoolServer puts the accepted 
connection on a different thread before calling getTran!
 sport 
https://github.com/apache/thrift/blob/master/lib/java/src/org/apache/thrift/server/TThreadPoolServer.java)
 If the C++ thrift implementation put the connection on a different thread 
before calling getTransport, fixing the locking would be helpful.
 
Backtrace ends up lookings something like this (from IMPALA-5268):
#6  0x0000000001b394d2 in apache::thrift::transport::TSSLSocket::read(unsigned 
char*, unsigned int) ()
No symbol table info available.
#7  0x0000000001b36003 in unsigned int 
apache::thrift::transport::readAll<apache::thrift::transport::TSocket>(apache::thrift::transport::TSocket&,
 unsigned char*, unsigned int) ()
No symbol table info available.
#8  0x0000000000b52e15 in 
apache::thrift::transport::TSaslTransport::receiveSaslMessage(apache::thrift::transport::NegotiationStatus*,
 unsigned int*) ()
No symbol table info available.
#9  0x0000000000b50733 in 
apache::thrift::transport::TSaslServerTransport::handleSaslStartMessage() ()
No symbol table info available.
#10 0x0000000000b52f9e in apache::thrift::transport::TSaslTransport::open() ()
No symbol table info available.
#11 0x0000000000b51431 in 
apache::thrift::transport::TSaslServerTransport::Factory::getTransport(boost::shared_ptr<apache::thrift::transport::TTransport>)
 ()
No symbol table info available.
#12 0x0000000001b4066d in apache::thrift::server::TThreadPoolServer::serve() ()
No symbol table info available.
#13 0x00000000009f2e60 in 
impala::ThriftServer::ThriftServerEventProcessor::Supervise() ()
No symbol table info available.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: John Sherman <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-HasComments: No

Reply via email to