Henry Robinson has posted comments on this change. Change subject: IMPALA-4135: Thrift threaded server times-out connections during high load ......................................................................
Patch Set 3: (8 comments) Some comments before I head off to Strata. I would run clang-format over the new thrift files. That way they'll be more readable to Impala developers, while keeping the structure and idioms of the thrift code. http://gerrit.cloudera.org:8080/#/c/4519/3/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: PS3, Line 204: 1 this is an important enough parameter that I'd make a constant for it, and put your comments about only using thread on that. We don't want someone changing the number of threads without understanding the thread-safety implications. PS3, Line 216: acceptPool.Offer returned false. If this returns false it's because the queue is shut down. Better to say that clearly in the error message so users have an idea what this might mean. Also you can say that it's unexpected - we never shut down our servers so the queue should never be shut down. http://gerrit.cloudera.org:8080/#/c/4519/3/be/src/rpc/TAcceptQueueServer.h File be/src/rpc/TAcceptQueueServer.h: Line 46: * connections will time out while in the OS accept queue. add a reference to IMPALA-4135 here. PS3, Line 105: std::mutex big_lock_; Is this used anywhere? http://gerrit.cloudera.org:8080/#/c/4519/3/be/src/rpc/thrift-server-test.cc File be/src/rpc/thrift-server-test.cc: Line 174: ThreadPool<int64_t> pool("group", "test", 256, 10000, [](int tid, const int64_t& item) { Add a comment about what you're testing and the associated JIRA. PS3, Line 178: ASSERT_OK(status); Does this work in a threaded context - does the test fail if the thread hits an ASSERT? PS3, Line 180: for (int i = 0; i < 1024 * 16; ++i) pool.Offer(i); I'm a bit concerned about failing with ulimit errors, particularly because both client and server should be in the same process. Have you ever seen this repro with fewer concurrent connections - say 8K? PS3, Line 173: TEST(ConcurrencyTest, ConnectTimeout) { : ThreadPool<int64_t> pool("group", "test", 256, 10000, [](int tid, const int64_t& item) { : using Client = ThriftClient<ImpalaInternalServiceClient>; : Client* client = new Client("127.0.0.1", 22000, "", NULL, false); : Status status = client->Open(); : ASSERT_OK(status); : }); : for (int i = 0; i < 1024 * 16; ++i) pool.Offer(i); : pool.DrainAndShutdown(); : } This relies on a running impala internal service - you probably had an Impala process running when you ran this? That won't be true in our builds. Use the same server code that all the other tests use to start a server in this process. -- To view, visit http://gerrit.cloudera.org:8080/4519 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Juan Yu <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-HasComments: Yes
