Mostafa Mokhtar created IMPALA-6285:
---------------------------------------

             Summary: Avoid printing the stack as part of DoTransmitDataRpc as 
it leads to burning lots of kernel CPU
                 Key: IMPALA-6285
                 URL: https://issues.apache.org/jira/browse/IMPALA-6285
             Project: IMPALA
          Issue Type: Bug
    Affects Versions: Impala 2.11.0
            Reporter: Mostafa Mokhtar
            Assignee: Michael Ho
            Priority: Blocker


When running on 32 concurrent TPCDS queries against 20 r4.8xlarge some of the 
RPCs timeout but don't fail the query 

{code}
I1206 12:44:14.925405 25274 status.cc:58] RPC recv timed out: Client 
foo-17.domain.com:22000 timed-out during recv call.
    @           0x957a6a  impala::Status::Status()
    @          0x11dd5fe  impala::DataStreamSender::Channel::DoTransmitDataRpc()
    @          0x11ddcd4  
impala::DataStreamSender::Channel::TransmitDataHelper()
    @          0x11de080  impala::DataStreamSender::Channel::TransmitData()
    @          0x11e1004  impala::ThreadPool<>::WorkerThread()
    @           0xd10063  impala::Thread::SuperviseThread()
    @           0xd107a4  boost::detail::thread_data<>::run()
    @          0x128997a  (unknown)
    @     0x7f68c5bc7e25  start_thread
    @     0x7f68c58f534d  __clone
{code}

{code}
I1206 12:44:15.152775 25296 status.cc:58] RPC recv timed out: Client 
foo-5.domain.com:22000 timed-out during recv call.
    @           0x957a6a  impala::Status::Status()
    @          0x11dd5fe  impala::DataStreamSender::Channel::DoTransmitDataRpc()
    @          0x11ddcd4  
impala::DataStreamSender::Channel::TransmitDataHelper()
    @          0x11de080  impala::DataStreamSender::Channel::TransmitData()
    @          0x11e1004  impala::ThreadPool<>::WorkerThread()
    @           0xd10063  impala::Thread::SuperviseThread()
    @           0xd107a4  boost::detail::thread_data<>::run()
    @          0x128997a  (unknown)
    @     0x7f68c5bc7e25  start_thread
    @     0x7f68c58f534d  __clone
{code}

The status can be changed to expected but it is worth verifying that this 
timeout can be tolerated. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to