Hi Henry,

 

I was referring specifically to the EXCHANGE_NODE section of the Coordinator 
Fragment – doesn’t that pin it down specifically to the Coordinator Node ie the 
node to which the JDBC Client is connected directly ?

 

Also how can the streaming the records from simple full table scan query like 
“select * from table” be accelerated so that SendersBlockedTimer  value does 
not represent the 95% of the overall time of the query. Basically imagine you 
have a 3GB parquet table in Impala and a JDBC Driver Client connected to the 
Coordinator ImpalaD and trying to stream out all of the data in the table (3GB) 
as quickly as possible.

The execution part of the query completes blindingly fast and the data is 
streamed out of HDFS within 30 seconds. However the Fetch phase of the full 
table scan query takes 15 min as 14 min and 30 sec of that time is  the value 
in the SendersBlockedTimer

 

The JDBC Client uses the latest Cloudera JDBC driver for Impala (which is 
actually the Simba driver) and performs nothing but just ResultSet.next() ie 
not parsing and data transformation of the columns of each row, no output to 
screen or disk etc. The network between the JDBC Client and Coordinator is 10 
GB and “hdfs client get” of the csv version of the same table takes only 30 sec 
…. 

 

Out of the above 15 min total time, Client Fetch Wait Time is 35% or about 6 
min. Then we also have  SendersBlockedTimer of 14 min and 30 sec – so who is to 
be blamed here for the slow streaming of records compared to hdfs get – a) 
innefecient implementation of the JDBC Client or the Coordinator Node needing 
more resources like more parallel threads and therefore CPU cores etc  

 

How do we interpret the above two figures and what do they point to - the jdbc 
driver or the Coordinator Node 

 

Regards,

Evo

 

From: Henry Robinson [mailto:[email protected]] 
Sent: Thursday, May 25, 2017 7:23 PM
To: [email protected]; [email protected]
Subject: Re: SendersBlockedTimer

 

Hi Evo - 

 

Just to clarify: the EXCHANGE_NODE is the operator in the plan tree which 
mediates communication between workers, not between the client and the 
coordinator.

 

The SendersBlockedTimer measures the amount of time that senders have row 
batches to deliver to an exchange node, but the exchange is busy delivering a 
previously sent row batch. That is, the senders are sending faster than the 
exchange node (and the upstream plan) processes those rows. 

 

In a select * from table query, there'll be one exchange on the coordinator, 
but that's not generally true - exchanges connect all the fragment instances. 
Having the senders blocked in this case is typical, because there'll lots of 
senders sending at high rate fanning in to a single receiver, serving a single 
client.

 

The delivery of rows to the client is managed by the coordinator fragment 
instance through a different part of the code to the exchange node. 

 

Henry

 

On 25 May 2017 at 08:31, Evo Eftimov <[email protected]> wrote:

What is the purpose of SendersBlockedTimer attribute in the EXCHANGE_NODE 
section of the Coordinator Fragment – part of the PROFILE of SQL statement 
executed by Impala

 

I have reviewed the Impala source code and know that the Exchange Node uses a 
Blocking Queue as part of “Stream Manager” module which it instantiates 

 

In the specific context I am interested in, the Exchange Node returns the row 
from a result set to a JDBC driver client. The result set is produced by a 
simple full table scan only query of the type “select * from table”

 

The “Sender” Parallel Threads (presumably with the Exchange Node) publish rows 
to the Blocking Queue also in the Exchange Node and the JDBC client reads rows 
from the same queue via remote JDBC session / connection over TCP/IP – is that 
a correct description of how the Exchange Node mediates between JDBC client on 
the one hand and ImpalaD workers on the other? Btw the Exchange Node is part of 
the Coordinator Node in terms of terminology – right?

 

My specific question is what is the purpose/meaning  of   SendersBlockedTimer – 
e.g. does it mean that the Sender Threads WITHIN the Exchange Node have been in 
a blocked state for the time shown in the value of the attribute. And if this 
is correct then does that mean that they have been blocked because the JDBC 
Client couldn’t not keep up with draining the Blocking Queue during the 
aggregated time duration in SendersBlockedTimer?

 

 

Reply via email to