RE: SendersBlockedTimer

Evo Eftimov Sat, 27 May 2017 02:51:49 -0700

Hi Henry,


Yes we had already tried various Fetch Batch Sizes – there were no improvements.

 

I get your point about the differences between HDFS->HDFS Client low level of 
abstraction data transfer vs HDFS->Impala->JDBC Client higher level of 
abstraction data transfer and that’s why I have used the low level HDFS->HDFS 
Client transfer to establish a rough baseline for comparison which I am not 
trying to match EXACTLY. However I was still hoping for smaller difference – 
the current one of 30 sec for HDFS only transfer vs Impala transfer of 15 min 
seems too many order of magnitude apart. Please note that the 15 min for Impala 
is strictly for the Fetch phase of the Query – the preceding execution phase 
(including the decompression of parquet data) completes in milliseconds 

 

Now what’s puzzling is that in the meantime I have loaded the same 2.6 GB 
dataset (its uncompressed CSV version) in MySQL table and using their JDBC 
driver managed to Fetch it in only 58 seconds – now that’s reasonably close to 
the pure HDFS-HDFS Client only baseline of 30 sec and that’s what I was 
looking/hoping to get when I was using the Cloudera JDBC Driver and 
Impala-Shell 

 

Regards

Evo

 

From: Henry Robinson [mailto:[email protected]] 
Sent: Friday, May 26, 2017 7:17 PM
To: Evo Eftimov
Cc: [email protected]
Subject: Re: SendersBlockedTimer

 

 

 

On 26 May 2017 at 01:23, Evo Eftimov <[email protected]> wrote:

Hi Henry,

 

The parquet table which is being exported in full via JDBC is about 800 MB 
compressed as stored as parquet file and when extracted as CSV or via JDBC ie 
uncompressed is 2.6 GB

 

I had already tried impala-shell too and it had demonstrated more or less 
identical performance as the Cloudera / Simba driver – impala-shell was run 
with the –B option and its output redirected to /dev/null for max performance 
during fetch 

 

If I extract the parquet table to CSV on HDFS with INSERT SELECT from within 
Impala that results in a 2.6 GB csv file which if I then download from HDFS 
with “hdfs dfs –get” it takes only 30 sec – this is a staggering difference 
with the performance demonstrated by impala-shell and the Cloudera/Simba JDBC 
driver – are these drivers / tools that poorly optimized if optimized / 
designed / implemented properly at all ?

 

 

They're not optimized for large extracts, on the order of GB of data. 'hdfs 
-get' has a number of significant advantages: it doesn't have to understand the 
results, so there's no serialization steps - it just copies the bytes. Since 
it's just copying a file, and not trying to present the abstraction of an 
ordered sequence of rows, it can take advantage of the parallelism available in 
HDFS and read different blocks from different datanodes. These options aren't 
so easily available to a query engine or its client. 

 

There are definitely improvements that we can make to the speed of result 
retrieval, but getting close to HDFS speeds would probably require an 
architectural overhaul of the way clients interact with Impala. As I said, 
large data extracts are not a use case that the clients, or Impala, are 
optimized for right now. 

 

You might try experimenting with the 'fetch size' parameter in the JDBC driver 
- larger batches might reduce overheads due to repeated RPC calls.

 

Henry

 

 

Ps: I run the JDBC client with JVM with 4GB heap size to minimize any impact of 
Garbage Collection   

 

Regards,

Evo

 

From: Henry Robinson [mailto:[email protected]] 
Sent: Thursday, May 25, 2017 10:32 PM
To: [email protected]
Subject: Re: SendersBlockedTimer

 

 

 

On 25 May 2017 at 12:19, Evo Eftimov <[email protected]> wrote:

Hi Henry,

 

I was referring specifically to the EXCHANGE_NODE section of the Coordinator 
Fragment – doesn’t that pin it down specifically to the Coordinator Node ie the 
node to which the JDBC Client is connected directly ?

 

Also how can the streaming the records from simple full table scan query like 
“select * from table” be accelerated so that SendersBlockedTimer  value does 
not represent the 95% of the overall time of the query. Basically imagine you 
have a 3GB parquet table in Impala and a JDBC Driver Client connected to the 
Coordinator ImpalaD and trying to stream out all of the data in the table (3GB) 
as quickly as possible.

The execution part of the query completes blindingly fast and the data is 
streamed out of HDFS within 30 seconds. However the Fetch phase of the full 
table scan query takes 15 min as 14 min and 30 sec of that time is  the value 
in the SendersBlockedTimer 

 

The JDBC Client uses the latest Cloudera JDBC driver for Impala (which is 
actually the Simba driver) and performs nothing but just ResultSet.next() ie 
not parsing and data transformation of the columns of each row, no output to 
screen or disk etc. The network between the JDBC Client and Coordinator is 10 
GB and “hdfs client get” of the csv version of the same table takes only 30 sec 
…. 

 

Out of the above 15 min total time, Client Fetch Wait Time is 35% or about 6 
min. Then we also have  SendersBlockedTimer of 14 min and 30 sec – so who is to 
be blamed here for the slow streaming of records compared to hdfs get – a) 
innefecient implementation of the JDBC Client or the Coordinator Node needing 
more resources like more parallel threads and therefore CPU cores etc  

 

How do we interpret the above two figures and what do they point to - the jdbc 
driver or the Coordinator Node

 

Most likely the driver, as the query takes 6 minutes, per the Client Fetch Wait 
Time. SendersBlockedTimer tracks the amount of time for which at least one 
sender was blocked. Since it is high, we know that the coordinator is moving 
slower than the results are being sent to it. The coordinator does very little 
in a SELECT * query, so the likelihood is that it is serving rows to the client 
as fast as it can consume them. Therefore I'd expect the client to be the 
bottleneck. 

 

Try using the impala-shell, and setting -B (and redirecting the output to 
/dev/null); this is about as fast as a single client can go right now and 
should give you a feeling for a lower bound on the query performance.

 

How much data does this query return? The client API and driver are not really 
optimized for large ETL-style retrieval - for that you might be better off 
using INSERT to write some files to HDFS, and then downloading them in parallel 
from HDFS.

 

Best,

Henry

 

 

Regards,

Evo

 

From: Henry Robinson [mailto:[email protected]] 
Sent: Thursday, May 25, 2017 7:23 PM
To: [email protected]; [email protected]
Subject: Re: SendersBlockedTimer

 

Hi Evo - 

 

Just to clarify: the EXCHANGE_NODE is the operator in the plan tree which 
mediates communication between workers, not between the client and the 
coordinator.

 

The SendersBlockedTimer measures the amount of time that senders have row 
batches to deliver to an exchange node, but the exchange is busy delivering a 
previously sent row batch. That is, the senders are sending faster than the 
exchange node (and the upstream plan) processes those rows. 

 

In a select * from table query, there'll be one exchange on the coordinator, 
but that's not generally true - exchanges connect all the fragment instances. 
Having the senders blocked in this case is typical, because there'll lots of 
senders sending at high rate fanning in to a single receiver, serving a single 
client.

 

The delivery of rows to the client is managed by the coordinator fragment 
instance through a different part of the code to the exchange node. 

 

Henry

 

On 25 May 2017 at 08:31, Evo Eftimov <[email protected]> wrote:

What is the purpose of SendersBlockedTimer attribute in the EXCHANGE_NODE 
section of the Coordinator Fragment – part of the PROFILE of SQL statement 
executed by Impala

 

I have reviewed the Impala source code and know that the Exchange Node uses a 
Blocking Queue as part of “Stream Manager” module which it instantiates 

 

In the specific context I am interested in, the Exchange Node returns the row 
from a result set to a JDBC driver client. The result set is produced by a 
simple full table scan only query of the type “select * from table”

 

The “Sender” Parallel Threads (presumably with the Exchange Node) publish rows 
to the Blocking Queue also in the Exchange Node and the JDBC client reads rows 
from the same queue via remote JDBC session / connection over TCP/IP – is that 
a correct description of how the Exchange Node mediates between JDBC client on 
the one hand and ImpalaD workers on the other? Btw the Exchange Node is part of 
the Coordinator Node in terms of terminology – right?

 

My specific question is what is the purpose/meaning  of   SendersBlockedTimer – 
e.g. does it mean that the Sender Threads WITHIN the Exchange Node have been in 
a blocked state for the time shown in the value of the attribute. And if this 
is correct then does that mean that they have been blocked because the JDBC 
Client couldn’t not keep up with draining the Blocking Queue during the 
aggregated time duration in SendersBlockedTimer?

RE: SendersBlockedTimer

Reply via email to