Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14129 )

Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default 
fetch sizes
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc@143
PS2, Line 143:     // If 'num_results' <= 0 then by default fetch BATCH_SIZE 
rows.
> Another thought about the batch sizes...
I did some profiling, and I think the answer is that it depends on the network 
between the client and the server. Ran a few experiments comparing small vs. 
large fetch sizes:

* JDBC client running on the same host as a the minicluster - didn't see much 
perf change
* JDBC client running on a different EC2 machine as the minicluster (both in 
the same region) - depends on the # of rows and # of columns, but makes up to a 
20% for a full table scan of TPCH 'orders'.
* JDBC client running on my laptop against a minicluster running on an EC2 host 
- makes a huge difference

So I think it really depends on how the client is deployed in relation to the 
Impala coordinator. I would guess that most clients (e.g. BI tools) would at 
least run in the same datacenter (or if on AWS same region), but maybe not the 
same host.

However, I don't think it hurts performance to increase the default fetch size. 
The only thing that would be an issue, as you mentioned, would be if the client 
was fetching faster than Impala was producing rows (maybe that can happen in 
highly selective scans). As you mentioned below, that might be solve-able by 
IMPALA-7312.

According to 
https://www.simba.com/products/Impala/doc/v2/ODBC_InstallGuide/win/content/odbc/options/rowsfetchedperblock.htm
 "testing has shown that performance gains are marginal beyond the default 
value of 10000 rows" - so maybe around 10 RowBatches is the sweet spot.

So I think setting the default to 10x BATCH_SIZE would be reasonable?



--
To view, visit http://gerrit.cloudera.org:8080/14129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4
Gerrit-Change-Number: 14129
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Sat, 24 Aug 2019 00:09:27 +0000
Gerrit-HasComments: Yes

Reply via email to