Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14129 )
Change subject: IMPALA-8819: BufferedPlanRootSink should handle non-default fetch sizes ...................................................................... Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/14129/2/be/src/exec/buffered-plan-root-sink.cc@143 PS2, Line 143: // If 'num_results' <= 0 then by default fetch BATCH_SIZE rows. > Another thought about the batch sizes... I did some profiling, and I think the answer is that it depends on the network between the client and the server. Ran a few experiments comparing small vs. large fetch sizes: * JDBC client running on the same host as a the minicluster - didn't see much perf change * JDBC client running on a different EC2 machine as the minicluster (both in the same region) - depends on the # of rows and # of columns, but makes up to a 20% for a full table scan of TPCH 'orders'. * JDBC client running on my laptop against a minicluster running on an EC2 host - makes a huge difference So I think it really depends on how the client is deployed in relation to the Impala coordinator. I would guess that most clients (e.g. BI tools) would at least run in the same datacenter (or if on AWS same region), but maybe not the same host. However, I don't think it hurts performance to increase the default fetch size. The only thing that would be an issue, as you mentioned, would be if the client was fetching faster than Impala was producing rows (maybe that can happen in highly selective scans). As you mentioned below, that might be solve-able by IMPALA-7312. According to https://www.simba.com/products/Impala/doc/v2/ODBC_InstallGuide/win/content/odbc/options/rowsfetchedperblock.htm "testing has shown that performance gains are marginal beyond the default value of 10000 rows" - so maybe around 10 RowBatches is the sweet spot. So I think setting the default to 10x BATCH_SIZE would be reasonable? -- To view, visit http://gerrit.cloudera.org:8080/14129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd4b397ab6457a4f85e635f239b2c67130fcce4 Gerrit-Change-Number: 14129 Gerrit-PatchSet: 2 Gerrit-Owner: Sahil Takiar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Sat, 24 Aug 2019 00:09:27 +0000 Gerrit-HasComments: Yes
