This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit adfa6c83ec8ca371798b17021d49a8eacd954faf
Author: Csaba Ringhofer <csringho...@cloudera.com>
AuthorDate: Thu Jun 1 11:52:23 2023 +0200

    IMPALA-12142: Decrease default fetch_size to 8192 in impala-shell
    
    The previous fetch_size of 10240 turned out to be suboptimal for HS2
    server side, likely because it leads to overallocation in result
    'std::vector's. Changed to the closest power of 2 size (8192).
    
    With this change RowMaterializationTimer decreased from 3.4s to 2.7s
    for "SELECT * FROM tpch_parquet.lineitem".
    
    Change-Id: I34973cb705db53c496b9944c74995b45cf720d46
    Reviewed-on: http://gerrit.cloudera.org:8080/19965
    Reviewed-by: Kurt Deschler <kdesc...@cloudera.com>
    Reviewed-by: Daniel Becker <daniel.bec...@cloudera.com>
    Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
---
 shell/option_parser.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/shell/option_parser.py b/shell/option_parser.py
index 75ede820a..1fc6fde2d 100755
--- a/shell/option_parser.py
+++ b/shell/option_parser.py
@@ -320,7 +320,7 @@ def get_option_parser(defaults):
                     "enforce any http path for the incoming requests, 
deployments could "
                     "still put it behind a loadbalancer that can expect the 
traffic at a "
                     "certain path.")
-  parser.add_option("--fetch_size", type="int", dest="fetch_size", 
default=10240,
+  parser.add_option("--fetch_size", type="int", dest="fetch_size", 
default=8192,
                     help="The fetch size when fetching rows from the Impala 
coordinator. "
                     "The fetch size controls how many rows a single fetch RPC 
request "
                     "(RPC from the Impala shell to the Impala coordinator) 
reads at a "
@@ -328,7 +328,7 @@ def get_option_parser(defaults):
                     "('spool_query_results'=true). When result spooling is 
enabled "
                     "values over the batch_size are honored. When result 
spooling is "
                     "disabled, values over the batch_size have no affect. By 
default, "
-                    "the fetch_size is set to 10240 which is equivalent to 10 
row "
+                    "the fetch_size is set to 8192 which is equivalent to 8 
row "
                     "batches (assuming the default batch size). Note that if 
result "
                     "spooling is disabled only a single row batch can be 
fetched at a "
                     "time regardless of the specified fetch_size.")

Reply via email to