[Impala-ASF-CR] IMPALA-11120: Fix codec not set in generating ORC tables

2022-02-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18228 )

Change subject: IMPALA-11120: Fix codec not set in generating ORC tables
..


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7888/


--
To view, visit http://gerrit.cloudera.org:8080/18228
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bd5d9400864145133ff019a3d076a6cab36fcc
Gerrit-Change-Number: 18228
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 01 Mar 2022 06:57:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11120: Fix codec not set in generating ORC tables

2022-02-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18228 )

Change subject: IMPALA-11120: Fix codec not set in generating ORC tables
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18228
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bd5d9400864145133ff019a3d076a6cab36fcc
Gerrit-Change-Number: 18228
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 01 Mar 2022 02:12:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11120: Fix codec not set in generating ORC tables

2022-02-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18228 )

Change subject: IMPALA-11120: Fix codec not set in generating ORC tables
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7888/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/18228
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bd5d9400864145133ff019a3d076a6cab36fcc
Gerrit-Change-Number: 18228
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 01 Mar 2022 02:12:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11120: Fix codec not set in generating ORC tables

2022-02-28 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18228 )

Change subject: IMPALA-11120: Fix codec not set in generating ORC tables
..


Patch Set 2:

Thank Andrew!


--
To view, visit http://gerrit.cloudera.org:8080/18228
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bd5d9400864145133ff019a3d076a6cab36fcc
Gerrit-Change-Number: 18228
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 01 Mar 2022 02:12:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11120: Fix codec not set in generating ORC tables

2022-02-28 Thread Andrew Sherman (Code Review)
Andrew Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18228 )

Change subject: IMPALA-11120: Fix codec not set in generating ORC tables
..


Patch Set 2: Code-Review+2

LGTM


--
To view, visit http://gerrit.cloudera.org:8080/18228
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bd5d9400864145133ff019a3d076a6cab36fcc
Gerrit-Change-Number: 18228
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 01 Mar 2022 01:57:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10049: Include RPC call id in slow RPC logs

2022-02-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18243 )

Change subject: IMPALA-10049: Include RPC call_id in slow RPC logs
..

IMPALA-10049: Include RPC call_id in slow RPC logs

KRPC log slow RPC trace in the receiver side. The trace log has the
call_id info that matches with the sender. However, our slow RPC logging
in the sender side does not log this call_id. It is hard to associate
the slow RPC logs between sender and receiver.

With the recent KRPC rebase in IMPALA-10931, we can now log the call_id
on the sender side.

Testing:
I tested this with a low threshold and delays added (the same as we did
in IMPALA-9128):

  start-impala-cluster.py \
  --impalad_args=--impala_slow_rpc_threshold_ms=1 \
  --impalad_args=--debug_actions=END_DATA_STREAM_DELAY:JITTER@3000@1.0

The following is how the logs look like on the sender and receiver sides:

impalad_node1.INFO (sender):
I0217 10:29:36.278754  6606 krpc-data-stream-sender.cc:394] Slow TransmitData 
RPC (request call id 414) to 127.0.0.1:27002 
(fragment_instance_id=d8453c2785c38df4:3473e28b0041): took 343.279ms. 
Receiver time: 342.780ms Network time: 498.405us

impalad_node2.INFO (receiver):
I0217 10:29:36.278379  6775 rpcz_store.cc:269] Call 
impala.DataStreamService.TransmitData from 127.0.0.1:39702 (request call id 
414) took 342ms. Trace:
I0217 10:29:36.278479  6775 rpcz_store.cc:270] 0217 10:29:35.935586 (+ 0us) 
impala-service-pool.cc:179] Inserting onto call queue
0217 10:29:36.277730 (+342144us) impala-service-pool.cc:278] Handling call
0217 10:29:36.277859 (+   129us) krpc-data-stream-recvr.cc:397] Deserializing 
batch
0217 10:29:36.278330 (+   471us) krpc-data-stream-recvr.cc:424] Enqueuing 
deserialized batch
0217 10:29:36.278369 (+39us) inbound_call.cc:171] Queueing success response
Metrics: {}

Change-Id: I7fb5746fa0be575745a8e168405d43115c425389
Reviewed-on: http://gerrit.cloudera.org:8080/18243
Reviewed-by: Wenzhe Zhou 
Tested-by: Impala Public Jenkins 
---
M be/src/runtime/krpc-data-stream-sender.cc
1 file changed, 2 insertions(+), 1 deletion(-)

Approvals:
  Wenzhe Zhou: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/18243
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7fb5746fa0be575745a8e168405d43115c425389
Gerrit-Change-Number: 18243
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10049: Include RPC call id in slow RPC logs

2022-02-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18243 )

Change subject: IMPALA-10049: Include RPC call_id in slow RPC logs
..


Patch Set 7: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18243
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7fb5746fa0be575745a8e168405d43115c425389
Gerrit-Change-Number: 18243
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 28 Feb 2022 23:07:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10992 Planner changes for estimate peak memory

2022-02-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18178 )

Change subject: IMPALA-10992 Planner changes for estimate peak memory
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10237/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18178
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I75cf17290be2c64fd4b732a5505bdac31869712a
Gerrit-Change-Number: 18178
Gerrit-PatchSet: 14
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 28 Feb 2022 21:17:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10992 Planner changes for estimate peak memory

2022-02-28 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18178 )

Change subject: IMPALA-10992 Planner changes for estimate peak memory
..


Patch Set 14:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/18178/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18178/13//COMMIT_MSG@9
PS13, Line 9: executor group
> nit: multiple executor group sets.
Done


http://gerrit.cloudera.org:8080/#/c/18178/13//COMMIT_MSG@74
PS13, Line 74:  Almost all FE and BE tests are now run in the artificial two
 : executor setup except a few where a specific cluster 
configuration
 : is desirable;
> Please see my comment in Frontend.java about how we can ensure re-planning
Done


http://gerrit.cloudera.org:8080/#/c/18178/13/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

http://gerrit.cloudera.org:8080/#/c/18178/13/common/thrift/Frontend.thrift@729
PS13, Line 729: // The optional threshold to determine which executor group set 
t
> nit: can you provide more context here as to what this threshold signifies,
Done


http://gerrit.cloudera.org:8080/#/c/18178/13/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/18178/13/fe/src/main/java/org/apache/impala/service/Frontend.java@265
PS13, Line 265: // An inner class to capture the state of compilation for 
auto-scaling.
  : final class AutoScalingCompilationState {
> nit: would it make sense to put this inside a separate
Move all the new data members and function members into a new inner class 
called AutoScalingCompilationState.


http://gerrit.cloudera.org:8080/#/c/18178/13/fe/src/main/java/org/apache/impala/service/Frontend.java@284
PS13, Line 284:   // Set when the query is compiled against the 1st group set 
inside
> nit: mention when it is set and when can it be reset
Done


http://gerrit.cloudera.org:8080/#/c/18178/13/fe/src/main/java/org/apache/impala/service/Frontend.java@335
PS13, Line 335: le) in next
> nit: not sure auto-scaling is the right term here, since we are not scaling
I used the name AutoScalingCompilationState to capture the data structures and 
methods. The nature of all of it is to facilitate auto-scaling in BE.

So in this sense, the work is still about auto-scaling.


http://gerrit.cloudera.org:8080/#/c/18178/13/fe/src/main/java/org/apache/impala/service/Frontend.java@1757
PS13, Line 1757: to the max_query_mem_limit from the pool
   :* service for the
> nit: would be good to explain why a group can be classified as useless and
Done


http://gerrit.cloudera.org:8080/#/c/18178/13/fe/src/main/java/org/apache/impala/service/Frontend.java@1776
PS13, Line 1776:  else if (test_replan) {
   : ExecutorMembershipSnapshot cluster = 
ExecutorMembershipSnapshot.getCluster();
   : int num_nodes = cluster.numExecutors();
   : // Form a two-executor group testing environment so 
that we can exercise
   : // auto-scaling logic (see getTExecRequest() in 
Frontend.java).
   : TExecutorGroupSet r = new TExecutorGroupSet(num_nodes, 
num_nodes, "small");
   : r.setThreshold(64*MEGABYTE);
   : result.add(r);
   : TExecutorGroupSet l = new TExecutorGroupSet(e);
   : Preconditions.
> if we want to emulate a 2 exec group set configuration, what if we only add
Done


http://gerrit.cloudera.org:8080/#/c/18178/13/fe/src/main/java/org/apache/impala/service/Frontend.java@1788
PS13, Line 1788:
> nit: can just use 'e' here too.
Done


http://gerrit.cloudera.org:8080/#/c/18178/13/fe/src/main/java/org/apache/impala/service/Frontend.java@1799
PS13, Line 1799: f defined, r
> if we use query_exec_request.query_ctx.request_pool instead of queryOptions
query_exec_request.query_ctx.request_pool is set after we have decided the 
group set to use for the query. Here we establish a list of such group set 
candidates.


http://gerrit.cloudera.org:8080/#/c/18178/13/fe/src/main/java/org/apache/impala/service/Frontend.java@1816
PS13, Line 1816: result.add(new_entry);
   : }
> nit: how about: Request pool:  does not map to any known executo
Done


http://gerrit.cloudera.org:8080/#/c/18178/13/fe/src/main/java/org/apache/impala/service/Frontend.java@1929
PS13, Line 1929:   }
   :
   :   // Find out the per host memory estimated from two 
possible sources.
   :   per_host_mem_estimate = -1;
   :   if (req.query_exec_request != null) {
   :
> when would either case be used? as in, why would query_exec_request not be
Done


http://gerrit.cloudera.org:8080/#/c/18178/13/fe/src/main/java/org/apache/impala/service/Frontend.java@1937
PS13, Line 

[Impala-ASF-CR] IMPALA-10992 Planner changes for estimate peak memory

2022-02-28 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/18178 )

Change subject: IMPALA-10992 Planner changes for estimate peak memory
..

IMPALA-10992 Planner changes for estimate peak memory

This patch provides replan support for multiple executor group sets.
Each executor group set is associated with a distinct number of nodes
and a threshold for estimated memory per host in bytes that can be
denoted as [:<#nodes>, ].

In the patch, a query of type EXPLAIN, QUERY or DML can be compiled
more than once. In each attempt, per host memory is estimated and
compared with the threshold of an executor group set. If the estimated
memory is no more than the threshold, the iteration process terminates
and the final plan is determined. The executor group set with the
threshold is selected to run the query.

A new query option 'enable_replan', default to 1 (enabled), is added.
It can be set to 0 to disable this patch and to generate the distributed
plan for the default executor group.

To avoid long compilation time, the following enhancement is enabled.
Note 1) and 2) can be disabled when relevant meta-data change is
detected.

 1. Authorization is performed only for the 1st compilation;
 2. The needed meta-data is fetched into a StmtTableCache in 1st
compilation and reused in subsequent compilations;
 3. openTransaction() is called for transactional queries in 1st
compilation and the saved transactional info is used in
subsequent compilations. Similar logic is applied to Kudu
transactional queries.

To facilitate testing, the patch imposes an artificial two executor
group setup in FE as follows.

 1. [regular:<#nodes>, 64MB]
 2. [large:<#nodes>, 8PB]

This setup is enabled when a new query option 'test_replan' is set
to 1 in backend tests, or RuntimeEnv.INSTANCE.isTestEnv() is true as
in most frontend tests. This query option is set to 0 by default.

Compilation time increases when a query is compiled in several
iterations, as shown below for several TPCDs queries. The increase
is mostly due to redundant work in either single node plan creation
or recomputing value transfer graph phase. For small queries, the
increase can be avoided if they can be compiled in sinlge iteration
by properly setting the smallest threshold among all executor group
sets. For example, for the set of queries listed below, the smallest
threshold can be set to 320MB to catch both q15 and q21 in one
compilation.

  Compilation time (ms)
Queries  Estimated Memory   2-iterations  1-iteration  Percentage of
 increase
 q1 408MB  18.32 13.0140.81%
 q11   1.37GB 186.17 86.28   115.77%
 q10a   519MB 108.27 53.58   102.07%
 q13339MB 118.03 82.4343.19%
 q14a  3.56GB 628.27307.24   104.49%
 q14b  2.20GB 518.79239.05   117.02%
 q15314MB  13.12  4.51   190.91%
 q21275MB  11.04  6.3474.13%
 q23a  1.34GB  458.7227.62   101.52%
 q23b  1.50GB 471.29224.75   109.70%
 q42.60GB 206.34 98.64   109.18%
 q67   5.16GB 691.45336.31   105.60%

Testing:
 1. Almost all FE and BE tests are now run in the artificial two
executor setup except a few where a specific cluster configuration
is desirable;
 2. Ran core tests successfully;
 3. Added a new observability test and a test to explicitly ensure
replan takes place among two group sets.

Change-Id: I75cf17290be2c64fd4b732a5505bdac31869712a
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/Frontend.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/util/ClassUtil.java
M fe/src/main/java/org/apache/impala/util/ExecutorMembershipSnapshot.java
M fe/src/test/java/org/apache/impala/common/QueryFixture.java
M fe/src/test/java/org/apache/impala/planner/ClusterSizeTest.java
M tests/common/test_dimensions.py
M tests/custom_cluster/test_admission_controller.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_executor_groups.py
M tests/query_test/test_observability.py
21 files changed, 533 insertions(+), 70 

[Impala-ASF-CR] IMPALA-10049: Include RPC call id in slow RPC logs

2022-02-28 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18243 )

Change subject: IMPALA-10049: Include RPC call_id in slow RPC logs
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18243
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7fb5746fa0be575745a8e168405d43115c425389
Gerrit-Change-Number: 18243
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 28 Feb 2022 18:21:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10049: Include RPC call id in slow RPC logs

2022-02-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18243 )

Change subject: IMPALA-10049: Include RPC call_id in slow RPC logs
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7887/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/18243
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7fb5746fa0be575745a8e168405d43115c425389
Gerrit-Change-Number: 18243
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 28 Feb 2022 18:23:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10049: Include RPC call id in slow RPC logs

2022-02-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18243 )

Change subject: IMPALA-10049: Include RPC call_id in slow RPC logs
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10236/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18243
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7fb5746fa0be575745a8e168405d43115c425389
Gerrit-Change-Number: 18243
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 28 Feb 2022 18:22:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10049: Include RPC call id in slow RPC logs

2022-02-28 Thread Riza Suminto (Code Review)
Hello Joe McDonnell, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18243

to look at the new patch set (#7).

Change subject: IMPALA-10049: Include RPC call_id in slow RPC logs
..

IMPALA-10049: Include RPC call_id in slow RPC logs

KRPC log slow RPC trace in the receiver side. The trace log has the
call_id info that matches with the sender. However, our slow RPC logging
in the sender side does not log this call_id. It is hard to associate
the slow RPC logs between sender and receiver.

With the recent KRPC rebase in IMPALA-10931, we can now log the call_id
on the sender side.

Testing:
I tested this with a low threshold and delays added (the same as we did
in IMPALA-9128):

  start-impala-cluster.py \
  --impalad_args=--impala_slow_rpc_threshold_ms=1 \
  --impalad_args=--debug_actions=END_DATA_STREAM_DELAY:JITTER@3000@1.0

The following is how the logs look like on the sender and receiver sides:

impalad_node1.INFO (sender):
I0217 10:29:36.278754  6606 krpc-data-stream-sender.cc:394] Slow TransmitData 
RPC (request call id 414) to 127.0.0.1:27002 
(fragment_instance_id=d8453c2785c38df4:3473e28b0041): took 343.279ms. 
Receiver time: 342.780ms Network time: 498.405us

impalad_node2.INFO (receiver):
I0217 10:29:36.278379  6775 rpcz_store.cc:269] Call 
impala.DataStreamService.TransmitData from 127.0.0.1:39702 (request call id 
414) took 342ms. Trace:
I0217 10:29:36.278479  6775 rpcz_store.cc:270] 0217 10:29:35.935586 (+ 0us) 
impala-service-pool.cc:179] Inserting onto call queue
0217 10:29:36.277730 (+342144us) impala-service-pool.cc:278] Handling call
0217 10:29:36.277859 (+   129us) krpc-data-stream-recvr.cc:397] Deserializing 
batch
0217 10:29:36.278330 (+   471us) krpc-data-stream-recvr.cc:424] Enqueuing 
deserialized batch
0217 10:29:36.278369 (+39us) inbound_call.cc:171] Queueing success response
Metrics: {}

Change-Id: I7fb5746fa0be575745a8e168405d43115c425389
---
M be/src/runtime/krpc-data-stream-sender.cc
1 file changed, 2 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/18243/7
--
To view, visit http://gerrit.cloudera.org:8080/18243
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7fb5746fa0be575745a8e168405d43115c425389
Gerrit-Change-Number: 18243
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10049: Include RPC call id in slow RPC logs

2022-02-28 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18243 )

Change subject: IMPALA-10049: Include RPC call_id in slow RPC logs
..


Patch Set 6:

> Patch Set 6: Verified-1
>
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7886/

This patch failed test_exchange_small_delay.
The call_id() that is being logged is retrieved from KRPC response message.
However, test_exchange_small_delay depends on receiver timing out and not 
sending any response message.
We need to revert the logging at LogSlowFailedRpc().


--
To view, visit http://gerrit.cloudera.org:8080/18243
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7fb5746fa0be575745a8e168405d43115c425389
Gerrit-Change-Number: 18243
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 28 Feb 2022 18:00:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10992 Planner changes for estimate peak memory - v1

2022-02-28 Thread Qifan Chen (Code Review)
Qifan Chen has abandoned this change. ( http://gerrit.cloudera.org:8080/18143 )

Change subject: IMPALA-10992 Planner changes for estimate peak memory - v1
..


Abandoned

This version is the draft version to https://gerrit.cloudera.org/#/c/18178/.
--
To view, visit http://gerrit.cloudera.org:8080/18143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Ibe71f905d6a8c1e42cf951b3a69ff33b81277c24
Gerrit-Change-Number: 18143
Gerrit-PatchSet: 29
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-11133 (Addendum): Encode a string in utf8 before printing it

2022-02-28 Thread Laszlo Gaal (Code Review)
Laszlo Gaal has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18270 )

Change subject: IMPALA-11133 (Addendum): Encode a string in utf8 before 
printing it
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18270
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad9b1fb0a523e219bc9f40a57ff7335808be283f
Gerrit-Change-Number: 18270
Gerrit-PatchSet: 2
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 28 Feb 2022 16:34:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9433: Improved caching of HdfsFileHandles

2022-02-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18191 )

Change subject: IMPALA-9433: Improved caching of HdfsFileHandles
..


Patch Set 26:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10235/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a
Gerrit-Change-Number: 18191
Gerrit-PatchSet: 26
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 28 Feb 2022 14:34:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9433: Improved caching of HdfsFileHandles

2022-02-28 Thread Code Review
Gergely Fürnstáhl has uploaded a new patch set (#26). ( 
http://gerrit.cloudera.org:8080/18191 )

Change subject: IMPALA-9433: Improved caching of HdfsFileHandles
..

IMPALA-9433: Improved caching of HdfsFileHandles

Seperated LRU caching functionality to a templated LruMultiCache class.

Replaced std::multimap with std::unordered_map with std::list for O(1)
lookups and less memory overhead, as it stores each key one time. Added
boost::intrusive::list to handle LRU relations with less overhead.
Added O(1) release method, instead of O(n) with minimal memory overhead.
Implemented RAII Accessor to remove the responsibility of releasing
the objects from the user.

Wrapped cache accessor and related DiskIOManager metrics to a
FileHandleCache::Accessor. Removed Release*() call trees from
FileHandleCache and DiskIOManager, removed scoped exit from
HdfsFileReader as they are handled automatically.

Testing:

Implemented extensive unit testing of the class, including forced
rehashes, collisions, capacity overshoot, explicit/automatic release
and destroy.

Ran tests/custom_cluster/test_hdfs_fd_caching.py to verify
FileHandleCache::Accessor behaviour through metrics.

Ran bin/single_node_perf_run.py with TPCH and TPC-DS on parquet tables,
no visible change in performance:
TPCH   scale=10 iterations=100: Delta(Avg)=-0.67% Delta(GeoMean)=-0.49%
TPC-DS scale=10 iterations= 50: Delta(Avg)=-0.02% Delta(GeoMean)= 0.00%

Tested some manual queries on functional_parquet.widetable_1000_cols
with 64 threads but did not notice significant changes in scan times.

Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a
---
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/handle-cache.h
M be/src/runtime/io/handle-cache.inline.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/util/CMakeLists.txt
A be/src/util/lru-multi-cache-test.cc
A be/src/util/lru-multi-cache.h
A be/src/util/lru-multi-cache.inline.h
9 files changed, 1,175 insertions(+), 274 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/18191/26
--
To view, visit http://gerrit.cloudera.org:8080/18191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a
Gerrit-Change-Number: 18191
Gerrit-PatchSet: 26
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9433: Improved caching of HdfsFileHandles

2022-02-28 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18191 )

Change subject: IMPALA-9433: Improved caching of HdfsFileHandles
..


Patch Set 25: Code-Review+1

(5 comments)

The code looks awesome!

http://gerrit.cloudera.org:8080/#/c/18191/25/be/src/runtime/io/handle-cache.inline.h
File be/src/runtime/io/handle-cache.inline.h:

http://gerrit.cloudera.org:8080/#/c/18191/25/be/src/runtime/io/handle-cache.inline.h@80
PS25, Line 80:   if (cache_accessor_.Get())
 : 
ImpaladMetrics::IO_MGR_NUM_FILE_HANDLES_OUTSTANDING->Increment(1L);
nit: multi-line if stmt needs braces


http://gerrit.cloudera.org:8080/#/c/18191/25/be/src/util/lru-multi-cache.h
File be/src/util/lru-multi-cache.h:

http://gerrit.cloudera.org:8080/#/c/18191/25/be/src/util/lru-multi-cache.h@53
PS25, Line 53: deigned
nit: designed


http://gerrit.cloudera.org:8080/#/c/18191/25/be/src/util/lru-multi-cache.h@61
PS25, Line 61:
nit: extra space


http://gerrit.cloudera.org:8080/#/c/18191/25/be/src/util/lru-multi-cache.h@74
PS25, Line 74: LRU order
Please mention that least recently used elements are at the front.


http://gerrit.cloudera.org:8080/#/c/18191/25/be/src/util/lru-multi-cache.inline.h
File be/src/util/lru-multi-cache.inline.h:

http://gerrit.cloudera.org:8080/#/c/18191/25/be/src/util/lru-multi-cache.inline.h@214
PS25, Line 214: in_use
Do we need 'in_use'? Can't we just use 'member_hook.is_linked()' instead?

Maybe we just need a member function InUse(): return !member_hook.is_linked();



--
To view, visit http://gerrit.cloudera.org:8080/18191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a
Gerrit-Change-Number: 18191
Gerrit-PatchSet: 25
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 28 Feb 2022 11:21:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10049: Include RPC call id in slow RPC logs

2022-02-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18243 )

Change subject: IMPALA-10049: Include RPC call_id in slow RPC logs
..


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7886/


--
To view, visit http://gerrit.cloudera.org:8080/18243
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7fb5746fa0be575745a8e168405d43115c425389
Gerrit-Change-Number: 18243
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 28 Feb 2022 09:46:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11053: Impala should be able to read migrated partitioned Iceberg tables

2022-02-28 Thread Tamas Mate (Code Review)
Tamas Mate has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18240 )

Change subject: IMPALA-11053: Impala should be able to read migrated 
partitioned Iceberg tables
..


Patch Set 4: Code-Review+1

Thank you for the update Zoltan, this change looks nice!
LGTM!


--
To view, visit http://gerrit.cloudera.org:8080/18240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac11a02de709d43532056f71359c49d20c1be2b8
Gerrit-Change-Number: 18240
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 28 Feb 2022 08:32:01 +
Gerrit-HasComments: No