[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-11-25 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16765 )

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG@26
PS3, Line 26:   remain unchanged after lowering the MAX_ROW_SIZE.
A better test is using SET_DENY_RESERVATION_PROBABILITY to verify the 
BufferedPlanRootSink can work in minimal reservation. There are some examples 
in query_test/test_spilling.py::TestSpillingDebugActionDimensions. Maybe we can 
add a test in test_spilling_large_rows.


http://gerrit.cloudera.org:8080/#/c/16765/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/16765/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@90
PS3, Line 90:   long minMemReservationBytes = 2 * bufferSize;
We should be able to work using the minimal reservation. So I think the problem 
is minMemReservationBytes should be 2 * maxRowBufferSize here, i.e. reserve mem 
for one large read page and one large write page.

The minMemReservation calculation of the final aggregation node is an example:
https://github.com/apache/impala/blob/8ea49e9b026d48b46e9fbd98dc5286f3e6dfa93d/fe/src/main/java/org/apache/impala/planner/AggregationNode.java#L582.
 The final AggregationNode needs n pages, then the min mem reservation is 
bufferSize * (n-2) + maxRowBufferSize * 2.

Here the BufferedPlanRootSink has a SpillableRowBatchQueue which is backed by a 
BufferedTupleStream. The stream is a read-write stream so may pin one read page 
and one write page at the same time, and they could both be a large page. So we 
should at least reserve 2 * maxRowBufferSize of mem for it. With enough 
reservation, batch_queue_->AddRow won't fail and hit the DCHECK:
https://github.com/apache/impala/blob/eea617b/be/src/runtime/spillable-row-batch-queue.cc#L97



--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 26 Nov 2020 07:33:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9382: part 2/3: aggregate profiles sent to coordinator

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16057 )

Change subject: IMPALA-9382: part 2/3: aggregate profiles sent to coordinator
..

IMPALA-9382: part 2/3: aggregate profiles sent to coordinator

This reworks the status reporting so that serialized
AggregatedRuntimeProfile objects are sent from executors
to coordinators. These profiles are substantially denser
and faster to process for higher mt_dop values. The aggregation
is also done in a single step, merging the aggregated thrift
profile from the executor directly into the final aggregated
profile, instead of converting it to an unaggregated profile
first.

The changes required were:
* A new Update() method for AggregatedRuntimeProfile that
  updates the profile from a serialised AggregateRuntimeProfile
  for a subset of the instances. The code is generalized from the
  existing InitFromThrift() code path.
* Per-fragment reports included in the status report protobuf
  when --gen_experimental_profile=true.
* Logic on the coordinator that either consumes serialized
  AggregatedRuntimeProfile per fragment, when
  --gen_experimental_profile=true, or consumes a serialized
  RuntimeProfile per finstance otherwise.

This also adds support for event sequences and time series
in the aggregated profile, so the amount of information
in the aggregated profile is now on par with the basic profile.

We also finish off support for JSON profile. The JSON profile is
more stripped down because we do not need to round-trip profiles
via JSON and it is a much less dense profile representation.

Part 3 will clean up and improve the display of the profile.

Testing:
* Add sanity tests for aggregated runtime profile.
* Add unit tests to exercise aggregation of the various counter types
* Ran core tests.

Change-Id: Ic680cbfe94c939c2a8fad9d0943034ed058c6bca
Reviewed-on: http://gerrit.cloudera.org:8080/16057
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/fragment-state.cc
M be/src/runtime/fragment-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/impala-server.cc
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile-test.cc
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/RuntimeProfile.thrift
A testdata/workloads/tpch/queries/runtime-profile-aggregated.test
A tests/custom_cluster/test_runtime_profile.py
18 files changed, 1,352 insertions(+), 264 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic680cbfe94c939c2a8fad9d0943034ed058c6bca
Gerrit-Change-Number: 16057
Gerrit-PatchSet: 20
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9382: part 2/3: aggregate profiles sent to coordinator

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16057 )

Change subject: IMPALA-9382: part 2/3: aggregate profiles sent to coordinator
..


Patch Set 19: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic680cbfe94c939c2a8fad9d0943034ed058c6bca
Gerrit-Change-Number: 16057
Gerrit-PatchSet: 19
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Nov 2020 06:50:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 14: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6714/


--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 14
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Nov 2020 04:28:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-11-25 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16765 )

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG@10
PS3, Line 10: This happens
: because results are spilled using a SpillableRowBatchQueue which 
needs 2
: buffer (read and write) with at least MAX_ROW_SIZE bytes per 
buffer.
: This patch fixes this by s
> nit: how about :
Done


http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py
File tests/query_test/test_result_spooling.py:

http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@68
PS3, Line 68: exec_options['max_row_size'] = 16 * 1024
> I think you can remove this change and the one in test_query_retries since
test_spilling and test_query_retries need to force spill by lowering spill 
related query options.
As consequence of this patch, MAX_ROW_SIZE now also need to be lowered in order 
to force spill.

The default MAX_ROW_SIZE is too high (512 KB) for the context of the tests.
With changes introduced in PlanRootSink.java, default MAX_ROW_SIZE will now 
contribute to pushing maxMemReservationBytes to 1 MB, which in turn will fit 
all rows in memory (no spill happens).

I have double check by commenting the MAX_ROW_SIZE option, and these tests 
failed without it.


http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@106
PS3, Line 106:   assert re.search(plan_root_sink_reservation_limit, profile)
This assertion, however, is OK to delete, because it is not the focus of the 
test.
I just add this to verify that ReservationLimit stay the same as before when 
MAX_ROW_SIZE does not contribute to maxMemReservationBytes.
Similarly with the one I added in test_query_retries.
Let me know what you think of.


http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@419
PS3, Line 419:   """These tests verify that while calculating max_reservation 
for spooling these query
> nit: you can add a comment here saying that these tests verify that while c
Done


http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@479
PS3, Line 479: ")
> nit: spills
Done



--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 26 Nov 2020 04:22:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16765 )

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7739/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 26 Nov 2020 04:21:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-11-25 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16765

to look at the new patch set (#5).

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..

IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation

PlanRootSink can fail silently if result spooling is enabled and
maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. This happens
because results are spilled using a SpillableRowBatchQueue which needs 2
buffer (read and write) with at least MAX_ROW_SIZE bytes per buffer.
This patch fixes this by setting the PlanRootSink's
maxMemReservationBytes as:

  max(minMemReservationBytes, MAX_RESULT_SPOOLING_MEM, 2 * MAX_ROW_SIZE)

minMemReservationBytes itself remain unchanged as:

  2 * DEFAULT_SPILLABLE_BUFFER_SIZE

Testing:
- Pass exhaustive tests.
- Add e2e TestResultSpoolingMaxReservation.
- Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to
  extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit
  remain unchanged after lowering the MAX_ROW_SIZE.

Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
---
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M tests/custom_cluster/test_query_retries.py
M tests/query_test/test_result_spooling.py
3 files changed, 95 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/16765/5
--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-11-25 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16765

to look at the new patch set (#4).

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..

IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation

PlanRootSink can fail silently if result spooling is enabled and
maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. Underneath, a
SpillableRowBatchQueue need 2 buffer (read and write) each to fit at
least MAX_ROW_SIZE bytes. This patch change the ResourceProfile's
maxMemReservationBytes as:

  max(minMemReservationBytes, MAX_RESULT_SPOOLING_MEM, 2 * MAX_ROW_SIZE)

minMemReservationBytes itself remain unchanged as:

  2 * DEFAULT_SPILLABLE_BUFFER_SIZE

Testing:
- Pass exhaustive tests.
- Add e2e TestResultSpoolingMaxReservation.
- Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to
  extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit
  remain unchanged after lowering the MAX_ROW_SIZE.

Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
---
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M tests/custom_cluster/test_query_retries.py
M tests/query_test/test_result_spooling.py
3 files changed, 95 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/16765/4
--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-11-25 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16765 )

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG@10
PS3, Line 10: Underneath, a
: SpillableRowBatchQueue need 2 buffer (read and write) each to fit 
at
: least MAX_ROW_SIZE bytes. This patch change the ResourceProfile's
: maxMemReservationBytes as:
nit: how about :
This happens because results are spilled using a SpillableRowBatchQueue which 
needs 2 buffer (read and write) with at least MAX_ROW_SIZE bytes per buffer. 
This patch fixes this by setting the PlanRootSink's maxMemReservationBytes as: 
max(minMemReservationBytes, MAX_RESULT_SPOOLING_MEM, 2 * MAX_ROW_SIZE)


http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py
File tests/query_test/test_result_spooling.py:

http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@68
PS3, Line 68: exec_options['max_row_size'] = 16 * 1024
I think you can remove this change and the one in test_query_retries since you 
have added a self contained test to verify max_row_size's contribution and 
these other tests are more concerned with testing other functionality.


http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@419
PS3, Line 419:
nit: you can add a comment here saying that these tests verify that while 
calculating max_reservation for spooling these query options are taken into 
account: MAX_ROW_SIZE, MAX_RESULT_SPOOLING_MEM and 
DEFAULT_SPILLABLE_BUFFER_SIZE.


http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@479
PS3, Line 479: spill
nit: spills



--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 26 Nov 2020 02:11:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9382: part 2/3: aggregate profiles sent to coordinator

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16057 )

Change subject: IMPALA-9382: part 2/3: aggregate profiles sent to coordinator
..


Patch Set 18:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7738/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic680cbfe94c939c2a8fad9d0943034ed058c6bca
Gerrit-Change-Number: 16057
Gerrit-PatchSet: 18
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Nov 2020 01:34:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9382: part 2/3: aggregate profiles sent to coordinator

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16057 )

Change subject: IMPALA-9382: part 2/3: aggregate profiles sent to coordinator
..


Patch Set 19: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic680cbfe94c939c2a8fad9d0943034ed058c6bca
Gerrit-Change-Number: 16057
Gerrit-PatchSet: 19
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Nov 2020 01:12:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9382: part 2/3: aggregate profiles sent to coordinator

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16057 )

Change subject: IMPALA-9382: part 2/3: aggregate profiles sent to coordinator
..


Patch Set 19:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6715/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic680cbfe94c939c2a8fad9d0943034ed058c6bca
Gerrit-Change-Number: 16057
Gerrit-PatchSet: 19
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Nov 2020 01:12:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9382: part 2/3: aggregate profiles sent to coordinator

2020-11-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16057 )

Change subject: IMPALA-9382: part 2/3: aggregate profiles sent to coordinator
..


Patch Set 18: Code-Review+2

The test was just too short-running. Made it more substantial.


--
To view, visit http://gerrit.cloudera.org:8080/16057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic680cbfe94c939c2a8fad9d0943034ed058c6bca
Gerrit-Change-Number: 16057
Gerrit-PatchSet: 18
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Nov 2020 01:11:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9382: part 2/3: aggregate profiles sent to coordinator

2020-11-25 Thread Tim Armstrong (Code Review)
Hello Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16057

to look at the new patch set (#18).

Change subject: IMPALA-9382: part 2/3: aggregate profiles sent to coordinator
..

IMPALA-9382: part 2/3: aggregate profiles sent to coordinator

This reworks the status reporting so that serialized
AggregatedRuntimeProfile objects are sent from executors
to coordinators. These profiles are substantially denser
and faster to process for higher mt_dop values. The aggregation
is also done in a single step, merging the aggregated thrift
profile from the executor directly into the final aggregated
profile, instead of converting it to an unaggregated profile
first.

The changes required were:
* A new Update() method for AggregatedRuntimeProfile that
  updates the profile from a serialised AggregateRuntimeProfile
  for a subset of the instances. The code is generalized from the
  existing InitFromThrift() code path.
* Per-fragment reports included in the status report protobuf
  when --gen_experimental_profile=true.
* Logic on the coordinator that either consumes serialized
  AggregatedRuntimeProfile per fragment, when
  --gen_experimental_profile=true, or consumes a serialized
  RuntimeProfile per finstance otherwise.

This also adds support for event sequences and time series
in the aggregated profile, so the amount of information
in the aggregated profile is now on par with the basic profile.

We also finish off support for JSON profile. The JSON profile is
more stripped down because we do not need to round-trip profiles
via JSON and it is a much less dense profile representation.

Part 3 will clean up and improve the display of the profile.

Testing:
* Add sanity tests for aggregated runtime profile.
* Add unit tests to exercise aggregation of the various counter types
* Ran core tests.

Change-Id: Ic680cbfe94c939c2a8fad9d0943034ed058c6bca
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/fragment-state.cc
M be/src/runtime/fragment-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/impala-server.cc
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile-test.cc
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/RuntimeProfile.thrift
A testdata/workloads/tpch/queries/runtime-profile-aggregated.test
A tests/custom_cluster/test_runtime_profile.py
18 files changed, 1,352 insertions(+), 264 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/16057/18
--
To view, visit http://gerrit.cloudera.org:8080/16057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic680cbfe94c939c2a8fad9d0943034ed058c6bca
Gerrit-Change-Number: 16057
Gerrit-PatchSet: 18
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6714/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 14
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 25 Nov 2020 22:55:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 14: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 14
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 25 Nov 2020 22:55:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7737/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 13
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 25 Nov 2020 22:50:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-25 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 13: Code-Review+2

(1 comment)

Carry forward +2

http://gerrit.cloudera.org:8080/#/c/16723/12/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/16723/12/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@1211
PS12, Line 1211: strBuilder.append(ToSqlUtils.getPlanHintsSql(options,
> line too long (96 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 13
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 25 Nov 2020 22:30:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-25 Thread Aman Sinha (Code Review)
Hello Qifan Chen, Shant Hovsepian, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16723

to look at the new patch set (#13).

Change subject: IMPALA-10314: Optimize planning time for simple limits
..

IMPALA-10314: Optimize planning time for simple limits

This patch optimizes the planning time for simple limit
queries by only considering a minimal set of partitions
whose file descriptors add up to N (the specified limit).
Each file is conservatively estimated to contain 1 row.

This reduces the number of partitions processed by
HdfsScanNode.computeScanRangeLocations() which, according
to query profiling, has been the main contributor to the
planning time especially for large number of partitions.
Further, within each partition, we only consider the number
of non-empty files that brings the total to N.

This is an opt-in optimization. A new planner option
OPTIMIZE_SIMPLE_LIMIT enables this optimization. Further,
if there's a WHERE clause, it must have an 'always_true'
hint in order for the optimization to be considered. For
example:
  set optimize_simple_limit = true;
  SELECT * FROM T
WHERE /* +always_true */ 
  LIMIT 10;

If there are too many empty files in the partitions, it is
possible that the query may produce fewer rows although
those are still valid rows.

Testing:
 - Added planner tests for the optimization
 - Ran query_test.py tests by enabling the optimize_simple_limit
 - Added an e2e test. Since result rows are non-deterministic,
   only simple count(*) query on top of subquery with limit
   was added.

Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test
M 
testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test
17 files changed, 514 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16723/13
--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 13
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-11-25 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325: Parquet scan should use min/max statistics to 
skip pages based on equi-join predicate
..


Patch Set 18:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16720/16/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/16720/16/be/src/exec/parquet/hdfs-parquet-scanner.cc@715
PS16, Line 715: }
nit: +2 indent


http://gerrit.cloudera.org:8080/#/c/16720/18/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/16720/18/be/src/exec/parquet/hdfs-parquet-scanner.cc@497
PS18, Line 497:   DCHECK(all_nulls);
  :   *all_nulls = false;
  :   static char dummy[sizeof(ColumnStatsReader)];
  :   SchemaNode* node = nullptr;
  :   Status status = HandlePosAndMissingField(slot_desc, 
missing_field, );
  :
  :   if (!status.ok()) {
  : return std::pair(
  : status, *reinterpret_cast(dummy));
  :   }
  :
  :   if (*missing_field) {
  : return std::pair(
  : Status::OK(), 
*reinterpret_cast(dummy));
  :   }
  :
  :   int col_idx = node->col_idx;
  :   DCHECK_LT(col_idx, row_group.columns.size());
Can't we move this outside of CreateStatsReader to the callsites? 
CreateStatsReader could get 'node' as an input parameter. I think that this 
would simplify code by no longer needing std::pair 
return values



--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 25 Nov 2020 22:06:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325: Parquet scan should use min/max statistics to 
skip pages based on equi-join predicate
..


Patch Set 18:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/7736/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 25 Nov 2020 21:25:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-11-25 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325: Parquet scan should use min/max statistics to 
skip pages based on equi-join predicate
..

IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on 
equi-join predicate

This patch adds the logic to utilize min/max stats for Parquet row
groups or pages to skip these entities when they don't qualify an
equi-join predicate.

A new class of predicates called overlap predicates is introduced to aid
in the determination of whether a Parquet row group or a page overlap
with a range computed from the hash join. If not, then the entire
Parquet row group or the page are skipped.

The new class of predicates co-exist with the existing min/max conjuncts
that are introduced based on the local scan predicates. Both classes of
predicates can work individually or together with each other. The
overlap predicates are evaluated after the existing min/max conjuncts.

TBD:
1. Unit/performance testing;
2. Core testing.

Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
---
M be/src/exec/exec-node.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
16 files changed, 654 insertions(+), 128 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16720/18
--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16765 )

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7735/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 25 Nov 2020 19:13:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16765 )

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7734/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 25 Nov 2020 19:02:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-11-25 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16765 )

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16765/1/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/16765/1/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@87
PS1, Line 87: getDefault_spillable_buffer_size
> got it, thanks for the explanation
I was wrong. Large DEFAULT_SPILLABLE_BUFFER_SIZE can wins calculation for 
maxMemReservationBytes by increasing minMemReservationBytes. The test case I 
mention earlier, however, stays true.

I adjust the commit message and add 
TestResultSpoolingMaxReservation::test_high_default_spillable_buffer to reflect 
this.


http://gerrit.cloudera.org:8080/#/c/16765/1/tests/custom_cluster/test_query_retries.py
File tests/custom_cluster/test_query_retries.py:

http://gerrit.cloudera.org:8080/#/c/16765/1/tests/custom_cluster/test_query_retries.py@589
PS1, Line 589: 'max_row_size': 8 * 1024,
> can you add a test that confirms that changing this will have an effect. So
I add verification to check that query ReservationLimit remains unchanged after 
addition of MAX_ROW_SIZE query option.

I also add TestResultSpoolingMaxReservation to verify that all three of 
MAX_ROW_SIZE, MAX_RESULT_SPOOLING_MEM, and DEFAULT_SPILLABLE_BUFFER_SIZE query 
options can contribute towards increasing PLAN_ROOT_SINK's ReservationLimit.



--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 25 Nov 2020 19:03:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-11-25 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16765

to look at the new patch set (#3).

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..

IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation

PlanRootSink can fail silently if result spooling is enabled and
maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. Underneath, a
SpillableRowBatchQueue need 2 buffer (read and write) each to fit at
least MAX_ROW_SIZE bytes. This patch change the ResourceProfile's
maxMemReservationBytes as:

  max(minMemReservationBytes, MAX_RESULT_SPOOLING_MEM, 2 * MAX_ROW_SIZE)

minMemReservationBytes itself remain unchanged as:

  2 * DEFAULT_SPILLABLE_BUFFER_SIZE

Testing:
- Pass exhaustive tests.
- Add e2e TestResultSpoolingMaxReservation.
- Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to
  extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit
  remain unchanged after lowering the MAX_ROW_SIZE.

Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
---
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M tests/custom_cluster/test_query_retries.py
M tests/query_test/test_result_spooling.py
3 files changed, 92 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/16765/3
--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16765 )

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16765/2/tests/query_test/test_result_spooling.py
File tests/query_test/test_result_spooling.py:

http://gerrit.cloudera.org:8080/#/c/16765/2/tests/query_test/test_result_spooling.py@447
PS2, Line 447: q
flake8: F821 undefined name 'query'


http://gerrit.cloudera.org:8080/#/c/16765/2/tests/query_test/test_result_spooling.py@493
PS2, Line 493:
flake8: W391 blank line at end of file



--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 25 Nov 2020 18:43:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-11-25 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16765

to look at the new patch set (#2).

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..

IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation

PlanRootSink can fail silently if result spooling is enabled and
maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. Underneath, a
SpillableRowBatchQueue need 2 buffer (read and write) each to fit at
least MAX_ROW_SIZE bytes. This patch change the ResourceProfile's
maxMemReservationBytes as:

  max(minMemReservationBytes, MAX_RESULT_SPOOLING_MEM, 2 * MAX_ROW_SIZE)

minMemReservationBytes itself remain unchanged as:

  2 * DEFAULT_SPILLABLE_BUFFER_SIZE

Testing:
- Pass exhaustive tests.
- Add e2e TestResultSpoolingMaxReservation.
- Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to
  extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit
  remain unchanged after lowering the MAX_ROW_SIZE.

Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
---
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M tests/custom_cluster/test_query_retries.py
M tests/query_test/test_result_spooling.py
3 files changed, 92 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/16765/2
--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-11-25 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325: Parquet scan should use min/max statistics to 
skip pages based on equi-join predicate
..


Patch Set 17:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16720/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16720/16//COMMIT_MSG@21
PS16, Line 21: evaluted
nit: evaluated


http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc@549
PS12, Line 549:   int64_t tuple_size = min_max_tuple_desc->byte_size();
> That seems a good idea, in that the new logic here can be moved over to the
Was the idea to whenever a min/max filter arrives, we could extend the 
min_max_tuple_ with a pair of slots (min and max filter value) and 
min_max_conjunct_evals_ with two new predicates (filter_min <= data and 
filter_max >= data)?

Creating the slot descriptors dynamically can be cumbersome, or maybe we could 
just create the descriptors in advance like we already do AFAICT, and only 
evaluate the conjuncts that has their filter arrived?

I think it's an interesting idea, worth to investigate this direction. It could 
probably simplify the code a lot because we'd get row group-level, page-level, 
and row-level filtering for free.


http://gerrit.cloudera.org:8080/#/c/16720/17/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/16720/17/be/src/exec/parquet/hdfs-parquet-scanner.cc@493
PS17, Line 493: std::pair
Nit: I think we usually return multiple values in output parameters, and the 
return value is only Status.



--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 25 Nov 2020 18:11:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325: Parquet scan should use min/max statistics to 
skip pages based on equi-join predicate
..


Patch Set 17:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/7733/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 25 Nov 2020 18:01:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-11-25 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#17). ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325: Parquet scan should use min/max statistics to 
skip pages based on equi-join predicate
..

IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on 
equi-join predicate

This patch adds the logic to utilize min/max stats for Parquet row
groups or pages to skip these entities when they don't qualify an
equi-join predicate.

A new class of predicates called overlap predicates is introduced to aid
in the determination of whether a Parquet row group or a page overlap
with a range computed from the hash join. If not, then the entire
Parquet row group or the page are skipped.

The new class of predicates co-exist with the existing min/max conjuncts
that are introduced based on the local scan predicates. Both classes of
predicates can work individually or together with each other. The overlap
predicates are evaluted after the existing min/max conjuncts.

To be done:
1. Handle STRING, DATE, TIME and DECIMAL data tyes;
2. Unit/performance testing;
3. Core testing.

Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
---
M be/src/exec/exec-node.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
16 files changed, 619 insertions(+), 128 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16720/17
--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10134: Implement ds hll union f() function.

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16711 )

Change subject: IMPALA-10134: Implement ds_hll_union_f() function.
..


Patch Set 6: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Gerrit-Change-Number: 16711
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 25 Nov 2020 14:33:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10134: Implement ds hll union f() function.

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16711 )

Change subject: IMPALA-10134: Implement ds_hll_union_f() function.
..

IMPALA-10134: Implement ds_hll_union_f() function.

This function receives two strings that are serialized Apache DataSketches
HLL sketches. Union two sketches and returns the resulting sketch of union.

Example:
select ds_hll_estimate(ds_hll_union_f(i_i, h_i))
from hll_sketches_impala_hive2;
+---+
| ds_hll_estimate(ds_hll_union_f(i_i, h_i)) |
+---+
| 7 |
+---+

Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Reviewed-on: http://gerrit.cloudera.org:8080/16711
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
4 files changed, 116 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Gerrit-Change-Number: 16711
Gerrit-PatchSet: 7
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10306: [DOCS] remove TZ offset texts from the list

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16689 )

Change subject: IMPALA-10306: [DOCS] remove TZ offset texts from the list
..


Patch Set 5: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I693d15f1230dd7eebcbf2a16657a3850943749e1
Gerrit-Change-Number: 16689
Gerrit-PatchSet: 5
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Wed, 25 Nov 2020 13:52:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10306: [DOCS] remove TZ offset texts from the list

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16689 )

Change subject: IMPALA-10306: [DOCS] remove TZ offset texts from the list
..

IMPALA-10306: [DOCS] remove TZ offset texts from the list

added an item in the list that says that TZ offset will not be included
in the output of this function even if the offset is provided as input

Change-Id: I693d15f1230dd7eebcbf2a16657a3850943749e1
Reviewed-on: http://gerrit.cloudera.org:8080/16689
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M docs/topics/impala_datetime_functions.xml
1 file changed, 43 insertions(+), 110 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I693d15f1230dd7eebcbf2a16657a3850943749e1
Gerrit-Change-Number: 16689
Gerrit-PatchSet: 6
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 


[Impala-ASF-CR] IMPALA-6671: Skip locked tables from topic updates

2020-11-25 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16549 )

Change subject: IMPALA-6671: Skip locked tables from topic updates
..


Patch Set 10:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16549/10//COMMIT_MSG@46
PS10, Line 46: 2
nit: 3


http://gerrit.cloudera.org:8080/#/c/16549/10/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/16549/10/be/src/catalog/catalog-server.cc@79
PS10, Line 79: the
nit: redundant "the"


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@25
PS10, Line 25: import java.sql.Time;
nit: unused import


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@40
PS10, Line 40: import java.util.concurrent.locks.ReentrantLock;
nit: unused import


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@212
PS10, Line 212: maxSkippedUpdatesLockContention
nit: maxSkippedUpdatesLockContention_


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@214
PS10, Line 214: topicUpdateTblLockMaxWaitTimeMs
nit: topicUpdateTblLockMaxWaitTimeMs_


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@453
PS10, Line 453: lock
It may be useful to know the lock type (read/write) in debugging.


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@460
PS10, Line 460: 0, TimeUnit.SECONDS
Can we directly use the specified timeout here? I think the answer is no and 
the reason is we want to release the catalog versionLock immediately. If so, I 
think it's worth a comment here.


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1395
PS10, Line 1395: topicUpdateTblLockMaxWaitTimeMs
I think the timeout should be topicUpdateTblLockMaxWaitTimeMs / maxAttempts


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1399
PS10, Line 1399: break;
nit: we can return true directly here. Then we don't need the lockAcquired var.


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1416
PS10, Line 1416:   // if pendingVersionUpdated is false it means that 
tblVersion has been changed
   :   // and hence we didn't update the pendingVersion. We 
retry once to acquire a read
   :   // lock.
Should we update tblVersion to be hdfsTable.getCatalogVersion() in this case?


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2215
PS10, Line 2215:   
updatedTbl.setCatalogVersion(incrementAndGetCatalogVersion());
If the existingTbl has a pendingVersion and both instances are HdfsTable, we 
loss the pendingVersion in the updatedTbl. But looks like it's ok because 
incrementAndGetCatalogVersion() will always get a version larger than the 
pendingVersion. This is just a corner case that only happens when a table 
loading is triggered due to stale writeIdList. It's worth a comment here.

BTW, what if we use incrementAndGetCatalogVersion() at the end of table 
modifications to set its catalog version? Is it a smipler solution than the 
pendingVersion solution? We just need to avoid deadlocks since 
incrementAndGetCatalogVersion() requires the catalog versionLock but we are 
holding the table lock in table modifications. It seems ok if we always use 
tryLock to acquire the table lock and give up the catalog version lock when it 
fails.


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3383
PS10, Line 3383:   // TODO(todd): consider a read-write lock here.
We can remove this TODO now.


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@299
PS10, Line 299: Hdfs
nit: HdfsTable?


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2709
PS10, Line 2709: then
nit: the


http://gerrit.cloudera.org:8080/#/c/16549/10/fe/src/main/java/org/apache/impala/catalog/Table.java
File 

[Impala-ASF-CR] IMPALA-10134: Implement ds hll union f() function.

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16711 )

Change subject: IMPALA-10134: Implement ds_hll_union_f() function.
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6713/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Gerrit-Change-Number: 16711
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 25 Nov 2020 09:05:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10134: Implement ds hll union f() function.

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16711 )

Change subject: IMPALA-10134: Implement ds_hll_union_f() function.
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Gerrit-Change-Number: 16711
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 25 Nov 2020 09:05:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10134: Implement ds hll union f() function.

2020-11-25 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16711 )

Change subject: IMPALA-10134: Implement ds_hll_union_f() function.
..


Patch Set 5: Code-Review+2

Great work!


--
To view, visit http://gerrit.cloudera.org:8080/16711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Gerrit-Change-Number: 16711
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 25 Nov 2020 09:05:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10306: [DOCS] remove TZ offset texts from the list

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16689 )

Change subject: IMPALA-10306: [DOCS] remove TZ offset texts from the list
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6712/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I693d15f1230dd7eebcbf2a16657a3850943749e1
Gerrit-Change-Number: 16689
Gerrit-PatchSet: 5
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Wed, 25 Nov 2020 08:22:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10306: [DOCS] remove TZ offset texts from the list

2020-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16689 )

Change subject: IMPALA-10306: [DOCS] remove TZ offset texts from the list
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I693d15f1230dd7eebcbf2a16657a3850943749e1
Gerrit-Change-Number: 16689
Gerrit-PatchSet: 5
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Wed, 25 Nov 2020 08:22:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10306: [DOCS] remove TZ offset texts from the list

2020-11-25 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16689 )

Change subject: IMPALA-10306: [DOCS] remove TZ offset texts from the list
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I693d15f1230dd7eebcbf2a16657a3850943749e1
Gerrit-Change-Number: 16689
Gerrit-PatchSet: 4
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Wed, 25 Nov 2020 08:21:44 +
Gerrit-HasComments: No