[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-11-25 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16765 )

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG@10
PS3, Line 10: This happens
: because results are spilled using a SpillableRowBatchQueue which 
needs 2
: buffer (read and write) with at least MAX_ROW_SIZE bytes per 
buffer.
: This patch fixes this by s
> nit: how about :
Done


http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py
File tests/query_test/test_result_spooling.py:

http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@68
PS3, Line 68: exec_options['max_row_size'] = 16 * 1024
> I think you can remove this change and the one in test_query_retries since
test_spilling and test_query_retries need to force spill by lowering spill 
related query options.
As consequence of this patch, MAX_ROW_SIZE now also need to be lowered in order 
to force spill.

The default MAX_ROW_SIZE is too high (512 KB) for the context of the tests.
With changes introduced in PlanRootSink.java, default MAX_ROW_SIZE will now 
contribute to pushing maxMemReservationBytes to 1 MB, which in turn will fit 
all rows in memory (no spill happens).

I have double check by commenting the MAX_ROW_SIZE option, and these tests 
failed without it.


http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@106
PS3, Line 106:   assert re.search(plan_root_sink_reservation_limit, profile)
This assertion, however, is OK to delete, because it is not the focus of the 
test.
I just add this to verify that ReservationLimit stay the same as before when 
MAX_ROW_SIZE does not contribute to maxMemReservationBytes.
Similarly with the one I added in test_query_retries.
Let me know what you think of.


http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@419
PS3, Line 419:   """These tests verify that while calculating max_reservation 
for spooling these query
> nit: you can add a comment here saying that these tests verify that while c
Done


http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@479
PS3, Line 479: ")
> nit: spills
Done



--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 26 Nov 2020 04:22:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser

2020-12-04 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16821 )

Change subject: IMPALA-9865: part 1: basic profile log parser
..


Patch Set 2:

(2 comments)

> Uploaded patch set 2.

Hi Tim,
Thanks for writing this parser. This is very useful.
I just have 2 comment.

http://gerrit.cloudera.org:8080/#/c/16821/2/be/src/util/impala-profile-tool.cc
File be/src/util/impala-profile-tool.cc:

http://gerrit.cloudera.org:8080/#/c/16821/2/be/src/util/impala-profile-tool.cc@31
PS2, Line 31: // is pretty-printed to standard output.
Add simple usage example in the doc maybe? like

impala-profile-tool < impala_profile_log_1.1-1607057366897


http://gerrit.cloudera.org:8080/#/c/16821/2/be/src/util/impala-profile-tool.cc@59
PS2, Line 59: getline(cin, line);
Tried to run the parser against my local runtime profile log. It seems It 
always hit "Error reading line" when it reach EOF.
What if we move this getline as the loop condition? say

for (std::string line; std::getline(cin, line); ) {



--
To view, visit http://gerrit.cloudera.org:8080/16821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3
Gerrit-Change-Number: 16821
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 05 Dec 2020 02:48:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-12-03 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16765

to look at the new patch set (#7).

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..

IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation

PlanRootSink can fail silently if result spooling is enabled and
maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. This happens
because results are spilled using a SpillableRowBatchQueue which needs 2
buffer (read and write) with at least MAX_ROW_SIZE bytes per buffer.
This patch fixes this by setting a lower bound of 2 * MAX_ROW_SIZE while
computing the min reservation for the PlanRootSink.

Testing:
- Pass exhaustive tests.
- Add e2e TestResultSpoolingMaxReservation.
- Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to
  extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit
  remain unchanged after lowering the MAX_ROW_SIZE.

Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
---
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/spillable-row-batch-queue.cc
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M tests/custom_cluster/test_query_retries.py
M tests/query_test/test_result_spooling.py
5 files changed, 118 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/16765/7
--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-12-03 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16765 )

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..


Patch Set 7:

(3 comments)

Thanks Bikram,

http://gerrit.cloudera.org:8080/#/c/16765/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16765/6//COMMIT_MSG@14
PS6, Line 14: computing the min reservation for the PlanRootS
> nit: i think this description is slightly opaque since the reader would hav
Done


http://gerrit.cloudera.org:8080/#/c/16765/6/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/16765/6/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@90
PS6, Line 90: bufferSize, queryOptions.getMax_row_size());
> nit: can you add a small comment on why we need 2 maxRowBufferSize, either
I edited the method documentation a bit to explain this. Hope it is clear.


http://gerrit.cloudera.org:8080/#/c/16765/6/tests/query_test/test_result_spooling.py
File tests/query_test/test_result_spooling.py:

http://gerrit.cloudera.org:8080/#/c/16765/6/tests/query_test/test_result_spooling.py@426
PS6, Line 426: DEBUG_ACTION_V
> nit: switch to ALL_CAPS to be consistent with rest of the codebase.
Done



--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 04 Dec 2020 04:55:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser

2020-12-07 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16821 )

Change subject: IMPALA-9865: part 1: basic profile log parser
..


Patch Set 4: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/16821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3
Gerrit-Change-Number: 16821
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 Dec 2020 22:39:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-11-23 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16765 )

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16765/1/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/16765/1/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@87
PS1, Line 87: getDefault_spillable_buffer_size
> Is there any relation to this and the max_row_Size?
My understanding is that DEFAULT_SPILLABLE_BUFFER_SIZE control the minimal unit 
of page size when increasing memory reservation. It is intended to tune 
performance in spill operation (larger buffer sizes result in Impala issuing 
larger I/O requests to storage devices, which might result in higher 
throughput) rather than in-memory allocation.
it contribute to calculation of maxRowBufferSize, but not to 
maxMemReservationBytes.

Let say user only set MAX_ROW_SIZE to 256MB, and leave 
DEFAULT_SPILLABLE_BUFFER_SIZE and MAX_RESULT_SPOOLING_MEM as default (2MB and 
100 MB accordingly). Without the patch, the result is the following:
bufferSize = 2MB
maxRowBufferSize = 256MB (rounded up to nearest power of two)
minMemReservationBytes = 4MB
maxMemReservationBytes = 100MB

maxMemReservationBytes should be at least 2 * MAX_ROW_SIZE for reservation of 
read+write page to be satisfied, which is not true in this case.


http://gerrit.cloudera.org:8080/#/c/16765/1/tests/custom_cluster/test_query_retries.py
File tests/custom_cluster/test_query_retries.py:

http://gerrit.cloudera.org:8080/#/c/16765/1/tests/custom_cluster/test_query_retries.py@589
PS1, Line 589: 'max_row_size': 4 * 1024,
> is this having any effect on the query? the max_result_spooling_mem is alre
It intended to keep the test the same, keeping the maxMemReservationBytes to 
8KB.

Default MAX_ROW_SIZE is 512KB. With the proposed changes in 
PlannerRootSink.java and without explicitly lowering MAX_ROW_SIZE to 4KB, 
MAX_RESULT_SPOOLING_MEM will be ignored and maxMemReservationBytes will be set 
to 1MB instead of intended 8KB.



--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 24 Nov 2020 01:28:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-11-30 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16765 )

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG@26
PS3, Line 26:
> A better test is using SET_DENY_RESERVATION_PROBABILITY to verify the Buffe
Added debug action SET_DENY_RESERVATION_PROBABILITY@1.0 in 
TestResultSpoolingMaxReservation.
test_spilling_large_rows already exercise SET_DENY_RESERVATION_PROBABILITY with 
mt_dop=1.


http://gerrit.cloudera.org:8080/#/c/16765/5/be/src/runtime/spillable-row-batch-queue.cc
File be/src/runtime/spillable-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/16765/5/be/src/runtime/spillable-row-batch-queue.cc@97
PS5, Line 97: // have been set.
> It'd be helpful to log the row size and batch_queue_->DebugString() here.
Added row size and batch_queue_->DebugString() info in the log.

However, this reminds me of IMPALA-9851.
BufferedTupleStream::DebugString() iterate std::list that can
potentially grow very large. While most references to this method are
behind DCHEK logging (thus stripped in release build), one of them is
used to build Status message through call to
GroupingAggregator::Partition::DebugString() at this line:
https://github.com/apache/impala/blob/5530b62/be/src/exec/grouping-aggregator-partition.cc#L155

IMPALA-9851 already cap error message to 128kb at most, but that does
not eliminate the cost to iterate page list in
BufferedTupleStream::DebugString(). Should I file this as separate Jira?


http://gerrit.cloudera.org:8080/#/c/16765/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/16765/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@90
PS3, Line 90:   long minMemReservationBytes = 2 * maxRowBufferSize;
> We should be able to work using the minimal reservation. So I think the pro
Make sense.
Bump up minMemReservation instead of maxMemReservationBytes in patch set #6.


http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py
File tests/query_test/test_result_spooling.py:

http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@68
PS3, Line 68: exec_options['max_row_size'] = 16 * 1024
> test_spilling and test_query_retries need to force spill by lowering spill
Done


http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@106
PS3, Line 106:   assert re.search(plan_root_sink_reservation_limit, profile)
> Thanks for the context above, makes sense to keep this too.
Done



--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 30 Nov 2020 22:32:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-11-30 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16765

to look at the new patch set (#6).

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..

IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation

PlanRootSink can fail silently if result spooling is enabled and
maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. This happens
because results are spilled using a SpillableRowBatchQueue which needs 2
buffer (read and write) with at least MAX_ROW_SIZE bytes per buffer.
This patch fixes this by setting the PlanRootSink's
minMemReservationBytes as 2 * maxRowBufferSize.

Testing:
- Pass exhaustive tests.
- Add e2e TestResultSpoolingMaxReservation.
- Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to
  extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit
  remain unchanged after lowering the MAX_ROW_SIZE.

Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
---
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/spillable-row-batch-queue.cc
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M tests/custom_cluster/test_query_retries.py
M tests/query_test/test_result_spooling.py
5 files changed, 115 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/16765/6
--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-12-02 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16765 )

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16765/5/be/src/runtime/spillable-row-batch-queue.cc
File be/src/runtime/spillable-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/16765/5/be/src/runtime/spillable-row-batch-queue.cc@97
PS5, Line 97: // have been set.
> Good point! I didin't realize IMPALA-9851 when adding that DebugString() in
Done.
Filed as IMPALA-10374.



--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 02 Dec 2020 18:34:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10390: impala-profile-tool JSON output

2020-12-10 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16855 )

Change subject: IMPALA-10390: impala-profile-tool JSON output
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16855/2/be/src/util/impala-profile-tool.cc
File be/src/util/impala-profile-tool.cc:

http://gerrit.cloudera.org:8080/#/c/16855/2/be/src/util/impala-profile-tool.cc@44
PS2, Line 44: "   json: output as JSON with one profile per line\n"
missing description for prettyjson option?


http://gerrit.cloudera.org:8080/#/c/16855/2/be/src/util/impala-profile-tool.cc@120
PS2, Line 120:   CHECK_EQ("json", profile_format);
For json and prettyjson format, what if we return array of json profile?
So if this parser return multiple profile, the output is still a valid json and 
can be piped to other program that read json like jq.



--
To view, visit http://gerrit.cloudera.org:8080/16855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I82ae0fe9379b7e3cbe93166adaa4c37212ea0f67
Gerrit-Change-Number: 16855
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 10 Dec 2020 21:16:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.

2020-12-10 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16755 )

Change subject: IMPALA-9856: Enable result spooling by default.
..


Patch Set 3:

(10 comments)

Patch set 3 is a rebase after several commits get in, including IMPALA-10337.
I add more descriptive comments on tests where we need to explicitly disable 
result spooling.

We still need to wait for IMPALA-10371 to be resolved before we can enable 
result spooling by default.

http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_admission_controller.py
File tests/custom_cluster/test_admission_controller.py:

http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_admission_controller.py@337
PS1, Line 337: _timeout_
> nit: mention briefly what we are trying to assert, that way we know from th
Done


http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_admission_controller.py@873
PS1, Line 873: u
> flake8: E203 whitespace before ':'
Done


http://gerrit.cloudera.org:8080/#/c/16755/2/tests/custom_cluster/test_admission_controller.py
File tests/custom_cluster/test_admission_controller.py:

http://gerrit.cloudera.org:8080/#/c/16755/2/tests/custom_cluster/test_admission_controller.py@883
PS2, Line 883: :
> flake8: E203 whitespace before ':'
Done


http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_observability.py
File tests/custom_cluster/test_observability.py:

http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_observability.py@37
PS1, Line 37: ue to unresolve
> nit: mention why will this cause a crash
Done


http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_observability.py@38
PS1, Line 38: l
> flake8: E203 whitespace before ':'
Done


http://gerrit.cloudera.org:8080/#/c/16755/2/tests/custom_cluster/test_observability.py
File tests/custom_cluster/test_observability.py:

http://gerrit.cloudera.org:8080/#/c/16755/2/tests/custom_cluster/test_observability.py@40
PS2, Line 40: :
> flake8: E203 whitespace before ':'
Done


http://gerrit.cloudera.org:8080/#/c/16755/1/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

http://gerrit.cloudera.org:8080/#/c/16755/1/tests/query_test/test_udfs.py@623
PS1, Line 623: l
> flake8: E203 whitespace before ':'
Done


http://gerrit.cloudera.org:8080/#/c/16755/1/tests/query_test/test_udfs.py@630
PS1, Line 630:
> flake8: E501 line too long (98 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/16755/2/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

http://gerrit.cloudera.org:8080/#/c/16755/2/tests/query_test/test_udfs.py@625
PS2, Line 625: :
> flake8: E203 whitespace before ':'
Done


http://gerrit.cloudera.org:8080/#/c/16755/2/tests/query_test/test_udfs.py@632
PS2, Line 632:
> flake8: E501 line too long (98 > 90 characters)
Done



--
To view, visit http://gerrit.cloudera.org:8080/16755
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4
Gerrit-Change-Number: 16755
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 10 Dec 2020 18:38:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.

2020-12-10 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16755

to look at the new patch set (#2).

Change subject: IMPALA-9856: Enable result spooling by default.
..

IMPALA-9856: Enable result spooling by default.

Result spooling has been relatively stable since it was introduced, and
it has several benefits described in IMPALA-8656. This patch enable
result spooling (SPOOL_QUERY_RESULTS) query options by default.

Furthermore, some tests need to be adjusted to account for result
spooling by default. The following are the adjustment categories and
list of tests that fall under such category.

Change in assertions:
PlannerTest#testAcidTableScans
PlannerTest#testBloomFilterAssignment
PlannerTest#testConstantFolding
PlannerTest#testFkPkJoinDetection
PlannerTest#testFkPkJoinDetectionWithHDFSNumRowsEstDisabled
PlannerTest#testKuduSelectivity
PlannerTest#testMaxRowSize
PlannerTest#testMinMaxRuntimeFilters
PlannerTest#testMinMaxRuntimeFiltersWithHDFSNumRowsEstDisabled
PlannerTest#testMtDopValidation
PlannerTest#testParquetFiltering
PlannerTest#testParquetFilteringDisabled
PlannerTest#testPartitionPruning
PlannerTest#testPreaggBytesLimit
PlannerTest#testResourceRequirements
PlannerTest#testRuntimeFilterQueryOptions
PlannerTest#testSortExprMaterialization
PlannerTest#testSpillableBufferSizing
PlannerTest#testTableSample
PlannerTest#testTpch
PlannerTest#testKuduTpch
PlannerTest#testTpchNested
PlannerTest#testUnion
TpcdsPlannerTest
custom_cluster/test_admission_controller.py::TestAdmissionController::test_dedicated_coordinator_planner_estimates
custom_cluster/test_admission_controller.py::TestAdmissionController::test_memory_rejection
custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_mem_limit_configs
metadata/test_explain.py::TestExplain::test_explain_level2
metadata/test_explain.py::TestExplain::test_explain_level3
metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation

Increase BUFFER_POOL_LIMIT:
query_test/test_queries.py::TestQueries::test_analytic_fns
query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filter_reservation
query_test/test_sort.py::TestQueryFullSort::test_multiple_mem_limits_full_output
query_test/test_spilling.py::TestSpillingBroadcastJoins::test_spilling_broadcast_joins
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_aggs
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive
query_test/test_udfs.py::TestUdfExecution::test_mem_limits

Increase MEM_LIMIT:
query_test/test_mem_usage_scaling.py::TestExchangeMemUsage::test_exchange_mem_usage_scaling
query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling

Increase MAX_ROW_SIZE:
custom_cluster/test_parquet_max_page_header.py::TestParquetMaxPageHeader::test_large_page_header_config
query_test/test_insert.py::TestInsertQueries::test_insert_large_string
query_test/test_query_mem_limit.py::TestQueryMemLimit::test_mem_limit
query_test/test_scanners.py::TestTextSplitDelimiters::test_text_split_across_buffers_delimiter
query_test/test_scanners.py::TestWideRow::test_wide_row

Disable result spooling to maintain assertion:
custom_cluster/test_admission_controller.py::TestAdmissionController::test_set_request_pool
custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_host_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_pool_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_queue_reasons_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_config_change_while_queued
custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_fetched_rows
custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_finished_query
custom_cluster/test_scratch_disk.py::TestScratchDir::test_no_dirs
custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_existing_dirs
custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_writable_dirs
query_test/test_insert.py::TestInsertQueries::test_insert_large_string (the 
last query only)
query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan
query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_kudu_scan_mem_usage
query_test/test_queries.py::TestQueriesParquetTables::test_very_large_strings
query_test/test_query_mem_limit.py::TestCodegenMemLimit::test_codegen_mem_limit
shell/test_shell_client.py::TestShellClient::test_fetch_size

Disable result spooling to avoid crash (IMPALA-10371):
custom_cluster/test_observability.py::TestObservability::test_host_profile_jvm_gc_metrics
query_test/test_udfs.py::TestUdfExecution::test_java_udfs
query_test/test_udfs.py::TestUdfTargeted::test_udf_profile


[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.

2020-12-10 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16755

to look at the new patch set (#3).

Change subject: IMPALA-9856: Enable result spooling by default.
..

IMPALA-9856: Enable result spooling by default.

Result spooling has been relatively stable since it was introduced, and
it has several benefits described in IMPALA-8656. This patch enable
result spooling (SPOOL_QUERY_RESULTS) query options by default.

Furthermore, some tests need to be adjusted to account for result
spooling by default. The following are the adjustment categories and
list of tests that fall under such category.

Change in assertions:
PlannerTest#testAcidTableScans
PlannerTest#testBloomFilterAssignment
PlannerTest#testConstantFolding
PlannerTest#testFkPkJoinDetection
PlannerTest#testFkPkJoinDetectionWithHDFSNumRowsEstDisabled
PlannerTest#testKuduSelectivity
PlannerTest#testMaxRowSize
PlannerTest#testMinMaxRuntimeFilters
PlannerTest#testMinMaxRuntimeFiltersWithHDFSNumRowsEstDisabled
PlannerTest#testMtDopValidation
PlannerTest#testParquetFiltering
PlannerTest#testParquetFilteringDisabled
PlannerTest#testPartitionPruning
PlannerTest#testPreaggBytesLimit
PlannerTest#testResourceRequirements
PlannerTest#testRuntimeFilterQueryOptions
PlannerTest#testSortExprMaterialization
PlannerTest#testSpillableBufferSizing
PlannerTest#testTableSample
PlannerTest#testTpch
PlannerTest#testKuduTpch
PlannerTest#testTpchNested
PlannerTest#testUnion
TpcdsPlannerTest
custom_cluster/test_admission_controller.py::TestAdmissionController::test_dedicated_coordinator_planner_estimates
custom_cluster/test_admission_controller.py::TestAdmissionController::test_memory_rejection
custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_mem_limit_configs
metadata/test_explain.py::TestExplain::test_explain_level2
metadata/test_explain.py::TestExplain::test_explain_level3
metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation

Increase BUFFER_POOL_LIMIT:
query_test/test_queries.py::TestQueries::test_analytic_fns
query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filter_reservation
query_test/test_sort.py::TestQueryFullSort::test_multiple_mem_limits_full_output
query_test/test_spilling.py::TestSpillingBroadcastJoins::test_spilling_broadcast_joins
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_aggs
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive
query_test/test_udfs.py::TestUdfExecution::test_mem_limits

Increase MEM_LIMIT:
query_test/test_mem_usage_scaling.py::TestExchangeMemUsage::test_exchange_mem_usage_scaling
query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling

Increase MAX_ROW_SIZE:
custom_cluster/test_parquet_max_page_header.py::TestParquetMaxPageHeader::test_large_page_header_config
query_test/test_insert.py::TestInsertQueries::test_insert_large_string
query_test/test_query_mem_limit.py::TestQueryMemLimit::test_mem_limit
query_test/test_scanners.py::TestTextSplitDelimiters::test_text_split_across_buffers_delimiter
query_test/test_scanners.py::TestWideRow::test_wide_row

Disable result spooling to maintain assertion:
custom_cluster/test_admission_controller.py::TestAdmissionController::test_set_request_pool
custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_host_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_pool_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_queue_reasons_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_config_change_while_queued
custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_fetched_rows
custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_finished_query
custom_cluster/test_scratch_disk.py::TestScratchDir::test_no_dirs
custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_existing_dirs
custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_writable_dirs
query_test/test_insert.py::TestInsertQueries::test_insert_large_string (the 
last query only)
query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan
query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_kudu_scan_mem_usage
query_test/test_queries.py::TestQueriesParquetTables::test_very_large_strings
query_test/test_query_mem_limit.py::TestCodegenMemLimit::test_codegen_mem_limit
shell/test_shell_client.py::TestShellClient::test_fetch_size

Disable result spooling to avoid crash (IMPALA-10371):
custom_cluster/test_observability.py::TestObservability::test_host_profile_jvm_gc_metrics
query_test/test_udfs.py::TestUdfExecution::test_java_udfs
query_test/test_udfs.py::TestUdfTargeted::test_udf_profile


[Impala-ASF-CR] IMPALA-10390: impala-profile-tool JSON output

2020-12-10 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16855 )

Change subject: IMPALA-10390: impala-profile-tool JSON output
..


Patch Set 3: Code-Review+1

(1 comment)

Thank you, Tim. LGTM!

http://gerrit.cloudera.org:8080/#/c/16855/2/be/src/util/impala-profile-tool.cc
File be/src/util/impala-profile-tool.cc:

http://gerrit.cloudera.org:8080/#/c/16855/2/be/src/util/impala-profile-tool.cc@120
PS2, Line 120: if (profile_format == "text") {
> The one line per document is pretty common and is a bit easier to handle wi
Ah, didn't now there is a .jsonl format. I learn new thing.



--
To view, visit http://gerrit.cloudera.org:8080/16855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I82ae0fe9379b7e3cbe93166adaa4c37212ea0f67
Gerrit-Change-Number: 16855
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 11 Dec 2020 02:10:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString

2020-12-21 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16884

to look at the new patch set (#2).

Change subject: IMPALA-10374: Limit iteration at 
BufferedTupleStream::DebugString
..

IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString

BufferedTupleStream::DebugString() iterate std::list that can
potentially grow very large. As consequent, the returned string can grow
large as well and cause a problem as previously happen in IMPALA-9851.
With this patch, BufferedTupleStream::DebugString() only include maximum
of 100 first pages of page list.

Testing:
- Add new be test SimpleTupleStreamTest.ShortDebugString in
  buffered-tuple-stream-test.cc
- Pass core tests

Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d
---
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
3 files changed, 73 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/16884/2
--
To view, visit http://gerrit.cloudera.org:8080/16884
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d
Gerrit-Change-Number: 16884
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString

2020-12-21 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16884 )

Change subject: IMPALA-10374: Limit iteration at 
BufferedTupleStream::DebugString
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16884/1/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

http://gerrit.cloudera.org:8080/#/c/16884/1/be/src/runtime/buffered-tuple-stream-test.cc@1472
PS1, Line 1472: SimpleTupleStreamTest
> nit: To be consistent with other tests, I think the label should be "Simple
Done


http://gerrit.cloudera.org:8080/#/c/16884/1/be/src/runtime/buffered-tuple-stream-test.cc@1485
PS1, Line 1485: bool b = stream.AddRow(batch->GetRow(j), );
> I think we can remove this TODO.
Done


http://gerrit.cloudera.org:8080/#/c/16884/1/be/src/runtime/buffered-tuple-stream-test.cc@1508
PS1, Line 1508:
> Can we calculate 148 out based on the vars? I'm concerning it will introduc
It is not obvious for me to calculate it based on the vars.
So instead, I change it to look up stream.num_pages_ directly and add 
verification that num_pages_ > MAX_PAGE_ITER_DEBUG, given the workload.


http://gerrit.cloudera.org:8080/#/c/16884/2/be/src/runtime/buffered-tuple-stream.h
File be/src/runtime/buffered-tuple-stream.h:

http://gerrit.cloudera.org:8080/#/c/16884/2/be/src/runtime/buffered-tuple-stream.h@563
PS2, Line 563:   friend class ArrayTupleStreamTest_TestArrayDeepCopy_Test;
 :   friend class ArrayTupleStreamTest_TestComputeRowSize_Test;
 :   friend class 
MultiNullableTupleStreamTest_TestComputeRowSize_Test;
I notice that since IMPALA-10337 changed BufferedTupleStream::ComputeRowSize() 
from private to public, these lines are not required anymore.
Can I delete them along with this patch?


http://gerrit.cloudera.org:8080/#/c/16884/1/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/16884/1/be/src/runtime/buffered-tuple-stream.cc@189
PS1, Line 189: \
> Comparing to the original codes, we should have "\n" here.
Done. Thanks for catching this!



--
To view, visit http://gerrit.cloudera.org:8080/16884
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d
Gerrit-Change-Number: 16884
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 21 Dec 2020 22:14:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString

2020-12-16 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16884


Change subject: IMPALA-10374: Limit iteration at 
BufferedTupleStream::DebugString
..

IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString

BufferedTupleStream::DebugString() iterate std::list that can
potentially grow very large. As consequent, the returned string can grow
large as well and cause a problem as previously happen in IMPALA-9851.
With this patch, BufferedTupleStream::DebugString() only include maximum
of 100 first pages of page list.

Testing:
- Add new be test SimpleNullStreamTest.ShortDebugString in
  buffered-tuple-stream-test.cc
- Pass core tests

Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d
---
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
3 files changed, 71 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/16884/1
--
To view, visit http://gerrit.cloudera.org:8080/16884
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d
Gerrit-Change-Number: 16884
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString

2020-12-21 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16884

to look at the new patch set (#3).

Change subject: IMPALA-10374: Limit iteration at 
BufferedTupleStream::DebugString
..

IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString

BufferedTupleStream::DebugString() iterate std::list that can
potentially grow very large. As consequent, the returned string can grow
large as well and cause a problem as previously happen in IMPALA-9851.
With this patch, BufferedTupleStream::DebugString() only include maximum
of 100 first pages of page list.

Testing:
- Add new be test SimpleTupleStreamTest.ShortDebugString in
  buffered-tuple-stream-test.cc
- Pass core tests

Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d
---
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
3 files changed, 73 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/16884/3
--
To view, visit http://gerrit.cloudera.org:8080/16884
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d
Gerrit-Change-Number: 16884
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-9865: part 2/2: add verbosity to profile tool

2020-12-21 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16881 )

Change subject: IMPALA-9865: part 2/2: add verbosity to profile tool
..


Patch Set 8: Code-Review+1

(2 comments)

Thanks for the explanation. LGTM.

http://gerrit.cloudera.org:8080/#/c/16881/7/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16881/7/be/src/util/runtime-profile.cc@2723
PS7, Line 2723:   // Legacy profile did not show per-instance stats for 
averaged profile.
> Done
Resolved


http://gerrit.cloudera.org:8080/#/c/16881/6/tests/observability/test_profile_tool.py
File tests/observability/test_profile_tool.py:

http://gerrit.cloudera.org:8080/#/c/16881/6/tests/observability/test_profile_tool.py@36
PS6, Line 36: self._compare_profile_tool_output([],
> Yeah, producing identical output is not necessarily the goal of this tool,
Got it, thanks!



--
To view, visit http://gerrit.cloudera.org:8080/16881
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I82618a813e29af7996dfaed78873b2a73bc0231d
Gerrit-Change-Number: 16881
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 22 Dec 2020 06:58:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString

2020-12-21 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16884 )

Change subject: IMPALA-10374: Limit iteration at 
BufferedTupleStream::DebugString
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16884/2/be/src/runtime/buffered-tuple-stream.h
File be/src/runtime/buffered-tuple-stream.h:

http://gerrit.cloudera.org:8080/#/c/16884/2/be/src/runtime/buffered-tuple-stream.h@563
PS2, Line 563:   friend class SimpleTupleStreamTest_ShortDebugString_Test;
 :
 :   /// Runtime state instance used to check for cancellation. Not 
own
> Yeah, please delete them.
Done



--
To view, visit http://gerrit.cloudera.org:8080/16884
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d
Gerrit-Change-Number: 16884
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 22 Dec 2020 03:26:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9865: part 2/2: add verbosity to profile tool

2020-12-21 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16881 )

Change subject: IMPALA-9865: part 2/2: add verbosity to profile tool
..


Patch Set 7:

(6 comments)

I have 1 follow up question and 1 new finding.

http://gerrit.cloudera.org:8080/#/c/16881/6/be/src/util/impala-profile-tool.cc
File be/src/util/impala-profile-tool.cc:

http://gerrit.cloudera.org:8080/#/c/16881/6/be/src/util/impala-profile-tool.cc@88
PS6, Line 88:   RuntimeProfileBase::Verbosity configured_verbosity =
> We shouldn't actually be using the default value, since we only use if it's
Done


http://gerrit.cloudera.org:8080/#/c/16881/6/be/src/util/impala-profile-tool.cc@136
PS6, Line 136: if (FLAGS_profile_verbosity == "") {
> Yes and no. We can set the verbosity level to any level for any profile, bu
Make sense. Thanks for clarifying this.


http://gerrit.cloudera.org:8080/#/c/16881/7/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16881/7/be/src/util/runtime-profile.cc@2723
PS7, Line 2723:   if (verbosity <= Verbosity::LEGACY) continue;
Looks like this condition can be flipped and merged with the branch bellow?

if (verbosity >= Verbosity::DEFAULT && v.second.second.size() > 1) {


http://gerrit.cloudera.org:8080/#/c/16881/6/bin/rat_exclude_files.txt
File bin/rat_exclude_files.txt:

http://gerrit.cloudera.org:8080/#/c/16881/6/bin/rat_exclude_files.txt@157
PS6, Line 157: 
testdata/impala-profiles/impala_profile_log_tpcds_compute_stats.expected.txt
> Done
Resolved


http://gerrit.cloudera.org:8080/#/c/16881/6/tests/observability/test_profile_tool.py
File tests/observability/test_profile_tool.py:

http://gerrit.cloudera.org:8080/#/c/16881/6/tests/observability/test_profile_tool.py@36
PS6, Line 36: self._compare_profile_tool_output([],
> Leaving --profile_verbosity unset is different from setting it to --legacy
Ok, I understand the difference now. Setting --profile_verbosity=legacy means 
forcing legacy format, regardless of the profile_version.

Since we don't test with --profile_verbosity=legacy, am I correct to assume 
that accurate backward-compatible parsing of v2 profile to v1 representation is 
not the goal of impala-profile-tool?


http://gerrit.cloudera.org:8080/#/c/16881/6/tests/observability/test_profile_tool.py@38
PS6, Line 38: 
get_profile_path('impala_profile_log_tpcds_compute_stats.expected.txt'))
> See above comment - this is actually kinda different from legacy
Got it, thanks!



--
To view, visit http://gerrit.cloudera.org:8080/16881
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I82618a813e29af7996dfaed78873b2a73bc0231d
Gerrit-Change-Number: 16881
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 22 Dec 2020 03:12:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString

2020-12-22 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16884 )

Change subject: IMPALA-10374: Limit iteration at 
BufferedTupleStream::DebugString
..


Patch Set 4:

> Patch Set 4: Verified-1
>
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6800/

Hit by IMPALA-9550.
Ironically, I just close IMPALA-9550 yesterday as not reproducible. I will 
reopen the JIRA again.


--
To view, visit http://gerrit.cloudera.org:8080/16884
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d
Gerrit-Change-Number: 16884
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 22 Dec 2020 15:46:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test fetch

2020-12-22 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16895


Change subject: IMPALA-9550: Fix flakiness in 
TestResultSpoolingFetchSize.test_fetch
..

IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test_fetch

TestResultSpoolingFetchSize.test_fetch has been flaky in
ubuntu-16.04-dockerised environment for not reaching finished state
within 10 seconds. This patch increase the timeout of the test to 30
seconds.

Testing:
- Looped the test locally.

Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5
---
M tests/query_test/test_result_spooling.py
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16895/1
--
To view, visit http://gerrit.cloudera.org:8080/16895
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5
Gerrit-Change-Number: 16895
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString

2020-12-22 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16884 )

Change subject: IMPALA-10374: Limit iteration at 
BufferedTupleStream::DebugString
..


Patch Set 5:

Patch set 5 is a rebase on top of IMPALA-9550 fix.
The flakiness should be gone now.


--
To view, visit http://gerrit.cloudera.org:8080/16884
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d
Gerrit-Change-Number: 16884
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 23 Dec 2020 01:33:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9865: part 2/2: add verbosity to profile tool

2020-12-21 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16881 )

Change subject: IMPALA-9865: part 2/2: add verbosity to profile tool
..


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16881/6/be/src/util/impala-profile-tool.cc
File be/src/util/impala-profile-tool.cc:

http://gerrit.cloudera.org:8080/#/c/16881/6/be/src/util/impala-profile-tool.cc@88
PS6, Line 88:   RuntimeProfileBase::Verbosity configured_verbosity;
Is this meant to be initialized as RuntimeProfileBase::Verbosity::LEGACY ?


http://gerrit.cloudera.org:8080/#/c/16881/6/be/src/util/impala-profile-tool.cc@136
PS6, Line 136:   verbosity = profile_version < 2 ? 
RuntimeProfileBase::Verbosity::LEGACY :
In case of v1 profile input and --profile_verbosity=default, this line does not 
seem to override it back to Verbosity::LEGACY.
Does this implies that both v1 and v2 profile input can be parsed into any 
verbosity level?


http://gerrit.cloudera.org:8080/#/c/16881/6/bin/rat_exclude_files.txt
File bin/rat_exclude_files.txt:

http://gerrit.cloudera.org:8080/#/c/16881/6/bin/rat_exclude_files.txt@157
PS6, Line 157: 
testdata/impala-profiles/impala_profile_log_tpcds_compute_stats.expected
Should we add `.txt` suffix for expected output in text format?


http://gerrit.cloudera.org:8080/#/c/16881/6/tests/observability/test_profile_tool.py
File tests/observability/test_profile_tool.py:

http://gerrit.cloudera.org:8080/#/c/16881/6/tests/observability/test_profile_tool.py@36
PS6, Line 36: self._compare_profile_tool_output([],
Since default verbosity might change in the future, can we add a test using 
argument `--profile_verbosity=legacy` ?
Should we also test between v1 vs v2 profile input when --profile_verbosity is 
not specified?


http://gerrit.cloudera.org:8080/#/c/16881/6/tests/observability/test_profile_tool.py@38
PS6, Line 38: 
get_profile_path('impala_profile_log_tpcds_compute_stats.expected'))
Rename expected file name to 
"impala_profile_log_tpcds_compute_stats_legacy.expected" for clarity?



--
To view, visit http://gerrit.cloudera.org:8080/16881
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I82618a813e29af7996dfaed78873b2a73bc0231d
Gerrit-Change-Number: 16881
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 21 Dec 2020 19:18:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.

2020-11-20 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16755


Change subject: IMPALA-9856: Enable result spooling by default.
..

IMPALA-9856: Enable result spooling by default.

Result spooling has been relatively stable since it was introduced, and
it has several benefits described in IMPALA-8656. This patch enable
result spooling (SPOOL_QUERY_RESULTS) query options by default.

Furthermore, some tests need to be adjusted to account for result
spooling by default. The following are the adjustment categories and
list of tests that fall under such category.

Change in assertions:
PlannerTest
TpcdsPlannerTest
custom_cluster/test_admission_controller.py::TestAdmissionController::test_dedicated_coordinator_planner_estimates
custom_cluster/test_admission_controller.py::TestAdmissionController::test_memory_rejection
custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_mem_limit_configs
metadata/test_explain.py::TestExplain::test_explain_level2
metadata/test_explain.py::TestExplain::test_explain_level3
metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation

Increase BUFFER_POOL_LIMIT:
query_test/test_queries.py::TestQueries::test_analytic_fns
query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filter_reservation
query_test/test_spilling.py::TestSpillingBroadcastJoins::test_spilling_broadcast_joins
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_aggs
query_test/test_udfs.py::TestUdfExecution::test_mem_limits

Increase MEM_LIMIT:
query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling

Increase MAX_ROW_SIZE:
custom_cluster/test_parquet_max_page_header.py::TestParquetMaxPageHeader::test_large_page_header_config
query_test/test_scanners.py::TestTextSplitDelimiters::test_text_split_across_buffers_delimiter

Disable result spooling to maintain assertion:
custom_cluster/test_admission_controller.py::TestAdmissionController::test_set_request_pool
custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_host_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_pool_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_queue_reasons_memory
custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_fetched_rows
custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_finished_query
custom_cluster/test_scratch_disk.py::TestScratchDir::test_no_dirs
custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_existing_dirs
custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_writable_dirs
query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan
query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_kudu_scan_mem_usage
query_test/test_query_mem_limit.py::TestCodegenMemLimit::test_codegen_mem_limit
query_test/test_query_mem_limit.py::TestQueryMemLimit::test_mem_limit
query_test/test_sort.py::TestQueryFullSort::test_multiple_mem_limits_full_output
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive
shell/test_shell_client.py::TestShellClient::test_fetch_size

Disable result spooling to avoid crash / silent error:
custom_cluster/test_observability.py::TestObservability::test_host_profile_jvm_gc_metrics
query_test/test_insert.py::TestInsertQueries::test_insert_large_string
query_test/test_queries.py::TestQueriesParquetTables::test_very_large_strings
query_test/test_scanners.py::TestWideRow::test_wide_row
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_large_rows
query_test/test_udfs.py::TestUdfExecution::test_java_udfs
query_test/test_udfs.py::TestUdfTargeted::test_udf_profile

Further investigation need to be done to address the last category.

Testing:
- Pass exhaustive tests.

Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4
---
M common/thrift/ImpalaInternalService.thrift
M testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection-hdfs-num-rows-est-enabled.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 

[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-11-21 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16765


Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..

IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation

PlanRootSink can fail silently if result spooling is enabled and
maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. Underneath, a
SpillableRowBatchQueue need 2 buffer (read and write) each to fit at
least MAX_ROW_SIZE bytes. This patch change the ResourceProfile's
maxMemReservationBytes as:

max(minMemReservationBytes, MAX_RESULT_SPOOLING_MEM, 2 * MAX_ROW_SIZE)

Testing:
- Pass exhaustive tests.
- Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to
  extremely low value.

Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
---
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M tests/custom_cluster/test_query_retries.py
M tests/query_test/test_result_spooling.py
3 files changed, 4 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/16765/1
--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.

2020-11-21 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16755 )

Change subject: IMPALA-9856: Enable result spooling by default.
..


Patch Set 1:

I file IMPALA-10337 separately to solve the crash / silent error case related 
to large rows. I think that should get in first before this patch.
I will investigate the other one that involve Udf.


--
To view, visit http://gerrit.cloudera.org:8080/16755
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4
Gerrit-Change-Number: 16755
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 21 Nov 2020 16:24:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10266: Identify FileSystem type based on the protocol scheme.

2020-10-22 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16628 )

Change subject: IMPALA-10266: Identify FileSystem type based on the protocol 
scheme.
..


Patch Set 1:

> Patch Set 1:
>
> It looks like there are some more instanceof usages sprinkled around the code:
>
> $ git grep 'instanceof.*FileSy'
> fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java:  if (!(fs 
> instanceof DistributedFileSystem) && !(fs instanceof S3AFileSystem) &&
> fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java:  !(fs 
> instanceof AzureBlobFileSystem) &&
> fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java:  !(fs 
> instanceof SecureAzureBlobFileSystem) &&
> fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java:  !(fs 
> instanceof AdlFileSystem) &&
> fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java:  !(fs 
> instanceof OzoneFileSystem)) {
> fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java:  boolean 
> shouldCheckPerms = !(fs instanceof AdlFileSystem ||
> fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java:fs 
> instanceof AzureBlobFileSystem || fs instanceof SecureAzureBlobFileSystem);
> fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:  
> if (FileSystemUtil.getDefaultFileSystem() instanceof DistributedFileSystem) {
> fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:  
> if (FileSystemUtil.getDefaultFileSystem() instanceof DistributedFileSystem) {
> fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:
> Preconditions.checkState(fs instanceof DistributedFileSystem);
> fe/src/main/java/org/apache/impala/service/JniFrontend.java:  if (!(fs 
> instanceof DistributedFileSystem ||
> fe/src/main/java/org/apache/impala/service/JniFrontend.java:fs 
> instanceof S3AFileSystem ||
> fe/src/main/java/org/apache/impala/service/JniFrontend.java:fs 
> instanceof AzureBlobFileSystem ||
> fe/src/main/java/org/apache/impala/service/JniFrontend.java:fs 
> instanceof SecureAzureBlobFileSystem ||
> fe/src/main/java/org/apache/impala/service/JniFrontend.java:fs 
> instanceof AdlFileSystem)) {

Ack. Will look into these codes.


--
To view, visit http://gerrit.cloudera.org:8080/16628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b
Gerrit-Change-Number: 16628
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 22 Oct 2020 17:47:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10266: Identify FileSystem type based on the protocol scheme.

2020-10-22 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16628


Change subject: IMPALA-10266: Identify FileSystem type based on the protocol 
scheme.
..

IMPALA-10266: Identify FileSystem type based on the protocol scheme.

Frontend identifies the type of FileSystem in two ways. The first is
done using the instanceof operator with subclasses of
org.apache.hadoop.fs.FileSystem. The second is by checking the
FileSystem protocol scheme. This patch standardizes the FileSystem
identification based on the scheme only.

Testing:
- Add testSupportStorageIds and testWriteableByImpala in
  FileSystemUtilTest.
- Run and pass core tests.

Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b
---
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java
2 files changed, 141 insertions(+), 36 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/16628/1
--
To view, visit http://gerrit.cloudera.org:8080/16628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b
Gerrit-Change-Number: 16628
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[Impala-ASF-CR] IMPALA-10266: Identify FileSystem type based on the protocol scheme.

2020-10-22 Thread Riza Suminto (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16628

to look at the new patch set (#4).

Change subject: IMPALA-10266: Identify FileSystem type based on the protocol 
scheme.
..

IMPALA-10266: Identify FileSystem type based on the protocol scheme.

Frontend identifies the type of FileSystem in two ways. The first is
done using the instanceof operator with subclasses of
org.apache.hadoop.fs.FileSystem. The second is by checking the
FileSystem protocol scheme. This patch standardizes the FileSystem
identification based on the scheme only.

Testing:
- Add several tests in FileSystemUtilTest to check validity of some
  FileSystemUtil functions.
- Run and pass core tests.

Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b
---
M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java
5 files changed, 261 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/16628/4
--
To view, visit http://gerrit.cloudera.org:8080/16628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b
Gerrit-Change-Number: 16628
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10266: Identify FileSystem type based on the protocol scheme.

2020-10-22 Thread Riza Suminto (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16628

to look at the new patch set (#2).

Change subject: IMPALA-10266: Identify FileSystem type based on the protocol 
scheme.
..

IMPALA-10266: Identify FileSystem type based on the protocol scheme.

Frontend identifies the type of FileSystem in two ways. The first is
done using the instanceof operator with subclasses of
org.apache.hadoop.fs.FileSystem. The second is by checking the
FileSystem protocol scheme. This patch standardizes the FileSystem
identification based on the scheme only.

Testing:
- Add several tests in FileSystemUtilTest to check validity of some
  FileSystemUtil functions.
- Run and pass core tests.

Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b
---
M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java
5 files changed, 220 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/16628/2
--
To view, visit http://gerrit.cloudera.org:8080/16628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b
Gerrit-Change-Number: 16628
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10266: Identify FileSystem type based on the protocol scheme.

2020-10-22 Thread Riza Suminto (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16628

to look at the new patch set (#3).

Change subject: IMPALA-10266: Identify FileSystem type based on the protocol 
scheme.
..

IMPALA-10266: Identify FileSystem type based on the protocol scheme.

Frontend identifies the type of FileSystem in two ways. The first is
done using the instanceof operator with subclasses of
org.apache.hadoop.fs.FileSystem. The second is by checking the
FileSystem protocol scheme. This patch standardizes the FileSystem
identification based on the scheme only.

Testing:
- Add several tests in FileSystemUtilTest to check validity of some
  FileSystemUtil functions.
- Run and pass core tests.

Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b
---
M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java
5 files changed, 222 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/16628/3
--
To view, visit http://gerrit.cloudera.org:8080/16628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b
Gerrit-Change-Number: 16628
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10266: Identify FileSystem type based on the protocol scheme.

2020-10-22 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16628 )

Change subject: IMPALA-10266: Identify FileSystem type based on the protocol 
scheme.
..


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16628/3/fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java
File fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java:

http://gerrit.cloudera.org:8080/#/c/16628/3/fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java@63
PS3, Line 63: Preconditions.checkNotNull(tableName);
> Maybe move this to FileSystemUtil? That way all these lists of capabilities
Done


http://gerrit.cloudera.org:8080/#/c/16628/3/fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java@161
PS3, Line 161:
> Can we generate this error message from the list of supported schemes? I th
Done


http://gerrit.cloudera.org:8080/#/c/16628/3/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/16628/3/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@72
PS3, Line 72:   ImmutableSet.builder()
> I like this pattern!
Agree, I feel it is easier to understand by grouping them like this.


http://gerrit.cloudera.org:8080/#/c/16628/2/fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java
File fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java:

http://gerrit.cloudera.org:8080/#/c/16628/2/fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java@100
PS2, Line 100: testFsType(
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16628/2/fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java@186
PS2, Line 186: return 
"adl://dummy-account.azuredatalakestore.net/dummy-part-3";
> line too long (91 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/16628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b
Gerrit-Change-Number: 16628
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 22 Oct 2020 21:13:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10266: Identify FileSystem type based on the protocol scheme.

2020-10-22 Thread Riza Suminto (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16628

to look at the new patch set (#5).

Change subject: IMPALA-10266: Identify FileSystem type based on the protocol 
scheme.
..

IMPALA-10266: Identify FileSystem type based on the protocol scheme.

Frontend identifies the type of FileSystem in two ways. The first is
done using the instanceof operator with subclasses of
org.apache.hadoop.fs.FileSystem. The second is by checking the
FileSystem protocol scheme. This patch standardizes the FileSystem
identification based on the scheme only.

Testing:
- Add several tests in FileSystemUtilTest to check validity of some
  FileSystemUtil functions.
- Run and pass core tests.

Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b
---
M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java
5 files changed, 261 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/16628/5
--
To view, visit http://gerrit.cloudera.org:8080/16628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b
Gerrit-Change-Number: 16628
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits

2021-01-21 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16963 )

Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits
..


Patch Set 2:

(2 comments)

Thank you for the review.
Patch set 2 move the handle variable and scope exit trigger closer around the 
file handle retrieval.

http://gerrit.cloudera.org:8080/#/c/16963/1/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/16963/1/be/src/runtime/io/hdfs-file-reader.cc@99
PS1, Line 99: Status status = Status::OK();
:   {
: ScopedTimer req_context_read_timer(
: scan_range_->reader_->read_timer_);
: bool logged_slow_read = false; // True if we already logged 
about a slow read.
:
: // Try reading from the remote data cache if it's enabled for 
the scan range.
: DataCache* remote_data_cache = io_mgr->remote_data_cache();
: bool try_data_cache = scan_range_->UseDataCache() && 
remote_data_cache != nullptr;
: i
> Nit: Since these are now initialized later in the function, I would also mo
Done


http://gerrit.cloudera.org:8080/#/c/16963/1/be/src/runtime/io/hdfs-file-reader.cc@126
PS1, Line 126: hdfsFile hdfs_file;
> For the clang-tidy issue you are seeing here:
Thank you. I didn't read thoroughly to realize that the logic later won't 
execute if all bytes successfully read from data cache.



--
To view, visit http://gerrit.cloudera.org:8080/16963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Gerrit-Change-Number: 16963
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 21 Jan 2021 17:13:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits

2021-01-21 Thread Riza Suminto (Code Review)
Hello Zoltan Borok-Nagy, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16963

to look at the new patch set (#2).

Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits
..

IMPALA-10147: Avoid getting a file handle for data cache hits

When reading from the data cache, the disk IO thread first gets a file
handle, then it checks the data cache for a hit. The file handle is only
used if there is a data cache miss. It is not used when data cache hit
and in turns becomes an overhead. This patch move the file handle
retrieval later when data cache miss hapens.

Testing:
- Add custom cluster test test_no_fd_caching_on_cached_data.
- Pass core tests.

Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
---
M be/src/runtime/io/hdfs-file-reader.cc
M tests/custom_cluster/test_hdfs_fd_caching.py
2 files changed, 71 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/16963/2
--
To view, visit http://gerrit.cloudera.org:8080/16963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Gerrit-Change-Number: 16963
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits

2021-01-19 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16963


Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits
..

IMPALA-10147: Avoid getting a file handle for data cache hits

When reading from the data cache, the disk IO thread first gets a file
handle, then it checks the data cache for a hit. The file handle is only
used if there is a data cache miss. It is not used when data cache hit
and in turns becomes an overhead. This patch move the file handle
retrieval later when data cache miss hapens.

Testing:
- Add custom cluster test test_no_fd_caching_on_cached_data.
- Pass core tests.

Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
---
M be/src/runtime/io/hdfs-file-reader.cc
M tests/custom_cluster/test_hdfs_fd_caching.py
2 files changed, 59 insertions(+), 10 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/16963/1
--
To view, visit http://gerrit.cloudera.org:8080/16963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Gerrit-Change-Number: 16963
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[Impala-ASF-CR] IMPALA-9355: TestExchangeMemUsage.test exchange mem usage scaling doesn't hit the memory limit

2021-06-11 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17586 )

Change subject: IMPALA-9355: 
TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory 
limit
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/17586
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4ad4508028645b67de419cfdfa2327d2847cfc4
Gerrit-Change-Number: 17586
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 11 Jun 2021 23:22:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early

2021-05-18 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17295 )

Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early
..


Patch Set 28:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc@337
PS27, Line 337:   for (std::vector::const_iterator it = 
minmax_filter_ctxs_.begin();
> Sounds like a good idea.
Looks good, thanks!


http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc@404
PS27, Line 404:   }
> Partial filter publishing with propagation to scan nodes may be a little bi
Make sense. I suppose execution nodes consuming the minmax filters also still 
need to wait for the remaining filters to arrive before start reading.


http://gerrit.cloudera.org:8080/#/c/17295/28/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/17295/28/be/src/exec/partitioned-hash-join-builder.cc@335
PS28, Line 335:   if (filter_ctxs_.size() == 0) return;
Looks like we can remove this branch?
In case minmax_filter_ctxs_ is empty, the loop below will stop immediately.



--
To view, visit http://gerrit.cloudera.org:8080/17295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183
Gerrit-Change-Number: 17295
Gerrit-PatchSet: 28
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 May 2021 18:09:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early

2021-05-17 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17295 )

Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early
..


Patch Set 27:

(2 comments)

Hi Qifan,
I wonder if we can improve the minmax filter performance from the build side.
I have the following questions and comments.

http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc@337
PS27, Line 337:   for (const FilterContext& ctx : filter_ctxs_) {
I wonder if we can speed this up by iterating ONLY the minmax filters.
Maybe copy reference of the minmax filters into separate vector?

This function seems to be called frequently on every PhjBuilder::AddBatch.
I imagine if minmax filter is enabled, only half of filter_ctxs_ elements are 
actually minmax filter.
We can also pop filter out of the vector once it deemed not useful, therefore 
speeding up the next iteration.


http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc@404
PS27, Line 404: PublishRuntimeFilters(num_build_rows);
It seems to me that PublishRuntimeFilters is only called here in FinalizeBuild 
(I assume near the end of the build process).
Since minmax filter can be quickly disabled after reading few early RowBatch, 
shall we consider to publish them as soon as possible?
Say, immediately publish disabled minmax filter from 
PhjBuilder::DetermineUsefulnessForMinmaxFilters()?



--
To view, visit http://gerrit.cloudera.org:8080/17295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183
Gerrit-Change-Number: 17295
Gerrit-PatchSet: 27
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 May 2021 02:52:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page if stream only has 2 pages.

2021-04-22 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page if stream only has 2 
pages.
..


Patch Set 7:

> Patch Set 7: Verified-1
>
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7091/

Looks like one of the maven dependency download failed

08:31:54 08:31:54 [ERROR] Failed to execute goal on project 
impala-executor-deps: Could not resolve dependencies for project 
org.apache.impala:impala-executor-deps:pom:4.0.0-SNAPSHOT: Failed to collect 
dependencies at com.google.cloud.bigdataoss:gcs-connector:jar:2.1.2.7.2.9.0-146 
-> com.google.cloud.bigdataoss:util:jar:2.1.2.7.2.9.0-146: Failed to read 
artifact descriptor for com.google.cloud.bigdataoss:util:jar:2.1.2.7.2.9.0-146: 
Could not transfer artifact 
com.google.cloud.bigdataoss:util:pom:2.1.2.7.2.9.0-146 from/to impala.cdp.repo 
(https://native-toolchain.s3.amazonaws.com/build/cdp_components/11920537/maven):
 Connection reset -> [Help 1]


--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 22 Apr 2021 15:58:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10497: Fix flakiness in test no fd caching on cached data.

2021-02-09 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17054


Change subject: IMPALA-10497: Fix flakiness in 
test_no_fd_caching_on_cached_data.
..

IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data.

test_no_fd_caching_on_cached_data has been flaky for not having all of
the data fully cached in warm up phase. This patch fix the test by:

1. Reduce the number of rows written to table cachefd.simple.
2. Repeat the warm up query 5 times.
3. Lower the cluster size to 1.

Testing:
- Loop the test manually 100 times and see no more failures.

Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952
---
M tests/custom_cluster/test_hdfs_fd_caching.py
1 file changed, 31 insertions(+), 20 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/17054/1
--
To view, visit http://gerrit.cloudera.org:8080/17054
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952
Gerrit-Change-Number: 17054
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[Impala-ASF-CR] IMPALA-10497: Fix flakiness in test no fd caching on cached data.

2021-02-10 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17054 )

Change subject: IMPALA-10497: Fix flakiness in 
test_no_fd_caching_on_cached_data.
..


Patch Set 1:

(1 comment)

Thanks Joe for the feedback.
I will revert most of these and leave only the extra warm up part.

http://gerrit.cloudera.org:8080/#/c/17054/1/tests/custom_cluster/test_hdfs_fd_caching.py
File tests/custom_cluster/test_hdfs_fd_caching.py:

http://gerrit.cloudera.org:8080/#/c/17054/1/tests/custom_cluster/test_hdfs_fd_caching.py@190
PS1, Line 190: cluster_size=1
> I think this should not impact the caching. Our scheduling algorithm maps f
I will remove this so the cluster size back to default.
Previously, when I kept cluster size as default (3), after several iteration of 
looping this test, one impalad failed with the following error.

I0210 17:50:00.781529 10340 mem_tracker.cc:83] Creating tracker 
/tmp-sharded_lru_cache->root
I0210 17:50:00.782321 10340 status.cc:129] Failed to delete old cache file 
/tmp/impala-cache-file-fc4060e3ab7e45b0:a680f0d3b05005a1: Not found: 
/tmp/impala-cache-file-fc4060e3ab7e45b0:a680f0d3b05005a1: No such file or 
directory (error 2)
@  0x2c008d7  _ZN6impala13GetStackTraceB5cxx11Ev
@  0x1c9003e  impala::Status::Status()
@  0x36016f9  
impala::io::DataCache::Partition::DeleteExistingFiles()
@  0x3601e22  impala::io::DataCache::Partition::Init()
@  0x360975e  impala::io::DataCache::Init()
@  0x35b4db0  impala::io::DiskIoMgr::Init()
@  0x251b229  impala::ExecEnv::Init()
@  0x29a0412  ImpaladMain()
@  0x1bdf4d3  main
@ 0x7faa8e3bebf7  __libc_start_main
@  0x1ae81da  _start
I0210 17:50:00.897989 10340 mem_tracker.cc:87] Destroying tracker 
/tmp-sharded_lru_cache->root

I was guessing that since all three write their data cache to common /tmp/ dir, 
one might accidentally delete cache file that belong to other impalad. Just 
running the test one time doesn't reveal this error.
So in this patchset 1, I follow examples from custom_cluster/test_data_cache.py 
to set custom_cluster=1.



--
To view, visit http://gerrit.cloudera.org:8080/17054
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952
Gerrit-Change-Number: 17054
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 11 Feb 2021 02:18:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10497: Fix flakiness in test no fd caching on cached data.

2021-02-10 Thread Riza Suminto (Code Review)
Hello Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17054

to look at the new patch set (#2).

Change subject: IMPALA-10497: Fix flakiness in 
test_no_fd_caching_on_cached_data.
..

IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data.

test_no_fd_caching_on_cached_data has been flaky for not having all of
the data fully cached in the warm-up phase. There is a limit on
concurrency in writing to the cache such that we may fail to cache data
the first time read it. This patch fixes the test by repeating the
warm-up query 5 times. This patch also reduce the test cluster size to
just 1 to avoid potential flakiness from multiple impalad racing to
delete the same old fata cache files.

Testing:
- Loop the test manually 100 times and see no more failures.

Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952
---
M tests/custom_cluster/test_hdfs_fd_caching.py
1 file changed, 7 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/17054/2
--
To view, visit http://gerrit.cloudera.org:8080/17054
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952
Gerrit-Change-Number: 17054
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10497: Fix flakiness in test no fd caching on cached data.

2021-02-10 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17054 )

Change subject: IMPALA-10497: Fix flakiness in 
test_no_fd_caching_on_cached_data.
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17054/1/tests/custom_cluster/test_hdfs_fd_caching.py
File tests/custom_cluster/test_hdfs_fd_caching.py:

http://gerrit.cloudera.org:8080/#/c/17054/1/tests/custom_cluster/test_hdfs_fd_caching.py@179
PS1, Line 179: le handles cached
> One thing to watch out for here is if you inherit from TestHdfsFdCaching, y
Done


http://gerrit.cloudera.org:8080/#/c/17054/1/tests/custom_cluster/test_hdfs_fd_caching.py@184
PS1, Line 184: # handle ca
> I believe that the number of rows should not impact the behavior. The diffe
Done


http://gerrit.cloudera.org:8080/#/c/17054/1/tests/custom_cluster/test_hdfs_fd_caching.py@190
PS1, Line 190:
> I will remove this so the cluster size back to default.
I change my mind, I think we should keep cluster_size=1 here.

On start up, impalad delete the old cache file by listing file matching 
CACHE_FILE_PREFIX and delete them one by one.
https://github.com/apache/impala/blob/b42c649/be/src/runtime/io/data-cache.cc#L415
So if cluster_size > 1, multiple impalad might race to delete the same old 
cache file and crash the losing ones.

Also, setting cluster_size=1 does not change the behavior being tested by this 
test, so I think it should be okay.


http://gerrit.cloudera.org:8080/#/c/17054/1/tests/custom_cluster/test_hdfs_fd_caching.py@207
PS1, Line 207:
 :   def cached_handles(self):
 : return self.get_ag
> This makes a lot of sense to me as the explanation. There are times when da
Done


http://gerrit.cloudera.org:8080/#/c/17054/1/tests/custom_cluster/test_hdfs_fd_caching.py@224
PS1, Line 224:
> I'm fine with either value here, but I also don't see this changing the beh
Done



--
To view, visit http://gerrit.cloudera.org:8080/17054
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952
Gerrit-Change-Number: 17054
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 11 Feb 2021 03:09:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17166

to look at the new patch set (#4).

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..

IMPALA-10565: Adjust result spooling memory based on scratch_limit

IMPALA-9856 enables result spooling by default. Result spooling depends
on the ability to spill its entire BufferedTupleStream to disk once it
hits maximum memory reservation. However, if the query option
scratch_limit is set lower than max_spilled_result_spooling_mem, the
query might fail in the middle of execution due to insufficient scratch
space. This patch adds planner change to consider scratch_limit and
scratch_dirs query option when computing resource used by result
spooling. The algorithm is as follow:

* If scratch_dirs is empty or scratch_limit < minMemReservationBytes
  required to use BufferedPlanRootSink, we set spool_query_results to
  false and fallback to use BlockingPlanRootSink.

* If scratch_limit > minMemReservationBytes but still fairly low, we
  lower the max_result_spooling_mem (default is 100MB) and
  max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit.

* if scratch_limit > max_spilled_result_spooling_mem, do nothing.

Testing:
- Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit
- Verify that spool_query_results query option is disabled in
  TestScratchDir::test_no_dirs
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/service/query-options-test.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
M tests/custom_cluster/test_scratch_disk.py
M tests/query_test/test_scratch_limit.py
8 files changed, 140 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/4
--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 4:

(8 comments)

My exhaustive test run last night reveal that scratch_limit might still get 
violated if maxMemReservationBytes is equal to scratch_limit. This is because 
the content of SpillableRowBatchQueue can be slightly higher than 
maxMemReservationBytes when it decide to spill.

To anticipate that, I lower the spooling mem config a little further here in 
Patch Set 4.

http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc@1104
PS3, Line 1104:   // max_spilled_result_spooling_mem (a value of 0 means memory 
is unbounded).
> I just figured out in ParseUtil::ParseMemSpec() that -1 for memory query op
Done


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@77
PS3, Line 77:* If SPOOL_QUERY_RESULTS is true, then the ResourceProfile 
sets a min/max resevation,
> Some of the method level comment should be updated to reflect the behavior
Done


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@92
PS3, Line 92:
> nit: typo ?
Done


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@110
PS3, Line 110:   long bufferSize = 
queryOptions.getDefault_spillable_buffer_size();
 :   long maxRowBufferSize = 
PlanNode.computeMaxSpillableBufferSize(
> It sounds like an existing bug.  If you can create a test case for it can y
I filed IMPALA-10583. Will work on that next.


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@126
PS3, Line 126:
> Suggest rewording:  'to >=' minMemReservationBytes
Done


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@126
PS3, Line 126:
> nit: 'increasing'
Done


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142
PS3, Line 142:   maxMemReservationBytes = scratchLimit - 
maxRowBufferSize;
> Would be useful to add a trace level log message here as well.
Done


http://gerrit.cloudera.org:8080/#/c/17166/3/testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
File testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test:

http://gerrit.cloudera.org:8080/#/c/17166/3/testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test@2
PS3, Line 2:  QUERY
> Could you add 1 tests with empty scratch dirs ?
Since scratch_dirs is a backend flag, I piggy back the test under 
TestScratchDir::test_no_dirs



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 12 Mar 2021 20:00:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-11 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17166

to look at the new patch set (#3).

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..

IMPALA-10565: Adjust result spooling memory based on scratch_limit

IMPALA-9856 enables result spooling by default. Result spooling depends
on the ability to spill its entire BufferedTupleStream to disk once it
hits maximum memory reservation. However, if the query option
scratch_limit is set lower than max_spilled_result_spooling_mem, the
query might fail in the middle of execution due to insufficient scratch
space. This patch adds planner change to consider scratch_limit and
scratch_dirs query option when computing resource used by result
spooling. The algorithm is as follow:

* If scratch_dirs is empty or scratch_limit < minMemReservationBytes
  required to use BufferedPlanRootSink, we set spool_query_results to
  false and fallback to use BlockingPlanRootSink.

* If scratch_limit > minMemReservationBytes but still fairly low, we
  lower the max_result_spooling_mem (default is 100MB) and
  max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit.

* if scratch_limit > max_spilled_result_spooling_mem, do nothing.

This patch also fix validation between max_result_spooling_mem and
max_spilled_result_spooling_mem that should treat both value 0 and -1 as
unbounded.

Testing:
- Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit
- Remove spool_query_results query option in
  custom_cluster/test_scratch_disk.py
- Change be test QueryOptions.ResultSpooling to also test -1 as
  unbounded.
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
M tests/custom_cluster/test_scratch_disk.py
M tests/query_test/test_scratch_limit.py
9 files changed, 140 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-11 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 3:

Following up on private discussion, we decide that it is better to override 
result spooling configuration if scratch_limit is too low.

Patch set 3 implement this strategy. I will rerun an exhaustive tests over this 
patch set later tonight.


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 12 Mar 2021 00:00:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@120
PS4, Line 120:
> ACK.
Done


http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142
PS4, Line 142:   queryOptions.setSpool_query_results(false);
> Yeah, that will make the code cleaner.  Agree about the first point about u
Done



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 03:59:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17166

to look at the new patch set (#5).

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..

IMPALA-10565: Adjust result spooling memory based on scratch_limit

IMPALA-9856 enables result spooling by default. Result spooling depends
on the ability to spill its entire BufferedTupleStream to disk once it
hits maximum memory reservation. However, if the query option
scratch_limit is set lower than max_spilled_result_spooling_mem, the
query might fail in the middle of execution due to insufficient scratch
space. This patch adds planner change to consider scratch_limit and
scratch_dirs query option when computing resource used by result
spooling. The algorithm is as follow:

* If scratch_dirs is empty or scratch_limit < minMemReservationBytes
  required to use BufferedPlanRootSink, we set spool_query_results to
  false and fallback to use BlockingPlanRootSink.

* If scratch_limit > minMemReservationBytes but still fairly low, we
  lower the max_result_spooling_mem (default is 100MB) and
  max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit.

* if scratch_limit > max_spilled_result_spooling_mem, do nothing.

Testing:
- Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit
- Verify that spool_query_results query option is disabled in
  TestScratchDir::test_no_dirs
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/service/query-options-test.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
M tests/custom_cluster/test_scratch_disk.py
M tests/query_test/test_scratch_limit.py
8 files changed, 143 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/5
--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17166

to look at the new patch set (#6).

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..

IMPALA-10565: Adjust result spooling memory based on scratch_limit

IMPALA-9856 enables result spooling by default. Result spooling depends
on the ability to spill its entire BufferedTupleStream to disk once it
hits maximum memory reservation. However, if the query option
scratch_limit is set lower than max_spilled_result_spooling_mem, the
query might fail in the middle of execution due to insufficient scratch
space. This patch adds planner change to consider scratch_limit and
scratch_dirs query option when computing resource used by result
spooling. The algorithm is as follow:

* If scratch_dirs is empty or scratch_limit < minMemReservationBytes
  required to use BufferedPlanRootSink, we set spool_query_results to
  false and fallback to use BlockingPlanRootSink.

* If scratch_limit > minMemReservationBytes but still fairly low, we
  lower the max_result_spooling_mem (default is 100MB) and
  max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit.

* if scratch_limit > max_spilled_result_spooling_mem, do nothing.

Testing:
- Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit
- Verify that spool_query_results query option is disabled in
  TestScratchDir::test_no_dirs
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/service/query-options-test.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
M tests/custom_cluster/test_scratch_disk.py
M tests/query_test/test_scratch_limit.py
8 files changed, 143 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/6
--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@123
PS5, Line 123:  // If maxAllowedScratchLimit < minMemReservat
> nit: Shouldn't this be maxAllowedScratchLimit < minMemReservationBytes ?
Thanks! Sorry for missing this.


http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@125
PS5, Line 125: If maxAllowedScratchLimit < maxMemReserva
> nit: Similar update needed here ?
Done



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 06:44:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@120
PS4, Line 120: scratch_limit < minMemReservationBytes
> Update this and the one below to account for the extra maxRowBufferSize
ACK.


http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142
PS4, Line 142:   maxMemReservationBytes = scratchLimit - 
maxRowBufferSize;
> For this adjustment for maxRowBufferSize, can we not just do it up front (o
If scratch_limit is unbounded, the maxMemReservationBytes calculation in line 
114 is OK. Little overspill will not fail the query.
In contrary, if scratch_limit is bounded, just a little overspill will 
terminate the query because scratch_limit is strictly enforced.

What if I tidy up the comparison a bit so that it looks simpler? We define

  long maxAllowedScratchLimit = scratchLimit - maxRowBufferSize;

Instead of comparing against scratchLimit, these should compare against 
maxAllowedScratchLimit;



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 01:47:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 6:

> Yes, let's do that. I presume you will explicitly enable result_spooling only 
> for the following test?
> def test_result_spooling_and_varying_scratch_limit(self, vector):

Yes. My plan is to disable result spooling as a separate patch for IMPALA-10559.
This patch then only have result spooling enabled for 
test_result_spooling_and_varying_scratch_limit.


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 17:36:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 6:

> Patch Set 6: Verified-1
>
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/

Looks like it hit flaky TestScratchLimit::test_with_unlimited_scratch_limit 
(IMPALA-10559).


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 15:49:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 6:

> Patch Set 6:
>
> > Patch Set 6:
> >
> > > Patch Set 6: Verified-1
> > >
> > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/
> >
> > Looks like it hit flaky TestScratchLimit::test_with_unlimited_scratch_limit 
> > (IMPALA-10559).
>
> Yeah, that is one failure..but I also see for the dockerized tests bunch of 
> other failures: 
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3935/ although it 
> seems the failures are all related to lost connection.
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION:  'thrift.transport.TTransport.TTransportException'>
> EMESSAGE: TSocket read 0 bytes

TestScratchLimit::test_with_unlimited_scratch_limit crash one of the impalad 
due to DCHECK violation in reservation-tracker.cc:436. It is a violation when 
BufferedTupleStream is spilling and subsequently allocating new page for read.

It is unrelated with this patch. But the combination of sort query + result 
spooling in this test seems to reveal another bug in BufferedTupleStream.
At this point, I think we should disable result spooling for all test in 
TestScratchLimit (per discussion in IMPALA-10559 jira) and investigate it 
separately. What do you think?


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 17:10:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit

2021-03-10 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17166

to look at the new patch set (#2).

Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs 
scratch_limit
..

IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit

IMPALA-9856 enables result spooling by default. However, if the query
option scratch_limit is set lower than max_spilled_result_spooling_mem,
the query might fail in the middle of execution due to insufficient
scratch space. This patch validation that when result spooling is
enabled, max_spilled_result_spooling_mem <= scratch_limit.

This patch also fix validation between max_result_spooling_mem and
max_spilled_result_spooling_mem that should treat both value 0 and -1 as
unbounded.

Testing:
- Lower max_spilled_result_spooling_mem in test_with_high_scratch_limit
  and test_with_low_scratch_limit.
- Toggle off spool_query_results in tests that set scratch_limit=0 to
  ensure that result spilling will not happen.
- Add test_with_scratch_limit_less_than_max_spilled_result_spooling_mem.
- Add be test QueryOptions.ResultSpoolingWithScratchLimit.
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M 
testdata/workloads/functional-query/queries/QueryTest/spilling-naaj-no-deny-reservation.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-naaj.test
M testdata/workloads/tpch/queries/sort-reservation-usage-single-node.test
M tests/query_test/test_scratch_limit.py
7 files changed, 129 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/2
--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit

2021-03-10 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs 
scratch_limit
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@
PS1, Line : $1
> slightly misleading,does not necessarily have to be the same, could be 0 or
Agree. I will rephrase into this:
"If max_result_spooling_mem is set to unbounded ($0) 
max_spilled_result_spooling_mem must be set to unbounded (0 or -1) as well."


http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@1124
PS1, Line 1124:   if (query_options->spool_query_results && scratch_limit != 
-1) {
> since now we have spilling on by default, spool_query_results is set to 1GB
Good point. Quick git grep show some more places where scratch_limit is set to 
0, like some in spilling-naaj-no-deny-reservation.test. I will also set 
spool_query_results=0 in these places.



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 10 Mar 2021 23:28:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit

2021-03-10 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs 
scratch_limit
..


Patch Set 2:

(2 comments)

Patch set 2 disable result spooling in tests that set scratch_limit > -1.

http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@
PS1, Line : un
> Agree. I will rephrase into this:
Done


http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@1124
PS1, Line 1124:   if (query_options->spool_query_results && scratch_limit != 
-1) {
> Good point. Quick git grep show some more places where scratch_limit is set
Done



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 11 Mar 2021 00:53:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit

2021-03-09 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17166


Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs 
scratch_limit
..

IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit

IMPALA-9856 enables result spooling by default. However, if the query
option scratch_limit is set lower than max_spilled_result_spooling_mem,
the query might fail in the middle of execution due to insufficient
scratch space. This patch validation that when result spooling is
enabled, max_spilled_result_spooling_mem <= scratch_limit.

This patch also fix validation between max_result_spooling_mem and
max_spilled_result_spooling_mem that should treat both value 0 and -1 as
unbounded.

Testing:
- Lower max_spilled_result_spooling_mem in test_with_high_scratch_limit
  and test_with_low_scratch_limit.
- Toggle off spool_query_results in
  test_with_zero_scratch_limit_no_memory_limit to ensure that spill will
  not happen.
- Add test_with_scratch_limit_less_than_max_spilled_result_spooling_mem.
- Add be test QueryOptions.ResultSpoolingWithScratchLimit.
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M tests/query_test/test_scratch_limit.py
3 files changed, 120 insertions(+), 11 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/1
--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[Impala-ASF-CR] IMPALA-10583: Fix bug on unbounded result spooling memory config.

2021-03-15 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17187


Change subject: IMPALA-10583: Fix bug on unbounded result spooling memory 
config.
..

IMPALA-10583: Fix bug on unbounded result spooling memory config.

Two bugs happening on result spooling when we set two of its query
options to 0 (unbounded).

The first bug happens if max_spilled_result_spooling_mem =
0 (unbounded). max_unpinned_bytes_ in SpillableRowBatchQueue will be set
to 0, SpillableRowBatchQueue::IsFull() then will always return true, and
the query hang. This patch fix it by setting max_unpinned_bytes_ to
INT64_MAX if max_spilled_result_spooling_mem = 0.

The second bug happens if we set max_result_spooling_mem =
0 (unbounded). PlanRootSink.java will peg maxMemReservationBytes to
always equal to minMemReservationBytes. This patch fixes this by
reverting to the default max_result_spooling_mem (100MB).

Testing:
- Add test_unbounded_result_spooling_mem.
- Pass core tests.

Change-Id: If8f5e3668281bba8813f8082f45b4faa7721530e
---
M be/src/exec/buffered-plan-root-sink.cc
M be/src/runtime/spillable-row-batch-queue.cc
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M tests/query_test/test_result_spooling.py
4 files changed, 43 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/17187/1
--
To view, visit http://gerrit.cloudera.org:8080/17187
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If8f5e3668281bba8813f8082f45b4faa7721530e
Gerrit-Change-Number: 17187
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-11 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc@1104
PS3, Line 1104:   // (a value of 0 or -1 means memory is unbounded).
I just figured out in ParseUtil::ParseMemSpec() that -1 for memory query option 
is accepted as indicator for infinite memory and translated into 0 return 
value. So testing against -1 here is not necessary. I will revert this change.


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@110
PS3, Line 110:   long maxMemReservationBytes = Math.max(
 :   queryOptions.getMax_result_spooling_mem(), 
minMemReservationBytes);
This maxMemReservationBytes does not seems to take account of 
max_result_spooling_mem = 0 (unbounded), which can peg maxMemReservationBytes = 
minMemReservationBytes. Im not sure what to set maxMemReservationBytes in this 
case.

Similarly, SpillableRowBatchQueue does not seem to recognize 
max_spilled_result_spooling_mem = 0 as unbounded.



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 12 Mar 2021 05:00:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17166

to look at the new patch set (#7).

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..

IMPALA-10565: Adjust result spooling memory based on scratch_limit

IMPALA-9856 enables result spooling by default. Result spooling depends
on the ability to spill its entire BufferedTupleStream to disk once it
hits maximum memory reservation. However, if the query option
scratch_limit is set lower than max_spilled_result_spooling_mem, the
query might fail in the middle of execution due to insufficient scratch
space. This patch adds planner change to consider scratch_limit and
scratch_dirs query option when computing resource used by result
spooling. The algorithm is as follow:

* If scratch_dirs is empty or scratch_limit < minMemReservationBytes
  required to use BufferedPlanRootSink, we set spool_query_results to
  false and fallback to use BlockingPlanRootSink.

* If scratch_limit > minMemReservationBytes but still fairly low, we
  lower the max_result_spooling_mem (default is 100MB) and
  max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit.

* if scratch_limit > max_spilled_result_spooling_mem, do nothing.

Testing:
- Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit
- Verify that spool_query_results query option is disabled in
  TestScratchDir::test_no_dirs
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/service/query-options-test.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
M tests/custom_cluster/test_scratch_disk.py
M tests/query_test/test_scratch_limit.py
8 files changed, 143 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/7
--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10559: Fix flakiness in TestScratchLimit.

2021-03-13 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17182


Change subject: IMPALA-10559: Fix flakiness in TestScratchLimit.
..

IMPALA-10559: Fix flakiness in TestScratchLimit.

TestScratchLimit has been flaky in ubuntu-16.04-dockerised-tests
environment since results spooling is enabled by default in IMPALA-9856.
A combination of result spooling, sort query, and low buffer_pool_limit
in TestScratchLimit::test_with_unlimited_scratch_limit seems to reveal a
memory reservation bug in BufferedTutpleStream. This patch disables
result spooling for tests under TestScratchLimit until the underlying
bug is found. We will investigate the bug in a separate JIRA.

Testing:
- Disable result spooling in all tests of TestScratchLimit before
  IMPALA-9856 gets in.
- Run and pass TestScratchLimit locally.

Change-Id: I68736d6bfb0001423fd138000670ac60b2117fbe
---
M tests/query_test/test_scratch_limit.py
1 file changed, 6 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/17182/1
--
To view, visit http://gerrit.cloudera.org:8080/17182
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I68736d6bfb0001423fd138000670ac60b2117fbe
Gerrit-Change-Number: 17182
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 7:

Patch set 7 is a rebase of patch set 6.


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sun, 14 Mar 2021 03:32:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10584: Allow UnpinStream to fail if reservation is not enough

2021-03-17 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17195


Change subject: IMPALA-10584: Allow UnpinStream to fail if reservation is not 
enough
..

IMPALA-10584: Allow UnpinStream to fail if reservation is not enough

TestScratchLimit::test_with_unlimited_scratch_limit has been
intermittently crashing in ubuntu-16.04-dockerised-tests environment
after result spooling is enabled by default in IMPALA-9856. DCHECK
violation occurs in ReservationTracker::CheckConsistency() due to
BufferedTupleStream wrongly tries to reclaim memory reservation while
unpinning the stream.

For this bug to surface, all of the following needs to happen:
- Stream is in pinned mode.
- There are only 2 pages in the stream: 1 read and 1 write.
- Stream can not increase reservation anymore either due to memory
  pressure or low buffer/memory limit.
- The stream read page has been fully read and is attached to output
  RowBatch. But the output RowBatch has not cleaned up yet.
- BufferedTupleStream::UnpinStream is invoked.

The bug happens because UnpinStream proceeds to NextReadPage where the
read page buffer was mistakenly assumed as released. default_page_len_
bytes were added into write_page_reservation_ and subsequently violates
the total memory reservation.

This patch fixes the bug by disallowing UnpinStream to advance the read
iterator if the read page is attached to output RowBatch and there is
no more unused reservation. If UnpinStream in turns unable to unpin any
page to reclaim memory reservation, TErrorCode::NO_PAGE_UNPINNED will be
returned and the stream will stay in pinned mode.

In the context of BufferedPlanRootSink, if this error occurs in Send(),
the writer will release the lock to give a chance for the reader to read
some more rows and clean up the output RowBatch (current_batch_) before
retying to add more rows again.

Testing:
- Add BE test UnpinReadPageMinReservation.
- Reenable result spooling in TestScratchLimit that was disabled in
  IMPALA-10559.
- Pass core tests.

Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
---
M be/src/exec/buffered-plan-root-sink.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/spillable-row-batch-queue.cc
M be/src/runtime/spillable-row-batch-queue.h
M common/thrift/generate_error_codes.py
M tests/query_test/test_scratch_limit.py
8 files changed, 200 insertions(+), 47 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17195/1
--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[Impala-ASF-CR] IMPALA-10492: Lower default MAX CNF EXPRS query option

2021-02-26 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17132


Change subject: IMPALA-10492: Lower default MAX_CNF_EXPRS query option
..

IMPALA-10492: Lower default MAX_CNF_EXPRS query option

MAX_CNF_EXPRS was set to unlimited by default. A complex query that
container many predicates the CNF rewrite can lead to significant
frontend memory usage and eventually OutOfMemory. We need to lower the
default value to avoid the memory problem while maintaining performance
for our TPC-DS and TPC-H workloads.

We investigate the maximum number of CFN expressions in TPC-DS and TPC-H
by printing out the final value of 'numCnfExprs_' from
ConvertToCNFRule.java to the query profile. We found 5 queries that
applies CNF rewrite rules as follow:

| Query | numCnfExprs_ |
|---+--|
| TPCDS-Q13 |  168 |
| TPCDS-Q85 |  100 |
| TPCDS-Q48 |   34 |
| TPCH-Q19  |  124 |
| TPCH-Q7   |3 |

This patch lower the default value from unlimited to 200 based on the
result above.

Testing:
- Manually verify that MAX_CNF_EXPRS 200 is enough for our TPC-DS and
  TPC-H worloads.
- Pass core tests.

Change-Id: I7ca3d0e094ac01c24a046c25d6a1b56bf134faa8
---
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
2 files changed, 2 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/17132/1
--
To view, visit http://gerrit.cloudera.org:8080/17132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7ca3d0e094ac01c24a046c25d6a1b56bf134faa8
Gerrit-Change-Number: 17132
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[Impala-ASF-CR] IMPALA-10492: Lower default MAX CNF EXPRS query option

2021-02-26 Thread Riza Suminto (Code Review)
Hello Aman Sinha, David Rorke, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17132

to look at the new patch set (#2).

Change subject: IMPALA-10492: Lower default MAX_CNF_EXPRS query option
..

IMPALA-10492: Lower default MAX_CNF_EXPRS query option

MAX_CNF_EXPRS was set to unlimited by default. The CNF rewrite can lead
to significant frontend memory usage and eventually OutOfMemory for a
complex query that contain many predicates. We need to lower the default
value to avoid this memory problem while maintaining performance for our
TPC-DS and TPC-H workloads.

We investigate the maximum number of CFN expressions in TPC-DS and TPC-H
by printing out the final value of 'numCnfExprs_' from
ConvertToCNFRule.java to the query profile. We found 5 queries that
applies CNF rewrite rules as follow:

| Query | numCnfExprs_ |
|---+--|
| TPCDS-Q13 |  168 |
| TPCDS-Q85 |  100 |
| TPCDS-Q48 |   34 |
| TPCH-Q19  |  124 |
| TPCH-Q7   |3 |

This patch lower the default value from unlimited to 200 based on the
result above.

Testing:
- Manually verify that MAX_CNF_EXPRS 200 is enough for our TPC-DS and
  TPC-H worloads.
- Pass core tests.

Change-Id: I7ca3d0e094ac01c24a046c25d6a1b56bf134faa8
---
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
2 files changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/17132/2
--
To view, visit http://gerrit.cloudera.org:8080/17132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ca3d0e094ac01c24a046c25d6a1b56bf134faa8
Gerrit-Change-Number: 17132
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.

2021-02-24 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16755 )

Change subject: IMPALA-9856: Enable result spooling by default.
..


Patch Set 4:

(2 comments)

Patch set 4 is a rebase after IMPALA-10371 resolved.

http://gerrit.cloudera.org:8080/#/c/16755/3/tests/custom_cluster/test_parquet_max_page_header.py
File tests/custom_cluster/test_parquet_max_page_header.py:

http://gerrit.cloudera.org:8080/#/c/16755/3/tests/custom_cluster/test_parquet_max_page_header.py@110
PS3, Line 110: # IMPALA-9856: Since this test expect to read a row up to 10
> nit: add comment as to why this query option is needed and why 11mb
Done


http://gerrit.cloudera.org:8080/#/c/16755/3/tests/query_test/test_query_mem_limit.py
File tests/query_test/test_query_mem_limit.py:

http://gerrit.cloudera.org:8080/#/c/16755/3/tests/query_test/test_query_mem_limit.py@99
PS3, Line 99:  # IMPALA-9856: For the group_concat
> nit: add why we need this and why it is 18M
Done



--
To view, visit http://gerrit.cloudera.org:8080/16755
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4
Gerrit-Change-Number: 16755
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 24 Feb 2021 17:11:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.

2021-02-24 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16755

to look at the new patch set (#4).

Change subject: IMPALA-9856: Enable result spooling by default.
..

IMPALA-9856: Enable result spooling by default.

Result spooling has been relatively stable since it was introduced, and
it has several benefits described in IMPALA-8656. This patch enable
result spooling (SPOOL_QUERY_RESULTS) query options by default.

Furthermore, some tests need to be adjusted to account for result
spooling by default. The following are the adjustment categories and
list of tests that fall under such category.

Change in assertions:
PlannerTest#testAcidTableScans
PlannerTest#testBloomFilterAssignment
PlannerTest#testConstantFolding
PlannerTest#testFkPkJoinDetection
PlannerTest#testFkPkJoinDetectionWithHDFSNumRowsEstDisabled
PlannerTest#testKuduSelectivity
PlannerTest#testMaxRowSize
PlannerTest#testMinMaxRuntimeFilters
PlannerTest#testMinMaxRuntimeFiltersWithHDFSNumRowsEstDisabled
PlannerTest#testMtDopValidation
PlannerTest#testParquetFiltering
PlannerTest#testParquetFilteringDisabled
PlannerTest#testPartitionPruning
PlannerTest#testPreaggBytesLimit
PlannerTest#testResourceRequirements
PlannerTest#testRuntimeFilterQueryOptions
PlannerTest#testSortExprMaterialization
PlannerTest#testSpillableBufferSizing
PlannerTest#testTableSample
PlannerTest#testTpch
PlannerTest#testKuduTpch
PlannerTest#testTpchNested
PlannerTest#testUnion
TpcdsPlannerTest
custom_cluster/test_admission_controller.py::TestAdmissionController::test_dedicated_coordinator_planner_estimates
custom_cluster/test_admission_controller.py::TestAdmissionController::test_memory_rejection
custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_mem_limit_configs
metadata/test_explain.py::TestExplain::test_explain_level2
metadata/test_explain.py::TestExplain::test_explain_level3
metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation

Increase BUFFER_POOL_LIMIT:
query_test/test_queries.py::TestQueries::test_analytic_fns
query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filter_reservation
query_test/test_sort.py::TestQueryFullSort::test_multiple_mem_limits_full_output
query_test/test_spilling.py::TestSpillingBroadcastJoins::test_spilling_broadcast_joins
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_aggs
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive
query_test/test_udfs.py::TestUdfExecution::test_mem_limits

Increase MEM_LIMIT:
query_test/test_mem_usage_scaling.py::TestExchangeMemUsage::test_exchange_mem_usage_scaling
query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling

Increase MAX_ROW_SIZE:
custom_cluster/test_parquet_max_page_header.py::TestParquetMaxPageHeader::test_large_page_header_config
query_test/test_insert.py::TestInsertQueries::test_insert_large_string
query_test/test_query_mem_limit.py::TestQueryMemLimit::test_mem_limit
query_test/test_scanners.py::TestTextSplitDelimiters::test_text_split_across_buffers_delimiter
query_test/test_scanners.py::TestWideRow::test_wide_row

Disable result spooling to maintain assertion:
custom_cluster/test_admission_controller.py::TestAdmissionController::test_set_request_pool
custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_host_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_pool_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_queue_reasons_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_config_change_while_queued
custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_fetched_rows
custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_finished_query
custom_cluster/test_scratch_disk.py::TestScratchDir::test_no_dirs
custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_existing_dirs
custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_writable_dirs
query_test/test_insert.py::TestInsertQueries::test_insert_large_string (the 
last query only)
query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan
query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_kudu_scan_mem_usage
query_test/test_queries.py::TestQueriesParquetTables::test_very_large_strings
query_test/test_query_mem_limit.py::TestCodegenMemLimit::test_codegen_mem_limit
shell/test_shell_client.py::TestShellClient::test_fetch_size

Testing:
- Pass exhaustive tests.

Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4
---
M common/thrift/ImpalaInternalService.thrift
M testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M 

[Impala-ASF-CR] IMPALA-10492: Lower default MAX CNF EXPRS query option

2021-02-27 Thread Riza Suminto (Code Review)
Hello Aman Sinha, David Rorke, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17132

to look at the new patch set (#3).

Change subject: IMPALA-10492: Lower default MAX_CNF_EXPRS query option
..

IMPALA-10492: Lower default MAX_CNF_EXPRS query option

MAX_CNF_EXPRS was set to unlimited by default. The CNF rewrite can lead
to significant frontend memory usage and eventually OutOfMemory for a
complex query that contain many predicates. We need to lower the default
value to avoid this memory problem while maintaining performance for our
TPC-DS and TPC-H workloads.

We investigate the maximum number of CNF expressions in TPC-DS and TPC-H
by printing out the final value of 'numCnfExprs_' from
ConvertToCNFRule.java to the query profile. We found 5 queries that
applies CNF rewrite rules as follow:

| Query | numCnfExprs_ |
|---+--|
| TPCDS-Q13 |  168 |
| TPCDS-Q85 |  100 |
| TPCDS-Q48 |   34 |
| TPCH-Q19  |  124 |
| TPCH-Q7   |3 |

This patch lower the default value from unlimited to 200 based on the
result above.

Testing:
- Manually verify that MAX_CNF_EXPRS 200 is enough for our TPC-DS and
  TPC-H worloads.
- Pass core tests.

Change-Id: I7ca3d0e094ac01c24a046c25d6a1b56bf134faa8
---
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
2 files changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/17132/3
--
To view, visit http://gerrit.cloudera.org:8080/17132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ca3d0e094ac01c24a046c25d6a1b56bf134faa8
Gerrit-Change-Number: 17132
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10492: Lower default MAX CNF EXPRS query option

2021-02-27 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17132 )

Change subject: IMPALA-10492: Lower default MAX_CNF_EXPRS query option
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17132/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17132/2//COMMIT_MSG@15
PS2, Line 15: CNF
> nit: CNF
Done



--
To view, visit http://gerrit.cloudera.org:8080/17132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ca3d0e094ac01c24a046c25d6a1b56bf134faa8
Gerrit-Change-Number: 17132
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 27 Feb 2021 14:25:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-08 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17252/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17252/10//COMMIT_MSG@7
PS10, Line 7: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
In my opinion, we should split this into 2 separate changes.
One is focus on fixing the test_overlap_min_max_filters, and the other is for 
the query profile improvement.


http://gerrit.cloudera.org:8080/#/c/17252/10/be/src/exec/scan-node.cc
File be/src/exec/scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/17252/10/be/src/exec/scan-node.cc@236
PS10, Line 236:  Current "
  :"time(ms): $2",
I think this is redundant and potentially confuse users who try to debug query 
profile.
The information from wait_time (Maximum arrival delay) should be sufficient.
As far as I know, MonotonicMilis mostly used to calculate time interval, not to 
represent system time.

Besides, filter arrival time is already displayed under host profile like this:

  impala-executor-000-1:27000:
Filter 16 arrival: 662ms
Filter 14 arrival: 10s454ms
Filter 6 arrival: 10s514ms
Filter 8 arrival: 5m46s
Filter 9 arrival: 5m46s
Filter 10 arrival: 5m46s
Filter 0 arrival: 7m29s
Filter 1 arrival: 7m29s
Filter 2 arrival: 7m29s


http://gerrit.cloudera.org:8080/#/c/17252/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17252/10/be/src/runtime/coordinator.cc@1614
PS10, Line 1614: minmaxDisabled_ = true;
I think we should maintain the same behavior as bloom_filter_ in here, making 
the logic consistent for both filter type.

I understand that this change is made to distinguish between 1) an actual 
AlwaysTrue filter that originate from executor, vs 2) AlwaysTrue filter that 
comes from disabling by coordinator.

Instead, what if we have a new boolean variable to track 1). Say, a boolean 
variable 'always_true_minmax_received_'. Coordinator::FilterDebugString() then 
should refer this boolean variable instead of checking 
minmax_filterPB.always_true(). Similarly also for AlwaysFalse filter arrival.



--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 08 Apr 2021 16:25:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-15 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17252/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
File 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test:

http://gerrit.cloudera.org:8080/#/c/17252/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@285
PS21, Line 285: set minmax_filter_threshold=0.5;
  : select straight_join count(*) from
  : lineitem_orderkey_only a join [SHUFFLE] tpch_parquet.orders b
  : where a.l_orderkey = b.o_orderkey;
It looks like this test behave differently if filter arrived on time vs arrived 
late.
The jenkins failure seemingly because of both filters arrived late.
It is curious that late filter arrival lead to less row from 
lineitem_orderkey_only scan, not more.

In my local machine, if I add query option "set 
runtime_filter_wait_time_ms=5000;", this test pass.
But without setting wait time, an AlwaysFalse minmax filter seems to be 
broadcasted. This is a section of query profile where this test failed:


  lv-desktop:27000:
Filter 1 arrival: 1s367ms
Filter 0 arrival: 1s371ms
...
  Runtime filters: Not all filters arrived (arrived: [], missing [1, 
0]), waited for 785ms. Arrival delay: 1s000ms.
...
  Filter 1 (0):
 - Files processed: 0 (0)
 - Files rejected: 0 (0)
 - Files total: 0 (0)
 - InactiveTotalTime: 0.000ns
 - RowGroups processed: 0 (0)
 - RowGroups rejected: 0 (0)
 - RowGroups total: 1 (1)
 - Rows processed: 1.81M (1810767)
 - Rows rejected: 1.81M (1810767)
 - Rows total: 2.14M (2142543)
 - Splits processed: 0 (0)
 - Splits rejected: 0 (0)
 - Splits total: 0 (0)
 - TotalTime: 0.000ns



--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 15 Apr 2021 21:07:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-09 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 15:

(2 comments)

Thanks Qifan for adjusting the changes.
I have 2 comments for patch set 15.

http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG@29
PS15, Line 29: Testing:
 :   1. Ran unit tests;
 :   2. Ran core tests.
Can we add a test to verify the content of MIN/MAX column in filter routing 
table?


http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc@659
PS15, Line 659: minmax_filterPB.has_always_true() && 
minmax_filterPB.always_true()
After a filter is disabled and broadcasted, isn't this always evaluate to True?
Maybe we should check state.AlwaysTrueFilterReceived() instead?



--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sat, 10 Apr 2021 01:19:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-12 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17195

to look at the new patch set (#3).

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..

IMPALA-10584: Defer advancing read page when reservation is tight.

TestScratchLimit::test_with_unlimited_scratch_limit has been
intermittently crashing in ubuntu-16.04-dockerised-tests environment
after result spooling is enabled by default in IMPALA-9856. DCHECK
violation occurs in ReservationTracker::CheckConsistency() due to
BufferedTupleStream wrongly tries to reclaim memory reservation while
unpinning the stream.

For this bug to surface, all of the following needs to happen:
- Stream is in pinned mode.
- There are only 2 pages in the stream: 1 read and 1 write.
- Stream can not increase reservation anymore either due to memory
  pressure or low buffer/memory limit.
- The stream read page has been fully read and is attached to output
  RowBatch. But the output RowBatch has not cleaned up yet.
- BufferedTupleStream::UnpinStream is invoked.

The memory accounting bug happens because UnpinStream proceeds to
NextReadPage where the read page buffer was mistakenly assumed as
released. default_page_len_ bytes were added into
write_page_reservation_ and subsequently violates the total memory
reservation.

This patch fixes the bug by deferring advancement of the read iterator
in UnpinStream if the read page is attached to output RowBatch and there
is no more unused reservation. This is OK because after UnpinStream
finished, the stream is now in unpinned mode and has_read_write_page is
false. The next AddRow operation is then allowed to unpin the previous
write page first before reusing the reservation to allocate a new write
page.

Testing:
- Loop the TestScratchLimit::test_with_unlimited_scratch_limit in my
  local dev machine and verify that each test passed without triggering
  the DCHECK violation.
- Reenable result spooling in TestScratchLimit that was disabled in
  IMPALA-10559.
- Pass core tests.

Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
---
M be/src/runtime/buffered-tuple-stream.cc
M tests/query_test/test_scratch_limit.py
2 files changed, 24 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17195/3
--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 17:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG@29
PS15, Line 29: row group is processed. By comparing these two timestamps, one
 : can easily diagnose issues related to late arrival of min/max
 : filters.
> Consider adding these three tests for Final table as its content is sending
The test added in Patch set 17 looks good to me, thanks!


http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator-filter-state.h
File be/src/runtime/coordinator-filter-state.h:

http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator-filter-state.h@178
PS17, Line 178: True value means the always false flag in aggregated filter is 
flipped.
To further clarify the doc, please mention that a True value means the filter 
was flipped from True to False by coordinator.


http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc@659
PS15, Line 659:
> If AlwaysTrueFilterReceived() is true, then the accumulated filter is logic
New logic looks good to me. I'll continue my comments on patch set 17.


http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator.cc@668
PS17, Line 668: state.AlwaysFalseFlipped()
The method name sounds ambiguous. Was it flipped from true to false, or false 
to true. Seems like the former. The method return true IF filter was an 
AlwaysFalse before being disabled (flipped to an AlwaysTrue) by coordinator.

Maybe "WasAlwaysFalse" is more descriptive?


http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/util/min-max-filter.h@358
PS17, Line 358:
  : std::string DebugString(const MinMaxFilterPB& filter);
  : bool AlwaysTrue(const MinMaxFilterPB& filter);
  : bool AlwaysFalse(const MinMaxFilterPB& filter);
  : std::string DebugString(const ColumnValuePB& value);
Any reason not making these methods as static methods under MinMaxFilter class?



--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 12 Apr 2021 18:59:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 3:

It looks like the solution of this problem is much more simple than I thought: 
just don't advance the read page.

The initial reason why we are advancing read page in the first place is 
explained in IMPALA-8890 (review is in https://gerrit.cloudera.org/c/14144/): a 
DCHECK hit in ExpectedPinCount() because we tried to unpin a fully exhausted 
and attached read_page_. IMPALA-8890 choose to advance the read iterator to 
avoid this DCHECK.

Patch set 3 in this review choose to NOT advance the read page, but instead 
skip the first page (that is the read page) when iterating pages_ to call 
UnpinPageIfNeeded.


--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 12 Apr 2021 20:18:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 18: Code-Review+1

(1 comment)

Looks good to me. Just have one more nit.
Please carry my +1 after fix.

http://gerrit.cloudera.org:8080/#/c/17252/18//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17252/18//COMMIT_MSG@12
PS18, Line 12: always_false_flipped_
nit: always_false_flipped_to_false_



--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 12 Apr 2021 21:00:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10611: Fix flakiness in test wide row

2021-04-19 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17324


Change subject: IMPALA-10611: Fix flakiness in test_wide_row
..

IMPALA-10611: Fix flakiness in test_wide_row

test_wide_row has been intermittently failed with "Failed to allocate
row batch" error message. This is due to recent change in IMPALA-9856
that add query option max_row_size=10MB without raising the mem_limit.
This patch fix the flakiness by increasing the mem_limit from 100 MB to
132 MB to account for 32 MB reservation needed by BufferedPlanRootSink.

Testing:
- Loop the test in local dev machine.

Change-Id: Ie1f0b7d4d6b3a875d9b408f057d46fdbdbdf2a34
---
M tests/query_test/test_scanners.py
1 file changed, 4 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/17324/1
--
To view, visit http://gerrit.cloudera.org:8080/17324
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie1f0b7d4d6b3a875d9b408f057d46fdbdbdf2a34
Gerrit-Change-Number: 17324
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-19 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 22: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 22
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 19 Apr 2021 18:04:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-14 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 21:

> Patch Set 21:
>
> Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7067/ 
> DRY_RUN=false

I think we need to rerun the gerrit-verify-dryrun since jenkins.impala.io down 
yesterday.


--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 14 Apr 2021 11:39:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 20: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 13 Apr 2021 16:01:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-13 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17195

to look at the new patch set (#4).

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..

IMPALA-10584: Defer advancing read page when reservation is tight.

TestScratchLimit::test_with_unlimited_scratch_limit has been
intermittently crashing in ubuntu-16.04-dockerised-tests environment
after result spooling is enabled by default in IMPALA-9856. DCHECK
violation occurs in ReservationTracker::CheckConsistency() due to
BufferedTupleStream wrongly tries to reclaim memory reservation while
unpinning the stream.

For this bug to surface, all of the following needs to happen:
- Stream is in pinned mode.
- There are only 2 pages in the stream: 1 read and 1 write.
- Stream can not increase reservation anymore either due to memory
  pressure or low buffer/memory limit.
- The stream read page has been fully read and is attached to output
  RowBatch. But the output RowBatch has not cleaned up yet.
- BufferedTupleStream::UnpinStream is invoked.

The memory accounting bug happens because UnpinStream proceeds to
NextReadPage where the read page buffer was mistakenly assumed as
released. default_page_len_ bytes were added into
write_page_reservation_ and subsequently violates the total memory
reservation.

This patch fixes the bug by deferring advancement of the read iterator
in UnpinStream if the read page is attached to output RowBatch and there
is no more unused reservation. This is OK because after UnpinStream
finished, the stream is now in unpinned mode and has_read_write_page is
false. The next AddRow operation is then allowed to unpin the previous
write page first before reusing the reservation to allocate a new write
page.

Testing:
- Add be test DeferAdvancingReadPage.
- Loop the TestScratchLimit::test_with_unlimited_scratch_limit in my
  local dev machine and verify that each test passed without triggering
  the DCHECK violation.
- Reenable result spooling in TestScratchLimit that was disabled in
  IMPALA-10559.
- Pass core tests.

Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
---
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M tests/query_test/test_scratch_limit.py
4 files changed, 97 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17195/4
--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 4:

(2 comments)

Patch set 4 submitted.

http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream-test.cc@1237
PS3, Line 1237: TEST_F(SimpleTupleStreamTest, UnpinReadPage) {
> It is possible to reproduce the bug with some changes in this test?
Added DeferAdvancingReadPage to test this corner case.


http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream.cc@723
PS3, Line 723:
> Does this assume the size of the read page is default_page_len_? Should we
That is a good point, thanks!
However, replacing it with read_it_.read_page_->len() caught this DCHECK 
failure: "buffer-pool.cc:93] Check failed: is_open()".
I think this because the the page handle has been closed in 
AttachBufferToBatch().
I made a work around by memorizing the page len in last_attached_page_len_ 
variable.



--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 13 Apr 2021 19:07:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc@733
PS4, Line 733: dvancing_read_page = true;
> This DCHECK was part of my debugging when investigating the bug.
Done


http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc@750
PS4, Line 750:
> Ack
Done. Reword it a bit.



--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 13 Apr 2021 20:41:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc@733
PS4, Line 733: read_it_.read_page_ == pages_.begin()
> I wonder if this assertion is always true, since in pinned mode, there can
This DCHECK was part of my debugging when investigating the bug.
The intent is to match assertion in NextReadPage (buffered-tuple-stream.cc:529).

Agree that this DCHECK most likely will always be true since we check earlier 
that attached_to_output_batch == true.
Will remove this DCHECK in next patch set.


http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc@750
PS4, Line 750: unpin this first page
> May add a comment on when this currently unpinned page is finally unpinned.
Ack



--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 13 Apr 2021 20:06:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-13 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Bikramjeet Vig, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17195

to look at the new patch set (#5).

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..

IMPALA-10584: Defer advancing read page when reservation is tight.

TestScratchLimit::test_with_unlimited_scratch_limit has been
intermittently crashing in ubuntu-16.04-dockerised-tests environment
after result spooling is enabled by default in IMPALA-9856. DCHECK
violation occurs in ReservationTracker::CheckConsistency() due to
BufferedTupleStream wrongly tries to reclaim memory reservation while
unpinning the stream.

For this bug to surface, all of the following needs to happen:
- Stream is in pinned mode.
- There are only 2 pages in the stream: 1 read and 1 write.
- Stream can not increase reservation anymore either due to memory
  pressure or low buffer/memory limit.
- The stream read page has been fully read and is attached to output
  RowBatch. But the output RowBatch has not cleaned up yet.
- BufferedTupleStream::UnpinStream is invoked.

The memory accounting bug happens because UnpinStream proceeds to
NextReadPage where the read page buffer was mistakenly assumed as
released. default_page_len_ bytes were added into
write_page_reservation_ and subsequently violates the total memory
reservation.

This patch fixes the bug by deferring advancement of the read iterator
in UnpinStream if the read page is attached to output RowBatch and there
is no more unused reservation. This is OK because after UnpinStream
finished, the stream is now in unpinned mode and has_read_write_page is
false. The next AddRow operation is then allowed to unpin the previous
write page first before reusing the reservation to allocate a new write
page.

Testing:
- Add be test DeferAdvancingReadPage.
- Loop the TestScratchLimit::test_with_unlimited_scratch_limit in my
  local dev machine and verify that each test passed without triggering
  the DCHECK violation.
- Reenable result spooling in TestScratchLimit that was disabled in
  IMPALA-10559.
- Pass core tests.

Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
---
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M tests/query_test/test_scratch_limit.py
4 files changed, 96 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17195/5
--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-14 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 5:

(1 comment)

s

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@728
PS5, Line 728: if (buffer_pool_client_->GetUnusedReservation() < 
last_attached_page_len_) {
 :   // Since attached_to_output_batch is true and 
GetUnusedReservation() is less
 :   // than last_attached_page_len_, the reader is most 
likely has not clean up the
 :   // RowBatch where the read page is attached to. We 
defer advancing the read page
 :   // until the next GetNext() call by the reader (see 
IMPALA-10584).
 :   defer_advancing_read_page = true;
 : }
> Your understanding is correct. Ideally, BufferedTupleStream should be 100%
Ok, after some investigation, I'm not confident I can implement solution 2) 
safely. The refactoring will touch several code that I don't fully understand 
yet such as grouping-aggregator, partitioned-hash-join-builder, and 
analytic-eval-node.

What if I change the logic into this instead:

  defer_advancing_read_page = num_pages_ <= 2;

Basically, we avoid getting into situation where we ended up with only 1 
read-write page while unpinning the stream. I think this is also a valid reason 
not to advance the read page (the other reason is that the reader has not freed 
the attached buffer). The "stealing" still won't happen and the DCHECK still 
won't hit.



--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 14 Apr 2021 21:27:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-14 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@728
PS5, Line 728: if (buffer_pool_client_->GetUnusedReservation() < 
last_attached_page_len_) {
 :   // Since attached_to_output_batch is true and 
GetUnusedReservation() is less
 :   // than last_attached_page_len_, the reader is most 
likely has not clean up the
 :   // RowBatch where the read page is attached to. We 
defer advancing the read page
 :   // until the next GetNext() call by the reader (see 
IMPALA-10584).
 :   defer_advancing_read_page = true;
 : }
> Maybe I'm misunderstanding something. This still seems tricky for me. I agr
Your understanding is correct. Ideally, BufferedTupleStream should be 100% 
aware whether its client has freed the attached buffer or not. It raise a 
question though on how to do that, because the stream lost reference to the 
buffer once it attach the buffer to output row batch.
In relation of BufferedPlanRootSink and SpillableRowBatchQueue, I can think of 
2 solution:

1). Track the amount of  both used and unused reservation at the end of 
GetNext() call. We can then check the reservation amount again in UnpinStream() 
to determine whether client has freed the buffer or not. However, this can be 
tricky and lead to more corner cases such as what if we increase/reduce the 
total reservation in between?

2). Add an explicit method in BufferedTupleStream to signal the free. 
SpillableRowBatchQueue then need similar method as well to chain the signal 
down from BufferedPlanRootSink to BufferedTupleStream. The mechanics might 
becomes awkward, but it maintain correctness.

I'll try to implement solution 2) first.



--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 14 Apr 2021 18:13:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10596: De-flake TestAdmissionControllerStress

2021-04-09 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17272 )

Change subject: IMPALA-10596: De-flake TestAdmissionControllerStress
..


Patch Set 1: Code-Review+1

Looks good to me, thanks Bikram for digging into this.
Agree that result spooling is not necessary for this test.


--
To view, visit http://gerrit.cloudera.org:8080/17272
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1594a82f507db8dc094d4441dd8739d037f974ff
Gerrit-Change-Number: 17272
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 09 Apr 2021 17:35:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10583: Fix bug on unbounded result spooling memory config.

2021-04-09 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17187 )

Change subject: IMPALA-10583: Fix bug on unbounded result spooling memory 
config.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17187/1/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17187/1/be/src/exec/buffered-plan-root-sink.cc@55
PS1, Line 55:   if (max_spilled_mem <= 0) max_spilled_mem = INT64_MAX;
> In practice, this would eventually encounter other limits such as running o
The closest limit I find is scratch_dirs startup flag. In this flag, we define 
the path of scratch dirs and capacity quota (maximum bytes allowed to spill) 
for each dirs. But the quota specification itself is optional. So without quota 
being specified, I suppose we depend on OS to tell impala when disk space has 
ran out.
https://impala.apache.org/docs/build/html/topics/impala_disk_space.html



--
To view, visit http://gerrit.cloudera.org:8080/17187
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If8f5e3668281bba8813f8082f45b4faa7721530e
Gerrit-Change-Number: 17187
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 09 Apr 2021 17:26:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page if stream only has 2 pages.

2021-04-14 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page if stream only has 2 
pages.
..


Patch Set 6:

(8 comments)

Patch set 5 change the condition on when to NOT advance the read page.
Read page will not be advanced in UnpinStream if read page is attached to 
output RowBatch and there are only 2 pages in the stream.

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1271
PS5, Line 1271: 
ASSERT_OK(stream.UnpinStream(BufferedTupleStream::UNPIN_ALL_EXCEPT_CURRENT));
> Not related to this patch, but I'm confused how do we verify we have advanc
I think this verify the read page has been advanced by not hitting DCHECK at 
ExpectedPinCount().


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1300
PS5, Line 1300: // Test that UnpinStream defer advancing the read page when all 
rows from the read page
> Could you add some comments above this? e.g.
Done


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1333
PS5, Line 1333: ASSERT_TRUE(read_batch.num_rows() < num_rows);
> Could you add a comment here that we are adding rows without releasing the
Done


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1334
PS5, Line 1334: ASSERT_TRUE(!eos);
> Could you leave a comment about why do we run this twice? I guess we want t
Done


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1354
PS5, Line 1354:   // Retry inserting this row by decreasing the index.
> Could you add ASSERT_FALSE(stream.is_pinned()) at the end?
Done


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1356
PS5, Line 1356:   // even if we're not immediately cleaning up the 
read_batch.
> Do we need read_batch.Reset() as well?
Done


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@728
PS5, Line 728:   // defer advancing the read page until the next 
GetNext() call by the reader
 :   // (see IMPALA-10584).
 :   defer_advancing_read_page = true;
 : }
 :   }
 :
 :   if 
> Ok, after some investigation, I'm not confident I can implement solution 2)
Marking this as Done. Lets continue this discussion in patch set 5.


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@747
PS5, Line 747:
> May DCHECK() on this assertion here.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 14 Apr 2021 22:22:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page if stream only has 2 pages.

2021-04-14 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Bikramjeet Vig, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17195

to look at the new patch set (#6).

Change subject: IMPALA-10584: Defer advancing read page if stream only has 2 
pages.
..

IMPALA-10584: Defer advancing read page if stream only has 2 pages.

TestScratchLimit::test_with_unlimited_scratch_limit has been
intermittently crashing in ubuntu-16.04-dockerised-tests environment
after result spooling is enabled by default in IMPALA-9856. DCHECK
violation occurs in ReservationTracker::CheckConsistency() due to
BufferedTupleStream wrongly tries to reclaim memory reservation while
unpinning the stream.

For this bug to surface, all of the following needs to happen:
- Stream is in pinned mode.
- There are only 2 pages in the stream: 1 read and 1 write.
- Stream can not increase reservation anymore either due to memory
  pressure or low buffer/memory limit.
- The stream read page has been fully read and is attached to output
  RowBatch. But the output RowBatch has not cleaned up yet.
- BufferedTupleStream::UnpinStream is invoked.

The memory accounting bug happens because UnpinStream proceeds to
NextReadPage where the read page buffer was mistakenly assumed as
released. default_page_len_ bytes were added into
write_page_reservation_ and subsequently violates the total memory
reservation.

This patch fixes the bug by deferring advancement of the read iterator
in UnpinStream if the read page is attached to output RowBatch and there
are only 2 pages in the stream. This is OK because after UnpinStream
finished, the stream is now in unpinned mode and has_read_write_page is
false. The next AddRow operation is then allowed to unpin the previous
write page first before reusing the reservation to allocate a new write
page. The next GetNext call will be responsible to advance the read
page.

Testing:
- Add be test DeferAdvancingReadPage.
- Loop the TestScratchLimit::test_with_unlimited_scratch_limit in my
  local dev machine and verify that each test passed without triggering
  the DCHECK violation.
- Reenable result spooling in TestScratchLimit that was disabled in
  IMPALA-10559.
- Pass core tests.

Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
---
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M tests/query_test/test_scratch_limit.py
4 files changed, 100 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17195/6
--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page if stream only has 2 pages.

2021-04-14 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page if stream only has 2 
pages.
..


Patch Set 6:

> Patch Set 6:
>
> (8 comments)
>
> Patch set 5 change the condition on when to NOT advance the read page.
> Read page will not be advanced in UnpinStream if read page is attached to 
> output RowBatch and there are only 2 pages in the stream.

Sorry, I should mention patch set 6 instead of 5.


--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 14 Apr 2021 22:43:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10497: Fix flakiness in test no fd caching on cached data.

2021-02-16 Thread Riza Suminto (Code Review)
Hello Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17054

to look at the new patch set (#3).

Change subject: IMPALA-10497: Fix flakiness in 
test_no_fd_caching_on_cached_data.
..

IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data.

test_no_fd_caching_on_cached_data has been flaky for not having all of
the data fully cached in the warm-up phase. There is a limit on
concurrency in writing to the cache such that we may fail to cache data
the first time read it. This patch fixes the test by repeating the
warm-up query 5 times. This patch also reduce the test cluster size to
just 1 to avoid potential flakiness from multiple impalad racing to
delete the same old fata cache files.

Testing:
- Loop the test manually 100 times and see no more failures.

Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952
---
M tests/custom_cluster/test_hdfs_fd_caching.py
1 file changed, 8 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/17054/3
--
To view, visit http://gerrit.cloudera.org:8080/17054
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952
Gerrit-Change-Number: 17054
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10497: Fix flakiness in test no fd caching on cached data.

2021-02-16 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17054 )

Change subject: IMPALA-10497: Fix flakiness in 
test_no_fd_caching_on_cached_data.
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17054/2/tests/custom_cluster/test_hdfs_fd_caching.py
File tests/custom_cluster/test_hdfs_fd_caching.py:

http://gerrit.cloudera.org:8080/#/c/17054/2/tests/custom_cluster/test_hdfs_fd_caching.py@168
PS2, Line 168: 500MB",
> Under the covers, custom cluster test uses bin/start-impala-cluster.py and
Done. Thanks for pointing out this solution.



--
To view, visit http://gerrit.cloudera.org:8080/17054
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952
Gerrit-Change-Number: 17054
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 16 Feb 2021 21:10:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10497: Fix flakiness in test no fd caching on cached data.

2021-02-16 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17054 )

Change subject: IMPALA-10497: Fix flakiness in 
test_no_fd_caching_on_cached_data.
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17054/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17054/3//COMMIT_MSG@13
PS3, Line 13: This patch also add a proper start_args to the
: test so that each impalad will write their data cache file in 
their own
: directory.
> Oh, can you revise this statement?
Done. Thanks for catching this!



--
To view, visit http://gerrit.cloudera.org:8080/17054
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952
Gerrit-Change-Number: 17054
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 16 Feb 2021 22:10:50 +
Gerrit-HasComments: Yes


<    1   2   3   4   5   6   7   8   9   10   >