[Impala-ASF-CR] IMPALA-9867: Add Support for Spilling to S3: Milestone 1

Tim Armstrong (Code Review) Fri, 12 Feb 2021 14:33:21 -0800

Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16318 )


Change subject: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
......................................................................


Patch Set 35:

(4 comments)

I think we should merge this once a few small issues are fixed. After merging I 
would like to see more stress and perf testing before we consider it 
produciton-ready, but I think at this point the code is best in master.

http://gerrit.cloudera.org:8080/#/c/16318/35/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16318/35/be/src/runtime/tmp-file-mgr.cc@1795
PS35, Line 1795:         cur_write_range_->io_ctx()->Cancel();
I think this Cancel() is undesirable, because it means there are two different 
code paths that the query failure propagates by, and we may have different 
errors from them. I think passing status to write_callback() below is enough to 
cancel the query, but maybe we need to set an additional flag here to prevent 
different threads from getting blocked in ReserveLocalBufferSpace and instead 
pass the same timeout error to write_callback()


http://gerrit.cloudera.org:8080/#/c/16318/35/be/src/runtime/tmp-file-mgr.cc@1797
PS35, Line 1797:         status = TMP_FILE_BUFFER_POOL_CONTEXT_CANCELLED;
As discussed, this should be a human-readable error message that explains that 
the spilling was cancelled after waiting for x seconds without getting a local 
file buffer.


http://gerrit.cloudera.org:8080/#/c/16318/35/be/src/runtime/tmp-file-mgr.cc@1941
PS35, Line 1941:       timeout = true;
It'd be clearer to just return here, instead of break then return below.


http://gerrit.cloudera.org:8080/#/c/16318/35/tests/custom_cluster/test_scratch_disk.py
File tests/custom_cluster/test_scratch_disk.py:

http://gerrit.cloudera.org:8080/#/c/16318/35/tests/custom_cluster/test_scratch_disk.py@260
PS35, Line 260:   @pytest.mark.execute_serially
Do these tests actually read back the data from remote storage? Or do they read 
from in-memory and the local files.

I.e. did you confirm that they are reading spilled data from HDFS?

If they are reading it back from in memory, maybe set 
--buffer_pool_clean_pages_limit to a lower value so that more clean pages are 
evicted



--
To view, visit http://gerrit.cloudera.org:8080/16318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89
Gerrit-Change-Number: 16318
Gerrit-PatchSet: 35
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Fri, 12 Feb 2021 22:33:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9867: Add Support for Spilling to S3: Milestone 1

Reply via email to