Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16318 )
Change subject: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 ...................................................................... Patch Set 35: (4 comments) I think we should merge this once a few small issues are fixed. After merging I would like to see more stress and perf testing before we consider it produciton-ready, but I think at this point the code is best in master. http://gerrit.cloudera.org:8080/#/c/16318/35/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/35/be/src/runtime/tmp-file-mgr.cc@1795 PS35, Line 1795: cur_write_range_->io_ctx()->Cancel(); I think this Cancel() is undesirable, because it means there are two different code paths that the query failure propagates by, and we may have different errors from them. I think passing status to write_callback() below is enough to cancel the query, but maybe we need to set an additional flag here to prevent different threads from getting blocked in ReserveLocalBufferSpace and instead pass the same timeout error to write_callback() http://gerrit.cloudera.org:8080/#/c/16318/35/be/src/runtime/tmp-file-mgr.cc@1797 PS35, Line 1797: status = TMP_FILE_BUFFER_POOL_CONTEXT_CANCELLED; As discussed, this should be a human-readable error message that explains that the spilling was cancelled after waiting for x seconds without getting a local file buffer. http://gerrit.cloudera.org:8080/#/c/16318/35/be/src/runtime/tmp-file-mgr.cc@1941 PS35, Line 1941: timeout = true; It'd be clearer to just return here, instead of break then return below. http://gerrit.cloudera.org:8080/#/c/16318/35/tests/custom_cluster/test_scratch_disk.py File tests/custom_cluster/test_scratch_disk.py: http://gerrit.cloudera.org:8080/#/c/16318/35/tests/custom_cluster/test_scratch_disk.py@260 PS35, Line 260: @pytest.mark.execute_serially Do these tests actually read back the data from remote storage? Or do they read from in-memory and the local files. I.e. did you confirm that they are reading spilled data from HDFS? If they are reading it back from in memory, maybe set --buffer_pool_clean_pages_limit to a lower value so that more clean pages are evicted -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318 Gerrit-PatchSet: 35 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Fri, 12 Feb 2021 22:33:11 +0000 Gerrit-HasComments: Yes
