Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16318 )

Change subject: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
......................................................................


Patch Set 35:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16318/35/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16318/35/be/src/runtime/tmp-file-mgr.cc@1795
PS35, Line 1795:         cur_write_range_->io_ctx()->Cancel();
> I think this Cancel() is undesirable, because it means there are two differ
Removed the Cancel() logic. I think it is okay to use the current mechanism 
only, which is to cancel the query by passing the cancel status to 
write_callback(). Because to cancel a query would be fast and may return the 
buffer that the query owns to the pool (the returned buffer could be in other 
servers) to make some progress for other queries in the system, in this case, 
the next query to wait for the buffer may not be blocked in 
ReserveLocalBufferSpace for a long time.


http://gerrit.cloudera.org:8080/#/c/16318/35/be/src/runtime/tmp-file-mgr.cc@1797
PS35, Line 1797:         status = TMP_FILE_BUFFER_POOL_CONTEXT_CANCELLED;
> As discussed, this should be a human-readable error message that explains t
Done. The new error msg would be like "ERROR: Cancelled in TmpFileBufferPool 
because: Timeout waiting for a local buffer in 60 seconds".


http://gerrit.cloudera.org:8080/#/c/16318/35/be/src/runtime/tmp-file-mgr.cc@1941
PS35, Line 1941:       timeout = true;
> It'd be clearer to just return here, instead of break then return below.
Done


http://gerrit.cloudera.org:8080/#/c/16318/35/tests/custom_cluster/test_scratch_disk.py
File tests/custom_cluster/test_scratch_disk.py:

http://gerrit.cloudera.org:8080/#/c/16318/35/tests/custom_cluster/test_scratch_disk.py@260
PS35, Line 260:   @pytest.mark.execute_serially
> Do these tests actually read back the data from remote storage? Or do they
Added limits to the size of local buffer dir, so that, at least part of the 
data would be read from the remote storage because the total size of spilled 
data is bigger than the local buffer limit, some of the local buffers must be 
evicted during the query.
Added a jira IMPALA-10508 to add metrics to identify the amount of data reading 
from the local buffer and the remote filesystem. The metrics could help to give 
more accurate numbers.



--
To view, visit http://gerrit.cloudera.org:8080/16318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89
Gerrit-Change-Number: 16318
Gerrit-PatchSet: 35
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Mon, 15 Feb 2021 14:59:56 +0000
Gerrit-HasComments: Yes

Reply via email to