Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16318 )

Change subject: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
......................................................................


Patch Set 30:

(10 comments)

I think I'm pretty close to approving it - the big thing is that I wanted to 
understand why the deadlock you were mentioning could occur. That one makes me 
think that we might be missing a flaw in how the local buffer directories work, 
or similar.

Otherwise looking pretty good :)

http://gerrit.cloudera.org:8080/#/c/16318/30//COMMIT_MSG
Commit Message:

PS30:
Following on from our discussion about a potential deadlock, I do think there 
is a limitation that, if there are N local buffer files, only N 
spilling-to-remote queries can make progress at the same time. I think these 
queries almost always will have a partially-written file with a local buffer 
associated, so they may block other queries from spilling until they have 
completed and released that buffer. I think a query can be in this state 
indefinitely. E.g. if it was spilling but is now off doing some other work.

This isn't ideal but is maybe OK if N is high enough, at least for milestone 1.

I *think* that each query should be able to make progress independent of other 
queries. We would have a deadlock if query A depended on B to release a local 
buffer file to make progress and B depended directly or indirectly on A 
releasing some other resource to make progress. I haven't seen anything that 
looks like that yet.


http://gerrit.cloudera.org:8080/#/c/16318/30//COMMIT_MSG@112
PS30, Line 112: * Ran pre-review-test
Have you tested in combination with disk spill compression?


http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/io/disk-io-mgr.cc@328
PS30, Line 328: 0
I think we should probably set the replication to 1 for this use case - that 
would reduce space requirement for HDFS spilling by 3x.


http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/io/local-file-system.h
File be/src/runtime/io/local-file-system.h:

http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/io/local-file-system.h@42
PS30, Line 42: uint8
nit: uint8_t


http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/io/local-file-system.cc
File be/src/runtime/io/local-file-system.cc:

http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/io/local-file-system.cc@98
PS30, Line 98: uint8
nit: uint8_t


http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/io/local-file-writer.cc
File be/src/runtime/io/local-file-writer.cc:

http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/io/local-file-writer.cc@45
PS30, Line 45:   int written = write(fileno(file_), range->data(), 
range->len());
I missed this, I think we should use LocalFileSystem::Fwrite which converts the 
error into a status so that the error is handled correctly.


http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/tmp-file-mgr.h@420
PS30, Line 420:   ///
This is the main public API for TmpFileMgr writing. I notice that it doesn't 
clearly document anything about when the write finishes.

We should probably say something like:

  The write may take some time to complete. It may be queued behind other I/O 
operations. If remote scratch is enabled, it may also need to wait for other 
queries to make progress and release space in the local buffer directory.


http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.h@194
PS27, Line 194:     std::unique_ptr<TmpFileBufferPool> tmp_file_pool_;
> The pool only works for remote spilling. It would be null when there is no
Can you also put this in the comment for tmp_file_pool_?


http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/tmp-file-mgr.cc@517
PS30, Line 517:   if (HasRemoteDir()) {
This doesn't seem right if write_range is just writing to a local scratch 
directory (not just the local buffer directory). Shouldn't it check to see if 
write_range is jsut targeted at local scratch? I guess I would expect spilling 
to a local directory to use the same code path regardless of whether remote 
scratch is used or not.


http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/tmp-file-mgr.cc@1820
PS30, Line 1820:         DCHECK(status.ok());
nit: use DCHECK_OK - we have a custom version that automatically prints the 
error message if the check fails.



--
To view, visit http://gerrit.cloudera.org:8080/16318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89
Gerrit-Change-Number: 16318
Gerrit-PatchSet: 30
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Wed, 20 Jan 2021 06:42:44 +0000
Gerrit-HasComments: Yes

Reply via email to