Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16318 )
Change subject: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 ...................................................................... Patch Set 30: (10 comments) I think I'm pretty close to approving it - the big thing is that I wanted to understand why the deadlock you were mentioning could occur. That one makes me think that we might be missing a flaw in how the local buffer directories work, or similar. Otherwise looking pretty good :) http://gerrit.cloudera.org:8080/#/c/16318/30//COMMIT_MSG Commit Message: PS30: Following on from our discussion about a potential deadlock, I do think there is a limitation that, if there are N local buffer files, only N spilling-to-remote queries can make progress at the same time. I think these queries almost always will have a partially-written file with a local buffer associated, so they may block other queries from spilling until they have completed and released that buffer. I think a query can be in this state indefinitely. E.g. if it was spilling but is now off doing some other work. This isn't ideal but is maybe OK if N is high enough, at least for milestone 1. I *think* that each query should be able to make progress independent of other queries. We would have a deadlock if query A depended on B to release a local buffer file to make progress and B depended directly or indirectly on A releasing some other resource to make progress. I haven't seen anything that looks like that yet. http://gerrit.cloudera.org:8080/#/c/16318/30//COMMIT_MSG@112 PS30, Line 112: * Ran pre-review-test Have you tested in combination with disk spill compression? http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/io/disk-io-mgr.cc@328 PS30, Line 328: 0 I think we should probably set the replication to 1 for this use case - that would reduce space requirement for HDFS spilling by 3x. http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/io/local-file-system.h File be/src/runtime/io/local-file-system.h: http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/io/local-file-system.h@42 PS30, Line 42: uint8 nit: uint8_t http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/io/local-file-system.cc File be/src/runtime/io/local-file-system.cc: http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/io/local-file-system.cc@98 PS30, Line 98: uint8 nit: uint8_t http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/io/local-file-writer.cc File be/src/runtime/io/local-file-writer.cc: http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/io/local-file-writer.cc@45 PS30, Line 45: int written = write(fileno(file_), range->data(), range->len()); I missed this, I think we should use LocalFileSystem::Fwrite which converts the error into a status so that the error is handled correctly. http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/tmp-file-mgr.h@420 PS30, Line 420: /// This is the main public API for TmpFileMgr writing. I notice that it doesn't clearly document anything about when the write finishes. We should probably say something like: The write may take some time to complete. It may be queued behind other I/O operations. If remote scratch is enabled, it may also need to wait for other queries to make progress and release space in the local buffer directory. http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.h@194 PS27, Line 194: std::unique_ptr<TmpFileBufferPool> tmp_file_pool_; > The pool only works for remote spilling. It would be null when there is no Can you also put this in the comment for tmp_file_pool_? http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/tmp-file-mgr.cc@517 PS30, Line 517: if (HasRemoteDir()) { This doesn't seem right if write_range is just writing to a local scratch directory (not just the local buffer directory). Shouldn't it check to see if write_range is jsut targeted at local scratch? I guess I would expect spilling to a local directory to use the same code path regardless of whether remote scratch is used or not. http://gerrit.cloudera.org:8080/#/c/16318/30/be/src/runtime/tmp-file-mgr.cc@1820 PS30, Line 1820: DCHECK(status.ok()); nit: use DCHECK_OK - we have a custom version that automatically prints the error message if the check fails. -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318 Gerrit-PatchSet: 30 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Wed, 20 Jan 2021 06:42:44 +0000 Gerrit-HasComments: Yes
