Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16318 )

Change subject: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
......................................................................


Patch Set 13:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/hdfs-fs-cache.cc
File be/src/runtime/hdfs-fs-cache.cc:

http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/hdfs-fs-cache.cc@103
PS13, Line 103:       if (options != nullptr && !options->empty()) {
              :         for (auto option : *options) {
              :           hdfsBuilderConfSetStr(
              :               hdfs_builder, option.first.c_str(), 
option.second.c_str());
              :         }
              :       }
does this actually work? were you able to confirm that non-default configs set 
this way are actually picked up by the returned hdfsFS object?

according to the comment above, hdfsBuilderSetForceNewInstance has to be called 
in order for hdfsBuilderConfSetStr to have an affect.


http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/io/disk-io-mgr.h
File be/src/runtime/io/disk-io-mgr.h:

http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/io/disk-io-mgr.h@370
PS13, Line 370:     REMOTE_DFS_DISK_FILE_OPER_OFFSET,
              :     REMOTE_S3_DISK_FILE_OPER_OFFSET,
you cover this a bit in the commit message, but is the reason they were added 
to de-couple HDFS/S3 scans / writes from HDFS/S3 tmp-file-mgr operations? I 
guess that benefit is isolate between operations that spill vs. operations that 
read / write data on behalf of queries?

it would be good to add some documentation here explaining why these two queues 
are necessary and when to, for example, use REMOTE_S3_DISK_FILE_OPER_OFFSET vs. 
REMOTE_S3_DISK_OFFSET


http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/io/disk-io-mgr.cc@30
PS13, Line 30: #include "runtime/tmp-file-mgr-internal.h"
Why are we including the tmp-file-mgr here? I see a lot of references to the 
tmp-file-mgr, but ideally I would think the disk-io-mgr and tmp-file-mgr would 
remain separate systems. I see a lot of code referring to tmp_files in this 
class as well, which I concerning.

I would think that the disk-io-mgr should just provide an interface to the 
tmp-file-mgr.


http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/io/disk-io-mgr.cc@241
PS13, Line 241: goto end;
I think we generally try to avoid 'goto' statements because they make the 
control flow harder to follow. in this case the 'goto end' is only used once so 
I don't think it is necessary.


http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/io/request-ranges.h@113
PS13, Line 113:     READ,
              :     WRITE,
              :     FETCH,
              :     UPLOAD,
not sure I understand the difference between all of these. fetch seems like it 
would be the same thing as read, and upload seems like the same thing as write.



-- 
To view, visit http://gerrit.cloudera.org:8080/16318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89
Gerrit-Change-Number: 16318
Gerrit-PatchSet: 13
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Wed, 30 Sep 2020 21:55:55 +0000
Gerrit-HasComments: Yes

Reply via email to