Abhishek Rawat has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16318 )

Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
......................................................................


Patch Set 10:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/16318/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16318/10//COMMIT_MSG@43
PS10, Line 43:    A local buffer file can be evicted if it is in status REMOTE 
or it
typo REMOTE -> UPLOADED
Also, maybe better to say "its all pages have been pinned"


http://gerrit.cloudera.org:8080/#/c/16318/10//COMMIT_MSG@51
PS10, Line 51: Remote scratch space uses the highest priority local scratch dir
             :    as its buffer.
We could look into enabling the multiple directories per device startup option 
by default (allow_multiple_scratch_dirs_per_device) so users can configure 
multiple directories in the local file system.

This option is disabled by default, but it might make sense to enable it since 
we want to reserve a local directory as a buffer for remote scratch dirs.

Also, in case no local scratch directory has been configured, then that should 
be a configuration error.
If, only one local scratch directory is configured then we should use that as a 
buffer for remote.


http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/hdfs-fs-cache.h
File be/src/runtime/hdfs-fs-cache.h:

http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/hdfs-fs-cache.h@57
PS10, Line 57:
Maybe use a typedef for the vector<std::pair<string, string>> to improve the 
readability.


http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/hdfs-fs-cache.cc
File be/src/runtime/hdfs-fs-cache.cc:

http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/hdfs-fs-cache.cc@103
PS10, Line 103:       if (options != nullptr && !options->empty()) {
Should we also use the local cache for these connections?
Also, do you also need to create a new instance of the filesystem object like 
we do in L99 above? Seems like you would have to?


http://gerrit.cloudera.org:8080/#/c/16318/7/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16318/7/be/src/runtime/io/disk-io-mgr.cc@110
PS7, Line 110: DEFINE_int32(num_remote_hdfs_file_oper_io_threads, 2,
I think as a starting point, we should probably use the same number of threads 
as num_remote_hdfs_io_threads


http://gerrit.cloudera.org:8080/#/c/16318/7/be/src/runtime/io/disk-io-mgr.cc@122
PS7, Line 122: DEFINE_int32(num_s3_file_oper_io_threads, 2, "Number of S3 file 
operations I/O threads");
Probably good to use the same default number of threads as num_s3_io_threads.


http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc@110
PS10, Line 110: DEFINE_int32(num_remote_hdfs_file_oper_io_threads, 2,
Add a comment for the new startup flag.


http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc@240
PS10, Line 240: ScopedHistogramTimer
Should the write timer also involve the lock acquisition delays?


http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc@285
PS10, Line 285:   offset_ = file_offset;
              :   disk_id_ = disk_id;
              :   tmp_file_ = tmp_file;
              :   io_mgr_ = io_mgr;
We could move all these to the initializer's list in the constructor.


http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/disk-io-mgr.cc@294
PS10, Line 294:   disk_id_ = disk_id;
              :   file_path_ = file_path;
              :   io_mgr_ = io_mgr;
Same here, better to move to initializer's list.


http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/request-context.h
File be/src/runtime/io/request-context.h:

http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/request-context.h@296
PS10, Line 296:   void RemoteOperDone(RemoteOperRange* oper_range, const 
Status& write_status);
Would be good to add comments here.


http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/io/request-ranges.h@372
PS10, Line 372: bool
Adding comments here for the new functions would be good.


http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr-internal.h
File be/src/runtime/tmp-file-mgr-internal.h:

http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr-internal.h@38
PS10, Line 38: DUMPED
Not sure what's the best name for the new states. I do find myself thinking 
about this all the time I come across this code. Ideally we want something 
which is self explanatory:

This is just a suggestion, but something like following maybe?

INWRITING       -> WRITING_LOCAL  / SPILLING_LOCAL
DUMPED          -> WRITTEN_LOCAL  / SPILLED_LOCAL
INFETCHING      -> READING_REMOTE / LOADING_REMOTE
UPLOADED        -> WRITTEN_REMOTE / SPILLED_REMOTE
DUMPED_UPLOADED -> WRITTEN_LOCAL_REMOTE / SPILLED_LOCAL_REMOTE
DELETED         -> DELETED


http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr-internal.h@198
PS10, Line 198:   void SetDumped(bool dumped = true);
Do we need multiple setters? Since they are updating the same file_status_ 
variable we could simply pass type TmpFileStatus as a param and set 
file_status_. We could do a special case for Delete in the common setter.


http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr.cc@134
PS10, Line 134: const
If this is a test only const then we should move it to the test file.


http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr.cc@152
PS10, Line 152:
Don't think we're using this variable?


http://gerrit.cloudera.org:8080/#/c/16318/10/be/src/runtime/tmp-file-mgr.cc@391
PS10, Line 391: s3a_options_
Should we set s3a_options_ only for S3 and not for hdfs?



--
To view, visit http://gerrit.cloudera.org:8080/16318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89
Gerrit-Change-Number: 16318
Gerrit-PatchSet: 10
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Fri, 11 Sep 2020 19:52:11 +0000
Gerrit-HasComments: Yes

Reply via email to