Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16318 )

Change subject: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
......................................................................


Patch Set 24:

(62 comments)

http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-file.h
File be/src/runtime/io/disk-file.h:

http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-file.h@113
PS24, Line 113:     DCHECK(status_lock.owns_lock());
> nit: can you check here and and GetFileStatusLocked() that status_lock.mute
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-file.h@167
PS24, Line 167:   SpinLock status_lock_;
> What about the lock order between 'status_lock_' and 'lock_'. It looks like
Added comments.


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-file.h@175
PS24, Line 175:   boost::shared_mutex lock_;
> Maybe call this deletion_lock_ or physical_file_lock_ to make it clear that
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-file.h@185
PS24, Line 185:   /// Specify if the file's space is reserved.
> Can you add another sentence or two explaining what reserved means?
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-file.h@186
PS24, Line 186:   AtomicBool space_reserved_{false};
> What's the concurrency protocol around this value? It's pretty hard to use
Added some detailed comments.


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-file.cc
File be/src/runtime/io/disk-file.cc:

http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-file.cc@37
PS24, Line 37: Not
> nit: No
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-file.cc@39
PS24, Line 39:     if (!is_deleted(status_l)) {
> Restructure this conditional to do an early exit, it'll reduce nesting and
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-file.cc@40
PS24, Line 40:       status = FileSystemUtil::RemovePaths({path_});
> I think we can wrap RemovePaths() here with RETURN_IF_ERROR() here and remo
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-file.cc@55
PS24, Line 55:   file_writer_.reset(new LocalFileWriter(io_mgr, path.c_str()));
> nit: you can initialize file_writer_ in the initializer list above instead
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-file.cc@56
PS24, Line 56:   space_reserved_.Store(true);
> nit: initialize this in the initializer list.
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-io-mgr.h
File be/src/runtime/io/disk-io-mgr.h:

http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-io-mgr.h@81
PS24, Line 81: funtion
> nit: function
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-io-mgr.cc@268
PS24, Line 268: Status
> I think this should be a const&
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-io-mgr.cc@339
PS24, Line 339:     if (disk_file_src_->is_to_delete()) {
> I think we should simplify the concurrency by removing the is_to_delete() l
removed.


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-io-mgr.cc@390
PS24, Line 390: Status RemoteOperRange::DoFetch(uint8_t* buffer, int64_t 
buffer_size) {
> Yida says this is dead code, we can delete it.
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/disk-io-mgr.cc@850
PS24, Line 850: uint8_t*)
> nit: use static_cast instead of c-style cast.
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/file-writer.h
File be/src/runtime/io/file-writer.h:

http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/file-writer.h@39
PS24, Line 39:   FileWriter(DiskIoMgr* io_mgr, const char* file_path, const 
int64_t file_size,
> What happens if max_page_size is > file_size?
max_page_size is removed, because the file size is allowed to be a little over 
the default size, no longer needs the max_page_size to prevent this happen.


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/local-file-writer.h
File be/src/runtime/io/local-file-writer.h:

http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/local-file-writer.h@45
PS24, Line 45:   /// Internal function for writing the padding data to a file. 
It is used when the
> Why do we need padding? Can't we just upload what we have?
padding is removed, because the file size is allowed to be a little over the 
default size, no longer needs the padding to have a strict default file size.


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/local-file-writer.cc
File be/src/runtime/io/local-file-writer.cc:

http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/local-file-writer.cc@34
PS24, Line 34:   if (file_ == nullptr) {
> I'm suspicious about accessing file_ without holding lock, it's not obvious
Done.


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/request-context.cc
File be/src/runtime/io/request-context.cc:

http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/request-context.cc@264
PS24, Line 264:   // Execute the callback before decrementing the thread count. 
Otherwise
> Please factor this out into a helper function instead of duplicating the co
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/request-context.cc@682
PS24, Line 682: tmporary
> nit: temporary
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/io/request-ranges.h@279
PS24, Line 279: guranttee
> guarantee
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/query-state.cc@345
PS24, Line 345:             query_id(), query_options().scratch_limit, 
query_options().max_row_size));
> The calculation of the maximum isn't quite right, because the maximum page
max_page_size is removed, because the file size is allowed to be a little over 
the default size, no longer need the max_page_size to prevent this happen.


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr-internal.h
File be/src/runtime/tmp-file-mgr-internal.h:

http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr-internal.h@170
PS24, Line 170: /// TmpFileRemote is a derived class of TmpFile to provide 
methods to handle a
> I really like this class comment.
Thanks.


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr-internal.h@274
PS24, Line 274: done_
> Maybe rename to full_ or at_capacity_? I think that is more specific than "
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr-internal.h@311
PS24, Line 311:   /// Function called when a TmpFileGroup is closing to remove 
the write ranges
> Can you describe more precisely what this does with its input argument. E.g
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr-internal.h@375
PS24, Line 375: destorying
> nit: destroying
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr-internal.h@386
PS24, Line 386:       std::vector<TmpFileMgr::WriteDoneCallback>& 
write_callbacks, bool is_cancelled);
> Can you pass this by pointer instead of by reference? That's the convention
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.h@162
PS24, Line 162:   /// A configuration for the control parameters of remote 
temporary directories.
> Can you mention the ownership lifecycle of this? It looks like it's owned b
Yes, added comments.


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.h@167
PS24, Line 167: tmporary
> nit: temporary. I think I see a couple more instances of this typo so it's
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@100
PS24, Line 100: Should
> nit: should
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@132
PS24, Line 132: const string TMP_DIR_HDFS_LOCAL_DEFAULT = 
"hdfs://localhost:20500/tmp";
> Why do we need this? Outside of tests, shouldn't we be always getting the H
HDFS scratch space is test-only and uses a fixed path due to some difficulties 
of parsing the HDFS path. Added a Jira IMPALA-10429 to track it.


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@300
PS24, Line 300:       if (tmp_dirs_remote_ != nullptr) continue;
> In this case, add a WARNING log that the temporary directory is being ignor
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@305
PS24, Line 305: TMP_DIR_HDFS_LOCAL_DEFAULT
> why wouldn't we use tmp_path here?
Added comments on TMP_DIR_HDFS_LOCAL_DEFAULT.


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@357
PS24, Line 357:       // Later we might allow to use s3 fast upload directly 
without
> nit: the can fit on one line within the 90 char limit
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@405
PS24, Line 405:     vector<bool>& is_tmp_dir_on_disk, bool& has_remote_dir, int 
disk_id) {
> Our convention is to use pointers for output arguments
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@411
PS24, Line 411:       has_remote_dir = false;
> The naming is confusing - in this case we still have a remote dir, but don'
Changed the name to "need_local_buffer_dir".


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@427
PS24, Line 427:       return Status::OK();
> Do we want to exit early in this case? Don't want want to still use the loc
Currently local buffer directory won't be inserted into tmp_dirs_, the 
directories in tmp_dirs_ are used for Spill to Local FS, but local buffer 
directory can't. However the local buffer can be read during pin process, so it 
can provide some functions of spill to local if the file hasn't been evicted.


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@507
PS24, Line 507:   if (tmp_file->disk_type() == io::DiskFileType::DUMMY) {
> nit: just move this to the top (line 504) and make it a one-liner so that i
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@515
PS24, Line 515:   auto tmp_file_remote = static_cast<TmpFileRemote*>(tmp_file);
> nit: I'd prefer to declare with the full type instead of auto.
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@520
PS24, Line 520: boost
> nit: use std::unique_lock instead of boost, we switch over a while ago. In
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@524
PS24, Line 524:     if (status.ok()) {
> nit: could remove one layer of nesting here, e.g.
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@561
PS24, Line 561:   if (tmp_file == nullptr) {
> nit: I'd prefer if this was on a single line to make code denser.
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@571
PS24, Line 571: Evction
> nit: Eviction
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@717
PS24, Line 717:   DCHECK(local_buffer_path != nullptr);
> Pass local_buffer_path by const& if it's always non-null
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@722
PS24, Line 722:     hdfs_conn_ = hdfsConnect("default", 0);
> I think this connects to the wrong filesystem if hdfs_url references a diff
HDFS path is a test-only feature, and using fixed path to avoid some 
difficulties to parse the HDFS path. Added IMPALA-10429 to track the limitation.


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@774
PS24, Line 774:   SetToDelete();
> I was a little suspicious of this because it's not signalling a conditional
Removed.


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@778
PS24, Line 778:   boost::unique_lock<shared_mutex> 
buffer_file_lock(*(disk_buffer_file_->GetFileLock()));
> nit: mentioned this elsewhere, but drop boost:: so we're using std::unique_
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@789
PS24, Line 789:     } else {
> nit: can merge the else and if into an "else if" and eliminate a level of i
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@807
PS24, Line 807:   // Return the file to the pool if it hasn't been enqueued.
> nit: could fit this condition onto one line, we've usually favoured that be
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@906
PS24, Line 906: the
> nit: extra the
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@1025
PS24, Line 1025:     // It is not supposed to have a remote directory other 
than HDFS or S3.
> We can remove a layer of indentation by removing the if and just having a D
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@1100
PS24, Line 1100:     Status status = AllocateLocalSpace(
> If there's an error status here, it looks like we're dropping it. I think t
Currently it should be okay, because if alloc_full is set true in 
AllocateLocalSpace, the status won't be an error status. Added !status.ok() 
then return the status.


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@1134
PS24, Line 1134:  due to 
> nit: because instead of "due to"
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@1194
PS24, Line 1194:     auto tmp_file = static_cast<TmpFileRemote*>(handle->file_);
> nit: i'd prefer spelling out the type instead of auto, so long as it's not
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@1466
PS24, Line 1466:   if (!tmp_file->GetWriteFile()->IsSpaceReserved()) {
> Need to come back and understand this logic better.
Optimized the logic and added some comments.


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@1545
PS24, Line 1545:   if (upload_status.ok()) {
> Should we log a warning if it failed?
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@1731
PS24, Line 1731: Status TmpFileBufferPool::AddWriteRangesHelper(DiskFile* 
disk_file,
> Maybe rename this to MoveWriteRangesHelper, since it's really moving the ra
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@1774
PS24, Line 1774:     if (range->offset() == 0) {
> nit: could fit on 1 line
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@1818
PS24, Line 1818:       if (tmp_file_remote->is_enqueued()) {
> nit: the usual style in the codebase is to put these short conditionals on
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/runtime/tmp-file-mgr.cc@1841
PS24, Line 1841:   while (tmp_files_avail_pool_.empty()) {
> If remote spilling is enabled, we should add a timer that tracks the time s
Added a HistogramMetric tmp-file-mgr.tmp-file-buff-pool-dequeue-durations.


http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/util/hdfs-util.h
File be/src/util/hdfs-util.h:

http://gerrit.cloudera.org:8080/#/c/16318/24/be/src/util/hdfs-util.h@28
PS24, Line 28: #define FILESYS_PREFIX_HDFS "hdfs://"
> Define these as proper variables, we prefer not to use preprocessor macros
Done


http://gerrit.cloudera.org:8080/#/c/16318/24/tests/custom_cluster/test_scratch_disk.py
File tests/custom_cluster/test_scratch_disk.py:

http://gerrit.cloudera.org:8080/#/c/16318/24/tests/custom_cluster/test_scratch_disk.py@260
PS24, Line 260:   def test_scratch_dirs_remote_spill(self, vector):
> I think we probably need to add some @SkipIf markers so that this only runs
Done



--
To view, visit http://gerrit.cloudera.org:8080/16318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89
Gerrit-Change-Number: 16318
Gerrit-PatchSet: 24
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Tue, 12 Jan 2021 20:53:11 +0000
Gerrit-HasComments: Yes

Reply via email to