Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16318 )

Change subject: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
......................................................................


Patch Set 17:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/hdfs-fs-cache.h
File be/src/runtime/hdfs-fs-cache.h:

http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/hdfs-fs-cache.h@59
PS17, Line 59: NULL
nit: use nullptr instead of NULL


http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-file.h
File be/src/runtime/io/disk-file.h:

http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-file.h@96
PS17, Line 96:   /// Return the lock of the file.
             :   boost::shared_mutex* GetFileLock() { return &lock_; }
             :
             :   /// Return the status lock of the file.
             :   SpinLock* GetStatusLock() { return &status_lock_; }
I think we typically try to avoid returning internal locks publicly. is there a 
reason these locks need to be exposed to other classes? can the code be 
re-factored so that any code that needs to acquire these locks lives inside 
this class?


http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-file.h@123
PS17, Line 123: volatile
why volatile? I've generally found the semantics of volatile in C++ to be quite 
complex and hard to get right (e.g. see 
https://sites.google.com/site/kjellhedstrom2/stay-away-from-volatile-in-threaded-code)


http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/disk-io-mgr.cc@253
PS17, Line 253:         // The file handle is closed, set the file to persisted 
status.
              :         lock_guard<SpinLock> l(disk_file_->status_lock_);
              :         disk_file_->SetPersisted();
bit of an anti-pattern to acquire a external classes internal locks. its 
probably cleaner to just have the SetPersisted method acquire the lock and then 
sate the state to PERSISTED.


http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/file-writer.h
File be/src/runtime/io/file-writer.h:

http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/file-writer.h@52
PS17, Line 52:   virtual Status WriteOne(WriteRange*) = 0;
not sure I understand the difference between WriteOne and Write


http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/file-writer.h@68
PS17, Line 68:   std::mutex lock_;
would be nice to explain the thread-safety semantics of this class


http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/file-writer.h@67
PS17, Line 67:   DiskIoMgr* io_mgr_ = nullptr;
             :   std::mutex lock_;
             :   int64_t written_bytes_ = 0;
             :   const char* file_path_;
             :   const int64_t file_size_;
             :   const int64_t max_page_size_;
nit: would be nice to have some docs on each of these


http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/io/hdfs-file-reader.cc@98
PS17, Line 98:   Status status = Status::OK();
nit: is this change necessary?


http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/tmp-file-mgr-internal.h
File be/src/runtime/tmp-file-mgr-internal.h:

http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/tmp-file-mgr-internal.h@46
PS17, Line 46:   /// Runtime Error during creation of the TmpFile.
             :   class ResourceError : public std::runtime_error {
             :    public:
             :     explicit ResourceError(const Status& status)
             :       : runtime_error(status.msg().msg()), status_(status) {}
             :     virtual ~ResourceError() {}
             :     Status& GetStatus() { return status_; }
             :
             :    private:
             :     Status status_;
             :   };
see other comment about C++ errors, probably should be removed. should be able 
to use 'Status' objects instead.


http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16318/13/be/src/runtime/tmp-file-mgr.cc@235
PS13, Line 235:    // Initialize the S3 options for later getting S3 connection.
              :       s3a_options_ = {make_pair("fs.s3a.fast.upload", "tru
probably don't want to hardcode these


http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/tmp-file-mgr.cc@236
PS17, Line 236:       s3a_options_ = {make_pair("fs.s3a.fast.upload", "true"),
              :           make_pair("fs.s3a.fast.upload.buffer", "disk")};
is the implementation of this feature dependent on setting these values? I 
don't think we want to set these values in the code itself, these types of 
configs are usually configured externally in an xml file (e.g. hadoop-site.xml 
or hdfs-site.xml)


http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/runtime/tmp-file-mgr.cc@728
PS17, Line 728:      throw TmpFile::ResourceError(
              :           Status(GetHdfsErrorMsg("Failed to connect to FS: ", 
hdfs_url)));
we try to avoid C++ errors. Impala uses 'Status' objects instead.


http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/util/hdfs-util.h
File be/src/util/hdfs-util.h:

http://gerrit.cloudera.org:8080/#/c/16318/17/be/src/util/hdfs-util.h@31
PS17, Line 31: ABFS_1
nit: ABFSS or ABFS_SECURE would probably be a better name



--
To view, visit http://gerrit.cloudera.org:8080/16318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89
Gerrit-Change-Number: 16318
Gerrit-PatchSet: 17
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Fri, 23 Oct 2020 16:59:02 +0000
Gerrit-HasComments: Yes

Reply via email to