Hung Tran created GOBBLIN-1057:
----------------------------------

             Summary: Optimize unnecessary RPCs in distcp-ng
                 Key: GOBBLIN-1057
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1057
             Project: Apache Gobblin
          Issue Type: Improvement
            Reporter: Hung Tran


There are some per-file FileSystem RPCs being invoked in Gobblin distcp-ng.

This results in a long file discovery phase that can be hours for a few 
thousand files.

The RPCs that can be removed are:

getFileChecksum() - the value doesn't appear to be used.

getFileStatus() - this is called to get the modification time in 
ModTimeDataFileVersionStrategy.getVersion(). The modification time is already 
available from listStatus(), so use that value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to