Hung Tran created GOBBLIN-1057: ---------------------------------- Summary: Optimize unnecessary RPCs in distcp-ng Key: GOBBLIN-1057 URL: https://issues.apache.org/jira/browse/GOBBLIN-1057 Project: Apache Gobblin Issue Type: Improvement Reporter: Hung Tran
There are some per-file FileSystem RPCs being invoked in Gobblin distcp-ng. This results in a long file discovery phase that can be hours for a few thousand files. The RPCs that can be removed are: getFileChecksum() - the value doesn't appear to be used. getFileStatus() - this is called to get the modification time in ModTimeDataFileVersionStrategy.getVersion(). The modification time is already available from listStatus(), so use that value. -- This message was sent by Atlassian Jira (v8.3.4#803005)