[ https://issues.apache.org/jira/browse/GOBBLIN-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hung Tran updated GOBBLIN-1057: ------------------------------- Summary: Remove unnecessary RPCs in distcp-ng (was: Optimize unnecessary RPCs in distcp-ng) > Remove unnecessary RPCs in distcp-ng > ------------------------------------ > > Key: GOBBLIN-1057 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1057 > Project: Apache Gobblin > Issue Type: Improvement > Reporter: Hung Tran > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > There are some per-file FileSystem RPCs being invoked in Gobblin distcp-ng. > This results in a long file discovery phase that can be hours for a few > thousand files. > The RPCs that can be removed are: > getFileChecksum() - the value doesn't appear to be used. > getFileStatus() - this is called to get the modification time in > ModTimeDataFileVersionStrategy.getVersion(). The modification time is already > available from listStatus(), so use that value. -- This message was sent by Atlassian Jira (v8.3.4#803005)