[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=400864&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400864 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 10/Mar/20 17:15 Start Date: 10/Mar/20 17:15 Worklog Time Spent: 10m Work Description: codecov-io commented on issue #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#issuecomment-586633838 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2633?src=pr&el=h1) Report > :exclamation: No coverage uploaded for pull request base (`master@bca2e1f`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit). > The diff coverage is `0%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2633?src=pr&el=tree) ```diff @@ Coverage Diff@@ ## master #2633 +/- ## Coverage ? 4.13% Complexity? 751 Files ?1937 Lines ? 72988 Branches ?8051 Hits ?3017 Misses? 69652 Partials ? 319 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2633?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...sion/finder/HdfsModifiedTimeHiveVersionFinder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L3ZlcnNpb24vZmluZGVyL0hkZnNNb2RpZmllZFRpbWVIaXZlVmVyc2lvbkZpbmRlci5qYXZh) | `23.07% <ø> (ø)` | `1 <0> (?)` | | | [...writer/partitioner/TimeBasedWriterPartitioner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3dyaXRlci9wYXJ0aXRpb25lci9UaW1lQmFzZWRXcml0ZXJQYXJ0aXRpb25lci5qYXZh) | `0% <ø> (ø)` | `0 <0> (?)` | | | [...he/gobblin/cluster/TaskRunnerSuiteThreadModel.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvVGFza1J1bm5lclN1aXRlVGhyZWFkTW9kZWwuamF2YQ==) | `0% <ø> (ø)` | `0 <0> (?)` | | | [.../java/org/apache/gobblin/hive/HiveLockFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0hpdmVMb2NrRmFjdG9yeS5qYXZh) | `0% <ø> (ø)` | `0 <0> (?)` | | | [...lin/hive/metastore/HiveMetaStoreBasedRegister.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL21ldGFzdG9yZS9IaXZlTWV0YVN0b3JlQmFzZWRSZWdpc3Rlci5qYXZh) | `0% <ø> (ø)` | `0 <0> (?)` | | | [...pache/gobblin/configuration/ConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vY29uZmlndXJhdGlvbi9Db25maWd1cmF0aW9uS2V5cy5qYXZh) | `0% <ø> (ø)` | `0 <0> (?)` | | | [.../org/apache/gobblin/hive/HiveRegistrationUnit.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0hpdmVSZWdpc3RyYXRpb25Vbml0LmphdmE=) | `0% <ø> (ø)` | `0 <0> (?)` | | | [.../org/apache/gobblin/service/ServiceConfigKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vc2VydmljZS9TZXJ2aWNlQ29uZmlnS2V5cy5qYXZh) | `0% <ø> (ø)` | `0 <0> (?)` | | | [...ain/java/org/apache/gobblin/writer/DataWriter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vd3JpdGVyL0RhdGFXcml0ZXIuamF2YQ==) | `0% <ø> (ø)` | `0 <0> (?)` | | | [...ain/java/org/apache/gobblin/hive/HiveLockImpl.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0hpdmVMb2NrSW1wbC5qYXZh) | `0% <ø> (ø)` | `0 <0> (?)` | | | ... and [129 more](https://codecov.io/gh/apache/incubator-gobblin/pull
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=399988&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-399988 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 09/Mar/20 06:36 Start Date: 09/Mar/20 06:36 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r389482730 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java ## @@ -73,6 +71,7 @@ private final VersionSelectionPolicy versionSelectionPolicy; private final ExecutorService executor; private final FileSystem srcFs; + private final CopyableFileFilter copyableFileFilter; Review comment: Thanks for the reference. AndPathFilter and CopyableFileFilter are two different interfaces and did not find a way to merge. AndPathFilter implements accept(..) whereas CopyableFileFilter implements filter(..). Please advise. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 399988) Time Spent: 7h 40m (was: 7.5h) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 7h 40m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=399987&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-399987 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 09/Mar/20 06:27 Start Date: 09/Mar/20 06:27 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r389480809 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java ## @@ -121,8 +119,8 @@ public TimestampBasedCopyableDataset(FileSystem fs, Properties props, Path datas ConcurrentLinkedQueue copyableFileList = new ConcurrentLinkedQueue<>(); List> futures = Lists.newArrayList(); for (TimestampedDatasetVersion copyableVersion : copyableVersions) { - futures.add(this.executor.submit(this.getCopyableFileGenetator(targetFs, configuration, copyableVersion, - copyableFileList))); + futures.add(this.executor.submit( + this.getCopyableFileGenetator(targetFs, configuration, copyableVersion, copyableFileList))); Review comment: Its existing code, but fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 399987) Time Spent: 7.5h (was: 7h 20m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=399986&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-399986 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 09/Mar/20 06:26 Start Date: 09/Mar/20 06:26 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r389480643 ## File path: gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDatasetTest.java ## @@ -91,12 +110,82 @@ public void testConfigOptions() { TimeBasedCopyPolicyForTest.class.getName()); } + @Test + public void testCopyWithFilter() throws IOException { + +/** source setup **/ +Path srcRoot = new Path(this.testTempPath, "src/data/dataset1/daily"); + +if (this.localFs.exists(srcRoot)) { + this.localFs.delete(srcRoot, true); +} + +List dateTimeList = Lists.newArrayList(); +IntStream.range(0, 4) +.forEach( +i -> dateTimeList.add(new DateTime(DateTimeZone.forID(ConfigurationKeys.PST_TIMEZONE_NAME)).minusDays(i))); + +String datePattern = "/MM/dd"; +DateTimeFormatter formatter = DateTimeFormat.forPattern(datePattern); + +for (DateTime dt : dateTimeList) { + String srcVersionPathStr = formatter.print(dt); + Path srcVersionPath = new Path(srcRoot, srcVersionPathStr); + this.localFs.mkdirs(srcVersionPath); + + Path srcfile = new Path(srcVersionPath, "file1.avro"); + this.localFs.create(srcfile); +} + +/** destination setup **/ +Path destRoot = new Path(this.testTempPath, "dest/data/dataset1"); +if (this.localFs.exists(destRoot)) { + this.localFs.delete(destRoot, true); +} +this.localFs.mkdirs(destRoot); + +Properties props = new Properties(); +props.setProperty(TimestampBasedCopyableDataset.COPY_POLICY, SelectBetweenTimeBasedPolicy.class.getName()); +props.setProperty(TimestampBasedCopyableDataset.DATASET_VERSION_FINDER, +DateTimeDatasetVersionFinder.class.getName()); + props.setProperty(SelectBetweenTimeBasedPolicy.TIME_BASED_SELECTION_MIN_LOOK_BACK_TIME_KEY, "1d"); + props.setProperty(SelectBetweenTimeBasedPolicy.TIME_BASED_SELECTION_MAX_LOOK_BACK_TIME_KEY, "6d"); +props.setProperty(DateTimeDatasetVersionFinder.DATE_TIME_PATTERN_KEY, "/MM/dd"); +props.setProperty("gobblin.dataset.copyable.file.filter.class", Review comment: org.apache.gobblin.data.management.dataset.DatasetUtils and org.apache.gobblin.data.management.copy.TimestampBasedCopyableDatasetTest are in different package, will change the access modifier to public. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 399986) Time Spent: 7h 20m (was: 7h 10m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 7h 20m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=398151&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398151 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 05/Mar/20 05:32 Start Date: 05/Mar/20 05:32 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r388085632 ## File path: gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDatasetTest.java ## @@ -91,12 +110,82 @@ public void testConfigOptions() { TimeBasedCopyPolicyForTest.class.getName()); } + @Test + public void testCopyWithFilter() throws IOException { + +/** source setup **/ +Path srcRoot = new Path(this.testTempPath, "src/data/dataset1/daily"); + +if (this.localFs.exists(srcRoot)) { + this.localFs.delete(srcRoot, true); +} + +List dateTimeList = Lists.newArrayList(); +IntStream.range(0, 4) +.forEach( +i -> dateTimeList.add(new DateTime(DateTimeZone.forID(ConfigurationKeys.PST_TIMEZONE_NAME)).minusDays(i))); + +String datePattern = "/MM/dd"; +DateTimeFormatter formatter = DateTimeFormat.forPattern(datePattern); + +for (DateTime dt : dateTimeList) { + String srcVersionPathStr = formatter.print(dt); + Path srcVersionPath = new Path(srcRoot, srcVersionPathStr); + this.localFs.mkdirs(srcVersionPath); + + Path srcfile = new Path(srcVersionPath, "file1.avro"); + this.localFs.create(srcfile); +} + +/** destination setup **/ +Path destRoot = new Path(this.testTempPath, "dest/data/dataset1"); +if (this.localFs.exists(destRoot)) { + this.localFs.delete(destRoot, true); +} +this.localFs.mkdirs(destRoot); + +Properties props = new Properties(); +props.setProperty(TimestampBasedCopyableDataset.COPY_POLICY, SelectBetweenTimeBasedPolicy.class.getName()); +props.setProperty(TimestampBasedCopyableDataset.DATASET_VERSION_FINDER, +DateTimeDatasetVersionFinder.class.getName()); + props.setProperty(SelectBetweenTimeBasedPolicy.TIME_BASED_SELECTION_MIN_LOOK_BACK_TIME_KEY, "1d"); + props.setProperty(SelectBetweenTimeBasedPolicy.TIME_BASED_SELECTION_MAX_LOOK_BACK_TIME_KEY, "6d"); +props.setProperty(DateTimeDatasetVersionFinder.DATE_TIME_PATTERN_KEY, "/MM/dd"); +props.setProperty("gobblin.dataset.copyable.file.filter.class", Review comment: Make DatasetUtils.COPYABLE_FILE_FILTER_KEY package private and use it here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398151) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=398150&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398150 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 05/Mar/20 05:32 Start Date: 05/Mar/20 05:32 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r388083153 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java ## @@ -121,8 +119,8 @@ public TimestampBasedCopyableDataset(FileSystem fs, Properties props, Path datas ConcurrentLinkedQueue copyableFileList = new ConcurrentLinkedQueue<>(); List> futures = Lists.newArrayList(); for (TimestampedDatasetVersion copyableVersion : copyableVersions) { - futures.add(this.executor.submit(this.getCopyableFileGenetator(targetFs, configuration, copyableVersion, - copyableFileList))); + futures.add(this.executor.submit( + this.getCopyableFileGenetator(targetFs, configuration, copyableVersion, copyableFileList))); Review comment: Typo: this.getCopyableFileGenerator This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398150) Time Spent: 7h 10m (was: 7h) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=398149&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398149 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 05/Mar/20 05:32 Start Date: 05/Mar/20 05:32 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r388081914 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java ## @@ -73,6 +71,7 @@ private final VersionSelectionPolicy versionSelectionPolicy; private final ExecutorService executor; private final FileSystem srcFs; + private final CopyableFileFilter copyableFileFilter; Review comment: Please take a look at AndPathFilter in gobblin-utility and UnixTimestampRecursiveCopyableDataset for an example usage. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398149) Time Spent: 7h (was: 6h 50m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=395366&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395366 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 29/Feb/20 01:29 Start Date: 29/Feb/20 01:29 Worklog Time Spent: 10m Work Description: arjun4084346 commented on issue #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#issuecomment-592804300 +1 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395366) Time Spent: 6h 50m (was: 6h 40m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=394646&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-394646 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 28/Feb/20 05:19 Start Date: 28/Feb/20 05:19 Worklog Time Spent: 10m Work Description: arjun4084346 commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r385511737 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/DateRangeBasedFileFilter.java ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import com.google.common.base.Strings; +import com.google.common.collect.ImmutableList; +import java.util.Collection; +import java.util.Iterator; +import lombok.extern.slf4j.Slf4j; +import org.apache.gobblin.configuration.ConfigurationKeys; +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modification time not within the lookback + * window + * sourceFs + */ +@Slf4j +public class DateRangeBasedFileFilter implements CopyableFileFilter { + + private Period minLookBackPeriod; + private Period maxLookBackPeriod; + private String timezone; + private DateTime currentTime; + private DateTime minLookBackTime; + private DateTime maxLookBackTime; + + public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset.filter."; + public static final String DEFAULT_DATE_PATTERN_TIMEZONE = ConfigurationKeys.PST_TIMEZONE_NAME; + + public DateRangeBasedFileFilter(Period minLookback, Period maxLookback, String timezone) { +this.minLookBackPeriod = minLookback; +this.maxLookBackPeriod = maxLookback; +this.timezone = timezone; +this.currentTime = !Strings.isNullOrEmpty(this.timezone) ? DateTime.now(DateTimeZone.forID(this.timezone)) +: DateTime.now(DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE)); +this.minLookBackTime = this.currentTime.minus(this.minLookBackPeriod); +this.maxLookBackTime = this.currentTime.minus(this.maxLookBackPeriod); + } + + /** + * For every {@link CopyableFile} in copyableFiles checks if a + * {@link CopyableFile#getOrigin()#getPath()#getModificationTime()} + * + date between the min and max look back window on sourceFs {@inheritDoc} + * + * @see CopyableFileFilter#filter(FileSystem, + * FileSystem, Collection) + */ + @Override + public Collection filter(FileSystem sourceFs, FileSystem targetFs, + Collection copyableFiles) { +Iterator iterator = copyableFiles.iterator(); + +ImmutableList.Builder filtered = ImmutableList.builder(); + +while (iterator.hasNext()) { + CopyableFile file = iterator.next(); + if (isFileModifiedWithinLookBackPeriod(file.getOrigin().getModificationTime())) { +filtered.add(file); + } +} + +return filtered.build(); + } + + /** + * + * @param modTime file modification time in long. + * @return true if the file modification time within lookback window; + * false if file modification time not within lookback window. + * + */ + private boolean isFileModifiedWithinLookBackPeriod(long modTime) { +DateTime modifiedTime = +!Strings.isNullOrEmpty(this.timezone) ? new DateTime(modTime, DateTimeZone.forID(this.timezone)) +: new DateTime(modTime, DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE)); Review comment: I found this more readable Strings.isNullOrEmpty(this.timezone) ? new DateTime(modTime, DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE)) : new DateTime(modTime, DateTimeZone.forID(this.timezone)) : ; just saying :D ---
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=394647&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-394647 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 28/Feb/20 05:19 Start Date: 28/Feb/20 05:19 Worklog Time Spent: 10m Work Description: arjun4084346 commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r385511940 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.joda.time.DateTime; +import org.joda.time.Period; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modification time not within the lookback Review comment: same. "in" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 394647) Time Spent: 6h 40m (was: 6.5h) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=394645&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-394645 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 28/Feb/20 05:10 Start Date: 28/Feb/20 05:10 Worklog Time Spent: 10m Work Description: arjun4084346 commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r385509959 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/DateRangeBasedFileFilter.java ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import com.google.common.base.Strings; +import com.google.common.collect.ImmutableList; +import java.util.Collection; +import java.util.Iterator; +import lombok.extern.slf4j.Slf4j; +import org.apache.gobblin.configuration.ConfigurationKeys; +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modification time not within the lookback Review comment: time is not .. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 394645) Time Spent: 6h 20m (was: 6h 10m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 6h 20m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387991&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387991 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Feb/20 20:00 Start Date: 15/Feb/20 20:00 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on issue #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#issuecomment-586636155 @sv2000 Please review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387991) Time Spent: 6h 10m (was: 6h) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387989&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387989 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Feb/20 19:33 Start Date: 15/Feb/20 19:33 Worklog Time Spent: 10m Work Description: codecov-io commented on issue #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#issuecomment-586633838 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2633?src=pr&el=h1) Report > :exclamation: No coverage uploaded for pull request base (`master@bca2e1f`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit). > The diff coverage is `84.61%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2633?src=pr&el=tree) ```diff @@Coverage Diff@@ ## master#2633 +/- ## = Coverage ? 45.85% Complexity? 9161 = Files ? 1932 Lines ?72659 Branches ? 7998 = Hits ?33316 Misses?36302 Partials ? 3041 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2633?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [.../copy/TimestampBasedCopyableGlobDatasetFinder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvcHkvVGltZXN0YW1wQmFzZWRDb3B5YWJsZUdsb2JEYXRhc2V0RmluZGVyLmphdmE=) | `0% <0%> (ø)` | `0 <0> (?)` | | | [.../gobblin/data/management/dataset/DatasetUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2RhdGFzZXQvRGF0YXNldFV0aWxzLmphdmE=) | `55.88% <100%> (ø)` | `6 <0> (?)` | | | [...agement/copy/ModifiedDateRangeBasedFileFilter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvcHkvTW9kaWZpZWREYXRlUmFuZ2VCYXNlZEZpbGVGaWx0ZXIuamF2YQ==) | `75% <75%> (ø)` | `4 <4> (?)` | | | [...management/copy/TimestampBasedCopyableDataset.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvcHkvVGltZXN0YW1wQmFzZWRDb3B5YWJsZURhdGFzZXQuamF2YQ==) | `83.52% <88.23%> (ø)` | `11 <0> (?)` | | | [...data/management/copy/DateRangeBasedFileFilter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvcHkvRGF0ZVJhbmdlQmFzZWRGaWxlRmlsdGVyLmphdmE=) | `89.28% <89.28%> (ø)` | `8 <8> (?)` | | | [...anagement/policy/SelectBetweenTimeBasedPolicy.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L3BvbGljeS9TZWxlY3RCZXR3ZWVuVGltZUJhc2VkUG9saWN5LmphdmE=) | `93.93% <90.47%> (ø)` | `9 <4> (?)` | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2633?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2633?src=pr&el=footer). Last update [bca2e1f...67344bb](https://codecov.io/gh/apache/incubator-gobblin/pull/2633?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387989) Time Spent: 6h (was: 5h 50m) > Di
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387982&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387982 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Feb/20 18:59 Start Date: 15/Feb/20 18:59 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849411 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window + * sourceFs + */ +@Slf4j +public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter { + + private final Properties props; + private Period minLookBackPeriod; + private Period maxLookBackPeriod; + private DateTime currentTime; + private DateTime minLookBackTime; + private DateTime maxLookBackTime; + + public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset.filter."; + public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime"; + public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime"; + public static final String DEFAULT_DATE_PATTERN_TIMEZONE = ConfigurationKeys.PST_TIMEZONE_NAME; + public static final String DATE_PATTERN_TIMEZONE_KEY = CONFIGURATION_KEY_PREFIX + "datetime.timezone"; + + public ModifiedDateRangeBasedFileFilter(Properties properties) { +this.props = properties; +PeriodFormatter periodFormatter = +new PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter(); +this.minLookBackPeriod = props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? periodFormatter.parsePeriod( +props.getProperty(MODIFIED_MIN_LOOK_BACK_TIME_KEY)) : new Period(DateTime.now().getMillis()); +this.maxLookBackPeriod = props.containsKey(MODIFIED_MAX_LOOK_BACK_TIME_KEY) ? periodFormatter.parsePeriod( +props.getProperty(MODIFIED_MAX_LOOK_BACK_TIME_KEY)) : new Period(DateTime.now().minusDays(1).getMillis()); +this.currentTime = properties.containsKey(DATE_PATTERN_TIMEZONE_KEY) ? DateTime.now( +DateTimeZone.forID(props.getProperty(DATE_PATTERN_TIMEZONE_KEY))) +: DateTime.now(DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE)); +this.minLookBackTime = this.currentTime.minus(minLookBackPeriod); +this.maxLookBackTime = this.currentTime.minus(maxLookBackPeriod); + } + + /** + * For every {@link CopyableFile} in copyableFiles checks if a {@link CopyableFile#getOrigin()#getPath()#getModificationTime()} + * + date between the min and max look back window on sourceFs {@inheritDoc} + * + * @see CopyableFileFilter#filter(FileSystem, + * FileSystem, Collection) + */ + @Override + public Collection filter(FileSystem sourceFs, FileSystem targetFs, + Collection copyableFiles) { +Iterator iterator = copyableFiles.iterator(); + +ImmutableList.Builder filtered = ImmutableList.builder(); + +while (iterator.hasNext()) { + CopyableFile file = iterator.next(); + boolean fileWithInModWindow = isFileModifiedBtwLookBackPeriod(file.getOrigin().ge
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387984&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387984 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Feb/20 18:59 Start Date: 15/Feb/20 18:59 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849417 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window + * sourceFs + */ +@Slf4j +public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter { + + private final Properties props; + private Period minLookBackPeriod; + private Period maxLookBackPeriod; + private DateTime currentTime; + private DateTime minLookBackTime; + private DateTime maxLookBackTime; + + public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset.filter."; + public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime"; + public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime"; + public static final String DEFAULT_DATE_PATTERN_TIMEZONE = ConfigurationKeys.PST_TIMEZONE_NAME; + public static final String DATE_PATTERN_TIMEZONE_KEY = CONFIGURATION_KEY_PREFIX + "datetime.timezone"; + + public ModifiedDateRangeBasedFileFilter(Properties properties) { +this.props = properties; +PeriodFormatter periodFormatter = +new PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter(); +this.minLookBackPeriod = props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? periodFormatter.parsePeriod( +props.getProperty(MODIFIED_MIN_LOOK_BACK_TIME_KEY)) : new Period(DateTime.now().getMillis()); +this.maxLookBackPeriod = props.containsKey(MODIFIED_MAX_LOOK_BACK_TIME_KEY) ? periodFormatter.parsePeriod( +props.getProperty(MODIFIED_MAX_LOOK_BACK_TIME_KEY)) : new Period(DateTime.now().minusDays(1).getMillis()); +this.currentTime = properties.containsKey(DATE_PATTERN_TIMEZONE_KEY) ? DateTime.now( +DateTimeZone.forID(props.getProperty(DATE_PATTERN_TIMEZONE_KEY))) +: DateTime.now(DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE)); +this.minLookBackTime = this.currentTime.minus(minLookBackPeriod); +this.maxLookBackTime = this.currentTime.minus(maxLookBackPeriod); + } + + /** + * For every {@link CopyableFile} in copyableFiles checks if a {@link CopyableFile#getOrigin()#getPath()#getModificationTime()} + * + date between the min and max look back window on sourceFs {@inheritDoc} + * + * @see CopyableFileFilter#filter(FileSystem, + * FileSystem, Collection) + */ + @Override + public Collection filter(FileSystem sourceFs, FileSystem targetFs, + Collection copyableFiles) { +Iterator iterator = copyableFiles.iterator(); + +ImmutableList.Builder filtered = ImmutableList.builder(); + +while (iterator.hasNext()) { + CopyableFile file = iterator.next(); + boolean fileWithInModWindow = isFileModifiedBtwLookBackPeriod(file.getOrigin().ge
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387985&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387985 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Feb/20 18:59 Start Date: 15/Feb/20 18:59 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849419 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window + * sourceFs + */ +@Slf4j +public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter { + + private final Properties props; + private Period minLookBackPeriod; + private Period maxLookBackPeriod; + private DateTime currentTime; + private DateTime minLookBackTime; + private DateTime maxLookBackTime; + + public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset.filter."; + public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime"; + public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime"; + public static final String DEFAULT_DATE_PATTERN_TIMEZONE = ConfigurationKeys.PST_TIMEZONE_NAME; + public static final String DATE_PATTERN_TIMEZONE_KEY = CONFIGURATION_KEY_PREFIX + "datetime.timezone"; + + public ModifiedDateRangeBasedFileFilter(Properties properties) { +this.props = properties; +PeriodFormatter periodFormatter = +new PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter(); +this.minLookBackPeriod = props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? periodFormatter.parsePeriod( +props.getProperty(MODIFIED_MIN_LOOK_BACK_TIME_KEY)) : new Period(DateTime.now().getMillis()); +this.maxLookBackPeriod = props.containsKey(MODIFIED_MAX_LOOK_BACK_TIME_KEY) ? periodFormatter.parsePeriod( +props.getProperty(MODIFIED_MAX_LOOK_BACK_TIME_KEY)) : new Period(DateTime.now().minusDays(1).getMillis()); +this.currentTime = properties.containsKey(DATE_PATTERN_TIMEZONE_KEY) ? DateTime.now( +DateTimeZone.forID(props.getProperty(DATE_PATTERN_TIMEZONE_KEY))) +: DateTime.now(DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE)); +this.minLookBackTime = this.currentTime.minus(minLookBackPeriod); +this.maxLookBackTime = this.currentTime.minus(maxLookBackPeriod); + } + + /** + * For every {@link CopyableFile} in copyableFiles checks if a {@link CopyableFile#getOrigin()#getPath()#getModificationTime()} + * + date between the min and max look back window on sourceFs {@inheritDoc} + * + * @see CopyableFileFilter#filter(FileSystem, + * FileSystem, Collection) + */ + @Override + public Collection filter(FileSystem sourceFs, FileSystem targetFs, + Collection copyableFiles) { +Iterator iterator = copyableFiles.iterator(); + +ImmutableList.Builder filtered = ImmutableList.builder(); + +while (iterator.hasNext()) { + CopyableFile file = iterator.next(); + boolean fileWithInModWindow = isFileModifiedBtwLookBackPeriod(file.getOrigin().ge
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387983&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387983 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Feb/20 18:59 Start Date: 15/Feb/20 18:59 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849413 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window + * sourceFs + */ +@Slf4j +public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter { + + private final Properties props; + private Period minLookBackPeriod; + private Period maxLookBackPeriod; + private DateTime currentTime; + private DateTime minLookBackTime; + private DateTime maxLookBackTime; + + public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset.filter."; + public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime"; + public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime"; + public static final String DEFAULT_DATE_PATTERN_TIMEZONE = ConfigurationKeys.PST_TIMEZONE_NAME; + public static final String DATE_PATTERN_TIMEZONE_KEY = CONFIGURATION_KEY_PREFIX + "datetime.timezone"; + + public ModifiedDateRangeBasedFileFilter(Properties properties) { +this.props = properties; +PeriodFormatter periodFormatter = +new PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter(); +this.minLookBackPeriod = props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? periodFormatter.parsePeriod( +props.getProperty(MODIFIED_MIN_LOOK_BACK_TIME_KEY)) : new Period(DateTime.now().getMillis()); +this.maxLookBackPeriod = props.containsKey(MODIFIED_MAX_LOOK_BACK_TIME_KEY) ? periodFormatter.parsePeriod( +props.getProperty(MODIFIED_MAX_LOOK_BACK_TIME_KEY)) : new Period(DateTime.now().minusDays(1).getMillis()); +this.currentTime = properties.containsKey(DATE_PATTERN_TIMEZONE_KEY) ? DateTime.now( +DateTimeZone.forID(props.getProperty(DATE_PATTERN_TIMEZONE_KEY))) +: DateTime.now(DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE)); +this.minLookBackTime = this.currentTime.minus(minLookBackPeriod); +this.maxLookBackTime = this.currentTime.minus(maxLookBackPeriod); + } + + /** + * For every {@link CopyableFile} in copyableFiles checks if a {@link CopyableFile#getOrigin()#getPath()#getModificationTime()} + * + date between the min and max look back window on sourceFs {@inheritDoc} + * + * @see CopyableFileFilter#filter(FileSystem, + * FileSystem, Collection) + */ + @Override + public Collection filter(FileSystem sourceFs, FileSystem targetFs, + Collection copyableFiles) { +Iterator iterator = copyableFiles.iterator(); + +ImmutableList.Builder filtered = ImmutableList.builder(); + +while (iterator.hasNext()) { + CopyableFile file = iterator.next(); + boolean fileWithInModWindow = isFileModifiedBtwLookBackPeriod(file.getOrigin().ge
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387986&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387986 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Feb/20 18:59 Start Date: 15/Feb/20 18:59 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849421 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window Review comment: Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387986) Time Spent: 5h 50m (was: 5h 40m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387980&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387980 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Feb/20 18:58 Start Date: 15/Feb/20 18:58 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849362 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java ## @@ -120,9 +118,10 @@ public TimestampBasedCopyableDataset(FileSystem fs, Properties props, Path datas Collection copyableVersions = this.versionSelectionPolicy.listSelectedVersions(versions); ConcurrentLinkedQueue copyableFileList = new ConcurrentLinkedQueue<>(); List> futures = Lists.newArrayList(); +//this.copyableFileFilter.filter(this.fs, targetFs, copyableFiles) Review comment: Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387980) Time Spent: 4h 50m (was: 4h 40m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387981&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387981 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Feb/20 18:58 Start Date: 15/Feb/20 18:58 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849405 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window + * sourceFs + */ +@Slf4j +public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter { Review comment: Agree. Created DataRangeFileFilter as you suggested. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387981) Time Spent: 5h (was: 4h 50m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387979&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387979 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Feb/20 18:58 Start Date: 15/Feb/20 18:58 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849356 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java ## @@ -134,7 +133,11 @@ public TimestampBasedCopyableDataset(FileSystem fs, Properties props, Path datas } finally { ExecutorsUtils.shutdownExecutorService(executor, Optional.of(log)); } -return copyableFileList; + +ConcurrentLinkedQueue copyableFilesFilteredList = new ConcurrentLinkedQueue<>(); Review comment: Existing contract returns ConcurrentLinkedQueue object, therefore did not change the object type. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387979) Time Spent: 4h 40m (was: 4.5h) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387977 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Feb/20 18:56 Start Date: 15/Feb/20 18:56 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849293 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/policy/SelectBetweenTimeBasedPolicy.java ## @@ -94,17 +100,25 @@ public SelectBetweenTimeBasedPolicy(Optional minLookBackPeriod, Optional public boolean apply(TimestampedDatasetVersion version) { return version.getDateTime() .plus(SelectBetweenTimeBasedPolicy.this.maxLookBackPeriod.or(new Period(DateTime.now().getMillis( -.isAfterNow() -&& version.getDateTime().plus(SelectBetweenTimeBasedPolicy.this.minLookBackPeriod.or(new Period(0))) -.isBeforeNow(); +.isAfterNow() && version.getDateTime() +.plus(SelectBetweenTimeBasedPolicy.this.minLookBackPeriod.or(new Period(0))) +.isBeforeNow(); } }; } protected static Period getLookBackPeriod(String lookbackTime) { Review comment: For better readability, prefer to have this reformatting. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387977) Time Spent: 4.5h (was: 4h 20m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387975&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387975 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Feb/20 18:56 Start Date: 15/Feb/20 18:56 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849244 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java ## @@ -73,6 +71,7 @@ private final VersionSelectionPolicy versionSelectionPolicy; private final ExecutorService executor; private final FileSystem srcFs; + private final CopyableFileFilter copyableFileFilter; Review comment: PathFilter interfaces do not support operation to merge the filter, also before filtering through the data range filter, hidden files are removed from the list by the existing control flow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387975) Time Spent: 4h 10m (was: 4h) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387976&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387976 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Feb/20 18:56 Start Date: 15/Feb/20 18:56 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849249 ## File path: gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDatasetTest.java ## @@ -91,12 +110,82 @@ public void testConfigOptions() { TimeBasedCopyPolicyForTest.class.getName()); } + @Test + public void testCopyWithFilter() throws IOException { + +/** source setup **/ +Path srcRoot = new Path(this.testTempPath, "src/slt/eqp/daily"); + +if (this.localFs.exists(srcRoot)) { + this.localFs.delete(srcRoot, true); +} + +List dateTimeList = Lists.newArrayList(); +IntStream.range(0, 4) +.forEach( +i -> dateTimeList.add(new DateTime(DateTimeZone.forID(ConfigurationKeys.PST_TIMEZONE_NAME)).minusDays(i))); + +String datePattern = "/MM/dd"; +DateTimeFormatter formatter = DateTimeFormat.forPattern(datePattern); + +for (DateTime dt : dateTimeList) { + String srcVersionPathStr = formatter.print(dt); + Path srcVersionPath = new Path(srcRoot, srcVersionPathStr); + this.localFs.mkdirs(srcVersionPath); + + Path srcfile = new Path(srcVersionPath, "file1.avro"); + this.localFs.create(srcfile); +} + +/** destination setup **/ +Path destRoot = new Path(this.testTempPath, "dest/slt/eqp"); Review comment: Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387976) Time Spent: 4h 20m (was: 4h 10m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312734 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Sep/19 22:47 Start Date: 15/Sep/19 22:47 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324483538 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window + * sourceFs + */ +@Slf4j +public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter { Review comment: Looks like most of the logic inside this class can be moved to a parent class that implements a "DateRangeFileFilter". ModTimeDateRangeFileFilter can extend this class and pass modification time to filter files. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 312734) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312736&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312736 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Sep/19 22:47 Start Date: 15/Sep/19 22:47 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324484636 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java ## @@ -73,6 +71,7 @@ private final VersionSelectionPolicy versionSelectionPolicy; private final ExecutorService executor; private final FileSystem srcFs; + private final CopyableFileFilter copyableFileFilter; Review comment: This class already has a method copyableFileFilter() that returns a HiddenFilter. You can use AndPathFilter to merge this filter with the filter specified in member variable copyableFileFilter. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 312736) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312732&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312732 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Sep/19 22:47 Start Date: 15/Sep/19 22:47 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324483057 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window Review comment: "if file modified time..." -> "if file modification time is not within the lookback window"? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 312732) Time Spent: 3h 20m (was: 3h 10m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312742&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312742 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Sep/19 22:47 Start Date: 15/Sep/19 22:47 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324483235 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window + * sourceFs + */ +@Slf4j +public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter { + + private final Properties props; + private Period minLookBackPeriod; + private Period maxLookBackPeriod; + private DateTime currentTime; + private DateTime minLookBackTime; + private DateTime maxLookBackTime; + + public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset.filter."; + public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime"; + public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime"; + public static final String DEFAULT_DATE_PATTERN_TIMEZONE = ConfigurationKeys.PST_TIMEZONE_NAME; + public static final String DATE_PATTERN_TIMEZONE_KEY = CONFIGURATION_KEY_PREFIX + "datetime.timezone"; + + public ModifiedDateRangeBasedFileFilter(Properties properties) { +this.props = properties; +PeriodFormatter periodFormatter = +new PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter(); +this.minLookBackPeriod = props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? periodFormatter.parsePeriod( +props.getProperty(MODIFIED_MIN_LOOK_BACK_TIME_KEY)) : new Period(DateTime.now().getMillis()); +this.maxLookBackPeriod = props.containsKey(MODIFIED_MAX_LOOK_BACK_TIME_KEY) ? periodFormatter.parsePeriod( +props.getProperty(MODIFIED_MAX_LOOK_BACK_TIME_KEY)) : new Period(DateTime.now().minusDays(1).getMillis()); +this.currentTime = properties.containsKey(DATE_PATTERN_TIMEZONE_KEY) ? DateTime.now( +DateTimeZone.forID(props.getProperty(DATE_PATTERN_TIMEZONE_KEY))) +: DateTime.now(DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE)); +this.minLookBackTime = this.currentTime.minus(minLookBackPeriod); +this.maxLookBackTime = this.currentTime.minus(maxLookBackPeriod); + } + + /** + * For every {@link CopyableFile} in copyableFiles checks if a {@link CopyableFile#getOrigin()#getPath()#getModificationTime()} + * + date between the min and max look back window on sourceFs {@inheritDoc} + * + * @see CopyableFileFilter#filter(FileSystem, + * FileSystem, Collection) + */ + @Override + public Collection filter(FileSystem sourceFs, FileSystem targetFs, + Collection copyableFiles) { +Iterator iterator = copyableFiles.iterator(); + +ImmutableList.Builder filtered = ImmutableList.builder(); + +while (iterator.hasNext()) { + CopyableFile file = iterator.next(); + boolean fileWithInModWindow = isFileModifiedBtwLookBackPeriod(file.getOrigin().getModifica
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312738 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Sep/19 22:47 Start Date: 15/Sep/19 22:47 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324484386 ## File path: gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDatasetTest.java ## @@ -91,12 +110,82 @@ public void testConfigOptions() { TimeBasedCopyPolicyForTest.class.getName()); } + @Test + public void testCopyWithFilter() throws IOException { + +/** source setup **/ +Path srcRoot = new Path(this.testTempPath, "src/slt/eqp/daily"); + +if (this.localFs.exists(srcRoot)) { + this.localFs.delete(srcRoot, true); +} + +List dateTimeList = Lists.newArrayList(); +IntStream.range(0, 4) +.forEach( +i -> dateTimeList.add(new DateTime(DateTimeZone.forID(ConfigurationKeys.PST_TIMEZONE_NAME)).minusDays(i))); + +String datePattern = "/MM/dd"; +DateTimeFormatter formatter = DateTimeFormat.forPattern(datePattern); + +for (DateTime dt : dateTimeList) { + String srcVersionPathStr = formatter.print(dt); + Path srcVersionPath = new Path(srcRoot, srcVersionPathStr); + this.localFs.mkdirs(srcVersionPath); + + Path srcfile = new Path(srcVersionPath, "file1.avro"); + this.localFs.create(srcfile); +} + +/** destination setup **/ +Path destRoot = new Path(this.testTempPath, "dest/slt/eqp"); Review comment: Change "dest/slt/eqp" pathname to some other dummy path. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 312738) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312737&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312737 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Sep/19 22:47 Start Date: 15/Sep/19 22:47 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324483226 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window + * sourceFs + */ +@Slf4j +public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter { + + private final Properties props; + private Period minLookBackPeriod; + private Period maxLookBackPeriod; + private DateTime currentTime; + private DateTime minLookBackTime; + private DateTime maxLookBackTime; + + public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset.filter."; + public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime"; + public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime"; + public static final String DEFAULT_DATE_PATTERN_TIMEZONE = ConfigurationKeys.PST_TIMEZONE_NAME; + public static final String DATE_PATTERN_TIMEZONE_KEY = CONFIGURATION_KEY_PREFIX + "datetime.timezone"; + + public ModifiedDateRangeBasedFileFilter(Properties properties) { +this.props = properties; +PeriodFormatter periodFormatter = +new PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter(); +this.minLookBackPeriod = props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? periodFormatter.parsePeriod( +props.getProperty(MODIFIED_MIN_LOOK_BACK_TIME_KEY)) : new Period(DateTime.now().getMillis()); +this.maxLookBackPeriod = props.containsKey(MODIFIED_MAX_LOOK_BACK_TIME_KEY) ? periodFormatter.parsePeriod( +props.getProperty(MODIFIED_MAX_LOOK_BACK_TIME_KEY)) : new Period(DateTime.now().minusDays(1).getMillis()); +this.currentTime = properties.containsKey(DATE_PATTERN_TIMEZONE_KEY) ? DateTime.now( +DateTimeZone.forID(props.getProperty(DATE_PATTERN_TIMEZONE_KEY))) +: DateTime.now(DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE)); +this.minLookBackTime = this.currentTime.minus(minLookBackPeriod); +this.maxLookBackTime = this.currentTime.minus(maxLookBackPeriod); + } + + /** + * For every {@link CopyableFile} in copyableFiles checks if a {@link CopyableFile#getOrigin()#getPath()#getModificationTime()} + * + date between the min and max look back window on sourceFs {@inheritDoc} + * + * @see CopyableFileFilter#filter(FileSystem, + * FileSystem, Collection) + */ + @Override + public Collection filter(FileSystem sourceFs, FileSystem targetFs, + Collection copyableFiles) { +Iterator iterator = copyableFiles.iterator(); + +ImmutableList.Builder filtered = ImmutableList.builder(); + +while (iterator.hasNext()) { + CopyableFile file = iterator.next(); + boolean fileWithInModWindow = isFileModifiedBtwLookBackPeriod(file.getOrigin().getModifica
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312735&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312735 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Sep/19 22:47 Start Date: 15/Sep/19 22:47 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324483733 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java ## @@ -134,7 +133,11 @@ public TimestampBasedCopyableDataset(FileSystem fs, Properties props, Path datas } finally { ExecutorsUtils.shutdownExecutorService(executor, Optional.of(log)); } -return copyableFileList; + +ConcurrentLinkedQueue copyableFilesFilteredList = new ConcurrentLinkedQueue<>(); Review comment: Do we need ConcurrentLinkedQueue? Seems like List should suffice? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 312735) Time Spent: 3h 40m (was: 3.5h) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312741&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312741 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Sep/19 22:47 Start Date: 15/Sep/19 22:47 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324483606 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java ## @@ -120,9 +118,10 @@ public TimestampBasedCopyableDataset(FileSystem fs, Properties props, Path datas Collection copyableVersions = this.versionSelectionPolicy.listSelectedVersions(versions); ConcurrentLinkedQueue copyableFileList = new ConcurrentLinkedQueue<>(); List> futures = Lists.newArrayList(); +//this.copyableFileFilter.filter(this.fs, targetFs, copyableFiles) Review comment: Remove this comment.. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 312741) Time Spent: 3h 50m (was: 3h 40m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312740&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312740 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Sep/19 22:47 Start Date: 15/Sep/19 22:47 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324483336 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window + * sourceFs + */ +@Slf4j +public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter { + + private final Properties props; + private Period minLookBackPeriod; + private Period maxLookBackPeriod; + private DateTime currentTime; + private DateTime minLookBackTime; + private DateTime maxLookBackTime; + + public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset.filter."; + public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime"; + public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime"; + public static final String DEFAULT_DATE_PATTERN_TIMEZONE = ConfigurationKeys.PST_TIMEZONE_NAME; + public static final String DATE_PATTERN_TIMEZONE_KEY = CONFIGURATION_KEY_PREFIX + "datetime.timezone"; + + public ModifiedDateRangeBasedFileFilter(Properties properties) { +this.props = properties; +PeriodFormatter periodFormatter = +new PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter(); +this.minLookBackPeriod = props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? periodFormatter.parsePeriod( +props.getProperty(MODIFIED_MIN_LOOK_BACK_TIME_KEY)) : new Period(DateTime.now().getMillis()); +this.maxLookBackPeriod = props.containsKey(MODIFIED_MAX_LOOK_BACK_TIME_KEY) ? periodFormatter.parsePeriod( +props.getProperty(MODIFIED_MAX_LOOK_BACK_TIME_KEY)) : new Period(DateTime.now().minusDays(1).getMillis()); +this.currentTime = properties.containsKey(DATE_PATTERN_TIMEZONE_KEY) ? DateTime.now( +DateTimeZone.forID(props.getProperty(DATE_PATTERN_TIMEZONE_KEY))) +: DateTime.now(DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE)); +this.minLookBackTime = this.currentTime.minus(minLookBackPeriod); +this.maxLookBackTime = this.currentTime.minus(maxLookBackPeriod); + } + + /** + * For every {@link CopyableFile} in copyableFiles checks if a {@link CopyableFile#getOrigin()#getPath()#getModificationTime()} + * + date between the min and max look back window on sourceFs {@inheritDoc} + * + * @see CopyableFileFilter#filter(FileSystem, + * FileSystem, Collection) + */ + @Override + public Collection filter(FileSystem sourceFs, FileSystem targetFs, + Collection copyableFiles) { +Iterator iterator = copyableFiles.iterator(); + +ImmutableList.Builder filtered = ImmutableList.builder(); + +while (iterator.hasNext()) { + CopyableFile file = iterator.next(); + boolean fileWithInModWindow = isFileModifiedBtwLookBackPeriod(file.getOrigin().getModifica
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312743&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312743 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Sep/19 22:47 Start Date: 15/Sep/19 22:47 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324484757 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java ## @@ -134,7 +133,11 @@ public TimestampBasedCopyableDataset(FileSystem fs, Properties props, Path datas } finally { ExecutorsUtils.shutdownExecutorService(executor, Optional.of(log)); } -return copyableFileList; + +ConcurrentLinkedQueue copyableFilesFilteredList = new ConcurrentLinkedQueue<>(); Review comment: Also, see the comment earlier about returning a merged path filter in TimeStampBasedCopyableDataset#copyFileFilter() method. That way, you can remove this filtering logic in the end. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 312743) Time Spent: 4h (was: 3h 50m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312739&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312739 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Sep/19 22:47 Start Date: 15/Sep/19 22:47 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324484290 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/policy/SelectBetweenTimeBasedPolicy.java ## @@ -94,17 +100,25 @@ public SelectBetweenTimeBasedPolicy(Optional minLookBackPeriod, Optional public boolean apply(TimestampedDatasetVersion version) { return version.getDateTime() .plus(SelectBetweenTimeBasedPolicy.this.maxLookBackPeriod.or(new Period(DateTime.now().getMillis( -.isAfterNow() -&& version.getDateTime().plus(SelectBetweenTimeBasedPolicy.this.minLookBackPeriod.or(new Period(0))) -.isBeforeNow(); +.isAfterNow() && version.getDateTime() +.plus(SelectBetweenTimeBasedPolicy.this.minLookBackPeriod.or(new Period(0))) +.isBeforeNow(); } }; } protected static Period getLookBackPeriod(String lookbackTime) { Review comment: Looks like this is just reformatting. Unless there is a reason to reformat, leave it as is. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 312739) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312733&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312733 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 15/Sep/19 22:47 Start Date: 15/Sep/19 22:47 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324483254 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window + * sourceFs + */ +@Slf4j +public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter { + + private final Properties props; + private Period minLookBackPeriod; + private Period maxLookBackPeriod; + private DateTime currentTime; + private DateTime minLookBackTime; + private DateTime maxLookBackTime; + + public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset.filter."; + public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime"; + public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime"; + public static final String DEFAULT_DATE_PATTERN_TIMEZONE = ConfigurationKeys.PST_TIMEZONE_NAME; + public static final String DATE_PATTERN_TIMEZONE_KEY = CONFIGURATION_KEY_PREFIX + "datetime.timezone"; + + public ModifiedDateRangeBasedFileFilter(Properties properties) { +this.props = properties; +PeriodFormatter periodFormatter = +new PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter(); +this.minLookBackPeriod = props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? periodFormatter.parsePeriod( +props.getProperty(MODIFIED_MIN_LOOK_BACK_TIME_KEY)) : new Period(DateTime.now().getMillis()); +this.maxLookBackPeriod = props.containsKey(MODIFIED_MAX_LOOK_BACK_TIME_KEY) ? periodFormatter.parsePeriod( +props.getProperty(MODIFIED_MAX_LOOK_BACK_TIME_KEY)) : new Period(DateTime.now().minusDays(1).getMillis()); +this.currentTime = properties.containsKey(DATE_PATTERN_TIMEZONE_KEY) ? DateTime.now( +DateTimeZone.forID(props.getProperty(DATE_PATTERN_TIMEZONE_KEY))) +: DateTime.now(DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE)); +this.minLookBackTime = this.currentTime.minus(minLookBackPeriod); +this.maxLookBackTime = this.currentTime.minus(maxLookBackPeriod); + } + + /** + * For every {@link CopyableFile} in copyableFiles checks if a {@link CopyableFile#getOrigin()#getPath()#getModificationTime()} + * + date between the min and max look back window on sourceFs {@inheritDoc} + * + * @see CopyableFileFilter#filter(FileSystem, + * FileSystem, Collection) + */ + @Override + public Collection filter(FileSystem sourceFs, FileSystem targetFs, + Collection copyableFiles) { +Iterator iterator = copyableFiles.iterator(); + +ImmutableList.Builder filtered = ImmutableList.builder(); + +while (iterator.hasNext()) { + CopyableFile file = iterator.next(); + boolean fileWithInModWindow = isFileModifiedBtwLookBackPeriod(file.getOrigin().getModifica
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=259196&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-259196 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 13/Jun/19 00:52 Start Date: 13/Jun/19 00:52 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on issue #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#issuecomment-501507352 @jhsenjaliya Pushed the changes, please review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 259196) Time Spent: 3h 10m (was: 3h) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=257969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257969 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 11/Jun/19 17:42 Start Date: 11/Jun/19 17:42 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r292580464 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/SelectBtwModDataTimeBasedCopyableFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window + * sourceFs + */ +@Slf4j +public class SelectBtwModDataTimeBasedCopyableFileFilter implements CopyableFileFilter { + + private final Properties props; + private Period minLookBackPeriod; + private Period maxLookBackPeriod; + private DateTime currentTime; + private DateTime minLookBackTime; + private DateTime maxLookBackTime; + + public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset."; + public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime"; + public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime"; + public static final String DEFAULT_DATE_PATTERN_TIMEZONE = ConfigurationKeys.PST_TIMEZONE_NAME; + public static final String DATE_PATTERN_TIMEZONE_KEY = CONFIGURATION_KEY_PREFIX + "datetime.timezone"; + + public SelectBtwModDataTimeBasedCopyableFileFilter(Properties properties) { +this.props = properties; +PeriodFormatter periodFormatter = +new PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter(); +this.minLookBackPeriod = props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? periodFormatter.parsePeriod( Review comment: i would like to follow the convention used in Gobblin to have min and max look back This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257969) Time Spent: 3h (was: 2h 50m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=257964&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257964 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 11/Jun/19 17:41 Start Date: 11/Jun/19 17:41 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r292580001 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/SelectBtwModDataTimeBasedCopyableFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window + * sourceFs + */ +@Slf4j +public class SelectBtwModDataTimeBasedCopyableFileFilter implements CopyableFileFilter { Review comment: yep it looks to me more appropriate, will change, thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257964) Time Spent: 2h 40m (was: 2.5h) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=257965&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257965 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 11/Jun/19 17:41 Start Date: 11/Jun/19 17:41 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r292580156 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/SelectBtwModDataTimeBasedCopyableFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window + * sourceFs + */ +@Slf4j +public class SelectBtwModDataTimeBasedCopyableFileFilter implements CopyableFileFilter { + + private final Properties props; + private Period minLookBackPeriod; + private Period maxLookBackPeriod; + private DateTime currentTime; + private DateTime minLookBackTime; + private DateTime maxLookBackTime; + + public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset."; Review comment: sure This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257965) Time Spent: 2h 50m (was: 2h 40m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=249886&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-249886 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 29/May/19 06:49 Start Date: 29/May/19 06:49 Worklog Time Spent: 10m Work Description: jhsenjaliya commented on issue #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#issuecomment-496808201 will continue review tomorrow This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 249886) Time Spent: 2.5h (was: 2h 20m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=249880&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-249880 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 29/May/19 06:44 Start Date: 29/May/19 06:44 Worklog Time Spent: 10m Work Description: jhsenjaliya commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r288414241 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/SelectBtwModDataTimeBasedCopyableFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window + * sourceFs + */ +@Slf4j +public class SelectBtwModDataTimeBasedCopyableFileFilter implements CopyableFileFilter { + + private final Properties props; + private Period minLookBackPeriod; + private Period maxLookBackPeriod; + private DateTime currentTime; + private DateTime minLookBackTime; + private DateTime maxLookBackTime; + + public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset."; + public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime"; + public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY = + CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime"; + public static final String DEFAULT_DATE_PATTERN_TIMEZONE = ConfigurationKeys.PST_TIMEZONE_NAME; + public static final String DATE_PATTERN_TIMEZONE_KEY = CONFIGURATION_KEY_PREFIX + "datetime.timezone"; + + public SelectBtwModDataTimeBasedCopyableFileFilter(Properties properties) { +this.props = properties; +PeriodFormatter periodFormatter = +new PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter(); +this.minLookBackPeriod = props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? periodFormatter.parsePeriod( Review comment: i initially thought `minLookBackPeriod` as what `minLookBackPeriod` is. If it helps, how about using startDate-endDate or since-by terminology? like `this.modifiedSince = props.containsKey("gobblin.dataset.filter.modified.since")` or `this.modifiedStartDate = props.containsKey("gobblin.dataset.filter.modified.startDate")` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 249880) Time Spent: 2h 20m (was: 2h 10m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT mod
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=249872&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-249872 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 29/May/19 06:38 Start Date: 29/May/19 06:38 Worklog Time Spent: 10m Work Description: jhsenjaliya commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r288412642 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/SelectBtwModDataTimeBasedCopyableFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window + * sourceFs + */ +@Slf4j +public class SelectBtwModDataTimeBasedCopyableFileFilter implements CopyableFileFilter { + + private final Properties props; + private Period minLookBackPeriod; + private Period maxLookBackPeriod; + private DateTime currentTime; + private DateTime minLookBackTime; + private DateTime maxLookBackTime; + + public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset."; Review comment: how about "gobblin.dataset.filter" to indicate all other properties to be specific to this filtering process? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 249872) Time Spent: 2h 10m (was: 2h) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=249871&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-249871 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 29/May/19 06:38 Start Date: 29/May/19 06:38 Worklog Time Spent: 10m Work Description: jhsenjaliya commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r288412642 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/SelectBtwModDataTimeBasedCopyableFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window + * sourceFs + */ +@Slf4j +public class SelectBtwModDataTimeBasedCopyableFileFilter implements CopyableFileFilter { + + private final Properties props; + private Period minLookBackPeriod; + private Period maxLookBackPeriod; + private DateTime currentTime; + private DateTime minLookBackTime; + private DateTime maxLookBackTime; + + public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset."; Review comment: should you use "gobblin.dataset.filter" to indicate all other properties to be specific to this filtering process? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 249871) Time Spent: 2h (was: 1h 50m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=249870&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-249870 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 29/May/19 06:34 Start Date: 29/May/19 06:34 Worklog Time Spent: 10m Work Description: jhsenjaliya commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r288411603 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/SelectBtwModDataTimeBasedCopyableFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window + * sourceFs + */ +@Slf4j +public class SelectBtwModDataTimeBasedCopyableFileFilter implements CopyableFileFilter { Review comment: should this be named `ModifiedDateRangeBasedFileFilter` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 249870) Time Spent: 1h 50m (was: 1h 40m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=249869&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-249869 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 29/May/19 06:34 Start Date: 29/May/19 06:34 Worklog Time Spent: 10m Work Description: jhsenjaliya commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r288411603 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/SelectBtwModDataTimeBasedCopyableFileFilter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.gobblin.data.management.copy; + +import java.util.Collection; +import java.util.Iterator; +import java.util.Properties; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.hadoop.fs.FileSystem; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.Period; +import org.joda.time.format.PeriodFormatter; +import org.joda.time.format.PeriodFormatterBuilder; + +import com.google.common.collect.ImmutableList; + +import org.apache.gobblin.configuration.ConfigurationKeys; + + +/** + * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file modified time not between the loop back window + * sourceFs + */ +@Slf4j +public class SelectBtwModDataTimeBasedCopyableFileFilter implements CopyableFileFilter { Review comment: should be named `DateRangeBasedFileFilter` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 249869) Time Spent: 1h 40m (was: 1.5h) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=241531&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-241531 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 14/May/19 05:30 Start Date: 14/May/19 05:30 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on issue #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633#issuecomment-492084494 @sv2000 @htran1 @jhsenjaliya created New PR. Please review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 241531) Time Spent: 1.5h (was: 1h 20m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=241417&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-241417 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 14/May/19 00:36 Start Date: 14/May/19 00:36 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2623: [GOBBLIN-759] Added feature to support DistCP to copy files modified in last n days URL: https://github.com/apache/incubator-gobblin/pull/2623 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 241417) Time Spent: 1h 10m (was: 1h) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=241418&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-241418 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 14/May/19 00:36 Start Date: 14/May/19 00:36 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2633: GOBBLIN-759: Added feature to support DistCP to copy files that were … URL: https://github.com/apache/incubator-gobblin/pull/2633 …modified in last n days Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [ ] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-759] My Added feature to support DistCP to copy files modified in last n days" - https://issues.apache.org/jira/browse/GOBBLIN-759 ### Description - [ ] Here are some details about my PR, including screenshots (if applicable): 1. Added feature to DistCP the files which were modified in last n days within the lookback period. 2. This feature allows to copy only the modified files even when non modified files not at the destination. 3. Leverage existing TimestampBasedCopyableDataset to find the dataset and uses SelectBtwModDataTimeBasedCopyableFileFilter CopyableFilter implementation to filter the files that were modified in last n days. ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: 1. Added TimestampBasedCopyableDatasetTest.testCopyWithFilter test case to test 1 modified and 1 non-modified scenario. ### Commits - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 241418) Time Spent: 1h 20m (was: 1h 10m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=240879&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240879 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 13/May/19 05:40 Start Date: 13/May/19 05:40 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2623: [GOBBLIN-759] Added feature to support DistCP to copy files modified in last n days URL: https://github.com/apache/incubator-gobblin/pull/2623 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 240879) Time Spent: 50m (was: 40m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=240881&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240881 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 13/May/19 05:41 Start Date: 13/May/19 05:41 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2623: [GOBBLIN-759] Added feature to support DistCP to copy files modified in last n days URL: https://github.com/apache/incubator-gobblin/pull/2623 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-759] My Added feature to support DistCP to copy files modified in last n days" - https://issues.apache.org/jira/browse/GOBBLIN-759 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): 1. Added feature to DistCP the files which were modified in last n days within the lookback period. 2. This feature allows to copy only the modified files even when non modified files not at the destination. 3. Leverage existing TimestampBasedCopyableDataset to find the dataset and uses SelectBtwModDataTimeBasedCopyableFileFilter CopyableFilter implementation to filter the files that were modified in last n days. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: 1. Added TimestampBasedCopyableDatasetTest.testCopyWithFilter test case to test 1 modified and 1 non-modified scenario. ### Commits - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 240881) Time Spent: 1h (was: 50m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=240533&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240533 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 11/May/19 01:21 Start Date: 11/May/19 01:21 Worklog Time Spent: 10m Work Description: jhsenjaliya commented on issue #2623: [GOBBLIN-759] Added feature to support DistCP to copy files modified in last n days URL: https://github.com/apache/incubator-gobblin/pull/2623#issuecomment-491467992 @amarnathkarthik, can you pls squash this commits? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 240533) Time Spent: 40m (was: 0.5h) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=234781&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-234781 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 29/Apr/19 18:16 Start Date: 29/Apr/19 18:16 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on issue #2623: [GOBBLIN-759] Added feature to support DistCP to copy files modified in last n days URL: https://github.com/apache/incubator-gobblin/pull/2623#issuecomment-487687405 @sv2000 Build successful, please review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 234781) Time Spent: 0.5h (was: 20m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=234024&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-234024 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 27/Apr/19 21:19 Start Date: 27/Apr/19 21:19 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on pull request #2623: [GOBBLIN-759] Added feature to support DistCP to copy files modified in last n days URL: https://github.com/apache/incubator-gobblin/pull/2623 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [ ] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-759] My Added feature to support DistCP to copy files modified in last n days" - https://issues.apache.org/jira/browse/GOBBLIN-759 ### Description - [ ] Here are some details about my PR, including screenshots (if applicable): 1. Added feature to DistCP the files which were modified in last n days within the lookback period. 2. This feature allows to copy only the modified files even when non modified files not at the destination. 3. Leverage existing TimestampBasedCopyableDataset to find the dataset and uses SelectBtwModDataTimeBasedCopyableFileFilter CopyableFilter implementation to filter the files that were modified in last n days. ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: 1. Added TimestampBasedCopyableDatasetTest.testCopyWithFilter test case to test 1 modified and 1 non-modified scenario. ### Commits - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 234024) Time Spent: 10m Remaining Estimate: 0h > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period
[ https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=234025&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-234025 ] ASF GitHub Bot logged work on GOBBLIN-759: -- Author: ASF GitHub Bot Created on: 27/Apr/19 21:19 Start Date: 27/Apr/19 21:19 Worklog Time Spent: 10m Work Description: amarnathkarthik commented on issue #2623: [GOBBLIN-759] Added feature to support DistCP to copy files modified in last n days URL: https://github.com/apache/incubator-gobblin/pull/2623#issuecomment-487320699 @sv2000 Please review. Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 234025) Time Spent: 20m (was: 10m) > DistCP files modified in last n days within a look back period > -- > > Key: GOBBLIN-759 > URL: https://issues.apache.org/jira/browse/GOBBLIN-759 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Karthik Amarnath >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > *Feature Request:* > # DistCP only the files modified in last n days within the look back window. > # DistCP will copy only the files modified even when the source file which > were NOT modified in last n days in the destination directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)