[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=400864&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400864
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 10/Mar/20 17:15
Start Date: 10/Mar/20 17:15
Worklog Time Spent: 10m 
  Work Description: codecov-io commented on issue #2633: GOBBLIN-759: Added 
feature to support DistCP to copy files that were …
URL: 
https://github.com/apache/incubator-gobblin/pull/2633#issuecomment-586633838
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2633?src=pr&el=h1)
 Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@bca2e1f`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `0%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2633?src=pr&el=tree)
   
   ```diff
   @@   Coverage Diff@@
   ## master   #2633   +/-   ##
   
 Coverage  ?   4.13%   
 Complexity? 751   
   
 Files ?1937   
 Lines ?   72988   
 Branches  ?8051   
   
 Hits  ?3017   
 Misses?   69652   
 Partials  ? 319
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2633?src=pr&el=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...sion/finder/HdfsModifiedTimeHiveVersionFinder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L3ZlcnNpb24vZmluZGVyL0hkZnNNb2RpZmllZFRpbWVIaXZlVmVyc2lvbkZpbmRlci5qYXZh)
 | `23.07% <ø> (ø)` | `1 <0> (?)` | |
   | 
[...writer/partitioner/TimeBasedWriterPartitioner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3dyaXRlci9wYXJ0aXRpb25lci9UaW1lQmFzZWRXcml0ZXJQYXJ0aXRpb25lci5qYXZh)
 | `0% <ø> (ø)` | `0 <0> (?)` | |
   | 
[...he/gobblin/cluster/TaskRunnerSuiteThreadModel.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvVGFza1J1bm5lclN1aXRlVGhyZWFkTW9kZWwuamF2YQ==)
 | `0% <ø> (ø)` | `0 <0> (?)` | |
   | 
[.../java/org/apache/gobblin/hive/HiveLockFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0hpdmVMb2NrRmFjdG9yeS5qYXZh)
 | `0% <ø> (ø)` | `0 <0> (?)` | |
   | 
[...lin/hive/metastore/HiveMetaStoreBasedRegister.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL21ldGFzdG9yZS9IaXZlTWV0YVN0b3JlQmFzZWRSZWdpc3Rlci5qYXZh)
 | `0% <ø> (ø)` | `0 <0> (?)` | |
   | 
[...pache/gobblin/configuration/ConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vY29uZmlndXJhdGlvbi9Db25maWd1cmF0aW9uS2V5cy5qYXZh)
 | `0% <ø> (ø)` | `0 <0> (?)` | |
   | 
[.../org/apache/gobblin/hive/HiveRegistrationUnit.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0hpdmVSZWdpc3RyYXRpb25Vbml0LmphdmE=)
 | `0% <ø> (ø)` | `0 <0> (?)` | |
   | 
[.../org/apache/gobblin/service/ServiceConfigKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vc2VydmljZS9TZXJ2aWNlQ29uZmlnS2V5cy5qYXZh)
 | `0% <ø> (ø)` | `0 <0> (?)` | |
   | 
[...ain/java/org/apache/gobblin/writer/DataWriter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vd3JpdGVyL0RhdGFXcml0ZXIuamF2YQ==)
 | `0% <ø> (ø)` | `0 <0> (?)` | |
   | 
[...ain/java/org/apache/gobblin/hive/HiveLockImpl.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL0hpdmVMb2NrSW1wbC5qYXZh)
 | `0% <ø> (ø)` | `0 <0> (?)` | |
   | ... and [129 
more](https://codecov.io/gh/apache/incubator-gobblin/pull

[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-03-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=399988&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-399988
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 09/Mar/20 06:36
Start Date: 09/Mar/20 06:36
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r389482730
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java
 ##
 @@ -73,6 +71,7 @@
   private final VersionSelectionPolicy 
versionSelectionPolicy;
   private final ExecutorService executor;
   private final FileSystem srcFs;
+  private final CopyableFileFilter copyableFileFilter;
 
 Review comment:
   Thanks for the reference. AndPathFilter and CopyableFileFilter are two 
different interfaces and did not find a way to merge. AndPathFilter implements 
accept(..) whereas CopyableFileFilter implements filter(..). Please advise.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 399988)
Time Spent: 7h 40m  (was: 7.5h)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-03-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=399987&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-399987
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 09/Mar/20 06:27
Start Date: 09/Mar/20 06:27
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r389480809
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java
 ##
 @@ -121,8 +119,8 @@ public TimestampBasedCopyableDataset(FileSystem fs, 
Properties props, Path datas
 ConcurrentLinkedQueue copyableFileList = new 
ConcurrentLinkedQueue<>();
 List> futures = Lists.newArrayList();
 for (TimestampedDatasetVersion copyableVersion : copyableVersions) {
-  futures.add(this.executor.submit(this.getCopyableFileGenetator(targetFs, 
configuration, copyableVersion,
-  copyableFileList)));
+  futures.add(this.executor.submit(
+  this.getCopyableFileGenetator(targetFs, configuration, 
copyableVersion, copyableFileList)));
 
 Review comment:
   Its existing code, but fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 399987)
Time Spent: 7.5h  (was: 7h 20m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-03-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=399986&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-399986
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 09/Mar/20 06:26
Start Date: 09/Mar/20 06:26
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r389480643
 
 

 ##
 File path: 
gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDatasetTest.java
 ##
 @@ -91,12 +110,82 @@ public void testConfigOptions() {
 TimeBasedCopyPolicyForTest.class.getName());
   }
 
+  @Test
+  public void testCopyWithFilter() throws IOException {
+
+/** source setup **/
+Path srcRoot = new Path(this.testTempPath, "src/data/dataset1/daily");
+
+if (this.localFs.exists(srcRoot)) {
+  this.localFs.delete(srcRoot, true);
+}
+
+List dateTimeList = Lists.newArrayList();
+IntStream.range(0, 4)
+.forEach(
+i -> dateTimeList.add(new 
DateTime(DateTimeZone.forID(ConfigurationKeys.PST_TIMEZONE_NAME)).minusDays(i)));
+
+String datePattern = "/MM/dd";
+DateTimeFormatter formatter = DateTimeFormat.forPattern(datePattern);
+
+for (DateTime dt : dateTimeList) {
+  String srcVersionPathStr = formatter.print(dt);
+  Path srcVersionPath = new Path(srcRoot, srcVersionPathStr);
+  this.localFs.mkdirs(srcVersionPath);
+
+  Path srcfile = new Path(srcVersionPath, "file1.avro");
+  this.localFs.create(srcfile);
+}
+
+/** destination setup **/
+Path destRoot = new Path(this.testTempPath, "dest/data/dataset1");
+if (this.localFs.exists(destRoot)) {
+  this.localFs.delete(destRoot, true);
+}
+this.localFs.mkdirs(destRoot);
+
+Properties props = new Properties();
+props.setProperty(TimestampBasedCopyableDataset.COPY_POLICY, 
SelectBetweenTimeBasedPolicy.class.getName());
+props.setProperty(TimestampBasedCopyableDataset.DATASET_VERSION_FINDER,
+DateTimeDatasetVersionFinder.class.getName());
+
props.setProperty(SelectBetweenTimeBasedPolicy.TIME_BASED_SELECTION_MIN_LOOK_BACK_TIME_KEY,
 "1d");
+
props.setProperty(SelectBetweenTimeBasedPolicy.TIME_BASED_SELECTION_MAX_LOOK_BACK_TIME_KEY,
 "6d");
+props.setProperty(DateTimeDatasetVersionFinder.DATE_TIME_PATTERN_KEY, 
"/MM/dd");
+props.setProperty("gobblin.dataset.copyable.file.filter.class",
 
 Review comment:
   org.apache.gobblin.data.management.dataset.DatasetUtils and 
org.apache.gobblin.data.management.copy.TimestampBasedCopyableDatasetTest are 
in different package, will change the access modifier to public.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 399986)
Time Spent: 7h 20m  (was: 7h 10m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=398151&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398151
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 05/Mar/20 05:32
Start Date: 05/Mar/20 05:32
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r388085632
 
 

 ##
 File path: 
gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDatasetTest.java
 ##
 @@ -91,12 +110,82 @@ public void testConfigOptions() {
 TimeBasedCopyPolicyForTest.class.getName());
   }
 
+  @Test
+  public void testCopyWithFilter() throws IOException {
+
+/** source setup **/
+Path srcRoot = new Path(this.testTempPath, "src/data/dataset1/daily");
+
+if (this.localFs.exists(srcRoot)) {
+  this.localFs.delete(srcRoot, true);
+}
+
+List dateTimeList = Lists.newArrayList();
+IntStream.range(0, 4)
+.forEach(
+i -> dateTimeList.add(new 
DateTime(DateTimeZone.forID(ConfigurationKeys.PST_TIMEZONE_NAME)).minusDays(i)));
+
+String datePattern = "/MM/dd";
+DateTimeFormatter formatter = DateTimeFormat.forPattern(datePattern);
+
+for (DateTime dt : dateTimeList) {
+  String srcVersionPathStr = formatter.print(dt);
+  Path srcVersionPath = new Path(srcRoot, srcVersionPathStr);
+  this.localFs.mkdirs(srcVersionPath);
+
+  Path srcfile = new Path(srcVersionPath, "file1.avro");
+  this.localFs.create(srcfile);
+}
+
+/** destination setup **/
+Path destRoot = new Path(this.testTempPath, "dest/data/dataset1");
+if (this.localFs.exists(destRoot)) {
+  this.localFs.delete(destRoot, true);
+}
+this.localFs.mkdirs(destRoot);
+
+Properties props = new Properties();
+props.setProperty(TimestampBasedCopyableDataset.COPY_POLICY, 
SelectBetweenTimeBasedPolicy.class.getName());
+props.setProperty(TimestampBasedCopyableDataset.DATASET_VERSION_FINDER,
+DateTimeDatasetVersionFinder.class.getName());
+
props.setProperty(SelectBetweenTimeBasedPolicy.TIME_BASED_SELECTION_MIN_LOOK_BACK_TIME_KEY,
 "1d");
+
props.setProperty(SelectBetweenTimeBasedPolicy.TIME_BASED_SELECTION_MAX_LOOK_BACK_TIME_KEY,
 "6d");
+props.setProperty(DateTimeDatasetVersionFinder.DATE_TIME_PATTERN_KEY, 
"/MM/dd");
+props.setProperty("gobblin.dataset.copyable.file.filter.class",
 
 Review comment:
   Make DatasetUtils.COPYABLE_FILE_FILTER_KEY package private and use it here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398151)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=398150&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398150
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 05/Mar/20 05:32
Start Date: 05/Mar/20 05:32
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r388083153
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java
 ##
 @@ -121,8 +119,8 @@ public TimestampBasedCopyableDataset(FileSystem fs, 
Properties props, Path datas
 ConcurrentLinkedQueue copyableFileList = new 
ConcurrentLinkedQueue<>();
 List> futures = Lists.newArrayList();
 for (TimestampedDatasetVersion copyableVersion : copyableVersions) {
-  futures.add(this.executor.submit(this.getCopyableFileGenetator(targetFs, 
configuration, copyableVersion,
-  copyableFileList)));
+  futures.add(this.executor.submit(
+  this.getCopyableFileGenetator(targetFs, configuration, 
copyableVersion, copyableFileList)));
 
 Review comment:
   Typo: this.getCopyableFileGenerator
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398150)
Time Spent: 7h 10m  (was: 7h)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=398149&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398149
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 05/Mar/20 05:32
Start Date: 05/Mar/20 05:32
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r388081914
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java
 ##
 @@ -73,6 +71,7 @@
   private final VersionSelectionPolicy 
versionSelectionPolicy;
   private final ExecutorService executor;
   private final FileSystem srcFs;
+  private final CopyableFileFilter copyableFileFilter;
 
 Review comment:
   Please take a look at AndPathFilter in gobblin-utility and 
UnixTimestampRecursiveCopyableDataset for an example usage.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398149)
Time Spent: 7h  (was: 6h 50m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-02-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=395366&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395366
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 29/Feb/20 01:29
Start Date: 29/Feb/20 01:29
Worklog Time Spent: 10m 
  Work Description: arjun4084346 commented on issue #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: 
https://github.com/apache/incubator-gobblin/pull/2633#issuecomment-592804300
 
 
   +1 LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 395366)
Time Spent: 6h 50m  (was: 6h 40m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=394646&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-394646
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 28/Feb/20 05:19
Start Date: 28/Feb/20 05:19
Worklog Time Spent: 10m 
  Work Description: arjun4084346 commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r385511737
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/DateRangeBasedFileFilter.java
 ##
 @@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import com.google.common.base.Strings;
+import com.google.common.collect.ImmutableList;
+import java.util.Collection;
+import java.util.Iterator;
+import lombok.extern.slf4j.Slf4j;
+import org.apache.gobblin.configuration.ConfigurationKeys;
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modification time not within the lookback
+ * window
+ *  sourceFs
+ */
+@Slf4j
+public class DateRangeBasedFileFilter implements CopyableFileFilter {
+
+  private Period minLookBackPeriod;
+  private Period maxLookBackPeriod;
+  private String timezone;
+  private DateTime currentTime;
+  private DateTime minLookBackTime;
+  private DateTime maxLookBackTime;
+
+  public static final String CONFIGURATION_KEY_PREFIX = 
"gobblin.dataset.filter.";
+  public static final String DEFAULT_DATE_PATTERN_TIMEZONE = 
ConfigurationKeys.PST_TIMEZONE_NAME;
+
+  public DateRangeBasedFileFilter(Period minLookback, Period maxLookback, 
String timezone) {
+this.minLookBackPeriod = minLookback;
+this.maxLookBackPeriod = maxLookback;
+this.timezone = timezone;
+this.currentTime = !Strings.isNullOrEmpty(this.timezone) ? 
DateTime.now(DateTimeZone.forID(this.timezone))
+: DateTime.now(DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE));
+this.minLookBackTime = this.currentTime.minus(this.minLookBackPeriod);
+this.maxLookBackTime = this.currentTime.minus(this.maxLookBackPeriod);
+  }
+
+  /**
+   * For every {@link CopyableFile} in copyableFiles checks if a
+   * {@link CopyableFile#getOrigin()#getPath()#getModificationTime()}
+   * + date between the min and max look back window on sourceFs 
{@inheritDoc}
+   *
+   * @see CopyableFileFilter#filter(FileSystem,
+   *  FileSystem, Collection)
+   */
+  @Override
+  public Collection filter(FileSystem sourceFs, FileSystem 
targetFs,
+  Collection copyableFiles) {
+Iterator iterator = copyableFiles.iterator();
+
+ImmutableList.Builder filtered = ImmutableList.builder();
+
+while (iterator.hasNext()) {
+  CopyableFile file = iterator.next();
+  if 
(isFileModifiedWithinLookBackPeriod(file.getOrigin().getModificationTime())) {
+filtered.add(file);
+  }
+}
+
+return filtered.build();
+  }
+
+  /**
+   *
+   * @param modTime file modification time in long.
+   * @return true if the file modification time within lookback 
window;
+   * false if file modification time not within lookback 
window.
+   *
+   */
+  private boolean isFileModifiedWithinLookBackPeriod(long modTime) {
+DateTime modifiedTime =
+!Strings.isNullOrEmpty(this.timezone) ? new DateTime(modTime, 
DateTimeZone.forID(this.timezone))
+: new DateTime(modTime, 
DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE));
 
 Review comment:
   I found this more readable
   Strings.isNullOrEmpty(this.timezone) ? new DateTime(modTime, 
DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE)) : new DateTime(modTime, 
DateTimeZone.forID(this.timezone))
   : ;
   
   just saying :D
 
---

[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=394647&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-394647
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 28/Feb/20 05:19
Start Date: 28/Feb/20 05:19
Worklog Time Spent: 10m 
  Work Description: arjun4084346 commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r385511940
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.joda.time.DateTime;
+import org.joda.time.Period;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modification time not within the lookback
 
 Review comment:
   same. "in"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 394647)
Time Spent: 6h 40m  (was: 6.5h)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=394645&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-394645
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 28/Feb/20 05:10
Start Date: 28/Feb/20 05:10
Worklog Time Spent: 10m 
  Work Description: arjun4084346 commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r385509959
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/DateRangeBasedFileFilter.java
 ##
 @@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import com.google.common.base.Strings;
+import com.google.common.collect.ImmutableList;
+import java.util.Collection;
+import java.util.Iterator;
+import lombok.extern.slf4j.Slf4j;
+import org.apache.gobblin.configuration.ConfigurationKeys;
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modification time not within the lookback
 
 Review comment:
   time is not ..
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 394645)
Time Spent: 6h 20m  (was: 6h 10m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387991&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387991
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Feb/20 20:00
Start Date: 15/Feb/20 20:00
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on issue #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: 
https://github.com/apache/incubator-gobblin/pull/2633#issuecomment-586636155
 
 
   @sv2000 Please review
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387991)
Time Spent: 6h 10m  (was: 6h)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387989&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387989
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Feb/20 19:33
Start Date: 15/Feb/20 19:33
Worklog Time Spent: 10m 
  Work Description: codecov-io commented on issue #2633: GOBBLIN-759: Added 
feature to support DistCP to copy files that were …
URL: 
https://github.com/apache/incubator-gobblin/pull/2633#issuecomment-586633838
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2633?src=pr&el=h1)
 Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@bca2e1f`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `84.61%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/graphs/tree.svg?width=650&token=4MgURJ0bGc&height=150&src=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2633?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#2633   +/-   ##
   =
 Coverage  ?   45.85%   
 Complexity? 9161   
   =
 Files ? 1932   
 Lines ?72659   
 Branches  ? 7998   
   =
 Hits  ?33316   
 Misses?36302   
 Partials  ? 3041
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2633?src=pr&el=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../copy/TimestampBasedCopyableGlobDatasetFinder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvcHkvVGltZXN0YW1wQmFzZWRDb3B5YWJsZUdsb2JEYXRhc2V0RmluZGVyLmphdmE=)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[.../gobblin/data/management/dataset/DatasetUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2RhdGFzZXQvRGF0YXNldFV0aWxzLmphdmE=)
 | `55.88% <100%> (ø)` | `6 <0> (?)` | |
   | 
[...agement/copy/ModifiedDateRangeBasedFileFilter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvcHkvTW9kaWZpZWREYXRlUmFuZ2VCYXNlZEZpbGVGaWx0ZXIuamF2YQ==)
 | `75% <75%> (ø)` | `4 <4> (?)` | |
   | 
[...management/copy/TimestampBasedCopyableDataset.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvcHkvVGltZXN0YW1wQmFzZWRDb3B5YWJsZURhdGFzZXQuamF2YQ==)
 | `83.52% <88.23%> (ø)` | `11 <0> (?)` | |
   | 
[...data/management/copy/DateRangeBasedFileFilter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvcHkvRGF0ZVJhbmdlQmFzZWRGaWxlRmlsdGVyLmphdmE=)
 | `89.28% <89.28%> (ø)` | `8 <8> (?)` | |
   | 
[...anagement/policy/SelectBetweenTimeBasedPolicy.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2633/diff?src=pr&el=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L3BvbGljeS9TZWxlY3RCZXR3ZWVuVGltZUJhc2VkUG9saWN5LmphdmE=)
 | `93.93% <90.47%> (ø)` | `9 <4> (?)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2633?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2633?src=pr&el=footer).
 Last update 
[bca2e1f...67344bb](https://codecov.io/gh/apache/incubator-gobblin/pull/2633?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387989)
Time Spent: 6h  (was: 5h 50m)

> Di

[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387982&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387982
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Feb/20 18:59
Start Date: 15/Feb/20 18:59
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849411
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
+ *  sourceFs
+ */
+@Slf4j
+public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter {
+
+  private final Properties props;
+  private Period minLookBackPeriod;
+  private Period maxLookBackPeriod;
+  private DateTime currentTime;
+  private DateTime minLookBackTime;
+  private DateTime maxLookBackTime;
+
+  public static final String CONFIGURATION_KEY_PREFIX = 
"gobblin.dataset.filter.";
+  public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime";
+  public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime";
+  public static final String DEFAULT_DATE_PATTERN_TIMEZONE = 
ConfigurationKeys.PST_TIMEZONE_NAME;
+  public static final String DATE_PATTERN_TIMEZONE_KEY = 
CONFIGURATION_KEY_PREFIX + "datetime.timezone";
+
+  public ModifiedDateRangeBasedFileFilter(Properties properties) {
+this.props = properties;
+PeriodFormatter periodFormatter =
+new 
PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter();
+this.minLookBackPeriod = 
props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? 
periodFormatter.parsePeriod(
+props.getProperty(MODIFIED_MIN_LOOK_BACK_TIME_KEY)) : new 
Period(DateTime.now().getMillis());
+this.maxLookBackPeriod = 
props.containsKey(MODIFIED_MAX_LOOK_BACK_TIME_KEY) ? 
periodFormatter.parsePeriod(
+props.getProperty(MODIFIED_MAX_LOOK_BACK_TIME_KEY)) : new 
Period(DateTime.now().minusDays(1).getMillis());
+this.currentTime = properties.containsKey(DATE_PATTERN_TIMEZONE_KEY) ? 
DateTime.now(
+DateTimeZone.forID(props.getProperty(DATE_PATTERN_TIMEZONE_KEY)))
+: DateTime.now(DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE));
+this.minLookBackTime = this.currentTime.minus(minLookBackPeriod);
+this.maxLookBackTime = this.currentTime.minus(maxLookBackPeriod);
+  }
+
+  /**
+   * For every {@link CopyableFile} in copyableFiles checks if a 
{@link CopyableFile#getOrigin()#getPath()#getModificationTime()}
+   * + date between the min and max look back window on sourceFs 
{@inheritDoc}
+   *
+   * @see CopyableFileFilter#filter(FileSystem,
+   *  FileSystem, Collection)
+   */
+  @Override
+  public Collection filter(FileSystem sourceFs, FileSystem 
targetFs,
+  Collection copyableFiles) {
+Iterator iterator = copyableFiles.iterator();
+
+ImmutableList.Builder filtered = ImmutableList.builder();
+
+while (iterator.hasNext()) {
+  CopyableFile file = iterator.next();
+  boolean fileWithInModWindow = 
isFileModifiedBtwLookBackPeriod(file.getOrigin().ge

[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387984&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387984
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Feb/20 18:59
Start Date: 15/Feb/20 18:59
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849417
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
+ *  sourceFs
+ */
+@Slf4j
+public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter {
+
+  private final Properties props;
+  private Period minLookBackPeriod;
+  private Period maxLookBackPeriod;
+  private DateTime currentTime;
+  private DateTime minLookBackTime;
+  private DateTime maxLookBackTime;
+
+  public static final String CONFIGURATION_KEY_PREFIX = 
"gobblin.dataset.filter.";
+  public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime";
+  public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime";
+  public static final String DEFAULT_DATE_PATTERN_TIMEZONE = 
ConfigurationKeys.PST_TIMEZONE_NAME;
+  public static final String DATE_PATTERN_TIMEZONE_KEY = 
CONFIGURATION_KEY_PREFIX + "datetime.timezone";
+
+  public ModifiedDateRangeBasedFileFilter(Properties properties) {
+this.props = properties;
+PeriodFormatter periodFormatter =
+new 
PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter();
+this.minLookBackPeriod = 
props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? 
periodFormatter.parsePeriod(
+props.getProperty(MODIFIED_MIN_LOOK_BACK_TIME_KEY)) : new 
Period(DateTime.now().getMillis());
+this.maxLookBackPeriod = 
props.containsKey(MODIFIED_MAX_LOOK_BACK_TIME_KEY) ? 
periodFormatter.parsePeriod(
+props.getProperty(MODIFIED_MAX_LOOK_BACK_TIME_KEY)) : new 
Period(DateTime.now().minusDays(1).getMillis());
+this.currentTime = properties.containsKey(DATE_PATTERN_TIMEZONE_KEY) ? 
DateTime.now(
+DateTimeZone.forID(props.getProperty(DATE_PATTERN_TIMEZONE_KEY)))
+: DateTime.now(DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE));
+this.minLookBackTime = this.currentTime.minus(minLookBackPeriod);
+this.maxLookBackTime = this.currentTime.minus(maxLookBackPeriod);
+  }
+
+  /**
+   * For every {@link CopyableFile} in copyableFiles checks if a 
{@link CopyableFile#getOrigin()#getPath()#getModificationTime()}
+   * + date between the min and max look back window on sourceFs 
{@inheritDoc}
+   *
+   * @see CopyableFileFilter#filter(FileSystem,
+   *  FileSystem, Collection)
+   */
+  @Override
+  public Collection filter(FileSystem sourceFs, FileSystem 
targetFs,
+  Collection copyableFiles) {
+Iterator iterator = copyableFiles.iterator();
+
+ImmutableList.Builder filtered = ImmutableList.builder();
+
+while (iterator.hasNext()) {
+  CopyableFile file = iterator.next();
+  boolean fileWithInModWindow = 
isFileModifiedBtwLookBackPeriod(file.getOrigin().ge

[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387985&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387985
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Feb/20 18:59
Start Date: 15/Feb/20 18:59
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849419
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
+ *  sourceFs
+ */
+@Slf4j
+public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter {
+
+  private final Properties props;
+  private Period minLookBackPeriod;
+  private Period maxLookBackPeriod;
+  private DateTime currentTime;
+  private DateTime minLookBackTime;
+  private DateTime maxLookBackTime;
+
+  public static final String CONFIGURATION_KEY_PREFIX = 
"gobblin.dataset.filter.";
+  public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime";
+  public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime";
+  public static final String DEFAULT_DATE_PATTERN_TIMEZONE = 
ConfigurationKeys.PST_TIMEZONE_NAME;
+  public static final String DATE_PATTERN_TIMEZONE_KEY = 
CONFIGURATION_KEY_PREFIX + "datetime.timezone";
+
+  public ModifiedDateRangeBasedFileFilter(Properties properties) {
+this.props = properties;
+PeriodFormatter periodFormatter =
+new 
PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter();
+this.minLookBackPeriod = 
props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? 
periodFormatter.parsePeriod(
+props.getProperty(MODIFIED_MIN_LOOK_BACK_TIME_KEY)) : new 
Period(DateTime.now().getMillis());
+this.maxLookBackPeriod = 
props.containsKey(MODIFIED_MAX_LOOK_BACK_TIME_KEY) ? 
periodFormatter.parsePeriod(
+props.getProperty(MODIFIED_MAX_LOOK_BACK_TIME_KEY)) : new 
Period(DateTime.now().minusDays(1).getMillis());
+this.currentTime = properties.containsKey(DATE_PATTERN_TIMEZONE_KEY) ? 
DateTime.now(
+DateTimeZone.forID(props.getProperty(DATE_PATTERN_TIMEZONE_KEY)))
+: DateTime.now(DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE));
+this.minLookBackTime = this.currentTime.minus(minLookBackPeriod);
+this.maxLookBackTime = this.currentTime.minus(maxLookBackPeriod);
+  }
+
+  /**
+   * For every {@link CopyableFile} in copyableFiles checks if a 
{@link CopyableFile#getOrigin()#getPath()#getModificationTime()}
+   * + date between the min and max look back window on sourceFs 
{@inheritDoc}
+   *
+   * @see CopyableFileFilter#filter(FileSystem,
+   *  FileSystem, Collection)
+   */
+  @Override
+  public Collection filter(FileSystem sourceFs, FileSystem 
targetFs,
+  Collection copyableFiles) {
+Iterator iterator = copyableFiles.iterator();
+
+ImmutableList.Builder filtered = ImmutableList.builder();
+
+while (iterator.hasNext()) {
+  CopyableFile file = iterator.next();
+  boolean fileWithInModWindow = 
isFileModifiedBtwLookBackPeriod(file.getOrigin().ge

[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387983&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387983
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Feb/20 18:59
Start Date: 15/Feb/20 18:59
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849413
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
+ *  sourceFs
+ */
+@Slf4j
+public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter {
+
+  private final Properties props;
+  private Period minLookBackPeriod;
+  private Period maxLookBackPeriod;
+  private DateTime currentTime;
+  private DateTime minLookBackTime;
+  private DateTime maxLookBackTime;
+
+  public static final String CONFIGURATION_KEY_PREFIX = 
"gobblin.dataset.filter.";
+  public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime";
+  public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime";
+  public static final String DEFAULT_DATE_PATTERN_TIMEZONE = 
ConfigurationKeys.PST_TIMEZONE_NAME;
+  public static final String DATE_PATTERN_TIMEZONE_KEY = 
CONFIGURATION_KEY_PREFIX + "datetime.timezone";
+
+  public ModifiedDateRangeBasedFileFilter(Properties properties) {
+this.props = properties;
+PeriodFormatter periodFormatter =
+new 
PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter();
+this.minLookBackPeriod = 
props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? 
periodFormatter.parsePeriod(
+props.getProperty(MODIFIED_MIN_LOOK_BACK_TIME_KEY)) : new 
Period(DateTime.now().getMillis());
+this.maxLookBackPeriod = 
props.containsKey(MODIFIED_MAX_LOOK_BACK_TIME_KEY) ? 
periodFormatter.parsePeriod(
+props.getProperty(MODIFIED_MAX_LOOK_BACK_TIME_KEY)) : new 
Period(DateTime.now().minusDays(1).getMillis());
+this.currentTime = properties.containsKey(DATE_PATTERN_TIMEZONE_KEY) ? 
DateTime.now(
+DateTimeZone.forID(props.getProperty(DATE_PATTERN_TIMEZONE_KEY)))
+: DateTime.now(DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE));
+this.minLookBackTime = this.currentTime.minus(minLookBackPeriod);
+this.maxLookBackTime = this.currentTime.minus(maxLookBackPeriod);
+  }
+
+  /**
+   * For every {@link CopyableFile} in copyableFiles checks if a 
{@link CopyableFile#getOrigin()#getPath()#getModificationTime()}
+   * + date between the min and max look back window on sourceFs 
{@inheritDoc}
+   *
+   * @see CopyableFileFilter#filter(FileSystem,
+   *  FileSystem, Collection)
+   */
+  @Override
+  public Collection filter(FileSystem sourceFs, FileSystem 
targetFs,
+  Collection copyableFiles) {
+Iterator iterator = copyableFiles.iterator();
+
+ImmutableList.Builder filtered = ImmutableList.builder();
+
+while (iterator.hasNext()) {
+  CopyableFile file = iterator.next();
+  boolean fileWithInModWindow = 
isFileModifiedBtwLookBackPeriod(file.getOrigin().ge

[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387986&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387986
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Feb/20 18:59
Start Date: 15/Feb/20 18:59
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849421
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
 
 Review comment:
   Fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387986)
Time Spent: 5h 50m  (was: 5h 40m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387980&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387980
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Feb/20 18:58
Start Date: 15/Feb/20 18:58
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849362
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java
 ##
 @@ -120,9 +118,10 @@ public TimestampBasedCopyableDataset(FileSystem fs, 
Properties props, Path datas
 Collection copyableVersions = 
this.versionSelectionPolicy.listSelectedVersions(versions);
 ConcurrentLinkedQueue copyableFileList = new 
ConcurrentLinkedQueue<>();
 List> futures = Lists.newArrayList();
+//this.copyableFileFilter.filter(this.fs, targetFs, copyableFiles)
 
 Review comment:
   Fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387980)
Time Spent: 4h 50m  (was: 4h 40m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387981&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387981
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Feb/20 18:58
Start Date: 15/Feb/20 18:58
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849405
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
+ *  sourceFs
+ */
+@Slf4j
+public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter {
 
 Review comment:
   Agree. Created DataRangeFileFilter as you suggested.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387981)
Time Spent: 5h  (was: 4h 50m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387979&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387979
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Feb/20 18:58
Start Date: 15/Feb/20 18:58
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849356
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java
 ##
 @@ -134,7 +133,11 @@ public TimestampBasedCopyableDataset(FileSystem fs, 
Properties props, Path datas
 } finally {
   ExecutorsUtils.shutdownExecutorService(executor, Optional.of(log));
 }
-return copyableFileList;
+
+ConcurrentLinkedQueue copyableFilesFilteredList = new 
ConcurrentLinkedQueue<>();
 
 Review comment:
   Existing contract returns ConcurrentLinkedQueue object, therefore did not 
change the object type.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387979)
Time Spent: 4h 40m  (was: 4.5h)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387977
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Feb/20 18:56
Start Date: 15/Feb/20 18:56
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849293
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/policy/SelectBetweenTimeBasedPolicy.java
 ##
 @@ -94,17 +100,25 @@ public SelectBetweenTimeBasedPolicy(Optional 
minLookBackPeriod, Optional
   public boolean apply(TimestampedDatasetVersion version) {
 return version.getDateTime()
 .plus(SelectBetweenTimeBasedPolicy.this.maxLookBackPeriod.or(new 
Period(DateTime.now().getMillis(
-.isAfterNow()
-&& 
version.getDateTime().plus(SelectBetweenTimeBasedPolicy.this.minLookBackPeriod.or(new
 Period(0)))
-.isBeforeNow();
+.isAfterNow() && version.getDateTime()
+.plus(SelectBetweenTimeBasedPolicy.this.minLookBackPeriod.or(new 
Period(0)))
+.isBeforeNow();
   }
 };
   }
 
   protected static Period getLookBackPeriod(String lookbackTime) {
 
 Review comment:
   For better readability, prefer to have this reformatting. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387977)
Time Spent: 4.5h  (was: 4h 20m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387975&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387975
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Feb/20 18:56
Start Date: 15/Feb/20 18:56
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849244
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java
 ##
 @@ -73,6 +71,7 @@
   private final VersionSelectionPolicy 
versionSelectionPolicy;
   private final ExecutorService executor;
   private final FileSystem srcFs;
+  private final CopyableFileFilter copyableFileFilter;
 
 Review comment:
   PathFilter interfaces do not support operation to merge the filter, also 
before filtering through the data range filter, hidden files are removed from 
the list by the existing control flow.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387975)
Time Spent: 4h 10m  (was: 4h)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2020-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=387976&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387976
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Feb/20 18:56
Start Date: 15/Feb/20 18:56
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r379849249
 
 

 ##
 File path: 
gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDatasetTest.java
 ##
 @@ -91,12 +110,82 @@ public void testConfigOptions() {
 TimeBasedCopyPolicyForTest.class.getName());
   }
 
+  @Test
+  public void testCopyWithFilter() throws IOException {
+
+/** source setup **/
+Path srcRoot = new Path(this.testTempPath, "src/slt/eqp/daily");
+
+if (this.localFs.exists(srcRoot)) {
+  this.localFs.delete(srcRoot, true);
+}
+
+List dateTimeList = Lists.newArrayList();
+IntStream.range(0, 4)
+.forEach(
+i -> dateTimeList.add(new 
DateTime(DateTimeZone.forID(ConfigurationKeys.PST_TIMEZONE_NAME)).minusDays(i)));
+
+String datePattern = "/MM/dd";
+DateTimeFormatter formatter = DateTimeFormat.forPattern(datePattern);
+
+for (DateTime dt : dateTimeList) {
+  String srcVersionPathStr = formatter.print(dt);
+  Path srcVersionPath = new Path(srcRoot, srcVersionPathStr);
+  this.localFs.mkdirs(srcVersionPath);
+
+  Path srcfile = new Path(srcVersionPath, "file1.avro");
+  this.localFs.create(srcfile);
+}
+
+/** destination setup **/
+Path destRoot = new Path(this.testTempPath, "dest/slt/eqp");
 
 Review comment:
   Fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387976)
Time Spent: 4h 20m  (was: 4h 10m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-09-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312734
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Sep/19 22:47
Start Date: 15/Sep/19 22:47
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324483538
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
+ *  sourceFs
+ */
+@Slf4j
+public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter {
 
 Review comment:
   Looks like most of the logic inside this class can be moved to a parent 
class that implements a "DateRangeFileFilter". ModTimeDateRangeFileFilter can 
extend this class and pass modification time to filter files. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 312734)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-09-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312736&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312736
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Sep/19 22:47
Start Date: 15/Sep/19 22:47
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324484636
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java
 ##
 @@ -73,6 +71,7 @@
   private final VersionSelectionPolicy 
versionSelectionPolicy;
   private final ExecutorService executor;
   private final FileSystem srcFs;
+  private final CopyableFileFilter copyableFileFilter;
 
 Review comment:
   This class already has a method copyableFileFilter() that returns a 
HiddenFilter. You can use AndPathFilter to merge this filter with the filter 
specified in member variable copyableFileFilter. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 312736)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-09-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312732&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312732
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Sep/19 22:47
Start Date: 15/Sep/19 22:47
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324483057
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
 
 Review comment:
   "if file modified time..." -> "if file modification time is not within the 
lookback window"?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 312732)
Time Spent: 3h 20m  (was: 3h 10m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-09-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312742&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312742
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Sep/19 22:47
Start Date: 15/Sep/19 22:47
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324483235
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
+ *  sourceFs
+ */
+@Slf4j
+public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter {
+
+  private final Properties props;
+  private Period minLookBackPeriod;
+  private Period maxLookBackPeriod;
+  private DateTime currentTime;
+  private DateTime minLookBackTime;
+  private DateTime maxLookBackTime;
+
+  public static final String CONFIGURATION_KEY_PREFIX = 
"gobblin.dataset.filter.";
+  public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime";
+  public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime";
+  public static final String DEFAULT_DATE_PATTERN_TIMEZONE = 
ConfigurationKeys.PST_TIMEZONE_NAME;
+  public static final String DATE_PATTERN_TIMEZONE_KEY = 
CONFIGURATION_KEY_PREFIX + "datetime.timezone";
+
+  public ModifiedDateRangeBasedFileFilter(Properties properties) {
+this.props = properties;
+PeriodFormatter periodFormatter =
+new 
PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter();
+this.minLookBackPeriod = 
props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? 
periodFormatter.parsePeriod(
+props.getProperty(MODIFIED_MIN_LOOK_BACK_TIME_KEY)) : new 
Period(DateTime.now().getMillis());
+this.maxLookBackPeriod = 
props.containsKey(MODIFIED_MAX_LOOK_BACK_TIME_KEY) ? 
periodFormatter.parsePeriod(
+props.getProperty(MODIFIED_MAX_LOOK_BACK_TIME_KEY)) : new 
Period(DateTime.now().minusDays(1).getMillis());
+this.currentTime = properties.containsKey(DATE_PATTERN_TIMEZONE_KEY) ? 
DateTime.now(
+DateTimeZone.forID(props.getProperty(DATE_PATTERN_TIMEZONE_KEY)))
+: DateTime.now(DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE));
+this.minLookBackTime = this.currentTime.minus(minLookBackPeriod);
+this.maxLookBackTime = this.currentTime.minus(maxLookBackPeriod);
+  }
+
+  /**
+   * For every {@link CopyableFile} in copyableFiles checks if a 
{@link CopyableFile#getOrigin()#getPath()#getModificationTime()}
+   * + date between the min and max look back window on sourceFs 
{@inheritDoc}
+   *
+   * @see CopyableFileFilter#filter(FileSystem,
+   *  FileSystem, Collection)
+   */
+  @Override
+  public Collection filter(FileSystem sourceFs, FileSystem 
targetFs,
+  Collection copyableFiles) {
+Iterator iterator = copyableFiles.iterator();
+
+ImmutableList.Builder filtered = ImmutableList.builder();
+
+while (iterator.hasNext()) {
+  CopyableFile file = iterator.next();
+  boolean fileWithInModWindow = 
isFileModifiedBtwLookBackPeriod(file.getOrigin().getModifica

[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-09-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312738
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Sep/19 22:47
Start Date: 15/Sep/19 22:47
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324484386
 
 

 ##
 File path: 
gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDatasetTest.java
 ##
 @@ -91,12 +110,82 @@ public void testConfigOptions() {
 TimeBasedCopyPolicyForTest.class.getName());
   }
 
+  @Test
+  public void testCopyWithFilter() throws IOException {
+
+/** source setup **/
+Path srcRoot = new Path(this.testTempPath, "src/slt/eqp/daily");
+
+if (this.localFs.exists(srcRoot)) {
+  this.localFs.delete(srcRoot, true);
+}
+
+List dateTimeList = Lists.newArrayList();
+IntStream.range(0, 4)
+.forEach(
+i -> dateTimeList.add(new 
DateTime(DateTimeZone.forID(ConfigurationKeys.PST_TIMEZONE_NAME)).minusDays(i)));
+
+String datePattern = "/MM/dd";
+DateTimeFormatter formatter = DateTimeFormat.forPattern(datePattern);
+
+for (DateTime dt : dateTimeList) {
+  String srcVersionPathStr = formatter.print(dt);
+  Path srcVersionPath = new Path(srcRoot, srcVersionPathStr);
+  this.localFs.mkdirs(srcVersionPath);
+
+  Path srcfile = new Path(srcVersionPath, "file1.avro");
+  this.localFs.create(srcfile);
+}
+
+/** destination setup **/
+Path destRoot = new Path(this.testTempPath, "dest/slt/eqp");
 
 Review comment:
   Change "dest/slt/eqp" pathname to some other dummy path.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 312738)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-09-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312737&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312737
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Sep/19 22:47
Start Date: 15/Sep/19 22:47
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324483226
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
+ *  sourceFs
+ */
+@Slf4j
+public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter {
+
+  private final Properties props;
+  private Period minLookBackPeriod;
+  private Period maxLookBackPeriod;
+  private DateTime currentTime;
+  private DateTime minLookBackTime;
+  private DateTime maxLookBackTime;
+
+  public static final String CONFIGURATION_KEY_PREFIX = 
"gobblin.dataset.filter.";
+  public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime";
+  public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime";
+  public static final String DEFAULT_DATE_PATTERN_TIMEZONE = 
ConfigurationKeys.PST_TIMEZONE_NAME;
+  public static final String DATE_PATTERN_TIMEZONE_KEY = 
CONFIGURATION_KEY_PREFIX + "datetime.timezone";
+
+  public ModifiedDateRangeBasedFileFilter(Properties properties) {
+this.props = properties;
+PeriodFormatter periodFormatter =
+new 
PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter();
+this.minLookBackPeriod = 
props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? 
periodFormatter.parsePeriod(
+props.getProperty(MODIFIED_MIN_LOOK_BACK_TIME_KEY)) : new 
Period(DateTime.now().getMillis());
+this.maxLookBackPeriod = 
props.containsKey(MODIFIED_MAX_LOOK_BACK_TIME_KEY) ? 
periodFormatter.parsePeriod(
+props.getProperty(MODIFIED_MAX_LOOK_BACK_TIME_KEY)) : new 
Period(DateTime.now().minusDays(1).getMillis());
+this.currentTime = properties.containsKey(DATE_PATTERN_TIMEZONE_KEY) ? 
DateTime.now(
+DateTimeZone.forID(props.getProperty(DATE_PATTERN_TIMEZONE_KEY)))
+: DateTime.now(DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE));
+this.minLookBackTime = this.currentTime.minus(minLookBackPeriod);
+this.maxLookBackTime = this.currentTime.minus(maxLookBackPeriod);
+  }
+
+  /**
+   * For every {@link CopyableFile} in copyableFiles checks if a 
{@link CopyableFile#getOrigin()#getPath()#getModificationTime()}
+   * + date between the min and max look back window on sourceFs 
{@inheritDoc}
+   *
+   * @see CopyableFileFilter#filter(FileSystem,
+   *  FileSystem, Collection)
+   */
+  @Override
+  public Collection filter(FileSystem sourceFs, FileSystem 
targetFs,
+  Collection copyableFiles) {
+Iterator iterator = copyableFiles.iterator();
+
+ImmutableList.Builder filtered = ImmutableList.builder();
+
+while (iterator.hasNext()) {
+  CopyableFile file = iterator.next();
+  boolean fileWithInModWindow = 
isFileModifiedBtwLookBackPeriod(file.getOrigin().getModifica

[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-09-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312735&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312735
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Sep/19 22:47
Start Date: 15/Sep/19 22:47
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324483733
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java
 ##
 @@ -134,7 +133,11 @@ public TimestampBasedCopyableDataset(FileSystem fs, 
Properties props, Path datas
 } finally {
   ExecutorsUtils.shutdownExecutorService(executor, Optional.of(log));
 }
-return copyableFileList;
+
+ConcurrentLinkedQueue copyableFilesFilteredList = new 
ConcurrentLinkedQueue<>();
 
 Review comment:
   Do we need ConcurrentLinkedQueue? Seems like List should 
suffice?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 312735)
Time Spent: 3h 40m  (was: 3.5h)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-09-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312741&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312741
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Sep/19 22:47
Start Date: 15/Sep/19 22:47
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324483606
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java
 ##
 @@ -120,9 +118,10 @@ public TimestampBasedCopyableDataset(FileSystem fs, 
Properties props, Path datas
 Collection copyableVersions = 
this.versionSelectionPolicy.listSelectedVersions(versions);
 ConcurrentLinkedQueue copyableFileList = new 
ConcurrentLinkedQueue<>();
 List> futures = Lists.newArrayList();
+//this.copyableFileFilter.filter(this.fs, targetFs, copyableFiles)
 
 Review comment:
   Remove this comment..
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 312741)
Time Spent: 3h 50m  (was: 3h 40m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-09-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312740&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312740
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Sep/19 22:47
Start Date: 15/Sep/19 22:47
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324483336
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
+ *  sourceFs
+ */
+@Slf4j
+public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter {
+
+  private final Properties props;
+  private Period minLookBackPeriod;
+  private Period maxLookBackPeriod;
+  private DateTime currentTime;
+  private DateTime minLookBackTime;
+  private DateTime maxLookBackTime;
+
+  public static final String CONFIGURATION_KEY_PREFIX = 
"gobblin.dataset.filter.";
+  public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime";
+  public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime";
+  public static final String DEFAULT_DATE_PATTERN_TIMEZONE = 
ConfigurationKeys.PST_TIMEZONE_NAME;
+  public static final String DATE_PATTERN_TIMEZONE_KEY = 
CONFIGURATION_KEY_PREFIX + "datetime.timezone";
+
+  public ModifiedDateRangeBasedFileFilter(Properties properties) {
+this.props = properties;
+PeriodFormatter periodFormatter =
+new 
PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter();
+this.minLookBackPeriod = 
props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? 
periodFormatter.parsePeriod(
+props.getProperty(MODIFIED_MIN_LOOK_BACK_TIME_KEY)) : new 
Period(DateTime.now().getMillis());
+this.maxLookBackPeriod = 
props.containsKey(MODIFIED_MAX_LOOK_BACK_TIME_KEY) ? 
periodFormatter.parsePeriod(
+props.getProperty(MODIFIED_MAX_LOOK_BACK_TIME_KEY)) : new 
Period(DateTime.now().minusDays(1).getMillis());
+this.currentTime = properties.containsKey(DATE_PATTERN_TIMEZONE_KEY) ? 
DateTime.now(
+DateTimeZone.forID(props.getProperty(DATE_PATTERN_TIMEZONE_KEY)))
+: DateTime.now(DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE));
+this.minLookBackTime = this.currentTime.minus(minLookBackPeriod);
+this.maxLookBackTime = this.currentTime.minus(maxLookBackPeriod);
+  }
+
+  /**
+   * For every {@link CopyableFile} in copyableFiles checks if a 
{@link CopyableFile#getOrigin()#getPath()#getModificationTime()}
+   * + date between the min and max look back window on sourceFs 
{@inheritDoc}
+   *
+   * @see CopyableFileFilter#filter(FileSystem,
+   *  FileSystem, Collection)
+   */
+  @Override
+  public Collection filter(FileSystem sourceFs, FileSystem 
targetFs,
+  Collection copyableFiles) {
+Iterator iterator = copyableFiles.iterator();
+
+ImmutableList.Builder filtered = ImmutableList.builder();
+
+while (iterator.hasNext()) {
+  CopyableFile file = iterator.next();
+  boolean fileWithInModWindow = 
isFileModifiedBtwLookBackPeriod(file.getOrigin().getModifica

[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-09-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312743&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312743
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Sep/19 22:47
Start Date: 15/Sep/19 22:47
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324484757
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimestampBasedCopyableDataset.java
 ##
 @@ -134,7 +133,11 @@ public TimestampBasedCopyableDataset(FileSystem fs, 
Properties props, Path datas
 } finally {
   ExecutorsUtils.shutdownExecutorService(executor, Optional.of(log));
 }
-return copyableFileList;
+
+ConcurrentLinkedQueue copyableFilesFilteredList = new 
ConcurrentLinkedQueue<>();
 
 Review comment:
   Also, see the comment earlier about returning a merged path filter in 
TimeStampBasedCopyableDataset#copyFileFilter() method. That way, you can remove 
this filtering logic in the end.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 312743)
Time Spent: 4h  (was: 3h 50m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-09-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312739&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312739
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Sep/19 22:47
Start Date: 15/Sep/19 22:47
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324484290
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/policy/SelectBetweenTimeBasedPolicy.java
 ##
 @@ -94,17 +100,25 @@ public SelectBetweenTimeBasedPolicy(Optional 
minLookBackPeriod, Optional
   public boolean apply(TimestampedDatasetVersion version) {
 return version.getDateTime()
 .plus(SelectBetweenTimeBasedPolicy.this.maxLookBackPeriod.or(new 
Period(DateTime.now().getMillis(
-.isAfterNow()
-&& 
version.getDateTime().plus(SelectBetweenTimeBasedPolicy.this.minLookBackPeriod.or(new
 Period(0)))
-.isBeforeNow();
+.isAfterNow() && version.getDateTime()
+.plus(SelectBetweenTimeBasedPolicy.this.minLookBackPeriod.or(new 
Period(0)))
+.isBeforeNow();
   }
 };
   }
 
   protected static Period getLookBackPeriod(String lookbackTime) {
 
 Review comment:
   Looks like this is just reformatting. Unless there is a reason to reformat, 
leave it as is. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 312739)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-09-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=312733&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-312733
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 15/Sep/19 22:47
Start Date: 15/Sep/19 22:47
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r324483254
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/ModifiedDateRangeBasedFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
+ *  sourceFs
+ */
+@Slf4j
+public class ModifiedDateRangeBasedFileFilter implements CopyableFileFilter {
+
+  private final Properties props;
+  private Period minLookBackPeriod;
+  private Period maxLookBackPeriod;
+  private DateTime currentTime;
+  private DateTime minLookBackTime;
+  private DateTime maxLookBackTime;
+
+  public static final String CONFIGURATION_KEY_PREFIX = 
"gobblin.dataset.filter.";
+  public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime";
+  public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime";
+  public static final String DEFAULT_DATE_PATTERN_TIMEZONE = 
ConfigurationKeys.PST_TIMEZONE_NAME;
+  public static final String DATE_PATTERN_TIMEZONE_KEY = 
CONFIGURATION_KEY_PREFIX + "datetime.timezone";
+
+  public ModifiedDateRangeBasedFileFilter(Properties properties) {
+this.props = properties;
+PeriodFormatter periodFormatter =
+new 
PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter();
+this.minLookBackPeriod = 
props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? 
periodFormatter.parsePeriod(
+props.getProperty(MODIFIED_MIN_LOOK_BACK_TIME_KEY)) : new 
Period(DateTime.now().getMillis());
+this.maxLookBackPeriod = 
props.containsKey(MODIFIED_MAX_LOOK_BACK_TIME_KEY) ? 
periodFormatter.parsePeriod(
+props.getProperty(MODIFIED_MAX_LOOK_BACK_TIME_KEY)) : new 
Period(DateTime.now().minusDays(1).getMillis());
+this.currentTime = properties.containsKey(DATE_PATTERN_TIMEZONE_KEY) ? 
DateTime.now(
+DateTimeZone.forID(props.getProperty(DATE_PATTERN_TIMEZONE_KEY)))
+: DateTime.now(DateTimeZone.forID(DEFAULT_DATE_PATTERN_TIMEZONE));
+this.minLookBackTime = this.currentTime.minus(minLookBackPeriod);
+this.maxLookBackTime = this.currentTime.minus(maxLookBackPeriod);
+  }
+
+  /**
+   * For every {@link CopyableFile} in copyableFiles checks if a 
{@link CopyableFile#getOrigin()#getPath()#getModificationTime()}
+   * + date between the min and max look back window on sourceFs 
{@inheritDoc}
+   *
+   * @see CopyableFileFilter#filter(FileSystem,
+   *  FileSystem, Collection)
+   */
+  @Override
+  public Collection filter(FileSystem sourceFs, FileSystem 
targetFs,
+  Collection copyableFiles) {
+Iterator iterator = copyableFiles.iterator();
+
+ImmutableList.Builder filtered = ImmutableList.builder();
+
+while (iterator.hasNext()) {
+  CopyableFile file = iterator.next();
+  boolean fileWithInModWindow = 
isFileModifiedBtwLookBackPeriod(file.getOrigin().getModifica

[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-06-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=259196&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-259196
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 13/Jun/19 00:52
Start Date: 13/Jun/19 00:52
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on issue #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: 
https://github.com/apache/incubator-gobblin/pull/2633#issuecomment-501507352
 
 
   @jhsenjaliya Pushed the changes, please review
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 259196)
Time Spent: 3h 10m  (was: 3h)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=257969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257969
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 11/Jun/19 17:42
Start Date: 11/Jun/19 17:42
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r292580464
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/SelectBtwModDataTimeBasedCopyableFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
+ *  sourceFs
+ */
+@Slf4j
+public class SelectBtwModDataTimeBasedCopyableFileFilter implements 
CopyableFileFilter {
+
+  private final Properties props;
+  private Period minLookBackPeriod;
+  private Period maxLookBackPeriod;
+  private DateTime currentTime;
+  private DateTime minLookBackTime;
+  private DateTime maxLookBackTime;
+
+  public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset.";
+  public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime";
+  public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime";
+  public static final String DEFAULT_DATE_PATTERN_TIMEZONE = 
ConfigurationKeys.PST_TIMEZONE_NAME;
+  public static final String DATE_PATTERN_TIMEZONE_KEY = 
CONFIGURATION_KEY_PREFIX + "datetime.timezone";
+
+  public SelectBtwModDataTimeBasedCopyableFileFilter(Properties properties) {
+this.props = properties;
+PeriodFormatter periodFormatter =
+new 
PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter();
+this.minLookBackPeriod = 
props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? 
periodFormatter.parsePeriod(
 
 Review comment:
   i would like to follow the convention used in Gobblin to have min and max 
look back
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257969)
Time Spent: 3h  (was: 2h 50m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=257964&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257964
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 11/Jun/19 17:41
Start Date: 11/Jun/19 17:41
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r292580001
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/SelectBtwModDataTimeBasedCopyableFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
+ *  sourceFs
+ */
+@Slf4j
+public class SelectBtwModDataTimeBasedCopyableFileFilter implements 
CopyableFileFilter {
 
 Review comment:
   yep it looks to me more appropriate, will change, thanks 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257964)
Time Spent: 2h 40m  (was: 2.5h)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=257965&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257965
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 11/Jun/19 17:41
Start Date: 11/Jun/19 17:41
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r292580156
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/SelectBtwModDataTimeBasedCopyableFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
+ *  sourceFs
+ */
+@Slf4j
+public class SelectBtwModDataTimeBasedCopyableFileFilter implements 
CopyableFileFilter {
+
+  private final Properties props;
+  private Period minLookBackPeriod;
+  private Period maxLookBackPeriod;
+  private DateTime currentTime;
+  private DateTime minLookBackTime;
+  private DateTime maxLookBackTime;
+
+  public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset.";
 
 Review comment:
   sure
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257965)
Time Spent: 2h 50m  (was: 2h 40m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-05-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=249886&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-249886
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 29/May/19 06:49
Start Date: 29/May/19 06:49
Worklog Time Spent: 10m 
  Work Description: jhsenjaliya commented on issue #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: 
https://github.com/apache/incubator-gobblin/pull/2633#issuecomment-496808201
 
 
   will continue review tomorrow
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 249886)
Time Spent: 2.5h  (was: 2h 20m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-05-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=249880&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-249880
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 29/May/19 06:44
Start Date: 29/May/19 06:44
Worklog Time Spent: 10m 
  Work Description: jhsenjaliya commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r288414241
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/SelectBtwModDataTimeBasedCopyableFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
+ *  sourceFs
+ */
+@Slf4j
+public class SelectBtwModDataTimeBasedCopyableFileFilter implements 
CopyableFileFilter {
+
+  private final Properties props;
+  private Period minLookBackPeriod;
+  private Period maxLookBackPeriod;
+  private DateTime currentTime;
+  private DateTime minLookBackTime;
+  private DateTime maxLookBackTime;
+
+  public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset.";
+  public static final String MODIFIED_MIN_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.min.lookbackTime";
+  public static final String MODIFIED_MAX_LOOK_BACK_TIME_KEY =
+  CONFIGURATION_KEY_PREFIX + "selection.modified.max.lookbackTime";
+  public static final String DEFAULT_DATE_PATTERN_TIMEZONE = 
ConfigurationKeys.PST_TIMEZONE_NAME;
+  public static final String DATE_PATTERN_TIMEZONE_KEY = 
CONFIGURATION_KEY_PREFIX + "datetime.timezone";
+
+  public SelectBtwModDataTimeBasedCopyableFileFilter(Properties properties) {
+this.props = properties;
+PeriodFormatter periodFormatter =
+new 
PeriodFormatterBuilder().appendDays().appendSuffix("d").appendHours().appendSuffix("h").toFormatter();
+this.minLookBackPeriod = 
props.containsKey(MODIFIED_MIN_LOOK_BACK_TIME_KEY) ? 
periodFormatter.parsePeriod(
 
 Review comment:
   i initially thought `minLookBackPeriod` as what `minLookBackPeriod` is.
   If it helps, how about using startDate-endDate or since-by terminology? like
   `this.modifiedSince = 
props.containsKey("gobblin.dataset.filter.modified.since")` or 
`this.modifiedStartDate = 
props.containsKey("gobblin.dataset.filter.modified.startDate")`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 249880)
Time Spent: 2h 20m  (was: 2h 10m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT mod

[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-05-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=249872&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-249872
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 29/May/19 06:38
Start Date: 29/May/19 06:38
Worklog Time Spent: 10m 
  Work Description: jhsenjaliya commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r288412642
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/SelectBtwModDataTimeBasedCopyableFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
+ *  sourceFs
+ */
+@Slf4j
+public class SelectBtwModDataTimeBasedCopyableFileFilter implements 
CopyableFileFilter {
+
+  private final Properties props;
+  private Period minLookBackPeriod;
+  private Period maxLookBackPeriod;
+  private DateTime currentTime;
+  private DateTime minLookBackTime;
+  private DateTime maxLookBackTime;
+
+  public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset.";
 
 Review comment:
   how about "gobblin.dataset.filter" to indicate all other properties to be 
specific to this filtering process?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 249872)
Time Spent: 2h 10m  (was: 2h)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-05-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=249871&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-249871
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 29/May/19 06:38
Start Date: 29/May/19 06:38
Worklog Time Spent: 10m 
  Work Description: jhsenjaliya commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r288412642
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/SelectBtwModDataTimeBasedCopyableFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
+ *  sourceFs
+ */
+@Slf4j
+public class SelectBtwModDataTimeBasedCopyableFileFilter implements 
CopyableFileFilter {
+
+  private final Properties props;
+  private Period minLookBackPeriod;
+  private Period maxLookBackPeriod;
+  private DateTime currentTime;
+  private DateTime minLookBackTime;
+  private DateTime maxLookBackTime;
+
+  public static final String CONFIGURATION_KEY_PREFIX = "gobblin.dataset.";
 
 Review comment:
   should you use "gobblin.dataset.filter" to indicate all other properties to 
be specific to this filtering process?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 249871)
Time Spent: 2h  (was: 1h 50m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-05-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=249870&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-249870
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 29/May/19 06:34
Start Date: 29/May/19 06:34
Worklog Time Spent: 10m 
  Work Description: jhsenjaliya commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r288411603
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/SelectBtwModDataTimeBasedCopyableFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
+ *  sourceFs
+ */
+@Slf4j
+public class SelectBtwModDataTimeBasedCopyableFileFilter implements 
CopyableFileFilter {
 
 Review comment:
   should this be named `ModifiedDateRangeBasedFileFilter` ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 249870)
Time Spent: 1h 50m  (was: 1h 40m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-05-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=249869&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-249869
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 29/May/19 06:34
Start Date: 29/May/19 06:34
Worklog Time Spent: 10m 
  Work Description: jhsenjaliya commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633#discussion_r288411603
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/SelectBtwModDataTimeBasedCopyableFileFilter.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.data.management.copy;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Properties;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.Period;
+import org.joda.time.format.PeriodFormatter;
+import org.joda.time.format.PeriodFormatterBuilder;
+
+import com.google.common.collect.ImmutableList;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+
+
+/**
+ * A {@link CopyableFileFilter} that drops a {@link CopyableFile} if file 
modified time not between the loop back window
+ *  sourceFs
+ */
+@Slf4j
+public class SelectBtwModDataTimeBasedCopyableFileFilter implements 
CopyableFileFilter {
 
 Review comment:
   should be named `DateRangeBasedFileFilter` ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 249869)
Time Spent: 1h 40m  (was: 1.5h)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-05-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=241531&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-241531
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 14/May/19 05:30
Start Date: 14/May/19 05:30
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on issue #2633: GOBBLIN-759: 
Added feature to support DistCP to copy files that were …
URL: 
https://github.com/apache/incubator-gobblin/pull/2633#issuecomment-492084494
 
 
   @sv2000 @htran1 @jhsenjaliya created New PR. Please review
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 241531)
Time Spent: 1.5h  (was: 1h 20m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-05-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=241417&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-241417
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 14/May/19 00:36
Start Date: 14/May/19 00:36
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2623: 
[GOBBLIN-759] Added feature to support DistCP to copy files modified in last n 
days
URL: https://github.com/apache/incubator-gobblin/pull/2623
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 241417)
Time Spent: 1h 10m  (was: 1h)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-05-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=241418&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-241418
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 14/May/19 00:36
Start Date: 14/May/19 00:36
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2633: 
GOBBLIN-759: Added feature to support DistCP to copy files that were …
URL: https://github.com/apache/incubator-gobblin/pull/2633
 
 
   …modified in last n days
   
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [ ] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-759] My Added feature to support 
DistCP to copy files modified in last n days"
   - https://issues.apache.org/jira/browse/GOBBLIN-759
   
   
   ### Description
   - [ ] Here are some details about my PR, including screenshots (if 
applicable):
   1. Added feature to DistCP the files which were modified in last n days 
within the lookback period.
   2. This feature allows to copy only the modified files even when non 
modified files not at the destination.
   3. Leverage existing TimestampBasedCopyableDataset to find the dataset and 
uses SelectBtwModDataTimeBasedCopyableFileFilter CopyableFilter implementation 
to filter the files that were modified in last n days.
   
   
   ### Tests
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   1. Added TimestampBasedCopyableDatasetTest.testCopyWithFilter test case to 
test 1 modified and 1 non-modified scenario.
   
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 241418)
Time Spent: 1h 20m  (was: 1h 10m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-05-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=240879&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240879
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 13/May/19 05:40
Start Date: 13/May/19 05:40
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2623: 
[GOBBLIN-759] Added feature to support DistCP to copy files modified in last n 
days
URL: https://github.com/apache/incubator-gobblin/pull/2623
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240879)
Time Spent: 50m  (was: 40m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-05-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=240881&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240881
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 13/May/19 05:41
Start Date: 13/May/19 05:41
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2623: 
[GOBBLIN-759] Added feature to support DistCP to copy files modified in last n 
days
URL: https://github.com/apache/incubator-gobblin/pull/2623
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-759] My Added feature to support 
DistCP to copy files modified in last n days"
   - https://issues.apache.org/jira/browse/GOBBLIN-759
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   1. Added feature to DistCP the files which were modified in last n days 
within the lookback period.
   2. This feature allows to copy only the modified files even when non 
modified files not at the destination.
   3. Leverage existing TimestampBasedCopyableDataset to find the dataset and 
uses SelectBtwModDataTimeBasedCopyableFileFilter CopyableFilter implementation 
to filter the files that were modified in last n days.
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   1. Added TimestampBasedCopyableDatasetTest.testCopyWithFilter test case to 
test 1 modified and 1 non-modified scenario.
   
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240881)
Time Spent: 1h  (was: 50m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=240533&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240533
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 11/May/19 01:21
Start Date: 11/May/19 01:21
Worklog Time Spent: 10m 
  Work Description: jhsenjaliya commented on issue #2623: [GOBBLIN-759] 
Added feature to support DistCP to copy files modified in last n days
URL: 
https://github.com/apache/incubator-gobblin/pull/2623#issuecomment-491467992
 
 
   @amarnathkarthik, can you pls squash this commits? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240533)
Time Spent: 40m  (was: 0.5h)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-04-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=234781&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-234781
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 29/Apr/19 18:16
Start Date: 29/Apr/19 18:16
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on issue #2623: [GOBBLIN-759] 
Added feature to support DistCP to copy files modified in last n days
URL: 
https://github.com/apache/incubator-gobblin/pull/2623#issuecomment-487687405
 
 
   @sv2000 Build successful, please review
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 234781)
Time Spent: 0.5h  (was: 20m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-04-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=234024&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-234024
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 27/Apr/19 21:19
Start Date: 27/Apr/19 21:19
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on pull request #2623: 
[GOBBLIN-759] Added feature to support DistCP to copy files modified in last n 
days
URL: https://github.com/apache/incubator-gobblin/pull/2623
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [ ] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-759] My Added feature to support 
DistCP to copy files modified in last n days"
   - https://issues.apache.org/jira/browse/GOBBLIN-759
   
   
   ### Description
   - [ ] Here are some details about my PR, including screenshots (if 
applicable):
   1. Added feature to DistCP the files which were modified in last n days 
within the lookback period.
   2. This feature allows to copy only the modified files even when non 
modified files not at the destination.
   3. Leverage existing TimestampBasedCopyableDataset to find the dataset and 
uses SelectBtwModDataTimeBasedCopyableFileFilter CopyableFilter implementation 
to filter the files that were modified in last n days.
   
   ### Tests
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   1. Added TimestampBasedCopyableDatasetTest.testCopyWithFilter test case to 
test 1 modified and 1 non-modified scenario.
   
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 234024)
Time Spent: 10m
Remaining Estimate: 0h

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-759) DistCP files modified in last n days within a look back period

2019-04-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-759?focusedWorklogId=234025&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-234025
 ]

ASF GitHub Bot logged work on GOBBLIN-759:
--

Author: ASF GitHub Bot
Created on: 27/Apr/19 21:19
Start Date: 27/Apr/19 21:19
Worklog Time Spent: 10m 
  Work Description: amarnathkarthik commented on issue #2623: [GOBBLIN-759] 
Added feature to support DistCP to copy files modified in last n days
URL: 
https://github.com/apache/incubator-gobblin/pull/2623#issuecomment-487320699
 
 
   @sv2000 Please review. Thanks
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 234025)
Time Spent: 20m  (was: 10m)

> DistCP files modified in last n days within a look back period
> --
>
> Key: GOBBLIN-759
> URL: https://issues.apache.org/jira/browse/GOBBLIN-759
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Karthik Amarnath
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *Feature Request:*
>  # DistCP only the files modified in last n days within the look back window.
>  # DistCP will copy only the files modified even when the source file which 
> were NOT modified in last n days in the destination directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)