[jira] [Resolved] (GOBBLIN-721) Gobblin streaming recipe is broken

2019-04-01 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-721.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2588
[https://github.com/apache/incubator-gobblin/pull/2588]

> Gobblin streaming recipe is broken
> --
>
> Key: GOBBLIN-721
> URL: https://issues.apache.org/jira/browse/GOBBLIN-721
> Project: Apache Gobblin
>  Issue Type: Bug
>  Components: gobblin-core
>Reporter: Shirshanka Das
>Assignee: Shirshanka Das
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Running the simple streaming pull file results in a "negative acks" problem 
> because the internal pipeline has been re-architected to ack automatically. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-721) Gobblin streaming recipe is broken

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-721?focusedWorklogId=221607=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221607
 ]

ASF GitHub Bot logged work on GOBBLIN-721:
--

Author: ASF GitHub Bot
Created on: 02/Apr/19 04:20
Start Date: 02/Apr/19 04:20
Worklog Time Spent: 10m 
  Work Description: asfgit commented on pull request #2588: GOBBLIN-721: 
Remove additional ack. Simplify watermark manager (remov…
URL: https://github.com/apache/incubator-gobblin/pull/2588
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221607)
Time Spent: 20m  (was: 10m)

> Gobblin streaming recipe is broken
> --
>
> Key: GOBBLIN-721
> URL: https://issues.apache.org/jira/browse/GOBBLIN-721
> Project: Apache Gobblin
>  Issue Type: Bug
>  Components: gobblin-core
>Reporter: Shirshanka Das
>Assignee: Shirshanka Das
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Running the simple streaming pull file results in a "negative acks" problem 
> because the internal pipeline has been re-architected to ack automatically. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] asfgit closed pull request #2588: GOBBLIN-721: Remove additional ack. Simplify watermark manager (remov…

2019-04-01 Thread GitBox
asfgit closed pull request #2588: GOBBLIN-721: Remove additional ack. Simplify 
watermark manager (remov…
URL: https://github.com/apache/incubator-gobblin/pull/2588
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-721) Gobblin streaming recipe is broken

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-721?focusedWorklogId=221429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221429
 ]

ASF GitHub Bot logged work on GOBBLIN-721:
--

Author: ASF GitHub Bot
Created on: 01/Apr/19 19:11
Start Date: 01/Apr/19 19:11
Worklog Time Spent: 10m 
  Work Description: shirshanka commented on pull request #2588: 
GOBBLIN-721: Remove additional ack. Simplify watermark manager (remov…
URL: https://github.com/apache/incubator-gobblin/pull/2588
 
 
   …e multiwriter)
   
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [X] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-721
   
   
   ### Description
   - [X] Here are some details about my PR, including screenshots (if 
applicable):
 Removed additional acking now that inner pipeline acks automatically. 
   
   ### Tests
   - [X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   TaskContinuousTest.java updated 
   
   ### Commits
   - [X] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221429)
Time Spent: 10m
Remaining Estimate: 0h

> Gobblin streaming recipe is broken
> --
>
> Key: GOBBLIN-721
> URL: https://issues.apache.org/jira/browse/GOBBLIN-721
> Project: Apache Gobblin
>  Issue Type: Bug
>  Components: gobblin-core
>Reporter: Shirshanka Das
>Assignee: Shirshanka Das
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Running the simple streaming pull file results in a "negative acks" problem 
> because the internal pipeline has been re-architected to ack automatically. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] shirshanka opened a new pull request #2588: GOBBLIN-721: Remove additional ack. Simplify watermark manager (remov…

2019-04-01 Thread GitBox
shirshanka opened a new pull request #2588: GOBBLIN-721: Remove additional ack. 
Simplify watermark manager (remov…
URL: https://github.com/apache/incubator-gobblin/pull/2588
 
 
   …e multiwriter)
   
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [X] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-721
   
   
   ### Description
   - [X] Here are some details about my PR, including screenshots (if 
applicable):
 Removed additional acking now that inner pipeline acks automatically. 
   
   ### Tests
   - [X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   TaskContinuousTest.java updated 
   
   ### Commits
   - [X] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (GOBBLIN-721) Gobblin streaming recipe is broken

2019-04-01 Thread Shirshanka Das (JIRA)
Shirshanka Das created GOBBLIN-721:
--

 Summary: Gobblin streaming recipe is broken
 Key: GOBBLIN-721
 URL: https://issues.apache.org/jira/browse/GOBBLIN-721
 Project: Apache Gobblin
  Issue Type: Bug
  Components: gobblin-core
Reporter: Shirshanka Das
Assignee: Shirshanka Das


Running the simple streaming pull file results in a "negative acks" problem 
because the internal pipeline has been re-architected to ack automatically. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-712) Add version strategy for configbased dataset copy

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-712?focusedWorklogId=221420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221420
 ]

ASF GitHub Bot logged work on GOBBLIN-712:
--

Author: ASF GitHub Bot
Created on: 01/Apr/19 18:52
Start Date: 01/Apr/19 18:52
Worklog Time Spent: 10m 
  Work Description: ibuenros commented on pull request #2579: [GOBBLIN-712] 
Add version strategy pickup for ConfigBasedDataset distcp workflow
URL: https://github.com/apache/incubator-gobblin/pull/2579#discussion_r271003747
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/replication/ConfigBasedDataset.java
 ##
 @@ -128,6 +175,17 @@ public String datasetURN() {
   return copyableFiles;
 }
 
+if (!this.srcDataFileVersionStrategy.isPresent() || 
!this.dstDataFileVersionStrategy.isPresent()) {
+  log.warn("Version strategy doesn't exist, cannot handle copy");
+  return copyableFiles;
+}
+
+if (!this.srcDataFileVersionStrategy.get().getClass().getName()
+.equals(this.dstDataFileVersionStrategy.get().getClass().getName())) {
+  log.warn("Version strategy doesn't match, cannot handle copy");
 
 Review comment:
   Let's print out the src and target strategies for easier debuggability.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221420)
Time Spent: 2h 10m  (was: 2h)

> Add version strategy for configbased dataset copy
> -
>
> Key: GOBBLIN-712
> URL: https://issues.apache.org/jira/browse/GOBBLIN-712
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] ibuenros commented on a change in pull request #2579: [GOBBLIN-712] Add version strategy pickup for ConfigBasedDataset distcp workflow

2019-04-01 Thread GitBox
ibuenros commented on a change in pull request #2579: [GOBBLIN-712] Add version 
strategy pickup for ConfigBasedDataset distcp workflow
URL: https://github.com/apache/incubator-gobblin/pull/2579#discussion_r271003747
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/replication/ConfigBasedDataset.java
 ##
 @@ -128,6 +175,17 @@ public String datasetURN() {
   return copyableFiles;
 }
 
+if (!this.srcDataFileVersionStrategy.isPresent() || 
!this.dstDataFileVersionStrategy.isPresent()) {
+  log.warn("Version strategy doesn't exist, cannot handle copy");
+  return copyableFiles;
+}
+
+if (!this.srcDataFileVersionStrategy.get().getClass().getName()
+.equals(this.dstDataFileVersionStrategy.get().getClass().getName())) {
+  log.warn("Version strategy doesn't match, cannot handle copy");
 
 Review comment:
   Let's print out the src and target strategies for easier debuggability.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-712) Add version strategy for configbased dataset copy

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-712?focusedWorklogId=221421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221421
 ]

ASF GitHub Bot logged work on GOBBLIN-712:
--

Author: ASF GitHub Bot
Created on: 01/Apr/19 18:52
Start Date: 01/Apr/19 18:52
Worklog Time Spent: 10m 
  Work Description: ibuenros commented on pull request #2579: [GOBBLIN-712] 
Add version strategy pickup for ConfigBasedDataset distcp workflow
URL: https://github.com/apache/incubator-gobblin/pull/2579#discussion_r271003600
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/replication/ConfigBasedDataset.java
 ##
 @@ -94,6 +107,40 @@ public ConfigBasedDataset(ReplicationConfiguration rc, 
Properties props, CopyRou
 this.pathFilter = DatasetUtils.instantiatePathFilter(this.props);
 this.applyFilterToDirectories =
 
Boolean.parseBoolean(this.props.getProperty(CopyConfiguration.APPLY_FILTER_TO_DIRECTORIES,
 "false"));
+this.srcDataFileVersionStrategy = 
getDataFileVersionStrategy(this.copyRoute.getCopyFrom(), rc, props);
+this.dstDataFileVersionStrategy = 
getDataFileVersionStrategy(this.copyRoute.getCopyTo(), rc, props);
+this.enforceFileLengthMatch = 
Boolean.parseBoolean(this.props.getProperty(CopyConfiguration.ENFORCE_FILE_LENGTH_MATCH,
 "true"));
+  }
+
+  /**
+   * Get the version strategy that can retrieve the data file version from the 
end point.
+   *
+   * @return the version strategy. Empty value when the version is not 
supported for this end point.
+   */
+  private Optional 
getDataFileVersionStrategy(EndPoint endPoint, ReplicationConfiguration rc, 
Properties props) {
+if (!(endPoint instanceof HadoopFsEndPoint)) {
+  log.warn("Data file version currently only handle the Hadoop Fs EndPoint 
replication");
+  return Optional.absent();
+}
+Configuration conf = HadoopUtils.newConfiguration();
+try {
+  HadoopFsEndPoint hEndpoint = (HadoopFsEndPoint) endPoint;
+  FileSystem fs = FileSystem.get(hEndpoint.getFsURI(), conf);
+
+  // IF configStore doesn't contain the strategy, check from job 
properties.
+  // If no strategy is found, default to the modification time strategy.
+  Optional versionStrategy = 
rc.getVersionStrategyFromConfigStore();
+  Config versionStrategyConfig = ConfigFactory.parseMap(ImmutableMap.of(
+  DataFileVersionStrategy.DATA_FILE_VERSION_STRATEGY_KEY, 
versionStrategy.isPresent()? versionStrategy.get() :
+  
props.getProperty(DataFileVersionStrategy.DATA_FILE_VERSION_STRATEGY_KEY,
+  ModTimeDataFileVersionStrategy.Factory.class.getName(;
+
+  DataFileVersionStrategy strategy = 
DataFileVersionStrategy.instantiateDataFileVersionStrategy(fs, 
versionStrategyConfig);
+  log.debug("{} has version strategy {}", hEndpoint.getClusterName());
+  return Optional.of(strategy);
+} catch (IOException e) {
+  return Optional.absent();
 
 Review comment:
   Should we log the exception?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221421)
Time Spent: 2h 10m  (was: 2h)

> Add version strategy for configbased dataset copy
> -
>
> Key: GOBBLIN-712
> URL: https://issues.apache.org/jira/browse/GOBBLIN-712
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] ibuenros commented on a change in pull request #2579: [GOBBLIN-712] Add version strategy pickup for ConfigBasedDataset distcp workflow

2019-04-01 Thread GitBox
ibuenros commented on a change in pull request #2579: [GOBBLIN-712] Add version 
strategy pickup for ConfigBasedDataset distcp workflow
URL: https://github.com/apache/incubator-gobblin/pull/2579#discussion_r271003600
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/replication/ConfigBasedDataset.java
 ##
 @@ -94,6 +107,40 @@ public ConfigBasedDataset(ReplicationConfiguration rc, 
Properties props, CopyRou
 this.pathFilter = DatasetUtils.instantiatePathFilter(this.props);
 this.applyFilterToDirectories =
 
Boolean.parseBoolean(this.props.getProperty(CopyConfiguration.APPLY_FILTER_TO_DIRECTORIES,
 "false"));
+this.srcDataFileVersionStrategy = 
getDataFileVersionStrategy(this.copyRoute.getCopyFrom(), rc, props);
+this.dstDataFileVersionStrategy = 
getDataFileVersionStrategy(this.copyRoute.getCopyTo(), rc, props);
+this.enforceFileLengthMatch = 
Boolean.parseBoolean(this.props.getProperty(CopyConfiguration.ENFORCE_FILE_LENGTH_MATCH,
 "true"));
+  }
+
+  /**
+   * Get the version strategy that can retrieve the data file version from the 
end point.
+   *
+   * @return the version strategy. Empty value when the version is not 
supported for this end point.
+   */
+  private Optional 
getDataFileVersionStrategy(EndPoint endPoint, ReplicationConfiguration rc, 
Properties props) {
+if (!(endPoint instanceof HadoopFsEndPoint)) {
+  log.warn("Data file version currently only handle the Hadoop Fs EndPoint 
replication");
+  return Optional.absent();
+}
+Configuration conf = HadoopUtils.newConfiguration();
+try {
+  HadoopFsEndPoint hEndpoint = (HadoopFsEndPoint) endPoint;
+  FileSystem fs = FileSystem.get(hEndpoint.getFsURI(), conf);
+
+  // IF configStore doesn't contain the strategy, check from job 
properties.
+  // If no strategy is found, default to the modification time strategy.
+  Optional versionStrategy = 
rc.getVersionStrategyFromConfigStore();
+  Config versionStrategyConfig = ConfigFactory.parseMap(ImmutableMap.of(
+  DataFileVersionStrategy.DATA_FILE_VERSION_STRATEGY_KEY, 
versionStrategy.isPresent()? versionStrategy.get() :
+  
props.getProperty(DataFileVersionStrategy.DATA_FILE_VERSION_STRATEGY_KEY,
+  ModTimeDataFileVersionStrategy.Factory.class.getName(;
+
+  DataFileVersionStrategy strategy = 
DataFileVersionStrategy.instantiateDataFileVersionStrategy(fs, 
versionStrategyConfig);
+  log.debug("{} has version strategy {}", hEndpoint.getClusterName());
+  return Optional.of(strategy);
+} catch (IOException e) {
+  return Optional.absent();
 
 Review comment:
   Should we log the exception?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-720) delete the state store whenever a flow is deleted

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-720?focusedWorklogId=221409=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221409
 ]

ASF GitHub Bot logged work on GOBBLIN-720:
--

Author: ASF GitHub Bot
Created on: 01/Apr/19 18:44
Start Date: 01/Apr/19 18:44
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2587: [GOBBLIN-720 
Always delete state store
URL: https://github.com/apache/incubator-gobblin/pull/2587#discussion_r271001196
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/job_monitor/KafkaJobMonitor.java
 ##
 @@ -115,4 +118,17 @@ protected void processMessage(MessageAndMetadata message) {
 }
   }
 
+  private void deleteStateStore(URI jobSpecUri) throws IOException {
+String[] uriTokens = jobSpecUri.getPath().split("/");
+if (null == this.datasetStateStore) {
+  log.warn("Job state store deletion failed as datasetstore is not 
initialized.");
+  return;
+} if (uriTokens.length != 3) {
+  log.error("Invalid URI {}.", jobSpecUri);
+  return;
+}
+String jobName = uriTokens[2];
 
 Review comment:
   Maybe, String jobName = uriTokens[uriTokens.length - 1]?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221409)
Time Spent: 0.5h  (was: 20m)

> delete the state store whenever a flow is deleted
> -
>
> Key: GOBBLIN-720
> URL: https://issues.apache.org/jira/browse/GOBBLIN-720
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Arjun Singh Bora
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-720) delete the state store whenever a flow is deleted

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-720?focusedWorklogId=221411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221411
 ]

ASF GitHub Bot logged work on GOBBLIN-720:
--

Author: ASF GitHub Bot
Created on: 01/Apr/19 18:44
Start Date: 01/Apr/19 18:44
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2587: [GOBBLIN-720 
Always delete state store
URL: https://github.com/apache/incubator-gobblin/pull/2587#discussion_r271001383
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/job_monitor/KafkaJobMonitor.java
 ##
 @@ -115,4 +118,17 @@ protected void processMessage(MessageAndMetadata message) {
 }
   }
 
+  private void deleteStateStore(URI jobSpecUri) throws IOException {
+String[] uriTokens = jobSpecUri.getPath().split("/");
+if (null == this.datasetStateStore) {
+  log.warn("Job state store deletion failed as datasetstore is not 
initialized.");
+  return;
+} if (uriTokens.length != 3) {
 
 Review comment:
   Perhaps define a constant EXPECTED_NUM_URI_TOKENS = 3 ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221411)
Time Spent: 40m  (was: 0.5h)

> delete the state store whenever a flow is deleted
> -
>
> Key: GOBBLIN-720
> URL: https://issues.apache.org/jira/browse/GOBBLIN-720
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Arjun Singh Bora
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2587: [GOBBLIN-720 Always delete state store

2019-04-01 Thread GitBox
sv2000 commented on a change in pull request #2587: [GOBBLIN-720 Always delete 
state store
URL: https://github.com/apache/incubator-gobblin/pull/2587#discussion_r271001383
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/job_monitor/KafkaJobMonitor.java
 ##
 @@ -115,4 +118,17 @@ protected void processMessage(MessageAndMetadata message) {
 }
   }
 
+  private void deleteStateStore(URI jobSpecUri) throws IOException {
+String[] uriTokens = jobSpecUri.getPath().split("/");
+if (null == this.datasetStateStore) {
+  log.warn("Job state store deletion failed as datasetstore is not 
initialized.");
+  return;
+} if (uriTokens.length != 3) {
 
 Review comment:
   Perhaps define a constant EXPECTED_NUM_URI_TOKENS = 3 ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] ibuenros commented on issue #2582: UnitTest for KafkaSource

2019-04-01 Thread GitBox
ibuenros commented on issue #2582: UnitTest for KafkaSource
URL: 
https://github.com/apache/incubator-gobblin/pull/2582#issuecomment-478697677
 
 
   @htran1 can you merge?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-720) delete the state store whenever a flow is deleted

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-720?focusedWorklogId=221407=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221407
 ]

ASF GitHub Bot logged work on GOBBLIN-720:
--

Author: ASF GitHub Bot
Created on: 01/Apr/19 18:42
Start Date: 01/Apr/19 18:42
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2587: [GOBBLIN-720 
Always delete state store
URL: https://github.com/apache/incubator-gobblin/pull/2587#discussion_r271000502
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/job_monitor/KafkaJobMonitor.java
 ##
 @@ -115,4 +118,17 @@ protected void processMessage(MessageAndMetadata message) {
 }
   }
 
+  private void deleteStateStore(URI jobSpecUri) throws IOException {
 
 Review comment:
   Can you add Javadoc to explain the behavior of deleteStateStore? Also, 
provide an example of what jobSpecUri should look like? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221407)
Time Spent: 20m  (was: 10m)

> delete the state store whenever a flow is deleted
> -
>
> Key: GOBBLIN-720
> URL: https://issues.apache.org/jira/browse/GOBBLIN-720
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Arjun Singh Bora
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2587: [GOBBLIN-720 Always delete state store

2019-04-01 Thread GitBox
sv2000 commented on a change in pull request #2587: [GOBBLIN-720 Always delete 
state store
URL: https://github.com/apache/incubator-gobblin/pull/2587#discussion_r271001196
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/job_monitor/KafkaJobMonitor.java
 ##
 @@ -115,4 +118,17 @@ protected void processMessage(MessageAndMetadata message) {
 }
   }
 
+  private void deleteStateStore(URI jobSpecUri) throws IOException {
+String[] uriTokens = jobSpecUri.getPath().split("/");
+if (null == this.datasetStateStore) {
+  log.warn("Job state store deletion failed as datasetstore is not 
initialized.");
+  return;
+} if (uriTokens.length != 3) {
+  log.error("Invalid URI {}.", jobSpecUri);
+  return;
+}
+String jobName = uriTokens[2];
 
 Review comment:
   Maybe, String jobName = uriTokens[uriTokens.length - 1]?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-720) delete the state store whenever a flow is deleted

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-720?focusedWorklogId=221353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221353
 ]

ASF GitHub Bot logged work on GOBBLIN-720:
--

Author: ASF GitHub Bot
Created on: 01/Apr/19 16:54
Start Date: 01/Apr/19 16:54
Worklog Time Spent: 10m 
  Work Description: arjun4084346 commented on pull request #2587: 
[GOBBLIN-720 Always delete state store
URL: https://github.com/apache/incubator-gobblin/pull/2587
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below! @sv2000 please review
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-XXX
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   it is a partial rollback of 
https://github.com/apache/incubator-gobblin/commit/ccd7ba769308e720db33ea800d964df43df4e878.
   this pr will clean the state store whenever delete flow request is received 
by cluster manager
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   tests already in place
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221353)
Time Spent: 10m
Remaining Estimate: 0h

> delete the state store whenever a flow is deleted
> -
>
> Key: GOBBLIN-720
> URL: https://issues.apache.org/jira/browse/GOBBLIN-720
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Arjun Singh Bora
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] arjun4084346 opened a new pull request #2587: [GOBBLIN-720 Always delete state store

2019-04-01 Thread GitBox
arjun4084346 opened a new pull request #2587: [GOBBLIN-720 Always delete state 
store
URL: https://github.com/apache/incubator-gobblin/pull/2587
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below! @sv2000 please review
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-XXX
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   it is a partial rollback of 
https://github.com/apache/incubator-gobblin/commit/ccd7ba769308e720db33ea800d964df43df4e878.
   this pr will clean the state store whenever delete flow request is received 
by cluster manager
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   tests already in place
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (GOBBLIN-720) delete the state store whenever a flow is deleted

2019-04-01 Thread Arjun Singh Bora (JIRA)
Arjun Singh Bora created GOBBLIN-720:


 Summary: delete the state store whenever a flow is deleted
 Key: GOBBLIN-720
 URL: https://issues.apache.org/jira/browse/GOBBLIN-720
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Arjun Singh Bora






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)