[jira] [Resolved] (GOBBLIN-721) Gobblin streaming recipe is broken
[ https://issues.apache.org/jira/browse/GOBBLIN-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-721. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2588 [https://github.com/apache/incubator-gobblin/pull/2588] > Gobblin streaming recipe is broken > -- > > Key: GOBBLIN-721 > URL: https://issues.apache.org/jira/browse/GOBBLIN-721 > Project: Apache Gobblin > Issue Type: Bug > Components: gobblin-core >Reporter: Shirshanka Das >Assignee: Shirshanka Das >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Running the simple streaming pull file results in a "negative acks" problem > because the internal pipeline has been re-architected to ack automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-721) Gobblin streaming recipe is broken
[ https://issues.apache.org/jira/browse/GOBBLIN-721?focusedWorklogId=221607=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221607 ] ASF GitHub Bot logged work on GOBBLIN-721: -- Author: ASF GitHub Bot Created on: 02/Apr/19 04:20 Start Date: 02/Apr/19 04:20 Worklog Time Spent: 10m Work Description: asfgit commented on pull request #2588: GOBBLIN-721: Remove additional ack. Simplify watermark manager (remov… URL: https://github.com/apache/incubator-gobblin/pull/2588 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221607) Time Spent: 20m (was: 10m) > Gobblin streaming recipe is broken > -- > > Key: GOBBLIN-721 > URL: https://issues.apache.org/jira/browse/GOBBLIN-721 > Project: Apache Gobblin > Issue Type: Bug > Components: gobblin-core >Reporter: Shirshanka Das >Assignee: Shirshanka Das >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Running the simple streaming pull file results in a "negative acks" problem > because the internal pipeline has been re-architected to ack automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] asfgit closed pull request #2588: GOBBLIN-721: Remove additional ack. Simplify watermark manager (remov…
asfgit closed pull request #2588: GOBBLIN-721: Remove additional ack. Simplify watermark manager (remov… URL: https://github.com/apache/incubator-gobblin/pull/2588 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-721) Gobblin streaming recipe is broken
[ https://issues.apache.org/jira/browse/GOBBLIN-721?focusedWorklogId=221429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221429 ] ASF GitHub Bot logged work on GOBBLIN-721: -- Author: ASF GitHub Bot Created on: 01/Apr/19 19:11 Start Date: 01/Apr/19 19:11 Worklog Time Spent: 10m Work Description: shirshanka commented on pull request #2588: GOBBLIN-721: Remove additional ack. Simplify watermark manager (remov… URL: https://github.com/apache/incubator-gobblin/pull/2588 …e multiwriter) Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [X] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-721 ### Description - [X] Here are some details about my PR, including screenshots (if applicable): Removed additional acking now that inner pipeline acks automatically. ### Tests - [X] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: TaskContinuousTest.java updated ### Commits - [X] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221429) Time Spent: 10m Remaining Estimate: 0h > Gobblin streaming recipe is broken > -- > > Key: GOBBLIN-721 > URL: https://issues.apache.org/jira/browse/GOBBLIN-721 > Project: Apache Gobblin > Issue Type: Bug > Components: gobblin-core >Reporter: Shirshanka Das >Assignee: Shirshanka Das >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Running the simple streaming pull file results in a "negative acks" problem > because the internal pipeline has been re-architected to ack automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] shirshanka opened a new pull request #2588: GOBBLIN-721: Remove additional ack. Simplify watermark manager (remov…
shirshanka opened a new pull request #2588: GOBBLIN-721: Remove additional ack. Simplify watermark manager (remov… URL: https://github.com/apache/incubator-gobblin/pull/2588 …e multiwriter) Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [X] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-721 ### Description - [X] Here are some details about my PR, including screenshots (if applicable): Removed additional acking now that inner pipeline acks automatically. ### Tests - [X] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: TaskContinuousTest.java updated ### Commits - [X] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (GOBBLIN-721) Gobblin streaming recipe is broken
Shirshanka Das created GOBBLIN-721: -- Summary: Gobblin streaming recipe is broken Key: GOBBLIN-721 URL: https://issues.apache.org/jira/browse/GOBBLIN-721 Project: Apache Gobblin Issue Type: Bug Components: gobblin-core Reporter: Shirshanka Das Assignee: Shirshanka Das Running the simple streaming pull file results in a "negative acks" problem because the internal pipeline has been re-architected to ack automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-712) Add version strategy for configbased dataset copy
[ https://issues.apache.org/jira/browse/GOBBLIN-712?focusedWorklogId=221420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221420 ] ASF GitHub Bot logged work on GOBBLIN-712: -- Author: ASF GitHub Bot Created on: 01/Apr/19 18:52 Start Date: 01/Apr/19 18:52 Worklog Time Spent: 10m Work Description: ibuenros commented on pull request #2579: [GOBBLIN-712] Add version strategy pickup for ConfigBasedDataset distcp workflow URL: https://github.com/apache/incubator-gobblin/pull/2579#discussion_r271003747 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/replication/ConfigBasedDataset.java ## @@ -128,6 +175,17 @@ public String datasetURN() { return copyableFiles; } +if (!this.srcDataFileVersionStrategy.isPresent() || !this.dstDataFileVersionStrategy.isPresent()) { + log.warn("Version strategy doesn't exist, cannot handle copy"); + return copyableFiles; +} + +if (!this.srcDataFileVersionStrategy.get().getClass().getName() +.equals(this.dstDataFileVersionStrategy.get().getClass().getName())) { + log.warn("Version strategy doesn't match, cannot handle copy"); Review comment: Let's print out the src and target strategies for easier debuggability. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221420) Time Spent: 2h 10m (was: 2h) > Add version strategy for configbased dataset copy > - > > Key: GOBBLIN-712 > URL: https://issues.apache.org/jira/browse/GOBBLIN-712 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] ibuenros commented on a change in pull request #2579: [GOBBLIN-712] Add version strategy pickup for ConfigBasedDataset distcp workflow
ibuenros commented on a change in pull request #2579: [GOBBLIN-712] Add version strategy pickup for ConfigBasedDataset distcp workflow URL: https://github.com/apache/incubator-gobblin/pull/2579#discussion_r271003747 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/replication/ConfigBasedDataset.java ## @@ -128,6 +175,17 @@ public String datasetURN() { return copyableFiles; } +if (!this.srcDataFileVersionStrategy.isPresent() || !this.dstDataFileVersionStrategy.isPresent()) { + log.warn("Version strategy doesn't exist, cannot handle copy"); + return copyableFiles; +} + +if (!this.srcDataFileVersionStrategy.get().getClass().getName() +.equals(this.dstDataFileVersionStrategy.get().getClass().getName())) { + log.warn("Version strategy doesn't match, cannot handle copy"); Review comment: Let's print out the src and target strategies for easier debuggability. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-712) Add version strategy for configbased dataset copy
[ https://issues.apache.org/jira/browse/GOBBLIN-712?focusedWorklogId=221421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221421 ] ASF GitHub Bot logged work on GOBBLIN-712: -- Author: ASF GitHub Bot Created on: 01/Apr/19 18:52 Start Date: 01/Apr/19 18:52 Worklog Time Spent: 10m Work Description: ibuenros commented on pull request #2579: [GOBBLIN-712] Add version strategy pickup for ConfigBasedDataset distcp workflow URL: https://github.com/apache/incubator-gobblin/pull/2579#discussion_r271003600 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/replication/ConfigBasedDataset.java ## @@ -94,6 +107,40 @@ public ConfigBasedDataset(ReplicationConfiguration rc, Properties props, CopyRou this.pathFilter = DatasetUtils.instantiatePathFilter(this.props); this.applyFilterToDirectories = Boolean.parseBoolean(this.props.getProperty(CopyConfiguration.APPLY_FILTER_TO_DIRECTORIES, "false")); +this.srcDataFileVersionStrategy = getDataFileVersionStrategy(this.copyRoute.getCopyFrom(), rc, props); +this.dstDataFileVersionStrategy = getDataFileVersionStrategy(this.copyRoute.getCopyTo(), rc, props); +this.enforceFileLengthMatch = Boolean.parseBoolean(this.props.getProperty(CopyConfiguration.ENFORCE_FILE_LENGTH_MATCH, "true")); + } + + /** + * Get the version strategy that can retrieve the data file version from the end point. + * + * @return the version strategy. Empty value when the version is not supported for this end point. + */ + private Optional getDataFileVersionStrategy(EndPoint endPoint, ReplicationConfiguration rc, Properties props) { +if (!(endPoint instanceof HadoopFsEndPoint)) { + log.warn("Data file version currently only handle the Hadoop Fs EndPoint replication"); + return Optional.absent(); +} +Configuration conf = HadoopUtils.newConfiguration(); +try { + HadoopFsEndPoint hEndpoint = (HadoopFsEndPoint) endPoint; + FileSystem fs = FileSystem.get(hEndpoint.getFsURI(), conf); + + // IF configStore doesn't contain the strategy, check from job properties. + // If no strategy is found, default to the modification time strategy. + Optional versionStrategy = rc.getVersionStrategyFromConfigStore(); + Config versionStrategyConfig = ConfigFactory.parseMap(ImmutableMap.of( + DataFileVersionStrategy.DATA_FILE_VERSION_STRATEGY_KEY, versionStrategy.isPresent()? versionStrategy.get() : + props.getProperty(DataFileVersionStrategy.DATA_FILE_VERSION_STRATEGY_KEY, + ModTimeDataFileVersionStrategy.Factory.class.getName(; + + DataFileVersionStrategy strategy = DataFileVersionStrategy.instantiateDataFileVersionStrategy(fs, versionStrategyConfig); + log.debug("{} has version strategy {}", hEndpoint.getClusterName()); + return Optional.of(strategy); +} catch (IOException e) { + return Optional.absent(); Review comment: Should we log the exception? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221421) Time Spent: 2h 10m (was: 2h) > Add version strategy for configbased dataset copy > - > > Key: GOBBLIN-712 > URL: https://issues.apache.org/jira/browse/GOBBLIN-712 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] ibuenros commented on a change in pull request #2579: [GOBBLIN-712] Add version strategy pickup for ConfigBasedDataset distcp workflow
ibuenros commented on a change in pull request #2579: [GOBBLIN-712] Add version strategy pickup for ConfigBasedDataset distcp workflow URL: https://github.com/apache/incubator-gobblin/pull/2579#discussion_r271003600 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/replication/ConfigBasedDataset.java ## @@ -94,6 +107,40 @@ public ConfigBasedDataset(ReplicationConfiguration rc, Properties props, CopyRou this.pathFilter = DatasetUtils.instantiatePathFilter(this.props); this.applyFilterToDirectories = Boolean.parseBoolean(this.props.getProperty(CopyConfiguration.APPLY_FILTER_TO_DIRECTORIES, "false")); +this.srcDataFileVersionStrategy = getDataFileVersionStrategy(this.copyRoute.getCopyFrom(), rc, props); +this.dstDataFileVersionStrategy = getDataFileVersionStrategy(this.copyRoute.getCopyTo(), rc, props); +this.enforceFileLengthMatch = Boolean.parseBoolean(this.props.getProperty(CopyConfiguration.ENFORCE_FILE_LENGTH_MATCH, "true")); + } + + /** + * Get the version strategy that can retrieve the data file version from the end point. + * + * @return the version strategy. Empty value when the version is not supported for this end point. + */ + private Optional getDataFileVersionStrategy(EndPoint endPoint, ReplicationConfiguration rc, Properties props) { +if (!(endPoint instanceof HadoopFsEndPoint)) { + log.warn("Data file version currently only handle the Hadoop Fs EndPoint replication"); + return Optional.absent(); +} +Configuration conf = HadoopUtils.newConfiguration(); +try { + HadoopFsEndPoint hEndpoint = (HadoopFsEndPoint) endPoint; + FileSystem fs = FileSystem.get(hEndpoint.getFsURI(), conf); + + // IF configStore doesn't contain the strategy, check from job properties. + // If no strategy is found, default to the modification time strategy. + Optional versionStrategy = rc.getVersionStrategyFromConfigStore(); + Config versionStrategyConfig = ConfigFactory.parseMap(ImmutableMap.of( + DataFileVersionStrategy.DATA_FILE_VERSION_STRATEGY_KEY, versionStrategy.isPresent()? versionStrategy.get() : + props.getProperty(DataFileVersionStrategy.DATA_FILE_VERSION_STRATEGY_KEY, + ModTimeDataFileVersionStrategy.Factory.class.getName(; + + DataFileVersionStrategy strategy = DataFileVersionStrategy.instantiateDataFileVersionStrategy(fs, versionStrategyConfig); + log.debug("{} has version strategy {}", hEndpoint.getClusterName()); + return Optional.of(strategy); +} catch (IOException e) { + return Optional.absent(); Review comment: Should we log the exception? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-720) delete the state store whenever a flow is deleted
[ https://issues.apache.org/jira/browse/GOBBLIN-720?focusedWorklogId=221409=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221409 ] ASF GitHub Bot logged work on GOBBLIN-720: -- Author: ASF GitHub Bot Created on: 01/Apr/19 18:44 Start Date: 01/Apr/19 18:44 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2587: [GOBBLIN-720 Always delete state store URL: https://github.com/apache/incubator-gobblin/pull/2587#discussion_r271001196 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/job_monitor/KafkaJobMonitor.java ## @@ -115,4 +118,17 @@ protected void processMessage(MessageAndMetadata message) { } } + private void deleteStateStore(URI jobSpecUri) throws IOException { +String[] uriTokens = jobSpecUri.getPath().split("/"); +if (null == this.datasetStateStore) { + log.warn("Job state store deletion failed as datasetstore is not initialized."); + return; +} if (uriTokens.length != 3) { + log.error("Invalid URI {}.", jobSpecUri); + return; +} +String jobName = uriTokens[2]; Review comment: Maybe, String jobName = uriTokens[uriTokens.length - 1]? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221409) Time Spent: 0.5h (was: 20m) > delete the state store whenever a flow is deleted > - > > Key: GOBBLIN-720 > URL: https://issues.apache.org/jira/browse/GOBBLIN-720 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Arjun Singh Bora >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-720) delete the state store whenever a flow is deleted
[ https://issues.apache.org/jira/browse/GOBBLIN-720?focusedWorklogId=221411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221411 ] ASF GitHub Bot logged work on GOBBLIN-720: -- Author: ASF GitHub Bot Created on: 01/Apr/19 18:44 Start Date: 01/Apr/19 18:44 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2587: [GOBBLIN-720 Always delete state store URL: https://github.com/apache/incubator-gobblin/pull/2587#discussion_r271001383 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/job_monitor/KafkaJobMonitor.java ## @@ -115,4 +118,17 @@ protected void processMessage(MessageAndMetadata message) { } } + private void deleteStateStore(URI jobSpecUri) throws IOException { +String[] uriTokens = jobSpecUri.getPath().split("/"); +if (null == this.datasetStateStore) { + log.warn("Job state store deletion failed as datasetstore is not initialized."); + return; +} if (uriTokens.length != 3) { Review comment: Perhaps define a constant EXPECTED_NUM_URI_TOKENS = 3 ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221411) Time Spent: 40m (was: 0.5h) > delete the state store whenever a flow is deleted > - > > Key: GOBBLIN-720 > URL: https://issues.apache.org/jira/browse/GOBBLIN-720 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Arjun Singh Bora >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2587: [GOBBLIN-720 Always delete state store
sv2000 commented on a change in pull request #2587: [GOBBLIN-720 Always delete state store URL: https://github.com/apache/incubator-gobblin/pull/2587#discussion_r271001383 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/job_monitor/KafkaJobMonitor.java ## @@ -115,4 +118,17 @@ protected void processMessage(MessageAndMetadata message) { } } + private void deleteStateStore(URI jobSpecUri) throws IOException { +String[] uriTokens = jobSpecUri.getPath().split("/"); +if (null == this.datasetStateStore) { + log.warn("Job state store deletion failed as datasetstore is not initialized."); + return; +} if (uriTokens.length != 3) { Review comment: Perhaps define a constant EXPECTED_NUM_URI_TOKENS = 3 ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] ibuenros commented on issue #2582: UnitTest for KafkaSource
ibuenros commented on issue #2582: UnitTest for KafkaSource URL: https://github.com/apache/incubator-gobblin/pull/2582#issuecomment-478697677 @htran1 can you merge? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-720) delete the state store whenever a flow is deleted
[ https://issues.apache.org/jira/browse/GOBBLIN-720?focusedWorklogId=221407=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221407 ] ASF GitHub Bot logged work on GOBBLIN-720: -- Author: ASF GitHub Bot Created on: 01/Apr/19 18:42 Start Date: 01/Apr/19 18:42 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2587: [GOBBLIN-720 Always delete state store URL: https://github.com/apache/incubator-gobblin/pull/2587#discussion_r271000502 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/job_monitor/KafkaJobMonitor.java ## @@ -115,4 +118,17 @@ protected void processMessage(MessageAndMetadata message) { } } + private void deleteStateStore(URI jobSpecUri) throws IOException { Review comment: Can you add Javadoc to explain the behavior of deleteStateStore? Also, provide an example of what jobSpecUri should look like? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221407) Time Spent: 20m (was: 10m) > delete the state store whenever a flow is deleted > - > > Key: GOBBLIN-720 > URL: https://issues.apache.org/jira/browse/GOBBLIN-720 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Arjun Singh Bora >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2587: [GOBBLIN-720 Always delete state store
sv2000 commented on a change in pull request #2587: [GOBBLIN-720 Always delete state store URL: https://github.com/apache/incubator-gobblin/pull/2587#discussion_r271001196 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/job_monitor/KafkaJobMonitor.java ## @@ -115,4 +118,17 @@ protected void processMessage(MessageAndMetadata message) { } } + private void deleteStateStore(URI jobSpecUri) throws IOException { +String[] uriTokens = jobSpecUri.getPath().split("/"); +if (null == this.datasetStateStore) { + log.warn("Job state store deletion failed as datasetstore is not initialized."); + return; +} if (uriTokens.length != 3) { + log.error("Invalid URI {}.", jobSpecUri); + return; +} +String jobName = uriTokens[2]; Review comment: Maybe, String jobName = uriTokens[uriTokens.length - 1]? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-720) delete the state store whenever a flow is deleted
[ https://issues.apache.org/jira/browse/GOBBLIN-720?focusedWorklogId=221353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221353 ] ASF GitHub Bot logged work on GOBBLIN-720: -- Author: ASF GitHub Bot Created on: 01/Apr/19 16:54 Start Date: 01/Apr/19 16:54 Worklog Time Spent: 10m Work Description: arjun4084346 commented on pull request #2587: [GOBBLIN-720 Always delete state store URL: https://github.com/apache/incubator-gobblin/pull/2587 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! @sv2000 please review ### JIRA - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-XXX ### Description - [x] Here are some details about my PR, including screenshots (if applicable): it is a partial rollback of https://github.com/apache/incubator-gobblin/commit/ccd7ba769308e720db33ea800d964df43df4e878. this pr will clean the state store whenever delete flow request is received by cluster manager ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: tests already in place ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221353) Time Spent: 10m Remaining Estimate: 0h > delete the state store whenever a flow is deleted > - > > Key: GOBBLIN-720 > URL: https://issues.apache.org/jira/browse/GOBBLIN-720 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Arjun Singh Bora >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] arjun4084346 opened a new pull request #2587: [GOBBLIN-720 Always delete state store
arjun4084346 opened a new pull request #2587: [GOBBLIN-720 Always delete state store URL: https://github.com/apache/incubator-gobblin/pull/2587 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! @sv2000 please review ### JIRA - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-XXX ### Description - [x] Here are some details about my PR, including screenshots (if applicable): it is a partial rollback of https://github.com/apache/incubator-gobblin/commit/ccd7ba769308e720db33ea800d964df43df4e878. this pr will clean the state store whenever delete flow request is received by cluster manager ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: tests already in place ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (GOBBLIN-720) delete the state store whenever a flow is deleted
Arjun Singh Bora created GOBBLIN-720: Summary: delete the state store whenever a flow is deleted Key: GOBBLIN-720 URL: https://issues.apache.org/jira/browse/GOBBLIN-720 Project: Apache Gobblin Issue Type: Improvement Reporter: Arjun Singh Bora -- This message was sent by Atlassian JIRA (v7.6.3#76005)