[jira] [Work logged] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly
[ https://issues.apache.org/jira/browse/GOBBLIN-707?focusedWorklogId=231807=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231807 ] ASF GitHub Bot logged work on GOBBLIN-707: -- Author: ASF GitHub Bot Created on: 24/Apr/19 00:45 Start Date: 24/Apr/19 00:45 Worklog Time Spent: 10m Work Description: jhsenjaliya commented on issue #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command URL: https://github.com/apache/incubator-gobblin/pull/2578#issuecomment-486024787 @autumnust , updated docs and also added new info in doc regarding the usage of gobblin.sh, please take a look when you get chance. Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231807) Time Spent: 5h 20m (was: 5h 10m) > combine & standardize all gobblin scripts into one master script & > restructure configs accordingly > -- > > Key: GOBBLIN-707 > URL: https://issues.apache.org/jira/browse/GOBBLIN-707 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Jay Sen >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > gobblin supports multiple modes of executions ( CLI, Standalone, > cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines > utility to run cli and admin commands. There is a individual script for each > of them. > Having individual script introduces lot of issues > # all scripts handles gobblin variables, user parameters differently, and > its highly inconsistent among various different gobblin scripts > # functionality around start, stop, status checking and handling PID's among > lot of other things, varies vastly as per the implementation of the script. > # features like GC & JVM params, log4j file selection, classpath > calculation, etc... exists in some gobblin scripts but not all, adding to > inconsistent user experience. > # maintaining total 13 script would be too much effort. > Also all the gobblin scripts share lot of common code to handle params, > start, stop services, status checks, pid handling, etc... combining all the > scripts into 1 not only makes maintenance easier but also brings clarity and > consistency. > > Solution: > 1. there can be one gobblin.sh script to handle all gobblin commands and > deployment options as per following signature. NOTE: This > {{gobblin.sh }} > {{gobblin.sh }} > {{commands values: admin, cli, statestore-check, statestore-clean, > historystore-manager, classpath}} > {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, > service}} > with above change, following becomes valid command. > {code:java} > # all under GobblinCli class > gobblin run listQuickApps –> gobblin cli run listQuickApps > gobblin run listQuickApps –> gobblin cli run listQuickApps > gobblin run -> gobblin cli run > # class: JobStateToJsonConverter > statestore-checker.sh -> gobblin statestore-checker > # class: StateStoreCleaner > statestore-clean.sh -> gobblin statestore-clean > # class: DatabaseJobHistoryStoreSchemaManager > historystore-manager.sh -> gobblin historystore-manager > # class: Cli > gobblin-admin.sh-> gobblin admin > # all gobblin deployment modes > gobblin-cluster-master.sh -> gobblin cluster-mater start|stop|status > gobblin-cluster-worker.sh -> gobblin cluster-mater start|stop|status > gobblin-compaction.sh -> gobblin cluster-mater start|stop|status > gobblin-env.sh -> gobblin cluster-mater start|stop|status > gobblin-mapreduce.sh-> gobblin cluster-mater start|stop|status > gobblin-service.sh -> gobblin cluster-mater start|stop|status > gobblin-standalone.sh -> gobblin cluster-mater start|stop|status > gobblin-yarn.sh -> gobblin cluster-mater start|stop|status > {code} > > 2. Also configs needs to be structured and deduped accordingly to make it > clear on which config will be picked up for which execution mode. > > {color:#FF} > NOTE: this refactoring to gobblin.sh, changes the way all gobblin commands > where ran before{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-753) Refactor HiveRegistrationPolicyBase to make ConfigStore object available in extending class
[ https://issues.apache.org/jira/browse/GOBBLIN-753?focusedWorklogId=231809=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231809 ] ASF GitHub Bot logged work on GOBBLIN-753: -- Author: ASF GitHub Bot Created on: 24/Apr/19 00:47 Start Date: 24/Apr/19 00:47 Worklog Time Spent: 10m Work Description: autumnust commented on issue #2618: [GOBBLIN-753] Refactor HiveRegistrationPolicyBase to surface configStore object URL: https://github.com/apache/incubator-gobblin/pull/2618#issuecomment-486025154 @ibuenros @htran1 Can you take a look ? Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231809) Time Spent: 20m (was: 10m) > Refactor HiveRegistrationPolicyBase to make ConfigStore object available in > extending class > --- > > Key: GOBBLIN-753 > URL: https://issues.apache.org/jira/browse/GOBBLIN-753 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] autumnust commented on issue #2618: [GOBBLIN-753] Refactor HiveRegistrationPolicyBase to surface configStore object
autumnust commented on issue #2618: [GOBBLIN-753] Refactor HiveRegistrationPolicyBase to surface configStore object URL: https://github.com/apache/incubator-gobblin/pull/2618#issuecomment-486025154 @ibuenros @htran1 Can you take a look ? Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-753) Refactor HiveRegistrationPolicyBase to make ConfigStore object available in extending class
[ https://issues.apache.org/jira/browse/GOBBLIN-753?focusedWorklogId=231808=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231808 ] ASF GitHub Bot logged work on GOBBLIN-753: -- Author: ASF GitHub Bot Created on: 24/Apr/19 00:47 Start Date: 24/Apr/19 00:47 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2618: [GOBBLIN-753] Refactor HiveRegistrationPolicyBase to surface configStore object URL: https://github.com/apache/incubator-gobblin/pull/2618 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! Some refactoring in `HiveRegistrationPolicyBase` to make topic-specific configStore object available in extension class ### JIRA - [x] My PR addresses the following [Gobblin JIRA] - https://issues.apache.org/jira/browse/GOBBLIN-753 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231808) Time Spent: 10m Remaining Estimate: 0h > Refactor HiveRegistrationPolicyBase to make ConfigStore object available in > extending class > --- > > Key: GOBBLIN-753 > URL: https://issues.apache.org/jira/browse/GOBBLIN-753 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] autumnust opened a new pull request #2618: [GOBBLIN-753] Refactor HiveRegistrationPolicyBase to surface configStore object
autumnust opened a new pull request #2618: [GOBBLIN-753] Refactor HiveRegistrationPolicyBase to surface configStore object URL: https://github.com/apache/incubator-gobblin/pull/2618 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! Some refactoring in `HiveRegistrationPolicyBase` to make topic-specific configStore object available in extension class ### JIRA - [x] My PR addresses the following [Gobblin JIRA] - https://issues.apache.org/jira/browse/GOBBLIN-753 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (GOBBLIN-753) Refactor HiveRegistrationPolicyBase to make ConfigStore object available in extending class
Lei Sun created GOBBLIN-753: --- Summary: Refactor HiveRegistrationPolicyBase to make ConfigStore object available in extending class Key: GOBBLIN-753 URL: https://issues.apache.org/jira/browse/GOBBLIN-753 Project: Apache Gobblin Issue Type: Improvement Reporter: Lei Sun -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] jhsenjaliya commented on issue #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command
jhsenjaliya commented on issue #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command URL: https://github.com/apache/incubator-gobblin/pull/2578#issuecomment-486024787 @autumnust , updated docs and also added new info in doc regarding the usage of gobblin.sh, please take a look when you get chance. Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-752) Throttling server incorrectly marks permit numbers as unsatisfiable
[ https://issues.apache.org/jira/browse/GOBBLIN-752?focusedWorklogId=231791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231791 ] ASF GitHub Bot logged work on GOBBLIN-752: -- Author: ASF GitHub Bot Created on: 24/Apr/19 00:27 Start Date: 24/Apr/19 00:27 Worklog Time Spent: 10m Work Description: asfgit commented on pull request #2617: [GOBBLIN-752] Fix a bug in QPS throttling policy where it was incorrectly indicatin… URL: https://github.com/apache/incubator-gobblin/pull/2617 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231791) Time Spent: 20m (was: 10m) > Throttling server incorrectly marks permit numbers as unsatisfiable > --- > > Key: GOBBLIN-752 > URL: https://issues.apache.org/jira/browse/GOBBLIN-752 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-746) Loading FlowSpecs asynchronously while initializing GobblinServiceManager
[ https://issues.apache.org/jira/browse/GOBBLIN-746?focusedWorklogId=231787=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231787 ] ASF GitHub Bot logged work on GOBBLIN-746: -- Author: ASF GitHub Bot Created on: 24/Apr/19 00:22 Start Date: 24/Apr/19 00:22 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2611: [GOBBLIN-746] Async loading FlowSpec URL: https://github.com/apache/incubator-gobblin/pull/2611#discussion_r277917952 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/scheduler/GobblinServiceJobScheduler.java ## @@ -136,6 +136,46 @@ public synchronized void setActive(boolean isActive) { } } + /** + * Load all {@link FlowSpec}s from {@link FlowCatalog} as one of the initialization step, + * and make schedulers be aware of that. + * + */ + private void scheduleSpecsFromCatalog() { +Iterator specUris = null; +long startTime = System.currentTimeMillis(); + +try { + specUris = this.flowCatalog.get().getSpecURIs(); +} catch (SpecSerDeException ssde) { + throw new RuntimeException("Failed to get the iterator of all Spec URIS", ssde); +} + + +try { + while (specUris.hasNext()) { +Spec spec = null; +try { + spec = this.flowCatalog.get().getSpec(specUris.next()); +} catch (SpecNotFoundException snfe) { + _log.error(String.format("The URI %s discovered in SpecStore is missing in FlowCatlog" + + ", suspecting current modification on SpecStore", specUris.next()), snfe); +} + +//Disable FLOW_RUN_IMMEDIATELY on service startup or leadership change +if (spec instanceof FlowSpec) { + Spec modifiedSpec = disableFlowRunImmediatelyOnStart((FlowSpec) spec); + onAddSpec(modifiedSpec); +} else { + onAddSpec(spec); +} + } +} finally { + flowSpecInitFinished.countDown(); Review comment: Is this countdown latch being used only in the test case? Your test case is waiting for at most 2 secs anyway. Can you simply do an AssertWithBackoff in your test case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231787) Time Spent: 0.5h (was: 20m) > Loading FlowSpecs asynchronously while initializing GobblinServiceManager > - > > Key: GOBBLIN-746 > URL: https://issues.apache.org/jira/browse/GOBBLIN-746 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-751) Make enforced file size matching to be configurable
[ https://issues.apache.org/jira/browse/GOBBLIN-751?focusedWorklogId=231786=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231786 ] ASF GitHub Bot logged work on GOBBLIN-751: -- Author: ASF GitHub Bot Created on: 24/Apr/19 00:20 Start Date: 24/Apr/19 00:20 Worklog Time Spent: 10m Work Description: ibuenros commented on issue #2616: [GOBBLIN-751] Make enforced file size matching to be configurable URL: https://github.com/apache/incubator-gobblin/pull/2616#issuecomment-486020826 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231786) Time Spent: 50m (was: 40m) > Make enforced file size matching to be configurable > --- > > Key: GOBBLIN-751 > URL: https://issues.apache.org/jira/browse/GOBBLIN-751 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Make enforced file size matching to be configurable -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-752) Throttling server incorrectly marks permit numbers as unsatisfiable
[ https://issues.apache.org/jira/browse/GOBBLIN-752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-752. -- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2617 [https://github.com/apache/incubator-gobblin/pull/2617] > Throttling server incorrectly marks permit numbers as unsatisfiable > --- > > Key: GOBBLIN-752 > URL: https://issues.apache.org/jira/browse/GOBBLIN-752 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] asfgit closed pull request #2617: [GOBBLIN-752] Fix a bug in QPS throttling policy where it was incorrectly indicatin…
asfgit closed pull request #2617: [GOBBLIN-752] Fix a bug in QPS throttling policy where it was incorrectly indicatin… URL: https://github.com/apache/incubator-gobblin/pull/2617 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-746) Loading FlowSpecs asynchronously while initializing GobblinServiceManager
[ https://issues.apache.org/jira/browse/GOBBLIN-746?focusedWorklogId=231789=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231789 ] ASF GitHub Bot logged work on GOBBLIN-746: -- Author: ASF GitHub Bot Created on: 24/Apr/19 00:22 Start Date: 24/Apr/19 00:22 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2611: [GOBBLIN-746] Async loading FlowSpec URL: https://github.com/apache/incubator-gobblin/pull/2611#discussion_r277899864 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/api/SpecStore.java ## @@ -105,4 +107,16 @@ * @throws IOException Exception in retrieving {@link Spec}s. */ Collection getSpecs() throws IOException; + + /** + * Return an iterator of Spec's URI(Spec's identifier) Review comment: Modify Spec's URI to Spec URIs (Spec identifiers)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231789) Time Spent: 50m (was: 40m) > Loading FlowSpecs asynchronously while initializing GobblinServiceManager > - > > Key: GOBBLIN-746 > URL: https://issues.apache.org/jira/browse/GOBBLIN-746 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-746) Loading FlowSpecs asynchronously while initializing GobblinServiceManager
[ https://issues.apache.org/jira/browse/GOBBLIN-746?focusedWorklogId=231788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231788 ] ASF GitHub Bot logged work on GOBBLIN-746: -- Author: ASF GitHub Bot Created on: 24/Apr/19 00:22 Start Date: 24/Apr/19 00:22 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2611: [GOBBLIN-746] Async loading FlowSpec URL: https://github.com/apache/incubator-gobblin/pull/2611#discussion_r277892394 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/api/SpecSerDeException.java ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.gobblin.runtime.api; + +import java.net.URI; + +/** + * An exception when {@link Spec} cannot be correctly serialized/deserialized from underlying storage. + */ +public class SpecSerDeException extends Exception{ Review comment: Minor nit. Should there be a space between Exception and "{"? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231788) Time Spent: 40m (was: 0.5h) > Loading FlowSpecs asynchronously while initializing GobblinServiceManager > - > > Key: GOBBLIN-746 > URL: https://issues.apache.org/jira/browse/GOBBLIN-746 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-752) Throttling server incorrectly marks permit numbers as unsatisfiable
Issac Buenrostro created GOBBLIN-752: Summary: Throttling server incorrectly marks permit numbers as unsatisfiable Key: GOBBLIN-752 URL: https://issues.apache.org/jira/browse/GOBBLIN-752 Project: Apache Gobblin Issue Type: Improvement Reporter: Issac Buenrostro Assignee: Issac Buenrostro -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2611: [GOBBLIN-746] Async loading FlowSpec
sv2000 commented on a change in pull request #2611: [GOBBLIN-746] Async loading FlowSpec URL: https://github.com/apache/incubator-gobblin/pull/2611#discussion_r277892394 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/api/SpecSerDeException.java ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.gobblin.runtime.api; + +import java.net.URI; + +/** + * An exception when {@link Spec} cannot be correctly serialized/deserialized from underlying storage. + */ +public class SpecSerDeException extends Exception{ Review comment: Minor nit. Should there be a space between Exception and "{"? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2611: [GOBBLIN-746] Async loading FlowSpec
sv2000 commented on a change in pull request #2611: [GOBBLIN-746] Async loading FlowSpec URL: https://github.com/apache/incubator-gobblin/pull/2611#discussion_r277917952 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/scheduler/GobblinServiceJobScheduler.java ## @@ -136,6 +136,46 @@ public synchronized void setActive(boolean isActive) { } } + /** + * Load all {@link FlowSpec}s from {@link FlowCatalog} as one of the initialization step, + * and make schedulers be aware of that. + * + */ + private void scheduleSpecsFromCatalog() { +Iterator specUris = null; +long startTime = System.currentTimeMillis(); + +try { + specUris = this.flowCatalog.get().getSpecURIs(); +} catch (SpecSerDeException ssde) { + throw new RuntimeException("Failed to get the iterator of all Spec URIS", ssde); +} + + +try { + while (specUris.hasNext()) { +Spec spec = null; +try { + spec = this.flowCatalog.get().getSpec(specUris.next()); +} catch (SpecNotFoundException snfe) { + _log.error(String.format("The URI %s discovered in SpecStore is missing in FlowCatlog" + + ", suspecting current modification on SpecStore", specUris.next()), snfe); +} + +//Disable FLOW_RUN_IMMEDIATELY on service startup or leadership change +if (spec instanceof FlowSpec) { + Spec modifiedSpec = disableFlowRunImmediatelyOnStart((FlowSpec) spec); + onAddSpec(modifiedSpec); +} else { + onAddSpec(spec); +} + } +} finally { + flowSpecInitFinished.countDown(); Review comment: Is this countdown latch being used only in the test case? Your test case is waiting for at most 2 secs anyway. Can you simply do an AssertWithBackoff in your test case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2611: [GOBBLIN-746] Async loading FlowSpec
sv2000 commented on a change in pull request #2611: [GOBBLIN-746] Async loading FlowSpec URL: https://github.com/apache/incubator-gobblin/pull/2611#discussion_r277899864 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/api/SpecStore.java ## @@ -105,4 +107,16 @@ * @throws IOException Exception in retrieving {@link Spec}s. */ Collection getSpecs() throws IOException; + + /** + * Return an iterator of Spec's URI(Spec's identifier) Review comment: Modify Spec's URI to Spec URIs (Spec identifiers)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] ibuenros commented on issue #2616: [GOBBLIN-751] Make enforced file size matching to be configurable
ibuenros commented on issue #2616: [GOBBLIN-751] Make enforced file size matching to be configurable URL: https://github.com/apache/incubator-gobblin/pull/2616#issuecomment-486020826 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-751) Make enforced file size matching to be configurable
[ https://issues.apache.org/jira/browse/GOBBLIN-751?focusedWorklogId=231780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231780 ] ASF GitHub Bot logged work on GOBBLIN-751: -- Author: ASF GitHub Bot Created on: 24/Apr/19 00:06 Start Date: 24/Apr/19 00:06 Worklog Time Spent: 10m Work Description: yukuai518 commented on pull request #2616: [GOBBLIN-751] Make enforced file size matching to be configurable URL: https://github.com/apache/incubator-gobblin/pull/2616#discussion_r277915613 ## File path: gobblin-utility/src/main/java/org/apache/gobblin/util/filesystem/DataFileVersionStrategy.java ## @@ -65,12 +65,13 @@ } String DATA_FILE_VERSION_STRATEGY_KEY = "org.apache.gobblin.dataFileVersionStrategy"; + String DEFAULT_DATA_FILE_VERSION_STAREGY = "modtime"; Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231780) Time Spent: 40m (was: 0.5h) > Make enforced file size matching to be configurable > --- > > Key: GOBBLIN-751 > URL: https://issues.apache.org/jira/browse/GOBBLIN-751 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Make enforced file size matching to be configurable -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-751) Make enforced file size matching to be configurable
[ https://issues.apache.org/jira/browse/GOBBLIN-751?focusedWorklogId=231777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231777 ] ASF GitHub Bot logged work on GOBBLIN-751: -- Author: ASF GitHub Bot Created on: 23/Apr/19 23:56 Start Date: 23/Apr/19 23:56 Worklog Time Spent: 10m Work Description: yukuai518 commented on issue #2616: [GOBBLIN-751] Make enforced file size matching to be configurable URL: https://github.com/apache/incubator-gobblin/pull/2616#issuecomment-486016085 @ibuenros please help review this. This will help us onboard a few datasets for some validation while the rest of datasets are untouched. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231777) Time Spent: 20m (was: 10m) > Make enforced file size matching to be configurable > --- > > Key: GOBBLIN-751 > URL: https://issues.apache.org/jira/browse/GOBBLIN-751 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Make enforced file size matching to be configurable -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] yukuai518 commented on a change in pull request #2616: [GOBBLIN-751] Make enforced file size matching to be configurable
yukuai518 commented on a change in pull request #2616: [GOBBLIN-751] Make enforced file size matching to be configurable URL: https://github.com/apache/incubator-gobblin/pull/2616#discussion_r277915613 ## File path: gobblin-utility/src/main/java/org/apache/gobblin/util/filesystem/DataFileVersionStrategy.java ## @@ -65,12 +65,13 @@ } String DATA_FILE_VERSION_STRATEGY_KEY = "org.apache.gobblin.dataFileVersionStrategy"; + String DEFAULT_DATA_FILE_VERSION_STAREGY = "modtime"; Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-751) Make enforced file size matching to be configurable
[ https://issues.apache.org/jira/browse/GOBBLIN-751?focusedWorklogId=231779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231779 ] ASF GitHub Bot logged work on GOBBLIN-751: -- Author: ASF GitHub Bot Created on: 24/Apr/19 00:02 Start Date: 24/Apr/19 00:02 Worklog Time Spent: 10m Work Description: ibuenros commented on pull request #2616: [GOBBLIN-751] Make enforced file size matching to be configurable URL: https://github.com/apache/incubator-gobblin/pull/2616#discussion_r277914931 ## File path: gobblin-utility/src/main/java/org/apache/gobblin/util/filesystem/DataFileVersionStrategy.java ## @@ -65,12 +65,13 @@ } String DATA_FILE_VERSION_STRATEGY_KEY = "org.apache.gobblin.dataFileVersionStrategy"; + String DEFAULT_DATA_FILE_VERSION_STAREGY = "modtime"; Review comment: can you fix the spelling of the key name? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231779) Time Spent: 0.5h (was: 20m) > Make enforced file size matching to be configurable > --- > > Key: GOBBLIN-751 > URL: https://issues.apache.org/jira/browse/GOBBLIN-751 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Make enforced file size matching to be configurable -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] ibuenros commented on a change in pull request #2616: [GOBBLIN-751] Make enforced file size matching to be configurable
ibuenros commented on a change in pull request #2616: [GOBBLIN-751] Make enforced file size matching to be configurable URL: https://github.com/apache/incubator-gobblin/pull/2616#discussion_r277914931 ## File path: gobblin-utility/src/main/java/org/apache/gobblin/util/filesystem/DataFileVersionStrategy.java ## @@ -65,12 +65,13 @@ } String DATA_FILE_VERSION_STRATEGY_KEY = "org.apache.gobblin.dataFileVersionStrategy"; + String DEFAULT_DATA_FILE_VERSION_STAREGY = "modtime"; Review comment: can you fix the spelling of the key name? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-751) Make enforced file size matching to be configurable
[ https://issues.apache.org/jira/browse/GOBBLIN-751?focusedWorklogId=231776=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231776 ] ASF GitHub Bot logged work on GOBBLIN-751: -- Author: ASF GitHub Bot Created on: 23/Apr/19 23:51 Start Date: 23/Apr/19 23:51 Worklog Time Spent: 10m Work Description: yukuai518 commented on pull request #2616: [GOBBLIN-751] Make enforced file size matching to be configurable URL: https://github.com/apache/incubator-gobblin/pull/2616 Make enforced file size matching to be configurable. Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-751 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): This PR makes 'enforced file size matching' to be configurable when we copy data files. This PR also make the dataFileVersionStrategy to be configurable for different dataset during the publisher phase. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231776) Time Spent: 10m Remaining Estimate: 0h > Make enforced file size matching to be configurable > --- > > Key: GOBBLIN-751 > URL: https://issues.apache.org/jira/browse/GOBBLIN-751 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Make enforced file size matching to be configurable -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-752) Throttling server incorrectly marks permit numbers as unsatisfiable
[ https://issues.apache.org/jira/browse/GOBBLIN-752?focusedWorklogId=231778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231778 ] ASF GitHub Bot logged work on GOBBLIN-752: -- Author: ASF GitHub Bot Created on: 23/Apr/19 23:57 Start Date: 23/Apr/19 23:57 Worklog Time Spent: 10m Work Description: ibuenros commented on pull request #2617: [GOBBLIN-752] Fix a bug in QPS throttling policy where it was incorrectly indicatin… URL: https://github.com/apache/incubator-gobblin/pull/2617 …g permits were impossible to satisfy. Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [ ] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-XXX ### Description - [ ] Here are some details about my PR, including screenshots (if applicable): ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231778) Time Spent: 10m Remaining Estimate: 0h > Throttling server incorrectly marks permit numbers as unsatisfiable > --- > > Key: GOBBLIN-752 > URL: https://issues.apache.org/jira/browse/GOBBLIN-752 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] ibuenros opened a new pull request #2617: [GOBBLIN-752] Fix a bug in QPS throttling policy where it was incorrectly indicatin…
ibuenros opened a new pull request #2617: [GOBBLIN-752] Fix a bug in QPS throttling policy where it was incorrectly indicatin… URL: https://github.com/apache/incubator-gobblin/pull/2617 …g permits were impossible to satisfy. Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [ ] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-XXX ### Description - [ ] Here are some details about my PR, including screenshots (if applicable): ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] yukuai518 commented on issue #2616: [GOBBLIN-751] Make enforced file size matching to be configurable
yukuai518 commented on issue #2616: [GOBBLIN-751] Make enforced file size matching to be configurable URL: https://github.com/apache/incubator-gobblin/pull/2616#issuecomment-486016085 @ibuenros please help review this. This will help us onboard a few datasets for some validation while the rest of datasets are untouched. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] yukuai518 opened a new pull request #2616: [GOBBLIN-751] Make enforced file size matching to be configurable
yukuai518 opened a new pull request #2616: [GOBBLIN-751] Make enforced file size matching to be configurable URL: https://github.com/apache/incubator-gobblin/pull/2616 Make enforced file size matching to be configurable. Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-751 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): This PR makes 'enforced file size matching' to be configurable when we copy data files. This PR also make the dataFileVersionStrategy to be configurable for different dataset during the publisher phase. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (GOBBLIN-751) Make enforced file size matching to be configurable
Kuai Yu created GOBBLIN-751: --- Summary: Make enforced file size matching to be configurable Key: GOBBLIN-751 URL: https://issues.apache.org/jira/browse/GOBBLIN-751 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Make enforced file size matching to be configurable -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Data Lineage in Gobblin
Hi, I see there is a concept of gathering and storing lineage info into WorkStates, but i cant find how one can use the lineage info from the stored state. Can someone please shade more light on the overall lineage feature? Thanks Jay
[GitHub] [incubator-gobblin] jhsenjaliya opened a new pull request #2615: Gobblin 750
jhsenjaliya opened a new pull request #2615: Gobblin 750 URL: https://github.com/apache/incubator-gobblin/pull/2615 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [ ] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-750 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): DatasetResolver and DatasetResolverFactory both are marked as deprecated. should remove the usage for next version, before there are more such resolver added for lineage functionality. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: existing test case are updated with updated class usage but there no additional tests added ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (GOBBLIN-750) remove usage of depricated DatasetResolver and DatasetResolverFactory
[ https://issues.apache.org/jira/browse/GOBBLIN-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Sen updated GOBBLIN-750: Description: {{DatasetResolver}} and {{DatasetResolverFactory}} both are marked as deprecated. should remove the usage for next version, before there are more such resolver added for lineage functionality. was: {{DatasetResolver}} and {{DatasetResolverFactory}} both are depricated. should remove the usage for next version > remove usage of depricated DatasetResolver and DatasetResolverFactory > - > > Key: GOBBLIN-750 > URL: https://issues.apache.org/jira/browse/GOBBLIN-750 > Project: Apache Gobblin > Issue Type: Improvement >Affects Versions: 0.15.0 >Reporter: Jay Sen >Priority: Minor > Fix For: 0.15.0 > > > {{DatasetResolver}} and {{DatasetResolverFactory}} both are marked as > deprecated. > should remove the usage for next version, before there are more such resolver > added for lineage functionality. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-747) Set expected schema when creating workunits
[ https://issues.apache.org/jira/browse/GOBBLIN-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-747. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2612 [https://github.com/apache/incubator-gobblin/pull/2612] > Set expected schema when creating workunits > --- > > Key: GOBBLIN-747 > URL: https://issues.apache.org/jira/browse/GOBBLIN-747 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Zihan Li >Priority: Major > Fix For: 0.15.0 > > > Set the property of gobblin.copy.expectedSchema when creating the workunit to > enable schema check in distcp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-747) Set expected schema when creating workunits
[ https://issues.apache.org/jira/browse/GOBBLIN-747?focusedWorklogId=231753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231753 ] ASF GitHub Bot logged work on GOBBLIN-747: -- Author: ASF GitHub Bot Created on: 23/Apr/19 22:51 Start Date: 23/Apr/19 22:51 Worklog Time Spent: 10m Work Description: asfgit commented on pull request #2612: [GOBBLIN-747]Check schema URL: https://github.com/apache/incubator-gobblin/pull/2612 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231753) Time Spent: 10m Remaining Estimate: 0h > Set expected schema when creating workunits > --- > > Key: GOBBLIN-747 > URL: https://issues.apache.org/jira/browse/GOBBLIN-747 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Zihan Li >Priority: Major > Fix For: 0.15.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Set the property of gobblin.copy.expectedSchema when creating the workunit to > enable schema check in distcp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-750) remove usage of depricated DatasetResolver and DatasetResolverFactory
Jay Sen created GOBBLIN-750: --- Summary: remove usage of depricated DatasetResolver and DatasetResolverFactory Key: GOBBLIN-750 URL: https://issues.apache.org/jira/browse/GOBBLIN-750 Project: Apache Gobblin Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Jay Sen Fix For: 0.15.0 {{DatasetResolver}} and {{DatasetResolverFactory}} both are depricated. should remove the usage for next version -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] asfgit closed pull request #2612: [GOBBLIN-747]Check schema
asfgit closed pull request #2612: [GOBBLIN-747]Check schema URL: https://github.com/apache/incubator-gobblin/pull/2612 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-748) Craftsmanship code cleaning in GaaS
[ https://issues.apache.org/jira/browse/GOBBLIN-748?focusedWorklogId=231744=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231744 ] ASF GitHub Bot logged work on GOBBLIN-748: -- Author: ASF GitHub Bot Created on: 23/Apr/19 22:10 Start Date: 23/Apr/19 22:10 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2613: [GOBBLIN-748]Craftsmanship code cleaning in Gobblin Service Code URL: https://github.com/apache/incubator-gobblin/pull/2613#discussion_r277525100 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java ## @@ -357,10 +357,9 @@ private void initialize(Dag dag) } /** - * Poll the statuses of running jobs. - * @return List of {@link JobStatus}es. + * Proceed the execution of each dag node based on job status. */ -private void pollJobStatuses() +private void proceedDagExecutionOnDagNodeStatus() Review comment: Don't like this name. pollAndAdvanceDag() maybe? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231744) Time Spent: 1h 10m (was: 1h) > Craftsmanship code cleaning in GaaS > > > Key: GOBBLIN-748 > URL: https://issues.apache.org/jira/browse/GOBBLIN-748 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2613: [GOBBLIN-748]Craftsmanship code cleaning in Gobblin Service Code
sv2000 commented on a change in pull request #2613: [GOBBLIN-748]Craftsmanship code cleaning in Gobblin Service Code URL: https://github.com/apache/incubator-gobblin/pull/2613#discussion_r277524878 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/template_catalog/FSFlowTemplateCatalog.java ## @@ -71,7 +74,7 @@ * @param sysConfig that must contain the fully qualified path of the flow template catalog * @throws IOException */ - public FSFlowCatalog(Config sysConfig) + public FSFlowTemplateCatalog(Config sysConfig) Review comment: +1 on this change. Was on my list. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2613: [GOBBLIN-748]Craftsmanship code cleaning in Gobblin Service Code
sv2000 commented on a change in pull request #2613: [GOBBLIN-748]Craftsmanship code cleaning in Gobblin Service Code URL: https://github.com/apache/incubator-gobblin/pull/2613#discussion_r277525100 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java ## @@ -357,10 +357,9 @@ private void initialize(Dag dag) } /** - * Poll the statuses of running jobs. - * @return List of {@link JobStatus}es. + * Proceed the execution of each dag node based on job status. */ -private void pollJobStatuses() +private void proceedDagExecutionOnDagNodeStatus() Review comment: Don't like this name. pollAndAdvanceDag() maybe? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2613: [GOBBLIN-748]Craftsmanship code cleaning in Gobblin Service Code
sv2000 commented on a change in pull request #2613: [GOBBLIN-748]Craftsmanship code cleaning in Gobblin Service Code URL: https://github.com/apache/incubator-gobblin/pull/2613#discussion_r277525235 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/template_catalog/FSFlowTemplateCatalog.java ## @@ -167,4 +159,21 @@ private Config loadHoconFileAtPath(Path filePath, boolean allowUnresolved) return ConfigFactory.parseReader(new InputStreamReader(is, Charsets.UTF_8)).resolve(options); } } + + /** + * Determine if an URI of a jobTemplate or a FlowTemplate is valid. + * @param flowURI The given job/flow template + * @return true to continue on loading. Review comment: Change to "true if the URI is valid." This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-748) Craftsmanship code cleaning in GaaS
[ https://issues.apache.org/jira/browse/GOBBLIN-748?focusedWorklogId=231743=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231743 ] ASF GitHub Bot logged work on GOBBLIN-748: -- Author: ASF GitHub Bot Created on: 23/Apr/19 22:10 Start Date: 23/Apr/19 22:10 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2613: [GOBBLIN-748]Craftsmanship code cleaning in Gobblin Service Code URL: https://github.com/apache/incubator-gobblin/pull/2613#discussion_r277524878 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/template_catalog/FSFlowTemplateCatalog.java ## @@ -71,7 +74,7 @@ * @param sysConfig that must contain the fully qualified path of the flow template catalog * @throws IOException */ - public FSFlowCatalog(Config sysConfig) + public FSFlowTemplateCatalog(Config sysConfig) Review comment: +1 on this change. Was on my list. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231743) Time Spent: 1h (was: 50m) > Craftsmanship code cleaning in GaaS > > > Key: GOBBLIN-748 > URL: https://issues.apache.org/jira/browse/GOBBLIN-748 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-748) Craftsmanship code cleaning in GaaS
[ https://issues.apache.org/jira/browse/GOBBLIN-748?focusedWorklogId=231742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231742 ] ASF GitHub Bot logged work on GOBBLIN-748: -- Author: ASF GitHub Bot Created on: 23/Apr/19 22:10 Start Date: 23/Apr/19 22:10 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2613: [GOBBLIN-748]Craftsmanship code cleaning in Gobblin Service Code URL: https://github.com/apache/incubator-gobblin/pull/2613#discussion_r277525235 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/template_catalog/FSFlowTemplateCatalog.java ## @@ -167,4 +159,21 @@ private Config loadHoconFileAtPath(Path filePath, boolean allowUnresolved) return ConfigFactory.parseReader(new InputStreamReader(is, Charsets.UTF_8)).resolve(options); } } + + /** + * Determine if an URI of a jobTemplate or a FlowTemplate is valid. + * @param flowURI The given job/flow template + * @return true to continue on loading. Review comment: Change to "true if the URI is valid." This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231742) Time Spent: 50m (was: 40m) > Craftsmanship code cleaning in GaaS > > > Key: GOBBLIN-748 > URL: https://issues.apache.org/jira/browse/GOBBLIN-748 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-744) Support cancellation of a Helix workflow via a DELETE Spec
[ https://issues.apache.org/jira/browse/GOBBLIN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudarshan Vasudevan resolved GOBBLIN-744. - Resolution: Fixed > Support cancellation of a Helix workflow via a DELETE Spec > -- > > Key: GOBBLIN-744 > URL: https://issues.apache.org/jira/browse/GOBBLIN-744 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-cluster >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > This task supports the ability to interrupt and cancel a running job on a > Gobblin Helix cluster via a DELETE Spec submitted to the > JobConfigurationManager. The DELETE Spec should have > "gobblin.cluster.shouldCancelRunningJobOnDelete" set to true for cancelling a > running job. The default behavior is to simply delete the corresponding > JobSpec from the JobCatalog. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-744) Support cancellation of a Helix workflow via a DELETE Spec
[ https://issues.apache.org/jira/browse/GOBBLIN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudarshan Vasudevan updated GOBBLIN-744: Issue resolved by PR: [https://github.com/apache/incubator-gobblin/pull/2609] > Support cancellation of a Helix workflow via a DELETE Spec > -- > > Key: GOBBLIN-744 > URL: https://issues.apache.org/jira/browse/GOBBLIN-744 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-cluster >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > This task supports the ability to interrupt and cancel a running job on a > Gobblin Helix cluster via a DELETE Spec submitted to the > JobConfigurationManager. The DELETE Spec should have > "gobblin.cluster.shouldCancelRunningJobOnDelete" set to true for cancelling a > running job. The default behavior is to simply delete the corresponding > JobSpec from the JobCatalog. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (GOBBLIN-744) Support cancellation of a Helix workflow via a DELETE Spec
[ https://issues.apache.org/jira/browse/GOBBLIN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudarshan Vasudevan closed GOBBLIN-744. --- > Support cancellation of a Helix workflow via a DELETE Spec > -- > > Key: GOBBLIN-744 > URL: https://issues.apache.org/jira/browse/GOBBLIN-744 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-cluster >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > This task supports the ability to interrupt and cancel a running job on a > Gobblin Helix cluster via a DELETE Spec submitted to the > JobConfigurationManager. The DELETE Spec should have > "gobblin.cluster.shouldCancelRunningJobOnDelete" set to true for cancelling a > running job. The default behavior is to simply delete the corresponding > JobSpec from the JobCatalog. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-744) Support cancellation of a Helix workflow via a DELETE Spec
[ https://issues.apache.org/jira/browse/GOBBLIN-744?focusedWorklogId=231723=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231723 ] ASF GitHub Bot logged work on GOBBLIN-744: -- Author: ASF GitHub Bot Created on: 23/Apr/19 21:53 Start Date: 23/Apr/19 21:53 Worklog Time Spent: 10m Work Description: asfgit commented on pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec. URL: https://github.com/apache/incubator-gobblin/pull/2609 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231723) Time Spent: 6h 10m (was: 6h) > Support cancellation of a Helix workflow via a DELETE Spec > -- > > Key: GOBBLIN-744 > URL: https://issues.apache.org/jira/browse/GOBBLIN-744 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-cluster >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > This task supports the ability to interrupt and cancel a running job on a > Gobblin Helix cluster via a DELETE Spec submitted to the > JobConfigurationManager. The DELETE Spec should have > "gobblin.cluster.shouldCancelRunningJobOnDelete" set to true for cancelling a > running job. The default behavior is to simply delete the corresponding > JobSpec from the JobCatalog. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] asfgit closed pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec.
asfgit closed pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec. URL: https://github.com/apache/incubator-gobblin/pull/2609 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-744) Support cancellation of a Helix workflow via a DELETE Spec
[ https://issues.apache.org/jira/browse/GOBBLIN-744?focusedWorklogId=231637=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231637 ] ASF GitHub Bot logged work on GOBBLIN-744: -- Author: ASF GitHub Bot Created on: 23/Apr/19 18:55 Start Date: 23/Apr/19 18:55 Worklog Time Spent: 10m Work Description: htran1 commented on issue #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec. URL: https://github.com/apache/incubator-gobblin/pull/2609#issuecomment-485930900 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231637) Time Spent: 6h (was: 5h 50m) > Support cancellation of a Helix workflow via a DELETE Spec > -- > > Key: GOBBLIN-744 > URL: https://issues.apache.org/jira/browse/GOBBLIN-744 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-cluster >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 6h > Remaining Estimate: 0h > > This task supports the ability to interrupt and cancel a running job on a > Gobblin Helix cluster via a DELETE Spec submitted to the > JobConfigurationManager. The DELETE Spec should have > "gobblin.cluster.shouldCancelRunningJobOnDelete" set to true for cancelling a > running job. The default behavior is to simply delete the corresponding > JobSpec from the JobCatalog. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] htran1 commented on issue #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec.
htran1 commented on issue #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec. URL: https://github.com/apache/incubator-gobblin/pull/2609#issuecomment-485930900 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] ibuenros commented on issue #2582: UnitTest for KafkaSource
ibuenros commented on issue #2582: UnitTest for KafkaSource URL: https://github.com/apache/incubator-gobblin/pull/2582#issuecomment-485887447 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] ibuenros commented on a change in pull request #2612: Check schema
ibuenros commented on a change in pull request #2612: Check schema URL: https://github.com/apache/incubator-gobblin/pull/2612#discussion_r26259 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/CopySource.java ## @@ -357,6 +358,9 @@ public Void call() { WorkUnit workUnit = new WorkUnit(extract); workUnit.addAll(this.state); + if(this.copyableDataset instanceof ConfigBasedDataset) { Review comment: Do you also want to check that the expected schema is not null? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-744) Support cancellation of a Helix workflow via a DELETE Spec
[ https://issues.apache.org/jira/browse/GOBBLIN-744?focusedWorklogId=231462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231462 ] ASF GitHub Bot logged work on GOBBLIN-744: -- Author: ASF GitHub Bot Created on: 23/Apr/19 15:33 Start Date: 23/Apr/19 15:33 Worklog Time Spent: 10m Work Description: shirshanka commented on pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec. URL: https://github.com/apache/incubator-gobblin/pull/2609#discussion_r277740598 ## File path: gobblin-cluster/src/test/java/org/apache/gobblin/cluster/ClusterIntegrationTest.java ## @@ -82,6 +101,105 @@ public void testJobShouldComplete() suite.waitForAndVerifyOutputFiles(); } + /** + * An integration test for restarting a Helix workflow via a JobSpec. This test case starts a Helix cluster with + * a {@link FsScheduledJobConfigurationManager}. The test case does the following: + * + *add a {@link org.apache.gobblin.runtime.api.JobSpec} that uses a {@link org.apache.gobblin.cluster.SleepingCustomTaskSource}) + * to {@link IntegrationJobRestartViaSpecSuite#FS_SPEC_CONSUMER_DIR}. which is picked by the JobConfigurationManager. + *the JobConfigurationManager sends a notification to the GobblinHelixJobScheduler which schedules the job for execution. The JobSpec is + * also added to the JobCatalog for persistence. Helix starts a Workflow for this JobSpec. + *We then add a {@link org.apache.gobblin.runtime.api.JobSpec} with UPDATE Verb to {@link IntegrationJobRestartViaSpecSuite#FS_SPEC_CONSUMER_DIR}. + * This signals GobblinHelixJobScheduler (and, Helix) to first cancel the running job (i.e., Helix Workflow) started in the previous step. + *We inspect the state of the zNode corresponding to the Workflow resource in Zookeeper to ensure that its {@link org.apache.helix.task.TargetState} + * is STOP. + *Once the cancelled job from the previous steps is completed, the job will be re-launched for execution by the GobblinHelixJobScheduler. + * We confirm the execution by again inspecting the zNode and ensuring its TargetState is START. + * + */ + @Test (dependsOnMethods = { "testJobShouldGetCancelled" }) + public void testJobRestartViaSpec() throws Exception { +this.suite = new IntegrationJobRestartViaSpecSuite(); +HelixManager helixManager = getHelixManager(); + +IntegrationJobRestartViaSpecSuite restartViaSpecSuite = (IntegrationJobRestartViaSpecSuite) this.suite; + +//Add a new JobSpec to the path monitored by the SpecConsumer +restartViaSpecSuite.addJobSpec(IntegrationJobRestartViaSpecSuite.JOB_ID, SpecExecutor.Verb.ADD.name()); + +//Start the cluster +restartViaSpecSuite.startCluster(); + +helixManager.connect(); + +AssertWithBackoff asserter1 = AssertWithBackoff.create().timeoutMs(3).maxSleepMs(1000).backoffFactor(1); +asserter1.assertTrue(isTaskStarted(helixManager, IntegrationJobRestartViaSpecSuite.JOB_ID), +"Waiting for the job to start..."); + +AssertWithBackoff asserter2 = AssertWithBackoff.create().maxSleepMs(100).timeoutMs(2000).backoffFactor(1); + asserter2.assertTrue(isTaskRunning(IntegrationJobRestartViaSpecSuite.TASK_STATE_FILE),"Waiting for the task to enter running state"); + +ZkClient zkClient = new ZkClient(this.zkConnectString); +PathBasedZkSerializer zkSerializer = ChainedPathZkSerializer.builder(new ZNRecordStreamingSerializer()).build(); +zkClient.setZkSerializer(zkSerializer); + +String clusterName = getHelixManager().getClusterName(); +String zNodePath = Paths.get("/", clusterName, "CONFIGS", "RESOURCE", IntegrationJobRestartViaSpecSuite.JOB_ID).toString(); + +//Ensure that the Workflow is started +ZNRecord record = zkClient.readData(zNodePath); +String targetState = record.getSimpleField("TargetState"); +Assert.assertEquals(targetState, TargetState.START.name()); + +//Add a JobSpec with UPDATE verb signalling the Helix cluster to restart the workflow +restartViaSpecSuite.addJobSpec(IntegrationJobRestartViaSpecSuite.JOB_ID, SpecExecutor.Verb.UPDATE.name()); + +AssertWithBackoff asserter3 = AssertWithBackoff.create().maxSleepMs(1000).timeoutMs(5000).backoffFactor(1); +asserter3.assertTrue(input -> { + //Inspect the zNode at the path corresponding to the Workflow resource. Ensure the target state of the resource is in + // the STOP state or that the zNode has been deleted. + ZNRecord recordNew = zkClient.readData(zNodePath, true); + String targetStateNew = null; + if (recordNew != null) { +targetStateNew = recordNew.getSimpleField("TargetState"); + } + return recordNew == null || targetStateNew.equals(TargetState.STOP.name()); +}, "Waiting for Workflow TargetState to be
[jira] [Resolved] (GOBBLIN-749) Better access logging for throttling server
[ https://issues.apache.org/jira/browse/GOBBLIN-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-749. -- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2614 [https://github.com/apache/incubator-gobblin/pull/2614] > Better access logging for throttling server > --- > > Key: GOBBLIN-749 > URL: https://issues.apache.org/jira/browse/GOBBLIN-749 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-749) Better access logging for throttling server
[ https://issues.apache.org/jira/browse/GOBBLIN-749?focusedWorklogId=231489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231489 ] ASF GitHub Bot logged work on GOBBLIN-749: -- Author: ASF GitHub Bot Created on: 23/Apr/19 16:19 Start Date: 23/Apr/19 16:19 Worklog Time Spent: 10m Work Description: asfgit commented on pull request #2614: [GOBBLIN-749] Add logging to limiter server. URL: https://github.com/apache/incubator-gobblin/pull/2614 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231489) Time Spent: 20m (was: 10m) > Better access logging for throttling server > --- > > Key: GOBBLIN-749 > URL: https://issues.apache.org/jira/browse/GOBBLIN-749 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] asfgit closed pull request #2614: [GOBBLIN-749] Add logging to limiter server.
asfgit closed pull request #2614: [GOBBLIN-749] Add logging to limiter server. URL: https://github.com/apache/incubator-gobblin/pull/2614 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-744) Support cancellation of a Helix workflow via a DELETE Spec
[ https://issues.apache.org/jira/browse/GOBBLIN-744?focusedWorklogId=231456=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231456 ] ASF GitHub Bot logged work on GOBBLIN-744: -- Author: ASF GitHub Bot Created on: 23/Apr/19 15:26 Start Date: 23/Apr/19 15:26 Worklog Time Spent: 10m Work Description: shirshanka commented on pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec. URL: https://github.com/apache/incubator-gobblin/pull/2609#discussion_r277737062 ## File path: gobblin-cluster/src/main/java/org/apache/gobblin/cluster/SleepingTask.java ## @@ -17,24 +17,52 @@ package org.apache.gobblin.cluster; +import java.io.File; +import java.io.IOException; + +import com.google.common.io.Files; + import avro.shaded.com.google.common.base.Throwables; Review comment: spotted a faulty import from before This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231456) Time Spent: 5h 20m (was: 5h 10m) > Support cancellation of a Helix workflow via a DELETE Spec > -- > > Key: GOBBLIN-744 > URL: https://issues.apache.org/jira/browse/GOBBLIN-744 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-cluster >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > This task supports the ability to interrupt and cancel a running job on a > Gobblin Helix cluster via a DELETE Spec submitted to the > JobConfigurationManager. The DELETE Spec should have > "gobblin.cluster.shouldCancelRunningJobOnDelete" set to true for cancelling a > running job. The default behavior is to simply delete the corresponding > JobSpec from the JobCatalog. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-744) Support cancellation of a Helix workflow via a DELETE Spec
[ https://issues.apache.org/jira/browse/GOBBLIN-744?focusedWorklogId=231464=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231464 ] ASF GitHub Bot logged work on GOBBLIN-744: -- Author: ASF GitHub Bot Created on: 23/Apr/19 15:34 Start Date: 23/Apr/19 15:34 Worklog Time Spent: 10m Work Description: shirshanka commented on pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec. URL: https://github.com/apache/incubator-gobblin/pull/2609#discussion_r277741264 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/api/FsSpecConsumer.java ## @@ -74,6 +78,8 @@ public FsSpecConsumer(Config config) { return null; } +Arrays.sort(fileStatuses, Comparator.comparingLong(FileStatus::getModificationTime)); Review comment: add a comment for why you're doing it and what you're expecting (ascending sort?) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231464) Time Spent: 5h 50m (was: 5h 40m) > Support cancellation of a Helix workflow via a DELETE Spec > -- > > Key: GOBBLIN-744 > URL: https://issues.apache.org/jira/browse/GOBBLIN-744 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-cluster >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > This task supports the ability to interrupt and cancel a running job on a > Gobblin Helix cluster via a DELETE Spec submitted to the > JobConfigurationManager. The DELETE Spec should have > "gobblin.cluster.shouldCancelRunningJobOnDelete" set to true for cancelling a > running job. The default behavior is to simply delete the corresponding > JobSpec from the JobCatalog. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] shirshanka commented on a change in pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec.
shirshanka commented on a change in pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec. URL: https://github.com/apache/incubator-gobblin/pull/2609#discussion_r277741264 ## File path: gobblin-runtime/src/main/java/org/apache/gobblin/runtime/api/FsSpecConsumer.java ## @@ -74,6 +78,8 @@ public FsSpecConsumer(Config config) { return null; } +Arrays.sort(fileStatuses, Comparator.comparingLong(FileStatus::getModificationTime)); Review comment: add a comment for why you're doing it and what you're expecting (ascending sort?) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] shirshanka commented on a change in pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec.
shirshanka commented on a change in pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec. URL: https://github.com/apache/incubator-gobblin/pull/2609#discussion_r277740598 ## File path: gobblin-cluster/src/test/java/org/apache/gobblin/cluster/ClusterIntegrationTest.java ## @@ -82,6 +101,105 @@ public void testJobShouldComplete() suite.waitForAndVerifyOutputFiles(); } + /** + * An integration test for restarting a Helix workflow via a JobSpec. This test case starts a Helix cluster with + * a {@link FsScheduledJobConfigurationManager}. The test case does the following: + * + *add a {@link org.apache.gobblin.runtime.api.JobSpec} that uses a {@link org.apache.gobblin.cluster.SleepingCustomTaskSource}) + * to {@link IntegrationJobRestartViaSpecSuite#FS_SPEC_CONSUMER_DIR}. which is picked by the JobConfigurationManager. + *the JobConfigurationManager sends a notification to the GobblinHelixJobScheduler which schedules the job for execution. The JobSpec is + * also added to the JobCatalog for persistence. Helix starts a Workflow for this JobSpec. + *We then add a {@link org.apache.gobblin.runtime.api.JobSpec} with UPDATE Verb to {@link IntegrationJobRestartViaSpecSuite#FS_SPEC_CONSUMER_DIR}. + * This signals GobblinHelixJobScheduler (and, Helix) to first cancel the running job (i.e., Helix Workflow) started in the previous step. + *We inspect the state of the zNode corresponding to the Workflow resource in Zookeeper to ensure that its {@link org.apache.helix.task.TargetState} + * is STOP. + *Once the cancelled job from the previous steps is completed, the job will be re-launched for execution by the GobblinHelixJobScheduler. + * We confirm the execution by again inspecting the zNode and ensuring its TargetState is START. + * + */ + @Test (dependsOnMethods = { "testJobShouldGetCancelled" }) + public void testJobRestartViaSpec() throws Exception { +this.suite = new IntegrationJobRestartViaSpecSuite(); +HelixManager helixManager = getHelixManager(); + +IntegrationJobRestartViaSpecSuite restartViaSpecSuite = (IntegrationJobRestartViaSpecSuite) this.suite; + +//Add a new JobSpec to the path monitored by the SpecConsumer +restartViaSpecSuite.addJobSpec(IntegrationJobRestartViaSpecSuite.JOB_ID, SpecExecutor.Verb.ADD.name()); + +//Start the cluster +restartViaSpecSuite.startCluster(); + +helixManager.connect(); + +AssertWithBackoff asserter1 = AssertWithBackoff.create().timeoutMs(3).maxSleepMs(1000).backoffFactor(1); +asserter1.assertTrue(isTaskStarted(helixManager, IntegrationJobRestartViaSpecSuite.JOB_ID), +"Waiting for the job to start..."); + +AssertWithBackoff asserter2 = AssertWithBackoff.create().maxSleepMs(100).timeoutMs(2000).backoffFactor(1); + asserter2.assertTrue(isTaskRunning(IntegrationJobRestartViaSpecSuite.TASK_STATE_FILE),"Waiting for the task to enter running state"); + +ZkClient zkClient = new ZkClient(this.zkConnectString); +PathBasedZkSerializer zkSerializer = ChainedPathZkSerializer.builder(new ZNRecordStreamingSerializer()).build(); +zkClient.setZkSerializer(zkSerializer); + +String clusterName = getHelixManager().getClusterName(); +String zNodePath = Paths.get("/", clusterName, "CONFIGS", "RESOURCE", IntegrationJobRestartViaSpecSuite.JOB_ID).toString(); + +//Ensure that the Workflow is started +ZNRecord record = zkClient.readData(zNodePath); +String targetState = record.getSimpleField("TargetState"); +Assert.assertEquals(targetState, TargetState.START.name()); + +//Add a JobSpec with UPDATE verb signalling the Helix cluster to restart the workflow +restartViaSpecSuite.addJobSpec(IntegrationJobRestartViaSpecSuite.JOB_ID, SpecExecutor.Verb.UPDATE.name()); + +AssertWithBackoff asserter3 = AssertWithBackoff.create().maxSleepMs(1000).timeoutMs(5000).backoffFactor(1); +asserter3.assertTrue(input -> { + //Inspect the zNode at the path corresponding to the Workflow resource. Ensure the target state of the resource is in + // the STOP state or that the zNode has been deleted. + ZNRecord recordNew = zkClient.readData(zNodePath, true); + String targetStateNew = null; + if (recordNew != null) { +targetStateNew = recordNew.getSimpleField("TargetState"); + } + return recordNew == null || targetStateNew.equals(TargetState.STOP.name()); +}, "Waiting for Workflow TargetState to be STOP"); + +//Ensure that the SleepingTask did not terminate normally i.e. it was interrupted. We check this by ensuring +// that the line "Hello World!" is not present in the logged output. +suite.waitForAndVerifyOutputFiles(); + +AssertWithBackoff asserter4 = AssertWithBackoff.create().maxSleepMs(1000).timeoutMs(12).backoffFactor(1); +asserter4.assertTrue(input -> { + //Inspect the
[jira] [Work logged] (GOBBLIN-744) Support cancellation of a Helix workflow via a DELETE Spec
[ https://issues.apache.org/jira/browse/GOBBLIN-744?focusedWorklogId=231461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231461 ] ASF GitHub Bot logged work on GOBBLIN-744: -- Author: ASF GitHub Bot Created on: 23/Apr/19 15:31 Start Date: 23/Apr/19 15:31 Worklog Time Spent: 10m Work Description: shirshanka commented on pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec. URL: https://github.com/apache/incubator-gobblin/pull/2609#discussion_r277739662 ## File path: gobblin-cluster/src/test/java/org/apache/gobblin/cluster/ClusterIntegrationTest.java ## @@ -51,28 +68,30 @@ public void testJobShouldComplete() runAndVerify(); } - @Test void testJobShouldGetCancelled() throws Exception { -this.suite =new IntegrationJobCancelSuite(); + private HelixManager getHelixManager() { Config helixConfig = this.suite.getManagerConfig(); String clusterName = helixConfig.getString(GobblinClusterConfigurationKeys.HELIX_CLUSTER_NAME_KEY); String instanceName = ConfigUtils.getString(helixConfig, GobblinClusterConfigurationKeys.HELIX_INSTANCE_NAME_KEY, GobblinClusterManager.class.getSimpleName()); -String zkConnectString = helixConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY); +this.zkConnectString = helixConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY); HelixManager helixManager = HelixManagerFactory.getZKHelixManager(clusterName, instanceName, InstanceType.CONTROLLER, zkConnectString); +return helixManager; + } + @Test void testJobShouldGetCancelled() throws Exception { +this.suite =new IntegrationJobCancelSuite(); +HelixManager helixManager = getHelixManager(); suite.startCluster(); - helixManager.connect(); TaskDriver taskDriver = new TaskDriver(helixManager); -while (TaskDriver.getWorkflowContext(helixManager, IntegrationJobCancelSuite.JOB_ID) == null) { - log.warn("Waiting for the job to start..."); - Thread.sleep(1000L); -} +AssertWithBackoff asserter1 = AssertWithBackoff.create().maxSleepMs(1000).backoffFactor(1); +asserter1.assertTrue(isTaskStarted(helixManager, IntegrationJobCancelSuite.JOB_ID), Review comment: you could chain the entire call without needing the local variable asserter1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231461) Time Spent: 5.5h (was: 5h 20m) > Support cancellation of a Helix workflow via a DELETE Spec > -- > > Key: GOBBLIN-744 > URL: https://issues.apache.org/jira/browse/GOBBLIN-744 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-cluster >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 5.5h > Remaining Estimate: 0h > > This task supports the ability to interrupt and cancel a running job on a > Gobblin Helix cluster via a DELETE Spec submitted to the > JobConfigurationManager. The DELETE Spec should have > "gobblin.cluster.shouldCancelRunningJobOnDelete" set to true for cancelling a > running job. The default behavior is to simply delete the corresponding > JobSpec from the JobCatalog. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] shirshanka commented on a change in pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec.
shirshanka commented on a change in pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec. URL: https://github.com/apache/incubator-gobblin/pull/2609#discussion_r277739662 ## File path: gobblin-cluster/src/test/java/org/apache/gobblin/cluster/ClusterIntegrationTest.java ## @@ -51,28 +68,30 @@ public void testJobShouldComplete() runAndVerify(); } - @Test void testJobShouldGetCancelled() throws Exception { -this.suite =new IntegrationJobCancelSuite(); + private HelixManager getHelixManager() { Config helixConfig = this.suite.getManagerConfig(); String clusterName = helixConfig.getString(GobblinClusterConfigurationKeys.HELIX_CLUSTER_NAME_KEY); String instanceName = ConfigUtils.getString(helixConfig, GobblinClusterConfigurationKeys.HELIX_INSTANCE_NAME_KEY, GobblinClusterManager.class.getSimpleName()); -String zkConnectString = helixConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY); +this.zkConnectString = helixConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY); HelixManager helixManager = HelixManagerFactory.getZKHelixManager(clusterName, instanceName, InstanceType.CONTROLLER, zkConnectString); +return helixManager; + } + @Test void testJobShouldGetCancelled() throws Exception { +this.suite =new IntegrationJobCancelSuite(); +HelixManager helixManager = getHelixManager(); suite.startCluster(); - helixManager.connect(); TaskDriver taskDriver = new TaskDriver(helixManager); -while (TaskDriver.getWorkflowContext(helixManager, IntegrationJobCancelSuite.JOB_ID) == null) { - log.warn("Waiting for the job to start..."); - Thread.sleep(1000L); -} +AssertWithBackoff asserter1 = AssertWithBackoff.create().maxSleepMs(1000).backoffFactor(1); +asserter1.assertTrue(isTaskStarted(helixManager, IntegrationJobCancelSuite.JOB_ID), Review comment: you could chain the entire call without needing the local variable asserter1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] shirshanka commented on a change in pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec.
shirshanka commented on a change in pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec. URL: https://github.com/apache/incubator-gobblin/pull/2609#discussion_r277737062 ## File path: gobblin-cluster/src/main/java/org/apache/gobblin/cluster/SleepingTask.java ## @@ -17,24 +17,52 @@ package org.apache.gobblin.cluster; +import java.io.File; +import java.io.IOException; + +import com.google.common.io.Files; + import avro.shaded.com.google.common.base.Throwables; Review comment: spotted a faulty import from before This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services