[jira] [Work logged] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-707?focusedWorklogId=231807=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231807
 ]

ASF GitHub Bot logged work on GOBBLIN-707:
--

Author: ASF GitHub Bot
Created on: 24/Apr/19 00:45
Start Date: 24/Apr/19 00:45
Worklog Time Spent: 10m 
  Work Description: jhsenjaliya commented on issue #2578: [GOBBLIN-707] 
rewrite gobblin script to combine all modes and command
URL: 
https://github.com/apache/incubator-gobblin/pull/2578#issuecomment-486024787
 
 
   @autumnust , updated docs and also added new info in doc regarding the usage 
of gobblin.sh, please take a look when you get chance. Thanks
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231807)
Time Spent: 5h 20m  (was: 5h 10m)

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines 
> utility to run cli and admin commands. There is a individual script for each 
> of them.
> Having individual script introduces lot of issues
>  # all scripts handles gobblin variables, user parameters differently, and 
> its highly inconsistent among various different gobblin scripts
>  # functionality around start, stop, status checking and handling PID's among 
> lot of other things, varies vastly as per the implementation of the script.
>  # features like GC & JVM params, log4j file selection, classpath 
> calculation, etc... exists in some gobblin scripts but not all, adding to 
> inconsistent user experience.
>  # maintaining total 13 script would be too much effort.
> Also all the gobblin scripts share lot of common code to handle params, 
> start, stop services, status checks, pid handling, etc... combining all the 
> scripts into  1 not only makes maintenance easier but also brings clarity and 
> consistency.
>  
> Solution:
> 1. there can be one gobblin.sh script to handle all gobblin commands and 
> deployment options as per following signature. NOTE: This
> {{gobblin.sh   }}
>  {{gobblin.sh   }}
> {{commands values: admin, cli, statestore-check, statestore-clean, 
> historystore-manager, classpath}}
>  {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, 
> service}}
> with above change, following becomes valid command.
> {code:java}
> # all under GobblinCli class
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run  -> gobblin cli run 
> # class: JobStateToJsonConverter
> statestore-checker.sh  -> gobblin statestore-checker 
> # class: StateStoreCleaner
> statestore-clean.sh  -> gobblin statestore-clean 
> # class: DatabaseJobHistoryStoreSchemaManager
> historystore-manager.sh  -> gobblin historystore-manager 
> # class: Cli
> gobblin-admin.sh-> gobblin admin 
> # all gobblin deployment modes
> gobblin-cluster-master.sh   -> gobblin cluster-mater start|stop|status
> gobblin-cluster-worker.sh   -> gobblin cluster-mater start|stop|status
> gobblin-compaction.sh   -> gobblin cluster-mater start|stop|status
> gobblin-env.sh  -> gobblin cluster-mater start|stop|status
> gobblin-mapreduce.sh-> gobblin cluster-mater start|stop|status
> gobblin-service.sh  -> gobblin cluster-mater start|stop|status
> gobblin-standalone.sh   -> gobblin cluster-mater start|stop|status
> gobblin-yarn.sh -> gobblin cluster-mater start|stop|status
> {code}
>  
> 2. Also configs needs to be structured and deduped accordingly to make it 
> clear on which config will be picked up for which execution mode.
>  
>  {color:#FF}
>  NOTE: this refactoring to gobblin.sh, changes the way all gobblin commands 
> where ran before{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-753) Refactor HiveRegistrationPolicyBase to make ConfigStore object available in extending class

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-753?focusedWorklogId=231809=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231809
 ]

ASF GitHub Bot logged work on GOBBLIN-753:
--

Author: ASF GitHub Bot
Created on: 24/Apr/19 00:47
Start Date: 24/Apr/19 00:47
Worklog Time Spent: 10m 
  Work Description: autumnust commented on issue #2618: [GOBBLIN-753] 
Refactor HiveRegistrationPolicyBase to surface configStore object
URL: 
https://github.com/apache/incubator-gobblin/pull/2618#issuecomment-486025154
 
 
   @ibuenros @htran1 Can you take a look ? Thanks. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231809)
Time Spent: 20m  (was: 10m)

> Refactor HiveRegistrationPolicyBase to make ConfigStore object available in 
> extending class
> ---
>
> Key: GOBBLIN-753
> URL: https://issues.apache.org/jira/browse/GOBBLIN-753
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] autumnust commented on issue #2618: [GOBBLIN-753] Refactor HiveRegistrationPolicyBase to surface configStore object

2019-04-23 Thread GitBox
autumnust commented on issue #2618: [GOBBLIN-753] Refactor 
HiveRegistrationPolicyBase to surface configStore object
URL: 
https://github.com/apache/incubator-gobblin/pull/2618#issuecomment-486025154
 
 
   @ibuenros @htran1 Can you take a look ? Thanks. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-753) Refactor HiveRegistrationPolicyBase to make ConfigStore object available in extending class

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-753?focusedWorklogId=231808=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231808
 ]

ASF GitHub Bot logged work on GOBBLIN-753:
--

Author: ASF GitHub Bot
Created on: 24/Apr/19 00:47
Start Date: 24/Apr/19 00:47
Worklog Time Spent: 10m 
  Work Description: autumnust commented on pull request #2618: 
[GOBBLIN-753] Refactor HiveRegistrationPolicyBase to surface configStore object
URL: https://github.com/apache/incubator-gobblin/pull/2618
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   Some refactoring in `HiveRegistrationPolicyBase` to make topic-specific 
configStore object available in extension class
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin JIRA]
   - https://issues.apache.org/jira/browse/GOBBLIN-753
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231808)
Time Spent: 10m
Remaining Estimate: 0h

> Refactor HiveRegistrationPolicyBase to make ConfigStore object available in 
> extending class
> ---
>
> Key: GOBBLIN-753
> URL: https://issues.apache.org/jira/browse/GOBBLIN-753
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] autumnust opened a new pull request #2618: [GOBBLIN-753] Refactor HiveRegistrationPolicyBase to surface configStore object

2019-04-23 Thread GitBox
autumnust opened a new pull request #2618: [GOBBLIN-753] Refactor 
HiveRegistrationPolicyBase to surface configStore object
URL: https://github.com/apache/incubator-gobblin/pull/2618
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   Some refactoring in `HiveRegistrationPolicyBase` to make topic-specific 
configStore object available in extension class
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin JIRA]
   - https://issues.apache.org/jira/browse/GOBBLIN-753
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (GOBBLIN-753) Refactor HiveRegistrationPolicyBase to make ConfigStore object available in extending class

2019-04-23 Thread Lei Sun (JIRA)
Lei Sun created GOBBLIN-753:
---

 Summary: Refactor HiveRegistrationPolicyBase to make ConfigStore 
object available in extending class
 Key: GOBBLIN-753
 URL: https://issues.apache.org/jira/browse/GOBBLIN-753
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Lei Sun






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] jhsenjaliya commented on issue #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command

2019-04-23 Thread GitBox
jhsenjaliya commented on issue #2578: [GOBBLIN-707] rewrite gobblin script to 
combine all modes and command
URL: 
https://github.com/apache/incubator-gobblin/pull/2578#issuecomment-486024787
 
 
   @autumnust , updated docs and also added new info in doc regarding the usage 
of gobblin.sh, please take a look when you get chance. Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-752) Throttling server incorrectly marks permit numbers as unsatisfiable

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-752?focusedWorklogId=231791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231791
 ]

ASF GitHub Bot logged work on GOBBLIN-752:
--

Author: ASF GitHub Bot
Created on: 24/Apr/19 00:27
Start Date: 24/Apr/19 00:27
Worklog Time Spent: 10m 
  Work Description: asfgit commented on pull request #2617: [GOBBLIN-752] 
Fix a bug in QPS throttling policy where it was incorrectly indicatin…
URL: https://github.com/apache/incubator-gobblin/pull/2617
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231791)
Time Spent: 20m  (was: 10m)

> Throttling server incorrectly marks permit numbers as unsatisfiable
> ---
>
> Key: GOBBLIN-752
> URL: https://issues.apache.org/jira/browse/GOBBLIN-752
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Issac Buenrostro
>Assignee: Issac Buenrostro
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-746) Loading FlowSpecs asynchronously while initializing GobblinServiceManager

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-746?focusedWorklogId=231787=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231787
 ]

ASF GitHub Bot logged work on GOBBLIN-746:
--

Author: ASF GitHub Bot
Created on: 24/Apr/19 00:22
Start Date: 24/Apr/19 00:22
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2611: [GOBBLIN-746] 
Async loading FlowSpec
URL: https://github.com/apache/incubator-gobblin/pull/2611#discussion_r277917952
 
 

 ##
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/scheduler/GobblinServiceJobScheduler.java
 ##
 @@ -136,6 +136,46 @@ public synchronized void setActive(boolean isActive) {
 }
   }
 
+  /**
+   * Load all {@link FlowSpec}s from {@link FlowCatalog} as one of the 
initialization step,
+   * and make schedulers be aware of that.
+   *
+   */
+  private void scheduleSpecsFromCatalog() {
+Iterator specUris = null;
+long startTime = System.currentTimeMillis();
+
+try {
+  specUris = this.flowCatalog.get().getSpecURIs();
+} catch (SpecSerDeException ssde) {
+  throw new RuntimeException("Failed to get the iterator of all Spec 
URIS", ssde);
+}
+
+
+try {
+  while (specUris.hasNext()) {
+Spec spec = null;
+try {
+  spec = this.flowCatalog.get().getSpec(specUris.next());
+} catch (SpecNotFoundException snfe) {
+  _log.error(String.format("The URI %s discovered in SpecStore is 
missing in FlowCatlog"
+  + ", suspecting current modification on SpecStore", 
specUris.next()), snfe);
+}
+
+//Disable FLOW_RUN_IMMEDIATELY on service startup or leadership change
+if (spec instanceof FlowSpec) {
+  Spec modifiedSpec = disableFlowRunImmediatelyOnStart((FlowSpec) 
spec);
+  onAddSpec(modifiedSpec);
+} else {
+  onAddSpec(spec);
+}
+  }
+} finally {
+  flowSpecInitFinished.countDown();
 
 Review comment:
   Is this countdown latch being used only in the test case? Your test case is 
waiting for at most 2 secs anyway. Can you simply do an AssertWithBackoff in 
your test case?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231787)
Time Spent: 0.5h  (was: 20m)

> Loading FlowSpecs asynchronously while initializing GobblinServiceManager
> -
>
> Key: GOBBLIN-746
> URL: https://issues.apache.org/jira/browse/GOBBLIN-746
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-751) Make enforced file size matching to be configurable

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-751?focusedWorklogId=231786=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231786
 ]

ASF GitHub Bot logged work on GOBBLIN-751:
--

Author: ASF GitHub Bot
Created on: 24/Apr/19 00:20
Start Date: 24/Apr/19 00:20
Worklog Time Spent: 10m 
  Work Description: ibuenros commented on issue #2616: [GOBBLIN-751] Make 
enforced file size matching to be configurable
URL: 
https://github.com/apache/incubator-gobblin/pull/2616#issuecomment-486020826
 
 
   +1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231786)
Time Spent: 50m  (was: 40m)

> Make enforced file size matching to be configurable
> ---
>
> Key: GOBBLIN-751
> URL: https://issues.apache.org/jira/browse/GOBBLIN-751
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Make enforced file size matching to be configurable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-752) Throttling server incorrectly marks permit numbers as unsatisfiable

2019-04-23 Thread Issac Buenrostro (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Issac Buenrostro resolved GOBBLIN-752.
--
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2617
[https://github.com/apache/incubator-gobblin/pull/2617]

> Throttling server incorrectly marks permit numbers as unsatisfiable
> ---
>
> Key: GOBBLIN-752
> URL: https://issues.apache.org/jira/browse/GOBBLIN-752
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Issac Buenrostro
>Assignee: Issac Buenrostro
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] asfgit closed pull request #2617: [GOBBLIN-752] Fix a bug in QPS throttling policy where it was incorrectly indicatin…

2019-04-23 Thread GitBox
asfgit closed pull request #2617: [GOBBLIN-752] Fix a bug in QPS throttling 
policy where it was incorrectly indicatin…
URL: https://github.com/apache/incubator-gobblin/pull/2617
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-746) Loading FlowSpecs asynchronously while initializing GobblinServiceManager

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-746?focusedWorklogId=231789=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231789
 ]

ASF GitHub Bot logged work on GOBBLIN-746:
--

Author: ASF GitHub Bot
Created on: 24/Apr/19 00:22
Start Date: 24/Apr/19 00:22
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2611: [GOBBLIN-746] 
Async loading FlowSpec
URL: https://github.com/apache/incubator-gobblin/pull/2611#discussion_r277899864
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/api/SpecStore.java
 ##
 @@ -105,4 +107,16 @@
* @throws IOException Exception in retrieving {@link Spec}s.
*/
   Collection getSpecs() throws IOException;
+
+  /**
+   * Return an iterator of Spec's URI(Spec's identifier)
 
 Review comment:
   Modify Spec's URI to Spec URIs (Spec identifiers)?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231789)
Time Spent: 50m  (was: 40m)

> Loading FlowSpecs asynchronously while initializing GobblinServiceManager
> -
>
> Key: GOBBLIN-746
> URL: https://issues.apache.org/jira/browse/GOBBLIN-746
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-746) Loading FlowSpecs asynchronously while initializing GobblinServiceManager

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-746?focusedWorklogId=231788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231788
 ]

ASF GitHub Bot logged work on GOBBLIN-746:
--

Author: ASF GitHub Bot
Created on: 24/Apr/19 00:22
Start Date: 24/Apr/19 00:22
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2611: [GOBBLIN-746] 
Async loading FlowSpec
URL: https://github.com/apache/incubator-gobblin/pull/2611#discussion_r277892394
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/api/SpecSerDeException.java
 ##
 @@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.runtime.api;
+
+import java.net.URI;
+
+/**
+ * An exception when {@link Spec} cannot be correctly serialized/deserialized 
from underlying storage.
+ */
+public class SpecSerDeException extends Exception{
 
 Review comment:
   Minor nit. Should there be a space between Exception and "{"?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231788)
Time Spent: 40m  (was: 0.5h)

> Loading FlowSpecs asynchronously while initializing GobblinServiceManager
> -
>
> Key: GOBBLIN-746
> URL: https://issues.apache.org/jira/browse/GOBBLIN-746
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-752) Throttling server incorrectly marks permit numbers as unsatisfiable

2019-04-23 Thread Issac Buenrostro (JIRA)
Issac Buenrostro created GOBBLIN-752:


 Summary: Throttling server incorrectly marks permit numbers as 
unsatisfiable
 Key: GOBBLIN-752
 URL: https://issues.apache.org/jira/browse/GOBBLIN-752
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Issac Buenrostro
Assignee: Issac Buenrostro






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2611: [GOBBLIN-746] Async loading FlowSpec

2019-04-23 Thread GitBox
sv2000 commented on a change in pull request #2611: [GOBBLIN-746] Async loading 
FlowSpec
URL: https://github.com/apache/incubator-gobblin/pull/2611#discussion_r277892394
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/api/SpecSerDeException.java
 ##
 @@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.runtime.api;
+
+import java.net.URI;
+
+/**
+ * An exception when {@link Spec} cannot be correctly serialized/deserialized 
from underlying storage.
+ */
+public class SpecSerDeException extends Exception{
 
 Review comment:
   Minor nit. Should there be a space between Exception and "{"?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2611: [GOBBLIN-746] Async loading FlowSpec

2019-04-23 Thread GitBox
sv2000 commented on a change in pull request #2611: [GOBBLIN-746] Async loading 
FlowSpec
URL: https://github.com/apache/incubator-gobblin/pull/2611#discussion_r277917952
 
 

 ##
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/scheduler/GobblinServiceJobScheduler.java
 ##
 @@ -136,6 +136,46 @@ public synchronized void setActive(boolean isActive) {
 }
   }
 
+  /**
+   * Load all {@link FlowSpec}s from {@link FlowCatalog} as one of the 
initialization step,
+   * and make schedulers be aware of that.
+   *
+   */
+  private void scheduleSpecsFromCatalog() {
+Iterator specUris = null;
+long startTime = System.currentTimeMillis();
+
+try {
+  specUris = this.flowCatalog.get().getSpecURIs();
+} catch (SpecSerDeException ssde) {
+  throw new RuntimeException("Failed to get the iterator of all Spec 
URIS", ssde);
+}
+
+
+try {
+  while (specUris.hasNext()) {
+Spec spec = null;
+try {
+  spec = this.flowCatalog.get().getSpec(specUris.next());
+} catch (SpecNotFoundException snfe) {
+  _log.error(String.format("The URI %s discovered in SpecStore is 
missing in FlowCatlog"
+  + ", suspecting current modification on SpecStore", 
specUris.next()), snfe);
+}
+
+//Disable FLOW_RUN_IMMEDIATELY on service startup or leadership change
+if (spec instanceof FlowSpec) {
+  Spec modifiedSpec = disableFlowRunImmediatelyOnStart((FlowSpec) 
spec);
+  onAddSpec(modifiedSpec);
+} else {
+  onAddSpec(spec);
+}
+  }
+} finally {
+  flowSpecInitFinished.countDown();
 
 Review comment:
   Is this countdown latch being used only in the test case? Your test case is 
waiting for at most 2 secs anyway. Can you simply do an AssertWithBackoff in 
your test case?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2611: [GOBBLIN-746] Async loading FlowSpec

2019-04-23 Thread GitBox
sv2000 commented on a change in pull request #2611: [GOBBLIN-746] Async loading 
FlowSpec
URL: https://github.com/apache/incubator-gobblin/pull/2611#discussion_r277899864
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/api/SpecStore.java
 ##
 @@ -105,4 +107,16 @@
* @throws IOException Exception in retrieving {@link Spec}s.
*/
   Collection getSpecs() throws IOException;
+
+  /**
+   * Return an iterator of Spec's URI(Spec's identifier)
 
 Review comment:
   Modify Spec's URI to Spec URIs (Spec identifiers)?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] ibuenros commented on issue #2616: [GOBBLIN-751] Make enforced file size matching to be configurable

2019-04-23 Thread GitBox
ibuenros commented on issue #2616: [GOBBLIN-751] Make enforced file size 
matching to be configurable
URL: 
https://github.com/apache/incubator-gobblin/pull/2616#issuecomment-486020826
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-751) Make enforced file size matching to be configurable

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-751?focusedWorklogId=231780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231780
 ]

ASF GitHub Bot logged work on GOBBLIN-751:
--

Author: ASF GitHub Bot
Created on: 24/Apr/19 00:06
Start Date: 24/Apr/19 00:06
Worklog Time Spent: 10m 
  Work Description: yukuai518 commented on pull request #2616: 
[GOBBLIN-751] Make enforced file size matching to be configurable
URL: https://github.com/apache/incubator-gobblin/pull/2616#discussion_r277915613
 
 

 ##
 File path: 
gobblin-utility/src/main/java/org/apache/gobblin/util/filesystem/DataFileVersionStrategy.java
 ##
 @@ -65,12 +65,13 @@
   }
 
   String DATA_FILE_VERSION_STRATEGY_KEY = 
"org.apache.gobblin.dataFileVersionStrategy";
+  String DEFAULT_DATA_FILE_VERSION_STAREGY = "modtime";
 
 Review comment:
   Fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231780)
Time Spent: 40m  (was: 0.5h)

> Make enforced file size matching to be configurable
> ---
>
> Key: GOBBLIN-751
> URL: https://issues.apache.org/jira/browse/GOBBLIN-751
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Make enforced file size matching to be configurable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-751) Make enforced file size matching to be configurable

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-751?focusedWorklogId=231777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231777
 ]

ASF GitHub Bot logged work on GOBBLIN-751:
--

Author: ASF GitHub Bot
Created on: 23/Apr/19 23:56
Start Date: 23/Apr/19 23:56
Worklog Time Spent: 10m 
  Work Description: yukuai518 commented on issue #2616: [GOBBLIN-751] Make 
enforced file size matching to be configurable
URL: 
https://github.com/apache/incubator-gobblin/pull/2616#issuecomment-486016085
 
 
   @ibuenros please help review this. This will help us onboard a few datasets 
for some validation while the rest of datasets are untouched.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231777)
Time Spent: 20m  (was: 10m)

> Make enforced file size matching to be configurable
> ---
>
> Key: GOBBLIN-751
> URL: https://issues.apache.org/jira/browse/GOBBLIN-751
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Make enforced file size matching to be configurable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] yukuai518 commented on a change in pull request #2616: [GOBBLIN-751] Make enforced file size matching to be configurable

2019-04-23 Thread GitBox
yukuai518 commented on a change in pull request #2616: [GOBBLIN-751] Make 
enforced file size matching to be configurable
URL: https://github.com/apache/incubator-gobblin/pull/2616#discussion_r277915613
 
 

 ##
 File path: 
gobblin-utility/src/main/java/org/apache/gobblin/util/filesystem/DataFileVersionStrategy.java
 ##
 @@ -65,12 +65,13 @@
   }
 
   String DATA_FILE_VERSION_STRATEGY_KEY = 
"org.apache.gobblin.dataFileVersionStrategy";
+  String DEFAULT_DATA_FILE_VERSION_STAREGY = "modtime";
 
 Review comment:
   Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-751) Make enforced file size matching to be configurable

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-751?focusedWorklogId=231779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231779
 ]

ASF GitHub Bot logged work on GOBBLIN-751:
--

Author: ASF GitHub Bot
Created on: 24/Apr/19 00:02
Start Date: 24/Apr/19 00:02
Worklog Time Spent: 10m 
  Work Description: ibuenros commented on pull request #2616: [GOBBLIN-751] 
Make enforced file size matching to be configurable
URL: https://github.com/apache/incubator-gobblin/pull/2616#discussion_r277914931
 
 

 ##
 File path: 
gobblin-utility/src/main/java/org/apache/gobblin/util/filesystem/DataFileVersionStrategy.java
 ##
 @@ -65,12 +65,13 @@
   }
 
   String DATA_FILE_VERSION_STRATEGY_KEY = 
"org.apache.gobblin.dataFileVersionStrategy";
+  String DEFAULT_DATA_FILE_VERSION_STAREGY = "modtime";
 
 Review comment:
   can you fix the spelling of the key name?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231779)
Time Spent: 0.5h  (was: 20m)

> Make enforced file size matching to be configurable
> ---
>
> Key: GOBBLIN-751
> URL: https://issues.apache.org/jira/browse/GOBBLIN-751
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Make enforced file size matching to be configurable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] ibuenros commented on a change in pull request #2616: [GOBBLIN-751] Make enforced file size matching to be configurable

2019-04-23 Thread GitBox
ibuenros commented on a change in pull request #2616: [GOBBLIN-751] Make 
enforced file size matching to be configurable
URL: https://github.com/apache/incubator-gobblin/pull/2616#discussion_r277914931
 
 

 ##
 File path: 
gobblin-utility/src/main/java/org/apache/gobblin/util/filesystem/DataFileVersionStrategy.java
 ##
 @@ -65,12 +65,13 @@
   }
 
   String DATA_FILE_VERSION_STRATEGY_KEY = 
"org.apache.gobblin.dataFileVersionStrategy";
+  String DEFAULT_DATA_FILE_VERSION_STAREGY = "modtime";
 
 Review comment:
   can you fix the spelling of the key name?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-751) Make enforced file size matching to be configurable

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-751?focusedWorklogId=231776=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231776
 ]

ASF GitHub Bot logged work on GOBBLIN-751:
--

Author: ASF GitHub Bot
Created on: 23/Apr/19 23:51
Start Date: 23/Apr/19 23:51
Worklog Time Spent: 10m 
  Work Description: yukuai518 commented on pull request #2616: 
[GOBBLIN-751] Make enforced file size matching to be configurable
URL: https://github.com/apache/incubator-gobblin/pull/2616
 
 
   Make enforced file size matching to be configurable.
   
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-751
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
  This PR makes 'enforced file size matching' to be configurable when we 
copy data files.
  This PR also make the dataFileVersionStrategy to be configurable for 
different dataset during the publisher phase.
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231776)
Time Spent: 10m
Remaining Estimate: 0h

> Make enforced file size matching to be configurable
> ---
>
> Key: GOBBLIN-751
> URL: https://issues.apache.org/jira/browse/GOBBLIN-751
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Make enforced file size matching to be configurable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-752) Throttling server incorrectly marks permit numbers as unsatisfiable

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-752?focusedWorklogId=231778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231778
 ]

ASF GitHub Bot logged work on GOBBLIN-752:
--

Author: ASF GitHub Bot
Created on: 23/Apr/19 23:57
Start Date: 23/Apr/19 23:57
Worklog Time Spent: 10m 
  Work Description: ibuenros commented on pull request #2617: [GOBBLIN-752] 
Fix a bug in QPS throttling policy where it was incorrectly indicatin…
URL: https://github.com/apache/incubator-gobblin/pull/2617
 
 
   …g permits were impossible to satisfy.
   
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [ ] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-XXX
   
   
   ### Description
   - [ ] Here are some details about my PR, including screenshots (if 
applicable):
   
   
   ### Tests
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231778)
Time Spent: 10m
Remaining Estimate: 0h

> Throttling server incorrectly marks permit numbers as unsatisfiable
> ---
>
> Key: GOBBLIN-752
> URL: https://issues.apache.org/jira/browse/GOBBLIN-752
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Issac Buenrostro
>Assignee: Issac Buenrostro
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] ibuenros opened a new pull request #2617: [GOBBLIN-752] Fix a bug in QPS throttling policy where it was incorrectly indicatin…

2019-04-23 Thread GitBox
ibuenros opened a new pull request #2617: [GOBBLIN-752] Fix a bug in QPS 
throttling policy where it was incorrectly indicatin…
URL: https://github.com/apache/incubator-gobblin/pull/2617
 
 
   …g permits were impossible to satisfy.
   
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [ ] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-XXX
   
   
   ### Description
   - [ ] Here are some details about my PR, including screenshots (if 
applicable):
   
   
   ### Tests
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] yukuai518 commented on issue #2616: [GOBBLIN-751] Make enforced file size matching to be configurable

2019-04-23 Thread GitBox
yukuai518 commented on issue #2616: [GOBBLIN-751] Make enforced file size 
matching to be configurable
URL: 
https://github.com/apache/incubator-gobblin/pull/2616#issuecomment-486016085
 
 
   @ibuenros please help review this. This will help us onboard a few datasets 
for some validation while the rest of datasets are untouched.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] yukuai518 opened a new pull request #2616: [GOBBLIN-751] Make enforced file size matching to be configurable

2019-04-23 Thread GitBox
yukuai518 opened a new pull request #2616: [GOBBLIN-751] Make enforced file 
size matching to be configurable
URL: https://github.com/apache/incubator-gobblin/pull/2616
 
 
   Make enforced file size matching to be configurable.
   
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-751
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
  This PR makes 'enforced file size matching' to be configurable when we 
copy data files.
  This PR also make the dataFileVersionStrategy to be configurable for 
different dataset during the publisher phase.
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (GOBBLIN-751) Make enforced file size matching to be configurable

2019-04-23 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-751:
---

 Summary: Make enforced file size matching to be configurable
 Key: GOBBLIN-751
 URL: https://issues.apache.org/jira/browse/GOBBLIN-751
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu


Make enforced file size matching to be configurable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Data Lineage in Gobblin

2019-04-23 Thread Jay Sen
Hi,

I see there is a concept of gathering and storing lineage info into
WorkStates, but i cant find how one can use the lineage info from the
stored state.

Can someone please shade more light on the overall lineage feature?

Thanks
Jay


[GitHub] [incubator-gobblin] jhsenjaliya opened a new pull request #2615: Gobblin 750

2019-04-23 Thread GitBox
jhsenjaliya opened a new pull request #2615: Gobblin 750
URL: https://github.com/apache/incubator-gobblin/pull/2615
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [ ] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-750
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   DatasetResolver and DatasetResolverFactory both are marked as deprecated.
   should remove the usage for next version, before there are more such 
resolver added for lineage functionality.
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   existing test case are updated with updated class usage but there no 
additional tests added
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (GOBBLIN-750) remove usage of depricated DatasetResolver and DatasetResolverFactory

2019-04-23 Thread Jay Sen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Sen updated GOBBLIN-750:

Description: 
{{DatasetResolver}} and {{DatasetResolverFactory}} both are marked as 
deprecated.

should remove the usage for next version, before there are more such resolver 
added for lineage functionality.

  was:
{{DatasetResolver}} and {{DatasetResolverFactory}} both are depricated.

should remove the usage for next version


> remove usage of depricated DatasetResolver and DatasetResolverFactory
> -
>
> Key: GOBBLIN-750
> URL: https://issues.apache.org/jira/browse/GOBBLIN-750
> Project: Apache Gobblin
>  Issue Type: Improvement
>Affects Versions: 0.15.0
>Reporter: Jay Sen
>Priority: Minor
> Fix For: 0.15.0
>
>
> {{DatasetResolver}} and {{DatasetResolverFactory}} both are marked as 
> deprecated.
> should remove the usage for next version, before there are more such resolver 
> added for lineage functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-747) Set expected schema when creating workunits

2019-04-23 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-747.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2612
[https://github.com/apache/incubator-gobblin/pull/2612]

> Set expected schema when creating workunits
> ---
>
> Key: GOBBLIN-747
> URL: https://issues.apache.org/jira/browse/GOBBLIN-747
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Zihan Li
>Priority: Major
> Fix For: 0.15.0
>
>
> Set the property of gobblin.copy.expectedSchema when creating the workunit to 
> enable schema check in distcp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-747) Set expected schema when creating workunits

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-747?focusedWorklogId=231753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231753
 ]

ASF GitHub Bot logged work on GOBBLIN-747:
--

Author: ASF GitHub Bot
Created on: 23/Apr/19 22:51
Start Date: 23/Apr/19 22:51
Worklog Time Spent: 10m 
  Work Description: asfgit commented on pull request #2612: 
[GOBBLIN-747]Check schema
URL: https://github.com/apache/incubator-gobblin/pull/2612
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231753)
Time Spent: 10m
Remaining Estimate: 0h

> Set expected schema when creating workunits
> ---
>
> Key: GOBBLIN-747
> URL: https://issues.apache.org/jira/browse/GOBBLIN-747
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Zihan Li
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Set the property of gobblin.copy.expectedSchema when creating the workunit to 
> enable schema check in distcp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-750) remove usage of depricated DatasetResolver and DatasetResolverFactory

2019-04-23 Thread Jay Sen (JIRA)
Jay Sen created GOBBLIN-750:
---

 Summary: remove usage of depricated DatasetResolver and 
DatasetResolverFactory
 Key: GOBBLIN-750
 URL: https://issues.apache.org/jira/browse/GOBBLIN-750
 Project: Apache Gobblin
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Jay Sen
 Fix For: 0.15.0


{{DatasetResolver}} and {{DatasetResolverFactory}} both are depricated.

should remove the usage for next version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] asfgit closed pull request #2612: [GOBBLIN-747]Check schema

2019-04-23 Thread GitBox
asfgit closed pull request #2612: [GOBBLIN-747]Check schema
URL: https://github.com/apache/incubator-gobblin/pull/2612
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-748) Craftsmanship code cleaning in GaaS

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-748?focusedWorklogId=231744=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231744
 ]

ASF GitHub Bot logged work on GOBBLIN-748:
--

Author: ASF GitHub Bot
Created on: 23/Apr/19 22:10
Start Date: 23/Apr/19 22:10
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2613: 
[GOBBLIN-748]Craftsmanship code cleaning in Gobblin Service Code
URL: https://github.com/apache/incubator-gobblin/pull/2613#discussion_r277525100
 
 

 ##
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java
 ##
 @@ -357,10 +357,9 @@ private void initialize(Dag dag)
 }
 
 /**
- * Poll the statuses of running jobs.
- * @return List of {@link JobStatus}es.
+ * Proceed the execution of each dag node based on job status.
  */
-private void pollJobStatuses()
+private void proceedDagExecutionOnDagNodeStatus()
 
 Review comment:
   Don't like this name. pollAndAdvanceDag() maybe?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231744)
Time Spent: 1h 10m  (was: 1h)

> Craftsmanship code cleaning in GaaS 
> 
>
> Key: GOBBLIN-748
> URL: https://issues.apache.org/jira/browse/GOBBLIN-748
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2613: [GOBBLIN-748]Craftsmanship code cleaning in Gobblin Service Code

2019-04-23 Thread GitBox
sv2000 commented on a change in pull request #2613: [GOBBLIN-748]Craftsmanship 
code cleaning in Gobblin Service Code
URL: https://github.com/apache/incubator-gobblin/pull/2613#discussion_r277524878
 
 

 ##
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/template_catalog/FSFlowTemplateCatalog.java
 ##
 @@ -71,7 +74,7 @@
* @param sysConfig that must contain the fully qualified path of the flow 
template catalog
* @throws IOException
*/
-  public FSFlowCatalog(Config sysConfig)
+  public FSFlowTemplateCatalog(Config sysConfig)
 
 Review comment:
   +1 on this change. Was on my list.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2613: [GOBBLIN-748]Craftsmanship code cleaning in Gobblin Service Code

2019-04-23 Thread GitBox
sv2000 commented on a change in pull request #2613: [GOBBLIN-748]Craftsmanship 
code cleaning in Gobblin Service Code
URL: https://github.com/apache/incubator-gobblin/pull/2613#discussion_r277525100
 
 

 ##
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java
 ##
 @@ -357,10 +357,9 @@ private void initialize(Dag dag)
 }
 
 /**
- * Poll the statuses of running jobs.
- * @return List of {@link JobStatus}es.
+ * Proceed the execution of each dag node based on job status.
  */
-private void pollJobStatuses()
+private void proceedDagExecutionOnDagNodeStatus()
 
 Review comment:
   Don't like this name. pollAndAdvanceDag() maybe?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2613: [GOBBLIN-748]Craftsmanship code cleaning in Gobblin Service Code

2019-04-23 Thread GitBox
sv2000 commented on a change in pull request #2613: [GOBBLIN-748]Craftsmanship 
code cleaning in Gobblin Service Code
URL: https://github.com/apache/incubator-gobblin/pull/2613#discussion_r277525235
 
 

 ##
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/template_catalog/FSFlowTemplateCatalog.java
 ##
 @@ -167,4 +159,21 @@ private Config loadHoconFileAtPath(Path filePath, boolean 
allowUnresolved)
   return ConfigFactory.parseReader(new InputStreamReader(is, 
Charsets.UTF_8)).resolve(options);
 }
   }
+
+  /**
+   * Determine if an URI of a jobTemplate or a FlowTemplate is valid.
+   * @param flowURI The given job/flow template
+   * @return true to continue on loading.
 
 Review comment:
   Change to "true if the URI is valid."


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-748) Craftsmanship code cleaning in GaaS

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-748?focusedWorklogId=231743=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231743
 ]

ASF GitHub Bot logged work on GOBBLIN-748:
--

Author: ASF GitHub Bot
Created on: 23/Apr/19 22:10
Start Date: 23/Apr/19 22:10
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2613: 
[GOBBLIN-748]Craftsmanship code cleaning in Gobblin Service Code
URL: https://github.com/apache/incubator-gobblin/pull/2613#discussion_r277524878
 
 

 ##
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/template_catalog/FSFlowTemplateCatalog.java
 ##
 @@ -71,7 +74,7 @@
* @param sysConfig that must contain the fully qualified path of the flow 
template catalog
* @throws IOException
*/
-  public FSFlowCatalog(Config sysConfig)
+  public FSFlowTemplateCatalog(Config sysConfig)
 
 Review comment:
   +1 on this change. Was on my list.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231743)
Time Spent: 1h  (was: 50m)

> Craftsmanship code cleaning in GaaS 
> 
>
> Key: GOBBLIN-748
> URL: https://issues.apache.org/jira/browse/GOBBLIN-748
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-748) Craftsmanship code cleaning in GaaS

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-748?focusedWorklogId=231742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231742
 ]

ASF GitHub Bot logged work on GOBBLIN-748:
--

Author: ASF GitHub Bot
Created on: 23/Apr/19 22:10
Start Date: 23/Apr/19 22:10
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2613: 
[GOBBLIN-748]Craftsmanship code cleaning in Gobblin Service Code
URL: https://github.com/apache/incubator-gobblin/pull/2613#discussion_r277525235
 
 

 ##
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/template_catalog/FSFlowTemplateCatalog.java
 ##
 @@ -167,4 +159,21 @@ private Config loadHoconFileAtPath(Path filePath, boolean 
allowUnresolved)
   return ConfigFactory.parseReader(new InputStreamReader(is, 
Charsets.UTF_8)).resolve(options);
 }
   }
+
+  /**
+   * Determine if an URI of a jobTemplate or a FlowTemplate is valid.
+   * @param flowURI The given job/flow template
+   * @return true to continue on loading.
 
 Review comment:
   Change to "true if the URI is valid."
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231742)
Time Spent: 50m  (was: 40m)

> Craftsmanship code cleaning in GaaS 
> 
>
> Key: GOBBLIN-748
> URL: https://issues.apache.org/jira/browse/GOBBLIN-748
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-744) Support cancellation of a Helix workflow via a DELETE Spec

2019-04-23 Thread Sudarshan Vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudarshan Vasudevan resolved GOBBLIN-744.
-
Resolution: Fixed

> Support cancellation of a Helix workflow via a DELETE Spec
> --
>
> Key: GOBBLIN-744
> URL: https://issues.apache.org/jira/browse/GOBBLIN-744
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-cluster
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> This task supports the ability to interrupt and cancel a running job on a 
> Gobblin Helix cluster via a DELETE Spec submitted to the 
> JobConfigurationManager. The DELETE Spec should have 
> "gobblin.cluster.shouldCancelRunningJobOnDelete" set to true for cancelling a 
> running job. The default behavior is to simply delete the corresponding 
> JobSpec from the JobCatalog. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-744) Support cancellation of a Helix workflow via a DELETE Spec

2019-04-23 Thread Sudarshan Vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudarshan Vasudevan updated GOBBLIN-744:


Issue resolved by PR: [https://github.com/apache/incubator-gobblin/pull/2609]

> Support cancellation of a Helix workflow via a DELETE Spec
> --
>
> Key: GOBBLIN-744
> URL: https://issues.apache.org/jira/browse/GOBBLIN-744
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-cluster
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> This task supports the ability to interrupt and cancel a running job on a 
> Gobblin Helix cluster via a DELETE Spec submitted to the 
> JobConfigurationManager. The DELETE Spec should have 
> "gobblin.cluster.shouldCancelRunningJobOnDelete" set to true for cancelling a 
> running job. The default behavior is to simply delete the corresponding 
> JobSpec from the JobCatalog. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (GOBBLIN-744) Support cancellation of a Helix workflow via a DELETE Spec

2019-04-23 Thread Sudarshan Vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudarshan Vasudevan closed GOBBLIN-744.
---

> Support cancellation of a Helix workflow via a DELETE Spec
> --
>
> Key: GOBBLIN-744
> URL: https://issues.apache.org/jira/browse/GOBBLIN-744
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-cluster
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> This task supports the ability to interrupt and cancel a running job on a 
> Gobblin Helix cluster via a DELETE Spec submitted to the 
> JobConfigurationManager. The DELETE Spec should have 
> "gobblin.cluster.shouldCancelRunningJobOnDelete" set to true for cancelling a 
> running job. The default behavior is to simply delete the corresponding 
> JobSpec from the JobCatalog. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-744) Support cancellation of a Helix workflow via a DELETE Spec

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-744?focusedWorklogId=231723=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231723
 ]

ASF GitHub Bot logged work on GOBBLIN-744:
--

Author: ASF GitHub Bot
Created on: 23/Apr/19 21:53
Start Date: 23/Apr/19 21:53
Worklog Time Spent: 10m 
  Work Description: asfgit commented on pull request #2609: GOBBLIN-744: 
Support cancellation of a Helix workflow via a DELETE Spec.
URL: https://github.com/apache/incubator-gobblin/pull/2609
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231723)
Time Spent: 6h 10m  (was: 6h)

> Support cancellation of a Helix workflow via a DELETE Spec
> --
>
> Key: GOBBLIN-744
> URL: https://issues.apache.org/jira/browse/GOBBLIN-744
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-cluster
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> This task supports the ability to interrupt and cancel a running job on a 
> Gobblin Helix cluster via a DELETE Spec submitted to the 
> JobConfigurationManager. The DELETE Spec should have 
> "gobblin.cluster.shouldCancelRunningJobOnDelete" set to true for cancelling a 
> running job. The default behavior is to simply delete the corresponding 
> JobSpec from the JobCatalog. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] asfgit closed pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec.

2019-04-23 Thread GitBox
asfgit closed pull request #2609: GOBBLIN-744: Support cancellation of a Helix 
workflow via a DELETE Spec.
URL: https://github.com/apache/incubator-gobblin/pull/2609
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-744) Support cancellation of a Helix workflow via a DELETE Spec

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-744?focusedWorklogId=231637=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231637
 ]

ASF GitHub Bot logged work on GOBBLIN-744:
--

Author: ASF GitHub Bot
Created on: 23/Apr/19 18:55
Start Date: 23/Apr/19 18:55
Worklog Time Spent: 10m 
  Work Description: htran1 commented on issue #2609: GOBBLIN-744: Support 
cancellation of a Helix workflow via a DELETE Spec.
URL: 
https://github.com/apache/incubator-gobblin/pull/2609#issuecomment-485930900
 
 
   +1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231637)
Time Spent: 6h  (was: 5h 50m)

> Support cancellation of a Helix workflow via a DELETE Spec
> --
>
> Key: GOBBLIN-744
> URL: https://issues.apache.org/jira/browse/GOBBLIN-744
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-cluster
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> This task supports the ability to interrupt and cancel a running job on a 
> Gobblin Helix cluster via a DELETE Spec submitted to the 
> JobConfigurationManager. The DELETE Spec should have 
> "gobblin.cluster.shouldCancelRunningJobOnDelete" set to true for cancelling a 
> running job. The default behavior is to simply delete the corresponding 
> JobSpec from the JobCatalog. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] htran1 commented on issue #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec.

2019-04-23 Thread GitBox
htran1 commented on issue #2609: GOBBLIN-744: Support cancellation of a Helix 
workflow via a DELETE Spec.
URL: 
https://github.com/apache/incubator-gobblin/pull/2609#issuecomment-485930900
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] ibuenros commented on issue #2582: UnitTest for KafkaSource

2019-04-23 Thread GitBox
ibuenros commented on issue #2582: UnitTest for KafkaSource
URL: 
https://github.com/apache/incubator-gobblin/pull/2582#issuecomment-485887447
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] ibuenros commented on a change in pull request #2612: Check schema

2019-04-23 Thread GitBox
ibuenros commented on a change in pull request #2612: Check schema
URL: https://github.com/apache/incubator-gobblin/pull/2612#discussion_r26259
 
 

 ##
 File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/CopySource.java
 ##
 @@ -357,6 +358,9 @@ public Void call() {
 
   WorkUnit workUnit = new WorkUnit(extract);
   workUnit.addAll(this.state);
+  if(this.copyableDataset instanceof ConfigBasedDataset) {
 
 Review comment:
   Do you also want to check that the expected schema is not null?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-744) Support cancellation of a Helix workflow via a DELETE Spec

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-744?focusedWorklogId=231462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231462
 ]

ASF GitHub Bot logged work on GOBBLIN-744:
--

Author: ASF GitHub Bot
Created on: 23/Apr/19 15:33
Start Date: 23/Apr/19 15:33
Worklog Time Spent: 10m 
  Work Description: shirshanka commented on pull request #2609: 
GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec.
URL: https://github.com/apache/incubator-gobblin/pull/2609#discussion_r277740598
 
 

 ##
 File path: 
gobblin-cluster/src/test/java/org/apache/gobblin/cluster/ClusterIntegrationTest.java
 ##
 @@ -82,6 +101,105 @@ public void testJobShouldComplete()
 suite.waitForAndVerifyOutputFiles();
   }
 
+  /**
+   * An integration test for restarting a Helix workflow via a JobSpec. This 
test case starts a Helix cluster with
+   * a {@link FsScheduledJobConfigurationManager}. The test case does the 
following:
+   * 
+   *add a {@link org.apache.gobblin.runtime.api.JobSpec} that uses a 
{@link org.apache.gobblin.cluster.SleepingCustomTaskSource})
+   *   to {@link IntegrationJobRestartViaSpecSuite#FS_SPEC_CONSUMER_DIR}.  
which is picked by the JobConfigurationManager. 
+   *the JobConfigurationManager sends a notification to the 
GobblinHelixJobScheduler which schedules the job for execution. The JobSpec is
+   *   also added to the JobCatalog for persistence. Helix starts a Workflow 
for this JobSpec. 
+   *We then add a {@link org.apache.gobblin.runtime.api.JobSpec} with 
UPDATE Verb to {@link IntegrationJobRestartViaSpecSuite#FS_SPEC_CONSUMER_DIR}.
+   *   This signals GobblinHelixJobScheduler (and, Helix) to first cancel the 
running job (i.e., Helix Workflow) started in the previous step.
+   *We inspect the state of the zNode corresponding to the Workflow 
resource in Zookeeper to ensure that its {@link 
org.apache.helix.task.TargetState}
+   *   is STOP. 
+   *Once the cancelled job from the previous steps is completed, the 
job will be re-launched for execution by the GobblinHelixJobScheduler.
+   *   We confirm the execution by again inspecting the zNode and ensuring its 
TargetState is START. 
+   * 
+   */
+  @Test (dependsOnMethods = { "testJobShouldGetCancelled" })
+  public void testJobRestartViaSpec() throws Exception {
+this.suite = new IntegrationJobRestartViaSpecSuite();
+HelixManager helixManager = getHelixManager();
+
+IntegrationJobRestartViaSpecSuite restartViaSpecSuite = 
(IntegrationJobRestartViaSpecSuite) this.suite;
+
+//Add a new JobSpec to the path monitored by the SpecConsumer
+restartViaSpecSuite.addJobSpec(IntegrationJobRestartViaSpecSuite.JOB_ID, 
SpecExecutor.Verb.ADD.name());
+
+//Start the cluster
+restartViaSpecSuite.startCluster();
+
+helixManager.connect();
+
+AssertWithBackoff asserter1 = 
AssertWithBackoff.create().timeoutMs(3).maxSleepMs(1000).backoffFactor(1);
+asserter1.assertTrue(isTaskStarted(helixManager, 
IntegrationJobRestartViaSpecSuite.JOB_ID),
+"Waiting for the job to start...");
+
+AssertWithBackoff asserter2 = 
AssertWithBackoff.create().maxSleepMs(100).timeoutMs(2000).backoffFactor(1);
+
asserter2.assertTrue(isTaskRunning(IntegrationJobRestartViaSpecSuite.TASK_STATE_FILE),"Waiting
 for the task to enter running state");
+
+ZkClient zkClient = new ZkClient(this.zkConnectString);
+PathBasedZkSerializer zkSerializer = ChainedPathZkSerializer.builder(new 
ZNRecordStreamingSerializer()).build();
+zkClient.setZkSerializer(zkSerializer);
+
+String clusterName = getHelixManager().getClusterName();
+String zNodePath = Paths.get("/", clusterName, "CONFIGS", "RESOURCE", 
IntegrationJobRestartViaSpecSuite.JOB_ID).toString();
+
+//Ensure that the Workflow is started
+ZNRecord record = zkClient.readData(zNodePath);
+String targetState = record.getSimpleField("TargetState");
+Assert.assertEquals(targetState, TargetState.START.name());
+
+//Add a JobSpec with UPDATE verb signalling the Helix cluster to restart 
the workflow
+restartViaSpecSuite.addJobSpec(IntegrationJobRestartViaSpecSuite.JOB_ID, 
SpecExecutor.Verb.UPDATE.name());
+
+AssertWithBackoff asserter3 = 
AssertWithBackoff.create().maxSleepMs(1000).timeoutMs(5000).backoffFactor(1);
+asserter3.assertTrue(input -> {
+  //Inspect the zNode at the path corresponding to the Workflow resource. 
Ensure the target state of the resource is in
+  // the STOP state or that the zNode has been deleted.
+  ZNRecord recordNew = zkClient.readData(zNodePath, true);
+  String targetStateNew = null;
+  if (recordNew != null) {
+targetStateNew = recordNew.getSimpleField("TargetState");
+  }
+  return recordNew == null || 
targetStateNew.equals(TargetState.STOP.name());
+}, "Waiting for Workflow TargetState to be 

[jira] [Resolved] (GOBBLIN-749) Better access logging for throttling server

2019-04-23 Thread Issac Buenrostro (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Issac Buenrostro resolved GOBBLIN-749.
--
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2614
[https://github.com/apache/incubator-gobblin/pull/2614]

> Better access logging for throttling server
> ---
>
> Key: GOBBLIN-749
> URL: https://issues.apache.org/jira/browse/GOBBLIN-749
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Issac Buenrostro
>Assignee: Issac Buenrostro
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-749) Better access logging for throttling server

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-749?focusedWorklogId=231489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231489
 ]

ASF GitHub Bot logged work on GOBBLIN-749:
--

Author: ASF GitHub Bot
Created on: 23/Apr/19 16:19
Start Date: 23/Apr/19 16:19
Worklog Time Spent: 10m 
  Work Description: asfgit commented on pull request #2614: [GOBBLIN-749] 
Add logging to limiter server.
URL: https://github.com/apache/incubator-gobblin/pull/2614
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231489)
Time Spent: 20m  (was: 10m)

> Better access logging for throttling server
> ---
>
> Key: GOBBLIN-749
> URL: https://issues.apache.org/jira/browse/GOBBLIN-749
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Issac Buenrostro
>Assignee: Issac Buenrostro
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] asfgit closed pull request #2614: [GOBBLIN-749] Add logging to limiter server.

2019-04-23 Thread GitBox
asfgit closed pull request #2614: [GOBBLIN-749] Add logging to limiter server.
URL: https://github.com/apache/incubator-gobblin/pull/2614
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-744) Support cancellation of a Helix workflow via a DELETE Spec

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-744?focusedWorklogId=231456=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231456
 ]

ASF GitHub Bot logged work on GOBBLIN-744:
--

Author: ASF GitHub Bot
Created on: 23/Apr/19 15:26
Start Date: 23/Apr/19 15:26
Worklog Time Spent: 10m 
  Work Description: shirshanka commented on pull request #2609: 
GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec.
URL: https://github.com/apache/incubator-gobblin/pull/2609#discussion_r277737062
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/SleepingTask.java
 ##
 @@ -17,24 +17,52 @@
 
 package org.apache.gobblin.cluster;
 
+import java.io.File;
+import java.io.IOException;
+
+import com.google.common.io.Files;
+
 import avro.shaded.com.google.common.base.Throwables;
 
 Review comment:
   spotted a faulty import from before
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231456)
Time Spent: 5h 20m  (was: 5h 10m)

> Support cancellation of a Helix workflow via a DELETE Spec
> --
>
> Key: GOBBLIN-744
> URL: https://issues.apache.org/jira/browse/GOBBLIN-744
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-cluster
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> This task supports the ability to interrupt and cancel a running job on a 
> Gobblin Helix cluster via a DELETE Spec submitted to the 
> JobConfigurationManager. The DELETE Spec should have 
> "gobblin.cluster.shouldCancelRunningJobOnDelete" set to true for cancelling a 
> running job. The default behavior is to simply delete the corresponding 
> JobSpec from the JobCatalog. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-744) Support cancellation of a Helix workflow via a DELETE Spec

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-744?focusedWorklogId=231464=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231464
 ]

ASF GitHub Bot logged work on GOBBLIN-744:
--

Author: ASF GitHub Bot
Created on: 23/Apr/19 15:34
Start Date: 23/Apr/19 15:34
Worklog Time Spent: 10m 
  Work Description: shirshanka commented on pull request #2609: 
GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec.
URL: https://github.com/apache/incubator-gobblin/pull/2609#discussion_r277741264
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/api/FsSpecConsumer.java
 ##
 @@ -74,6 +78,8 @@ public FsSpecConsumer(Config config) {
   return null;
 }
 
+Arrays.sort(fileStatuses, 
Comparator.comparingLong(FileStatus::getModificationTime));
 
 Review comment:
   add a comment for why you're doing it and what you're expecting (ascending 
sort?)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231464)
Time Spent: 5h 50m  (was: 5h 40m)

> Support cancellation of a Helix workflow via a DELETE Spec
> --
>
> Key: GOBBLIN-744
> URL: https://issues.apache.org/jira/browse/GOBBLIN-744
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-cluster
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> This task supports the ability to interrupt and cancel a running job on a 
> Gobblin Helix cluster via a DELETE Spec submitted to the 
> JobConfigurationManager. The DELETE Spec should have 
> "gobblin.cluster.shouldCancelRunningJobOnDelete" set to true for cancelling a 
> running job. The default behavior is to simply delete the corresponding 
> JobSpec from the JobCatalog. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] shirshanka commented on a change in pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec.

2019-04-23 Thread GitBox
shirshanka commented on a change in pull request #2609: GOBBLIN-744: Support 
cancellation of a Helix workflow via a DELETE Spec.
URL: https://github.com/apache/incubator-gobblin/pull/2609#discussion_r277741264
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/api/FsSpecConsumer.java
 ##
 @@ -74,6 +78,8 @@ public FsSpecConsumer(Config config) {
   return null;
 }
 
+Arrays.sort(fileStatuses, 
Comparator.comparingLong(FileStatus::getModificationTime));
 
 Review comment:
   add a comment for why you're doing it and what you're expecting (ascending 
sort?)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] shirshanka commented on a change in pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec.

2019-04-23 Thread GitBox
shirshanka commented on a change in pull request #2609: GOBBLIN-744: Support 
cancellation of a Helix workflow via a DELETE Spec.
URL: https://github.com/apache/incubator-gobblin/pull/2609#discussion_r277740598
 
 

 ##
 File path: 
gobblin-cluster/src/test/java/org/apache/gobblin/cluster/ClusterIntegrationTest.java
 ##
 @@ -82,6 +101,105 @@ public void testJobShouldComplete()
 suite.waitForAndVerifyOutputFiles();
   }
 
+  /**
+   * An integration test for restarting a Helix workflow via a JobSpec. This 
test case starts a Helix cluster with
+   * a {@link FsScheduledJobConfigurationManager}. The test case does the 
following:
+   * 
+   *add a {@link org.apache.gobblin.runtime.api.JobSpec} that uses a 
{@link org.apache.gobblin.cluster.SleepingCustomTaskSource})
+   *   to {@link IntegrationJobRestartViaSpecSuite#FS_SPEC_CONSUMER_DIR}.  
which is picked by the JobConfigurationManager. 
+   *the JobConfigurationManager sends a notification to the 
GobblinHelixJobScheduler which schedules the job for execution. The JobSpec is
+   *   also added to the JobCatalog for persistence. Helix starts a Workflow 
for this JobSpec. 
+   *We then add a {@link org.apache.gobblin.runtime.api.JobSpec} with 
UPDATE Verb to {@link IntegrationJobRestartViaSpecSuite#FS_SPEC_CONSUMER_DIR}.
+   *   This signals GobblinHelixJobScheduler (and, Helix) to first cancel the 
running job (i.e., Helix Workflow) started in the previous step.
+   *We inspect the state of the zNode corresponding to the Workflow 
resource in Zookeeper to ensure that its {@link 
org.apache.helix.task.TargetState}
+   *   is STOP. 
+   *Once the cancelled job from the previous steps is completed, the 
job will be re-launched for execution by the GobblinHelixJobScheduler.
+   *   We confirm the execution by again inspecting the zNode and ensuring its 
TargetState is START. 
+   * 
+   */
+  @Test (dependsOnMethods = { "testJobShouldGetCancelled" })
+  public void testJobRestartViaSpec() throws Exception {
+this.suite = new IntegrationJobRestartViaSpecSuite();
+HelixManager helixManager = getHelixManager();
+
+IntegrationJobRestartViaSpecSuite restartViaSpecSuite = 
(IntegrationJobRestartViaSpecSuite) this.suite;
+
+//Add a new JobSpec to the path monitored by the SpecConsumer
+restartViaSpecSuite.addJobSpec(IntegrationJobRestartViaSpecSuite.JOB_ID, 
SpecExecutor.Verb.ADD.name());
+
+//Start the cluster
+restartViaSpecSuite.startCluster();
+
+helixManager.connect();
+
+AssertWithBackoff asserter1 = 
AssertWithBackoff.create().timeoutMs(3).maxSleepMs(1000).backoffFactor(1);
+asserter1.assertTrue(isTaskStarted(helixManager, 
IntegrationJobRestartViaSpecSuite.JOB_ID),
+"Waiting for the job to start...");
+
+AssertWithBackoff asserter2 = 
AssertWithBackoff.create().maxSleepMs(100).timeoutMs(2000).backoffFactor(1);
+
asserter2.assertTrue(isTaskRunning(IntegrationJobRestartViaSpecSuite.TASK_STATE_FILE),"Waiting
 for the task to enter running state");
+
+ZkClient zkClient = new ZkClient(this.zkConnectString);
+PathBasedZkSerializer zkSerializer = ChainedPathZkSerializer.builder(new 
ZNRecordStreamingSerializer()).build();
+zkClient.setZkSerializer(zkSerializer);
+
+String clusterName = getHelixManager().getClusterName();
+String zNodePath = Paths.get("/", clusterName, "CONFIGS", "RESOURCE", 
IntegrationJobRestartViaSpecSuite.JOB_ID).toString();
+
+//Ensure that the Workflow is started
+ZNRecord record = zkClient.readData(zNodePath);
+String targetState = record.getSimpleField("TargetState");
+Assert.assertEquals(targetState, TargetState.START.name());
+
+//Add a JobSpec with UPDATE verb signalling the Helix cluster to restart 
the workflow
+restartViaSpecSuite.addJobSpec(IntegrationJobRestartViaSpecSuite.JOB_ID, 
SpecExecutor.Verb.UPDATE.name());
+
+AssertWithBackoff asserter3 = 
AssertWithBackoff.create().maxSleepMs(1000).timeoutMs(5000).backoffFactor(1);
+asserter3.assertTrue(input -> {
+  //Inspect the zNode at the path corresponding to the Workflow resource. 
Ensure the target state of the resource is in
+  // the STOP state or that the zNode has been deleted.
+  ZNRecord recordNew = zkClient.readData(zNodePath, true);
+  String targetStateNew = null;
+  if (recordNew != null) {
+targetStateNew = recordNew.getSimpleField("TargetState");
+  }
+  return recordNew == null || 
targetStateNew.equals(TargetState.STOP.name());
+}, "Waiting for Workflow TargetState to be STOP");
+
+//Ensure that the SleepingTask did not terminate normally i.e. it was 
interrupted. We check this by ensuring
+// that the line "Hello World!" is not present in the logged output.
+suite.waitForAndVerifyOutputFiles();
+
+AssertWithBackoff asserter4 = 
AssertWithBackoff.create().maxSleepMs(1000).timeoutMs(12).backoffFactor(1);
+asserter4.assertTrue(input -> {
+  //Inspect the 

[jira] [Work logged] (GOBBLIN-744) Support cancellation of a Helix workflow via a DELETE Spec

2019-04-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-744?focusedWorklogId=231461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231461
 ]

ASF GitHub Bot logged work on GOBBLIN-744:
--

Author: ASF GitHub Bot
Created on: 23/Apr/19 15:31
Start Date: 23/Apr/19 15:31
Worklog Time Spent: 10m 
  Work Description: shirshanka commented on pull request #2609: 
GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec.
URL: https://github.com/apache/incubator-gobblin/pull/2609#discussion_r277739662
 
 

 ##
 File path: 
gobblin-cluster/src/test/java/org/apache/gobblin/cluster/ClusterIntegrationTest.java
 ##
 @@ -51,28 +68,30 @@ public void testJobShouldComplete()
 runAndVerify();
   }
 
-  @Test void testJobShouldGetCancelled() throws Exception {
-this.suite =new IntegrationJobCancelSuite();
+  private HelixManager getHelixManager() {
 Config helixConfig = this.suite.getManagerConfig();
 String clusterName = 
helixConfig.getString(GobblinClusterConfigurationKeys.HELIX_CLUSTER_NAME_KEY);
 String instanceName = ConfigUtils.getString(helixConfig, 
GobblinClusterConfigurationKeys.HELIX_INSTANCE_NAME_KEY,
 GobblinClusterManager.class.getSimpleName());
-String zkConnectString = 
helixConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY);
+this.zkConnectString = 
helixConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY);
 HelixManager helixManager = 
HelixManagerFactory.getZKHelixManager(clusterName, instanceName, 
InstanceType.CONTROLLER, zkConnectString);
+return helixManager;
+  }
 
+  @Test void testJobShouldGetCancelled() throws Exception {
+this.suite =new IntegrationJobCancelSuite();
+HelixManager helixManager = getHelixManager();
 suite.startCluster();
-
 helixManager.connect();
 
 TaskDriver taskDriver = new TaskDriver(helixManager);
 
-while (TaskDriver.getWorkflowContext(helixManager, 
IntegrationJobCancelSuite.JOB_ID) == null) {
-  log.warn("Waiting for the job to start...");
-  Thread.sleep(1000L);
-}
+AssertWithBackoff asserter1 = 
AssertWithBackoff.create().maxSleepMs(1000).backoffFactor(1);
+asserter1.assertTrue(isTaskStarted(helixManager, 
IntegrationJobCancelSuite.JOB_ID),
 
 Review comment:
   you could chain the entire call without needing the local variable asserter1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231461)
Time Spent: 5.5h  (was: 5h 20m)

> Support cancellation of a Helix workflow via a DELETE Spec
> --
>
> Key: GOBBLIN-744
> URL: https://issues.apache.org/jira/browse/GOBBLIN-744
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-cluster
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> This task supports the ability to interrupt and cancel a running job on a 
> Gobblin Helix cluster via a DELETE Spec submitted to the 
> JobConfigurationManager. The DELETE Spec should have 
> "gobblin.cluster.shouldCancelRunningJobOnDelete" set to true for cancelling a 
> running job. The default behavior is to simply delete the corresponding 
> JobSpec from the JobCatalog. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] shirshanka commented on a change in pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec.

2019-04-23 Thread GitBox
shirshanka commented on a change in pull request #2609: GOBBLIN-744: Support 
cancellation of a Helix workflow via a DELETE Spec.
URL: https://github.com/apache/incubator-gobblin/pull/2609#discussion_r277739662
 
 

 ##
 File path: 
gobblin-cluster/src/test/java/org/apache/gobblin/cluster/ClusterIntegrationTest.java
 ##
 @@ -51,28 +68,30 @@ public void testJobShouldComplete()
 runAndVerify();
   }
 
-  @Test void testJobShouldGetCancelled() throws Exception {
-this.suite =new IntegrationJobCancelSuite();
+  private HelixManager getHelixManager() {
 Config helixConfig = this.suite.getManagerConfig();
 String clusterName = 
helixConfig.getString(GobblinClusterConfigurationKeys.HELIX_CLUSTER_NAME_KEY);
 String instanceName = ConfigUtils.getString(helixConfig, 
GobblinClusterConfigurationKeys.HELIX_INSTANCE_NAME_KEY,
 GobblinClusterManager.class.getSimpleName());
-String zkConnectString = 
helixConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY);
+this.zkConnectString = 
helixConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY);
 HelixManager helixManager = 
HelixManagerFactory.getZKHelixManager(clusterName, instanceName, 
InstanceType.CONTROLLER, zkConnectString);
+return helixManager;
+  }
 
+  @Test void testJobShouldGetCancelled() throws Exception {
+this.suite =new IntegrationJobCancelSuite();
+HelixManager helixManager = getHelixManager();
 suite.startCluster();
-
 helixManager.connect();
 
 TaskDriver taskDriver = new TaskDriver(helixManager);
 
-while (TaskDriver.getWorkflowContext(helixManager, 
IntegrationJobCancelSuite.JOB_ID) == null) {
-  log.warn("Waiting for the job to start...");
-  Thread.sleep(1000L);
-}
+AssertWithBackoff asserter1 = 
AssertWithBackoff.create().maxSleepMs(1000).backoffFactor(1);
+asserter1.assertTrue(isTaskStarted(helixManager, 
IntegrationJobCancelSuite.JOB_ID),
 
 Review comment:
   you could chain the entire call without needing the local variable asserter1


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] shirshanka commented on a change in pull request #2609: GOBBLIN-744: Support cancellation of a Helix workflow via a DELETE Spec.

2019-04-23 Thread GitBox
shirshanka commented on a change in pull request #2609: GOBBLIN-744: Support 
cancellation of a Helix workflow via a DELETE Spec.
URL: https://github.com/apache/incubator-gobblin/pull/2609#discussion_r277737062
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/SleepingTask.java
 ##
 @@ -17,24 +17,52 @@
 
 package org.apache.gobblin.cluster;
 
+import java.io.File;
+import java.io.IOException;
+
+import com.google.common.io.Files;
+
 import avro.shaded.com.google.common.base.Throwables;
 
 Review comment:
   spotted a faulty import from before


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services