[jira] [Created] (GOBBLIN-1205) restarting gobblin on yarn fails with error

2020-06-20 Thread Jay Sen (Jira)
Jay Sen created GOBBLIN-1205:


 Summary: restarting gobblin on yarn fails with error
 Key: GOBBLIN-1205
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1205
 Project: Apache Gobblin
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Jay Sen
 Fix For: 0.15.0


restarting gobblin deployed on yarn mode occasionally fails starting up with 
following error, may be the path is still on hold by the previous process, it 
may just need bit time between stop/start.
{code:java}
WARN [ZKHelixAdmin] Root directory exists.Cleaning the root 
directory:/GobblinYarnHelixAppWARN [ZKHelixAdmin] Root directory 
exists.Cleaning the root directory:/GobblinYarnHelixAppWARN [ZkClient] Failed 
to delete path /GobblinYarnHelixApp/CONTROLLER! 
org.I0Itec.zkclient.exception.ZkException: 
org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = 
Directory not empty for /GobblinYarnHelixApp/CONTROLLERERROR [ZkClient] Failed 
to delete 
/GobblinYarnHelixApp/CONTROLLERorg.I0Itec.zkclient.exception.ZkException: 
org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = 
Directory not empty for /GobblinYarnHelixApp/CONTROLLER at 
org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:68) at 
org.apache.helix.manager.zk.zookeeper.ZkClient.retryUntilConnected(ZkClient.java:1160)
 at org.apache.helix.manager.zk.zookeeper.ZkClient.delete(ZkClient.java:1215) 
at 
org.apache.helix.manager.zk.zookeeper.ZkClient.deleteRecursively(ZkClient.java:949)
 at 
org.apache.helix.manager.zk.zookeeper.ZkClient.deleteRecursively(ZkClient.java:942)
 at org.apache.helix.manager.zk.ZKHelixAdmin.addCluster(ZKHelixAdmin.java:698) 
at org.apache.helix.tools.ClusterSetup.addCluster(ClusterSetup.java:162) at 
org.apache.gobblin.cluster.HelixUtils.createGobblinHelixCluster(HelixUtils.java:96)
 at 
org.apache.gobblin.yarn.GobblinYarnAppLauncher.launch(GobblinYarnAppLauncher.java:337)
 at 
org.apache.gobblin.yarn.GobblinYarnAppLauncher.main(GobblinYarnAppLauncher.java:1067)Caused
 by: org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = 
Directory not empty for /GobblinYarnHelixApp/CONTROLLER at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:128) at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at 
org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:882) at 
org.apache.helix.manager.zk.zookeeper.ZkConnection.delete(ZkConnection.java:119)
 at org.apache.helix.manager.zk.zookeeper.ZkClient$9.call(ZkClient.java:1219) 
at 
org.apache.helix.manager.zk.zookeeper.ZkClient.retryUntilConnected(ZkClient.java:1150)
 ... 8 more
==> logs/yarn.err <==Exception in thread "main" 
org.apache.helix.HelixException: Failed to delete 
/GobblinYarnHelixApp/CONTROLLER at 
org.apache.helix.manager.zk.zookeeper.ZkClient.deleteRecursively(ZkClient.java:952)
 at 
org.apache.helix.manager.zk.zookeeper.ZkClient.deleteRecursively(ZkClient.java:942)
 at org.apache.helix.manager.zk.ZKHelixAdmin.addCluster(ZKHelixAdmin.java:698) 
at org.apache.helix.tools.ClusterSetup.addCluster(ClusterSetup.java:162) at 
org.apache.gobblin.cluster.HelixUtils.createGobblinHelixCluster(HelixUtils.java:96)
 at 
org.apache.gobblin.yarn.GobblinYarnAppLauncher.launch(GobblinYarnAppLauncher.java:337)
 at 
org.apache.gobblin.yarn.GobblinYarnAppLauncher.main(GobblinYarnAppLauncher.java:1067)Caused
 by: org.I0Itec.zkclient.exception.ZkException: 
org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = 
Directory not empty for /GobblinYarnHelixApp/CONTROLLER at 
org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:68) at 
org.apache.helix.manager.zk.zookeeper.ZkClient.retryUntilConnected(ZkClient.java:1160)
 at org.apache.helix.manager.zk.zookeeper.ZkClient.delete(ZkClient.java:1215) 
at 
org.apache.helix.manager.zk.zookeeper.ZkClient.deleteRecursively(ZkClient.java:949)
 ... 6 moreCaused by: org.apache.zookeeper.KeeperException$NotEmptyException: 
KeeperErrorCode = Directory not empty for /GobblinYarnHelixApp/CONTROLLER at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:128) at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at 
org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:882) at 
org.apache.helix.manager.zk.zookeeper.ZkConnection.delete(ZkConnection.java:119)
 at org.apache.helix.manager.zk.zookeeper.ZkClient$9.call(ZkClient.java:1219) 
at 
org.apache.helix.manager.zk.zookeeper.ZkClient.retryUntilConnected(ZkClient.java:1150)
 ... 8 moreException in thread "Thread-6" org.apache.helix.HelixException: 
HelixManager (ZkClient) is not connected. Call HelixManager#connect() at 
org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:363)
 at 
org.apache.helix.manager.zk.ZKHelixManager.getClusterManagmentTo

[jira] [Resolved] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-04-03 Thread Sudarshan Vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudarshan Vasudevan resolved GOBBLIN-1099.
--
Resolution: Fixed

> Handle orphaned Yarn containers in Gobblin-on-Yarn clusters
> ---
>
>     Key: GOBBLIN-1099
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> A Yarn application may leave behind orphaned containers, which can happen due 
> to lost node managers. The orphaned containers however can continue to run 
> (potentially forever) as participants in the Helix cluster. This can cause 
> the following problems for a Gobblin-on-Yarn application:
>  # Double publish of data and commit of state
>  # Task failures and partition starvation during application restarts, as 
> Helix may assign tasks to the orphaned containers which have a stale state 
> and configuration
>  # Container failures on application restarts due to Helix instance name 
> collisions with orphaned containers
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-04-03 Thread Sudarshan Vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudarshan Vasudevan closed GOBBLIN-1099.


> Handle orphaned Yarn containers in Gobblin-on-Yarn clusters
> ---
>
>     Key: GOBBLIN-1099
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> A Yarn application may leave behind orphaned containers, which can happen due 
> to lost node managers. The orphaned containers however can continue to run 
> (potentially forever) as participants in the Helix cluster. This can cause 
> the following problems for a Gobblin-on-Yarn application:
>  # Double publish of data and commit of state
>  # Task failures and partition starvation during application restarts, as 
> Helix may assign tasks to the orphaned containers which have a stale state 
> and configuration
>  # Container failures on application restarts due to Helix instance name 
> collisions with orphaned containers
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=411607=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411607
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 28/Mar/20 04:26
Start Date: 28/Mar/20 04:26
Worklog Time Spent: 10m 
  Work Description: asfgit commented on pull request #2940: GOBBLIN-1099: 
Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 411607)
Time Spent: 2h 40m  (was: 2.5h)

> Handle orphaned Yarn containers in Gobblin-on-Yarn clusters
> ---
>
>     Key: GOBBLIN-1099
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> A Yarn application may leave behind orphaned containers, which can happen due 
> to lost node managers. The orphaned containers however can continue to run 
> (potentially forever) as participants in the Helix cluster. This can cause 
> the following problems for a Gobblin-on-Yarn application:
>  # Double publish of data and commit of state
>  # Task failures and partition starvation during application restarts, as 
> Helix may assign tasks to the orphaned containers which have a stale state 
> and configuration
>  # Container failures on application restarts due to Helix instance name 
> collisions with orphaned containers
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] asfgit closed pull request #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-27 Thread GitBox
asfgit closed pull request #2940: GOBBLIN-1099: Handle orphaned Yarn containers 
in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=411595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411595
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 28/Mar/20 03:19
Start Date: 28/Mar/20 03:19
Worklog Time Spent: 10m 
  Work Description: codecov-io commented on issue #2940: GOBBLIN-1099: 
Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: 
https://github.com/apache/incubator-gobblin/pull/2940#issuecomment-604751963
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=h1)
 Report
   > Merging 
[#2940](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/a63461257c3fcea8f4ff67087f8cb29be25d6baf=desc)
 will **decrease** coverage by `40.49%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/graphs/tree.svg?width=650=150=pr=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2940   +/-   ##
   
   - Coverage 44.60%   4.10%   -40.50% 
   + Complexity 8980 750 -8230 
   
 Files  19361936   
 Lines 73234   73292   +58 
 Branches   80838088+5 
   
   - Hits  326693012-29657 
   - Misses37515   69960+32445 
   + Partials   3050 320 -2730 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=)
 | `0.00% <0.00%> (-53.92%)` | `0.00 <0.00> (-26.00)` | |
   | 
[.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==)
 | `0.00% <0.00%> (-64.16%)` | `0.00 <0.00> (-27.00)` | |
   | 
[...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh)
 | `0.00% <0.00%> (-38.27%)` | `0.00 <0.00> (-14.00)` | |
   | 
[.../apache/gobblin/yarn/GobblinApplicationMaster.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbkFwcGxpY2F0aW9uTWFzdGVyLmphdmE=)
 | `0.00% <0.00%> (-18.06%)` | `0.00 <0.00> (-3.00)` | |
   | 
[...rg/apache/gobblin/yarn/GobblinYarnAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5BcHBMYXVuY2hlci5qYXZh)
 | `0.00% <0.00%> (-21.17%)` | `0.00 <0.00> (-8.00)` | |
   | 
[...main/java/org/apache/gobblin/yarn/YarnService.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vWWFyblNlcnZpY2UuamF2YQ==)
 | `0.00% <0.00%> (-15.28%)` | `0.00 <0.00> (-4.00)` | |
   | 
[...c/main/java/org/apache/gobblin/util/FileUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvRmlsZVV0aWxzLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...n/java/org/apache/gobblin/fork/CopyableSchema.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2ZvcmsvQ29weWFibGVTY2hlbWEuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...java/org/apache/gobblin/stream/ControlMessage.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vc3RyZWFtL0NvbnRyb2xNZXNzYWdlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...va/org/apache/gobblin/dataset/DatasetResolver.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-27 Thread GitBox
codecov-io edited a comment on issue #2940: GOBBLIN-1099: Handle orphaned Yarn 
containers in Gobblin-on-Yarn clus…
URL: 
https://github.com/apache/incubator-gobblin/pull/2940#issuecomment-604751963
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=h1)
 Report
   > Merging 
[#2940](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/a63461257c3fcea8f4ff67087f8cb29be25d6baf=desc)
 will **decrease** coverage by `40.49%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/graphs/tree.svg?width=650=150=pr=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2940   +/-   ##
   
   - Coverage 44.60%   4.10%   -40.50% 
   + Complexity 8980 750 -8230 
   
 Files  19361936   
 Lines 73234   73292   +58 
 Branches   80838088+5 
   
   - Hits  326693012-29657 
   - Misses37515   69960+32445 
   + Partials   3050 320 -2730 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=)
 | `0.00% <0.00%> (-53.92%)` | `0.00 <0.00> (-26.00)` | |
   | 
[.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==)
 | `0.00% <0.00%> (-64.16%)` | `0.00 <0.00> (-27.00)` | |
   | 
[...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh)
 | `0.00% <0.00%> (-38.27%)` | `0.00 <0.00> (-14.00)` | |
   | 
[.../apache/gobblin/yarn/GobblinApplicationMaster.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbkFwcGxpY2F0aW9uTWFzdGVyLmphdmE=)
 | `0.00% <0.00%> (-18.06%)` | `0.00 <0.00> (-3.00)` | |
   | 
[...rg/apache/gobblin/yarn/GobblinYarnAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5BcHBMYXVuY2hlci5qYXZh)
 | `0.00% <0.00%> (-21.17%)` | `0.00 <0.00> (-8.00)` | |
   | 
[...main/java/org/apache/gobblin/yarn/YarnService.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vWWFyblNlcnZpY2UuamF2YQ==)
 | `0.00% <0.00%> (-15.28%)` | `0.00 <0.00> (-4.00)` | |
   | 
[...c/main/java/org/apache/gobblin/util/FileUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvRmlsZVV0aWxzLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...n/java/org/apache/gobblin/fork/CopyableSchema.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2ZvcmsvQ29weWFibGVTY2hlbWEuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...java/org/apache/gobblin/stream/ControlMessage.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vc3RyZWFtL0NvbnRyb2xNZXNzYWdlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...va/org/apache/gobblin/dataset/DatasetResolver.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YXNldC9EYXRhc2V0UmVzb2x2ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | ... and [1150 
more](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=continue

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-27 Thread GitBox
sv2000 commented on a change in pull request #2940: GOBBLIN-1099: Handle 
orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399594383
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java
 ##
 @@ -366,20 +377,74 @@ boolean isStopped() {
   }
 
   @VisibleForTesting
-  void connectHelixManager() {
-try {
-  this.jobHelixManager.connect();
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
-  new ParticipantShutdownMessageHandlerFactory());
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
-  getUserDefinedMessageHandlerFactory());
-  if (this.taskDriverHelixManager.isPresent()) {
-this.taskDriverHelixManager.get().connect();
+  void connectHelixManager() throws Exception {
+this.jobHelixManager.connect();
+//Ensure the instance is enabled.
+this.jobHelixManager.getClusterManagmentTool().enableInstance(clusterName, 
helixInstanceName, true);
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
+new ParticipantShutdownMessageHandlerFactory());
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
+getUserDefinedMessageHandlerFactory());
+if (this.taskDriverHelixManager.isPresent()) {
+  this.taskDriverHelixManager.get().connect();
+  //Ensure the instance is enabled.
+  
this.taskDriverHelixManager.get().getClusterManagmentTool().enableInstance(this.taskDriverHelixManager.get().getClusterName(),
 helixInstanceName, true);
+}
+  }
+
+  /**
+   * A method to handle failures joining Helix cluster. The method will 
perform the following steps before attempting
+   * to re-join the cluster:
+   * 
+   *   Disconnect from Helix cluster, which would close any open 
clients
+   *   Drop instance from Helix cluster, to remove any partial instance 
structure from Helix
+   *   Re-construct helix manager instances, used to re-join the 
cluster
+   * 
+   */
+  private void onClusterJoinFailure() {
+logger.warn("Disconnecting Helix manager..");
+disconnectHelixManager();
+
+HelixAdmin admin = new 
ZKHelixAdmin(clusterConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY));
+//Drop the helix Instance
+logger.warn("Dropping instance: {} from cluster: {}", helixInstanceName, 
clusterName);
+HelixUtils.dropInstanceIfExists(admin, clusterName, helixInstanceName);
 
 Review comment:
   Changed it to return void. The dropInstanceIfExists method is intended to 
swallow HelixException that can only occur due to instance path not being 
present i.e instance does not exist. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=411565=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411565
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 28/Mar/20 00:34
Start Date: 28/Mar/20 00:34
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2940: GOBBLIN-1099: 
Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399594383
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java
 ##
 @@ -366,20 +377,74 @@ boolean isStopped() {
   }
 
   @VisibleForTesting
-  void connectHelixManager() {
-try {
-  this.jobHelixManager.connect();
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
-  new ParticipantShutdownMessageHandlerFactory());
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
-  getUserDefinedMessageHandlerFactory());
-  if (this.taskDriverHelixManager.isPresent()) {
-this.taskDriverHelixManager.get().connect();
+  void connectHelixManager() throws Exception {
+this.jobHelixManager.connect();
+//Ensure the instance is enabled.
+this.jobHelixManager.getClusterManagmentTool().enableInstance(clusterName, 
helixInstanceName, true);
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
+new ParticipantShutdownMessageHandlerFactory());
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
+getUserDefinedMessageHandlerFactory());
+if (this.taskDriverHelixManager.isPresent()) {
+  this.taskDriverHelixManager.get().connect();
+  //Ensure the instance is enabled.
+  
this.taskDriverHelixManager.get().getClusterManagmentTool().enableInstance(this.taskDriverHelixManager.get().getClusterName(),
 helixInstanceName, true);
+}
+  }
+
+  /**
+   * A method to handle failures joining Helix cluster. The method will 
perform the following steps before attempting
+   * to re-join the cluster:
+   * 
+   *   Disconnect from Helix cluster, which would close any open 
clients
+   *   Drop instance from Helix cluster, to remove any partial instance 
structure from Helix
+   *   Re-construct helix manager instances, used to re-join the 
cluster
+   * 
+   */
+  private void onClusterJoinFailure() {
+logger.warn("Disconnecting Helix manager..");
+disconnectHelixManager();
+
+HelixAdmin admin = new 
ZKHelixAdmin(clusterConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY));
+//Drop the helix Instance
+logger.warn("Dropping instance: {} from cluster: {}", helixInstanceName, 
clusterName);
+HelixUtils.dropInstanceIfExists(admin, clusterName, helixInstanceName);
 
 Review comment:
   Changed it to return void. The dropInstanceIfExists method is intended to 
swallow HelixException that can only occur due to instance path not being 
present i.e instance does not exist. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 411565)
Time Spent: 2h 20m  (was: 2h 10m)

> Handle orphaned Yarn containers in Gobblin-on-Yarn clusters
> ---
>
> Key: GOBBLIN-1099
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> A Yarn application may leave behind orphaned containers, which can happen due 
> to lost node managers. The orphaned containers however can continue to run 
> (potentially forever) as participants in the Helix cluster. This can cause 
> the following problems for a Gobblin-on-Yarn application:
>  # Double publish of data and commit of state
>  # Task failures and partition starvation during application restarts, as 
> Helix may assign tasks to t

[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=411560=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411560
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 28/Mar/20 00:28
Start Date: 28/Mar/20 00:28
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2940: GOBBLIN-1099: 
Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399593342
 
 

 ##
 File path: gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java
 ##
 @@ -625,7 +634,7 @@ private boolean shouldStickToTheSameNode(int 
containerExitStatus) {
*/
   protected void handleContainerCompletion(ContainerStatus containerStatus) {
 Map.Entry completedContainerEntry = 
this.containerMap.remove(containerStatus.getContainerId());
-String completedInstanceName = completedContainerEntry == null? "unknown" 
: completedContainerEntry.getValue();
+String completedInstanceName = completedContainerEntry == null?  
UNKNOWN_HELIX_INSTANCE : completedContainerEntry.getValue();
 
 Review comment:
   Added a comment.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 411560)
Time Spent: 2h 10m  (was: 2h)

> Handle orphaned Yarn containers in Gobblin-on-Yarn clusters
> ---
>
>     Key: GOBBLIN-1099
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> A Yarn application may leave behind orphaned containers, which can happen due 
> to lost node managers. The orphaned containers however can continue to run 
> (potentially forever) as participants in the Helix cluster. This can cause 
> the following problems for a Gobblin-on-Yarn application:
>  # Double publish of data and commit of state
>  # Task failures and partition starvation during application restarts, as 
> Helix may assign tasks to the orphaned containers which have a stale state 
> and configuration
>  # Container failures on application restarts due to Helix instance name 
> collisions with orphaned containers
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-27 Thread GitBox
sv2000 commented on a change in pull request #2940: GOBBLIN-1099: Handle 
orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399593342
 
 

 ##
 File path: gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java
 ##
 @@ -625,7 +634,7 @@ private boolean shouldStickToTheSameNode(int 
containerExitStatus) {
*/
   protected void handleContainerCompletion(ContainerStatus containerStatus) {
 Map.Entry completedContainerEntry = 
this.containerMap.remove(containerStatus.getContainerId());
-String completedInstanceName = completedContainerEntry == null? "unknown" 
: completedContainerEntry.getValue();
+String completedInstanceName = completedContainerEntry == null?  
UNKNOWN_HELIX_INSTANCE : completedContainerEntry.getValue();
 
 Review comment:
   Added a comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=411537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411537
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 28/Mar/20 00:11
Start Date: 28/Mar/20 00:11
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2940: GOBBLIN-1099: 
Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399590474
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java
 ##
 @@ -82,6 +91,7 @@
 import org.apache.gobblin.util.JvmUtils;
 import org.apache.gobblin.util.reflection.GobblinConstructorUtils;
 
+
 
 Review comment:
   Fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 411537)
Time Spent: 2h  (was: 1h 50m)

> Handle orphaned Yarn containers in Gobblin-on-Yarn clusters
> ---
>
>     Key: GOBBLIN-1099
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> A Yarn application may leave behind orphaned containers, which can happen due 
> to lost node managers. The orphaned containers however can continue to run 
> (potentially forever) as participants in the Helix cluster. This can cause 
> the following problems for a Gobblin-on-Yarn application:
>  # Double publish of data and commit of state
>  # Task failures and partition starvation during application restarts, as 
> Helix may assign tasks to the orphaned containers which have a stale state 
> and configuration
>  # Container failures on application restarts due to Helix instance name 
> collisions with orphaned containers
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=411535=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411535
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 28/Mar/20 00:11
Start Date: 28/Mar/20 00:11
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2940: GOBBLIN-1099: 
Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399590389
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java
 ##
 @@ -366,20 +377,74 @@ boolean isStopped() {
   }
 
   @VisibleForTesting
-  void connectHelixManager() {
-try {
-  this.jobHelixManager.connect();
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
-  new ParticipantShutdownMessageHandlerFactory());
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
-  getUserDefinedMessageHandlerFactory());
-  if (this.taskDriverHelixManager.isPresent()) {
-this.taskDriverHelixManager.get().connect();
+  void connectHelixManager() throws Exception {
+this.jobHelixManager.connect();
+//Ensure the instance is enabled.
+this.jobHelixManager.getClusterManagmentTool().enableInstance(clusterName, 
helixInstanceName, true);
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
+new ParticipantShutdownMessageHandlerFactory());
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
+getUserDefinedMessageHandlerFactory());
+if (this.taskDriverHelixManager.isPresent()) {
+  this.taskDriverHelixManager.get().connect();
+  //Ensure the instance is enabled.
+  
this.taskDriverHelixManager.get().getClusterManagmentTool().enableInstance(this.taskDriverHelixManager.get().getClusterName(),
 helixInstanceName, true);
+}
+  }
+
+  /**
+   * A method to handle failures joining Helix cluster. The method will 
perform the following steps before attempting
+   * to re-join the cluster:
+   * 
+   *   Disconnect from Helix cluster, which would close any open 
clients
+   *   Drop instance from Helix cluster, to remove any partial instance 
structure from Helix
+   *   Re-construct helix manager instances, used to re-join the 
cluster
+   * 
+   */
+  private void onClusterJoinFailure() {
+logger.warn("Disconnecting Helix manager..");
+disconnectHelixManager();
+
+HelixAdmin admin = new 
ZKHelixAdmin(clusterConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY));
+//Drop the helix Instance
+logger.warn("Dropping instance: {} from cluster: {}", helixInstanceName, 
clusterName);
+HelixUtils.dropInstanceIfExists(admin, clusterName, helixInstanceName);
+
+if (this.taskDriverHelixManager.isPresent()) {
+  String taskDriverCluster = 
clusterConfig.getString(GobblinClusterConfigurationKeys.TASK_DRIVER_CLUSTER_NAME_KEY);
+  logger.warn("Dropping instance: {} from task driver cluster: {}", 
helixInstanceName, taskDriverCluster);
+  HelixUtils.dropInstanceIfExists(admin, taskDriverCluster, 
helixInstanceName);
+}
+admin.close();
+
+logger.warn("Reinitializing Helix manager..");
+initHelixManager();
+  }
+
+  @VisibleForTesting
+  void connectHelixManagerWithRetry() {
+Callable connectHelixManagerCallable = () -> {
+  try {
+logger.info("Instance: {} attempting to join cluster: {}", 
helixInstanceName, clusterName);
+connectHelixManager();
+  } catch (HelixException e) {
+logger.error("Exception encountered when joining cluster", e);
+onClusterJoinFailure();
+throw e;
   }
-} catch (Exception e) {
-  logger.error("HelixManager failed to connect", e);
+  return null;
+};
+
+Retryer retryer = RetryerBuilder.newBuilder()
+.retryIfException()
+.withStopStrategy(StopStrategies.stopAfterAttempt(5)).build();
+
+try {
+  retryer.call(connectHelixManagerCallable);
+} catch (ExecutionException | RetryException e) {
+  logger.error("Connecting to Helix manager failed", e);
 
 Review comment:
   Thanks! Will fix it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this serv

[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=411534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411534
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 28/Mar/20 00:11
Start Date: 28/Mar/20 00:11
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2940: GOBBLIN-1099: 
Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399590340
 
 

 ##
 File path: 
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/GobblinYarnAppLauncher.java
 ##
 @@ -479,6 +488,26 @@ void connectHelixManager() {
 }
   }
 
+  /**
+   * A method to disable pre-existing live instances in a Helix cluster. This 
can happen when a previous Yarn application
+   * leaves behind orphaned Yarn worker processes. Since Helix does not 
provide an API to drop a live instance, we use
+   * the disable instance API to fence off these orphaned instances and 
prevent them from becoming participants in the
+   * new cluster.
+   *
+   * NOTE: this is a workaround for an existing YARN bug. Once YARN has a fix 
to guarantee container kills on application
+   * completion, this method should be removed.
+   */
+  void disableLiveHelixInstances() {
+String clusterName = this.helixManager.getClusterName();
+HelixAdmin helixAdmin = this.helixManager.getClusterManagmentTool();
+List liveInstances = 
HelixUtils.getLiveInstances(this.helixManager);
+LOGGER.warn("Found {} live instances in the cluster.", 
liveInstances.size());
+for (String instanceName: liveInstances) {
+  LOGGER.warn("Disabling instance: {}", instanceName);
+  helixAdmin.enableInstance(clusterName, instanceName, false);
 
 Review comment:
   No there is none. The instance is part of the cluster, but disabled. It 
needs to re-enabled to start getting task assignments from Helix.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 411534)
Time Spent: 1h 40m  (was: 1.5h)

> Handle orphaned Yarn containers in Gobblin-on-Yarn clusters
> ---
>
> Key: GOBBLIN-1099
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> A Yarn application may leave behind orphaned containers, which can happen due 
> to lost node managers. The orphaned containers however can continue to run 
> (potentially forever) as participants in the Helix cluster. This can cause 
> the following problems for a Gobblin-on-Yarn application:
>  # Double publish of data and commit of state
>  # Task failures and partition starvation during application restarts, as 
> Helix may assign tasks to the orphaned containers which have a stale state 
> and configuration
>  # Container failures on application restarts due to Helix instance name 
> collisions with orphaned containers
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-27 Thread GitBox
sv2000 commented on a change in pull request #2940: GOBBLIN-1099: Handle 
orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399590340
 
 

 ##
 File path: 
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/GobblinYarnAppLauncher.java
 ##
 @@ -479,6 +488,26 @@ void connectHelixManager() {
 }
   }
 
+  /**
+   * A method to disable pre-existing live instances in a Helix cluster. This 
can happen when a previous Yarn application
+   * leaves behind orphaned Yarn worker processes. Since Helix does not 
provide an API to drop a live instance, we use
+   * the disable instance API to fence off these orphaned instances and 
prevent them from becoming participants in the
+   * new cluster.
+   *
+   * NOTE: this is a workaround for an existing YARN bug. Once YARN has a fix 
to guarantee container kills on application
+   * completion, this method should be removed.
+   */
+  void disableLiveHelixInstances() {
+String clusterName = this.helixManager.getClusterName();
+HelixAdmin helixAdmin = this.helixManager.getClusterManagmentTool();
+List liveInstances = 
HelixUtils.getLiveInstances(this.helixManager);
+LOGGER.warn("Found {} live instances in the cluster.", 
liveInstances.size());
+for (String instanceName: liveInstances) {
+  LOGGER.warn("Disabling instance: {}", instanceName);
+  helixAdmin.enableInstance(clusterName, instanceName, false);
 
 Review comment:
   No there is none. The instance is part of the cluster, but disabled. It 
needs to re-enabled to start getting task assignments from Helix.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-27 Thread GitBox
sv2000 commented on a change in pull request #2940: GOBBLIN-1099: Handle 
orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399590474
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java
 ##
 @@ -82,6 +91,7 @@
 import org.apache.gobblin.util.JvmUtils;
 import org.apache.gobblin.util.reflection.GobblinConstructorUtils;
 
+
 
 Review comment:
   Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-27 Thread GitBox
sv2000 commented on a change in pull request #2940: GOBBLIN-1099: Handle 
orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399590389
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java
 ##
 @@ -366,20 +377,74 @@ boolean isStopped() {
   }
 
   @VisibleForTesting
-  void connectHelixManager() {
-try {
-  this.jobHelixManager.connect();
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
-  new ParticipantShutdownMessageHandlerFactory());
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
-  getUserDefinedMessageHandlerFactory());
-  if (this.taskDriverHelixManager.isPresent()) {
-this.taskDriverHelixManager.get().connect();
+  void connectHelixManager() throws Exception {
+this.jobHelixManager.connect();
+//Ensure the instance is enabled.
+this.jobHelixManager.getClusterManagmentTool().enableInstance(clusterName, 
helixInstanceName, true);
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
+new ParticipantShutdownMessageHandlerFactory());
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
+getUserDefinedMessageHandlerFactory());
+if (this.taskDriverHelixManager.isPresent()) {
+  this.taskDriverHelixManager.get().connect();
+  //Ensure the instance is enabled.
+  
this.taskDriverHelixManager.get().getClusterManagmentTool().enableInstance(this.taskDriverHelixManager.get().getClusterName(),
 helixInstanceName, true);
+}
+  }
+
+  /**
+   * A method to handle failures joining Helix cluster. The method will 
perform the following steps before attempting
+   * to re-join the cluster:
+   * 
+   *   Disconnect from Helix cluster, which would close any open 
clients
+   *   Drop instance from Helix cluster, to remove any partial instance 
structure from Helix
+   *   Re-construct helix manager instances, used to re-join the 
cluster
+   * 
+   */
+  private void onClusterJoinFailure() {
+logger.warn("Disconnecting Helix manager..");
+disconnectHelixManager();
+
+HelixAdmin admin = new 
ZKHelixAdmin(clusterConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY));
+//Drop the helix Instance
+logger.warn("Dropping instance: {} from cluster: {}", helixInstanceName, 
clusterName);
+HelixUtils.dropInstanceIfExists(admin, clusterName, helixInstanceName);
+
+if (this.taskDriverHelixManager.isPresent()) {
+  String taskDriverCluster = 
clusterConfig.getString(GobblinClusterConfigurationKeys.TASK_DRIVER_CLUSTER_NAME_KEY);
+  logger.warn("Dropping instance: {} from task driver cluster: {}", 
helixInstanceName, taskDriverCluster);
+  HelixUtils.dropInstanceIfExists(admin, taskDriverCluster, 
helixInstanceName);
+}
+admin.close();
+
+logger.warn("Reinitializing Helix manager..");
+initHelixManager();
+  }
+
+  @VisibleForTesting
+  void connectHelixManagerWithRetry() {
+Callable connectHelixManagerCallable = () -> {
+  try {
+logger.info("Instance: {} attempting to join cluster: {}", 
helixInstanceName, clusterName);
+connectHelixManager();
+  } catch (HelixException e) {
+logger.error("Exception encountered when joining cluster", e);
+onClusterJoinFailure();
+throw e;
   }
-} catch (Exception e) {
-  logger.error("HelixManager failed to connect", e);
+  return null;
+};
+
+Retryer retryer = RetryerBuilder.newBuilder()
+.retryIfException()
+.withStopStrategy(StopStrategies.stopAfterAttempt(5)).build();
+
+try {
+  retryer.call(connectHelixManagerCallable);
+} catch (ExecutionException | RetryException e) {
+  logger.error("Connecting to Helix manager failed", e);
 
 Review comment:
   Thanks! Will fix it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=411531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411531
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 28/Mar/20 00:09
Start Date: 28/Mar/20 00:09
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2940: GOBBLIN-1099: 
Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r39958
 
 

 ##
 File path: gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java
 ##
 @@ -188,6 +193,7 @@
   // into the queue if the container running the instance completes. Unused 
Helix
   // instance names get picked up when replacement containers get allocated.
   private final ConcurrentLinkedQueue unusedHelixInstanceNames = 
Queues.newConcurrentLinkedQueue();
+  private boolean reuseUnusedHelixInstanceNames = true;
 
 Review comment:
   Thanks! Removed this variable.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 411531)
Time Spent: 1.5h  (was: 1h 20m)

> Handle orphaned Yarn containers in Gobblin-on-Yarn clusters
> ---
>
>     Key: GOBBLIN-1099
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> A Yarn application may leave behind orphaned containers, which can happen due 
> to lost node managers. The orphaned containers however can continue to run 
> (potentially forever) as participants in the Helix cluster. This can cause 
> the following problems for a Gobblin-on-Yarn application:
>  # Double publish of data and commit of state
>  # Task failures and partition starvation during application restarts, as 
> Helix may assign tasks to the orphaned containers which have a stale state 
> and configuration
>  # Container failures on application restarts due to Helix instance name 
> collisions with orphaned containers
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-27 Thread GitBox
sv2000 commented on a change in pull request #2940: GOBBLIN-1099: Handle 
orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r39958
 
 

 ##
 File path: gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java
 ##
 @@ -188,6 +193,7 @@
   // into the queue if the container running the instance completes. Unused 
Helix
   // instance names get picked up when replacement containers get allocated.
   private final ConcurrentLinkedQueue unusedHelixInstanceNames = 
Queues.newConcurrentLinkedQueue();
+  private boolean reuseUnusedHelixInstanceNames = true;
 
 Review comment:
   Thanks! Removed this variable.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=411528=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411528
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 28/Mar/20 00:04
Start Date: 28/Mar/20 00:04
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2940: GOBBLIN-1099: 
Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399589040
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java
 ##
 @@ -366,20 +377,74 @@ boolean isStopped() {
   }
 
   @VisibleForTesting
-  void connectHelixManager() {
-try {
-  this.jobHelixManager.connect();
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
-  new ParticipantShutdownMessageHandlerFactory());
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
-  getUserDefinedMessageHandlerFactory());
-  if (this.taskDriverHelixManager.isPresent()) {
-this.taskDriverHelixManager.get().connect();
+  void connectHelixManager() throws Exception {
+this.jobHelixManager.connect();
+//Ensure the instance is enabled.
+this.jobHelixManager.getClusterManagmentTool().enableInstance(clusterName, 
helixInstanceName, true);
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
+new ParticipantShutdownMessageHandlerFactory());
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
+getUserDefinedMessageHandlerFactory());
+if (this.taskDriverHelixManager.isPresent()) {
+  this.taskDriverHelixManager.get().connect();
+  //Ensure the instance is enabled.
+  
this.taskDriverHelixManager.get().getClusterManagmentTool().enableInstance(this.taskDriverHelixManager.get().getClusterName(),
 helixInstanceName, true);
+}
+  }
+
+  /**
+   * A method to handle failures joining Helix cluster. The method will 
perform the following steps before attempting
+   * to re-join the cluster:
+   * 
+   *   Disconnect from Helix cluster, which would close any open 
clients
+   *   Drop instance from Helix cluster, to remove any partial instance 
structure from Helix
+   *   Re-construct helix manager instances, used to re-join the 
cluster
+   * 
+   */
+  private void onClusterJoinFailure() {
+logger.warn("Disconnecting Helix manager..");
+disconnectHelixManager();
+
+HelixAdmin admin = new 
ZKHelixAdmin(clusterConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY));
 
 Review comment:
   The intent was to avoid a state where the HelixManager is in an inconsistent 
state where the underlying ZkClient is connected but there is a failure later. 
By disconnecting before retrying, we ensure any partial state during the 
previous connect attempt is cleaned up. 
   
   The reason we are instantiating a new HelixAdmin instance (separate from 
HelixManager) is that getClusterManagementTool() requires HelixManager to be 
connected first. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 411528)
Time Spent: 1h 20m  (was: 1h 10m)

> Handle orphaned Yarn containers in Gobblin-on-Yarn clusters
> ---
>
>     Key: GOBBLIN-1099
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> A Yarn application may leave behind orphaned containers, which can happen due 
> to lost node managers. The orphaned containers however can continue to run 
> (potentially forever) as participants in the Helix cluster. This can cause 
> the following problems for a Gobblin-on-Yarn application:
>  # Double publish of data and commit of state
>  # Task failures and partition starvation during application resta

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-27 Thread GitBox
sv2000 commented on a change in pull request #2940: GOBBLIN-1099: Handle 
orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399589040
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java
 ##
 @@ -366,20 +377,74 @@ boolean isStopped() {
   }
 
   @VisibleForTesting
-  void connectHelixManager() {
-try {
-  this.jobHelixManager.connect();
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
-  new ParticipantShutdownMessageHandlerFactory());
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
-  getUserDefinedMessageHandlerFactory());
-  if (this.taskDriverHelixManager.isPresent()) {
-this.taskDriverHelixManager.get().connect();
+  void connectHelixManager() throws Exception {
+this.jobHelixManager.connect();
+//Ensure the instance is enabled.
+this.jobHelixManager.getClusterManagmentTool().enableInstance(clusterName, 
helixInstanceName, true);
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
+new ParticipantShutdownMessageHandlerFactory());
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
+getUserDefinedMessageHandlerFactory());
+if (this.taskDriverHelixManager.isPresent()) {
+  this.taskDriverHelixManager.get().connect();
+  //Ensure the instance is enabled.
+  
this.taskDriverHelixManager.get().getClusterManagmentTool().enableInstance(this.taskDriverHelixManager.get().getClusterName(),
 helixInstanceName, true);
+}
+  }
+
+  /**
+   * A method to handle failures joining Helix cluster. The method will 
perform the following steps before attempting
+   * to re-join the cluster:
+   * 
+   *   Disconnect from Helix cluster, which would close any open 
clients
+   *   Drop instance from Helix cluster, to remove any partial instance 
structure from Helix
+   *   Re-construct helix manager instances, used to re-join the 
cluster
+   * 
+   */
+  private void onClusterJoinFailure() {
+logger.warn("Disconnecting Helix manager..");
+disconnectHelixManager();
+
+HelixAdmin admin = new 
ZKHelixAdmin(clusterConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY));
 
 Review comment:
   The intent was to avoid a state where the HelixManager is in an inconsistent 
state where the underlying ZkClient is connected but there is a failure later. 
By disconnecting before retrying, we ensure any partial state during the 
previous connect attempt is cleaned up. 
   
   The reason we are instantiating a new HelixAdmin instance (separate from 
HelixManager) is that getClusterManagementTool() requires HelixManager to be 
connected first. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=411517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411517
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 27/Mar/20 23:41
Start Date: 27/Mar/20 23:41
Worklog Time Spent: 10m 
  Work Description: autumnust commented on pull request #2940: 
GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399583366
 
 

 ##
 File path: gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java
 ##
 @@ -635,10 +644,26 @@ protected void handleContainerCompletion(ContainerStatus 
containerStatus) {
   containerStatus.getContainerId(), containerStatus.getDiagnostics()));
 }
 
-if 
(this.releasedContainerCache.getIfPresent(containerStatus.getContainerId()) != 
null) {
-  LOGGER.info("Container release requested, so not spawning a replacement 
for containerId {}",
-  containerStatus.getContainerId());
-  return;
+if (containerStatus.getExitStatus() == ContainerExitStatus.ABORTED) {
+  if 
(this.releasedContainerCache.getIfPresent(containerStatus.getContainerId()) != 
null) {
+LOGGER.info("Container release requested, so not spawning a 
replacement for containerId {}", containerStatus.getContainerId());
+return;
+  } else {
+LOGGER.info("Container {} aborted due to lost NM", 
containerStatus.getContainerId());
+   // Container release was not requested. Likely, the container was 
running on a node on which the NM died.
+   // In this case, RM assumes that the containers are "lost", even though 
the container process may still be
+// running on the node. We need to ensure that the Helix instances 
running on the orphaned containers
+// are fenced off from the Helix cluster to avoid double publishing 
and state being committed by the
+// instances.
+if (!UNKNOWN_HELIX_INSTANCE.equals(completedInstanceName)) {
+  String clusterName = this.helixManager.getClusterName();
+  //Disable the orphaned instance.
+  if (HelixUtils.isInstanceLive(helixManager, completedInstanceName)) {
+LOGGER.info("Disabling the Helix instance {}", 
completedInstanceName);
+
this.helixManager.getClusterManagmentTool().enableInstance(clusterName, 
completedInstanceName, false);
 
 Review comment:
   enableInstance, This is a very good API :) 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 411517)
    Time Spent: 1h 10m  (was: 1h)

> Handle orphaned Yarn containers in Gobblin-on-Yarn clusters
> -------
>
> Key: GOBBLIN-1099
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> A Yarn application may leave behind orphaned containers, which can happen due 
> to lost node managers. The orphaned containers however can continue to run 
> (potentially forever) as participants in the Helix cluster. This can cause 
> the following problems for a Gobblin-on-Yarn application:
>  # Double publish of data and commit of state
>  # Task failures and partition starvation during application restarts, as 
> Helix may assign tasks to the orphaned containers which have a stale state 
> and configuration
>  # Container failures on application restarts due to Helix instance name 
> collisions with orphaned containers
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=411515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411515
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 27/Mar/20 23:41
Start Date: 27/Mar/20 23:41
Worklog Time Spent: 10m 
  Work Description: autumnust commented on pull request #2940: 
GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399580058
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java
 ##
 @@ -366,20 +377,74 @@ boolean isStopped() {
   }
 
   @VisibleForTesting
-  void connectHelixManager() {
-try {
-  this.jobHelixManager.connect();
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
-  new ParticipantShutdownMessageHandlerFactory());
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
-  getUserDefinedMessageHandlerFactory());
-  if (this.taskDriverHelixManager.isPresent()) {
-this.taskDriverHelixManager.get().connect();
+  void connectHelixManager() throws Exception {
+this.jobHelixManager.connect();
+//Ensure the instance is enabled.
+this.jobHelixManager.getClusterManagmentTool().enableInstance(clusterName, 
helixInstanceName, true);
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
+new ParticipantShutdownMessageHandlerFactory());
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
+getUserDefinedMessageHandlerFactory());
+if (this.taskDriverHelixManager.isPresent()) {
+  this.taskDriverHelixManager.get().connect();
+  //Ensure the instance is enabled.
+  
this.taskDriverHelixManager.get().getClusterManagmentTool().enableInstance(this.taskDriverHelixManager.get().getClusterName(),
 helixInstanceName, true);
+}
+  }
+
+  /**
+   * A method to handle failures joining Helix cluster. The method will 
perform the following steps before attempting
+   * to re-join the cluster:
+   * 
+   *   Disconnect from Helix cluster, which would close any open 
clients
+   *   Drop instance from Helix cluster, to remove any partial instance 
structure from Helix
+   *   Re-construct helix manager instances, used to re-join the 
cluster
+   * 
+   */
+  private void onClusterJoinFailure() {
+logger.warn("Disconnecting Helix manager..");
+disconnectHelixManager();
+
+HelixAdmin admin = new 
ZKHelixAdmin(clusterConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY));
+//Drop the helix Instance
+logger.warn("Dropping instance: {} from cluster: {}", helixInstanceName, 
clusterName);
+HelixUtils.dropInstanceIfExists(admin, clusterName, helixInstanceName);
 
 Review comment:
   The return value of this method is ignored, is that intentional? Or shall we 
leave the handling of exception to caller of this static method ? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 411515)
Time Spent: 1h 10m  (was: 1h)

> Handle orphaned Yarn containers in Gobblin-on-Yarn clusters
> ---
>
> Key: GOBBLIN-1099
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> A Yarn application may leave behind orphaned containers, which can happen due 
> to lost node managers. The orphaned containers however can continue to run 
> (potentially forever) as participants in the Helix cluster. This can cause 
> the following problems for a Gobblin-on-Yarn application:
>  # Double publish of data and commit of state
>  # Task failures and partition starvation during application restarts, as 
> Helix may assign tasks to the orphaned containers which have a stale state 
&g

[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=411514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411514
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 27/Mar/20 23:41
Start Date: 27/Mar/20 23:41
Worklog Time Spent: 10m 
  Work Description: autumnust commented on pull request #2940: 
GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399582595
 
 

 ##
 File path: gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java
 ##
 @@ -625,7 +634,7 @@ private boolean shouldStickToTheSameNode(int 
containerExitStatus) {
*/
   protected void handleContainerCompletion(ContainerStatus containerStatus) {
 Map.Entry completedContainerEntry = 
this.containerMap.remove(containerStatus.getContainerId());
-String completedInstanceName = completedContainerEntry == null? "unknown" 
: completedContainerEntry.getValue();
+String completedInstanceName = completedContainerEntry == null?  
UNKNOWN_HELIX_INSTANCE : completedContainerEntry.getValue();
 
 Review comment:
   Just curious, What kind of situation result in a containerId is not in 
containerMap? can we also add javadoc here
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 411514)
Time Spent: 1h 10m  (was: 1h)

> Handle orphaned Yarn containers in Gobblin-on-Yarn clusters
> ---
>
>     Key: GOBBLIN-1099
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> A Yarn application may leave behind orphaned containers, which can happen due 
> to lost node managers. The orphaned containers however can continue to run 
> (potentially forever) as participants in the Helix cluster. This can cause 
> the following problems for a Gobblin-on-Yarn application:
>  # Double publish of data and commit of state
>  # Task failures and partition starvation during application restarts, as 
> Helix may assign tasks to the orphaned containers which have a stale state 
> and configuration
>  # Container failures on application restarts due to Helix instance name 
> collisions with orphaned containers
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=411512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411512
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 27/Mar/20 23:41
Start Date: 27/Mar/20 23:41
Worklog Time Spent: 10m 
  Work Description: autumnust commented on pull request #2940: 
GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399578539
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java
 ##
 @@ -366,20 +377,74 @@ boolean isStopped() {
   }
 
   @VisibleForTesting
-  void connectHelixManager() {
-try {
-  this.jobHelixManager.connect();
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
-  new ParticipantShutdownMessageHandlerFactory());
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
-  getUserDefinedMessageHandlerFactory());
-  if (this.taskDriverHelixManager.isPresent()) {
-this.taskDriverHelixManager.get().connect();
+  void connectHelixManager() throws Exception {
+this.jobHelixManager.connect();
+//Ensure the instance is enabled.
+this.jobHelixManager.getClusterManagmentTool().enableInstance(clusterName, 
helixInstanceName, true);
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
+new ParticipantShutdownMessageHandlerFactory());
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
+getUserDefinedMessageHandlerFactory());
+if (this.taskDriverHelixManager.isPresent()) {
+  this.taskDriverHelixManager.get().connect();
+  //Ensure the instance is enabled.
+  
this.taskDriverHelixManager.get().getClusterManagmentTool().enableInstance(this.taskDriverHelixManager.get().getClusterName(),
 helixInstanceName, true);
+}
+  }
+
+  /**
+   * A method to handle failures joining Helix cluster. The method will 
perform the following steps before attempting
+   * to re-join the cluster:
+   * 
+   *   Disconnect from Helix cluster, which would close any open 
clients
+   *   Drop instance from Helix cluster, to remove any partial instance 
structure from Helix
+   *   Re-construct helix manager instances, used to re-join the 
cluster
+   * 
+   */
+  private void onClusterJoinFailure() {
+logger.warn("Disconnecting Helix manager..");
+disconnectHelixManager();
+
+HelixAdmin admin = new 
ZKHelixAdmin(clusterConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY));
 
 Review comment:
   I couldn't fully get the purpose of disconnect from existing helixManager, 
creating a HelixAdmin and use the newly-created admin to conduct 
instance-dropping. Isn't `helixManager.getClusterManagmentTool()` essentially a 
same HelixAdmin that we can use to drop instance ? What's the gain of 
disconnecting helixManager? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 411512)
Time Spent: 1h  (was: 50m)

> Handle orphaned Yarn containers in Gobblin-on-Yarn clusters
> ---
>
>     Key: GOBBLIN-1099
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> A Yarn application may leave behind orphaned containers, which can happen due 
> to lost node managers. The orphaned containers however can continue to run 
> (potentially forever) as participants in the Helix cluster. This can cause 
> the following problems for a Gobblin-on-Yarn application:
>  # Double publish of data and commit of state
>  # Task failures and partition starvation during application restarts, as 
> Helix may assign tasks to the orphaned containers which have a stale state 
> and configuration
>  # Container

[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-27 Thread GitBox
autumnust commented on a change in pull request #2940: GOBBLIN-1099: Handle 
orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399582595
 
 

 ##
 File path: gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java
 ##
 @@ -625,7 +634,7 @@ private boolean shouldStickToTheSameNode(int 
containerExitStatus) {
*/
   protected void handleContainerCompletion(ContainerStatus containerStatus) {
 Map.Entry completedContainerEntry = 
this.containerMap.remove(containerStatus.getContainerId());
-String completedInstanceName = completedContainerEntry == null? "unknown" 
: completedContainerEntry.getValue();
+String completedInstanceName = completedContainerEntry == null?  
UNKNOWN_HELIX_INSTANCE : completedContainerEntry.getValue();
 
 Review comment:
   Just curious, What kind of situation result in a containerId is not in 
containerMap? can we also add javadoc here


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-27 Thread GitBox
autumnust commented on a change in pull request #2940: GOBBLIN-1099: Handle 
orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399575480
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java
 ##
 @@ -82,6 +91,7 @@
 import org.apache.gobblin.util.JvmUtils;
 import org.apache.gobblin.util.reflection.GobblinConstructorUtils;
 
+
 
 Review comment:
   Empty line.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=411513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411513
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 27/Mar/20 23:41
Start Date: 27/Mar/20 23:41
Worklog Time Spent: 10m 
  Work Description: autumnust commented on pull request #2940: 
GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399583642
 
 

 ##
 File path: gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java
 ##
 @@ -188,6 +193,7 @@
   // into the queue if the container running the instance completes. Unused 
Helix
   // instance names get picked up when replacement containers get allocated.
   private final ConcurrentLinkedQueue unusedHelixInstanceNames = 
Queues.newConcurrentLinkedQueue();
+  private boolean reuseUnusedHelixInstanceNames = true;
 
 Review comment:
   what's the usage of this?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 411513)
Time Spent: 1h 10m  (was: 1h)

> Handle orphaned Yarn containers in Gobblin-on-Yarn clusters
> ---
>
>     Key: GOBBLIN-1099
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> A Yarn application may leave behind orphaned containers, which can happen due 
> to lost node managers. The orphaned containers however can continue to run 
> (potentially forever) as participants in the Helix cluster. This can cause 
> the following problems for a Gobblin-on-Yarn application:
>  # Double publish of data and commit of state
>  # Task failures and partition starvation during application restarts, as 
> Helix may assign tasks to the orphaned containers which have a stale state 
> and configuration
>  # Container failures on application restarts due to Helix instance name 
> collisions with orphaned containers
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=411510=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411510
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 27/Mar/20 23:41
Start Date: 27/Mar/20 23:41
Worklog Time Spent: 10m 
  Work Description: autumnust commented on pull request #2940: 
GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399575480
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java
 ##
 @@ -82,6 +91,7 @@
 import org.apache.gobblin.util.JvmUtils;
 import org.apache.gobblin.util.reflection.GobblinConstructorUtils;
 
+
 
 Review comment:
   Empty line.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 411510)
Time Spent: 50m  (was: 40m)

> Handle orphaned Yarn containers in Gobblin-on-Yarn clusters
> ---
>
>     Key: GOBBLIN-1099
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> A Yarn application may leave behind orphaned containers, which can happen due 
> to lost node managers. The orphaned containers however can continue to run 
> (potentially forever) as participants in the Helix cluster. This can cause 
> the following problems for a Gobblin-on-Yarn application:
>  # Double publish of data and commit of state
>  # Task failures and partition starvation during application restarts, as 
> Helix may assign tasks to the orphaned containers which have a stale state 
> and configuration
>  # Container failures on application restarts due to Helix instance name 
> collisions with orphaned containers
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-27 Thread GitBox
autumnust commented on a change in pull request #2940: GOBBLIN-1099: Handle 
orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399583366
 
 

 ##
 File path: gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java
 ##
 @@ -635,10 +644,26 @@ protected void handleContainerCompletion(ContainerStatus 
containerStatus) {
   containerStatus.getContainerId(), containerStatus.getDiagnostics()));
 }
 
-if 
(this.releasedContainerCache.getIfPresent(containerStatus.getContainerId()) != 
null) {
-  LOGGER.info("Container release requested, so not spawning a replacement 
for containerId {}",
-  containerStatus.getContainerId());
-  return;
+if (containerStatus.getExitStatus() == ContainerExitStatus.ABORTED) {
+  if 
(this.releasedContainerCache.getIfPresent(containerStatus.getContainerId()) != 
null) {
+LOGGER.info("Container release requested, so not spawning a 
replacement for containerId {}", containerStatus.getContainerId());
+return;
+  } else {
+LOGGER.info("Container {} aborted due to lost NM", 
containerStatus.getContainerId());
+   // Container release was not requested. Likely, the container was 
running on a node on which the NM died.
+   // In this case, RM assumes that the containers are "lost", even though 
the container process may still be
+// running on the node. We need to ensure that the Helix instances 
running on the orphaned containers
+// are fenced off from the Helix cluster to avoid double publishing 
and state being committed by the
+// instances.
+if (!UNKNOWN_HELIX_INSTANCE.equals(completedInstanceName)) {
+  String clusterName = this.helixManager.getClusterName();
+  //Disable the orphaned instance.
+  if (HelixUtils.isInstanceLive(helixManager, completedInstanceName)) {
+LOGGER.info("Disabling the Helix instance {}", 
completedInstanceName);
+
this.helixManager.getClusterManagmentTool().enableInstance(clusterName, 
completedInstanceName, false);
 
 Review comment:
   enableInstance, This is a very good API :) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=411511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411511
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 27/Mar/20 23:41
Start Date: 27/Mar/20 23:41
Worklog Time Spent: 10m 
  Work Description: autumnust commented on pull request #2940: 
GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399579077
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java
 ##
 @@ -366,20 +377,74 @@ boolean isStopped() {
   }
 
   @VisibleForTesting
-  void connectHelixManager() {
-try {
-  this.jobHelixManager.connect();
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
-  new ParticipantShutdownMessageHandlerFactory());
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
-  getUserDefinedMessageHandlerFactory());
-  if (this.taskDriverHelixManager.isPresent()) {
-this.taskDriverHelixManager.get().connect();
+  void connectHelixManager() throws Exception {
+this.jobHelixManager.connect();
+//Ensure the instance is enabled.
+this.jobHelixManager.getClusterManagmentTool().enableInstance(clusterName, 
helixInstanceName, true);
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
+new ParticipantShutdownMessageHandlerFactory());
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
+getUserDefinedMessageHandlerFactory());
+if (this.taskDriverHelixManager.isPresent()) {
+  this.taskDriverHelixManager.get().connect();
+  //Ensure the instance is enabled.
+  
this.taskDriverHelixManager.get().getClusterManagmentTool().enableInstance(this.taskDriverHelixManager.get().getClusterName(),
 helixInstanceName, true);
+}
+  }
+
+  /**
+   * A method to handle failures joining Helix cluster. The method will 
perform the following steps before attempting
+   * to re-join the cluster:
+   * 
+   *   Disconnect from Helix cluster, which would close any open 
clients
+   *   Drop instance from Helix cluster, to remove any partial instance 
structure from Helix
+   *   Re-construct helix manager instances, used to re-join the 
cluster
+   * 
+   */
+  private void onClusterJoinFailure() {
+logger.warn("Disconnecting Helix manager..");
+disconnectHelixManager();
+
+HelixAdmin admin = new 
ZKHelixAdmin(clusterConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY));
+//Drop the helix Instance
+logger.warn("Dropping instance: {} from cluster: {}", helixInstanceName, 
clusterName);
+HelixUtils.dropInstanceIfExists(admin, clusterName, helixInstanceName);
+
+if (this.taskDriverHelixManager.isPresent()) {
+  String taskDriverCluster = 
clusterConfig.getString(GobblinClusterConfigurationKeys.TASK_DRIVER_CLUSTER_NAME_KEY);
+  logger.warn("Dropping instance: {} from task driver cluster: {}", 
helixInstanceName, taskDriverCluster);
+  HelixUtils.dropInstanceIfExists(admin, taskDriverCluster, 
helixInstanceName);
+}
+admin.close();
+
+logger.warn("Reinitializing Helix manager..");
+initHelixManager();
+  }
+
+  @VisibleForTesting
+  void connectHelixManagerWithRetry() {
+Callable connectHelixManagerCallable = () -> {
+  try {
+logger.info("Instance: {} attempting to join cluster: {}", 
helixInstanceName, clusterName);
+connectHelixManager();
+  } catch (HelixException e) {
+logger.error("Exception encountered when joining cluster", e);
+onClusterJoinFailure();
+throw e;
   }
-} catch (Exception e) {
-  logger.error("HelixManager failed to connect", e);
+  return null;
+};
+
+Retryer retryer = RetryerBuilder.newBuilder()
+.retryIfException()
+.withStopStrategy(StopStrategies.stopAfterAttempt(5)).build();
+
+try {
+  retryer.call(connectHelixManagerCallable);
+} catch (ExecutionException | RetryException e) {
+  logger.error("Connecting to Helix manager failed", e);
 
 Review comment:
   let's avoid log.error and having an unchecked exception propagated at the 
same time: It doubled stack traces which is not very necessary and confused 
reader on how many times the exception actually happen. 
 

This is an automated mess

[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-27 Thread GitBox
autumnust commented on a change in pull request #2940: GOBBLIN-1099: Handle 
orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399583642
 
 

 ##
 File path: gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java
 ##
 @@ -188,6 +193,7 @@
   // into the queue if the container running the instance completes. Unused 
Helix
   // instance names get picked up when replacement containers get allocated.
   private final ConcurrentLinkedQueue unusedHelixInstanceNames = 
Queues.newConcurrentLinkedQueue();
+  private boolean reuseUnusedHelixInstanceNames = true;
 
 Review comment:
   what's the usage of this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-27 Thread GitBox
autumnust commented on a change in pull request #2940: GOBBLIN-1099: Handle 
orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399579077
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java
 ##
 @@ -366,20 +377,74 @@ boolean isStopped() {
   }
 
   @VisibleForTesting
-  void connectHelixManager() {
-try {
-  this.jobHelixManager.connect();
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
-  new ParticipantShutdownMessageHandlerFactory());
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
-  getUserDefinedMessageHandlerFactory());
-  if (this.taskDriverHelixManager.isPresent()) {
-this.taskDriverHelixManager.get().connect();
+  void connectHelixManager() throws Exception {
+this.jobHelixManager.connect();
+//Ensure the instance is enabled.
+this.jobHelixManager.getClusterManagmentTool().enableInstance(clusterName, 
helixInstanceName, true);
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
+new ParticipantShutdownMessageHandlerFactory());
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
+getUserDefinedMessageHandlerFactory());
+if (this.taskDriverHelixManager.isPresent()) {
+  this.taskDriverHelixManager.get().connect();
+  //Ensure the instance is enabled.
+  
this.taskDriverHelixManager.get().getClusterManagmentTool().enableInstance(this.taskDriverHelixManager.get().getClusterName(),
 helixInstanceName, true);
+}
+  }
+
+  /**
+   * A method to handle failures joining Helix cluster. The method will 
perform the following steps before attempting
+   * to re-join the cluster:
+   * 
+   *   Disconnect from Helix cluster, which would close any open 
clients
+   *   Drop instance from Helix cluster, to remove any partial instance 
structure from Helix
+   *   Re-construct helix manager instances, used to re-join the 
cluster
+   * 
+   */
+  private void onClusterJoinFailure() {
+logger.warn("Disconnecting Helix manager..");
+disconnectHelixManager();
+
+HelixAdmin admin = new 
ZKHelixAdmin(clusterConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY));
+//Drop the helix Instance
+logger.warn("Dropping instance: {} from cluster: {}", helixInstanceName, 
clusterName);
+HelixUtils.dropInstanceIfExists(admin, clusterName, helixInstanceName);
+
+if (this.taskDriverHelixManager.isPresent()) {
+  String taskDriverCluster = 
clusterConfig.getString(GobblinClusterConfigurationKeys.TASK_DRIVER_CLUSTER_NAME_KEY);
+  logger.warn("Dropping instance: {} from task driver cluster: {}", 
helixInstanceName, taskDriverCluster);
+  HelixUtils.dropInstanceIfExists(admin, taskDriverCluster, 
helixInstanceName);
+}
+admin.close();
+
+logger.warn("Reinitializing Helix manager..");
+initHelixManager();
+  }
+
+  @VisibleForTesting
+  void connectHelixManagerWithRetry() {
+Callable connectHelixManagerCallable = () -> {
+  try {
+logger.info("Instance: {} attempting to join cluster: {}", 
helixInstanceName, clusterName);
+connectHelixManager();
+  } catch (HelixException e) {
+logger.error("Exception encountered when joining cluster", e);
+onClusterJoinFailure();
+throw e;
   }
-} catch (Exception e) {
-  logger.error("HelixManager failed to connect", e);
+  return null;
+};
+
+Retryer retryer = RetryerBuilder.newBuilder()
+.retryIfException()
+.withStopStrategy(StopStrategies.stopAfterAttempt(5)).build();
+
+try {
+  retryer.call(connectHelixManagerCallable);
+} catch (ExecutionException | RetryException e) {
+  logger.error("Connecting to Helix manager failed", e);
 
 Review comment:
   let's avoid log.error and having an unchecked exception propagated at the 
same time: It doubled stack traces which is not very necessary and confused 
reader on how many times the exception actually happen. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-27 Thread GitBox
autumnust commented on a change in pull request #2940: GOBBLIN-1099: Handle 
orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399578539
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java
 ##
 @@ -366,20 +377,74 @@ boolean isStopped() {
   }
 
   @VisibleForTesting
-  void connectHelixManager() {
-try {
-  this.jobHelixManager.connect();
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
-  new ParticipantShutdownMessageHandlerFactory());
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
-  getUserDefinedMessageHandlerFactory());
-  if (this.taskDriverHelixManager.isPresent()) {
-this.taskDriverHelixManager.get().connect();
+  void connectHelixManager() throws Exception {
+this.jobHelixManager.connect();
+//Ensure the instance is enabled.
+this.jobHelixManager.getClusterManagmentTool().enableInstance(clusterName, 
helixInstanceName, true);
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
+new ParticipantShutdownMessageHandlerFactory());
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
+getUserDefinedMessageHandlerFactory());
+if (this.taskDriverHelixManager.isPresent()) {
+  this.taskDriverHelixManager.get().connect();
+  //Ensure the instance is enabled.
+  
this.taskDriverHelixManager.get().getClusterManagmentTool().enableInstance(this.taskDriverHelixManager.get().getClusterName(),
 helixInstanceName, true);
+}
+  }
+
+  /**
+   * A method to handle failures joining Helix cluster. The method will 
perform the following steps before attempting
+   * to re-join the cluster:
+   * 
+   *   Disconnect from Helix cluster, which would close any open 
clients
+   *   Drop instance from Helix cluster, to remove any partial instance 
structure from Helix
+   *   Re-construct helix manager instances, used to re-join the 
cluster
+   * 
+   */
+  private void onClusterJoinFailure() {
+logger.warn("Disconnecting Helix manager..");
+disconnectHelixManager();
+
+HelixAdmin admin = new 
ZKHelixAdmin(clusterConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY));
 
 Review comment:
   I couldn't fully get the purpose of disconnect from existing helixManager, 
creating a HelixAdmin and use the newly-created admin to conduct 
instance-dropping. Isn't `helixManager.getClusterManagmentTool()` essentially a 
same HelixAdmin that we can use to drop instance ? What's the gain of 
disconnecting helixManager? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-27 Thread GitBox
autumnust commented on a change in pull request #2940: GOBBLIN-1099: Handle 
orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399584405
 
 

 ##
 File path: 
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/GobblinYarnAppLauncher.java
 ##
 @@ -479,6 +488,26 @@ void connectHelixManager() {
 }
   }
 
+  /**
+   * A method to disable pre-existing live instances in a Helix cluster. This 
can happen when a previous Yarn application
+   * leaves behind orphaned Yarn worker processes. Since Helix does not 
provide an API to drop a live instance, we use
+   * the disable instance API to fence off these orphaned instances and 
prevent them from becoming participants in the
+   * new cluster.
+   *
+   * NOTE: this is a workaround for an existing YARN bug. Once YARN has a fix 
to guarantee container kills on application
+   * completion, this method should be removed.
+   */
+  void disableLiveHelixInstances() {
+String clusterName = this.helixManager.getClusterName();
+HelixAdmin helixAdmin = this.helixManager.getClusterManagmentTool();
+List liveInstances = 
HelixUtils.getLiveInstances(this.helixManager);
+LOGGER.warn("Found {} live instances in the cluster.", 
liveInstances.size());
+for (String instanceName: liveInstances) {
+  LOGGER.warn("Disabling instance: {}", instanceName);
+  helixAdmin.enableInstance(clusterName, instanceName, false);
 
 Review comment:
   Are there any mechanism from Helix side to sort of retry-joining cluster 
after being disabled? Just want to make sure


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-27 Thread GitBox
autumnust commented on a change in pull request #2940: GOBBLIN-1099: Handle 
orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940#discussion_r399580058
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java
 ##
 @@ -366,20 +377,74 @@ boolean isStopped() {
   }
 
   @VisibleForTesting
-  void connectHelixManager() {
-try {
-  this.jobHelixManager.connect();
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
-  new ParticipantShutdownMessageHandlerFactory());
-  this.jobHelixManager.getMessagingService()
-  
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
-  getUserDefinedMessageHandlerFactory());
-  if (this.taskDriverHelixManager.isPresent()) {
-this.taskDriverHelixManager.get().connect();
+  void connectHelixManager() throws Exception {
+this.jobHelixManager.connect();
+//Ensure the instance is enabled.
+this.jobHelixManager.getClusterManagmentTool().enableInstance(clusterName, 
helixInstanceName, true);
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(GobblinHelixConstants.SHUTDOWN_MESSAGE_TYPE,
+new ParticipantShutdownMessageHandlerFactory());
+this.jobHelixManager.getMessagingService()
+
.registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(),
+getUserDefinedMessageHandlerFactory());
+if (this.taskDriverHelixManager.isPresent()) {
+  this.taskDriverHelixManager.get().connect();
+  //Ensure the instance is enabled.
+  
this.taskDriverHelixManager.get().getClusterManagmentTool().enableInstance(this.taskDriverHelixManager.get().getClusterName(),
 helixInstanceName, true);
+}
+  }
+
+  /**
+   * A method to handle failures joining Helix cluster. The method will 
perform the following steps before attempting
+   * to re-join the cluster:
+   * 
+   *   Disconnect from Helix cluster, which would close any open 
clients
+   *   Drop instance from Helix cluster, to remove any partial instance 
structure from Helix
+   *   Re-construct helix manager instances, used to re-join the 
cluster
+   * 
+   */
+  private void onClusterJoinFailure() {
+logger.warn("Disconnecting Helix manager..");
+disconnectHelixManager();
+
+HelixAdmin admin = new 
ZKHelixAdmin(clusterConfig.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY));
+//Drop the helix Instance
+logger.warn("Dropping instance: {} from cluster: {}", helixInstanceName, 
clusterName);
+HelixUtils.dropInstanceIfExists(admin, clusterName, helixInstanceName);
 
 Review comment:
   The return value of this method is ignored, is that intentional? Or shall we 
leave the handling of exception to caller of this static method ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=410741=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410741
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 27/Mar/20 00:35
Start Date: 27/Mar/20 00:35
Worklog Time Spent: 10m 
  Work Description: codecov-io commented on issue #2940: GOBBLIN-1099: 
Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: 
https://github.com/apache/incubator-gobblin/pull/2940#issuecomment-604751963
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=h1)
 Report
   > Merging 
[#2940](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/a63461257c3fcea8f4ff67087f8cb29be25d6baf=desc)
 will **decrease** coverage by `0.00%`.
   > The diff coverage is `41.57%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/graphs/tree.svg?width=650=150=pr=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2940  +/-   ##
   
   - Coverage 44.60%   44.60%   -0.01% 
   - Complexity 8980 8981   +1 
   
 Files  1936 1936  
 Lines 7323473296  +62 
 Branches   8083 8089   +6 
   
   + Hits  3266932695  +26 
   - Misses3751537550  +35 
   - Partials   3050 3051   +1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../apache/gobblin/yarn/GobblinApplicationMaster.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbkFwcGxpY2F0aW9uTWFzdGVyLmphdmE=)
 | `17.80% <0.00%> (-0.25%)` | `3.00 <0.00> (ø)` | |
   | 
[...rg/apache/gobblin/yarn/GobblinYarnAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5BcHBMYXVuY2hlci5qYXZh)
 | `20.66% <0.00%> (-0.51%)` | `8.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh)
 | `34.12% <18.18%> (-4.14%)` | `14.00 <1.00> (ø)` | |
   | 
[...main/java/org/apache/gobblin/yarn/YarnService.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vWWFyblNlcnZpY2UuamF2YQ==)
 | `15.71% <22.22%> (+0.44%)` | `4.00 <0.00> (ø)` | |
   | 
[.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==)
 | `66.01% <75.00%> (+1.85%)` | `29.00 <4.00> (+2.00)` | |
   | 
[.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=)
 | `53.91% <100.00%> (ø)` | `26.00 <1.00> (ø)` | |
   | 
[...ava/org/apache/gobblin/fsm/FiniteStateMachine.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2ZzbS9GaW5pdGVTdGF0ZU1hY2hpbmUuamF2YQ==)
 | `73.48% <0.00%> (-3.04%)` | `18.00% <0.00%> (-3.00%)` | |
   | 
[...lin/restli/throttling/ZookeeperLeaderElection.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2UvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2Utc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3Jlc3RsaS90aHJvdHRsaW5nL1pvb2tlZXBlckxlYWRlckVsZWN0aW9uLmphdmE=)
 | `70.00% <0.00%> (-2.23%)` | `13.00% <0.00%> (ø%)` | |
   | ... and [3 
more](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolut

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-26 Thread GitBox
codecov-io edited a comment on issue #2940: GOBBLIN-1099: Handle orphaned Yarn 
containers in Gobblin-on-Yarn clus…
URL: 
https://github.com/apache/incubator-gobblin/pull/2940#issuecomment-604751963
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=h1)
 Report
   > Merging 
[#2940](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/a63461257c3fcea8f4ff67087f8cb29be25d6baf=desc)
 will **decrease** coverage by `0.00%`.
   > The diff coverage is `41.57%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/graphs/tree.svg?width=650=150=pr=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2940  +/-   ##
   
   - Coverage 44.60%   44.60%   -0.01% 
   - Complexity 8980 8981   +1 
   
 Files  1936 1936  
 Lines 7323473296  +62 
 Branches   8083 8089   +6 
   
   + Hits  3266932695  +26 
   - Misses3751537550  +35 
   - Partials   3050 3051   +1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../apache/gobblin/yarn/GobblinApplicationMaster.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbkFwcGxpY2F0aW9uTWFzdGVyLmphdmE=)
 | `17.80% <0.00%> (-0.25%)` | `3.00 <0.00> (ø)` | |
   | 
[...rg/apache/gobblin/yarn/GobblinYarnAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5BcHBMYXVuY2hlci5qYXZh)
 | `20.66% <0.00%> (-0.51%)` | `8.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh)
 | `34.12% <18.18%> (-4.14%)` | `14.00 <1.00> (ø)` | |
   | 
[...main/java/org/apache/gobblin/yarn/YarnService.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vWWFyblNlcnZpY2UuamF2YQ==)
 | `15.71% <22.22%> (+0.44%)` | `4.00 <0.00> (ø)` | |
   | 
[.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==)
 | `66.01% <75.00%> (+1.85%)` | `29.00 <4.00> (+2.00)` | |
   | 
[.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=)
 | `53.91% <100.00%> (ø)` | `26.00 <1.00> (ø)` | |
   | 
[...ava/org/apache/gobblin/fsm/FiniteStateMachine.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2ZzbS9GaW5pdGVTdGF0ZU1hY2hpbmUuamF2YQ==)
 | `73.48% <0.00%> (-3.04%)` | `18.00% <0.00%> (-3.00%)` | |
   | 
[...lin/restli/throttling/ZookeeperLeaderElection.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2UvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2Utc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3Jlc3RsaS90aHJvdHRsaW5nL1pvb2tlZXBlckxlYWRlckVsZWN0aW9uLmphdmE=)
 | `70.00% <0.00%> (-2.23%)` | `13.00% <0.00%> (ø%)` | |
   | ... and [3 
more](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=footer).
 Last update 
[a634612...2915df6](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This 

[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=410737=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410737
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 27/Mar/20 00:19
Start Date: 27/Mar/20 00:19
Worklog Time Spent: 10m 
  Work Description: codecov-io commented on issue #2940: GOBBLIN-1099: 
Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: 
https://github.com/apache/incubator-gobblin/pull/2940#issuecomment-604751963
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=h1)
 Report
   > Merging 
[#2940](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/a63461257c3fcea8f4ff67087f8cb29be25d6baf=desc)
 will **decrease** coverage by `40.49%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/graphs/tree.svg?width=650=150=pr=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2940   +/-   ##
   
   - Coverage 44.60%   4.11%   -40.50% 
   + Complexity 8980 750 -8230 
   
 Files  19361936   
 Lines 73234   73296   +62 
 Branches   80838089+6 
   
   - Hits  326693016-29653 
   - Misses37515   69960+32445 
   + Partials   3050 320 -2730 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=)
 | `0.00% <0.00%> (-53.92%)` | `0.00 <0.00> (-26.00)` | |
   | 
[.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==)
 | `0.00% <0.00%> (-64.16%)` | `0.00 <0.00> (-27.00)` | |
   | 
[...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh)
 | `0.00% <0.00%> (-38.27%)` | `0.00 <0.00> (-14.00)` | |
   | 
[...n/compaction/action/CompactionWatermarkAction.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vYWN0aW9uL0NvbXBhY3Rpb25XYXRlcm1hcmtBY3Rpb24uamF2YQ==)
 | `0.00% <0.00%> (-77.05%)` | `0.00 <0.00> (-11.00)` | |
   | 
[...che/gobblin/salesforce/ResultChainingIterator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvUmVzdWx0Q2hhaW5pbmdJdGVyYXRvci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../apache/gobblin/yarn/GobblinApplicationMaster.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbkFwcGxpY2F0aW9uTWFzdGVyLmphdmE=)
 | `0.00% <0.00%> (-18.06%)` | `0.00 <0.00> (-3.00)` | |
   | 
[...rg/apache/gobblin/yarn/GobblinYarnAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5BcHBMYXVuY2hlci5qYXZh)
 | `0.00% <0.00%> (-21.17%)` | `0.00 <0.00> (-8.00)` | |
   | 
[...main/java/org/apache/gobblin/yarn/YarnService.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vWWFyblNlcnZpY2UuamF2YQ==)
 | `0.00% <0.00%> (-15.28%)` | `0.00 <0.00> (-4.00)` | |
   | 
[...c/main/java/org/apache/gobblin/util/FileUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvRmlsZVV0aWxzLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...n/java/org/apach

[GitHub] [incubator-gobblin] codecov-io commented on issue #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-26 Thread GitBox
codecov-io commented on issue #2940: GOBBLIN-1099: Handle orphaned Yarn 
containers in Gobblin-on-Yarn clus…
URL: 
https://github.com/apache/incubator-gobblin/pull/2940#issuecomment-604751963
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=h1)
 Report
   > Merging 
[#2940](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/a63461257c3fcea8f4ff67087f8cb29be25d6baf=desc)
 will **decrease** coverage by `40.49%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/graphs/tree.svg?width=650=150=pr=4MgURJ0bGc)](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2940   +/-   ##
   
   - Coverage 44.60%   4.11%   -40.50% 
   + Complexity 8980 750 -8230 
   
 Files  19361936   
 Lines 73234   73296   +62 
 Branches   80838089+6 
   
   - Hits  326693016-29653 
   - Misses37515   69960+32445 
   + Partials   3050 320 -2730 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2940?src=pr=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=)
 | `0.00% <0.00%> (-53.92%)` | `0.00 <0.00> (-26.00)` | |
   | 
[.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==)
 | `0.00% <0.00%> (-64.16%)` | `0.00 <0.00> (-27.00)` | |
   | 
[...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh)
 | `0.00% <0.00%> (-38.27%)` | `0.00 <0.00> (-14.00)` | |
   | 
[...n/compaction/action/CompactionWatermarkAction.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vYWN0aW9uL0NvbXBhY3Rpb25XYXRlcm1hcmtBY3Rpb24uamF2YQ==)
 | `0.00% <0.00%> (-77.05%)` | `0.00 <0.00> (-11.00)` | |
   | 
[...che/gobblin/salesforce/ResultChainingIterator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvUmVzdWx0Q2hhaW5pbmdJdGVyYXRvci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../apache/gobblin/yarn/GobblinApplicationMaster.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbkFwcGxpY2F0aW9uTWFzdGVyLmphdmE=)
 | `0.00% <0.00%> (-18.06%)` | `0.00 <0.00> (-3.00)` | |
   | 
[...rg/apache/gobblin/yarn/GobblinYarnAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5BcHBMYXVuY2hlci5qYXZh)
 | `0.00% <0.00%> (-21.17%)` | `0.00 <0.00> (-8.00)` | |
   | 
[...main/java/org/apache/gobblin/yarn/YarnService.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vWWFyblNlcnZpY2UuamF2YQ==)
 | `0.00% <0.00%> (-15.28%)` | `0.00 <0.00> (-4.00)` | |
   | 
[...c/main/java/org/apache/gobblin/util/FileUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvRmlsZVV0aWxzLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...n/java/org/apache/gobblin/fork/CopyableSchema.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2ZvcmsvQ29weWFibGVTY2hlbWEuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | ... and [1150 
more](https://codecov.io/gh/apache/incubator-gobblin/pull/2940/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubato

[GitHub] [incubator-gobblin] sv2000 commented on issue #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-26 Thread GitBox
sv2000 commented on issue #2940: GOBBLIN-1099: Handle orphaned Yarn containers 
in Gobblin-on-Yarn clus…
URL: 
https://github.com/apache/incubator-gobblin/pull/2940#issuecomment-604735451
 
 
   @autumnust @htran1 please review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=410697=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410697
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 26/Mar/20 23:18
Start Date: 26/Mar/20 23:18
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on issue #2940: GOBBLIN-1099: Handle 
orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: 
https://github.com/apache/incubator-gobblin/pull/2940#issuecomment-604735451
 
 
   @autumnust @htran1 please review.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 410697)
Time Spent: 20m  (was: 10m)

> Handle orphaned Yarn containers in Gobblin-on-Yarn clusters
> ---
>
>     Key: GOBBLIN-1099
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> A Yarn application may leave behind orphaned containers, which can happen due 
> to lost node managers. The orphaned containers however can continue to run 
> (potentially forever) as participants in the Helix cluster. This can cause 
> the following problems for a Gobblin-on-Yarn application:
>  # Double publish of data and commit of state
>  # Task failures and partition starvation during application restarts, as 
> Helix may assign tasks to the orphaned containers which have a stale state 
> and configuration
>  # Container failures on application restarts due to Helix instance name 
> collisions with orphaned containers
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1099?focusedWorklogId=410696=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410696
 ]

ASF GitHub Bot logged work on GOBBLIN-1099:
---

Author: ASF GitHub Bot
Created on: 26/Mar/20 23:18
Start Date: 26/Mar/20 23:18
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2940: GOBBLIN-1099: 
Handle orphaned Yarn containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940
 
 
   …ters
   
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-1099
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   A Yarn application may leave behind orphaned containers, which can happen 
due to lost node managers. The orphaned containers however can continue to run 
(potentially forever) as participants in the Helix cluster. This can cause the 
following problems for a Gobblin-on-Yarn application:
   
   Double publish of data and commit of state
   Task failures and partition starvation during application restarts, as Helix 
may assign tasks to the orphaned containers which have a stale state and 
configuration
   Container failures on application restarts due to Helix instance name 
collisions with orphaned containers
   
   This PR incorporates the following changes to handle orphaned containers:
   1. Disables live instances during Yarn application start up and shutdown in 
GobblinYarnAppLauncher. This ensures that any orphaned Helix instances are 
disabled from joining the cluster on restart and any tasks running on these 
instances are cancelled.
   2. The GobblinApplicationMaster (inside YarnService) ensures that Helix 
instance name assigned to a newly allocated container is not a currently live 
instance to avoid instance name collisions. We also handle NM failures/restarts 
that result in Yarn RM "aborting" the container (i.e. container is deemed dead 
from Yarn's point of view, even though the container may physically be alive). 
In this case, we disable the instance to ensure it is fenced off from the Helix 
cluster.
   3. Gobblin workers (i.e. GobblinYarnTaskRunner) retry in case of failure in 
joining a Helix cluster. The retry logic disconnects from the Helix cluster and 
drops the instance before reattempting to join the cluster. 
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   Added unit test in GobblinTaskRunnerTest
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 410696)
Remaining Estimate: 0h
    Time Spent: 10m

> Handle orphaned Yarn containers in Gobblin-on-Yarn clusters
> ---
>
> Key: GOBBLIN-1099
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A Yarn application may leave behind orphaned containers, which can happen due 
> to lost node managers. The orphaned containers however can continue to run 
> (potentially forever) as participants in the Helix cluster. Th

[GitHub] [incubator-gobblin] sv2000 opened a new pull request #2940: GOBBLIN-1099: Handle orphaned Yarn containers in Gobblin-on-Yarn clus…

2020-03-26 Thread GitBox
sv2000 opened a new pull request #2940: GOBBLIN-1099: Handle orphaned Yarn 
containers in Gobblin-on-Yarn clus…
URL: https://github.com/apache/incubator-gobblin/pull/2940
 
 
   …ters
   
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-1099
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   A Yarn application may leave behind orphaned containers, which can happen 
due to lost node managers. The orphaned containers however can continue to run 
(potentially forever) as participants in the Helix cluster. This can cause the 
following problems for a Gobblin-on-Yarn application:
   
   Double publish of data and commit of state
   Task failures and partition starvation during application restarts, as Helix 
may assign tasks to the orphaned containers which have a stale state and 
configuration
   Container failures on application restarts due to Helix instance name 
collisions with orphaned containers
   
   This PR incorporates the following changes to handle orphaned containers:
   1. Disables live instances during Yarn application start up and shutdown in 
GobblinYarnAppLauncher. This ensures that any orphaned Helix instances are 
disabled from joining the cluster on restart and any tasks running on these 
instances are cancelled.
   2. The GobblinApplicationMaster (inside YarnService) ensures that Helix 
instance name assigned to a newly allocated container is not a currently live 
instance to avoid instance name collisions. We also handle NM failures/restarts 
that result in Yarn RM "aborting" the container (i.e. container is deemed dead 
from Yarn's point of view, even though the container may physically be alive). 
In this case, we disable the instance to ensure it is fenced off from the Helix 
cluster.
   3. Gobblin workers (i.e. GobblinYarnTaskRunner) retry in case of failure in 
joining a Helix cluster. The retry logic disconnects from the Helix cluster and 
drops the instance before reattempting to join the cluster. 
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   Added unit test in GobblinTaskRunnerTest
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (GOBBLIN-1099) Handle orphaned Yarn containers in Gobblin-on-Yarn clusters

2020-03-26 Thread Sudarshan Vasudevan (Jira)
Sudarshan Vasudevan created GOBBLIN-1099:


 Summary: Handle orphaned Yarn containers in Gobblin-on-Yarn 
clusters
 Key: GOBBLIN-1099
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
 Project: Apache Gobblin
  Issue Type: Improvement
  Components: gobblin-yarn
Affects Versions: 0.15.0
Reporter: Sudarshan Vasudevan
Assignee: Abhishek Tiwari
 Fix For: 0.15.0


A Yarn application may leave behind orphaned containers, which can happen due 
to lost node managers. The orphaned containers however can continue to run 
(potentially forever) as participants in the Helix cluster. This can cause the 
following problems for a Gobblin-on-Yarn application:
 # Double publish of data and commit of state
 # Task failures and partition starvation during application restarts, as Helix 
may assign tasks to the orphaned containers which have a stale state and 
configuration
 # Container failures on application restarts due to Helix instance name 
collisions with orphaned containers

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] asfgit closed pull request #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton

2020-02-22 Thread GitBox
asfgit closed pull request #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running 
Azkaban job skeleton
URL: https://github.com/apache/incubator-gobblin/pull/2873
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton

2020-02-12 Thread GitBox
autumnust commented on a change in pull request #2873: 
[GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton
URL: https://github.com/apache/incubator-gobblin/pull/2873#discussion_r378448071
 
 

 ##
 File path: 
gobblin-modules/gobblin-azkaban/src/main/java/org/apache/gobblin/azkaban/EmbeddedGobblinYarnAppLauncher.java
 ##
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.azkaban;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.OutputStream;
+import java.io.PrintWriter;
+import java.lang.reflect.Field;
+import java.util.Map;
+
+import org.apache.gobblin.testing.AssertWithBackoff;
+import org.apache.hadoop.yarn.conf.YarnConfiguration;
+import org.apache.hadoop.yarn.server.MiniYARNCluster;
+import org.testng.collections.Lists;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Predicate;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.io.Closer;
+
+import lombok.extern.slf4j.Slf4j;
+
+
+/**
+ * Given a set up Azkaban job configuration, launch the Gobblin-on-Yarn job in 
a semi-embedded mode:
+ * - Uses external Kafka cluster and requires external Zookeeper(Non-embedded 
TestingServer) to be set up.
+ * The Kafka Cluster was intentionally set to be external due to the data 
availability. External ZK was unintentional
+ * as the helix version (0.9) being used cannot finish state transition in the 
Embedded ZK.
+ * TODO: Adding embedded Kafka cluster and set golden datasets for 
data-validation.
+ * - Uses MiniYARNCluster so YARN components don't have to be installed.
+ */
+@Slf4j
+public class EmbeddedGobblinYarnAppLauncher extends AzkabanJobRunner {
+  public static final String DYNAMIC_CONF_PATH = "dynamic.conf";
+  public static final String YARN_SITE_XML_PATH = "yarn-site.xml";
+  private static String zkString = "";
+  private static String fileAddress = "";
+
+  private static void setup(String[] args)
+  throws Exception {
+// Parsing zk-string
+Preconditions.checkArgument(args.length == 1);
+zkString = args[0];
+
+// Initialize necessary external components: Yarn and Helix
+Closer closer = Closer.create();
+
+// Set java home in environment since it isn't set on some systems
+String javaHome = System.getProperty("java.home");
+setEnv("JAVA_HOME", javaHome);
+
+final YarnConfiguration clusterConf = new YarnConfiguration();
+clusterConf.set("yarn.resourcemanager.connect.max-wait.ms", "1");
+clusterConf.set("yarn.nodemanager.resource.memory-mb", "512");
+clusterConf.set("yarn.scheduler.maximum-allocation-mb", "1024");
+
+MiniYARNCluster miniYARNCluster = closer.register(new 
MiniYARNCluster("TestCluster", 1, 1, 1));
+miniYARNCluster.init(clusterConf);
+miniYARNCluster.start();
+
+// YARN client should not be started before the Resource Manager is up
+AssertWithBackoff.create().logger(log).timeoutMs(1).assertTrue(new 
Predicate() {
+  @Override
+  public boolean apply(Void input) {
+return !clusterConf.get(YarnConfiguration.RM_ADDRESS).contains(":0");
+  }
+}, "Waiting for RM");
+
+try (PrintWriter pw = new PrintWriter(DYNAMIC_CONF_PATH, "UTF-8")) {
+  File dir = new File("target/dummydir");
+
+  // dummy directory specified in configuration
+  if (!dir.mkdir()) {
+log.error("The dummy folder's creation is not successful");
+  }
+  dir.deleteOnExit();
+
+  pw.println("gobblin.cluster.zk.connection.string=\"" + zkString + "\"");
+  pw.println("jobconf.fullyQualifiedPath=\"" + dir.getAbsolutePath() + 
"\"");
+}
+
+// YARN config is dynamic and needs to be passed to other processes
+try (OutputStream os = new FileOutputStream(new File(YARN_SITE_XML_PATH))) 
{
+  clusterConf.writeXml(os);
+}
+
+/**

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton

2020-02-06 Thread GitBox
sv2000 commented on a change in pull request #2873: 
[GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton
URL: https://github.com/apache/incubator-gobblin/pull/2873#discussion_r376171137
 
 

 ##
 File path: 
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/GobblinYarnAppLauncher.java
 ##
 @@ -687,9 +691,13 @@ private void addAppLocalFiles(String localFilePathList, 
Optional

[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton

2020-02-06 Thread GitBox
sv2000 commented on a change in pull request #2873: 
[GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton
URL: https://github.com/apache/incubator-gobblin/pull/2873#discussion_r376166058
 
 

 ##
 File path: 
gobblin-modules/gobblin-azkaban/src/main/java/org/apache/gobblin/azkaban/EmbeddedGobblinYarnAppLauncher.java
 ##
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.azkaban;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.OutputStream;
+import java.io.PrintWriter;
+import java.lang.reflect.Field;
+import java.util.Map;
+
+import org.apache.gobblin.testing.AssertWithBackoff;
+import org.apache.hadoop.yarn.conf.YarnConfiguration;
+import org.apache.hadoop.yarn.server.MiniYARNCluster;
+import org.testng.collections.Lists;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Predicate;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.io.Closer;
+
+import lombok.extern.slf4j.Slf4j;
+
+
+/**
+ * Given a set up Azkaban job configuration, launch the Gobblin-on-Yarn job in 
a semi-embedded mode:
+ * - Uses external Kafka cluster and requires external Zookeeper(Non-embedded 
TestingServer) to be set up.
+ * The Kafka Cluster was intentionally set to be external due to the data 
availability. External ZK was unintentional
+ * as the helix version (0.9) being used cannot finish state transition in the 
Embedded ZK.
+ * TODO: Adding embedded Kafka cluster and set golden datasets for 
data-validation.
+ * - Uses MiniYARNCluster so YARN components don't have to be installed.
+ */
+@Slf4j
+public class EmbeddedGobblinYarnAppLauncher extends AzkabanJobRunner {
+  public static final String DYNAMIC_CONF_PATH = "dynamic.conf";
+  public static final String YARN_SITE_XML_PATH = "yarn-site.xml";
+  private static String zkString = "";
+  private static String fileAddress = "";
+
+  private static void setup(String[] args)
+  throws Exception {
+// Parsing zk-string
+Preconditions.checkArgument(args.length == 1);
+zkString = args[0];
+
+// Initialize necessary external components: Yarn and Helix
+Closer closer = Closer.create();
+
+// Set java home in environment since it isn't set on some systems
+String javaHome = System.getProperty("java.home");
+setEnv("JAVA_HOME", javaHome);
+
+final YarnConfiguration clusterConf = new YarnConfiguration();
+clusterConf.set("yarn.resourcemanager.connect.max-wait.ms", "1");
+clusterConf.set("yarn.nodemanager.resource.memory-mb", "512");
+clusterConf.set("yarn.scheduler.maximum-allocation-mb", "1024");
+
+MiniYARNCluster miniYARNCluster = closer.register(new 
MiniYARNCluster("TestCluster", 1, 1, 1));
+miniYARNCluster.init(clusterConf);
+miniYARNCluster.start();
+
+// YARN client should not be started before the Resource Manager is up
+AssertWithBackoff.create().logger(log).timeoutMs(1).assertTrue(new 
Predicate() {
+  @Override
+  public boolean apply(Void input) {
+return !clusterConf.get(YarnConfiguration.RM_ADDRESS).contains(":0");
+  }
+}, "Waiting for RM");
+
+try (PrintWriter pw = new PrintWriter(DYNAMIC_CONF_PATH, "UTF-8")) {
+  File dir = new File("target/dummydir");
+
+  // dummy directory specified in configuration
+  if (!dir.mkdir()) {
+log.error("The dummy folder's creation is not successful");
+  }
+  dir.deleteOnExit();
+
+  pw.println("gobblin.cluster.zk.connection.string=\"" + zkString + "\"");
+  pw.println("jobconf.fullyQualifiedPath=\"" + dir.getAbsolutePath() + 
"\"");
+}
+
+// YARN config is dynamic and needs to be passed to other processes
+try (OutputStream os = new FileOutputStream(new File(YARN_SITE_XML_PATH))) 
{
+  clusterConf.writeXml(os);
+}
+
+/** Have to pass the 

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton

2020-02-04 Thread GitBox
codecov-io edited a comment on issue #2873: [GOBBLIN-1031]Gobblin-on-Yarn 
locally running Azkaban job skeleton
URL: 
https://github.com/apache/incubator-gobblin/pull/2873#issuecomment-575821726
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2873?src=pr=h1)
 Report
   > Merging 
[#2873](https://codecov.io/gh/apache/incubator-gobblin/pull/2873?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/ef3003c85531aaeacb760a4e30fca2d3069001aa?src=pr=desc)
 will **decrease** coverage by `0.04%`.
   > The diff coverage is `3.5%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2873?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2873  +/-   ##
   
   - Coverage 45.73%   45.68%   -0.05% 
 Complexity 9112 9112  
   
 Files  1921 1924   +3 
 Lines 7239172495 +104 
 Branches   7967 7977  +10 
   
   + Hits  3310633120  +14 
   - Misses3626236353  +91 
   + Partials   3023 3022   -1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2873?src=pr=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=)
 | `54.75% <ø> (ø)` | `28 <0> (ø)` | :arrow_down: |
   | 
[...n/java/org/apache/gobblin/util/logs/LogCopier.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvbG9ncy9Mb2dDb3BpZXIuamF2YQ==)
 | `69.56% <ø> (ø)` | `18 <0> (ø)` | :arrow_down: |
   | 
[...che/gobblin/yarn/GobblinYarnConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5Db25maWd1cmF0aW9uS2V5cy5qYXZh)
 | `66.66% <ø> (ø)` | `1 <0> (ø)` | :arrow_down: |
   | 
[.../org/apache/gobblin/yarn/GobblinYarnLogSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5Mb2dTb3VyY2UuamF2YQ==)
 | `23.07% <ø> (ø)` | `3 <0> (ø)` | :arrow_down: |
   | 
[...e/gobblin/yarn/AbstractYarnAppSecurityManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vQWJzdHJhY3RZYXJuQXBwU2VjdXJpdHlNYW5hZ2VyLmphdmE=)
 | `48.27% <ø> (ø)` | `6 <0> (ø)` | :arrow_down: |
   | 
[...gobblin/azkaban/AzkabanGobblinYarnAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tYXprYWJhbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9hemthYmFuL0F6a2FiYW5Hb2JibGluWWFybkFwcExhdW5jaGVyLmphdmE=)
 | `30.55% <0%> (-2.78%)` | `2 <0> (ø)` | |
   | 
[...obblin/azkaban/EmbeddedGobblinYarnAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tYXprYWJhbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9hemthYmFuL0VtYmVkZGVkR29iYmxpbllhcm5BcHBMYXVuY2hlci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...in/source/extractor/extract/kafka/KafkaSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVNvdXJjZS5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...in/azkaban/AzkabanGobblinLocalYarnAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tYXprYWJhbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9hemthYmFuL0F6a2FiYW5Hb2JibGluTG9jYWxZYXJuQXBwTGF1bmNoZXIuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...rg/apache/gobblin/yarn/GobblinYarnAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5BcHBMYXVuY2hlci5qYXZh)
 | `21.18% <0%> (-0.22%)` | `8 <0> (ø)` | |
   | ... and [10 
more](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to revie

[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton

2020-02-04 Thread GitBox
autumnust commented on a change in pull request #2873: 
[GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton
URL: https://github.com/apache/incubator-gobblin/pull/2873#discussion_r374996548
 
 

 ##
 File path: 
gobblin-modules/gobblin-azkaban/src/main/java/org/apache/gobblin/azkaban/EmbeddedGobblinYarnAppLauncher.java
 ##
 @@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.azkaban;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.OutputStream;
+import java.io.PrintWriter;
+import java.lang.reflect.Field;
+import java.util.Map;
+
+import org.apache.gobblin.testing.AssertWithBackoff;
+import org.apache.hadoop.yarn.conf.YarnConfiguration;
+import org.apache.hadoop.yarn.server.MiniYARNCluster;
+import org.testng.collections.Lists;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Predicate;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.io.Closer;
+
+import lombok.extern.slf4j.Slf4j;
+
+
+/**
+ * Given a set up Azkaban job configuration, launch the Gobblin-on-Yarn job in 
a semi-embedded mode:
+ * - Uses external Kafka cluster and requires external Zookeeper(Non-embedded 
TestingServer) to be set up.
 
 Review comment:
   Yes, Kafka was intentional from very beginning, as we don't have quick 
set-up script for kafka and a good way to create a golden datasets there. 
   
   ZK was kind of intentional, but briefly the Helix behaves unexpectedly 
unstable on embedded ZK server so I abandoned that eventually. Will add these 
into comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton

2020-01-20 Thread GitBox
sv2000 commented on a change in pull request #2873: 
[GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton
URL: https://github.com/apache/incubator-gobblin/pull/2873#discussion_r368765996
 
 

 ##
 File path: gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java
 ##
 @@ -533,6 +533,7 @@ private void addRemoteAppFiles(String hdfsFileList, 
Map r
 }
   }
 
+  // TODO: Not fully understand the purpose of this method.
 
 Review comment:
   What is the TODO here? Drop this comment if unintentional.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton

2020-01-20 Thread GitBox
sv2000 commented on a change in pull request #2873: 
[GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton
URL: https://github.com/apache/incubator-gobblin/pull/2873#discussion_r368764748
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinHelixJobLauncher.java
 ##
 @@ -93,8 +93,8 @@
  * 
  *
  * 
- *   This class runs in the {@link GobblinClusterManager}. The actual task 
execution happens in the in the
- *   {@link GobblinTaskRunner}.
+ *   The instance of this class is instantiated and runing along with the 
{@link GobblinHelixJobScheduler}.
 
 Review comment:
   Rewording: This class is instantiated by the {@link 
GobblinHelixJobScheduler} on every job submission to launch the Gobblin job.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton

2020-01-20 Thread GitBox
sv2000 commented on a change in pull request #2873: 
[GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton
URL: https://github.com/apache/incubator-gobblin/pull/2873#discussion_r368766539
 
 

 ##
 File path: 
gobblin-modules/gobblin-azkaban/src/main/resources/conf/properties/local.properties
 ##
 @@ -0,0 +1,23 @@
+# Misc. deployment/cluster specific variables
+user.name=kafkaetl
+logs.user.name=kafkaetlstream
+root.project.name=gobblin-kafka-streaming-local
+grid.name=ltx1-holdem
 
 Review comment:
   Change this config.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton

2020-01-20 Thread GitBox
sv2000 commented on a change in pull request #2873: 
[GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton
URL: https://github.com/apache/incubator-gobblin/pull/2873#discussion_r368766504
 
 

 ##
 File path: 
gobblin-modules/gobblin-azkaban/src/main/resources/conf/properties/local.properties
 ##
 @@ -0,0 +1,23 @@
+# Misc. deployment/cluster specific variables
+user.name=kafkaetl
+logs.user.name=kafkaetlstream
+root.project.name=gobblin-kafka-streaming-local
+grid.name=ltx1-holdem
+
+
+# Cluster specific directory configurations
+# home.dir and root.data.location should be unique per cluster deployment
+
+home.dir=/jobs/${user.name}
+logs.dir=/jobs/${logs.user.name}
+root.data.location=/data
+
+fs.uri=file:///
+
+# Yarn Configuration
+gobblin.yarn.app.queue=default
+
+# Gobblin Cluster
+#Add a suffix "-1" to the app name becuase of a corrupted Helix cluster znode 
(LIZK-619). This made the helix cluster
 
 Review comment:
   Drop this comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton

2020-01-20 Thread GitBox
sv2000 commented on a change in pull request #2873: 
[GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton
URL: https://github.com/apache/incubator-gobblin/pull/2873#discussion_r368766370
 
 

 ##
 File path: 
gobblin-modules/gobblin-azkaban/src/main/java/org/apache/gobblin/azkaban/EmbeddedGobblinYarnAppLauncher.java
 ##
 @@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.azkaban;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.OutputStream;
+import java.io.PrintWriter;
+import java.lang.reflect.Field;
+import java.util.Map;
+
+import org.apache.gobblin.testing.AssertWithBackoff;
+import org.apache.hadoop.yarn.conf.YarnConfiguration;
+import org.apache.hadoop.yarn.server.MiniYARNCluster;
+import org.testng.collections.Lists;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Predicate;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.io.Closer;
+
+import lombok.extern.slf4j.Slf4j;
+
+
+/**
+ * Given a set up Azkaban job configuration, launch the Gobblin-on-Yarn job in 
a semi-embedded mode:
+ * - Uses external Kafka cluster and requires external Zookeeper(Non-embedded 
TestingServer) to be set up.
+ * - Uses MiniYARNCluster so YARN components don't have to be installed.
+ */
+@Slf4j
+public class EmbeddedGobblinYarnAppLauncher extends AzkabanJobRunner {
+  public static final String DYNAMIC_CONF_PATH = "dynamic.conf";
+  public static final String YARN_SITE_XML_PATH = "yarn-site.xml";
+  private static String zkString = "zk-ltx1-gobblin.stg.linkedin.com:6312";
 
 Review comment:
   Change this line. Shouldn't expose LI internal stuff outside.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton

2020-01-20 Thread GitBox
sv2000 commented on a change in pull request #2873: 
[GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton
URL: https://github.com/apache/incubator-gobblin/pull/2873#discussion_r368765618
 
 

 ##
 File path: 
gobblin-modules/gobblin-azkaban/src/main/java/org/apache/gobblin/azkaban/EmbeddedGobblinYarnAppLauncher.java
 ##
 @@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.azkaban;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.OutputStream;
+import java.io.PrintWriter;
+import java.lang.reflect.Field;
+import java.util.Map;
+
+import org.apache.gobblin.testing.AssertWithBackoff;
+import org.apache.hadoop.yarn.conf.YarnConfiguration;
+import org.apache.hadoop.yarn.server.MiniYARNCluster;
+import org.testng.collections.Lists;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Predicate;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.io.Closer;
+
+import lombok.extern.slf4j.Slf4j;
+
+
+/**
+ * Given a set up Azkaban job configuration, launch the Gobblin-on-Yarn job in 
a semi-embedded mode:
+ * - Uses external Kafka cluster and requires external Zookeeper(Non-embedded 
TestingServer) to be set up.
 
 Review comment:
   Can you add why we are using an external ZK/Kafka server? And a TODO to set 
these up in embedded mode if that is the intention?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton

2020-01-20 Thread GitBox
sv2000 commented on a change in pull request #2873: 
[GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton
URL: https://github.com/apache/incubator-gobblin/pull/2873#discussion_r368766551
 
 

 ##
 File path: 
gobblin-modules/gobblin-azkaban/src/main/resources/conf/properties/local.properties
 ##
 @@ -0,0 +1,23 @@
+# Misc. deployment/cluster specific variables
+user.name=kafkaetl
 
 Review comment:
   Change this config.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton

2020-01-20 Thread GitBox
sv2000 commented on a change in pull request #2873: 
[GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton
URL: https://github.com/apache/incubator-gobblin/pull/2873#discussion_r368764662
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinClusterManager.java
 ##
 @@ -228,6 +228,8 @@ private void stopAppLauncherAndServices() {
 
   /**
* Configure Helix quota-based task scheduling
+   * This Task-quota indicates how many helix-tasks can be running 
concurrently within a Helix Job.
 
 Review comment:
   More precisely: this config controls the number of tasks that are 
concurrently assigned to a single Helix instance.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton

2020-01-20 Thread GitBox
sv2000 commented on a change in pull request #2873: 
[GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton
URL: https://github.com/apache/incubator-gobblin/pull/2873#discussion_r368764148
 
 

 ##
 File path: 
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinHelixJobLauncher.java
 ##
 @@ -93,8 +93,8 @@
  * 
  *
  * 
- *   This class runs in the {@link GobblinClusterManager}. The actual task 
execution happens in the in the
- *   {@link GobblinTaskRunner}.
+ *   The instance of this class is instantiated and runing along with the 
{@link GobblinHelixJobScheduler}.
+ *   The actual task execution happens in the in the {@link 
GobblinTaskRunner}, usually in a different process.
 
 Review comment:
   typo: "in the in the" -> "in the".


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton

2020-01-17 Thread GitBox
codecov-io edited a comment on issue #2873: [GOBBLIN-1031]Gobblin-on-Yarn 
locally running Azkaban job skeleton
URL: 
https://github.com/apache/incubator-gobblin/pull/2873#issuecomment-575821726
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2873?src=pr=h1)
 Report
   > Merging 
[#2873](https://codecov.io/gh/apache/incubator-gobblin/pull/2873?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/adb0efcc20a306d16b2f59d89c189539bd20dc2d?src=pr=desc)
 will **decrease** coverage by `0.05%`.
   > The diff coverage is `3.5%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2873?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2873  +/-   ##
   
   - Coverage  45.8%   45.74%   -0.06% 
   - Complexity 9108 9110   +2 
   
 Files  1915 1918   +3 
 Lines 7226672389 +123 
 Branches   7969 7982  +13 
   
   + Hits  3309833112  +14 
   - Misses3614336250 +107 
   - Partials   3025 3027   +2
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2873?src=pr=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=)
 | `53.91% <ø> (ø)` | `27 <0> (ø)` | :arrow_down: |
   | 
[...n/java/org/apache/gobblin/util/logs/LogCopier.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvbG9ncy9Mb2dDb3BpZXIuamF2YQ==)
 | `69.56% <ø> (ø)` | `18 <0> (ø)` | :arrow_down: |
   | 
[...main/java/org/apache/gobblin/yarn/YarnService.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vWWFyblNlcnZpY2UuamF2YQ==)
 | `15.16% <ø> (ø)` | `4 <0> (ø)` | :arrow_down: |
   | 
[...che/gobblin/yarn/GobblinYarnConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5Db25maWd1cmF0aW9uS2V5cy5qYXZh)
 | `66.66% <ø> (ø)` | `1 <0> (ø)` | :arrow_down: |
   | 
[.../org/apache/gobblin/yarn/GobblinYarnLogSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5Mb2dTb3VyY2UuamF2YQ==)
 | `23.07% <ø> (ø)` | `3 <0> (ø)` | :arrow_down: |
   | 
[...e/gobblin/yarn/AbstractYarnAppSecurityManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vQWJzdHJhY3RZYXJuQXBwU2VjdXJpdHlNYW5hZ2VyLmphdmE=)
 | `48.27% <ø> (ø)` | `6 <0> (ø)` | :arrow_down: |
   | 
[...org/apache/gobblin/yarn/GobblinYarnTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5UYXNrUnVubmVyLmphdmE=)
 | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...a/org/apache/gobblin/azkaban/AzkabanJobRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tYXprYWJhbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9hemthYmFuL0F6a2FiYW5Kb2JSdW5uZXIuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...obblin/azkaban/EmbeddedGobblinYarnAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tYXprYWJhbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9hemthYmFuL0VtYmVkZGVkR29iYmxpbllhcm5BcHBMYXVuY2hlci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...in/source/extractor/extract/kafka/KafkaSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVNvdXJjZS5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | ... and [16 
more](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubato

[GitHub] [incubator-gobblin] codecov-io commented on issue #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton

2020-01-17 Thread GitBox
codecov-io commented on issue #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally 
running Azkaban job skeleton
URL: 
https://github.com/apache/incubator-gobblin/pull/2873#issuecomment-575821726
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2873?src=pr=h1)
 Report
   > Merging 
[#2873](https://codecov.io/gh/apache/incubator-gobblin/pull/2873?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/adb0efcc20a306d16b2f59d89c189539bd20dc2d?src=pr=desc)
 will **decrease** coverage by `0.05%`.
   > The diff coverage is `3.5%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2873?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2873  +/-   ##
   
   - Coverage  45.8%   45.74%   -0.06% 
   - Complexity 9108 9110   +2 
   
 Files  1915 1918   +3 
 Lines 7226672389 +123 
 Branches   7969 7982  +13 
   
   + Hits  3309833112  +14 
   - Misses3614336250 +107 
   - Partials   3025 3027   +2
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2873?src=pr=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=)
 | `53.91% <ø> (ø)` | `27 <0> (ø)` | :arrow_down: |
   | 
[...n/java/org/apache/gobblin/util/logs/LogCopier.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvbG9ncy9Mb2dDb3BpZXIuamF2YQ==)
 | `69.56% <ø> (ø)` | `18 <0> (ø)` | :arrow_down: |
   | 
[...main/java/org/apache/gobblin/yarn/YarnService.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vWWFyblNlcnZpY2UuamF2YQ==)
 | `15.16% <ø> (ø)` | `4 <0> (ø)` | :arrow_down: |
   | 
[...che/gobblin/yarn/GobblinYarnConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5Db25maWd1cmF0aW9uS2V5cy5qYXZh)
 | `66.66% <ø> (ø)` | `1 <0> (ø)` | :arrow_down: |
   | 
[.../org/apache/gobblin/yarn/GobblinYarnLogSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5Mb2dTb3VyY2UuamF2YQ==)
 | `23.07% <ø> (ø)` | `3 <0> (ø)` | :arrow_down: |
   | 
[...e/gobblin/yarn/AbstractYarnAppSecurityManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vQWJzdHJhY3RZYXJuQXBwU2VjdXJpdHlNYW5hZ2VyLmphdmE=)
 | `48.27% <ø> (ø)` | `6 <0> (ø)` | :arrow_down: |
   | 
[...org/apache/gobblin/yarn/GobblinYarnTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5UYXNrUnVubmVyLmphdmE=)
 | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...a/org/apache/gobblin/azkaban/AzkabanJobRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tYXprYWJhbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9hemthYmFuL0F6a2FiYW5Kb2JSdW5uZXIuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...obblin/azkaban/EmbeddedGobblinYarnAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tYXprYWJhbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9hemthYmFuL0VtYmVkZGVkR29iYmxpbllhcm5BcHBMYXVuY2hlci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...in/source/extractor/extract/kafka/KafkaSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS9LYWZrYVNvdXJjZS5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | ... and [16 
more](https://codecov.io/gh/apache/incubator-gobblin/pull/2873/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-gobblin

[GitHub] [incubator-gobblin] autumnust opened a new pull request #2873: [GOBBLIN-1031]Gobblin-on-Yarn locally running Azkaban job skeleton

2020-01-17 Thread GitBox
autumnust opened a new pull request #2873: [GOBBLIN-1031]Gobblin-on-Yarn 
locally running Azkaban job skeleton
URL: https://github.com/apache/incubator-gobblin/pull/2873
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [ ] https://issues.apache.org/jira/browse/GOBBLIN-1031
   
   
   ### Description
   - [ ] Here are some details about my PR, including screenshots (if 
applicable):
   - The major part of this PR is `AzkabanJobRunner.java` which could power 
gobblin job to be running using the same Azkaban setting that Linkedin is using 
internally for job-scheduling, locally. 
   - On top of that, is an Embedded Gobblin JobLauncher that starts a 
Gobblin-Streaming job and submit ApplictionMaster to `MiniYARNCluster` spun up 
in the same process. 
   - Also provide a skeleton of configurations that can be used for running the 
whole job locally. The skeleton has many class-definition that have not been 
open-sourced or only serve Linkedin-specific purpose, the general-purpose job 
configuration is still under development.
   
   
   ### Tests
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


gobblin on yarn - job stuck without any error message logged

2019-12-27 Thread Krishnan Nair, Praveen
Hi,

I am new to gobblin and trying to use gobblin on yarn to import data from 
postgres to store it in hdfs as avro files (daily partitioned).

(Initially I faced an issue related to jodattime not handling  postgres 
timestamp with microsecond precision in 
JsonElementConversionFactory.DateConverter
which I am yet to find a solution but using watermark type simple with user 
defined partition to test out remaining configuration.)

Right now I am facing another issue:

The job gets stuck after completing more than half of workunits without any 
error logs.
The helix debug logs indicate the tasks (for those partition files that are 
missing in taskoutput dir) changed from INIT to RUNNINg in one task runner - 
say task runner 1,
And then later I can see same transition in another task runner say task runner 
4 - to RUNNING.
In between there are logs indicating task changed from RUNNING to COMPLETED. - 
But task output file is missing.
And job never finishes it gets stuck. Helix logs suggests it doesn't see any 
pending tasks and no task assigned to any task runner.
The config works for lesser volume of data (less number of partitions.

Not sure how can I troubleshoot this one. Appreciate your suggestions.

Thanks & Regards,
Praveen



CONFIDENTIALITY NOTICE: This message is the property of International Game 
Technology PLC and/or its subsidiaries and may contain proprietary, 
confidential or trade secret information. This message is intended solely for 
the use of the addressee. If you are not the intended recipient and have 
received this message in error, please delete this message from your system. 
Any unauthorized reading, distribution, copying, or other use of this message 
or its attachments is strictly prohibited.


[jira] [Closed] (GOBBLIN-996) Add support for managed Helix clusters for Gobblin-on-Yarn applications

2019-12-05 Thread Sudarshan Vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudarshan Vasudevan closed GOBBLIN-996.
---

> Add support for managed Helix clusters for Gobblin-on-Yarn applications
> ---
>
>     Key: GOBBLIN-996
> URL: https://issues.apache.org/jira/browse/GOBBLIN-996
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-cluster
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> For managed Helix clusters, the GobblinApplicationMaster should skip 
> connecting to the cluster as CONTROLLER and we should skip cluster creation 
> from GobblinYarnApplicationLauncher.
>  
> This PR also bumps up Helix dependencies to 0.9.x version. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-996) Add support for managed Helix clusters for Gobblin-on-Yarn applications

2019-12-05 Thread Sudarshan Vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudarshan Vasudevan resolved GOBBLIN-996.
-
Resolution: Fixed

> Add support for managed Helix clusters for Gobblin-on-Yarn applications
> ---
>
>     Key: GOBBLIN-996
> URL: https://issues.apache.org/jira/browse/GOBBLIN-996
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-cluster
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> For managed Helix clusters, the GobblinApplicationMaster should skip 
> connecting to the cluster as CONTROLLER and we should skip cluster creation 
> from GobblinYarnApplicationLauncher.
>  
> This PR also bumps up Helix dependencies to 0.9.x version. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-996) Add support for managed Helix clusters for Gobblin-on-Yarn applications

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-996?focusedWorklogId=354756=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354756
 ]

ASF GitHub Bot logged work on GOBBLIN-996:
--

Author: ASF GitHub Bot
Created on: 05/Dec/19 22:42
Start Date: 05/Dec/19 22:42
Worklog Time Spent: 10m 
  Work Description: asfgit commented on pull request #2841: GOBBLIN-996: 
Add support for managed Helix clusters for Gobblin-on-Ya…
URL: https://github.com/apache/incubator-gobblin/pull/2841
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354756)
Time Spent: 50m  (was: 40m)

> Add support for managed Helix clusters for Gobblin-on-Yarn applications
> ---
>
>     Key: GOBBLIN-996
> URL: https://issues.apache.org/jira/browse/GOBBLIN-996
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-cluster
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> For managed Helix clusters, the GobblinApplicationMaster should skip 
> connecting to the cluster as CONTROLLER and we should skip cluster creation 
> from GobblinYarnApplicationLauncher.
>  
> This PR also bumps up Helix dependencies to 0.9.x version. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-996) Add support for managed Helix clusters for Gobblin-on-Yarn applications

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-996?focusedWorklogId=354739=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354739
 ]

ASF GitHub Bot logged work on GOBBLIN-996:
--

Author: ASF GitHub Bot
Created on: 05/Dec/19 22:18
Start Date: 05/Dec/19 22:18
Worklog Time Spent: 10m 
  Work Description: codecov-io commented on issue #2841: GOBBLIN-996: Add 
support for managed Helix clusters for Gobblin-on-Ya…
URL: 
https://github.com/apache/incubator-gobblin/pull/2841#issuecomment-562340792
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2841?src=pr=h1)
 Report
   > Merging 
[#2841](https://codecov.io/gh/apache/incubator-gobblin/pull/2841?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/1092a891af7eda5732c948695b92e34540751613?src=pr=desc)
 will **increase** coverage by `0.01%`.
   > The diff coverage is `11.11%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2841?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2841  +/-   ##
   
   + Coverage 45.66%   45.68%   +0.01% 
   - Complexity 8988 8991   +3 
   
 Files  1903 1903  
 Lines 7132771331   +4 
 Branches   7871 7873   +2 
   
   + Hits  3257232584  +12 
   + Misses3575435748   -6 
   + Partials   3001 2999   -2
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2841?src=pr=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...bblin/cluster/GobblinClusterConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJDb25maWd1cmF0aW9uS2V5cy5qYXZh)
 | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...rg/apache/gobblin/yarn/GobblinYarnAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5BcHBMYXVuY2hlci5qYXZh)
 | `19.94% <0%> (-0.11%)` | `7 <0> (ø)` | |
   | 
[...ache/gobblin/cluster/GobblinHelixMultiManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkhlbGl4TXVsdGlNYW5hZ2VyLmphdmE=)
 | `54.16% <100%> (-0.29%)` | `19 <0> (ø)` | |
   | 
[...apache/gobblin/salesforce/SalesforceExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUV4dHJhY3Rvci5qYXZh)
 | `0% <0%> (ø)` | `0% <0%> (ø)` | :arrow_down: |
   | 
[...lin/restli/throttling/ZookeeperLeaderElection.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/diff?src=pr=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2UvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2Utc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3Jlc3RsaS90aHJvdHRsaW5nL1pvb2tlZXBlckxlYWRlckVsZWN0aW9uLmphdmE=)
 | `72.22% <0%> (+2.22%)` | `13% <0%> (ø)` | :arrow_down: |
   | 
[.../limiter/RedirectAwareRestClientRequestSender.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/diff?src=pr=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2UvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2UtY2xpZW50L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvbGltaXRlci9SZWRpcmVjdEF3YXJlUmVzdENsaWVudFJlcXVlc3RTZW5kZXIuamF2YQ==)
 | `87.2% <0%> (+2.32%)` | `18% <0%> (+1%)` | :arrow_up: |
   | 
[...he/gobblin/writer/FineGrainedWatermarkTracker.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/diff?src=pr=tree#diff-Z29iYmxpbi1jb3JlLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vd3JpdGVyL0ZpbmVHcmFpbmVkV2F0ZXJtYXJrVHJhY2tlci5qYXZh)
 | `84.67% <0%> (+2.41%)` | `29% <0%> (+1%)` | :arrow_up: |
   | 
[...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh)
 | `39.25% <0%> (+3.73%)` | `13% <0%> (+1%)` | :arrow_up: |
   | 
[...lin/util/filesystem/FileSystemInstrumentation.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/diff?src=pr=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL

[jira] [Work logged] (GOBBLIN-996) Add support for managed Helix clusters for Gobblin-on-Yarn applications

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-996?focusedWorklogId=354734=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354734
 ]

ASF GitHub Bot logged work on GOBBLIN-996:
--

Author: ASF GitHub Bot
Created on: 05/Dec/19 22:08
Start Date: 05/Dec/19 22:08
Worklog Time Spent: 10m 
  Work Description: codecov-io commented on issue #2841: GOBBLIN-996: Add 
support for managed Helix clusters for Gobblin-on-Ya…
URL: 
https://github.com/apache/incubator-gobblin/pull/2841#issuecomment-562340792
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2841?src=pr=h1)
 Report
   > Merging 
[#2841](https://codecov.io/gh/apache/incubator-gobblin/pull/2841?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/1092a891af7eda5732c948695b92e34540751613?src=pr=desc)
 will **decrease** coverage by `41.51%`.
   > The diff coverage is `0%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2841?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2841   +/-   ##
   
   - Coverage 45.66%   4.14%   -41.52% 
   + Complexity 8988 747 -8241 
   
 Files  19031903   
 Lines 71327   71331+4 
 Branches   78717873+2 
   
   - Hits  325722960-29612 
   - Misses35754   68052+32298 
   + Partials   3001 319 -2682
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2841?src=pr=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...bblin/cluster/GobblinClusterConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJDb25maWd1cmF0aW9uS2V5cy5qYXZh)
 | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...ache/gobblin/cluster/GobblinHelixMultiManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkhlbGl4TXVsdGlNYW5hZ2VyLmphdmE=)
 | `0% <0%> (-54.46%)` | `0 <0> (-19)` | |
   | 
[...rg/apache/gobblin/yarn/GobblinYarnAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5BcHBMYXVuY2hlci5qYXZh)
 | `0% <0%> (-20.06%)` | `0 <0> (-7)` | |
   | 
[...n/converter/AvroStringFieldDecryptorConverter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY3J5cHRvLXByb3ZpZGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbnZlcnRlci9BdnJvU3RyaW5nRmllbGREZWNyeXB0b3JDb252ZXJ0ZXIuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
   | 
[...he/gobblin/cluster/TaskRunnerSuiteThreadModel.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvVGFza1J1bm5lclN1aXRlVGhyZWFkTW9kZWwuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (-5%)` | |
   | 
[...n/mapreduce/avro/AvroKeyCompactorOutputFormat.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/diff?src=pr=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL2F2cm8vQXZyb0tleUNvbXBhY3Rvck91dHB1dEZvcm1hdC5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (-3%)` | |
   | 
[...apache/gobblin/fork/CopyNotSupportedException.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/diff?src=pr=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZm9yay9Db3B5Tm90U3VwcG9ydGVkRXhjZXB0aW9uLmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (-1%)` | |
   | 
[.../gobblin/kafka/writer/KafkaWriterCommonConfig.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2thZmthL3dyaXRlci9LYWZrYVdyaXRlckNvbW1vbkNvbmZpZy5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (-7%)` | |
   | 
[...ker/task/TaskLevelPolicyCheckerBuilderFactory.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2841/diff?src=pr=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3F1YWxpdHljaGVja2VyL3Rhc2svVGFza0xldmVsUG9saWN5Q2hlY2tlckJ1aWxkZXJGYWN0b3J5LmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (-2%)` | |
   | 
[...bblin/data/

[jira] [Work logged] (GOBBLIN-996) Add support for managed Helix clusters for Gobblin-on-Yarn applications

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-996?focusedWorklogId=354469=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354469
 ]

ASF GitHub Bot logged work on GOBBLIN-996:
--

Author: ASF GitHub Bot
Created on: 05/Dec/19 17:04
Start Date: 05/Dec/19 17:04
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on issue #2841: GOBBLIN-996: Add 
support for managed Helix clusters for Gobblin-on-Ya…
URL: 
https://github.com/apache/incubator-gobblin/pull/2841#issuecomment-562220963
 
 
   @htran1 @autumnust please review this PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354469)
Time Spent: 20m  (was: 10m)

> Add support for managed Helix clusters for Gobblin-on-Yarn applications
> ---
>
>     Key: GOBBLIN-996
> URL: https://issues.apache.org/jira/browse/GOBBLIN-996
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-cluster
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For managed Helix clusters, the GobblinApplicationMaster should skip 
> connecting to the cluster as CONTROLLER and we should skip cluster creation 
> from GobblinYarnApplicationLauncher.
>  
> This PR also bumps up Helix dependencies to 0.9.x version. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-996) Add support for managed Helix clusters for Gobblin-on-Yarn applications

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-996?focusedWorklogId=354467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354467
 ]

ASF GitHub Bot logged work on GOBBLIN-996:
--

Author: ASF GitHub Bot
Created on: 05/Dec/19 17:04
Start Date: 05/Dec/19 17:04
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2841: GOBBLIN-996: 
Add support for managed Helix clusters for Gobblin-on-Ya…
URL: https://github.com/apache/incubator-gobblin/pull/2841
 
 
   …rn applications
   
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-996
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   Add support for managed Helix clusters for Gobblin-on-Yarn applications
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   End-to-end test.
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354467)
Remaining Estimate: 0h
    Time Spent: 10m

> Add support for managed Helix clusters for Gobblin-on-Yarn applications
> -------
>
> Key: GOBBLIN-996
> URL: https://issues.apache.org/jira/browse/GOBBLIN-996
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-cluster
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For managed Helix clusters, the GobblinApplicationMaster should skip 
> connecting to the cluster as CONTROLLER and we should skip cluster creation 
> from GobblinYarnApplicationLauncher.
>  
> This PR also bumps up Helix dependencies to 0.9.x version. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-996) Add support for managed Helix clusters for Gobblin-on-Yarn applications

2019-12-05 Thread Sudarshan Vasudevan (Jira)
Sudarshan Vasudevan created GOBBLIN-996:
---

 Summary: Add support for managed Helix clusters for 
Gobblin-on-Yarn applications
 Key: GOBBLIN-996
 URL: https://issues.apache.org/jira/browse/GOBBLIN-996
 Project: Apache Gobblin
  Issue Type: Improvement
  Components: gobblin-cluster
Affects Versions: 0.15.0
Reporter: Sudarshan Vasudevan
Assignee: Hung Tran
 Fix For: 0.15.0


For managed Helix clusters, the GobblinApplicationMaster should skip connecting 
to the cluster as CONTROLLER and we should skip cluster creation from 
GobblinYarnApplicationLauncher.

 

This PR also bumps up Helix dependencies to 0.9.x version. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-836) Expose container logs location via system property to be used in log4j configuration for Gobblin-on-Yarn applications

2019-07-26 Thread Sudarshan Vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudarshan Vasudevan resolved GOBBLIN-836.
-
Resolution: Fixed

> Expose container logs location via system property to be used in log4j 
> configuration for Gobblin-on-Yarn applications
> -
>
>         Key: GOBBLIN-836
> URL: https://issues.apache.org/jira/browse/GOBBLIN-836
> Project: Apache Gobblin
>  Issue Type: Improvement
>      Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This task exposes container log directory and container log file name via 
> Java System properties, so that they can be referenced in log4j properties. 
> This in turn allows configuring log rotation of container logs based on size 
> and maximum number of back ups.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (GOBBLIN-836) Expose container logs location via system property to be used in log4j configuration for Gobblin-on-Yarn applications

2019-07-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-836?focusedWorklogId=283724=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283724
 ]

ASF GitHub Bot logged work on GOBBLIN-836:
--

Author: ASF GitHub Bot
Created on: 27/Jul/19 04:19
Start Date: 27/Jul/19 04:19
Worklog Time Spent: 10m 
  Work Description: asfgit commented on pull request #2694: GOBBLIN-836: 
Expose container logs location via system property to be…
URL: https://github.com/apache/incubator-gobblin/pull/2694
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 283724)
Time Spent: 0.5h  (was: 20m)

> Expose container logs location via system property to be used in log4j 
> configuration for Gobblin-on-Yarn applications
> -
>
>         Key: GOBBLIN-836
> URL: https://issues.apache.org/jira/browse/GOBBLIN-836
> Project: Apache Gobblin
>  Issue Type: Improvement
>      Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This task exposes container log directory and container log file name via 
> Java System properties, so that they can be referenced in log4j properties. 
> This in turn allows configuring log rotation of container logs based on size 
> and maximum number of back ups.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (GOBBLIN-836) Expose container logs location via system property to be used in log4j configuration for Gobblin-on-Yarn applications

2019-07-26 Thread Sudarshan Vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudarshan Vasudevan closed GOBBLIN-836.
---

> Expose container logs location via system property to be used in log4j 
> configuration for Gobblin-on-Yarn applications
> -
>
>         Key: GOBBLIN-836
> URL: https://issues.apache.org/jira/browse/GOBBLIN-836
> Project: Apache Gobblin
>  Issue Type: Improvement
>      Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This task exposes container log directory and container log file name via 
> Java System properties, so that they can be referenced in log4j properties. 
> This in turn allows configuring log rotation of container logs based on size 
> and maximum number of back ups.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (GOBBLIN-836) Expose container logs location via system property to be used in log4j configuration for Gobblin-on-Yarn applications

2019-07-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-836?focusedWorklogId=283680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283680
 ]

ASF GitHub Bot logged work on GOBBLIN-836:
--

Author: ASF GitHub Bot
Created on: 26/Jul/19 23:16
Start Date: 26/Jul/19 23:16
Worklog Time Spent: 10m 
  Work Description: codecov-io commented on issue #2694: GOBBLIN-836: 
Expose container logs location via system property to be…
URL: 
https://github.com/apache/incubator-gobblin/pull/2694#issuecomment-515626110
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2694?src=pr=h1)
 Report
   > Merging 
[#2694](https://codecov.io/gh/apache/incubator-gobblin/pull/2694?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/7a74579c05eb5c0bf8965bfd96c3c1036c529ffa?src=pr=desc)
 will **increase** coverage by `0.04%`.
   > The diff coverage is `50%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2694/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2694?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2694  +/-   ##
   
   + Coverage 44.79%   44.84%   +0.04% 
   - Complexity 8686 8691   +5 
   
 Files  1878 1878  
 Lines 7005470058   +4 
 Branches   7701 7701  
   
   + Hits  3138431417  +33 
   + Misses3576435738  -26 
   + Partials   2906 2903   -3
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2694?src=pr=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...che/gobblin/yarn/GobblinYarnConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2694/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5Db25maWd1cmF0aW9uS2V5cy5qYXZh)
 | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...main/java/org/apache/gobblin/yarn/YarnService.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2694/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vWWFyblNlcnZpY2UuamF2YQ==)
 | `14.64% <0%> (-0.09%)` | `3 <0> (ø)` | |
   | 
[...rg/apache/gobblin/yarn/GobblinYarnAppLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2694/diff?src=pr=tree#diff-Z29iYmxpbi15YXJuL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3lhcm4vR29iYmxpbllhcm5BcHBMYXVuY2hlci5qYXZh)
 | `19.89% <100%> (+0.42%)` | `7 <0> (ø)` | :arrow_down: |
   | 
[...lin/restli/throttling/ZookeeperLeaderElection.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2694/diff?src=pr=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2UvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2Utc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3Jlc3RsaS90aHJvdHRsaW5nL1pvb2tlZXBlckxlYWRlckVsZWN0aW9uLmphdmE=)
 | `70% <0%> (-2.23%)` | `13% <0%> (ø)` | |
   | 
[...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2694/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh)
 | `41.83% <0%> (ø)` | `14% <0%> (+1%)` | :arrow_up: |
   | 
[.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2694/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==)
 | `66.19% <0%> (+0.46%)` | `29% <0%> (ø)` | :arrow_down: |
   | 
[.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2694/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=)
 | `80.37% <0%> (+0.93%)` | `24% <0%> (ø)` | :arrow_down: |
   | 
[...ache/gobblin/cluster/GobblinHelixJobScheduler.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2694/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkhlbGl4Sm9iU2NoZWR1bGVyLmphdmE=)
 | `40.52% <0%> (+1.3%)` | `6% <0%> (ø)` | :arrow_down: |
   | 
[...pache/gobblin/cluster/GobblinHelixJobLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2694/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkhlbGl4Sm9iTGF1bmNoZXIuamF2YQ==)
 | `83.33% <0%> (+2.08%)` | `28% <0%> (+2%)` | :arrow_up: |
   | 
[.../gobblin/cluster

[jira] [Work logged] (GOBBLIN-836) Expose container logs location via system property to be used in log4j configuration for Gobblin-on-Yarn applications

2019-07-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-836?focusedWorklogId=283666=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283666
 ]

ASF GitHub Bot logged work on GOBBLIN-836:
--

Author: ASF GitHub Bot
Created on: 26/Jul/19 22:42
Start Date: 26/Jul/19 22:42
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2694: GOBBLIN-836: 
Expose container logs location via system property to be…
URL: https://github.com/apache/incubator-gobblin/pull/2694
 
 
   … used in log4j configuration for Gobblin-on-Yarn applications
   
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-836
   Expose container logs location via system property to be used in log4j 
configuration for Gobblin-on-Yarn applications
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   This task exposes container log directory and container log file name via 
Java System properties, so that they can be referenced in log4j properties. 
This in turn allows configuring log rotation of container logs based on size 
and maximum number of back ups.
   
   
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   Tested by deploying a Gobblin-on-Yarn application
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 283666)
Time Spent: 10m
Remaining Estimate: 0h

> Expose container logs location via system property to be used in log4j 
> configuration for Gobblin-on-Yarn applications
> -----
>
> Key: GOBBLIN-836
> URL: https://issues.apache.org/jira/browse/GOBBLIN-836
>     Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This task exposes container log directory and container log file name via 
> Java System properties, so that they can be referenced in log4j properties. 
> This in turn allows configuring log rotation of container logs based on size 
> and maximum number of back ups.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (GOBBLIN-836) Expose container logs location via system property to be used in log4j configuration for Gobblin-on-Yarn applications

2019-07-26 Thread Sudarshan Vasudevan (JIRA)
Sudarshan Vasudevan created GOBBLIN-836:
---

 Summary: Expose container logs location via system property to be 
used in log4j configuration for Gobblin-on-Yarn applications
 Key: GOBBLIN-836
 URL: https://issues.apache.org/jira/browse/GOBBLIN-836
 Project: Apache Gobblin
  Issue Type: Improvement
  Components: gobblin-yarn
Affects Versions: 0.15.0
Reporter: Sudarshan Vasudevan
Assignee: Abhishek Tiwari
 Fix For: 0.15.0


This task exposes container log directory and container log file name via Java 
System properties, so that they can be referenced in log4j properties. This in 
turn allows configuring log rotation of container logs based on size and 
maximum number of back ups.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (GOBBLIN-834) Provide config for setting ACLs to control visibility of Gobblin-on-Yarn application logs

2019-07-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-834?focusedWorklogId=283517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283517
 ]

ASF GitHub Bot logged work on GOBBLIN-834:
--

Author: ASF GitHub Bot
Created on: 26/Jul/19 17:41
Start Date: 26/Jul/19 17:41
Worklog Time Spent: 10m 
  Work Description: asfgit commented on pull request #2693: GOBBLIN-834: 
Provide config for setting ACLs to control visibility of…
URL: https://github.com/apache/incubator-gobblin/pull/2693
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 283517)
Time Spent: 20m  (was: 10m)

> Provide config for setting ACLs to control visibility of Gobblin-on-Yarn 
> application logs 
> --
>
> Key: GOBBLIN-834
> URL: https://issues.apache.org/jira/browse/GOBBLIN-834
> Project: Apache Gobblin
>  Issue Type: Improvement
>      Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This change provides a config to set ACLs that control the visibility of the 
> AM and container logs for Gobblin-on-Yarn applications.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (GOBBLIN-834) Provide config for setting ACLs to control visibility of Gobblin-on-Yarn application logs

2019-07-26 Thread Sudarshan Vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudarshan Vasudevan resolved GOBBLIN-834.
-
Resolution: Fixed

> Provide config for setting ACLs to control visibility of Gobblin-on-Yarn 
> application logs 
> --
>
> Key: GOBBLIN-834
> URL: https://issues.apache.org/jira/browse/GOBBLIN-834
> Project: Apache Gobblin
>  Issue Type: Improvement
>      Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This change provides a config to set ACLs that control the visibility of the 
> AM and container logs for Gobblin-on-Yarn applications.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (GOBBLIN-834) Provide config for setting ACLs to control visibility of Gobblin-on-Yarn application logs

2019-07-26 Thread Sudarshan Vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudarshan Vasudevan closed GOBBLIN-834.
---

> Provide config for setting ACLs to control visibility of Gobblin-on-Yarn 
> application logs 
> --
>
> Key: GOBBLIN-834
> URL: https://issues.apache.org/jira/browse/GOBBLIN-834
> Project: Apache Gobblin
>  Issue Type: Improvement
>      Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This change provides a config to set ACLs that control the visibility of the 
> AM and container logs for Gobblin-on-Yarn applications.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (GOBBLIN-834) Provide config for setting ACLs to control visibility of Gobblin-on-Yarn application logs

2019-07-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-834?focusedWorklogId=283475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283475
 ]

ASF GitHub Bot logged work on GOBBLIN-834:
--

Author: ASF GitHub Bot
Created on: 26/Jul/19 16:52
Start Date: 26/Jul/19 16:52
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2693: GOBBLIN-834: 
Provide config for setting ACLs to control visibility of…
URL: https://github.com/apache/incubator-gobblin/pull/2693
 
 
   … Gobblin-on-Yarn application logs
   
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-834
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   This change provides a config to set ACLs that control the visibility of the 
AM and container logs for Gobblin-on-Yarn applications.
   
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   Test logs visible via UI by running a test application.
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 283475)
Time Spent: 10m
Remaining Estimate: 0h

> Provide config for setting ACLs to control visibility of Gobblin-on-Yarn 
> application logs 
> --
>
> Key: GOBBLIN-834
> URL: https://issues.apache.org/jira/browse/GOBBLIN-834
>     Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-yarn
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This change provides a config to set ACLs that control the visibility of the 
> AM and container logs for Gobblin-on-Yarn applications.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (GOBBLIN-834) Provide config for setting ACLs to control visibility of Gobblin-on-Yarn application logs

2019-07-26 Thread Sudarshan Vasudevan (JIRA)
Sudarshan Vasudevan created GOBBLIN-834:
---

 Summary: Provide config for setting ACLs to control visibility of 
Gobblin-on-Yarn application logs 
 Key: GOBBLIN-834
 URL: https://issues.apache.org/jira/browse/GOBBLIN-834
 Project: Apache Gobblin
  Issue Type: Improvement
  Components: gobblin-yarn
Affects Versions: 0.15.0
Reporter: Sudarshan Vasudevan
Assignee: Abhishek Tiwari
 Fix For: 0.15.0


This change provides a config to set ACLs that control the visibility of the AM 
and container logs for Gobblin-on-Yarn applications.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN

2019-05-14 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-762.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2626
[https://github.com/apache/incubator-gobblin/pull/2626]

> Add automatic scaling for Gobblin on YARN
> -
>
>     Key: GOBBLIN-762
> URL: https://issues.apache.org/jira/browse/GOBBLIN-762
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Gobblin on YARN needs a way to scale up and down the containers based on the 
> workload.
> Added `YarnAutoScalingManager` which can be started by the 
> `GobblinApplicationMaster` by setting the 
> `gobblin.yarn.app.master.serviceClasses` configuration. This class runs a 
> scheduled task with a default interval of 60 seconds to detect the number of 
> required partitions for the workflows submitted to Helix. It will request the 
> `YarnService` to scale to a computed number of containers. If the requested 
> number of containers is higher than the YarnService has previously requested 
> then it will request more containers. If the requested count is less than the 
> current number of allocated containers then it will free any unused 
> containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN

2019-05-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-762?focusedWorklogId=241822=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-241822
 ]

ASF GitHub Bot logged work on GOBBLIN-762:
--

Author: ASF GitHub Bot
Created on: 14/May/19 16:06
Start Date: 14/May/19 16:06
Worklog Time Spent: 10m 
  Work Description: asfgit commented on pull request #2626: [GOBBLIN-762] 
Add automatic scaling for Gobblin on YARN
URL: https://github.com/apache/incubator-gobblin/pull/2626
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 241822)
Time Spent: 3.5h  (was: 3h 20m)

> Add automatic scaling for Gobblin on YARN
> -
>
>     Key: GOBBLIN-762
> URL: https://issues.apache.org/jira/browse/GOBBLIN-762
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Gobblin on YARN needs a way to scale up and down the containers based on the 
> workload.
> Added `YarnAutoScalingManager` which can be started by the 
> `GobblinApplicationMaster` by setting the 
> `gobblin.yarn.app.master.serviceClasses` configuration. This class runs a 
> scheduled task with a default interval of 60 seconds to detect the number of 
> required partitions for the workflows submitted to Helix. It will request the 
> `YarnService` to scale to a computed number of containers. If the requested 
> number of containers is higher than the YarnService has previously requested 
> then it will request more containers. If the requested count is less than the 
> current number of allocated containers then it will free any unused 
> containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] asfgit closed pull request #2626: [GOBBLIN-762] Add automatic scaling for Gobblin on YARN

2019-05-14 Thread GitBox
asfgit closed pull request #2626: [GOBBLIN-762] Add automatic scaling for 
Gobblin on YARN
URL: https://github.com/apache/incubator-gobblin/pull/2626
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN

2019-05-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-762?focusedWorklogId=239976=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-239976
 ]

ASF GitHub Bot logged work on GOBBLIN-762:
--

Author: ASF GitHub Bot
Created on: 09/May/19 23:11
Start Date: 09/May/19 23:11
Worklog Time Spent: 10m 
  Work Description: jhsenjaliya commented on pull request #2626: 
[GOBBLIN-762] Add automatic scaling for Gobblin on YARN
URL: https://github.com/apache/incubator-gobblin/pull/2626#discussion_r282698677
 
 

 ##
 File path: 
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnAutoScalingManager.java
 ##
 @@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.yarn;
+
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import org.apache.helix.HelixManager;
+import org.apache.helix.task.JobContext;
+import org.apache.helix.task.JobDag;
+import org.apache.helix.task.TaskDriver;
+import org.apache.helix.task.TaskState;
+import org.apache.helix.task.WorkflowConfig;
+import org.apache.helix.task.WorkflowContext;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Optional;
+import com.google.common.base.Preconditions;
+import com.google.common.util.concurrent.AbstractIdleService;
+import com.typesafe.config.Config;
+
+import lombok.AllArgsConstructor;
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.util.ConfigUtils;
+import org.apache.gobblin.util.ExecutorsUtils;
+
+
+/**
+ * The autoscaling manager is responsible for figuring out how many containers 
are required for the workload and
+ * requesting the {@link YarnService} to request that many containers.
+ */
+@Slf4j
+public class YarnAutoScalingManager extends AbstractIdleService {
+  private final String AUTO_SCALING_PREFIX = 
GobblinYarnConfigurationKeys.GOBBLIN_YARN_PREFIX + "autoScaling.";
+  private final String AUTO_SCALING_POLLING_INTERVAL_SECS =
+  AUTO_SCALING_PREFIX + "pollingIntervalSeconds";
+  private final int DEFAULT_AUTO_SCALING_POLLING_INTERVAL_SECS = 60;
+  // Only one container will be requested for each N partitions of work
+  private final String AUTO_SCALING_PARTITIONS_PER_CONTAINER = 
AUTO_SCALING_PREFIX + "partitionsPerContainer";
+  private final int DEFAULT_AUTO_SCALING_PARTITIONS_PER_CONTAINER = 1;
+
+  private final Config config;
+  private final HelixManager helixManager;
+  private final ScheduledExecutorService autoScalingExecutor;
+  private final YarnService yarnService;
+  private final int partitionsPerContainer;
+
+  public YarnAutoScalingManager(GobblinApplicationMaster appMaster) {
+this.config = appMaster.getConfig();
+this.helixManager = 
appMaster.getMultiManager().getJobClusterHelixManager();
+this.yarnService = appMaster.getYarnService();
+this.partitionsPerContainer = ConfigUtils.getInt(this.config, 
AUTO_SCALING_PARTITIONS_PER_CONTAINER,
+DEFAULT_AUTO_SCALING_PARTITIONS_PER_CONTAINER);
+
+Preconditions.checkArgument(this.partitionsPerContainer > 0,
+AUTO_SCALING_PARTITIONS_PER_CONTAINER + " needs to be greater than 0");
+
+this.autoScalingExecutor = Executors.newSingleThreadScheduledExecutor(
+ExecutorsUtils.newThreadFactory(Optional.of(log), 
Optional.of("AutoScalingExecutor")));
+  }
+
+  @Override
+  protected void startUp() throws Exception {
+int scheduleInterval = ConfigUtils.getInt(this.config, 
AUTO_SCALING_POLLING_INTERVAL_SECS,
+DEFAULT_AUTO_SCALING_POLLING_INTERVAL_SECS);
+log.info("Starting the " + YarnAutoScalingManager.class.getSimpleName());
+log.info("Scheduling the auto scaling task with an interval of {} 
seconds", scheduleInterval);
+
+this.autoScalingExecutor.scheduleAtFixedRate(new 
YarnAutoScalingRunnable(new TaskDriver(this.helixManager),
+t

[GitHub] [incubator-gobblin] jhsenjaliya commented on a change in pull request #2626: [GOBBLIN-762] Add automatic scaling for Gobblin on YARN

2019-05-09 Thread GitBox
jhsenjaliya commented on a change in pull request #2626: [GOBBLIN-762] Add 
automatic scaling for Gobblin on YARN
URL: https://github.com/apache/incubator-gobblin/pull/2626#discussion_r282698677
 
 

 ##
 File path: 
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnAutoScalingManager.java
 ##
 @@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.yarn;
+
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import org.apache.helix.HelixManager;
+import org.apache.helix.task.JobContext;
+import org.apache.helix.task.JobDag;
+import org.apache.helix.task.TaskDriver;
+import org.apache.helix.task.TaskState;
+import org.apache.helix.task.WorkflowConfig;
+import org.apache.helix.task.WorkflowContext;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Optional;
+import com.google.common.base.Preconditions;
+import com.google.common.util.concurrent.AbstractIdleService;
+import com.typesafe.config.Config;
+
+import lombok.AllArgsConstructor;
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.util.ConfigUtils;
+import org.apache.gobblin.util.ExecutorsUtils;
+
+
+/**
+ * The autoscaling manager is responsible for figuring out how many containers 
are required for the workload and
+ * requesting the {@link YarnService} to request that many containers.
+ */
+@Slf4j
+public class YarnAutoScalingManager extends AbstractIdleService {
+  private final String AUTO_SCALING_PREFIX = 
GobblinYarnConfigurationKeys.GOBBLIN_YARN_PREFIX + "autoScaling.";
+  private final String AUTO_SCALING_POLLING_INTERVAL_SECS =
+  AUTO_SCALING_PREFIX + "pollingIntervalSeconds";
+  private final int DEFAULT_AUTO_SCALING_POLLING_INTERVAL_SECS = 60;
+  // Only one container will be requested for each N partitions of work
+  private final String AUTO_SCALING_PARTITIONS_PER_CONTAINER = 
AUTO_SCALING_PREFIX + "partitionsPerContainer";
+  private final int DEFAULT_AUTO_SCALING_PARTITIONS_PER_CONTAINER = 1;
+
+  private final Config config;
+  private final HelixManager helixManager;
+  private final ScheduledExecutorService autoScalingExecutor;
+  private final YarnService yarnService;
+  private final int partitionsPerContainer;
+
+  public YarnAutoScalingManager(GobblinApplicationMaster appMaster) {
+this.config = appMaster.getConfig();
+this.helixManager = 
appMaster.getMultiManager().getJobClusterHelixManager();
+this.yarnService = appMaster.getYarnService();
+this.partitionsPerContainer = ConfigUtils.getInt(this.config, 
AUTO_SCALING_PARTITIONS_PER_CONTAINER,
+DEFAULT_AUTO_SCALING_PARTITIONS_PER_CONTAINER);
+
+Preconditions.checkArgument(this.partitionsPerContainer > 0,
+AUTO_SCALING_PARTITIONS_PER_CONTAINER + " needs to be greater than 0");
+
+this.autoScalingExecutor = Executors.newSingleThreadScheduledExecutor(
+ExecutorsUtils.newThreadFactory(Optional.of(log), 
Optional.of("AutoScalingExecutor")));
+  }
+
+  @Override
+  protected void startUp() throws Exception {
+int scheduleInterval = ConfigUtils.getInt(this.config, 
AUTO_SCALING_POLLING_INTERVAL_SECS,
+DEFAULT_AUTO_SCALING_POLLING_INTERVAL_SECS);
+log.info("Starting the " + YarnAutoScalingManager.class.getSimpleName());
+log.info("Scheduling the auto scaling task with an interval of {} 
seconds", scheduleInterval);
+
+this.autoScalingExecutor.scheduleAtFixedRate(new 
YarnAutoScalingRunnable(new TaskDriver(this.helixManager),
+this.yarnService, this.partitionsPerContainer), 0,
+scheduleInterval, TimeUnit.SECONDS);
+  }
+
+  @Override
+  protected void shutDown() throws Exception {
+log.info("Stopping the " + YarnAutoScalingManager.class.getSimpleName());
+
+ExecutorsUtils.shutdownExecutorService(this.autoScalingExecutor, 
Optional.of(log));
+  }
+
+  /**
+   * A {@link Runnable} that figures out the number of 

[jira] [Work logged] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN

2019-05-08 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-762?focusedWorklogId=239453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-239453
 ]

ASF GitHub Bot logged work on GOBBLIN-762:
--

Author: ASF GitHub Bot
Created on: 08/May/19 20:18
Start Date: 08/May/19 20:18
Worklog Time Spent: 10m 
  Work Description: htran1 commented on pull request #2626: [GOBBLIN-762] 
Add automatic scaling for Gobblin on YARN
URL: https://github.com/apache/incubator-gobblin/pull/2626#discussion_r281835351
 
 

 ##
 File path: 
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/GobblinApplicationMaster.java
 ##
 @@ -82,13 +89,22 @@ public GobblinApplicationMaster(String applicationName, 
ContainerId containerId,
   .addService(gobblinYarnLogSource.buildLogCopier(this.config, 
containerId, this.fs, this.appWorkDir));
 }
 
-this.applicationLauncher
-.addService(buildYarnService(this.config, applicationName, 
this.applicationId, yarnConfiguration, this.fs));
+this.yarnService = buildYarnService(this.config, applicationName, 
this.applicationId, yarnConfiguration, this.fs);
+this.applicationLauncher.addService(this.yarnService);
 
 if (UserGroupInformation.isSecurityEnabled()) {
   LOGGER.info("Adding YarnContainerSecurityManager since security is 
enabled");
   
this.applicationLauncher.addService(buildYarnContainerSecurityManager(this.config,
 this.fs));
 }
+
+// Add additional services
+List serviceClassNames = ConfigUtils.getStringList(this.config,
+GobblinYarnConfigurationKeys.APP_MASTER_SERVICE_CLASSES);
+
+for (String serviceClassName : serviceClassNames) {
+  Class serviceClass = Class.forName(serviceClassName);
+  this.applicationLauncher.addService((Service) 
GobblinConstructorUtils.invokeLongestConstructor(serviceClass, this));
 
 Review comment:
   This is to support services that take no arguments.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 239453)
Time Spent: 2h 20m  (was: 2h 10m)

> Add automatic scaling for Gobblin on YARN
> -
>
>     Key: GOBBLIN-762
> URL: https://issues.apache.org/jira/browse/GOBBLIN-762
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Gobblin on YARN needs a way to scale up and down the containers based on the 
> workload.
> Added `YarnAutoScalingManager` which can be started by the 
> `GobblinApplicationMaster` by setting the 
> `gobblin.yarn.app.master.serviceClasses` configuration. This class runs a 
> scheduled task with a default interval of 60 seconds to detect the number of 
> required partitions for the workflows submitted to Helix. It will request the 
> `YarnService` to scale to a computed number of containers. If the requested 
> number of containers is higher than the YarnService has previously requested 
> then it will request more containers. If the requested count is less than the 
> current number of allocated containers then it will free any unused 
> containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN

2019-05-08 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-762?focusedWorklogId=239460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-239460
 ]

ASF GitHub Bot logged work on GOBBLIN-762:
--

Author: ASF GitHub Bot
Created on: 08/May/19 20:18
Start Date: 08/May/19 20:18
Worklog Time Spent: 10m 
  Work Description: htran1 commented on pull request #2626: [GOBBLIN-762] 
Add automatic scaling for Gobblin on YARN
URL: https://github.com/apache/incubator-gobblin/pull/2626#discussion_r281870422
 
 

 ##
 File path: 
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnAutoScalingManager.java
 ##
 @@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.yarn;
+
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import org.apache.helix.HelixManager;
+import org.apache.helix.task.JobContext;
+import org.apache.helix.task.JobDag;
+import org.apache.helix.task.TaskDriver;
+import org.apache.helix.task.TaskState;
+import org.apache.helix.task.WorkflowConfig;
+import org.apache.helix.task.WorkflowContext;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Optional;
+import com.google.common.base.Preconditions;
+import com.google.common.util.concurrent.AbstractIdleService;
+import com.typesafe.config.Config;
+
+import lombok.AllArgsConstructor;
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.util.ConfigUtils;
+import org.apache.gobblin.util.ExecutorsUtils;
+
+
+/**
+ * The autoscaling manager is responsible for figuring out how many containers 
are required for the workload and
+ * requesting the {@link YarnService} to request that many containers.
+ */
+@Slf4j
+public class YarnAutoScalingManager extends AbstractIdleService {
+  private final String AUTO_SCALING_PREFIX = 
GobblinYarnConfigurationKeys.GOBBLIN_YARN_PREFIX + "autoScaling.";
+  private final String AUTO_SCALING_POLLING_INTERVAL_SECS =
+  AUTO_SCALING_PREFIX + "pollingIntervalSeconds";
+  private final int DEFAULT_AUTO_SCALING_POLLING_INTERVAL_SECS = 60;
+  // Only one container will be requested for each N partitions of work
+  private final String AUTO_SCALING_PARTITIONS_PER_CONTAINER = 
AUTO_SCALING_PREFIX + "partitionsPerContainer";
+  private final int DEFAULT_AUTO_SCALING_PARTITIONS_PER_CONTAINER = 1;
+
+  private final Config config;
+  private final HelixManager helixManager;
+  private final ScheduledExecutorService autoScalingExecutor;
+  private final YarnService yarnService;
+  private final int partitionsPerContainer;
+
+  public YarnAutoScalingManager(GobblinApplicationMaster appMaster) {
+this.config = appMaster.getConfig();
+this.helixManager = 
appMaster.getMultiManager().getJobClusterHelixManager();
+this.yarnService = appMaster.getYarnService();
+this.partitionsPerContainer = ConfigUtils.getInt(this.config, 
AUTO_SCALING_PARTITIONS_PER_CONTAINER,
+DEFAULT_AUTO_SCALING_PARTITIONS_PER_CONTAINER);
+
+Preconditions.checkArgument(this.partitionsPerContainer > 0,
+AUTO_SCALING_PARTITIONS_PER_CONTAINER + " needs to be greater than 0");
+
+this.autoScalingExecutor = Executors.newSingleThreadScheduledExecutor(
+ExecutorsUtils.newThreadFactory(Optional.of(log), 
Optional.of("AutoScalingExecutor")));
+  }
+
+  @Override
+  protected void startUp() throws Exception {
+int scheduleInterval = ConfigUtils.getInt(this.config, 
AUTO_SCALING_POLLING_INTERVAL_SECS,
+DEFAULT_AUTO_SCALING_POLLING_INTERVAL_SECS);
+log.info("Starting the " + YarnAutoScalingManager.class.getSimpleName());
+log.info("Scheduling the auto scaling task with an interval of {} 
seconds", scheduleInterval);
+
+this.autoScalingExecutor.scheduleAtFixedRate(new 
YarnAutoScalingRunnable(new TaskDriver(this.helixManager),
+this.yarnService, this.

[jira] [Work logged] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN

2019-05-08 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-762?focusedWorklogId=239461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-239461
 ]

ASF GitHub Bot logged work on GOBBLIN-762:
--

Author: ASF GitHub Bot
Created on: 08/May/19 20:18
Start Date: 08/May/19 20:18
Worklog Time Spent: 10m 
  Work Description: htran1 commented on pull request #2626: [GOBBLIN-762] 
Add automatic scaling for Gobblin on YARN
URL: https://github.com/apache/incubator-gobblin/pull/2626#discussion_r281850457
 
 

 ##
 File path: gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java
 ##
 @@ -310,10 +344,70 @@ private EventSubmitter buildEventSubmitter() {
 .build();
   }
 
-  private void requestInitialContainers(int containersRequested) {
-for (int i = 0; i < containersRequested; i++) {
+  /**
+   * Request an allocation of containers. If numTargetContainers is larger 
than the max of current and expected number
+   * of containers then additional containers are requested.
+   *
+   * If numTargetContainers is less than the current number of allocated 
containers then release free containers.
+   * Shrinking is relative to the number of currently allocated containers 
since it takes time for containers
+   * to be allocated and assigned work and we want to avoid releasing a 
container prematurely before it is assigned
+   * work. This means that a container may not be released even though 
numTargetContainers is less than the requested
+   * number of containers. The intended usage is for the caller of this method 
to make periodic calls to attempt to
+   * adjust the cluster towards the desired number of containers.
+   *
+   * @param numTargetContainers the desired number of containers
+   * @param inUseInstances  a set of in use instances
+   */
+  public synchronized void requestTargetNumberOfContainers(int 
numTargetContainers, Set inUseInstances) {
+LOGGER.debug("Requesting numTargetContainers {} current 
numRequestedContainers {} in use instances {} map size {}",
+numTargetContainers, this.numRequestedContainers, inUseInstances, 
this.containerMap.size());
+
+// YARN can allocate more than the requested number of containers, compute 
additional allocations and deallocations
+// based on the max of the requested and actual allocated counts
+int numAllocatedContainers = this.containerMap.size();
+
+// The number of allocated containers may be higher than the previously 
requested amount
+// and there may be outstanding allocation requests, so the max of both 
counts is computed here
+// and used to decide whether to allocate containers.
+int numContainers = Math.max(numRequestedContainers, 
numAllocatedContainers);
+
+// Request additional containers if the desired count is higher than the 
max of the current allocation or previously
+// requested amount.
+for (int i = numContainers; i < numTargetContainers; i++) {
 
 Review comment:
   Yes, the true allocation count can be higher than the requested count. If 
YARN happens to give more containers after we compute `numContainers` here then 
the requested amount can take us higher than the target. This should stabilize 
over time as auto-scaling brings the count down.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 239461)
Time Spent: 3h 10m  (was: 3h)

> Add automatic scaling for Gobblin on YARN
> -
>
> Key: GOBBLIN-762
> URL: https://issues.apache.org/jira/browse/GOBBLIN-762
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Gobblin on YARN needs a way to scale up and down the containers based on the 
> workload.
> Added `YarnAutoScalingManager` which can be started by the 
> `GobblinApplicationMaster` by setting the 
> `gobblin.yarn.app.master.serviceClasses` configuration. This class runs a 
> scheduled task with a default interval of 60 seconds to detect the number of 
> required partitions for the workflows submitted to Helix. It will request the 
> `YarnService` to scale to a computed number of containers. If the requested 
> number of containers is higher than the YarnService has previously requested 
> then it will request more containers. If the requested count is less than the 
> current number of allocated containers then it will

[jira] [Work logged] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN

2019-05-08 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-762?focusedWorklogId=239459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-239459
 ]

ASF GitHub Bot logged work on GOBBLIN-762:
--

Author: ASF GitHub Bot
Created on: 08/May/19 20:18
Start Date: 08/May/19 20:18
Worklog Time Spent: 10m 
  Work Description: htran1 commented on pull request #2626: [GOBBLIN-762] 
Add automatic scaling for Gobblin on YARN
URL: https://github.com/apache/incubator-gobblin/pull/2626#discussion_r281840066
 
 

 ##
 File path: 
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnAutoScalingManager.java
 ##
 @@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.yarn;
+
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import org.apache.helix.HelixManager;
+import org.apache.helix.task.JobContext;
+import org.apache.helix.task.JobDag;
+import org.apache.helix.task.TaskDriver;
+import org.apache.helix.task.TaskState;
+import org.apache.helix.task.WorkflowConfig;
+import org.apache.helix.task.WorkflowContext;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Optional;
+import com.google.common.base.Preconditions;
+import com.google.common.util.concurrent.AbstractIdleService;
+import com.typesafe.config.Config;
+
+import lombok.AllArgsConstructor;
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.util.ConfigUtils;
+import org.apache.gobblin.util.ExecutorsUtils;
+
+
+/**
+ * The autoscaling manager is responsible for figuring out how many containers 
are required for the workload and
+ * requesting the {@link YarnService} to request that many containers.
+ */
+@Slf4j
+public class YarnAutoScalingManager extends AbstractIdleService {
+  private final String AUTO_SCALING_PREFIX = 
GobblinYarnConfigurationKeys.GOBBLIN_YARN_PREFIX + "autoScaling.";
+  private final String AUTO_SCALING_POLLING_INTERVAL_SECS =
+  AUTO_SCALING_PREFIX + "pollingIntervalSeconds";
+  private final int DEFAULT_AUTO_SCALING_POLLING_INTERVAL_SECS = 60;
+  // Only one container will be requested for each N partitions of work
+  private final String AUTO_SCALING_PARTITIONS_PER_CONTAINER = 
AUTO_SCALING_PREFIX + "partitionsPerContainer";
+  private final int DEFAULT_AUTO_SCALING_PARTITIONS_PER_CONTAINER = 1;
+
+  private final Config config;
+  private final HelixManager helixManager;
+  private final ScheduledExecutorService autoScalingExecutor;
+  private final YarnService yarnService;
+  private final int partitionsPerContainer;
+
+  public YarnAutoScalingManager(GobblinApplicationMaster appMaster) {
+this.config = appMaster.getConfig();
+this.helixManager = 
appMaster.getMultiManager().getJobClusterHelixManager();
+this.yarnService = appMaster.getYarnService();
+this.partitionsPerContainer = ConfigUtils.getInt(this.config, 
AUTO_SCALING_PARTITIONS_PER_CONTAINER,
+DEFAULT_AUTO_SCALING_PARTITIONS_PER_CONTAINER);
+
+Preconditions.checkArgument(this.partitionsPerContainer > 0,
+AUTO_SCALING_PARTITIONS_PER_CONTAINER + " needs to be greater than 0");
+
+this.autoScalingExecutor = Executors.newSingleThreadScheduledExecutor(
+ExecutorsUtils.newThreadFactory(Optional.of(log), 
Optional.of("AutoScalingExecutor")));
+  }
+
+  @Override
+  protected void startUp() throws Exception {
+int scheduleInterval = ConfigUtils.getInt(this.config, 
AUTO_SCALING_POLLING_INTERVAL_SECS,
+DEFAULT_AUTO_SCALING_POLLING_INTERVAL_SECS);
+log.info("Starting the " + YarnAutoScalingManager.class.getSimpleName());
+log.info("Scheduling the auto scaling task with an interval of {} 
seconds", scheduleInterval);
+
+this.autoScalingExecutor.scheduleAtFixedRate(new 
YarnAutoScalingRunnable(new TaskDriver(this.helixManager),
+this.yarnService, this.

[jira] [Work logged] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN

2019-05-08 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-762?focusedWorklogId=239457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-239457
 ]

ASF GitHub Bot logged work on GOBBLIN-762:
--

Author: ASF GitHub Bot
Created on: 08/May/19 20:18
Start Date: 08/May/19 20:18
Worklog Time Spent: 10m 
  Work Description: htran1 commented on pull request #2626: [GOBBLIN-762] 
Add automatic scaling for Gobblin on YARN
URL: https://github.com/apache/incubator-gobblin/pull/2626#discussion_r281867296
 
 

 ##
 File path: gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java
 ##
 @@ -468,6 +562,13 @@ private void handleContainerCompletion(ContainerStatus 
containerStatus) {
   containerStatus.getContainerId(), containerStatus.getDiagnostics()));
 }
 
+if (this.releasedContainerSet.contains(containerStatus.getContainerId())) {
+  LOGGER.info("Container release requested, so not spawning a replacement 
for containerId {}",
+  containerStatus.getContainerId());
+  this.releasedContainerSet.remove(containerStatus.getContainerId());
 
 Review comment:
   The existing code assumes `handleContainerCompletion` only gets called once 
per completion since the code ```Map.Entry 
completedContainerEntry = 
this.containerMap.remove(containerStatus.getContainerId());
   String completedInstanceName = completedContainerEntry.getValue();   
String completedInstanceName = completedContainerEntry.getValue();``` would 
otherwise hit NPE. I can't find from the documentation what the guarantee is on 
this.
   
   I made a change to store the released container in a cache with TTL, so I 
removed the `remove` code, but if `handleContainerCompletion` does get called 
multiple times then that is an existing bug that needs to be fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 239457)

> Add automatic scaling for Gobblin on YARN
> -
>
>     Key: GOBBLIN-762
> URL: https://issues.apache.org/jira/browse/GOBBLIN-762
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Gobblin on YARN needs a way to scale up and down the containers based on the 
> workload.
> Added `YarnAutoScalingManager` which can be started by the 
> `GobblinApplicationMaster` by setting the 
> `gobblin.yarn.app.master.serviceClasses` configuration. This class runs a 
> scheduled task with a default interval of 60 seconds to detect the number of 
> required partitions for the workflows submitted to Helix. It will request the 
> `YarnService` to scale to a computed number of containers. If the requested 
> number of containers is higher than the YarnService has previously requested 
> then it will request more containers. If the requested count is less than the 
> current number of allocated containers then it will free any unused 
> containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN

2019-05-08 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-762?focusedWorklogId=239458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-239458
 ]

ASF GitHub Bot logged work on GOBBLIN-762:
--

Author: ASF GitHub Bot
Created on: 08/May/19 20:18
Start Date: 08/May/19 20:18
Worklog Time Spent: 10m 
  Work Description: htran1 commented on pull request #2626: [GOBBLIN-762] 
Add automatic scaling for Gobblin on YARN
URL: https://github.com/apache/incubator-gobblin/pull/2626#discussion_r281846385
 
 

 ##
 File path: gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java
 ##
 @@ -159,6 +173,11 @@
 
   private volatile boolean shutdownInProgress = false;
 
+  // The number of containers requested
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 239458)
Time Spent: 3h  (was: 2h 50m)

> Add automatic scaling for Gobblin on YARN
> -
>
>     Key: GOBBLIN-762
> URL: https://issues.apache.org/jira/browse/GOBBLIN-762
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Gobblin on YARN needs a way to scale up and down the containers based on the 
> workload.
> Added `YarnAutoScalingManager` which can be started by the 
> `GobblinApplicationMaster` by setting the 
> `gobblin.yarn.app.master.serviceClasses` configuration. This class runs a 
> scheduled task with a default interval of 60 seconds to detect the number of 
> required partitions for the workflows submitted to Helix. It will request the 
> `YarnService` to scale to a computed number of containers. If the requested 
> number of containers is higher than the YarnService has previously requested 
> then it will request more containers. If the requested count is less than the 
> current number of allocated containers then it will free any unused 
> containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN

2019-05-08 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-762?focusedWorklogId=239456=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-239456
 ]

ASF GitHub Bot logged work on GOBBLIN-762:
--

Author: ASF GitHub Bot
Created on: 08/May/19 20:18
Start Date: 08/May/19 20:18
Worklog Time Spent: 10m 
  Work Description: htran1 commented on pull request #2626: [GOBBLIN-762] 
Add automatic scaling for Gobblin on YARN
URL: https://github.com/apache/incubator-gobblin/pull/2626#discussion_r281849098
 
 

 ##
 File path: gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java
 ##
 @@ -229,6 +248,21 @@ public void 
handleContainerShutdownRequest(ContainerShutdownRequest containerShu
 }
   }
 
+  /**
+   * Request the Resource Manager to release the container
+   * @param containerReleaseRequest containers to release
+   */
+  @Subscribe
+  public void handleContainerReleaseRequest(ContainerReleaseRequest 
containerReleaseRequest) {
+for (Container container : containerReleaseRequest.getContainers()) {
+  LOGGER.info(String.format("Releasing container %s running on %s", 
container.getId(), container.getNodeId()));
+
+  // record that this container was explicitly released so that a new one 
is not spawned to replace it
+  this.releasedContainerSet.add(container.getId());
 
 Review comment:
   Added some comments. Also changed the concurrent set to a `Cache` to address 
Kuai's concern of cleanup. That case should be rare, but the Cache will clean 
up after some time if it occurs. The aysnc container operations can interleave 
and the race conditions cannot be avoided since there is no global ordering or 
locking mechanism when interacting with YARN. The auto-scaler should be able to 
get to the desired steady state after several iterations. So if a container 
happens to die and get replaced as the autoscaler decides to release it then in 
a future iteration if the new container is unused then another release request 
will be issued.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 239456)
Time Spent: 2h 50m  (was: 2h 40m)

> Add automatic scaling for Gobblin on YARN
> -
>
>     Key: GOBBLIN-762
> URL: https://issues.apache.org/jira/browse/GOBBLIN-762
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Gobblin on YARN needs a way to scale up and down the containers based on the 
> workload.
> Added `YarnAutoScalingManager` which can be started by the 
> `GobblinApplicationMaster` by setting the 
> `gobblin.yarn.app.master.serviceClasses` configuration. This class runs a 
> scheduled task with a default interval of 60 seconds to detect the number of 
> required partitions for the workflows submitted to Helix. It will request the 
> `YarnService` to scale to a computed number of containers. If the requested 
> number of containers is higher than the YarnService has previously requested 
> then it will request more containers. If the requested count is less than the 
> current number of allocated containers then it will free any unused 
> containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN

2019-05-08 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-762?focusedWorklogId=239454=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-239454
 ]

ASF GitHub Bot logged work on GOBBLIN-762:
--

Author: ASF GitHub Bot
Created on: 08/May/19 20:18
Start Date: 08/May/19 20:18
Worklog Time Spent: 10m 
  Work Description: htran1 commented on pull request #2626: [GOBBLIN-762] 
Add automatic scaling for Gobblin on YARN
URL: https://github.com/apache/incubator-gobblin/pull/2626#discussion_r281836523
 
 

 ##
 File path: gobblin-yarn/build.gradle
 ##
 @@ -65,6 +65,10 @@ dependencies {
   testCompile externalDependency.hadoopYarnMiniCluster
   testCompile externalDependency.curatorFramework
   testCompile externalDependency.curatorTest
+
+  testCompile ('com.google.inject:guice:3.0') {
+force = true
 
 Review comment:
   The test mini cluster uses guice APIs that are not compatible with guice 
4.0. I think this is a known issue and in this file there is the following 
line. `force 'com.google.inject:guice:3.0'` for the testRuntime configuration, 
but that did not seem to work fully. The test would run fine in the IDE, but 
failed at the command line. Adding the testCompile resolved the dependency 
issue when running the test from the gradle build command.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 239454)
Time Spent: 2.5h  (was: 2h 20m)

> Add automatic scaling for Gobblin on YARN
> -
>
>     Key: GOBBLIN-762
> URL: https://issues.apache.org/jira/browse/GOBBLIN-762
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Gobblin on YARN needs a way to scale up and down the containers based on the 
> workload.
> Added `YarnAutoScalingManager` which can be started by the 
> `GobblinApplicationMaster` by setting the 
> `gobblin.yarn.app.master.serviceClasses` configuration. This class runs a 
> scheduled task with a default interval of 60 seconds to detect the number of 
> required partitions for the workflows submitted to Helix. It will request the 
> `YarnService` to scale to a computed number of containers. If the requested 
> number of containers is higher than the YarnService has previously requested 
> then it will request more containers. If the requested count is less than the 
> current number of allocated containers then it will free any unused 
> containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN

2019-05-08 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-762?focusedWorklogId=239455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-239455
 ]

ASF GitHub Bot logged work on GOBBLIN-762:
--

Author: ASF GitHub Bot
Created on: 08/May/19 20:18
Start Date: 08/May/19 20:18
Worklog Time Spent: 10m 
  Work Description: htran1 commented on pull request #2626: [GOBBLIN-762] 
Add automatic scaling for Gobblin on YARN
URL: https://github.com/apache/incubator-gobblin/pull/2626#discussion_r281838842
 
 

 ##
 File path: 
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnAutoScalingManager.java
 ##
 @@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.yarn;
+
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import org.apache.helix.HelixManager;
+import org.apache.helix.task.JobContext;
+import org.apache.helix.task.JobDag;
+import org.apache.helix.task.TaskDriver;
+import org.apache.helix.task.TaskState;
+import org.apache.helix.task.WorkflowConfig;
+import org.apache.helix.task.WorkflowContext;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Optional;
+import com.google.common.base.Preconditions;
+import com.google.common.util.concurrent.AbstractIdleService;
+import com.typesafe.config.Config;
+
+import lombok.AllArgsConstructor;
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.util.ConfigUtils;
+import org.apache.gobblin.util.ExecutorsUtils;
+
+
+/**
+ * The autoscaling manager is responsible for figuring out how many containers 
are required for the workload and
+ * requesting the {@link YarnService} to request that many containers.
+ */
+@Slf4j
+public class YarnAutoScalingManager extends AbstractIdleService {
+  private final String AUTO_SCALING_PREFIX = 
GobblinYarnConfigurationKeys.GOBBLIN_YARN_PREFIX + "autoScaling.";
+  private final String AUTO_SCALING_POLLING_INTERVAL_SECS =
+  AUTO_SCALING_PREFIX + "pollingIntervalSeconds";
+  private final int DEFAULT_AUTO_SCALING_POLLING_INTERVAL_SECS = 60;
+  // Only one container will be requested for each N partitions of work
+  private final String AUTO_SCALING_PARTITIONS_PER_CONTAINER = 
AUTO_SCALING_PREFIX + "partitionsPerContainer";
+  private final int DEFAULT_AUTO_SCALING_PARTITIONS_PER_CONTAINER = 1;
+
+  private final Config config;
+  private final HelixManager helixManager;
+  private final ScheduledExecutorService autoScalingExecutor;
+  private final YarnService yarnService;
+  private final int partitionsPerContainer;
+
+  public YarnAutoScalingManager(GobblinApplicationMaster appMaster) {
+this.config = appMaster.getConfig();
+this.helixManager = 
appMaster.getMultiManager().getJobClusterHelixManager();
+this.yarnService = appMaster.getYarnService();
+this.partitionsPerContainer = ConfigUtils.getInt(this.config, 
AUTO_SCALING_PARTITIONS_PER_CONTAINER,
+DEFAULT_AUTO_SCALING_PARTITIONS_PER_CONTAINER);
+
+Preconditions.checkArgument(this.partitionsPerContainer > 0,
+AUTO_SCALING_PARTITIONS_PER_CONTAINER + " needs to be greater than 0");
+
+this.autoScalingExecutor = Executors.newSingleThreadScheduledExecutor(
+ExecutorsUtils.newThreadFactory(Optional.of(log), 
Optional.of("AutoScalingExecutor")));
+  }
+
+  @Override
+  protected void startUp() throws Exception {
+int scheduleInterval = ConfigUtils.getInt(this.config, 
AUTO_SCALING_POLLING_INTERVAL_SECS,
+DEFAULT_AUTO_SCALING_POLLING_INTERVAL_SECS);
+log.info("Starting the " + YarnAutoScalingManager.class.getSimpleName());
+log.info("Scheduling the auto scaling task with an interval of {} 
seconds", scheduleInterval);
+
+this.autoScalingExecutor.scheduleAtFixedRate(new 
YarnAutoScalingRunnable(new TaskDriver(this.helixManager),
+this.yarnService, this.

[GitHub] [incubator-gobblin] htran1 commented on a change in pull request #2626: [GOBBLIN-762] Add automatic scaling for Gobblin on YARN

2019-05-08 Thread GitBox
htran1 commented on a change in pull request #2626: [GOBBLIN-762] Add automatic 
scaling for Gobblin on YARN
URL: https://github.com/apache/incubator-gobblin/pull/2626#discussion_r281836523
 
 

 ##
 File path: gobblin-yarn/build.gradle
 ##
 @@ -65,6 +65,10 @@ dependencies {
   testCompile externalDependency.hadoopYarnMiniCluster
   testCompile externalDependency.curatorFramework
   testCompile externalDependency.curatorTest
+
+  testCompile ('com.google.inject:guice:3.0') {
+force = true
 
 Review comment:
   The test mini cluster uses guice APIs that are not compatible with guice 
4.0. I think this is a known issue and in this file there is the following 
line. `force 'com.google.inject:guice:3.0'` for the testRuntime configuration, 
but that did not seem to work fully. The test would run fine in the IDE, but 
failed at the command line. Adding the testCompile resolved the dependency 
issue when running the test from the gradle build command.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] htran1 commented on a change in pull request #2626: [GOBBLIN-762] Add automatic scaling for Gobblin on YARN

2019-05-08 Thread GitBox
htran1 commented on a change in pull request #2626: [GOBBLIN-762] Add automatic 
scaling for Gobblin on YARN
URL: https://github.com/apache/incubator-gobblin/pull/2626#discussion_r281867296
 
 

 ##
 File path: gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java
 ##
 @@ -468,6 +562,13 @@ private void handleContainerCompletion(ContainerStatus 
containerStatus) {
   containerStatus.getContainerId(), containerStatus.getDiagnostics()));
 }
 
+if (this.releasedContainerSet.contains(containerStatus.getContainerId())) {
+  LOGGER.info("Container release requested, so not spawning a replacement 
for containerId {}",
+  containerStatus.getContainerId());
+  this.releasedContainerSet.remove(containerStatus.getContainerId());
 
 Review comment:
   The existing code assumes `handleContainerCompletion` only gets called once 
per completion since the code ```Map.Entry 
completedContainerEntry = 
this.containerMap.remove(containerStatus.getContainerId());
   String completedInstanceName = completedContainerEntry.getValue();   
String completedInstanceName = completedContainerEntry.getValue();``` would 
otherwise hit NPE. I can't find from the documentation what the guarantee is on 
this.
   
   I made a change to store the released container in a cache with TTL, so I 
removed the `remove` code, but if `handleContainerCompletion` does get called 
multiple times then that is an existing bug that needs to be fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] htran1 commented on a change in pull request #2626: [GOBBLIN-762] Add automatic scaling for Gobblin on YARN

2019-05-08 Thread GitBox
htran1 commented on a change in pull request #2626: [GOBBLIN-762] Add automatic 
scaling for Gobblin on YARN
URL: https://github.com/apache/incubator-gobblin/pull/2626#discussion_r281850457
 
 

 ##
 File path: gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java
 ##
 @@ -310,10 +344,70 @@ private EventSubmitter buildEventSubmitter() {
 .build();
   }
 
-  private void requestInitialContainers(int containersRequested) {
-for (int i = 0; i < containersRequested; i++) {
+  /**
+   * Request an allocation of containers. If numTargetContainers is larger 
than the max of current and expected number
+   * of containers then additional containers are requested.
+   *
+   * If numTargetContainers is less than the current number of allocated 
containers then release free containers.
+   * Shrinking is relative to the number of currently allocated containers 
since it takes time for containers
+   * to be allocated and assigned work and we want to avoid releasing a 
container prematurely before it is assigned
+   * work. This means that a container may not be released even though 
numTargetContainers is less than the requested
+   * number of containers. The intended usage is for the caller of this method 
to make periodic calls to attempt to
+   * adjust the cluster towards the desired number of containers.
+   *
+   * @param numTargetContainers the desired number of containers
+   * @param inUseInstances  a set of in use instances
+   */
+  public synchronized void requestTargetNumberOfContainers(int 
numTargetContainers, Set inUseInstances) {
+LOGGER.debug("Requesting numTargetContainers {} current 
numRequestedContainers {} in use instances {} map size {}",
+numTargetContainers, this.numRequestedContainers, inUseInstances, 
this.containerMap.size());
+
+// YARN can allocate more than the requested number of containers, compute 
additional allocations and deallocations
+// based on the max of the requested and actual allocated counts
+int numAllocatedContainers = this.containerMap.size();
+
+// The number of allocated containers may be higher than the previously 
requested amount
+// and there may be outstanding allocation requests, so the max of both 
counts is computed here
+// and used to decide whether to allocate containers.
+int numContainers = Math.max(numRequestedContainers, 
numAllocatedContainers);
+
+// Request additional containers if the desired count is higher than the 
max of the current allocation or previously
+// requested amount.
+for (int i = numContainers; i < numTargetContainers; i++) {
 
 Review comment:
   Yes, the true allocation count can be higher than the requested count. If 
YARN happens to give more containers after we compute `numContainers` here then 
the requested amount can take us higher than the target. This should stabilize 
over time as auto-scaling brings the count down.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   >