[jira] [Work logged] (GOBBLIN-917) gaas jobs not able to report status should be killed after some time
[ https://issues.apache.org/jira/browse/GOBBLIN-917?focusedWorklogId=330796=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330796 ] ASF GitHub Bot logged work on GOBBLIN-917: -- Author: ASF GitHub Bot Created on: 18/Oct/19 21:53 Start Date: 18/Oct/19 21:53 Worklog Time Spent: 10m Work Description: asfgit commented on pull request #2769: [GOBBLIN-917] kill orphan gaas jobs URL: https://github.com/apache/incubator-gobblin/pull/2769 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 330796) Time Spent: 50m (was: 40m) > gaas jobs not able to report status should be killed after some time > > > Key: GOBBLIN-917 > URL: https://issues.apache.org/jira/browse/GOBBLIN-917 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Arjun Singh Bora >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-917) gaas jobs not able to report status should be killed after some time
[ https://issues.apache.org/jira/browse/GOBBLIN-917?focusedWorklogId=330781=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330781 ] ASF GitHub Bot logged work on GOBBLIN-917: -- Author: ASF GitHub Bot Created on: 18/Oct/19 21:32 Start Date: 18/Oct/19 21:32 Worklog Time Spent: 10m Work Description: codecov-io commented on issue #2769: [GOBBLIN-917] kill orphan gaas jobs URL: https://github.com/apache/incubator-gobblin/pull/2769#issuecomment-543433401 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2769?src=pr=h1) Report > Merging [#2769](https://codecov.io/gh/apache/incubator-gobblin/pull/2769?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/b92f516a4e7a9b2c8bd2e90d71026fb07cb56da7?src=pr=desc) will **increase** coverage by `0.04%`. > The diff coverage is `94.11%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2769?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2769 +/- ## + Coverage 45.3% 45.35% +0.04% - Complexity 8846 8861 +15 Files 1892 1894 +2 Lines 7075270829 +77 Branches 7773 7787 +14 + Hits 3205432124 +70 - Misses3574035743 +3 - Partials 2958 2962 +4 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2769?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...pache/gobblin/configuration/ConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/diff?src=pr=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vY29uZmlndXJhdGlvbi9Db25maWd1cmF0aW9uS2V5cy5qYXZh) | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...service/modules/orchestration/DagManagerUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/diff?src=pr=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9vcmNoZXN0cmF0aW9uL0RhZ01hbmFnZXJVdGlscy5qYXZh) | `84.81% <100%> (+0.81%)` | `34 <2> (+2)` | :arrow_up: | | [...blin/service/modules/orchestration/DagManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/diff?src=pr=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9vcmNoZXN0cmF0aW9uL0RhZ01hbmFnZXIuamF2YQ==) | `79.29% <92.3%> (+0.29%)` | `12 <0> (ø)` | :arrow_down: | | [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `63.88% <0%> (-0.93%)` | `28% <0%> (ø)` | | | [...ement/copy/UnixTimestampCopyableDatasetFinder.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/diff?src=pr=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvcHkvVW5peFRpbWVzdGFtcENvcHlhYmxlRGF0YXNldEZpbmRlci5qYXZh) | `100% <0%> (ø)` | `2% <0%> (?)` | | | [...nt/copy/UnixTimestampRecursiveCopyableDataset.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/diff?src=pr=tree#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvcHkvVW5peFRpbWVzdGFtcFJlY3Vyc2l2ZUNvcHlhYmxlRGF0YXNldC5qYXZh) | `85% <0%> (ø)` | `8% <0%> (?)` | | | [.../org/apache/gobblin/metrics/RootMetricContext.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/diff?src=pr=tree#diff-Z29iYmxpbi1tZXRyaWNzLWxpYnMvZ29iYmxpbi1tZXRyaWNzLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0cmljcy9Sb290TWV0cmljQ29udGV4dC5qYXZh) | `79.68% <0%> (+1.56%)` | `16% <0%> (+1%)` | :arrow_up: | | [...lin/restli/throttling/ZookeeperLeaderElection.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/diff?src=pr=tree#diff-Z29iYmxpbi1yZXN0bGkvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2UvZ29iYmxpbi10aHJvdHRsaW5nLXNlcnZpY2Utc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3Jlc3RsaS90aHJvdHRsaW5nL1pvb2tlZXBlckxlYWRlckVsZWN0aW9uLmphdmE=) | `72.22% <0%> (+2.22%)` | `13% <0%> (ø)` | :arrow_down: | | [...lin/util/filesystem/FileSystemInstrumentation.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/diff?src=pr=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvZmlsZXN5c3RlbS9GaWxlU3lzdGVtSW5zdHJ1bWVudGF0aW9uLmphdmE=) | `100% <0%> (+7.14%)` | `4% <0%> (+1%)` | :arrow_up:
[jira] [Work logged] (GOBBLIN-917) gaas jobs not able to report status should be killed after some time
[ https://issues.apache.org/jira/browse/GOBBLIN-917?focusedWorklogId=330268=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330268 ] ASF GitHub Bot logged work on GOBBLIN-917: -- Author: ASF GitHub Bot Created on: 18/Oct/19 04:19 Start Date: 18/Oct/19 04:19 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2769: [GOBBLIN-917] kill orphan gaas jobs URL: https://github.com/apache/incubator-gobblin/pull/2769#discussion_r336314957 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java ## @@ -545,8 +542,26 @@ private void pollAndAdvanceDag() throws IOException, ExecutionException, Interru } } -private ExecutionStatus getJobExecutionStatus(boolean slaKilled, JobStatus jobStatus) { - if (slaKilled) { +// cancel the job and returns true if the job status remains ORCHESTRATED for some specific time Review comment: Use javadoc syntax: /** * Cancel the job if the job has been "orphaned". A job is orphaned if {provide definition of orhphaned job}. @return true if **/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 330268) Time Spent: 0.5h (was: 20m) > gaas jobs not able to report status should be killed after some time > > > Key: GOBBLIN-917 > URL: https://issues.apache.org/jira/browse/GOBBLIN-917 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Arjun Singh Bora >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-917) gaas jobs not able to report status should be killed after some time
[ https://issues.apache.org/jira/browse/GOBBLIN-917?focusedWorklogId=330269=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330269 ] ASF GitHub Bot logged work on GOBBLIN-917: -- Author: ASF GitHub Bot Created on: 18/Oct/19 04:19 Start Date: 18/Oct/19 04:19 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2769: [GOBBLIN-917] kill orphan gaas jobs URL: https://github.com/apache/incubator-gobblin/pull/2769#discussion_r336314482 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java ## @@ -545,8 +542,26 @@ private void pollAndAdvanceDag() throws IOException, ExecutionException, Interru } } -private ExecutionStatus getJobExecutionStatus(boolean slaKilled, JobStatus jobStatus) { - if (slaKilled) { +// cancel the job and returns true if the job status remains ORCHESTRATED for some specific time +private boolean killOrphanFlows(DagNode node, JobStatus jobStatus) Review comment: killOrphanFlows -> killJobIfOrphaned This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 330269) Time Spent: 0.5h (was: 20m) > gaas jobs not able to report status should be killed after some time > > > Key: GOBBLIN-917 > URL: https://issues.apache.org/jira/browse/GOBBLIN-917 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Arjun Singh Bora >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-917) gaas jobs not able to report status should be killed after some time
[ https://issues.apache.org/jira/browse/GOBBLIN-917?focusedWorklogId=330270=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330270 ] ASF GitHub Bot logged work on GOBBLIN-917: -- Author: ASF GitHub Bot Created on: 18/Oct/19 04:19 Start Date: 18/Oct/19 04:19 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2769: [GOBBLIN-917] kill orphan gaas jobs URL: https://github.com/apache/incubator-gobblin/pull/2769#discussion_r336314407 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java ## @@ -491,12 +486,14 @@ private void pollAndAdvanceDag() throws IOException, ExecutionException, Interru Map>> nextSubmitted = Maps.newHashMap(); List> nodesToCleanUp = Lists.newArrayList(); - for (DagNode node: this.jobToDag.keySet()) { + for (DagNode node : this.jobToDag.keySet()) { boolean slaKilled = slaKillIfNeeded(node); JobStatus jobStatus = pollJobStatus(node); -ExecutionStatus status = getJobExecutionStatus(slaKilled, jobStatus); +boolean killOrphanFlow = killOrphanFlows(node, jobStatus); Review comment: killOrphanFlows -> killJobIfOrphaned(node, jobStatus); This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 330270) Time Spent: 0.5h (was: 20m) > gaas jobs not able to report status should be killed after some time > > > Key: GOBBLIN-917 > URL: https://issues.apache.org/jira/browse/GOBBLIN-917 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Arjun Singh Bora >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-917) gaas jobs not able to report status should be killed after some time
[ https://issues.apache.org/jira/browse/GOBBLIN-917?focusedWorklogId=330220=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330220 ] ASF GitHub Bot logged work on GOBBLIN-917: -- Author: ASF GitHub Bot Created on: 18/Oct/19 00:58 Start Date: 18/Oct/19 00:58 Worklog Time Spent: 10m Work Description: codecov-io commented on issue #2769: [GOBBLIN-917] kill orphan gaas jobs URL: https://github.com/apache/incubator-gobblin/pull/2769#issuecomment-543433401 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2769?src=pr=h1) Report > Merging [#2769](https://codecov.io/gh/apache/incubator-gobblin/pull/2769?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/b92f516a4e7a9b2c8bd2e90d71026fb07cb56da7?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `70.58%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2769?src=pr=tree) ```diff @@ Coverage Diff @@ ## master #2769 +/- ## === + Coverage 45.3% 45.3% +<.01% - Complexity 88468847 +1 === Files 18921892 Lines 70752 70766 +14 Branches 77737776 +3 === + Hits 32054 32062 +8 - Misses35740 35746 +6 Partials 29582958 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2769?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...pache/gobblin/configuration/ConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/diff?src=pr=tree#diff-Z29iYmxpbi1hcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vY29uZmlndXJhdGlvbi9Db25maWd1cmF0aW9uS2V5cy5qYXZh) | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...service/modules/orchestration/DagManagerUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/diff?src=pr=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9vcmNoZXN0cmF0aW9uL0RhZ01hbmFnZXJVdGlscy5qYXZh) | `79.74% <0%> (-4.26%)` | `32 <0> (ø)` | | | [...blin/service/modules/orchestration/DagManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/diff?src=pr=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9vcmNoZXN0cmF0aW9uL0RhZ01hbmFnZXIuamF2YQ==) | `79.29% <92.3%> (+0.29%)` | `12 <0> (ø)` | :arrow_down: | | [...lin/util/filesystem/FileSystemInstrumentation.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/diff?src=pr=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvZmlsZXN5c3RlbS9GaWxlU3lzdGVtSW5zdHJ1bWVudGF0aW9uLmphdmE=) | `85.71% <0%> (-7.15%)` | `3% <0%> (ø)` | | | [...in/java/org/apache/gobblin/cluster/HelixUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvSGVsaXhVdGlscy5qYXZh) | `35.51% <0%> (-3.74%)` | `12% <0%> (-1%)` | | | [...main/java/org/apache/gobblin/util/HadoopUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/diff?src=pr=tree#diff-Z29iYmxpbi11dGlsaXR5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3V0aWwvSGFkb29wVXRpbHMuamF2YQ==) | `30.87% <0%> (+0.67%)` | `25% <0%> (+1%)` | :arrow_up: | | [.../apache/gobblin/runtime/api/JobExecutionState.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvYXBpL0pvYkV4ZWN1dGlvblN0YXRlLmphdmE=) | `80.37% <0%> (+0.93%)` | `24% <0%> (ø)` | :arrow_down: | | [.../org/apache/gobblin/metrics/RootMetricContext.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2769/diff?src=pr=tree#diff-Z29iYmxpbi1tZXRyaWNzLWxpYnMvZ29iYmxpbi1tZXRyaWNzLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vbWV0cmljcy9Sb290TWV0cmljQ29udGV4dC5qYXZh) | `79.68% <0%> (+1.56%)` | `16% <0%> (+1%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2769?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2769?src=pr=footer). Last update [b92f516...d8892ae](https://codecov.io/gh/apache/incubator-gobblin/pull/2769?src=pr=lastupdated). Read the
[jira] [Work logged] (GOBBLIN-917) gaas jobs not able to report status should be killed after some time
[ https://issues.apache.org/jira/browse/GOBBLIN-917?focusedWorklogId=330190=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330190 ] ASF GitHub Bot logged work on GOBBLIN-917: -- Author: ASF GitHub Bot Created on: 17/Oct/19 23:23 Start Date: 17/Oct/19 23:23 Worklog Time Spent: 10m Work Description: arjun4084346 commented on pull request #2769: [GOBBLIN-917] kill orphan gaas jobs URL: https://github.com/apache/incubator-gobblin/pull/2769 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-917 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): dag manager thread should wait for some time (which should be configurable), and then kill the job id it does not see the job running. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: unit test added ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 330190) Remaining Estimate: 0h Time Spent: 10m > gaas jobs not able to report status should be killed after some time > > > Key: GOBBLIN-917 > URL: https://issues.apache.org/jira/browse/GOBBLIN-917 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Arjun Singh Bora >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)