[jira] [Work logged] (GOBBLIN-771) emit a few metrics for gobblin service
[ https://issues.apache.org/jira/browse/GOBBLIN-771?focusedWorklogId=247803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247803 ] ASF GitHub Bot logged work on GOBBLIN-771: -- Author: ASF GitHub Bot Created on: 24/May/19 01:06 Start Date: 24/May/19 01:06 Worklog Time Spent: 10m Work Description: arjun4084346 commented on pull request #2635: [GOBBLIN-771] add a few metrics for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2635#discussion_r287184812 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/Orchestrator.java ## @@ -241,10 +233,15 @@ public void orchestrate(Spec spec) throws Exception { if (!canRun(flowName, flowGroup, allowConcurrentExecution)) { _log.warn("Another instance of flowGroup: {}, flowName: {} running; Skipping flow execution since " + "concurrent executions are disabled for this flow.", flowGroup, flowName); -if (this.flowAlreadyRunningGauge.isPresent()) { - this.jobAlreadyRunning.incrementAndGet(); -} +// We send a gauge with value 0 signifying that the flow could not be compiled because previous execution is already running +metricContext.newContextAwareGauge( + MetricRegistry.name(MetricReportUtils.GOBBLIN_SERVICE_METRICS_PREFIX, flowGroup, flowName, ServiceMetricNames.COMPILED), +() -> 0L); Review comment: No, O and 1 seems more intuitive. 0 means not compiled, 1 means compiled? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 247803) Time Spent: 1h 10m (was: 1h) > emit a few metrics for gobblin service > -- > > Key: GOBBLIN-771 > URL: https://issues.apache.org/jira/browse/GOBBLIN-771 > Project: Apache Gobblin > Issue Type: Task >Reporter: Arjun Singh Bora >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-771) emit a few metrics for gobblin service
[ https://issues.apache.org/jira/browse/GOBBLIN-771?focusedWorklogId=247804=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247804 ] ASF GitHub Bot logged work on GOBBLIN-771: -- Author: ASF GitHub Bot Created on: 24/May/19 01:08 Start Date: 24/May/19 01:08 Worklog Time Spent: 10m Work Description: arjun4084346 commented on pull request #2635: [GOBBLIN-771] add a few metrics for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2635#discussion_r287185042 ## File path: gobblin-metrics-libs/gobblin-metrics/src/main/java/org/apache/gobblin/metrics/ServiceMetricNames.java ## @@ -37,5 +37,5 @@ public static final String RUN_IMMEDIATELY_FLOW_METER = "RunImmediatelyFlow"; public static final String RUNNING_FLOWS_COUNTER = "RunningFlows"; - public static final String FLOWS_ALREADY_RUNNING_GAUGE = "FlowsAlreadyRunning"; + public static final String COMPILED = "Compiled"; Review comment: I think 'Compiled' is ok, because we are appending flow details before it. e.g. ktwo.encryption_holdem_faro.Compiled = 1/0 seems better than ktwo.encryption_holdem_faro.CompiledFlows = 1/0 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 247804) Time Spent: 1h 20m (was: 1h 10m) > emit a few metrics for gobblin service > -- > > Key: GOBBLIN-771 > URL: https://issues.apache.org/jira/browse/GOBBLIN-771 > Project: Apache Gobblin > Issue Type: Task >Reporter: Arjun Singh Bora >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] arjun4084346 commented on a change in pull request #2635: [GOBBLIN-771] add a few metrics for gobblin service
arjun4084346 commented on a change in pull request #2635: [GOBBLIN-771] add a few metrics for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2635#discussion_r287185042 ## File path: gobblin-metrics-libs/gobblin-metrics/src/main/java/org/apache/gobblin/metrics/ServiceMetricNames.java ## @@ -37,5 +37,5 @@ public static final String RUN_IMMEDIATELY_FLOW_METER = "RunImmediatelyFlow"; public static final String RUNNING_FLOWS_COUNTER = "RunningFlows"; - public static final String FLOWS_ALREADY_RUNNING_GAUGE = "FlowsAlreadyRunning"; + public static final String COMPILED = "Compiled"; Review comment: I think 'Compiled' is ok, because we are appending flow details before it. e.g. ktwo.encryption_holdem_faro.Compiled = 1/0 seems better than ktwo.encryption_holdem_faro.CompiledFlows = 1/0 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] arjun4084346 commented on a change in pull request #2635: [GOBBLIN-771] add a few metrics for gobblin service
arjun4084346 commented on a change in pull request #2635: [GOBBLIN-771] add a few metrics for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2635#discussion_r287184812 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/Orchestrator.java ## @@ -241,10 +233,15 @@ public void orchestrate(Spec spec) throws Exception { if (!canRun(flowName, flowGroup, allowConcurrentExecution)) { _log.warn("Another instance of flowGroup: {}, flowName: {} running; Skipping flow execution since " + "concurrent executions are disabled for this flow.", flowGroup, flowName); -if (this.flowAlreadyRunningGauge.isPresent()) { - this.jobAlreadyRunning.incrementAndGet(); -} +// We send a gauge with value 0 signifying that the flow could not be compiled because previous execution is already running +metricContext.newContextAwareGauge( + MetricRegistry.name(MetricReportUtils.GOBBLIN_SERVICE_METRICS_PREFIX, flowGroup, flowName, ServiceMetricNames.COMPILED), +() -> 0L); Review comment: No, O and 1 seems more intuitive. 0 means not compiled, 1 means compiled? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-775) Add job level retry for gobblin service
[ https://issues.apache.org/jira/browse/GOBBLIN-775?focusedWorklogId=247743=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247743 ] ASF GitHub Bot logged work on GOBBLIN-775: -- Author: ASF GitHub Bot Created on: 23/May/19 22:11 Start Date: 23/May/19 22:11 Worklog Time Spent: 10m Work Description: jack-moseley commented on pull request #2640: [GOBBLIN-775] Add job level retries for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2640#discussion_r287153533 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java ## @@ -391,6 +392,10 @@ private void pollAndAdvanceDag() jobExecutionPlan.setExecutionStatus(RUNNING); break; } + +if (jobStatus.isShouldRetry()) { Review comment: Yes, and they also have their status updated to running at the same time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 247743) Time Spent: 1h 20m (was: 1h 10m) > Add job level retry for gobblin service > --- > > Key: GOBBLIN-775 > URL: https://issues.apache.org/jira/browse/GOBBLIN-775 > Project: Apache Gobblin > Issue Type: New Feature > Components: gobblin-service >Reporter: Jack Moseley >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] jack-moseley commented on a change in pull request #2640: [GOBBLIN-775] Add job level retries for gobblin service
jack-moseley commented on a change in pull request #2640: [GOBBLIN-775] Add job level retries for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2640#discussion_r287153533 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java ## @@ -391,6 +392,10 @@ private void pollAndAdvanceDag() jobExecutionPlan.setExecutionStatus(RUNNING); break; } + +if (jobStatus.isShouldRetry()) { Review comment: Yes, and they also have their status updated to running at the same time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-775) Add job level retry for gobblin service
[ https://issues.apache.org/jira/browse/GOBBLIN-775?focusedWorklogId=247740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247740 ] ASF GitHub Bot logged work on GOBBLIN-775: -- Author: ASF GitHub Bot Created on: 23/May/19 22:10 Start Date: 23/May/19 22:10 Worklog Time Spent: 10m Work Description: jack-moseley commented on pull request #2640: [GOBBLIN-775] Add job level retries for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2640#discussion_r287153336 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java ## @@ -457,6 +462,15 @@ private void submitJob(DagNode dagNode) { JobSpec jobSpec = DagManagerUtils.getJobSpec(dagNode); Map jobMetadata = TimingEventUtils.getJobMetadata(Maps.newHashMap(), jobExecutionPlan); + // Increment submission attempt Review comment: I don't think the incrementing logic should be in there because it could be called at a time other than job submission. But I moved the logic for updating the map to there. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 247740) Time Spent: 1h 10m (was: 1h) > Add job level retry for gobblin service > --- > > Key: GOBBLIN-775 > URL: https://issues.apache.org/jira/browse/GOBBLIN-775 > Project: Apache Gobblin > Issue Type: New Feature > Components: gobblin-service >Reporter: Jack Moseley >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] jack-moseley commented on a change in pull request #2640: [GOBBLIN-775] Add job level retries for gobblin service
jack-moseley commented on a change in pull request #2640: [GOBBLIN-775] Add job level retries for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2640#discussion_r287153336 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java ## @@ -457,6 +462,15 @@ private void submitJob(DagNode dagNode) { JobSpec jobSpec = DagManagerUtils.getJobSpec(dagNode); Map jobMetadata = TimingEventUtils.getJobMetadata(Maps.newHashMap(), jobExecutionPlan); + // Increment submission attempt Review comment: I don't think the incrementing logic should be in there because it could be called at a time other than job submission. But I moved the logic for updating the map to there. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-771) emit a few metrics for gobblin service
[ https://issues.apache.org/jira/browse/GOBBLIN-771?focusedWorklogId=247570=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247570 ] ASF GitHub Bot logged work on GOBBLIN-771: -- Author: ASF GitHub Bot Created on: 23/May/19 17:31 Start Date: 23/May/19 17:31 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2635: [GOBBLIN-771] add a few metrics for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2635#discussion_r287054481 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/Orchestrator.java ## @@ -241,10 +233,15 @@ public void orchestrate(Spec spec) throws Exception { if (!canRun(flowName, flowGroup, allowConcurrentExecution)) { _log.warn("Another instance of flowGroup: {}, flowName: {} running; Skipping flow execution since " + "concurrent executions are disabled for this flow.", flowGroup, flowName); -if (this.flowAlreadyRunningGauge.isPresent()) { - this.jobAlreadyRunning.incrementAndGet(); -} +// We send a gauge with value 0 signifying that the flow could not be compiled because previous execution is already running +metricContext.newContextAwareGauge( + MetricRegistry.name(MetricReportUtils.GOBBLIN_SERVICE_METRICS_PREFIX, flowGroup, flowName, ServiceMetricNames.COMPILED), +() -> 0L); Review comment: Can we use gauge with values 1 and 2 instead of 0 and 1? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 247570) Time Spent: 1h (was: 50m) > emit a few metrics for gobblin service > -- > > Key: GOBBLIN-771 > URL: https://issues.apache.org/jira/browse/GOBBLIN-771 > Project: Apache Gobblin > Issue Type: Task >Reporter: Arjun Singh Bora >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-771) emit a few metrics for gobblin service
[ https://issues.apache.org/jira/browse/GOBBLIN-771?focusedWorklogId=247569=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247569 ] ASF GitHub Bot logged work on GOBBLIN-771: -- Author: ASF GitHub Bot Created on: 23/May/19 17:31 Start Date: 23/May/19 17:31 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2635: [GOBBLIN-771] add a few metrics for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2635#discussion_r287041859 ## File path: gobblin-metrics-libs/gobblin-metrics/src/main/java/org/apache/gobblin/metrics/ServiceMetricNames.java ## @@ -37,5 +37,5 @@ public static final String RUN_IMMEDIATELY_FLOW_METER = "RunImmediatelyFlow"; public static final String RUNNING_FLOWS_COUNTER = "RunningFlows"; - public static final String FLOWS_ALREADY_RUNNING_GAUGE = "FlowsAlreadyRunning"; + public static final String COMPILED = "Compiled"; Review comment: Maybe "CompiledFlows"? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 247569) Time Spent: 50m (was: 40m) > emit a few metrics for gobblin service > -- > > Key: GOBBLIN-771 > URL: https://issues.apache.org/jira/browse/GOBBLIN-771 > Project: Apache Gobblin > Issue Type: Task >Reporter: Arjun Singh Bora >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2635: [GOBBLIN-771] add a few metrics for gobblin service
sv2000 commented on a change in pull request #2635: [GOBBLIN-771] add a few metrics for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2635#discussion_r287041859 ## File path: gobblin-metrics-libs/gobblin-metrics/src/main/java/org/apache/gobblin/metrics/ServiceMetricNames.java ## @@ -37,5 +37,5 @@ public static final String RUN_IMMEDIATELY_FLOW_METER = "RunImmediatelyFlow"; public static final String RUNNING_FLOWS_COUNTER = "RunningFlows"; - public static final String FLOWS_ALREADY_RUNNING_GAUGE = "FlowsAlreadyRunning"; + public static final String COMPILED = "Compiled"; Review comment: Maybe "CompiledFlows"? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2635: [GOBBLIN-771] add a few metrics for gobblin service
sv2000 commented on a change in pull request #2635: [GOBBLIN-771] add a few metrics for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2635#discussion_r287054481 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/Orchestrator.java ## @@ -241,10 +233,15 @@ public void orchestrate(Spec spec) throws Exception { if (!canRun(flowName, flowGroup, allowConcurrentExecution)) { _log.warn("Another instance of flowGroup: {}, flowName: {} running; Skipping flow execution since " + "concurrent executions are disabled for this flow.", flowGroup, flowName); -if (this.flowAlreadyRunningGauge.isPresent()) { - this.jobAlreadyRunning.incrementAndGet(); -} +// We send a gauge with value 0 signifying that the flow could not be compiled because previous execution is already running +metricContext.newContextAwareGauge( + MetricRegistry.name(MetricReportUtils.GOBBLIN_SERVICE_METRICS_PREFIX, flowGroup, flowName, ServiceMetricNames.COMPILED), +() -> 0L); Review comment: Can we use gauge with values 1 and 2 instead of 0 and 1? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-775) Add job level retry for gobblin service
[ https://issues.apache.org/jira/browse/GOBBLIN-775?focusedWorklogId=247561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247561 ] ASF GitHub Bot logged work on GOBBLIN-775: -- Author: ASF GitHub Bot Created on: 23/May/19 17:30 Start Date: 23/May/19 17:30 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2640: [GOBBLIN-775] Add job level retries for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2640#discussion_r287050503 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManagerUtils.java ## @@ -75,8 +75,10 @@ static String getJobName(DagNode dagNode) { * @return a fully qualified name of the underlying job. */ static String getFullyQualifiedJobName(DagNode dagNode) { -Config jobConfig = dagNode.getValue().getJobSpec().getConfig(); +return getFullyQualifiedJobName(dagNode.getValue().getJobSpec().getConfig()); + } + public static String getFullyQualifiedJobName(Config jobConfig) { Review comment: Let's give it an another name instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 247561) Time Spent: 40m (was: 0.5h) > Add job level retry for gobblin service > --- > > Key: GOBBLIN-775 > URL: https://issues.apache.org/jira/browse/GOBBLIN-775 > Project: Apache Gobblin > Issue Type: New Feature > Components: gobblin-service >Reporter: Jack Moseley >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-775) Add job level retry for gobblin service
[ https://issues.apache.org/jira/browse/GOBBLIN-775?focusedWorklogId=247563=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247563 ] ASF GitHub Bot logged work on GOBBLIN-775: -- Author: ASF GitHub Bot Created on: 23/May/19 17:30 Start Date: 23/May/19 17:30 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2640: [GOBBLIN-775] Add job level retries for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2640#discussion_r287053928 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java ## @@ -391,6 +392,10 @@ private void pollAndAdvanceDag() jobExecutionPlan.setExecutionStatus(RUNNING); break; } + +if (jobStatus.isShouldRetry()) { Review comment: Only failed job's status will have this flag turned on? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 247563) Time Spent: 1h (was: 50m) > Add job level retry for gobblin service > --- > > Key: GOBBLIN-775 > URL: https://issues.apache.org/jira/browse/GOBBLIN-775 > Project: Apache Gobblin > Issue Type: New Feature > Components: gobblin-service >Reporter: Jack Moseley >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-775) Add job level retry for gobblin service
[ https://issues.apache.org/jira/browse/GOBBLIN-775?focusedWorklogId=247562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247562 ] ASF GitHub Bot logged work on GOBBLIN-775: -- Author: ASF GitHub Bot Created on: 23/May/19 17:30 Start Date: 23/May/19 17:30 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2640: [GOBBLIN-775] Add job level retries for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2640#discussion_r287051518 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java ## @@ -457,6 +462,15 @@ private void submitJob(DagNode dagNode) { JobSpec jobSpec = DagManagerUtils.getJobSpec(dagNode); Map jobMetadata = TimingEventUtils.getJobMetadata(Maps.newHashMap(), jobExecutionPlan); + // Increment submission attempt Review comment: Should this blocked be part of `TimingEventUtils.getJobMetadata` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 247562) Time Spent: 50m (was: 40m) > Add job level retry for gobblin service > --- > > Key: GOBBLIN-775 > URL: https://issues.apache.org/jira/browse/GOBBLIN-775 > Project: Apache Gobblin > Issue Type: New Feature > Components: gobblin-service >Reporter: Jack Moseley >Assignee: Abhishek Tiwari >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2640: [GOBBLIN-775] Add job level retries for gobblin service
autumnust commented on a change in pull request #2640: [GOBBLIN-775] Add job level retries for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2640#discussion_r287050503 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManagerUtils.java ## @@ -75,8 +75,10 @@ static String getJobName(DagNode dagNode) { * @return a fully qualified name of the underlying job. */ static String getFullyQualifiedJobName(DagNode dagNode) { -Config jobConfig = dagNode.getValue().getJobSpec().getConfig(); +return getFullyQualifiedJobName(dagNode.getValue().getJobSpec().getConfig()); + } + public static String getFullyQualifiedJobName(Config jobConfig) { Review comment: Let's give it an another name instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2640: [GOBBLIN-775] Add job level retries for gobblin service
autumnust commented on a change in pull request #2640: [GOBBLIN-775] Add job level retries for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2640#discussion_r287051518 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java ## @@ -457,6 +462,15 @@ private void submitJob(DagNode dagNode) { JobSpec jobSpec = DagManagerUtils.getJobSpec(dagNode); Map jobMetadata = TimingEventUtils.getJobMetadata(Maps.newHashMap(), jobExecutionPlan); + // Increment submission attempt Review comment: Should this blocked be part of `TimingEventUtils.getJobMetadata` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2640: [GOBBLIN-775] Add job level retries for gobblin service
autumnust commented on a change in pull request #2640: [GOBBLIN-775] Add job level retries for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2640#discussion_r287053928 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java ## @@ -391,6 +392,10 @@ private void pollAndAdvanceDag() jobExecutionPlan.setExecutionStatus(RUNNING); break; } + +if (jobStatus.isShouldRetry()) { Review comment: Only failed job's status will have this flag turned on? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (GOBBLIN-780) Handle scenarios that cause the YarnAutoScalingManager to be stuck
[ https://issues.apache.org/jira/browse/GOBBLIN-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran updated GOBBLIN-780: -- Summary: Handle scenarios that cause the YarnAutoScalingManager to be stuck (was: Handle scenarios that causes the YarnAutoScalingManager to be stuck) > Handle scenarios that cause the YarnAutoScalingManager to be stuck > -- > > Key: GOBBLIN-780 > URL: https://issues.apache.org/jira/browse/GOBBLIN-780 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Priority: Major > > Issue 1: The YarnAutoScalingRunnable is run in a fixed schedule by a > ScheduledExecutorService in YarnAutoScalingManager. If the runnable > encounters an exception the the executor service will stop scheduling it. > Catch all exceptions in the runnable, log, and do not re-raise. > Issue 2: The auto scaler may reduce the container count to 0. Helix will not > schedule any flows if there are no participants connected. This results in > the auto scaler keeping the container count at 0 and no progress is made. Fix > this by not allowing the container count to be reduced below 1. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-780) Handle scenarios that cause the YarnAutoScalingManager to be stuck
[ https://issues.apache.org/jira/browse/GOBBLIN-780?focusedWorklogId=247520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247520 ] ASF GitHub Bot logged work on GOBBLIN-780: -- Author: ASF GitHub Bot Created on: 23/May/19 16:29 Start Date: 23/May/19 16:29 Worklog Time Spent: 10m Work Description: htran1 commented on pull request #2644: [GOBBLIN-780] Handle scenarios that cause the YarnAutoScalingManager … URL: https://github.com/apache/incubator-gobblin/pull/2644 …to be stuck Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [X] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-780 ### Description - [X] Here are some details about my PR, including screenshots (if applicable): Issue 1: The YarnAutoScalingRunnable is run in a fixed schedule by a ScheduledExecutorService in YarnAutoScalingManager. If the runnable encounters an exception the the executor service will stop scheduling it. Catch all exceptions in the runnable, log, and do not re-raise. Issue 2: The auto scaler may reduce the container count to 0. Helix will not schedule any flows if there are no participants connected. This results in the auto scaler keeping the container count at 0 and no progress is made. Fix this by not allowing the container count to be reduced below 1. ### Tests - [X] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [X] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 247520) Time Spent: 10m Remaining Estimate: 0h > Handle scenarios that cause the YarnAutoScalingManager to be stuck > -- > > Key: GOBBLIN-780 > URL: https://issues.apache.org/jira/browse/GOBBLIN-780 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Issue 1: The YarnAutoScalingRunnable is run in a fixed schedule by a > ScheduledExecutorService in YarnAutoScalingManager. If the runnable > encounters an exception the the executor service will stop scheduling it. > Catch all exceptions in the runnable, log, and do not re-raise. > Issue 2: The auto scaler may reduce the container count to 0. Helix will not > schedule any flows if there are no participants connected. This results in > the auto scaler keeping the container count at 0 and no progress is made. Fix > this by not allowing the container count to be reduced below 1. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-780) Handle scenarios that causes the YarnAutoScalingManager to be stuck
Hung Tran created GOBBLIN-780: - Summary: Handle scenarios that causes the YarnAutoScalingManager to be stuck Key: GOBBLIN-780 URL: https://issues.apache.org/jira/browse/GOBBLIN-780 Project: Apache Gobblin Issue Type: Task Reporter: Hung Tran Issue 1: The YarnAutoScalingRunnable is run in a fixed schedule by a ScheduledExecutorService in YarnAutoScalingManager. If the runnable encounters an exception the the executor service will stop scheduling it. Catch all exceptions in the runnable, log, and do not re-raise. Issue 2: The auto scaler may reduce the container count to 0. Helix will not schedule any flows if there are no participants connected. This results in the auto scaler keeping the container count at 0 and no progress is made. Fix this by not allowing the container count to be reduced below 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)