[jira] [Updated] (GOBBLIN-531) Gobblin AWS Worker cannot start because of state store type and uri mismatch
[ https://issues.apache.org/jira/browse/GOBBLIN-531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Baranick updated GOBBLIN-531: -- Attachment: Screen Shot 2018-07-12 at 8.47.07 AM.png > Gobblin AWS Worker cannot start because of state store type and uri mismatch > > > Key: GOBBLIN-531 > URL: https://issues.apache.org/jira/browse/GOBBLIN-531 > Project: Apache Gobblin > Issue Type: Bug > Components: gobblin-aws >Affects Versions: 0.12.0 >Reporter: Joel Baranick >Assignee: Abhishek Tiwari >Priority: Major > Labels: aws, helix > Attachments: Screen Shot 2018-07-12 at 8.47.07 AM.png > > > Something has changed from 0.10.0 to 0.12.0 which causes the _StateStores_ > class to be instantiated with a _state.store.fs.uri_ which is mismatched with > the _state.store.type_. > The problem seems to be from: > [GobblinTaskRunner.java#L250|https://github.com/apache/incubator-gobblin/blob/0.12.0/gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java#L250] > It create a new _Config_ for like: > > {code:java} > Config stateStoreJobConfig = ConfigUtils.propertiesToConfig(properties) > .withValue(ConfigurationKeys.STATE_STORE_FS_URI_KEY, > ConfigValueFactory.fromAnyRef(rootPathUri.toString())); > {code} > Compare this to: > [GobblinHelixJobLauncher.java#L156|https://github.com/apache/incubator-gobblin/blob/0.12.0/gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinHelixJobLauncher.java#L156] > > It creates a new _Config_ like: > > {code:java} > Config stateStoreJobConfig = ConfigUtils.propertiesToConfig(jobProps) > .withValue(ConfigurationKeys.STATE_STORE_FS_URI_KEY, > ConfigValueFactory.fromAnyRef( new URI(appWorkDir.toUri().getScheme(), null, > appWorkDir.toUri().getHost(), appWorkDir.toUri().getPort(), null, null, > null).toString())); > {code} > The following screenshot shows the callstack and the overridden value. > !Screen Shot 2018-07-12 at 8.47.07 AM.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-531) Gobblin AWS Worker cannot start because of state store type and uri mismatch
Joel Baranick created GOBBLIN-531: - Summary: Gobblin AWS Worker cannot start because of state store type and uri mismatch Key: GOBBLIN-531 URL: https://issues.apache.org/jira/browse/GOBBLIN-531 Project: Apache Gobblin Issue Type: Bug Components: gobblin-aws Affects Versions: 0.12.0 Reporter: Joel Baranick Assignee: Abhishek Tiwari Something has changed from 0.10.0 to 0.12.0 which causes the _StateStores_ class to be instantiated with a _state.store.fs.uri_ which is mismatched with the _state.store.type_. The problem seems to be from: [GobblinTaskRunner.java#L250|https://github.com/apache/incubator-gobblin/blob/0.12.0/gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java#L250] It create a new _Config_ for like: {code:java} Config stateStoreJobConfig = ConfigUtils.propertiesToConfig(properties) .withValue(ConfigurationKeys.STATE_STORE_FS_URI_KEY, ConfigValueFactory.fromAnyRef(rootPathUri.toString())); {code} Compare this to: [GobblinHelixJobLauncher.java#L156|https://github.com/apache/incubator-gobblin/blob/0.12.0/gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinHelixJobLauncher.java#L156] It creates a new _Config_ like: {code:java} Config stateStoreJobConfig = ConfigUtils.propertiesToConfig(jobProps) .withValue(ConfigurationKeys.STATE_STORE_FS_URI_KEY, ConfigValueFactory.fromAnyRef( new URI(appWorkDir.toUri().getScheme(), null, appWorkDir.toUri().getHost(), appWorkDir.toUri().getPort(), null, null, null).toString())); {code} The following screenshot shows the callstack and the overridden value. !Screen Shot 2018-07-12 at 8.47.07 AM.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-371) In gobblin_pr, Jira resolution fails if python jira package is not installed
[ https://issues.apache.org/jira/browse/GOBBLIN-371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Baranick resolved GOBBLIN-371. --- Resolution: Fixed Fix Version/s: 0.13.0 Issue resolved by pull request #2246 [https://github.com/apache/incubator-gobblin/pull/2246] > In gobblin_pr, Jira resolution fails if python jira package is not installed > > > Key: GOBBLIN-371 > URL: https://issues.apache.org/jira/browse/GOBBLIN-371 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick >Assignee: Joel Baranick >Priority: Major > Fix For: 0.13.0 > > > In gobblin_pr, Jira resolution fails if python jira package is not installed. > If this happens, there is no easy way to recover and you have to resolve the > jira issue manually. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-389) Gobblin class resolution requires all classes to be in gobblin packages
Joel Baranick created GOBBLIN-389: - Summary: Gobblin class resolution requires all classes to be in gobblin packages Key: GOBBLIN-389 URL: https://issues.apache.org/jira/browse/GOBBLIN-389 Project: Apache Gobblin Issue Type: Bug Reporter: Joel Baranick Gobblin performs classpath scanning to allow loading classes from configured aliases. The current mechanism forces classes to be in a few specific gobblin packages. This is confusing for users and increases support costs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GOBBLIN-371) In gobblin_pr, Jira resolution fails if python jira package is not installed
[ https://issues.apache.org/jira/browse/GOBBLIN-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16339425#comment-16339425 ] Joel Baranick commented on GOBBLIN-371: --- [~abti] Any feedback on this bug and the associated PR? > In gobblin_pr, Jira resolution fails if python jira package is not installed > > > Key: GOBBLIN-371 > URL: https://issues.apache.org/jira/browse/GOBBLIN-371 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick >Assignee: Joel Baranick >Priority: Major > > In gobblin_pr, Jira resolution fails if python jira package is not installed. > If this happens, there is no easy way to recover and you have to resolve the > jira issue manually. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely
[ https://issues.apache.org/jira/browse/GOBBLIN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334662#comment-16334662 ] Joel Baranick commented on GOBBLIN-318: --- Great! I think we should leave this open as there is still some underlying issue that needs to be fixes. > Gobblin Helix Jobs Hang Indefinitely > - > > Key: GOBBLIN-318 > URL: https://issues.apache.org/jira/browse/GOBBLIN-318 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick >Priority: Critical > > In some cases, gobblin helix jobs can hang indefinitely. When coupled with > job locks, this can result in a job becoming stuck and not progressing. The > only solution currently is to restart the master node. > Assume the following is for a {{job_myjob_1510884004834}} and which hung at > 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. > {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job > as completed. This results in the {{TaskStateCollectorService}} indefinitely > searching for more task states, even though it has processed all the task > states that are ever going to be produced. There is no reference to the hung > job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}. In the Helix Web Admin, > the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. > There is no record of the job in Zookeeper at > {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}. This means that > the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails. > {code:java} > private void waitForJobCompletion() throws InterruptedException { > while (true) { > WorkflowContext workflowContext = > TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName); > if (workflowContext != null) { > org.apache.helix.task.TaskState helixJobState = > workflowContext.getJobState(this.jobResourceName); > if (helixJobState == org.apache.helix.task.TaskState.COMPLETED || > helixJobState == org.apache.helix.task.TaskState.FAILED || > helixJobState == org.apache.helix.task.TaskState.STOPPED) { > return; > } > } > Thread.sleep(1000); > } > } > {code} > The code gets the job state from Zookeeper: > {code:javascript} > { > "id": "WorkflowContext", > "simpleFields": { > "START_TIME": "1505159715449", > "STATE": "IN_PROGRESS" > }, > "listFields": {}, > "mapFields": { > "JOB_STATES": { > "jobname_job_jobname_150741571": "COMPLETED", > "jobname_job_jobname_150775680": "COMPLETED", > "jobname_job_jobname_150795931": "COMPLETED", > "jobname_job_jobname_1509857102910": "COMPLETED", > "jobname_job_jobname_1510253708033": "COMPLETED", > "jobname_job_jobname_1510271102898": "COMPLETED", > "jobname_job_jobname_1510852210668": "COMPLETED", > "jobname_job_jobname_1510853133675": "COMPLETED" > } > } > } > {code} > But there is no information contained in the job state for the hung job. > Also, it is really strange that the job states contained in that json blob > are so old. The oldest one is from 2017-10-7 10:35:00 PM UTC, more than a > month ago. > I'm not sure how the system got in this state, but this isn't the first time > we have seen this. While it would be good to prevent this from happening, it > would also be good to allow the system to recover if this state is entered. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GOBBLIN-378) Task only publish data when the state is successful in the earlier processing
[ https://issues.apache.org/jira/browse/GOBBLIN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329866#comment-16329866 ] Joel Baranick commented on GOBBLIN-378: --- What versions does this impact? Can you give more details to allow others to access whether they are impacted? > Task only publish data when the state is successful in the earlier processing > - > > Key: GOBBLIN-378 > URL: https://issues.apache.org/jira/browse/GOBBLIN-378 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Kuai Yu >Assignee: Kuai Yu >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (GOBBLIN-378) Task only publish data when the state is successful in the earlier processing
[ https://issues.apache.org/jira/browse/GOBBLIN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329866#comment-16329866 ] Joel Baranick edited comment on GOBBLIN-378 at 1/18/18 1:40 AM: What versions does this impact? Can you give more details to allow others to assess whether they are impacted? was (Author: jbaranick): What versions does this impact? Can you give more details to allow others to access whether they are impacted? > Task only publish data when the state is successful in the earlier processing > - > > Key: GOBBLIN-378 > URL: https://issues.apache.org/jira/browse/GOBBLIN-378 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Kuai Yu >Assignee: Kuai Yu >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely
[ https://issues.apache.org/jira/browse/GOBBLIN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329381#comment-16329381 ] Joel Baranick commented on GOBBLIN-318: --- Retries to zk updates were added to master helix in the commit > Gobblin Helix Jobs Hang Indefinitely > - > > Key: GOBBLIN-318 > URL: https://issues.apache.org/jira/browse/GOBBLIN-318 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick >Priority: Critical > > In some cases, gobblin helix jobs can hang indefinitely. When coupled with > job locks, this can result in a job becoming stuck and not progressing. The > only solution currently is to restart the master node. > Assume the following is for a {{job_myjob_1510884004834}} and which hung at > 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. > {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job > as completed. This results in the {{TaskStateCollectorService}} indefinitely > searching for more task states, even though it has processed all the task > states that are ever going to be produced. There is no reference to the hung > job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}. In the Helix Web Admin, > the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. > There is no record of the job in Zookeeper at > {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}. This means that > the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails. > {code:java} > private void waitForJobCompletion() throws InterruptedException { > while (true) { > WorkflowContext workflowContext = > TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName); > if (workflowContext != null) { > org.apache.helix.task.TaskState helixJobState = > workflowContext.getJobState(this.jobResourceName); > if (helixJobState == org.apache.helix.task.TaskState.COMPLETED || > helixJobState == org.apache.helix.task.TaskState.FAILED || > helixJobState == org.apache.helix.task.TaskState.STOPPED) { > return; > } > } > Thread.sleep(1000); > } > } > {code} > The code gets the job state from Zookeeper: > {code:javascript} > { > "id": "WorkflowContext", > "simpleFields": { > "START_TIME": "1505159715449", > "STATE": "IN_PROGRESS" > }, > "listFields": {}, > "mapFields": { > "JOB_STATES": { > "jobname_job_jobname_150741571": "COMPLETED", > "jobname_job_jobname_150775680": "COMPLETED", > "jobname_job_jobname_150795931": "COMPLETED", > "jobname_job_jobname_1509857102910": "COMPLETED", > "jobname_job_jobname_1510253708033": "COMPLETED", > "jobname_job_jobname_1510271102898": "COMPLETED", > "jobname_job_jobname_1510852210668": "COMPLETED", > "jobname_job_jobname_1510853133675": "COMPLETED" > } > } > } > {code} > But there is no information contained in the job state for the hung job. > Also, it is really strange that the job states contained in that json blob > are so old. The oldest one is from 2017-10-7 10:35:00 PM UTC, more than a > month ago. > I'm not sure how the system got in this state, but this isn't the first time > we have seen this. While it would be good to prevent this from happening, it > would also be good to allow the system to recover if this state is entered. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GOBBLIN-302) Handle stuck Helix workflow
[ https://issues.apache.org/jira/browse/GOBBLIN-302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329191#comment-16329191 ] Joel Baranick commented on GOBBLIN-302: --- This seems like it is related to: [GOBBLIN-318|https://issues.apache.org/jira/browse/GOBBLIN-318] > Handle stuck Helix workflow > --- > > Key: GOBBLIN-302 > URL: https://issues.apache.org/jira/browse/GOBBLIN-302 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Arjun Singh Bora >Assignee: Arjun Singh Bora >Priority: Trivial > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GOBBLIN-373) Expose task executor auto scale metrics to external sensor
[ https://issues.apache.org/jira/browse/GOBBLIN-373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327516#comment-16327516 ] Joel Baranick commented on GOBBLIN-373: --- [~yukuai518] Any more details. Just curious how these are going to be used. > Expose task executor auto scale metrics to external sensor > -- > > Key: GOBBLIN-373 > URL: https://issues.apache.org/jira/browse/GOBBLIN-373 > Project: Apache Gobblin > Issue Type: Task >Reporter: Kuai Yu >Assignee: Kuai Yu >Priority: Major > > This is used for LinkedIn inGraph integration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-371) In gobblin_pr, Jira resolution fails if python jira package is not installed
Joel Baranick created GOBBLIN-371: - Summary: In gobblin_pr, Jira resolution fails if python jira package is not installed Key: GOBBLIN-371 URL: https://issues.apache.org/jira/browse/GOBBLIN-371 Project: Apache Gobblin Issue Type: Bug Reporter: Joel Baranick In gobblin_pr, Jira resolution fails if python jira package is not installed. If this happens, there is no easy way to recover and you have to resolve the jira issue manually. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-207) Gobblin AWS requires job package to be publicly accessible
[ https://issues.apache.org/jira/browse/GOBBLIN-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Baranick resolved GOBBLIN-207. --- Resolution: Fixed Fix Version/s: 0.13.0 Issue resolved by pull request #2059 (https://github.com/apache/incubator-gobblin/pull/2059) > Gobblin AWS requires job package to be publicly accessible > -- > > Key: GOBBLIN-207 > URL: https://issues.apache.org/jira/browse/GOBBLIN-207 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick >Assignee: Joel Baranick > Fix For: 0.13.0 > > > {{GobblinAwsJobConfigurationManager}} expects that the job configuration file > is publicly accessible so that it can be downloaded. This PR changes how the > download is done, using Hadoop FS, so that the job package can be stored on > filesystems that don't expose it over HTTP and so that authentication can be > performed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (GOBBLIN-207) Gobblin AWS requires job package to be publicly accessible
[ https://issues.apache.org/jira/browse/GOBBLIN-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325018#comment-16325018 ] Joel Baranick edited comment on GOBBLIN-207 at 1/13/18 7:08 AM: Issue resolved by pull request #2059 https://github.com/apache/incubator-gobblin/pull/2059 was (Author: jbaranick): Issue resolved by pull request #2059 (https://github.com/apache/incubator-gobblin/pull/2059) > Gobblin AWS requires job package to be publicly accessible > -- > > Key: GOBBLIN-207 > URL: https://issues.apache.org/jira/browse/GOBBLIN-207 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick >Assignee: Joel Baranick > Fix For: 0.13.0 > > > {{GobblinAwsJobConfigurationManager}} expects that the job configuration file > is publicly accessible so that it can be downloaded. This PR changes how the > download is done, using Hadoop FS, so that the job package can be stored on > filesystems that don't expose it over HTTP and so that authentication can be > performed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-360) Helix not pruning old Zookeeper data
Joel Baranick created GOBBLIN-360: - Summary: Helix not pruning old Zookeeper data Key: GOBBLIN-360 URL: https://issues.apache.org/jira/browse/GOBBLIN-360 Project: Apache Gobblin Issue Type: Bug Reporter: Joel Baranick Helix version 0.6.7 is not correctly pruning old data in zookeeper at path {{/root/cluster/PROPERTYSTORE/TaskRebalancer}}. This causes the zookeeper cluster to keep using more disk and memory. At some point, the number of children in the folder exceeds the default zookeeper {{jute.maxbuffer}} setting and the contents of {{/root/cluster/PROPERTYSTORE/TaskRebalancer}} cannot be listed, deleted, etc. The only resolution at that point to reduce data is to add the {{-Djute.maxbuffer=999}} system parameter to all zookeeper servers and the zkCli application, restart the zookeeper processes, connect via zkCli, and cleanup the data. Once done, {{-Djute.maxbuffer}} can be removed from the zookeeper servers and the zkCli application. Then restart the zookeeper server processes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (GOBBLIN-359) Logged Job/Task info from TaskExecutor threads sometimes does not match the task running
[ https://issues.apache.org/jira/browse/GOBBLIN-359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Baranick updated GOBBLIN-359: -- Summary: Logged Job/Task info from TaskExecutor threads sometimes does not match the task running (was: Job/task info stored in MDC sometimes is incorrect) > Logged Job/Task info from TaskExecutor threads sometimes does not match the > task running > > > Key: GOBBLIN-359 > URL: https://issues.apache.org/jira/browse/GOBBLIN-359 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick >Assignee: Joel Baranick > > In some cases the job/task information that is stored in the MDC to improve > logging doesn't match the actual task being run on a given thread. It seems > as if the MDC contents are not always being managed in a way that ensures > that when a task is complete the MDC data is cleared. > One place I noticed was in {{TaskExecutor}}, where {{this.taskExecutor}} and > {{this.forkExecutor}} are not wrapped with > {{ExecutorUtils.loggingDecorator}}. {{ExecutorUtils.loggingDecorator}} > ensures that submitted {{Runnable}} and {{Callable}} instances first clone > the MDC and finally reset the MDC. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-359) Job/task info stored in MDC sometimes is incorrect
Joel Baranick created GOBBLIN-359: - Summary: Job/task info stored in MDC sometimes is incorrect Key: GOBBLIN-359 URL: https://issues.apache.org/jira/browse/GOBBLIN-359 Project: Apache Gobblin Issue Type: Bug Reporter: Joel Baranick Assignee: Joel Baranick In some cases the job/task information that is stored in the MDC to improve logging doesn't match the actual task being run on a given thread. It seems as if the MDC contents are not always being managed in a way that ensures that when a task is complete the MDC data is cleared. One place I noticed was in {{TaskExecutor}}, where {{this.taskExecutor}} and {{this.forkExecutor}} are not wrapped with {{ExecutorUtils.loggingDecorator}}. {{ExecutorUtils.loggingDecorator}} ensures that submitted {{Runnable}} and {{Callable}} instances first clone the MDC and finally reset the MDC. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-357) Poor logging when zookeeper connection is lost
[ https://issues.apache.org/jira/browse/GOBBLIN-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Baranick resolved GOBBLIN-357. --- Resolution: Fixed Fix Version/s: 0.13.0 Issue resolved by pull request #2230 [https://github.com/apache/incubator-gobblin/pull/2230] > Poor logging when zookeeper connection is lost > -- > > Key: GOBBLIN-357 > URL: https://issues.apache.org/jira/browse/GOBBLIN-357 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick >Assignee: Joel Baranick > Fix For: 0.13.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-357) Poor logging when zookeeper connection is lost
Joel Baranick created GOBBLIN-357: - Summary: Poor logging when zookeeper connection is lost Key: GOBBLIN-357 URL: https://issues.apache.org/jira/browse/GOBBLIN-357 Project: Apache Gobblin Issue Type: Bug Reporter: Joel Baranick Assignee: Joel Baranick -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely
[ https://issues.apache.org/jira/browse/GOBBLIN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286433#comment-16286433 ] Joel Baranick commented on GOBBLIN-318: --- [~abti] Job timeouts will help. That said, the underlying issue of the TaskCollectorService missing task states should be resolved. > Gobblin Helix Jobs Hang Indefinitely > - > > Key: GOBBLIN-318 > URL: https://issues.apache.org/jira/browse/GOBBLIN-318 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick >Priority: Critical > > In some cases, gobblin helix jobs can hang indefinitely. When coupled with > job locks, this can result in a job becoming stuck and not progressing. The > only solution currently is to restart the master node. > Assume the following is for a {{job_myjob_1510884004834}} and which hung at > 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. > {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job > as completed. This results in the {{TaskStateCollectorService}} indefinitely > searching for more task states, even though it has processed all the task > states that are ever going to be produced. There is no reference to the hung > job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}. In the Helix Web Admin, > the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. > There is no record of the job in Zookeeper at > {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}. This means that > the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails. > {code:java} > private void waitForJobCompletion() throws InterruptedException { > while (true) { > WorkflowContext workflowContext = > TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName); > if (workflowContext != null) { > org.apache.helix.task.TaskState helixJobState = > workflowContext.getJobState(this.jobResourceName); > if (helixJobState == org.apache.helix.task.TaskState.COMPLETED || > helixJobState == org.apache.helix.task.TaskState.FAILED || > helixJobState == org.apache.helix.task.TaskState.STOPPED) { > return; > } > } > Thread.sleep(1000); > } > } > {code} > The code gets the job state from Zookeeper: > {code:javascript} > { > "id": "WorkflowContext", > "simpleFields": { > "START_TIME": "1505159715449", > "STATE": "IN_PROGRESS" > }, > "listFields": {}, > "mapFields": { > "JOB_STATES": { > "jobname_job_jobname_150741571": "COMPLETED", > "jobname_job_jobname_150775680": "COMPLETED", > "jobname_job_jobname_150795931": "COMPLETED", > "jobname_job_jobname_1509857102910": "COMPLETED", > "jobname_job_jobname_1510253708033": "COMPLETED", > "jobname_job_jobname_1510271102898": "COMPLETED", > "jobname_job_jobname_1510852210668": "COMPLETED", > "jobname_job_jobname_1510853133675": "COMPLETED" > } > } > } > {code} > But there is no information contained in the job state for the hung job. > Also, it is really strange that the job states contained in that json blob > are so old. The oldest one is from 2017-10-7 10:35:00 PM UTC, more than a > month ago. > I'm not sure how the system got in this state, but this isn't the first time > we have seen this. While it would be good to prevent this from happening, it > would also be good to allow the system to recover if this state is entered. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely
[ https://issues.apache.org/jira/browse/GOBBLIN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286467#comment-16286467 ] Joel Baranick edited comment on GOBBLIN-318 at 12/11/17 8:00 PM: - To summarize the issue: # The job is running. # The job lock is still held. # All tasks have completed successfully and written their task state files. # The job has consumed all the task state files and updated the gobblin job and database # The helix state in Zookeeper is missing or not in a terminal state. # The job keeps polling the state at "/mycluster/PROPERTYSTORE/TaskRebalancer/myjob/Context". was (Author: jbaranick): To summarize the issue: # The job is running. # The job lock is still help. # All tasks have completed successfully and written their task state files. # The job has consumed all the task state files and updated the gobblin job and database # The helix state in Zookeeper is missing or not in a terminal state. # The job keeps polling the state at "/mycluster/PROPERTYSTORE/TaskRebalancer/myjob/Context" > Gobblin Helix Jobs Hang Indefinitely > - > > Key: GOBBLIN-318 > URL: https://issues.apache.org/jira/browse/GOBBLIN-318 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick >Priority: Critical > > In some cases, gobblin helix jobs can hang indefinitely. When coupled with > job locks, this can result in a job becoming stuck and not progressing. The > only solution currently is to restart the master node. > Assume the following is for a {{job_myjob_1510884004834}} and which hung at > 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. > {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job > as completed. This results in the {{TaskStateCollectorService}} indefinitely > searching for more task states, even though it has processed all the task > states that are ever going to be produced. There is no reference to the hung > job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}. In the Helix Web Admin, > the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. > There is no record of the job in Zookeeper at > {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}. This means that > the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails. > {code:java} > private void waitForJobCompletion() throws InterruptedException { > while (true) { > WorkflowContext workflowContext = > TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName); > if (workflowContext != null) { > org.apache.helix.task.TaskState helixJobState = > workflowContext.getJobState(this.jobResourceName); > if (helixJobState == org.apache.helix.task.TaskState.COMPLETED || > helixJobState == org.apache.helix.task.TaskState.FAILED || > helixJobState == org.apache.helix.task.TaskState.STOPPED) { > return; > } > } > Thread.sleep(1000); > } > } > {code} > The code gets the job state from Zookeeper: > {code:javascript} > { > "id": "WorkflowContext", > "simpleFields": { > "START_TIME": "1505159715449", > "STATE": "IN_PROGRESS" > }, > "listFields": {}, > "mapFields": { > "JOB_STATES": { > "jobname_job_jobname_150741571": "COMPLETED", > "jobname_job_jobname_150775680": "COMPLETED", > "jobname_job_jobname_150795931": "COMPLETED", > "jobname_job_jobname_1509857102910": "COMPLETED", > "jobname_job_jobname_1510253708033": "COMPLETED", > "jobname_job_jobname_1510271102898": "COMPLETED", > "jobname_job_jobname_1510852210668": "COMPLETED", > "jobname_job_jobname_1510853133675": "COMPLETED" > } > } > } > {code} > But there is no information contained in the job state for the hung job. > Also, it is really strange that the job states contained in that json blob > are so old. The oldest one is from 2017-10-7 10:35:00 PM UTC, more than a > month ago. > I'm not sure how the system got in this state, but this isn't the first time > we have seen this. While it would be good to prevent this from happening, it > would also be good to allow the system to recover if this state is entered. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely
[ https://issues.apache.org/jira/browse/GOBBLIN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286602#comment-16286602 ] Joel Baranick commented on GOBBLIN-318: --- Another piece of info. All tasks are marked as completed in the Gobblin DB, but when I look at https://zookeeper/node?path=/ROOT/CLUSTER/PROPERTYSTORE/TaskRebalancer/JOB_NAME_job_JOB_NAME_1512924480001/Context , there are multiple tasks still marked as running: {code:java} { "id":"TaskContext" ,"simpleFields":{ "START_TIME":"1512924491039" } ,"listFields":{ } ,"mapFields":{ "0":{ "ASSIGNED_PARTICIPANT":"worker-1" ,"FINISH_TIME":"1512924700877" ,"INFO":"completed tasks: 1" ,"NUM_ATTEMPTS":"1" ,"START_TIME":"1512924491044" ,"STATE":"COMPLETED" ,"TASK_ID":"124a2e88-90e3-40e8-add6-94b59ee30133" } ,"1":{ "ASSIGNED_PARTICIPANT":"worker-2" ,"FINISH_TIME":"1512924701120" ,"INFO":"completed tasks: 1" ,"NUM_ATTEMPTS":"1" ,"START_TIME":"1512924491044" ,"STATE":"COMPLETED" ,"TASK_ID":"9d7c2369-d6d9-4c2f-8bf3-1bcea0a47fdf" } ,"2":{ "ASSIGNED_PARTICIPANT":"worker-3" ,"FINISH_TIME":"1512924695451" ,"INFO":"completed tasks: 1" ,"NUM_ATTEMPTS":"1" ,"START_TIME":"1512924491044" ,"STATE":"COMPLETED" ,"TASK_ID":"19545764-e2bf-48b6-9942-361c834790cf" } ,"3":{ "ASSIGNED_PARTICIPANT":"worker-4" ,"FINISH_TIME":"1512924776614" ,"INFO":"completed tasks: 1" ,"NUM_ATTEMPTS":"1" ,"START_TIME":"1512924491044" ,"STATE":"COMPLETED" ,"TASK_ID":"3f59431f-2415-477a-8008-26a3eb258129" } ,"4":{ "ASSIGNED_PARTICIPANT":"worker-5" ,"FINISH_TIME":"1512924731962" ,"INFO":"completed tasks: 1" ,"NUM_ATTEMPTS":"1" ,"START_TIME":"1512924491044" ,"STATE":"COMPLETED" ,"TASK_ID":"19863633-6ed3-49d4-a07f-2130eec15dd3" } ,"5":{ "ASSIGNED_PARTICIPANT":"worker-6" ,"INFO":"" ,"START_TIME":"1512924491044" ,"STATE":"RUNNING" ,"TASK_ID":"433c0107-0919-428a-b7c5-6e8925df7dac" } ,"6":{ "ASSIGNED_PARTICIPANT":"worker-7" ,"INFO":"" ,"START_TIME":"1512924491044" ,"STATE":"RUNNING" ,"TASK_ID":"89a63cfd-efb4-44ce-a08b-68678d792e25" } ,"7":{ "ASSIGNED_PARTICIPANT":"worker-8" ,"FINISH_TIME":"1512924524111" ,"INFO":"completed tasks: 1" ,"NUM_ATTEMPTS":"1" ,"START_TIME":"1512924491044" ,"STATE":"COMPLETED" ,"TASK_ID":"a133db13-3f28-49af-8e3d-1d6fa81f6247" } ,"8":{ "ASSIGNED_PARTICIPANT":"worker-9" ,"INFO":"" ,"START_TIME":"1512924491044" ,"STATE":"RUNNING" ,"TASK_ID":"7bbda2ef-68da-4f11-b217-89c3cd7d7a2e" } ,"9":{ "ASSIGNED_PARTICIPANT":"worker-10" ,"INFO":"" ,"START_TIME":"1512924491044" ,"STATE":"RUNNING" ,"TASK_ID":"8407cb27-4b26-4786-91f2-ad920b1e2343" } } } {code} > Gobblin Helix Jobs Hang Indefinitely > - > > Key: GOBBLIN-318 > URL: https://issues.apache.org/jira/browse/GOBBLIN-318 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick >Priority: Critical > > In some cases, gobblin helix jobs can hang indefinitely. When coupled with > job locks, this can result in a job becoming stuck and not progressing. The > only solution currently is to restart the master node. > Assume the following is for a {{job_myjob_1510884004834}} and which hung at > 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. > {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job > as completed. This results in the {{TaskStateCollectorService}} indefinitely > searching for more task states, even though it has processed all the task > states that are ever going to be produced. There is no reference to the hung > job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}. In the Helix Web Admin, > the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. > There is no record of the job in Zookeeper at > {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}. This means that > the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails. > {code:java} > private void waitForJobCompletion() throws InterruptedException { > while (true) { > WorkflowContext workflowContext = > TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName); > if (workflowContext != null) { > org.apache.helix.task.TaskState helixJobState = > workflowContext.getJobState(this.jobResourceName); > if (helixJobState == org.apache.helix.task.TaskState.COMPLETED || > helixJobState == org.apache.helix.task.TaskState.FAILED || > helixJobState ==
[jira] [Commented] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely
[ https://issues.apache.org/jira/browse/GOBBLIN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286467#comment-16286467 ] Joel Baranick commented on GOBBLIN-318: --- To summarize the issue: # The job is running. # The job lock is still help. # All tasks have completed successfully and written their task state files. # The job has consumed all the task state files and updated the gobblin job and database # The helix state in Zookeeper is missing or not in a terminal state. # The job keeps polling the state at "/mycluster/PROPERTYSTORE/TaskRebalancer/myjob/Context" > Gobblin Helix Jobs Hang Indefinitely > - > > Key: GOBBLIN-318 > URL: https://issues.apache.org/jira/browse/GOBBLIN-318 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick >Priority: Critical > > In some cases, gobblin helix jobs can hang indefinitely. When coupled with > job locks, this can result in a job becoming stuck and not progressing. The > only solution currently is to restart the master node. > Assume the following is for a {{job_myjob_1510884004834}} and which hung at > 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. > {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job > as completed. This results in the {{TaskStateCollectorService}} indefinitely > searching for more task states, even though it has processed all the task > states that are ever going to be produced. There is no reference to the hung > job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}. In the Helix Web Admin, > the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. > There is no record of the job in Zookeeper at > {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}. This means that > the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails. > {code:java} > private void waitForJobCompletion() throws InterruptedException { > while (true) { > WorkflowContext workflowContext = > TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName); > if (workflowContext != null) { > org.apache.helix.task.TaskState helixJobState = > workflowContext.getJobState(this.jobResourceName); > if (helixJobState == org.apache.helix.task.TaskState.COMPLETED || > helixJobState == org.apache.helix.task.TaskState.FAILED || > helixJobState == org.apache.helix.task.TaskState.STOPPED) { > return; > } > } > Thread.sleep(1000); > } > } > {code} > The code gets the job state from Zookeeper: > {code:javascript} > { > "id": "WorkflowContext", > "simpleFields": { > "START_TIME": "1505159715449", > "STATE": "IN_PROGRESS" > }, > "listFields": {}, > "mapFields": { > "JOB_STATES": { > "jobname_job_jobname_150741571": "COMPLETED", > "jobname_job_jobname_150775680": "COMPLETED", > "jobname_job_jobname_150795931": "COMPLETED", > "jobname_job_jobname_1509857102910": "COMPLETED", > "jobname_job_jobname_1510253708033": "COMPLETED", > "jobname_job_jobname_1510271102898": "COMPLETED", > "jobname_job_jobname_1510852210668": "COMPLETED", > "jobname_job_jobname_1510853133675": "COMPLETED" > } > } > } > {code} > But there is no information contained in the job state for the hung job. > Also, it is really strange that the job states contained in that json blob > are so old. The oldest one is from 2017-10-7 10:35:00 PM UTC, more than a > month ago. > I'm not sure how the system got in this state, but this isn't the first time > we have seen this. While it would be good to prevent this from happening, it > would also be good to allow the system to recover if this state is entered. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely
[ https://issues.apache.org/jira/browse/GOBBLIN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286433#comment-16286433 ] Joel Baranick edited comment on GOBBLIN-318 at 12/11/17 7:36 PM: - [~abti] Job timeouts will help. That said, the underlying issue still needs to be fixed. A couple more pieces of info that might help figure out what is going on. # We write the task state to EFS, so it isn't an S3 eventual consistency issue. # The TaskStateCollectorService recognizes that the task is done. I know that because the sum of the completed task count from "Collected task state of %d completed tasks" equals that task count for the job. was (Author: jbaranick): [~abti] Job timeouts will help. That said, the underlying issue of the TaskCollectorService missing task states should be resolved. One other piece of info. We write the task state to EFS, so it isn't an S3 eventual consistency issue. > Gobblin Helix Jobs Hang Indefinitely > - > > Key: GOBBLIN-318 > URL: https://issues.apache.org/jira/browse/GOBBLIN-318 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick >Priority: Critical > > In some cases, gobblin helix jobs can hang indefinitely. When coupled with > job locks, this can result in a job becoming stuck and not progressing. The > only solution currently is to restart the master node. > Assume the following is for a {{job_myjob_1510884004834}} and which hung at > 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. > {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job > as completed. This results in the {{TaskStateCollectorService}} indefinitely > searching for more task states, even though it has processed all the task > states that are ever going to be produced. There is no reference to the hung > job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}. In the Helix Web Admin, > the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. > There is no record of the job in Zookeeper at > {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}. This means that > the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails. > {code:java} > private void waitForJobCompletion() throws InterruptedException { > while (true) { > WorkflowContext workflowContext = > TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName); > if (workflowContext != null) { > org.apache.helix.task.TaskState helixJobState = > workflowContext.getJobState(this.jobResourceName); > if (helixJobState == org.apache.helix.task.TaskState.COMPLETED || > helixJobState == org.apache.helix.task.TaskState.FAILED || > helixJobState == org.apache.helix.task.TaskState.STOPPED) { > return; > } > } > Thread.sleep(1000); > } > } > {code} > The code gets the job state from Zookeeper: > {code:javascript} > { > "id": "WorkflowContext", > "simpleFields": { > "START_TIME": "1505159715449", > "STATE": "IN_PROGRESS" > }, > "listFields": {}, > "mapFields": { > "JOB_STATES": { > "jobname_job_jobname_150741571": "COMPLETED", > "jobname_job_jobname_150775680": "COMPLETED", > "jobname_job_jobname_150795931": "COMPLETED", > "jobname_job_jobname_1509857102910": "COMPLETED", > "jobname_job_jobname_1510253708033": "COMPLETED", > "jobname_job_jobname_1510271102898": "COMPLETED", > "jobname_job_jobname_1510852210668": "COMPLETED", > "jobname_job_jobname_1510853133675": "COMPLETED" > } > } > } > {code} > But there is no information contained in the job state for the hung job. > Also, it is really strange that the job states contained in that json blob > are so old. The oldest one is from 2017-10-7 10:35:00 PM UTC, more than a > month ago. > I'm not sure how the system got in this state, but this isn't the first time > we have seen this. While it would be good to prevent this from happening, it > would also be good to allow the system to recover if this state is entered. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Baranick resolved GOBBLIN-321. --- Resolution: Not A Problem > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Assignee: Joel Baranick >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > data.publisher.type=gobblin.publisher.BaseDataPublisher > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely
[ https://issues.apache.org/jira/browse/GOBBLIN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286433#comment-16286433 ] Joel Baranick edited comment on GOBBLIN-318 at 12/11/17 7:29 PM: - [~abti] Job timeouts will help. That said, the underlying issue of the TaskCollectorService missing task states should be resolved. One other piece of info. We write the task state to EFS, so it isn't an S3 eventual consistency issue. was (Author: jbaranick): [~abti] Job timeouts will help. That said, the underlying issue of the TaskCollectorService missing task states should be resolved. > Gobblin Helix Jobs Hang Indefinitely > - > > Key: GOBBLIN-318 > URL: https://issues.apache.org/jira/browse/GOBBLIN-318 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick >Priority: Critical > > In some cases, gobblin helix jobs can hang indefinitely. When coupled with > job locks, this can result in a job becoming stuck and not progressing. The > only solution currently is to restart the master node. > Assume the following is for a {{job_myjob_1510884004834}} and which hung at > 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. > {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job > as completed. This results in the {{TaskStateCollectorService}} indefinitely > searching for more task states, even though it has processed all the task > states that are ever going to be produced. There is no reference to the hung > job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}. In the Helix Web Admin, > the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. > There is no record of the job in Zookeeper at > {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}. This means that > the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails. > {code:java} > private void waitForJobCompletion() throws InterruptedException { > while (true) { > WorkflowContext workflowContext = > TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName); > if (workflowContext != null) { > org.apache.helix.task.TaskState helixJobState = > workflowContext.getJobState(this.jobResourceName); > if (helixJobState == org.apache.helix.task.TaskState.COMPLETED || > helixJobState == org.apache.helix.task.TaskState.FAILED || > helixJobState == org.apache.helix.task.TaskState.STOPPED) { > return; > } > } > Thread.sleep(1000); > } > } > {code} > The code gets the job state from Zookeeper: > {code:javascript} > { > "id": "WorkflowContext", > "simpleFields": { > "START_TIME": "1505159715449", > "STATE": "IN_PROGRESS" > }, > "listFields": {}, > "mapFields": { > "JOB_STATES": { > "jobname_job_jobname_150741571": "COMPLETED", > "jobname_job_jobname_150775680": "COMPLETED", > "jobname_job_jobname_150795931": "COMPLETED", > "jobname_job_jobname_1509857102910": "COMPLETED", > "jobname_job_jobname_1510253708033": "COMPLETED", > "jobname_job_jobname_1510271102898": "COMPLETED", > "jobname_job_jobname_1510852210668": "COMPLETED", > "jobname_job_jobname_1510853133675": "COMPLETED" > } > } > } > {code} > But there is no information contained in the job state for the hung job. > Also, it is really strange that the job states contained in that json blob > are so old. The oldest one is from 2017-10-7 10:35:00 PM UTC, more than a > month ago. > I'm not sure how the system got in this state, but this isn't the first time > we have seen this. While it would be good to prevent this from happening, it > would also be good to allow the system to recover if this state is entered. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-207) Gobblin AWS requires job package to be publicly accessible
[ https://issues.apache.org/jira/browse/GOBBLIN-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Baranick resolved GOBBLIN-207. --- Resolution: Not A Bug > Gobblin AWS requires job package to be publicly accessible > -- > > Key: GOBBLIN-207 > URL: https://issues.apache.org/jira/browse/GOBBLIN-207 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick >Assignee: Joel Baranick > > {{GobblinAwsJobConfigurationManager}} expects that the job configuration file > is publicly accessible so that it can be downloaded. This PR changes how the > download is done, using Hadoop FS, so that the job package can be stored on > filesystems that don't expose it over HTTP and so that authentication can be > performed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Reopened] (GOBBLIN-207) Gobblin AWS requires job package to be publicly accessible
[ https://issues.apache.org/jira/browse/GOBBLIN-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Baranick reopened GOBBLIN-207: --- > Gobblin AWS requires job package to be publicly accessible > -- > > Key: GOBBLIN-207 > URL: https://issues.apache.org/jira/browse/GOBBLIN-207 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick >Assignee: Joel Baranick > > {{GobblinAwsJobConfigurationManager}} expects that the job configuration file > is publicly accessible so that it can be downloaded. This PR changes how the > download is done, using Hadoop FS, so that the job package can be stored on > filesystems that don't expose it over HTTP and so that authentication can be > performed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264079#comment-16264079 ] Joel Baranick commented on GOBBLIN-321: --- It is the location on {{source.filebased.fs.uri}} > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Assignee: Joel Baranick >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > data.publisher.type=gobblin.publisher.BaseDataPublisher > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264060#comment-16264060 ] Joel Baranick commented on GOBBLIN-321: --- Check you logs for {{Running ls command with input}}. Does the path listed there make sense? It gets build up by combining {{source.filebased.data.directory}} with {{source.entity}}. > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Assignee: Joel Baranick >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > data.publisher.type=gobblin.publisher.BaseDataPublisher > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264043#comment-16264043 ] Joel Baranick commented on GOBBLIN-321: --- {{source.filebased.data.directory}} should be a path not a URI (ex. {{/input}}) > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Assignee: Joel Baranick >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > data.publisher.type=gobblin.publisher.BaseDataPublisher > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263904#comment-16263904 ] Joel Baranick commented on GOBBLIN-321: --- Well, if you are using <= 0.11.0, I would stick to the {{gobblin.}} namespaces. Also, I'd pick either 0.10.0 or 0.11.0 and try with that. > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Assignee: Joel Baranick >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-187) Gobblin Helix doesn't clean up `.job.state` files, causing unbounded disk usage
[ https://issues.apache.org/jira/browse/GOBBLIN-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263899#comment-16263899 ] Joel Baranick commented on GOBBLIN-187: --- [~abti] Any ideas here? This ends up causing our EFS to keep growing, incurring more cost. > Gobblin Helix doesn't clean up `.job.state` files, causing unbounded disk > usage > --- > > Key: GOBBLIN-187 > URL: https://issues.apache.org/jira/browse/GOBBLIN-187 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick > > Then Gobblin is running on `GobblinHelixJobLauncher.createJob` method writes > the job state to a `.job.state` file. Nothing cleans up these files. The > result is unbounded disk usage. `.job.state` files should be deleted at the > completion of jobs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263897#comment-16263897 ] Joel Baranick commented on GOBBLIN-321: --- [~sheik5azmal] Where did you see to use the apache qualified namespace? Maybe some documentation is guiding people astray during this transition period. > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Assignee: Joel Baranick >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Baranick reassigned GOBBLIN-321: - Assignee: Joel Baranick > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Assignee: Joel Baranick >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263895#comment-16263895 ] Joel Baranick commented on GOBBLIN-321: --- >From your logs, the class you are loading is >{{org.apache.gobblin.source.extractor.filebased.TextFileBasedSource}}, 0.11.0 >doesn't use the apache namespaces. Compare >[0.11.0|https://github.com/apache/incubator-gobblin/blob/gobblin_0.11.0/gobblin-core/src/main/java/gobblin/source/extractor/filebased/TextFileBasedSource.java] > to >[master|https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/source/extractor/filebased/TextFileBasedSource.java] > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263895#comment-16263895 ] Joel Baranick edited comment on GOBBLIN-321 at 11/23/17 7:07 AM: - >From your logs, the class you are loading is >{{org.apache.gobblin.source.extractor.filebased.TextFileBasedSource}}, 0.11.0 >doesn't use the apache namespaces. Compare >[0.11.0|https://github.com/apache/incubator-gobblin/blob/gobblin_0.11.0/gobblin-core/src/main/java/gobblin/source/extractor/filebased/TextFileBasedSource.java] > to >[master|https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/source/extractor/filebased/TextFileBasedSource.java]. > You will see that the namespaces in master are all prefixed with >{{org.apache.}} because the gobblin was adopted as an apache incubator >project. The last release pre incubator is 0.11.0. was (Author: jbaranick): >From your logs, the class you are loading is >{{org.apache.gobblin.source.extractor.filebased.TextFileBasedSource}}, 0.11.0 >doesn't use the apache namespaces. Compare >[0.11.0|https://github.com/apache/incubator-gobblin/blob/gobblin_0.11.0/gobblin-core/src/main/java/gobblin/source/extractor/filebased/TextFileBasedSource.java] > to >[master|https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/source/extractor/filebased/TextFileBasedSource.java] > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263891#comment-16263891 ] Joel Baranick commented on GOBBLIN-321: --- What version of gobblin? > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely
[ https://issues.apache.org/jira/browse/GOBBLIN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256707#comment-16256707 ] Joel Baranick commented on GOBBLIN-318: --- There is one other manual way to recover from this hang. Modify the state at {{/mycluster/PROPERTYSTORE/TaskRebalancer/myjob/Context}}, adding an entry for the hung job with a terminal state. For instance, modify: {code:javascript} { "id": "WorkflowContext", "simpleFields": { "START_TIME": "1505159715449", "STATE": "IN_PROGRESS" }, "listFields": {}, "mapFields": { "JOB_STATES": { "jobname_job_jobname_150741571": "COMPLETED", "jobname_job_jobname_150775680": "COMPLETED", "jobname_job_jobname_150795931": "COMPLETED", "jobname_job_jobname_1509857102910": "COMPLETED", "jobname_job_jobname_1510253708033": "COMPLETED", "jobname_job_jobname_1510271102898": "COMPLETED", "jobname_job_jobname_1510852210668": "COMPLETED", "jobname_job_jobname_1510853133675": "COMPLETED" } } } {code} Adding, {{"jobname_job_jobname_1510884004834": "COMPLETED"}} to {{JOB_STATES}} (don't forget the comma). The updated json will look like: {code:java} // Some comments here { "id": "WorkflowContext", "simpleFields": { "START_TIME": "1505159715449", "STATE": "IN_PROGRESS" }, "listFields": {}, "mapFields": { "JOB_STATES": { "jobname_job_jobname_150741571": "COMPLETED", "jobname_job_jobname_150775680": "COMPLETED", "jobname_job_jobname_150795931": "COMPLETED", "jobname_job_jobname_1509857102910": "COMPLETED", "jobname_job_jobname_1510253708033": "COMPLETED", "jobname_job_jobname_1510271102898": "COMPLETED", "jobname_job_jobname_1510852210668": "COMPLETED", "jobname_job_jobname_1510853133675": "COMPLETED", "jobname_job_jobname_1510884004834": "COMPLETED" } } } {code} This will allow Gobblin to detect that the job is done and finish it's execution. I'm not sure if there are any other implications of doing this. > Gobblin Helix Jobs Hang Indefinitely > - > > Key: GOBBLIN-318 > URL: https://issues.apache.org/jira/browse/GOBBLIN-318 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick >Priority: Critical > > In some cases, gobblin helix jobs can hang indefinitely. When coupled with > job locks, this can result in a job becoming stuck and not progressing. The > only solution currently is to restart the master node. > Assume the following is for a {{job_myjob_1510884004834}} and which hung at > 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. > {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job > as completed. This results in the {{TaskStateCollectorService}} indefinitely > searching for more task states, even though it has processed all the task > states that are ever going to be produced. There is no reference to the hung > job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}. In the Helix Web Admin, > the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. > There is no record of the job in Zookeeper at > {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}. This means that > the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails. > {code:java} > private void waitForJobCompletion() throws InterruptedException { > while (true) { > WorkflowContext workflowContext = > TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName); > if (workflowContext != null) { > org.apache.helix.task.TaskState helixJobState = > workflowContext.getJobState(this.jobResourceName); > if (helixJobState == org.apache.helix.task.TaskState.COMPLETED || > helixJobState == org.apache.helix.task.TaskState.FAILED || > helixJobState == org.apache.helix.task.TaskState.STOPPED) { > return; > } > } > Thread.sleep(1000); > } > } > {code} > The code gets the job state from Zookeeper: > {code:javascript} > { > "id": "WorkflowContext", > "simpleFields": { > "START_TIME": "1505159715449", > "STATE": "IN_PROGRESS" > }, > "listFields": {}, > "mapFields": { > "JOB_STATES": { > "jobname_job_jobname_150741571": "COMPLETED", > "jobname_job_jobname_150775680": "COMPLETED", > "jobname_job_jobname_150795931": "COMPLETED", > "jobname_job_jobname_1509857102910": "COMPLETED", > "jobname_job_jobname_1510253708033": "COMPLETED", > "jobname_job_jobname_1510271102898": "COMPLETED", > "jobname_job_jobname_1510852210668": "COMPLETED", > "jobname_job_jobname_1510853133675": "COMPLETED" > } > } > } > {code} > But there is no information contained in the job state
[jira] [Assigned] (GOBBLIN-311) Gobblin AWS runs old jobs when cluster is restarted.
[ https://issues.apache.org/jira/browse/GOBBLIN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Baranick reassigned GOBBLIN-311: - Assignee: (was: Hung Tran) > Gobblin AWS runs old jobs when cluster is restarted. > > > Key: GOBBLIN-311 > URL: https://issues.apache.org/jira/browse/GOBBLIN-311 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick > > On startup of my cluster, old jobs are still attempted. @htran1 said that > they should be cleaned up in Standalone mode, but that does not seem > compatible with running under AWS: > [http://gobblin.readthedocs.io/en/latest/user-guide/Gobblin-Deployment/#standalone-architecture] > Also, if I enabled Standalone mode, then > {{GobblinClusterManager.sendShutdownRequest()}} won't be called. > Additionally, when enabling Standalone mode, GobblinClusterManager will call > the following code, which doesn't seem right if I'm running under AWS: > {code:java} > // In AWS / Yarn mode, the cluster Launcher takes care of setting up Helix > cluster > /// .. but for Standalone mode, we go via this main() method, so setup the > cluster here > if (isStandaloneClusterManager) { > // Create Helix cluster and connect to it > String zkConnectionString = > config.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY); > String helixClusterName = > config.getString(GobblinClusterConfigurationKeys.HELIX_CLUSTER_NAME_KEY); > HelixUtils.createGobblinHelixCluster(zkConnectionString, > helixClusterName, false); > LOGGER.info("Created Helix cluster " + helixClusterName); > } > {code} > Thoughts? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely
Joel Baranick created GOBBLIN-318: - Summary: Gobblin Helix Jobs Hang Indefinitely Key: GOBBLIN-318 URL: https://issues.apache.org/jira/browse/GOBBLIN-318 Project: Apache Gobblin Issue Type: Bug Reporter: Joel Baranick Priority: Critical In some cases, gobblin helix jobs can hang indefinitely. When coupled with job locks, this can result in a job becoming stuck and not progressing. The only solution currently is to restart the master node. Assume the following is for a {{job_myjob_1510884004834}} and which hung at 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job as completed. This results in the {{TaskStateCollectorService}} indefinitely searching for more task states, even though it has processed all the task states that are ever going to be produced. There is no reference to the hung job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}. In the Helix Web Admin, the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. There is no record of the job in Zookeeper at {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}. This means that the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails. {code:java} private void waitForJobCompletion() throws InterruptedException { while (true) { WorkflowContext workflowContext = TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName); if (workflowContext != null) { org.apache.helix.task.TaskState helixJobState = workflowContext.getJobState(this.jobResourceName); if (helixJobState == org.apache.helix.task.TaskState.COMPLETED || helixJobState == org.apache.helix.task.TaskState.FAILED || helixJobState == org.apache.helix.task.TaskState.STOPPED) { return; } } Thread.sleep(1000); } } {code} The code gets the job state from Zookeeper: {code:javascript} { "id": "WorkflowContext", "simpleFields": { "START_TIME": "1505159715449", "STATE": "IN_PROGRESS" }, "listFields": {}, "mapFields": { "JOB_STATES": { "jobname_job_jobname_150741571": "COMPLETED", "jobname_job_jobname_150775680": "COMPLETED", "jobname_job_jobname_150795931": "COMPLETED", "jobname_job_jobname_1509857102910": "COMPLETED", "jobname_job_jobname_1510253708033": "COMPLETED", "jobname_job_jobname_1510271102898": "COMPLETED", "jobname_job_jobname_1510852210668": "COMPLETED", "jobname_job_jobname_1510853133675": "COMPLETED" } } } {code} But there is no information contained in the job state for the hung job. Also, it is really strange that the job states contained in that json blob are so old. The oldest one is from 2017-10-7 10:35:00 PM UTC, more than a month ago. I'm not sure how the system got in this state, but this isn't the first time we have seen this. While it would be good to prevent this from happening, it would also be good to allow the system to recover if this state is entered. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-316) gobblin.util.ImmutableProperties behavior is different from Properties
Joel Baranick created GOBBLIN-316: - Summary: gobblin.util.ImmutableProperties behavior is different from Properties Key: GOBBLIN-316 URL: https://issues.apache.org/jira/browse/GOBBLIN-316 Project: Apache Gobblin Issue Type: Bug Reporter: Joel Baranick gobblin.util.ImmutableProperties uses Lombok's @Delegate annotation to delegate calls to the underlying Properties implementation. Unfortunately @Delegate isn't delegating cals to the underlying Hashtable. This results in different behavior between Properties and ImmutableProperties. For example, on Properties, .keys() and .keyset() return the same list of keys. However on ImmutableProperties, .keys() returns an empty enumerable and .keyset() return all the keys. Additionally, Lombok's @Delegate is likely to be removed in a future version of the library as they are not pleased with it: https://projectlombok.org/features/experimental/Delegate -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-311) Gobblin AWS runs old jobs when cluster is restarted.
Joel Baranick created GOBBLIN-311: - Summary: Gobblin AWS runs old jobs when cluster is restarted. Key: GOBBLIN-311 URL: https://issues.apache.org/jira/browse/GOBBLIN-311 Project: Apache Gobblin Issue Type: Bug Reporter: Joel Baranick Assignee: Hung Tran On startup of my cluster, old jobs are still attempted. @htran1 said that they should be cleaned up in Standalone mode, but that does not seem compatible with running under AWS: [http://gobblin.readthedocs.io/en/latest/user-guide/Gobblin-Deployment/#standalone-architecture] Also, if I enabled Standalone mode, then {{GobblinClusterManager.sendShutdownRequest()}} won't be called. Additionally, when enabling Standalone mode, GobblinClusterManager will call the following code, which doesn't seem right if I'm running under AWS: {code:java} // In AWS / Yarn mode, the cluster Launcher takes care of setting up Helix cluster /// .. but for Standalone mode, we go via this main() method, so setup the cluster here if (isStandaloneClusterManager) { // Create Helix cluster and connect to it String zkConnectionString = config.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY); String helixClusterName = config.getString(GobblinClusterConfigurationKeys.HELIX_CLUSTER_NAME_KEY); HelixUtils.createGobblinHelixCluster(zkConnectionString, helixClusterName, false); LOGGER.info("Created Helix cluster " + helixClusterName); } {code} Thoughts? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (GOBBLIN-227) JobLauncherUtils.cleanTaskStagingData fails for jobs with forks
[ https://issues.apache.org/jira/browse/GOBBLIN-227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Baranick updated GOBBLIN-227: -- Description: *Precondition:* Using Hocon configuration and have two forks configured. *Summary:* When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it tries to lookup {{writer.staging.dir}} in the configuration and fails. *Details:* Hocon configuration doesn't allow the following config: {code:none} writer.staging.dir=/foo writer.staging.dir.0=/foo writer.staging.dir.1=/foo {code} Initially {{writer.staging.dir}} is of type String, but when the Hocon parser encounters {{writer.staging.dir.0}}, it decides that {{writer.staging.dir}} is now of type Object, overwriting the prior value with {{_\{"0": "/foo"\}_}}. The effective Hocon configuration is: {code:javascript} { "writer": { "staging": { "dir": { "0": "/foo", "1": "/foo" } } } } {code} Fork specific configuration uses the same config keys as regular configuration except the fork number is appended like: {{.1}}. The code that looks up fork specific configuration doesn't automatically fallback to regular configuration. For example, if the code is trying to find {{writer.staging.dir.0}} and it isn't configured, the job will fail. Then means that all forks must configure fork specific versions of {{writer.staging.dir}}. When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it cleans up the based on the current job's configuration. Because of this, {{fork.branches}} is always set to {{1}}. The call to {{WriterUtils.getWriterStagingDir(state, numBranches, branchId)}} is made with {{numBranches=1}} and {{branchId=0}}. This results in the method looking for {{writer.staging.dir}}. Unfortunately, when using Hocon configuration the value {{writer.staging.dir}} doesn't exist and the job fails. was: *Precondition:* Using Hocon configuration and have two forks configured. *Summary:* When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it tries to lookup {{writer.staging.dir}} in the configuration and fails. *Details:* Hocon configuration doesn't allow the following config: {code:none} writer.staging.dir=/foo writer.staging.dir.0=/foo writer.staging.dir.1=/foo {code} Initially {{writer.staging.dir}} is of type String, but when the Hocon parser encounters {{writer.staging.dir.0}}, it decides that {{writer.staging.dir}} is now of type Object, overwriting the prior value with {{_\{"0": "/foo"\}_}}. The effective Hocon configuration is: {code:javascript} { "writer": { "staging": { "dir": { "0": "/foo", "1": "/foo" } } } } {code} Fork specific configuration uses the same config keys as regular configuration except the fork number is appended like: {{.1}}. The code that looks up fork specific configuration doesn't automatically fallback to regular configuration. For example, if the code is trying to find {{writer.staging.dir.0}} and it isn't configured, the job will fail. Then means that all forks must configure fork specific versions of {{writer.staging.dir}}. When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it cleans up the based on the current job's configuration. Because of this, {{fork.branches}} is always set to {{1}}. The call to {{WriterUtils.getWriterStagingDir(state, numBranches, branchId)}} is make with {{numBranches=1}} and {{branchId=0}}. This results in the method looking for {{writer.staging.dir}}. Unfortunately, when using Hocon configuration the value {{writer.staging.dir}} doesn't exist and the job fails. > JobLauncherUtils.cleanTaskStagingData fails for jobs with forks > --- > > Key: GOBBLIN-227 > URL: https://issues.apache.org/jira/browse/GOBBLIN-227 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick > > *Precondition:* > Using Hocon configuration and have two forks configured. > *Summary:* > When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} > it tries to lookup {{writer.staging.dir}} in the configuration and fails. > *Details:* > Hocon configuration doesn't allow the following config: > {code:none} > writer.staging.dir=/foo > writer.staging.dir.0=/foo > writer.staging.dir.1=/foo > {code} > Initially {{writer.staging.dir}} is of type String, but when the Hocon parser > encounters {{writer.staging.dir.0}}, it decides that {{writer.staging.dir}} > is now of type Object, overwriting the prior value with {{_\{"0": "/foo"\}_}}. > The effective Hocon configuration is: > {code:javascript} > { > "writer": { > "staging": { > "dir": { > "0": "/foo", > "1": "/foo" > } > } > } > } > {code} > Fork specific configuration
[jira] [Created] (GOBBLIN-227) JobLauncherUtils.cleanTaskStagingData fails for jobs with forks
Joel Baranick created GOBBLIN-227: - Summary: JobLauncherUtils.cleanTaskStagingData fails for jobs with forks Key: GOBBLIN-227 URL: https://issues.apache.org/jira/browse/GOBBLIN-227 Project: Apache Gobblin Issue Type: Bug Reporter: Joel Baranick *Precondition:* Using Hocon configuration and have two forks configured. *Summary:* When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it tries to lookup {{writer.staging.dir}} in the configuration and fails. *Details:* Hocon configuration doesn't allow the following config: {code:none} writer.staging.dir=/foo writer.staging.dir.0=/foo writer.staging.dir.1=/foo {code} Initially {{writer.staging.dir}} is of type String, but when the Hocon parser encounters {{writer.staging.dir.0}}, it decides that {{writer.staging.dir}} is now of type Object, overwriting the prior value with {{_\{"0": "/foo"\}_}}. The effective Hocon configuration is: {code:javascript} { "writer": { "staging": { "dir": { "0": "/foo", "1": "/foo" } } } } {code} Fork specific configuration uses the same config keys as regular configuration except the fork number is appended like: {{.1}}. The code that looks up fork specific configuration doesn't automatically fallback to regular configuration. For example, if the code is trying to find {{writer.staging.dir.0}} and it isn't configured, the job will fail. Then means that all forks must configure fork specific versions of {{writer.staging.dir}}. When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it cleans up the based on the current job's configuration. Because of this, {{fork.branches}} is always set to {{1}}. The call to {{WriterUtils.getWriterStagingDir(state, numBranches, branchId)}} is make with {{numBranches=1}} and {{branchId=0}}. This results in the method looking for {{writer.staging.dir}}. Unfortunately, when using Hocon configuration the value {{writer.staging.dir}} doesn't exist and the job fails. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-208) JobCatalogs should fallback to system configuration
Joel Baranick created GOBBLIN-208: - Summary: JobCatalogs should fallback to system configuration Key: GOBBLIN-208 URL: https://issues.apache.org/jira/browse/GOBBLIN-208 Project: Apache Gobblin Issue Type: Bug Reporter: Joel Baranick When `GobblinClusterManager` create the `JobCatalog`, it passes in a copy of the system config, scoped to the `gobblin.cluster.` prefix. This causes problems later when jobs are being loaded because properties they refer to may not be available. The config should fall back to the unmodified system config. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-196) Properties saved to JobState cannot be retrieved from DatasetState
Joel Baranick created GOBBLIN-196: - Summary: Properties saved to JobState cannot be retrieved from DatasetState Key: GOBBLIN-196 URL: https://issues.apache.org/jira/browse/GOBBLIN-196 Project: Apache Gobblin Issue Type: Bug Reporter: Joel Baranick In 0.6.0, properties could be saved to JobState and then retrieved from DatasetState via a `getProp()` call. From 0.9.0 on, properties can no longer be retrieved from DatasetState because `getProp()` (and other methods) have been overridden to throw `UnsupportedOperationException`.This is a backwards incompatible change and makes it hard for solutions to be developed on top of Gobblin that require state that is persisted across job runs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-32) StateStores created with rootDir that is incompatible with state.store.type
[ https://issues.apache.org/jira/browse/GOBBLIN-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116876#comment-16116876 ] Joel Baranick commented on GOBBLIN-32: -- @htran1 Can you look at this PR, which should solve this issue: https://github.com/apache/incubator-gobblin/pull/2035 > StateStores created with rootDir that is incompatible with state.store.type > --- > > Key: GOBBLIN-32 > URL: https://issues.apache.org/jira/browse/GOBBLIN-32 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick >Assignee: Hung Tran > > The StateStores class, when run under gobblin-yarn, can be created with a > rootDir (which comes from the yarn application work directory and is in the > form of `HDFS://...`) that is incompatible with the configured > `state.store.type`. > > *Github Url* : https://github.com/linkedin/gobblin/issues/1848 > *Github Reporter* : [~jbaranick] > *Github Created At* : 2017-05-09T17:30:00Z > *Github Updated At* : 2017-06-22T21:36:54Z > h3. Comments > > [~jbaranick] wrote on 2017-06-22T21:36:54Z : @htran1 Are you able to look > into this? > > *Github Url* : > https://github.com/linkedin/gobblin/issues/1848#issuecomment-310509726 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-159) Gobblin Cluster graceful shutdown of master and workers
[ https://issues.apache.org/jira/browse/GOBBLIN-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114785#comment-16114785 ] Joel Baranick commented on GOBBLIN-159: --- Not sure why it didn't auto attach: https://github.com/apache/incubator-gobblin/pull/2037 > Gobblin Cluster graceful shutdown of master and workers > --- > > Key: GOBBLIN-159 > URL: https://issues.apache.org/jira/browse/GOBBLIN-159 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Abhishek Tiwari >Assignee: Zhixiong Chen > > Relevant chat from Gitter channel: > *Joel Baranick @kadaan Jun 30 10:47* > Up scaling seems to work great. But down scaling caused problems with the > cluster. > Basically, once the cpu dropped enough to start down scaling, something broke > where it stopped processing jobs. > I’m concerned that the down scaling is not graceful and that the cluster > doesn’t respond nicely to workers leaving the cluster in the middle of > processing. > There are a couple problems I see. One is that the workers down gracefully > stop running tasks and allow them to be picked up by other nodes. > The other is that if task publishing is used, partial data might be published > when the node goes away. How does the task get completed without possibly > duplicating data? > *Joel Baranick @kadaan Jun 30 12:07* > @abti What I'm wondering is how we can shutdown a worker node and have it > gracefully stop working. > *Joel Baranick @kadaan Jun 30 12:52* > Also, seems like .../taskstates/... as well as the job...job.state file in > NFS don't get purged. > Our NFS is experiencing unbounded growth. Are we missing a setting or service? > *Abhishek Tiwari @abti Jun 30 15:36* > I didn’t fully understand the issue. Did you see the workers abruptly cancel > the task or did they wait for it to finish before shutting down? If the > worker waits around enough for Task to finish, the task level publish should > be fine? > *Joel Baranick @kadaan Jun 30 15:37* > The workers never shut down. > *Abhishek Tiwari @abti Jun 30 15:38* > could be because they wait for graceful shutdown but do not leave cluster and > are assigned new tasks by helix? > *Joel Baranick @kadaan Jun 30 15:39* > I think one issue is that there is an > org.quartz.UnableToInterruptJobException in JobScheduler.shutDown which > causes it to never run > ExecutorsUtils.shutdownExecutorService(this.jobExecutor, Optional.of(LOG)); > *Abhishek Tiwari @abti Jun 30 15:40* > also taskstates should get cleaned up, check with @htran1 too .. only wu > probably should be left around > we need to add some cleaning mechanism for that > we dont recall seeing the lurking state files > *Joel Baranick @kadaan Jun 30 15:47* > In my EFS/NFS, I have tons (> 6000) of files remaining under > .../_taskstates/... for jobs/tasks that have been completed for ages. > *Abhishek Tiwari @abti Jun 30 16:29* > wow thats unexpected, did master switch while several jobs were going on? > *Joel Baranick @kadaan Jun 30 17:23* > There isn't a way for master to switch without jobs running as they don't > cancel correctly. > *Joel Baranick @kadaan Jul 05 14:22* > @abti I was looking at fixing the cancellation problem. > From what I can tell, GobblinHelixJob needs to implement InterruptableJob. > And it needs to call jobLauncher.cancelJob(jobListener); when it is invoked. > Does this seem right? Anything I'm missing? > *Abhishek Tiwari @abti Jul 06 00:34* > looks about right -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-138) Task metrics are not saved to Job History Database when running under Yarn
[ https://issues.apache.org/jira/browse/GOBBLIN-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114558#comment-16114558 ] Joel Baranick commented on GOBBLIN-138: --- [~abti] Seems like there are being stored now, but they show up in the properties table, not the metrics table. > Task metrics are not saved to Job History Database when running under Yarn > -- > > Key: GOBBLIN-138 > URL: https://issues.apache.org/jira/browse/GOBBLIN-138 > Project: Apache Gobblin > Issue Type: Bug > Components: gobblin-yarn >Reporter: Joel Baranick >Assignee: Abhishek Tiwari > Labels: Bug:Generic, LaunchType:Yarn > > Task level metrics are not transmitted from the containers that tasks are > running on back to the app_master. This means that task level metrics cannot > be saved in the Job History Database. We should be able to store the task > level metrics in the Job History Database just like when we run a standalone > job. > > *Github Url* : https://github.com/linkedin/gobblin/issues/748 > *Github Reporter* : [~jbaranick] > *Github Created At* : 2016-02-23T21:33:39Z > *Github Updated At* : 2016-03-08T06:15:25Z > h3. Comments > > *ydai1124* wrote on 2016-02-23T21:42:11Z : @kadaan We are deprecating the > Job/Task metrics you are using. It is better to switch to Gobblin Metrics: > https://github.com/linkedin/gobblin/wiki/Gobblin%20Metrics%20Architecture. It > has more contents and more stable. But we don't have the reporter to report > to database yet. You can implement your own reporter: > https://github.com/linkedin/gobblin/wiki/Implementing%20New%20Reporters. > > > *Github Url* : > https://github.com/linkedin/gobblin/issues/748#issuecomment-187926718 > > [~stakiar] wrote on 2016-02-24T00:07:38Z : @kadaan is this bug blocking the > deployment of the `JobHistoryStore` or are you actually interesting in > viewing the `TaskMetrics`? Just want to understand where the bug is. > AS @ydai1124 mentioned we are moving away from `TaskMetrics` and > `JobMetrics`, and at some point we want to remove the current way > `TaskMetrics` and `JobMetrics` are written to the `JobHistoryStore`. > > > *Github Url* : > https://github.com/linkedin/gobblin/issues/748#issuecomment-187977305 > > [~jbaranick] wrote on 2016-02-24T00:09:20Z : No, it is not blocking anything. > > > *Github Url* : > https://github.com/linkedin/gobblin/issues/748#issuecomment-187978192 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-152) Private version of Apache Helix causes maven repo to be unusable
[ https://issues.apache.org/jira/browse/GOBBLIN-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114557#comment-16114557 ] Joel Baranick commented on GOBBLIN-152: --- [~abti] Hasn't this already been fixed? > Private version of Apache Helix causes maven repo to be unusable > > > Key: GOBBLIN-152 > URL: https://issues.apache.org/jira/browse/GOBBLIN-152 > Project: Apache Gobblin > Issue Type: Bug > Components: gobblin-helix >Reporter: Joel Baranick >Assignee: Hung Tran > Labels: Bug:Generic, Framework:Build, LaunchType:Yarn > > The gobblin-yarn build.gradle includes a reference to an private version of > helix: `compile files('./src/main/resources/helix-core-0.6.6-SNAPSHOT.jar')`. > When the gobblin libraries are pushed to maven, the private version of helix > is not pushed. Because of this, tarballs built from maven are missing the > helix jar. > Is it possible to switch to the latest release or snapshot version of helix? > > *Github Url* : https://github.com/linkedin/gobblin/issues/525 > *Github Reporter* : [~jbaranick] > *Github Created At* : 2015-12-15T01:43:37Z > *Github Updated At* : 2017-01-12T04:31:44Z > h3. Comments > > [~liyinan926] wrote on 2015-12-15T19:26:40Z : The local Helix jar contains > critical patches that have not been merged into the trunk yet. We are working > on that though. > > > *Github Url* : > https://github.com/linkedin/gobblin/issues/525#issuecomment-164866018 > > [~jbaranick] wrote on 2015-12-15T22:23:17Z : For now we are just pushing it > to artifactory. Please update this when the merge to helix trunk is done. > > > *Github Url* : > https://github.com/linkedin/gobblin/issues/525#issuecomment-164916245 > > [~jbaranick] wrote on 2016-01-04T18:26:17Z : @liyinan926 What is the status > of this? > > > *Github Url* : > https://github.com/linkedin/gobblin/issues/525#issuecomment-168759278 > > [~jbaranick] wrote on 2016-01-14T16:44:05Z : @liyinan926 Can you please > provide links to the PRs for the critical patches to Helix so that we can > track the progress of this? > > > *Github Url* : > https://github.com/linkedin/gobblin/issues/525#issuecomment-171697162 > > [~stakiar] wrote on 2016-02-05T21:15:04Z : Here are the PRs: > https://github.com/apache/helix/pull/34 > https://github.com/apache/helix/pull/35 > I believe release 0.7.2 of Helix will have these changes. I don't when it > will be release to Maven. > > > *Github Url* : > https://github.com/linkedin/gobblin/issues/525#issuecomment-180556016 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-129) AdminUI performs too many requests when update is pressed
[ https://issues.apache.org/jira/browse/GOBBLIN-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114553#comment-16114553 ] Joel Baranick commented on GOBBLIN-129: --- Fixed by [Gobblin-9] > AdminUI performs too many requests when update is pressed > - > > Key: GOBBLIN-129 > URL: https://issues.apache.org/jira/browse/GOBBLIN-129 > Project: Apache Gobblin > Issue Type: Bug > Components: gobblin-admin >Reporter: Joel Baranick >Assignee: Abhishek Tiwari > Labels: Framework:AdminUI, enhancement > > After using the AdminUI for a while and navigating from the overview, to job > information, to job details, and back, the update button causes too many > requests. The update button should only be updating the information for the > current page, but in this case it is making requests for data for all > previous pages as well. > > *Github Url* : https://github.com/linkedin/gobblin/issues/784 > *Github Reporter* : [~jbaranick] > *Github Created At* : 2016-03-02T07:09:21Z > *Github Updated At* : 2017-01-12T04:44:12Z -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-129) AdminUI performs too many requests when update is pressed
[ https://issues.apache.org/jira/browse/GOBBLIN-129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Baranick resolved GOBBLIN-129. --- Resolution: Fixed Fixed by [Gobblin-9] > AdminUI performs too many requests when update is pressed > - > > Key: GOBBLIN-129 > URL: https://issues.apache.org/jira/browse/GOBBLIN-129 > Project: Apache Gobblin > Issue Type: Bug > Components: gobblin-admin >Reporter: Joel Baranick >Assignee: Abhishek Tiwari > Labels: Framework:AdminUI, enhancement > > After using the AdminUI for a while and navigating from the overview, to job > information, to job details, and back, the update button causes too many > requests. The update button should only be updating the information for the > current page, but in this case it is making requests for data for all > previous pages as well. > > *Github Url* : https://github.com/linkedin/gobblin/issues/784 > *Github Reporter* : [~jbaranick] > *Github Created At* : 2016-03-02T07:09:21Z > *Github Updated At* : 2017-01-12T04:44:12Z -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-159) Gobblin Cluster graceful shutdown of master and workers
[ https://issues.apache.org/jira/browse/GOBBLIN-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114544#comment-16114544 ] Joel Baranick commented on GOBBLIN-159: --- Zhixiong Chen, are you actively working on this? > Gobblin Cluster graceful shutdown of master and workers > --- > > Key: GOBBLIN-159 > URL: https://issues.apache.org/jira/browse/GOBBLIN-159 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Abhishek Tiwari >Assignee: Zhixiong Chen > > Relevant chat from Gitter channel: > *Joel Baranick @kadaan Jun 30 10:47* > Up scaling seems to work great. But down scaling caused problems with the > cluster. > Basically, once the cpu dropped enough to start down scaling, something broke > where it stopped processing jobs. > I’m concerned that the down scaling is not graceful and that the cluster > doesn’t respond nicely to workers leaving the cluster in the middle of > processing. > There are a couple problems I see. One is that the workers down gracefully > stop running tasks and allow them to be picked up by other nodes. > The other is that if task publishing is used, partial data might be published > when the node goes away. How does the task get completed without possibly > duplicating data? > *Joel Baranick @kadaan Jun 30 12:07* > @abti What I'm wondering is how we can shutdown a worker node and have it > gracefully stop working. > *Joel Baranick @kadaan Jun 30 12:52* > Also, seems like .../taskstates/... as well as the job...job.state file in > NFS don't get purged. > Our NFS is experiencing unbounded growth. Are we missing a setting or service? > *Abhishek Tiwari @abti Jun 30 15:36* > I didn’t fully understand the issue. Did you see the workers abruptly cancel > the task or did they wait for it to finish before shutting down? If the > worker waits around enough for Task to finish, the task level publish should > be fine? > *Joel Baranick @kadaan Jun 30 15:37* > The workers never shut down. > *Abhishek Tiwari @abti Jun 30 15:38* > could be because they wait for graceful shutdown but do not leave cluster and > are assigned new tasks by helix? > *Joel Baranick @kadaan Jun 30 15:39* > I think one issue is that there is an > org.quartz.UnableToInterruptJobException in JobScheduler.shutDown which > causes it to never run > ExecutorsUtils.shutdownExecutorService(this.jobExecutor, Optional.of(LOG)); > *Abhishek Tiwari @abti Jun 30 15:40* > also taskstates should get cleaned up, check with @htran1 too .. only wu > probably should be left around > we need to add some cleaning mechanism for that > we dont recall seeing the lurking state files > *Joel Baranick @kadaan Jun 30 15:47* > In my EFS/NFS, I have tons (> 6000) of files remaining under > .../_taskstates/... for jobs/tasks that have been completed for ages. > *Abhishek Tiwari @abti Jun 30 16:29* > wow thats unexpected, did master switch while several jobs were going on? > *Joel Baranick @kadaan Jun 30 17:23* > There isn't a way for master to switch without jobs running as they don't > cancel correctly. > *Joel Baranick @kadaan Jul 05 14:22* > @abti I was looking at fixing the cancellation problem. > From what I can tell, GobblinHelixJob needs to implement InterruptableJob. > And it needs to call jobLauncher.cancelJob(jobListener); when it is invoked. > Does this seem right? Anything I'm missing? > *Abhishek Tiwari @abti Jul 06 00:34* > looks about right -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-188) Update website URLs to point to https://gobblin.apache.org/
Joel Baranick created GOBBLIN-188: - Summary: Update website URLs to point to https://gobblin.apache.org/ Key: GOBBLIN-188 URL: https://issues.apache.org/jira/browse/GOBBLIN-188 Project: Apache Gobblin Issue Type: Bug Reporter: Joel Baranick Assignee: Abhishek Tiwari The URL at the top of https://github.com/apache/incubator-gobblin needs to point to https://gobblin.apache.org/ The URL listed as the website on http://incubator.apache.org/projects/gobblin.html needs to point to https://gobblin.apache.org/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-187) Gobblin Helix doesn't clean up `.job.state` files, causing unbounded disk usage
Joel Baranick created GOBBLIN-187: - Summary: Gobblin Helix doesn't clean up `.job.state` files, causing unbounded disk usage Key: GOBBLIN-187 URL: https://issues.apache.org/jira/browse/GOBBLIN-187 Project: Apache Gobblin Issue Type: Bug Reporter: Joel Baranick Then Gobblin is running on `GobblinHelixJobLauncher.createJob` method writes the job state to a `.job.state` file. Nothing cleans up these files. The result is unbounded disk usage. `.job.state` files should be deleted at the completion of jobs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-127) Admin UI duration chart is sorted incorrectly
[ https://issues.apache.org/jira/browse/GOBBLIN-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Baranick resolved GOBBLIN-127. --- Resolution: Fixed Fixed by [Gobblin-9] > Admin UI duration chart is sorted incorrectly > - > > Key: GOBBLIN-127 > URL: https://issues.apache.org/jira/browse/GOBBLIN-127 > Project: Apache Gobblin > Issue Type: Bug > Components: gobblin-admin >Reporter: Joel Baranick > Labels: Bug:Generic, Framework:AdminUI > > The Job Duration chart in the AdminUI is sorted incorrectly. It is sorted by > duration, but should be sorted by time. > src=https://cloud.githubusercontent.com/assets/1904898/13618328/60a39822-e538-11e5-983b-4706e45c2a34.png> > > *Github Url* : https://github.com/linkedin/gobblin/issues/811 > *Github Reporter* : [~jbaranick] > *Github Created At* : 2016-03-08T22:17:30Z > *Github Updated At* : 2017-01-12T04:46:36Z -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (GOBBLIN-109) Remove need for current.jst
[ https://issues.apache.org/jira/browse/GOBBLIN-109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Baranick closed GOBBLIN-109. - Resolution: Fixed No longer needed. We implemented out own S3aStateStore which handles this. > Remove need for current.jst > --- > > Key: GOBBLIN-109 > URL: https://issues.apache.org/jira/browse/GOBBLIN-109 > Project: Apache Gobblin > Issue Type: Task >Reporter: Joel Baranick > Labels: Framework:StateManagement, enhancement > > Fix for #882 > > *Github Url* : https://github.com/linkedin/gobblin/pull/965 > *Github Reporter* : [~jbaranick] > *Github Assignee* : [~jbaranick] > *Github Created At* : 2016-05-05T22:04:54Z > *Github Updated At* : 2017-04-22T18:44:42Z > h3. Comments > > [~jbaranick] wrote on 2016-05-05T22:06:22Z : @sahilTakiar @zliu41: Can you > review this? > > > *Github Url* : > https://github.com/linkedin/gobblin/pull/965#issuecomment-217293611 > > *coveralls* wrote on 2016-05-05T22:21:08Z : [![Coverage > Status](https://coveralls.io/builds/6068577/badge)](https://coveralls.io/builds/6068577) > Coverage increased (+0.5%) to 45.026% when pulling > **193dddb831475f931999a9aca54c5c00e2d082d3 on kadaan:FixFor882** into > **41963701538ae90ed8042c8d34a2ed7211a9af42 on linkedin:master**. > > > *Github Url* : > https://github.com/linkedin/gobblin/pull/965#issuecomment-217296762 > > *zliu41* wrote on 2016-05-06T16:07:44Z : @kadaan could you please give a > brief description of your approach? It seems you are still using > `current.jst`, which is a different approach than #882. > > > *Github Url* : > https://github.com/linkedin/gobblin/pull/965#issuecomment-217485933 > > [~jbaranick] wrote on 2016-05-06T16:35:48Z : `current.jst` is not used. > There is a compromise here so that users of the API aren't broken. New > callers can call `getCurrent` or `getAllCurrent` to get the latest state. If > they want a specific state they can continue to call `get` or `getAll`. If > `current` or `current.jst` is requested when calling `get` or `getAll` it > will return the latest state just like `getCurrent` and `getAllCurrent`. A > precondition will ensure that users of the API are not able to write a file > named `current` or `current.jst`. > > > *Github Url* : > https://github.com/linkedin/gobblin/pull/965#issuecomment-217492590 > > *coveralls* wrote on 2016-05-06T16:57:15Z : [![Coverage > Status](https://coveralls.io/builds/6078675/badge)](https://coveralls.io/builds/6078675) > Coverage increased (+0.1%) to 45.096% when pulling > **e5f0498095f1f6bcbf25e3ed0316ffd772275d73 on kadaan:FixFor882** into > **588d8c77fe3c84c752fd410f916868419c178465 on linkedin:master**. > > > *Github Url* : > https://github.com/linkedin/gobblin/pull/965#issuecomment-217497736 > > *coveralls* wrote on 2016-05-12T07:47:06Z : [![Coverage > Status](https://coveralls.io/builds/6149251/badge)](https://coveralls.io/builds/6149251) > Coverage increased (+0.1%) to 46.765% when pulling > **67e66222cd441d903b4197d380df8041cce2cc9d on kadaan:FixFor882** into > **5cd9d969f73456e46847c9d9e7ef33ad5376617c on linkedin:master**. > > > *Github Url* : > https://github.com/linkedin/gobblin/pull/965#issuecomment-218684160 > > [~jbaranick] wrote on 2016-05-12T20:52:47Z : @zliu41 @sahilTakiar Can you > guys finish this review? > > > *Github Url* : > https://github.com/linkedin/gobblin/pull/965#issuecomment-218882376 > > [~jbaranick] wrote on 2016-06-01T15:05:13Z : @zliu41 @sahilTakiar Can you > guys finish this review? > > > *Github Url* : > https://github.com/linkedin/gobblin/pull/965#issuecomment-223022088 > > *coveralls* wrote on 2016-06-01T15:33:05Z : [![Coverage > Status](https://coveralls.io/builds/6419146/badge)](https://coveralls.io/builds/6419146) > Coverage increased (+0.07%) to 46.308% when pulling > **668262a91516d9919a1cd30c141b058514890c8e on kadaan:FixFor882** into > **fe7dc7c35eebc3a4faee9987ecccaae358c5 on linkedin:master**. > > > *Github Url* : > https://github.com/linkedin/gobblin/pull/965#issuecomment-223031389 > > *zliu41* wrote on 2016-06-01T19:09:53Z : @pcadabam @ibuenros @ydai1124 can > you review this PR? Thanks > > > *Github Url* : > https://github.com/linkedin/gobblin/pull/965#issuecomment-223094308 > > *coveralls* wrote on 2016-06-01T20:12:06Z : [![Coverage > Status](https://coveralls.io/builds/6423539/badge)](https://coveralls.io/builds/6423539) > Coverage increased (+0.2%) to 46.398% when pulling > **668262a91516d9919a1cd30c141b058514890c8e on kadaan:FixFor882** into > **fe7dc7c35eebc3a4faee9987ecccaae358c5 on linkedin:master**. > > > *Github Url* : > https://github.com/linkedin/gobblin/pull/965#issuecomment-223110224 > > [~jbaranick] wrote on
[jira] [Resolved] (GOBBLIN-39) JobHistoryDB migration files have been incorrectly modified
[ https://issues.apache.org/jira/browse/GOBBLIN-39?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Baranick resolved GOBBLIN-39. -- Resolution: Fixed Resolved by [GOBBLIN-11] > JobHistoryDB migration files have been incorrectly modified > --- > > Key: GOBBLIN-39 > URL: https://issues.apache.org/jira/browse/GOBBLIN-39 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick > > The flyway DB migration files cannot be changed after they are committed. If > you need to make a schema change they need to be done in a newly versioned > file. The way these changes have been made screw up the migration: > 58389c95dc00b23cb1c63ce88a18be9239aa465e, > a4dbf76d17c39f8282d3b765c32de61f2eb23404, > 82678450952a7de194b810dbd82cd0c5b4752e63. > Changing previous migration files changes the checksums. The flyway > migration then fails because of the differing checksums. This check is here > to ensure that flyway can always know what changes, and in what order, need > to be applied. The DB migration is done by running: > `./historystore-manager.sh migrate -Durl=jdbc:mysql:///gobblin > -Duser= -Dpassword=`. > More details can be found in: > https://github.com/linkedin/gobblin/tree/master/gobblin-metastore/src/main/resources > As a short term work around, the following can be added to the migration > command: `-DvalidateOnMigrate=false`. This removes much of the safety net, > but allows the changes to be processed. Please don't rely on this mechanism. > > *Github Url* : https://github.com/linkedin/gobblin/issues/1823 > *Github Reporter* : [~jbaranick] > *Github Created At* : 2017-05-02T00:27:27Z > *Github Updated At* : 2017-05-02T00:27:27Z -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-40) Job History DB Schema had not been updated to reflect new LauncherType
[ https://issues.apache.org/jira/browse/GOBBLIN-40?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Baranick resolved GOBBLIN-40. -- Resolution: Fixed Resolved by [GOBBLIN-11] > Job History DB Schema had not been updated to reflect new LauncherType > -- > > Key: GOBBLIN-40 > URL: https://issues.apache.org/jira/browse/GOBBLIN-40 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick > > A new launcher type has been added, `CLUSTER`, but the JobHistoryDB schema > has not been updated to support this type. > > *Github Url* : https://github.com/linkedin/gobblin/issues/1822 > *Github Reporter* : [~jbaranick] > *Github Created At* : 2017-05-01T23:40:03Z > *Github Updated At* : 2017-05-01T23:40:03Z -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-30) Reflections errors when scanning classpath and encountering missing/invalid file paths.
[ https://issues.apache.org/jira/browse/GOBBLIN-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Baranick resolved GOBBLIN-30. -- Resolution: Fixed Resolved by [GOBBLIN-10] > Reflections errors when scanning classpath and encountering missing/invalid > file paths. > --- > > Key: GOBBLIN-30 > URL: https://issues.apache.org/jira/browse/GOBBLIN-30 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick > > Reflections should filter out classpath entries which are missing/invalid. > ``` > 2017-05-04 23:58:03 UTC WARN [JobExecutionInfoServer STARTING] > org.reflections.vfs.Vfs- could not create Dir using directory from url > file:/usr/lib/packages/hadoop2/hadoop2/share/hadoop/mapreduce/lib/*. skipping. > java.lang.NullPointerException > at org.reflections.vfs.Vfs$DefaultUrlTypes$3.matches(Vfs.java:239) > at org.reflections.vfs.Vfs.fromURL(Vfs.java:98) > at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) > at org.reflections.Reflections.scan(Reflections.java:237) > at org.reflections.Reflections.scan(Reflections.java:204) > at org.reflections.Reflections.(Reflections.java:129) > at org.reflections.Reflections.(Reflections.java:170) > at > gobblin.metastore.DatabaseJobHistoryStore.findVersionedDatabaseJobHistoryStore(DatabaseJobHistoryStore.java:102) > at > gobblin.metastore.DatabaseJobHistoryStore.(DatabaseJobHistoryStore.java:61) > at > gobblin.metastore.DatabaseJobHistoryStore$$FastClassByGuice$$ec6cc1b8.newInstance() > at > com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40) > at > com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:61) > at > com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:105) > at > com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85) > at > com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:267) > at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56) > at > com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1016) > at > com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092) > at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1012) > at > com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1051) > at > gobblin.rest.JobExecutionInfoServer.startUp(JobExecutionInfoServer.java:85) > at > com.google.common.util.concurrent.AbstractIdleService$2$1.run(AbstractIdleService.java:54) > at com.google.common.util.concurrent.Callables$3.run(Callables.java:95) > at java.lang.Thread.run(Thread.java:745) > ``` > > *Github Url* : https://github.com/linkedin/gobblin/issues/1851 > *Github Reporter* : [~jbaranick] > *Github Created At* : 2017-05-09T17:41:23Z > *Github Updated At* : 2017-05-09T17:41:39Z -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-31) Reflections concurrency issue
[ https://issues.apache.org/jira/browse/GOBBLIN-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105561#comment-16105561 ] Joel Baranick commented on GOBBLIN-31: -- [~abti] This was fixed by [GOBBLIN-10] > Reflections concurrency issue > - > > Key: GOBBLIN-31 > URL: https://issues.apache.org/jira/browse/GOBBLIN-31 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick > > Reflections has a concurrency issue that causes the classpath scanning in > `DatabaseJobHistoryStore` to intermittently fail. The Reflections scanner > needs to be created only once per application. > `2017-05-08 14:52:06 UTC INFO [DefaultQuartzScheduler_Worker-1] > org.quartz.core.JobRunShell- Job my.job threw a JobExecutionException: > org.quartz.JobExecutionException: com.google.inject.ProvisionException: > Unable to provision, see the following errors: > 1) Error injecting constructor, java.lang.IllegalStateException: zip file > closed > at > gobblin.metastore.DatabaseJobHistoryStore.(DatabaseJobHistoryStore.java:69) > while locating gobblin.metastore.DatabaseJobHistoryStore > while locating gobblin.metastore.JobHistoryStore > 1 error [See nested exception: com.google.inject.ProvisionException: Unable > to provision, see the following errors: > 1) Error injecting constructor, java.lang.IllegalStateException: zip file > closed > at > gobblin.metastore.DatabaseJobHistoryStore.(DatabaseJobHistoryStore.java:69) > while locating gobblin.metastore.DatabaseJobHistoryStore > while locating gobblin.metastore.JobHistoryStore > 1 error] > at gobblin.cluster.GobblinHelixJob.executeImpl(GobblinHelixJob.java:87) > at gobblin.scheduler.BaseGobblinJob.execute(BaseGobblinJob.java:53) > at org.quartz.core.JobRunShell.run(JobRunShell.java:202) > at > org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) > Caused by: com.google.inject.ProvisionException: Unable to provision, see the > following errors: > 1) Error injecting constructor, java.lang.IllegalStateException: zip file > closed > at > gobblin.metastore.DatabaseJobHistoryStore.(DatabaseJobHistoryStore.java:69) > while locating gobblin.metastore.DatabaseJobHistoryStore > while locating gobblin.metastore.JobHistoryStore > 1 error > at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1025) > at > com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1051) > at gobblin.runtime.JobContext.createJobHistoryStore(JobContext.java:202) > at gobblin.runtime.JobContext.(JobContext.java:141) > at > gobblin.runtime.AbstractJobLauncher.(AbstractJobLauncher.java:172) > at > gobblin.runtime.AbstractJobLauncher.(AbstractJobLauncher.java:144) > at > gobblin.cluster.GobblinHelixJobLauncher.(GobblinHelixJobLauncher.java:120) > at gobblin.cluster.GobblinHelixJob.executeImpl(GobblinHelixJob.java:65) > ... 3 more > Caused by: java.lang.IllegalStateException: zip file closed > at java.util.zip.ZipFile.ensureOpen(ZipFile.java:634) > at java.util.zip.ZipFile.access$200(ZipFile.java:56) > at java.util.zip.ZipFile$1.hasMoreElements(ZipFile.java:487) > at java.util.jar.JarFile$1.hasMoreElements(JarFile.java:241) > at org.reflections.vfs.ZipDir$1$1.computeNext(ZipDir.java:30) > at org.reflections.vfs.ZipDir$1$1.computeNext(ZipDir.java:26) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > at org.reflections.Reflections.scan(Reflections.java:240) > at org.reflections.Reflections.scan(Reflections.java:204) > at org.reflections.Reflections.(Reflections.java:129) > at > gobblin.metastore.DatabaseJobHistoryStore.findVersionedDatabaseJobHistoryStore(DatabaseJobHistoryStore.java:124) > at > gobblin.metastore.DatabaseJobHistoryStore.(DatabaseJobHistoryStore.java:71) > at > gobblin.metastore.DatabaseJobHistoryStore$$FastClassByGuice$$ec6cc1b8.newInstance() > at > com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40) > at > com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:61) > at > com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:105) > at > com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85) > at > com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:267) > at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56) > at > com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1016) > at >
[jira] [Commented] (GOBBLIN-30) Reflections errors when scanning classpath and encountering missing/invalid file paths.
[ https://issues.apache.org/jira/browse/GOBBLIN-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105559#comment-16105559 ] Joel Baranick commented on GOBBLIN-30: -- [~abti] This was fixed by [GOBBLIN-10] > Reflections errors when scanning classpath and encountering missing/invalid > file paths. > --- > > Key: GOBBLIN-30 > URL: https://issues.apache.org/jira/browse/GOBBLIN-30 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick > > Reflections should filter out classpath entries which are missing/invalid. > ``` > 2017-05-04 23:58:03 UTC WARN [JobExecutionInfoServer STARTING] > org.reflections.vfs.Vfs- could not create Dir using directory from url > file:/usr/lib/packages/hadoop2/hadoop2/share/hadoop/mapreduce/lib/*. skipping. > java.lang.NullPointerException > at org.reflections.vfs.Vfs$DefaultUrlTypes$3.matches(Vfs.java:239) > at org.reflections.vfs.Vfs.fromURL(Vfs.java:98) > at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) > at org.reflections.Reflections.scan(Reflections.java:237) > at org.reflections.Reflections.scan(Reflections.java:204) > at org.reflections.Reflections.(Reflections.java:129) > at org.reflections.Reflections.(Reflections.java:170) > at > gobblin.metastore.DatabaseJobHistoryStore.findVersionedDatabaseJobHistoryStore(DatabaseJobHistoryStore.java:102) > at > gobblin.metastore.DatabaseJobHistoryStore.(DatabaseJobHistoryStore.java:61) > at > gobblin.metastore.DatabaseJobHistoryStore$$FastClassByGuice$$ec6cc1b8.newInstance() > at > com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40) > at > com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:61) > at > com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:105) > at > com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85) > at > com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:267) > at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56) > at > com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1016) > at > com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092) > at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1012) > at > com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1051) > at > gobblin.rest.JobExecutionInfoServer.startUp(JobExecutionInfoServer.java:85) > at > com.google.common.util.concurrent.AbstractIdleService$2$1.run(AbstractIdleService.java:54) > at com.google.common.util.concurrent.Callables$3.run(Callables.java:95) > at java.lang.Thread.run(Thread.java:745) > ``` > > *Github Url* : https://github.com/linkedin/gobblin/issues/1851 > *Github Reporter* : [~jbaranick] > *Github Created At* : 2017-05-09T17:41:23Z > *Github Updated At* : 2017-05-09T17:41:39Z -- This message was sent by Atlassian JIRA (v6.4.14#64029)