[jira] [Updated] (GOBBLIN-531) Gobblin AWS Worker cannot start because of state store type and uri mismatch

2018-07-12 Thread Joel Baranick (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Baranick updated GOBBLIN-531:
--
Attachment: Screen Shot 2018-07-12 at 8.47.07 AM.png

> Gobblin AWS Worker cannot start because of state store type and uri mismatch
> 
>
> Key: GOBBLIN-531
> URL: https://issues.apache.org/jira/browse/GOBBLIN-531
> Project: Apache Gobblin
>  Issue Type: Bug
>  Components: gobblin-aws
>Affects Versions: 0.12.0
>Reporter: Joel Baranick
>Assignee: Abhishek Tiwari
>Priority: Major
>  Labels: aws, helix
> Attachments: Screen Shot 2018-07-12 at 8.47.07 AM.png
>
>
> Something has changed from 0.10.0 to 0.12.0 which causes the _StateStores_ 
> class to be instantiated with a _state.store.fs.uri_ which is mismatched with 
> the _state.store.type_.  
> The problem seems to be from: 
> [GobblinTaskRunner.java#L250|https://github.com/apache/incubator-gobblin/blob/0.12.0/gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java#L250]
> It create a new _Config_ for like: 
>  
> {code:java}
> Config stateStoreJobConfig = ConfigUtils.propertiesToConfig(properties)
>   .withValue(ConfigurationKeys.STATE_STORE_FS_URI_KEY,
> ConfigValueFactory.fromAnyRef(rootPathUri.toString()));
> {code}
> Compare this to: 
> [GobblinHelixJobLauncher.java#L156|https://github.com/apache/incubator-gobblin/blob/0.12.0/gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinHelixJobLauncher.java#L156]
>  
> It creates a new _Config_ like:
>  
> {code:java}
> Config stateStoreJobConfig = ConfigUtils.propertiesToConfig(jobProps)
> .withValue(ConfigurationKeys.STATE_STORE_FS_URI_KEY, 
> ConfigValueFactory.fromAnyRef( new URI(appWorkDir.toUri().getScheme(), null, 
> appWorkDir.toUri().getHost(), appWorkDir.toUri().getPort(), null, null, 
> null).toString()));
> {code}
> The following screenshot shows the callstack and the overridden value.
> !Screen Shot 2018-07-12 at 8.47.07 AM.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-531) Gobblin AWS Worker cannot start because of state store type and uri mismatch

2018-07-12 Thread Joel Baranick (JIRA)
Joel Baranick created GOBBLIN-531:
-

 Summary: Gobblin AWS Worker cannot start because of state store 
type and uri mismatch
 Key: GOBBLIN-531
 URL: https://issues.apache.org/jira/browse/GOBBLIN-531
 Project: Apache Gobblin
  Issue Type: Bug
  Components: gobblin-aws
Affects Versions: 0.12.0
Reporter: Joel Baranick
Assignee: Abhishek Tiwari


Something has changed from 0.10.0 to 0.12.0 which causes the _StateStores_ 
class to be instantiated with a _state.store.fs.uri_ which is mismatched with 
the _state.store.type_.  

The problem seems to be from: 
[GobblinTaskRunner.java#L250|https://github.com/apache/incubator-gobblin/blob/0.12.0/gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java#L250]

It create a new _Config_ for like: 

 
{code:java}
Config stateStoreJobConfig = ConfigUtils.propertiesToConfig(properties)
  .withValue(ConfigurationKeys.STATE_STORE_FS_URI_KEY,
ConfigValueFactory.fromAnyRef(rootPathUri.toString()));
{code}
Compare this to: 
[GobblinHelixJobLauncher.java#L156|https://github.com/apache/incubator-gobblin/blob/0.12.0/gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinHelixJobLauncher.java#L156]

 

It creates a new _Config_ like:

 
{code:java}
Config stateStoreJobConfig = ConfigUtils.propertiesToConfig(jobProps)
.withValue(ConfigurationKeys.STATE_STORE_FS_URI_KEY, 
ConfigValueFactory.fromAnyRef( new URI(appWorkDir.toUri().getScheme(), null, 
appWorkDir.toUri().getHost(), appWorkDir.toUri().getPort(), null, null, 
null).toString()));
{code}
The following screenshot shows the callstack and the overridden value.

!Screen Shot 2018-07-12 at 8.47.07 AM.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-371) In gobblin_pr, Jira resolution fails if python jira package is not installed

2018-02-14 Thread Joel Baranick (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Baranick resolved GOBBLIN-371.
---
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request #2246
[https://github.com/apache/incubator-gobblin/pull/2246]

> In gobblin_pr, Jira resolution fails if python jira package is not installed
> 
>
> Key: GOBBLIN-371
> URL: https://issues.apache.org/jira/browse/GOBBLIN-371
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>Assignee: Joel Baranick
>Priority: Major
> Fix For: 0.13.0
>
>
> In gobblin_pr, Jira resolution fails if python jira package is not installed. 
>  If this happens, there is no easy way to recover and you have to resolve the 
> jira issue manually.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-389) Gobblin class resolution requires all classes to be in gobblin packages

2018-01-25 Thread Joel Baranick (JIRA)
Joel Baranick created GOBBLIN-389:
-

 Summary: Gobblin class resolution requires all classes to be in 
gobblin packages
 Key: GOBBLIN-389
 URL: https://issues.apache.org/jira/browse/GOBBLIN-389
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Joel Baranick


Gobblin performs classpath scanning to allow loading classes from configured 
aliases.  The current mechanism forces classes to be in a few specific gobblin 
packages.  This is confusing for users and increases support costs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-371) In gobblin_pr, Jira resolution fails if python jira package is not installed

2018-01-25 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16339425#comment-16339425
 ] 

Joel Baranick commented on GOBBLIN-371:
---

[~abti] Any feedback on this bug and the associated PR?

> In gobblin_pr, Jira resolution fails if python jira package is not installed
> 
>
> Key: GOBBLIN-371
> URL: https://issues.apache.org/jira/browse/GOBBLIN-371
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>Assignee: Joel Baranick
>Priority: Major
>
> In gobblin_pr, Jira resolution fails if python jira package is not installed. 
>  If this happens, there is no easy way to recover and you have to resolve the 
> jira issue manually.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely

2018-01-22 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334662#comment-16334662
 ] 

Joel Baranick commented on GOBBLIN-318:
---

Great!  I think we should leave this open as there is still some underlying 
issue that needs to be fixes.

> Gobblin Helix Jobs Hang Indefinitely 
> -
>
> Key: GOBBLIN-318
> URL: https://issues.apache.org/jira/browse/GOBBLIN-318
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>Priority: Critical
>
> In some cases, gobblin helix jobs can hang indefinitely.  When coupled with 
> job locks, this can result in a job becoming stuck and not progressing.  The 
> only solution currently is to restart the master node.
> Assume the following is for a {{job_myjob_1510884004834}} and which hung at 
> 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. 
> {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job 
> as completed. This results in the {{TaskStateCollectorService}} indefinitely 
> searching for more task states, even though it has processed all the task 
> states that are ever going to be produced.  There is no reference to the hung 
> job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}.  In the Helix Web Admin, 
> the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. 
> There is no record of the job in Zookeeper at 
> {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}.  This means that 
> the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails.
> {code:java}
> private void waitForJobCompletion() throws InterruptedException {
> while (true) {
>   WorkflowContext workflowContext = 
> TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName);
>   if (workflowContext != null) {
> org.apache.helix.task.TaskState helixJobState = 
> workflowContext.getJobState(this.jobResourceName);
> if (helixJobState == org.apache.helix.task.TaskState.COMPLETED ||
> helixJobState == org.apache.helix.task.TaskState.FAILED ||
> helixJobState == org.apache.helix.task.TaskState.STOPPED) {
>   return;
> }
>   }
>   Thread.sleep(1000);
> }
>   }
> {code}
> The code gets the job state from Zookeeper:
> {code:javascript}
> {
>   "id": "WorkflowContext",
>   "simpleFields": {
> "START_TIME": "1505159715449",
> "STATE": "IN_PROGRESS"
>   },
>   "listFields": {},
>   "mapFields": {
> "JOB_STATES": {
>   "jobname_job_jobname_150741571": "COMPLETED",
>   "jobname_job_jobname_150775680": "COMPLETED",
>   "jobname_job_jobname_150795931": "COMPLETED",
>   "jobname_job_jobname_1509857102910": "COMPLETED",
>   "jobname_job_jobname_1510253708033": "COMPLETED",
>   "jobname_job_jobname_1510271102898": "COMPLETED",
>   "jobname_job_jobname_1510852210668": "COMPLETED",
>   "jobname_job_jobname_1510853133675": "COMPLETED"
> }
>   }
> }
> {code}
> But there is no information contained in the job state for the hung job.
> Also, it is really strange that the job states contained in that json blob 
> are so old.  The oldest one is from 2017-10-7 10:35:00 PM UTC, more than a 
> month ago.
> I'm not sure how the system got in this state, but this isn't the first time 
> we have seen this.  While it would be good to prevent this from happening, it 
> would also be good to allow the system to recover if this state is entered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-378) Task only publish data when the state is successful in the earlier processing

2018-01-17 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329866#comment-16329866
 ] 

Joel Baranick commented on GOBBLIN-378:
---

What versions does this impact?  Can you give more details to allow others to 
access whether they are impacted?

> Task only publish data when the state is successful in the earlier processing
> -
>
> Key: GOBBLIN-378
> URL: https://issues.apache.org/jira/browse/GOBBLIN-378
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (GOBBLIN-378) Task only publish data when the state is successful in the earlier processing

2018-01-17 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329866#comment-16329866
 ] 

Joel Baranick edited comment on GOBBLIN-378 at 1/18/18 1:40 AM:


What versions does this impact?  Can you give more details to allow others to 
assess whether they are impacted?


was (Author: jbaranick):
What versions does this impact?  Can you give more details to allow others to 
access whether they are impacted?

> Task only publish data when the state is successful in the earlier processing
> -
>
> Key: GOBBLIN-378
> URL: https://issues.apache.org/jira/browse/GOBBLIN-378
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely

2018-01-17 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329381#comment-16329381
 ] 

Joel Baranick commented on GOBBLIN-318:
---

Retries to zk updates were added to master helix in the commit

> Gobblin Helix Jobs Hang Indefinitely 
> -
>
> Key: GOBBLIN-318
> URL: https://issues.apache.org/jira/browse/GOBBLIN-318
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>Priority: Critical
>
> In some cases, gobblin helix jobs can hang indefinitely.  When coupled with 
> job locks, this can result in a job becoming stuck and not progressing.  The 
> only solution currently is to restart the master node.
> Assume the following is for a {{job_myjob_1510884004834}} and which hung at 
> 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. 
> {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job 
> as completed. This results in the {{TaskStateCollectorService}} indefinitely 
> searching for more task states, even though it has processed all the task 
> states that are ever going to be produced.  There is no reference to the hung 
> job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}.  In the Helix Web Admin, 
> the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. 
> There is no record of the job in Zookeeper at 
> {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}.  This means that 
> the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails.
> {code:java}
> private void waitForJobCompletion() throws InterruptedException {
> while (true) {
>   WorkflowContext workflowContext = 
> TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName);
>   if (workflowContext != null) {
> org.apache.helix.task.TaskState helixJobState = 
> workflowContext.getJobState(this.jobResourceName);
> if (helixJobState == org.apache.helix.task.TaskState.COMPLETED ||
> helixJobState == org.apache.helix.task.TaskState.FAILED ||
> helixJobState == org.apache.helix.task.TaskState.STOPPED) {
>   return;
> }
>   }
>   Thread.sleep(1000);
> }
>   }
> {code}
> The code gets the job state from Zookeeper:
> {code:javascript}
> {
>   "id": "WorkflowContext",
>   "simpleFields": {
> "START_TIME": "1505159715449",
> "STATE": "IN_PROGRESS"
>   },
>   "listFields": {},
>   "mapFields": {
> "JOB_STATES": {
>   "jobname_job_jobname_150741571": "COMPLETED",
>   "jobname_job_jobname_150775680": "COMPLETED",
>   "jobname_job_jobname_150795931": "COMPLETED",
>   "jobname_job_jobname_1509857102910": "COMPLETED",
>   "jobname_job_jobname_1510253708033": "COMPLETED",
>   "jobname_job_jobname_1510271102898": "COMPLETED",
>   "jobname_job_jobname_1510852210668": "COMPLETED",
>   "jobname_job_jobname_1510853133675": "COMPLETED"
> }
>   }
> }
> {code}
> But there is no information contained in the job state for the hung job.
> Also, it is really strange that the job states contained in that json blob 
> are so old.  The oldest one is from 2017-10-7 10:35:00 PM UTC, more than a 
> month ago.
> I'm not sure how the system got in this state, but this isn't the first time 
> we have seen this.  While it would be good to prevent this from happening, it 
> would also be good to allow the system to recover if this state is entered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-302) Handle stuck Helix workflow

2018-01-17 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329191#comment-16329191
 ] 

Joel Baranick commented on GOBBLIN-302:
---

This seems like it is related to: 
[GOBBLIN-318|https://issues.apache.org/jira/browse/GOBBLIN-318]

> Handle stuck Helix workflow
> ---
>
> Key: GOBBLIN-302
> URL: https://issues.apache.org/jira/browse/GOBBLIN-302
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Arjun Singh Bora
>Assignee: Arjun Singh Bora
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-373) Expose task executor auto scale metrics to external sensor

2018-01-16 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327516#comment-16327516
 ] 

Joel Baranick commented on GOBBLIN-373:
---

[~yukuai518] Any more details.  Just curious how these are going to be used.

> Expose task executor auto scale metrics to external sensor
> --
>
> Key: GOBBLIN-373
> URL: https://issues.apache.org/jira/browse/GOBBLIN-373
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>Priority: Major
>
> This is used for LinkedIn inGraph integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-371) In gobblin_pr, Jira resolution fails if python jira package is not installed

2018-01-13 Thread Joel Baranick (JIRA)
Joel Baranick created GOBBLIN-371:
-

 Summary: In gobblin_pr, Jira resolution fails if python jira 
package is not installed
 Key: GOBBLIN-371
 URL: https://issues.apache.org/jira/browse/GOBBLIN-371
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Joel Baranick


In gobblin_pr, Jira resolution fails if python jira package is not installed.  
If this happens, there is no easy way to recover and you have to resolve the 
jira issue manually.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (GOBBLIN-207) Gobblin AWS requires job package to be publicly accessible

2018-01-12 Thread Joel Baranick (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Baranick resolved GOBBLIN-207.
---
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request #2059 
(https://github.com/apache/incubator-gobblin/pull/2059)

> Gobblin AWS requires job package to be publicly accessible
> --
>
> Key: GOBBLIN-207
> URL: https://issues.apache.org/jira/browse/GOBBLIN-207
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>Assignee: Joel Baranick
> Fix For: 0.13.0
>
>
> {{GobblinAwsJobConfigurationManager}} expects that the job configuration file 
> is publicly accessible so that it can be downloaded.  This PR changes how the 
> download is done, using Hadoop FS, so that the job package can be stored on 
> filesystems that don't expose it over HTTP and so that authentication can be 
> performed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (GOBBLIN-207) Gobblin AWS requires job package to be publicly accessible

2018-01-12 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325018#comment-16325018
 ] 

Joel Baranick edited comment on GOBBLIN-207 at 1/13/18 7:08 AM:


Issue resolved by pull request #2059
https://github.com/apache/incubator-gobblin/pull/2059


was (Author: jbaranick):
Issue resolved by pull request #2059 
(https://github.com/apache/incubator-gobblin/pull/2059)

> Gobblin AWS requires job package to be publicly accessible
> --
>
> Key: GOBBLIN-207
> URL: https://issues.apache.org/jira/browse/GOBBLIN-207
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>Assignee: Joel Baranick
> Fix For: 0.13.0
>
>
> {{GobblinAwsJobConfigurationManager}} expects that the job configuration file 
> is publicly accessible so that it can be downloaded.  This PR changes how the 
> download is done, using Hadoop FS, so that the job package can be stored on 
> filesystems that don't expose it over HTTP and so that authentication can be 
> performed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-360) Helix not pruning old Zookeeper data

2018-01-06 Thread Joel Baranick (JIRA)
Joel Baranick created GOBBLIN-360:
-

 Summary: Helix not pruning old Zookeeper data
 Key: GOBBLIN-360
 URL: https://issues.apache.org/jira/browse/GOBBLIN-360
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Joel Baranick


Helix version 0.6.7 is not correctly pruning old data in zookeeper at path 
{{/root/cluster/PROPERTYSTORE/TaskRebalancer}}.  This causes the zookeeper 
cluster to keep using more disk and memory.  At some point, the number of 
children in the folder exceeds the default zookeeper {{jute.maxbuffer}} setting 
and the contents of {{/root/cluster/PROPERTYSTORE/TaskRebalancer}} cannot be 
listed, deleted, etc.  The only resolution at that point to reduce data is to 
add the {{-Djute.maxbuffer=999}} system parameter to all zookeeper servers 
and the zkCli application, restart the zookeeper processes, connect via zkCli, 
and cleanup the data.  Once done, {{-Djute.maxbuffer}} can be removed from the 
zookeeper servers and the zkCli application.  Then restart the zookeeper server 
processes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-359) Logged Job/Task info from TaskExecutor threads sometimes does not match the task running

2018-01-06 Thread Joel Baranick (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Baranick updated GOBBLIN-359:
--
Summary: Logged Job/Task info from TaskExecutor threads sometimes does not 
match the task running  (was: Job/task info stored in MDC sometimes is 
incorrect)

> Logged Job/Task info from TaskExecutor threads sometimes does not match the 
> task running
> 
>
> Key: GOBBLIN-359
> URL: https://issues.apache.org/jira/browse/GOBBLIN-359
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>Assignee: Joel Baranick
>
> In some cases the job/task information that is stored in the MDC to improve 
> logging doesn't match the actual task being run on a given thread.  It seems 
> as if the MDC contents are not always being managed in a way that ensures 
> that when a task is complete the MDC data is cleared.
> One place I noticed was in {{TaskExecutor}}, where {{this.taskExecutor}} and 
> {{this.forkExecutor}} are not wrapped with 
> {{ExecutorUtils.loggingDecorator}}.  {{ExecutorUtils.loggingDecorator}} 
> ensures that submitted {{Runnable}} and {{Callable}} instances first clone 
> the MDC and finally reset the MDC.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-359) Job/task info stored in MDC sometimes is incorrect

2018-01-06 Thread Joel Baranick (JIRA)
Joel Baranick created GOBBLIN-359:
-

 Summary: Job/task info stored in MDC sometimes is incorrect
 Key: GOBBLIN-359
 URL: https://issues.apache.org/jira/browse/GOBBLIN-359
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Joel Baranick
Assignee: Joel Baranick


In some cases the job/task information that is stored in the MDC to improve 
logging doesn't match the actual task being run on a given thread.  It seems as 
if the MDC contents are not always being managed in a way that ensures that 
when a task is complete the MDC data is cleared.

One place I noticed was in {{TaskExecutor}}, where {{this.taskExecutor}} and 
{{this.forkExecutor}} are not wrapped with {{ExecutorUtils.loggingDecorator}}.  
{{ExecutorUtils.loggingDecorator}} ensures that submitted {{Runnable}} and 
{{Callable}} instances first clone the MDC and finally reset the MDC.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (GOBBLIN-357) Poor logging when zookeeper connection is lost

2018-01-05 Thread Joel Baranick (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Baranick resolved GOBBLIN-357.
---
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request #2230
[https://github.com/apache/incubator-gobblin/pull/2230]

> Poor logging when zookeeper connection is lost
> --
>
> Key: GOBBLIN-357
> URL: https://issues.apache.org/jira/browse/GOBBLIN-357
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>Assignee: Joel Baranick
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-357) Poor logging when zookeeper connection is lost

2018-01-05 Thread Joel Baranick (JIRA)
Joel Baranick created GOBBLIN-357:
-

 Summary: Poor logging when zookeeper connection is lost
 Key: GOBBLIN-357
 URL: https://issues.apache.org/jira/browse/GOBBLIN-357
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Joel Baranick
Assignee: Joel Baranick






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely

2017-12-11 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286433#comment-16286433
 ] 

Joel Baranick commented on GOBBLIN-318:
---

[~abti] Job timeouts will help.  That said, the underlying issue of the 
TaskCollectorService missing task states should be resolved.

> Gobblin Helix Jobs Hang Indefinitely 
> -
>
> Key: GOBBLIN-318
> URL: https://issues.apache.org/jira/browse/GOBBLIN-318
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>Priority: Critical
>
> In some cases, gobblin helix jobs can hang indefinitely.  When coupled with 
> job locks, this can result in a job becoming stuck and not progressing.  The 
> only solution currently is to restart the master node.
> Assume the following is for a {{job_myjob_1510884004834}} and which hung at 
> 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. 
> {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job 
> as completed. This results in the {{TaskStateCollectorService}} indefinitely 
> searching for more task states, even though it has processed all the task 
> states that are ever going to be produced.  There is no reference to the hung 
> job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}.  In the Helix Web Admin, 
> the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. 
> There is no record of the job in Zookeeper at 
> {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}.  This means that 
> the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails.
> {code:java}
> private void waitForJobCompletion() throws InterruptedException {
> while (true) {
>   WorkflowContext workflowContext = 
> TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName);
>   if (workflowContext != null) {
> org.apache.helix.task.TaskState helixJobState = 
> workflowContext.getJobState(this.jobResourceName);
> if (helixJobState == org.apache.helix.task.TaskState.COMPLETED ||
> helixJobState == org.apache.helix.task.TaskState.FAILED ||
> helixJobState == org.apache.helix.task.TaskState.STOPPED) {
>   return;
> }
>   }
>   Thread.sleep(1000);
> }
>   }
> {code}
> The code gets the job state from Zookeeper:
> {code:javascript}
> {
>   "id": "WorkflowContext",
>   "simpleFields": {
> "START_TIME": "1505159715449",
> "STATE": "IN_PROGRESS"
>   },
>   "listFields": {},
>   "mapFields": {
> "JOB_STATES": {
>   "jobname_job_jobname_150741571": "COMPLETED",
>   "jobname_job_jobname_150775680": "COMPLETED",
>   "jobname_job_jobname_150795931": "COMPLETED",
>   "jobname_job_jobname_1509857102910": "COMPLETED",
>   "jobname_job_jobname_1510253708033": "COMPLETED",
>   "jobname_job_jobname_1510271102898": "COMPLETED",
>   "jobname_job_jobname_1510852210668": "COMPLETED",
>   "jobname_job_jobname_1510853133675": "COMPLETED"
> }
>   }
> }
> {code}
> But there is no information contained in the job state for the hung job.
> Also, it is really strange that the job states contained in that json blob 
> are so old.  The oldest one is from 2017-10-7 10:35:00 PM UTC, more than a 
> month ago.
> I'm not sure how the system got in this state, but this isn't the first time 
> we have seen this.  While it would be good to prevent this from happening, it 
> would also be good to allow the system to recover if this state is entered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely

2017-12-11 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286467#comment-16286467
 ] 

Joel Baranick edited comment on GOBBLIN-318 at 12/11/17 8:00 PM:
-

To summarize the issue:
# The job is running.
# The job lock is still held.
# All tasks have completed successfully and written their task state files.
# The job has consumed all the task state files and updated the gobblin job and 
database
# The helix state in Zookeeper is missing or not in a terminal state.
# The job keeps polling the state at 
"/mycluster/PROPERTYSTORE/TaskRebalancer/myjob/Context".


was (Author: jbaranick):
To summarize the issue:
# The job is running.
# The job lock is still help.
# All tasks have completed successfully and written their task state files.
# The job has consumed all the task state files and updated the gobblin job and 
database
# The helix state in Zookeeper is missing or not in a terminal state.
# The job keeps polling the state at 
"/mycluster/PROPERTYSTORE/TaskRebalancer/myjob/Context"

> Gobblin Helix Jobs Hang Indefinitely 
> -
>
> Key: GOBBLIN-318
> URL: https://issues.apache.org/jira/browse/GOBBLIN-318
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>Priority: Critical
>
> In some cases, gobblin helix jobs can hang indefinitely.  When coupled with 
> job locks, this can result in a job becoming stuck and not progressing.  The 
> only solution currently is to restart the master node.
> Assume the following is for a {{job_myjob_1510884004834}} and which hung at 
> 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. 
> {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job 
> as completed. This results in the {{TaskStateCollectorService}} indefinitely 
> searching for more task states, even though it has processed all the task 
> states that are ever going to be produced.  There is no reference to the hung 
> job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}.  In the Helix Web Admin, 
> the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. 
> There is no record of the job in Zookeeper at 
> {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}.  This means that 
> the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails.
> {code:java}
> private void waitForJobCompletion() throws InterruptedException {
> while (true) {
>   WorkflowContext workflowContext = 
> TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName);
>   if (workflowContext != null) {
> org.apache.helix.task.TaskState helixJobState = 
> workflowContext.getJobState(this.jobResourceName);
> if (helixJobState == org.apache.helix.task.TaskState.COMPLETED ||
> helixJobState == org.apache.helix.task.TaskState.FAILED ||
> helixJobState == org.apache.helix.task.TaskState.STOPPED) {
>   return;
> }
>   }
>   Thread.sleep(1000);
> }
>   }
> {code}
> The code gets the job state from Zookeeper:
> {code:javascript}
> {
>   "id": "WorkflowContext",
>   "simpleFields": {
> "START_TIME": "1505159715449",
> "STATE": "IN_PROGRESS"
>   },
>   "listFields": {},
>   "mapFields": {
> "JOB_STATES": {
>   "jobname_job_jobname_150741571": "COMPLETED",
>   "jobname_job_jobname_150775680": "COMPLETED",
>   "jobname_job_jobname_150795931": "COMPLETED",
>   "jobname_job_jobname_1509857102910": "COMPLETED",
>   "jobname_job_jobname_1510253708033": "COMPLETED",
>   "jobname_job_jobname_1510271102898": "COMPLETED",
>   "jobname_job_jobname_1510852210668": "COMPLETED",
>   "jobname_job_jobname_1510853133675": "COMPLETED"
> }
>   }
> }
> {code}
> But there is no information contained in the job state for the hung job.
> Also, it is really strange that the job states contained in that json blob 
> are so old.  The oldest one is from 2017-10-7 10:35:00 PM UTC, more than a 
> month ago.
> I'm not sure how the system got in this state, but this isn't the first time 
> we have seen this.  While it would be good to prevent this from happening, it 
> would also be good to allow the system to recover if this state is entered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely

2017-12-11 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286602#comment-16286602
 ] 

Joel Baranick commented on GOBBLIN-318:
---

Another piece of info.  All tasks are marked as completed in the Gobblin DB, 
but when I look at 
https://zookeeper/node?path=/ROOT/CLUSTER/PROPERTYSTORE/TaskRebalancer/JOB_NAME_job_JOB_NAME_1512924480001/Context
 , there are multiple tasks still marked as running:

{code:java}
{
  "id":"TaskContext"
  ,"simpleFields":{
"START_TIME":"1512924491039"
  }
  ,"listFields":{
  }
  ,"mapFields":{
"0":{
  "ASSIGNED_PARTICIPANT":"worker-1"
  ,"FINISH_TIME":"1512924700877"
  ,"INFO":"completed tasks: 1"
  ,"NUM_ATTEMPTS":"1"
  ,"START_TIME":"1512924491044"
  ,"STATE":"COMPLETED"
  ,"TASK_ID":"124a2e88-90e3-40e8-add6-94b59ee30133"
}
,"1":{
  "ASSIGNED_PARTICIPANT":"worker-2"
  ,"FINISH_TIME":"1512924701120"
  ,"INFO":"completed tasks: 1"
  ,"NUM_ATTEMPTS":"1"
  ,"START_TIME":"1512924491044"
  ,"STATE":"COMPLETED"
  ,"TASK_ID":"9d7c2369-d6d9-4c2f-8bf3-1bcea0a47fdf"
}
,"2":{
  "ASSIGNED_PARTICIPANT":"worker-3"
  ,"FINISH_TIME":"1512924695451"
  ,"INFO":"completed tasks: 1"
  ,"NUM_ATTEMPTS":"1"
  ,"START_TIME":"1512924491044"
  ,"STATE":"COMPLETED"
  ,"TASK_ID":"19545764-e2bf-48b6-9942-361c834790cf"
}
,"3":{
  "ASSIGNED_PARTICIPANT":"worker-4"
  ,"FINISH_TIME":"1512924776614"
  ,"INFO":"completed tasks: 1"
  ,"NUM_ATTEMPTS":"1"
  ,"START_TIME":"1512924491044"
  ,"STATE":"COMPLETED"
  ,"TASK_ID":"3f59431f-2415-477a-8008-26a3eb258129"
}
,"4":{
  "ASSIGNED_PARTICIPANT":"worker-5"
  ,"FINISH_TIME":"1512924731962"
  ,"INFO":"completed tasks: 1"
  ,"NUM_ATTEMPTS":"1"
  ,"START_TIME":"1512924491044"
  ,"STATE":"COMPLETED"
  ,"TASK_ID":"19863633-6ed3-49d4-a07f-2130eec15dd3"
}
,"5":{
  "ASSIGNED_PARTICIPANT":"worker-6"
  ,"INFO":""
  ,"START_TIME":"1512924491044"
  ,"STATE":"RUNNING"
  ,"TASK_ID":"433c0107-0919-428a-b7c5-6e8925df7dac"
}
,"6":{
  "ASSIGNED_PARTICIPANT":"worker-7"
  ,"INFO":""
  ,"START_TIME":"1512924491044"
  ,"STATE":"RUNNING"
  ,"TASK_ID":"89a63cfd-efb4-44ce-a08b-68678d792e25"
}
,"7":{
  "ASSIGNED_PARTICIPANT":"worker-8"
  ,"FINISH_TIME":"1512924524111"
  ,"INFO":"completed tasks: 1"
  ,"NUM_ATTEMPTS":"1"
  ,"START_TIME":"1512924491044"
  ,"STATE":"COMPLETED"
  ,"TASK_ID":"a133db13-3f28-49af-8e3d-1d6fa81f6247"
}
,"8":{
  "ASSIGNED_PARTICIPANT":"worker-9"
  ,"INFO":""
  ,"START_TIME":"1512924491044"
  ,"STATE":"RUNNING"
  ,"TASK_ID":"7bbda2ef-68da-4f11-b217-89c3cd7d7a2e"
}
,"9":{
  "ASSIGNED_PARTICIPANT":"worker-10"
  ,"INFO":""
  ,"START_TIME":"1512924491044"
  ,"STATE":"RUNNING"
  ,"TASK_ID":"8407cb27-4b26-4786-91f2-ad920b1e2343"
}
  }
}
{code}


> Gobblin Helix Jobs Hang Indefinitely 
> -
>
> Key: GOBBLIN-318
> URL: https://issues.apache.org/jira/browse/GOBBLIN-318
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>Priority: Critical
>
> In some cases, gobblin helix jobs can hang indefinitely.  When coupled with 
> job locks, this can result in a job becoming stuck and not progressing.  The 
> only solution currently is to restart the master node.
> Assume the following is for a {{job_myjob_1510884004834}} and which hung at 
> 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. 
> {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job 
> as completed. This results in the {{TaskStateCollectorService}} indefinitely 
> searching for more task states, even though it has processed all the task 
> states that are ever going to be produced.  There is no reference to the hung 
> job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}.  In the Helix Web Admin, 
> the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. 
> There is no record of the job in Zookeeper at 
> {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}.  This means that 
> the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails.
> {code:java}
> private void waitForJobCompletion() throws InterruptedException {
> while (true) {
>   WorkflowContext workflowContext = 
> TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName);
>   if (workflowContext != null) {
> org.apache.helix.task.TaskState helixJobState = 
> workflowContext.getJobState(this.jobResourceName);
> if (helixJobState == org.apache.helix.task.TaskState.COMPLETED ||
> helixJobState == org.apache.helix.task.TaskState.FAILED ||
> helixJobState == 

[jira] [Commented] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely

2017-12-11 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286467#comment-16286467
 ] 

Joel Baranick commented on GOBBLIN-318:
---

To summarize the issue:
# The job is running.
# The job lock is still help.
# All tasks have completed successfully and written their task state files.
# The job has consumed all the task state files and updated the gobblin job and 
database
# The helix state in Zookeeper is missing or not in a terminal state.
# The job keeps polling the state at 
"/mycluster/PROPERTYSTORE/TaskRebalancer/myjob/Context"

> Gobblin Helix Jobs Hang Indefinitely 
> -
>
> Key: GOBBLIN-318
> URL: https://issues.apache.org/jira/browse/GOBBLIN-318
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>Priority: Critical
>
> In some cases, gobblin helix jobs can hang indefinitely.  When coupled with 
> job locks, this can result in a job becoming stuck and not progressing.  The 
> only solution currently is to restart the master node.
> Assume the following is for a {{job_myjob_1510884004834}} and which hung at 
> 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. 
> {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job 
> as completed. This results in the {{TaskStateCollectorService}} indefinitely 
> searching for more task states, even though it has processed all the task 
> states that are ever going to be produced.  There is no reference to the hung 
> job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}.  In the Helix Web Admin, 
> the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. 
> There is no record of the job in Zookeeper at 
> {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}.  This means that 
> the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails.
> {code:java}
> private void waitForJobCompletion() throws InterruptedException {
> while (true) {
>   WorkflowContext workflowContext = 
> TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName);
>   if (workflowContext != null) {
> org.apache.helix.task.TaskState helixJobState = 
> workflowContext.getJobState(this.jobResourceName);
> if (helixJobState == org.apache.helix.task.TaskState.COMPLETED ||
> helixJobState == org.apache.helix.task.TaskState.FAILED ||
> helixJobState == org.apache.helix.task.TaskState.STOPPED) {
>   return;
> }
>   }
>   Thread.sleep(1000);
> }
>   }
> {code}
> The code gets the job state from Zookeeper:
> {code:javascript}
> {
>   "id": "WorkflowContext",
>   "simpleFields": {
> "START_TIME": "1505159715449",
> "STATE": "IN_PROGRESS"
>   },
>   "listFields": {},
>   "mapFields": {
> "JOB_STATES": {
>   "jobname_job_jobname_150741571": "COMPLETED",
>   "jobname_job_jobname_150775680": "COMPLETED",
>   "jobname_job_jobname_150795931": "COMPLETED",
>   "jobname_job_jobname_1509857102910": "COMPLETED",
>   "jobname_job_jobname_1510253708033": "COMPLETED",
>   "jobname_job_jobname_1510271102898": "COMPLETED",
>   "jobname_job_jobname_1510852210668": "COMPLETED",
>   "jobname_job_jobname_1510853133675": "COMPLETED"
> }
>   }
> }
> {code}
> But there is no information contained in the job state for the hung job.
> Also, it is really strange that the job states contained in that json blob 
> are so old.  The oldest one is from 2017-10-7 10:35:00 PM UTC, more than a 
> month ago.
> I'm not sure how the system got in this state, but this isn't the first time 
> we have seen this.  While it would be good to prevent this from happening, it 
> would also be good to allow the system to recover if this state is entered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely

2017-12-11 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286433#comment-16286433
 ] 

Joel Baranick edited comment on GOBBLIN-318 at 12/11/17 7:36 PM:
-

[~abti] Job timeouts will help.  That said, the underlying issue still needs to 
be fixed.

A couple more pieces of info that might help figure out what is going on.
# We write the task state to EFS, so it isn't an S3 eventual consistency issue.
# The TaskStateCollectorService recognizes that the task is done. I know that 
because the sum of the completed task count from "Collected task state of %d 
completed tasks" equals that task count for the job.



was (Author: jbaranick):
[~abti] Job timeouts will help.  That said, the underlying issue of the 
TaskCollectorService missing task states should be resolved.

One other piece of info.  We write the task state to EFS, so it isn't an S3 
eventual consistency issue.

> Gobblin Helix Jobs Hang Indefinitely 
> -
>
> Key: GOBBLIN-318
> URL: https://issues.apache.org/jira/browse/GOBBLIN-318
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>Priority: Critical
>
> In some cases, gobblin helix jobs can hang indefinitely.  When coupled with 
> job locks, this can result in a job becoming stuck and not progressing.  The 
> only solution currently is to restart the master node.
> Assume the following is for a {{job_myjob_1510884004834}} and which hung at 
> 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. 
> {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job 
> as completed. This results in the {{TaskStateCollectorService}} indefinitely 
> searching for more task states, even though it has processed all the task 
> states that are ever going to be produced.  There is no reference to the hung 
> job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}.  In the Helix Web Admin, 
> the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. 
> There is no record of the job in Zookeeper at 
> {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}.  This means that 
> the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails.
> {code:java}
> private void waitForJobCompletion() throws InterruptedException {
> while (true) {
>   WorkflowContext workflowContext = 
> TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName);
>   if (workflowContext != null) {
> org.apache.helix.task.TaskState helixJobState = 
> workflowContext.getJobState(this.jobResourceName);
> if (helixJobState == org.apache.helix.task.TaskState.COMPLETED ||
> helixJobState == org.apache.helix.task.TaskState.FAILED ||
> helixJobState == org.apache.helix.task.TaskState.STOPPED) {
>   return;
> }
>   }
>   Thread.sleep(1000);
> }
>   }
> {code}
> The code gets the job state from Zookeeper:
> {code:javascript}
> {
>   "id": "WorkflowContext",
>   "simpleFields": {
> "START_TIME": "1505159715449",
> "STATE": "IN_PROGRESS"
>   },
>   "listFields": {},
>   "mapFields": {
> "JOB_STATES": {
>   "jobname_job_jobname_150741571": "COMPLETED",
>   "jobname_job_jobname_150775680": "COMPLETED",
>   "jobname_job_jobname_150795931": "COMPLETED",
>   "jobname_job_jobname_1509857102910": "COMPLETED",
>   "jobname_job_jobname_1510253708033": "COMPLETED",
>   "jobname_job_jobname_1510271102898": "COMPLETED",
>   "jobname_job_jobname_1510852210668": "COMPLETED",
>   "jobname_job_jobname_1510853133675": "COMPLETED"
> }
>   }
> }
> {code}
> But there is no information contained in the job state for the hung job.
> Also, it is really strange that the job states contained in that json blob 
> are so old.  The oldest one is from 2017-10-7 10:35:00 PM UTC, more than a 
> month ago.
> I'm not sure how the system got in this state, but this isn't the first time 
> we have seen this.  While it would be good to prevent this from happening, it 
> would also be good to allow the system to recover if this state is entered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (GOBBLIN-321) CSV to HDFS ISSUE

2017-12-11 Thread Joel Baranick (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Baranick resolved GOBBLIN-321.
---
Resolution: Not A Problem

> CSV to HDFS ISSUE
> -
>
> Key: GOBBLIN-321
> URL: https://issues.apache.org/jira/browse/GOBBLIN-321
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Azmal Sheik
>Assignee: Joel Baranick
>Priority: Critical
>  Labels: beginner, newbie, starter
> Attachments: gobblin-current.log, job.txt
>
>
> I was trying to load csv file data to HDFS with below job conf But I'm facing 
> class not found error, I have checked in lib/gobblin-core.jar the class 
> TextFileBasedSource is present but it was saying class not found.
> Can anyone help over here
> Here is JOB,LOGS
> *JOB :
> *
> job.name=json-gobblin-hdfs
> job.group=Gobblin-Json-Demo
> job.description=Publishing JSON data from files to HDFS in Avro format.
> job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/
> job.lock.enabled=false
> distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/
> source.class=gobblin.source.extractor.filebased.TextFileBasedSource
> converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter"
> writer.builder.class=gobblin.writer.AvroDataWriterBuilder
> source.entity=
> source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> extract.table.name=CsvToAvro
> extract.namespace=gobblin.example
> extract.table.type=APPEND_ONLY
> source.schema={"namespace":"example.avro", "type":"record", "name":"User", 
> "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number",  
> "type":"int"}, {#"name":"favorite_color", "type":"string"}]}
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy
> qualitychecker.task.policy.types=OPTIONAL,OPTIONAL
> qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy
> qualitychecker.row.policy.types=OPTIONAL
> data.publisher.type=gobblin.publisher.BaseDataPublisher
> writer.destination.type=HDFS
> writer.output.format=AVRO
> fs.uri=hdfs://:8020/
> writer.fs.uri=hdfs://...:8020/
> state.store.fs.uri=hdfs://:8020/
> mr.job.root.dir=/user/ndxmetadata/output/working
> state.store.dir=/user/ndxmetadata/output/state-store
> writer.staging.dir=/user/ndxmetadata/output/task-staging
> writer.output.dir=/user/ndxmetadata/output/task-output
> data.publisher.final.dir=/user/ndxmetadata/output/
> ---
> Log's attached below



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely

2017-12-11 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286433#comment-16286433
 ] 

Joel Baranick edited comment on GOBBLIN-318 at 12/11/17 7:29 PM:
-

[~abti] Job timeouts will help.  That said, the underlying issue of the 
TaskCollectorService missing task states should be resolved.

One other piece of info.  We write the task state to EFS, so it isn't an S3 
eventual consistency issue.


was (Author: jbaranick):
[~abti] Job timeouts will help.  That said, the underlying issue of the 
TaskCollectorService missing task states should be resolved.

> Gobblin Helix Jobs Hang Indefinitely 
> -
>
> Key: GOBBLIN-318
> URL: https://issues.apache.org/jira/browse/GOBBLIN-318
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>Priority: Critical
>
> In some cases, gobblin helix jobs can hang indefinitely.  When coupled with 
> job locks, this can result in a job becoming stuck and not progressing.  The 
> only solution currently is to restart the master node.
> Assume the following is for a {{job_myjob_1510884004834}} and which hung at 
> 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. 
> {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job 
> as completed. This results in the {{TaskStateCollectorService}} indefinitely 
> searching for more task states, even though it has processed all the task 
> states that are ever going to be produced.  There is no reference to the hung 
> job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}.  In the Helix Web Admin, 
> the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. 
> There is no record of the job in Zookeeper at 
> {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}.  This means that 
> the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails.
> {code:java}
> private void waitForJobCompletion() throws InterruptedException {
> while (true) {
>   WorkflowContext workflowContext = 
> TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName);
>   if (workflowContext != null) {
> org.apache.helix.task.TaskState helixJobState = 
> workflowContext.getJobState(this.jobResourceName);
> if (helixJobState == org.apache.helix.task.TaskState.COMPLETED ||
> helixJobState == org.apache.helix.task.TaskState.FAILED ||
> helixJobState == org.apache.helix.task.TaskState.STOPPED) {
>   return;
> }
>   }
>   Thread.sleep(1000);
> }
>   }
> {code}
> The code gets the job state from Zookeeper:
> {code:javascript}
> {
>   "id": "WorkflowContext",
>   "simpleFields": {
> "START_TIME": "1505159715449",
> "STATE": "IN_PROGRESS"
>   },
>   "listFields": {},
>   "mapFields": {
> "JOB_STATES": {
>   "jobname_job_jobname_150741571": "COMPLETED",
>   "jobname_job_jobname_150775680": "COMPLETED",
>   "jobname_job_jobname_150795931": "COMPLETED",
>   "jobname_job_jobname_1509857102910": "COMPLETED",
>   "jobname_job_jobname_1510253708033": "COMPLETED",
>   "jobname_job_jobname_1510271102898": "COMPLETED",
>   "jobname_job_jobname_1510852210668": "COMPLETED",
>   "jobname_job_jobname_1510853133675": "COMPLETED"
> }
>   }
> }
> {code}
> But there is no information contained in the job state for the hung job.
> Also, it is really strange that the job states contained in that json blob 
> are so old.  The oldest one is from 2017-10-7 10:35:00 PM UTC, more than a 
> month ago.
> I'm not sure how the system got in this state, but this isn't the first time 
> we have seen this.  While it would be good to prevent this from happening, it 
> would also be good to allow the system to recover if this state is entered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (GOBBLIN-207) Gobblin AWS requires job package to be publicly accessible

2017-11-23 Thread Joel Baranick (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Baranick resolved GOBBLIN-207.
---
Resolution: Not A Bug

> Gobblin AWS requires job package to be publicly accessible
> --
>
> Key: GOBBLIN-207
> URL: https://issues.apache.org/jira/browse/GOBBLIN-207
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>Assignee: Joel Baranick
>
> {{GobblinAwsJobConfigurationManager}} expects that the job configuration file 
> is publicly accessible so that it can be downloaded.  This PR changes how the 
> download is done, using Hadoop FS, so that the job package can be stored on 
> filesystems that don't expose it over HTTP and so that authentication can be 
> performed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (GOBBLIN-207) Gobblin AWS requires job package to be publicly accessible

2017-11-23 Thread Joel Baranick (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Baranick reopened GOBBLIN-207:
---

> Gobblin AWS requires job package to be publicly accessible
> --
>
> Key: GOBBLIN-207
> URL: https://issues.apache.org/jira/browse/GOBBLIN-207
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>Assignee: Joel Baranick
>
> {{GobblinAwsJobConfigurationManager}} expects that the job configuration file 
> is publicly accessible so that it can be downloaded.  This PR changes how the 
> download is done, using Hadoop FS, so that the job package can be stored on 
> filesystems that don't expose it over HTTP and so that authentication can be 
> performed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE

2017-11-23 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264079#comment-16264079
 ] 

Joel Baranick commented on GOBBLIN-321:
---

It is the location on {{source.filebased.fs.uri}}

> CSV to HDFS ISSUE
> -
>
> Key: GOBBLIN-321
> URL: https://issues.apache.org/jira/browse/GOBBLIN-321
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Azmal Sheik
>Assignee: Joel Baranick
>Priority: Critical
>  Labels: beginner, newbie, starter
> Attachments: gobblin-current.log, job.txt
>
>
> I was trying to load csv file data to HDFS with below job conf But I'm facing 
> class not found error, I have checked in lib/gobblin-core.jar the class 
> TextFileBasedSource is present but it was saying class not found.
> Can anyone help over here
> Here is JOB,LOGS
> *JOB :
> *
> job.name=json-gobblin-hdfs
> job.group=Gobblin-Json-Demo
> job.description=Publishing JSON data from files to HDFS in Avro format.
> job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/
> job.lock.enabled=false
> distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/
> source.class=gobblin.source.extractor.filebased.TextFileBasedSource
> converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter"
> writer.builder.class=gobblin.writer.AvroDataWriterBuilder
> source.entity=
> source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> extract.table.name=CsvToAvro
> extract.namespace=gobblin.example
> extract.table.type=APPEND_ONLY
> source.schema={"namespace":"example.avro", "type":"record", "name":"User", 
> "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number",  
> "type":"int"}, {#"name":"favorite_color", "type":"string"}]}
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy
> qualitychecker.task.policy.types=OPTIONAL,OPTIONAL
> qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy
> qualitychecker.row.policy.types=OPTIONAL
> data.publisher.type=gobblin.publisher.BaseDataPublisher
> writer.destination.type=HDFS
> writer.output.format=AVRO
> fs.uri=hdfs://:8020/
> writer.fs.uri=hdfs://...:8020/
> state.store.fs.uri=hdfs://:8020/
> mr.job.root.dir=/user/ndxmetadata/output/working
> state.store.dir=/user/ndxmetadata/output/state-store
> writer.staging.dir=/user/ndxmetadata/output/task-staging
> writer.output.dir=/user/ndxmetadata/output/task-output
> data.publisher.final.dir=/user/ndxmetadata/output/
> ---
> Log's attached below



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE

2017-11-23 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264060#comment-16264060
 ] 

Joel Baranick commented on GOBBLIN-321:
---

Check you logs for {{Running ls command with input}}.  Does the path listed 
there make sense?  It gets build up by combining 
{{source.filebased.data.directory}} with {{source.entity}}.

> CSV to HDFS ISSUE
> -
>
> Key: GOBBLIN-321
> URL: https://issues.apache.org/jira/browse/GOBBLIN-321
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Azmal Sheik
>Assignee: Joel Baranick
>Priority: Critical
>  Labels: beginner, newbie, starter
> Attachments: gobblin-current.log, job.txt
>
>
> I was trying to load csv file data to HDFS with below job conf But I'm facing 
> class not found error, I have checked in lib/gobblin-core.jar the class 
> TextFileBasedSource is present but it was saying class not found.
> Can anyone help over here
> Here is JOB,LOGS
> *JOB :
> *
> job.name=json-gobblin-hdfs
> job.group=Gobblin-Json-Demo
> job.description=Publishing JSON data from files to HDFS in Avro format.
> job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/
> job.lock.enabled=false
> distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/
> source.class=gobblin.source.extractor.filebased.TextFileBasedSource
> converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter"
> writer.builder.class=gobblin.writer.AvroDataWriterBuilder
> source.entity=
> source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> extract.table.name=CsvToAvro
> extract.namespace=gobblin.example
> extract.table.type=APPEND_ONLY
> source.schema={"namespace":"example.avro", "type":"record", "name":"User", 
> "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number",  
> "type":"int"}, {#"name":"favorite_color", "type":"string"}]}
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy
> qualitychecker.task.policy.types=OPTIONAL,OPTIONAL
> qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy
> qualitychecker.row.policy.types=OPTIONAL
> data.publisher.type=gobblin.publisher.BaseDataPublisher
> writer.destination.type=HDFS
> writer.output.format=AVRO
> fs.uri=hdfs://:8020/
> writer.fs.uri=hdfs://...:8020/
> state.store.fs.uri=hdfs://:8020/
> mr.job.root.dir=/user/ndxmetadata/output/working
> state.store.dir=/user/ndxmetadata/output/state-store
> writer.staging.dir=/user/ndxmetadata/output/task-staging
> writer.output.dir=/user/ndxmetadata/output/task-output
> data.publisher.final.dir=/user/ndxmetadata/output/
> ---
> Log's attached below



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE

2017-11-23 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264043#comment-16264043
 ] 

Joel Baranick commented on GOBBLIN-321:
---

{{source.filebased.data.directory}} should be a path not a URI (ex. {{/input}})

> CSV to HDFS ISSUE
> -
>
> Key: GOBBLIN-321
> URL: https://issues.apache.org/jira/browse/GOBBLIN-321
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Azmal Sheik
>Assignee: Joel Baranick
>Priority: Critical
>  Labels: beginner, newbie, starter
> Attachments: gobblin-current.log, job.txt
>
>
> I was trying to load csv file data to HDFS with below job conf But I'm facing 
> class not found error, I have checked in lib/gobblin-core.jar the class 
> TextFileBasedSource is present but it was saying class not found.
> Can anyone help over here
> Here is JOB,LOGS
> *JOB :
> *
> job.name=json-gobblin-hdfs
> job.group=Gobblin-Json-Demo
> job.description=Publishing JSON data from files to HDFS in Avro format.
> job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/
> job.lock.enabled=false
> distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/
> source.class=gobblin.source.extractor.filebased.TextFileBasedSource
> converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter"
> writer.builder.class=gobblin.writer.AvroDataWriterBuilder
> source.entity=
> source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> extract.table.name=CsvToAvro
> extract.namespace=gobblin.example
> extract.table.type=APPEND_ONLY
> source.schema={"namespace":"example.avro", "type":"record", "name":"User", 
> "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number",  
> "type":"int"}, {#"name":"favorite_color", "type":"string"}]}
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy
> qualitychecker.task.policy.types=OPTIONAL,OPTIONAL
> qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy
> qualitychecker.row.policy.types=OPTIONAL
> data.publisher.type=gobblin.publisher.BaseDataPublisher
> writer.destination.type=HDFS
> writer.output.format=AVRO
> fs.uri=hdfs://:8020/
> writer.fs.uri=hdfs://...:8020/
> state.store.fs.uri=hdfs://:8020/
> mr.job.root.dir=/user/ndxmetadata/output/working
> state.store.dir=/user/ndxmetadata/output/state-store
> writer.staging.dir=/user/ndxmetadata/output/task-staging
> writer.output.dir=/user/ndxmetadata/output/task-output
> data.publisher.final.dir=/user/ndxmetadata/output/
> ---
> Log's attached below



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE

2017-11-22 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263904#comment-16263904
 ] 

Joel Baranick commented on GOBBLIN-321:
---

Well, if you are using <= 0.11.0, I would stick to the {{gobblin.}} namespaces. 
 Also, I'd pick either 0.10.0 or 0.11.0 and try with that.

> CSV to HDFS ISSUE
> -
>
> Key: GOBBLIN-321
> URL: https://issues.apache.org/jira/browse/GOBBLIN-321
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Azmal Sheik
>Assignee: Joel Baranick
>Priority: Critical
>  Labels: beginner, newbie, starter
> Attachments: gobblin-current.log, job.txt
>
>
> I was trying to load csv file data to HDFS with below job conf But I'm facing 
> class not found error, I have checked in lib/gobblin-core.jar the class 
> TextFileBasedSource is present but it was saying class not found.
> Can anyone help over here
> Here is JOB,LOGS
> *JOB :
> *
> ## job configuration file ##
> job.name=json-gobblin-hdfs
> job.group=Gobblin-Json-Demo
> job.description=Publishing JSON data from files to HDFS in Avro format.
> job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/
> job.lock.enabled=false
> distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/
> source.class=gobblin.source.extractor.filebased.TextFileBasedSource
> converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter"
> writer.builder.class=gobblin.writer.AvroDataWriterBuilder
> source.entity=
> source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> extract.table.name=CsvToAvro
> extract.namespace=gobblin.example
> extract.table.type=APPEND_ONLY
> # source data schema
> source.schema={"namespace":"example.avro", "type":"record", "name":"User", 
> "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number",  
> "type":"int"}, {#"name":"favorite_color", "type":"string"}]}
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> # quality checker configuration properties
> qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy
> qualitychecker.task.policy.types=OPTIONAL,OPTIONAL
> qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy
> qualitychecker.row.policy.types=OPTIONAL
> # data publisher class to be used
> data.publisher.type=gobblin.publisher.BaseDataPublisher
> # writer configuration properties
> writer.destination.type=HDFS
> writer.output.format=AVRO
> fs.uri=hdfs://:8020/
> writer.fs.uri=hdfs://...:8020/
> state.store.fs.uri=hdfs://:8020/
> mr.job.root.dir=/user/ndxmetadata/output/working
> state.store.dir=/user/ndxmetadata/output/state-store
> writer.staging.dir=/user/ndxmetadata/output/task-staging
> writer.output.dir=/user/ndxmetadata/output/task-output
> data.publisher.final.dir=/user/ndxmetadata/output/
> ---
> Log's attached below



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-187) Gobblin Helix doesn't clean up `.job.state` files, causing unbounded disk usage

2017-11-22 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263899#comment-16263899
 ] 

Joel Baranick commented on GOBBLIN-187:
---

[~abti] Any ideas here?  This ends up causing our EFS to keep growing, 
incurring more cost.

> Gobblin Helix doesn't clean up `.job.state` files, causing unbounded disk 
> usage
> ---
>
> Key: GOBBLIN-187
> URL: https://issues.apache.org/jira/browse/GOBBLIN-187
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>
> Then Gobblin is running on `GobblinHelixJobLauncher.createJob` method writes 
> the job state to a `.job.state` file.  Nothing cleans up these files.  The 
> result is unbounded disk usage.  `.job.state` files should be deleted at the 
> completion of jobs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE

2017-11-22 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263897#comment-16263897
 ] 

Joel Baranick commented on GOBBLIN-321:
---

[~sheik5azmal] Where did you see to use the apache qualified namespace?  Maybe 
some documentation is guiding people astray during this transition period.

> CSV to HDFS ISSUE
> -
>
> Key: GOBBLIN-321
> URL: https://issues.apache.org/jira/browse/GOBBLIN-321
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Azmal Sheik
>Assignee: Joel Baranick
>Priority: Critical
>  Labels: beginner, newbie, starter
> Attachments: gobblin-current.log, job.txt
>
>
> I was trying to load csv file data to HDFS with below job conf But I'm facing 
> class not found error, I have checked in lib/gobblin-core.jar the class 
> TextFileBasedSource is present but it was saying class not found.
> Can anyone help over here
> Here is JOB,LOGS
> *JOB :
> *
> ## job configuration file ##
> job.name=json-gobblin-hdfs
> job.group=Gobblin-Json-Demo
> job.description=Publishing JSON data from files to HDFS in Avro format.
> job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/
> job.lock.enabled=false
> distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/
> source.class=gobblin.source.extractor.filebased.TextFileBasedSource
> converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter"
> writer.builder.class=gobblin.writer.AvroDataWriterBuilder
> source.entity=
> source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> extract.table.name=CsvToAvro
> extract.namespace=gobblin.example
> extract.table.type=APPEND_ONLY
> # source data schema
> source.schema={"namespace":"example.avro", "type":"record", "name":"User", 
> "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number",  
> "type":"int"}, {#"name":"favorite_color", "type":"string"}]}
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> # quality checker configuration properties
> qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy
> qualitychecker.task.policy.types=OPTIONAL,OPTIONAL
> qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy
> qualitychecker.row.policy.types=OPTIONAL
> # data publisher class to be used
> data.publisher.type=gobblin.publisher.BaseDataPublisher
> # writer configuration properties
> writer.destination.type=HDFS
> writer.output.format=AVRO
> fs.uri=hdfs://:8020/
> writer.fs.uri=hdfs://...:8020/
> state.store.fs.uri=hdfs://:8020/
> mr.job.root.dir=/user/ndxmetadata/output/working
> state.store.dir=/user/ndxmetadata/output/state-store
> writer.staging.dir=/user/ndxmetadata/output/task-staging
> writer.output.dir=/user/ndxmetadata/output/task-output
> data.publisher.final.dir=/user/ndxmetadata/output/
> ---
> Log's attached below



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (GOBBLIN-321) CSV to HDFS ISSUE

2017-11-22 Thread Joel Baranick (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Baranick reassigned GOBBLIN-321:
-

Assignee: Joel Baranick

> CSV to HDFS ISSUE
> -
>
> Key: GOBBLIN-321
> URL: https://issues.apache.org/jira/browse/GOBBLIN-321
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Azmal Sheik
>Assignee: Joel Baranick
>Priority: Critical
>  Labels: beginner, newbie, starter
> Attachments: gobblin-current.log, job.txt
>
>
> I was trying to load csv file data to HDFS with below job conf But I'm facing 
> class not found error, I have checked in lib/gobblin-core.jar the class 
> TextFileBasedSource is present but it was saying class not found.
> Can anyone help over here
> Here is JOB,LOGS
> *JOB :
> *
> ## job configuration file ##
> job.name=json-gobblin-hdfs
> job.group=Gobblin-Json-Demo
> job.description=Publishing JSON data from files to HDFS in Avro format.
> job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/
> job.lock.enabled=false
> distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/
> source.class=gobblin.source.extractor.filebased.TextFileBasedSource
> converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter"
> writer.builder.class=gobblin.writer.AvroDataWriterBuilder
> source.entity=
> source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> extract.table.name=CsvToAvro
> extract.namespace=gobblin.example
> extract.table.type=APPEND_ONLY
> # source data schema
> source.schema={"namespace":"example.avro", "type":"record", "name":"User", 
> "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number",  
> "type":"int"}, {#"name":"favorite_color", "type":"string"}]}
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> # quality checker configuration properties
> qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy
> qualitychecker.task.policy.types=OPTIONAL,OPTIONAL
> qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy
> qualitychecker.row.policy.types=OPTIONAL
> # data publisher class to be used
> data.publisher.type=gobblin.publisher.BaseDataPublisher
> # writer configuration properties
> writer.destination.type=HDFS
> writer.output.format=AVRO
> fs.uri=hdfs://:8020/
> writer.fs.uri=hdfs://...:8020/
> state.store.fs.uri=hdfs://:8020/
> mr.job.root.dir=/user/ndxmetadata/output/working
> state.store.dir=/user/ndxmetadata/output/state-store
> writer.staging.dir=/user/ndxmetadata/output/task-staging
> writer.output.dir=/user/ndxmetadata/output/task-output
> data.publisher.final.dir=/user/ndxmetadata/output/
> ---
> Log's attached below



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE

2017-11-22 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263895#comment-16263895
 ] 

Joel Baranick commented on GOBBLIN-321:
---

>From your logs, the class you are loading is 
>{{org.apache.gobblin.source.extractor.filebased.TextFileBasedSource}}, 0.11.0 
>doesn't use the apache namespaces. Compare 
>[0.11.0|https://github.com/apache/incubator-gobblin/blob/gobblin_0.11.0/gobblin-core/src/main/java/gobblin/source/extractor/filebased/TextFileBasedSource.java]
> to 
>[master|https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/source/extractor/filebased/TextFileBasedSource.java]

> CSV to HDFS ISSUE
> -
>
> Key: GOBBLIN-321
> URL: https://issues.apache.org/jira/browse/GOBBLIN-321
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Azmal Sheik
>Priority: Critical
>  Labels: beginner, newbie, starter
> Attachments: gobblin-current.log, job.txt
>
>
> I was trying to load csv file data to HDFS with below job conf But I'm facing 
> class not found error, I have checked in lib/gobblin-core.jar the class 
> TextFileBasedSource is present but it was saying class not found.
> Can anyone help over here
> Here is JOB,LOGS
> *JOB :
> *
> ## job configuration file ##
> job.name=json-gobblin-hdfs
> job.group=Gobblin-Json-Demo
> job.description=Publishing JSON data from files to HDFS in Avro format.
> job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/
> job.lock.enabled=false
> distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/
> source.class=gobblin.source.extractor.filebased.TextFileBasedSource
> converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter"
> writer.builder.class=gobblin.writer.AvroDataWriterBuilder
> source.entity=
> source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> extract.table.name=CsvToAvro
> extract.namespace=gobblin.example
> extract.table.type=APPEND_ONLY
> # source data schema
> source.schema={"namespace":"example.avro", "type":"record", "name":"User", 
> "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number",  
> "type":"int"}, {#"name":"favorite_color", "type":"string"}]}
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> # quality checker configuration properties
> qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy
> qualitychecker.task.policy.types=OPTIONAL,OPTIONAL
> qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy
> qualitychecker.row.policy.types=OPTIONAL
> # data publisher class to be used
> data.publisher.type=gobblin.publisher.BaseDataPublisher
> # writer configuration properties
> writer.destination.type=HDFS
> writer.output.format=AVRO
> fs.uri=hdfs://:8020/
> writer.fs.uri=hdfs://...:8020/
> state.store.fs.uri=hdfs://:8020/
> mr.job.root.dir=/user/ndxmetadata/output/working
> state.store.dir=/user/ndxmetadata/output/state-store
> writer.staging.dir=/user/ndxmetadata/output/task-staging
> writer.output.dir=/user/ndxmetadata/output/task-output
> data.publisher.final.dir=/user/ndxmetadata/output/
> ---
> Log's attached below



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (GOBBLIN-321) CSV to HDFS ISSUE

2017-11-22 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263895#comment-16263895
 ] 

Joel Baranick edited comment on GOBBLIN-321 at 11/23/17 7:07 AM:
-

>From your logs, the class you are loading is 
>{{org.apache.gobblin.source.extractor.filebased.TextFileBasedSource}}, 0.11.0 
>doesn't use the apache namespaces. Compare 
>[0.11.0|https://github.com/apache/incubator-gobblin/blob/gobblin_0.11.0/gobblin-core/src/main/java/gobblin/source/extractor/filebased/TextFileBasedSource.java]
> to 
>[master|https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/source/extractor/filebased/TextFileBasedSource.java].
>  You will see that the namespaces in master are all prefixed with 
>{{org.apache.}} because the gobblin was adopted as an apache incubator 
>project.  The last release pre incubator is 0.11.0.


was (Author: jbaranick):
>From your logs, the class you are loading is 
>{{org.apache.gobblin.source.extractor.filebased.TextFileBasedSource}}, 0.11.0 
>doesn't use the apache namespaces. Compare 
>[0.11.0|https://github.com/apache/incubator-gobblin/blob/gobblin_0.11.0/gobblin-core/src/main/java/gobblin/source/extractor/filebased/TextFileBasedSource.java]
> to 
>[master|https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/source/extractor/filebased/TextFileBasedSource.java]

> CSV to HDFS ISSUE
> -
>
> Key: GOBBLIN-321
> URL: https://issues.apache.org/jira/browse/GOBBLIN-321
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Azmal Sheik
>Priority: Critical
>  Labels: beginner, newbie, starter
> Attachments: gobblin-current.log, job.txt
>
>
> I was trying to load csv file data to HDFS with below job conf But I'm facing 
> class not found error, I have checked in lib/gobblin-core.jar the class 
> TextFileBasedSource is present but it was saying class not found.
> Can anyone help over here
> Here is JOB,LOGS
> *JOB :
> *
> ## job configuration file ##
> job.name=json-gobblin-hdfs
> job.group=Gobblin-Json-Demo
> job.description=Publishing JSON data from files to HDFS in Avro format.
> job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/
> job.lock.enabled=false
> distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/
> source.class=gobblin.source.extractor.filebased.TextFileBasedSource
> converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter"
> writer.builder.class=gobblin.writer.AvroDataWriterBuilder
> source.entity=
> source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> extract.table.name=CsvToAvro
> extract.namespace=gobblin.example
> extract.table.type=APPEND_ONLY
> # source data schema
> source.schema={"namespace":"example.avro", "type":"record", "name":"User", 
> "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number",  
> "type":"int"}, {#"name":"favorite_color", "type":"string"}]}
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> # quality checker configuration properties
> qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy
> qualitychecker.task.policy.types=OPTIONAL,OPTIONAL
> qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy
> qualitychecker.row.policy.types=OPTIONAL
> # data publisher class to be used
> data.publisher.type=gobblin.publisher.BaseDataPublisher
> # writer configuration properties
> writer.destination.type=HDFS
> writer.output.format=AVRO
> fs.uri=hdfs://:8020/
> writer.fs.uri=hdfs://...:8020/
> state.store.fs.uri=hdfs://:8020/
> mr.job.root.dir=/user/ndxmetadata/output/working
> state.store.dir=/user/ndxmetadata/output/state-store
> writer.staging.dir=/user/ndxmetadata/output/task-staging
> writer.output.dir=/user/ndxmetadata/output/task-output
> data.publisher.final.dir=/user/ndxmetadata/output/
> ---
> Log's attached below



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE

2017-11-22 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263891#comment-16263891
 ] 

Joel Baranick commented on GOBBLIN-321:
---

What version of gobblin?

> CSV to HDFS ISSUE
> -
>
> Key: GOBBLIN-321
> URL: https://issues.apache.org/jira/browse/GOBBLIN-321
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Azmal Sheik
>Priority: Critical
>  Labels: beginner, newbie, starter
> Attachments: gobblin-current.log, job.txt
>
>
> I was trying to load csv file data to HDFS with below job conf But I'm facing 
> class not found error, I have checked in lib/gobblin-core.jar the class 
> TextFileBasedSource is present but it was saying class not found.
> Can anyone help over here
> Here is JOB,LOGS
> *JOB :
> *
> ## job configuration file ##
> job.name=json-gobblin-hdfs
> job.group=Gobblin-Json-Demo
> job.description=Publishing JSON data from files to HDFS in Avro format.
> job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/
> job.lock.enabled=false
> distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/
> source.class=gobblin.source.extractor.filebased.TextFileBasedSource
> converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter"
> writer.builder.class=gobblin.writer.AvroDataWriterBuilder
> source.entity=
> source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> extract.table.name=CsvToAvro
> extract.namespace=gobblin.example
> extract.table.type=APPEND_ONLY
> # source data schema
> source.schema={"namespace":"example.avro", "type":"record", "name":"User", 
> "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number",  
> "type":"int"}, {#"name":"favorite_color", "type":"string"}]}
> gobblin.converter.schemaInjector.schema=SCHEMA
> converter.csv.to.json.delimiter=","
> # quality checker configuration properties
> qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy
> qualitychecker.task.policy.types=OPTIONAL,OPTIONAL
> qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy
> qualitychecker.row.policy.types=OPTIONAL
> # data publisher class to be used
> data.publisher.type=gobblin.publisher.BaseDataPublisher
> # writer configuration properties
> writer.destination.type=HDFS
> writer.output.format=AVRO
> fs.uri=hdfs://:8020/
> writer.fs.uri=hdfs://...:8020/
> state.store.fs.uri=hdfs://:8020/
> mr.job.root.dir=/user/ndxmetadata/output/working
> state.store.dir=/user/ndxmetadata/output/state-store
> writer.staging.dir=/user/ndxmetadata/output/task-staging
> writer.output.dir=/user/ndxmetadata/output/task-output
> data.publisher.final.dir=/user/ndxmetadata/output/
> ---
> Log's attached below



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely

2017-11-17 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256707#comment-16256707
 ] 

Joel Baranick commented on GOBBLIN-318:
---

There is one other manual way to recover from this hang.  Modify the state at 
{{/mycluster/PROPERTYSTORE/TaskRebalancer/myjob/Context}}, adding an entry for 
the hung job with a terminal state.  For instance, modify:
{code:javascript}
{
  "id": "WorkflowContext",
  "simpleFields": {
"START_TIME": "1505159715449",
"STATE": "IN_PROGRESS"
  },
  "listFields": {},
  "mapFields": {
"JOB_STATES": {
  "jobname_job_jobname_150741571": "COMPLETED",
  "jobname_job_jobname_150775680": "COMPLETED",
  "jobname_job_jobname_150795931": "COMPLETED",
  "jobname_job_jobname_1509857102910": "COMPLETED",
  "jobname_job_jobname_1510253708033": "COMPLETED",
  "jobname_job_jobname_1510271102898": "COMPLETED",
  "jobname_job_jobname_1510852210668": "COMPLETED",
  "jobname_job_jobname_1510853133675": "COMPLETED"
}
  }
}
{code}

Adding, {{"jobname_job_jobname_1510884004834": "COMPLETED"}} to {{JOB_STATES}} 
(don't forget the comma).

The updated json will look like: 

{code:java}
// Some comments here
{
  "id": "WorkflowContext",
  "simpleFields": {
"START_TIME": "1505159715449",
"STATE": "IN_PROGRESS"
  },
  "listFields": {},
  "mapFields": {
"JOB_STATES": {
  "jobname_job_jobname_150741571": "COMPLETED",
  "jobname_job_jobname_150775680": "COMPLETED",
  "jobname_job_jobname_150795931": "COMPLETED",
  "jobname_job_jobname_1509857102910": "COMPLETED",
  "jobname_job_jobname_1510253708033": "COMPLETED",
  "jobname_job_jobname_1510271102898": "COMPLETED",
  "jobname_job_jobname_1510852210668": "COMPLETED",
  "jobname_job_jobname_1510853133675": "COMPLETED",
  "jobname_job_jobname_1510884004834": "COMPLETED"
}
  }
}
{code}

This will allow Gobblin to detect that the job is done and finish it's 
execution.

I'm not sure if there are any other implications of doing this.

> Gobblin Helix Jobs Hang Indefinitely 
> -
>
> Key: GOBBLIN-318
> URL: https://issues.apache.org/jira/browse/GOBBLIN-318
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>Priority: Critical
>
> In some cases, gobblin helix jobs can hang indefinitely.  When coupled with 
> job locks, this can result in a job becoming stuck and not progressing.  The 
> only solution currently is to restart the master node.
> Assume the following is for a {{job_myjob_1510884004834}} and which hung at 
> 2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. 
> {{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job 
> as completed. This results in the {{TaskStateCollectorService}} indefinitely 
> searching for more task states, even though it has processed all the task 
> states that are ever going to be produced.  There is no reference to the hung 
> job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}.  In the Helix Web Admin, 
> the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. 
> There is no record of the job in Zookeeper at 
> {{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}.  This means that 
> the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails.
> {code:java}
> private void waitForJobCompletion() throws InterruptedException {
> while (true) {
>   WorkflowContext workflowContext = 
> TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName);
>   if (workflowContext != null) {
> org.apache.helix.task.TaskState helixJobState = 
> workflowContext.getJobState(this.jobResourceName);
> if (helixJobState == org.apache.helix.task.TaskState.COMPLETED ||
> helixJobState == org.apache.helix.task.TaskState.FAILED ||
> helixJobState == org.apache.helix.task.TaskState.STOPPED) {
>   return;
> }
>   }
>   Thread.sleep(1000);
> }
>   }
> {code}
> The code gets the job state from Zookeeper:
> {code:javascript}
> {
>   "id": "WorkflowContext",
>   "simpleFields": {
> "START_TIME": "1505159715449",
> "STATE": "IN_PROGRESS"
>   },
>   "listFields": {},
>   "mapFields": {
> "JOB_STATES": {
>   "jobname_job_jobname_150741571": "COMPLETED",
>   "jobname_job_jobname_150775680": "COMPLETED",
>   "jobname_job_jobname_150795931": "COMPLETED",
>   "jobname_job_jobname_1509857102910": "COMPLETED",
>   "jobname_job_jobname_1510253708033": "COMPLETED",
>   "jobname_job_jobname_1510271102898": "COMPLETED",
>   "jobname_job_jobname_1510852210668": "COMPLETED",
>   "jobname_job_jobname_1510853133675": "COMPLETED"
> }
>   }
> }
> {code}
> But there is no information contained in the job state 

[jira] [Assigned] (GOBBLIN-311) Gobblin AWS runs old jobs when cluster is restarted.

2017-11-17 Thread Joel Baranick (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Baranick reassigned GOBBLIN-311:
-

Assignee: (was: Hung Tran)

> Gobblin AWS runs old jobs when cluster is restarted.
> 
>
> Key: GOBBLIN-311
> URL: https://issues.apache.org/jira/browse/GOBBLIN-311
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>
> On startup of my cluster, old jobs are still attempted. @htran1 said that 
> they should be cleaned up in Standalone mode, but that does not seem 
> compatible with running under AWS: 
> [http://gobblin.readthedocs.io/en/latest/user-guide/Gobblin-Deployment/#standalone-architecture]
> Also, if I enabled Standalone mode, then 
> {{GobblinClusterManager.sendShutdownRequest()}} won't be called. 
> Additionally, when enabling Standalone mode, GobblinClusterManager will call 
> the following code, which doesn't seem right if I'm running under AWS:
> {code:java}
>  // In AWS / Yarn mode, the cluster Launcher takes care of setting up Helix 
> cluster
> /// .. but for Standalone mode, we go via this main() method, so setup the 
> cluster here
> if (isStandaloneClusterManager) {
> // Create Helix cluster and connect to it
> String zkConnectionString = 
> config.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY);
> String helixClusterName = 
> config.getString(GobblinClusterConfigurationKeys.HELIX_CLUSTER_NAME_KEY);
> HelixUtils.createGobblinHelixCluster(zkConnectionString, 
> helixClusterName, false);
> LOGGER.info("Created Helix cluster " + helixClusterName);
> }
> {code}
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-318) Gobblin Helix Jobs Hang Indefinitely

2017-11-17 Thread Joel Baranick (JIRA)
Joel Baranick created GOBBLIN-318:
-

 Summary: Gobblin Helix Jobs Hang Indefinitely 
 Key: GOBBLIN-318
 URL: https://issues.apache.org/jira/browse/GOBBLIN-318
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Joel Baranick
Priority: Critical


In some cases, gobblin helix jobs can hang indefinitely.  When coupled with job 
locks, this can result in a job becoming stuck and not progressing.  The only 
solution currently is to restart the master node.

Assume the following is for a {{job_myjob_1510884004834}} and which hung at 
2017-11-17 02:09:00 UTC and was still hung at 2017-11-17 09:12:00 UTC. 
{{GobblinHelixJobLauncher.waitForJobCompletion()}} is never detecting the job 
as completed. This results in the {{TaskStateCollectorService}} indefinitely 
searching for more task states, even though it has processed all the task 
states that are ever going to be produced.  There is no reference to the hung 
job in Zookeeper at {{/mycluster/CONFIGS/RESOURCE}}.  In the Helix Web Admin, 
the hung job doesn't exist at {{/clusters/mycluster/jobQueues/jobname}}. There 
is no record of the job in Zookeeper at 
{{/mycluster/PROPERTYSTORE/TaskRebalancer/jobname/Context}}.  This means that 
the {{GobblinHelixJobLauncher.waitForJobCompletion()}} code fails.
{code:java}
private void waitForJobCompletion() throws InterruptedException {
while (true) {
  WorkflowContext workflowContext = 
TaskDriver.getWorkflowContext(this.helixManager, this.helixQueueName);
  if (workflowContext != null) {
org.apache.helix.task.TaskState helixJobState = 
workflowContext.getJobState(this.jobResourceName);
if (helixJobState == org.apache.helix.task.TaskState.COMPLETED ||
helixJobState == org.apache.helix.task.TaskState.FAILED ||
helixJobState == org.apache.helix.task.TaskState.STOPPED) {
  return;
}
  }

  Thread.sleep(1000);
}
  }
{code}

The code gets the job state from Zookeeper:
{code:javascript}
{
  "id": "WorkflowContext",
  "simpleFields": {
"START_TIME": "1505159715449",
"STATE": "IN_PROGRESS"
  },
  "listFields": {},
  "mapFields": {
"JOB_STATES": {
  "jobname_job_jobname_150741571": "COMPLETED",
  "jobname_job_jobname_150775680": "COMPLETED",
  "jobname_job_jobname_150795931": "COMPLETED",
  "jobname_job_jobname_1509857102910": "COMPLETED",
  "jobname_job_jobname_1510253708033": "COMPLETED",
  "jobname_job_jobname_1510271102898": "COMPLETED",
  "jobname_job_jobname_1510852210668": "COMPLETED",
  "jobname_job_jobname_1510853133675": "COMPLETED"
}
  }
}
{code}

But there is no information contained in the job state for the hung job.

Also, it is really strange that the job states contained in that json blob are 
so old.  The oldest one is from 2017-10-7 10:35:00 PM UTC, more than a month 
ago.

I'm not sure how the system got in this state, but this isn't the first time we 
have seen this.  While it would be good to prevent this from happening, it 
would also be good to allow the system to recover if this state is entered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-316) gobblin.util.ImmutableProperties behavior is different from Properties

2017-11-16 Thread Joel Baranick (JIRA)
Joel Baranick created GOBBLIN-316:
-

 Summary: gobblin.util.ImmutableProperties behavior is different 
from Properties
 Key: GOBBLIN-316
 URL: https://issues.apache.org/jira/browse/GOBBLIN-316
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Joel Baranick


gobblin.util.ImmutableProperties uses Lombok's @Delegate annotation to delegate 
calls to the underlying Properties implementation.  Unfortunately @Delegate 
isn't delegating cals to the underlying Hashtable.  This results in different 
behavior between Properties and ImmutableProperties.  For example, on 
Properties, .keys() and .keyset() return the same list of keys.  However on 
ImmutableProperties, .keys() returns an empty enumerable and .keyset() return 
all the keys.

Additionally, Lombok's @Delegate is likely to be removed in a future version of 
the library as they are not pleased with it: 
https://projectlombok.org/features/experimental/Delegate



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-311) Gobblin AWS runs old jobs when cluster is restarted.

2017-11-09 Thread Joel Baranick (JIRA)
Joel Baranick created GOBBLIN-311:
-

 Summary: Gobblin AWS runs old jobs when cluster is restarted.
 Key: GOBBLIN-311
 URL: https://issues.apache.org/jira/browse/GOBBLIN-311
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Joel Baranick
Assignee: Hung Tran


On startup of my cluster, old jobs are still attempted. @htran1 said that they 
should be cleaned up in Standalone mode, but that does not seem compatible with 
running under AWS: 
[http://gobblin.readthedocs.io/en/latest/user-guide/Gobblin-Deployment/#standalone-architecture]
Also, if I enabled Standalone mode, then 
{{GobblinClusterManager.sendShutdownRequest()}} won't be called. Additionally, 
when enabling Standalone mode, GobblinClusterManager will call the following 
code, which doesn't seem right if I'm running under AWS:

{code:java}
 // In AWS / Yarn mode, the cluster Launcher takes care of setting up Helix 
cluster
/// .. but for Standalone mode, we go via this main() method, so setup the 
cluster here
if (isStandaloneClusterManager) {
// Create Helix cluster and connect to it
String zkConnectionString = 
config.getString(GobblinClusterConfigurationKeys.ZK_CONNECTION_STRING_KEY);
String helixClusterName = 
config.getString(GobblinClusterConfigurationKeys.HELIX_CLUSTER_NAME_KEY);
HelixUtils.createGobblinHelixCluster(zkConnectionString, helixClusterName, 
false);
LOGGER.info("Created Helix cluster " + helixClusterName);
}
{code}

Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-227) JobLauncherUtils.cleanTaskStagingData fails for jobs with forks

2017-08-28 Thread Joel Baranick (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Baranick updated GOBBLIN-227:
--
Description: 
*Precondition:* 
Using Hocon configuration and have two forks configured.

*Summary:* 
When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it 
tries to lookup {{writer.staging.dir}} in the configuration and fails.

*Details:*
Hocon configuration doesn't allow the following config:
{code:none}
writer.staging.dir=/foo
writer.staging.dir.0=/foo
writer.staging.dir.1=/foo
{code}
Initially {{writer.staging.dir}} is of type String, but when the Hocon parser 
encounters {{writer.staging.dir.0}}, it decides that {{writer.staging.dir}} is 
now of type Object, overwriting the prior value with {{_\{"0": "/foo"\}_}}.
The effective Hocon configuration is:

{code:javascript}
{
  "writer": {
"staging": {
  "dir": {
"0": "/foo",
"1": "/foo"
  }
}
  }
}
{code}

Fork specific configuration uses the same config keys as regular configuration 
except the fork number is appended like: {{.1}}.  The code that looks up fork 
specific configuration doesn't automatically fallback to regular configuration. 
 For example, if the code is trying to find {{writer.staging.dir.0}} and it 
isn't configured, the job will fail.  Then means that all forks must configure 
fork specific versions of {{writer.staging.dir}}.

When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it 
cleans up the based on the current job's configuration.  Because of this, 
{{fork.branches}} is always set to {{1}}. The call to 
{{WriterUtils.getWriterStagingDir(state, numBranches, branchId)}} is made with 
{{numBranches=1}} and {{branchId=0}}.  This results in the method looking for 
{{writer.staging.dir}}. Unfortunately, when using Hocon configuration the value 
{{writer.staging.dir}} doesn't exist and the job fails.

  was:
*Precondition:* 
Using Hocon configuration and have two forks configured.

*Summary:* 
When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it 
tries to lookup {{writer.staging.dir}} in the configuration and fails.

*Details:*
Hocon configuration doesn't allow the following config:
{code:none}
writer.staging.dir=/foo
writer.staging.dir.0=/foo
writer.staging.dir.1=/foo
{code}
Initially {{writer.staging.dir}} is of type String, but when the Hocon parser 
encounters {{writer.staging.dir.0}}, it decides that {{writer.staging.dir}} is 
now of type Object, overwriting the prior value with {{_\{"0": "/foo"\}_}}.
The effective Hocon configuration is:

{code:javascript}
{
  "writer": {
"staging": {
  "dir": {
"0": "/foo",
"1": "/foo"
  }
}
  }
}
{code}

Fork specific configuration uses the same config keys as regular configuration 
except the fork number is appended like: {{.1}}.  The code that looks up fork 
specific configuration doesn't automatically fallback to regular configuration. 
 For example, if the code is trying to find {{writer.staging.dir.0}} and it 
isn't configured, the job will fail.  Then means that all forks must configure 
fork specific versions of {{writer.staging.dir}}.

When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it 
cleans up the based on the current job's configuration.  Because of this, 
{{fork.branches}} is always set to {{1}}. The call to 
{{WriterUtils.getWriterStagingDir(state, numBranches, branchId)}} is make with 
{{numBranches=1}} and {{branchId=0}}.  This results in the method looking for 
{{writer.staging.dir}}. Unfortunately, when using Hocon configuration the value 
{{writer.staging.dir}} doesn't exist and the job fails.


> JobLauncherUtils.cleanTaskStagingData fails for jobs with forks
> ---
>
> Key: GOBBLIN-227
> URL: https://issues.apache.org/jira/browse/GOBBLIN-227
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>
> *Precondition:* 
> Using Hocon configuration and have two forks configured.
> *Summary:* 
> When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} 
> it tries to lookup {{writer.staging.dir}} in the configuration and fails.
> *Details:*
> Hocon configuration doesn't allow the following config:
> {code:none}
> writer.staging.dir=/foo
> writer.staging.dir.0=/foo
> writer.staging.dir.1=/foo
> {code}
> Initially {{writer.staging.dir}} is of type String, but when the Hocon parser 
> encounters {{writer.staging.dir.0}}, it decides that {{writer.staging.dir}} 
> is now of type Object, overwriting the prior value with {{_\{"0": "/foo"\}_}}.
> The effective Hocon configuration is:
> {code:javascript}
> {
>   "writer": {
> "staging": {
>   "dir": {
> "0": "/foo",
> "1": "/foo"
>   }
> }
>   }
> }
> {code}
> Fork specific configuration 

[jira] [Created] (GOBBLIN-227) JobLauncherUtils.cleanTaskStagingData fails for jobs with forks

2017-08-28 Thread Joel Baranick (JIRA)
Joel Baranick created GOBBLIN-227:
-

 Summary: JobLauncherUtils.cleanTaskStagingData fails for jobs with 
forks
 Key: GOBBLIN-227
 URL: https://issues.apache.org/jira/browse/GOBBLIN-227
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Joel Baranick


*Precondition:* 
Using Hocon configuration and have two forks configured.

*Summary:* 
When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it 
tries to lookup {{writer.staging.dir}} in the configuration and fails.

*Details:*
Hocon configuration doesn't allow the following config:
{code:none}
writer.staging.dir=/foo
writer.staging.dir.0=/foo
writer.staging.dir.1=/foo
{code}
Initially {{writer.staging.dir}} is of type String, but when the Hocon parser 
encounters {{writer.staging.dir.0}}, it decides that {{writer.staging.dir}} is 
now of type Object, overwriting the prior value with {{_\{"0": "/foo"\}_}}.
The effective Hocon configuration is:

{code:javascript}
{
  "writer": {
"staging": {
  "dir": {
"0": "/foo",
"1": "/foo"
  }
}
  }
}
{code}

Fork specific configuration uses the same config keys as regular configuration 
except the fork number is appended like: {{.1}}.  The code that looks up fork 
specific configuration doesn't automatically fallback to regular configuration. 
 For example, if the code is trying to find {{writer.staging.dir.0}} and it 
isn't configured, the job will fail.  Then means that all forks must configure 
fork specific versions of {{writer.staging.dir}}.

When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it 
cleans up the based on the current job's configuration.  Because of this, 
{{fork.branches}} is always set to {{1}}. The call to 
{{WriterUtils.getWriterStagingDir(state, numBranches, branchId)}} is make with 
{{numBranches=1}} and {{branchId=0}}.  This results in the method looking for 
{{writer.staging.dir}}. Unfortunately, when using Hocon configuration the value 
{{writer.staging.dir}} doesn't exist and the job fails.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-208) JobCatalogs should fallback to system configuration

2017-08-14 Thread Joel Baranick (JIRA)
Joel Baranick created GOBBLIN-208:
-

 Summary: JobCatalogs should fallback to system configuration
 Key: GOBBLIN-208
 URL: https://issues.apache.org/jira/browse/GOBBLIN-208
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Joel Baranick


When `GobblinClusterManager` create the `JobCatalog`, it passes in a copy of 
the system config, scoped to the `gobblin.cluster.` prefix.  This causes 
problems later when jobs are being loaded because properties they refer to may 
not be available.  The config should fall back to the unmodified system config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-196) Properties saved to JobState cannot be retrieved from DatasetState

2017-08-09 Thread Joel Baranick (JIRA)
Joel Baranick created GOBBLIN-196:
-

 Summary: Properties saved to JobState cannot be retrieved from 
DatasetState 
 Key: GOBBLIN-196
 URL: https://issues.apache.org/jira/browse/GOBBLIN-196
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Joel Baranick


In 0.6.0, properties could be saved to JobState and then retrieved from 
DatasetState via a `getProp()` call.  From 0.9.0 on, properties can no longer 
be retrieved from DatasetState because `getProp()` (and other methods) have 
been overridden to throw `UnsupportedOperationException`.This is a 
backwards incompatible change and makes it hard for solutions to be developed 
on top of Gobblin that require state that is persisted across job runs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-32) StateStores created with rootDir that is incompatible with state.store.type

2017-08-07 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116876#comment-16116876
 ] 

Joel Baranick commented on GOBBLIN-32:
--

@htran1 Can you look at this PR, which should solve this issue: 
https://github.com/apache/incubator-gobblin/pull/2035

> StateStores created with rootDir that is incompatible with state.store.type
> ---
>
> Key: GOBBLIN-32
> URL: https://issues.apache.org/jira/browse/GOBBLIN-32
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>Assignee: Hung Tran
>
> The StateStores class, when run under gobblin-yarn, can be created with a 
> rootDir (which comes from the yarn application work directory and is in the 
> form of `HDFS://...`) that is incompatible with the configured 
> `state.store.type`.
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1848 
> *Github Reporter* : [~jbaranick] 
> *Github Created At* : 2017-05-09T17:30:00Z 
> *Github Updated At* : 2017-06-22T21:36:54Z 
> h3. Comments 
> 
> [~jbaranick] wrote on 2017-06-22T21:36:54Z : @htran1 Are you able to look 
> into this? 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/1848#issuecomment-310509726



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-159) Gobblin Cluster graceful shutdown of master and workers

2017-08-04 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114785#comment-16114785
 ] 

Joel Baranick commented on GOBBLIN-159:
---

Not sure why it didn't auto attach: 
https://github.com/apache/incubator-gobblin/pull/2037

> Gobblin Cluster graceful shutdown of master and workers
> ---
>
> Key: GOBBLIN-159
> URL: https://issues.apache.org/jira/browse/GOBBLIN-159
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Abhishek Tiwari
>Assignee: Zhixiong Chen
>
> Relevant chat from Gitter channel: 
> *Joel Baranick @kadaan Jun 30 10:47*
> Up scaling seems to work great. But down scaling caused problems with the 
> cluster.
> Basically, once the cpu dropped enough to start down scaling, something broke 
> where it stopped processing jobs.
> I’m concerned that the down scaling is not graceful and that the cluster 
> doesn’t respond nicely to workers leaving the cluster in the middle of 
> processing.
> There are a couple problems I see. One is that the workers down gracefully 
> stop running tasks and allow them to be picked up by other nodes.
> The other is that if task publishing is used, partial data might be published 
> when the node goes away. How does the task get completed without possibly 
> duplicating data?
> *Joel Baranick @kadaan Jun 30 12:07*
> @abti What I'm wondering is how we can shutdown a worker node and have it 
> gracefully stop working.
> *Joel Baranick @kadaan Jun 30 12:52*
> Also, seems like .../taskstates/... as well as the job...job.state file in 
> NFS don't get purged.
> Our NFS is experiencing unbounded growth. Are we missing a setting or service?
> *Abhishek Tiwari @abti Jun 30 15:36*
> I didn’t fully understand the issue. Did you see the workers abruptly cancel 
> the task or did they wait for it to finish before shutting down? If the 
> worker waits around enough for Task to finish, the task level publish should 
> be fine?
> *Joel Baranick @kadaan Jun 30 15:37*
> The workers never shut down.
> *Abhishek Tiwari @abti Jun 30 15:38*
> could be because they wait for graceful shutdown but do not leave cluster and 
> are assigned new tasks by helix?
> *Joel Baranick @kadaan Jun 30 15:39*
> I think one issue is that there is an 
> org.quartz.UnableToInterruptJobException in JobScheduler.shutDown which 
> causes it to never run 
> ExecutorsUtils.shutdownExecutorService(this.jobExecutor, Optional.of(LOG));
> *Abhishek Tiwari @abti Jun 30 15:40*
> also taskstates should get cleaned up, check with @htran1 too .. only wu 
> probably should be left around
> we need to add some cleaning mechanism for that
> we dont recall seeing the lurking state files
> *Joel Baranick @kadaan Jun 30 15:47*
> In my EFS/NFS, I have tons (> 6000) of files remaining under 
> .../_taskstates/... for jobs/tasks that have been completed for ages.
> *Abhishek Tiwari @abti Jun 30 16:29*
> wow thats unexpected, did master switch while several jobs were going on?
> *Joel Baranick @kadaan Jun 30 17:23*
> There isn't a way for master to switch without jobs running as they don't 
> cancel correctly.
> *Joel Baranick @kadaan Jul 05 14:22*
> @abti I was looking at fixing the cancellation problem.
> From what I can tell, GobblinHelixJob needs to implement InterruptableJob.
> And it needs to call jobLauncher.cancelJob(jobListener); when it is invoked.
> Does this seem right? Anything I'm missing?
> *Abhishek Tiwari @abti Jul 06 00:34*
> looks about right



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-138) Task metrics are not saved to Job History Database when running under Yarn

2017-08-04 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114558#comment-16114558
 ] 

Joel Baranick commented on GOBBLIN-138:
---

[~abti] Seems like there are being stored now, but they show up in the 
properties table, not the metrics table.

> Task metrics are not saved to Job History Database when running under Yarn
> --
>
> Key: GOBBLIN-138
> URL: https://issues.apache.org/jira/browse/GOBBLIN-138
> Project: Apache Gobblin
>  Issue Type: Bug
>  Components: gobblin-yarn
>Reporter: Joel Baranick
>Assignee: Abhishek Tiwari
>  Labels: Bug:Generic, LaunchType:Yarn
>
> Task level metrics are not transmitted from the containers that tasks are 
> running on back to the app_master.  This means that task level metrics cannot 
> be saved in the Job History Database.  We should be able to store the task 
> level metrics in the Job History Database just like when we run a standalone 
> job.
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/748 
> *Github Reporter* : [~jbaranick] 
> *Github Created At* : 2016-02-23T21:33:39Z 
> *Github Updated At* : 2016-03-08T06:15:25Z 
> h3. Comments 
> 
> *ydai1124* wrote on 2016-02-23T21:42:11Z : @kadaan We are deprecating the 
> Job/Task metrics you are using. It is better to switch to Gobblin Metrics: 
> https://github.com/linkedin/gobblin/wiki/Gobblin%20Metrics%20Architecture. It 
> has more contents and more stable. But we don't have the reporter to report 
> to database yet. You can implement your own reporter: 
> https://github.com/linkedin/gobblin/wiki/Implementing%20New%20Reporters.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/748#issuecomment-187926718 
> 
> [~stakiar] wrote on 2016-02-24T00:07:38Z : @kadaan is this bug blocking the 
> deployment of the `JobHistoryStore` or are you actually interesting in 
> viewing the `TaskMetrics`? Just want to understand where the bug is.
> AS @ydai1124 mentioned we are moving away from `TaskMetrics` and 
> `JobMetrics`, and at some point we want to remove the current way 
> `TaskMetrics` and `JobMetrics` are written to the `JobHistoryStore`.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/748#issuecomment-187977305 
> 
> [~jbaranick] wrote on 2016-02-24T00:09:20Z : No, it is not blocking anything.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/748#issuecomment-187978192



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-152) Private version of Apache Helix causes maven repo to be unusable

2017-08-04 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114557#comment-16114557
 ] 

Joel Baranick commented on GOBBLIN-152:
---

[~abti] Hasn't this already been fixed?

> Private version of Apache Helix causes maven repo to be unusable
> 
>
> Key: GOBBLIN-152
> URL: https://issues.apache.org/jira/browse/GOBBLIN-152
> Project: Apache Gobblin
>  Issue Type: Bug
>  Components: gobblin-helix
>Reporter: Joel Baranick
>Assignee: Hung Tran
>  Labels: Bug:Generic, Framework:Build, LaunchType:Yarn
>
> The gobblin-yarn build.gradle includes a reference to an private version of 
> helix: `compile files('./src/main/resources/helix-core-0.6.6-SNAPSHOT.jar')`. 
>  When the gobblin libraries are pushed to maven, the private version of helix 
> is not pushed.  Because of this, tarballs built from maven are missing the 
> helix jar.
> Is it possible to switch to the latest release or snapshot version of helix? 
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/525 
> *Github Reporter* : [~jbaranick] 
> *Github Created At* : 2015-12-15T01:43:37Z 
> *Github Updated At* : 2017-01-12T04:31:44Z 
> h3. Comments 
> 
> [~liyinan926] wrote on 2015-12-15T19:26:40Z : The local Helix jar contains 
> critical patches that have not been merged into the trunk yet. We are working 
> on that though.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/525#issuecomment-164866018 
> 
> [~jbaranick] wrote on 2015-12-15T22:23:17Z : For now we are just pushing it 
> to artifactory. Please update this when the merge to helix trunk is done.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/525#issuecomment-164916245 
> 
> [~jbaranick] wrote on 2016-01-04T18:26:17Z : @liyinan926 What is the status 
> of this?
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/525#issuecomment-168759278 
> 
> [~jbaranick] wrote on 2016-01-14T16:44:05Z : @liyinan926 Can you please 
> provide links to the PRs for the critical patches to Helix so that we can 
> track the progress of this?
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/525#issuecomment-171697162 
> 
> [~stakiar] wrote on 2016-02-05T21:15:04Z : Here are the PRs:
> https://github.com/apache/helix/pull/34
> https://github.com/apache/helix/pull/35
> I believe release 0.7.2 of Helix will have these changes. I don't when it 
> will be release to Maven.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/525#issuecomment-180556016



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-129) AdminUI performs too many requests when update is pressed

2017-08-04 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114553#comment-16114553
 ] 

Joel Baranick commented on GOBBLIN-129:
---

Fixed by [Gobblin-9]

> AdminUI performs too many requests when update is pressed
> -
>
> Key: GOBBLIN-129
> URL: https://issues.apache.org/jira/browse/GOBBLIN-129
> Project: Apache Gobblin
>  Issue Type: Bug
>  Components: gobblin-admin
>Reporter: Joel Baranick
>Assignee: Abhishek Tiwari
>  Labels: Framework:AdminUI, enhancement
>
> After using the AdminUI for a while and navigating from the overview, to job 
> information, to job details, and back, the update button causes too many 
> requests.  The update button should only be updating the information for the 
> current page, but in this case it is making requests for data for all 
> previous pages as well.
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/784 
> *Github Reporter* : [~jbaranick] 
> *Github Created At* : 2016-03-02T07:09:21Z 
> *Github Updated At* : 2017-01-12T04:44:12Z



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (GOBBLIN-129) AdminUI performs too many requests when update is pressed

2017-08-04 Thread Joel Baranick (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Baranick resolved GOBBLIN-129.
---
Resolution: Fixed

Fixed by [Gobblin-9]

> AdminUI performs too many requests when update is pressed
> -
>
> Key: GOBBLIN-129
> URL: https://issues.apache.org/jira/browse/GOBBLIN-129
> Project: Apache Gobblin
>  Issue Type: Bug
>  Components: gobblin-admin
>Reporter: Joel Baranick
>Assignee: Abhishek Tiwari
>  Labels: Framework:AdminUI, enhancement
>
> After using the AdminUI for a while and navigating from the overview, to job 
> information, to job details, and back, the update button causes too many 
> requests.  The update button should only be updating the information for the 
> current page, but in this case it is making requests for data for all 
> previous pages as well.
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/784 
> *Github Reporter* : [~jbaranick] 
> *Github Created At* : 2016-03-02T07:09:21Z 
> *Github Updated At* : 2017-01-12T04:44:12Z



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-159) Gobblin Cluster graceful shutdown of master and workers

2017-08-04 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114544#comment-16114544
 ] 

Joel Baranick commented on GOBBLIN-159:
---

Zhixiong Chen, are you actively working on this?

> Gobblin Cluster graceful shutdown of master and workers
> ---
>
> Key: GOBBLIN-159
> URL: https://issues.apache.org/jira/browse/GOBBLIN-159
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Abhishek Tiwari
>Assignee: Zhixiong Chen
>
> Relevant chat from Gitter channel: 
> *Joel Baranick @kadaan Jun 30 10:47*
> Up scaling seems to work great. But down scaling caused problems with the 
> cluster.
> Basically, once the cpu dropped enough to start down scaling, something broke 
> where it stopped processing jobs.
> I’m concerned that the down scaling is not graceful and that the cluster 
> doesn’t respond nicely to workers leaving the cluster in the middle of 
> processing.
> There are a couple problems I see. One is that the workers down gracefully 
> stop running tasks and allow them to be picked up by other nodes.
> The other is that if task publishing is used, partial data might be published 
> when the node goes away. How does the task get completed without possibly 
> duplicating data?
> *Joel Baranick @kadaan Jun 30 12:07*
> @abti What I'm wondering is how we can shutdown a worker node and have it 
> gracefully stop working.
> *Joel Baranick @kadaan Jun 30 12:52*
> Also, seems like .../taskstates/... as well as the job...job.state file in 
> NFS don't get purged.
> Our NFS is experiencing unbounded growth. Are we missing a setting or service?
> *Abhishek Tiwari @abti Jun 30 15:36*
> I didn’t fully understand the issue. Did you see the workers abruptly cancel 
> the task or did they wait for it to finish before shutting down? If the 
> worker waits around enough for Task to finish, the task level publish should 
> be fine?
> *Joel Baranick @kadaan Jun 30 15:37*
> The workers never shut down.
> *Abhishek Tiwari @abti Jun 30 15:38*
> could be because they wait for graceful shutdown but do not leave cluster and 
> are assigned new tasks by helix?
> *Joel Baranick @kadaan Jun 30 15:39*
> I think one issue is that there is an 
> org.quartz.UnableToInterruptJobException in JobScheduler.shutDown which 
> causes it to never run 
> ExecutorsUtils.shutdownExecutorService(this.jobExecutor, Optional.of(LOG));
> *Abhishek Tiwari @abti Jun 30 15:40*
> also taskstates should get cleaned up, check with @htran1 too .. only wu 
> probably should be left around
> we need to add some cleaning mechanism for that
> we dont recall seeing the lurking state files
> *Joel Baranick @kadaan Jun 30 15:47*
> In my EFS/NFS, I have tons (> 6000) of files remaining under 
> .../_taskstates/... for jobs/tasks that have been completed for ages.
> *Abhishek Tiwari @abti Jun 30 16:29*
> wow thats unexpected, did master switch while several jobs were going on?
> *Joel Baranick @kadaan Jun 30 17:23*
> There isn't a way for master to switch without jobs running as they don't 
> cancel correctly.
> *Joel Baranick @kadaan Jul 05 14:22*
> @abti I was looking at fixing the cancellation problem.
> From what I can tell, GobblinHelixJob needs to implement InterruptableJob.
> And it needs to call jobLauncher.cancelJob(jobListener); when it is invoked.
> Does this seem right? Anything I'm missing?
> *Abhishek Tiwari @abti Jul 06 00:34*
> looks about right



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-188) Update website URLs to point to https://gobblin.apache.org/

2017-08-04 Thread Joel Baranick (JIRA)
Joel Baranick created GOBBLIN-188:
-

 Summary: Update website URLs to point to 
https://gobblin.apache.org/
 Key: GOBBLIN-188
 URL: https://issues.apache.org/jira/browse/GOBBLIN-188
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Joel Baranick
Assignee: Abhishek Tiwari


The URL at the top of https://github.com/apache/incubator-gobblin needs to 
point to https://gobblin.apache.org/
The URL listed as the website on 
http://incubator.apache.org/projects/gobblin.html needs to point to 
https://gobblin.apache.org/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-187) Gobblin Helix doesn't clean up `.job.state` files, causing unbounded disk usage

2017-08-04 Thread Joel Baranick (JIRA)
Joel Baranick created GOBBLIN-187:
-

 Summary: Gobblin Helix doesn't clean up `.job.state` files, 
causing unbounded disk usage
 Key: GOBBLIN-187
 URL: https://issues.apache.org/jira/browse/GOBBLIN-187
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Joel Baranick


Then Gobblin is running on `GobblinHelixJobLauncher.createJob` method writes 
the job state to a `.job.state` file.  Nothing cleans up these files.  The 
result is unbounded disk usage.  `.job.state` files should be deleted at the 
completion of jobs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (GOBBLIN-127) Admin UI duration chart is sorted incorrectly

2017-07-28 Thread Joel Baranick (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Baranick resolved GOBBLIN-127.
---
Resolution: Fixed

Fixed by [Gobblin-9]

> Admin UI duration chart is sorted incorrectly
> -
>
> Key: GOBBLIN-127
> URL: https://issues.apache.org/jira/browse/GOBBLIN-127
> Project: Apache Gobblin
>  Issue Type: Bug
>  Components: gobblin-admin
>Reporter: Joel Baranick
>  Labels: Bug:Generic, Framework:AdminUI
>
> The Job Duration chart in the AdminUI is sorted incorrectly.  It is sorted by 
> duration, but should be sorted by time.
>  src=https://cloud.githubusercontent.com/assets/1904898/13618328/60a39822-e538-11e5-983b-4706e45c2a34.png>
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/811 
> *Github Reporter* : [~jbaranick] 
> *Github Created At* : 2016-03-08T22:17:30Z 
> *Github Updated At* : 2017-01-12T04:46:36Z



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (GOBBLIN-109) Remove need for current.jst

2017-07-28 Thread Joel Baranick (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Baranick closed GOBBLIN-109.
-
Resolution: Fixed

No longer needed. We implemented out own S3aStateStore which handles this.

> Remove need for current.jst
> ---
>
> Key: GOBBLIN-109
> URL: https://issues.apache.org/jira/browse/GOBBLIN-109
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Joel Baranick
>  Labels: Framework:StateManagement, enhancement
>
> Fix for #882
>  
> *Github Url* : https://github.com/linkedin/gobblin/pull/965 
> *Github Reporter* : [~jbaranick] 
> *Github Assignee* : [~jbaranick] 
> *Github Created At* : 2016-05-05T22:04:54Z 
> *Github Updated At* : 2017-04-22T18:44:42Z 
> h3. Comments 
> 
> [~jbaranick] wrote on 2016-05-05T22:06:22Z : @sahilTakiar @zliu41: Can you 
> review this?
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-217293611 
> 
> *coveralls* wrote on 2016-05-05T22:21:08Z : [![Coverage 
> Status](https://coveralls.io/builds/6068577/badge)](https://coveralls.io/builds/6068577)
> Coverage increased (+0.5%) to 45.026% when pulling 
> **193dddb831475f931999a9aca54c5c00e2d082d3 on kadaan:FixFor882** into 
> **41963701538ae90ed8042c8d34a2ed7211a9af42 on linkedin:master**.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-217296762 
> 
> *zliu41* wrote on 2016-05-06T16:07:44Z : @kadaan could you please give a 
> brief description of your approach? It seems you are still using 
> `current.jst`, which is a different approach than #882.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-217485933 
> 
> [~jbaranick] wrote on 2016-05-06T16:35:48Z : `current.jst` is not used.  
> There is a compromise here so that users of the API aren't broken.  New 
> callers can call `getCurrent` or `getAllCurrent` to get the latest state.  If 
> they want a specific state they can continue to call `get` or `getAll`.  If 
> `current` or `current.jst` is requested when calling `get` or `getAll` it 
> will  return the latest state just like `getCurrent` and `getAllCurrent`.  A 
> precondition will ensure that users of the API are not able to write a file 
> named `current` or `current.jst`.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-217492590 
> 
> *coveralls* wrote on 2016-05-06T16:57:15Z : [![Coverage 
> Status](https://coveralls.io/builds/6078675/badge)](https://coveralls.io/builds/6078675)
> Coverage increased (+0.1%) to 45.096% when pulling 
> **e5f0498095f1f6bcbf25e3ed0316ffd772275d73 on kadaan:FixFor882** into 
> **588d8c77fe3c84c752fd410f916868419c178465 on linkedin:master**.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-217497736 
> 
> *coveralls* wrote on 2016-05-12T07:47:06Z : [![Coverage 
> Status](https://coveralls.io/builds/6149251/badge)](https://coveralls.io/builds/6149251)
> Coverage increased (+0.1%) to 46.765% when pulling 
> **67e66222cd441d903b4197d380df8041cce2cc9d on kadaan:FixFor882** into 
> **5cd9d969f73456e46847c9d9e7ef33ad5376617c on linkedin:master**.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-218684160 
> 
> [~jbaranick] wrote on 2016-05-12T20:52:47Z : @zliu41 @sahilTakiar Can you 
> guys finish this review?
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-218882376 
> 
> [~jbaranick] wrote on 2016-06-01T15:05:13Z : @zliu41 @sahilTakiar Can you 
> guys finish this review?
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-223022088 
> 
> *coveralls* wrote on 2016-06-01T15:33:05Z : [![Coverage 
> Status](https://coveralls.io/builds/6419146/badge)](https://coveralls.io/builds/6419146)
> Coverage increased (+0.07%) to 46.308% when pulling 
> **668262a91516d9919a1cd30c141b058514890c8e on kadaan:FixFor882** into 
> **fe7dc7c35eebc3a4faee9987ecccaae358c5 on linkedin:master**.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-223031389 
> 
> *zliu41* wrote on 2016-06-01T19:09:53Z : @pcadabam @ibuenros @ydai1124 can 
> you review this PR? Thanks
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-223094308 
> 
> *coveralls* wrote on 2016-06-01T20:12:06Z : [![Coverage 
> Status](https://coveralls.io/builds/6423539/badge)](https://coveralls.io/builds/6423539)
> Coverage increased (+0.2%) to 46.398% when pulling 
> **668262a91516d9919a1cd30c141b058514890c8e on kadaan:FixFor882** into 
> **fe7dc7c35eebc3a4faee9987ecccaae358c5 on linkedin:master**.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-223110224 
> 
> [~jbaranick] wrote on 

[jira] [Resolved] (GOBBLIN-39) JobHistoryDB migration files have been incorrectly modified

2017-07-28 Thread Joel Baranick (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-39?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Baranick resolved GOBBLIN-39.
--
Resolution: Fixed

Resolved by [GOBBLIN-11]

> JobHistoryDB migration files have been incorrectly modified
> ---
>
> Key: GOBBLIN-39
> URL: https://issues.apache.org/jira/browse/GOBBLIN-39
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>
> The flyway DB migration files cannot be changed after they are committed. If 
> you need to make a schema change they need to be done in a newly versioned 
> file. The way these changes have been made screw up the migration: 
> 58389c95dc00b23cb1c63ce88a18be9239aa465e, 
> a4dbf76d17c39f8282d3b765c32de61f2eb23404, 
> 82678450952a7de194b810dbd82cd0c5b4752e63. 
> Changing previous migration files changes the checksums.  The flyway 
> migration then fails because of the differing checksums.  This check is here 
> to ensure that flyway can always know what changes, and in what order, need 
> to be applied.  The DB migration is done by running: 
> `./historystore-manager.sh migrate -Durl=jdbc:mysql:///gobblin 
> -Duser= -Dpassword=`.  
> More details can be found in: 
> https://github.com/linkedin/gobblin/tree/master/gobblin-metastore/src/main/resources
> As a short term work around, the following can be added to the migration 
> command: `-DvalidateOnMigrate=false`.  This removes much of the safety net, 
> but allows the changes to be processed.  Please don't rely on this mechanism.
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1823 
> *Github Reporter* : [~jbaranick] 
> *Github Created At* : 2017-05-02T00:27:27Z 
> *Github Updated At* : 2017-05-02T00:27:27Z



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (GOBBLIN-40) Job History DB Schema had not been updated to reflect new LauncherType

2017-07-28 Thread Joel Baranick (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-40?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Baranick resolved GOBBLIN-40.
--
Resolution: Fixed

Resolved by [GOBBLIN-11]

> Job History DB Schema had not been updated to reflect new LauncherType
> --
>
> Key: GOBBLIN-40
> URL: https://issues.apache.org/jira/browse/GOBBLIN-40
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>
> A new launcher type has been added, `CLUSTER`, but the JobHistoryDB schema 
> has not been updated to support this type.
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1822 
> *Github Reporter* : [~jbaranick] 
> *Github Created At* : 2017-05-01T23:40:03Z 
> *Github Updated At* : 2017-05-01T23:40:03Z



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (GOBBLIN-30) Reflections errors when scanning classpath and encountering missing/invalid file paths.

2017-07-28 Thread Joel Baranick (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Baranick resolved GOBBLIN-30.
--
Resolution: Fixed

Resolved by [GOBBLIN-10]

> Reflections errors when scanning classpath and encountering missing/invalid 
> file paths.
> ---
>
> Key: GOBBLIN-30
> URL: https://issues.apache.org/jira/browse/GOBBLIN-30
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>
> Reflections should filter out classpath entries which are missing/invalid.
> ```
> 2017-05-04 23:58:03 UTC WARN  [JobExecutionInfoServer STARTING] 
> org.reflections.vfs.Vfs- could not create Dir using directory from url 
> file:/usr/lib/packages/hadoop2/hadoop2/share/hadoop/mapreduce/lib/*. skipping.
> java.lang.NullPointerException
>   at org.reflections.vfs.Vfs$DefaultUrlTypes$3.matches(Vfs.java:239)
>   at org.reflections.vfs.Vfs.fromURL(Vfs.java:98)
>   at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
>   at org.reflections.Reflections.scan(Reflections.java:237)
>   at org.reflections.Reflections.scan(Reflections.java:204)
>   at org.reflections.Reflections.(Reflections.java:129)
>   at org.reflections.Reflections.(Reflections.java:170)
>   at 
> gobblin.metastore.DatabaseJobHistoryStore.findVersionedDatabaseJobHistoryStore(DatabaseJobHistoryStore.java:102)
>   at 
> gobblin.metastore.DatabaseJobHistoryStore.(DatabaseJobHistoryStore.java:61)
>   at 
> gobblin.metastore.DatabaseJobHistoryStore$$FastClassByGuice$$ec6cc1b8.newInstance()
>   at 
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
>   at 
> com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:61)
>   at 
> com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:105)
>   at 
> com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
>   at 
> com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:267)
>   at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56)
>   at 
> com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1016)
>   at 
> com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092)
>   at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1012)
>   at 
> com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1051)
>   at 
> gobblin.rest.JobExecutionInfoServer.startUp(JobExecutionInfoServer.java:85)
>   at 
> com.google.common.util.concurrent.AbstractIdleService$2$1.run(AbstractIdleService.java:54)
>   at com.google.common.util.concurrent.Callables$3.run(Callables.java:95)
>   at java.lang.Thread.run(Thread.java:745)
> ```
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1851 
> *Github Reporter* : [~jbaranick] 
> *Github Created At* : 2017-05-09T17:41:23Z 
> *Github Updated At* : 2017-05-09T17:41:39Z



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-31) Reflections concurrency issue

2017-07-28 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105561#comment-16105561
 ] 

Joel Baranick commented on GOBBLIN-31:
--

[~abti] This was fixed by [GOBBLIN-10]

> Reflections concurrency issue
> -
>
> Key: GOBBLIN-31
> URL: https://issues.apache.org/jira/browse/GOBBLIN-31
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>
> Reflections has a concurrency issue that causes the classpath scanning in 
> `DatabaseJobHistoryStore` to intermittently fail.  The Reflections scanner 
> needs to be created only once per application.
> `2017-05-08 14:52:06 UTC INFO  [DefaultQuartzScheduler_Worker-1] 
> org.quartz.core.JobRunShell- Job my.job threw a JobExecutionException: 
> org.quartz.JobExecutionException: com.google.inject.ProvisionException: 
> Unable to provision, see the following errors:
> 1) Error injecting constructor, java.lang.IllegalStateException: zip file 
> closed
>   at 
> gobblin.metastore.DatabaseJobHistoryStore.(DatabaseJobHistoryStore.java:69)
>   while locating gobblin.metastore.DatabaseJobHistoryStore
>   while locating gobblin.metastore.JobHistoryStore
> 1 error [See nested exception: com.google.inject.ProvisionException: Unable 
> to provision, see the following errors:
> 1) Error injecting constructor, java.lang.IllegalStateException: zip file 
> closed
>   at 
> gobblin.metastore.DatabaseJobHistoryStore.(DatabaseJobHistoryStore.java:69)
>   while locating gobblin.metastore.DatabaseJobHistoryStore
>   while locating gobblin.metastore.JobHistoryStore
> 1 error]
>   at gobblin.cluster.GobblinHelixJob.executeImpl(GobblinHelixJob.java:87)
>   at gobblin.scheduler.BaseGobblinJob.execute(BaseGobblinJob.java:53)
>   at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>   at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
> Caused by: com.google.inject.ProvisionException: Unable to provision, see the 
> following errors:
> 1) Error injecting constructor, java.lang.IllegalStateException: zip file 
> closed
>   at 
> gobblin.metastore.DatabaseJobHistoryStore.(DatabaseJobHistoryStore.java:69)
>   while locating gobblin.metastore.DatabaseJobHistoryStore
>   while locating gobblin.metastore.JobHistoryStore
> 1 error
>   at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1025)
>   at 
> com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1051)
>   at gobblin.runtime.JobContext.createJobHistoryStore(JobContext.java:202)
>   at gobblin.runtime.JobContext.(JobContext.java:141)
>   at 
> gobblin.runtime.AbstractJobLauncher.(AbstractJobLauncher.java:172)
>   at 
> gobblin.runtime.AbstractJobLauncher.(AbstractJobLauncher.java:144)
>   at 
> gobblin.cluster.GobblinHelixJobLauncher.(GobblinHelixJobLauncher.java:120)
>   at gobblin.cluster.GobblinHelixJob.executeImpl(GobblinHelixJob.java:65)
>   ... 3 more
> Caused by: java.lang.IllegalStateException: zip file closed
>   at java.util.zip.ZipFile.ensureOpen(ZipFile.java:634)
>   at java.util.zip.ZipFile.access$200(ZipFile.java:56)
>   at java.util.zip.ZipFile$1.hasMoreElements(ZipFile.java:487)
>   at java.util.jar.JarFile$1.hasMoreElements(JarFile.java:241)
>   at org.reflections.vfs.ZipDir$1$1.computeNext(ZipDir.java:30)
>   at org.reflections.vfs.ZipDir$1$1.computeNext(ZipDir.java:26)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at org.reflections.Reflections.scan(Reflections.java:240)
>   at org.reflections.Reflections.scan(Reflections.java:204)
>   at org.reflections.Reflections.(Reflections.java:129)
>   at 
> gobblin.metastore.DatabaseJobHistoryStore.findVersionedDatabaseJobHistoryStore(DatabaseJobHistoryStore.java:124)
>   at 
> gobblin.metastore.DatabaseJobHistoryStore.(DatabaseJobHistoryStore.java:71)
>   at 
> gobblin.metastore.DatabaseJobHistoryStore$$FastClassByGuice$$ec6cc1b8.newInstance()
>   at 
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
>   at 
> com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:61)
>   at 
> com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:105)
>   at 
> com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
>   at 
> com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:267)
>   at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56)
>   at 
> com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1016)
>   at 
> 

[jira] [Commented] (GOBBLIN-30) Reflections errors when scanning classpath and encountering missing/invalid file paths.

2017-07-28 Thread Joel Baranick (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105559#comment-16105559
 ] 

Joel Baranick commented on GOBBLIN-30:
--

[~abti] This was fixed by [GOBBLIN-10]

> Reflections errors when scanning classpath and encountering missing/invalid 
> file paths.
> ---
>
> Key: GOBBLIN-30
> URL: https://issues.apache.org/jira/browse/GOBBLIN-30
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Joel Baranick
>
> Reflections should filter out classpath entries which are missing/invalid.
> ```
> 2017-05-04 23:58:03 UTC WARN  [JobExecutionInfoServer STARTING] 
> org.reflections.vfs.Vfs- could not create Dir using directory from url 
> file:/usr/lib/packages/hadoop2/hadoop2/share/hadoop/mapreduce/lib/*. skipping.
> java.lang.NullPointerException
>   at org.reflections.vfs.Vfs$DefaultUrlTypes$3.matches(Vfs.java:239)
>   at org.reflections.vfs.Vfs.fromURL(Vfs.java:98)
>   at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
>   at org.reflections.Reflections.scan(Reflections.java:237)
>   at org.reflections.Reflections.scan(Reflections.java:204)
>   at org.reflections.Reflections.(Reflections.java:129)
>   at org.reflections.Reflections.(Reflections.java:170)
>   at 
> gobblin.metastore.DatabaseJobHistoryStore.findVersionedDatabaseJobHistoryStore(DatabaseJobHistoryStore.java:102)
>   at 
> gobblin.metastore.DatabaseJobHistoryStore.(DatabaseJobHistoryStore.java:61)
>   at 
> gobblin.metastore.DatabaseJobHistoryStore$$FastClassByGuice$$ec6cc1b8.newInstance()
>   at 
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
>   at 
> com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:61)
>   at 
> com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:105)
>   at 
> com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
>   at 
> com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:267)
>   at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56)
>   at 
> com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1016)
>   at 
> com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092)
>   at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1012)
>   at 
> com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1051)
>   at 
> gobblin.rest.JobExecutionInfoServer.startUp(JobExecutionInfoServer.java:85)
>   at 
> com.google.common.util.concurrent.AbstractIdleService$2$1.run(AbstractIdleService.java:54)
>   at com.google.common.util.concurrent.Callables$3.run(Callables.java:95)
>   at java.lang.Thread.run(Thread.java:745)
> ```
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1851 
> *Github Reporter* : [~jbaranick] 
> *Github Created At* : 2017-05-09T17:41:23Z 
> *Github Updated At* : 2017-05-09T17:41:39Z



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)