[ 
https://issues.apache.org/jira/browse/GOBBLIN-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-336.
-------------------------------
    Resolution: Fixed

Issue resolved by pull request #2193
[https://github.com/apache/incubator-gobblin/pull/2193]

> Gobblin Cluster Job Isolation
> -----------------------------
>
>                 Key: GOBBLIN-336
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-336
>             Project: Apache Gobblin
>          Issue Type: Improvement
>            Reporter: Ray Yang
>
> Gobblin cluster runs Gobblin jobs. Each cluster worker host runs jobs in a 
> thread pool in a single JVM. The thread pool is reused for next jobs after 
> previous jobs finish.  
> Gobblin cluster recently ran into issues with resource leakage. The cluster 
> would fail all job executions when certain resources such as threads were 
> exhausted. To recover, the whole cluster has to be restarted and jobs have to 
> be retried. With the expected increase in the number of jobs executed, such 
> errors happen more frequently.  We have identified the causes and fixes have 
> been verfied. However, there are concerns that unknown similar bugs may show 
> up later that may bring the whole cluster down. 
> In general, any bug in one job’s code may affect the executions of another 
> job since they run in the same JVM. It’s also possible that a bug will only 
> be triggered by certain input data which is specific to a subset of jobs. 
> The cluster will be more robust if a job execution is better isolated from 
> another job. 
> In the future, we expect jobs will become more diverse as more use cases are 
> on-boarded. The need for job isolation will become more important over time.  
> In the future job isolation may be required for security reasons too. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to