[ https://issues.apache.org/jira/browse/GOBBLIN-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hung Tran resolved GOBBLIN-336. ------------------------------- Resolution: Fixed Issue resolved by pull request #2193 [https://github.com/apache/incubator-gobblin/pull/2193] > Gobblin Cluster Job Isolation > ----------------------------- > > Key: GOBBLIN-336 > URL: https://issues.apache.org/jira/browse/GOBBLIN-336 > Project: Apache Gobblin > Issue Type: Improvement > Reporter: Ray Yang > > Gobblin cluster runs Gobblin jobs. Each cluster worker host runs jobs in a > thread pool in a single JVM. The thread pool is reused for next jobs after > previous jobs finish. > Gobblin cluster recently ran into issues with resource leakage. The cluster > would fail all job executions when certain resources such as threads were > exhausted. To recover, the whole cluster has to be restarted and jobs have to > be retried. With the expected increase in the number of jobs executed, such > errors happen more frequently. We have identified the causes and fixes have > been verfied. However, there are concerns that unknown similar bugs may show > up later that may bring the whole cluster down. > In general, any bug in one job’s code may affect the executions of another > job since they run in the same JVM. It’s also possible that a bug will only > be triggered by certain input data which is specific to a subset of jobs. > The cluster will be more robust if a job execution is better isolated from > another job. > In the future, we expect jobs will become more diverse as more use cases are > on-boarded. The need for job isolation will become more important over time. > In the future job isolation may be required for security reasons too. -- This message was sent by Atlassian JIRA (v6.4.14#64029)