[jira] [Commented] (OOZIE-2887) Oozie Server hangs when there is a user job has wrong namenode address

2019-01-12 Thread duan xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741422#comment-16741422
 ] 

duan xiong commented on OOZIE-2887:
---

[~Prabhu Joseph] Hi,When we use Hadoop HA,Hadoop can provide one public domain 
name,For examples: hdfs://hadoop.com,Then when we submit job, we can use this 
value replace nn1,nn2, this method can avoid this problem.

> Oozie Server hangs when there is a user job has wrong namenode address 
> ---
>
> Key: OOZIE-2887
> URL: https://issues.apache.org/jira/browse/OOZIE-2887
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 4.3.0
>Reporter: Prabhu Joseph
>Priority: Critical
>
> All the oozie jobs goes to PREP state when a user job tries to connect to 
> wrong namenode address by mistake. Analyzing the jstack, all the threads 
> which tries to submit job waiting to lock "java.util.ServiceLoader"
> {code}
> "pool-2-thread-19" #47 prio=5 os_prio=0 tid=0x7f8c08734000 nid=0xb468 
> waiting for monitor entry [0x7f8bf207a000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:89)
> - waiting to lock <0x81b29098> (a java.util.ServiceLoader)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
> at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1260)
> at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1256)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at org.apache.hadoop.mapreduce.Job.connect(Job.java:1255)
> - locked <0x82fd6b30> (a org.apache.hadoop.mapreduce.Job)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1284)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1187)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1373)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
> at org.apache.oozie.command.XCommand.call(XCommand.java:287)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:331)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:260)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> And the thread which tries to connect to wrong NameNode address which has 
> acquired the lock and keeps on retrying to connect to NameNode for ever. 
> {code}
> "pool-2-thread-20" #48 prio=5 os_prio=0 tid=0x7f8c08736000 nid=0xb469 
> waiting on condition [0x7f8bf1f78000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:899)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:666)
> - locked <0x83b80360> (a 
> org.apache.hadoop.ipc.Client$Connection)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:745)
> - locked <0x83b80360> (a 
> org.apache.hadoop.ipc.Client$Connection)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1618)
> at org.apache.hadoop.ipc.Client.call(Client.java:1449)
> at 

[jira] [Commented] (OOZIE-2887) Oozie Server hangs when there is a user job has wrong namenode address

2017-08-03 Thread Prabhu Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112427#comment-16112427
 ] 

Prabhu Joseph commented on OOZIE-2887:
--

The issue happens even with job.properties having the correct namenode address 
when the NameNode nn1 machine is down. WhiteList configuration does not help 
here.

{code}
Repro:

NameNode HA - nn1, nn2 
Shutdown nn1
yarn.timeline.service.enabled true
Now all oozie jobs will go to PREP where one thread will keep on retrying to 
connect to nn1 node and other threads waiting to lock the object.
{code}


> Oozie Server hangs when there is a user job has wrong namenode address 
> ---
>
> Key: OOZIE-2887
> URL: https://issues.apache.org/jira/browse/OOZIE-2887
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 4.3.0
>Reporter: Prabhu Joseph
>Priority: Critical
>
> All the oozie jobs goes to PREP state when a user job tries to connect to 
> wrong namenode address by mistake. Analyzing the jstack, all the threads 
> which tries to submit job waiting to lock "java.util.ServiceLoader"
> {code}
> "pool-2-thread-19" #47 prio=5 os_prio=0 tid=0x7f8c08734000 nid=0xb468 
> waiting for monitor entry [0x7f8bf207a000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:89)
> - waiting to lock <0x81b29098> (a java.util.ServiceLoader)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
> at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1260)
> at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1256)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at org.apache.hadoop.mapreduce.Job.connect(Job.java:1255)
> - locked <0x82fd6b30> (a org.apache.hadoop.mapreduce.Job)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1284)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1187)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1373)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
> at org.apache.oozie.command.XCommand.call(XCommand.java:287)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:331)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:260)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> And the thread which tries to connect to wrong NameNode address which has 
> acquired the lock and keeps on retrying to connect to NameNode for ever. 
> {code}
> "pool-2-thread-20" #48 prio=5 os_prio=0 tid=0x7f8c08736000 nid=0xb469 
> waiting on condition [0x7f8bf1f78000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:899)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:666)
> - locked <0x83b80360> (a 
> org.apache.hadoop.ipc.Client$Connection)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:745)
> - locked <0x83b80360> (a 
> org.apache.hadoop.ipc.Client$Connection)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397)
>  

[jira] [Commented] (OOZIE-2887) Oozie Server hangs when there is a user job has wrong namenode address

2017-07-17 Thread Daniel Becker (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089526#comment-16089526
 ] 

Daniel Becker commented on OOZIE-2887:
--

I couldn't reproduce it on pseudo-cluster, the issue might be in Hadoop retry 
logic.

> Oozie Server hangs when there is a user job has wrong namenode address 
> ---
>
> Key: OOZIE-2887
> URL: https://issues.apache.org/jira/browse/OOZIE-2887
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 4.3.0
>Reporter: Prabhu Joseph
>Priority: Critical
>
> All the oozie jobs goes to PREP state when a user job tries to connect to 
> wrong namenode address by mistake. Analyzing the jstack, all the threads 
> which tries to submit job waiting to lock "java.util.ServiceLoader"
> {code}
> "pool-2-thread-19" #47 prio=5 os_prio=0 tid=0x7f8c08734000 nid=0xb468 
> waiting for monitor entry [0x7f8bf207a000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:89)
> - waiting to lock <0x81b29098> (a java.util.ServiceLoader)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
> at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1260)
> at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1256)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at org.apache.hadoop.mapreduce.Job.connect(Job.java:1255)
> - locked <0x82fd6b30> (a org.apache.hadoop.mapreduce.Job)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1284)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1187)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1373)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
> at org.apache.oozie.command.XCommand.call(XCommand.java:287)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:331)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:260)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> And the thread which tries to connect to wrong NameNode address which has 
> acquired the lock and keeps on retrying to connect to NameNode for ever. 
> {code}
> "pool-2-thread-20" #48 prio=5 os_prio=0 tid=0x7f8c08736000 nid=0xb469 
> waiting on condition [0x7f8bf1f78000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:899)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:666)
> - locked <0x83b80360> (a 
> org.apache.hadoop.ipc.Client$Connection)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:745)
> - locked <0x83b80360> (a 
> org.apache.hadoop.ipc.Client$Connection)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1618)
> at org.apache.hadoop.ipc.Client.call(Client.java:1449)
> at org.apache.hadoop.ipc.Client.call(Client.java:1396)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
> at 

[jira] [Commented] (OOZIE-2887) Oozie Server hangs when there is a user job has wrong namenode address

2017-05-26 Thread Venkat Ranganathan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026911#comment-16026911
 ] 

Venkat Ranganathan commented on OOZIE-2887:
---

You can use the whitelist configuration parameters to control this.

See
oozie.service.HadoopAccessorService.jobTracker.whitelist and
oozie.service.HadoopAccessorService.nameNode.whitelist

> Oozie Server hangs when there is a user job has wrong namenode address 
> ---
>
> Key: OOZIE-2887
> URL: https://issues.apache.org/jira/browse/OOZIE-2887
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 4.3.0
>Reporter: Prabhu Joseph
>Priority: Critical
>
> All the oozie jobs goes to PREP state when a user job tries to connect to 
> wrong namenode address by mistake. Analyzing the jstack, all the threads 
> which tries to submit job waiting to lock "java.util.ServiceLoader"
> {code}
> "pool-2-thread-19" #47 prio=5 os_prio=0 tid=0x7f8c08734000 nid=0xb468 
> waiting for monitor entry [0x7f8bf207a000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:89)
> - waiting to lock <0x81b29098> (a java.util.ServiceLoader)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
> at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1260)
> at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1256)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at org.apache.hadoop.mapreduce.Job.connect(Job.java:1255)
> - locked <0x82fd6b30> (a org.apache.hadoop.mapreduce.Job)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1284)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1187)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1373)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
> at org.apache.oozie.command.XCommand.call(XCommand.java:287)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:331)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:260)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> And the thread which tries to connect to wrong NameNode address which has 
> acquired the lock and keeps on retrying to connect to NameNode for ever. 
> {code}
> "pool-2-thread-20" #48 prio=5 os_prio=0 tid=0x7f8c08736000 nid=0xb469 
> waiting on condition [0x7f8bf1f78000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:899)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:666)
> - locked <0x83b80360> (a 
> org.apache.hadoop.ipc.Client$Connection)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:745)
> - locked <0x83b80360> (a 
> org.apache.hadoop.ipc.Client$Connection)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1618)
> at org.apache.hadoop.ipc.Client.call(Client.java:1449)
> at org.apache.hadoop.ipc.Client.call(Client.java:1396)
>