Was there any resolution to this issue?
It looks like the same thing happened to me this morning.
We have a job that runs once a day, and this morning workflow never left the
'PREP' state, and the coordinator show's the workflow as failed.
I ended up killing the failed coordinator and restarting it (so things look
like they are back to normal).
However, I'm wondering if this will happen again, and if so, how often?
We're using cdh3u2 (so, Oozie 2.3.2-u2).
Any help would be appreciated.
Below is error that was reported in oozie.log:
2012-02-15 01:02:41,753 ERROR StartCommand:522 - USER[oozie] GROUP[users]
TOKEN[] APP[comps_primary_workflow] JOB[0001668-111221102141548-oozie-oozi-W]
ACTION[0000639-111221102141548-oozie-oozi-C@20] XException,
org.apache.oozie.store.StoreException: E0607: Other error in operation
[updateWorkflow], Unable to obtain an object lock on "A lock could not be
obtained due to a deadlock, cycle of locks and waiters is:
Lock : ROW, WF_JOBS, (2,7) Waiting XID : {1971343, U} , SA, UPDATE WF_JOBS SET
app_name = ?, app_path = ?, conf = ?, group_name = ?, run = ?, user_name = ?,
auth_token = ?, created_time = ?, end_time = NULL, external_id = NULL,
last_modified_time = ?, log_token = ?, proto_action_conf = ?, sla_x
ml = ?, start_time = ?, status = ?, wf_instance = ? WHERE id IN (SELECT
DISTINCT t0.id FROM WF_JOBS t0 WHERE (t0.id = ?) AND t0.bean_type = ?)
Granted XID : {1971338, U}
Lock : ROW, WF_JOBS, (952,43) Waiting XID : {1971338, S} , SA, UPDATE WF_JOBS
SET app_name = ?, app_path = ?, conf = ?, group_name = ?, run = ?, user_name =
?, auth_token = ?, created_time = ?, end_time = NULL, external_id = NULL,
last_modified_time = ?, log_token = ?, proto_action_conf = ?, sla_x
ml = ?, start_time = ?, status = ?, wf_instance = ? WHERE id IN (SELECT
DISTINCT t0.id FROM WF_JOBS t0 WHERE (t0.id = ?) AND t0.bean_type = ?)
Granted XID : {1971343, X}
. The selected victim is XID : 1971343. {prepstmnt 1601033644 UPDATE WF_JOBS
SET app_name = ?, app_path = ?, conf = ?, group_name = ?, run = ?, user_name =
?, auth_token = ?, created_time = ?, end_time = NULL, external_id = NULL,
last_modified_time = ?, log_token = ?, p
roto_action_conf = ?, sla_xml = ?, start_time = ?, status = ?, wf_instance = ?
WHERE id IN (SELECT DISTINCT t0.id FROM WF_JOBS t0 WHERE (t0.id = ?) AND
t0.bean_type = ?) [params=(String) comps_primary_workflow, (String)
hdfs://analyticsnn1.sv2.trulia.com:8020/oozie/workflow/coordinators/...,
(String) <configuration>
<property>
<name>downloadedForSaleDir</name>..., (String) users, (int) 0, (String)
oozie, (String) ?, (Timestamp) 2012-02-15 01:02:21.683, (Timestamp) 2012-02-15
01:02:21.705, (String) , (String) <?xml version="1.0" encoding="UTF-8"
standalone="no"?><configuration..., (String) , (Timestamp) 2012-02-15
01:02:21.7, (String) RUNNING, (byte[]) [B@7b46f3aa, (String)
0001668-111221102141548-oozie-oozi-W, (String) WorkflowJobBean]} [code=30000,
state=40001] [java.lang.String]".
org.apache.oozie.store.StoreException: E0607: Other error in operation
[updateWorkflow], Unable to obtain an object lock on "A lock could not be
obtained due to a deadlock, cycle of locks and waiters is:Lock : ROW, WF_JOBS,
(2,7)
Waiting XID : {1971343, U} , SA, UPDATE WF_JOBS SET app_name = ?, app_path =
?, conf = ?, group_name = ?, run = ?, user_name = ?, auth_token = ?,
created_time = ?, end_time = NULL, external_id = NULL, last_modified_time = ?,
log_token = ?, proto_action_conf = ?, sla_xml = ?, start_time = ?, status = ?,
wf_instance = ? WHERE id IN (SELECT DISTINCT t0.id FROM WF_JOBS t0 WHERE (t0.id
= ?) AND t0.bean_type = ?)
Granted XID : {1971338, U}
Lock : ROW, WF_JOBS, (952,43)
Waiting XID : {1971338, S} , SA, UPDATE WF_JOBS SET app_name = ?, app_path =
?, conf = ?, group_name = ?, run = ?, user_name = ?, auth_token = ?,
created_time = ?, end_time = NULL, external_id = NULL, last_modified_time = ?,
log_token = ?, proto_action_conf = ?, sla_xml = ?, start_time = ?, status = ?,
wf_instance = ? WHERE id IN (SELECT DISTINCT t0.id FROM WF_JOBS t0 WHERE (t0.id
= ?) AND t0.bean_type = ?)
Granted XID : {1971343, X}
. The selected victim is XID : 1971343. {prepstmnt 1601033644 UPDATE WF_JOBS
SET app_name = ?, app_path = ?, conf = ?, group_name = ?, run = ?, user_name =
?, auth_token = ?, created_time = ?, end_time = NULL, external_id = NULL,
last_modified_time = ?, log_token = ?, proto_action_conf = ?, sla_xml = ?,
start_time = ?, status = ?, wf_instance = ? WHERE id IN (SELECT DISTINCT t0.id
FROM WF_JOBS t0 WHERE (t0.id = ?) AND t0.bean_type = ?) [params=(String)
comps_primary_workflow, (String)
hdfs://analyticsnn1.sv2.trulia.com:8020/oozie/workflow/coordinators/...,
(String) <configuration>
<property>
<name>downloadedForSaleDir</name>..., (String) users, (int) 0, (String)
oozie, (String) ?, (Timestamp) 2012-02-15 01:02:21.683, (Timestamp) 2012-02-15
01:02:21.705, (String) , (String) <?xml version="1.0" encoding="UTF-8"
standalone="no"?><configuration..., (String) , (Timestamp) 2012-02-15
01:02:21.7, (String) RUNNING, (byte[]) [B@7b46f3aa, (String)
0001668-111221102141548-oozie-oozi-W, (String) WorkflowJobBean]} [code=30000,
state=40001] [java.lang.String]".
at
org.apache.oozie.store.WorkflowStore.doOperation(WorkflowStore.java:823)
at
org.apache.oozie.store.WorkflowStore.updateWorkflow(WorkflowStore.java:197)
at
org.apache.oozie.command.wf.SignalCommand.call(SignalCommand.java:199)
at
org.apache.oozie.command.wf.SignalCommand.execute(SignalCommand.java:305)
at
org.apache.oozie.command.wf.SignalCommand.execute(SignalCommand.java:59)
at org.apache.oozie.command.Command.call(Command.java:202)
at org.apache.oozie.DagEngine.start(DagEngine.java:184)
at org.apache.oozie.DagEngine.submitJob(DagEngine.java:98)
at
org.apache.oozie.command.coord.CoordActionStartCommand.call(CoordActionStartCommand.java:157)
at
org.apache.oozie.command.coord.CoordActionStartCommand.execute(CoordActionStartCommand.java:211)
at
org.apache.oozie.command.coord.CoordActionStartCommand.execute(CoordActionStartCommand.java:45)
at org.apache.oozie.command.Command.call(Command.java:202)
at
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:132)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: <openjpa-1.2.1-r752877:753278 nonfatal store error>
org.apache.openjpa.persistence.OptimisticLockException: Unable to obtain an
object lock on "A lock could not be obtained due to a deadlock, cycle of locks
and waiters is:
Lock : ROW, WF_JOBS, (2,7)
Waiting XID : {1971343, U} , SA, UPDATE WF_JOBS SET app_name = ?, app_path =
?, conf = ?, group_name = ?, run = ?, user_name = ?, auth_token = ?,
created_time = ?, end_time = NULL, external_id = NULL, last_modified_time = ?,
log_token = ?, proto_action_conf = ?, sla_xml = ?, start_time = ?, status = ?,
wf_instance = ? WHERE id IN (SELECT DISTINCT t0.id FROM WF_JOBS t0 WHERE (t0.id
= ?) AND t0.bean_type = ?)
Granted XID : {1971338, U}
Lock : ROW, WF_JOBS, (952,43)
Waiting XID : {1971338, S} , SA, UPDATE WF_JOBS SET app_name = ?, app_path =
?, conf = ?, group_name = ?, run = ?, user_name = ?, auth_token = ?,
created_time = ?, end_time = NULL, external_id = NULL, last_modified_time = ?,
log_token = ?, proto_action_conf = ?, sla_xml = ?, start_time = ?, status = ?,
wf_instance = ? WHERE id IN (SELECT DISTINCT t0.id FROM WF_JOBS t0 WHERE (t0.id
= ?) AND t0.bean_type = ?)
Granted XID : {1971343, X}
. The selected victim is XID : 1971343. {prepstmnt 1601033644 UPDATE WF_JOBS
SET app_name = ?, app_path = ?, conf = ?, group_name = ?, run = ?, user_name =
?, auth_token = ?, created_time = ?, end_time = NULL, external_id = NULL,
last_modified_time = ?, log_token = ?, proto_action_conf = ?, sla_xml = ?,
start_time = ?, status = ?, wf_instance = ? WHERE id IN (SELECT DISTINCT t0.id
FROM WF_JOBS t0 WHERE (t0.id = ?) AND t0.bean_type = ?) [params=(String)
comps_primary_workflow, (String)
hdfs://analyticsnn1.sv2.trulia.com:8020/oozie/workflow/coordinators/...,
(String) <configuration>
<property>
<name>downloadedForSaleDir</name>..., (String) users, (int) 0, (String)
oozie, (String) ?, (Timestamp) 2012-02-15 01:02:21.683, (Timestamp) 2012-02-15
01:02:21.705, (String) , (String) <?xml version="1.0" encoding="UTF-8"
standalone="no"?><configuration..., (String) , (Timestamp) 2012-02-15
01:02:21.7, (String) RUNNING, (byte[]) [B@7b46f3aa, (String)
0001668-111221102141548-oozie-oozi-W, (String) WorkflowJobBean]} [code=30000,
state=40001] [java.lang.String]".
FailedObject: UPDATE WF_JOBS SET app_name = ?, app_path = ?, conf = ?,
group_name = ?, run = ?, user_name = ?, auth_token = ?, created_time = ?,
end_time = NULL, external_id = NULL, last_modified_time = ?, log_token = ?,
proto_action_conf = ?, sla_xml = ?, start_time = ?, status = ?, wf_instance = ?
WHERE id IN (SELECT DISTINCT t0.id FROM WF_JOBS t0 WHERE (t0.id = ?) AND
t0.bean_type = ?) [java.lang.String]
at
org.apache.openjpa.jdbc.sql.DBDictionary.narrow(DBDictionary.java:4222)
at
org.apache.openjpa.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4197)
at
org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:102)
at
org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:72)
at
org.apache.openjpa.jdbc.kernel.JDBCStoreQuery.executeBulkOperation(JDBCStoreQuery.java:515)
at
org.apache.openjpa.jdbc.kernel.JDBCStoreQuery.executeUpdate(JDBCStoreQuery.java:440)
at
org.apache.openjpa.kernel.ExpressionStoreQuery$DataStoreExecutor.executeUpdate(ExpressionStoreQuery.java:694)
at org.apache.openjpa.kernel.QueryImpl.update(QueryImpl.java:1039)
at org.apache.openjpa.kernel.QueryImpl.execute(QueryImpl.java:852)
at org.apache.openjpa.kernel.QueryImpl.updateAll(QueryImpl.java:888)
at
org.apache.openjpa.kernel.DelegatingQuery.updateAll(DelegatingQuery.java:581)
at
org.apache.openjpa.persistence.QueryImpl.executeUpdate(QueryImpl.java:338)
at org.apache.oozie.store.WorkflowStore$5.call(WorkflowStore.java:202)
at org.apache.oozie.store.WorkflowStore$5.call(WorkflowStore.java:197)
at
org.apache.oozie.store.WorkflowStore.doOperation(WorkflowStore.java:808)
... 15 more
Caused by: org.apache.openjpa.lib.jdbc.ReportingSQLException: A lock could not
be obtained due to a deadlock, cycle of locks and waiters is:
Lock : ROW, WF_JOBS, (2,7)
Waiting XID : {1971343, U} , SA, UPDATE WF_JOBS SET app_name = ?, app_path =
?, conf = ?, group_name = ?, run = ?, user_name = ?, auth_token = ?,
created_time = ?, end_time = NULL, external_id = NULL, last_modified_time = ?,
log_token = ?, proto_action_conf = ?, sla_xml = ?, start_time = ?, status = ?,
wf_instance = ? WHERE id IN (SELECT DISTINCT t0.id FROM WF_JOBS t0 WHERE (t0.id
= ?) AND t0.bean_type = ?)
Granted XID : {1971338, U}
Lock : ROW, WF_JOBS, (952,43)
Waiting XID : {1971338, S} , SA, UPDATE WF_JOBS SET app_name = ?, app_path =
?, conf = ?, group_name = ?, run = ?, user_name = ?, auth_token = ?,
created_time = ?, end_time = NULL, external_id = NULL, last_modified_time = ?,
log_token = ?, proto_action_conf = ?, sla_xml = ?, start_time = ?, status = ?,
wf_instance = ? WHERE id IN (SELECT DISTINCT t0.id FROM WF_JOBS t0 WHERE (t0.id
= ?) AND t0.bean_type = ?)
Granted XID : {1971343, X}
. The selected victim is XID : 1971343. {prepstmnt 1601033644 UPDATE WF_JOBS
SET app_name = ?, app_path = ?, conf = ?, group_name = ?, run = ?, user_name =
?, auth_token = ?, created_time = ?, end_time = NULL, external_id = NULL,
last_modified_time = ?, log_token = ?, proto_action_conf = ?, sla_xml = ?,
start_time = ?, status = ?, wf_instance = ? WHERE id IN (SELECT DISTINCT t0.id
FROM WF_JOBS t0 WHERE (t0.id = ?) AND t0.bean_type = ?) [params=(String)
comps_primary_workflow, (String)
hdfs://analyticsnn1.sv2.trulia.com:8020/oozie/workflow/coordinators/...,
(String) <configuration>
<property>
<name>downloadedForSaleDir</name>..., (String) users, (int) 0, (String)
oozie, (String) ?, (Timestamp) 2012-02-15 01:02:21.683, (Timestamp) 2012-02-15
01:02:21.705, (String) , (String) <?xml version="1.0" encoding="UTF-8"
standalone="no"?><configuration..., (String) , (Timestamp) 2012-02-15
01:02:21.7, (String) RUNNING, (byte[]) [B@7b46f3aa, (String)
0001668-111221102141548-oozie-oozi-W, (String) WorkflowJobBean]} [code=30000,
state=40001]
at
org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:192)
at
org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.access$700(LoggingConnectionDecorator.java:57)
at
org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator$LoggingConnection$LoggingPreparedStatement.executeUpdate(LoggingConnectionDecorator.java:866)
at
org.apache.openjpa.lib.jdbc.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:269)
at
org.apache.openjpa.jdbc.kernel.JDBCStoreManager$CancelPreparedStatement.executeUpdate(JDBCStoreManager.java:1586)
at
org.apache.openjpa.jdbc.kernel.JDBCStoreQuery.executeUpdate(JDBCStoreQuery.java:671)
at
org.apache.openjpa.jdbc.kernel.JDBCStoreQuery.executeBulkOperation(JDBCStoreQuery.java:513)
... 25 more
________________________________
From: Mohammad Islam <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Friday, January 20, 2012 12:13 AM
Subject: Re: Some lock issue is occuring.
Did you recently upgrade oozie?
Can you please submit a new job and try to kill it
In which log file did you get it? looks like it is not in oozie.log. May be
catalina.out ot localhsot*
Regards,
Mohammad
----- Original Message -----
From: Harshal <[email protected]>
To: [email protected]
Cc:
Sent: Thursday, January 19, 2012 11:40 PM
Subject: Re: Some lock issue is occuring.
Also,
Error: E0607 : E0607: Other error in operation
[getActionsForCoordinatorJob], An optimistic lock violation was detected
when flushing object instance "A lock could not be obtained within the
time requested [java.lang.String]" to the data store. This indicates
that the object was concurrently modified in another transaction.
On Fri, 2012-01-20 at 12:53 +0530, Harshal wrote:
> What do you mean by this? I am not able to kill any of my jobs.
>
> Error: E0607 : E0607: Other error in operation
> [updateCoordinatorAction], Unable to obtain an object lock on "A lock
> could not be obtained due to a deadlock, cycle of locks and waiters is:
> Lock : ROW, COORD_ACTIONS, (336,8) Waiting XID : {312103, U} , SA,
> UPDATE COORD_ACTIONS SET action_number = ?, action_xml = ?, console_url
> = NULL, created_conf = ?, error_code = NULL, error_message = NULL,
> external_status = NULL, missing_dependencies = ?, run_conf = ?, time_out
> = ?, tracker_uri = NULL, job_type = NULL, created_time = ?, external_id
> = NULL, job_id = ?, last_modified_time = ?, nominal_time = ?, sla_xml =
> NULL, status = ? WHERE id IN (SELECT DISTINCT t0.id FROM COORD_ACTIONS
> t0 WHERE (t0.id = ?) AND t0.bean_type = ?) Granted XID : {312125, X}
> Lock : ROW, COORD_ACTIONS, (67,8) Waiting XID : {312125, S} , SA,
> UPDATE COORD_ACTIONS SET action_number = ?, action_xml = ?, console_url
> = NULL, created_conf = ?, error_code = NULL, error_message = NULL,
> external_status = NULL, missing_dependencies = ?, run_conf = ?, time_out
> = ?, tracker_uri = NULL, job_type = NULL, created_time = ?, external_id
> = NULL, job_id = ?, last_modified_time = ?, nominal_time = ?, sla_xml =
> NULL, status = ? WHERE id IN (SELECT DISTINCT t0.id FROM COORD_ACTIONS
> t0 WHERE (t0.id = ?) AND t0.bean_type = ?) Granted XID : {312103,
> X} . The selected victim is XID : 312103. {prepstmnt 34907896 UPDATE
> COORD_ACTIONS SET action_number = ?, action_xml = ?, console_url = NULL,
> created_conf = ?, error_code = NULL, error_message = NULL,
> external_status = NULL, missing_dependencies = ?, run_conf = ?, time_out
> = ?, tracker_uri = NULL, job_type = NULL, created_time = ?, external_id
> = NULL, job_id = ?, last_modified_time = ?, nominal_time = ?, sla_xml =
> NULL, status = ? WHERE id IN (SELECT DISTINCT t0.id FROM COORD_ACTIONS
> t0 WHERE (t0.id = ?) AND t0.bean_type = ?) [params=(int) 68, (String)
> <coordinator-app xmlns="uri:oozie:coordinator:0.1" name="plp-coord" ...,
> (String) <configuration> <property>
> <name>oozie.coord.application.pat..., (String) , (String)
> <configuration> <property> <name>oozie.coord.application.pat...,
> (int) -1, (Timestamp) 2012-01-18 00:04:27.148, (String)
> 0000003-120112014121356-oozie-oozi-C, (Timestamp) 2012-01-20 12:49:01.8,
> (Timestamp) 2012-01-17 15:00:00.0, (String) KILLED, (String)
> 0000003-120112014121356-oozie-oozi-C@68, (String)
> CoordinatorActionBean]} [code=30000, state=40001] [java.lang.String]".
>
>