Hi,
I use oozie 4.0.0 in HA mode. The problem is there is one time that we
restart mysql, it just takes about 30 seconds, then I found both oozie
server log has exception like:
2014-11-12 06:39:34,603 WARN JPAService:542 - USER[-] GROUP[-] TOKEN[-]
APP[-] JOB[-] ACTION[-] JPAExecutor [WorkflowActionsGetPendingJPAExecutor]
ended with an active transaction, rolling back
2014-11-12 06:39:34,604 WARN RecoveryService$RecoveryRunnable:542 -
USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Exception while reading
pending actions from storage
org.apache.oozie.executor.jpa.JPAExecutorException: E0603: SQL error in
operation, <openjpa-2.2.2-r422266:1468616 fatal general error>
org.apache.openjpa.persistence.PersistenceException: Communications link
failure
The last packet successfully received from the server was 14,288
milliseconds ago. The last packet sent successfully to the server was 0
milliseconds ago.
at org.apache.oozie.service.JPAService.execute(JPAService.java:218)
at
org.apache.oozie.service.RecoveryService$RecoveryRunnable.runWFRecovery(RecoveryService.java:326)
at
org.apache.oozie.service.RecoveryService$RecoveryRunnable.run(RecoveryService.java:134)
at
org.apache.oozie.service.SchedulerService$2.run(SchedulerService.java:174)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: <openjpa-2.2.2-r422266:1468616 fatal general error>
org.apache.openjpa.persistence.PersistenceException: Communications link
failure
The last packet successfully received from the server was 14,288
milliseconds ago. The last packet sent successfully to the server was 0
milliseconds ago.
at
org.apache.openjpa.jdbc.sql.DBDictionary.narrow(DBDictionary.java:4962)
at
org.apache.openjpa.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4922)
at
org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:136)
at
org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:110)
at
org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:62)
at
org.apache.openjpa.jdbc.kernel.JDBCStoreManager.connect(JDBCStoreManager.java:968)
at
org.apache.openjpa.jdbc.kernel.JDBCStoreManager.getConnection(JDBCStoreManager.java:242)
at
org.apache.openjpa.jdbc.kernel.PreparedSQLStoreQuery$PreparedSQLExecutor.executeQuery(PreparedSQLStoreQuery.java:104)
at org.apache.openjpa.kernel.QueryImpl.execute(QueryImpl.java:1005)
at org.apache.openjpa.kernel.QueryImpl.execute(QueryImpl.java:863)
at org.apache.openjpa.kernel.QueryImpl.execute(QueryImpl.java:794)
at
org.apache.openjpa.kernel.DelegatingQuery.execute(DelegatingQuery.java:542)
at
org.apache.openjpa.persistence.QueryImpl.execute(QueryImpl.java:286)
at
org.apache.openjpa.persistence.QueryImpl.getResultList(QueryImpl.java:302)
at
org.apache.oozie.executor.jpa.WorkflowActionsGetPendingJPAExecutor.execute(WorkflowActionsGetPendingJPAExecutor.java:60)
at
org.apache.oozie.executor.jpa.WorkflowActionsGetPendingJPAExecutor.execute(WorkflowActionsGetPendingJPAExecutor.java:30)
at org.apache.oozie.service.JPAService.execute(JPAService.java:207)
... 12 more
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
Communications link failure
The last packet successfully received from the server was 14,288
milliseconds ago. The last packet sent successfully to the server was 0
milliseconds ago.
at sun.reflect.GeneratedConstructorAccessor87.newInstance(Unknown
Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:409)
at
com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1122)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3056)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2942)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3485)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1960)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2114)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2690)
at
com.mysql.jdbc.ConnectionImpl.setAutoCommit(ConnectionImpl.java:5093)
at
org.apache.commons.dbcp.DelegatingConnection.setAutoCommit(DelegatingConnection.java:371)
at
org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.setAutoCommit(PoolingDataSource.java:328)
at
org.apache.openjpa.lib.jdbc.DelegatingConnection.setAutoCommit(DelegatingConnection.java:167)
at
org.apache.openjpa.lib.jdbc.DelegatingConnection.setAutoCommit(DelegatingConnection.java:167)
at
org.apache.openjpa.lib.jdbc.ConfiguringConnectionDecorator$ConfiguringConnection.setAutoCommit(ConfiguringConnectionDecorator.java:117)
at
org.apache.openjpa.lib.jdbc.ConfiguringConnectionDecorator$ConfiguringConnection.<init>(ConfiguringConnectionDecorator.java:111)
at
org.apache.openjpa.lib.jdbc.ConfiguringConnectionDecorator.decorate(ConfiguringConnectionDecorator.java:93)
at
org.apache.openjpa.lib.jdbc.DecoratingDataSource.decorate(DecoratingDataSource.java:99)
at
org.apache.openjpa.lib.jdbc.DecoratingDataSource.getConnection(DecoratingDataSource.java:88)
at
org.apache.openjpa.jdbc.kernel.JDBCStoreManager.connectInternal(JDBCStoreManager.java:979)
at
org.apache.openjpa.jdbc.kernel.JDBCStoreManager.connect(JDBCStoreManager.java:964)
... 23 more
Caused by: java.io.EOFException: Can not read response from server.
Expected to read 4 bytes, read 0 bytes before connection was unexpectedly
lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2503)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2953)
... 40 more
2014-11-12 06:39:41,294 INFO
StatusTransitService$StatusTransitRunnable:539 - USER[-] GROUP[-] Acquired
lock for [org.apache.oozie.service.StatusTransitService]
2014-11-12 06:39:41,295 INFO
StatusTransitService$StatusTransitRunnable:539 - USER[-] GROUP[-] Running
coordinator status service from last instance time = 2014-11-12T06:38Z
2014-11-12 06:39:50,756 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:39:56,744 INFO PauseTransitService:539 - USER[-] GROUP[-]
Acquired lock for [org.apache.oozie.service.PauseTransitService]
2014-11-12 06:40:20,757 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:40:50,758 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:41:20,759 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:41:50,759 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:42:20,760 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:42:50,761 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:43:20,762 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:43:50,762 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:44:20,763 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:44:50,764 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:45:20,765 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:45:50,766 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:46:20,766 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:46:50,767 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:47:20,768 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:47:50,769 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:48:20,769 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:48:50,770 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:49:20,771 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:49:50,772 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:50:20,772 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:50:21,455 WARN ParameterVerifier:542 - USER[hdfs] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] The application does not define formal
parameters in its XML definition
2014-11-12 06:50:21,557 WARN LiteWorkflowAppService:542 - USER[hdfs]
GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] libpath
[hdfs://cluster1/user/nagios/oozie/wfDistinct/lib] does not exist
2014-11-12 06:50:50,773 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:51:20,775 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:51:50,776 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:52:20,776 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:52:50,777 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:53:20,778 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:53:50,778 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:54:20,779 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:54:50,780 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:55:20,780 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:55:50,781 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:56:20,782 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:56:50,782 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:57:20,783 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:57:50,784 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:58:20,785 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:58:50,786 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:59:20,787 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
2014-11-12 06:59:50,788 INFO SLACalculatorMemory:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA check
Seems like the log just continue to print "INFO SLACalculatorMemory:539 -
USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Running periodic SLA
check" and nothing more.
At this time both oozie servers won't accept any job submission and web
console has no response too. We restart oozie server then it's back to
normal.
Any idea of what may happen? Any suggestion is welcome.
thanks