Hi Team, I apologize for the lengthy message.
I am having a problem with my Taverna system. I haven't been able to get to the bottom of what is causing the problem. Curiously, I've been using this system for 12-16 months without problem. It's been great. However, recently our MySQL DB was upgraded forcing a change in the required java_connector version. I've also changed from TV 1.7.1 to TV 1.7.2. Seemingly tiny changes. The system in question consists of clients machines running TV 1.7.2, a server machine with TV 1.7.2 + Remote Execution service, and a MySQL DB instead of Derby. I currently use a slightly modified version (DB configure modifications plus changes to the sizes of some of the columns) of the remote execution service (0.5+). The problem is the DB gets in an inappropriate state where for a particular job, its QueueEntry id is removed but its queueEntry_id in Job is not null. We've determined this while checking sql logs to the MySQL DB in question. In fact, we've observed (not 100% reproducible) that over time the QueueEntry did occasionally get deleted before queueEntry_id is set to null. This causes failures in the remote execution service. To get the webservice back up and accessing the DB, one simply needs to manually set to null the "orphaned" entryQueue_id value. What I don't quite understand is why this is happening (some OS upgrade was performed but it's hard for me to get specific details about what was changed) Example Run scenario: Six jobs launched to Taverna on the client machine. One Queue was enabled in the execution service (remote server). So jobs run one at a time. The first two completed fine. The third completed ( ascertained after the fact ) but a problem occurred in the remote execution service.No further jobs would run. 1) The exception found in catalina.out looks (in part) like this. > Exception in thread "Console reader for Job > 1dbde4cf-a7bc-46cd-b2f6-957aefa2a119" > javax.persistence.EntityNotFoundException: Unable to find > net.sf.taverna.service.datastore.bean.QueueEntry with id 3 > at > org.hibernate.ejb.Ejb3Configuration$Ejb3EntityNotFoundDelegate.handleEntityNotFound(Ejb3Configuration.java:107) > at > org.hibernate.event.def.DefaultLoadEventListener.load(DefaultLoadEventListener.java:145) > at > org.hibernate.event.def.DefaultLoadEventListener.proxyOrLoad(DefaultLoadEventListener.java:195) > at > org.hibernate.event.def.DefaultLoadEventListener.onLoad(DefaultLoadEventListener.java:103) > at org.hibernate.impl.SessionImpl.fireLoad(SessionImpl.java:878) > at org.hibernate.impl.SessionImpl.internalLoad(SessionImpl.java:846) > at > org.hibernate.type.EntityType.resolveIdentifier(EntityType.java:557) > at org.hibernate.type.EntityType.resolve(EntityType.java:379) > at > org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:116) > at > org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:842) > at org.hibernate.loader.Loader.doQuery(Loader.java:717) > at > org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:224) > at org.hibernate.loader.Loader.loadEntity(Loader.java:1851) > at > org.hibernate.loader.entity.AbstractEntityLoader.load(AbstractEntityLoader.java:48) > at > org.hibernate.loader.entity.AbstractEntityLoader.load(AbstractEntityLoader.java:42) (snip) > at > net.sf.taverna.service.datastore.dao.jpa.GenericDaoImpl.reread(GenericDaoImpl.java:70) > at > net.sf.taverna.service.backend.executor.ConsoleReaderThread.run(ProcessJobExecutor.java:281) > Exception in thread "Queue Monitor Thread" > javax.persistence.EntityNotFoundException: Unable to find > net.sf.taverna.service.datastore.bean.QueueEntry with id 3 > at > org.hibernate.ejb.Ejb3Configuration$Ejb3EntityNotFoundDelegate.handleEntityNotFound(Ejb3Configuration.java:107) > at > org.hibernate.event.def.DefaultLoadEventListener.load(DefaultLoadEventListener.java:14 2) Now, the DB looks like this. Note how queueEntry_id=3 exists, but the corresponding entry in QueueEntry has been deleted. I think this is the problem, I believe the delete QueueEntry step is supposed to be last in the series of Job table updates for a COMPLETED job cleanup. Job table: > +---------------+ > | queueEntry_id | > +---------------+ > | NULL | > | NULL | > | 3 | > | 4 | > | 5 | > | 6 | > +---------------+ > 6 rows in set (0.00 sec) > > QueueEntry table: > +----+--------------------------------------+ > | id | queue_id | > +----+--------------------------------------+ > | 4 | b042a005-50f2-4398-93fa-1791a969d676 | > | 5 | b042a005-50f2-4398-93fa-1791a969d676 | > | 6 | b042a005-50f2-4398-93fa-1791a969d676 | > +----+--------------------------------------+ > 3 rows in set (0.00 sec) 3) Persistence.xml is configured as follows. I've attempted to disable pooling but changing those values had little effect on the failure. > <persistence-unit name="tavernaService"> > <provider>org.hibernate.ejb.HibernatePersistence</provider> > <class>net.sf.taverna.service.datastore.bean.Job</class> > <class>net.sf.taverna.service.datastore.bean.Workflow</class> > <class>net.sf.taverna.service.datastore.bean.DataDoc</class> > <class>net.sf.taverna.service.datastore.bean.User</class> > <class>net.sf.taverna.service.datastore.bean.Worker</class> > <class>net.sf.taverna.service.datastore.bean.Queue</class> > > <class>net.sf.taverna.service.datastore.bean.QueueEntry</class> > > <class>net.sf.taverna.service.datastore.bean.Configuration</class> > <properties> > <property name="hibernate.archive.autodetection" > value="class, hbm" /> > <property name="hibernate.show_sql" value="false" /> > <property name="hibernate.format_sql" value="true" /> > <property name="hibernate.connection.driver_class" > value="com.mysql.jdbc.Driver" /> > <property name="hibernate.connection.url" > > value="jdbc:mysql://db.edc.org:3306/name?user=user&password=password&create=true"/> > <property name="hibernate.c3p0.min_size" value="5" /> > <property name="hibernate.c3p0.max_size" value="0" /> > <property name="hibernate.c3p0.timeout" value="30" /> > <property name="hibernate.c3p0.max_statements" > value="0" /> > <property name="hibernate.c3p0.idle_test_period" > value="300" /> > <property name="hibernate.dialect" > value="org.hibernate.dialect.DerbyDialect" /> > <property name="hibernate.dialect" > value="org.hibernate.dialect.MySQLDialect" /> > <property name="hibernate.hbm2ddl.auto" > value="update" /> > </properties> > </persistence-unit> 4) The taverna running on the client machine can no longer interact with the remote execution service on the server machine and throws this exception: > Exception in thread "AWT-EventQueue-0" java.lang.RuntimeException: Could not > get document from > https://compute.system.org:8443/network-0.5.1/v1/users/jtilson/jobs > at > net.sf.taverna.service.rest.client.RESTContext.loadDocument(RESTContext.java:302) > at > net.sf.taverna.service.rest.client.AbstractREST.loadDocument(AbstractREST.java:77) > at > net.sf.taverna.service.rest.client.LinkedREST.getDocument(LinkedREST.java:30) > at > net.sf.taverna.service.rest.client.JobsREST.getJobs(JobsREST.java:33) > at > net.sf.taverna.service.rest.client.JobsREST.iterator(JobsREST.java:41) > at > net.sf.taverna.service.executeremotely.ui.JobsPanel.addJobs(JobsPanel.java:187) > at > net.sf.taverna.service.executeremotely.ui.JobsPanel.access$500(JobsPanel.java:27) 5) If I manually set queueEntry_id=3 to null, everything works again. Any thoughts, comments, ideas are welcome. I am thinking this is simply a configuration problem but am not sure. --jeff ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ taverna-hackers mailing list [email protected] Web site: http://www.taverna.org.uk Mailing lists: http://www.taverna.org.uk/taverna-mailing-lists/ Developers Guide: http://www.mygrid.org.uk/tools/developer-information
