On Fri, Mar 26, 2010 at 09:09, Stian Soiland-Reyes
<[email protected]> wrote:

> I would have to look at the code, this is a while ago for me!

The method which removes completed jobs from DefaultQueueMonitor:

private void removeCompletedJobs() {
                Queue queue=daoFactory.getQueueDAO().defaultQueue();
                List<Job> completedJobs = 
findQueuedJobsByStatus(Status.COMPLETE);
                completedJobs.addAll(findQueuedJobsByStatus(Status.CANCELLED));
                completedJobs.addAll(findQueuedJobsByStatus(Status.FAILED));
                for (Job job : completedJobs) {
                        QueueEntry entry=queue.removeJob(job);
                        daoFactory.getQueueEntryDAO().delete(entry);
                        daoFactory.getJobDAO().update(job);
                }
                daoFactory.getQueueDAO().update(queue);
                daoFactory.commit();
        }

And Queue.removeJob():

        public QueueEntry removeJob(Job job) {
                QueueEntry removeEntry = job.getQueueEntry();
                if (removeEntry == null) {
                        throw new IllegalArgumentException("Unknown job " + 
job);
                }
                job.setQueueEntry(null);
                entries.remove(removeEntry);
                setLastModified();
                return removeEntry;
        }


Meaning that if the queueentry is removed through this method, it
should also have job.queueEntry set to null.

I can see that if you use a database without transactional support,
like MySQL MyISAM, you could have two different connections running
intro trouble between the two lines:

daoFactory.getQueueEntryDAO().delete(entry);
daoFactory.getJobDAO().update(job);

(Note however that there is only one thread doing this database
cleanup - but there could be a second thread from the worker
requesting what jobs are on the queue)


But if you are right that you redid the tests with InnoDB (which is
transactional) and had a blank database, then something else must be
wrong..

The ConsoleReaderThread should only be reading the job  it has been
asked to execute, which should not be ready to be deleted yet.

I also checked the creation of the jobs, where the job's queue entry
is set *before* it is added to the list of queue entries, so it should
not be visible to the DefaultQueueMonitor before that has completed
either..


Sounds like a little mystery to me, but obviously there are some race
conditions involved here.

So you submit 6 jobs at once, all from the same Taverna client? Are
these workflows the type that takes short or long time to execute?

-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
taverna-hackers mailing list
[email protected]
Web site: http://www.taverna.org.uk
Mailing lists: http://www.taverna.org.uk/taverna-mailing-lists/
Developers Guide: http://www.mygrid.org.uk/tools/developer-information

Reply via email to