i believe the latest stable update of galaxy included changes to drmaa.py
which allows a job to be rechecked indefinitely with regard to scheduler
communication errors, so perhaps your "cluster could not complete job"
errors are due to a filesystem race condition, whereby the cluster node
completes
Hello everybody :)
Today, I have a question related to timeout management in Galaxy.
More particularly, I'm searching for a way to set (in a configuration
file if possible) all timeouts related to DRMAA and timeouts related to
communication between Galaxy and SGE.
My goal is to increase c