Galaxy is failing due to a segfault in libdrmaa [9116874.391434] python[5211]: segfault at 0 ip 00007fcb9fd8ae62 sp 00007fcb9affe490 error 4 in libdrmaa.so.1.0[7fcb9fc29000+1b9000]
I first started observing this in the last few weeks. After the first event I pulled in this changeset 4a95ae9<https://bitbucket.org/galaxy/galaxy-central/commits/4a95ae9a26d96f0dc9a0fe3b083a2c7b99b0466b> Handle invalid job ids in the drmaa runner. but I'm still seeing the segfault. I think this is some correlated log information from before the patch… Error - <type 'exceptions.UnboundLocalError'>: local variable 'job' referenced before assignment URL: http://galaxy.neb.com/datasets/c3d98ec09a23e847/show_params File '/mnt/ngswork/galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/exceptions/errormiddleware.py', line 143 in __call__ app_iter = self.application(environ, start_response) File '/mnt/ngswork/galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/recursive.py', line 80 in __call__ return self.application(environ, start_response) File '/mnt/ngswork/galaxy/galaxy-dist/lib/galaxy/web/framework/middleware/remoteuser.py', line 91 in __call__ return self.app( environ, start_response ) File '/mnt/ngswork/galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpexceptions.py', line 632 in __call__ return self.application(environ, start_response) File '/mnt/ngswork/galaxy/galaxy-dist/lib/galaxy/web/framework/base.py', line 160 in __call__ body = method( trans, **kwargs ) File '/mnt/ngswork/galaxy/galaxy-dist/lib/galaxy/webapps/galaxy/controllers/dataset.py', line 1025 in show_params return trans.fill_template( "show_params.mako", inherit_chain=inherit_chain, history=trans.get_history(), hda=hda, job=job, tool=tool, params_objects=params_objects ) UnboundLocalError: local variable 'job' referenced before assignment after applying 4a95ae9 I see this galaxy.jobs.handler DEBUG 2012-12-05 10:34:20,968 Stopping job 25519: galaxy.jobs.handler DEBUG 2012-12-05 10:34:20,971 stopping job 25519 in drmaa runner galaxy.jobs.runners.drmaa DEBUG 2012-12-05 10:34:20,983 (25519/22378) User killed running job, but it was already dead 172.17.121.186 - - [05/Dec/2012:10:34:19 -0400] "GET /datasets/414fa4e8d28bb2be/delete_async HTTP/1.1" 200 - "http://galaxy.neb.com/history?status=done&show_deleted=False&filename=None&dataset_id=6152b5966ba797a7" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; InfoPath.2; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)" galaxy.jobs.handler INFO 2012-12-05 10:34:21,073 (25520) Job unable to run: one or more inputs deleted galaxy.jobs.handler DEBUG 2012-12-05 10:34:22,251 Stopping job 25520: galaxy.jobs.handler DEBUG 2012-12-05 10:34:22,253 stopping job 25520 in drmaa runner Any ideas? Brad -- Brad Langhorst langho...@neb.com<mailto:langho...@neb.com>
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/