Galaxy is failing due to a segfault in libdrmaa
[9116874.391434] python[5211]: segfault at 0 ip 00007fcb9fd8ae62 sp 
00007fcb9affe490 error 4 in libdrmaa.so.1.0[7fcb9fc29000+1b9000]

I first started observing this in the last few weeks. After the first event I 
pulled in this changeset
4a95ae9<https://bitbucket.org/galaxy/galaxy-central/commits/4a95ae9a26d96f0dc9a0fe3b083a2c7b99b0466b>

Handle invalid job ids in the drmaa runner.



but I'm still seeing the segfault.

I think this is some correlated log information from before the patch…

Error - <type 'exceptions.UnboundLocalError'>: local variable 'job' referenced 
before assignment
URL: http://galaxy.neb.com/datasets/c3d98ec09a23e847/show_params
File 
'/mnt/ngswork/galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/exceptions/errormiddleware.py',
 line 143 in __call__
  app_iter = self.application(environ, start_response)
File 
'/mnt/ngswork/galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/recursive.py', 
line 80 in __call__
  return self.application(environ, start_response)
File 
'/mnt/ngswork/galaxy/galaxy-dist/lib/galaxy/web/framework/middleware/remoteuser.py',
 line 91 in __call__
  return self.app( environ, start_response )
File 
'/mnt/ngswork/galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpexceptions.py',
 line 632 in __call__
  return self.application(environ, start_response)
File '/mnt/ngswork/galaxy/galaxy-dist/lib/galaxy/web/framework/base.py', line 
160 in __call__
  body = method( trans, **kwargs )
File 
'/mnt/ngswork/galaxy/galaxy-dist/lib/galaxy/webapps/galaxy/controllers/dataset.py',
 line 1025 in show_params
  return trans.fill_template( "show_params.mako", inherit_chain=inherit_chain, 
history=trans.get_history(), hda=hda, job=job, tool=tool, 
params_objects=params_objects )
UnboundLocalError: local variable 'job' referenced before assignment

after applying 4a95ae9
I see this


galaxy.jobs.handler DEBUG 2012-12-05 10:34:20,968 Stopping job 25519:
galaxy.jobs.handler DEBUG 2012-12-05 10:34:20,971 stopping job 25519 in drmaa 
runner
galaxy.jobs.runners.drmaa DEBUG 2012-12-05 10:34:20,983 (25519/22378) User 
killed running job, but it was already dead
172.17.121.186 - - [05/Dec/2012:10:34:19 -0400] "GET 
/datasets/414fa4e8d28bb2be/delete_async HTTP/1.1" 200 - 
"http://galaxy.neb.com/history?status=done&show_deleted=False&filename=None&dataset_id=6152b5966ba797a7";
 "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; InfoPath.2; 
.NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
galaxy.jobs.handler INFO 2012-12-05 10:34:21,073 (25520) Job unable to run: one 
or more inputs deleted
galaxy.jobs.handler DEBUG 2012-12-05 10:34:22,251 Stopping job 25520:
galaxy.jobs.handler DEBUG 2012-12-05 10:34:22,253 stopping job 25520 in drmaa 
runner


Any ideas?



Brad


--
Brad Langhorst
langho...@neb.com<mailto:langho...@neb.com>





___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to