Re: [galaxy-dev] Server stops itself

2013-01-10 Thread MONJEAUD

Hi,

indeed, you are right. In the database, the job_runner_external_id 
column is empty for all jobs causing the crash of Galaxy when they are 
stopped. I tried to launch the instance without the --daemon option and 
I have got this segmentation fault as you suspected:


run.sh: line 77: 19622 Segmentation fault  (core dumped) 
/local/python/2.7-bis/bin/python ./scripts/paster.py serve 
universe_wsgi.ini


If I understand, we can't delete a job with the state new (associated 
with an empty job_runner_external_id column)?


Thanks,
Cyril



On 01/09/2013 07:51 PM, Nate Coraor wrote:

Hi Cyril,

If you start the server in the foreground (no --daemon option), is there a 
segfault when the process dies?  If so, this is most likely a problem where a 
job is attempting to be stopped that does not have an external job ID set.  
Could you check this in the database for one of the jobs that's causing this 
(e.g. 3065)?

Thanks,
--nate

On Jan 9, 2013, at 4:39 AM, MONJEAUD wrote:


Hello All,

after more researchs, I found that the crash of the galaxy server was caused by 
stopping jobs. We are working with our own SGE cluster.

It's weird because we can kill jobs via history or administration panel without 
problem.

In the paster.log, we just got this message before the crash of the server :

galaxy.jobs.handler DEBUG 2013-01-08 16:52:39,877 Stopping job 3065:
galaxy.jobs.handler DEBUG 2013-01-08 16:52:39,877 stopping job 3065 in drmaa 
runner

I think this problem comes when there is many jobs in running, new and 
queued states.

Cheers,
Cyril


On 01/08/2013 04:11 PM, MONJEAUD wrote:

Hello All,

I'm trying to deploy my instance of Galaxy in production. Some tests we've done 
show that when the number of person connected is high (20 together), the 
server stops itself.

Sometimes, I have this error in the paster.log:


Exception happened during processing of request from ('127.0.0.1', 60575)
Traceback (most recent call last):
  File /opt/galaxy-dist/eggs/Paste-1.6-py2.7.egg/paste/httpserver.py, line 
1053, in process_request_in_thread
self.finish_request(request, client_address)
  File /local/python/2.7-bis/lib/python2.7/SocketServer.py, line 323, in 
finish_request
self.RequestHandlerClass(request, client_address, self)
  File /local/python/2.7-bis/lib/python2.7/SocketServer.py, line 641, in 
__init__
self.finish()
  File /local/python/2.7-bis/lib/python2.7/SocketServer.py, line 694, in 
finish
self.wfile.flush()
  File /local/python/2.7-bis/lib/python2.7/socket.py, line 301, in flush
self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe

Do you have any ideas about this and how resolve it?

Cheers!!
Cyril


--

Cyril Monjeaud
Equipe Symbiose / Plate-forme GenOuest
Bureau D156
IRISA-INRIA, Campus de Beaulieu
35042 Rennes cedex, France
Tél: +33 (0) 2 99 84 74 17

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/




--

Cyril Monjeaud
Equipe Symbiose / Plate-forme GenOuest
Bureau D156
IRISA-INRIA, Campus de Beaulieu
35042 Rennes cedex, France
Tél: +33 (0) 2 99 84 74 17

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] anybody seen 403s after a recent upgrade?

2013-01-10 Thread Nate Coraor
On Jan 9, 2013, at 3:51 PM, Langhorst, Brad wrote:

 grr - actually - no it's not fixed 
 
 I didn't see it on a few refreshes, but apparently it was temporary - or I 
 was hallucinating.

These problems are tricky to debug because they involve a lot of config 
fiddling and retrying.  I'd suggest setting mod_rewrite to debug:

http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriteloglevel

And modifying Galaxy to debug what is received from Apache:

http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-January/001676.html

One other suggestion that has come up in the past: SELinux can interfere with 
Apache/Galaxy communication, so if it's enabled, I would suggest disabling it 
and testing to see if that has an effect.

--nate

 
 
 brad
 On Jan 9, 2013, at 3:27 PM, Brad Langhorst langho...@neb.com
 wrote:
 
 Hmmm
 I reconsulted the docs and commented these lines. All seems well again after 
 an apache restart.  
 I had been running like this for a year or more, and only observed problems 
 after the most recent update
 
 
 #RewriteCond %{IS_SUBREQ} ^false$
 #RewriteCond %{LA-U:REMOTE_USER} (.+)
 #RewriteRule . - [E=RU:%1]
  RequestHeader set REMOTE_USER %{AUTHENTICATE_sAMAccountName}e
 
 I just cargo culted those in to begin with… I don't really know what they do.
 
 
 Brad
 
 On Jan 9, 2013, at 1:21 PM, Nate Coraor n...@bx.psu.edu
 wrote:
 
 On Jan 7, 2013, at 10:43 PM, Langhorst, Brad wrote:
 
 galaxy.web.framework WARNING 2013-01-07 22:06:30,044 User logged in as 
 '(null)' externally, but has a cookie as 'langho...@neb.com' invalidating 
 session
 10.254.254.86 - - [07/Jan/2013:22:06:30 -0400] GET 
 /api/histories/4c8cd68e0b9ed4a7 HTTP/1.1 403 - 
 http://galaxy.neb.com/history; Mozilla/5.0 (Macintosh; Intel Mac OS X 
 10.8; rv:17.0) Gecko/20100101 Firefox/17.0
 
 Hi Brad,
 
 Your proxy server appears to no longer be passing the REMOTE_USER header 
 correctly.  Did anything change there?
 
 --nate
 
 
 i did not observe this before today… but maybe I've screwed things up in 
 my hg fiddling today.
 Did anybody else see this?
 
 manual refreshes work, but automatic ones don't
 
 
 Brad
 --
 Brad Langhorst
 langho...@neb.com
 
 
 
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
 http://lists.bx.psu.edu/
 
 
 --
 Brad Langhorst
 langho...@neb.com
 
 
 
 
 
 
 --
 Brad Langhorst
 langho...@neb.com
 
 
 
 
 
 


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Server stops itself

2013-01-10 Thread Nate Coraor
On Jan 10, 2013, at 4:43 AM, MONJEAUD wrote:

 Hi,
 
 indeed, you are right. In the database, the job_runner_external_id column 
 is empty for all jobs causing the crash of Galaxy when they are stopped. I 
 tried to launch the instance without the --daemon option and I have got this 
 segmentation fault as you suspected:
 
 run.sh: line 77: 19622 Segmentation fault  (core dumped) 
 /local/python/2.7-bis/bin/python ./scripts/paster.py serve universe_wsgi.ini
 
 If I understand, we can't delete a job with the state new (associated with 
 an empty job_runner_external_id column)?

Hi Cyril,

This was due to a bug, which has been fixed in c015b82b3944.

--nate

 
 Thanks,
 Cyril
 
 
 
 On 01/09/2013 07:51 PM, Nate Coraor wrote:
 Hi Cyril,
 
 If you start the server in the foreground (no --daemon option), is there a 
 segfault when the process dies?  If so, this is most likely a problem where 
 a job is attempting to be stopped that does not have an external job ID set. 
  Could you check this in the database for one of the jobs that's causing 
 this (e.g. 3065)?
 
 Thanks,
 --nate
 
 On Jan 9, 2013, at 4:39 AM, MONJEAUD wrote:
 
 Hello All,
 
 after more researchs, I found that the crash of the galaxy server was 
 caused by stopping jobs. We are working with our own SGE cluster.
 
 It's weird because we can kill jobs via history or administration panel 
 without problem.
 
 In the paster.log, we just got this message before the crash of the server :
 galaxy.jobs.handler DEBUG 2013-01-08 16:52:39,877 Stopping job 3065:
 galaxy.jobs.handler DEBUG 2013-01-08 16:52:39,877 stopping job 3065 in 
 drmaa runner
 I think this problem comes when there is many jobs in running, new and 
 queued states.
 
 Cheers,
 Cyril
 
 
 On 01/08/2013 04:11 PM, MONJEAUD wrote:
 Hello All,
 
 I'm trying to deploy my instance of Galaxy in production. Some tests we've 
 done show that when the number of person connected is high (20 together), 
 the server stops itself.
 
 Sometimes, I have this error in the paster.log:
 
 Exception happened during processing of request from ('127.0.0.1', 60575)
 Traceback (most recent call last):
  File /opt/galaxy-dist/eggs/Paste-1.6-py2.7.egg/paste/httpserver.py, 
 line 1053, in process_request_in_thread
self.finish_request(request, client_address)
  File /local/python/2.7-bis/lib/python2.7/SocketServer.py, line 323, in 
 finish_request
self.RequestHandlerClass(request, client_address, self)
  File /local/python/2.7-bis/lib/python2.7/SocketServer.py, line 641, in 
 __init__
self.finish()
  File /local/python/2.7-bis/lib/python2.7/SocketServer.py, line 694, in 
 finish
self.wfile.flush()
  File /local/python/2.7-bis/lib/python2.7/socket.py, line 301, in flush
self._sock.sendall(view[write_offset:write_offset+buffer_size])
 error: [Errno 32] Broken pipe
 Do you have any ideas about this and how resolve it?
 
 Cheers!!
 Cyril
 
 -- 
 
 Cyril Monjeaud
 Equipe Symbiose / Plate-forme GenOuest
 Bureau D156
 IRISA-INRIA, Campus de Beaulieu
 35042 Rennes cedex, France
 Tél: +33 (0) 2 99 84 74 17
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
 http://lists.bx.psu.edu/
 
 
 
 -- 
 
 Cyril Monjeaud
 Equipe Symbiose / Plate-forme GenOuest
 Bureau D156
 IRISA-INRIA, Campus de Beaulieu
 35042 Rennes cedex, France
 Tél: +33 (0) 2 99 84 74 17
 


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/