Re: [galaxy-dev] Exporting histories fails: no space left on device

2013-03-29 Thread Joachim Jacob | VIB |
I can confirm that the proxy settings are the reason for the failing 
export. When I go to localhost:8080 directly, I can export large files 
from the Data Library.


When going via the proxy using the URL, download of large files does not 
work. Here is a hint  on what the solution might be 
(http://serverfault.com/questions/185894/proxy-error-502-reason-error-reading-from-remote-server-with-apache-2-2-3-de)



*** The error in the browser:


 Proxy Error

The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request 
/POST /library_common/act_on_multiple_datasets 
http://galaxy.bits.vib.be/library_common/act_on_multiple_datasets/.


Reason: *Error reading from remote server*


*** The error in the http logs:

[Fri Mar 29 10:22:03 2013] [error] [client 157.193.10.20] (70007)The 
timeout specified has expired: proxy: error reading status line from 
remote server localhost, referer: 
http://galaxy.bits.vib.be/library_common/browse_library?sort=namef-description=Allf-name=Allid=142184b92db50a63cntrller=libraryasync=falseshow_item_checkboxes=falseoperation=browsepage=1
[Fri Mar 29 10:22:03 2013] [error] [client 157.193.10.20] proxy: Error 
reading from remote server returned by 
/library_common/act_on_multiple_datasets, referer: 
http://galaxy.bits.vib.be/library_common/browse_library?sort=namef-description=Allf-name=Allid=142184b92db50a63cntrller=libraryasync=falseshow_item_checkboxes=falseoperation=browsepage=1



*** Our proxy settings
I would really appreciate if somebody could have a look at our current 
Apache proxy settings. Since I suspect the problem to be a time-out, I 
have tried modifying related parameters, with no luck.


===
[root@galaxy conf.d]# cat galaxy_web.conf
NameVirtualHost 157.193.230.103:80
VirtualHost 157.193.230.103:80
ServerName galaxy.bits.vib.be
SetEnv force-proxy-request-1.0 1# tried this, does not help
SetEnv proxy-nokeepalive 1 # tried this, does not help
KeepAliveTimeout 600 # tried this, does not help
ProxyPass /library_common/act_on_multiple_datasets 
http://galaxy.bits.vib.be/library_common /act_on_multiple_datasets max=6 
keepalive=On timeout=600 retry=10  #tried this, does not help.

Proxy balancer://galaxy
BalancerMember http://localhost:8080
BalancerMember http://localhost:8081
BalancerMember http://localhost:8082
BalancerMember http://localhost:8083
BalancerMember http://localhost:8084
BalancerMember http://localhost:8085
BalancerMember http://localhost:8086
BalancerMember http://localhost:8087
BalancerMember http://localhost:8088
BalancerMember http://localhost:8089
BalancerMember http://localhost:8090
BalancerMember http://localhost:8091
BalancerMember http://localhost:8092
/Proxy
RewriteEngine on
RewriteLog /tmp/apacheGalaxy.log
# Location /
# AuthType Basic
# AuthBasicProvider ldap
# AuthLDAPURL ldap://smeagol.vib.be:389/DC=vib,DC=local?sAMAccountName
# AuthLDAPBindDN vib\administrator
# AuthLDAPBindPassword tofillin
# AuthzLDAPAuthoritative off
# Require valid-user
# # Set the REMOTE_USER header to the contents of the LDAP query 
response's uid attribute

# RequestHeader set REMOTE_USER %{AUTHENTICATE_sAMAccountName}
# /Location
RewriteRule ^/static/style/(.*) 
/home/galaxy/galaxy-dist/static/june_2007_style/blue/$1 [L]
RewriteRule ^/static/scripts/(.*) 
/home/galaxy/galaxy-dist/static/scripts/packed/$1 [L]

RewriteRule ^/static/(.*) /home/galaxy/galaxy-dist/static/$1 [L]
RewriteRule ^/favicon.ico 
/home/galaxy/galaxy-dist/static/favicon.ico [L]

RewriteRule ^/robots.txt /home/galaxy/galaxy-dist/static/robots.txt [L]
RewriteRule ^(.*) balancer://galaxy$1 [P]
/VirtualHost
==


Thanks,

Joachim

Joachim Jacob

Rijvisschestraat 120, 9052 Zwijnaarde
Tel: +32 9 244.66.34
Bioinformatics Training and Services (BITS)
http://www.bits.vib.be
@bitsatvib

On 03/28/2013 03:21 PM, Joachim Jacob | VIB | wrote:

OK, it seems to be a proxy error.

When the proxy does not receive data from the server, it times out, 
and closes the connection.
I think the process that packs the datasets takes too long, so the 
connection is closed before the packaging is finished? Just a gues...


From the httpd logs:
=
[Thu Mar 28 15:14:46 2013] [error] [client 157.193.10.52] (70007)The 
timeout specified has expired: proxy: error reading status line from 
remote server localhost, referer: 
http://galaxy.bits.vib.be/library_common/browse_library?sort=namef-description=Allf-name=Allid=142184b92db50a63cntrller=libraryasync=falseshow_item_checkboxes=falseoperation=browsepage=1
[Thu Mar 28 15:14:46 2013] [error] [client 157.193.10.52] proxy: Error 
reading from remote server returned by 
/library_common/act_on_multiple_datasets, referer: 

Re: [galaxy-dev] Exporting histories fails: no space left on device

2013-03-28 Thread Joachim Jacob | VIB |

Hi Assaf,


After all, the problem appears not to be total size of the history, but 
the size of the individual datasets.


Now, histories which contain big datasets (1GB) imported from Data 
Libraries causes the exporting process to crash. Can somebody confirm 
if this is a bug? I uploaded the datasets to a directory, which are 
then imported from that directory into a Data Library.


Downloading data sets 1GB from a data library directly (as tar.gz) 
also crashes.


Note: I have re-enabled abrt, but waiting for some jobs to be finished 
to restart.



Cheers,
Joachim.


Joachim Jacob

Rijvisschestraat 120, 9052 Zwijnaarde
Tel: +32 9 244.66.34
Bioinformatics Training and Services (BITS)
http://www.bits.vib.be
@bitsatvib



On Tue 26 Mar 2013 03:45:43 PM CET, Assaf Gordon wrote:

Hello Joachim,

Joachim Jacob | VIB | wrote, On 03/26/2013 10:01 AM:


abrt was filling the root directory indeed. So disabled it.

I have done some exporting tests, and the behaviour is not consistent.

1. *size*: in general, it worked out for smaller datasets, and usually crashed 
on bigger ones (starting from 3 GB). So size is key?
2. But now I have found several histories of 4.5GB that I was able to export... 
So far for the size hypothesis.

Another observation: when the export crashes, the corresponding webhandler 
process dies.



A crashing python process crosses the fine boundary between the Galaxy code and 
Python internals... perhaps the Galaxy developers can help with this problem.

It would be helpful to find a reproducible case with a specific history or a 
specific sequence of events, then someone can help you with the debugging.

Once you find a history that causes a crash (every time or sometimes, but in a 
reproducible way), try to pinpoint when exactly it happens:
Is it when you start preparing the export (and export_history.py is running 
as a job), or when you start downloading the exported file.
(I'm a bit behind on the export mechanism, so perhaps there are other steps 
involved?).

Couple of things to try:

1. set cleanup_job=never in your universe_wsgi.ini - this will keep the 
temporary files, and will help you re-produce jobs later.

2. Enable abrt again - it is not the problem (just the symptom).
You can cleanup the /var/spool/abrt/XXX directory from previous crash logs, 
then reproduce a new crash, and look at the collected files (assuming you have enough 
space to store at least one crash).
In particular, look at the file called coredump - it will tell you which 
script has crashed.
Try running:
 $ file /var/spool/abrt//coredump
 coredump ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, 
from 'python XX.py'

Instead of .py it would show the python script that crashed (hopefully 
with full command-line parameters).

It won't show which python statement caused the crash, but it will point in the 
right direction.


So now I suspect something to be wrong with the datasets, but I am not able to trace 
something meaningful in the logs.  I am not confident in turning on logging in Python 
yet, but apparently this happens with the module logging initiated like 
logging.getLogger( __name__ ).



It could be a bad dataset (file on disk), or a problem in the database, or 
something completely different (a bug in the python archive module).
No point guessing until there are more details.

-gordon



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Exporting histories fails: no space left on device

2013-03-28 Thread Joachim Jacob | VIB |

OK, it seems to be a proxy error.

When the proxy does not receive data from the server, it times out, and 
closes the connection.
I think the process that packs the datasets takes too long, so the 
connection is closed before the packaging is finished? Just a gues...


From the httpd logs:
=
[Thu Mar 28 15:14:46 2013] [error] [client 157.193.10.52] (70007)The 
timeout specified has expired: proxy: error reading status line from 
remote server localhost, referer: 
http://galaxy.bits.vib.be/library_common/browse_library?sort=namef-description=Allf-name=Allid=142184b92db50a63cntrller=libraryasync=falseshow_item_checkboxes=falseoperation=browsepage=1
[Thu Mar 28 15:14:46 2013] [error] [client 157.193.10.52] proxy: Error 
reading from remote server returned by 
/library_common/act_on_multiple_datasets, referer: 
http://galaxy.bits.vib.be/library_common/browse_library?sort=namef-description=Allf-name=Allid=142184b92db50a63cntrller=libraryasync=falseshow_item_checkboxes=falseoperation=browsepage=1

=

See if changing time out settings fixes this issue.


Cheers,
Joachim

Joachim Jacob

Rijvisschestraat 120, 9052 Zwijnaarde
Tel: +32 9 244.66.34
Bioinformatics Training and Services (BITS)
http://www.bits.vib.be
@bitsatvib

On 03/28/2013 02:58 PM, Joachim Jacob | VIB | wrote:

Hi Assaf,


After all, the problem appears not to be total size of the history, 
but the size of the individual datasets.


Now, histories which contain big datasets (1GB) imported from Data 
Libraries causes the exporting process to crash. Can somebody confirm 
if this is a bug? I uploaded the datasets to a directory, which are 
then imported from that directory into a Data Library.


Downloading data sets 1GB from a data library directly (as tar.gz) 
also crashes.


Note: I have re-enabled abrt, but waiting for some jobs to be finished 
to restart.



Cheers,
Joachim.


Joachim Jacob

Rijvisschestraat 120, 9052 Zwijnaarde
Tel: +32 9 244.66.34
Bioinformatics Training and Services (BITS)
http://www.bits.vib.be
@bitsatvib



On Tue 26 Mar 2013 03:45:43 PM CET, Assaf Gordon wrote:

Hello Joachim,

Joachim Jacob | VIB | wrote, On 03/26/2013 10:01 AM:


abrt was filling the root directory indeed. So disabled it.

I have done some exporting tests, and the behaviour is not consistent.

1. *size*: in general, it worked out for smaller datasets, and 
usually crashed on bigger ones (starting from 3 GB). So size is key?
2. But now I have found several histories of 4.5GB that I was able 
to export... So far for the size hypothesis.


Another observation: when the export crashes, the corresponding 
webhandler process dies.




A crashing python process crosses the fine boundary between the 
Galaxy code and Python internals... perhaps the Galaxy developers can 
help with this problem.


It would be helpful to find a reproducible case with a specific 
history or a specific sequence of events, then someone can help you 
with the debugging.


Once you find a history that causes a crash (every time or sometimes, 
but in a reproducible way), try to pinpoint when exactly it happens:
Is it when you start preparing the export (and export_history.py is 
running as a job), or when you start downloading the exported file.
(I'm a bit behind on the export mechanism, so perhaps there are other 
steps involved?).


Couple of things to try:

1. set cleanup_job=never in your universe_wsgi.ini - this will keep 
the temporary files, and will help you re-produce jobs later.


2. Enable abrt again - it is not the problem (just the symptom).
You can cleanup the /var/spool/abrt/XXX directory from previous 
crash logs, then reproduce a new crash, and look at the collected 
files (assuming you have enough space to store at least one crash).
In particular, look at the file called coredump - it will tell you 
which script has crashed.

Try running:
 $ file /var/spool/abrt//coredump
 coredump ELF 64-bit LSB core file x86-64, version 1 (SYSV), 
SVR4-style, from 'python XX.py'


Instead of .py it would show the python script that crashed 
(hopefully with full command-line parameters).


It won't show which python statement caused the crash, but it will 
point in the right direction.


So now I suspect something to be wrong with the datasets, but I am 
not able to trace something meaningful in the logs.  I am not 
confident in turning on logging in Python yet, but apparently this 
happens with the module logging initiated like logging.getLogger( 
__name__ ).




It could be a bad dataset (file on disk), or a problem in the 
database, or something completely different (a bug in the python 
archive module).

No point guessing until there are more details.

-gordon




___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the 

Re: [galaxy-dev] Exporting histories fails: no space left on device

2013-03-26 Thread Joachim Jacob | VIB |

Hi  Gordon,


Thanks for your assistance and the recommendations. Freezing postgres 
sounds like hell to me :-)


abrt was filling the root directory indeed. So disabled it.

I have done some exporting tests, and the behaviour is not consistent.

1. *size*: in general, it worked out for smaller datasets, and usually 
crashed on bigger ones (starting from 3 GB). So size is key?
2. But now I have found several histories of 4.5GB that I was able to 
export... So far for the size hypothesis.


Another observation: when the export crashes, the corresponding 
webhandler process dies.


So now I suspect something to be wrong with the datasets, but I am not 
able to trace something meaningful in the logs.  I am not confident in 
turning on logging in Python yet, but apparently this happens with the 
module logging initiated like logging.getLogger( __name__ ).



Cheers,
Joachim

Joachim Jacob

Rijvisschestraat 120, 9052 Zwijnaarde
Tel: +32 9 244.66.34
Bioinformatics Training and Services (BITS)
http://www.bits.vib.be
@bitsatvib

On 03/25/2013 05:18 PM, Assaf Gordon wrote:

Hello Joachim,

Couple of things to check:


On Mar 25, 2013, at 10:01 AM, Joachim Jacob | VIB | wrote:


Hi,

About the exporting of history, which fails:
1. the preparation seems to work fine: meaning: choosing 'Export this history' 
in the History menu leads to a URL that reports initially that the export is 
still in progress.

2. when the export is finished, and I click the download link, the  root partition fills 
and the browser displays Error reading from remote server. A folder 
ccpp-2013-03-25-14:51:15-27045.new is created in the directory /var/spool/abrt, which 
fills the root partition.

Something in your export is likely not finishing fine, but crashes instead 
(either the creation of the archive, or the download).

The folder /var/spool/abrt/ccpp- (and especially a file named coredump) 
hints that the program crashed.
abrt is a daemon (at least on Fedora) that monitors crashes and tries to keep 
all relevant information about the program which crashed 
(http://docs.fedoraproject.org/en-US/Fedora/13/html/Deployment_Guide/ch-abrt.html).

So what might have happened, is that a program (galaxy's export_history.py or other) 
crashed during your export, and then abrt picked-up the pieces (storing a 
memory dump, for example), and then filled your disk.


The handler reports in its log:

galaxy.jobs DEBUG 2013-03-25 14:38:33,322 (8318) Working directory for job is: 
/mnt/galaxydb/job_working_directory/008/8318
galaxy.jobs.handler DEBUG 2013-03-25 14:38:33,322 dispatching job 8318 to local 
runner
galaxy.jobs.handler INFO 2013-03-25 14:38:33,368 (8318) Job dispatched
galaxy.jobs.runners.local DEBUG 2013-03-25 14:38:33,432 Local runner: starting 
job 8318
galaxy.jobs.runners.local DEBUG 2013-03-25 14:38:33,572 executing: python 
/home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G 
/mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF 
/mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat
galaxy.jobs.runners.local DEBUG 2013-03-25 14:41:29,420 execution finished: 
python /home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G 
/mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF 
/mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat
galaxy.jobs DEBUG 2013-03-25 14:41:29,476 Tool did not define exit code or 
stdio handling; checking stderr for success
galaxy.tools DEBUG 2013-03-25 14:41:29,530 Error opening galaxy.json file: 
[Errno 2] No such file or directory: 
'/mnt/galaxydb/job_working_directory/008/8318/galaxy.json'
galaxy.jobs DEBUG 2013-03-25 14:41:29,555 job 8318 ended


The system reports:

Mar 25 14:51:26 galaxy abrt[16805]: Write error: No space left on device
Mar 25 14:51:27 galaxy abrt[16805]: Error writing 
'/var/spool/abrt/ccpp-2013-03-25-14:51:15-27045.new/coredump'



One thing to try: if you have galaxy keeping temporary files, try running the 
export command manually:
===
python /home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G 
/mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF 
/mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat
===

Another thing to try: modify export_history.py, adding debug messages to 
track progress and whether it finishes or not.

And: check the abrt program's GUI, perhaps you'll see previous crashes that 
were stored successfully, providing more information about which program crashed.


As a general rule, it's best to keep the /var directory on a separate 
partition for production systems, exactly so that filling it up with junk wouldn't 
intervene with other programs.
Even better, set each sub-directory of /var to a dedicated partition, so that filling up 
/var/log or /var/spool would not fill up /var/lib/pgsql and stop Postgres from 
working.


-gordon







Re: [galaxy-dev] Exporting histories fails: no space left on device

2013-03-26 Thread Assaf Gordon
Hello Joachim,

Joachim Jacob | VIB | wrote, On 03/26/2013 10:01 AM:
 
 abrt was filling the root directory indeed. So disabled it.
 
 I have done some exporting tests, and the behaviour is not consistent.
 
 1. *size*: in general, it worked out for smaller datasets, and usually 
 crashed on bigger ones (starting from 3 GB). So size is key?
 2. But now I have found several histories of 4.5GB that I was able to 
 export... So far for the size hypothesis.
 
 Another observation: when the export crashes, the corresponding webhandler 
 process dies.
 

A crashing python process crosses the fine boundary between the Galaxy code and 
Python internals... perhaps the Galaxy developers can help with this problem.

It would be helpful to find a reproducible case with a specific history or a 
specific sequence of events, then someone can help you with the debugging.

Once you find a history that causes a crash (every time or sometimes, but in a 
reproducible way), try to pinpoint when exactly it happens:
Is it when you start preparing the export (and export_history.py is running 
as a job), or when you start downloading the exported file.
(I'm a bit behind on the export mechanism, so perhaps there are other steps 
involved?).

Couple of things to try:

1. set cleanup_job=never in your universe_wsgi.ini - this will keep the 
temporary files, and will help you re-produce jobs later.

2. Enable abrt again - it is not the problem (just the symptom).
You can cleanup the /var/spool/abrt/XXX directory from previous crash logs, 
then reproduce a new crash, and look at the collected files (assuming you have 
enough space to store at least one crash).
In particular, look at the file called coredump - it will tell you which 
script has crashed.
Try running:
$ file /var/spool/abrt//coredump
coredump ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, 
from 'python XX.py'

Instead of .py it would show the python script that crashed (hopefully 
with full command-line parameters).

It won't show which python statement caused the crash, but it will point in the 
right direction.

 So now I suspect something to be wrong with the datasets, but I am not able 
 to trace something meaningful in the logs.  I am not confident in turning on 
 logging in Python yet, but apparently this happens with the module logging 
 initiated like logging.getLogger( __name__ ).
 

It could be a bad dataset (file on disk), or a problem in the database, or 
something completely different (a bug in the python archive module).
No point guessing until there are more details.

-gordon
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Exporting histories fails: no space left on device

2013-03-25 Thread Jeremy Goecks
Please keep all replies on-list so that everyone can contribute.

Someone more knowledgeable about systems than me suggests that lsof(8) and/or 
/proc/galaxy server pid/fd should yield some clues as to what file is being 
written to.

Good luck,
J.

On Mar 25, 2013, at 10:01 AM, Joachim Jacob | VIB | wrote:

 Hi,
 
 About the exporting of history, which fails:
 1. the preparation seems to work fine: meaning: choosing 'Export this 
 history' in the History menu leads to a URL that reports initially that the 
 export is still in progress.
 
 2. when the export is finished, and I click the download link, the  root 
 partition fills and the browser displays Error reading from remote server. 
 A folder ccpp-2013-03-25-14:51:15-27045.new is created in the directory 
 /var/spool/abrt, which fills the root partition.
 
 The handler reports in its log:
 
 galaxy.jobs DEBUG 2013-03-25 14:38:33,322 (8318) Working directory for job 
 is: /mnt/galaxydb/job_working_directory/008/8318
 galaxy.jobs.handler DEBUG 2013-03-25 14:38:33,322 dispatching job 8318 to 
 local runner
 galaxy.jobs.handler INFO 2013-03-25 14:38:33,368 (8318) Job dispatched
 galaxy.jobs.runners.local DEBUG 2013-03-25 14:38:33,432 Local runner: 
 starting job 8318
 galaxy.jobs.runners.local DEBUG 2013-03-25 14:38:33,572 executing: python 
 /home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G 
 /mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF 
 /mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat
 galaxy.jobs.runners.local DEBUG 2013-03-25 14:41:29,420 execution finished: 
 python /home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G 
 /mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF 
 /mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat
 galaxy.jobs DEBUG 2013-03-25 14:41:29,476 Tool did not define exit code or 
 stdio handling; checking stderr for success
 galaxy.tools DEBUG 2013-03-25 14:41:29,530 Error opening galaxy.json file: 
 [Errno 2] No such file or directory: 
 '/mnt/galaxydb/job_working_directory/008/8318/galaxy.json'
 galaxy.jobs DEBUG 2013-03-25 14:41:29,555 job 8318 ended
 
 
 The system reports:
 
 Mar 25 14:51:26 galaxy abrt[16805]: Write error: No space left on device
 Mar 25 14:51:27 galaxy abrt[16805]: Error writing 
 '/var/spool/abrt/ccpp-2013-03-25-14:51:15-27045.new/coredump'
 
 
 
 Thanks,
 Joachim
 
 
 Joachim Jacob
 
 Rijvisschestraat 120, 9052 Zwijnaarde
 Tel: +32 9 244.66.34
 Bioinformatics Training and Services (BITS)
 http://www.bits.vib.be
 @bitsatvib
 
 
 
 On Tue 19 Mar 2013 11:22:27 PM CET, Jeremy Goecks wrote:
 I'm unable to reproduce this behavior using a clean version of galaxy-dist. 
 The code (export_history.py) doesn't create any temporary files and appears 
 to write directly to the output file, so it seems unlikely that Galaxy is 
 writing anything to the root directory.
 
 Can you provide the name of any file that Galaxy appears to be writing to 
 outside of galaxy-home? What about watching the job output file/export 
 file to see if that's increasingly in size and causing the out-of-space 
 error?
 
 Best,
 J.
 
 On Mar 19, 2013, at 10:56 AM, Joachim Jacob | VIB | wrote:
 
 Hi all,
 
 
 Exporting histories fails on our server:  Reason: *Error reading from 
 remote server.
 
 *When looking at the logs and the system:
 tail /var/log/messages
 Mar 19 15:52:47 galaxy abrt[25605]: Write error: No space left on device
 Mar 19 15:52:49 galaxy abrt[25605]: Error writing 
 '/var/spool/abrt/ccpp-2013-03-19-15:52:37-13394.new/coredump'
 
 So I watched my system when I repeated the export, and saw that Galaxy 
 fills up the root directory (/), instead of any temporary directory.
 
 Somebody has an idea where to adjust this setting, so the export function 
 uses any temporary directory?
 
 
 Thanks,
 Joachim
 
 --
 Joachim Jacob
 
 Rijvisschestraat 120, 9052 Zwijnaarde
 Tel: +32 9 244.66.34
 Bioinformatics Training and Services (BITS)
 http://www.bits.vib.be
 @bitsatvib
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
 http://lists.bx.psu.edu/
 
 
 


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Exporting histories fails: no space left on device

2013-03-25 Thread Assaf Gordon
Hello Joachim,

Couple of things to check:

 On Mar 25, 2013, at 10:01 AM, Joachim Jacob | VIB | wrote:
 
 Hi,

 About the exporting of history, which fails:
 1. the preparation seems to work fine: meaning: choosing 'Export this 
 history' in the History menu leads to a URL that reports initially that the 
 export is still in progress.

 2. when the export is finished, and I click the download link, the  root 
 partition fills and the browser displays Error reading from remote server. 
 A folder ccpp-2013-03-25-14:51:15-27045.new is created in the directory 
 /var/spool/abrt, which fills the root partition.

Something in your export is likely not finishing fine, but crashes instead 
(either the creation of the archive, or the download).

The folder /var/spool/abrt/ccpp- (and especially a file named coredump) 
hints that the program crashed.
abrt is a daemon (at least on Fedora) that monitors crashes and tries to keep 
all relevant information about the program which crashed 
(http://docs.fedoraproject.org/en-US/Fedora/13/html/Deployment_Guide/ch-abrt.html).

So what might have happened, is that a program (galaxy's export_history.py or 
other) crashed during your export, and then abrt picked-up the pieces 
(storing a memory dump, for example), and then filled your disk.


 The handler reports in its log:
 
 galaxy.jobs DEBUG 2013-03-25 14:38:33,322 (8318) Working directory for job 
 is: /mnt/galaxydb/job_working_directory/008/8318
 galaxy.jobs.handler DEBUG 2013-03-25 14:38:33,322 dispatching job 8318 to 
 local runner
 galaxy.jobs.handler INFO 2013-03-25 14:38:33,368 (8318) Job dispatched
 galaxy.jobs.runners.local DEBUG 2013-03-25 14:38:33,432 Local runner: 
 starting job 8318
 galaxy.jobs.runners.local DEBUG 2013-03-25 14:38:33,572 executing: python 
 /home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G 
 /mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF 
 /mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat
 galaxy.jobs.runners.local DEBUG 2013-03-25 14:41:29,420 execution finished: 
 python /home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py 
 -G /mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF 
 /mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat
 galaxy.jobs DEBUG 2013-03-25 14:41:29,476 Tool did not define exit code or 
 stdio handling; checking stderr for success
 galaxy.tools DEBUG 2013-03-25 14:41:29,530 Error opening galaxy.json file: 
 [Errno 2] No such file or directory: 
 '/mnt/galaxydb/job_working_directory/008/8318/galaxy.json'
 galaxy.jobs DEBUG 2013-03-25 14:41:29,555 job 8318 ended
 

 The system reports:
 
 Mar 25 14:51:26 galaxy abrt[16805]: Write error: No space left on device
 Mar 25 14:51:27 galaxy abrt[16805]: Error writing 
 '/var/spool/abrt/ccpp-2013-03-25-14:51:15-27045.new/coredump'
 


One thing to try: if you have galaxy keeping temporary files, try running the 
export command manually:
===
python /home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G 
/mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF 
/mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat
===

Another thing to try: modify export_history.py, adding debug messages to 
track progress and whether it finishes or not.

And: check the abrt program's GUI, perhaps you'll see previous crashes that 
were stored successfully, providing more information about which program 
crashed.


As a general rule, it's best to keep the /var directory on a separate 
partition for production systems, exactly so that filling it up with junk 
wouldn't intervene with other programs.
Even better, set each sub-directory of /var to a dedicated partition, so that 
filling up /var/log or /var/spool would not fill up /var/lib/pgsql and 
stop Postgres from working.


-gordon


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


[galaxy-dev] Exporting histories fails: no space left on device

2013-03-19 Thread Joachim Jacob | VIB |

Hi all,


Exporting histories fails on our server:  Reason: *Error reading from 
remote server.


*When looking at the logs and the system:
tail /var/log/messages
Mar 19 15:52:47 galaxy abrt[25605]: Write error: No space left on device
Mar 19 15:52:49 galaxy abrt[25605]: Error writing 
'/var/spool/abrt/ccpp-2013-03-19-15:52:37-13394.new/coredump'


So I watched my system when I repeated the export, and saw that Galaxy 
fills up the root directory (/), instead of any temporary directory.


Somebody has an idea where to adjust this setting, so the export 
function uses any temporary directory?



Thanks,
Joachim

--
Joachim Jacob

Rijvisschestraat 120, 9052 Zwijnaarde
Tel: +32 9 244.66.34
Bioinformatics Training and Services (BITS)
http://www.bits.vib.be
@bitsatvib

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Exporting histories fails: no space left on device

2013-03-19 Thread Jeremy Goecks
I'm unable to reproduce this behavior using a clean version of galaxy-dist. The 
code (export_history.py) doesn't create any temporary files and appears to 
write directly to the output file, so it seems unlikely that Galaxy is writing 
anything to the root directory.

Can you provide the name of any file that Galaxy appears to be writing to 
outside of galaxy-home? What about watching the job output file/export file 
to see if that's increasingly in size and causing the out-of-space error?

Best,
J.

On Mar 19, 2013, at 10:56 AM, Joachim Jacob | VIB | wrote:

 Hi all,
 
 
 Exporting histories fails on our server:  Reason: *Error reading from remote 
 server.
 
 *When looking at the logs and the system:
 tail /var/log/messages
 Mar 19 15:52:47 galaxy abrt[25605]: Write error: No space left on device
 Mar 19 15:52:49 galaxy abrt[25605]: Error writing 
 '/var/spool/abrt/ccpp-2013-03-19-15:52:37-13394.new/coredump'
 
 So I watched my system when I repeated the export, and saw that Galaxy fills 
 up the root directory (/), instead of any temporary directory.
 
 Somebody has an idea where to adjust this setting, so the export function 
 uses any temporary directory?
 
 
 Thanks,
 Joachim
 
 -- 
 Joachim Jacob
 
 Rijvisschestraat 120, 9052 Zwijnaarde
 Tel: +32 9 244.66.34
 Bioinformatics Training and Services (BITS)
 http://www.bits.vib.be
 @bitsatvib
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
 http://lists.bx.psu.edu/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/