Re: [galaxy-dev] job_wrapper.finish - No JSON object could be decoded:

2014-03-12 Thread Peter Cock
On Tue, Mar 11, 2014 at 4:04 PM, Peter Cock p.j.a.c...@googlemail.com wrote:
 On Tue, Mar 11, 2014 at 11:44 AM, Peter Cock p.j.a.c...@googlemail.com 
 wrote:
 Hi all,

 Some of our split BLAST jobs using the parallelism feature have
 been failing (apparently for a while, so this is not a recent problem
 from changes in the BLAST wrappers) as follows:

 quote
 The Galaxy framework encountered the following error while attempting
 to run the tool:

 Traceback (most recent call last):
   File /mnt/galaxy/galaxy-dist/lib/galaxy/jobs/runners/tasks.py,
 line 141, in queue_job
 job_wrapper.finish( stdout, stderr, job_exit_code )
   File /mnt/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py, line
 962, in finish
 if ( not 
 self.external_output_metadata.external_metadata_set_successfully(
 dataset, self.sa_session ) and
 self.app.config.retry_metadata_internally ):
   File /mnt/galaxy/galaxy-dist/lib/galaxy/datatypes/metadata.py,
 line 638, in external_metadata_set_successfully
 rval, rstring = simplejson.load( open(
 metadata_files.filename_results_code ) )
   File 
 /mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/__init__.py,
 line 328, in load
 use_decimal=use_decimal, **kw)
   File 
 /mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/__init__.py,
 line 384, in loads
 return _default_decoder.decode(s)
   File 
 /mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/decoder.py,
 line 402, in decode
 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
   File 
 /mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/decoder.py,
 line 420, in raw_decode
 raise JSONDecodeError(No JSON object could be decoded, s, idx)
 JSONDecodeError: No JSON object could be decoded: line 1 column 0 (char 0)

 Tool execution generated the following error message:

 Unable to finish job
 /quote

 Reading the code, this is breaking after all the child jobs have
 finished, and the data has been merged, during the external
 set metadata step.

 This specific example was BLAST XML output, and the merge
 had produced a 4.2 GB file which ended mid record (evidently
 the output merge was not really successful).

 However, as a partial XML file, it would be invalid and might
 this cause problems with something in the metadata setting
 code - and explain the reported error?

 Peter

 Clues from my log:

 2014-03-11T07:11:16.635421+00:00 ppserver galaxy.jobs.splitters.multi
 ERROR 2014-03-11 07:11:16,563 Error merging files#012Traceback (most
 recent call last):#012  File
 /mnt/galaxy/galaxy-dist/lib/galaxy/jobs/splitters/multi.py, line
 153, in do_merge#012output_type.merge(output_files,
 output_file_name, **extra_merge_args)#012  File
 /mnt/galaxy/galaxy-dist/lib/galaxy/datatypes/data.py, line 572, in
 merge#012shutil.copyfileobj(open(fsrc, 'rb'), fdst)#012  File
 /usr/lib64/python2.6/shutil.py, line 31, in copyfileobj#012
 fdst.write(buf)#012IOError: [Errno 28] No space left on device

 2014-03-11T07:11:25.594665+00:00 ppserver galaxy.jobs.runners.tasks
 ERROR 2014-03-11 07:11:25,558 Job wrapper finish method
 failed#012Traceback (most recent call last):#012  File
 /mnt/galaxy/galaxy-dist/lib/galaxy/jobs/runners/tasks.py, line 141,
 in queue_job#012job_wrapper.finish( stdout, stderr, job_exit_code
 )#012  File /mnt/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py,
 line 962, in finish#012if ( not
 self.external_output_metadata.external_metadata_set_successfully(
 dataset, self.sa_session ) and
 self.app.config.retry_metadata_internally ):#012  File
 /mnt/galaxy/galaxy-dist/lib/galaxy/datatypes/metadata.py, line 638,
 in external_metadata_set_successfully#012rval, rstring =
 simplejson.load( open( metadata_files.filename_results_code ) )#012
 File 
 /mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/__init__.py,
 line 328, in load#012use_decimal=use_decimal, **kw)#012  File
 /mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/__init__.py,
 line 384, in loads#012return _default_decoder.decode(s)#012  File
 /mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/decoder.py,
 line 402, in decode#012obj, end = self.raw_decode(s, idx=_w(s,
 0).end())#012  File
 /mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/decoder.py,
 line 420, in raw_decode#012raise JSONDecodeError(No JSON object
 could be decoded, s, idx)#012JSONDecodeError: No JSON object could be
 decoded: line 1 column 0 (char 0)

 So this looks like the root cause was a full partition, but Galaxy handled
 this poorly (showing a JSON error rather than disk full).

 This proposed patch should give a more helpful error message from the
 metadata script, which is a start:

 $ hg diff lib/galaxy/datatypes/metadata.py
 diff -r 26f58e05aa10 lib/galaxy/datatypes/metadata.py
 --- 

Re: [galaxy-dev] job_wrapper.finish - No JSON object could be decoded:

2014-03-11 Thread Peter Cock
On Tue, Mar 11, 2014 at 11:44 AM, Peter Cock p.j.a.c...@googlemail.com wrote:
 Hi all,

 Some of our split BLAST jobs using the parallelism feature have
 been failing (apparently for a while, so this is not a recent problem
 from changes in the BLAST wrappers) as follows:

 quote
 The Galaxy framework encountered the following error while attempting
 to run the tool:

 Traceback (most recent call last):
   File /mnt/galaxy/galaxy-dist/lib/galaxy/jobs/runners/tasks.py,
 line 141, in queue_job
 job_wrapper.finish( stdout, stderr, job_exit_code )
   File /mnt/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py, line
 962, in finish
 if ( not self.external_output_metadata.external_metadata_set_successfully(
 dataset, self.sa_session ) and
 self.app.config.retry_metadata_internally ):
   File /mnt/galaxy/galaxy-dist/lib/galaxy/datatypes/metadata.py,
 line 638, in external_metadata_set_successfully
 rval, rstring = simplejson.load( open(
 metadata_files.filename_results_code ) )
   File 
 /mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/__init__.py,
 line 328, in load
 use_decimal=use_decimal, **kw)
   File 
 /mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/__init__.py,
 line 384, in loads
 return _default_decoder.decode(s)
   File 
 /mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/decoder.py,
 line 402, in decode
 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
   File 
 /mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/decoder.py,
 line 420, in raw_decode
 raise JSONDecodeError(No JSON object could be decoded, s, idx)
 JSONDecodeError: No JSON object could be decoded: line 1 column 0 (char 0)

 Tool execution generated the following error message:

 Unable to finish job
 /quote

 Reading the code, this is breaking after all the child jobs have
 finished, and the data has been merged, during the external
 set metadata step.

 This specific example was BLAST XML output, and the merge
 had produced a 4.2 GB file which ended mid record (evidently
 the output merge was not really successful).

 However, as a partial XML file, it would be invalid and might
 this cause problems with something in the metadata setting
 code - and explain the reported error?

 Peter

Clues from my log:

2014-03-11T07:11:16.635421+00:00 ppserver galaxy.jobs.splitters.multi
ERROR 2014-03-11 07:11:16,563 Error merging files#012Traceback (most
recent call last):#012  File
/mnt/galaxy/galaxy-dist/lib/galaxy/jobs/splitters/multi.py, line
153, in do_merge#012output_type.merge(output_files,
output_file_name, **extra_merge_args)#012  File
/mnt/galaxy/galaxy-dist/lib/galaxy/datatypes/data.py, line 572, in
merge#012shutil.copyfileobj(open(fsrc, 'rb'), fdst)#012  File
/usr/lib64/python2.6/shutil.py, line 31, in copyfileobj#012
fdst.write(buf)#012IOError: [Errno 28] No space left on device

2014-03-11T07:11:25.594665+00:00 ppserver galaxy.jobs.runners.tasks
ERROR 2014-03-11 07:11:25,558 Job wrapper finish method
failed#012Traceback (most recent call last):#012  File
/mnt/galaxy/galaxy-dist/lib/galaxy/jobs/runners/tasks.py, line 141,
in queue_job#012job_wrapper.finish( stdout, stderr, job_exit_code
)#012  File /mnt/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py,
line 962, in finish#012if ( not
self.external_output_metadata.external_metadata_set_successfully(
dataset, self.sa_session ) and
self.app.config.retry_metadata_internally ):#012  File
/mnt/galaxy/galaxy-dist/lib/galaxy/datatypes/metadata.py, line 638,
in external_metadata_set_successfully#012rval, rstring =
simplejson.load( open( metadata_files.filename_results_code ) )#012
File 
/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/__init__.py,
line 328, in load#012use_decimal=use_decimal, **kw)#012  File
/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/__init__.py,
line 384, in loads#012return _default_decoder.decode(s)#012  File
/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/decoder.py,
line 402, in decode#012obj, end = self.raw_decode(s, idx=_w(s,
0).end())#012  File
/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/decoder.py,
line 420, in raw_decode#012raise JSONDecodeError(No JSON object
could be decoded, s, idx)#012JSONDecodeError: No JSON object could be
decoded: line 1 column 0 (char 0)

So this looks like the root cause was a full partition, but Galaxy handled
this poorly (showing a JSON error rather than disk full).

This proposed patch should give a more helpful error message from the
metadata script, which is a start:

$ hg diff lib/galaxy/datatypes/metadata.py
diff -r 26f58e05aa10 lib/galaxy/datatypes/metadata.py
--- a/lib/galaxy/datatypes/metadata.pyMon Nov 04 14:58:55 2013 -0500
+++ b/lib/galaxy/datatypes/metadata.pyTue Mar 11