Re: [galaxy-dev] Corner case in task splitter - merging zero files

2012-10-22 Thread Peter Cock
On Fri, Oct 19, 2012 at 8:57 PM, Scott McManus scottmcma...@gatech.edu wrote:

 Ok -it's in. Thanks again! I will add a to-do item to put output-merge 
 messages
 into stdout so that they're more visible.

 -Scott

Great, thanks.

I see Edward Kirton had already reported the underlying problem that was
triggering this on our system - Job output not returned from cluster is not
being treated as an error condition:

https://trello.com/card/813-drmaa-py-job-output-not-returned-from-cluster-should-also-set-exit-code-str-to-non-zero-value/50686d0302dfa79d13d90c45/257

(The markup imported from bitbucket seems to have messed up but the
gist of the report is understandable)

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Corner case in task splitter - merging zero files

2012-10-19 Thread Peter Cock
On Thu, Oct 18, 2012 at 5:19 PM, Scott McManus scottmcma...@gatech.edu wrote:

 Hey Peter-

 Thanks - I'll look into it. If you're able to reproduce the problem easily
 and wouldn't mind crafting a pull request, then it would be much
 appreciated. Otherwise I'll put this on my to-do list to be done soon.
 I or someone else may want to revisit the exception handling to prevent
 that from happening.

 Thanks!

 -Scott

OK then:
https://bitbucket.org/galaxy/galaxy-central/pull-request/78/avoid-stall-when-merging-zero-files-fao/diff

I can explain what was happening: We had a mount problem. The
Galaxy server could talk to SGE and submit jobs, but when the
jobs came to run the mount providing their home directory and
the Galaxy file system was down, so they failed. Naturally this
meant Galaxy got no output files back.

Reading the code, you deliberately attempt to merge any files
present (e.g. if 9 out of 10 come back). That does make sense
as it could be instructive (as long as it is flagged as an error,
which doesn't seem to be happening).

I think getting zero files back from the split-jobs ought to be an
error condition. In fact, failing to get all the expected sub-files
back should also be an error condition (although it is still nice
to do the merge so the user can see the partial output).

I think a little re-factoring might be needed to treat these
explicitly as errors.

Regards,

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Corner case in task splitter - merging zero files

2012-10-19 Thread Scott McManus

Ok -it's in. Thanks again! I will add a to-do item to put output-merge messages
into stdout so that they're more visible.

-Scott

- Original Message -
 
 Thanks, Peter! I'll get to it this afternoon EDT.
 
 -Scott
 
 - Original Message -
  On Thu, Oct 18, 2012 at 5:19 PM, Scott McManus
  scottmcma...@gatech.edu wrote:
  
   Hey Peter-
  
   Thanks - I'll look into it. If you're able to reproduce the
   problem
   easily
   and wouldn't mind crafting a pull request, then it would be much
   appreciated. Otherwise I'll put this on my to-do list to be done
   soon.
   I or someone else may want to revisit the exception handling to
   prevent
   that from happening.
  
   Thanks!
  
   -Scott
  
  OK then:
  https://bitbucket.org/galaxy/galaxy-central/pull-request/78/avoid-stall-when-merging-zero-files-fao/diff
  
  I can explain what was happening: We had a mount problem. The
  Galaxy server could talk to SGE and submit jobs, but when the
  jobs came to run the mount providing their home directory and
  the Galaxy file system was down, so they failed. Naturally this
  meant Galaxy got no output files back.
  
  Reading the code, you deliberately attempt to merge any files
  present (e.g. if 9 out of 10 come back). That does make sense
  as it could be instructive (as long as it is flagged as an error,
  which doesn't seem to be happening).
  
  I think getting zero files back from the split-jobs ought to be an
  error condition. In fact, failing to get all the expected sub-files
  back should also be an error condition (although it is still nice
  to do the merge so the user can see the partial output).
  
  I think a little re-factoring might be needed to treat these
  explicitly as errors.
  
  Regards,
  
  Peter
  
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
   http://lists.bx.psu.edu/
 
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Corner case in task splitter - merging zero files

2012-10-18 Thread Peter Cock
Hi Scott,

Following some failing hard drives, I'm rebuilding our Galaxy server.
Something isn't quite right with our cluster integration yet, but it has
exposed a problem in Galaxy's handling of task splitting - it can
sometimes attempt to merge zero files.

Here is my fix for the BLAST XML format (now in the ToolShed),
https://bitbucket.org/peterjc/galaxy-central/changeset/5cb6411bad19802ba4001a083164366b42850a48

Here's an example using the text format:

galaxy.jobs.splitters.multi ERROR 2012-10-18 16:26:21,330 Error merging files
Traceback (most recent call last):
  File /mnt/galaxy/galaxy-central/lib/galaxy/jobs/splitters/multi.py,
line 133, in do_merge
output_type.merge(output_files, output_file_name)
  File /mnt/galaxy/galaxy-central/lib/galaxy/datatypes/data.py, line
545, in merge
raise Exception('Result %s from %s' % (result, cmd))
Exception: Result 2 from cat  
/mnt/galaxy/galaxy-central/database/files/000/dataset_304.dat

The problem obviously is that while cat file1 ... fileN  merged will
work fine for one or more files, with no files it sits waiting for stdin
(and from a user perspective stalls).

This logic error is in lib/galaxy/datatypes/data.py method merge,
which could either treat zero files as an error, or a no-op:

if len(split_files) == 1:
cmd = 'mv -f %s %s' % ( split_files[0], output_file )
else:
cmd = 'cat %s  %s' % ( ' '.join(split_files), output_file )
result = os.system(cmd)

I think this should be something like this:

if not split_files:
raise Exception('Asked to merge zero files')
elif len(split_files) == 1:
cmd = 'mv -f %s %s' % ( split_files[0], output_file )
else:
cmd = 'cat %s  %s' % ( ' '.join(split_files), output_file )
result = os.system(cmd)

It might also make sense to check for zero files in the code which
calls the merge, i.e. lib/galaxy/jobs/splitters/multi.py function do_merge
I'm still investigating upstream how this comes about, one clue:

galaxy.jobs.runners.drmaa DEBUG 2012-10-18 16:25:01,930 (273/510)
state change: job is running
galaxy.jobs.runners.drmaa DEBUG 2012-10-18 16:25:03,040 (273/510)
state change: job finished, but failed
galaxy.jobs.runners.drmaa DEBUG 2012-10-18 16:25:03,074 Job output not
returned from cluster
galaxy.jobs DEBUG 2012-10-18 16:25:03,074 task 641 for job 273 ended;
exit code: 0
galaxy.jobs DEBUG 2012-10-18 16:25:03,148 task 641 ended
galaxy.jobs.runners.tasks DEBUG 2012-10-18 16:25:05,169 execution
finished - beginning merge: tblastx -query
/mnt/galaxy/galaxy-central/database/files/000/dataset_127.dat   -db
/var/local/blast/ncbi/nt -query_gencode 2 -evalue 0.001 -out
/mnt/galaxy/galaxy-central/database/files/000/dataset_304.dat
-outfmt 0 -num_threads 8
galaxy.jobs.splitters.multi DEBUG 2012-10-18 16:25:05,181 files []

If you would prefer that small suggestion as a pull request, let me know.

Regards,

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Corner case in task splitter - merging zero files

2012-10-18 Thread Scott McManus

Hey Peter-

Thanks - I'll look into it. If you're able to reproduce the problem easily
and wouldn't mind crafting a pull request, then it would be much 
appreciated. Otherwise I'll put this on my to-do list to be done soon.
I or someone else may want to revisit the exception handling to prevent
that from happening.

Thanks!

-Scott

- Original Message -
 Hi Scott,
 
 Following some failing hard drives, I'm rebuilding our Galaxy server.
 Something isn't quite right with our cluster integration yet, but it
 has
 exposed a problem in Galaxy's handling of task splitting - it can
 sometimes attempt to merge zero files.
 
 Here is my fix for the BLAST XML format (now in the ToolShed),
 https://bitbucket.org/peterjc/galaxy-central/changeset/5cb6411bad19802ba4001a083164366b42850a48
 
 Here's an example using the text format:
 
 galaxy.jobs.splitters.multi ERROR 2012-10-18 16:26:21,330 Error
 merging files
 Traceback (most recent call last):
   File
   /mnt/galaxy/galaxy-central/lib/galaxy/jobs/splitters/multi.py,
 line 133, in do_merge
 output_type.merge(output_files, output_file_name)
   File /mnt/galaxy/galaxy-central/lib/galaxy/datatypes/data.py,
   line
 545, in merge
 raise Exception('Result %s from %s' % (result, cmd))
 Exception: Result 2 from cat  
 /mnt/galaxy/galaxy-central/database/files/000/dataset_304.dat
 
 The problem obviously is that while cat file1 ... fileN  merged
 will
 work fine for one or more files, with no files it sits waiting for
 stdin
 (and from a user perspective stalls).
 
 This logic error is in lib/galaxy/datatypes/data.py method merge,
 which could either treat zero files as an error, or a no-op:
 
 if len(split_files) == 1:
 cmd = 'mv -f %s %s' % ( split_files[0], output_file )
 else:
 cmd = 'cat %s  %s' % ( ' '.join(split_files),
 output_file )
 result = os.system(cmd)
 
 I think this should be something like this:
 
 if not split_files:
 raise Exception('Asked to merge zero files')
 elif len(split_files) == 1:
 cmd = 'mv -f %s %s' % ( split_files[0], output_file )
 else:
 cmd = 'cat %s  %s' % ( ' '.join(split_files),
 output_file )
 result = os.system(cmd)
 
 It might also make sense to check for zero files in the code which
 calls the merge, i.e. lib/galaxy/jobs/splitters/multi.py function
 do_merge
 I'm still investigating upstream how this comes about, one clue:
 
 galaxy.jobs.runners.drmaa DEBUG 2012-10-18 16:25:01,930 (273/510)
 state change: job is running
 galaxy.jobs.runners.drmaa DEBUG 2012-10-18 16:25:03,040 (273/510)
 state change: job finished, but failed
 galaxy.jobs.runners.drmaa DEBUG 2012-10-18 16:25:03,074 Job output
 not
 returned from cluster
 galaxy.jobs DEBUG 2012-10-18 16:25:03,074 task 641 for job 273 ended;
 exit code: 0
 galaxy.jobs DEBUG 2012-10-18 16:25:03,148 task 641 ended
 galaxy.jobs.runners.tasks DEBUG 2012-10-18 16:25:05,169 execution
 finished - beginning merge: tblastx -query
 /mnt/galaxy/galaxy-central/database/files/000/dataset_127.dat   -db
 /var/local/blast/ncbi/nt -query_gencode 2 -evalue 0.001 -out
 /mnt/galaxy/galaxy-central/database/files/000/dataset_304.dat
 -outfmt 0 -num_threads 8
 galaxy.jobs.splitters.multi DEBUG 2012-10-18 16:25:05,181 files []
 
 If you would prefer that small suggestion as a pull request, let me
 know.
 
 Regards,
 
 Peter
 
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/