[galaxy-dev] metadata in parallelization

2015-04-17 Thread Roberto Alonso
Hello,

I am writing some code to enable parallelization for some tool wrappers.
First, I did it for simple bwa wrapper, but now I am modifying
toolshed.g2.bx.psu.edu/repos/devteam/bwa/c71dd035971e/bwa/bwa-mem.xml to
check if the code would work with this wrapper. So, I wrote some code that
I thing was necessary in order to merge some bam and I added the
parallelism tag (in bold) to the config file:

tool id=bwa_mem name=BWA-MEM version=0.1

  macros
importbwa_macros.xml/import
  /macros

  requirements
requirement type=package
version=0.7.10.039ea20639bwa/requirement
requirement type=package version=1.1samtools/requirement
  /requirements
  description- map medium and long reads (gt; 100 bp) against reference
genome/description
  *parallelism method=multi split_size=3 shared_inputs=ref_file
split_mode=number_of_parts merge_outputs=bam_output
split_inputs=fastq_input1,fastq_input2 /parallelism*


  command
...

So, everything works well, and the resulting bam from parallelization mode
and without the parallelization mode is the same but the Galaxy log throws
an error regarding metadata, it says something like this:

galaxy.jobs.splitters.multi DEBUG 2015-04-17 09:54:58,335 merge finished:
/home/ralonso/galaxy/database/files/000/dataset_198.dat
galaxy.jobs.runners.tasks DEBUG 2015-04-17 09:54:58,473 executing external
set_meta script for job 200: python
/home/ralonso/galaxy/database/tmp/set_metadata_E5fGIE.py
/home/ralonso/galaxy/database/tmp/tmpHS8Byo
/home/ralonso/galaxy/database/job_working_directory/000/200/galaxy.json
/home/ralonso/galaxy/database/tmp/metadata_in_HistoryDatasetAssociation_198_yOGiQG,/home/ralonso/galaxy/database/tmp/metadata_kwds_HistoryDatasetAssociation_198_nAsQoq,/home/ralonso/galaxy/database/tmp/metadata_out_HistoryDatasetAssociation_198_I_cLs4,/home/ralonso/galaxy/database/tmp/metadata_results_HistoryDatasetAssociation_198_qhjzoV,/home/ralonso/galaxy/database/files/000/dataset_198.dat,/home/ralonso/galaxy/database/tmp/metadata_override_HistoryDatasetAssociation_198_ScKLqH
Traceback (most recent call last):
  File /home/ralonso/galaxy/database/tmp/set_metadata_E5fGIE.py, line 1,
in module
from galaxy_ext.metadata.set_metadata import set_metadata;
set_metadata()
ImportError: No module named galaxy_ext.metadata.set_metadata
galaxy.jobs.runners.tasks DEBUG 2015-04-17 09:54:58,624 execution of
external set_meta finished for job 200
*galaxy.datatypes.metadata DEBUG 2015-04-17 09:54:58,714 setting metadata
externally failed for HistoryDatasetAssociation 198: External set_meta()
not called*

When using no parallelization mode, there is no problem, also because
Galaxy doesn't go through this part of code, I mean it doesn't execute this.
I see that Galaxy have to do something with metada attributes, but what is
t trying to do? is there any way to solve this?

Thank you very much

Regards,
Roberto
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] join bam results in one file

2015-03-03 Thread Roberto Alonso CIPF
Hello,

I am playing with Galaxy splitters capabilities. After some cases that
you help me out to solve I am facing a new issue, this is maybe due to my
tool configuration file, but in any case I tell you what I've done.
What I would like to do exactly, is to split paired fastq, map them and
then join them. This is my configuration file:

tool id=bwa_mio name=map with bwa
  descriptionmap with bwa/description
  parallelism method=basic split_size=3 split_mode=number_of_parts
merge_outputs=output/parallelism

  command
  bwa mem -R '@RG\tID:foo\tSM:bar'
/home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa $input gt;
temporary_bam_file.sam 2gt;/dev/null ;
  samtools view -Sb temporary_bam_file.sam gt; temporary_bam_file.bam ;
  samtools sort temporary_bam_file.bam $output ;
  /command
  inputs
param format=fastqsanger name=input type=data label=fastq/
  /inputs
  outputs
  data format=bam name=output /
  /outputs

  help
  bwa
  /help
/tool

My problem of this configuration is that generates an empty file. So, after
seeing the code, I discover that when it tries to join the several bam
files it goes to the first parent: *class Data( object )*, to the method
merge: *def merge( split_files, output_file). *So I may be wrong, but I
think binary.bam class should override this method, is this right? if this
is the case, I would like to implement this method, I have couple of basic
ideas, like merge them with samtools. What do you think?


On ther other hand, is this related with the last email of John Chilton and
the 15.03 release?

Best Regards


-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] problems splitting

2015-02-24 Thread Roberto Alonso CIPF
Hello again,

first of all thanks for your help, it is being very useful.

What I have done up to now is to copy this method to the class Sequence

def get_split_commands_sequential(is_compressed, input_name, output_name,
start_sequence, sequence_count):

Does a brain-dead sequential scan  extract of certain sequences
 Sequence.get_split_commands_sequential(True, './input.gz',
'./output.gz', start_sequence=0, sequence_count=10)
['zcat ./input.gz | ( tail -n +1 2 /dev/null) | head -40 | gzip
-c  ./output.gz']
 Sequence.get_split_commands_sequential(False, './input.fastq',
'./output.fastq', start_sequence=10, sequence_count=10)
['tail -n +41 ./input.fastq 2 /dev/null | head -40 
./output.fastq']

start_line = start_sequence * 4
line_count = sequence_count * 4
# TODO: verify that tail can handle 64-bit numbers
if is_compressed:
cmd = 'zcat %s | ( tail -n +%s 2 /dev/null) | head -%s |
gzip -c' % (input_name, start_line+1, line_count)
else:
cmd = 'tail -n +%s %s 2 /dev/null | head -%s'  %
(start_line+1, input_name, line_count)
cmd += '  %s' % output_name

return [cmd]
get_split_commands_sequential =
staticmethod(get_split_commands_sequential)

This is something that you suggested.
When I run the tool with this configuration:

tool id=bwa_mio name=map with bwa
  descriptionmap with bwa/description
  parallelism method=basic split_size=3
split_mode=number_of_parts/parallelism

  command
  bwa mem /home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
$input  $output 2/dev/null/command
  inputs
param format=fastqsanger name=input type=data label=fastq/
  /inputs
  outputs
  data format=sam name=output /
  /outputs

  help
  bwa
  /help

/tool
Everything ends ok, but when I go to check how is the sam, I see that in
the alingments it is the path of the file, i.e
example_split.sam:
/home/ralonso/galaxy-dist/database/job_working_directory/000/90/task_2/dataset_91.dat:SRR098409.1113446
4 * 0 0 * * 0 0
TCTGGGTGAGGGAGTAGTGGGTGAGGGTGTGTGAGGATGTGTAAGTGGATGGAAGTAGATTGAATGTT

AS:i:0 XS:i:0

you know what  may be going on?
If i don't split the file, everything goes correctly.

Best regards


On 13 February 2015 at 13:39, Peter Cock p.j.a.c...@googlemail.com wrote:

 On Fri, Feb 13, 2015 at 11:38 AM, Nicola Soranzo nsora...@tiscali.it
 wrote:
  Il 13.02.2015 03:17 Peter Cock ha scritto:
 
  Hi Roberto,
 
  It looks like this is a known issue with FASTQ splitting,
 
  https://trello.com/c/qRHLFSzd/1522-issues-with-tasked-jobs-parallelism
 
  I originally broke it during a refactor, but it looks like the
  discussion died about that that method was meant to do
  (e.g. FQTOC = FASTQ table of contents?):
 
 
 
 https://bitbucket.org/galaxy/galaxy-central/commits/76277761807306ec2be3f1e4059dd7cde6fd2dc6#comment-820648
 
  I'm away from the office so can't try this, but probably all
  that is needed is to copy and paste the old method
  get_split_commands_sequential and the old method
  get_split_commands_with_toc (removed from the
  base Sequence class in the above commit) into the
  base Fastq class instead.
 
  Nicola - did you fix this locally after noticing the
  problem last year?
 
  No, sorry, we disabled Galaxy parallelism because it was using
  too many cluster nodes.
 
  Nicola

 I had similar comments from some of the cluster users
 after getting it working here - but on balance a well used
 cluster helps justify future investment in maintaining it.

 Sorry about not following up on this - I think I might have
 assumed you would take care of it. Unfortunately I won't
 be able to test the obvious fix until at least a week later...

 Peter




-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] problems splitting

2015-02-25 Thread Roberto Alonso CIPF
Hello again,

this is something that I consider important, when I see the log I see this
output:
galaxy.jobs.runners.tasks DEBUG 2015-02-25 11:33:30,989 execution finished -*
beginning merge: bwa mem*
/home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
/home/ralonso/galaxy-dist/database/files/000/dataset_8.dat 
/home/ralonso/galaxy-dist/database/files/000/dataset_94.dat 2 /dev/null
I think the merge should be done with samtools. I don't know how is this
programmed in Galaxy, but I didn't indicate anywhere the path to samtools,
is it maybe the problem related with this?

Thanks a lot,

Regards


On 25 February 2015 at 11:13, Roberto Alonso CIPF ralo...@cipf.es wrote:

 Hello,

 I just changed for the CDATA format, but the problem still remains. When I
 split by 2, there is no problem, but when I go for 3, it happens the
 problem commented before. Here it is the link to the sam/bam file:
  https://dl.dropboxusercontent.com/u/1669701/ejemplo_split.bam

 Best regards

 On 24 February 2015 at 17:49, Peter Cock p.j.a.c...@googlemail.com
 wrote:

 On Tue, Feb 24, 2015 at 4:43 PM, Roberto Alonso CIPF ralo...@cipf.es
 wrote:
  Hello again,
 
  first of all thanks for your help, it is being very useful.
 
  What I have done up to now is to copy this method to the class Sequence
 
  def get_split_commands_sequential(is_compressed, input_name,
 output_name,
  start_sequence, sequence_count):
  ...
  return [cmd]
  get_split_commands_sequential =
  staticmethod(get_split_commands_sequential)
 
  This is something that you suggested.

 Good.

  When I run the tool with this configuration:
 
  tool id=bwa_mio name=map with bwa
descriptionmap with bwa/description
parallelism method=basic split_size=3
  split_mode=number_of_parts/parallelism
 
command
bwa mem
 /home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
  $input  $output 2/dev/null/command
inputs
  param format=fastqsanger name=input type=data label=fastq/
/inputs
outputs
data format=sam name=output /
/outputs
 
help
bwa
/help
 
  /tool

 One minor improvement would be to escape the  as gt; in
 your XML, or use the CDATA approach documented here:

 https://wiki.galaxyproject.org/Tools/BestPractices

  Everything ends ok, but when I go to check how is the sam, I see that
 in the
  alingments it is the path of the file, i.e
  example_split.sam:
 
 /home/ralonso/galaxy-dist/database/job_working_directory/000/90/task_2/dataset_91.dat:SRR098409.1113446
  4 * 0 0 * * 0 0
 
 TCTGGGTGAGGGAGTAGTGGGTGAGGGTGTGTGAGGATGTGTAAGTGGATGGAAGTAGATTGAATGTT
 
 
  AS:i:0 XS:i:0
 
  you know what  may be going on?
  If i don't split the file, everything goes correctly.

 This sounds to me like there may be a problem with SAM merging?
 Could you share the entire example_split.sam file (e.g. as a gist
 on GitHub, or via dropbox)?

 Peter




 --
 Roberto Alonso
 Functional Genomics Unit
 Bioinformatics and Genomics Department
 Prince Felipe Research Center (CIPF)
 C./Eduardo Primo Yúfera (Científic), nº 3
 (junto Oceanografico)
 46012 Valencia, Spain
 Tel: +34 963289680 Ext. 1021
 Fax: +34 963289574
 E-Mail: ralo...@cipf.es




-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] problems splitting

2015-02-25 Thread Roberto Alonso CIPF
Hello,

I just changed for the CDATA format, but the problem still remains. When I
split by 2, there is no problem, but when I go for 3, it happens the
problem commented before. Here it is the link to the sam/bam file:
 https://dl.dropboxusercontent.com/u/1669701/ejemplo_split.bam

Best regards

On 24 February 2015 at 17:49, Peter Cock p.j.a.c...@googlemail.com wrote:

 On Tue, Feb 24, 2015 at 4:43 PM, Roberto Alonso CIPF ralo...@cipf.es
 wrote:
  Hello again,
 
  first of all thanks for your help, it is being very useful.
 
  What I have done up to now is to copy this method to the class Sequence
 
  def get_split_commands_sequential(is_compressed, input_name, output_name,
  start_sequence, sequence_count):
  ...
  return [cmd]
  get_split_commands_sequential =
  staticmethod(get_split_commands_sequential)
 
  This is something that you suggested.

 Good.

  When I run the tool with this configuration:
 
  tool id=bwa_mio name=map with bwa
descriptionmap with bwa/description
parallelism method=basic split_size=3
  split_mode=number_of_parts/parallelism
 
command
bwa mem /home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
  $input  $output 2/dev/null/command
inputs
  param format=fastqsanger name=input type=data label=fastq/
/inputs
outputs
data format=sam name=output /
/outputs
 
help
bwa
/help
 
  /tool

 One minor improvement would be to escape the  as gt; in
 your XML, or use the CDATA approach documented here:

 https://wiki.galaxyproject.org/Tools/BestPractices

  Everything ends ok, but when I go to check how is the sam, I see that in
 the
  alingments it is the path of the file, i.e
  example_split.sam:
 
 /home/ralonso/galaxy-dist/database/job_working_directory/000/90/task_2/dataset_91.dat:SRR098409.1113446
  4 * 0 0 * * 0 0
 
 TCTGGGTGAGGGAGTAGTGGGTGAGGGTGTGTGAGGATGTGTAAGTGGATGGAAGTAGATTGAATGTT
 
 
  AS:i:0 XS:i:0
 
  you know what  may be going on?
  If i don't split the file, everything goes correctly.

 This sounds to me like there may be a problem with SAM merging?
 Could you share the entire example_split.sam file (e.g. as a gist
 on GitHub, or via dropbox)?

 Peter




-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] problems splitting

2015-02-25 Thread Roberto Alonso CIPF
Ok, I think I understand the line:
beginning merge: bwa mem
/home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
/home/ralonso/galaxy-dist/database/files/000/dataset_8.dat 
/home/ralonso/galaxy-dist/database/files/000/dataset_94.dat 2 /dev/null
it refers to the original command, so everything is fine with this line.
The other problem still remains
Regards, sorry for the confusion

On 25 February 2015 at 11:40, Roberto Alonso CIPF ralo...@cipf.es wrote:

 Hello again,

 this is something that I consider important, when I see the log I see this
 output:
 galaxy.jobs.runners.tasks DEBUG 2015-02-25 11:33:30,989 execution finished
 -* beginning merge: bwa mem*
 /home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
 /home/ralonso/galaxy-dist/database/files/000/dataset_8.dat 
 /home/ralonso/galaxy-dist/database/files/000/dataset_94.dat 2 /dev/null
 I think the merge should be done with samtools. I don't know how is this
 programmed in Galaxy, but I didn't indicate anywhere the path to samtools,
 is it maybe the problem related with this?

 Thanks a lot,

 Regards


 On 25 February 2015 at 11:13, Roberto Alonso CIPF ralo...@cipf.es wrote:

 Hello,

 I just changed for the CDATA format, but the problem still remains. When
 I split by 2, there is no problem, but when I go for 3, it happens the
 problem commented before. Here it is the link to the sam/bam file:
  https://dl.dropboxusercontent.com/u/1669701/ejemplo_split.bam

 Best regards

 On 24 February 2015 at 17:49, Peter Cock p.j.a.c...@googlemail.com
 wrote:

 On Tue, Feb 24, 2015 at 4:43 PM, Roberto Alonso CIPF ralo...@cipf.es
 wrote:
  Hello again,
 
  first of all thanks for your help, it is being very useful.
 
  What I have done up to now is to copy this method to the class Sequence
 
  def get_split_commands_sequential(is_compressed, input_name,
 output_name,
  start_sequence, sequence_count):
  ...
  return [cmd]
  get_split_commands_sequential =
  staticmethod(get_split_commands_sequential)
 
  This is something that you suggested.

 Good.

  When I run the tool with this configuration:
 
  tool id=bwa_mio name=map with bwa
descriptionmap with bwa/description
parallelism method=basic split_size=3
  split_mode=number_of_parts/parallelism
 
command
bwa mem
 /home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
  $input  $output 2/dev/null/command
inputs
  param format=fastqsanger name=input type=data
 label=fastq/
/inputs
outputs
data format=sam name=output /
/outputs
 
help
bwa
/help
 
  /tool

 One minor improvement would be to escape the  as gt; in
 your XML, or use the CDATA approach documented here:

 https://wiki.galaxyproject.org/Tools/BestPractices

  Everything ends ok, but when I go to check how is the sam, I see that
 in the
  alingments it is the path of the file, i.e
  example_split.sam:
 
 /home/ralonso/galaxy-dist/database/job_working_directory/000/90/task_2/dataset_91.dat:SRR098409.1113446
  4 * 0 0 * * 0 0
 
 TCTGGGTGAGGGAGTAGTGGGTGAGGGTGTGTGAGGATGTGTAAGTGGATGGAAGTAGATTGAATGTT
 
 
  AS:i:0 XS:i:0
 
  you know what  may be going on?
  If i don't split the file, everything goes correctly.

 This sounds to me like there may be a problem with SAM merging?
 Could you share the entire example_split.sam file (e.g. as a gist
 on GitHub, or via dropbox)?

 Peter




 --
 Roberto Alonso
 Functional Genomics Unit
 Bioinformatics and Genomics Department
 Prince Felipe Research Center (CIPF)
 C./Eduardo Primo Yúfera (Científic), nº 3
 (junto Oceanografico)
 46012 Valencia, Spain
 Tel: +34 963289680 Ext. 1021
 Fax: +34 963289574
 E-Mail: ralo...@cipf.es




 --
 Roberto Alonso
 Functional Genomics Unit
 Bioinformatics and Genomics Department
 Prince Felipe Research Center (CIPF)
 C./Eduardo Primo Yúfera (Científic), nº 3
 (junto Oceanografico)
 46012 Valencia, Spain
 Tel: +34 963289680 Ext. 1021
 Fax: +34 963289574
 E-Mail: ralo...@cipf.es




-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] question about splitting bams

2015-04-23 Thread Roberto Alonso CIPF
Hello,
I ma trying ti write some code in order to give the possibility of
parallelize some tasks. Now, I was with the problem of splitting a bam in
some parts, for this I create this simple tool

parallelism method=multi split_size=3 split_mode=number_of_parts
merge_outputs=output split_inputs=input /parallelism

  command
java -jar
/home/ralonso/software/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T
UnifiedGenotyper -R /home/ralonso/BiB/Galaxy/data/chr_19_hg19_ucsc.fa -I
$input -o $output 2gt; /dev/null;

  /command
  inputs
param format=bam name=input type=data label=bam/
  /inputs
  outputs
  data format=vcf name=output /
  /outputs

But I have one problem, when I execute the tool it goes through this part
of code (I am working in dev branch):

*$galaxy/lib/galaxy/jobs/splitters/multi.py, line 75:*

for input in parent_job.input_datasets:
if input.name in split_inputs:
this_input_files =
job_wrapper.get_input_dataset_fnames(input.dataset)
if len(this_input_files)  1:
log_error = The input '%s' is composed of multiple files -
splitting is not allowed % str(input.name)
log.error(log_error)
raise Exception(log_error)
input_datasets.append(input.dataset)

So, it is raising the exception because this_input_files=2, concretely:
['/home/ralonso/galaxy/database/files/000/dataset_171.dat',
'/home/ralonso/galaxy/database/files/_metadata_files/000/metadata_13.dat'],
I guess that:
*dataset_171.dat*: It is the bam file.
*metadata_13.dat*: It is the bai file.

So, Galaxy can't move on and I don't know which would be the best solution.
Maybe change the *if* to check only non-metadata files? I think I should
use both files in order to create the bam sub-files, but this would be
inside the Bam class, under *binary.py* file.
Could you please guide me before I mess things up?

Thanks so much
-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] question about splitting bams

2015-04-23 Thread Roberto Alonso CIPF
Regarding my previous mail I found this thread
http://www.bytebucket.org/galaxy/galaxy-central/pull-request/175/parameter-based-bam-file-parallelization/diff

is it still alive? is it maybe the best choice to do the bam
parallelization?

Thanks!
Best regards

On 23 April 2015 at 17:55, Roberto Alonso CIPF ralo...@cipf.es wrote:

 Hello,
 I ma trying ti write some code in order to give the possibility of
 parallelize some tasks. Now, I was with the problem of splitting a bam in
 some parts, for this I create this simple tool

 parallelism method=multi split_size=3 split_mode=number_of_parts
 merge_outputs=output split_inputs=input /parallelism

   command
 java -jar
 /home/ralonso/software/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T
 UnifiedGenotyper -R /home/ralonso/BiB/Galaxy/data/chr_19_hg19_ucsc.fa -I
 $input -o $output 2gt; /dev/null;

   /command
   inputs
 param format=bam name=input type=data label=bam/
   /inputs
   outputs
   data format=vcf name=output /
   /outputs

 But I have one problem, when I execute the tool it goes through this part
 of code (I am working in dev branch):

 *$galaxy/lib/galaxy/jobs/splitters/multi.py, line 75:*

 for input in parent_job.input_datasets:
 if input.name in split_inputs:
 this_input_files =
 job_wrapper.get_input_dataset_fnames(input.dataset)
 if len(this_input_files)  1:
 log_error = The input '%s' is composed of multiple files
 - splitting is not allowed % str(input.name)
 log.error(log_error)
 raise Exception(log_error)
 input_datasets.append(input.dataset)

 So, it is raising the exception because this_input_files=2, concretely:
 ['/home/ralonso/galaxy/database/files/000/dataset_171.dat',
 '/home/ralonso/galaxy/database/files/_metadata_files/000/metadata_13.dat'],
 I guess that:
 *dataset_171.dat*: It is the bam file.
 *metadata_13.dat*: It is the bai file.

 So, Galaxy can't move on and I don't know which would be the best
 solution. Maybe change the *if* to check only non-metadata files? I think
 I should use both files in order to create the bam sub-files, but this
 would be inside the Bam class, under *binary.py* file.
 Could you please guide me before I mess things up?

 Thanks so much
 --
 Roberto Alonso
 Functional Genomics Unit
 Bioinformatics and Genomics Department
 Prince Felipe Research Center (CIPF)
 C./Eduardo Primo Yúfera (Científic), nº 3
 (junto Oceanografico)
 46012 Valencia, Spain
 Tel: +34 963289680 Ext. 1021
 Fax: +34 963289574
 E-Mail: ralo...@cipf.es




-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] question about splitting bams

2015-04-24 Thread Roberto Alonso CIPF
Hello,

I have been reading those different threads  and I have some doubts that
you maybe can clarify me. In the thread you said: ability to write tools
that split up a single input into a collection. , I think this is focused
for workflows, but in any case, could we use this to split bams?
Another comment is the next:

These common pipelines where you split up a BAM files, run a bunch of
steps, and then merge the results will be executable in the near
future (though 15.03 won't have workflow editor support for it - I
will try to get to this by the following release - and you can
manually build up workflows to do this - 

As I was trying to write something that will do exactly this and I guess
there is someone working on this, do you think is it worth to continue
doing this or just switch to another thing? would you know the road-map of
this feature?

Thanks a lot,

Roberto

On 23 April 2015 at 20:09, John Chilton jmchil...@gmail.com wrote:

 I am a pragmatist - I have no problem just splitting the inputs and
 skipping the metadata files. I would just convert the error into an
 log.info() and warn that the tool cannot use metadata files. If the
 underlying tool needs an index it can recreate it instead I think. One
 can imagine a more intricate solution that would recreate metadata
 files as needed - but that would be a lot of work I think.

 Does that make sense?

 About BB PR 175 there were some recent discussions about that approach
 - I would check out
 http://dev.list.galaxyproject.org/Parallelism-using-metadata-td4666763.html
 .

 -John

 On Thu, Apr 23, 2015 at 11:55 AM, Roberto Alonso CIPF ralo...@cipf.es
 wrote:
  Hello,
  I ma trying ti write some code in order to give the possibility of
  parallelize some tasks. Now, I was with the problem of splitting a bam in
  some parts, for this I create this simple tool
 
  parallelism method=multi split_size=3 split_mode=number_of_parts
  merge_outputs=output split_inputs=input /parallelism
 
command
  java -jar
  /home/ralonso/software/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T
  UnifiedGenotyper -R /home/ralonso/BiB/Galaxy/data/chr_19_hg19_ucsc.fa -I
  $input -o $output 2gt; /dev/null;
 
/command
inputs
  param format=bam name=input type=data label=bam/
/inputs
outputs
data format=vcf name=output /
/outputs
 
  But I have one problem, when I execute the tool it goes through this
 part of
  code (I am working in dev branch):
 
  $galaxy/lib/galaxy/jobs/splitters/multi.py, line 75:
 
  for input in parent_job.input_datasets:
  if input.name in split_inputs:
  this_input_files =
  job_wrapper.get_input_dataset_fnames(input.dataset)
  if len(this_input_files)  1:
  log_error = The input '%s' is composed of multiple
 files -
  splitting is not allowed % str(input.name)
  log.error(log_error)
  raise Exception(log_error)
  input_datasets.append(input.dataset)
 
  So, it is raising the exception because this_input_files=2, concretely:
  ['/home/ralonso/galaxy/database/files/000/dataset_171.dat',
 
 '/home/ralonso/galaxy/database/files/_metadata_files/000/metadata_13.dat'],
  I guess that:
  dataset_171.dat: It is the bam file.
  metadata_13.dat: It is the bai file.
 
  So, Galaxy can't move on and I don't know which would be the best
 solution.
  Maybe change the if to check only non-metadata files? I think I should
 use
  both files in order to create the bam sub-files, but this would be inside
  the Bam class, under binary.py file.
  Could you please guide me before I mess things up?
 
  Thanks so much
  --
  Roberto Alonso
  Functional Genomics Unit
  Bioinformatics and Genomics Department
  Prince Felipe Research Center (CIPF)
  C./Eduardo Primo Yúfera (Científic), nº 3
  (junto Oceanografico)
  46012 Valencia, Spain
  Tel: +34 963289680 Ext. 1021
  Fax: +34 963289574
  E-Mail: ralo...@cipf.es
 
  ___
  Please keep all replies on the list by using reply all
  in your mail client.  To manage your subscriptions to this
  and other Galaxy lists, please use the interface at:
https://lists.galaxyproject.org/
 
  To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/




-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] PR 149

2015-04-29 Thread Roberto Alonso CIPF
Ok, no problems ;)
It is another PR, I think it is useful without the other PR, for example
when you map with BWA. The last PR will be like the next step, I mean for
example that  you split a bam to do some calling or whatever. I think both
PR can live independently and together... I don't know if I self-explained
well :)

Regards

On 29 April 2015 at 17:14, John Chilton jmchil...@gmail.com wrote:

 No it just slipped through the cracks - sorry about that. I have
 commented on it now. There was a time when a couple weeks before a
 first response was the norm :).

 Does it belong with the bam splitting pull request - is merging useful
 on its own without the other piece you are working on.

 -John




 On Wed, Apr 29, 2015 at 11:04 AM, Roberto Alonso CIPF ralo...@cipf.es
 wrote:
  Hello,
 
  I created a PR https://github.com/galaxyproject/galaxy/pull/149 dome
 days
  ago, but I don't have any feedback yet, is there any problem with it? Is
 it
  not interesting for the current Galaxy? didn't the authors realized about
  it? It would be nice to have some feedback, even if it is not  a
 convenient
  PR.
 
  Thanks so much,
 
  Best regards
 
 
  --
  Roberto Alonso
  Functional Genomics Unit
  Bioinformatics and Genomics Department
  Prince Felipe Research Center (CIPF)
  C./Eduardo Primo Yúfera (Científic), nº 3
  (junto Oceanografico)
  46012 Valencia, Spain
  Tel: +34 963289680 Ext. 1021
  Fax: +34 963289574
  E-Mail: ralo...@cipf.es
 
  ___
  Please keep all replies on the list by using reply all
  in your mail client.  To manage your subscriptions to this
  and other Galaxy lists, please use the interface at:
https://lists.galaxyproject.org/
 
  To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/




-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] splitting bams bai

2015-04-27 Thread Roberto Alonso CIPF
 is in a metadata table, but I don't know how to get
it, Could you please help me with this?
In any case, if you find that I am doing something wrong, or you have a
better idea of implementing this, please don't hesitate to contact me.

Best regards



-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] PR 149

2015-04-29 Thread Roberto Alonso CIPF
Hello,

I created a PR https://github.com/galaxyproject/galaxy/pull/149 dome days
ago, but I don't have any feedback yet, is there any problem with it? Is it
not interesting for the current Galaxy? didn't the authors realized about
it? It would be nice to have some feedback, even if it is not  a convenient
PR.

Thanks so much,

Best regards


-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] bam split and gatk calling

2015-05-06 Thread Roberto Alonso CIPF
Hello,

I have been working in the Galaxy parallelization module and I would like
to ask you some questions that I have about how to face one problem.
I have done one pull request about splitting bams:
https://github.com/galaxyproject/galaxy/pull/184

Regarding this, I think it is useful but it could be more while accessing
somehow the interval. I better explain it with an example:
If I define a simple tool like this, with the parallelism tag actived:

tool id=gatk name=call with gatk
  descriptiongatk/description
*  parallelism method=multi split_mode=by_interval
split_size=1 merge_outputs=output split_inputs=input
/parallelism*

  command
## by_rname
ln -s $input input.bam;
samtools index input.bam;
UnifiedGenotyper -R /home/ralonso/BiB/Galaxy/data/hg19_ucsc.fa -I input.bam
-o $output -L *REGION* ;

  /command
  inputs
param format=bam name=input type=data label=bam/
  /inputs
  outputs
  data format=vcf name=output /
  /outputs

  help
  bwa
  /help
/tool

The region is based on the field split_size, it is better explained in the
PR.
How does the code from the PR work? It goes through the bam file and does
something like samtools view *REGION *-o bam_splitted.bam, so then GATK
does the calling for this small bam, but what is the problem? As you know,
in the software GATK if you don't pass the region as an argument in the
command line it goes through all the genome, so it is very slow. So, what
would you recommend to me to be able to pass this information to GATK? I
was thinking to create, at the same time the bam is splitted, a file
region.bed and use it in the tool definition xml, so the command would be
like this:
  command
...
UnifiedGenotyper -R /home/ralonso/BiB/Galaxy/data/hg19_ucsc.fa -I input.bam
-o $output -L *region.bed*;
/command

This solution does not convince me too much because it is a bit intrusive
in the tool definition and also because you have to trust that the
*region.bed* file exists.
Do you have any opinion, suggestion...?

Thanks a lot!

Best regards


-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] bam split and gatk calling

2015-05-06 Thread Roberto Alonso CIPF
Hello,

I agree, what you say fits perfectly for GATK, but as I wanted to create a
more generic code I did it this way (also because I am a newbie in the
galaxy code and I didn't know so well how to implement this ). What about a
tool that doesn't accept a region, just a bam? Maybe we can put another
parameter in the parallelism tag that force to split the bam.
Mostly, just to create a bed file would be better, right?
What do you think?

Regards

On 6 May 2015 at 12:23, Peter Cock p.j.a.c...@googlemail.com wrote:

 Hi Roberto,

 Given the way BAM indexing works, I see no reason to actually
 split the BAM file at all - it seems like wasted disk IO.

 Instead, can you split a BED file into sub-regions? This way
 each child GATK job would look at the full BAM file but only for
 a small region described in the split BED region file?

 Peter


 On Wed, May 6, 2015 at 11:19 AM, Roberto Alonso CIPF ralo...@cipf.es
 wrote:
  Hello,
 
  I have been working in the Galaxy parallelization module and I would
 like to
  ask you some questions that I have about how to face one problem.
  I have done one pull request about splitting bams:
  https://github.com/galaxyproject/galaxy/pull/184
 
  Regarding this, I think it is useful but it could be more while accessing
  somehow the interval. I better explain it with an example:
  If I define a simple tool like this, with the parallelism tag actived:
 
  tool id=gatk name=call with gatk
descriptiongatk/description
parallelism method=multi split_mode=by_interval
  split_size=1 merge_outputs=output split_inputs=input
 /parallelism
 
command
  ## by_rname
  ln -s $input input.bam;
  samtools index input.bam;
  UnifiedGenotyper -R /home/ralonso/BiB/Galaxy/data/hg19_ucsc.fa -I
 input.bam
  -o $output -L REGION ;
 
/command
inputs
  param format=bam name=input type=data label=bam/
/inputs
outputs
data format=vcf name=output /
/outputs
 
help
bwa
/help
  /tool
 
  The region is based on the field split_size, it is better explained in
 the
  PR.
  How does the code from the PR work? It goes through the bam file and does
  something like samtools view REGION -o bam_splitted.bam, so then GATK
 does
  the calling for this small bam, but what is the problem? As you know, in
 the
  software GATK if you don't pass the region as an argument in the command
  line it goes through all the genome, so it is very slow. So, what would
 you
  recommend to me to be able to pass this information to GATK? I was
 thinking
  to create, at the same time the bam is splitted, a file region.bed and
 use
  it in the tool definition xml, so the command would be like this:
command
  ...
  UnifiedGenotyper -R /home/ralonso/BiB/Galaxy/data/hg19_ucsc.fa -I
 input.bam
  -o $output -L region.bed;
  /command
 
  This solution does not convince me too much because it is a bit
 intrusive in
  the tool definition and also because you have to trust that the
 region.bed
  file exists.
  Do you have any opinion, suggestion...?
 
  Thanks a lot!
 
  Best regards
 
 
  --
  Roberto Alonso
  Functional Genomics Unit
  Bioinformatics and Genomics Department
  Prince Felipe Research Center (CIPF)
  C./Eduardo Primo Yúfera (Científic), nº 3
  (junto Oceanografico)
  46012 Valencia, Spain
  Tel: +34 963289680 Ext. 1021
  Fax: +34 963289574
  E-Mail: ralo...@cipf.es
 
  ___
  Please keep all replies on the list by using reply all
  in your mail client.  To manage your subscriptions to this
  and other Galaxy lists, please use the interface at:
https://lists.galaxyproject.org/
 
  To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/




-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] bam split and gatk calling

2015-05-06 Thread Roberto Alonso CIPF
I agree, I prefer your solution, I will focus on that solution, thanks!
Although there is some software more or less used in the community such
Delly https://github.com/tobiasrausch/delly and Breakdancer
http://gmt.genome.wustl.edu/packages/breakdancer/documentation.html, that
doesn't use bed files, the only way to parallelize their execution is
through smaller bams

Regards


On 6 May 2015 at 15:00, Peter Cock p.j.a.c...@googlemail.com wrote:

 On Wed, May 6, 2015 at 11:33 AM, Roberto Alonso CIPF ralo...@cipf.es
 wrote:
  Hello,
 
  I agree, what you say fits perfectly for GATK, but as I wanted to create
 a
  more generic code I did it this way (also because I am a newbie in the
  galaxy code and I didn't know so well how to implement this ). What
 about a
  tool that doesn't accept a region, just a bam? Maybe we can put another
  parameter in the parallelism tag that force to split the bam.
  Mostly, just to create a bed file would be better, right?
  What do you think?
 
  Regards

 Maybe you're right - BAM splitting might be useful for some tools
 (any examples?), even though BED splitting is a much more elegant
 solution.

 Peter




-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] error executing test

2015-05-11 Thread Roberto Alonso CIPF
Hello,

I am designing some test and I have a problem, it works under the Galaxy
web environment, but it doesn't work when I try to use it as a test case.
Indeed I am trying other tests and they fail as well.

My test  *./run_tests.sh -framework -id parallelism_bam_filter_reads* says
the next:

==
ERROR: filter reads ( parallelism_bam_filter_reads )  Test-1
--
Traceback (most recent call last):
  File /home/ralonso/galaxy/test/functional/test_toolbox.py, line 268, in
test_tool
self.do_it( td )
  File /home/ralonso/galaxy/test/functional/test_toolbox.py, line 55, in
do_it
raise e
RunToolException: Error creating a job for these tool inputs - {u'type':
u'error', u'data': {u'input': u'History does not include a dataset of the
required format / build'}}




And the other test  *./run_tests.sh -framework -id compare_bam_as_sam*

==
ERROR: compare_bam_as_sam ( compare_bam_as_sam )  Test-1
--
Traceback (most recent call last):
  File /home/ralonso/galaxy/test/functional/test_toolbox.py, line 268, in
test_tool
self.do_it( td )
  File /home/ralonso/galaxy/test/functional/test_toolbox.py, line 37, in
do_it
stage_data_in_history( galaxy_interactor, testdef.test_data(),
test_history, shed_tool_id )
  File /home/ralonso/galaxy/test/base/interactor.py, line 38, in
stage_data_in_history
upload_wait()
  File /home/ralonso/galaxy/test/base/interactor.py, line 279, in wait
while not self.__history_ready( history_id ):
  File /home/ralonso/galaxy/test/base/interactor.py, line 297, in
__history_ready
return self._state_ready( state, error_msg=History in error state. )
  File /home/ralonso/galaxy/test/base/interactor.py, line 356, in
_state_ready
raise Exception( error_msg )
Exception: History in error state.
  begin captured logging  

Besides than it tries to migrate the database each time I try a test case
and it takes too long. I have seen that you can use --db postgres but it
doesn't work, I think this option should be user with --dockerize (that is
not my case).

would you have any idea of what is going on?

Best regards

-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] metadata in parallelization

2015-04-17 Thread Roberto Alonso CIPF
Hello,

I am writing some code to enable parallelization for some tool wrappers.
First, I did it for simple bwa wrapper, but now I am modifying
toolshed.g2.bx.psu.edu/repos/devteam/bwa/c71dd035971e/bwa/bwa-mem.xml to
check if the code would work with this wrapper. So, I wrote some code that
I thing was necessary in order to merge some bam and I added the
parallelism tag (in bold) to the config file:

tool id=bwa_mem name=BWA-MEM version=0.1

  macros
importbwa_macros.xml/import
  /macros

  requirements
requirement type=package
version=0.7.10.039ea20639bwa/requirement
requirement type=package version=1.1samtools/requirement
  /requirements
  description- map medium and long reads (gt; 100 bp) against reference
genome/description
  parallelism method=multi split_size=3 shared_inputs=ref_file
split_mode=number_of_parts merge_outputs=bam_output
split_inputs=fastq_input1,fastq_input2 /parallelism


  command
...

So, everything works well, and the resulting bam from parallelization mode
and without the parallelization mode is the same but the Galaxy log throws
an error regarding metadata, it says something like this:

galaxy.jobs.splitters.multi DEBUG 2015-04-17 09:54:58,335 merge finished:
/home/ralonso/galaxy/database/files/000/dataset_198.dat
galaxy.jobs.runners.tasks DEBUG 2015-04-17 09:54:58,473 executing external
set_meta script for job 200: python
/home/ralonso/galaxy/database/tmp/set_metadata_E5fGIE.py
/home/ralonso/galaxy/database/tmp/tmpHS8Byo
/home/ralonso/galaxy/database/job_working_directory/000/200/galaxy.json
/home/ralonso/galaxy/database/tmp/metadata_in_HistoryDatasetAssociation_198_yOGiQG,/home/ralonso/galaxy/database/tmp/metadata_kwds_HistoryDatasetAssociation_198_nAsQoq,/home/ralonso/galaxy/database/tmp/metadata_out_HistoryDatasetAssociation_198_I_cLs4,/home/ralonso/galaxy/database/tmp/metadata_results_HistoryDatasetAssociation_198_qhjzoV,/home/ralonso/galaxy/database/files/000/dataset_198.dat,/home/ralonso/galaxy/database/tmp/metadata_override_HistoryDatasetAssociation_198_ScKLqH
Traceback (most recent call last):
  File /home/ralonso/galaxy/database/tmp/set_metadata_E5fGIE.py, line 1,
in module
from galaxy_ext.metadata.set_metadata import set_metadata;
set_metadata()
ImportError: No module named galaxy_ext.metadata.set_metadata
galaxy.jobs.runners.tasks DEBUG 2015-04-17 09:54:58,624 execution of
external set_meta finished for job 200
galaxy.datatypes.metadata DEBUG 2015-04-17 09:54:58,714 setting metadata
externally failed for HistoryDatasetAssociation 198: External set_meta()
not called

When using no parallelization mode, there is no problem, also because
Galaxy doesn't go through this part of code, I mean it doesn't execute this.
I see that Galaxy have to do something with metada attributes, but what is
t trying to do? is there any way to solve this?

Thank you very much

Regards,


-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/