Re: [galaxy-dev] problems splitting

2015-02-25 Thread Nicola Soranzo
  Hi Roberto,
I'm happy you solved your issue, thanks for sharing the
solution!
I'd suggest you open a pull request with the fixes at
https://github.com/galaxyproject/galaxy .

Cheers,
Nicola

Il 25.02.2015
15:07 Roberto Alonso CIPF ha scritto: 

 Hello again :), 
 I have
found the problem, the code that merge the files is this: 

galaxy/datatypes/tabular.py:484: cmd = 'egrep -v ^@ %s  %s' % ( '
'.join(split_files[1:]), output_file ) 
 This concatenates the file
name into the sam file. Just adding h it is enough, so it will be like
this: 
 
 galaxy/datatypes/tabular.py:484: cmd = 'egrep -Hv ^@ %s 
%s' % ( ' '.join(split_files[1:]), output_file ) 
 Thanks all for your
help, best regards 
 
 On 25 February 2015 at 12:31, Roberto Alonso
CIPF wrote:
 
 Ok, I think I understand the line: 
 beginning
merge: bwa mem
/home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
/home/ralonso/galaxy-dist/database/files/000/dataset_8.dat 
/home/ralonso/galaxy-dist/database/files/000/dataset_94.dat 2 /dev/null

 it refers to the original command, so everything is fine with this
line. The other problem still remains 
 Regards, sorry for the
confusion 
 
 On 25 February 2015 at 11:40, Roberto Alonso CIPF
wrote:
 
 Hello again, 
 this is something that I consider
important, when I see the log I see this output: 
 

galaxy.jobs.runners.tasks DEBUG 2015-02-25 11:33:30,989 execution
finished - BEGINNING MERGE: BWA MEM
/home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
/home/ralonso/galaxy-dist/database/files/000/dataset_8.dat 
/home/ralonso/galaxy-dist/database/files/000/dataset_94.dat 2 /dev/null

 I think the merge should be done with samtools. I don't know how is
this programmed in Galaxy, but I didn't indicate anywhere the path to
samtools, is it maybe the problem related with this? 
 Thanks a lot,

 Regards 
 
 On 25 February 2015 at 11:13, Roberto Alonso CIPF
wrote:
 
 Hello, 
 I just changed for the CDATA format, but
the problem still remains. When I split by 2, there is no problem, but
when I go for 3, it happens the problem commented before. Here it is the
link to the sam/bam file: 

https://dl.dropboxusercontent.com/u/1669701/ejemplo_split.bam [3] 

Best regards 
 
 On 24 February 2015 at 17:49, Peter Cock
wrote:
 
 On Tue, Feb 24, 2015 at 4:43 PM, Roberto Alonso CIPF
wrote:
  Hello again,
 
  first of all thanks for your
help, it is being very useful.
 
  What I have done up to
now is to copy this method to the class Sequence
 
  def
get_split_commands_sequential(is_compressed, input_name,
output_name,
  start_sequence, sequence_count):
 ...

return [cmd]
  get_split_commands_sequential =
 
staticmethod(get_split_commands_sequential)
 
  This is
something that you suggested.
 
 Good.
 
  When I
run the tool with this configuration:
 
  
  map with
bwa
   split_mode=number_of_parts
 
  
  bwa
mem /home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
 
$input  $output 2/dev/null
  
  
  
  
 

  
 
  
  bwa
  
 
  


 One minor improvement would be to escape the  as  in

your XML, or use the CDATA approach documented here:
 

https://wiki.galaxyproject.org/Tools/BestPractices [2]
 
 
Everything ends ok, but when I go to check how is the sam, I see that in
the
  alingments it is the path of the file, i.e
 
example_split.sam:
 
/home/ralonso/galaxy-dist/database/job_working_directory/000/90/task_2/dataset_91.dat:SRR098409.1113446

 4 * 0 0 * * 0 0
 
TCTGGGTGAGGGAGTAGTGGGTGAGGGTGTGTGAGGATGTGTAAGTGGATGGAAGTAGATTGAATGTT




 AS:i:0 XS:i:0
 
  you know what may be going on?
 
If i don't split the file, everything goes correctly.
 
 This
sounds to me like there may be a problem with SAM merging?
 Could
you share the entire example_split.sam file (e.g. as a gist
 on
GitHub, or via dropbox)?
 
 Peter
 
 -- 
 

Roberto Alonso 
 Functional Genomics Unit
 Bioinformatics and
Genomics Department
 Prince Felipe Research Center (CIPF) 

C./Eduardo Primo Yúfera (Científic), nº 3
 (junto
Oceanografico)
 46012 Valencia, Spain
 Tel: +34 963289680 Ext.
1021
 Fax: +34 963289574
 E-Mail: ralo...@cipf.es [5]
 

-- 
 
 Roberto Alonso 
 Functional Genomics Unit

Bioinformatics and Genomics Department
 Prince Felipe Research Center
(CIPF) 
 C./Eduardo Primo Yúfera (Científic), nº 3
 (junto
Oceanografico)
 46012 Valencia, Spain
 Tel: +34 963289680 Ext.
1021
 Fax: +34 963289574
 E-Mail: ralo...@cipf.es [7]
 
 --

 
 Roberto Alonso 
 Functional Genomics Unit
 Bioinformatics
and Genomics Department
 Prince Felipe Research Center (CIPF) 

C./Eduardo Primo Yúfera (Científic), nº 3
 (junto Oceanografico)

46012 Valencia, Spain
 Tel: +34 963289680 Ext. 1021
 Fax: +34
963289574
 E-Mail: ralo...@cipf.es [9]
 
 -- 
 
 Roberto Alonso 

Functional Genomics Unit
 Bioinformatics and Genomics Department

Prince Felipe Research Center (CIPF) 
 C./Eduardo Primo Yúfera
(Científic), nº 3
 (junto Oceanografico)
 46012 Valencia, Spain
 Tel:
+34 963289680 Ext. 1021
 

Re: [galaxy-dev] problems splitting

2015-02-25 Thread Roberto Alonso CIPF
Hello again,

this is something that I consider important, when I see the log I see this
output:
galaxy.jobs.runners.tasks DEBUG 2015-02-25 11:33:30,989 execution finished -*
beginning merge: bwa mem*
/home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
/home/ralonso/galaxy-dist/database/files/000/dataset_8.dat 
/home/ralonso/galaxy-dist/database/files/000/dataset_94.dat 2 /dev/null
I think the merge should be done with samtools. I don't know how is this
programmed in Galaxy, but I didn't indicate anywhere the path to samtools,
is it maybe the problem related with this?

Thanks a lot,

Regards


On 25 February 2015 at 11:13, Roberto Alonso CIPF ralo...@cipf.es wrote:

 Hello,

 I just changed for the CDATA format, but the problem still remains. When I
 split by 2, there is no problem, but when I go for 3, it happens the
 problem commented before. Here it is the link to the sam/bam file:
  https://dl.dropboxusercontent.com/u/1669701/ejemplo_split.bam

 Best regards

 On 24 February 2015 at 17:49, Peter Cock p.j.a.c...@googlemail.com
 wrote:

 On Tue, Feb 24, 2015 at 4:43 PM, Roberto Alonso CIPF ralo...@cipf.es
 wrote:
  Hello again,
 
  first of all thanks for your help, it is being very useful.
 
  What I have done up to now is to copy this method to the class Sequence
 
  def get_split_commands_sequential(is_compressed, input_name,
 output_name,
  start_sequence, sequence_count):
  ...
  return [cmd]
  get_split_commands_sequential =
  staticmethod(get_split_commands_sequential)
 
  This is something that you suggested.

 Good.

  When I run the tool with this configuration:
 
  tool id=bwa_mio name=map with bwa
descriptionmap with bwa/description
parallelism method=basic split_size=3
  split_mode=number_of_parts/parallelism
 
command
bwa mem
 /home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
  $input  $output 2/dev/null/command
inputs
  param format=fastqsanger name=input type=data label=fastq/
/inputs
outputs
data format=sam name=output /
/outputs
 
help
bwa
/help
 
  /tool

 One minor improvement would be to escape the  as gt; in
 your XML, or use the CDATA approach documented here:

 https://wiki.galaxyproject.org/Tools/BestPractices

  Everything ends ok, but when I go to check how is the sam, I see that
 in the
  alingments it is the path of the file, i.e
  example_split.sam:
 
 /home/ralonso/galaxy-dist/database/job_working_directory/000/90/task_2/dataset_91.dat:SRR098409.1113446
  4 * 0 0 * * 0 0
 
 TCTGGGTGAGGGAGTAGTGGGTGAGGGTGTGTGAGGATGTGTAAGTGGATGGAAGTAGATTGAATGTT
 
 
  AS:i:0 XS:i:0
 
  you know what  may be going on?
  If i don't split the file, everything goes correctly.

 This sounds to me like there may be a problem with SAM merging?
 Could you share the entire example_split.sam file (e.g. as a gist
 on GitHub, or via dropbox)?

 Peter




 --
 Roberto Alonso
 Functional Genomics Unit
 Bioinformatics and Genomics Department
 Prince Felipe Research Center (CIPF)
 C./Eduardo Primo Yúfera (Científic), nº 3
 (junto Oceanografico)
 46012 Valencia, Spain
 Tel: +34 963289680 Ext. 1021
 Fax: +34 963289574
 E-Mail: ralo...@cipf.es




-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] problems splitting

2015-02-25 Thread Roberto Alonso CIPF
Hello,

I just changed for the CDATA format, but the problem still remains. When I
split by 2, there is no problem, but when I go for 3, it happens the
problem commented before. Here it is the link to the sam/bam file:
 https://dl.dropboxusercontent.com/u/1669701/ejemplo_split.bam

Best regards

On 24 February 2015 at 17:49, Peter Cock p.j.a.c...@googlemail.com wrote:

 On Tue, Feb 24, 2015 at 4:43 PM, Roberto Alonso CIPF ralo...@cipf.es
 wrote:
  Hello again,
 
  first of all thanks for your help, it is being very useful.
 
  What I have done up to now is to copy this method to the class Sequence
 
  def get_split_commands_sequential(is_compressed, input_name, output_name,
  start_sequence, sequence_count):
  ...
  return [cmd]
  get_split_commands_sequential =
  staticmethod(get_split_commands_sequential)
 
  This is something that you suggested.

 Good.

  When I run the tool with this configuration:
 
  tool id=bwa_mio name=map with bwa
descriptionmap with bwa/description
parallelism method=basic split_size=3
  split_mode=number_of_parts/parallelism
 
command
bwa mem /home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
  $input  $output 2/dev/null/command
inputs
  param format=fastqsanger name=input type=data label=fastq/
/inputs
outputs
data format=sam name=output /
/outputs
 
help
bwa
/help
 
  /tool

 One minor improvement would be to escape the  as gt; in
 your XML, or use the CDATA approach documented here:

 https://wiki.galaxyproject.org/Tools/BestPractices

  Everything ends ok, but when I go to check how is the sam, I see that in
 the
  alingments it is the path of the file, i.e
  example_split.sam:
 
 /home/ralonso/galaxy-dist/database/job_working_directory/000/90/task_2/dataset_91.dat:SRR098409.1113446
  4 * 0 0 * * 0 0
 
 TCTGGGTGAGGGAGTAGTGGGTGAGGGTGTGTGAGGATGTGTAAGTGGATGGAAGTAGATTGAATGTT
 
 
  AS:i:0 XS:i:0
 
  you know what  may be going on?
  If i don't split the file, everything goes correctly.

 This sounds to me like there may be a problem with SAM merging?
 Could you share the entire example_split.sam file (e.g. as a gist
 on GitHub, or via dropbox)?

 Peter




-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] problems splitting

2015-02-25 Thread Roberto Alonso CIPF
Ok, I think I understand the line:
beginning merge: bwa mem
/home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
/home/ralonso/galaxy-dist/database/files/000/dataset_8.dat 
/home/ralonso/galaxy-dist/database/files/000/dataset_94.dat 2 /dev/null
it refers to the original command, so everything is fine with this line.
The other problem still remains
Regards, sorry for the confusion

On 25 February 2015 at 11:40, Roberto Alonso CIPF ralo...@cipf.es wrote:

 Hello again,

 this is something that I consider important, when I see the log I see this
 output:
 galaxy.jobs.runners.tasks DEBUG 2015-02-25 11:33:30,989 execution finished
 -* beginning merge: bwa mem*
 /home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
 /home/ralonso/galaxy-dist/database/files/000/dataset_8.dat 
 /home/ralonso/galaxy-dist/database/files/000/dataset_94.dat 2 /dev/null
 I think the merge should be done with samtools. I don't know how is this
 programmed in Galaxy, but I didn't indicate anywhere the path to samtools,
 is it maybe the problem related with this?

 Thanks a lot,

 Regards


 On 25 February 2015 at 11:13, Roberto Alonso CIPF ralo...@cipf.es wrote:

 Hello,

 I just changed for the CDATA format, but the problem still remains. When
 I split by 2, there is no problem, but when I go for 3, it happens the
 problem commented before. Here it is the link to the sam/bam file:
  https://dl.dropboxusercontent.com/u/1669701/ejemplo_split.bam

 Best regards

 On 24 February 2015 at 17:49, Peter Cock p.j.a.c...@googlemail.com
 wrote:

 On Tue, Feb 24, 2015 at 4:43 PM, Roberto Alonso CIPF ralo...@cipf.es
 wrote:
  Hello again,
 
  first of all thanks for your help, it is being very useful.
 
  What I have done up to now is to copy this method to the class Sequence
 
  def get_split_commands_sequential(is_compressed, input_name,
 output_name,
  start_sequence, sequence_count):
  ...
  return [cmd]
  get_split_commands_sequential =
  staticmethod(get_split_commands_sequential)
 
  This is something that you suggested.

 Good.

  When I run the tool with this configuration:
 
  tool id=bwa_mio name=map with bwa
descriptionmap with bwa/description
parallelism method=basic split_size=3
  split_mode=number_of_parts/parallelism
 
command
bwa mem
 /home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
  $input  $output 2/dev/null/command
inputs
  param format=fastqsanger name=input type=data
 label=fastq/
/inputs
outputs
data format=sam name=output /
/outputs
 
help
bwa
/help
 
  /tool

 One minor improvement would be to escape the  as gt; in
 your XML, or use the CDATA approach documented here:

 https://wiki.galaxyproject.org/Tools/BestPractices

  Everything ends ok, but when I go to check how is the sam, I see that
 in the
  alingments it is the path of the file, i.e
  example_split.sam:
 
 /home/ralonso/galaxy-dist/database/job_working_directory/000/90/task_2/dataset_91.dat:SRR098409.1113446
  4 * 0 0 * * 0 0
 
 TCTGGGTGAGGGAGTAGTGGGTGAGGGTGTGTGAGGATGTGTAAGTGGATGGAAGTAGATTGAATGTT
 
 
  AS:i:0 XS:i:0
 
  you know what  may be going on?
  If i don't split the file, everything goes correctly.

 This sounds to me like there may be a problem with SAM merging?
 Could you share the entire example_split.sam file (e.g. as a gist
 on GitHub, or via dropbox)?

 Peter




 --
 Roberto Alonso
 Functional Genomics Unit
 Bioinformatics and Genomics Department
 Prince Felipe Research Center (CIPF)
 C./Eduardo Primo Yúfera (Científic), nº 3
 (junto Oceanografico)
 46012 Valencia, Spain
 Tel: +34 963289680 Ext. 1021
 Fax: +34 963289574
 E-Mail: ralo...@cipf.es




 --
 Roberto Alonso
 Functional Genomics Unit
 Bioinformatics and Genomics Department
 Prince Felipe Research Center (CIPF)
 C./Eduardo Primo Yúfera (Científic), nº 3
 (junto Oceanografico)
 46012 Valencia, Spain
 Tel: +34 963289680 Ext. 1021
 Fax: +34 963289574
 E-Mail: ralo...@cipf.es




-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] problems splitting

2015-02-24 Thread Roberto Alonso CIPF
Hello again,

first of all thanks for your help, it is being very useful.

What I have done up to now is to copy this method to the class Sequence

def get_split_commands_sequential(is_compressed, input_name, output_name,
start_sequence, sequence_count):

Does a brain-dead sequential scan  extract of certain sequences
 Sequence.get_split_commands_sequential(True, './input.gz',
'./output.gz', start_sequence=0, sequence_count=10)
['zcat ./input.gz | ( tail -n +1 2 /dev/null) | head -40 | gzip
-c  ./output.gz']
 Sequence.get_split_commands_sequential(False, './input.fastq',
'./output.fastq', start_sequence=10, sequence_count=10)
['tail -n +41 ./input.fastq 2 /dev/null | head -40 
./output.fastq']

start_line = start_sequence * 4
line_count = sequence_count * 4
# TODO: verify that tail can handle 64-bit numbers
if is_compressed:
cmd = 'zcat %s | ( tail -n +%s 2 /dev/null) | head -%s |
gzip -c' % (input_name, start_line+1, line_count)
else:
cmd = 'tail -n +%s %s 2 /dev/null | head -%s'  %
(start_line+1, input_name, line_count)
cmd += '  %s' % output_name

return [cmd]
get_split_commands_sequential =
staticmethod(get_split_commands_sequential)

This is something that you suggested.
When I run the tool with this configuration:

tool id=bwa_mio name=map with bwa
  descriptionmap with bwa/description
  parallelism method=basic split_size=3
split_mode=number_of_parts/parallelism

  command
  bwa mem /home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
$input  $output 2/dev/null/command
  inputs
param format=fastqsanger name=input type=data label=fastq/
  /inputs
  outputs
  data format=sam name=output /
  /outputs

  help
  bwa
  /help

/tool
Everything ends ok, but when I go to check how is the sam, I see that in
the alingments it is the path of the file, i.e
example_split.sam:
/home/ralonso/galaxy-dist/database/job_working_directory/000/90/task_2/dataset_91.dat:SRR098409.1113446
4 * 0 0 * * 0 0
TCTGGGTGAGGGAGTAGTGGGTGAGGGTGTGTGAGGATGTGTAAGTGGATGGAAGTAGATTGAATGTT

AS:i:0 XS:i:0

you know what  may be going on?
If i don't split the file, everything goes correctly.

Best regards


On 13 February 2015 at 13:39, Peter Cock p.j.a.c...@googlemail.com wrote:

 On Fri, Feb 13, 2015 at 11:38 AM, Nicola Soranzo nsora...@tiscali.it
 wrote:
  Il 13.02.2015 03:17 Peter Cock ha scritto:
 
  Hi Roberto,
 
  It looks like this is a known issue with FASTQ splitting,
 
  https://trello.com/c/qRHLFSzd/1522-issues-with-tasked-jobs-parallelism
 
  I originally broke it during a refactor, but it looks like the
  discussion died about that that method was meant to do
  (e.g. FQTOC = FASTQ table of contents?):
 
 
 
 https://bitbucket.org/galaxy/galaxy-central/commits/76277761807306ec2be3f1e4059dd7cde6fd2dc6#comment-820648
 
  I'm away from the office so can't try this, but probably all
  that is needed is to copy and paste the old method
  get_split_commands_sequential and the old method
  get_split_commands_with_toc (removed from the
  base Sequence class in the above commit) into the
  base Fastq class instead.
 
  Nicola - did you fix this locally after noticing the
  problem last year?
 
  No, sorry, we disabled Galaxy parallelism because it was using
  too many cluster nodes.
 
  Nicola

 I had similar comments from some of the cluster users
 after getting it working here - but on balance a well used
 cluster helps justify future investment in maintaining it.

 Sorry about not following up on this - I think I might have
 assumed you would take care of it. Unfortunately I won't
 be able to test the obvious fix until at least a week later...

 Peter




-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] problems splitting

2015-02-24 Thread Peter Cock
On Tue, Feb 24, 2015 at 4:43 PM, Roberto Alonso CIPF ralo...@cipf.es wrote:
 Hello again,

 first of all thanks for your help, it is being very useful.

 What I have done up to now is to copy this method to the class Sequence

 def get_split_commands_sequential(is_compressed, input_name, output_name,
 start_sequence, sequence_count):
 ...
 return [cmd]
 get_split_commands_sequential =
 staticmethod(get_split_commands_sequential)

 This is something that you suggested.

Good.

 When I run the tool with this configuration:

 tool id=bwa_mio name=map with bwa
   descriptionmap with bwa/description
   parallelism method=basic split_size=3
 split_mode=number_of_parts/parallelism

   command
   bwa mem /home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
 $input  $output 2/dev/null/command
   inputs
 param format=fastqsanger name=input type=data label=fastq/
   /inputs
   outputs
   data format=sam name=output /
   /outputs

   help
   bwa
   /help

 /tool

One minor improvement would be to escape the  as gt; in
your XML, or use the CDATA approach documented here:

https://wiki.galaxyproject.org/Tools/BestPractices

 Everything ends ok, but when I go to check how is the sam, I see that in the
 alingments it is the path of the file, i.e
 example_split.sam:
 /home/ralonso/galaxy-dist/database/job_working_directory/000/90/task_2/dataset_91.dat:SRR098409.1113446
 4 * 0 0 * * 0 0
 TCTGGGTGAGGGAGTAGTGGGTGAGGGTGTGTGAGGATGTGTAAGTGGATGGAAGTAGATTGAATGTT
 
 AS:i:0 XS:i:0

 you know what  may be going on?
 If i don't split the file, everything goes correctly.

This sounds to me like there may be a problem with SAM merging?
Could you share the entire example_split.sam file (e.g. as a gist
on GitHub, or via dropbox)?

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] problems splitting

2015-02-12 Thread Peter Cock
Hi Roberto,

It looks like this is a known issue with FASTQ splitting,
https://trello.com/c/qRHLFSzd/1522-issues-with-tasked-jobs-parallelism

I originally broke it during a refactor, but it looks like the
discussion died about that that method was meant to do
(e.g. FQTOC = FASTQ table of contents?):

https://bitbucket.org/galaxy/galaxy-central/commits/76277761807306ec2be3f1e4059dd7cde6fd2dc6#comment-820648

I'm away from the office so can't try this, but probably all
that is needed is to copy and paste the old method
get_split_commands_sequential and the old method
get_split_commands_with_toc (removed from the
base Sequence class in the above commit) into the
base Fastq class instead.

Nicola - did you fix this locally after noticing the
problem last year?

Peter

On Wed, Feb 11, 2015 at 3:45 PM, Roberto Alonso CIPF ralo...@cipf.es wrote:
 Hello,

 I am trying to map a a fastqsacer file and map it with bwa, my bwa tool
 config file is this:

 tool id=bwa_mio name=map with bwa
   descriptionmap with bwa/description
   parallelism method=basic split_size=2
 split_mode=number_of_parts/parallelism

   command
   bwa mem /home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
 $input  $output 2xx/command
   inputs
 param format=fastqsanger name=input type=data label=fastq/
   /inputs
   outputs
   data format=sam name=output /
   /outputs

   help
   bwa
   /help

 /tool


 And when I see the stderr I see this error:
 type object 'Sequence' has no attribute 'get_split_commands_sequential'

 It seems that this command that I see in the log is not working
 galaxy.jobs.runners DEBUG 2015-02-11 16:33:48,738 (74) command is:
 /home/ralonso/galaxy-dist/extract_dataset_parts.sh
 /home/ralonso/galaxy-dist/database/job_working_directory/000/74/task_0; bwa
 mem /home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
 /home/ralonso/galaxy-dist/database/job_working_directory/000/74/task_0/dataset_8.dat

 /home/ralonso/galaxy-dist/database/job_working_directory/000/74/task_0/dataset_75.dat

 When I go directly to the code, around line 559 of class
 galaxy.datatypes.sequence I can't find this function
 get_split_commands_sequential anywhere.
 Any idea?

 Thank you very much

 Regards

 --
 Roberto Alonso
 Functional Genomics Unit
 Bioinformatics and Genomics Department
 Prince Felipe Research Center (CIPF)
 C./Eduardo Primo Yúfera (Científic), nº 3
 (junto Oceanografico)
 46012 Valencia, Spain
 Tel: +34 963289680 Ext. 1021
 Fax: +34 963289574
 E-Mail: ralo...@cipf.es

 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   https://lists.galaxyproject.org/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/