Re: [galaxy-dev] Galaxy's dependency on old samtools vs tools wrapping later versions?

2014-11-04 Thread Peter Cock
OK, so this should work then... :)

Thanks Dave,

Peter

On Mon, Nov 3, 2014 at 7:06 PM, Dave Bouvier d...@bx.psu.edu wrote:
 Peter,

 For the automated indexing of bam files, Galaxy uses the samtools version
 linked to as default under tool-dependencies/samtools/

 This should normally be 0.1.19 or older, due to the not-yet-implemented
 handling of bam_index_build and other potential regressions that could be
 uncovered in the future.

--Dave B.


 On 11/03/2014 01:52 PM, Peter Cock wrote:

 Hello all,

 Galaxy currently requires samtools on the $PATH in order to sort
 and index BAM files automatically, and samtools 0.1.19 works fine.

 Unfortunately later versions of samtools index have a regression:
 https://github.com/samtools/samtools/issues/199

 This has caught several people out already,
 e.g. https://biostar.usegalaxy.org/p/7928/
 and https://biostar.usegalaxy.org/p/9335/

 While eventually samtools will be fixed, right now this means we
 can't have samtools 1.1 as the first samtools on the $PATH used
 by Galaxy.

 I am working on a wrapper for samtools bam2fq:
 https://github.com/peterjc/pico_galaxy/tree/master/tools/samtools_bam2fq
 https://testtoolshed.g2.bx.psu.edu/view/peterjc/samtools_bam2fq

 The bam2qf command in samtools 0.1.19 has a number of bugs,
 so I want to target samtools 1.1.  However this has complicated my
 testing since for my BAM input files Galaxy will call samtools index,
 and if it calls samtools 1.1 this will fail.

 I'm not using the tool shed dependencies during development
 so instead came up with the following hack:
 https://github.com/peterjc/picobio/blob/master/sambam/samtools_auto.py

 My question is, what is expected to happen with a Tool Shed installed
 wrapper for samtools 1.1 and Galaxy's attempts to automatically call
 samtools to index any BAM output file? Would the tool environment
 put samtools 1.1 on the (local) $PATH which would then break setting
 the metadata as part of the same job?

 Regards,

 Peter
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Galaxy's dependency on old samtools vs tools wrapping later versions?

2014-11-04 Thread Peter Cock
Fingers crossed - perhaps I jumped the gun uploading this to the main
tool shed without seeing the test results on the Test Tool Shed:

https://toolshed.g2.bx.psu.edu/view/peterjc/samtools_bam2fq
https://testtoolshed.g2.bx.psu.edu/view/peterjc/samtools_bam2fq

I look forward to the automated test results from the Tool Sheds ...
http://lists.bx.psu.edu/pipermail/galaxy-dev/2014-November/020792.html

Thanks,

Peter

On Tue, Nov 4, 2014 at 8:38 AM, Peter Cock p.j.a.c...@googlemail.com wrote:
 OK, so this should work then... :)

 Thanks Dave,

 Peter

 On Mon, Nov 3, 2014 at 7:06 PM, Dave Bouvier d...@bx.psu.edu wrote:
 Peter,

 For the automated indexing of bam files, Galaxy uses the samtools version
 linked to as default under tool-dependencies/samtools/

 This should normally be 0.1.19 or older, due to the not-yet-implemented
 handling of bam_index_build and other potential regressions that could be
 uncovered in the future.

--Dave B.


 On 11/03/2014 01:52 PM, Peter Cock wrote:

 Hello all,

 Galaxy currently requires samtools on the $PATH in order to sort
 and index BAM files automatically, and samtools 0.1.19 works fine.

 Unfortunately later versions of samtools index have a regression:
 https://github.com/samtools/samtools/issues/199

 This has caught several people out already,
 e.g. https://biostar.usegalaxy.org/p/7928/
 and https://biostar.usegalaxy.org/p/9335/

 While eventually samtools will be fixed, right now this means we
 can't have samtools 1.1 as the first samtools on the $PATH used
 by Galaxy.

 I am working on a wrapper for samtools bam2fq:
 https://github.com/peterjc/pico_galaxy/tree/master/tools/samtools_bam2fq
 https://testtoolshed.g2.bx.psu.edu/view/peterjc/samtools_bam2fq

 The bam2qf command in samtools 0.1.19 has a number of bugs,
 so I want to target samtools 1.1.  However this has complicated my
 testing since for my BAM input files Galaxy will call samtools index,
 and if it calls samtools 1.1 this will fail.

 I'm not using the tool shed dependencies during development
 so instead came up with the following hack:
 https://github.com/peterjc/picobio/blob/master/sambam/samtools_auto.py

 My question is, what is expected to happen with a Tool Shed installed
 wrapper for samtools 1.1 and Galaxy's attempts to automatically call
 samtools to index any BAM output file? Would the tool environment
 put samtools 1.1 on the (local) $PATH which would then break setting
 the metadata as part of the same job?

 Regards,

 Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


[galaxy-dev] Can existing SAM/BAM filter tools give me mapped/unmapped pairs?

2014-11-04 Thread Peter Cock
Hi all,

I'm looking for a little advice on the pre-existing SAM/BAM filtering
tools already in the Galaxy Tool Shed (to avoid reinventing the wheel).

As I mentioned on another thread, I'm working on a wrapper for the
samtools bam2fq command (targeting samtools 1.1 which fixed
some bugs in this tool and added new functionality compared to
samtools 0.1.19), see:

https://github.com/peterjc/pico_galaxy/tree/master/tools/samtools_bam2fq
https://toolshed.g2.bx.psu.edu/view/peterjc/samtools_bam2fq
https://testtoolshed.g2.bx.psu.edu/view/peterjc/samtools_bam2fq

One of my motivating use cases is a workflow like this:

1. Upload paired end FASTQ files.
2. Map them against a known contaminant genome giving a BAM file
(note I need the mapper to report unmapped reads in the output).
3. Filter the BAM to get unmapped reads, plus reads whose partner is
unmapped (conversely, remove reads where both partners are mapped).
4. Convert the filtered BAM back into FASTQ (with samtools bam2fq).
5. Proceed with analysis (e.g. de novo assembly).

Assuming I have understood samtools view, this filtering step
has to be multiple parts:

This would get the unmapped reads
$ samtools view -f 0x4 ...

This would get reads with an unmapped partner:
$ samtools view -f 0x8 ...

However this would only get unmapped reads with an unmapped partner:
$ samtools view -f 0x12 ...

i.e. samtools view allows logical AND, not logical OR, when combining
flag filters.

So, I believe using samtools directly, a two stage filter is needed followed
by a merge (and sort), taking care not to duplicate reads, perhaps:

$ samtools view -f 4 ...  unmapped.bam
$ samtools view -f 8 -F 4 ...  mapped_with_partner_unmapped.bam
$ samtools merge unmapped.bam mapped_with_partner_unmapped.bam  ...

That could be repeated within Galaxy but is surprisingly complicated
with multiple steps in the history - so I do not want to go that route.

Have I overlooked a simple ToolShed solution using samtools?

As far as I could tell, the only other option on the current Tool Shed
is the Sambamba Filter tool (using unmapped or mate_is_unmapped),
which has a very capable looking filter system:
https://toolshed.g2.bx.psu.edu/view/lomereiter/sambamba_filter

@Artem - have you explored updating your tool_dependencies.xml
to download your pre-compiled binaries by default? That would
make deployment far easier, since D compilers are still rare, and
would mean we can see the test results on the Tool Shed :)
Please ask if you'd like advice on Tool Shed packaging.

Thanks,

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Can existing SAM/BAM filter tools give me mapped/unmapped pairs?

2014-11-04 Thread Peter Cock
On Tue, Nov 4, 2014 at 2:44 PM, Peter Cock p.j.a.c...@googlemail.com wrote:
 Hi all,

 I'm looking for a little advice on the pre-existing SAM/BAM filtering
 tools already in the Galaxy Tool Shed (to avoid reinventing the wheel).

 As I mentioned on another thread, I'm working on a wrapper for the
 samtools bam2fq command (targeting samtools 1.1 which fixed
 some bugs in this tool and added new functionality compared to
 samtools 0.1.19), see:

 https://github.com/peterjc/pico_galaxy/tree/master/tools/samtools_bam2fq
 https://toolshed.g2.bx.psu.edu/view/peterjc/samtools_bam2fq
 https://testtoolshed.g2.bx.psu.edu/view/peterjc/samtools_bam2fq


Going off topic, but I just hit a problem here:
https://github.com/samtools/samtools/issues/313

Depending on if the reads have a QUAL value or not, samtool bam2fq
will produce either FASTA or FASTQ output - and will happily give
a mixture in one file. I know Heng Li has a parser that will take this
kind of input, but Galaxy likes to have well defined file formats.

I may have to fix samtools, perhaps by adding a strict FASTQ
output mode?

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


[galaxy-dev] importing a large number of samples via a sample sheet

2014-11-04 Thread Ryan G
Hi all - I've recently come back to Galaxy to see how its progressed over
the last few years.  The tools shed is completely new to me and will take
some exploring on my part.

My immediate question is if there is a way to import a large number of
samples associated with a project via a sample sheet (txt) file?  I have a
text file with 3 columns: sample name, location of fastq files.

I haven't seen a way to import this and haven't seen anything in the docs
either.  I suspect the answer is no, but I'm hoping the answer is actually
yes?

Ryan
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/