Re: [galaxy-dev] SAM/BAM Hybrid Selection Metrics

2012-01-10 Thread Ryan Golhar
I've been looking at this all day today, and here's what I've figured out.
 The picard_wrapper.py simply puts the SAM header from the input BAM file
at the top of the BED file.  However, the interval file actually has
different columns of the order:
Seq Name, Start Pos (1-based), End Pos, Strand, Interval Name.
whereas the Bed file use the format of
Seq Name, Start Pos (0-based), End Pos, Name, Score, Strand

So the bed file actually needs to be converted and not just have the SAM
header added.  I wonder if the wrapper should NOT be doing this but this
should be a whole different file format.  I see in datatypes_conf.xml that
a picard_interval_list datatype exists, but I'm not sure its entirely
correct either.  Would it be more appropriate to have the user upload a
correctly formatted file or should the wrapper just re-order the BED
columns and add 1 to the start pos?


On Tue, Jan 10, 2012 at 11:24 AM, Ryan Golhar
wrote:

> In case anyone is interested I posted a message to samtools-dev and got a
> few responses about it.  The thread is called 'Picard bait/target format
> file for HsMetrics'.  Now, for Galaxy, I think the wrapper should not
> accept the BED file as input as that doesn't work.  I like the idea of a
> new file format (picardBaitTarget or maybe picardIntervalList) as the input
> type.
>
> If the converter tool adds a header to the BED file, then there is the
> possibility that a user can associated the BED file with the wrong version
> of a genome.  This is what Picard was trying to avoid.  But that doesn't
> mean a user can't manually add the wrong header anyway.  If the BED file is
> missing strand information, I don't think the tool should add it.  I would
> say just leave the rest of the file alone.  If there is no strand
> information, perhaps the user doesn't care about the strand.
>
> On Mon, Jan 9, 2012 at 6:11 PM, Ross  wrote:
>
>> Hi Ryan,
>>
>> Yes, the Picard tool mandates a bizarre bait/target format file for
>> reasons which might best be addressed to the Picard devs - they may
>> have some very good reasons although I can't imagine what they are.
>> :)
>>
>> Yes, automated conversion of any valid Galaxy bed dataset into the
>> strange format required by the Picard tool is a very good idea. We're
>> already half way there because the tool wrapper adds the (IMHO really
>> silly) required SAM header automagically.
>>
>> A new datatype (eg "picardBaitTarget") and an automated converter
>> would make the tool much easier to use - it's far from ideal to force
>> Galaxy users to comply with the strange Picard format requirements if
>> we can automate a converter.
>>
>> I thought about implementing one but stopped when I realized that am
>> not sure what an automated converter should do if the user supplies a
>> valid Galaxy bed lacking strand information - generally, making up
>> strand is not a good idea. I don't have enough insight into the way
>> the stats are calculated to know whether bad things might happen if
>> (eg) we assume all the bait and target regions are on the + strand if
>> they're not - but if someone can describe how to automate the
>> conversion, it would definitely be an improvement to the usability of
>> the Picard tool.
>>
>> Suggestions welcomed!
>>
>>
>> On Tue, Jan 10, 2012 at 8:03 AM, Ryan Golhar
>>  wrote:
>> > Hi all - I think there is a problem with the Picard HSMetrics wrapper in
>> > Galaxy.  The wrapper accepts a BAM files and a BED file.  However the
>> BED
>> > file isn't really in a BED format...it requires a SAM header before the
>> BED
>> > lines.  This really isn't a BED file format.  I'm not quite sure how
>> Galaxy
>> > should deal with this...maybe a file format specific for Picard
>> formatted
>> > BED file.
>> >
>> >
>>
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] SAM/BAM Hybrid Selection Metrics

2012-01-10 Thread Ryan Golhar
In case anyone is interested I posted a message to samtools-dev and got a
few responses about it.  The thread is called 'Picard bait/target format
file for HsMetrics'.  Now, for Galaxy, I think the wrapper should not
accept the BED file as input as that doesn't work.  I like the idea of a
new file format (picardBaitTarget or maybe picardIntervalList) as the input
type.

If the converter tool adds a header to the BED file, then there is the
possibility that a user can associated the BED file with the wrong version
of a genome.  This is what Picard was trying to avoid.  But that doesn't
mean a user can't manually add the wrong header anyway.  If the BED file is
missing strand information, I don't think the tool should add it.  I would
say just leave the rest of the file alone.  If there is no strand
information, perhaps the user doesn't care about the strand.

On Mon, Jan 9, 2012 at 6:11 PM, Ross  wrote:

> Hi Ryan,
>
> Yes, the Picard tool mandates a bizarre bait/target format file for
> reasons which might best be addressed to the Picard devs - they may
> have some very good reasons although I can't imagine what they are.
> :)
>
> Yes, automated conversion of any valid Galaxy bed dataset into the
> strange format required by the Picard tool is a very good idea. We're
> already half way there because the tool wrapper adds the (IMHO really
> silly) required SAM header automagically.
>
> A new datatype (eg "picardBaitTarget") and an automated converter
> would make the tool much easier to use - it's far from ideal to force
> Galaxy users to comply with the strange Picard format requirements if
> we can automate a converter.
>
> I thought about implementing one but stopped when I realized that am
> not sure what an automated converter should do if the user supplies a
> valid Galaxy bed lacking strand information - generally, making up
> strand is not a good idea. I don't have enough insight into the way
> the stats are calculated to know whether bad things might happen if
> (eg) we assume all the bait and target regions are on the + strand if
> they're not - but if someone can describe how to automate the
> conversion, it would definitely be an improvement to the usability of
> the Picard tool.
>
> Suggestions welcomed!
>
>
> On Tue, Jan 10, 2012 at 8:03 AM, Ryan Golhar
>  wrote:
> > Hi all - I think there is a problem with the Picard HSMetrics wrapper in
> > Galaxy.  The wrapper accepts a BAM files and a BED file.  However the BED
> > file isn't really in a BED format...it requires a SAM header before the
> BED
> > lines.  This really isn't a BED file format.  I'm not quite sure how
> Galaxy
> > should deal with this...maybe a file format specific for Picard formatted
> > BED file.
> >
> >
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] SAM/BAM Hybrid Selection Metrics

2012-01-09 Thread Ross
Hi Ryan,

Yes, the Picard tool mandates a bizarre bait/target format file for
reasons which might best be addressed to the Picard devs - they may
have some very good reasons although I can't imagine what they are.
:)

Yes, automated conversion of any valid Galaxy bed dataset into the
strange format required by the Picard tool is a very good idea. We're
already half way there because the tool wrapper adds the (IMHO really
silly) required SAM header automagically.

A new datatype (eg "picardBaitTarget") and an automated converter
would make the tool much easier to use - it's far from ideal to force
Galaxy users to comply with the strange Picard format requirements if
we can automate a converter.

I thought about implementing one but stopped when I realized that am
not sure what an automated converter should do if the user supplies a
valid Galaxy bed lacking strand information - generally, making up
strand is not a good idea. I don't have enough insight into the way
the stats are calculated to know whether bad things might happen if
(eg) we assume all the bait and target regions are on the + strand if
they're not - but if someone can describe how to automate the
conversion, it would definitely be an improvement to the usability of
the Picard tool.

Suggestions welcomed!


On Tue, Jan 10, 2012 at 8:03 AM, Ryan Golhar
 wrote:
> Hi all - I think there is a problem with the Picard HSMetrics wrapper in
> Galaxy.  The wrapper accepts a BAM files and a BED file.  However the BED
> file isn't really in a BED format...it requires a SAM header before the BED
> lines.  This really isn't a BED file format.  I'm not quite sure how Galaxy
> should deal with this...maybe a file format specific for Picard formatted
> BED file.
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/



-- 
Ross Lazarus MBBS MPH;
Associate Professor, Harvard Medical School;
Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444;

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] SAM/BAM Hybrid Selection Metrics

2012-01-09 Thread Ryan Golhar
Hi all - I think there is a problem with the Picard HSMetrics wrapper in
Galaxy.  The wrapper accepts a BAM files and a BED file.  However the BED
file isn't really in a BED format...it requires a SAM header before the BED
lines.  This really isn't a BED file format.  I'm not quite sure how Galaxy
should deal with this...maybe a file format specific for Picard formatted
BED file.
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/