I checked some of the offending records, and they did NOT have the 0x800 bit
set (on any of them). It seems that samtools sort -n is able to handle this
case, and so we will be doing a samtools sort -n followed by a little java
class to fix the sorting declaration in the header, eliminate the dups, and
strip out alignment info. Hopefully that'll do it!
Thanks for the help.
Michael
From: Alec Wysoker [mailto:al...@broadinstitute.org]
Sent: Monday, June 16, 2014 12:29 PM
To: Rusch, Michael
Cc: samtools-help@lists.sourceforge.net
Subject: Re: [Samtools-help] working with file with multiple primary alignme
nts. .
Hi Michael,
Note that some aligners, e.g. BWA, can now produce split alignments if an
alignment cannot be represented by a coordinate + cigar string. In a group of
SAMRecords representing such an alignment, all but one of them should have the
0x800 supplementary flag set. Newer versions of Picard handle this properly.
I.e. for a given {read name, end number} there should be only one SAMRecord
that has neither 0x800 (supplementary) nor 0x100 (secondary) flag set.
-Alec
On Jun 16, 2014, at 1:08 PM, Rusch, Michael
<michael.ru...@stjude.org<mailto:michael.ru...@stjude.org>> wrote:
We downloaded some BAM files for some public data. We would like to extract
raw reads and remap. We've had trouble doing this so far for this one
particular data set using Picard, and after some investigating, the root cause
seems to be that they have some reads that have more than one primary
alignment. So, for a given read name and mate (1/2), there are in some cases
multiple SAM records that do not have the non-primary flag set. This seems
"wrong" to me. Am I missing something?
In any case, right or wrong, we're having trouble working with it. SamToFastq
throws an exception, as does SortSam (we were going to sort by queryname and
then use some simple code to filter out the dups before converting to FASTQ).
Anybody successfully dealt with this before who could provide some advice?
FYI: we're using an old version of picard, so we're trying with a newer version
in case the newer version can handle it. We're also going to try samtools sort
followed by the filtering code. But, I thought maybe if others had dealt with
this, they could enlighten me with their wisdom and save us some work
reinventing the wheel.
Michael
________________________________
Email Disclaimer:
www.stjude.org/emaildisclaimer<http://www.stjude.org/emaildisclaimer>
Consultation Disclaimer:
www.stjude.org/consultationdisclaimer<http://www.stjude.org/consultationdisclaimer>
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net<mailto:Samtools-help@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/samtools-help
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help