On 27 May 2014, at 10:58, Wolfgang Maier 
<wolfgang.ma...@biologie.uni-freiburg.de> wrote:
> On 22.05.2014 20:59, Kate Im wrote:
>> the number of unmapped read (estimated by
>> subtracting the reported number of mapped reads from the reported number
>> of total reads) is always higher than the number of sequences with an "*"
>> in the third column of the SAM file. Shouldn't these be the same?
> 
> Ideally, yes, but the SAM/BAM format specifications 
> (http://samtools.github.io/hts-specs/SAMv1.pdf) say that:
> 
> "Bit 0x4 [in the FLAG field] is the only reliable place to tell whether 
> the segment is unmapped.

In particular (see ยง2, 4.1 of that document), there is the common convention 
for pairs in which just one end is mapped, of giving both reads the RNAME and 
POS (3rd and 4th) columns of the mapped end.  This has the useful side-effect 
of bringing the unmapped end alongside its mate when the file is 
coordinate-sorted.

    John

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

------------------------------------------------------------------------------
The best possible search technologies are now affordable for all companies.
Download your FREE open source Enterprise Search Engine today!
Our experts will assist you in its installation for $59/mo, no commitment.
Test it for FREE on our Cloud platform anytime!
http://pubads.g.doubleclick.net/gampad/clk?id=145328191&iu=/4140/ostg.clktrk
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to