On 27 May 2014, at 10:58, Wolfgang Maier <wolfgang.ma...@biologie.uni-freiburg.de> wrote: > On 22.05.2014 20:59, Kate Im wrote: >> the number of unmapped read (estimated by >> subtracting the reported number of mapped reads from the reported number >> of total reads) is always higher than the number of sequences with an "*" >> in the third column of the SAM file. Shouldn't these be the same? > > Ideally, yes, but the SAM/BAM format specifications > (http://samtools.github.io/hts-specs/SAMv1.pdf) say that: > > "Bit 0x4 [in the FLAG field] is the only reliable place to tell whether > the segment is unmapped.
In particular (see ยง2, 4.1 of that document), there is the common convention for pairs in which just one end is mapped, of giving both reads the RNAME and POS (3rd and 4th) columns of the mapped end. This has the useful side-effect of bringing the unmapped end alongside its mate when the file is coordinate-sorted. John -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------ The best possible search technologies are now affordable for all companies. Download your FREE open source Enterprise Search Engine today! Our experts will assist you in its installation for $59/mo, no commitment. Test it for FREE on our Cloud platform anytime! http://pubads.g.doubleclick.net/gampad/clk?id=145328191&iu=/4140/ostg.clktrk _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help