On Mon, 16 Jan 2017, Colin Hercus wrote:

> Sometimes we get reads to align that have been trimmed to zero length and
> I'm wondering how these should be represented in SAM format.
>
> Here's a pair as reported by Novoalign that had been trimmed by cutadapt
> and one read of the pair is zero length
>
> READID    77    *    0    0    *    *    0    0        *    PG:Z:novoalign
> READID    141    *    0    0    *    *    0    0
> GTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAGGGG
> EEDDB:=<;A9/=C=@A;:<,1:<?@.0<./;;;AC.;;5@::    PG:Z:novoalign
>
> The first read of the pair has a zero length SEQ field.
>
> This pair fails with a parse error in Samtools Version: 1.2 (using htslib
> 1.2.1) but is accepted by Samtools Version: 0.1.19-44428cd.
>
> What is a valid SAM record for a zero length read?

The sequence should be '*' rather than blank.  In fact, the latest version 
of samtools seems to correct your record to this instead of complaining 
about it:

cat > /tmp/test.sam
READID  77      *       0       0       *       *       0       0               
*       PG:Z:novoalign
READID  141     *       0       0       *       *       0       0       
GTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAGGGG     
EEDDB:=<;A9/=C=@A;:<,1:<?@.0<./;;;AC.;;5@::     PG:Z:novoalign

samtools view /tmp/test.sam
READID  77      *       0       0       *       *       0       0       *       
*       PG:Z:novoalign
READID  141     *       0       0       *       *       0       0       
GTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAGGGG     
EEDDB:=<;A9/=C=@A;:<,1:<?@.0<./;;;AC.;;5@::     PG:Z:novoalign


Rob Davies              r...@sanger.ac.uk
The Sanger Institute    http://www.sanger.ac.uk/
Hinxton, Cambs.,        Tel. +44 (1223) 834244
CB10 1SA, U.K.          Fax. +44 (1223) 494919


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to