On 6 Nov 2014, at 01:28, Mark Ravinet <mravi...@nig.ac.jp> wrote:
> Grepped lines from a typical header are as follows:
> 
>> @PG  ID:GSNAP        PN:gsnap        VN:2013-11-27   CL:gsnap -d stick_ref 
>> --maxsearch=10 -M 0 -m 5 -t 6 -n 1 -A sam --quiet-if-excessive 
>> --terminal-threshold=10 -i 2 ./samples_081014/CHA13_N_PA_F.fq
> 
> Similarly, the grepped line of one of the entry errors is as follows:
> 
>> 1_1101_10360_49339_1 16      groupI  11728   40      90M     *       0       
>> 0       
>> TCAATTATATTTAATATGAATAGTTACACCGTTAAACCAGCGTTGCATTTTTCCTCTCAAGGAATCCCTAGAGCCGCTTGCGTGCCTGCA
>>       
>> C>DCDDCDCAEDDDEDEEECDCCA>BA?DDDDCBA?CADFFHFFEHGIIGIGJJJIGFIHHCIJIIHGHFJJJIJIIJJIGGFFCCIHFH
>>       MD:Z:90 NH:i:1  HI:i:1  NM:i:0  SM:i:40 XQ:i:40 X2:i:0  XO:Z:UU PG:Z:A
> 
> I guess the error seems to have something to do with the “A” tag at the end 
> of the entry?

The purpose of having a PG:Z: field in a record is to tie that record back to a 
corresponding @PG header that describes the processing applied to that record 
(and any others with the same PG field).  So if there is no @PG ID:A header, 
the PG:Z:A field is useless; the SAM specification is somewhat unclear here, 
but this scenario is basically invalid as well as useless.

The record is still kept; "tag lost" just means that the PG:Z:A field is 
removed.

> However I ran these commands on a group of bams including several which do 
> not produce this error and they also have the “A” tag and an identical header.

This warning message is produced when samtools is merging the temporary files 
produced while sorting a large BAM file.  So probably the several that don't 
produce this message happen to be small enough that they can be sorted without 
temporary files.  They still manifest this problem, but it is not diagnosed.

(That said, it's not really samtools's style to enforce this sort of thing.  So 
perhaps this could be relaxed somewhat.  BTW in each BAM file do you get one or 
a few messages like this, or thousands?)

    John

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

------------------------------------------------------------------------------
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to