Hi,

I posted a similar question on Biostars and realized I should have come
here to begin with. We received ~1000 whole-genome bams that didn't have
the @RG tag in the reads (existed in the header though). We used 'bamaddrg'
to add @RG tags to the reads and are now getting the following error when
we use Picard's MarkDuplicates:

Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM
validation error: ERROR: Record 1642900, Read name
HS2000-1005_167:8:1103:3541:88508, bin field of BAM record does not
equal value computed based on alignment start and end, and length of
sequence to which read is aligned
        at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:452)
        at 
htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:643)
        at 
htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:628)
        at 
htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:598)
        at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:514)
        at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:488)
        at 
picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:413)
        at picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:177)
        at 
picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183)
        at picard.sam.MarkDuplicates.main(MarkDuplicates.java:161)


Using 'VALIDATION_STRINGENCY=LENIENT' MarkDuplicates will ignore the error,
but it prints it for a large number of reads (I didn't count how many).
Another thread on the mailing list (
http://sourceforge.net/p/samtools/mailman/message/31853465/) says that is
"bad" and we can use the following command to fix it: 'java -classpath
sam-1.99.jar net.sf.samtools.FixBAMFile test.bam fixed.bam'

We have so many large bams though.

Questions:
1. The error states that the bin is calculated based on alignment start and
end. These values did not change! So why would the calculated bin change?
2. Is there a more manageable way to avoid the incorrect bins while adding
@RG tags?

Here are two read pairs to compare:

### BEGIN ###
HS2000-1005_167:8:1103:3541:88508 73 chr1 5881857 254 100M * 0 0
CCGTGCAGTTCCCTTGGGTTTTGAAGCAAAGCCACAGTCTCTTCAGCAAACAACTATTTCCTTTAAAGACACAGTTCAGGAGTTGCTTCTGGACCTGATG
@?@FFFFFHGHHDHCHIIAFHGGGHGCHHJJJIGIIIBDABDHHGBEG3BFDCHIIIIIHBHIGHIGH@
@EHH>?;CD;;;;(6@CDC>CC(;(5(9?@C BC:Z:0 XD:Z:100 SM:i:500 AS:i:0
HS2000-1005_167:8:1103:3541:88508 133 chr1 5881857 0 * = 5881857 0
GGGGGGCCAAGGGGGGGGTTGGGCACAGGGGGAGGGGGGACGGGGGGGAAATCCCTCCCGCGTCGGGTTACAATATTTTTTCTGGCTCCTTTGGTCCCGG
####################################################################################################
BC:Z:0

HS2000-1005_167:8:1103:3541:88508 73 chr1 5881857 254 100M * 0 0
CCGTGCAGTTCCCTTGGGTTTTGAAGCAAAGCCACAGTCTCTTCAGCAAACAACTATTTCCTTTAAAGACACAGTTCAGGAGTTGCTTCTGGACCTGATG
@?@FFFFFHGHHDHCHIIAFHGGGHGCHHJJJIGIIIBDABDHHGBEG3BFDCHIIIIIHBHIGHIGH@
@EHH>?;CD;;;;(6@CDC>CC(;(5(9?@C BC:Z:0 XD:Z:100 SM:i:500 AS:i:0 RG:Z:MYGROUP
HS2000-1005_167:8:1103:3541:88508 133 chr1 5881857 0 * = 5881857 0
GGGGGGCCAAGGGGGGGGTTGGGCACAGGGGGAGGGGGGACGGGGGGGAAATCCCTCCCGCGTCGGGTTACAATATTTTTTCTGGCTCCTTTGGTCCCGG
####################################################################################################
BC:Z:0 RG:Z:MYGROUP
### END ###

Thanks!
------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to