Hi,
I posted a similar question on Biostars and realized I should have come
here to begin with. We received ~1000 whole-genome bams that didn't have
the @RG tag in the reads (existed in the header though). We used 'bamaddrg'
to add @RG tags to the reads and are now getting the following error when
we use Picard's MarkDuplicates:
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM
validation error: ERROR: Record 1642900, Read name
HS2000-1005_167:8:1103:3541:88508, bin field of BAM record does not
equal value computed based on alignment start and end, and length of
sequence to which read is aligned
at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:452)
at
htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:643)
at
htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:628)
at
htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:598)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:514)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:488)
at
picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:413)
at picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:177)
at
picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183)
at picard.sam.MarkDuplicates.main(MarkDuplicates.java:161)
Using 'VALIDATION_STRINGENCY=LENIENT' MarkDuplicates will ignore the error,
but it prints it for a large number of reads (I didn't count how many).
Another thread on the mailing list (
http://sourceforge.net/p/samtools/mailman/message/31853465/) says that is
"bad" and we can use the following command to fix it: 'java -classpath
sam-1.99.jar net.sf.samtools.FixBAMFile test.bam fixed.bam'
We have so many large bams though.
Questions:
1. The error states that the bin is calculated based on alignment start and
end. These values did not change! So why would the calculated bin change?
2. Is there a more manageable way to avoid the incorrect bins while adding
@RG tags?
Here are two read pairs to compare:
### BEGIN ###
HS2000-1005_167:8:1103:3541:88508 73 chr1 5881857 254 100M * 0 0
CCGTGCAGTTCCCTTGGGTTTTGAAGCAAAGCCACAGTCTCTTCAGCAAACAACTATTTCCTTTAAAGACACAGTTCAGGAGTTGCTTCTGGACCTGATG
@?@FFFFFHGHHDHCHIIAFHGGGHGCHHJJJIGIIIBDABDHHGBEG3BFDCHIIIIIHBHIGHIGH@
@EHH>?;CD;;;;(6@CDC>CC(;(5(9?@C BC:Z:0 XD:Z:100 SM:i:500 AS:i:0
HS2000-1005_167:8:1103:3541:88508 133 chr1 5881857 0 * = 5881857 0
GGGGGGCCAAGGGGGGGGTTGGGCACAGGGGGAGGGGGGACGGGGGGGAAATCCCTCCCGCGTCGGGTTACAATATTTTTTCTGGCTCCTTTGGTCCCGG
####################################################################################################
BC:Z:0
HS2000-1005_167:8:1103:3541:88508 73 chr1 5881857 254 100M * 0 0
CCGTGCAGTTCCCTTGGGTTTTGAAGCAAAGCCACAGTCTCTTCAGCAAACAACTATTTCCTTTAAAGACACAGTTCAGGAGTTGCTTCTGGACCTGATG
@?@FFFFFHGHHDHCHIIAFHGGGHGCHHJJJIGIIIBDABDHHGBEG3BFDCHIIIIIHBHIGHIGH@
@EHH>?;CD;;;;(6@CDC>CC(;(5(9?@C BC:Z:0 XD:Z:100 SM:i:500 AS:i:0 RG:Z:MYGROUP
HS2000-1005_167:8:1103:3541:88508 133 chr1 5881857 0 * = 5881857 0
GGGGGGCCAAGGGGGGGGTTGGGCACAGGGGGAGGGGGGACGGGGGGGAAATCCCTCCCGCGTCGGGTTACAATATTTTTTCTGGCTCCTTTGGTCCCGG
####################################################################################################
BC:Z:0 RG:Z:MYGROUP
### END ###
Thanks!
------------------------------------------------------------------------------
Slashdot TV.
Video for Nerds. Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help