Hi James, Thanks for your reply. I used UCSC hg19 from Illumina iGenomes. Is the value 8192 abnormally large for NM tag? This is a BAM from RNA-seq, so there is an intron of 8192 nt. How does NM compute? Does NM consider CIGAR "N" into account? Thanks.
One example read in the Initial BAM DF9F08P1:388:C737JACXX:1:1207:17084:11061 163 chr1 10412727 255 68M8192N8M = 10412789 138 GTTGATGCCATCCTCTCCCTAAATATTATTTCTGCCAAGTACCTGAAGTCTTCCCACAACTCTAGCAGGACCTTCT CCCFFFFFHHHHGJJJJJJIJIJJIJIJJJJJJJJJJJJHHIJJJIIJGIBGIIJJJJJJJJIJJJJIJJJJIJJH AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:76 YS:i:0 YT:Z:CP XS:A:+ NH:i:1 The same read after conversion: DF9F08P1:388:C737JACXX:1:1207:17084:11061 163 chr1 10412727 255 68M8192N8M = 10412789 138 GTTGATGCCATCCTCTCCCTAAATATTATTTCTGCCAAGTACCTGAAGTCTTCCCACAACTCTAGCAGGACCTTCT CCCFFFFFHHHHGJJJJJJIJIJJIJIJJJJJJJJJJJJHHIJJJIIJGIBGIIJJJJJJJJIJJJJIJJJJIJJH AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 YS:i:0 YT:Z:CP XS:A:+ NH:i:1 MD:Z:76 NM:i:8192 Initial BAM header: @HD VN:1.0 SO:coordinate @SQ SN:chrM LN:16571 @SQ SN:chr1 LN:249250621 @SQ SN:chr2 LN:243199373 @SQ SN:chr3 LN:198022430 @SQ SN:chr4 LN:191154276 @SQ SN:chr5 LN:180915260 @SQ SN:chr6 LN:171115067 @SQ SN:chr7 LN:159138663 @SQ SN:chr8 LN:146364022 @SQ SN:chr9 LN:141213431 @SQ SN:chr10 LN:135534747 @SQ SN:chr11 LN:135006516 @SQ SN:chr12 LN:133851895 @SQ SN:chr13 LN:115169878 @SQ SN:chr14 LN:107349540 @SQ SN:chr15 LN:102531392 @SQ SN:chr16 LN:90354753 @SQ SN:chr17 LN:81195210 @SQ SN:chr18 LN:78077248 @SQ SN:chr19 LN:59128983 @SQ SN:chr20 LN:63025520 @SQ SN:chr21 LN:48129895 @SQ SN:chr22 LN:51304566 @SQ SN:chrX LN:155270560 @SQ SN:chrY LN:59373566 @PG ID:hisat2 PN:hisat2 VN:2.0.1-beta header of BAM in the end: @HD VN:1.0 SO:coordinate @SQ SN:chrM LN:16571 M5:d2ed829b8a1628d16cbeee88e88e39eb UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr1 LN:249250621 M5:1b22b98cdeb4a9304cb5d48026a85128 UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr2 LN:243199373 M5:a0d9851da00400dec1098a9255ac712e UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr3 LN:198022430 M5:641e4338fa8d52a5b781bd2a2c08d3c3 UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr4 LN:191154276 M5:23dccd106897542ad87d2765d28a19a1 UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr5 LN:180915260 M5:0740173db9ffd264d728f32784845cd7 UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr6 LN:171115067 M5:1d3a93a248d92a729ee764823acbbc6b UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr7 LN:159138663 M5:618366e953d6aaad97dbe4777c29375e UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr8 LN:146364022 M5:96f514a9929e410c6651697bded59aec UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr9 LN:141213431 M5:3e273117f15e0a400f01055d9f393768 UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr10 LN:135534747 M5:988c28e000e84c26d552359af1ea2e1d UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr11 LN:135006516 M5:98c59049a2df285c76ffb1c6db8f8b96 UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr12 LN:133851895 M5:51851ac0e1a115847ad36449b0015864 UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr13 LN:115169878 M5:283f8d7892baa81b510a015719ca7b0b UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr14 LN:107349540 M5:98f3cae32b2a2e9524bc19813927542e UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr15 LN:102531392 M5:e5645a794a8238215b2cd77acb95a078 UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr16 LN:90354753 M5:fc9b1a7b42b97a864f56b348b06095e6 UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr17 LN:81195210 M5:351f64d4f4f9ddd45b35336ad97aa6de UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr18 LN:78077248 M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr19 LN:59128983 M5:1aacd71f30db8e561810913e0b72636d UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr20 LN:63025520 M5:0dec9660ec1efaaf33281c0d5ea2560f UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr21 LN:48129895 M5:2979a6085bfe28e3ad6f552f361ed74d UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chr22 LN:51304566 M5:a718acaa6135fdca8357d5bfe94211dd UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chrX LN:155270560 M5:7e0e2e580297b7764e31dbc80c2540dd UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @SQ SN:chrY LN:59373566 M5:1e86411d73e6f00a10590f976be01623 UR:/home/lin/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa @PG ID:hisat2 PN:hisat2 VN:2.0.1-beta Chih-Hsu ------------------------------------------------------------ Chih-Hsu Lin Grad student Lichtarge Lab (T929) SCBMB, Baylor College of Medicine chihh...@bcm.edu ________________________________________ From: James Bonfield <j...@sanger.ac.uk> Sent: Wednesday, May 18, 2016 3:25:17 AM To: Lin, Chih-Hsu Cc: samtools-help@lists.sourceforge.net Subject: Re: [Samtools-help] About BAM->CRAM->BAM On Tue, May 17, 2016 at 09:17:20PM +0000, Lin, Chih-Hsu wrote: > During the conversion, BAM->CRAM->BAM, using samtools 1.3.1, I found the NM > tags were changed. Does anyone have solution to that? Can you give us a concrete example please; an alignment record along with the relevant @SQ line so we can see what the reference is. The reason for NM and MD changes is that CRAM doesn't explicitly store these (although it could, it leads to larger files). Instead it uses the reference to compute them on-the-fly. However we have seen a number of cases where NM/MD in the original BAM file are incorrect, due to bugs in aligners. This leads to changes after round-tripping through CRAM. So that said, are the NM values output by CRAM->BAM the same values that samtools calmd generates? If so then this is fixing your data! If not then we possibly have a bug that needs fixing. Some have asked for a way to store the invalid data in CRAM regardless. There is perhaps some (albeit twisted) logic to this as it makes the validation of the data the responsibility of other tools and not the file format itself. I experimented with this by writing out all NM/MD, but it usually leads to 5-10% growth. A better solution would be to check when they differ to the computed values and only store them then, although that will slow up CRAM encoding somewhat so I'm not convinced yet this is a problem in need of a solution. James -- James Bonfield (j...@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova | Plurima gyrabant gymbolitare vabo; A Staden Package developer: | Et Borogovorum mimzebant undique formae, https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------ Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help