Hi Axel,

The sort command can simply be "samtools-1.1 sort try5 result_1.1", you
don't need to cat things.

The pretty_header function tries to reorganize the header to be in a nice
order (so @HD lines first, then @SQ, @RG, @PG, and @CO in that order) .
You'll get the error you received any time the header produced by that
function has an unexpected length. In fact, the header you showed from
result_0.1.19.bam also looks very strange. So, either that function has
always had issues or the original header is odd enough that the function
just can't deal. In any case, if you can shrink that BAM file down enough
(just subset it) such that it still produces the problem and then either
post it somewhere (dropbox, google drive, etc.) or email it then I or
someone else on the list can probably have a look.

Best,
Devon


--
Devon Ryan, Ph.D.
Email: dpr...@dpryan.com
Laboratory for Molecular and Cellular Cognition
German Centre for Neurodegenerative Diseases (DZNE)
Ludwig-Erhard-Allee 2
53175 Bonn
Germany
<devon.r...@dzne.de>

On Thu, Jan 22, 2015 at 12:59 PM, Axel Rasche <ras...@molgen.mpg.de> wrote:

> Dear List
>
> Using samtools V1.1 for a SAM to BAM sorting leads to a "pretty_header"
> where information in Google is scarce. So I have no clue, what this
> error is about. The same step does not show problems using samtools
> V0.1.19. The command lines describing input, error and possible output
> are shown below. The error shows up in a later step of samtools sort,
> leaving behind a list of intermediate BAM files.
>
> Thanks and best regards, Axel
>
>
> Appendix:
>
> #ll try5
> -rw-rw---- 1 user group 13G Jan 21 16:15 try5
>
> #samtools-1.1 view -H try5
> @PG ID:bwa PN:bwa VN:0.7.10-r789 CL:bwa samse -n 2 input.fa - input.fastq
> @HD VN:1.3 SO:coordinate
> @SQ SN:chr22 LN:51304566
>
> #samtools-1.1 view try5 | head -n2
> HWI-BRUNOP16X_0001:2:2:16141:26794#0 4 * 0 0 * * 0 0
> GTTAAATACAAACTTTCATTTGGTGATGCACGGCACCAATGCTTTGCATATACCTTGCTGCAAAGAACAGGTTAA
> TTTTTSSKJFSSPTT\GUHUSSTTSSTPTTMSTTTSSSTTTTTTSb]bbRSSKTTTTKTT``b[bBBBBBBBBBB
> HWI-BRUNOP16X_0001:2:2:16241:26798#0 4 * 0 0 * * 0 0
> GTACACTTCATTCTCTAGGGCTGCAGGGTACAAAAAGCCAATTTCAGAAGTAAGTGGACAAGGCAGAGAAGAAAA
> gfggggggggggggggggggggggggggfggggggffgggggggggggggfgggfgggggggggggfghggggg]
>
> #cat try5 | samtools-1.1 sort - result_1.1
> [bam_sort_core] merging from 30 files...
> [pretty_header] invalid header
>
> #ll result_1.1.*
> -rw-rw---- 1 user group 143M Jan 21 16:44 result_1.1.0000.bam
>    ...
> -rw-rw---- 1 user group  15M Jan 21 16:52 result_1.1.0029.bam
>
> #cat try5 | samtools-0.1.19 sort - result_0.1.19
> [bam_header_read] EOF marker is absent. The input is probably truncated.
> [bam_sort_core] merging from 30 files...
>
> #ll result_0.1.19.bam
> -rw-rw---- 1 user group 3.6G Jan 21 17:14 result_0.1.19.bam
>
> #samtools-0.1.19 view -H result_0.1.19.bam
> @HD VN:1.3 SO:coordinate
> @PG ID:bwa PN:bwa VN:0.7.10-r789 CL:bwa samse -n 2
>
> /project/altsplice/projekte/rna-seq-analysis/dev/bwa/jctnReference_fastRun/synthJctnReference_ens72synthJctnSize90bp.fa
> - /scratch/local2/user/rna-seq-analysis_bwa/FCA_s_2_hg19_un.fastq
> @HD VN:1.3 SO:coordinate
> @SQ SN:chr22 LN:51304566
>
> #samtools-0.1.19 view result_0.1.19.bam | head -n2
> HWI-BRUNOP16X_0001:2:4:2964:191394#0 0 chr22 16100613 25 36M627N39M * 0
> 0
> CCTATGAATTGAATGTGTTACTATCGCTTTCACATCCTGAAGCATTAGAGCATGTGGGGAATGCACAAAAATTGA
> BBBBBBB_Y`bdgggggggggggggggggggggggggggggggggggggggggggggggggggffgggggggggg
> XT:A:U NM:i:4 X0:i:1 X1:i:0 XM:i:4 XO:i:0 XG:i:0 MD:Z:0A24C11G17T19
> HWI-BRUNOP16X_0001:2:23:17206:48524#0 0 chr22 16100613 37 36M627N39M * 0
> 0
> ACTATGAATTGAATGTGTTACTATCGCTTTCACATCCTGAAGCATTAGAGCATGTGGGGAATGCACAAAAATTGA
> U]b]OX]\MXTTSORP`^XU`RY^VbbWWb^\^^QR`b^b\\\YY\b^[bTSSSS^bW^\TTSPNSSSSTTTTPT
> XT:A:U NM:i:3 X0:i:1 X1:i:0 XM:i:3 XO:i:0 XG:i:0 MD:Z:25C11G17T19
>
> #
>
>
> ------------------------------------------------------------------------------
> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
> GigeNET is offering a free month of service with a new server in Ashburn.
> Choose from 2 high performing configs, both with 100TB of bandwidth.
> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
> http://p.sf.net/sfu/gigenet
> _______________________________________________
> Samtools-help mailing list
> Samtools-help@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/samtools-help
>
------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to