A historical background. Terminating at EOF was intended initially. The idea was to allow a BAM footer for extra information, such as the index. Nonetheless, we have not gone much further with this idea.
Heng On May 14, 2015, at 10:43 AM, Kanterakis, Efstathios <ekantera...@illumina.com> wrote: > Thank you all for your help. I'll keep an eye on > https://github.com/samtools/htslib/issues/45 > Stathis > > -----Original Message----- > From: John Marshall [mailto:j...@sanger.ac.uk] > Sent: 14 May 2015 11:46 > To: Kanterakis, Efstathios > Cc: samtools-help@lists.sourceforge.net > Subject: Re: [Samtools-help] tabix bug on cat'ed vcf.gz > > On 13 May 2015, at 12:17, Kanterakis, Efstathios <ekantera...@illumina.com> > wrote: >> bgzip chr1_h.vcf >> bgzip chr2.vcf >> cat chr1_h.vcf.gz chr2.vcf.gz > test.vcf.gz > > ...i.e., constructs test.vcf.gz with many BGZF blocks, including an EOF > trailer block from each of chr1_h.vcf.gz (in the middle of test.vcf.gz) and > chr2.vcf.gz (at the end of test.vcf.gz). > >> tabix test.vcf.gz # <-- >> tabix test.vcf.gz chr2 # blank >> tabix test.vcf.gz chr1 # works >> [...] >> I was under the impression that bgzipped files are directly cat'able. Is >> this a bug? > > As Len suspected, the tabix index command (marked <--) is stopping at the EOF > trailer block at the end of chr1_h.vcf.gz. This is an htslib bug: > https://github.com/samtools/htslib/issues/45 . > > See http://sourceforge.net/p/samtools/mailman/message/33493929/ for further > background. Nobody considered these EOF blocks and concatenation of bgzipped > files until rather late in the piece, and both htslib/samtools and > htsjdk/Picard still have bugs that mean they stop reading at these EOF blocks > in various circumstances. The fact that this doesn't cause chaos shows how > rare this is in practice, and is a large part of the reason why these bugs > have not been fixed. > > Thanks for the IMHO rather plausible use case! I mostly fixed this in htslib > a while back, but stopped as the expected utility did not seem to outweigh > the risk of screwing up error handling in the code in question. Plausible > use cases change that calculus. > > On 13 May 2015, at 13:52, Peter Cock <p.j.a.c...@googlemail.com> wrote: >> Second, some tools fail >> to cope with concatenated gzip block (e.g. some Java libraries break). > > This is a separate concern and is not in play here. Any sizeable bgzipped > file is already itself a bunch of concatenated gzip/BGZF blocks, so catting > two of them together makes no difference to the Java library problem. > > John > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, > a charity registered in England with number 1021457 and a company registered > in England with number 2742969, whose registered office is 215 Euston Road, > London, NW1 2BE. > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Samtools-help mailing list > Samtools-help@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/samtools-help ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help