A historical background. Terminating at EOF was intended initially. The idea 
was to allow a BAM footer for extra information, such as the index. 
Nonetheless, we have not gone much further with this idea.

Heng

On May 14, 2015, at 10:43 AM, Kanterakis, Efstathios <ekantera...@illumina.com> 
wrote:

> Thank you all for your help. I'll keep an eye on 
> https://github.com/samtools/htslib/issues/45
> Stathis
> 
> -----Original Message-----
> From: John Marshall [mailto:j...@sanger.ac.uk] 
> Sent: 14 May 2015 11:46
> To: Kanterakis, Efstathios
> Cc: samtools-help@lists.sourceforge.net
> Subject: Re: [Samtools-help] tabix bug on cat'ed vcf.gz
> 
> On 13 May 2015, at 12:17, Kanterakis, Efstathios <ekantera...@illumina.com> 
> wrote:
>> bgzip chr1_h.vcf
>> bgzip chr2.vcf
>> cat chr1_h.vcf.gz chr2.vcf.gz > test.vcf.gz
> 
> ...i.e., constructs test.vcf.gz with many BGZF blocks, including an EOF 
> trailer block from each of chr1_h.vcf.gz (in the middle of test.vcf.gz) and 
> chr2.vcf.gz (at the end of test.vcf.gz).
> 
>> tabix test.vcf.gz    # <--
>> tabix test.vcf.gz chr2 # blank
>> tabix test.vcf.gz chr1 # works
>> [...]
>> I was under the impression that bgzipped files are directly cat'able. Is 
>> this a bug?
> 
> As Len suspected, the tabix index command (marked <--) is stopping at the EOF 
> trailer block at the end of chr1_h.vcf.gz.  This is an htslib bug: 
> https://github.com/samtools/htslib/issues/45 .
> 
> See http://sourceforge.net/p/samtools/mailman/message/33493929/ for further 
> background.  Nobody considered these EOF blocks and concatenation of bgzipped 
> files until rather late in the piece, and both htslib/samtools and 
> htsjdk/Picard still have bugs that mean they stop reading at these EOF blocks 
> in various circumstances.  The fact that this doesn't cause chaos shows how 
> rare this is in practice, and is a large part of the reason why these bugs 
> have not been fixed.
> 
> Thanks for the IMHO rather plausible use case!  I mostly fixed this in htslib 
> a while back, but stopped as the expected utility did not seem to outweigh 
> the risk of screwing up error handling in the code in question.  Plausible 
> use cases change that calculus.
> 
> On 13 May 2015, at 13:52, Peter Cock <p.j.a.c...@googlemail.com> wrote:
>> Second, some tools fail
>> to cope with concatenated gzip block (e.g. some Java libraries break).
> 
> This is a separate concern and is not in play here.  Any sizeable bgzipped 
> file is already itself a bunch of concatenated gzip/BGZF blocks, so catting 
> two of them together makes no difference to the Java library problem.
> 
>    John
> 
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research  Limited, 
> a charity registered in England with number 1021457 and a  company registered 
> in England with number 2742969, whose registered  office is 215 Euston Road, 
> London, NW1 2BE. 
> 
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud 
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Samtools-help mailing list
> Samtools-help@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/samtools-help


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to