On 13 May 2015, at 12:17, Kanterakis, Efstathios <ekantera...@illumina.com> wrote: > bgzip chr1_h.vcf > bgzip chr2.vcf > cat chr1_h.vcf.gz chr2.vcf.gz > test.vcf.gz
...i.e., constructs test.vcf.gz with many BGZF blocks, including an EOF trailer block from each of chr1_h.vcf.gz (in the middle of test.vcf.gz) and chr2.vcf.gz (at the end of test.vcf.gz). > tabix test.vcf.gz # <-- > tabix test.vcf.gz chr2 # blank > tabix test.vcf.gz chr1 # works > [...] > I was under the impression that bgzipped files are directly cat'able. Is this > a bug? As Len suspected, the tabix index command (marked <--) is stopping at the EOF trailer block at the end of chr1_h.vcf.gz. This is an htslib bug: https://github.com/samtools/htslib/issues/45 . See http://sourceforge.net/p/samtools/mailman/message/33493929/ for further background. Nobody considered these EOF blocks and concatenation of bgzipped files until rather late in the piece, and both htslib/samtools and htsjdk/Picard still have bugs that mean they stop reading at these EOF blocks in various circumstances. The fact that this doesn't cause chaos shows how rare this is in practice, and is a large part of the reason why these bugs have not been fixed. Thanks for the IMHO rather plausible use case! I mostly fixed this in htslib a while back, but stopped as the expected utility did not seem to outweigh the risk of screwing up error handling in the code in question. Plausible use cases change that calculus. On 13 May 2015, at 13:52, Peter Cock <p.j.a.c...@googlemail.com> wrote: > Second, some tools fail > to cope with concatenated gzip block (e.g. some Java > libraries break). This is a separate concern and is not in play here. Any sizeable bgzipped file is already itself a bunch of concatenated gzip/BGZF blocks, so catting two of them together makes no difference to the Java library problem. John -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help