Samtools (and HTSlib and BCFtools) version 1.18 is now available from
GitHub and SourceForge.
https://github.com/samtools/htslib/releases/tag/1.18
https://github.com/samtools/samtools/releases/tag/1.18
https://github.com/samtools/bcftools/releases/tag/1.18
https://sourceforge.net/projects/samtools/
The main changes are listed below:
------------------------------------------------------------------------------
htslib - changes v1.18
------------------------------------------------------------------------------
Updates
-------
* Using CRAM 3.1 no longer gives a warning about the specification being
draft. Note CRAM 3.0 is still the default output format. (PR#1583)
* Replaced use of sprintf with snprintf, to silence potential warnings from
Apple's compilers and those who implement similar checks. (PR#1594,
fixes #1586. Reported by Oleksii Nikolaienko)
* Fastq output will now generate empty records for reads with no sequence
data (i.e. sequence is "*" in SAM format). (PR#1576, fixes
samtools/samtools#1576. Reported by Nils Homer)
* CRAM decoding speed-ups. (PR#1580)
* A new MN aux tag can now be used to verify that MM/ML base modification
data has not been broken by hard clipping. (PR#1590, PR#1612. See also PR
samtools/hts-specs#714 and issue samtools/hts-specs#646. Reported by
Jared Simpson)
* The base modification API has been improved to make it easier for
callers to tell unchecked bases from unmodified ones. (PR#1636, fixes
#1550. Requested by Chris Wright)
* A new bam_mods_queryi() API has been added to return additional data about
the i-th base modification returned by bam_mods_recorded(). (PR#1636, fixes
#1550 and #1635. Requested by Jared Simpson)
* Speed up index look-ups for whole-chromosome queries. (PR#1596)
* Mpileup now merges adjacent (mis)match CIGAR operations, so CIGARs using
the X/= operators give the same results as if the M operator was used.
(PR#1607, fixes #1597. Reported by Marcel Martin)
* It's now possible to call bcf_sr_set_regions() after adding readers using
bcf_sr_add_reader() (previously this returned an error). Doing so will
discard any unread data, and reset the readers so they iterate over the
new regions. (PR#1624, fixes samtools/bcftools#1918. Reported by
Gregg Thomas)
* The synced BCF reader can now accept regions with reference names
including colons and hyphens, by enclosing them in curly braces. For
example, {chr_part:1-1001}:10-20 will return bases 10 to 20 from
reference "chr_part:1-1001". (PR#1630, fixes #1620. Reported by Bren)
* Add a "samples" directory with code demonstrating usage of HTSlib plus a
tutorial document. (PR#1589)
Build changes
-------------
* Htscodecs has been updated to 1.5.1 (PR#1654)
* Htscodecs SIMD code now works with Apple multiarch binaries. (PR#1587,
HTSlib fix for samtools/htscodecs#76. Reported by John Marshall)
* Improve portability of "expr" usage in version.sh. (PR#1593, fixes #1592.
Reported by John Marshall)
* Improve portability to *BSD targets by ensuring _XOPEN_SOURCE is
defined correctly and that source files properly include "config.h".
Perl scripts also now all use #!/usr/bin/env instead of assuming that
it's in /usr/bin/perl. (PR#1628, fixes #1606. Reported by
Robert Clausecker)
* Fixed NAME entry in htslib-s3-plugin man page so the whatis and apropos
commands find it. (PR#1634, thanks to Étienne Mollier)
* Assorted dependency tracking fixes. (PR#1653, thanks to John Marshall)
Documentation updates
---------------------
* Changed Alpine build instructions as they've switched back to using
openssl. (PR#1609)
* Recommend using -rdynamic when statically linking a libhts.a with plugins
enabled. (PR#1611, thanks to John Marshall. Fixes #1600, reported by
Jack Wimberley)
* Fixed example in docs for sam_hdr_add_line(). (PR#1618, thanks to kojix2)
* Improved test harness for base modifications API. (PR#1648)
Bug fixes
---------
* Fix a major bug when searching against a CRAM index where one container
has start and end coordinates entirely contained within the previous
container. This would occasionally miss data, and sometimes return much
more than required. The bug affected versions 1.11 to 1.17, although the
change in 1.11 was bug-fixing multi-threaded index queries. This bug did
not affect index building. There is no need to reindex your CRAM files.
(PR#1574, PR#1640. Fixes #1569, #1639, samtools/samtools#1808,
samtools/samtools#1819. Reported by xuxif, Jens Reeder and Jared Simpson)
* Prevent CRAM blocks from becoming too big in files with short sequences but
very long aux tags. (PR #1613)
* Fix bug where the CRAM decoder for CONST_INT and CONST_BYTE codecs may
incorrectly look for extra data in the CORE block. Note that this bug only
affected the experimental CRAM v4.0 decoder. (PR#1614)
* Fix crypt4gh redirection so it works in conjunction with non-file IO, such
as using htsget. (PR#1577)
* Improve error checking for the VCF POS column, when facing invalid data.
(PR#1575, replaces #1570 originally reported and fixed by Colin Nolan.)
* Improved error checking on VCF indexing to validate the data is BGZF
compressed. (PR#1581)
* Fix bug where bin number calculation could overflow when making iterators
over regions that go to the end of a chromosome. (PR#1595)
* Backport attractivechaos/klib#78 (by Pall Melsted) to HTSlib. Prevents
infinite loops in kseq_read() when reading broken gzip files. (PR#1582,
fixes #1579. Reported by Goran Vinterhalter)
* Backport attractivechaos/klib@384277a (by innoink) to HTSlib. Fixes the
kh_int_hash_func2() macro definition. (PR#1599, fixes #1598. Reported by
fanxinping)
* Remove a compilation warning on systems with newer libcurl releases.
(PR#1572)
* Windows: Fixed BGZF EOF check for recent MinGW releases. (PR#1601, fixes
samtools/bcftools#1901)
* Fixed bug where tabix would not return the correct regions for files
where the column ordering is end, ..., begin instead of begin, ...,
end. (PR#1626, fixes #1622. Reported by Hiruna Samarakoon)
* sam_format_aux1() now always NUL-terminates Z/H tags. (PR#1631)
* Ensure base modification iterator is reset when no MM tag is present.
(PR#1631, PR#1647)
* Fix segfault when attempting to write an uncompressed BAM file opened
using hts_open(name, "wbu"). This was attempting to write BAM data
without wrapping it in BGZF blocks, which is invalid according to the BAM
specification. "wbu" is now internally converted to "wb0" to output
uncompressed data wrapped in BGZF blocks. (PR#1632, fixes #1617. Reported
by Joyjit Daw)
* Fixed over-strict bounds check in probaln_glocal() which caused it to make
sub-optimal alignments when the requested band width was greater than the
query length. (PR#1616, fixes #1605. Reported by Jared Simpson)
* Fixed possible double frees when handling errors in bcf_hdr_add_hrec(), if
particular memory allocations fail. (PR#1637)
* Ensure that bcf_hdr_remove() clears up all pointers to the items removed
from dictionaries. Failing to do this could have resulted in a call
requesting a deleted item via bcf_hdr_get_hrec() returning a stale
pointer. (PR#1637)
* Stop the gzip decompresser from finishing prematurely when an empty gzip
block is followed by more data. (PR#1643, PR#1646)
------------------------------------------------------------------------------
samtools - changes v1.18
------------------------------------------------------------------------------
New work and changes:
* Add minimiser sort option to collate by an indexed fasta. Expand the
minimiser sort to arrange the minimiser values in the same order as
they occur in the reference genome. This is acts as an extremely crude
and simplistic read aligner that can be used to boost read compression.
(PR#1818)
* Add a --duplicate-count option to markdup. Adds the number of duplicates
(including itself) to the original read in a 'dc' tag. (PR#1816. Thanks to
wulj2)
* Make calmd handle unaligned data or empty files without throwing an error.
This is to make pipelines work more smoothly. A warning will still be
issued. (PR#1841, fixes #1839. Reported by Filipe G. Vieira)
* Consistent, more comprehensive flag filtering for fasta/fastq. Added
--rf/--incl[ude]-flags and long options for -F (--excl[ude]-flags and -f
(--require-flags). (PR#1842. Thanks to Devang Thakkar)
* Apply fastq --input-fmt-option settings. Previously any options
specified were not being applied to the input file. (PR#1855. Thanks
to John Marshall)
* Add fastq -d TAG[:VAL] check. This mirrors view -d and will only output
alignments that match TAG (and VAL if specified). (PR#1863, fixes #1854.
Requested by Rasmus Kirkegaard)
* Extend import --order TAG to --order TAG:length. If length is specified,
the tag format goes from integer to a 0-padded string format. This is a
workaround for BAM and CRAM that cannot encode an order tag of over 4
billion records. (PR#1850, fixes #1847. Reported by Feng Tian)
* New -aa mode for consensus. This works like the -aa option in depth
and mpileup. The single 'a' reports all bases in contigs covered by
alignments. Double 'aa' (or '-a -a') reports Ns even for the references
with no alignments against them. (PR#1851, fixes #1849. Requested by
Tim Fennell)
* Add long option support to samtools index. (PR#1872, fixes #1869. Reported
by Jason Bacon)
* Be consistent with rounding of "average length" in samtools stats.
(PR#1876, fixes #1867. Reported by Jelinek-J)
* Add option to ampliconclip that marks reads as unmapped when they do not
have enough aligned bases left after clipping. Default is to unmap reads
with zero aligned bases. (PR#1865, fixes #1856. Requested by ces)
Bug Fixes:
* [From HTSLib] Fix a major bug when searching against a CRAM index where
one container has start and end coordinates entirely contained within the
previous container. This would occasionally miss data, and sometimes
return much more than required. The bug affected versions 1.11 to 1.17,
although the change in 1.11 was bug-fixing multi-threaded index queries.
This bug did not affect index building. There is no need to reindex your
CRAM files. (PR#samtools/htslib#1574, PR#samtools/htslib#1640. Fixes
#samtools/htslib#1569, #samtools/htslib#1639, #1808, #1819. Reported
by xuxif, Jens Reeder and Jared Simpson)
* Fix a sort -M bug (regression) when merging sub-blocks. Data was valid but
in a poor order for compression. (PR#1812)
* Fix bug in split output format. Now SAM and CRAM format can chosen as well
as BAM. Also a documentation change, see below. (PR#1821)
* Add error checking to view -e filter expression code. Invalid expressions
were not returning an error code. (PR#1833, fixes #1829. Reported by
Steve Huang)
* Fix reheader CRAM output version. Sets the correct CRAM output version for
non-3.0 CRAMs. (PR#1868, fixes #1866. Reported by John Marshall)
Documentation:
* Expand the default filtering information on the mpileup man page. (PR#1802,
fixes #1801. Reported by gevro)
* Add an explanation of the default behaviour of split files on generating a
file for reads with missing or unrecognised RG tags. Also a small bug fix,
see above. (PR#1821, fixes #1817. Reported by Steve Huang)
* In the INSTALL instructions, switched back to openssl for Alpine. This
matches the current Alpine Linux practice. (PR#1837, see htslib#1591.
Reported by John Marshall)
* Fix various typos caught by lintian parsers. (PR#1877. Thanks to
Étienne Mollier)
* Document consensus --qual-calibration option. (PR#1880, fixes #1879.
Reported by John Marshall)
* Updated the page about samtools duplicate marking with more detail at
www.htslib.org/algorithms/duplicate.html
Non user-visible changes and build improvements:
* Removed a redundant line that caused a warning in gcc-13. (PR#1838)
------------------------------------------------------------------------------
bcftools - changes v1.18
------------------------------------------------------------------------------
Changes affecting the whole of bcftools, or multiple commands:
* Support auto indexing during writing BCF and VCF.gz via new `--write-index`
option
Changes affecting specific commands:
* bcftools annotate
- The `-m, --mark-sites` option can be now used to mark all sites without
the need to provide the `-a` file (#1861)
- Fix a bug where the `-m` function did not respect the `--min-overlap`
option (#1869)
- Fix a bug when update of INFO/END results in assertion error (#1957)
* bcftools concat
- New option `--drop-genotypes`
* bcftools consensus
- Support higher-ploidy genotypes with `-H, --haplotype` (#1892)
- Allow `--mark-ins` and `--mark-snv` with a character, similarly to
`--mark-del`
* bcftools convert
- Support for conversion from tab-delimited files (CHROM,POS,REF,ALT) to
sites-only VCFs
* bcftools csq
- New `--unify-chr-names` option to automatically unify different
chromosome naming conventions in the input GFF, fasta and VCF files
(e.g. "chrX" vs "X")
- More versatility in parsing various flavors of GFF
- A new `--dump-gff` option to help with debugging and investigating the
internals of hGFF parsing
- When printing consequences in nonsense mediated decay transcripts,
include 'NMD_transcript' in the consequence part of the annotation.
This is to make filtering easier and analogous to VEP annotations.
For example the consequence annotation
3_prime_utr|PCGF3|ENST00000430644|NMD is newly printed as
3_prime_utr&NMD_transcript|PCGF3|ENST00000430644|NMD
* bcftools gtcheck
- Add stats for the number of sites matched in the GT-vs-GT, GT-vs-PL,
etc modes. This information is important for interpretation of the
discordance score, as only the GT-vs-GT matching can be interpreted as
the number of mismatching genotypes.
* bcftools +mendelian2
- Fix in command line argument parsing, the `-p` and `-P` options were
not functioning (#1906)
* bcftools merge
- New `-M, --missing-rules` option to control the behavior of merging of
vector tags to prevent mixtures of known and missing values in tags
when desired
- Use values pertaining to the unknown allele (<*> or <NON_REF>) when
available to prevent mixtures of known and missing values (#1888)
- Revamped line matching code to fix problems in gVCF merging where split
gVCF blocks would not update genotypes (#1891, #1164).
* bcftool mpileup
- Fix a bug in --indels-v2.0 which caused an endless loop when CIGAR
operator 'H' or 'P' was encountered
* bcftools norm
- The `-m, --multiallelics +` mode now preserves phasing (#1893)
- Symbolic <DEL.*> alleles are now normalized too (#1919)
- New `-g, --gff-annot` option to right-align indels in forward
transcripts to follow HGVS 3'rule (#1929)
* bcftools query
- Force newline character in formatting expression when not given
explicitly
- Fix `-H` header output in formatting expressions containing newlines
* bcftools reheader
- Make `-f, --fai` aware of long contigs not representable by 32-bit
integer (#1959)
* bcftools +split-vep
- Prevent a segfault when `-i/-e` use a VEP subfield not included in `-f`
or `-c` (#1877)
- New `-X, --keep-sites` option complementing the existing `-x,
--drop-sites` options
- Force newline character in formatting expression when not given
explicitly
- Fix a subtle ambiguity: identical rows must be returned when `-s` is
applied regardless of `-f` containing the `-a` VEP tag itself or not.
* bcftools stats
- Collect new VAF (variant allele frequency) statistics from FORMAT/AD
field
- When counting transitions/transversions, consider also alternate het
genotypes
* plot-vcfstats
- Add three new VAF plots
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is Wellcome Sanger Institute, Wellcome Genome Campus,
Hinxton, CB10 1SA._______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help