Samtools (and HTSlib and BCFtools) version 1.18 is now available from
GitHub and SourceForge.

https://github.com/samtools/htslib/releases/tag/1.18
https://github.com/samtools/samtools/releases/tag/1.18
https://github.com/samtools/bcftools/releases/tag/1.18
https://sourceforge.net/projects/samtools/ The main changes are listed below:

------------------------------------------------------------------------------
htslib - changes v1.18
------------------------------------------------------------------------------

Updates
-------

* Using CRAM 3.1 no longer gives a warning about the specification being
  draft.  Note CRAM 3.0 is still the default output format. (PR#1583)

* Replaced use of sprintf with snprintf, to silence potential warnings from
  Apple's compilers and those who implement similar checks. (PR#1594,
  fixes #1586. Reported by Oleksii Nikolaienko)

* Fastq output will now generate empty records for reads with no sequence
  data (i.e. sequence is "*" in SAM format). (PR#1576, fixes
  samtools/samtools#1576.  Reported by Nils Homer)

* CRAM decoding speed-ups. (PR#1580)

* A new MN aux tag can now be used to verify that MM/ML base modification
  data has not been broken by hard clipping. (PR#1590, PR#1612. See also PR
  samtools/hts-specs#714 and issue samtools/hts-specs#646. Reported by
  Jared Simpson)

* The base modification API has been improved to make it easier for
  callers to tell unchecked bases from unmodified ones. (PR#1636, fixes
  #1550. Requested by Chris Wright)

* A new bam_mods_queryi() API has been added to return additional data about
  the i-th base modification returned by bam_mods_recorded(). (PR#1636, fixes
  #1550 and #1635.  Requested by Jared Simpson)

* Speed up index look-ups for whole-chromosome queries. (PR#1596)

* Mpileup now merges adjacent (mis)match CIGAR operations, so CIGARs using
  the X/= operators give the same results as if the M operator was used.
  (PR#1607, fixes #1597.  Reported by Marcel Martin)

* It's now possible to call bcf_sr_set_regions() after adding readers using
  bcf_sr_add_reader() (previously this returned an error).  Doing so will
  discard any unread data, and reset the readers so they iterate over the
  new regions.  (PR#1624, fixes samtools/bcftools#1918.  Reported by
  Gregg Thomas)

* The synced BCF reader can now accept regions with reference names
  including colons and hyphens, by enclosing them in curly braces.  For
  example, {chr_part:1-1001}:10-20 will return bases 10 to 20 from
  reference "chr_part:1-1001".  (PR#1630, fixes #1620.  Reported by Bren)

* Add a "samples" directory with code demonstrating usage of HTSlib plus a
  tutorial document. (PR#1589)

Build changes
-------------

* Htscodecs has been updated to 1.5.1 (PR#1654)

* Htscodecs SIMD code now works with Apple multiarch binaries. (PR#1587,
  HTSlib fix for samtools/htscodecs#76. Reported by John Marshall)

* Improve portability of "expr" usage in version.sh. (PR#1593, fixes #1592.
  Reported by John Marshall)

* Improve portability to *BSD targets by ensuring _XOPEN_SOURCE is
  defined correctly and that source files properly include "config.h".
  Perl scripts also now all use #!/usr/bin/env instead of assuming that
  it's in /usr/bin/perl. (PR#1628, fixes #1606. Reported by
  Robert Clausecker)

* Fixed NAME entry in htslib-s3-plugin man page so the whatis and apropos
  commands find it.  (PR#1634, thanks to Étienne Mollier)

* Assorted dependency tracking fixes.  (PR#1653, thanks to John Marshall)

Documentation updates
---------------------

* Changed Alpine build instructions as they've switched back to using
  openssl. (PR#1609)

* Recommend using -rdynamic when statically linking a libhts.a with plugins
  enabled.  (PR#1611, thanks to John Marshall.  Fixes #1600, reported by
  Jack Wimberley)

* Fixed example in docs for sam_hdr_add_line().  (PR#1618, thanks to kojix2)

* Improved test harness for base modifications API. (PR#1648)

Bug fixes
---------

* Fix a major bug when searching against a CRAM index where one container
  has start and end coordinates entirely contained within the previous
  container. This would occasionally miss data, and sometimes return much
  more than required.  The bug affected versions 1.11 to 1.17, although the
  change in 1.11 was bug-fixing multi-threaded index queries.  This bug did
  not affect index building.  There is no need to reindex your CRAM files.
  (PR#1574, PR#1640. Fixes #1569, #1639, samtools/samtools#1808,
  samtools/samtools#1819.  Reported by xuxif, Jens Reeder and Jared Simpson)

* Prevent CRAM blocks from becoming too big in files with short sequences but
  very long aux tags.  (PR #1613)

* Fix bug where the CRAM decoder for CONST_INT and CONST_BYTE codecs may
  incorrectly look for extra data in the CORE block. Note that this bug only
  affected the experimental CRAM v4.0 decoder. (PR#1614)

* Fix crypt4gh redirection so it works in conjunction with non-file IO, such
  as using htsget. (PR#1577)

* Improve error checking for the VCF POS column, when facing invalid data.
  (PR#1575, replaces #1570 originally reported and fixed by Colin Nolan.)

* Improved error checking on VCF indexing to validate the data is BGZF
  compressed. (PR#1581)

* Fix bug where bin number calculation could overflow when making iterators
  over regions that go to the end of a chromosome. (PR#1595)

* Backport attractivechaos/klib#78 (by Pall Melsted) to HTSlib. Prevents
  infinite loops in kseq_read() when reading broken gzip files. (PR#1582,
  fixes #1579.  Reported by Goran Vinterhalter)

* Backport attractivechaos/klib@384277a (by innoink) to HTSlib. Fixes the
  kh_int_hash_func2() macro definition. (PR#1599, fixes #1598.  Reported by
  fanxinping)

* Remove a compilation warning on systems with newer libcurl releases.
  (PR#1572)

* Windows: Fixed BGZF EOF check for recent MinGW releases. (PR#1601, fixes
  samtools/bcftools#1901)

* Fixed bug where tabix would not return the correct regions for files
  where the column ordering is end, ..., begin instead of begin, ...,
  end. (PR#1626, fixes #1622.  Reported by Hiruna Samarakoon)

* sam_format_aux1() now always NUL-terminates Z/H tags. (PR#1631)

* Ensure base modification iterator is reset when no MM tag is present.
  (PR#1631, PR#1647)

* Fix segfault when attempting to write an uncompressed BAM file opened
  using hts_open(name, "wbu").  This was attempting to write BAM data
  without wrapping it in BGZF blocks, which is invalid according to the BAM
  specification.  "wbu" is now internally converted to "wb0" to output
  uncompressed data wrapped in BGZF blocks. (PR#1632, fixes #1617. Reported
  by Joyjit Daw)

* Fixed over-strict bounds check in probaln_glocal() which caused it to make
  sub-optimal alignments when the requested band width was greater than the
  query length.  (PR#1616, fixes #1605.  Reported by Jared Simpson)

* Fixed possible double frees when handling errors in bcf_hdr_add_hrec(), if
  particular memory allocations fail. (PR#1637)

* Ensure that bcf_hdr_remove() clears up all pointers to the items removed
  from dictionaries.  Failing to do this could have resulted in a call
  requesting a deleted item via bcf_hdr_get_hrec() returning a stale
  pointer. (PR#1637)

* Stop the gzip decompresser from finishing prematurely when an empty gzip
  block is followed by more data. (PR#1643, PR#1646)

------------------------------------------------------------------------------
samtools - changes v1.18
------------------------------------------------------------------------------

New work and changes:

* Add minimiser sort option to collate by an indexed fasta.  Expand the
  minimiser sort to arrange the minimiser values in the same order as
  they occur in the reference genome. This is acts as an extremely crude
  and simplistic read aligner that can be used to boost read compression.
  (PR#1818)

* Add a --duplicate-count option to markdup.  Adds the number of duplicates
  (including itself) to the original read in a 'dc' tag. (PR#1816. Thanks to
  wulj2)

* Make calmd handle unaligned data or empty files without throwing an error.
  This is to make pipelines work more smoothly.  A warning will still be
  issued. (PR#1841, fixes #1839.  Reported by Filipe G. Vieira)

* Consistent, more comprehensive flag filtering for fasta/fastq.  Added
  --rf/--incl[ude]-flags and long options for -F (--excl[ude]-flags and -f
  (--require-flags). (PR#1842.  Thanks to Devang Thakkar)

* Apply fastq --input-fmt-option settings.  Previously any options
  specified were not being applied to the input file. (PR#1855.  Thanks
  to John Marshall)

* Add fastq -d TAG[:VAL] check.  This mirrors view -d and will only output
  alignments that match TAG (and VAL if specified). (PR#1863, fixes #1854.
  Requested by Rasmus Kirkegaard)

* Extend import --order TAG to --order TAG:length.  If length is specified,
  the tag format goes from integer to a 0-padded string format.  This is a
  workaround for BAM and CRAM that cannot encode an order tag of over 4
  billion records. (PR#1850, fixes #1847.  Reported by Feng Tian)

* New -aa mode for consensus.  This works like the -aa option in depth
  and mpileup. The single 'a' reports all bases in contigs covered by
  alignments. Double 'aa' (or '-a -a') reports Ns even for the references
  with no alignments against them. (PR#1851, fixes #1849.  Requested by
  Tim Fennell)

* Add long option support to samtools index. (PR#1872, fixes #1869.  Reported
  by Jason Bacon)

* Be consistent with rounding of "average length" in samtools stats.
  (PR#1876, fixes #1867.  Reported by Jelinek-J)

* Add option to ampliconclip that marks reads as unmapped when they do not
  have enough aligned bases left after clipping.  Default is to unmap reads
  with zero aligned bases. (PR#1865, fixes #1856.  Requested by ces)

Bug Fixes:

* [From HTSLib] Fix a major bug when searching against a CRAM index where
  one container has start and end coordinates entirely contained within the
  previous container. This would occasionally miss data, and sometimes
  return much more than required.  The bug affected versions 1.11 to 1.17,
  although the change in 1.11 was bug-fixing multi-threaded index queries.
  This bug did not affect index building.  There is no need to reindex your
  CRAM files. (PR#samtools/htslib#1574, PR#samtools/htslib#1640. Fixes
  #samtools/htslib#1569, #samtools/htslib#1639, #1808, #1819.  Reported
  by xuxif, Jens Reeder and Jared Simpson)

* Fix a sort -M bug (regression) when merging sub-blocks.  Data was valid but
  in a poor order for compression. (PR#1812)

* Fix bug in split output format.  Now SAM and CRAM format can chosen as well
  as BAM.  Also a documentation change, see below. (PR#1821)

* Add error checking to view -e filter expression code.  Invalid expressions
  were not returning an error code. (PR#1833, fixes #1829.  Reported by
  Steve Huang)

* Fix reheader CRAM output version.  Sets the correct CRAM output version for
  non-3.0 CRAMs. (PR#1868, fixes #1866.  Reported by John Marshall)

Documentation:

* Expand the default filtering information on the mpileup man page. (PR#1802,
  fixes #1801.  Reported by gevro)

* Add an explanation of the default behaviour of split files on generating a
  file for reads with missing or unrecognised RG tags.  Also a small bug fix,
  see above. (PR#1821, fixes #1817.  Reported by Steve Huang)

* In the INSTALL instructions, switched back to openssl for Alpine.  This
  matches the current Alpine Linux practice. (PR#1837, see htslib#1591.
  Reported by John Marshall)

* Fix various typos caught by lintian parsers. (PR#1877.  Thanks to
  Étienne Mollier)

* Document consensus --qual-calibration option. (PR#1880, fixes #1879.
  Reported by John Marshall)

* Updated the page about samtools duplicate marking with more detail at
  www.htslib.org/algorithms/duplicate.html

Non user-visible changes and build improvements:

* Removed a redundant line that caused a warning in gcc-13. (PR#1838)

------------------------------------------------------------------------------
bcftools - changes v1.18
------------------------------------------------------------------------------

Changes affecting the whole of bcftools, or multiple commands:

* Support auto indexing during writing BCF and VCF.gz via new `--write-index`
  option

Changes affecting specific commands:

* bcftools annotate

    - The `-m, --mark-sites` option can be now used to mark all sites without
      the need to provide the `-a` file (#1861)

    - Fix a bug where the `-m` function did not respect the `--min-overlap`
      option (#1869)

    - Fix a bug when update of INFO/END results in assertion error (#1957)

* bcftools concat

    - New option `--drop-genotypes`

* bcftools consensus

    - Support higher-ploidy genotypes with `-H, --haplotype` (#1892)

    - Allow `--mark-ins` and `--mark-snv` with a character, similarly to
      `--mark-del`

* bcftools convert

    - Support for conversion from tab-delimited files (CHROM,POS,REF,ALT) to
      sites-only VCFs

* bcftools csq

    - New `--unify-chr-names` option to automatically unify different
      chromosome naming conventions in the input GFF, fasta and VCF files
      (e.g. "chrX" vs "X")

    - More versatility in parsing various flavors of GFF

    - A new `--dump-gff` option to help with debugging and investigating the
      internals of hGFF parsing

    - When printing consequences in nonsense mediated decay transcripts,
      include 'NMD_transcript' in the consequence part of the annotation.
      This is to make filtering easier and analogous to VEP annotations.
      For example the consequence annotation
      3_prime_utr|PCGF3|ENST00000430644|NMD is newly printed as
      3_prime_utr&NMD_transcript|PCGF3|ENST00000430644|NMD

* bcftools gtcheck

    - Add stats for the number of sites matched in the GT-vs-GT, GT-vs-PL,
      etc modes. This information is important for interpretation of the
      discordance score, as only the GT-vs-GT matching can be interpreted as
      the number of mismatching genotypes.

* bcftools +mendelian2

    - Fix in command line argument parsing, the `-p` and `-P` options were
      not functioning (#1906)

* bcftools merge

    - New `-M, --missing-rules` option to control the behavior of merging of
      vector tags to prevent mixtures of known and missing values in tags
      when desired

    - Use values pertaining to the unknown allele (<*> or <NON_REF>) when
      available to prevent mixtures of known and missing values (#1888)

    - Revamped line matching code to fix problems in gVCF merging where split
      gVCF blocks would not update genotypes (#1891, #1164).

* bcftool mpileup

    - Fix a bug in --indels-v2.0 which caused an endless loop when CIGAR
      operator 'H' or 'P' was encountered

* bcftools norm

    - The `-m, --multiallelics +` mode now preserves phasing (#1893)

    - Symbolic <DEL.*> alleles are now normalized too (#1919)

    - New `-g, --gff-annot` option to right-align indels in forward
      transcripts to follow HGVS 3'rule (#1929)

* bcftools query

    - Force newline character in formatting expression when not given
      explicitly

    - Fix `-H` header output in formatting expressions containing newlines

* bcftools reheader

    - Make `-f, --fai` aware of long contigs not representable by 32-bit
      integer (#1959)

* bcftools +split-vep

    - Prevent a segfault when `-i/-e` use a VEP subfield not included in `-f`
      or `-c` (#1877)

    - New `-X, --keep-sites` option complementing the existing `-x,
      --drop-sites` options

    - Force newline character in formatting expression when not given
      explicitly

    - Fix a subtle ambiguity: identical rows must be returned when `-s` is
      applied regardless of `-f` containing the `-a` VEP tag itself or not.

* bcftools stats

    - Collect new VAF (variant allele frequency) statistics from FORMAT/AD
      field

    - When counting transitions/transversions, consider also alternate het
      genotypes

* plot-vcfstats

    - Add three new VAF plots



--
The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA.
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to