Samtools (and HTSlib and BCFtools) version 1.21 is now available from
GitHub and SourceForge.

https://github.com/samtools/htslib/releases/tag/1.21
https://github.com/samtools/samtools/releases/tag/1.21
https://github.com/samtools/bcftools/releases/tag/1.21
https://sourceforge.net/projects/samtools/  The main changes are listed below:


------------------------------------------------------------------------------
htslib - changes v1.21
------------------------------------------------------------------------------

The primary user-visible changes in this release are updates to the annot-tsv
tool and some speed improvements.  Full details of other changes and bugs
fixed are below.

Notice: this is the last SAMtools / HTSlib release where CRAM 3.0 will
be the default CRAM version.  From the next we will change to CRAM 3.1
unless the version is explicitly specified, for example using
"samtools view -O cram,version=3.0".

Updates
-------

* Extend annot-tsv with several new command line options.
    --delim permits use of other delimiters.
    --headers for selection of other header formats.
    --no-header-idx to suppress column index numbers in header.
  Also removed -h as it is now short for --headers.  Note --help still works.
  (PR #1779)

* Allow annot-tsv -a to rename annotations. (PR #1709)

* Extend annot-tsv --overlap to be able to specify the overlap fraction
  separately for source and target. (PR #1811)

* Added new APIs to facilitate low-level CRAM container manipulations,
  used by the new "samtools cat" region filtering code. Functions are:
    cram_container_get_coords()
    cram_filter_container()
    cram_index_extents()
    cram_container_num2offset()
    cram_container_offset2num()
    cram_num_containers()
    cram_num_containers_between()
  Also improved cram_index_query() to cope with HTS_IDX_NOCOOR regions.
  (PR #1771)

* Bgzip now retains file modification and access times when compressing and
  decompressing. (PR #1727, fixes #1718.  Requested by Gert Hulselmans.)

* Use FNV1a for string hashing in khash.  The old algorithm was particularly
  weak with base-64 style strings and lead to a large number of collisions.
  (PR #1806.  Fixes samtools/samtools#2066, reported by
  Hans-Joachim Ruscheweyh)

* Improve the speed of the nibble2base() function on Intel (PR #1667, PR
  #1764, PR #1786, PR #1802, thanks to Ruben Vorderman) and ARM (PR #1795,
  thanks to John Marshall).

* bgzf_getline() will now warn if it encounters UTF-16 data. (PR #1487,
  thanks to John Marshall)

* Speed up bgzf_read().  While this does not reduce CPU significantly, it
  does increase the maximum parallelism available permitting 10-15% faster
  decoding. (PR #1772, PR #1800, Issue #1798)

* Speed up faidx by use of better isgraph methods (PR #1797) and whole-line
  reading (PR #1799, thanks to John Marshall).

* Speed up kputll() function, speeding up BAM -> SAM conversion by about 5%
  and also samtools depth. (PR #1805)

* Added more example code, covering fasta/fastq indexing, tabix indexing and
  use of the thread pool. (PR #1666)

Build Changes
-------------

* Code warning fixes for pedantic compilers (PR #1777) and avoid some
  undefined behaviour (PR #1810, PR #1816, PR #1828).

* Windows based CI has been migrated from AppVeyor to GitHub Actions. (PR
  #1796, PR #1803, PR #1808)

* Miscellaneous minor build infrastructure and code fixes. (PR #1807, PR
  #1829, both thanks to John Marshall)

* Updated htscodecs submodule to version 1.6.1 (PR #1828)

* Fixed an awk script in the Makefile that only worked with gawk. (PR #1831)

Bug fixes
---------

* Fix small OSS-Fuzz reported issues with CRAM encoding and long CIGARS
  and/or illegal positions. (PR #1775, PR #1801, PR #1817)

* Fix issues with on-the-fly indexing of VCF/BCF (bcftools --write-index)
  when not using multiple threads. (PR #1837. Fixes samtools/bcftools#2267,
  reported by Giulio Genovese)

* Stricter limits on POS / MPOS / TLEN in sam_parse1().  This fixes a
  signed overflow reported by OSS-Fuzz and should help prevent other
  as-yet undetected bugs. (PR #1812)

* Check that the underlying file open worked for preload: URLs.  Fixes a NULL
  pointer dereference reported by OSS-Fuzz. (PR #1821)

* Fix an infinite loop in hts_itr_query() when given extremely large
  positions which cause integer overflow.  Also adds hts_bin_maxpos() and
  hts_idx_maxpos() functions. (PR #1774, thanks to John Marshall and
  reported by Jesus Alberto Munoz Mesa)

* Fix an out of bounds read in hts_itr_multi_next() when switching
  chromosomes.  This bug is present in releases 1.11 to 1.20. (PR #1788.
  Fixes samtools/samtools#2063, reported by acorvelo)

* Work around parsing problems with colons in CHROM names. Fixes
  samtools/bcftools#2139.  (PR #1781, John Marshall / James Bonfield)

* Correct the CPU detection for Mac OS X 10.7.  cpuid is used by htscodecs
  (see samtools/htscodecs#116), and the corresponding changes in htslib are
  PR #1785.  Reported by Ryan Carsten Schmidt.

* Make BAM zero-length intervals work the same as CRAM; permitted and
  returning overlapping records. (PR #1787.  Fixes samtools/samtools#2060,
  reported by acorvelo)

* Replace assert() with abort() in BCF synced reader.  This is not an ideal
  solution, but it gives consistent behaviour when compiling with or without
  NDEBUG. (PR #1791, thanks to Martin Pollard)

* Fixed failure to change the write block size on compressed SAM or VCF files
  due to an internal type confusion. (PR #1826)

* Fixed an out-of-bounds read in cram_codec_iter_next() (PR #1832)

------------------------------------------------------------------------------
samtools - changes v1.21
------------------------------------------------------------------------------

Notice:

* This is the last SAMtools / HTSlib release where CRAM 3.0 will be the
  default CRAM version.  From the next we will change to CRAM 3.1 unless
  the version is explicitly specified, for example using
  "samtools view -O cram,version=3.0".

New work and changes:

* `samtools reset` now removes a set of predefined auxtags, as these tags
  are no longer valid after the reset operation.  This behaviour can be
  overridden if desired. (PR #2034, fixes #2011.  Reported by Felix Lenner)

* `samtools reset` now also removes duplicate flags. (PR #2047.  Reported by
  Kevin Lewis)

* Region and section/part filtering added to CRAM `samtools cat`.  Region
  filtering permits `samtools cat` to produce new CRAMs that only cover a
  specified region. (PR #2035)

* Added a report of the number of alignments for each primer to `samtools
  ampliconclip`. (PR #2039, PR #2101, feature request #2033.  Thanks to
  Brad Langhorst)

* Make `ampliconclip` primer counts output deterministic. (PR #2081)

* `samtools fixmate` no longer removes the PAIRED flag from reads that have
  no mate.  This is done on the understanding that the PAIRED flag is a
  sequencing technology indicator not a feature of alignment.  This is a
  change to previous `fixmate` behaviour. (PR #2056, fixes #2052.  Reported
  by John Wiedenhoeft)

* Added bgzf compressed FASTA output to `samtools faidx`. (PR #2067, fixes
  #2055. Requested by Filipe G Vieira)

* Optimise `samtools depth` histogram incrementing code. (PR #2078)

* In `samtools merge` zero pad unique suffix IDs. (PR #2087, fixes #2086.
  Thanks to Chris Wright)

* `samtools idxstats` now accepts the `-X` option, making it easier to
  specify the location of the index file. (PR #2093, feature request
  #2071.  Requested by Samuel Chen)

* Improved documentation for the mpileup `--adjust-MQ` option. (PR #2098.
  Requested by Georg Langebrake)

Bug fixes:

* Avoid `tview` buffer overflow for positions with >= 14 digits. (PR #2032.
  Thanks to John Marshall. Reported on bioconda/bioconda-recipes#47137 by
  jmunoz94)

* Added file name and error message to 'error closing output file' error in
  `samtools sort`. (PR #2050, fixes #2049.  Thanks to Joshua C Randall).

* Fixed hard clip trimming issue in `ampliconclip` where right-hand side
  qualities were being removed from left-hand side trims. (PR #2053, fixes
  #2048.  Reported by Duda5)

* Fixed a bug in `samtools merge --template-coordinate` where the wrong
  heap was being tested. (PR #2062.  Thanks to Nils Homer.  Reported on
  ng-core/fastquorum#52 by David Mas-Ponte)

* Do not look at chr "*" for unmapped-placed reads with `samtools view
  --fetch-pairs`.  This was causing a significant slowdown when
  `--fetch-pairs` was being used. (PR #2070, fixes #2059.  Reported by
  acorvelo)

* Fixed bug which could cause `samtools view -L` to give incomplete output
  when the BED file contained nested target locations. (PR #2107, fixes
  #2104.  Reported by geertvandeweyer)

* Enable `samtools coverage` to handle alignments that do not have quality
  score data.  This was causing memory access problems. (PR #2083, fixes
  #2076.  Reported by Matthew Colpus)

* Fix undefined behaviour in `samtools fastq` with empty QUAL. (PR #2084)

* In `plot-bamstats` fixed read-length plot for data with limited variations
  in length. Lack of data was causing gnuplot problems. (PR #2085, fixes
  #2068.  Reported by mariyeta)

* Fixed an accidental fall-through that caused `samtools split -p` to also
  enable `--no-PG`. (PR #2101)

* Fixed an overflow that caused `samtools consensus -m simple` to give
  incorrect output when the input coverage exceeded several million reads
  deep. (PR #2099, fixes #2095.  Reported by Dylan Lawrence)

Non user-visible changes and build improvements:

* Work around address sanitizer going missing from the Cirrus CI ubuntu clang
  compiler by moving the address sanitizer build to gcc. Fix warnings from
  the new clang compiler. (PR #2043)

* Windows based CI has been migrated from AppVeyor to GitHub Actions. (PR
  #2072, PR #2108)

* Turn on more warning options in Cirrus-CI builds, ensure everything builds
  with `-Werror`, and add undefined behaviour checks to the address sanitizer
  test. (PR #2101, PR #2103, PR #2109)

* Tidy up Makefile dependencies and untracked test files. (PR #2106.  Thanks
  to John Marshall)

------------------------------------------------------------------------------
bcftools - changes v1.21
------------------------------------------------------------------------------

Changes affecting the whole of bcftools, or multiple commands:

* Support multiple semicolon-separated strings when filtering by ID
  using -i/-e (#2190).
  For example, `-i 'ID="rs123"'` now correctly matches `rs123;rs456`

* The filtering expression ILEN can be positive (insertion), negative
  (deletion), zero (balanced substitutions), or set to missing value
  (symbolic alleles).

* bcftools query
* bcftools +split-vep

    - The columns indices printed by default with `-H` (e.g., "#[1]CHROM")
      can be now suppressed by giving the option twice `-HH` (#2152)

Changes affecting specific commands:

* bcftools annotate

    - Support dynamic variables read from a tab-delimited annotation file
      (#2151). For example, in the two cases below the field 'STR' from the -a
      file is required to match the INFO/TAG in VCF. In the first example the
      alleles REF,ALT must match, in the second example they are ignored. The
      option -k is required to output also records that were not annotated:

       bcftools annotate -a ann.tsv.gz \
          -c CHROM,POS,REF,ALT,SCORE,~STR -i'TAG={STR}' -k in.vcf

       bcftools annotate -a ann.tsv.gz \
          -c CHROM,POS,-,-,SCORE,~STR     -i'TAG={STR}' -k in.vcf

    - When adding Type=String annotations from a tab-delimited file, encode
      characters with special meaning using percent encoding (';', '=' in
      INFO and ':' in FORMAT) (#2202)

* bcftools consensus

    - Allow to apply a reference allele which overlaps a previous deletion,
      there is no need to complain about overlapping alleles in such case

    - Fix a bug which required `-s -` to be present even when there were no
      samples in the VCF (#2260)

* bcftools csq

    - Fix a rare bug where indel combined with a substitution ending at
      exon boundary is incorrectly predicted to have 'inframe' rather
      than 'frameshift' consequence (#2212)

* bcftools gtcheck

    - Fix a segfault with --no-HWE-prob. The bug was introduced with the
      output format change in 1.19 which replaced the DC section with DCv2
      (#2180)

    - The number of matching genotypes in the DCv2 output was not calculated
      correctly with non-zero `-E, --error-probability`. Consequently, also
      the average HWE score was incorrect. The main output, the discordance
      score, was not affected by the bug

* bcftools +mendelian2

    - Include the number of good cases where at least one of the trio
      genotypes has an alternate allele (#2204)

    - Fix the error message which would report the wrong sample when
      non-existent sample is given. Note that bug only affected the error
      message, the program otherwise assigns the family members correctly
      (#2242)

* bcftools merge

    - Fix a severe bug in merging of FORMAT fields with Number=R and Number=A
      values. For example, rows with high-coverage FORMAT/AD values (bigger
      or equal to 128) could have been assigned to incorrect samples. The bug
      was introduced in version 1.19. For details see #2244.

* bcftools mpileup

    - Return non-zero error code when the input BAM/CRAM file is truncated
      (#2177)

    - Add FORMAT/AD annotation by default, disable with `-a -AD`

* bcftools norm

    - Support realignment of symbolic <DUP.*> alleles, similarly to <DEL.*>
      added previously (#1919,#2145)

    - Fix in reporting reference allele genotypes with `--multi-overlaps .`
      (#2160)

    - Support of duplicate removal of symbolic alleles of the same type but
      different SVLEN (#2182)

    - New `-S, --sort` switch to optionally sort output records by allele
      (#1484)

    - Add the `-i/-e` filtering options to select records for normalization.
      Note duplicate removal ignores this option.

    - Fix a bug where `--atomize` would not fill GT alleles for atomized SNVs
      followed by an indel (#2239)

* bcftools +remove-overlaps

    - Revamp the program to allow greater flexibility, with the following new
      options:

      -M, --mark-tag TAG   Mark -m sites with INFO/TAG

      -m, --mark EXPR      Mark (if also -M is present)
                           or remove sites [overlap]

            dup       .. all overlapping sites
            overlap   .. overlapping sites
            min(QUAL) .. mark sites with lowest QUAL until overlaps
                         are resolved

      --missing EXPR       Value to use for missing tags
                           with -m 'min(QUAL)'

            0   .. the default
            DP  .. heuristics, scale maximum QUAL value proportionally
                   to INFO/DP

      --reverse            Apply the reverse logic, for example preserve
                           duplicates instead of removing

      -O, --output-type t  t: plain list of sites (chr,pos),
                           tz: compressed list

* bcftools +tag2tag

    - The conversions --LXX-to-XX, --XX-to-LXX were working but specific
      cases such as --LAD-to-AD were not.

    - Print more informative error message when source tag type violiates VCF
      specification

* bcftools +trio-dnm2

    - Better handling of the --strictly-novel functionality, especically with
      respect to chrX inheritance


----------------------------------------------------------------------
The Wellcome Sanger Institute is operated by Genome Research Limited, a charity 
registered in England with number 1021457 and a company registered in England 
with number 2742969, whose registered office is Wellcome Sanger Institute, 
Wellcome Genome Campus, Hinxton, CB10 1SA.


_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to