Samtools (and HTSlib and BCFtools) version 1.20 is now available from
GitHub and SourceForge.

https://github.com/samtools/htslib/releases/tag/1.20
https://github.com/samtools/samtools/releases/tag/1.20
https://github.com/samtools/bcftools/releases/tag/1.20
https://sourceforge.net/projects/samtools/ The main changes are listed below:


------------------------------------------------------------------------------
htslib - changes v1.20
------------------------------------------------------------------------------

Updates
-------

* When working on named files, bgzip now sets the modified and access times
  of the output files it makes to match those of the corresponding input.
  (PR #1727, feature request #1718.  Requested by Gert Hulselmans)

* It's now possible to use a -o option to specify the output file name in
  bgzip. (PR #1747, feature request #1726.  Requested by Gert Hulselmans)

* Improved error faidx error messages. (PR #1743, thanks to Nick Moore)

* Faster reading of SAM array (type "B") tags.  These often turn up in ONT
  and PacBio data. (PR #1741)

* Improved validity checking of base modification tags. (PR #1749)

* mpileup overlap removal now works where one read has a deletion. (PR #1751,
  fixes samtools/samtools#1992.  Reported by Long Tian)

* The S3 plugin can now find buckets via S3 access point aliases. (PR #1756,
  thanks to Matt Pawelczyk; fixes samtools/samtools#1984.  Reported by
  Albert Li)

* Added a --threads option (and -@ short option) to tabix. (PR #1755, feature
  request #1735.  Requested by Dan Bolser)

* tabix can now index Graph Alignment Format (GAF) files. (See
  https://github.com/lh3/gfatools/blob/master/doc/rGFA.md) (PR #1763,
  thanks to Adam Novak)

Bug fixes
---------

* Security fix: Prevent possible heap overflow in cram_encode_aux() on bad
  RG:Z tags. (PR #1737)

* Security fix: Prevent attempts to call a NULL pointer if certain URL
  schemes are used in CRAM @SQ UR: tags. (PR #1757)

* Security fix: Fixed a bug where following certain AWS S3 redirects could
  downgrade the connection from TLS (i.e. https://) to unencrypted http://.
  This could happen when using path-based URLs and AWS_DEFAULT_REGION was set
  to a region other that the one where the data was stored. (PR #1762, fixes
  #1760. Reported by andaca)

* Fixed arithmetic overflow when loading very long references for CRAM. (PR
  #1738, fixes #1738.  Reported by Shane McCarthy)

* Fixed faidx and CRAM reference look-ups on compressed fasta where the .fai
  index file was present, but the .gzi index of compressed offsets was not.
  (PR #1745, fixes #1744.  Reported by Theodore Li)

* Fixed BCF indexing on-the-fly bug which produced invalid indexes when
  using multiple compression threads. (PR #1742, fixes #1740.  Reported
  by graphenn)

* Ensure that pileup destructors are called by bam_plp_destroy(), to prevent
  memory leaks. (PR #1749, PR #1754)

* Ensure on-the-fly index timestamps are always older than the data file.
  Previously the files could be closed out of order, leading to warnings
  being printed when using the index. (PR #1753, fixes #1732.  Reported by
  Gert Hulselmans)

* To prevent data corruption when reading (strictly invalid) VCF files with
  duplicated FORMAT tags, all but the first copy of the data associated with
  the tag are now dropped with a warning. (PR #1752, PR #1761, fixes #1733.
  Reported by anthakki)

* Fixed a bug introduced in release 1.19 (PR #1689) which broke variant
  record data if it tried to remove an over-long tag. (PR #1752, PR #1761)

* Changed error to warning when complaining about use of the CG tag in SAM or
  CRAM files. (PR #1758, fixes samtools/samtools#2002)

------------------------------------------------------------------------------
samtools - changes v1.20
------------------------------------------------------------------------------

* Added a `--max-depth` option to `bedcov`, for more control over the depth
  limit used when calculating the pileup.  Previously this limit was set at
  64000; now it is set to over 2 billion, so effectively all bases will be
  counted. (PR #1970, fixes #1950.  Reported by ellisjj)

* Added `mpileup --output-extra RLEN` to display the unclipped read length.
  (PR #1971, feature request #1959.  Requested by Feng Tian)

* Improved checking of symbolic flag names (e.g. UNMAP) passed to samtools.
  (PR #1981, fixes #1977.  Reported by Ilya Shlyakhter)

* The `samtools consensus --min-depth` option now works for the Bayesian mode
  as well as the simple one. (PR #1989, feature request #1982.  Requested by
  Gautier Richard)

* It's now possible to use the `samtools fastq` `-d tag:val` option multiple
  times, allowing matches on more than one tag/value.  It also gets a `-D`
  option which allows the values to be listed in a file. (PR #1993, feature
  request #1958.  Requested by Tristan Lefebure)

* Added `samtools fixmate` `-M` option to sanity check base modification
  (`ML`, `MM`, `MN`) tags, and where necessary adjust modification data on
  hard-clipped records. (PR #1990)

* Made `mpileup` run faster. (PR #1995)

* `samtools import` now adds a `@PG` header to the files it makes. As with
  other sub-commands, this can be disabled by using `--no-PG`. (PR #2008.
  Requested by Steven Leonard)

* The `samtools split` `-d` option to split by tag value now works on tags
  with integer values. (PR #2005, feature request #1956.  Requested by
  Alex Leonard)

* Adjusted `samtools sort -n` (by name) so that primary reads are always
  sorted before secondary / supplementary. (PR #2012, feature request
  #2010.  Requested by Stijn van Dongen)

* Added `samtools bedcov` `-H` option to print column headers in the output.
  (PR #2025.  Thanks to Dr. K. D. Murray)

Documentation:

* Added a note that BAQ is applied before filtering and overlap removal
  during mpileup processing. (PR #1988, fixes #1985.  Reported by
  Joseph Galasso)

* Added 3.1 to the list of supported CRAM versions in the samtools manual
  page. (PR #2009.  Thanks to Andrew Thrasher)

* Made assorted improvements to ampliconclip, flagstat and markdup manual
  pages. (PR #2014)

Bug Fixes:

* Security fix: Fixed double free that could occur if bed file indexing
  failed due to running out of memory.  This bug first appeared in version
  1.19.1. (PR #2026)

* Corrected error message printed when faidx fails to load the fai index. (PR
  #1987.  Thanks to Nick Moore)

* Fixed bug introduced in release 1.4 that caused incorrect reference bases
  to be printed by `samtools mpileup -a -f ref.fa` in the zero-depth regions
  at the end of each reference. (PR #2019, fixes #2018.  Reported by
  Joe Georgeson)

* Fixed a samtools view usage crash on MinGW when given invalid options. (PR
  #2030, fixes #2029.  Reported by Divon Lan)

Non user-visible changes and build improvements:

* Added tests to ensure that CRAM compression is working properly. (PR #1969,
  part of fix for #1968.  Reported by Clockris)

------------------------------------------------------------------------------
bcftools - changes v1.20
------------------------------------------------------------------------------

Changes affecting the whole of bcftools, or multiple commands:

* Add short option -W for --write-index. The option now accepts an optional
  parameter which allows to choose between TBI and CSI index format.

Changes affecting specific commands:

* bcftools consensus

    - Add new --regions-overlap option which allows to take into account
      overlapping deletions that start out of the fasta file target region.

* bcftools isec

    - Add new option `-l, --file-list` to read the list of file names from a
      file

* bcftools merge

    - Add new option `--force-single` to support single-file edge case
      (#2100)

* bcftools mpileup

    - Add new option --indels-cns for an alternative indel calling model,
      which should increase the speed on long read data (thanks to using
      edlib) and the precision (thanks to a number of heuristics).

* bcftools norm

    - Change the order of atomization and multiallelic splitting (when both
      -a,-m are given) from "atomize first, then split" to "split first, then
      atomize". This usually results in a simpler VCF representation. The
      previous behaviour can be achieved by explicitly streaming the output
      of the --atomize command into the --multiallelics splitting command.

    - Fix Type=String multiallelic splitting for Number=A,R,G tags with
      incorrect number of values.

    - Merging into multiallelic sites with `bcftools norm -m +indels` did
      not work. This is now fixed and the merging is now more strict about
      variant types, for example complex events, such as AC>TGA, are not
      considered as indels anymore (#2084)

* bcftools reheader

    - Allow reading the input file from a stream with --fai (#2088)

* bcftools +setGT

    - Support for custom genotypes based on the allele with higher depth,
      such as `--new-gt c:0/X` custom genotypes (#2065)

* bcftools +split-vep

    - When only one of the tags is present, automatically choose INFO/BCSQ
      (the default tag name produced by `bcftools csq`) or INFO/CSQ (produced
      by VEP). When both tags are present, use the default INFO/CSQ.

    - Transcript selection by MANE, PICK, and user-defined transcripts, for
      example

       --select CANONICAL=YES
       --select MANE_SELECT!=""
       --select PolyPhen~probably_damaging

    - Select all matching transcripts via --select, not just one

    - Change automatic type parsing of VEP fields DNA_position, CDS_position,
      and Protein_position from Integer to String, as it can be of the form
      "8586-8599/9231". The type Integer can be still enforced with
      `-c cDNA_position:int,CDS_position:int,Protein_position:int`.

    - Recognize `-c field:str`, not just `-c field:string`, as advertised in
      the usage page

    - Fix a bug which made filtering expression containing missing values
      crash (#2098)

* bcftools stats

    - When GT is missing but AD is present, the program determines the
      alternate allele from AD. However, if the AD tag has incorrect number
      of values, the program would exit with an error printing "Requested
      allele outside valid range". This is now fixed by taking into account
      the actual number of ALT alleles.

* bcftools +tag2tag

    - Support for conversion from tags using localized alleles (e.g. LPL,
      LAD) to the family of standard tags (PL, AD)

* bcftools +trio-dnm2

    - Extend --strictly-novel to exclude cases where the non-Mendelian
      allele is the reference allele. The change is motivated by the
      observation that this class of variants is enriched for errors
      (especially for indels), and better corresponds with the option name.



--
The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA.


_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to