Samtools (and HTSlib and BCFtools) version 1.15 is now available from
GitHub and SourceForge.

https://sourceforge.net/projects/samtools/
https://github.com/samtools/htslib/releases/tag/1.15
https://github.com/samtools/samtools/releases/tag/1.15
https://github.com/samtools/bcftools/releases/tag/1.15

The main changes are listed below:

------------------------------------------------------------------------------
htslib - changes v1.15
------------------------------------------------------------------------------

Features and Updates
--------------------

* Bgzip now has a --keep option to not remove the input file after
  compressing. (PR#1331)

* Improved file format detection so some BED files are no longer detected as
  FASTQ or FASTA. (PR#1350, thanks to John Marshall)

* Added xz (lzma), zstd and D4 formats to the file type detection functions.
  We don't actively support reading these data types, but function calls and
  htsfile can detect them. (PR#1340, thanks to John Marshall)

* CRAM now also uses libdeflate for read-names if the libdeflate version is
  new enough (1.9 onwards).  Previously we used zlib for this due to poor
  performance of libdeflate.  This gives a slight speed up and reduction in
  file size. (PR#1383)

* The VCF and BCF readers will now issue a warning if contig, INFO or FORMAT
  IDs do not match the formats described in the VCFv4.3 specification. Note
  that while the invalid names will mostly still be accepted, future updates
  will convert the warnings to errors causing files including invalid names
  to be rejected.  (PR#1389)

Build changes
-------------

These are compiler, configuration and makefile based changes.

* HTSlib now uses libhtscodecs release 1.2.1.

* Improved support for compiling and linking against HTSlib with Microsoft
  Visual Studio. (PR#1380, #1377, #1375.  Thanks to Aidan Bickford and
  John Marshall)

* Various internal CI improvements.

Bug fixes
---------

* Fixed CRAM index queries for HTSJDK output (PR#1388, reported by Chris
  Norman).  Note this also fixes writing CRAM writing, to match the
  specification (and HTSJDK), from version 3.1 onwards.

* Fixed CRAM index queries when required-fields settings are selected to
  ignore CIGARs (PR#1372, reported by Giulio Genovese).

* Unmapped but placed (having chr/pos) are now included in the BAM indices.
  (PR#1352, thanks to John Marshall)

* CRAM now honours the filename##idx##index nomenclature for specifying
  non-standard index locations. (PR#1360, reported by Michael Cariaso)

* Minor CRAM v1.0 read-group fix (PR#1349, thanks to John Marshall)

* Permit .fa and .fq file type detection as synonyms for FASTA and FASTQ.
  (PR#1386).

* Empty VCF format fields are now output ":.:" as instead of "::". (PR#1370)

* Repeated bcf_sr_seek calls now work. (PR#1363, reported by Giulio Genovese)

* Bcf_remove_allele_set now works on unpacked BCF records. (PR#1358, reported
  by Brent Pedersen).

* The hts_parse_decimal() function used to read numbers in region lists is
  now better at rejecting non-numeric values.  In particular it now rejects a
  lone 'G' instead of interpreting it as '0G', i.e. zero. (PR#1396, PR#1400,
  reported by SSSimon Yang; thanks to John Marshall).

* Improve support for GPU issues listed by -Wdouble-promotion. (PR#1365,
  reported by David Seisert)

* Fix example code in header file documentation. (PR#1381, Thanks to
  Aidan Bickford)

------------------------------------------------------------------------------
samtools - changes v1.15
------------------------------------------------------------------------------

Notice:

 * Samtools mpileup VCF and BCF output (deprecated in release 1.9) has been
   removed.  Please use bcftools mpileup instead.

New work and changes:

 * Added "--min-BQ" and "--min-MQ" options to "depth". These match the
   equivalent long options found in "samtools mpileup" and gives a
   consistent way of specifying the base and mapping quality filters.
   (#1584; fixes #1580. Reported by Chang Y)

 * Improved automatic file type detection with "view -u" or "view -1".
   Setting either of these options would default to BAM format regardless
   of the usual automatic file type selection based on the file name.
   The defaults are now only used when the file name does not indicate
   otherwise. (#1582)

 * For "markdup" optical duplicate marking add regex options for custom
   coordinates.  For the case of non standard read names (QNAME), add
   options to read the coordinates and, optionally, another part of the
   string to test for optical duplication. (#1558)

 * New "samtools consensus" subcommand for generating consensus from SAM,
   BAM or CRAM files based on the contents of the alignment records.  The
   consensus is written as FASTA, FASTQ or as a pileup oriented format.
   The default FASTA/FASTQ output includes one base per non-gap consensus,
   with insertions with respect to the aligned reference being included and
   deletions removed. This could be used to compute a new reference from
   sequence assemblies to realign against. (#1557)

 * New "samtools view --fetch-pairs" option.  This options retrieves
   pairs even when the mate is outside of the requested region.  Using
   this option enables the multi-region iterator and a region to search
   must be specified.  The input file must be an indexed regular file.
   (#1542)

 * Building on #1530 below, add a tview reflist for Goto. (#1539, thanks to
   Adam Blanchet)

 * Completion of references added to tview Goto. (#1530; thanks to
   Adam Blanchet)

 * New "samtools head" subcommand for conveniently displaying the headers
   of a SAM, BAM, or CRAM file. Without options, this is equivalent to
   `samtools view --header-only --no-PG` but more succinct and memorable.
   (#1517; thanks to John Marshall)

Bug Fixes:

 * Free memory when stats fails to read the header of a file. (#1592; thanks
   to Mathias Schmitt)

 * Fixed empty field on unsupported aux tags in "mpileup --output-extra".
   Replaces the empty fields on unsupported aux tags with a '*'. (#1553;
   fixes #1544. Thanks to Adam Blanchet)

 * In mpileup, the --output-BP-5 and --output-BP are no longer mutually
   exclusive.  This fixes the problem of output columns being switched.
   (#1540; fixes 1534.  Reported by Konstantin Riege)

 * Fix for hardclip bug in ampliconclip.  Odd length sequences resulted in
   random characters appearing in sequence. (#1538; fixes #1527. Reported by
   Ivana Mihalek)

Documentation:

 * Improved mpileup documentation. (#1566; fixes #1564.  Reported by Chang Y)

 * Fixed "samtools depth -J" documentation, which was reversed. (#1552; fixes
   #1549.  Reported by Stephan Hutter)

 * Numerous minor man page fixes. (#1528, #1536, #1579, #1590.  Thanks to
   John Marshall for some of these)

Non user-visible changes and build improvements:

 * Replace CentOS test build with Rocky Linux.  The CentOS Docker images that
   our test build depended on has stopped working.  Switched to Rocky Linux
   as the nearest available equivalent. (#1589)

 * Fix missing autotools on Appveyor.  Newer versions of msys2 removed
   autotools from their base-devel package.  This is putting them back.
   (#1575)

 * Fixed bug detected by clang-13 with -Wformat-security. (#1553)

 * Switch to using splaysort in bam_lpileup.  Improves speed and efficiency
   in "tview". (#1548; thanks to Adam Blanchet)

------------------------------------------------------------------------------
bcftools - changes v1.15
------------------------------------------------------------------------------

* New `bcftools head` subcommand for conveniently displaying the headers of a
  VCF or BCF file. Without any options, this is equivalent to `bcftools view
  --header-only --no-version` but more succinct and memorable.

* The `-T, --targets-file` option had the following bug originating in
  HTSlib code: when an uncompressed file with multiple columns CHR,POS,REF
  was provided, the REF would be interpreted as 0 gigabases (#1598)

Changes affecting specific commands:

* bcftools annotate

    - In addition to `--rename-annots`, which requires a file with name
      mappings, it is now possible to do the same on the command line
      `-c NEW_TAG:=OLD_TAG`

    - Add new option --min-overlap which allows to specify the minimum
      required overlap of intersecting regions

    - Allow to transfer ALT from VCF with or without replacement using
       bcftools annotate -a annots.vcf.gz -c ALT file.vcf.gz
       bcftools annotate -a annots.vcf.gz -c +ALT file.vcf.gz

* bcftools convert

    - Revamp of `--gensample`, `--hapsample` and `--haplegendsample` family
      of options which includes the following changes:

    - New `--3N6` option to output/input the new version of the .gen file
      format, see https://www.cog-genomics.org/plink/2.0/formats#gen

    - Deprecate the `--chrom` option in favor of `--3N6`. A simple `cut`
      command can be used to convert from the new 3*M+6 column format to the
      format printed with `--chrom` (`cut -d' ' -f1,3-`).

    - The CHROM:POS_REF_ALT IDs which are used to detect strand swaps are
      required and must appear either in the "SNP ID" column or the
      "rsID" column. The column is autodetected for `--gensample2vcf`,
      can be the first or the second for `--hapsample2vcf` (depending on
      whether the `--vcf-ids` option is given), must be the first for
      `--haplegendsample2vcf`.

* bcftools csq

    - Allow GFF files with phase column unset

* bcftools filter

    - New `--mask`, `--mask-file` and `--mask-overlap` options to soft filter
      variants in regions (#1635)

* bcftools +fixref

    - The `-m id` option now works also for non-dbSNP ids, i.e. not just
      `rsINT`

    - New `-m flip-all` mode for flipping all sites, including ambiguous A/T
      and C/G sites

* bcftools isec

    - Prevent segfault on sites filtered with -i/-e in all files (#1632)

* bcftools mpileup

    - More flexible read filtering using the options:
      --ls, --skip-all-set   ..  skip reads with all of the FLAG bits set
      --ns, --skip-any-set   ..  skip reads with any of the FLAG bits set
      --lu, --skip-all-unset ..  skip reads with all of the FLAG bits unset
      --nu, --skip-any-unset ..  skip reads with any of the FLAG bits unset

      The existing synonymous options will continue to function but their use is
      discouraged:
       --rf, --incl-flags  Required flags: skip reads with mask bits unset
       --ff, --excl-flags  Filter flags: skip reads with mask bits set

* bcftools query

    - Make the `--samples` and `--samples-file` options work also in the
      `--list-samples` mode. Add a new `--force-samples` option which allows
      to proceed even when some of the requested samples are not present in
      the VCF (#1631)

* bcftools +setGT

    - Fix a bug in `-t q -e EXPR` logic applied on FORMAT fields, sites with
      all samples failing the expression EXPR were incorrectly skipped. This
      problem affected only the use of `-e` logic, not the `-i` expressions
      (#1607)

* bcftools sort

    - make use of the TMPDIR environment variable when defined

* bcftools +trio-dnm2

    - The --use-NAIVE mode now also adds the de novo allele in FORMAT/VA



--
The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to