Samtools (and HTSlib and BCFtools) version 1.16 is now available from
GitHub and SourceForge.

https://sourceforge.net/projects/samtools/
https://github.com/samtools/htslib/releases/tag/1.16
https://github.com/samtools/samtools/releases/tag/1.16
https://github.com/samtools/bcftools/releases/tag/1.16

The main changes are listed below:

------------------------------------------------------------------------------
htslib - changes v1.16
------------------------------------------------------------------------------

* Make hfile_s3 refresh AWS credentials on expiry in order to make HTSlib
  work better with AWS IAM credentials, which have a limited lifespan.
  (PR#1462 and PR#1474, addresses #344)

* Allow BAM headers between 2GB and 4GB in size once more.  This is not
  permitted in the BAM specification but was allowed in an earlier version
  of HTSlib.  There is now a warning at 2GB and a hard failure at 4GB.
  (PR#1421, fixes #1420 and samtools#1613. Reported by John Marshall and
  R C Mueller)

* Improve error message when failing to load an index. (PR#1468, example of
  the problem samtools#1637)

* Permit MM (base modification) tags containing "." and "?" suffixes.  These
  define implicit vs explicit coordinates.  See the SAM tags specification
  for details. (PR#1423 and PR#1426, fixes #1418.  PR#1469, fixes #1466.
  Reported by cjw85)

* Warn if spaces instead of tabs are detected in a VCF file to prevent
  confusion. (PR#1328, fixes bcftools#1575.  Reported by ketkijoshi278)

* Add an "sclen" filter expression keyword.  This is the length of a
  soft-clip, both left and right end.  It may be combined with qlen
  (qlen-sclen) to obtain the number of bases in the query sequence that have
  been aligned to the genome ie it provides a way to compare local-alignment
  vs global-alignment length. (PR#1441 and PR/samtools#1661, fixes #1436.
  Requested by Chang Y)

* Improve error messages for CRAM reference mismatches.  If the user
  specifies the wrong reference, the CRAM slice header MD5sum checks
  fail.  We now report the SQ line M5 string too so it is possible to
  validate against the whole chr in the ref.fa file.  The error message
  has also been improved to report the reference name instead of #num.
  Finally, we now hint at the likely cause, which counters the misleading
  samtools supplied error of "truncated or corrupt" file. (PR#1427, fixes
  samtools#1640.  Reported by Jian-Guo Zhou)

* Expose more of the CRAM API and add new functionality to extract the
  reference from a CRAM file. (PR#1429 and PR#1442)

* Improvements to the implementation of embedded references in CRAM where no
  external reference is specified. (PR#1449, addresses some of the issues in
  #1445)

* The CRAM writer now allows alignment records with RG:Z: aux tags that
  don't have a corresponding @RG ID in the file header.  Previously these
  tags would have been silently dropped.  HTSlib will complain whenever
  it has to add one though, as such tags do not conform to recommended
  practice for the SAM, BAM and CRAM formats. (PR#1480, fixes #1479.
  Reported by Alex Leonard)

* Set tab delimiter in man page for tabix GFF3 sort. (PR#1457.  Thanks to
  Colin Diesh)

* When using libdeflate, the 1...9 scale of BGZF compression levels is now
  remapped to the 1...12 range used by libdeflate instead of being passed
  directly.  In particular, HTSlib levels 8 and 9 now map to libdeflate
  levels 10 and 12, so it is possible to select the highest (but slowest)
  compression offered by libdeflate. (PR#1488, fixes #1477.  Reported by
  Gert Hulselmans)

* The VCF variant API has been extended so that it can return separate flags
  for INS and DEL variants as well as the existing INDEL one.  These flags
  have not been added to the old bcf_get_variant_types() interface as it
  could break existing users.  To access them, it is necessary to use new
  functions bcf_has_variant_type() and bcf_has_variant_types(). (PR#1467)

* The missing, but trivial, `le_to_u8()` function has been added to
  hts_endian. (PR#1494, Thanks to John Marshall)

* bcf_format_gt() now works properly on big-endian platforms. (PR#1495,
  Thanks to John Marshall)

Build changes
-------------

These are compiler, configuration and makefile based changes.

* Update htscodecs to version 1.3.0 for new SIMD code + various fixes.
  Updates the htscodecs submodule and adds changes necessary to make
  HTSlib build the new SIMD codec implementations. (PR#1438, PR#1489,
  PR#1500)

* Fix clang builds under mingw.  Under mingw, clang requires dllexport to be
  applied to both function declarations and function definitions. (PR#1435,
  PR#1497, PR#1498 fixes #1433.  Reported by teepean)

* Fix curl type warning with gcc 12.1 on Windows. (PR#1443)

* Detect ARM Neon support and only build appropriate SIMD object files.
  (PR#1451, fixes #1450.  Thanks to John Marshall)

* `make print-config` now reports extra CFLAGS that are needed to build
  the SIMD parts of htscodecs.  These may be of use to third-party build
  systems that don't use HTSlib's or htscodecs' build infrastructure.
  (PR#1485. Thanks to John Marshall)

* Fixed some Makefile dependency issues for the "check"/"test" targets and
  plugins.  In particular, "make check" will now build the "all" target, if
  not done already, before running the tests. (PR#1496)

Bug fixes
---------

* Fix bug when reading position -1 in BCF (0 in VCF), which is used to
  indicate telomeric  regions.  The BCF reader was incorrectly assuming the
  value stored in the file was unsigned, so a VCF->BCF->VCF round-trip would
  change it from 0 to 4294967296. (PR#1476, fixes #1475 and bcftools#1753.
  Reported by Rodrigo Martin)

* Various bugs and quirks have been fixed in the filter expression engine,
  mostly related to the handling of absent tags, and the is_true flag. Note
  that as a result of these fixes, some filter expressions may give different
  results:

  - Fixed and-expressions including aux tag values which could give an
    invalid true result depending on the order of terms.

  - The expression `![NM]` is now true if only `NM` does not exist.  In
    earlier versions it would also report true for tags like `NM:i:0` which
    exist but have a value of zero.

  - The expression `[X1] != 0` is now false when `X1` does not exist.
    Earlier versions would return true for this comparison when the tag
    was missing.

  - NULL values due to missing tags now propagate through string, bitwise and
    mathematical operations.  Logical operations always treat them as false.
    (PR#1463, fixes samtools#1670.  Reported by Gert Hulselmans; PR#1478,
    fixes samtools#1677.  Reported by johnsonzcode)

* Fix buffer overrun in bam_plp_insertion_mod.  Memory now grows to the
  proper size needed for base modification data. (PR#1430, fixes
  samtools#1652.  Reported by hd2326)

* Remove limit of returned size from fai_retrieve(). (PR#1446, fixes
  samtools#1660.  Reported by Shane McCarthy)

* Cap hts_getline() return value at INT_MAX.  Prevents hts_getline() from
  returning a negative number (a fail) for very long string length values.
  (PR#1448.  Thanks to John Marshall)

* Fix breakend detection and test bcf_set_variant_type(). (PR#1456, fixes
  #1455.  Thanks to Martin Pollard)

* Prevent arrays of BCF_BT_NULL values found in BCF files from causing
  bcf_fmt_array() to call exit() as the type is unsupported.  These are
  now tested for and caught by bcf_record_check(), which returns an error
  code instead.  (PR#1486)

* Improved detection of fasta and fastq files that have very long
  comments following identifiers.  (PR#1491, thanks to John Marshall.
  Fixes samtools/samtools#1689, reported by cjw85)

* Fixed a SEGV triggered by giving a SAM file to `samtools import`. (PR#1492)

------------------------------------------------------------------------------
samtools - changes v1.16
------------------------------------------------------------------------------

New work and changes:

 * samtools reference command added.  This subcommand extracts the embedded
   reference out of a CRAM file. (PR#1649, addresses #723.  Requested by
   Torsten Seemann)

  * samtools import now adds grouped by query-name to the header. (PR#1633,
    thanks to Nils Homer)

 * Made samtools view read error messages more generic.  Former error message
   would claim that there was a "truncated file or corrupt BAM index file"
   with no real justification.  Also reset errno in stream_view which could
   lead to confusing error messages. (PR#1645, addresses some of the issues
   in #1640.  Reported by Jian-Guo Zhou)

 * Make samtools view -p also clear mqual, tlen and cigar. (PR#1647, fixes
   #1606.  Reported by eboyden)

 * Add bedcov option -c to report read count. (PR#1644, fixes #1629.
   Reported by Natchaphon Rajudom)

 * Add UMI/barcode handling to samtools markdup. (PR#1630, fixes #1358 and
   #1514.  Reported by Gert Hulselmans and Poshi)

 * Add a new template coordinate sort order to samtools sort and samtools
   merge.  This is useful when working with unique molecular identifiers
   (UMIs). (PR#1605, fixes #1591.  Thanks to Nils Homer)

 * Rename mpileup --ignore-overlaps to --ignore-overlaps-removal or
   --disable-overlap-removal.  The previous name was ambiguous and was
   often read as an option to enable removal of overlapping bases, while
   in reality this is on by default and the option turns off the ability
   to remove overlapping bases. (PR#1666, fixes #1663.  Reported by
   yangdingyangding)

 * The dict command can now read BWA's .alt file and add AH:* tags indicating
   reference sequences that represent alternate loci. (PR#1676.  Thanks to
   John Marshall)

 * The "samtools index" command can now accept multiple alignment filenames
   with the new -M option, and will index each of them separately.
   (Specifying the output index filename via out.index or the new -o option
   is currently only applicable when there is only one alignment file to be
   indexed.) (PR#1674.  Reported by Abigail Ramsøe and Nicola Romanò.
   Thanks to John Marshall)

 * Allow samtools fastq -T "*". This allows all tags from SAM records to be
   written to fastq headers. This is a counterpart to samtools import -T "*".
   (PR#1679.  Thanks to cjw85)

Bug Fixes:

 * Re-enable --reference option for samtools depth.  The reference is not
   used but this makes the command line usage compatible with older releases.
   (PR#1646, fixes #1643.  Reported by Randy Harr)

 * Fix regex coordinate bug in samtools markdup. (PR#1657, fixes #1642.
   Reported by Randy Harr)

 * Fix divide by zero in plot-bamstats -m, on unmapped data. (PR#1678, fixes
   #1675.  Thanks to Shane McCarthy)

 * Fix missing RG headers when using samtools merge -r. (PR#1683, addresses
   htslib#1479.  Reported by Alex Leonard)

 * Fix a possible unaligned access in samtools reference. (PR#1696)

Documentation:

 * Add documentation on CRAM compression profiles and some of the newer
   options that appear in CRAM 3.1 and above. (PR#1659, fixes #1656.
   Reported by Matthias De Smet)

 * Add "sclen" filter expression keyword documentation. (PR#1661, see also
   htslib#1441)

 * Extend FILTER EXPRESSION man page section to match the changes made in
   HTSlib. (PR#1687, samtools/htslib#1478)

Non user-visible changes and build improvements:

 * Ensure generated test files are ignored (by git) and cleaned (by make
   testclean) (PR#1692, Thanks to John Marshall)

------------------------------------------------------------------------------
bcftools - changes v1.16
------------------------------------------------------------------------------

* New plugin `bcftools +variant-distance` to annotate records with distance
  to the nearest variant (#1690)

Changes affecting the whole of bcftools, or multiple commands:

* The -i/-e filtering expressions

    - Added support for querying of multiple filters, for example `-i
      'FILTER="A;B"'` can be used to select sites with two filters "A"
      and "B" set. See the documentation for more examples.

    - Added modulo arithmetic operator

Changes affecting specific commands:

* bcftools annotate

    - A bug introduced in 1.14 caused that records with INFO/END annotation
      would incorrectly trigger `-c ~INFO/END` mode of comparison even when
      not explicitly requested, which would result in not transferring the
      annotation from a tab-delimited file (#1733)

* bcftools merge

    - New `-m snp-ins-del` switch to merge SNVs, insertions and deletions
      separately (#1704)

* bcftools mpileup

    - New NMBZ annotation for Mann-Whitney U-z test on number of mismatches
      within supporting reads

    - Suppress the output of MQSBZ and FS annotations in absence of alternate
      allele

* bcftools +scatter

    - Fix erroneous addition of duplicate PG lines

* bcftools +setGT

    - Custom genotypes (e.g. `-n c:1/1`) now correctly override ploidy



--
The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to