Samtools (and HTSlib and BCFtools) version 1.19 is now available from
GitHub and SourceForge.

https://github.com/samtools/htslib/releases/tag/1.19
https://github.com/samtools/samtools/releases/tag/1.19
https://github.com/samtools/bcftools/releases/tag/1.19
https://sourceforge.net/projects/samtools/

The main changes are listed below:

------------------------------------------------------------------------------
htslib - changes v1.19
------------------------------------------------------------------------------

Updates
-------

* A temporary work-around has been put in the VCF parser so that it is less
  likely to fail on rows with a large number of ALT alleles, where Number=G
  tags like PL can expand beyond the 2Gb limit enforced by HTSlib.  For now,
  where this happens the offending tag will be dropped so the data can be
  processed, albeit without the likelihood data.

  In future work, the library will instead convert such tags into their
  local alternatives (see https://github.com/samtools/hts-specs/pull/434).

* New program. Adds annot-tsv which annotates regions in a destination file
  with texts from overlapping regions in a source file. (PR#1619)

* Change bam_parse_cigar() so that it can modify existing BAM records.  This
  makes more useful as public API.  Previously it could only handle partially
  formed BAM records. (PR#1651, fixes #1650. Reported by Oleksii Nikolaienko)

* Add "uncompressed" to hts_format_description() where appropriate.  This
  adds an "uncompressed" description to uncompressed files that would
  normally be compressed, such as BAM and BCF. (PR#1656, in relation to
  samtools#1884.  Thanks to John Marshall)

* Speed up to the VCF parser and writer. (PR#1644 and PR#1663)

* Add an hclen (hard clip length) SAM filter function. (PR#1660, with
  reference to samtools#813)

* Avoid really closing stdin/stdout in hclose()/hts_close()/et al. See
  discussion in PR for details. (PR#1665.  Thanks to John Marshall)

* Add support to handle multiple files in bgzip. (PR#1658, fixes #1642.
  Requested by bw2)

* Enable auto-vectorisation in CRAM 3.1 codecs.  Speeds decoding on some
  sequencing platform data. (PR#1669)

* Speed up removal of lines in large headers. (PR#1662, fixes #1460.
  Reported by Anže Starič)

* Apply seqtk PR to improve kseq.h parsing performance.  Port of
  Fabian Klötzl's (kloetzl) lh3/seqtk#123 and attractivechaos/klib#173
  to HTSlib. (PR#1674.  Thanks to John Marshall)

Build changes
-------------

* Updated htscodecs submodule to 1.6.0. (PR#1685, PR#1717, PR#1719)

* Apply the packed attribute to uint*_u types for Clang to prevent
  -fsanitize=alignment failures. (PR#1667.  Thanks to Fangrui Song)

* Fuzz testing improvements. (PR#1664)

* Add C++ casts for external headers in klist.h and kseq.h. (PR#1683.  See
  also PR#1674 and PR#1682)

* Add test case compiling the public headers as C++. (PR#1682.  Thanks to
  John Marshall)

* Enable optimisation level -O3 for SAM QUAL+33 formatting. (PR#1679)

* Make compiler flag detection work with zig cc. (PR#1687)

* Fix unused value warnings when built with NDEBUG. (PR#1688)

* Remove some disused Makefile variables, fix typos and a warning.  Improve
  bam_parse_basemod() documentation. (PR#1705, Thanks to John Marshall)

Bug fixes
---------

* Fail bgzf_useek() when offset is above block limits. (PR#1668)

* Fix multi-threaded on-the-fly indexing problems. (PR#1672, fixes
  samtools#1861 and bcftools#1985.  Reported by Mark Ebbert and lacek)

* Fix hfile_libcurl small seek bug. (PR#1676, fixes samtools#1918.  Also may
  fix #1037, #1625 and samtools#1622. Reported by Alex Reynolds, Mark Walker,
  Arthur Gilly and skatragadda-nygc. Thanks to John Marshall)

* Fix a minor memory leak in malformed CRAM EXTERNAL blocks. [fuzz] (PR#1671)

* Fix a cram decode hang from block_resize(). (PR#1680. Reported by
  Sebastian Deorowicz)

* Cram fuzzing improvements.  Fixes a number of cram errors. (PR#1701,
  fixes #1691, #1692, #1693, #1696, #1697, #1698, #1699 and #1700. Thanks
  to Octavio Galland for finding and reporting all these)

* Fix crypt4gh redirection. (PR#1675, fixes  grbot/crypt4gh-tutorial#2.
  Reported by hth4)

* Fix PG header linking when records make a loop. (PR#1702, fixes #1694.
  Reported by Octavio Galland)

* Prevent issues with no-stored-sequence records in CRAM files, by ensuring
  they are accounted for properly in block size calculations, and by limiting
  the maximum query length in the CIGAR data.  Originally seen as an overflow
  by OSS-Fuzz / UBSAN, it turned out this could lead to excessive time and
  memory use by HTSlib, and could result in it writing out unreadable CRAM
  files. (PR#1710)

* Fix some illegal shifts and integer overflows found by OSS-Fuzz / UBSAN.
  (PR#1707, PR#1712, PR#1713)

------------------------------------------------------------------------------
samtools - changes v1.19
------------------------------------------------------------------------------

New work and changes:

* Samtools coverage: add a new --plot-depth option to draw depth (of
  coverage) rather than the percentage of bases covered. (PR #1910.
  Thanks to Pierre Lindenbaum)

* Samtools merge / sort: add a lexicographical name-sort option via the -N
  option.  The "natural" alpha-numeric sort is still available via -n.
  (PR #1900, fixes #1500.  Reported by Steve Huang)

* Samtools view: add -N ^NAME_FILE and -R ^RG_FILE options.  The standard -N
  and -R options only output reads matching a specified list of names or
  read-groups.  With a caret (^) prefix these may be negated to only output
  read not matching the specified files. (PR #1896, fixes #1895.  Suggested
  by Feng Tian)

* Cope with Htslib's change to no longer close stdout on hts_close. Htslib
  companion PR is samtools/htslib#1665. (PR #1909.  Thanks to John Marshall)

* Plot-bamstats: add a new plot of the read lengths ("RL") from samtools
  stats output. (PR #1922, fixes #1824.  Thanks to @erboone, suggested by
  Martin Pollard)

* Samtools split: support splitting files based on the contents of auxiliary
  tags.  Also adds a -M option to limit the number of files split can make,
  to avoid accidental resource over-allocation, and fixes some issues with
  --write-index. (PR #1222, PR #1933, fixes #1758.  Thanks to Valeriu Ohan,
  suggested by Scott Norton)

Bug Fixes:

* Samtools stats: empty barcode tags are now treated as having no barcode.
  (PR #1929, fixes #1926.  Reported by Jukka Matilainen)

* Samtools cat: add support for non-seekable streams.  The file format
  detection blocked pipes from working before, but now files may be
  non-seekable such as stdin or a pipe. (PR #1930, fixes #1731.  Reported
  by Julian Hess)

* Samtools mpileup -aa (absolutely all positions) now produces an output even
  when given an empty input file. (PR #1939.  Reported by Chang Y)

* Samtools markdup: speed up optical duplicate tagging on regions with very
  deep data. (PR #1952)

Documentation:

* Samtools mpileup: add more usage examples to the man page. (PR #1913,
  fixes #1801)

* Samtools fastq: explicitly document the order that filters apply.
  (PR #1907)

* Samtools merge: fix example output to use an uppercase RG PL field.
  (PR #1917.  Thanks to John Marshall.  Reported by Michael Macias)

* Add hclen SAM filter documentation. (PR #1902.  See also
  samtools/htslib#1660)

* Samtools consensus: remove the -5 option from documentation.  This option
  was renamed before the consensus subcommand was merged, but accidentally
  left in the man page. (PR #1924)

------------------------------------------------------------------------------
bcftools - changes v1.19
------------------------------------------------------------------------------

Changes affecting the whole of bcftools, or multiple commands:

* Filtering expressions can be given a file with list of strings to match,
  this was previously possible only for the ID column. For example

  ID=@file            .. selects lines with ID present in the file
  INFO/TAG=@file.txt  .. selects lines where TAG has a string value listed
                         in the file
  INFO/TAG!=@file.txt .. TAG must not have a string value
                         listed in the file

* Allow to query REF,ALT columns directly, for example

  -e 'REF="N"'

Changes affecting specific commands:

* bcftools annotate

    - Fix `bcftools annotate --mark-sites`, VCF sites overlapping regions in
      a BED file were not annotated (#1989)

    - Add flexibility to FILTER column transfers and allow transfers within
      the same file, across files, and in combination. For examples see
      
http://samtools.github.io/bcftools/howtos/annotate.html#transfer_filter_to_info

* bcftools call

    - Output MIN_DP rather than MinDP in gVCF mode

    - New `-*, --keep-unseen-allele` option to output the unobserved
      allele <*>, intended for gVCF.

* bcftools head

    - New `-s, --samples` option to include the #CHROM header line with
      samples.

* bcftools gtcheck

    - Add output options `-o, --output` and `-O, --output-type`

    - Add filtering options `-i, --include` and `-e, --exclude`

    - Rename the short option `-e, --error-probability` from lower case to
      upper case `-E, --error-probability`

    - Changes to the output format, replace the DC section with DCv2:

        - adds a new column for the number of matching genotypes

        - The --error-probability is newly interpreted as the probability of
          erroneous allele rather than genotype. In other words, the
          calculation of the discordance score now considers the probability
          of genotyping error to be different for HOM and HET genotypes,
          i.e. P(0/1|dsg=0) > P(1/1|dsg=0).

        - fixes in HWE score calculation plus output average HWE score rather
          than absolute HWE score

        - better description of fields

* bcftools merge

    - Add `-m` modifiers to suppress the output of the unseen allele <*> or
      <NON_REF> at variant sites (e.g. `-m both,*`) or all sites
      (e.g. `-m both,**`)

* bcftools mpileup

    - Output MIN_DP rather than MinDP in gVCF mode

* bcftools norm

    - Add the number of joined lines to the summary output, for example

      Lines   total/split/joined/realigned/skipped:  6/0/3/0/0

    - Allow combining -m and -a with --old-rec-tag (#2020)

    - Symbolic <DEL> alleles caused norm to expand REF to the full length of
      the deletion. This was not intended and problematic for long deletions,
      the REF allele should list one base only (#2029)

* bcftools query

    - Add new `-N, --disable-automatic-newline` option for pre-1.18 query
      formatting behavior when newline would not be added when missing

    - Make the automatic addition of the newline character in a more
      predictable way and, when missing, always put it at the end of the
      expression. In version 1.18 it could be added at the end of the
      expression (for per-site expressions) or inside the square brackets
      (for per-sample expressions). The new behavior is:

        - if the formatting expression contains a newline character, do
          nothing

        - if there is no newline character and -N,
          --disable-automatic-newline is given, do nothing

        - if there is no newline character and -N is not given, insert
          newline at the end of the expression

 See #1969 for details

    - Add new `-F, --print-filtered` option to output a default string for
      samples that would otherwise be filtered by `-i/-e` expressions.

    - Include sample name in the output header with `-H` whenever it makes
      sense (#1992)

* bcftools +spit-vep

    - Fix on the fly filtering involving numeric subfields, e.g. `-i
      'MAX_AF<0.001'` (#2039)

    - Interpret default column type names (--columns-types) as entire
      strings, rather than substrings to avoid unexpected spurious
      matches (i.e. internally add ^ and $ to all field names)

* bcftools +trio-dnm2

    - Do not flag paternal genotyping errors as de novo mutations.
      Specifically, when father's chrX genotype is 0/1 and mother's 0/0,
      0/1 in the child will not be marked as DNM.

* bcftools view

    - Add new `-A, --trim-unseen-allele` option to remove the unseen allele
      <*> or <NON_REF> at variant sites (`-A`) or all sites (`-AA`)



--
The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA.
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to