Samtools (and HTSlib and BCFtools) version 1.13 is now available from
GitHub and SourceForge.

https://sourceforge.net/projects/samtools/
https://github.com/samtools/htslib/releases/tag/1.13
https://github.com/samtools/samtools/releases/tag/1.13
https://github.com/samtools/bcftools/releases/tag/1.13 The main changes are listed below:


------------------------------------------------------------------------------
htslib - changes v1.13
------------------------------------------------------------------------------

Features and Updates
--------------------

* In case a PG header line has multiple ID tags supplied by other
  applications, the header API now selects the first one encountered as
  the identifying tag and issues a warning when detecting subsequent ID
  tags. (#1256; fixed samtools/samtools#1393)

* VCF header reading function (vcf_hdr_read) no longer tries to download a
  remote index file by default. (#1266; fixes #380)

* Support reading and writing FASTQ format in the same way as SAM, BAM or
  CRAM. Records read from a FASTQ file will be treated as unmapped data.
  (#1156)

* Added GCP requester pays bucket access.  Thanks to @indraniel. (#1255)

* Made mpileup's overlap removal choose which copy to remove at random
  instead of always removing the second one.  This avoids strand bias in
  experiments where the +ve and -ve strand reads always appear in the same
  order. (#1273; fixes samtools/bcftools#1459)

* It is now possible to use platform specific BAQ parameters.  This also
  selects long-read parameters for read lengths bigger than 1kb, which helps
  bcftools mpileup call SNPs on PacBio CCS reads. (#1275)

* Improved bcf_remove_allele_set.  This fixes a bug that stopped iteration
  over alleles prematurely, marks removed alleles as 'missing' and does
  automatic lazy unpacking. (#1288; fixes #1259)

* Improved compression metrics for unsorted CRAM files.  This improves the
  choice of codecs when handling unsorted data. (#1291)

* Linear index entries for empty intervals are now initialised with the file
  offset in the next non-empty interval instead of the previous one.  This
  may reduce the amount of data iterators have to discard before reaching the
  desired region, when the starting location is in a sequence gap. Thanks to
  @carsonh for reporting the issue. (#1286; fixes #486)

* A new hts_bin_level API function has been added, to compute the level of a
  given bin in the binning index. (#1286)

* Related to the above, a new API method, hts_idx_nseq, now returns the total
  number of contigs from an index. (#1295 and #1299)

* Added bracket handling to bcf_hdr_parse_line, for use with ##META lines.
  Thanks to Alberto Casas Ortiz. (#1240)

Build changes
-------------

These are compiler, configuration and makefile based changes.

* HTSlib now uses libhtscodecs release 1.1.1.

* Added a curl/curl.h check to configure and improved INSTALL documentation
  on build options.  Thanks to Melanie Kirsche and John Marshall. (#1265;
  fixes #1261)

* Some fixes to address GCC 11.1 warnings. (#1280, #1284, #1285; fixes #1283)

* Supports building HTSlib in a separate directory. Thanks to John Marshall.
  (#1277; fixes #231)

* Supports building HTSlib on MinGW 32-bit environments.
  Thanks to John Marshall. (#1301)

Bug fixes
---------

* Fixed hts_itr_query() et al region queries: fixed bug introduced in
  HTSlib 1.12, which led to iterators producing very few reads for some
  queries (especially for larger target regions) when unmapped reads were
  present. HTSlib 1.11 had a related problem in which iterators would
  omit a few unmapped reads that should have been produced; cf #1142.
  Thanks to Daniel Cooke for reporting the issue. (#1281; fixes #1279)

* Removed compressBound assertions on opening bgzf files.
  Thanks to Gurt Hulselmans for reporting the issue. (#1258; fixed #1257)

* Duplicate sample name error message for a VCF file now only displays
  the duplicated name rather the entire same name list. (#1262; fixes
  samtools/bcftools#1451)

* Fix to make samtools cat work on CRAMs again. (#1276; fixes
  samtools/samtools#1420)

* Fix for a double memory free in SAM header creation.  Thanks to @ihsinme.
  (#1274)

* Prevent assert in bcf_sr_set_regions.  Thanks to Dr K D Murray. (#1270)

* Fixed crash in knet_open() etc stubs.  Thanks to John Marshall. (#1289)

* Fixed filter expression "cigar" on unmapped reads.  Stop treating an
  empty CIGAR string as an error.  Thanks to Chang Y for reporting the
  issue. (#1298, fixes samtools/samtools#1445)

* Bug fixes in the bundled copy of htscodecs:

  - Fixed an uninitialized access in the name tokeniser decoder.
    (samtools/htscodecs#23)

  - Fixed a bug with name tokeniser and variable number of names per
    slice, causing it to incorrectly report an error on certain valid
    inputs. (samtools/htscodecs#24)

------------------------------------------------------------------------------
samtools - changes v1.13
------------------------------------------------------------------------------

 * Fixed samtools view FILE REGION, mpileup -r REGION, coverage -r REGION
   and other region queries: fixed bug introduced in 1.12, which led to
   region queries producing very few reads for some queries (especially for
   larger target regions) when unmapped reads were present. Thanks to
   @vinimfava (#1451), @JingGuo1997 (#1457) and Ramprasad Neethiraj (#1460)
   for reporting the respective issues.

 * Added options to set and clear flags to samtools view.  Along with the
   existing remove aux tags this gives the ability to remove mark duplicate
   changes (part of #1358) (#1441)

 * samtools view now has long option equivalents for most of its
   single-letter options. Thanks to John Marshall. (#1442)

 * A new tool, samtools import, has been added.  It reads one or more FASTQ
   files and converts them into unmapped SAM, BAM or CRAM. (#1323)

 * Fixed samtools coverage error message when the target region name is not
   present in the file header. Thanks to @Lyn16 for reporting it. (#1462;
   fixes #1461)

 * Made samtools coverage ASCII mode produce true ASCII output.  Previously
   it would produce UTF-8 characters. (#1423; fixes #1419)

 * samtools coverage now allows setting the maximum depth, using the
   -d/--depth option. Also, the default maximum depth has been set to
   1000000. (#1415; fixes #1395)

 * Complete rewrite of samtools depth.  This means it is now considerably
   faster and does not need a depth limit to avoid high memory usage.
   Results should mostly be the same as the old command with the potential
   exception of overlap removal. (#1428; fixes #889, helps ameliorate #1411)

 * samtools flags now accepts any number of command line arguments, allowing
   multiple SAM flag combinations to be converted at once.
   Thanks to John Marshall. (#1401, fixes #749)

 * samtools ampliconclip, ampliconstats and plot-ampliconstats now support
   inputs that list more than one reference. (#1410 and #1417; fixes #1396
   and #1418)

 * samtools ampliconclip now accepts the --tolerance option, which allows the
   user to set the number of bases within which a region is matched.  The
   default is 5. (#1456)

 * Updated the documentation on samtools ampliconclip to be clearer about
   what it does.  From a suggestion by Nathan S Watson-Haigh. (#1448)

 * Fixed negative depth values in ampliconstats output. (#1400)

 * samtools addreplacerg now allows for updating (replacing) an existing
   `@RG` line in the output header, if a new `@RG` line is provided in the
   command line, via the -r argument. The update still requires the user's
   approval, which can be given with the new -w option.
   Thanks to Chuang Yu. (#1404)

 * Stopped samtools cat from outputting multiple CRAM EOF markers. (#1422)

 * Three new counts have been added to samtools flagstat: primary, mapped
   primary and duplicate primary. (#1431; fixes #1382)

 * samtools merge now accepts a `-o FILE` option specifying the output file,
   similarly to most other subcommands. The existing way of specifying it (as
   the first non-option argument, alongside the input file arguments) remains
   supported. Thanks to David McGaughey and John Marshall. (#1434)

 * The way samtools merge checks for existing files has been changed so that
   it does not hang when used on a named pipe. (#1438; fixes #1437)

 * Updated documentation on mpileup to highlight the fact that the filtering
   options on FLAGs work with ANY rules. (#1447; fixes #1435)

 * samtools can now be configured to use a copy of HTSlib that has been
   set up with separate build and source trees.  When this is the case,
   the `--with-htslib` configure option should be given the location of
   the HTSlib build tree.  (Note that samtools itself does not yet support
   out-of-tree builds).  Thanks to John Marshall. (#1427; companion change
   to samtools/htslib#1277)

------------------------------------------------------------------------------
bcftools - changes v1.13
------------------------------------------------------------------------------

This release brings new options and significant changes in BAQ
parametrization in `bcftools mpileup`. The previous behaviour can be
triggered by providing the `--config 1.12` option. Please see
PR #1474 for details.

Changes affecting the whole of bcftools, or multiple commands:

* Improved build system

Changes affecting specific commands:

* bcftools annotate:

    - Fix rare a bug when INFO/END is present, all INFO fields are removed
      with `bcftools annotate -x INFO` and BCF output is produced. Then the
      removed INFO/END continues to inform the end coordinate and causes
      incorrect retrieval of records with the -r option (#1483)

    - Support for matching annotation line by ID, in addition to
      CHROM,POS,REF, and ALT (#1461)

 bcftools annotate -a annots.tab.gz  -c CHROM,POS,~ID,REF,ALT,INFO/END
input.vcf

* bcftools csq:

    - When GFF and VCF/fasta use a different chromosome naming convention
      (e.g. chrX vs X), no consequences would be added. Newly the program
      attempts to detect these differences and remove/add the "chr" prefix to
      chromosome name to match the GFF and VCF/fasta (#1507)

    - Parametrize brief-predictions parameter to allow explicit number of
      amino acids to be printed. Note that the `-b, --brief-predictions`
      option is being replaced with `-B, --trim-protein-seq INT`

* bcftools +fill-tags:

    - Generalization and better support for custom functions that allow
      adding new INFO tags based on arbitrary `-i, --include` type of
      expressions. For example, to calculate a missing INFO/DP annotation
      from FORMAT/AD, it is possible to use:  -t 'DP:1=int(sum(FORMAT/AD))'

 Here the optional ":1" part specifies that a single value will be added (by
default Number=. is used) and the optional int(...) adds an integer value (by
default Type=Float is used).

    - When FORMAT/GT is not present, the INFO/AF tag will be newly calculated
      from INFO/AC and INFO/AN.

* bcftools gtcheck:

    - Switch between FORMAT/GT or FORMAT/PL when one is (implicitly)
      requested but only the other is available

    - Improve diagnostics, printing warnings when a line cannot be matched
      and the number of lines skipped for various reasons (#1444)

    - Minor bug fix, with PLs being the default, the `--distinctive-sites`
      option started to require explicit `--error-probability 0`

* bcftools index:

    - The program now accepts both data file name and the index file name.
      This adds to user convenience when running index statistics (-n, -s)

* bcftools isec:

    - Always generate sites.txt with isec -p (#1462)

* bcftools +mendelian:

    - Consider only complete trios, do not crash on sample name typos (#1520)

* bcftools mpileup:

    - New `--seed` option for reproducibility of subsampling code in HTSlib

    - The SCR annotation which shows the number of soft-clipped reads now
      correctly pools reads together regardless of the variant type.
      Previously only reads with indels were included at indel sites.

    - Major revamp of BAQ. Please see
      https://github.com/samtools/bcftools/pull/1474 for details. The
      previous behaviour can be triggered by providing the `--config 1.12`
      option.

    - Thanks to improvements in HTSlib, the removal of overlapping reads
      (which can be disabled with the `-x, --ignore-overlaps` options) is
      not systematically biased anymore
      (https://github.com/samtools/htslib/pull/1273)

    - Modified scale of Mann-Whitney U tests. Newly INFO/*Z annotations will
      be printed, for example MQBZ replaces MQB.

* bcftools norm:

    - Fix Type=Flag output in `norm --atomize` (#1472)

    - Atomization must not discard ALT=. records

    - Atomization of AD and QS tags now correctly updates occurrences of
      duplicate alleles within different haplotypes

    - Fix a bug in atomization of Number=A,R tags

* bcftools reheader:

    - Add `-T, --temp-prefix` option

* bcftools +setGT:

    - A wider range of genotypes can be set by the plugin by allowing
      specifying custom genotypes. For example, to force a heterozygous
      genotype it is now possible to use expressions like:

     c:'m|M' c:0/1 c:0

* bcftools +split-vep:

    - New `-u, --allow-undef-tags` option

    - Better handling of ambiguous keys such as INFO/AF and CSQ/AD. The `-p,
      --annot-prefix` option is now applied before doing anything else which
      allows its use with `-f, --format` and `-c, --columns` options.

    - Some consequence field names may not constitute a valid tag name, such
      as "pos(1-based)". Newly field names are trimmed to exclude brackets.

* bcftools +tag2tag:

    - New --QR-QA-to-QS option to convert annotations generated by Freebays
      to QS used by BCFtools

* bcftools +trio-dnm:

    - Add support for sites with more than four alleles. Note that only the
      four most frequent alleles are considered, the model remains unchanged.
      Previously such sites were skipped.

    - New --use-NAIVE option for a naive DNM calling based solely on
      FORMAT/GT and expected Mendelian inheritance. This option is
      suitable for prefiltering.

    - Fix behaviour to match the documentation, the `--dnm-tag DNG` option now
      correctly outputs log scaled values by default, not phred scaled.

    - Fix bug in VAF calculation, homozygous de novo variants were
      incorrectly reported as having VAF=50%

    - Fix arithmetic underflow which could lead to imprecise scores and
      improve sensitivity in high coverage regions

    - Allow combining --pn and --pns to set the noise trehsholds
      independently


--
The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to