Samtools (and HTSlib and BCFtools) version 1.12 is now available from
GitHub and SourceForge.

https://sourceforge.net/projects/samtools/
https://github.com/samtools/htslib/releases/tag/1.12
https://github.com/samtools/samtools/releases/tag/1.12
https://github.com/samtools/bcftools/releases/tag/1.12

The main changes are listed below:

------------------------------------------------------------------------------
htslib - changes v1.12
------------------------------------------------------------------------------

Features and Updates
--------------------

* Added experimental CRAM 3.1 and 4.0 support. (#929)

  These should not be used for long term data storage as the specification
  still needs to be ratified by GA4GH and may be subject to changes in format.
  (This is highly likely for 4.0).  However it may be tested using:

    test/test_view -t ref.fa -C -o version=3.1 in.bam -p out31.cram

  For smaller but slower files, try varying the compression profile with an
  additional "-o small".  Profile choices are fast, normal, small and archive,
  and can be applied to all CRAM versions.

* Added a general filtering syntax for alignment records in SAM/BAM/CRAM
  readers. (#1181, #1203)

  An example to find chromosome spanning read-pairs with high mapping quality:
   'mqual >= 30 && mrname != rname'

  To find significant sized deletions: 'cigar =~ "[0-9]{2}D"' or
  'rlen - qlen > 10'.

 To report duplicates that aren't part of a "proper pair":
   'flag.dup && !flag.proper_pair'

 More details are in the samtools.1 man page under "FILTER EXPRESSIONS".

* The knet networking code has been removed.  It only supported the http and
  ftp protocols, and a better and safer alternative using libcurl has been
  available since release 1.3.  If you need access to ftp:// and http://
  URLs, HTSlib should be built with libcurl support. (#1200)

* The old htslib/knetfile.h interfaces have been marked as deprecated.  Any
  code still using them should be updated to use hFILE instead. (#1200)

* Added an introspection API for checking some of the capabilities provided
  by HTSlib. (#1170) Thanks also to John Marshall for contributions. (#1222)

    - `hfile_list_schemes`: returns the number of schemes found

    - `hfile_list_plugins`: returns the number of plugins found

    - `hfile_has_plugin`: checks if a specific plugin is available

    - `hts_features`: returns a bit mask with all available features

    - `hts_test_feature`: test if a feature is available

    - `hts_feature_string`: return a string summary of enabled features

* Made performance improvements to `probaln_glocal` method, which speeds up
  mpileup BAQ calculations. (#1188)

    - Caching of reused loop variables and removal of loop invariants

    - Code reordering to remove instruction latency.

    - Other refactoring and tidyups.

* Added a public method for constructing a BAM record from the component
  pieces. Thanks to Anders Kaplan. (#1159, #1164)

* Added two public methods, `sam_parse_cigar` and `bam_parse_cigar`, as part
  of a small CIGAR API (#1169, #1182). Thanks to Daniel Cameron for input.
  (#1147)

* HTSlib, and the included htsfile program, will now recognise the old RAZF
  compressed file format.  Note that while the format is detected, HTSlib is
  unable to read it.  It is recommended that RAZF files are uncompressed with
  `gunzip` before using them with HTSlib.  Thanks to John Marshall (#1244);
  and Matthew J. Oldach who reported problems with uncompressing some RAZF
  files (samtools/samtools#1387).

* The S3 plugin now has options to force the address style.  It will
  recognise the addressing_style and host_bucket entries in the
  respective aws .credentials and s3cmd .s3cfg files.  There is also a
  new HTS_S3_ADDRESS_STYLE environment variable.  Details are in the
  htslib-s3-plugin.7 man file (#1249).

Build changes
-------------

These are compiler, configuration and makefile based changes.

* Added new Makefile targets for the applications that embed HTSlib and want
  to run its test suite or clean its generated artefacts. (#1230, #1238)

* The CRAM codecs are now obtained via the htscodecs submodule, hence when
  cloning it is now best to use "git clone --recursive".  In an existing
  clone, you may use "git submodule update --init" to obtain the htscodecs
  submodule checkout.

* Updated CI test configuration to recurse HTSlib submodules. (#1359)

* Added Cirrus-CI integration as a replacement for Travis, which was phased
  out.  (#1175; #1212)

* Updated the Windows image used by Appveyor to 'Visual Studio 2019'. (#1172;
  fixed #1166)

* Fixed a buglet in configure.ac, exposed by the release 2.70 of autoconf.
  Thanks to John Marshall. (#1198)

* Fixed plugin linking on macOS, to prevent symbol conflict when linking with
  a static HTSlib. Thanks to John Marshall. (#1184)

* Fixed a clang++9 error in `cram_io.h`. Thanks to Pjotr Prins. (#1190)

* Introduced $(ALL_CPPFLAGS) to allow for more flexibility in setting the
  compiler flags. Thanks to John Marshall. (#1187)

* Added 'fall through' comments to prevent warnings issued by Clang on
  intentional fall through case statements, when building with  `-Wextra
  flag`. Thanks to John Marshall. (#1163)

* Non-configure builds now define _XOPEN_SOURCE=600 to allow them to work
  when the `gcc -std=c99` option is used.  Thanks to John Marshall. (#1246)

Bug fixes
---------

* Fixed VCF `#CHROM` header parsing to only separate columns at tab
  characters. Thanks to Sam Morris for reporting the issue. (#1237; fixed
  samtools/bcftools#1408)

* Fixed a crash reported in `bcf_sr_sort_set`, which expects REF to be
  present. (#1204; fixed samtools/bcftools#1361)

* Fixed a bcf synced reader bug when filtering with a region list, and the
  first record for a chromosome had the same position as the last record for
  the previous chromosome. (#1254; fixed samtools/bcftools#1441)

* Fixed a bug in the overlapping logic of mpileup, dealing with iterating
  over CIGAR segments. Thanks to `@wulj2` for the analysis. (#1202; fixed
  #1196)

* Fixed a tabix bug that prevented setting the correct number of lines to be
  skipped in a region file. Thanks to Jim Robinson for reporting it. (#1189;
  fixed #1186)

* Made `bam_itr_next` an alias for `sam_itr_next`, to prevent it from
  crashing when working with htsFile pointers. Thanks to Torbjörn Klatt
  for reporting it. (#1180; fixed #1179)

* Fixed once per outgoing multi-threaded block `bgzf_idx_flush` assertion, to
  accommodate situations when a single record could span multiple blocks.
  Thanks to `@lacek`. (#1168; fixed samtools/samtools#1328)

* Fixed assumption of pthread_t being a non-structure, as permitted by POSIX.
  Thanks also to John Marshall and Anders Kaplan. (#1167, #1153, #1153)

* Fixed the minimum offset of a BAI index bin, to account for unmapped reads.
  Thanks to John Marshall for spotting the issue. (#1158; fixed #1142)

* Fixed the CRLF handling in `sam_parse_worker` method. Thanks to
  Anders Kaplan. (#1149; fixed #1148)

* Included unistd.h and errno.h directly in HTSlib files, as opposed to
  including them indirectly, via third party code. Thanks to
  Andrew Patterson (#1143) and John Marshall (#1145).

------------------------------------------------------------------------------
samtools - changes v1.12
------------------------------------------------------------------------------

 * The legacy samtools API (libbam.a, bam.h, sam.h, etc) has not been
   actively maintained since 2015. It is deprecated and will be removed
   entirely in a future SAMtools release. We recommend coding against the
   HTSlib API directly.

 * I/O errors and record parsing errors during the reading of SAM/BAM/CRAM
   files are now always detected. Thanks to John Marshall (#1379; fixed #101)

 * New make targets have been added: check-all, test-all, distclean-all,
   mostlyclean-all, testclean-all, which allow SAMtools installations to
   call corresponding Makefile targets from embedded HTSlib installations.

 * samtools --version now displays a summary of the compilation details and
   available features, including flags, used libraries and enabled plugins
   from HTSlib. As an alias, `samtools version` can also be used. (#1371)

 * samtools stats now displays the number of supplementary reads in the SN
   section. Also, supplementary reads are no longer considered when splitting
   read pairs by orientation (inward, outward, other). (#1363)

 * samtools stats now counts only the filtered alignments that overlap target
   regions, if any are specified. (#1363)

 * samtools view now accepts option -N, which takes a file containing read
   names of interest. This allows the output of only the reads with names
   contained in the given file. Thanks to Daniel Cameron. (#1324)

 * samtools view -d option now works without a tag associated value, which
   allows it to output all the reads with the given tag. (#1339; fixed #1317)

 * samtools view -d and -D options now accept integer and single character
   values associated with tags, not just strings. Thanks to `@dariome` and
   Keiran Raine for the suggestions. (#1357, #1392)

 * samtools view now works with the filtering expressions introduced by
   HTSlib. The filtering expression is passed to the program using the
   specific option -e or the global long option --input-fmt-option. E.g.

   samtools view -e 'qname =~ "#49$" && mrefid != refid && refid != -1 &&  
mrefid != -1' align.bam

   looks for records with query-name ending in `#49` that have their mate
   aligned in a different chromosome. More details can be found in the
   FILTER EXPRESSIONS section of the main man page. (#1346)

 * samtools markdup now benefits from an increase in performance in the
   situation when a single read has tens or hundreds of thousands of
   duplicates. Thanks to `@denriquez` for reporting the issue. (#1345;
   fixed #1325)

 * The documentation for samtools ampliconstats has been added to the
   samtools man page. (#1351)

 * A new FASTA/FASTQ sanitizer script (`fasta-sanitize.pl`) was added, which
   corrects the invalid characters in the reference names. (#1314) Thanks to
   John Marshall for the installation fix. (#1353)

 * The CI scripts have been updated to recurse the HTSlib submodules when
   cloning HTSlib, to accommodate for the CRAM codecs, which now reside in
   the htscodecs submodule. (#1359)

 * The CI integrations now include Cirrus-CI rather than Travis. (#1335;
   #1365)

 * Updated the Windows image used by Appveyor to 'Visual Studio 2019'.
   (#1333; fixed #1332)

 * Fixed a bug in samtools cat, which prevented the command from running in
   multi-threaded mode. Thanks to Alex Leonard for reporting the issue.
   (#1337; fixed #1336)

 * A couple of invalid CIGAR strings have been corrected in the test data.
   (#1343)

 * The documentation for `samtools depth -s` has been improved. Thanks to
   `@wulj2`. (#1355)

 * Fixed a `samtools merge` segmentation fault when it failed to merge header
   `@PG` records. Thanks to John Marshall.  (#1394; reported by Kemin Zhou in
   #1393)

 * Ampliconclip and ampliconstats now guard against the BED file containing
   more than one reference (chromosome) and fail when found.  Adding proper
   support for multiple references will appear later.  (#1398)
------------------------------------------------------------------------------
bcftools - changes v1.12
------------------------------------------------------------------------------

Changes affecting the whole of bcftools, or multiple commands:

* The output file type is determined from the output file name suffix, where
  available, so the -O/--output-type option is often no longer necessary.

* Make F_MISSING in filtering expressions work for sites with multiple ALT
  alleles (#1343)

* Fix N_PASS and F_PASS to behave according to expectation when reverse
  logic is used (#1397). This fix has the side effect of `query` (or
  programs like `+trio-stats`) behaving differently with these expressions,
  operating now in site-oriented rather than sample-oriented mode. For
  example, the new behavior could be:

    bcftools query -f'[%POS %SAMPLE %GT\n]' -i'N_PASS(GT="alt")==1'
        11       A       0/0
        11       B       0/0
        11       C       1/1

  while previously the same expression would return:
        11       C       1/1

  The original mode can be mimicked by splitting the filtering into two steps:

    bcftools view -i'N_PASS(GT="alt")==1' | \
    bcftools query -f'[%POS %SAMPLE %GT\n]' -i'GT="alt"'

Changes affecting specific commands:

* bcftools annotate:

    - New `--rename-annots` option to help fix broken VCFs (#1335)

    - New -C option allows to read a long list of options from a file to
      prevent very long command lines.

    - New `append-missing` logic allows annotations to be added for each ALT
      allele in the same order as they appear in the VCF. Note that this is
      not bullet proof. In order for this to work:

        - the annotation file must have one line per ALT allele

        - fields must contain a single value as multiple values are appended
          as they are and would break the correspondence between the alleles
          and values

* bcftools concat:

    - Do not phase genotypes by mistake if they are not already phased
      with `-l` (#1346)

* bcftools consensus:

    - New `--mask-with`, `--mark-del`, `--mark-ins`, `--mark-snv` options
      (#1382, #1381, #1170)

    - Symbolic <DEL> should have only one REF base. If there are multiple,
      take POS+1 as the first deleted base.

    - Make consensus work when the first base of the reference genome is
      deleted. In this situation the VCF record has POS=1 and the first REF
      base cannot precede the event. (#1330)

* bcftools +contrast:

    - The NOVELGT annotation was previously not added when requested.

* bcftools convert:

    - Make the --hapsample and --hapsample2vcf options consistent with each
      other and with the documentation.

* bcftools call:

    - Revamp of `call -G`, previously sample grouping by population was not
      truly independent and could still be influenced by the presence of
      other sample groups.

    - Optional addition of INFO/PV4 annotation with `call -a INFO/PV4`

    - Remove generation of useless HOB and ICB annotation; use
      `+fill-tags -- -t HWE,ExcHet` instead

    - The `call -f` option was renamed to `-a` to (1) make it consistent with
      `mpileup` and (2) to indicate that it includes both INFO and FORMAT
      annotations, not just FORMAT as previously

    - Any sensible Number=R,Type=Integer annotation can be used with -G, such
      as AD or QS

    - Don't trim QUAL; although usefulness of this change is
      questionable for true probabilistic interpretation (such high
      precision is unrealistic), using QUAL as a score rather than
      probability is helpful and permits more fine-grained filtering

    - Fix a suspected bug in `call -F` in the worst case, for certain improve
      readability

    - `call -C trio` is temporarily disabled

* bcftools csq:

    - Fix a bug wich caused incorrect FORMAT/BCSQ formatting at sites with
      too many per-sample consequences

    - Fix a bug which incorrectly handled the --ncsq parameter and could
      clash with reserved BCF values, consequently producing truncated or
      even incorrect output of the %TBCSQ formatting expression in `bcftools
      query`. To account for the reserved values, the new default value is
      --ncsq 15 (#1428)

* bcftools +fill-tags:

    - MAF definition revised for multiallelic sites, the second most common
      allele is considered to be the minor allele (#1313)

    - New FORMAT/VAF, VAF1 annotations to set the fraction of alternate reads
      provided FORMAT/AD is present

* bcftools gtcheck:

    - support matching of a single sample against all other samples in the
      file with `-s qry:sample -s gt:-`. This was previously not possible,
      either full cross-check mode had to be run or a list of pairs/samples
      had to be created explicitly

* bcftools merge:

    - Make `merge -R` behavior consistent with other commands and pull in
      overlapping records with POS outside of the regions (#1374)

    - Bug fix (#1353)

* bcftools mpileup:

    - Add new optional tag `mpileup -a FORMAT/QS`

* bcftools norm:

    - New `-a, --atomize` functionality to decompose complex variants, for
      example MNVs into consecutive SNVs

    - New option `--old-rec-tag` to indicate the original variant

* bcftools query:

    - Incorrect fields were printed in the per-sample output when subset of
      samples was requested via -s/-S and the order of samples in the header
      was different from the requested -s/-S order (#1435)

* bcftools +prune:

    - New options --random-seed and --nsites-per-win-mode (#1050)

* bcftools +split-vep:

    - Transcript selection now works also on the raw CSQ/BCSQ annotation.

    - Bug fix, samples were dropped on VCF input and VCF/BCF output (#1349)

* bcftools stats:

    - Changes to QUAL and ts/tv plotting stats: avoid capping QUAL to
      predefined bins, use an open-range logarithmic binning instead

    - plot dual ts/tv stats: per quality bin and cumulative as if threshold
      applied on the whole dataset

* bcftools +trio-dnm2:

    - Major revamp of +trio-dnm plugin, which is now deprecated and replaced
      by +trio-dnm2.

      The original trio-dnm calling model used genotype likelihoods (PLs) as
      the input for calling. However, that is flawed because PLs make
      assumptions which are unsuitable for de novo calling: PL(RR) can become
      bigger than PL(RA) even when the ALT allele is present in the parents.
      Note that this is true also for other programs such as DeNovoGear which
      rely on the same samtools calculation.

      The new recommended workflow is:

        bcftools mpileup -a AD,QS -f ref.fa -Ou \
           proband.bam father.bam mother.bam | \
        bcftools call -mv -Ou | \
        bcftools +trio-dnm -p proband,father,mother -Oz -o output.vcf.gz

      This new version also implements the DeNovoGear model. The original
      behavior of trio-dnm is no longer supported.

      For more details see http://samtools.github.io/bcftools/trio-dnm.pdf



--
The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to