Samtools (and HTSlib and BCFtools) version 1.13 is now available from
GitHub and SourceForge.
https://sourceforge.net/projects/samtools/
https://github.com/samtools/htslib/releases/tag/1.13
https://github.com/samtools/samtools/releases/tag/1.13
https://github.com/samtools/bcftools/releases/tag/1.13
The main changes are listed below:
------------------------------------------------------------------------------
htslib - changes v1.13
------------------------------------------------------------------------------
Features and Updates
--------------------
* In case a PG header line has multiple ID tags supplied by other
applications, the header API now selects the first one encountered as
the identifying tag and issues a warning when detecting subsequent ID
tags. (#1256; fixed samtools/samtools#1393)
* VCF header reading function (vcf_hdr_read) no longer tries to download a
remote index file by default. (#1266; fixes #380)
* Support reading and writing FASTQ format in the same way as SAM, BAM or
CRAM. Records read from a FASTQ file will be treated as unmapped data.
(#1156)
* Added GCP requester pays bucket access. Thanks to @indraniel. (#1255)
* Made mpileup's overlap removal choose which copy to remove at random
instead of always removing the second one. This avoids strand bias in
experiments where the +ve and -ve strand reads always appear in the same
order. (#1273; fixes samtools/bcftools#1459)
* It is now possible to use platform specific BAQ parameters. This also
selects long-read parameters for read lengths bigger than 1kb, which helps
bcftools mpileup call SNPs on PacBio CCS reads. (#1275)
* Improved bcf_remove_allele_set. This fixes a bug that stopped iteration
over alleles prematurely, marks removed alleles as 'missing' and does
automatic lazy unpacking. (#1288; fixes #1259)
* Improved compression metrics for unsorted CRAM files. This improves the
choice of codecs when handling unsorted data. (#1291)
* Linear index entries for empty intervals are now initialised with the file
offset in the next non-empty interval instead of the previous one. This
may reduce the amount of data iterators have to discard before reaching the
desired region, when the starting location is in a sequence gap. Thanks to
@carsonh for reporting the issue. (#1286; fixes #486)
* A new hts_bin_level API function has been added, to compute the level of a
given bin in the binning index. (#1286)
* Related to the above, a new API method, hts_idx_nseq, now returns the total
number of contigs from an index. (#1295 and #1299)
* Added bracket handling to bcf_hdr_parse_line, for use with ##META lines.
Thanks to Alberto Casas Ortiz. (#1240)
Build changes
-------------
These are compiler, configuration and makefile based changes.
* HTSlib now uses libhtscodecs release 1.1.1.
* Added a curl/curl.h check to configure and improved INSTALL documentation
on build options. Thanks to Melanie Kirsche and John Marshall. (#1265;
fixes #1261)
* Some fixes to address GCC 11.1 warnings. (#1280, #1284, #1285; fixes #1283)
* Supports building HTSlib in a separate directory. Thanks to John Marshall.
(#1277; fixes #231)
* Supports building HTSlib on MinGW 32-bit environments.
Thanks to John Marshall. (#1301)
Bug fixes
---------
* Fixed hts_itr_query() et al region queries: fixed bug introduced in
HTSlib 1.12, which led to iterators producing very few reads for some
queries (especially for larger target regions) when unmapped reads were
present. HTSlib 1.11 had a related problem in which iterators would
omit a few unmapped reads that should have been produced; cf #1142.
Thanks to Daniel Cooke for reporting the issue. (#1281; fixes #1279)
* Removed compressBound assertions on opening bgzf files.
Thanks to Gurt Hulselmans for reporting the issue. (#1258; fixed #1257)
* Duplicate sample name error message for a VCF file now only displays
the duplicated name rather the entire same name list. (#1262; fixes
samtools/bcftools#1451)
* Fix to make samtools cat work on CRAMs again. (#1276; fixes
samtools/samtools#1420)
* Fix for a double memory free in SAM header creation. Thanks to @ihsinme.
(#1274)
* Prevent assert in bcf_sr_set_regions. Thanks to Dr K D Murray. (#1270)
* Fixed crash in knet_open() etc stubs. Thanks to John Marshall. (#1289)
* Fixed filter expression "cigar" on unmapped reads. Stop treating an
empty CIGAR string as an error. Thanks to Chang Y for reporting the
issue. (#1298, fixes samtools/samtools#1445)
* Bug fixes in the bundled copy of htscodecs:
- Fixed an uninitialized access in the name tokeniser decoder.
(samtools/htscodecs#23)
- Fixed a bug with name tokeniser and variable number of names per
slice, causing it to incorrectly report an error on certain valid
inputs. (samtools/htscodecs#24)
------------------------------------------------------------------------------
samtools - changes v1.13
------------------------------------------------------------------------------
* Fixed samtools view FILE REGION, mpileup -r REGION, coverage -r REGION
and other region queries: fixed bug introduced in 1.12, which led to
region queries producing very few reads for some queries (especially for
larger target regions) when unmapped reads were present. Thanks to
@vinimfava (#1451), @JingGuo1997 (#1457) and Ramprasad Neethiraj (#1460)
for reporting the respective issues.
* Added options to set and clear flags to samtools view. Along with the
existing remove aux tags this gives the ability to remove mark duplicate
changes (part of #1358) (#1441)
* samtools view now has long option equivalents for most of its
single-letter options. Thanks to John Marshall. (#1442)
* A new tool, samtools import, has been added. It reads one or more FASTQ
files and converts them into unmapped SAM, BAM or CRAM. (#1323)
* Fixed samtools coverage error message when the target region name is not
present in the file header. Thanks to @Lyn16 for reporting it. (#1462;
fixes #1461)
* Made samtools coverage ASCII mode produce true ASCII output. Previously
it would produce UTF-8 characters. (#1423; fixes #1419)
* samtools coverage now allows setting the maximum depth, using the
-d/--depth option. Also, the default maximum depth has been set to
1000000. (#1415; fixes #1395)
* Complete rewrite of samtools depth. This means it is now considerably
faster and does not need a depth limit to avoid high memory usage.
Results should mostly be the same as the old command with the potential
exception of overlap removal. (#1428; fixes #889, helps ameliorate #1411)
* samtools flags now accepts any number of command line arguments, allowing
multiple SAM flag combinations to be converted at once.
Thanks to John Marshall. (#1401, fixes #749)
* samtools ampliconclip, ampliconstats and plot-ampliconstats now support
inputs that list more than one reference. (#1410 and #1417; fixes #1396
and #1418)
* samtools ampliconclip now accepts the --tolerance option, which allows the
user to set the number of bases within which a region is matched. The
default is 5. (#1456)
* Updated the documentation on samtools ampliconclip to be clearer about
what it does. From a suggestion by Nathan S Watson-Haigh. (#1448)
* Fixed negative depth values in ampliconstats output. (#1400)
* samtools addreplacerg now allows for updating (replacing) an existing
`@RG` line in the output header, if a new `@RG` line is provided in the
command line, via the -r argument. The update still requires the user's
approval, which can be given with the new -w option.
Thanks to Chuang Yu. (#1404)
* Stopped samtools cat from outputting multiple CRAM EOF markers. (#1422)
* Three new counts have been added to samtools flagstat: primary, mapped
primary and duplicate primary. (#1431; fixes #1382)
* samtools merge now accepts a `-o FILE` option specifying the output file,
similarly to most other subcommands. The existing way of specifying it (as
the first non-option argument, alongside the input file arguments) remains
supported. Thanks to David McGaughey and John Marshall. (#1434)
* The way samtools merge checks for existing files has been changed so that
it does not hang when used on a named pipe. (#1438; fixes #1437)
* Updated documentation on mpileup to highlight the fact that the filtering
options on FLAGs work with ANY rules. (#1447; fixes #1435)
* samtools can now be configured to use a copy of HTSlib that has been
set up with separate build and source trees. When this is the case,
the `--with-htslib` configure option should be given the location of
the HTSlib build tree. (Note that samtools itself does not yet support
out-of-tree builds). Thanks to John Marshall. (#1427; companion change
to samtools/htslib#1277)
------------------------------------------------------------------------------
bcftools - changes v1.13
------------------------------------------------------------------------------
This release brings new options and significant changes in BAQ
parametrization in `bcftools mpileup`. The previous behaviour can be
triggered by providing the `--config 1.12` option. Please see
PR #1474 for details.
Changes affecting the whole of bcftools, or multiple commands:
* Improved build system
Changes affecting specific commands:
* bcftools annotate:
- Fix rare a bug when INFO/END is present, all INFO fields are removed
with `bcftools annotate -x INFO` and BCF output is produced. Then the
removed INFO/END continues to inform the end coordinate and causes
incorrect retrieval of records with the -r option (#1483)
- Support for matching annotation line by ID, in addition to
CHROM,POS,REF, and ALT (#1461)
bcftools annotate -a annots.tab.gz -c CHROM,POS,~ID,REF,ALT,INFO/END
input.vcf
* bcftools csq:
- When GFF and VCF/fasta use a different chromosome naming convention
(e.g. chrX vs X), no consequences would be added. Newly the program
attempts to detect these differences and remove/add the "chr" prefix to
chromosome name to match the GFF and VCF/fasta (#1507)
- Parametrize brief-predictions parameter to allow explicit number of
amino acids to be printed. Note that the `-b, --brief-predictions`
option is being replaced with `-B, --trim-protein-seq INT`
* bcftools +fill-tags:
- Generalization and better support for custom functions that allow
adding new INFO tags based on arbitrary `-i, --include` type of
expressions. For example, to calculate a missing INFO/DP annotation
from FORMAT/AD, it is possible to use: -t 'DP:1=int(sum(FORMAT/AD))'
Here the optional ":1" part specifies that a single value will be added (by
default Number=. is used) and the optional int(...) adds an integer value (by
default Type=Float is used).
- When FORMAT/GT is not present, the INFO/AF tag will be newly calculated
from INFO/AC and INFO/AN.
* bcftools gtcheck:
- Switch between FORMAT/GT or FORMAT/PL when one is (implicitly)
requested but only the other is available
- Improve diagnostics, printing warnings when a line cannot be matched
and the number of lines skipped for various reasons (#1444)
- Minor bug fix, with PLs being the default, the `--distinctive-sites`
option started to require explicit `--error-probability 0`
* bcftools index:
- The program now accepts both data file name and the index file name.
This adds to user convenience when running index statistics (-n, -s)
* bcftools isec:
- Always generate sites.txt with isec -p (#1462)
* bcftools +mendelian:
- Consider only complete trios, do not crash on sample name typos (#1520)
* bcftools mpileup:
- New `--seed` option for reproducibility of subsampling code in HTSlib
- The SCR annotation which shows the number of soft-clipped reads now
correctly pools reads together regardless of the variant type.
Previously only reads with indels were included at indel sites.
- Major revamp of BAQ. Please see
https://github.com/samtools/bcftools/pull/1474 for details. The
previous behaviour can be triggered by providing the `--config 1.12`
option.
- Thanks to improvements in HTSlib, the removal of overlapping reads
(which can be disabled with the `-x, --ignore-overlaps` options) is
not systematically biased anymore
(https://github.com/samtools/htslib/pull/1273)
- Modified scale of Mann-Whitney U tests. Newly INFO/*Z annotations will
be printed, for example MQBZ replaces MQB.
* bcftools norm:
- Fix Type=Flag output in `norm --atomize` (#1472)
- Atomization must not discard ALT=. records
- Atomization of AD and QS tags now correctly updates occurrences of
duplicate alleles within different haplotypes
- Fix a bug in atomization of Number=A,R tags
* bcftools reheader:
- Add `-T, --temp-prefix` option
* bcftools +setGT:
- A wider range of genotypes can be set by the plugin by allowing
specifying custom genotypes. For example, to force a heterozygous
genotype it is now possible to use expressions like:
c:'m|M' c:0/1 c:0
* bcftools +split-vep:
- New `-u, --allow-undef-tags` option
- Better handling of ambiguous keys such as INFO/AF and CSQ/AD. The `-p,
--annot-prefix` option is now applied before doing anything else which
allows its use with `-f, --format` and `-c, --columns` options.
- Some consequence field names may not constitute a valid tag name, such
as "pos(1-based)". Newly field names are trimmed to exclude brackets.
* bcftools +tag2tag:
- New --QR-QA-to-QS option to convert annotations generated by Freebays
to QS used by BCFtools
* bcftools +trio-dnm:
- Add support for sites with more than four alleles. Note that only the
four most frequent alleles are considered, the model remains unchanged.
Previously such sites were skipped.
- New --use-NAIVE option for a naive DNM calling based solely on
FORMAT/GT and expected Mendelian inheritance. This option is
suitable for prefiltering.
- Fix behaviour to match the documentation, the `--dnm-tag DNG` option now
correctly outputs log scaled values by default, not phred scaled.
- Fix bug in VAF calculation, homozygous de novo variants were
incorrectly reported as having VAF=50%
- Fix arithmetic underflow which could lead to imprecise scores and
improve sensitivity in high coverage regions
- Allow combining --pn and --pns to set the noise trehsholds
independently
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help