Samtools (and HTSlib and BCFtools) version 1.14 is now available from
GitHub and SourceForge.
https://sourceforge.net/projects/samtools/
https://github.com/samtools/htslib/releases/tag/1.14
https://github.com/samtools/samtools/releases/tag/1.14
https://github.com/samtools/bcftools/releases/tag/1.14
The main changes are listed below:
------------------------------------------------------------------------------
htslib - changes v1.14
------------------------------------------------------------------------------
Features and Updates
--------------------
* Added a keep option to bgzip to leave the original file untouched. This
brings bgzip into line with gzip. (PR #1331, thanks to Alex Petty)
* "endpos" has been added to the filter language, giving the position of the
rightmost mapped base as measured by the CIGAR string. For unmapped reads
it is the same as "pos". (PR #1307, thanks to John Marshall)
* Interfaces have been added to interpret the new base modification tags
added to the SAMtags document in samtools/hts-specs#418. (PR #1132)
* New API functions hts_flush()/sam_flush()/bcf_flush() for flushing output
htsFile/samFile/vcfFile streams. (PR #1326, thanks to John Marshall)
* The synced_bcf_reader now sorts lines with symbolic alleles by END tag as
well as POS. (PR #1321)
* Added synced_bcf_reader options BCF_SR_REGIONS_OVERLAP and
BCF_SR_TARGETS_OVERLAP for better control of records that start outside the
desired region but overlap it are handled. Fixes samtools/bcftools#1420
and samtools/bcftools#1421 raised by John Marshall. (PR #1327)
* HTSlib will now accept long-cigar CG:B: tags made by htsjdk which don't
quite follow the specification properly (using signed values instead of
unsigned). Thanks to Colin Diesh for reporting an example file. (PR #1317)
* The warning printed when the BGZF reader finds a file with no EOF block
has been changed to be less alarming. Unfortunately some third-party
BGZF encoders don't write EOF blocks at the end of files. Thanks to
Keiran Raine for reporting an example file. (PR #1323)
* The FASTA and FASTQ readers get an option to skip over the first item on
the header line, and use the second as the read name. It allows the
original name to be restored on some of the fastq files served from the
European Nucleotide Archive (ENA). (PR #1325)
* HTSlib is now more strict when parsing the VCF samples line (beginning
#CHROM). It will only accept tabs between the mandatory field names and
sample names must be separated with tabs. (PR #1328)
* HTSlib will now warn if it looks like the header has been corrupted by
diagnostic messages from the program that made it. This can happen when
using `nohup`, which by default mixes stdout and stderr into the same
stream. (PR#1339, thanks to John Marshall)
* File format detection will now recognise signatures for XZ, Zstd and D4
files (note that HTSlib will not read them yet). (PR #1340, thanks to
John Marshall)
Build changes
-------------
These are compiler, configuration and makefile based changes.
* Some redundant tests have been removed from the test harness, speeding it
up. (PR #1308)
* The version.sh script now works better on shallow checkouts. (PR #1324)
* A check-untracked Makefile target has been added to catch untracked files
(mostly) left by the test harness. (PR #1324)
Bug fixes
---------
* Fixed a case where flushing the thread pool could very occasionally cause a
deadlock. (PR #1309)
* Fixed a bug where some CRAM files could fail to decode if the
required_fields option was in use. Thanks to Matt Sexton for reporting
the issue. (PR #1314, fixes samtools/samtools#1475)
* Fixed a regression where the S3 plugin could not read public files
unless you supplied some Amazon credentials. Thanks to Chris Saunders
for reporting. (PR #1332, fixes samtools/samtools#1491)
* Fixed a possible CRAM thread deadlock discovered by @ryancaicse. (PR #1330,
fixes #1329)
* Some set-but-unused variables have been removed. (PR #1334)
* Fixed a bug which prevented "flag.read2" from working in the filter
language unless it was at the end of the expression. Thanks to
Vamsi Kodali for reporting the issue. (PR #1342)
* Fixed a memory leak that could happen if CRAM fails to inflate a LZMA
block. (PR #1340, thanks to John Marshall)
------------------------------------------------------------------------------
samtools - changes v1.14
------------------------------------------------------------------------------
Notice:
* Samtools mpileup VCF and BCF output (deprecated in release 1.9) will be
removed in the next release. Please use bcftools mpileup instead.
New work and changes:
* The legacy samtools API (libbam.a, bam_endian.h, sam.h and most of bam.h)
has been removed. We recommend coding against the HTSlib API directly.
The legacy API had not been actively maintained since 2015. (#1483)
* New "samtools samples" command to list the samples used in a SAM/BAM/CRAM
file. (#1432; thanks to Pierre Lindenbaum)
* "mpileup" now supports base modifications via the SAM Mm/MM auxiliary tag.
Please see the "--output-mods" option. (#1311)
* Added "mpileup --output-BP-5" option to output the BP field in 5' to 3'
order instead of left to right. (#1484; fixes #1481)
* Added "samtools view --rf" option as an additional FLAG filtering method.
This keeps records only if (FLAG & N) != 0. (#1508; fixes #1470)
* New "samtools import -N" option to use the second word on a FASTQ header
line, matching the SRA/ENA FASTQ variant. (#1485)
* Improve "view -x" option to simplify specifying multiple tags, and added
the reverse "--keep-tag" option to include rather than exclude. (#516)
* Switched the processing order of "view" -x (tag filtering) and -e
(expression) handling. Expressions now happen first so we can filter
on tags which are about to be deleted. This is now consistent with
the "view -d" behaviour too. (#1480; fixes #1476. Reported by
William Rowell)
* Added filter expression "endpos" keyword. (#1464. Thanks to
John Marshall)
* "samtools view" errors now appear after any SAM output, improving their
visibility. (#1490. Thanks to John Marshall)
* Improved "samtools sort" use of temporary files, both tidying up if it
fails and recovery when facing pre-existing temporary files. (#1510; fixes
#1035, #1503. Reported by Vivek Rai and Maarten Kooyman)
* Filtering in "samtools markdup" now sets the UNMAP BAM flag when given the
"-p" option. (#1512; fixes #1469)
* Make CRAM references shared during "samtools merge" so merging many files
has a lower memory usage. (#471)
Bug fixes:
* Prevent "samtools depth" from closing stdout when outputting to terminal,
avoiding a bad interaction with PySam. (#1465. Thanks to John Marshall)
* In-place "samtools reheader" now works on CRAMs produced using a higher
than default compression level. (#1479)
* Fix setting of the dt tag in "markdup". Optical duplicates were being
marked too early, negating the tagging and counting elsewhere. (#1487;
fixes #1486. Reported by Kevin Lewis)
* Reinstate the "samtools stats -I" option to filter by sample. (#1496;
fixes #1489. Reported by Matthias Bernt)
* Fix "samtools fastq" handling of dual index tags on single-ended input.
(#1474)
* Improve "samtools coverage" documentation. (#1521; fixes #1504. Reported
by Peter Menzel)
Non user-visible changes and build improvements:
* Replace Curses mvprintw() with va_list-based equivalent. (#1509. Thanks to
John Marshall and Andreas Tille)
* Fixed some clang-13 warning messages. (#1506)
* Improve quoting of options in "samtools import" tests. (#1466. Thanks to
John Marshall)
* Fixed a faulty test which caused test harness failures on NetBSD. (#1520)
------------------------------------------------------------------------------
bcftools - changes v1.14
------------------------------------------------------------------------------
Changes affecting the whole of bcftools, or multiple commands:
* New `--regions-overlap` and `--targets-overlap` options which address a
long-standing design problem with subsetting VCF files by region.
BCFtools recognize two sets of options, one for streaming (`-t/-T`) and
one for index-gumping (`-r/-R`). They behave differently, the first
includes only records with POS coordinate within the regions, the other
includes overlapping regions. The two new options allow to modify the
default behaviour, see the man page for more details.
* The `--output-type` option can be used to override the default compression
level
Changes affecting specific commands:
* bcftools annotate
- when `--set-id` and `--remove` are combined, `--set-id` cannot use tags
deleted by `--remove`. This is now detected and the program exists with
an informative error message instead of segfaulting (#1540)
- while non-symbolic variation are uniquely identified by
POS,REF,ALT, symbolic alleles starting at the same position were
indistinguishable. This prevented correct matching of records with
the same positions and variant type but different length given by
INFO/END (samtools/htslib@60977f2). When annotating from a VCF/BCF,
the matching is done automatically. When annotating from a
tab-delimited text file, this feature can be invoked by using `-c
INFO/END`.
- add a new '.' modifier to control whether missing values should be
carried over from a tab-delimited file or not. For example:
-c TAG .. adds TAG if the source value is not missing. If TAG exists
in the target file, it will be overwritten
-c .TAG .. adds TAG even if the source value is missing. This can
overwrite non-missing values with a missing value and can create empty
VCF fields (`TAG=.`)
* bcftools +check-ploidy
- by default missing genotypes are not used when determining ploidy.
With the new option `-m, --use-missing` it is possible to use the
information carried in the missing and half-missing genotypes (e.g.
".", "./." or "./1")
* bcftools concat
- new `--ligate-force` and `--ligate-warn` options for finer control
of `-l, --ligate` behaviour in imperfect overlaps. The new default
is to throw an error when sites present in one chunk but absent in
the other are encountered. To drop such sites and proceed, use the
new `--ligate-warn` option (previously this was the default). To
keep such sites, use the new `--ligate-force` option (#1567).
* bcftools consensus:
- Apply mask even when the VCF has no notion about the chromosome. It
was possible to encounter this problem when `contig` lines were not
present in the VCF header and no variants were called on that
chromosome (#1592)
* bcftools +contrast:
- support for chunking within map/reduce framework allowing to collect
NASSOC counts even for empty case/control sample sets (#1566)
* bcftools csq:
- bug fix, compound indels were not recognised in some cases (#1536)
- compound variants were incorrectly marked as 'inframe' even when stop
codon would occur before the frame was restored (#1551)
- bug fix, FORMAT/BCSQ bitmasks could have been assigned incorrectly
to some samples at multiallelic sites, a superset of the correct
consequences would have been set (#1539)
- bug fix, the upstream stop could be falsely assigned to all samples in
a multi-sample VCF even if the stop was relevant for a single sample
only (#1578)
- further improve the detection of mismatching chromosome naming (e.g.
"chrX" vs "X") in the GFF, VCF and fasta files
* bcftools merge:
- keep (sum) INFO/AN,AC values when merging VCFs with no samples (#1394)
* bcftools mpileup:
- new --indel-size option which allows to increase the maximum considered
indel size considered, large deletions in long read data are otherwise
lost.
* bcftools norm:
- atomization now supports Number=A,R string annotations (#1503)
- assign as many alternate alleles to genotypes at multiallelic sites in
the`-m +` mode, disregarding the phase. Previously the program assumed
to be executed as an inverse operation of `-m -`, but when that was not
the case, reference alleles would have been filled instead of multiple
alternate alleles (#1542)
* bcftools sort:
- increase accuracy of the --max-mem option limit, previously the limit
could be exceeded by more than 20% (#1576)
* bcftools +trio-dnm:
- new `--with-pAD` option to allow processing of VCFs without FORMAT/QS.
The existing `--ppl` option was changed to the analogous `--with-pPL`
* bcftools view:
- the functionality of the option --compression-level lost in 1.12 has
been restored
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help