Samtools (and HTSlib and BCFtools) version 1.9 is now available from
GitHub and SourceForge

https://sourceforge.net/projects/samtools/

https://github.com/samtools/htslib/releases/tag/1.9
https://github.com/samtools/samtools/releases/tag/1.9
https://github.com/samtools/bcftools/releases/tag/1.9

The main changes are listed below:

------------------------------------------------------------------------------
htslib - changes v1.9
------------------------------------------------------------------------------

* If `./configure` fails, `make` will stop working until either configure is
  re-run successfully, or `make distclean` is used.  This makes configuration
  failures more obvious.  (#711, thanks to John Marshall)

* The default SAM version has been changed to 1.6.  This is in line with the
  latest version specification and indicates that HTSlib supports the CG tag
  used to store long CIGAR data in BAM format.

* bgzip integrity check option '--test' (#682, thanks to @sd4B75bJ, @jrayner)

* Faidx can now index fastq files as well as fasta.  The fastq index adds an
  extra column to the `.fai` index which gives the offset to the quality
  values.  New interfaces have been added to `htslib/faidx.h` to read the
  fastq index and retrieve the quality values.  It is possible to open a
  fastq index as if fasta (only sequences will be returned), but not the
  other way round. (#701)

* New API interfaces to add or update integer, float and array aux tags.
  (#694)

* Add `level=<number>` option to `hts_set_opt()` to allow the compression
  level to be set.  Setting `level=0` enables uncompressed output. (#715)

* Improved bgzip error reporting.

* Better error reporting when CRAM reference files can't be opened. (#706)

* Fixes to make tests work properly on Windows/MinGW - mainly to handle line
  ending differences. (#716)

* Efficiency improvements:

  - Small speed-up for CRAM indexing.

  - Reduce the number of unnecessary wake-ups in the thread pool. (#703)

  - Avoid some memory copies when writing data, notably for uncompressed BGZF
    output. (#703)

* Bug fixes:

  - Fix multi-region iterator bugs on CRAM files. (#684)

  - Fixed multi-region iterator bug that caused some reads to be skipped
    incorrectly when reading BAM files. (#687)

  - Fixed synced_bcf_reader() bug when reading contigs multiple times. (#691,
    reported by @freeseek)

  - Fixed bug where bcf_hdr_set_samples() did not update the sample
    dictionary when removing samples. (#692, reported by @freeseek)

  - Fixed bug where the VCF record ref length was calculated incorrectly if
    an INFO END tag was present. (71b00a)

  - Fixed warnings found when compiling with gcc 8.1.0. (#700)

  - sam_hdr_read() and sam_hdr_write() will now return an error code if
    passed a NULL file pointer, instead of crashing.

  - Fixed possible negative array look-up in sam_parse1() that somehow
    escaped previous fuzz testing. (#731, reported by @fCorleone)

  - Fixed bug where cram range queries could incorrectly report an error when
    using multiple threads. (#734, reported by Brent Pedersen)

  - Fixed very rare rANS normalisation bug that could cause an assertion
    failure when writing CRAM files.  (#739, reported by @carsonhh)

------------------------------------------------------------------------------
samtools - changes v1.9
------------------------------------------------------------------------------

 * Samtools mpileup VCF and BCF output is now deprecated.  It is still
   functional, but will warn.  Please use bcftools mpileup instead. (#884)

 * Samtools mpileup now handles the '-d' max_depth option differently.  There
   is no longer an enforced minimum, and '-d 0' is interpreted as limitless
   (no maximum - warning this may be slow).  The default per-file depth is
   now 8000, which matches the value mpileup used to use when processing a
   single sample.  To get the previous default behaviour use the higher of
   8000 divided by the number of samples across all input files, or 250.
   (#859)

 * Samtools stats new features:

   - The '--remove-overlaps' option discounts overlapping portions of
     templates when computing coverage and mapped base counting. (#855)



   - When a target file is in use, the number of bases inside the target is
     printed and the percentage of target bases with coverage above a given
     threshold specified by the '--cov-threshold' option. (#855)

   - Split base composition and length statistics by first and last reads.
     (#814, #816)

 * Samtools faidx new features:

   - Now takes long options. (#509, thanks to Pierre Lindenbaum)

   - Now warns about zero-length and truncated sequences due to the requested
     range being beyond the end of the sequence. (#834)



   - Gets a new option (--continue) that allows it to carry on when a
     requested sequence was not in the index. (#834)

   - It is now possible to supply the list of regions to output in a text
     file using the new '--region-file' option. (#840)

   - New '-i' option to make faidx return the reverse complement of the
     regions requested. (#878)

   - faidx now works on FASTQ (returning FASTA) and added a new fqidx command
     to index and return FASTQ. (#852)

 * Samtools collate now has a fast option '-f' that only operates on primary
   pairs, dropping secondary and supplementary.  It tries to write pairs to
   the final output file as soon as both reads have been found. (#818)

 * Samtools bedcov gets a new '-j' option to make it ignore deletions (D) and
   reference skips (N) when computing coverage. (#843)

 * Small speed up to samtools coordinate sort, by converting it to use radix
   sort. (#835, thanks to Zhuravleva Aleksandra)

 * Samtools idxstats now works on SAM and CRAM files, however this isn't fast
   due to some information lacking from indices. (#832)

 * Compression levels may now be specified with the level=N
   output-fmt-option.  E.g. with -O bam,level=3.

 * Various documentation improvements.

 * Bug-fixes:

   - Improved error reporting in several places. (#827, #834, #877, cd7197)

   - Various test improvements.

   - Fixed failures in the multi-region iterator (view -M) when regions
     provided via BED files include overlaps (#819, reported by Dave Larson).

   - Samtools stats now counts '=' and 'X' CIGAR operators when counting
     mapped bases. (#855)

   - Samtools stats has fixes for insert size filtering (-m, -i). (#845; #697
     reported by Soumitra Pal)

   - Samtools stats -F now longer negates an earlier -d option. (#830)

   - Fix samtools stats crash when using a target region. (#875, reported by
     John Marshall)

   - Samtools sort now keeps to a single thread when the -@ option is absent.
     Previously it would spawn a writer thread, which could cause the CPU
     usage to go slightly over 100%. (#833, reported by Matthias Bernt)

   - Fixed samtools phase '-A' option which was incorrectly defined to take a
     parameter. (#850; #846 reported by Dianne Velasco)

   - Fixed compilation problems when using C_INCLUDE_PATH. (#870; #817
     reported by Robert Boissy)

   - Fixed --version when built from a Git repository. (#844, thanks to John
     Marshall)

   - Use noenhanced mode for title in plot-bamstats.  Prevents unwanted
     interpretation of characters like underscore in gnuplot version 5.
     (#829, thanks to M. Zapukhlyak)

   - blast2sam.pl now reports perfect match hits (no indels or mismatches).
     (#873, thanks to Nils Homer)

   - Fixed bug in fasta and fastq subcommands where stdout would not be
     flushed correctly if the -0 option was used.

   - Fixed invalid memory access in mpileup and depth on alignment records
     where the sequence is absent.

------------------------------------------------------------------------------
bcftools - changes v1.9
------------------------------------------------------------------------------

* `annotate`

    - REF and ALT columns can be now transferred from the annotation file.

    - fixed bug when setting vector_end values.

* `consensus`

    - new -M option to control output at missing genotypes

    - variants immediately following insersions should not be skipped.  Note
      however, that the current fix requires normalized VCF and may still
      falsely skip variants adjacent to multiallelic indels.

    - bug fixed in -H selection handling

* `convert`

    - the --tsv2vcf option now makes the missing genotypes diploid, "./."
      instead of "."

    - the behavior of -i/-e with --gvcf2vcf changed. Previously only
      sites with FILTER set to "PASS" or "." were expanded and the -i/-e
      options dropped sites completely. The new behavior is to let the
      -i/-e options control which records will be expanded. In order to
      drop records completely, one can stream through "bcftools view"
      first.

* `csq`

    - since the real consequence of start/splice events are not known, the
      aminoacid positions at subsequent variants should stay unchanged

    - add `--force` option to skip malformatted transcripts in GFFs with
      out-of-phase CDS exons.

* `+dosage`: output all alleles and all their dosages at multiallelic sites

* `+fixref`: fix serious bug in -m top conversion

* `-i/-e` filtering expressions:

    - add two-tailed binomial test

    - add functions N_PASS() and F_PASS()

    - add support for lists of samples in filtering expressions, with many
      samples it was impractical to list them all on the command line.
      Samples can be now in a file as, e.g., GT[@samples.txt]="het"

    - allow multiple perl functions in the expressions and some bug fixes

    - fix a parsing problem, '@' was not removed from '@filename' expressions

* `mpileup`: fixed bug where, if samples were renamed using the `-G`
  (`--read-groups`) option, some samples could be omitted from the output
  file.

* `norm`: update INFO/END when normalizing indels

* `+split`: new -S option to subset samples and to use custom file names
  instead of the defaults

* `+smpl-stats`: new plugin

* `+trio-stats`: new plugin

* Fixed build problems with non-functional configure script produced on some
  platforms


Rob Davies              r...@sanger.ac.uk
The Sanger Institute    http://www.sanger.ac.uk/
Hinxton, Cambs.,        Tel. +44 (1223) 834244
CB10 1SA, U.K.          Fax. +44 (1223) 494919


--
The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to