On Wed, 8 Nov 2017, Tommy Carstensen wrote:

To samtools-help,

1) I am trying to convert a cram to bam with samtools view v1.5, but eventually 
I get the error below for some of the files, whereas others are successfully 
converted:
Block CRC32 failure
[main_samview] truncated file.

Has anyone had this problem? I am quite sure the files are not 
truncated/corrupted. How do people usually check that?

2) Furthermore, when typing the command below:
samtools view -H 
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/gambian_genome_variation_project/data/FULA/SC_GMFUL5306338/alignment/SC_GMFUL5306338.alt_bwamem_GRCh38DH.20151208.FULA.gambian_lowcov.cram

Then I get this error:
[E::hts_hopen] Failed to open file 
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/gambian_genome_variation_project/data/FULA/SC_GMFUL5306338/alignment/SC_GMFUL5306338.alt_bwamem_GRCh38DH.20151208.FULA.gambian_lowcov.cram
[E::hts_open_format] Failed to open file 
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/gambian_genome_variation_project/data/FULA/SC_GMFUL5306338/alignment/SC_GMFUL5306338.alt_bwamem_GRCh38DH.20151208.FULA.gambian_lowcov.cram
samtools view: failed to open 
"ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/gambian_genome_variation_project/data/FULA/SC_GMFUL5306338/alignment/SC_GMFUL5306338.alt_bwamem_GRCh38DH.20151208.FULA.gambian_lowcov.cram";
 for reading: Exec format error

It looks like the file is corrupt:

curl 
'ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/gambian_genome_variation_project/data/FULA/SC_GMFUL5306338/alignment/SC_GMFUL5306338.alt_bwamem_GRCh38DH.20151208.FULA.gambian_lowcov.cram'
 | hexdump -C | head
  % Total    % Received % Xferd  Average Speed   Time    Time     Time   Current
                                 Dload  Upload   Total   Spent    Left   Speed
  0 9972M    0 99.3M    0     0  12.6M      0  0:13:07  0:00:07  0:13:00 9784k
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
066334b0  00 00 00 00 00 00 00 00  00 00 00 00 e5 7a 59 35  |.............zY5|
066334c0  e2 58 f8 07 bc 4e 2b 2d  c7 c7 81 c5 9d 56 aa 27  |.X...N+-.....V.'|
066334d0  62 fb 82 89 05 77 99 23  df a9 f9 c9 8c 0f 67 c4  |b....w.#......g.|
066334e0  c4 2d 8e 2f 67 0a a8 1d  79 1f f5 ef a0 cd ca 71  |.-./g...y......q|
066334f0  a6 b9 24 99 6b b4 95 20  46 8f d5 0b c8 aa 40 bb  |..$.k.. F.....@.|
06633500  05 05 d5 83 f4 8a 2b 86  86 4b 5b da cc 27 9c 8d  |......+..K[..'..|
06633510  77 ca 1f 32 67 2d 14 62  99 90 21 bc 71 0a b2 5b  |w..2g-.b..!.q..[|
06633520  40 a2 bb a9 2e a2 2c df  5f 16 b8 83 f7 c3 0c 9a  |@.....,._.......|

That's a lot of zeros to find at the beginning of a CRAM file.

The bizarre error message is a result of htslib abusing the standard unix error codes to pass back error conditions. In this case it couldn't work out what sort of file it was trying to open.

The "Block CRC32 failure" is likely to be another corruption of some sort. We really need to make the software print out the file position when this happens. It would make tracking down exactly where the problem is much easier. In this case, as long as the header is intact you may be able to rescue the data after the corruption by using range queries to jump to parts of the file after the broken bit.

Rob Davies              r...@sanger.ac.uk
The Sanger Institute    http://www.sanger.ac.uk/
Hinxton, Cambs.,        Tel. +44 (1223) 834244
CB10 1SA, U.K.          Fax. +44 (1223) 494919


--
The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to