On Wed, 8 Nov 2017, Tommy Carstensen wrote:
To samtools-help,
1) I am trying to convert a cram to bam with samtools view v1.5, but eventually
I get the error below for some of the files, whereas others are successfully
converted:
Block CRC32 failure
[main_samview] truncated file.
Has anyone had this problem? I am quite sure the files are not
truncated/corrupted. How do people usually check that?
2) Furthermore, when typing the command below:
samtools view -H
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/gambian_genome_variation_project/data/FULA/SC_GMFUL5306338/alignment/SC_GMFUL5306338.alt_bwamem_GRCh38DH.20151208.FULA.gambian_lowcov.cram
Then I get this error:
[E::hts_hopen] Failed to open file
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/gambian_genome_variation_project/data/FULA/SC_GMFUL5306338/alignment/SC_GMFUL5306338.alt_bwamem_GRCh38DH.20151208.FULA.gambian_lowcov.cram
[E::hts_open_format] Failed to open file
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/gambian_genome_variation_project/data/FULA/SC_GMFUL5306338/alignment/SC_GMFUL5306338.alt_bwamem_GRCh38DH.20151208.FULA.gambian_lowcov.cram
samtools view: failed to open
"ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/gambian_genome_variation_project/data/FULA/SC_GMFUL5306338/alignment/SC_GMFUL5306338.alt_bwamem_GRCh38DH.20151208.FULA.gambian_lowcov.cram"
for reading: Exec format error
It looks like the file is corrupt:
curl
'ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/gambian_genome_variation_project/data/FULA/SC_GMFUL5306338/alignment/SC_GMFUL5306338.alt_bwamem_GRCh38DH.20151208.FULA.gambian_lowcov.cram'
| hexdump -C | head
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 9972M 0 99.3M 0 0 12.6M 0 0:13:07 0:00:07 0:13:00 9784k
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
066334b0 00 00 00 00 00 00 00 00 00 00 00 00 e5 7a 59 35 |.............zY5|
066334c0 e2 58 f8 07 bc 4e 2b 2d c7 c7 81 c5 9d 56 aa 27 |.X...N+-.....V.'|
066334d0 62 fb 82 89 05 77 99 23 df a9 f9 c9 8c 0f 67 c4 |b....w.#......g.|
066334e0 c4 2d 8e 2f 67 0a a8 1d 79 1f f5 ef a0 cd ca 71 |.-./g...y......q|
066334f0 a6 b9 24 99 6b b4 95 20 46 8f d5 0b c8 aa 40 bb |..$.k.. F.....@.|
06633500 05 05 d5 83 f4 8a 2b 86 86 4b 5b da cc 27 9c 8d |......+..K[..'..|
06633510 77 ca 1f 32 67 2d 14 62 99 90 21 bc 71 0a b2 5b |w..2g-.b..!.q..[|
06633520 40 a2 bb a9 2e a2 2c df 5f 16 b8 83 f7 c3 0c 9a |@.....,._.......|
That's a lot of zeros to find at the beginning of a CRAM file.
The bizarre error message is a result of htslib abusing the standard unix
error codes to pass back error conditions. In this case it couldn't work
out what sort of file it was trying to open.
The "Block CRC32 failure" is likely to be another corruption of some sort.
We really need to make the software print out the file position when this
happens. It would make tracking down exactly where the problem is much
easier. In this case, as long as the header is intact you may be able to
rescue the data after the corruption by using range queries to jump to
parts of the file after the broken bit.
Rob Davies r...@sanger.ac.uk
The Sanger Institute http://www.sanger.ac.uk/
Hinxton, Cambs., Tel. +44 (1223) 834244
CB10 1SA, U.K. Fax. +44 (1223) 494919
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help