[issue21872] LZMA library sometimes fails to decompress a file

2014-07-01 Thread Ville Nummela

Ville Nummela added the comment:

Uploading a few more 'bad' lzma files for testing.

--
Added file: http://bugs.python.org/file35822/more_bad_lzma_files.zip

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21872
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21872] LZMA library sometimes fails to decompress a file

2014-06-26 Thread Ville Nummela

Ville Nummela added the comment:

My stats so far:

As of writing this, I have attempted to decompress about 5000 downloaded files 
(two years of tick data). 25 'bad' files were found within this lot.

I re-downloaded all of them, plus about 500 other files as the minimum lot the 
server supplies is 24 hours / files at a time.

I compared all these 528 file pairs using hashlib.md5 and got identical hashes 
for all of them.

I guess what I should do next is to go through the decompressed data and look 
for suspicious anomalies, but unfortunately I don't have the tools in place to 
do that quite yet.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21872
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21872] LZMA library sometimes fails to decompress a file

2014-06-25 Thread Ville Nummela

New submission from Ville Nummela:

Python lzma library sometimes fails to decompress a file, even though the file 
does not appear to be corrupt. 

Originally discovered with OS X 10.9 / Python 2.7.7 / bacports.lzma
Now also reproduced on OS X / Python 3.4 / lzma, please see
https://github.com/peterjc/backports.lzma/issues/6 for more details.

Two example files are provided, a good one and a bad one. Both are compressed 
using the older lzma algorithm (not xz). An attempt to decompress the 'bad' 
file raises EOFError: Compressed file ended before the end-of-stream marker 
was reached.

The 'bad' file appears to be ok, because
- a direct call to XZ Utils processes the files without complaints
- the decompressed files' contents appear to be ok.

The example files contain tick data and have been downloaded from the Dukascopy 
bank's historical data feed service. The service is well known for it's high 
data quality and utilised by multiple analysis SW platforms. Thus I think it is 
unlikely that a file integrity issue on their end would have gone unnoticed.

The error occurs relatively rarely; only around 1 - 5 times per 1000 downloaded 
files.

--
components: Library (Lib)
files: Archive.zip
messages: 221566
nosy: nadeem.vawda, vnummela
priority: normal
severity: normal
status: open
title: LZMA library sometimes fails to decompress a file
type: behavior
versions: Python 2.7, Python 3.4
Added file: http://bugs.python.org/file35779/Archive.zip

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21872
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com