Johannes Bauer schrieb:
Hello group,

with this following program:

#!/usr/bin/python3
import gzip
x = gzip.open("testdatei", "wb")
x.write("ä")
x.close()

I get a broken .gzip file when decompressing:

$ cat testdatei |gunzip
ä
gzip: stdin: invalid compressed data--length error

As it only happens with UTF-8 characters, I suppose the gzip module

UTF-8 is not unicode. Even if the source-encoding above is UTF-8, I'm not sure what is used to encode the unicode-string when it's written.

writes a length of 1 in the gzip file header (one character "ä"), but
then actually writes 2 characters (0xc3 0xa4).

Is there a solution?

What about writinga bytestring by explicitly decoding the string to utf-8 first?

x.write("ä".encode("utf-8"))


Diez
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to