Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to undefined

2009-02-26 Thread Anjanesh Lekshminarayanan
(1) what is produced on Anjanesh's machine sys.getdefaultencoding() 'utf-8' (2) it looks like a small snippet from a Python source file! Its a file containing just JSON data - but has some unicode characters as well as it has data from the web. Anjanesh, Is it a .py file Its a .json file. I

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to undefined

2009-01-29 Thread Anjanesh Lekshminarayanan
C:\Python30\lib\io.py, line 1295, in decode output = self.decoder.decode(input, final=final) File C:\Python30\lib\encodings\cp1252.py, line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to undefined

2009-01-29 Thread Benjamin Kaplan
,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to undefined The string at position 10442 is something like this : query:0 1Ȉ \u2021 0\u201a0 \u2021Ȉ , So what encoding value am I supposed to give ? I tried f = open

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to undefined

2009-01-29 Thread Anjanesh Lekshminarayanan
It does auto-detect it as cp1252- look at the files in the traceback and you'll see lib\encodings\cp1252.py. Since cp1252 seems to be the wrong encoding, try opening it as utf-8 or latin1 and see if that fixes it. Thanks a lot ! utf-8 and latin1 were accepted ! --

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to undefined

2009-01-29 Thread Benjamin Kaplan
On Thu, Jan 29, 2009 at 12:09 PM, Anjanesh Lekshminarayanan m...@anjanesh.net wrote: It does auto-detect it as cp1252- look at the files in the traceback and you'll see lib\encodings\cp1252.py. Since cp1252 seems to be the wrong encoding, try opening it as utf-8 or latin1 and see if that

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to undefined

2009-01-29 Thread Benjamin Peterson
Anjanesh Lekshminarayanan mail at anjanesh.net writes: It does auto-detect it as cp1252- look at the files in the traceback and you'll see lib\encodings\cp1252.py. Since cp1252 seems to be the wrong encoding, try opening it as utf-8 or latin1 and see if that fixes it. Thanks a lot !

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to undefined

2009-01-29 Thread John Machin
of Anjanesh's report: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to undefined The string at position 10442 is something like this : query:0 1»Ý \u2021 0\u201a0 \u2021»Ý, draws two observations: (1) there is nothing in the reported string that can

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to undefined

2009-01-29 Thread Benjamin Kaplan
, you'll avoid this issue all together (just make sure you use byte strings instead of unicode strings). In fact, inspection of Anjanesh's report: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to undefined The string at position 10442 is something

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to undefined

2009-01-29 Thread John Machin
Benjamin Kaplan benjamin.kaplan at case.edu writes: First of all, you're right that might be confusing. I was thinking of auto-detect as in check the platform and locale and guess what they usually use. I wasn't thinking of it like the web browsers use it.I think it uses