[issue9598] untabify.py fails on files that contain non-ascii characters

2010-11-30 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: Committed revision 86893 that makes untabify.py respect encoding cookie in the files it processes. I don't think there is anything else that needs to be done here. -- resolution: - fixed stage: -

[issue9598] untabify.py fails on files that contain non-ascii characters

2010-09-07 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: From IRC: Me: UTF-8 was not strictly valid in ANSI C comments, so it is a bug in untabify to assume UTF-8 in C files. Merwok: Works for me. I am lowering the priority because it looks like untabify does not fail on the

[issue9598] untabify.py fails on files that contain non-ascii characters

2010-09-07 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: Why would it be the job of untabify to report invalid non-ASCII characters in C files? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9598

[issue9598] untabify.py fails on files that contain non-ascii characters

2010-09-07 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Tue, Sep 7, 2010 at 8:08 PM, Éric Araujo rep...@bugs.python.org wrote: .. Why would it be the job of untabify to report invalid non-ASCII characters in C files? Since untabify works by loading C code as text, it has

[issue9598] untabify.py fails on files that contain non-ascii characters

2010-09-07 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: My real question was: Shouldn’t this be a VCS hook instead of untabify’s job? (or in addition to untabify if you insist) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9598

[issue9598] untabify.py fails on files that contain non-ascii characters

2010-09-07 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: On Tue, Sep 7, 2010 at 8:31 PM, Éric Araujo rep...@bugs.python.org wrote: .. My real question was: Shouldn’t this be a VCS hook instead of untabify’s job? (or in addition to untabify if you insist) Yes, VCS hook makes

[issue9598] untabify.py fails on files that contain non-ascii characters

2010-09-07 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: I agree with your reply (that’s what I meant with “works for me”, the question about untabify vs. hooks only occurred to me after our IRC exchange). -- ___ Python tracker rep...@bugs.python.org

[issue9598] untabify.py fails on files that contain non-ascii characters

2010-09-04 Thread Florent Xicluna
Florent Xicluna florent.xicl...@gmail.com added the comment: Other C files converted from latin-1 to utf-8 with r84485. -- components: +Unicode nosy: +flox ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9598

[issue9598] untabify.py fails on files that contain non-ascii characters

2010-09-03 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: Fixed encoding error in r84472 through r84474. This bug should be reassessed and retitled. If untabify fails because a file has an incorrect encoding, is it really a problem in untabify? This is a developer’s tool, so getting a traceback here

[issue9598] untabify.py fails on files that contain non-ascii characters

2010-09-03 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: If untabify fails because a file has an incorrect encoding, is it really a problem in untabify? This is a developer’s tool, so getting a traceback here seems okay to me. I disagree. I think we should use this

[issue9598] untabify.py fails on files that contain non-ascii characters

2010-09-03 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: I agree about the need to define the encoding for comments. My vote goes to #2, since I wouldn’t want to see names of authors/contributors mangled in the source. I would reconsider if a specification explicitly forbade that. I repeat that the

[issue9598] untabify.py fails on files that contain non-ascii characters

2010-09-03 Thread Alexander Belopolsky
Alexander Belopolsky belopol...@users.sourceforge.net added the comment: I wouldn’t want to see names of authors/contributors mangled in the source. This is a reason to write names in ASCII. While Latin-1 is a grey area because most of it's characters look familiar to English-speaking

[issue9598] untabify.py fails on files that contain non-ascii characters

2010-09-03 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: I wouldn’t want to see names of authors/contributors mangled in the source. This is a reason to write names in ASCII. Oh, sorry, by “mangled” I meant “forced into ASCII”. I was not speaking about mojibake. While Latin-1 is a grey area

[issue9598] untabify.py fails on files that contain non-ascii characters

2010-08-25 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: The builtin open in 3.2 is similar to codecs.open. If you read the error message closely, you’ll see that the decoding that failed did try to use UTF-8. The cause of the problem here is that the bytes used for the ç in François’ name are not

[issue9598] untabify.py fails on files that contain non-ascii characters

2010-08-18 Thread Popa Claudiu
Popa Claudiu pcmantic...@gmail.com added the comment: Hello. As it seems, untabify.py opens the file using the builtin function open, making the call error-prone when encountering non-ascii character. The proper handling should be done by using open from codecs library, specifying the encoding

[issue9598] untabify.py fails on files that contain non-ascii characters

2010-08-13 Thread Alexander Belopolsky
New submission from Alexander Belopolsky belopol...@users.sourceforge.net: For example: $ ./python.exe Tools/scripts/untabify.py Modules/_heapqmodule.c Traceback (most recent call last): ... (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8'

[issue9598] untabify.py fails on files that contain non-ascii characters

2010-08-13 Thread Alexander Belopolsky
Changes by Alexander Belopolsky belopol...@users.sourceforge.net: -- nosy: +eric.araujo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9598 ___ ___