hi, On Monday 29 March 2010 18:38:12 Alexander Artemenko wrote: > >> On 2010/03/29 16:05 - svetlyak40wt wrote : > >> I've solved this annoing problem. Here is the patch: > >> http://gist.github.com/347854 > > > > On 2010/03/29 16:57 - sthenault wrote : > > would you please add a test case to the functional suite ? > > > > see test/input and test/messages or search ml archives for more > > details > > Hi Sylvain, I've updated the patch and added tests. > > > by: Alexander Artemenko > url: http://www.logilab.org/ticket/4683
I tried out your patch; but unfortunately it generated an UnicodeDecodeError in our test suite I fixed it without understanding, by splitting your lambda declaration in two lines: + decode = stream.readline().decode + line_generator = lambda: decode(encoding) instead of: + line_generator = lambda: stream.readline().decode(encoding) Can somebody explain me what happened ? Anyhow, Appended my new patch (we use func_noerror_* if we don't want the message triggered) Is that ok ? -- Emile Anclin <emile.anc...@logilab.fr> http://www.logilab.fr/ http://www.logilab.org/ Informatique scientifique & et gestion de connaissances
fix #4683: Non-ASCII characters count double if utf8 diff -r cdd571901fea checkers/format.py --- a/checkers/format.py Mon Mar 29 11:27:19 2010 +0200 +++ b/checkers/format.py Tue Mar 30 11:13:09 2010 +0200 @@ -31,6 +31,7 @@ from pylint.interfaces import IRawChecker, IASTNGChecker from pylint.checkers import BaseRawChecker +from pylint.checkers.misc import guess_encoding, is_ascii MSGS = { 'C0301': ('Line too long (%s/%s)', @@ -178,6 +179,25 @@ self._lines = None self._visited_lines = None + def process_module(self, stream): + """extracts encoding from the stream and + decodes each line, so that international + text's lenght properly calculated. + """ + data = stream.read() + line_generator = stream.readline + + ascii, lineno = is_ascii(data) + if not ascii: + encoding = guess_encoding(data) + if encoding is not None: + decode = stream.readline().decode + line_generator = lambda: decode(encoding) + del data + + stream.seek(0) + self.process_tokens(tokenize.generate_tokens(line_generator)) + def new_line(self, tok_type, line, line_num, junk): """a new line has been encountered, process it if necessary""" if not tok_type in junk: diff -r cdd571901fea test/input/func_noerror_long_utf8_line.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test/input/func_noerror_long_utf8_line.py Tue Mar 30 11:13:09 2010 +0200 @@ -0,0 +1,8 @@ +# -*- coding: utf-8 -*- +"""this utf-8 doc string have some non ASCII caracters like 'é', or '¢»ß'""" +### check also comments with some more non ASCII caracters like 'é' or '¢»ß' + +__revision__ = 1100 +print "------------------------------------------------------------------------" +print "-----------------------------------------------------------------------é" +
_______________________________________________ Python-Projects mailing list Python-Projects@lists.logilab.org http://lists.logilab.org/mailman/listinfo/python-projects