[issue11909] Doctest sees directives in strings when it should only see them in comments
Devin Jeanpierre jeanpierr...@gmail.com added the comment: Updated patch to newest revision, and to use _tokenize function and includes a test case to verify that it ignores the encoding directive during the tokenization (and every other) step. I'll file a tokenize bug separately. -- Added file: http://bugs.python.org/file22559/comments.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11909 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11909] Doctest sees directives in strings when it should only see them in comments
Devin Jeanpierre jeanpierr...@gmail.com added the comment: Erp I forgot to run this against the rest of the tests. Disregard, I'll fix it up a bit later. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11909 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11909] Doctest sees directives in strings when it should only see them in comments
Devin Jeanpierre jeanpierr...@gmail.com added the comment: Updated. -- Added file: http://bugs.python.org/file22562/comments3.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11909 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11909] Doctest sees directives in strings when it should only see them in comments
Devin Jeanpierre jeanpierr...@gmail.com added the comment: You're right, and good catch. If a doctest starts with a #coding:XXX line, this should break. One option is to replace the call to tokenize.tokenize with a call to tokenize._tokenize and pass 'utf-8' as a parameter. Downside: that's a private and undocumented API. The alternative is to manually add a coding line that specifies UTF-8, so that any coding line in the doctest would be ignored. My preferred option would be to add the ability to read unicode to the tokenize API, and then use that. I can file a separate ticket if that sounds good, since it's probably useful to others too. One other thing to be worried about -- I'm not sure how doctest would treat tests with leading coding:XXX lines. I'd hope it ignores them, if it doesn't then this is more complicated and the above stuff wouldn't work. I'll see if I have the time to play around with this (and add more test cases to the patch, correspondingly) this weekend. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11909 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11909] Doctest sees directives in strings when it should only see them in comments
R. David Murray rdmur...@bitdance.com added the comment: I agree that having a unicode API for tokenize seems to make sense, and that would indeed require a separate issue. That's a good point about doctest not otherwise supporting coding cookies. Those only really apply to source files. So no doctest fragments ought to contain coding cookies at the start, so your patch ought to be fine. But I'm not familiar with the doctest internals, so having some tests to prove everything is fine would be great. Your code could use the tokenize sniffer to make sure the fragment reads as utf-8 and throw an error otherwise. But using a unicode interface to tokenize would probably be cleaner, since I suspect it would mimic what doctest does otherwise (ignore coding cookies). But I don't *know* the latter, so your checking it would be appreciated. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11909 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11909] Doctest sees directives in strings when it should only see them in comments
R. David Murray rdmur...@bitdance.com added the comment: For the most part the patch looks good to me, too. My one concern is the encoding. tokenize detects the encoding...is it possible for the doctest fragment to be detected to be some encoding other than utf-8? -- nosy: +benjamin.peterson, r.david.murray ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11909 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11909] Doctest sees directives in strings when it should only see them in comments
Petri Lehtinen pe...@digip.org added the comment: The patch looks good to me. It passes the old doctests tests and adds a new test case for what it's fixing. -- nosy: +petri.lehtinen, tim_one ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11909 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11909] Doctest sees directives in strings when it should only see them in comments
New submission from Devin Jeanpierre jeanpierr...@gmail.com: From the doctest source: 'Option directives are comments starting with doctest:. Warning: this may give false positives for string-literals that contain the string #doctest:. Eliminating these false positives would require actually parsing the string; but we limit them by ignoring any line containing #doctest: that is *followed* by a quote mark.' This isn't a huge deal, but it's a bit annoying. Above being confusing, this is in contradiction with the doctest documentation, which states: 'Doctest directives are expressed as a special Python comment following an example’s source code' No mention is made of this corner case where the regexp breaks. As per the comment in the source, the patched version parses the source using the tokenize module, and runs a modified directive regex on all comment tokens to find directives. -- components: Library (Lib) files: comments.diff keywords: patch messages: 134278 nosy: Devin Jeanpierre priority: normal severity: normal status: open title: Doctest sees directives in strings when it should only see them in comments Added file: http://bugs.python.org/file21757/comments.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11909 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11909] Doctest sees directives in strings when it should only see them in comments
Changes by R. David Murray rdmur...@bitdance.com: -- stage: - patch review type: - feature request versions: +Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11909 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com