Re: [Python-Dev] What does a double coding cookie mean?
On 17.03.16 15:14, M.-A. Lemburg wrote: On 17.03.2016 01:29, Guido van Rossum wrote: Should we recommend that everyone use tokenize.detect_encoding()? I'd prefer a separate utility for this somewhere, since tokenize.detect_encoding() is not available in Python 2. I've attached an example implementation with tests, which works in Python 2.7 and 3. Sorry, but this code doesn't match the behaviour of Python interpreter, nor other tools. I suggest to backport tokenize.detect_encoding() (but be aware that the default encoding in Python 2 is ASCII, not UTF-8). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 484 update: add Type[T]
There's a more fundamental PEP 484 update that I'd like to add. The discussion is in https://github.com/python/typing/issues/107. Currently we don't have a way to talk about arguments and variables whose type is itself a type or class. The only annotation you can use for this is 'type' which says "this argument/variable is a type object" (or a class). But it's often useful to be able to say "this is a class and it must be a subclass of X". In fact this was proposed in the original rounds of discussion about PEP 484, but at the time it felt too far removed from practice to know quite how it should be used, so I just put it off. But it's been one of the features that's been requested most by the early adopters of PEP 484 at Dropbox. So I'd like to add it now. At runtime this shouldn't do much; Type would be just a generic class of one parameter that records its one type parameter. The real magic would happen in the type checker, which will be able to use types involving Type. It should also be possible to use this with type variables, so we could write e.g. T = TypeVar('T', bound=int) def factory(c: Type[T]) -> T: This would define factory() as a function whose argument must be a subclass of int and returning an instance of that subclass. (The bound= option to TypeVar() is already described in PEP 484, although mypy hasn't implemented it yet.) (If I screwed up this example, hopefully Jukka will correct me. :-) Again, I'd like this to go out with 3.5.2, because it requires adding something to typing.py (and again, that's allowed because PEP 484 is provisional -- see PEP 411 for an explanation). -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 484: updates to Python 2.7 signature syntax
Heh. I could add an example with a long list of parameters with long names, but apart from showing by example what the motivation is it wouldn't really add anything, and it's more to type. :-) On Sat, Mar 19, 2016 at 6:43 PM, Andrew Barnertwrote: > On Mar 19, 2016, at 18:18, Guido van Rossum wrote: >> >> Second, https://github.com/python/typing/issues/186. This builds on >> the previous syntax but deals with the other annoyance of long >> argument lists, this time in case you *do* care about the types. The >> proposal is to allow writing the arguments one per line with a type >> comment on each line. This has been implemented in PyCharm but not yet >> in mypy. Example: >> >>def gcd( >>a, # type: int >>b, # type: int >>): >># type: (...) -> int >> > > This is a lot nicer than what you were originally discussing (at #1101? I > forget...). Even more so given how trivial it will be to mechanically convert > these to annotations if/when you switch an app to pure Python 3. > > But one thing: in the PEP and the docs, I think it would be better to pick an > example with longer parameter names. This example shows that even in the > worst case it isn't that bad, but a better example would show that in the > typical case it's actually pretty nice. (Also, I don't see why you wouldn't > just use the "old" comment form for this example, since it all fits on one > line and isn't at all confusing.) > -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 484 update: allow @overload in regular module files
Here's another proposal for a change to PEP 484. In https://github.com/python/typing/issues/72 there's a long discussion ending with a reasonable argument to allow @overload in (non-stub) modules after all. This proposal does *not* sneak in a syntax for multi-dispatch -- the @overload versions are only for the benefit of type checkers while a single non-@overload implementation must follow that handles all cases. In fact, I expect that if we ever end up adding multi-dispatch to the language or library, it will neither replace not compete with @overload, but the two will most likely be orthogonal to each other, with @overload aiming at a type checker and some other multi-dispatch aiming at the interpreter. (The needs of the two cases are just too different -- e.g. it's hard to imagine multi-dispatch in Python use type variables.) More details in the issue (that's also where I'd like to get feedback if possible). I want to settle this before 3.5.2 goes out, because it requires a change to typing.py in the stdlib. Fortunately the change will be backward compatible (even though this isn't strictly required for a provisional module). In the original typing module, any use of @overload outside a stub is an error (it raises as soon as it's used). In the new proposal, you can decorate a function with @overload, but any attempt to call such a decorated function raises an error. This should catch cases early where you forget to provide an implementation. (Reference for provisional modules: https://www.python.org/dev/peps/pep-0411/) -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 484: updates to Python 2.7 signature syntax
On Mar 19, 2016, at 18:18, Guido van Rossumwrote: > > Second, https://github.com/python/typing/issues/186. This builds on > the previous syntax but deals with the other annoyance of long > argument lists, this time in case you *do* care about the types. The > proposal is to allow writing the arguments one per line with a type > comment on each line. This has been implemented in PyCharm but not yet > in mypy. Example: > >def gcd( >a, # type: int >b, # type: int >): ># type: (...) -> int > This is a lot nicer than what you were originally discussing (at #1101? I forget...). Even more so given how trivial it will be to mechanically convert these to annotations if/when you switch an app to pure Python 3. But one thing: in the PEP and the docs, I think it would be better to pick an example with longer parameter names. This example shows that even in the worst case it isn't that bad, but a better example would show that in the typical case it's actually pretty nice. (Also, I don't see why you wouldn't just use the "old" comment form for this example, since it all fits on one line and isn't at all confusing.) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What does a double coding cookie mean?
On 3/16/2016 12:59 AM, Serhiy Storchaka wrote: On 16.03.16 09:46, Glenn Linderman wrote: On 3/16/2016 12:09 AM, Serhiy Storchaka wrote: On 16.03.16 08:34, Glenn Linderman wrote: From the PEP 263: More precisely, the first or second line must match the regular expression "coding[:=]\s*([-\w.]+)". The first group of this expression is then interpreted as encoding name. If the encoding is unknown to Python, an error is raised during compilation. There must not be any Python statement on the line that contains the encoding declaration. Clearly the regular expression would only match the first of multiple cookies on the same line, so the first one should always win... but there should only be one, from the first PEP quote "a magic comment". "The first group of this expression" means the first regular expression group. Only the part between parenthesis "([-\w.]+)" is interpreted as encoding name, not all expression. Sure. But there is no mention anywhere in the PEP of more than one being legal: just more than one position for it, EITHER line 1 or line 2. So while the regular expression mentioned is not anchored, to allow variation in syntax between emacs and vim, "must match the regular expression" doesn't imply "several times", and when searching for a regular expression that might not be anchored, one typically expects to find the first. Actually "must match the regular expression" is not correct, because re.match() implies anchoring at the start. I have proposed more correct regular expression in other branch of this thread. "match" doesn't imply anchoring at the start. "re.match()" does (and as a result is very confusing to newbies to Python re, that have used other regexp systems). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 484: updates to Python 2.7 signature syntax
PEP 484 was already updated to support signatures as type comments in Python 2.7. I'd like to add two more variations to this spec, both of which have already come up through users. First, https://github.com/python/typing/issues/188. This augments the format of signature type comments to allow (...) instead of an argument list. This is useful to avoid having to write (Any, Any, Any, ..., Any) for a long argument list if you don't care about the argument types but do want to specify the return type. It's already implemented by mypy (and presumably by PyCharm). Example: def gcd(a, b): # type: (...) -> int Second, https://github.com/python/typing/issues/186. This builds on the previous syntax but deals with the other annoyance of long argument lists, this time in case you *do* care about the types. The proposal is to allow writing the arguments one per line with a type comment on each line. This has been implemented in PyCharm but not yet in mypy. Example: def gcd( a, # type: int b, # type: int ): # type: (...) -> int In both cases we've considered a few alternatives and ended up agreeing on the best course forward. If you have questions or feedback on either proposal it's probably best to just add a comment to the GitHub tracker issues. A clarification of the status of PEP 484: it was provisionally accepted in May 2015. Having spent close to a year pondering it, and the last several months actively using it at Dropbox, I'm now ready to move with some improvements based on these experiences (and those of others who have started to use it). We already added the basic Python 2.7 compatible syntax (see thread starting at https://mail.python.org/pipermail/python-ideas/2016-January/037704.html), and having used that for a few months the two proposals mentioned above handle a few corner cases that were possible but a bit awkward in our experience. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Make the warnings module extensible
Hi, I have an API question for you. I would like to add a new parameter the the showwarning() function of the warnings module. Problem: it's not possible to do that without breaking the backward compatibility (when an application replaces warnings.showwarning(), the warnings allows and promotes that). I proposed a patch to add a new showmsg() function which takes a warnings.WarningMessage object: https://bugs.python.org/issue26568 The design is inspired by the logging module and its logging.LogRecord class. The warnings.WarningMessage already exists. Since it's class, it's easy to add new attributes without breaking the API. - If warnings.showwarning() is replaced by an application, this function will be called in practice to log the warning. - If warnings.showmsg() is replaced, again, this function will be called in practice. - If both functions are replaced, showmsg() will be called (replaced showwarning() is ignored) I'm not sure about function names: showmsg() and formatmsg(). Maybe: showwarnmsg() and formatwarnmsg()? Bikeshedding fight! The final goal is to log the traceback where the destroyed object was allocated when a ResourceWarning warning is logged: https://bugs.python.org/issue26567 Adding a new parameter to warnings make the implementation much more simple and gives more freedom to the logger to decide how to format the warning. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bitfields - short - and xlc compiler
On Mar 17, 2016, at 18:35, MRABwrote: > >> On 2016-03-18 00:56, Michael Felt wrote: >> Update: >> Is this going to be impossible? > From what I've been able to find out, the C89 standard limits bitfields to > int, signed int and unsigned int, and the C99 standard added _Bool, although > some compilers allow other integer types too. It looks like your compiler > doesn't allow those additional types. Yeah, C99 (6.7.2.1) allows "a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation-defined type", and same for C11. This means that a compiler could easily allow an implementation-defined type that's identical to and interconvertible with short, say "i16", to be used in bitfields, but not short itself. And yet, gcc still allows short "even in strictly conforming mode" (4.9), and it looks like Clang and Intel do the same. Meanwhile, MSVC specifically says it's illegal ("The type-specifier for the declarator must be unsigned int, signed int, or int") but then defines the semantics (you can't have a 17-bit short, bit fields act as the underlying type when accessed, alignment is forced to a boundary appropriate for the underlying type). They do mention that allowing char and long types is a Microsoft extension, but still nothing about short, even though it's used in most of the examples on the page. Anyway, is the question what ctypes should do? If a platform's compiler allows "short M: 1", especially if it has potentially different alignment than "int M: 1", ctypes on that platform had better make ("M", c_short, 1) match the former, right? So it sounds like you need some configure switch to test that your compiler doesn't allow short bit fields, so your ctypes build at least skips that part of _ctypes_test.c and test_bitfields.py, and maybe even doesn't allow them in Python code. >> test_short fails om AIX when using xlC in any case. How terrible is this? >> >> == >> FAIL: test_shorts (ctypes.test.test_bitfields.C_Test) >> -- >> Traceback (most recent call last): >>File >> "/data/prj/aixtools/python/python-2.7.11.2/Lib/ctypes/test/test_bitfields.py", >> line 48, in test_shorts >> self.assertEqual((name, i, getattr(b, name)), (name, i, >> func(byref(b), name))) >> AssertionError: Tuples differ: ('M', 1, -1) != ('M', 1, 1) >> >> First differing element 2: >> -1 >> 1 >> >> - ('M', 1, -1) >> ? - >> >> + ('M', 1, 1) >> >> -- >> Ran 440 tests in 1.538s >> >> FAILED (failures=1, skipped=91) >> Traceback (most recent call last): >>File "./Lib/test/test_ctypes.py", line 15, in >> test_main() >>File "./Lib/test/test_ctypes.py", line 12, in test_main >> run_unittest(unittest.TestSuite(suites)) >>File >> "/data/prj/aixtools/python/python-2.7.11.2/Lib/test/test_support.py", >> line 1428, in run_unittest >> _run_suite(suite) >>File >> "/data/prj/aixtools/python/python-2.7.11.2/Lib/test/test_support.py", >> line 1411, in _run_suite >> raise TestFailed(err) >> test.test_support.TestFailed: Traceback (most recent call last): >>File >> "/data/prj/aixtools/python/python-2.7.11.2/Lib/ctypes/test/test_bitfields.py", >> line 48, in test_shorts >> self.assertEqual((name, i, getattr(b, name)), (name, i, >> func(byref(b), name))) >> AssertionError: Tuples differ: ('M', 1, -1) != ('M', 1, 1) >> >> First differing element 2: >> -1 >> 1 >> >> - ('M', 1, -1) >> ? - >> >> + ('M', 1, 1) >> >> >> >> >>> On 17-Mar-16 23:31, Michael Felt wrote: >>> a) hope this is not something you expect to be on -list, if so - my >>> apologies! >>> >>> Getting this message (here using c99 as compiler name, but same issue >>> with xlc as compiler name) >>> c99 -qarch=pwr4 -qbitfields=signed -DNDEBUG -O -I. -IInclude >>> -I./Include -I/data/prj/aixtools/python/python-2.7.11.2/Include >>> -I/data/prj/aixtools/python/python-2.7.11.2 -c >>> /data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c >>> -o >>> build/temp.aix-5.3-2.7/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.o >>> "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", >>> line 387.5: 1506-009 (S) Bit field M must be of type signed int, >>> unsigned int or int. >>> "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", >>> line 387.5: 1506-009 (S) Bit field N must be of type signed int, >>> unsigned int or int. >>> "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", >>> line 387.5: 1506-009 (S) Bit field O must be of type signed int, >>> unsigned int or int. >>> "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", >>> line 387.5: 1506-009 (S) Bit field P must be of type signed int, >>>
Re: [Python-Dev] What does a double coding cookie mean?
On 3/19/2016 2:37 PM, Serhiy Storchaka wrote: On 19.03.16 19:36, Glenn Linderman wrote: On 3/19/2016 8:19 AM, Serhiy Storchaka wrote: On 16.03.16 08:03, Serhiy Storchaka wrote: I just tested with Emacs, and it looks that when specify different codings on two different lines, the first coding wins, but when specify different codings on the same line, the last coding wins. Therefore current CPython behavior can be correct, and the regular expression in PEP 263 should be changed to use greedy repetition. Just because emacs works that way (and even though I'm an emacs user), that doesn't mean CPython should act like emacs. Yes. But current CPython works that way. The behavior of Emacs is the argument that maybe this is not a bug. If CPython properly handles the following line as having only one proper coding declaration (utf-8), then I might reluctantly agree that the behavior of Emacs might be a relevant argument. Otherwise, vehemently not relevant. # -*- coding: utf-8 -*- this file does not use coding: latin-1 (4) there is no benefit to specifying the coding twice on a line, it only adds confusion, whether in CPython, emacs, or vim. (4a) Here's an untested line that emacs would interpret as utf-8, and CPython with the greedy regulare expression would interpret as latin-1, because emacs looks only between the -*- pair, and CPython ignores that. # -*- coding: utf-8 -*- this file does not use coding: latin-1 Since Emacs allows to specify the coding twice on a line, and this can be ambiguous, and CPython already detects some ambiguous situations (UTF-8 BOM and non-UTF-8 coding cookie), it may be worth to add a check that the coding is specified only once on a line. Diagnosing ambiguous conditions, even including my example above, might be useful... for a few files... is it worth the effort? What % of .py sources have coding specifications? What % of those have two? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What does a double coding cookie mean?
On 19.03.16 19:36, Glenn Linderman wrote: On 3/19/2016 8:19 AM, Serhiy Storchaka wrote: On 16.03.16 08:03, Serhiy Storchaka wrote: I just tested with Emacs, and it looks that when specify different codings on two different lines, the first coding wins, but when specify different codings on the same line, the last coding wins. Therefore current CPython behavior can be correct, and the regular expression in PEP 263 should be changed to use greedy repetition. Just because emacs works that way (and even though I'm an emacs user), that doesn't mean CPython should act like emacs. Yes. But current CPython works that way. The behavior of Emacs is the argument that maybe this is not a bug. (4) there is no benefit to specifying the coding twice on a line, it only adds confusion, whether in CPython, emacs, or vim. (4a) Here's an untested line that emacs would interpret as utf-8, and CPython with the greedy regulare expression would interpret as latin-1, because emacs looks only between the -*- pair, and CPython ignores that. # -*- coding: utf-8 -*- this file does not use coding: latin-1 Since Emacs allows to specify the coding twice on a line, and this can be ambiguous, and CPython already detects some ambiguous situations (UTF-8 BOM and non-UTF-8 coding cookie), it may be worth to add a check that the coding is specified only once on a line. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Summary of Python tracker Issues
ACTIVITY SUMMARY (2016-03-11 - 2016-03-18) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open5459 ( +5) closed 32885 (+43) total 38344 (+48) Open issues with patches: 2375 Issues opened (36) == #15660: Clarify 0 prefix for width specifier in str.format doc, http://bugs.python.org/issue15660 reopened by terry.reedy #22758: Regression in Python 3.2 cookie parsing http://bugs.python.org/issue22758 reopened by berker.peksag #25934: ICC compiler: ICC treats denormal floating point numbers as 0. http://bugs.python.org/issue25934 reopened by zach.ware #26270: Support for read()/write()/select() on asyncio http://bugs.python.org/issue26270 reopened by gvanrossum #26481: unittest discovery process not working without .py source file http://bugs.python.org/issue26481 reopened by rbcollins #26541: Add stop_after parameter to setup() http://bugs.python.org/issue26541 opened by memeplex #26543: imaplib noop Debug http://bugs.python.org/issue26543 opened by Stephen.Evans #26544: platform.libc_ver() returns incorrect version number http://bugs.python.org/issue26544 opened by Thomas.Waldmann #26545: os.walk is limited by python's recursion limit http://bugs.python.org/issue26545 opened by Thomas.Waldmann #26546: Provide translated french translation on docs.python.org http://bugs.python.org/issue26546 opened by sizeof #26547: Undocumented use of the term dictproxy in vars() documentation http://bugs.python.org/issue26547 opened by sizeof #26549: co_stacksize is calculated from unoptimized code http://bugs.python.org/issue26549 opened by ztane #26550: documentation minor issue : "Step back: WSGI" section from "HO http://bugs.python.org/issue26550 opened by Alejandro Soini #26552: Failing ensure_future still creates a Task http://bugs.python.org/issue26552 opened by gordon #26553: Write HTTP in uppercase http://bugs.python.org/issue26553 opened by Sudheer Satyanarayana #26554: PC\bdist_wininst\install.c: Missing call to fclose() http://bugs.python.org/issue26554 opened by maddin200 #26556: Update expat to 2.2.1 http://bugs.python.org/issue26556 opened by christian.heimes #26557: dictviews methods not present on shelve objects http://bugs.python.org/issue26557 opened by Michael Crouch #26559: logging.handlers.MemoryHandler flushes on shutdown but not rem http://bugs.python.org/issue26559 opened by David Escott #26560: Error in assertion in wsgiref.handlers.BaseHandler.start_respo http://bugs.python.org/issue26560 opened by inglesp #26565: [ctypes] Add value attribute to non basic pointers. http://bugs.python.org/issue26565 opened by memeplex #26566: Failures on FreeBSD CURRENT buildbot http://bugs.python.org/issue26566 opened by haypo #26567: ResourceWarning: Use tracemalloc to display the traceback wher http://bugs.python.org/issue26567 opened by haypo #26568: Add a new warnings.showmsg() function taking a warnings.Warnin http://bugs.python.org/issue26568 opened by haypo #26571: turtle regression in 3.5 http://bugs.python.org/issue26571 opened by Ellison Marks #26574: replace_interleave can be optimized for single character byte http://bugs.python.org/issue26574 opened by Josh Snider #26576: Tweak wording of decorator docos http://bugs.python.org/issue26576 opened by Rosuav #26577: inspect.getclosurevars returns incorrect variable when using c http://bugs.python.org/issue26577 opened by Ryan Fox #26578: Bad BaseHTTPRequestHandler response when using HTTP/0.9 http://bugs.python.org/issue26578 opened by xiang.zhang #26579: Support pickling slots in subclasses of common classes http://bugs.python.org/issue26579 opened by serhiy.storchaka #26581: Double coding cookie http://bugs.python.org/issue26581 opened by serhiy.storchaka #26582: asyncio documentation links to wrong CancelledError http://bugs.python.org/issue26582 opened by awilfox #26584: pyclbr module needs to be more flexible on loader support http://bugs.python.org/issue26584 opened by eric.snow #26585: Use html.escape to replace _quote_html in http.server http://bugs.python.org/issue26585 opened by xiang.zhang #26586: Simple enhancement to BaseHTTPRequestHandler http://bugs.python.org/issue26586 opened by xiang.zhang #26587: Possible duplicate entries in sys.path if .pth files are used http://bugs.python.org/issue26587 opened by tds333 Most recent 15 issues with no replies (15) == #26584: pyclbr module needs to be more flexible on loader support http://bugs.python.org/issue26584 #26582: asyncio documentation links to wrong CancelledError http://bugs.python.org/issue26582 #26581: Double coding cookie http://bugs.python.org/issue26581 #26579: Support pickling slots in subclasses of common classes http://bugs.python.org/issue26579 #26577: inspect.getclosurevars returns incorrect variable
Re: [Python-Dev] What does a double coding cookie mean?
On 17.03.2016 18:53, Serhiy Storchaka wrote: > On 17.03.16 19:23, M.-A. Lemburg wrote: >> On 17.03.2016 15:02, Serhiy Storchaka wrote: >>> On 17.03.16 15:14, M.-A. Lemburg wrote: On 17.03.2016 01:29, Guido van Rossum wrote: > Should we recommend that everyone use tokenize.detect_encoding()? I'd prefer a separate utility for this somewhere, since tokenize.detect_encoding() is not available in Python 2. I've attached an example implementation with tests, which works in Python 2.7 and 3. >>> >>> Sorry, but this code doesn't match the behaviour of Python interpreter, >>> nor other tools. I suggest to backport tokenize.detect_encoding() (but >>> be aware that the default encoding in Python 2 is ASCII, not UTF-8). >> >> Yes, I got the default for Python 3 wrong. I'll fix that. Thanks >> for the note. >> >> What other aspects are different than what Python implements ? > > 1. If there is a BOM and coding cookie, the source encoding is "utf-8-sig". Ok, that makes sense (even though it's not mandated by the PEP; the utf-8-sig codec didn't exist yet). > 2. If there is a BOM and coding cookie is not 'utf-8', this is an error. It's an error for Python, but why should a detection function always raise an error for this case ? It would probably be a good idea to have an errors parameter to leave this to the use to decide. Same for unknown encodings. > 3. If the first line is not blank or comment line, the coding cookie is > not searched in the second line. Hmm, the PEP does allow having the coding cookie in the second line, even if the first line is not a comment. Perhaps that's not really needed. > 4. Encoding name should be canonized. "UTF8", "utf8", "utf_8" and > "utf-8" is the same encoding (and all are changed to "utf-8-sig" with BOM). Well, that's cosmetics :-) The codec system will take care of this when needed. > 5. There isn't the limit of 400 bytes. Actually there is a bug with > handling long lines in current code, but even with this bug the limit is > larger. I think it's a reasonable limit, since shebang lines may only be 127 long on at least Linux (and probably several other Unix systems as well). But just in case, I made this configurable :-) > 6. I made a mistake in the regular expression, missed the underscore. I added it. > tokenize.detect_encoding() is the closest imitation of the behavior of > Python interpreter. Probably, but that doesn't us on Python 2, right ? I'll upload the script to github later today or tomorrow to continue development. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Mar 17 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ 2016-03-07: Released eGenix pyOpenSSL 0.13.14 ... http://egenix.com/go89 2016-02-19: Released eGenix PyRun 2.1.2 ... http://egenix.com/go88 ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 515: Underscores in Numeric Literals (revision 3)
All that sounds fine! On Sat, Mar 19, 2016 at 11:28 AM, Stefan Krahwrote: > Guido van Rossum python.org> writes: >> So should the preprocessing step just be s.replace('_', ''), or should >> it reject underscores that don't follow the rules from the PEP >> (perhaps augmented so they follow the spirit of the PEP and the letter >> of the IBM spec)? >> >> Honestly I think it's also fine if specifying this exactly is left out >> of the PEP, and handled by whoever adds this to Decimal. Having a PEP >> to work from for the language spec and core builtins (int(), float() >> complex()) is more important. > > I'd keep it simple for Decimal: Remove left and right whitespace (we're > already doing this), then remove underscores from the remaining string > (which must not contain any further whitespace), then use the IBM grammar. > > > We could add a clause to the PEP that only those strings that follow > the spirit of the PEP are guaranteed to be accepted in the future. > > > One reason for keeping it simple is that I would not like to slow down > string conversion, but thinking about two grammars is also a problem -- > part of the string conversion in libmpdec is modeled in ACL2, which > would be invalidated or at least complicated with two grammars. > > > > Stefan Krah > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 515: Underscores in Numeric Literals (revision 3)
Guido van Rossum python.org> writes: > So should the preprocessing step just be s.replace('_', ''), or should > it reject underscores that don't follow the rules from the PEP > (perhaps augmented so they follow the spirit of the PEP and the letter > of the IBM spec)? > > Honestly I think it's also fine if specifying this exactly is left out > of the PEP, and handled by whoever adds this to Decimal. Having a PEP > to work from for the language spec and core builtins (int(), float() > complex()) is more important. I'd keep it simple for Decimal: Remove left and right whitespace (we're already doing this), then remove underscores from the remaining string (which must not contain any further whitespace), then use the IBM grammar. We could add a clause to the PEP that only those strings that follow the spirit of the PEP are guaranteed to be accepted in the future. One reason for keeping it simple is that I would not like to slow down string conversion, but thinking about two grammars is also a problem -- part of the string conversion in libmpdec is modeled in ACL2, which would be invalidated or at least complicated with two grammars. Stefan Krah ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What does a double coding cookie mean?
Glenn Linderman writes: > On 3/19/2016 8:19 AM, Serhiy Storchaka wrote: > > Therefore current CPython behavior can be correct, and the regular > > expression in PEP 263 should be changed to use greedy repetition. > > Just because emacs works that way (and even though I'm an emacs user), > that doesn't mean CPython should act like emacs. > > (1) CPython should not necessarily act like emacs, We can't treat Emacs as a spec, because Emacs doesn't follow specs, doesn't respect standards, and above a certain level of inconvenience to developers doesn't respect backward compatibility. There's never any guarantee that Emacs will do the same thing tomorrow that it does today, although inertia has mostly the same effect. In this case, there's a reason why Emacs behaves the way it does, which is that you can put an arbitrary sequence of variable assignments in "-*- ... -*-" and they will be executed in order. So it makes sense that "last coding wins". But pragmas are severely deprecated in Python; cookies got a very special exception. So that rationale can't apply to Python. > (4) there is no benefit to specifying the coding twice on a line, it > only adds confusion, whether in CPython, emacs, or vim. Indeed. I see no point in reading past the first cookie found (whether a valid codec or not), unless an error would be raised. That might be a good idea, but I doubt it's worth the implementation complexity. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 515: Underscores in Numeric Literals (revision 3)
So should the preprocessing step just be s.replace('_', ''), or should it reject underscores that don't follow the rules from the PEP (perhaps augmented so they follow the spirit of the PEP and the letter of the IBM spec)? Honestly I think it's also fine if specifying this exactly is left out of the PEP, and handled by whoever adds this to Decimal. Having a PEP to work from for the language spec and core builtins (int(), float() complex()) is more important. On Sat, Mar 19, 2016 at 10:24 AM, Stefan Krahwrote: > > Guido van Rossum python.org> writes: >> I don't care too much either way, but I think passing underscores to the > constructor shouldn't be affected by the context -- the underscores are just > removed before parsing the number. But if it's too complicated to implement > I'm fine with punting. > > Just removing the underscores would be fine. The problem is that per > the PEP the conversion should happen according the Python float grammar > but the actual decimal grammar is the one from the IBM specification. > > I'd much rather express the problem like you did above: A preprocessing > step followed by the IBM specification grammar. > > > > Stefan Krah > > > > > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What does a double coding cookie mean?
On 3/19/2016 8:19 AM, Serhiy Storchaka wrote: On 16.03.16 08:03, Serhiy Storchaka wrote: On 15.03.16 22:30, Guido van Rossum wrote: I came across a file that had two different coding cookies -- one on the first line and one on the second. CPython uses the first, but mypy happens to use the second. I couldn't find anything in the spec or docs ruling out the second interpretation. Does anyone have a suggestion (apart from following CPython)? Reference: https://github.com/python/mypy/issues/1281 There is similar question. If a file has two different coding cookies on the same line, what should win? Currently the last cookie wins, in CPython parser, in the tokenize module, in IDLE, and in number of other code. I think this is a bug. I just tested with Emacs, and it looks that when specify different codings on two different lines, the first coding wins, but when specify different codings on the same line, the last coding wins. Therefore current CPython behavior can be correct, and the regular expression in PEP 263 should be changed to use greedy repetition. Just because emacs works that way (and even though I'm an emacs user), that doesn't mean CPython should act like emacs. (1) CPython should not necessarily act like emacs, unless the coding syntax exactly matches emacs, rather than the generic coding that CPython interprets, that matches emacs, vim, and other similar things that both emacs and vim would ignore. (1a) Maybe if a similar test were run on vim with its syntax, and it also works the same way, then one might think it is a trend worth following, but it is not clear to this non-vim user that vim syntax allows more than one coding specification per line. (2) emacs has no requirement that the coding be placed on the first two lines. It specifically looks at the second line only if the first line has a “ #! ” or a “ '\" ” (for troff). (according to docs, not experimentation) (3) emacs also allows for Local Variables to be specified at the end of the file. If CPython were really to act like emacs, then it would need to allow for that too. (4) there is no benefit to specifying the coding twice on a line, it only adds confusion, whether in CPython, emacs, or vim. (4a) Here's an untested line that emacs would interpret as utf-8, and CPython with the greedy regulare expression would interpret as latin-1, because emacs looks only between the -*- pair, and CPython ignores that. # -*- coding: utf-8 -*- this file does not use coding: latin-1 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 515: Underscores in Numeric Literals (revision 3)
Guido van Rossum python.org> writes: > I don't care too much either way, but I think passing underscores to the constructor shouldn't be affected by the context -- the underscores are just removed before parsing the number. But if it's too complicated to implement I'm fine with punting. Just removing the underscores would be fine. The problem is that per the PEP the conversion should happen according the Python float grammar but the actual decimal grammar is the one from the IBM specification. I'd much rather express the problem like you did above: A preprocessing step followed by the IBM specification grammar. Stefan Krah ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 515: Underscores in Numeric Literals (revision 3)
I don't care too much either way, but I think passing underscores to the constructor shouldn't be affected by the context -- the underscores are just removed before parsing the number. But if it's too complicated to implement I'm fine with punting. --Guido (mobile) On Mar 19, 2016 6:24 AM, "Nick Coghlan"wrote: > On 19 March 2016 at 16:44, Georg Brandl wrote: > > On the other hand, assuming decimal literals are introduced at some > > point, they would almost definitely need to support underscores. > > Of course, the decision whether to modify the Decimal constructor > > can be postponed until that time. > > The idea of Decimal literals is complicated significantly by their > current context dependent behaviour (especially when it comes to > rounding), so I'd suggest leaving them alone in the context of this > PEP. > > Cheers, > Nick. > > -- > Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What does a double coding cookie mean?
On 17.03.2016 01:29, Guido van Rossum wrote: > I've updated the PEP. Please review. I decided not to update the > Unicode howto (the thing is too obscure). Serhiy, you're probably in a > better position to fix the code looking for cookies to pick the first > one if there are two on the same line (or do whatever you think should > be done there). Thanks, will do. > Should we recommend that everyone use tokenize.detect_encoding()? I'd prefer a separate utility for this somewhere, since tokenize.detect_encoding() is not available in Python 2. I've attached an example implementation with tests, which works in Python 2.7 and 3. > On Wed, Mar 16, 2016 at 5:05 PM, Guido van Rossumwrote: >> On Wed, Mar 16, 2016 at 12:59 AM, M.-A. Lemburg wrote: >>> The only reason to read up to two lines was to address the use of >>> the shebang on Unix, not to be able to define two competing >>> source code encodings :-) >> >> I know. I was just surprised that the PEP was sufficiently vague about >> it that when I found that mypy picked the second if there were two, I >> couldn't prove to myself that it was violating the PEP. I'd rather >> clarify the PEP than rely on the reasoning presented earlier here. I suppose it's a rather rare case, since it's the first time that I heard about anyone thinking that a possible second line could be picked - after 15 years :-) >> I don't like erroring out when there are two different cookies on two >> lines; I feel that the spirit of the PEP is to read up to two lines >> until a cookie is found, whichever comes first. >> >> I will update the regex in the PEP too (or change the wording to avoid >> "match"). >> >> I'm not sure what to do if there are two cooking on one line. If >> CPython currently picks the latter we may want to preserve that >> behavior. >> >> Should we recommend that everyone use tokenize.detect_encoding()? >> >> -- >> --Guido van Rossum (python.org/~guido) > > > -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Mar 17 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ 2016-03-07: Released eGenix pyOpenSSL 0.13.14 ... http://egenix.com/go89 2016-02-19: Released eGenix PyRun 2.1.2 ... http://egenix.com/go88 ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ #!/usr/bin/python """ Utility to detect the source code encoding of a Python file. Marc-Andre Lemburg, 2016. Supports Python 2.7 and 3. """ import sys import re import codecs # Debug output ? _debug = True # PEP 263 RE PEP263 = re.compile(b'^[ \t]*#.*?coding[:=][ \t]*([-.a-zA-Z0-9]+)', re.MULTILINE) ### def detect_source_encoding(code, buffer_size=400): """ Detect and return the source code encoding of the Python code given in code. code must be given as bytes. The function uses a buffer to determine the first two code lines with a default size of 400 bytes/code points. This can be adjusted using the buffer_size parameter. """ # Get the first two lines first_two_lines = b'\n'.join(code[:buffer_size].splitlines()[:2]) # BOMs override any source code encoding comments if first_two_lines.startswith(codecs.BOM): return 'utf-8' # .search() picks the first occurrance m = PEP263.search(first_two_lines) if m is None: return 'ascii' return m.group(1).decode('ascii') # Tests def _test(): l = ( (b"""\ # No encoding """, 'ascii'), (b"""\ # coding: latin-1 """, 'latin-1'), (b"""\ #!/usr/bin/python # coding: utf-8 """, 'utf-8'), (b"""\ coding=123 # The above could be detected as source code encoding """, 'ascii'), (b"""\ # coding: latin-1 # coding: utf-8 """, 'latin-1'), (b"""\ # No encoding on first line # No encoding on second line # coding: utf-8 """, 'ascii'), (codecs.BOM + b"""\ # No encoding """, 'utf-8'), (codecs.BOM + b"""\ # BOM and encoding # coding: latin-1 """, 'utf-8'), ) for code, encoding in l: if _debug: print ('=' * 72) print ('Checking:') print ('-' * 72) print (code.decode('latin-1')) print ('-' * 72) detected_encoding = detect_source_encoding(code) if _debug: print ('detected: %s, expected: %s' % (detected_encoding, encoding)) assert
Re: [Python-Dev] What does a double coding cookie mean?
On 16.03.16 08:03, Serhiy Storchaka wrote: On 15.03.16 22:30, Guido van Rossum wrote: I came across a file that had two different coding cookies -- one on the first line and one on the second. CPython uses the first, but mypy happens to use the second. I couldn't find anything in the spec or docs ruling out the second interpretation. Does anyone have a suggestion (apart from following CPython)? Reference: https://github.com/python/mypy/issues/1281 There is similar question. If a file has two different coding cookies on the same line, what should win? Currently the last cookie wins, in CPython parser, in the tokenize module, in IDLE, and in number of other code. I think this is a bug. I just tested with Emacs, and it looks that when specify different codings on two different lines, the first coding wins, but when specify different codings on the same line, the last coding wins. Therefore current CPython behavior can be correct, and the regular expression in PEP 263 should be changed to use greedy repetition. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What does a double coding cookie mean?
On 17.03.2016 15:55, Guido van Rossum wrote: > On Thu, Mar 17, 2016 at 5:04 AM, Serhiy Storchakawrote: >>> Should we recommend that everyone use tokenize.detect_encoding()? >> >> Likely. However the interface of tokenize.detect_encoding() is not very >> simple. > > I just found that out yesterday. You have to give it a readline() > function, which is cumbersome if all you have is a (byte) string and > you don't want to split it on lines just yet. And the readline() > function raises SyntaxError when the encoding isn't right. I wish > there were a lower-level helper that just took a line and told you > what the encoding in it was, if any. Then the rest of the logic can be > handled by the caller (including the logic of trying up to two lines). I've uploaded the code I posted yesterday, modified to address some of the issues it had to github: https://github.com/malemburg/python-snippets/blob/master/detect_source_encoding.py I'm pretty sure the two-lines read can be optimized away and put straight into the regular expression used for matching. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Mar 18 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ 2016-03-07: Released eGenix pyOpenSSL 0.13.14 ... http://egenix.com/go89 2016-02-19: Released eGenix PyRun 2.1.2 ... http://egenix.com/go88 ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com From openstack-dev-bounces+archive=mail-archive@lists.openstack.org Sat Mar 19 08:12:23 2016 Return-path: Envelope-to: arch...@mail-archive.com Delivery-date: Sat, 19 Mar 2016 08:12:23 -0700 Received: from bolt10a.mxthunder.net ([209.105.224.168]) by mail-archive.com with esmtp (Exim 4.76) (envelope-from ) id 1ahIYA-0002fr-Cy for arch...@mail-archive.com; Sat, 19 Mar 2016 08:12:22 -0700 Received: by bolt10a.mxthunder.net (Postfix, from userid 12345) id 3qRVGK3w2Rz19ktG; Fri, 18 Mar 2016 08:56:36 -0700 (PDT) Received: from lists.openstack.org (lists.openstack.org [50.56.173.222]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by bolt10a.mxthunder.net (Postfix) with ESMTPS id 3qRVFs4wjPz19kcC for ; Fri, 18 Mar 2016 08:56:33 -0700 (PDT) Received: from localhost ([127.0.0.1] helo=lists.openstack.org) by lists.openstack.org with esmtp (Exim 4.76) (envelope-from ) id 1agwhl-0005a9-59; Fri, 18 Mar 2016 15:52:49 + Received: from g4t3426.houston.hp.com ([15.201.208.54]) by lists.openstack.org with esmtp (Exim 4.76) (envelope-from ) id 1agwhj-0005YM-O5 for openstack-...@lists.openstack.org; Fri, 18 Mar 2016 15:52:47 + Received: from G4W9121.americas.hpqcorp.net (g4w9121.houston.hp.com [16.210.21.16]) (using TLSv1.2 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by g4t3426.houston.hp.com (Postfix) with ESMTPS id 605A664 for ; Fri, 18 Mar 2016 15:52:47 + (UTC) Received: from G4W9121.americas.hpqcorp.net (16.210.21.16) by G4W9121.americas.hpqcorp.net (16.210.21.16) with Microsoft SMTP Server (TLS) id 15.0.1076.9; Fri, 18 Mar 2016 15:52:38 + Received: from G4W6304.americas.hpqcorp.net (16.210.26.229) by G4W9121.americas.hpqcorp.net (16.210.21.16) with Microsoft SMTP Server (TLS) id 15.0.1076.9 via Frontend Transport; Fri, 18 Mar 2016 15:52:38 + Received: from G9W0750.americas.hpqcorp.net ([169.254.9.246]) by G4W6304.americas.hpqcorp.net ([16.210.26.229]) with mapi id 14.03.0169.001; Fri, 18 Mar 2016 15:52:38 + From: "Hayes, Graham" To: "OpenStack Development Mailing List (not for usage questions)" Thread-Topic: [openstack-dev] [all][infra][ptls] tagging reviews, making tags searchable Thread-Index: AQHRgSbdQRQMpxFOeUKGrubFuR+3RA== Date: Fri, 18 Mar 2016 15:52:37 + Message-ID:
Re: [Python-Dev] What does a double coding cookie mean?
On 17.03.16 21:11, Guido van Rossum wrote: I tried this and it was too painful, so now I've just changed the regex that mypy uses to use non-eager matching (https://github.com/python/mypy/commit/b291998a46d580df412ed28af1ba1658446b9fe5). \s* matches newlines. {0,1}? is the same as ??. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bitfields - short - and xlc compiler
Update: Is this going to be impossible? test_short fails om AIX when using xlC in any case. How terrible is this? == FAIL: test_shorts (ctypes.test.test_bitfields.C_Test) -- Traceback (most recent call last): File "/data/prj/aixtools/python/python-2.7.11.2/Lib/ctypes/test/test_bitfields.py", line 48, in test_shorts self.assertEqual((name, i, getattr(b, name)), (name, i, func(byref(b), name))) AssertionError: Tuples differ: ('M', 1, -1) != ('M', 1, 1) First differing element 2: -1 1 - ('M', 1, -1) ? - + ('M', 1, 1) -- Ran 440 tests in 1.538s FAILED (failures=1, skipped=91) Traceback (most recent call last): File "./Lib/test/test_ctypes.py", line 15, in test_main() File "./Lib/test/test_ctypes.py", line 12, in test_main run_unittest(unittest.TestSuite(suites)) File "/data/prj/aixtools/python/python-2.7.11.2/Lib/test/test_support.py", line 1428, in run_unittest _run_suite(suite) File "/data/prj/aixtools/python/python-2.7.11.2/Lib/test/test_support.py", line 1411, in _run_suite raise TestFailed(err) test.test_support.TestFailed: Traceback (most recent call last): File "/data/prj/aixtools/python/python-2.7.11.2/Lib/ctypes/test/test_bitfields.py", line 48, in test_shorts self.assertEqual((name, i, getattr(b, name)), (name, i, func(byref(b), name))) AssertionError: Tuples differ: ('M', 1, -1) != ('M', 1, 1) First differing element 2: -1 1 - ('M', 1, -1) ? - + ('M', 1, 1) On 17-Mar-16 23:31, Michael Felt wrote: a) hope this is not something you expect to be on -list, if so - my apologies! Getting this message (here using c99 as compiler name, but same issue with xlc as compiler name) c99 -qarch=pwr4 -qbitfields=signed -DNDEBUG -O -I. -IInclude -I./Include -I/data/prj/aixtools/python/python-2.7.11.2/Include -I/data/prj/aixtools/python/python-2.7.11.2 -c /data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c -o build/temp.aix-5.3-2.7/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.o "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field M must be of type signed int, unsigned int or int. "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field N must be of type signed int, unsigned int or int. "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field O must be of type signed int, unsigned int or int. "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field P must be of type signed int, unsigned int or int. "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field Q must be of type signed int, unsigned int or int. "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field R must be of type signed int, unsigned int or int. "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field S must be of type signed int, unsigned int or int. for: struct BITS { int A: 1, B:2, C:3, D:4, E: 5, F: 6, G: 7, H: 8, I: 9; short M: 1, N: 2, O: 3, P: 4, Q: 5, R: 6, S: 7; }; in short xlC v11 does not like short (xlC v7 might have accepted it, but "32-bit machines were common then". I am guessing that 16-bit is not well liked on 64-bit hw now. reference for xlC v7, where short was (apparently) still accepted: http://www.serc.iisc.ernet.in/facilities/ComputingFacilities/systems/cluster/vac-7.0/html/language/ref/clrc03defbitf.htm I am taking this is from xlC v7 documentation from the URL, not because I know it personally. So - my question: if "short" is unacceptable for POWER, or maybe only xlC (not tried with gcc) - how terrible is this, and is it possible to adjust the test so - the test is accurate? I am going to modify the test code so it is struct BITS { signed int A: 1, B:2, C:3, D:4, E: 5, F: 6, G: 7, H: 8, I: 9; unsigned int M: 1, N: 2, O: 3, P: 4, Q: 5, R: 6, S: 7; }; And see what happens - BUT - what does this have for impact on python - assuming that "short" bitfields are not supported? p.s. not submitting this a bug (now) as it may just be that "you" consider it a bug in xlC to not support (signed) short bit fields. p.p.s. Note: xlc, by default, considers bitfields to be unsigned. I was trying to force them to signed with -qbitfields=signed - and I still got messages. So, going back to defaults. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe:
[Python-Dev] GSoC: looking for a student to help on FAT Python
Hi, I am now looking for a Google Summer of Code (GSoC) student to help me of my FAT Python project, a new static optimizer for CPython 3.6 using specialization with guards. The FAT Python project is already fully functional, the code is written and tested. I need help to implement new efficient optimizations to "finish" the project and prove that my design allows to really run applications faster. FAT Python project: https://faster-cpython.readthedocs.org/fat_python.html fatoptimizer module: https://fatoptimizer.readthedocs.org/ Slides of my talk at FOSDEM: https://github.com/haypo/conf/raw/master/2016-FOSDEM/fat_python.pdf The "fatoptimizer" optimizer is written in pure Python. I'm looking for a student who knows compilers especially static optimizations like loop unrolling and function inlining. For concrete tasks, take a look at the TODO list: https://fatoptimizer.readthedocs.org/en/latest/todo.html Hurry up students! The deadline is in 1 week! (Sorry, I'm late for my project...) -- PSF GSoC, Python core projects: https://wiki.python.org/moin/SummerOfCode/2016/python-core All PSF GSoC projects: https://wiki.python.org/moin/SummerOfCode/2016 GSOC: https://developers.google.com/open-source/gsoc/ Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] bitfields - short - and xlc compiler
a) hope this is not something you expect to be on -list, if so - my apologies! Getting this message (here using c99 as compiler name, but same issue with xlc as compiler name) c99 -qarch=pwr4 -qbitfields=signed -DNDEBUG -O -I. -IInclude -I./Include -I/data/prj/aixtools/python/python-2.7.11.2/Include -I/data/prj/aixtools/python/python-2.7.11.2 -c /data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c -o build/temp.aix-5.3-2.7/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.o "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field M must be of type signed int, unsigned int or int. "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field N must be of type signed int, unsigned int or int. "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field O must be of type signed int, unsigned int or int. "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field P must be of type signed int, unsigned int or int. "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field Q must be of type signed int, unsigned int or int. "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field R must be of type signed int, unsigned int or int. "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field S must be of type signed int, unsigned int or int. for: struct BITS { int A: 1, B:2, C:3, D:4, E: 5, F: 6, G: 7, H: 8, I: 9; short M: 1, N: 2, O: 3, P: 4, Q: 5, R: 6, S: 7; }; in short xlC v11 does not like short (xlC v7 might have accepted it, but "32-bit machines were common then". I am guessing that 16-bit is not well liked on 64-bit hw now. reference for xlC v7, where short was (apparently) still accepted: http://www.serc.iisc.ernet.in/facilities/ComputingFacilities/systems/cluster/vac-7.0/html/language/ref/clrc03defbitf.htm I am taking this is from xlC v7 documentation from the URL, not because I know it personally. So - my question: if "short" is unacceptable for POWER, or maybe only xlC (not tried with gcc) - how terrible is this, and is it possible to adjust the test so - the test is accurate? I am going to modify the test code so it is struct BITS { signed int A: 1, B:2, C:3, D:4, E: 5, F: 6, G: 7, H: 8, I: 9; unsigned int M: 1, N: 2, O: 3, P: 4, Q: 5, R: 6, S: 7; }; And see what happens - BUT - what does this have for impact on python - assuming that "short" bitfields are not supported? p.s. not submitting this a bug (now) as it may just be that "you" consider it a bug in xlC to not support (signed) short bit fields. p.p.s. Note: xlc, by default, considers bitfields to be unsigned. I was trying to force them to signed with -qbitfields=signed - and I still got messages. So, going back to defaults. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What does a double coding cookie mean?
On 3/16/2016 3:14 AM, Serhiy Storchaka wrote: On 16.03.16 02:28, Guido van Rossum wrote: I agree that the spirit of the PEP is to stop at the first coding cookie found. Would it be okay if I updated the PEP to clarify this? I'll definitely also update the docs. Could you please also update the regular expression in PEP 263 to "^[ \t\v]*#.*?coding[:=][ \t]*([-.a-zA-Z0-9]+)"? Coding cookie must be in comment, only the first occurrence in the line must be taken to account (here is a bug in CPython), encoding name must be ASCII, and there must not be any Python statement on the line that contains the encoding declaration. [1] [1] https://bugs.python.org/issue18873 Also, I think there should be one 'official' function somewhere in the stdlib to get and return the encoding declaration. The patch for the issue above had to make the same change in four places other than tests, a violent violation of DRY. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 515: Underscores in Numeric Literals (revision 3)
On 19 March 2016 at 16:44, Georg Brandlwrote: > On the other hand, assuming decimal literals are introduced at some > point, they would almost definitely need to support underscores. > Of course, the decision whether to modify the Decimal constructor > can be postponed until that time. The idea of Decimal literals is complicated significantly by their current context dependent behaviour (especially when it comes to rounding), so I'd suggest leaving them alone in the context of this PEP. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What does a double coding cookie mean?
On Thu, Mar 17, 2016 at 5:04 AM, Serhiy Storchakawrote: >> Should we recommend that everyone use tokenize.detect_encoding()? > > Likely. However the interface of tokenize.detect_encoding() is not very > simple. I just found that out yesterday. You have to give it a readline() function, which is cumbersome if all you have is a (byte) string and you don't want to split it on lines just yet. And the readline() function raises SyntaxError when the encoding isn't right. I wish there were a lower-level helper that just took a line and told you what the encoding in it was, if any. Then the rest of the logic can be handled by the caller (including the logic of trying up to two lines). -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Interested in the GSoC idea 'Roundup - GitHub integration'
Hello everyone, I am Wasim Thabraze, a Computer Science Undergraduate. I have thoroughly gone through the Core-Python GSoC ideas page and have narrowed down my choices to the project 'Improving Roundup GitHub integration'. I have experience in building stuff that are connected to GitHub. Openflock (http://www.openflock.co) is one of such products that I developed. Can someone please help me in knowing more about the project? I wanted to know how and where GitHub should be integrated in the https://bugs.python.org I hope I can code with Core Python this summer. Regards, Wasim www.thabraze.me github.com/waseem18 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bitfields - short - and xlc compiler
On 2016-03-18 00:56, Michael Felt wrote: Update: Is this going to be impossible? From what I've been able to find out, the C89 standard limits bitfields to int, signed int and unsigned int, and the C99 standard added _Bool, although some compilers allow other integer types too. It looks like your compiler doesn't allow those additional types. test_short fails om AIX when using xlC in any case. How terrible is this? == FAIL: test_shorts (ctypes.test.test_bitfields.C_Test) -- Traceback (most recent call last): File "/data/prj/aixtools/python/python-2.7.11.2/Lib/ctypes/test/test_bitfields.py", line 48, in test_shorts self.assertEqual((name, i, getattr(b, name)), (name, i, func(byref(b), name))) AssertionError: Tuples differ: ('M', 1, -1) != ('M', 1, 1) First differing element 2: -1 1 - ('M', 1, -1) ? - + ('M', 1, 1) -- Ran 440 tests in 1.538s FAILED (failures=1, skipped=91) Traceback (most recent call last): File "./Lib/test/test_ctypes.py", line 15, in test_main() File "./Lib/test/test_ctypes.py", line 12, in test_main run_unittest(unittest.TestSuite(suites)) File "/data/prj/aixtools/python/python-2.7.11.2/Lib/test/test_support.py", line 1428, in run_unittest _run_suite(suite) File "/data/prj/aixtools/python/python-2.7.11.2/Lib/test/test_support.py", line 1411, in _run_suite raise TestFailed(err) test.test_support.TestFailed: Traceback (most recent call last): File "/data/prj/aixtools/python/python-2.7.11.2/Lib/ctypes/test/test_bitfields.py", line 48, in test_shorts self.assertEqual((name, i, getattr(b, name)), (name, i, func(byref(b), name))) AssertionError: Tuples differ: ('M', 1, -1) != ('M', 1, 1) First differing element 2: -1 1 - ('M', 1, -1) ? - + ('M', 1, 1) On 17-Mar-16 23:31, Michael Felt wrote: a) hope this is not something you expect to be on -list, if so - my apologies! Getting this message (here using c99 as compiler name, but same issue with xlc as compiler name) c99 -qarch=pwr4 -qbitfields=signed -DNDEBUG -O -I. -IInclude -I./Include -I/data/prj/aixtools/python/python-2.7.11.2/Include -I/data/prj/aixtools/python/python-2.7.11.2 -c /data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c -o build/temp.aix-5.3-2.7/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.o "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field M must be of type signed int, unsigned int or int. "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field N must be of type signed int, unsigned int or int. "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field O must be of type signed int, unsigned int or int. "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field P must be of type signed int, unsigned int or int. "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field Q must be of type signed int, unsigned int or int. "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field R must be of type signed int, unsigned int or int. "/data/prj/aixtools/python/python-2.7.11.2/Modules/_ctypes/_ctypes_test.c", line 387.5: 1506-009 (S) Bit field S must be of type signed int, unsigned int or int. for: struct BITS { int A: 1, B:2, C:3, D:4, E: 5, F: 6, G: 7, H: 8, I: 9; short M: 1, N: 2, O: 3, P: 4, Q: 5, R: 6, S: 7; }; in short xlC v11 does not like short (xlC v7 might have accepted it, but "32-bit machines were common then". I am guessing that 16-bit is not well liked on 64-bit hw now. reference for xlC v7, where short was (apparently) still accepted: http://www.serc.iisc.ernet.in/facilities/ComputingFacilities/systems/cluster/vac-7.0/html/language/ref/clrc03defbitf.htm I am taking this is from xlC v7 documentation from the URL, not because I know it personally. So - my question: if "short" is unacceptable for POWER, or maybe only xlC (not tried with gcc) - how terrible is this, and is it possible to adjust the test so - the test is accurate? I am going to modify the test code so it is struct BITS { signed int A: 1, B:2, C:3, D:4, E: 5, F: 6, G: 7, H: 8, I: 9; unsigned int M: 1, N: 2, O: 3, P: 4, Q: 5, R: 6, S: 7; }; And see what happens - BUT - what does this have for impact on python - assuming that "short" bitfields are not supported? p.s. not submitting this a bug (now) as it may just be that "you" consider it a bug in xlC to not support (signed) short bit fields. p.p.s. Note: xlc, by default, considers bitfields to be unsigned. I was trying to
Re: [Python-Dev] What does a double coding cookie mean?
On Thu, 17 Mar 2016 at 07:56 Guido van Rossumwrote: > On Thu, Mar 17, 2016 at 5:04 AM, Serhiy Storchaka > wrote: > >> Should we recommend that everyone use tokenize.detect_encoding()? > > > > Likely. However the interface of tokenize.detect_encoding() is not very > > simple. > > I just found that out yesterday. You have to give it a readline() > function, which is cumbersome if all you have is a (byte) string and > you don't want to split it on lines just yet. And the readline() > function raises SyntaxError when the encoding isn't right. I wish > there were a lower-level helper that just took a line and told you > what the encoding in it was, if any. Then the rest of the logic can be > handled by the caller (including the logic of trying up to two lines). > Since this is for mypy my guess is you only want to know the encoding, but if you're simply trying to decode bytes of syntax then importilb.util.decode_source() will handle that for you. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What does a double coding cookie mean?
On 17.03.16 21:11, Guido van Rossum wrote: This will raise SyntaxError if the encoding is unknown. That needs to be caught in mypy's case and then it needs to get the line number from the exception. Good point. "lineno" and "offset" attributes of SyntaxError is set to None by tokenize.detect_encoding() and to 0 by CPython interpreter. They should be set to useful values. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 515: Underscores in Numeric Literals (revision 3)
Where did this PEP leave off? Anything blocking its acceptance? On Sat, 13 Feb 2016 at 00:49 Georg Brandlwrote: > Hi all, > > after talking to Guido and Serhiy we present the next revision > of this PEP. It is a compromise that we are all happy with, > and a relatively restricted rule that makes additions to PEP 8 > basically unnecessary. > > I think the discussion has shown that supporting underscores in > the from-string constructors is valuable, therefore this is now > added to the specification section. > > The remaining open question is about the reverse direction: do > we want a string formatting modifier that adds underscores as > thousands separators? > > cheers, > Georg > > - > > PEP: 515 > Title: Underscores in Numeric Literals > Version: $Revision$ > Last-Modified: $Date$ > Author: Georg Brandl, Serhiy Storchaka > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 10-Feb-2016 > Python-Version: 3.6 > Post-History: 10-Feb-2016, 11-Feb-2016 > > Abstract and Rationale > == > > This PEP proposes to extend Python's syntax and number-from-string > constructors so that underscores can be used as visual separators for > digit grouping purposes in integral, floating-point and complex number > literals. > > This is a common feature of other modern languages, and can aid > readability of long literals, or literals whose value should clearly > separate into parts, such as bytes or words in hexadecimal notation. > > Examples:: > > # grouping decimal numbers by thousands > amount = 10_000_000.0 > > # grouping hexadecimal addresses by words > addr = 0xDEAD_BEEF > > # grouping bits into nibbles in a binary literal > flags = 0b_0011__0100_1110 > > # same, for string conversions > flags = int('0b__', 2) > > > Specification > = > > The current proposal is to allow one underscore between digits, and > after base specifiers in numeric literals. The underscores have no > semantic meaning, and literals are parsed as if the underscores were > absent. > > Literal Grammar > --- > > The production list for integer literals would therefore look like > this:: > >integer: decinteger | bininteger | octinteger | hexinteger >decinteger: nonzerodigit (["_"] digit)* | "0" (["_"] "0")* >bininteger: "0" ("b" | "B") (["_"] bindigit)+ >octinteger: "0" ("o" | "O") (["_"] octdigit)+ >hexinteger: "0" ("x" | "X") (["_"] hexdigit)+ >nonzerodigit: "1"..."9" >digit: "0"..."9" >bindigit: "0" | "1" >octdigit: "0"..."7" >hexdigit: digit | "a"..."f" | "A"..."F" > > For floating-point and complex literals:: > >floatnumber: pointfloat | exponentfloat >pointfloat: [digitpart] fraction | digitpart "." >exponentfloat: (digitpart | pointfloat) exponent >digitpart: digit (["_"] digit)* >fraction: "." digitpart >exponent: ("e" | "E") ["+" | "-"] digitpart >imagnumber: (floatnumber | digitpart) ("j" | "J") > > Constructors > > > Following the same rules for placement, underscores will be allowed in > the following constructors: > > - ``int()`` (with any base) > - ``float()`` > - ``complex()`` > - ``Decimal()`` > > > Prior Art > = > > Those languages that do allow underscore grouping implement a large > variety of rules for allowed placement of underscores. In cases where > the language spec contradicts the actual behavior, the actual behavior > is listed. ("single" or "multiple" refer to allowing runs of > consecutive underscores.) > > * Ada: single, only between digits [8]_ > * C# (open proposal for 7.0): multiple, only between digits [6]_ > * C++14: single, between digits (different separator chosen) [1]_ > * D: multiple, anywhere, including trailing [2]_ > * Java: multiple, only between digits [7]_ > * Julia: single, only between digits (but not in float exponent parts) > [9]_ > * Perl 5: multiple, basically anywhere, although docs say it's > restricted to one underscore between digits [3]_ > * Ruby: single, only between digits (although docs say "anywhere") > [10]_ > * Rust: multiple, anywhere, except for between exponent "e" and digits > [4]_ > * Swift: multiple, between digits and trailing (although textual > description says only "between digits") [5]_ > > > Alternative Syntax > == > > Underscore Placement Rules > -- > > Instead of the relatively strict rule specified above, the use of > underscores could be limited. As we seen from other languages, common > rules include: > > * Only one consecutive underscore allowed, and only between digits. > * Multiple consecutive underscores allowed, but only between digits. > * Multiple consecutive underscores allowed, in most positions except > for the start of the literal, or special positions like after a > decimal point. > > The syntax in this PEP has ultimately
Re: [Python-Dev] What does a double coding cookie mean?
On 3/16/2016 5:29 PM, Guido van Rossum wrote: I've updated the PEP. Please review. I decided not to update the Unicode howto (the thing is too obscure). Serhiy, you're probably in a better position to fix the code looking for cookies to pick the first one if there are two on the same line (or do whatever you think should be done there). Should we recommend that everyone use tokenize.detect_encoding()? On Wed, Mar 16, 2016 at 5:05 PM, Guido van Rossumwrote: On Wed, Mar 16, 2016 at 12:59 AM, M.-A. Lemburg wrote: The only reason to read up to two lines was to address the use of the shebang on Unix, not to be able to define two competing source code encodings :-) I know. I was just surprised that the PEP was sufficiently vague about it that when I found that mypy picked the second if there were two, I couldn't prove to myself that it was violating the PEP. I'd rather clarify the PEP than rely on the reasoning presented earlier here. Oh sure. Updating the PEP is the best way forward. But the reasoning, although from somewhat vague specifications, seems sound enough to declare that it meant "find the first cookie in the first two lines". Which is what you've said in the update, although not quite that tersely. It now leaves no room for ambiguous interpretations. I don't like erroring out when there are two different cookies on two lines; I feel that the spirit of the PEP is to read up to two lines until a cookie is found, whichever comes first. The only reason for an error would be to alert people that had depended on the bugs, or misinterpretations. Personally, I think if they haven't converted to UTF-8 by now, they've got bigger problems than this change. I will update the regex in the PEP too (or change the wording to avoid "match"). I'm not sure what to do if there are two cooking on one line. If CPython currently picks the latter we may want to preserve that behavior. Should we recommend that everyone use tokenize.detect_encoding()? -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What does a double coding cookie mean?
I've updated the PEP. Please review. I decided not to update the Unicode howto (the thing is too obscure). Serhiy, you're probably in a better position to fix the code looking for cookies to pick the first one if there are two on the same line (or do whatever you think should be done there). Should we recommend that everyone use tokenize.detect_encoding()? On Wed, Mar 16, 2016 at 5:05 PM, Guido van Rossumwrote: > On Wed, Mar 16, 2016 at 12:59 AM, M.-A. Lemburg wrote: >> The only reason to read up to two lines was to address the use of >> the shebang on Unix, not to be able to define two competing >> source code encodings :-) > > I know. I was just surprised that the PEP was sufficiently vague about > it that when I found that mypy picked the second if there were two, I > couldn't prove to myself that it was violating the PEP. I'd rather > clarify the PEP than rely on the reasoning presented earlier here. > > I don't like erroring out when there are two different cookies on two > lines; I feel that the spirit of the PEP is to read up to two lines > until a cookie is found, whichever comes first. > > I will update the regex in the PEP too (or change the wording to avoid > "match"). > > I'm not sure what to do if there are two cooking on one line. If > CPython currently picks the latter we may want to preserve that > behavior. > > Should we recommend that everyone use tokenize.detect_encoding()? > > -- > --Guido van Rossum (python.org/~guido) -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What does a double coding cookie mean?
Guido van Rossum writes: > > Should we recommend that everyone use tokenize.detect_encoding()? +1 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What does a double coding cookie mean?
On 17.03.16 02:29, Guido van Rossum wrote: I've updated the PEP. Please review. I decided not to update the Unicode howto (the thing is too obscure). Serhiy, you're probably in a better position to fix the code looking for cookies to pick the first one if there are two on the same line (or do whatever you think should be done there). http://bugs.python.org/issue26581 Should we recommend that everyone use tokenize.detect_encoding()? Likely. However the interface of tokenize.detect_encoding() is not very simple. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What does a double coding cookie mean?
On 17.03.16 19:23, M.-A. Lemburg wrote: On 17.03.2016 15:02, Serhiy Storchaka wrote: On 17.03.16 15:14, M.-A. Lemburg wrote: On 17.03.2016 01:29, Guido van Rossum wrote: Should we recommend that everyone use tokenize.detect_encoding()? I'd prefer a separate utility for this somewhere, since tokenize.detect_encoding() is not available in Python 2. I've attached an example implementation with tests, which works in Python 2.7 and 3. Sorry, but this code doesn't match the behaviour of Python interpreter, nor other tools. I suggest to backport tokenize.detect_encoding() (but be aware that the default encoding in Python 2 is ASCII, not UTF-8). Yes, I got the default for Python 3 wrong. I'll fix that. Thanks for the note. What other aspects are different than what Python implements ? 1. If there is a BOM and coding cookie, the source encoding is "utf-8-sig". 2. If there is a BOM and coding cookie is not 'utf-8', this is an error. 3. If the first line is not blank or comment line, the coding cookie is not searched in the second line. 4. Encoding name should be canonized. "UTF8", "utf8", "utf_8" and "utf-8" is the same encoding (and all are changed to "utf-8-sig" with BOM). 5. There isn't the limit of 400 bytes. Actually there is a bug with handling long lines in current code, but even with this bug the limit is larger. 6. I made a mistake in the regular expression, missed the underscore. tokenize.detect_encoding() is the closest imitation of the behavior of Python interpreter. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 515: Underscores in Numeric Literals (revision 3)
I'll update the text so that the format() gets promoted from optional to specified. There was one point of discussion in the tracker issue that should be resolved before acceptance: the Decimal constructor is listed as getting updated to allow underscores, but its syntax is specified in the Decimal spec: http://speleotrove.com/decimal/daconvs.html Acccepting underscores would be an extension to the spec, which may not be what we want to do as otherwise Decimal follows that spec closely. On the other hand, assuming decimal literals are introduced at some point, they would almost definitely need to support underscores. Of course, the decision whether to modify the Decimal constructor can be postponed until that time. cheers, Georg On 03/19/2016 01:02 AM, Guido van Rossum wrote: > I'm happy to accept this PEP as is stands, assuming the authors are > ready for this news. I recommend also implementing the option from > footnote [11] (extend the number-to-string formatting language to > allow ``_`` as a thousans separator). > > On Thu, Mar 17, 2016 at 11:19 AM, Brett Cannonwrote: >> Where did this PEP leave off? Anything blocking its acceptance? >> >> On Sat, 13 Feb 2016 at 00:49 Georg Brandl wrote: >>> >>> Hi all, >>> >>> after talking to Guido and Serhiy we present the next revision >>> of this PEP. It is a compromise that we are all happy with, >>> and a relatively restricted rule that makes additions to PEP 8 >>> basically unnecessary. >>> >>> I think the discussion has shown that supporting underscores in >>> the from-string constructors is valuable, therefore this is now >>> added to the specification section. >>> >>> The remaining open question is about the reverse direction: do >>> we want a string formatting modifier that adds underscores as >>> thousands separators? >>> >>> cheers, >>> Georg >>> >>> - >>> >>> PEP: 515 >>> Title: Underscores in Numeric Literals >>> Version: $Revision$ >>> Last-Modified: $Date$ >>> Author: Georg Brandl, Serhiy Storchaka >>> Status: Draft >>> Type: Standards Track >>> Content-Type: text/x-rst >>> Created: 10-Feb-2016 >>> Python-Version: 3.6 >>> Post-History: 10-Feb-2016, 11-Feb-2016 >>> >>> Abstract and Rationale >>> == >>> >>> This PEP proposes to extend Python's syntax and number-from-string >>> constructors so that underscores can be used as visual separators for >>> digit grouping purposes in integral, floating-point and complex number >>> literals. >>> >>> This is a common feature of other modern languages, and can aid >>> readability of long literals, or literals whose value should clearly >>> separate into parts, such as bytes or words in hexadecimal notation. >>> >>> Examples:: >>> >>> # grouping decimal numbers by thousands >>> amount = 10_000_000.0 >>> >>> # grouping hexadecimal addresses by words >>> addr = 0xDEAD_BEEF >>> >>> # grouping bits into nibbles in a binary literal >>> flags = 0b_0011__0100_1110 >>> >>> # same, for string conversions >>> flags = int('0b__', 2) >>> >>> >>> Specification >>> = >>> >>> The current proposal is to allow one underscore between digits, and >>> after base specifiers in numeric literals. The underscores have no >>> semantic meaning, and literals are parsed as if the underscores were >>> absent. >>> >>> Literal Grammar >>> --- >>> >>> The production list for integer literals would therefore look like >>> this:: >>> >>>integer: decinteger | bininteger | octinteger | hexinteger >>>decinteger: nonzerodigit (["_"] digit)* | "0" (["_"] "0")* >>>bininteger: "0" ("b" | "B") (["_"] bindigit)+ >>>octinteger: "0" ("o" | "O") (["_"] octdigit)+ >>>hexinteger: "0" ("x" | "X") (["_"] hexdigit)+ >>>nonzerodigit: "1"..."9" >>>digit: "0"..."9" >>>bindigit: "0" | "1" >>>octdigit: "0"..."7" >>>hexdigit: digit | "a"..."f" | "A"..."F" >>> >>> For floating-point and complex literals:: >>> >>>floatnumber: pointfloat | exponentfloat >>>pointfloat: [digitpart] fraction | digitpart "." >>>exponentfloat: (digitpart | pointfloat) exponent >>>digitpart: digit (["_"] digit)* >>>fraction: "." digitpart >>>exponent: ("e" | "E") ["+" | "-"] digitpart >>>imagnumber: (floatnumber | digitpart) ("j" | "J") >>> >>> Constructors >>> >>> >>> Following the same rules for placement, underscores will be allowed in >>> the following constructors: >>> >>> - ``int()`` (with any base) >>> - ``float()`` >>> - ``complex()`` >>> - ``Decimal()`` >>> >>> >>> Prior Art >>> = >>> >>> Those languages that do allow underscore grouping implement a large >>> variety of rules for allowed placement of underscores. In cases where >>> the language spec contradicts the actual behavior, the actual behavior >>> is listed. ("single" or "multiple" refer to allowing runs of >>> consecutive underscores.) >>>