[issue40791] hmac.compare_digest could try harder to be constant-time.
Change by Devin Jeanpierre : -- keywords: +patch pull_requests: +19700 stage: -> patch review pull_request: https://github.com/python/cpython/pull/20444 ___ Python tracker <https://bugs.python.org/issue40791> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40791] hmac.compare_digest could try harder to be constant-time.
New submission from Devin Jeanpierre : `hmac.compare_digest` (via `_tscmp`) does not mark the accumulator variable `result` as volatile, which means that the compiler is allowed to short-circuit the comparison loop as long as it still reads from both strings. In particular, when `result` is non-volatile, the compiler is allowed to change the loop from this: ```c for (i=0; i < length; i++) { result |= *left++ ^ *right++; } return (result == 0); ``` into (the moral equivalent of) this: ```c for (i=0; i < length; i++) { result |= *left++ ^ *right++; if (result) { for (; ++i < length;) { *left++; *right++; } return 1; } } return (result == 0); ``` (Code not tested.) This might not seem like much, but it cuts out almost all of the data dependencies between `result`, `left`, and `right`, which in theory would free the CPU to race ahead using out of order execution -- it could execute code that depends on the result of `_tscmp`, even while `_tscmp` is still performing the volatile reads. (I have not actually benchmarked this. :)) In other words, this weird short circuiting could still actually improve performance. That, in turn, means that it would break constant-time guarantees. (This is different from saying that it _would_ increase performance, but marking it volatile removes the worry.) (Prior art/discussion: https://github.com/google/tink/commit/335291c42eecf29fca3d85fed6179d11287d253e ) I propose two changes, one trivial, and one that's more invasive: 1) Make `result` a `volatile unsigned char` instead of `unsigned char`. 2) When SSL is available, instead use `CRYPTO_memcmp` from OpenSSL/BoringSSL. We are, in effect, "rolling our own crypto". The SSL libraries are more strictly audited for timing issues, down to actually checking the generated machine code. As tools improve, those libraries will grow to use those tools. If we use their functions, we get the benefit of those audits and improvements. -- components: Library (Lib) messages: 370053 nosy: Devin Jeanpierre priority: normal severity: normal status: open title: hmac.compare_digest could try harder to be constant-time. versions: Python 3.10, Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue40791> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: Users banned
On Sun, Jul 15, 2018 at 5:09 PM Jim Lee wrote: > That is, of course, the decision of the moderators - but I happen to > agree with both Christian and Ethan. Banning for the simple reason of a > dissenting opinion is censorship, pure and simple. While Bart may have > been prolific in his arguments, he never spoke in a toxic or > condescending manner, or broke any of the rules of conduct. I cannot > say the same for several who engaged with him. +1000 It seems to me like the python-list moderators are rewarding people for being bullies, by banning the people they were bullying. The behavior on the list the past few days has been unforgivably toxic, and that has nothing to do with the behavior of Bart et al. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Kindness
On Fri, Jul 13, 2018 at 10:49 AM Mark Lawrence wrote: > > On 13/07/18 16:16, Bart wrote: > > On 13/07/2018 13:33, Steven D'Aprano wrote: > >> On Fri, 13 Jul 2018 11:37:41 +0100, Bart wrote: > >> > >>> (** Something so radical I've been using them elsewhere since forever.) > >> > >> And you just can't resist making it about you and your language. > > > > And you can't resist having a personal dig. > > > > You are a troll and should have been banned from this list years ago. This exchange is unacceptable. I don't know who Bart is or what their language is, but they left a basically OK (if kinda edgy) comment, and this was immediately escalated into a series of personal attacks. Not everyone is as familiar with the surrounding context. To me, it looks like you are all bullying someone. Even if you think it is justified, consider how it looks to others. I am afraid of ever getting on your "bad side", and I bet the same is true of other non-Bart people, too. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On Sat, Jul 7, 2018 at 6:49 AM Marko Rauhamaa wrote: > Is that guaranteed to be thread-safe? The documentation ( s://docs.python.org/3/library/stdtypes.html#dict.setdefault>) makes no > such promise. It's guaranteed to be thread-safe because all of Python's core containers are thread safe (in as far as they document behaviors/invariants, which implicitly also hold in multithreaded code -- Python does not take the approach other languages do of "thread-compatible" containers that have undefined behavior if mutated from multiple threads simultaneously). It isn't guaranteed to be _atomic_ by the documentation, but I bet no Python implementation would make dict.setdefault non-atomic. There's no good description of the threading rules for Python data structures anywhere. ISTR there was a proposal to give Python some defined rules around thread-safety a couple of years ago (to help with things like GIL-less python projects), but I guess nothing ever came of it. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Why is the use of an undefined name not a syntax error?
On Sun, Apr 1, 2018 at 2:38 PM, Chris Angelicowrote: > On Mon, Apr 2, 2018 at 7:24 AM, David Foster wrote: >> My understanding is that the Python interpreter already has enough >> information when bytecode-compiling a .py file to determine which names >> correspond to local variables in functions. That suggests it has enough >> information to identify all valid names in a .py file and in particular to >> identify which names are not valid. >> > > It's not as simple as you think. Here's a demo. Using all of the > information available to the compiler, tell me which of these names > are valid and which are not: This feels like browbeating to me. Just because a programmer finds it hard to figure out manually, doesn't mean a computer can't do it automatically. And anyway, isn't the complexity of reviewing such code an argument in favor of automatic detection, rather than against? For example, whether or not "except Exception:" raises an error depends on what kind of scope we are in and what variable declarations exist in this scope (in a global or class scope, all lookups are dynamic and go up to the builtins, whereas in a function body this would have resulted in an unbound local exception because it uses fast local lookup). What a complex thing. But easy for a computer to detect, actually -- it's right in the syntax tree (and bytecode) what kind of lookup it is, and what paths lead to defining it, and a fairly trivial control flow analysis would discover if it will always, never, or sometimes raise a NameError -- in the absence of "extreme dynamism" like mutating the builtins and so on. :( Unfortunately, the extreme dynamism can't really be eliminated as a possibility, and there's no rule that says "just because this will always raise an exception, we can fail at compile-time instead". Maybe a particular UnboundLocalError was on purpose, after all. Python doesn't know. So probably this can't ever sensibly be a compile error, even if it's a fantastically useful lint warning. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Why is the use of an undefined name not a syntax error?
> But if it is cheap to detect a wide variety of name errors at compile time, > is there any particular reason it is not done? >From my perspective, it is done, but by tools that give better output than Python's parser. :) Linters (like pylint) are better than syntax errors here, because they collect all of the undefined variables, not just the first one. Maybe Python could/should be changed to give more detailed errors of this kind as well. e.g. Clang parse errors for C and C++ are much more thorough and will report all of your typos, not just the first one. > P.S. Here are some uncommon language features that interfere with identifying > all valid names. In their absence, one might expect an invalid name to be a > syntax error: Also, if statements, depending on what you mean by "invalid": def foo(x): if x: y = 3 return y # will raise UnboundLocalError if not x -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: binary decision diagrams
On Mon, Dec 18, 2017 at 5:00 AM, Wild, Marcel, Drwrote: > Hello everybody: > I really don't know anything about Python (I'm using Mathematica) but with > the help of others learned that > > g=expr2bdd(f) > > makes the BDD (=binary decision diagram) g of a Boolean function f. But > what is the easiest (fool-proof) way to print out a diagram of g ? Python doesn't come with support for (ro)bdds built-in. You're probably thinking of this library, which includes visualization instructions: http://pyeda.readthedocs.io/en/latest/bdd.html -- Devin -- https://mail.python.org/mailman/listinfo/python-list
[issue29505] Submit the re, json, & csv modules to oss-fuzz testing
Devin Jeanpierre added the comment: Oops, so it is. I can't read apparently. I'll spend my time on making more fuzz tests in the meantime. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue29505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29505] Submit the re, json, & csv modules to oss-fuzz testing
Devin Jeanpierre added the comment: kcc strongly disagrees though. Copying latest comment: """ fwiw - I object to us running any of this internally at Google. We need to be part of the main oss-fuzz project pulling from upstream revisions. Doing this testing within our blackhole of internal stuff adds more work for us internally (read: which we're not going to do) and wouldn't provide results feedback to the upstream CPython project in a useful timely manner. We must figure out how to get this to build and run on the external oss-fuzz infrastructure """ -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue29505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29505] Submit the re, json, & csv modules to oss-fuzz testing
Devin Jeanpierre added the comment: > i'd rather make this work in oss-fuzz on cpython. can you point me to how > oss-fuzz works and what it wants to do so i can better understand what it > needs? I don't have any details except for what's in the PR to oss-fuzz (https://github.com/google/oss-fuzz/pull/731) My understanding is matches what you've said so far: Python is built to one directory (/out/), but then needs to be run from another directory (/out/ is renamed to /foo/bar/baz/out/). We need python to still work. I have no idea how to do this. The only suggestion on #python-dev IRC was to statically link a libpython.a, but this doesn't avoid needing to import libraries like "encodings" dynamically, so they still need to be locatable on disk. Is there a way to build python so that it doesn't use absolute paths to everything, and so that the install can be moved at will? Or is there a way to tell it that it was moved at runtime? (I am unconvinced PYTHONPATH is a maintainable solution, if it works at all...) oss-fuzz is not going to change away from its model (I asked if they could, they said no), so we're stuck with making Python compatible with it one way or another. This is why I am so drawn to running the test internally on Google's infrastructure anyway: we already _did_ all this work already, via hermetic python. Doing it a second time, but worse, seems annoying. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue29505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29505] Submit the re, json, & csv modules to oss-fuzz testing
Devin Jeanpierre added the comment: So here's an interesting issue: oss-fuzz requires that the built location be movable. IOW, we build Python into $OUT, and then the $OUT directory gets moved somewhere else and the fuzz test gets run from there. This causes problems because Python can no longer find where the modules it needs are (encodings for example). First thought: wouldn't it be nice if we could make a prepackaged and hermetic executable that we can move around freely? Second thought: isn't that "Hermetic Python", as used within Google? Third thought: doesn't Google have an internal fuzz testing environment we can use, instead of oss-fuzz? So unless someone says this is a bad idea, I'd propose we not run these in oss-fuzz and instead run them in Google proper. The alternative is if there's a way to make it easy to move Python around -- is there a way to build it s.t. the import path is relative and so on? -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue29505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29505] Submit the re, json, & csv modules to oss-fuzz testing
Changes by Devin Jeanpierre <jeanpierr...@gmail.com>: -- keywords: +patch pull_requests: +3434 stage: test needed -> patch review ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue29505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29505] Submit the re, json, & csv modules to oss-fuzz testing
Changes by Devin Jeanpierre <jeanpierr...@gmail.com>: -- pull_requests: +3412 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29505] Submit the re, json, & csv modules to oss-fuzz testing
Devin Jeanpierre added the comment: Huh. I would not have predicted that. https://gcc.gnu.org/onlinedocs/cpp/Defined.html I'll send a fix. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29505] Submit the re, json, & csv modules to oss-fuzz testing
Devin Jeanpierre added the comment: I think they misspoke, it's normal with fuzzing to test against master. The current draft of the code runs this git pull before building/launching any tests: git clone --depth 1 https://github.com/python/cpython.git cpython Speaking of which, I forgot to update this bug thread with the followup PR to actually run CPython's fuzz tests (when they exist): https://github.com/google/oss-fuzz/pull/731. That's where I grabbed the git clone statement from. I think that will be merged after some version of PR 2878 lands in CPython (still in code review / broken). For Python 2 I guess it's different, and we will test against the 2.7 branch, right? -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29505] Submit the re, json, & csv modules to oss-fuzz testing
Changes by Devin Jeanpierre <jeanpierr...@gmail.com>: -- pull_requests: +2929 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17870] Python does not provide PyLong_FromIntMax_t() or PyLong_FromUintMax_t() function
Devin Jeanpierre added the comment: Oh, to be clear on this last point: > Hum, who else needs such function except of you? Right now there is no way to convert an int that might be > 64 bits, into a python long, except really bizarre shenanigans, unless we want to rely on implementation-defined behavior. This would be fine if it were easy to implement, but it isn't -- as we've both agreed, there's no good way to do this, and it is significantly easier to add this to CPython than to implement this from outside of CPython. And I do think there is merit in writing code that doesn't rely on implementation-defined behavior. I also think it's simpler -- imagine if we just didn't care about all these int types! Phew. Ack that this isn't "strong rationale" per your standards, so do whatever is right for this bug. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue17870> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17870] Python does not provide PyLong_FromIntMax_t() or PyLong_FromUintMax_t() function
Devin Jeanpierre added the comment: > Making two C functions public is very different from supporting intmax_t. I > expect a change of a few lines, whereas my intmax_t patch modified a lot of > code. I requested either a way to create from intmax_t, or from bytes. We have two existing functions (that I didn't know about) to do the latter, so it would fix this bug report to just make those public, from my POV. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue17870> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17870] Python does not provide PyLong_FromIntMax_t() or PyLong_FromUintMax_t() function
Devin Jeanpierre added the comment: > Devin, I asked you for a strong rationale to add the feature. I don't see > such rationale, so this issue will be closed again. I guess we have different definitions of "strong rationale". Clearer criteria would help. >> It may be better to make _PyLong_FromByteArray() and _PyLong_AsByteArray() >> public. > That makes sense. I suggest to open a new issue for that. This request was part of the original bug report, so why open a new issue? > PyLong_FromIntMax_t(myinteger) would be great. Or maybe even better would be > PyLong_FromBytes(, sizeof(myinteger)) ? -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue17870> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17870] Python does not provide PyLong_FromIntMax_t() or PyLong_FromUintMax_t() function
Devin Jeanpierre added the comment: > Write your own C extension to do that. Sorry, I don't know what is the best > way to write such C extension. If everyone who wants to convert intptr_t to a python int has to write their own function, then why not just include it in the C-API? Having support for intmax_t means we never have to have this conversation ever again, because it should work for all int types. Reopening since this use-case doesn't sound solved yet. -- resolution: rejected -> status: closed -> open ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue17870> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17870] Python does not provide PyLong_FromIntMax_t() or PyLong_FromUintMax_t() function
Devin Jeanpierre added the comment: > I wrote my first patch in 2013, but I still fail to find a very good example > where intmax_t would be an obvious choice. So I have to agree and I will now > close the issue. Hold on, nobody ever answered the question in the OP. How would you convert an intptr_t (e.g. Rust's int type) to a Python int? You can't use FromVoidPtr because of signedness. You can use FromLongLong, but that's implementation-defined. If what we should be using is FromLongLong for all "really big ints", why not just rename FromLongLong to FromIntMax and call it a day? There is no standard relationship between long long and most other int types -- all we know is that it's at least 64 bits, but an int type can perfectly reasonably be e.g. 80 bits or 128 bits or similar. I think it *is* a worhtwhile goal to allow programmers to write C code that has as little implementation-defined or undefined behavior as possible. If that isn't considered a worthwhile goal, maybe we should reconsider using such a dangerous and pointy language as C. :) -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue17870> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29505] Submit the re, json, & csv modules to oss-fuzz testing
Devin Jeanpierre added the comment: https://github.com/google/oss-fuzz/pull/583 is the PR to oss-fuzz to add the project. I'm working on actual tests to be submitted here. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29505] Submit the re, json, & csv modules to oss-fuzz testing
Devin Jeanpierre added the comment: Aha, I found an existing issue! For adding to oss-fuzz, is there a contact email we can use that is connected to a google account? I am tempted to just put gregory.p.smith on there if not. :) I can volunteer to fuzz some interesting subset of the stdlib. The list I've come up with (by counting uses in my code) is: the XML parser (which seems to be written in C) struct (unpack) the various builtins that parse strings (like int()) hashlib binascii datetime's parsing json I'd also suggest the ast module, since people do use ast.literal_eval on untrusted strings, but I probably won't do that one myself. I wrote a fuzz test for json via upstream simplejson, but the bug on github is getting stale: https://github.com/simplejson/simplejson/issues/163 Should I add it to CPython instead? > We should investigate creating fuzz targets for the Python re module (_sre.c) > at a minimum. If we prioritize based on security risk, I'd argue that this is lower priority than things like json's speedup extension module, because people should generally not pass untrusted strings to the re module: it's very easy to DOS a service with regexes unless you're using RE2 or similar -- which is fuzzed. In contrast, json is supposed to accept untrusted input and people do that very often. (OTOH, I would be willing to bet that fuzzing re will yield more bugs than fuzzing json.) -- nosy: +Devin Jeanpierre ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: tempname.mktemp functionality deprecation
On Sat, Apr 29, 2017 at 11:45 AM, Tim Chasewrote: > Unfortunately, tempfile.mktemp() is described as deprecated > since 2.3 (though appears to still exist in the 3.4.2 that is the > default Py3 on Debian Stable). While the deprecation notice says > "In version 2.3 of Python, this module was overhauled for enhanced > security. It now provides three new functions, NamedTemporaryFile(), > mkstemp(), and mkdtemp(), which should eliminate all remaining need > to use the insecure mktemp() function", as best I can tell, all of > the other functions/objects in the tempfile module return a file > object, not a string suitable for passing to link(). > > So which route should I pursue? > > - go ahead and use tempfile.mktemp() ignoring the deprecation? > > - use a GUID-named temp-file instead for less chance of collision? > > - I happen to already have a hash of the file contents, so use > the .hexdigest() string as the temp-file name? > > - some other solution I've missed? I vote the last one: you can read the .name attribute of the returned file(-like) object from NamedTemporaryFile to get a path to a file, which can be passed to other functions. I guess ideally, one would use linkat instead of os.link[*], but that's platform-specific and not exposed in Python AFAIK. Maybe things would be better if all the functions that accept filenames should also accept files, and do the best job they can? (if a platform supports using the fd instead, use that, otherwise use f.name). .. *: http://stackoverflow.com/questions/17127522/create-a-hard-link-from-a-file-handle-on-unix/18644492#18644492 -- Devin -- https://mail.python.org/mailman/listinfo/python-list
[issue29986] Documentation recommends raising TypeError from tp_richcompare
Devin Jeanpierre added the comment: Yeah, I agree there might be a use-case (can't find one offhand, but in principle), but I think it's rare enough that you're more likely to be led astray from reading this note -- almost always, NotImplemented does what you want. In a way this is a special case of being able to raise an exception at all, which is mentioned earlier ("if another error occurred it must return NULL and set an exception condition.") -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29986> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29986] Documentation recommends raising TypeError from tp_richcompare
Devin Jeanpierre added the comment: Sorry, forgot to link to docs because I was copy-pasting from the PR: https://docs.python.org/2/c-api/typeobj.html#c.PyTypeObject.tp_richcompare https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_richcompare > Note: If you want to implement a type for which only a limited set of > comparisons makes sense (e.g. == and !=, but not < and friends), directly > raise TypeError in the rich comparison function. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29986> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29986] Documentation recommends raising TypeError from tp_richcompare
New submission from Devin Jeanpierre: am not sure when TypeError is the right choice. Definitely, most of the time I've seen it done, it causes trouble, and NotImplemented usually does something better. For example, see the work in https://bugs.python.org/issue8743 to get set to interoperate correctly with other set-like classes --- a problem caused by the use of TypeError instead of returning NotImplemented (e.g. https://hg.python.org/cpython/rev/3615cdb3b86d). This advice seems to conflict with the usual and expected behavior of objects from Python: e.g. object().__lt__(1) returns NotImplemented rather than raising TypeError, despite < not "making sense" for object. Similarly for file objects and other uncomparable classes. Even complex numbers only return NotImplemented! >>> 1j.__lt__(1j) NotImplemented If this note should be kept, this section could use a decent explanation of the difference between "undefined" (should return NotImplemented) and "nonsensical" (should apparently raise TypeError). Perhaps a reference to an example from the stdlib. -- assignee: docs@python components: Documentation messages: 291144 nosy: Devin Jeanpierre, docs@python priority: normal pull_requests: 1167 severity: normal status: open title: Documentation recommends raising TypeError from tp_richcompare ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29986> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: Clickable hyperlinks
Sadly, no. :( Consoles (and stdout) are just text, not hypertext. The way to make an URL clickable is to use a terminal that makes URLs clickable, and print the URL: print("%s: %s" % (description, url)) -- Devin On Tue, Jan 3, 2017 at 11:46 AM, Deborah Swansonwrote: > Excel has a formula: > > =HYPERLINK(url,description) > > that will put a clickable link into a cell. > > Does python have an equivalent function? Probably the most common use > for it would be output to the console, similar to a print statement, but > clickable. > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Clickable hyperlinks
Sadly, no. :( Consoles (and stdout) are just text, not hypertext. The way to make an URL clickable is to use a terminal that makes URLs clickable, and print the URL: print("%s: %s" % (description, url)) -- Devin On Tue, Jan 3, 2017 at 11:46 AM, Deborah Swansonwrote: > Excel has a formula: > > =HYPERLINK(url,description) > > that will put a clickable link into a cell. > > Does python have an equivalent function? Probably the most common use > for it would be output to the console, similar to a print statement, but > clickable. > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Looking for ideas to improve library API
Documentation is all you can do. -- Devin On Thu, Nov 26, 2015 at 5:35 AM, Chris Lalancette <clalance...@gmail.com> wrote: > On Thu, Nov 26, 2015 at 7:46 AM, Devin Jeanpierre > <jeanpierr...@gmail.com> wrote: >> Why not take ownership of the file object, instead of requiring users >> to manage lifetimes? > > Yeah, I've kind of been coming to this conclusion. So my question > then becomes: how do I "take ownership" of it? I already keep a > reference to it, but how would I signal to the API user that they > should no longer use that file object (other than documentation)? > > Thanks, > Chris -- https://mail.python.org/mailman/listinfo/python-list
Re: Looking for ideas to improve library API
Why not take ownership of the file object, instead of requiring users to manage lifetimes? -- Devin On Wed, Nov 25, 2015 at 12:52 PM, Chris Lalancettewrote: > Hello, > I'm currently developing a library called pyiso ( > https://github.com/clalancette/pyiso), used for manipulating ISO disk > images. I'm pretty far along with it, but there is one part of the API > that I really don't like. > Typical usage of the library is something like: > > import pyiso > > p = pyiso.PyIso() // create the object > f = open('/path/to/original.iso', 'r') > p.open(f) // parse all of the metadata from the input ISO > fp = open('/path/to/file/to/add/to/iso', 'r') > p.add_fp(fp) // add a new file to the ISO > out = open('/path/to/modified.iso', 'w') > p.write(out) // write out the modified ISO to another file > out.close() > fp.close() > f.close() > > This currently works OK. The problem ends up being the file descriptor > lifetimes. I want the user to be able to do multiple operations to the > ISO, and I also don't want to read the entire ISO (and new files) into > memory. That means that internal to the library, I take a reference to the > file object that the user passes in during open() and add_fp(). This is > fine, unless the user decides to close the file object before calling the > write method, at which point the write complains of I/O to a closed file. > This is especially problematic when it comes to using context managers, > since the user needs to leave the context open until they call write(). > I've thought of a couple ways to deal with this: > > 1. Make a copy of the file object internal to the library, using os.dup() > to copy the file descriptor. This is kind of nasty, especially since I > want to support other kinds of file objects (think StringIO). > 2. Just document the fact that the user needs to leave the file objects > open until they are done. This is simple, but not super user-friendly. > > I'm looking for any ideas of how to do this better, or something I missed. > Any input is appreciated! > > Thanks, > Chris Lalancette > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Should non-security 2.7 bugs be fixed?
I think you're missing the line where I said all the relevant conversation happened in IRC, and that you should refer to logs. On Sun, Jul 19, 2015 at 11:25 PM, Terry Reedy tjre...@udel.edu wrote: On 7/19/2015 9:20 PM, Devin Jeanpierre wrote: Search your logs for https://bugs.python.org/issue17094 http://bugs.python.org/issue5315 I was most frustrated by the first case -- the patch was (informally) rejected By 'the patch', I presume you mean current-frames-cleanup.patch by Stefan Ring, who said it is certainly not the most complete solution, but it solves my problem.. It was reviewed a month later by a core dev, who said it had two defects. Do you expect us to apply defective patches? No, I meant my patch. It was discussed in IRC, and I gave the search term to grep for. (The issue URL.) in favor of the right fix, right is your word. Natali simply uploaded an alternate patch that did not have the defects cited. It went through 4 versions, two by Pitrou, before the commit and close 2 months later, with the comment Hopefully there aren't any applications relying on the previous behaviour. No, right is the word used by members of #python-dev, referrig to Antoine's fix. Two years later, last May, you proposed and uploaded a patch with what looks to be a new and different approach. It has been ignored. In the absence of a core dev focused on 2.7, I expect that this will continue. Too bad you did not upload it in Feb 2013, before the review and fix started. I'm not sure what you're implying here. It couldn't be helped. and http://bugs.python.org/issue5315 Another fairly obscure issue for most of us. Five years ago, this was turned into a doc issue, but no patch was ever submitted for either 2.x or 3.x. Again, no particular prejudice against 2.x. In May, you posted a bugfix which so far has been ignored. Not too surprising. I submitted a ping and updated the versions. If anyone responds, you might be asked for a patch against 3.4 or 3.5. Again, the prejudice was expressed in IRC. It was ignored because you can just use asyncio in 3.x, and because the bug was old. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Should non-security 2.7 bugs be fixed?
On Sat, Jul 18, 2015 at 9:45 PM, Steven D'Aprano st...@pearwood.info wrote: It gets really boring submitting 2.7-specific patches, though, when they aren't accepted, and the committers have such a hostile attitude towards it. I was told by core devs that, instead of fixing bugs in Python 2, I should just rewrite my app in Python 3. Really? Can you point us to this discussion? Yes, really. It was on #python-dev IRC. If you are right, and that was an official pronouncement, then it seems that non-security bug fixes to 2.7 are forbidden. I never said it was a pronouncement, or official. It wasn't. I have no idea where you got that idea from, given that I specifically have said that I think non-security bug fixes are allowed. I suspect though that it's not quite that black and white. Perhaps there was some doubt about whether or not the patch in question was fixing a bug or adding a feature (a behavioural change). Or the core dev in question was speaking for themselves, not for all. They weren't speaking for all. And, I never said they were. Nor did I imply that they were. Search your logs for https://bugs.python.org/issue17094 and http://bugs.python.org/issue5315 I was most frustrated by the first case -- the patch was (informally) rejected in favor of the right fix, and the right fix was (informally) rejected because it changed behavior, leaving me only with the option of absurd workarounds of a bug in Python, or moving to python 3. It has even been implied that bugs in Python 2 are *good*, because that might help with Python 3 adoption. Really? Can you point us to this discussion? As they say on Wikipedia, Citation Needed. I would like to see the context before taking that at face value. Of course, it was a joke. The format of the joke goes like this: people spend a lot of time debugging and writing bugfixes for Python 2.7, and you say: dev2 guido wants all python 3 features in python 2, so ssbr` maybe choose the right time to ask a backport ;-) dev1 oh. if i would be paid to contribute to cpython, i would probably be ok to backport anything from python 3 to python 2 dev1 since i'm not paid for that, i will to kill python 2, it must suffer a lot And that's about as close to logs as I am comfortable posting. Grep your logs for that, too. I don't like how this is being redirected to surely you misunderstood or I don't believe you. The fact that some core devs are hostile to 2.x development is really bleedingly obvious, you shouldn't need quotes or context thrown at you. The rhetoric almost always shies _just_ short of ceasing bugfixes (until 2020, when that abruptly becomes a cracking good idea). e.g. in 2.7 is here until 2020, please don't call it a waste. I don't want to argue over who said what. I am sure everyone meant the best, and I misunderstood them given a complicated context and a rough day. Let's end this thread here, please. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Should non-security 2.7 bugs be fixed?
On Sun, Jul 19, 2015 at 8:05 PM, Steven D'Aprano st...@pearwood.info wrote: On Mon, 20 Jul 2015 11:20 am, Devin Jeanpierre wrote: I was most frustrated by the first case -- the patch was (informally) rejected in favor of the right fix, and the right fix was (informally) rejected because it changed behavior, leaving me only with the option of absurd workarounds of a bug in Python, or moving to python 3. In the first case, 17094, your comments weren't added until TWO YEARS after the issue was closed. It's quite possible that nobody has even noticed them. In the second case, the issue is still open. So I don't understand your description above: there's no sign that the patch in 17094 was rejected, the patch had bugs and it was fixed and applied to 3.4. It wasn't applied to 2.7 for the reasons explained in the tracker: it could break code that is currently working. For the second issue, it has neither been applied nor rejected. I meant search your #python-dev IRC logs, where this was discussed. As far as whether people notice patches after an issue is closed, Terry Reedy answered yes earlier in the thread. If the answer is actually no, then we should fix how bugs are handled post-closure, in case e.g. someone posts a followup patch that fixes a remaining case, and so on. you shouldn't need quotes or context thrown at you. The rhetoric almost always shies _just_ short of ceasing bugfixes (until 2020, when that abruptly becomes a cracking good idea). e.g. in 2.7 is here until 2020, please don't call it a waste. Right. So you take an extended ten year maintenance period for Python 2.7 as evidence that the core devs are *hostile* to maintaining 2.7? That makes no sense to me. That isn't what I said at all. If you want to say that *some individuals* who happen to have commit rights are hostile to Python 2.7, I can't really argue with that. Individuals can have all sorts of ideas and opinions. But the core devs as a group are very supportive of Python 2.7, even going to the effort of back-porting performance improvements. I do want to say that. It doesn't help that those same individuals are the only core devs I have interacted with while trying to patch 2.7. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Should non-security 2.7 bugs be fixed?
Considering CPython is officially accepting performance improvements to 2.7, surely bug fixes are also allowed? I have contributed both performance improvements and bug fixes to 2.7. In my experience, the problem is not the lack of contributors, it's the lack of code reviewers. I think this is something everyone should care about. The really great thing about working on a project like Python is that not only do you help the programmers who use Python, but also the users who use the software that those programmers create. Python 2.7 is important in the software ecosystem of the world. Fixing bugs and making performance improvements can sometimes significantly help the 1B people who use the software written in Python 2.7. -- Devin On Sat, Jul 18, 2015 at 4:36 PM, Terry Reedy tjre...@udel.edu wrote: I asked the following as an off-topic aside in a reply on another thread. I got one response which presented a point I had not considered. I would like more viewpoints from 2.7 users. Background: each x.y.0 release normally gets up to 2 years of bugfixes, until x.(y+1).0 is released. For 2.7, released summer 2010, the bugfix period was initially extended to 5 years, ending about now. At the spring pycon last year, the period was extended to 10 years, with an emphasis on security and build fixed. My general question is what other fixes should be made? Some specific forms of this question are the following. If the vast majority of Python programmers are focused on 2.7, why are volunteers to help fix 2.7 bugs so scarce? Does they all consider it perfect (or sufficient) as is? Should the core developers who do not personally use 2.7 stop backporting, because no one cares if they do? -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Should non-security 2.7 bugs be fixed?
On Sat, Jul 18, 2015 at 6:34 PM, Terry Reedy tjre...@udel.edu wrote: On 7/18/2015 8:27 PM, Mark Lawrence wrote: On 19/07/2015 00:36, Terry Reedy wrote: Programmers don't much like doing maintainance work when they're paid to do it, so why would they volunteer to do it? Right. So I am asking: if a 3.x user volunteers a 3.x patch and a 3.x core developer reviews and edits the patch until it is ready to commit, why should either of them volunteer to do a 2.7 backport that they will not use? Because it helps even more people. The reason people make upstream contributions is so that the world benefits. If you only wanted to help yourself, you'd just patch CPython locally, and not bother contributing anything upstream. I am suggesting that if there are 10x as many 2.7only programmers as 3.xonly programmers, and none of the 2.7 programmers is willing to do the backport *of an already accepted patch*, then maybe it should not be done at all. That just isn't true. I have backported 3.x patches. Other people have backported entire modules. It gets really boring submitting 2.7-specific patches, though, when they aren't accepted, and the committers have such a hostile attitude towards it. I was told by core devs that, instead of fixing bugs in Python 2, I should just rewrite my app in Python 3. It has even been implied that bugs in Python 2 are *good*, because that might help with Python 3 adoption. Then even if you do the work to fix *ANY* bug there is no guarantee that it gets committed. I am discussing the situation where there *is* a near guarantee (if the backport works and does not break anything and has not been so heavily revised as to require a separate review). That is not how I have experienced contribution to CPython. No, the patches are *not* guaranteed, and in my experience they are not likely to be accepted. If the issue was closed as fixed before I contributed the backported patch, does anyone even see it? -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Fri, Jun 26, 2015 at 11:16 PM, Steven D'Aprano st...@pearwood.info wrote: On Sat, 27 Jun 2015 02:05 pm, Devin Jeanpierre wrote: On Fri, Jun 26, 2015 at 8:38 PM, Steven D'Aprano st...@pearwood.info wrote: Now you say that the application encrypts the data, except that the user can turn that option off. Just make the AES encryption mandatory, not optional. Then the user cannot upload unencrypted malicious data, and the receiver cannot read the data. That's two problems solved. No, because another application could pretend to be the file-sending application, but send unencrypted data instead of encrypted data. Did you stop reading my post when you got to that? Because I went on to say: At that point I quit in frustration, yeah. Actually, the more I think about this, the more I come to think that the only way this can be secure is for both the sending client application and the receiving client appl to both encrypt the data. The sender can't trust the receiver not to read the files, so the sender has to encrypt; the receiver can't trust the sender not to send malicious files, so the receiver has to encrypt too. When you realize you've said something completely wrong, you should edit your email. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sat, Jun 27, 2015 at 6:18 PM, Steven D'Aprano st...@pearwood.info wrote: On Sun, 28 Jun 2015 06:30 am, Devin Jeanpierre wrote: On Fri, Jun 26, 2015 at 11:16 PM, Steven D'Aprano st...@pearwood.info wrote: On Sat, 27 Jun 2015 02:05 pm, Devin Jeanpierre wrote: On Fri, Jun 26, 2015 at 8:38 PM, Steven D'Aprano st...@pearwood.info wrote: Now you say that the application encrypts the data, except that the user can turn that option off. Just make the AES encryption mandatory, not optional. Then the user cannot upload unencrypted malicious data, and the receiver cannot read the data. That's two problems solved. No, because another application could pretend to be the file-sending application, but send unencrypted data instead of encrypted data. Did you stop reading my post when you got to that? Because I went on to say: At that point I quit in frustration, yeah. Actually, the more I think about this, the more I come to think that the only way this can be secure is for both the sending client application and the receiving client appl to both encrypt the data. The sender can't trust the receiver not to read the files, so the sender has to encrypt; the receiver can't trust the sender not to send malicious files, so the receiver has to encrypt too. When you realize you've said something completely wrong, you should edit your email. If both the sender and receiver encrypt the data, how is is completely wrong to say that encrypting data should be mandatory? That isn't what I was calling completely wrong. This is: Just make the AES encryption mandatory, not optional. Then the user cannot upload unencrypted malicious data, and the receiver cannot read the data. That's two problems solved. The user can still upload unencrypted malicious data by writing their own client that doesn't have mandatory AES encryption. You realized this later in the email, apparently, which is why you should have edited your own email to delete your original, insecure, suggestion. :( That said, I appreciate the work you've done here asking for a specific threat model and pushing back on the idea that it's up to python-list to prove something is insecure, not the other way around. That's important. I think, for the same reasons, it's also important to be really careful what cryptosystems we discuss, and not suggest or appear to suggest ones that won't work. P.S. FWIW, the base64 idea has a lot of promise and is probably fundamentally better than a crypto algorithm. With something along the lines of base64 -- say, encoding a file using just the letters 'a' and 'b' -- one might try to make it it literally impossible to write bad things to disk, whereas with any crypto, it is always possible to obtain the key, so one has to be careful with key management to prevent/mitigate that. (One might add: why not both? Beats me. I like using extension modules.) P.P.S.: of course, I'm not an expert. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
Johannes, I agree with a lot of what you say, but can you please have less of a mean attitude? -- Devin On Fri, Jun 26, 2015 at 3:42 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote: On 26.06.2015 23:29, Jon Ribbens wrote: While you seem to think that Steven is rampaging about nothing, he does have a fair point: You consistently were vague about wheter you want to have encryption, authentication or obfuscation of data. This suggests that you may not be so sure yourself what it is you actually want. He hasn't been vague, you and Steven just haven't been paying attention. Bullshit. Even the topic indicates that he doesn't know what he wants: data mangling or encryption, which one is it? You always play around with the 256! which would be a ridiculously high security margin (1684 bits of security, w!). You totally ignore that the system can be broken in a linear fashion. No, it can't, because the attacker does not have access to the ciphertext. Or so you claim. I could go into detail about how the assumtion that the ciphertext is secret is not a smart one in the context of cryptography. And how side channels and other leakage may affect overall system security. But I'm going to save my time on that. I do get paid to review cryptographic systems and part of the job is dealing with belligerent people who have read Schneier's blog and think they can outsmart anyone else. Since I don't get paid to convice you, it's absolutely fine that you think your substitution scheme is the grand prize. Nobody assumes you're a moron. But it's safe to assume that you're a crypto layman, because only laymen have no clue on how difficult it is to get cryptography even remotely right. Amateur crypto is indeed a bad idea. But what you're still not getting is that what he's doing here *isn't crypto*. So the topic says Encrypting. If you look really closely at the word, the part crypt might give away to you that cryptography is involved. He's just trying to avoid letting third parties write completely arbitrary data to the disk. There's your requirement. Then there's obviously some kind of implication when a third party *can* write arbitrary data to disk. And your other solution to that problem... You know what would be a perfectly good solution to his problem? Base 64 encoding. That would solve the issue pretty much completely, the only reason it's not an ideal solution is that it of course increases the size of the data. ...wow. That's a nice interpretation of not letting a third party write completely arbitrary data. According to your definition, this would be: It's okay if the attacker can control 6 of 8 bits. That people in 2015 actually defend inventing a substitution-cipher cryptosystem sends literally shivers down my spine. Nobody is defending such a thing, you just haven't understood what problem is being solved here. Oh I understand your solutions plenty well. The only thing I don't understand is why you don't own a Fields medal yet for your groundbreaking work on bulletproof obfuscation. Cheers, Johannes -- Wo hattest Du das Beben nochmal GENAU vorhergesagt? Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Fri, Jun 26, 2015 at 8:38 PM, Steven D'Aprano st...@pearwood.info wrote: Now you say that the application encrypts the data, except that the user can turn that option off. Just make the AES encryption mandatory, not optional. Then the user cannot upload unencrypted malicious data, and the receiver cannot read the data. That's two problems solved. No, because another application could pretend to be the file-sending application, but send unencrypted data instead of encrypted data. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Thu, Jun 25, 2015 at 2:57 AM, Chris Angelico ros...@gmail.com wrote: On Thu, Jun 25, 2015 at 7:41 PM, Devin Jeanpierre jeanpierr...@gmail.com wrote: I know that the OP doesn't propose using ROT-13, but a classical substitution cipher isn't that much stronger. Yes, it is. It requires the attacker being able to see something about the ciphertext, unlike ROT13. But it is reasonable to suppose that maybe the attacker can trigger the file getting executed, at which point maybe you can deduce from the behavior what the starting bytes are...? If a symmetric cipher is being used and the key is known, anyone can simply perform a decryption operation on the desired bytes, get back a pile of meaningless encrypted junk, and submit that. When it's encrypted with the same key, voila! The cleartext will reappear. Asymmetric ciphers are a bit different, though. AIUI you can't perform a decryption without the private key, whereas you can encrypt with only the public key. So you ought to be safe on that one; the only way someone could deliberately craft input that, when encrypted with your public key, produces a specific set of bytes, would be to brute-force it. (But I might be wrong on that. I'm no crypto expert.) Yes, so it should be random. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Thu, Jun 25, 2015 at 2:25 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Thursday 25 June 2015 14:27, Devin Jeanpierre wrote: The original post said that the sender will usually send files they encrypted, unless they are malicious. So if the sender wants them to be encrypted, they already are. The OP *hopes* that the sender will encrypt the files. I think that's a vanishingly faint hope, unless the application itself encrypts the file. Most people don't have any encryption software beyond password-protecting zip files. Zip 2.0 legacy encryption is crap, and there are plenty of tools available to break it. Winzip has an extension for 128-bit and 256-bit AES encryption, both of which are probably strong enough unless you're targeted by the NSA, but the weak link in the chain is the idea that people will encrypt the software before sending it. Even if they have the tools, laziness being the defining characteristic of most people, they won't use them. You're right, I was supposing that since they wrote the server, they also wrote the client, and were just protecting from the protocol itself being weak. I know that the OP doesn't propose using ROT-13, but a classical substitution cipher isn't that much stronger. Yes, it is. It requires the attacker being able to see something about the ciphertext, unlike ROT13. But it is reasonable to suppose that maybe the attacker can trigger the file getting executed, at which point maybe you can deduce from the behavior what the starting bytes are...? I don't think any of us *really* understand his use-case or the potential threats, but to my way of thinking, you can never have too strong a cipher or underestimate the risk of users taking short-cuts. This is truth. It would be nice if something like keyczar came in the stdlib. (Otherwise, users of Python take shortcuts and use randomized substitution ciphers instead of AES.) -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
How about a random substitution cipher? This will be ultra-weak, but fast (using bytes.translate/bytes.maketrans) and seems to be the kind of thing you're asking for. -- Devin On Tue, Jun 23, 2015 at 12:02 PM, Randall Smith rand...@tnr.cc wrote: Chunks of data (about 2MB) are to be stored on machines using a peer-to-peer protocol. The recipient of these chunks can't assume that the payload is benign. While the data senders are supposed to encrypt data, that's not guaranteed, and I'd like to protect the recipient against exposure to nefarious data by mangling or encrypting the data before it is written to disk. My original idea was for the recipient to encrypt using AES. But I want to keep this software pure Python batteries included and not require installation of other platform-dependent software. Pure Python AES and even DES are just way too slow. I don't know that I really need encryption here, but some type of fast mangling algorithm where a bad actor sending a payload can't guess the output ahead of time. Any ideas are appreciated. Thanks. -Randall -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Wed, Jun 24, 2015 at 9:07 PM, Steven D'Aprano st...@pearwood.info wrote: But just sticking to the three above, the first one is partially mitigated by allowing virus scanners to scan the data, but that implies that the owner of the storage machine can spy on the files. So you have a conflict here. If it's encrypted malware, and you can't decrypt it, there's no threat. Honestly, the *only* real defence against the spying issue is to encrypt the files. Not obfuscate them with a lousy random substitution cipher. The storage machine can keep the files as long as they like, just by making a copy, and spend hours bruteforcing them. They *will* crack the substitution cipher. In pure Python, that may take a few days or weeks; in C, hours or days. If they have the resources to throw at it, minutes. Substitution ciphers have not been effective encryption since, oh, the 1950s, unless you use a one-time pad. Which you won't be. The original post said that the sender will usually send files they encrypted, unless they are malicious. So if the sender wants them to be encrypted, they already are. While the data senders are supposed to encrypt data, that's not guaranteed, and I'd like to protect the recipient against exposure to nefarious data by mangling or encrypting the data before it is written to disk. The cipher is just to keep the sender from being able to control what is on disk. I am usually very oppositional when it comes to rolling your own crypto, but am I alone here in thinking the OP very clearly laid out their case? -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: enhancement request: make py3 read/write py2 pickle format
FWIW most of the objections below also apply to JSON, so this doesn't just have to be about repr/literal_eval. I'm definitely a huge proponent of widespread use of something like protocol buffers, both for production code and personal hacky projects. On Wed, Jun 10, 2015 at 2:36 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Wednesday 10 June 2015 14:48, Devin Jeanpierre wrote: [...] and literal_eval is not a great idea. * the common serializer (repr) does not output a canonical form, and can serialize things in a way that they can't be deserialized For literals, the canonical form is that understood by Python. I'm pretty sure that these have been stable since the days of Python 1.0, and will remain so pretty much forever: The problem is that there are two different ways repr might write out a dict equal to {'a': 1, 'b': 2}. This can make tests brittle -- e.g. it's why doctest fails badly at examples involving dictionaries. Text format protocol buffers output everything sorted, so that you can do textual diffs for compatibility tests and such. At work, one thing we do in places is mock out services using golden expected protobuf responses, so that you can test that the server returns exactly that, and test what the client does with that, separately. These are checked into perforce in text format. * there is no schema * there is no well understood migration story for when the data you load and store changes literal_eval is not a serialisation format itself. It is a primitive operation usable when serialising. E.g. you might write out a simple Unix- style rc file of key:value pairs: -snip- split on = and call literal_eval on the value. This is a perfectly reasonable light-weight solution for simple serialisation needs. I could spend a bunch of time writing yet another config file format, or I could use text format protocol buffers, YAML, or TOML and call it a day. * it encourages the use of eval when literal_eval becomes inconvenient or insufficient I don't think so. I think that people who make the effort to import ast and call ast.literal_eval are fully aware of the dangers of eval and aren't silly enough to start using eval. The problem is when you have your config file format using python literals, and another programmer wants to deal with it and doesn't look at your codebase, and things like that. When transferring data, this can happen a lot, since you are often not the user of the data you wrote, and you can't control how others consume it. They might use eval even if you didn't mean for them to. For example, in JavaScript, this was once a common problem for services exposing JSON, and it still happens even now. * It is not particularly well specified or documented compared to the alternatives. * The types you get back differ in python 2 vs 3 Doesn't matter. The type you *write* are different in Python 2 vs 3, so of course you do. In a shared 2/3 codebase, if I write bytes I expect to get bytes, and if I write unicode I expect to get unicode. (There is a third category of thing, which should be bytes on 2.x and string on 3.x, but it's probably best to handle that outside of the deserializer). If you thread it through repr and literal_eval using different versions for each, unicode in python 3 becomes bytes in python 2, and vice versa. So it makes migrating to Python 3 even harder. For most apps, the alternatives are better. Irmen's serpent library is strictly better on every front, for example. (Except potentially security, who knows.) Beyond simple needs, like rc files, literal_eval is not sufficient. You can't use it to deserialise arbitrary objects. That might be a feature, but if you need something more powerful than basic ints, floats, strings and a few others, literal_eval will not be powerful enough. No, it is powerful enough. After all, JSON has the same limitations. Protobuf only adds enums and structs to JSON's types, and it's potentially the most-used serialization format in the world by operations per second. Serialization libraries/formats usually need handholding to serialize complex Python objects into simple serializable types. [Except pickle, and that's the very reason it's insecure (per previous discussion in thread.)] -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: enhancement request: make py3 read/write py2 pickle format
On Wed, Jun 10, 2015 at 4:39 PM, Devin Jeanpierre jeanpierr...@gmail.com wrote: On Wed, Jun 10, 2015 at 4:25 PM, Terry Reedy tjre...@udel.edu wrote: On 6/10/2015 6:10 PM, Devin Jeanpierre wrote: The problem is that there are two different ways repr might write out a dict equal to {'a': 1, 'b': 2}. This can make tests brittle Not if one compares objects rather than string representations of objects. I am strongly of the view that code and tests should be written to directly compare objects as much as possible. For serialization formats that always output the same string for the same data (like text format protos), there is no practical difference between the two, except that if you're comparing text, you can easily supply a diff to update one to match the other. Ugh, there's also the fiddly difference between what goes in and what you read. A serialized data structure might contain lots of data that is ignored by the deserializer (in protobuf), or it might contain data which can't be loaded by the deserializer or produces weird / incorrect results. Being able to inspect and test the serialized data separately from the deserialized data is useful in that regard, so that you know where the failure lies, but it's sort of fuzzy. Some examples of where this crops up: pickles after you've moved a class, JSON encoders that try to be clever and output invalid JSON, protocol buffers with unexpected fields. Overall, though, the diff thing is probably the bigger reason everyone wants to do this sort of thing with serialized data. If you do it right and are principled about it, I don't see a problem with it. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: enhancement request: make py3 read/write py2 pickle format
On Wed, Jun 10, 2015 at 4:46 PM, Terry Reedy tjre...@udel.edu wrote: On 6/10/2015 7:39 PM, Devin Jeanpierre wrote: On Wed, Jun 10, 2015 at 4:25 PM, Terry Reedy tjre...@udel.edu wrote: On 6/10/2015 6:10 PM, Devin Jeanpierre wrote: The problem is that there are two different ways repr might write out a dict equal to {'a': 1, 'b': 2}. This can make tests brittle You commented about *tests* Not if one compares objects rather than string representations of objects. I am strongly of the view that code and tests should be written to directly compare objects as much as possible. I responded about *tests* For serialization formats that always output the same string for the same data (like text format protos), there is no practical difference between the two, except that if you're comparing text, you can easily supply a diff to update one to match the other. Serialization is a different issue. Yes, tests of code that uses serialization (caching, RPCs, etc.). I mentioned above a sort of test that divides tests of a client and server along RPC boundaries by providing fake queries and responses, and testing that those are the queries and responses given by the client and server. This way you don't need to actually start the client and server to test them both and their interactions. This is one example, there are other uses, but they go along the same lines. For example, one can also imagine testing that a serialized structure is identical across version changes, so that it's guaranteed to be forwards/backwards compatible. It is not enough to test that the deserialized form is, because it might differ substantially, as long as the communicated serialized structure is the same. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: enhancement request: make py3 read/write py2 pickle format
On Wed, Jun 10, 2015 at 4:25 PM, Terry Reedy tjre...@udel.edu wrote: On 6/10/2015 6:10 PM, Devin Jeanpierre wrote: The problem is that there are two different ways repr might write out a dict equal to {'a': 1, 'b': 2}. This can make tests brittle Not if one compares objects rather than string representations of objects. I am strongly of the view that code and tests should be written to directly compare objects as much as possible. For serialization formats that always output the same string for the same data (like text format protos), there is no practical difference between the two, except that if you're comparing text, you can easily supply a diff to update one to match the other. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: enhancement request: make py3 read/write py2 pickle format
Snipped aplenty. On Wed, Jun 10, 2015 at 8:21 PM, Steven D'Aprano st...@pearwood.info wrote: On Thu, 11 Jun 2015 08:10 am, Devin Jeanpierre wrote: [...] I could spend a bunch of time writing yet another config file format, or I could use text format protocol buffers, YAML, or TOML and call it a day. Writing a rc parser is so trivial that it's almost easier to just write it than it is to look up the APIs for YAML or JSON, to say nothing of the rigmarole of defining a protocol buffer config file, compiling it, importing the module, and using that. -snip That's a basic, *but acceptable*, rc parser written in literally under a minute. At the risk of ending up with egg on my face, I reckon that it's so simple and so obviously correct that I can tell it works correctly without even testing it. (Famous last words, huh?) I won't try to egg you. That said, you have to write tests. Also, everyone who uses it has to learn the format and API, and it may have corner cases you aren't aware of, it has to get ported to python 3 if you wrote it for python 2, the parsing errors are obscure and might need improvement, and so on. There's a place for this, but I suspect it is small compared to the place where it seemed like a good idea at the time. Beyond simple needs, like rc files, literal_eval is not sufficient. You can't use it to deserialise arbitrary objects. That might be a feature, but if you need something more powerful than basic ints, floats, strings and a few others, literal_eval will not be powerful enough. No, it is powerful enough. After all, JSON has the same limitations. In the sense that you can build arbitrary objects from a combination of a few basic types, yes, literal_eval is powerful enough if you are prepared to re-invent JSON, YAML, or protocol buffer. But I'm not talking about re-inventing what already exists. If I want JSON, I'll use JSON, not spend weeks or months re-writing it from scratch. I can't do this: class MyClass: pass a = MyClass() serialised = repr(a) b = ast.literal_eval(serialised) assert a == b I don't understand. You can't do that in JSON, YAML, XML, or protocol buffers, either. They only provide a small set of types, comparable to (but smaller) than the set of types you get from literal_eval/repr. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: enhancement request: make py3 read/write py2 pickle format
There's a lot of subtle issues with pickle compatibility. e.g. old-style vs new-style classes. It's kinda hard and it's better to give up. I definitely agree it's better to use something else instead. For example, we switched to using protocol buffers, which have much better compatibility properties and are a bit more testable to boot (since text format protobufs are always output in a canonical (sorted) form.) -- Devin On Tue, Jun 9, 2015 at 11:35 AM, Chris Warrick kwpol...@gmail.com wrote: On Tue, Jun 9, 2015 at 8:08 PM, Neal Becker ndbeck...@gmail.com wrote: One of the most annoying problems with py2/3 interoperability is that the pickle formats are not compatible. There must be many who, like myself, often use pickle format for data storage. It certainly would be a big help if py3 could read/write py2 pickle format. You know, backward compatibility? Don’t use pickle. It’s unsafe — it executes arbitrary code, which means someone can give you a pickle file that will delete all your files or eat your cat. Instead, use a safe format that has no ability to execute code, like JSON. It will also work with other programming languages and environments if you ever need to talk to anyone else. But, FYI: there is backwards compatibility if you ask for it, in the form of protocol versions. That’s all you should know — again, don’t use pickle. -- Chris Warrick https://chriswarrick.com/ PGP: 5EAAEA16 -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: enhancement request: make py3 read/write py2 pickle format
Passing around data that can be put into ast.literal_eval is synonymous with passing around data taht can be put into eval. It sounds like a trap. Other points against JSON / etc.: the lack of schema makes it easier to stuff anything in there (not as easily as pickle, mind), and by returning a plain dict, it becomes easier to require a field than to allow a field to be missing, which is bad for robustness and bad for data format migrations. (Protobuf (v3) has schemas and gives every field a default value.) For human readable serialized data, text format protocol buffers are seriously underrated. (Relatedly: underdocumented, too.) /me lifts head out of kool-aid and gasps for air -- Devin On Tue, Jun 9, 2015 at 5:17 PM, Irmen de Jong irmen.nos...@xs4all.nl wrote: On 10-6-2015 1:06, Chris Angelico wrote: On Wed, Jun 10, 2015 at 6:07 AM, Devin Jeanpierre jeanpierr...@gmail.com wrote: There's a lot of subtle issues with pickle compatibility. e.g. old-style vs new-style classes. It's kinda hard and it's better to give up. I definitely agree it's better to use something else instead. For example, we switched to using protocol buffers, which have much better compatibility properties and are a bit more testable to boot (since text format protobufs are always output in a canonical (sorted) form.) Or use JSON, if your data fits within that structure. It's easy to read and write, it's human-readable, and it's safe (no chance of arbitrary code execution). Forcing yourself to use a format that can basically be processed by ast.literal_eval() is a good discipline - means you don't accidentally save/load too much. ChrisA I made a specialized serializer for this, which is more expressive than JSON. It outputs python literal expressions that can be directly parsed by ast.literal_eval(). You can find it on pypi (https://pypi.python.org/pypi/serpent). It's the default serializer of Pyro, and it includes a Java and .NET version as well as an added bonus. Irmen -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: enhancement request: make py3 read/write py2 pickle format
On Tue, Jun 9, 2015 at 8:52 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Wednesday 10 June 2015 10:47, Devin Jeanpierre wrote: Passing around data that can be put into ast.literal_eval is synonymous with passing around data taht can be put into eval. It sounds like a trap. In what way? I misspoke, and instead of synonymous, meant also means. (Implication, not equivalence.) For human readable serialized data, text format protocol buffers are seriously underrated. (Relatedly: underdocumented, too.) Ironically, literal_eval is designed to process text-format protocols using human-readable Python syntax for common data types like int, str, and dict. Protocol buffers are a specific technology, not an abstract concept, and literal_eval is not a great idea. * the common serializer (repr) does not output a canonical form, and can serialize things in a way that they can't be deserialized * there is no schema * there is no well understood migration story for when the data you load and store changes * it is not usable from other programming languages * it encourages the use of eval when literal_eval becomes inconvenient or insufficient * It is not particularly well specified or documented compared to the alternatives. * The types you get back differ in python 2 vs 3 For most apps, the alternatives are better. Irmen's serpent library is strictly better on every front, for example. (Except potentially security, who knows.) At least it's better than pickle, security wise. Reliability wise, repr is a black hole, so no dice. :( -- Devin -- https://mail.python.org/mailman/listinfo/python-list
[issue15138] base64.urlsafe_b64**code are too slow
Devin Jeanpierre added the comment: Here's a backport of the patch to 2.7. It's pretty rad, and basically identical to how YouTube monkeypatches base64. Not sure what will happen to this patch. According to recent discussion on the list (e.g. https://mail.python.org/pipermail/python-dev/2015-May/140380.html ), performance improvements are open for inclusion in 2.7 if anyone wants to bother with merging this in and taking on the review / maintenance burden. I'm OK with just publishing it for others to merge in with their own private versions of Python. It is only relevant if you use base64 a lot. :) -- nosy: +Devin Jeanpierre Added file: http://bugs.python.org/file39568/base64_27.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15138 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17094] sys._current_frames() reports too many/wrong stack frames
Devin Jeanpierre added the comment: The patch I'm providing with this comment has a ... really hokey test case, and a two line + whitespace diff for pystate.c . The objective of the patch is only to have _current_frames report the correct frame for any live thread. It continues to report dead threads' frames, up until they would conflict with a live thread. IMO it's the minimal possible fix for this aspect of the bug, and suitable for 2.7.x. Let me know what you think. -- Added file: http://bugs.python.org/file39564/_current_frames_27_setdefault.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17094 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17094] sys._current_frames() reports too many/wrong stack frames
Devin Jeanpierre added the comment: This bug also affects 2.7. The main problem I'm dealing with is sys._current_frames will then return wrong stack frames for existing threads. One fix to just this would be to change how the dict is created, to keep newer threads rather than tossing them. Alternatively, we could backport the 3.4 fix. Thoughts? -- nosy: +Devin Jeanpierre ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17094 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23275] Can assign [] = (), but not () = []
Devin Jeanpierre added the comment: [a, b] = (1, 2) is also fine. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23275 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5315] signal handler never gets called
Devin Jeanpierre added the comment: Adding haypo since apparently he's been touching signals stuff a lot lately, maybe has some useful thoughts / review? :) -- nosy: +haypo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5315 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24283] Print not safe in signal handlers
New submission from Devin Jeanpierre: The code attached runs a while loop that prints, and has a signal handler that also prints. There is a thread that constantly fires off signals, but this is just to ensure the condition for the bug happens -- this is a bug with signal handling, not threads -- I can trigger a RuntimeError (... with a missing message?) by commenting out the threading lines and instead running a separate process while true; do kill -s SIGUSR1 4687; done. Traceback: $ python3 threading_print_test.py hello world Traceback (most recent call last): File /usr/local/google/home/devinj/Downloads/threading_print_test.py, line 36, in module main() File /usr/local/google/home/devinj/Downloads/threading_print_test.py, line 30, in main print(world) File /usr/local/google/home/devinj/Downloads/threading_print_test.py, line 13, in print_hello print(hello) RuntimeError: reentrant call inside _io.BufferedWriter name='stdout' -- files: threading_print_test.py messages: 244020 nosy: Devin Jeanpierre, haypo priority: normal severity: normal status: open title: Print not safe in signal handlers Added file: http://bugs.python.org/file39491/threading_print_test.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24283 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24283] Print not safe in signal handlers
Devin Jeanpierre added the comment: It doesn't do any of those things in Python 2, to my knowledge. Why aren't we willing to make this work? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24283 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5315] signal handler never gets called
Devin Jeanpierre added the comment: Agree with Charles-François's second explanation. This makes it very hard to reliably handle signals -- basically everyone has to remember to use set_wakeup_fd, and most people don't. For example, gunicorn is likely vulnerable to this because it doesn't use set_wakeup_fd. I suspect most code using select + signals is wrong. I've attached a patch which fixes the issue for select(), but not any other functions. If it's considered a good patch, I can work on the rest of the functions in the select module. (Also, tests for the details of the behavior.) Also the patch is pretty hokey, so I'd appreciate feedback if it's going to go in. :) -- keywords: +patch nosy: +Devin Jeanpierre Added file: http://bugs.python.org/file39489/select_select.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5315 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24235] ABCs don't fail metaclass instantiation
New submission from Devin Jeanpierre: If a subclass has abstract methods, it fails to instantiate... unless it's a metaclass, and then it succeeds. import abc class A(metaclass=abc.ABCMeta): ... @abc.abstractmethod ... def foo(self): pass ... class B(A): pass ... B() Traceback (most recent call last): File stdin, line 1, in module TypeError: Can't instantiate abstract class B with abstract methods foo class C(A, type): pass ... class c(metaclass=C): pass ... C('', (), {}) class '__main__.' -- components: Library (Lib) messages: 243540 nosy: Devin Jeanpierre priority: normal severity: normal status: open title: ABCs don't fail metaclass instantiation versions: Python 2.7, Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24235 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24144] Docs discourage use of binascii.unhexlify etc.
New submission from Devin Jeanpierre: Maybe the functions should be split up into those you shouldn't need to call directly, and those you should? I find it unlikely that you're supposed to use codecs.encode(..., 'hex') and codecs.decode(..., 'hex') instead of binascii (the only other thing, AFAIK, that works in both 2 and 3). Relevant quote starts with: Normally, you will not use these functions directly https://docs.python.org/2/library/binascii https://docs.python.org/3/library/binascii -- assignee: docs@python components: Documentation messages: 242737 nosy: Devin Jeanpierre, docs@python priority: normal severity: normal status: open title: Docs discourage use of binascii.unhexlify etc. versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24144 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: Pickle based workflow - looking for advice
On Mon, Apr 13, 2015 at 10:58 AM, Fabien fabien.mauss...@gmail.com wrote: Now, to my questions: 1. Does that seem reasonable? A big issue is the use of pickle, which is: * Often suboptimal performance wise (e.g. you can't load only subsets of the data) * Makes forwards/backwards compatibility very difficult * Can make python 2/3 migrations harder * Creates data files which are difficult to analyze/fix by hand if they get broken * Is schemaless, and can accidentally include irrelevant data you didn't mean to store, making all of the above worse. * Means you have to be very careful who wrote the pickles, or you open a remote code execution vulnerability. It's common for people to forget that code is unsafe, and get themselves pwned. Security is always better if you don't do anything bad in the first place, than if you do something bad but try to manage the context in which the bad thing is done. Cap'n Proto might be a decent alternatives that gives you good performance, by letting you process only the bits of the file you want to. It is also not a walking security nightmare. 2. Should Watershed be an object or should it be a simple dictionary? I thought that an object could be nice, because it could take care of some operations such as plotting and logging. Currently I defined a class Watershed, but its attributes are defined and filled by A, B and C (this seems a bit wrong to me). It is usually very confusing for attributes to be defined anywhere other than __init__. It's very really confusing for them to be defined by some random other function living somewhere else. I could give more responsibilities to this class but it might become way too big: since the whole purpose of the tool is to work on watersheds, making a Watershed class actually sounds like a code smell (http://en.wikipedia.org/wiki/God_object) Whether they are methods or not doesn't make this any more or less of a god object -- if it stores all this data used by all these different things, it is already a bit off. 3. The operation A opens an external file, reads data out of it and writes it in Watershed object. Is it a bad idea to multiprocess this? (I guess it is, since the file might be read twice at the same time) That does sound like a bad idea, for the reason you gave. It might be possible to read it once, and share it among many processes. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: You must register a new account to report a bug (was: Python 2 to 3 conversion - embrace the pain)
On Sun, Mar 15, 2015 at 11:17 PM, Ben Finney ben+pyt...@benfinney.id.au wrote: Sadly becoming the norm. People will run a software project and just assume that users will be willing to go through a registration process for every project just to report a bug. Registering for github is a lot easier than creating a reproducible test case. I agree that we should minimize friction, but friction will always exist. In GitHub's case, the additional friction is amortized over all github projects (and there are lots of those). Other things that can make bug reporting frustrating: * Slow triage / ignored bug reports * Automated bug report handling (closing all extant bugs every N months) * Passive-aggressively requiring hours of work to create an isolated system (e.g. brand new install of Ubuntu) before bug reports are accepted. * Dismissing bug reports as WAI without explanation, or with poor explanation (we talked about this and decided we disagree) * Dismissing bugs as not worth fixing * Passing the buck (This is a bug in XYlib, WAI) * Insulting bug reporters IMO registration is not nearly as big a deal as the others. If nothing else, because it's a one-time cost per project at most, whereas all the other issues (potentially) rear their head with every single bug report. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Fri, Feb 20, 2015 at 9:42 PM, Chris Angelico ros...@gmail.com wrote: No, it's not. I would advise using strong references - if the callback is a closure, for instance, you need to hang onto it, because there are unlikely to be any other references to it. If I register a callback with you, I expect it to be called; I expect, in fact, that that *will* keep my object alive. For that matter, if the callback is a method, you need to hang onto it, because method wrappers are generated on demand, so the method would be removed from the valid callbacks instantly. Weak references for callbacks are broken. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: meaning of: line, =
Sorry for late reply, I somehow missed this email. On Thu, Feb 5, 2015 at 8:59 AM, Rustom Mody rustompm...@gmail.com wrote: The reason I ask: I sorely miss haskell's pattern matching in python. It goes some way: ((x,y),z) = ((1,2),3) x,y,z (1, 2, 3) But not as far as I would like: ((x,y),3) = ((1,2),3) File stdin, line 1 SyntaxError: can't assign to literal [Haskell] Prelude let (x, (y, (42, z, Hello))) = (1, (2, (42, 3, Hello))) Prelude (x,y,z) (1,2,3) Yeah, but Haskell is ludicrous. Prelude let (x, 2) = (1, 3) Prelude Only non-falsifiable patterns really make sense as the left hand side of an assignment in a language without exceptions, IMO. Otherwise you should use a match/case statement. (Of course, Python does have exceptions...) -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: meaning of: line, =
On Thu, Feb 5, 2015 at 8:08 AM, Ian Kelly ian.g.ke...@gmail.com wrote: On Thu, Feb 5, 2015 at 2:40 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Devin Jeanpierre wrote: On Wed, Feb 4, 2015 at 1:18 PM, Chris Angelico ros...@gmail.com wrote: [result] = f() result 42 Huh, was not aware of that alternate syntax. Nor are most people. Nor is Python, in some places -- it seems like people forgot about it when writing some bits of the grammar. Got an example where you can use a,b but not [a,b] or (a,b)? def f(a, (b, c)): ... print a, b, c ... f(3, [4, 5]) 3 4 5 def g(a, [b, c]): File stdin, line 1 def g(a, [b, c]): ^ SyntaxError: invalid syntax Although to be fair, the first syntax there is no longer valid either in Python 3. As Ian rightly understood, I was referring to differences between [a, b, ...] and (a, b, ...). Here's another example, one that still exists in Python 3: [] = '' () = '' File stdin, line 1 SyntaxError: can't assign to () The syntax explicitly blacklists (), but forgets to blacklist []. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: meaning of: line, =
On Wed, Feb 4, 2015 at 1:18 PM, Chris Angelico ros...@gmail.com wrote: On Thu, Feb 5, 2015 at 4:36 AM, Peter Otten __pete...@web.de wrote: Another alternative is to put a list literal on the lefthand side: def f(): yield 42 ... [result] = f() result 42 Huh, was not aware of that alternate syntax. Nor are most people. Nor is Python, in some places -- it seems like people forgot about it when writing some bits of the grammar. I'd suggest not using it. (If you're worried: neither the list nor the tuple will be created; the bytecode is identical in both cases) It can't possibly be created anyway. Python doesn't have a notion of assignable thing that, when assigned to, will assign to something else like C's pointers or C++'s references. There's nothing that you could put into the list that would have this behaviour. C pointers don't do that either. It's really just references. (C pointers aren't any more action-at-a-distance than Python attributes.) Anyway, it could create a new list in Python, because Python can do whatever it wants. But it doesn't, because as you say, that wouldn't do anything. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: dunder-docs (was Python is DOOMED! Again!)
On Mon, Feb 2, 2015 at 6:07 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Run this code: # === cut === class K(object): def f(self): pass def f(self): pass instance = K() things = [instance.f, f.__get__(instance, K)] from random import shuffle shuffle(things) print(things) # === cut === You allege that one of these things is a method, and the other is not. I challenge you to find any behavioural or functional difference between the two. (Object IDs don't count.) If you can find any meaningful difference between the two, I will accept that methods have to be created as functions inside a class body. In this particular case, there is none. What if the body of the method was super().f() ? Some methods can be defined outside of the body and still work exactly the same, but others won't. Otherwise you are reduced to claiming that there is some sort of mystical, undetectable essence or spirit that makes one of those two objects a real method and the other one a fake method, even though they have the same type, the same behaviour, and there is no test that can tell you which is which. It isn't mystical. There are differences in semantics of defining methods inside or outside of a class that apply in certain situations (e.g. super(), metaclasses). You have cherrypicked an example that avoids them. If one wants to say A method can (...) by using super(), then methods must be defined to only exist inside of class bodies. Obviously, once you construct the correct runtime values, behavior might be identical. The difference is in whether you can do different things, not in behavior. For an example we can all agree on, this is not an instance of collections.Iterable, but the docs claim it is iterable: https://docs.python.org/2/glossary.html#term-iterable class MyIterable(object): def __getitem__(self, i): return i Iterable is a generic term, not a type. Despite the existence of the collections.Iterable ABC, iterable refers to any type which can be iterated over, using either of two different protocols. As I said above, if you wanted to argue that method was a general term for any callable attached to an instance or class, then you might have a point. But you're doing something much weirder: you are arguing that given two objects which are *identical* save for their object ID, one might be called a method, and the other not, due solely to where it was created. Not even where it was retrieved from, but where it was created. If you believe that method or not depends on where the function was defined, then this will really freak you out: py class Q: ... def f(self): pass # f defined inside the class ... py def f(self): pass # f defined outside the class ... py f, Q.f = Q.f, f # Swap the inside f and the outside f. py instance = Q() py instance.f # Uses outside f, so not a real method! bound method Q.f of __main__.Q object at 0xb7b8fcec py MethodType(f, instance) # Uses inside f, so is a real method! bound method Q.f of __main__.Q object at 0xb7b8fcec You are really missing the point, if you think that surprises me. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: dunder-docs (was Python is DOOMED! Again!)
On Mon, Feb 2, 2015 at 6:20 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Devin Jeanpierre wrote: Oops, I just realized why such a claim might be made: the documentation probably wants to be able to say that any method can use super(). So that's why it claims that it isn't a method unless it's defined inside a class body. You can use super anywhere, including outside of classes. The only thing you can't do is use the Python 3 super hack which automatically fills in the arguments to super if you don't supply them. That is compiler magic which truly does require the function to be defined inside a class body. But you can use super outside of classes: Obviously, I was referring to no-arg super. Please assume good faith and non-ignorance on my part. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: dunder-docs (was Python is DOOMED! Again!)
On Mon, Feb 2, 2015 at 4:06 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Sun, Feb 1, 2015 at 11:15 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Both K.f and K.g are methods, even though only one meets the definition given in the glossary. The glossary is wrong. I agree, it oversimplified and has made a useless distinction here. Even if it is so defined, the definition is wrong. You can define methods on an instance. I showed an example of an instance with its own personal __dir__ method, and showed that dir() ignores it if the instance belongs to a new-style class but uses it if it is an old-style class. You didn't define a method, you defined a callable attribute. That is wrong. I defined a method: py from types import MethodType py type(instance.f) is MethodType True instance.f is a method by the glossary definition. Its type is identical to types.MethodType, which is what I used to create a method by hand. You are assuming that they are both methods, just because they are instances of a type called MethodType. This is like assuming that a Tree() object is made out of wood. The documentation is free to define things in terms other than types and be correct. There are many properties of functions-on-classes that callable instance attributes that are instances of MethodType do not have, as we've already noticed. isinstance can say one thing, and the documentation another, and both can be right, because they are saying different things. For an example we can all agree on, this is not an instance of collections.Iterable, but the docs claim it is iterable: https://docs.python.org/2/glossary.html#term-iterable class MyIterable(object): def __getitem__(self, i): return i The docs are not wrong, they are just making a distinction for humans that is separate from the python types involved. This is OK. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: dunder-docs (was Python is DOOMED! Again!)
On Mon, Feb 2, 2015 at 5:00 AM, Devin Jeanpierre jeanpierr...@gmail.com wrote: On Mon, Feb 2, 2015 at 4:06 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Sun, Feb 1, 2015 at 11:15 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Both K.f and K.g are methods, even though only one meets the definition given in the glossary. The glossary is wrong. I agree, it oversimplified and has made a useless distinction here. Oops, I just realized why such a claim might be made: the documentation probably wants to be able to say that any method can use super(). So that's why it claims that it isn't a method unless it's defined inside a class body. -- Devin Even if it is so defined, the definition is wrong. You can define methods on an instance. I showed an example of an instance with its own personal __dir__ method, and showed that dir() ignores it if the instance belongs to a new-style class but uses it if it is an old-style class. You didn't define a method, you defined a callable attribute. That is wrong. I defined a method: py from types import MethodType py type(instance.f) is MethodType True instance.f is a method by the glossary definition. Its type is identical to types.MethodType, which is what I used to create a method by hand. You are assuming that they are both methods, just because they are instances of a type called MethodType. This is like assuming that a Tree() object is made out of wood. The documentation is free to define things in terms other than types and be correct. There are many properties of functions-on-classes that callable instance attributes that are instances of MethodType do not have, as we've already noticed. isinstance can say one thing, and the documentation another, and both can be right, because they are saying different things. For an example we can all agree on, this is not an instance of collections.Iterable, but the docs claim it is iterable: https://docs.python.org/2/glossary.html#term-iterable class MyIterable(object): def __getitem__(self, i): return i The docs are not wrong, they are just making a distinction for humans that is separate from the python types involved. This is OK. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Python is DOOMED! Again!
On Sun, Feb 1, 2015 at 8:31 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Paul Rubin wrote: It's completely practical: polymorphism and type inference get you the value you want with usually no effort on your part. But it's the usually that bites you. If I have an arbitrary pointer, and I want to check if it is safe to dereference, how do I do it? Surely I'm not expected to write something like: if type(ptr) == A: if ptr != Anil: ... if type(ptr) == B: if ptr != Bnil: ... etc. That would be insane. So how does Haskell do this? Haskell has different nulls in the same sense Java does: there's one keyword, whose type varies by context. Unlike Java, there is no way at all to cast different nulls to different types. Haskell has return value polymorphism and generics, so it's very easy for a function to return values of different types depending on type parameters. So this isn't even compiler hackery, it's ordinary. Also, you don't dereference in Haskell, you unpack. Python and Haskell code: if x is None: print(Not found!) else: print x case x of Nothing - putStrLn Not found Some y - putStrLn (show y) Both of these work whenever x is something that can be null and can be shown -- in Haskell, that's anything of type Maybe T, where you have access to a Show implementation for T. In Python, None is its own type/value, in Haskell there is an incompatible Nothing for each T. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Python is DOOMED! Again!
On Sun, Feb 1, 2015 at 8:34 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Devin Jeanpierre wrote: It's really only dynamically typed languages that have a single null value of a single type. Maybe I misunderstand the original statement. Pascal is statically typed and has a single null pointer compatible with all pointer types. C has a single nil pointer compatible with all pointer types. I expect that the Modula and Oberon family of languages copied Pascal, which probably copied Algol. No, C has a NULL macro which evaluates to something which coerces to any pointer type and will be the null value of that type. But there's one null value per type. The C standard makes no guarantees that they are compatible in any way, e.g. they can be of different sizes. On some systems, the null function pointer will have a size of N, where the null int pointer will have a size of M, where N != M -- so these are clearly not the same null value. I don't know Pascal, but I wouldn't be surprised if something similar held, as nonuniform pointer sizes were a thing once. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Python is DOOMED! Again!
On Sun, Feb 1, 2015 at 2:27 PM, Paul Rubin no.email@nospam.invalid wrote: Devin Jeanpierre jeanpierr...@gmail.com writes: That said, Haskell (and the rest) do have a sort of type coercion, of literals at compile time (e.g. 3 can be an Integer or a Double depending on how you use it.) That's polymorphism, not coercion. OK, yes, that fits better into how Haskell works. After all, that's how Nothing works. If 3 is just a (magic) constructor, then it's no different. The compiler figures out at compile time what type of 3 you actually mean: there is never an automatic runtime conversion. sqrt(3) works because sqrt expects a floating argument so the compiler deduces that the 3 that you wrote denotes a float. sqrt(3+length(xs)) has to fail because length returns an int, so 3+length(xs) is an int, and you can't pass an int to sqrt. BTW it's weird that in this thread, and in the programmer community at large, int-string is considered worse than int-float Hehe, though int-string leads to plenty of weird bugs. Haskell's idiomatic substitute for a null pointer is a Nothing value For that matter, how is this (first part) different from, say, Java? In Java, functions expecting to receve sensible values can get null by surprise. In Haskell, if a term can have a Nothing value, that has to be reflected in its type. Haskell's bug-magnet counterpart to Java's null values is Bottom, an artifact of lazy evaluation. E.g. you can write x = 3 / 0 someplace in your program, and the program will accept this and run merrily until you try to actually print something that depends on x, at which point it crashes. This isn't a difference in whether there are multiple nulls, though. I answered my own question later, by accident: Java nulls are castable to each other if you do it explicitly (routing through Object -- e.g. (Something)((Object) ((SomeOtherThing) null. So in that sense, there is only one null, just with some arbitrary compiler distinctions you can break through if you try hard enough. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: dunder-docs (was Python is DOOMED! Again!)
-- Devin On Sun, Feb 1, 2015 at 11:15 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Gregory Ewing wrote: Steven D'Aprano wrote: [quote] If the object has a method named __dir__(), this method will be called and must return the list of attributes. [end quote] The first inaccuracy is that like all (nearly all?) dunder methods, Python only looks for __dir__ on the class, not the instance itself. It says method, not attribute, so technically it's correct. The methods of an object are defined by what's in its class. Citation please. I'd like to see where that is defined. https://docs.python.org/3/glossary.html#term-method Even if it is so defined, the definition is wrong. You can define methods on an instance. I showed an example of an instance with its own personal __dir__ method, and showed that dir() ignores it if the instance belongs to a new-style class but uses it if it is an old-style class. You didn't define a method, you defined a callable attribute. Old-style classes will call those for special method overriding, because it's the simplest thing to do. New-style classes look methods up on the class as an optimization, but it also really complicates the attribute semantics. The lookup strategy is explicitly defined in the docs. pydoc is, like always, incomplete or inaccurate. See https://docs.python.org/2/reference/datamodel.html#special-method-names -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: RAII vs gc (was fortran lib which provide python like data type)
On Fri, Jan 30, 2015 at 1:28 PM, Sturla Molden sturla.mol...@gmail.com wrote: in Python. It actually corresponds to with Foo() as bar: suite The problem with with statements is that they only handle the case of RAII with stack allocated variables, and can't handle transfer of ownership cleanly. Consider the case of a function that opens a file and returns it: def myfunction(name, stuff): f = open(name) f.seek(stuff) # or whatever return f def blahblah(): with myfunction('hello', 12) as f: This code is wrong, because if an error occurs during seek in myfunction, the file is leaked. The correct myfunction is as follows: def myfunction(name, stuff) f = open(name) try: f.seek(stuff) except: f.close() raise Or whatever. (I would love a close_on_error context manager, BTW.) With RAII, the equivalent C++ looks nearly exactly like the original (bad) Python approach, except it uses unique_ptr to store the file, and isn't broken. (Modern) C++ makes this easy to get right. But then, this isn't the common case. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Python is DOOMED! Again!
Sorry, sort of responding to both of you. On Sat, Jan 31, 2015 at 10:12 PM, Paul Rubin no.email@nospam.invalid wrote: Steven D'Aprano steve+comp.lang.pyt...@pearwood.info writes: Some degree of weakness in a type system is not necessarily bad. Even the strongest of languages usually allow a few exceptions, such as numeric coercions. Haskell doesn't have automatic coercions of any sort. You have to call a conversion function if you want to turn an Int into an Integer. Yeah. In fact, it isn't very compatible with the ML/Haskell type system to automatically convert, because it does weird things to type inference and type unification. So this is common in that language family. That said, Haskell (and the rest) do have a sort of type coercion, of literals at compile time (e.g. 3 can be an Integer or a Double depending on how you use it.) BTW it's weird that in this thread, and in the programmer community at large, int-string is considered worse than int-float, when the former is predictable and reversible, while the latter is lossy and can cause subtle bugs. Although at least we don't have ten+ types with sixty different spellings which change from platform to platform, and all of which automatically coerce despite massive and outrageous differences in representable values. (Hello, C.) I've never come across a language that has pointers which insists on having a separate Nil pointer for ever pointer type Haskell's idiomatic substitute for a null pointer is a Nothing value (like Python's None) and there's a separate one for every type. The FFI offers actual pointers (Foreign.Ptr) and there is a separate nullPtr for every type. For that matter, how is this (first part) different from, say, Java? It's really only dynamically typed languages that have a single null value of a single type. Maybe I misunderstand the original statement. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: ANN: unpyc3 - a python bytecode decompiler for Python3
On Wed, Jan 28, 2015 at 4:34 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Devin Jeanpierre wrote: Git doesn't help if you lose your files in between commits, Sure it does? You just lose the changes made since the previous commit, but that's no different from restoring from backup. The restored file is only as up to date as the last time a backup was taken. Yeah. My point here is that Drive/Dropbox take snapshots at much shorter intervals than any reasonable person will commit with a DVCS, so you lose much less. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: ANN: unpyc3 - a python bytecode decompiler for Python3
On Wed, Jan 28, 2015 at 1:40 PM, Chris Angelico ros...@gmail.com wrote: On Thu, Jan 29, 2015 at 5:47 AM, Chris Kaynor ckay...@zindagigames.com wrote: I use Google Drive for it for all the stuff I do at home, and use SVN for all my personal projects, with the SVN depots also in Drive. The combination works well for me, I can transfer between my desktop and laptop freely, and have full revision history for debugging issues. I just do everything in git, no need for either Drive or something as old as SVN. Much easier. :) Git doesn't help if you lose your files in between commits, or if you lose the entire directory between pushes. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing module backport from 3 to 2.7 - spawn feature
On Wed, Jan 28, 2015 at 10:06 AM, Skip Montanaro skip.montan...@gmail.com wrote: On Wed, Jan 28, 2015 at 7:07 AM, Andres Riancho andres.rian...@gmail.com wrote: The feature I'm specially interested in is the ability to spawn processes [1] instead of forking, which is not present in the 2.7 version of the module. Can you explain what you see as the difference between spawn and fork in this context? Are you using Windows perhaps? I don't know anything obviously different between the two terms on Unix systems. On Unix, if you fork without exec*, and had threads open, threads abruptly terminate, resulting in completely broken mutex state etc., which leads to deadlocks or worse if you try to acquire resources in the forked child process. So in such circumstances, multiprocessing (in 2.7) is not a viable option. But 3.x adds a feature, spawn, that lets you fork+exec instead of just forking. I too would be interested in such a backport. I considered writing one, but haven't had a strong enough need yet. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: ANN: unpyc3 - a python bytecode decompiler for Python3
I distrust any backup strategy that requires explicit action by the user. I've seen users fail too often. (Including myself.) -- Devin On Wed, Jan 28, 2015 at 2:02 PM, Chris Angelico ros...@gmail.com wrote: On Thu, Jan 29, 2015 at 8:52 AM, Devin Jeanpierre jeanpierr...@gmail.com wrote: Git doesn't help if you lose your files in between commits, or if you lose the entire directory between pushes. So you commit often and push immediately. Solved. ChrisA -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: ANN: unpyc3 - a python bytecode decompiler for Python3
FWIW I put all my source code inside Dropbox so that even things I haven't yet committed/pushed to Bitbucket/Github are backed up. So far it's worked really well, despite using Dropbox on both Windows and Linux. (See also: Google Drive, etc.) (Free) Dropbox has a 30 day recovery time limit, and I think Google Drive has a trash bin, as well as a 29 day recovery for emptied trash items. That said, hindsight is easier than foresight. I'm glad you were able to recover your files! -- Devin On Wed, Jan 28, 2015 at 10:09 AM, n.poppel...@xs4all.nl wrote: Last night I accidentally deleted a group of *.py files (stupid-stupid-stupid!). Thanks to unpyc3 I have reconstructed all but one of them so far from the *.pyc files that were in the directory __pycache__. Many thanks!!! -- Nico -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: An object is an instance (or not)?
On Tue, Jan 27, 2015 at 9:37 PM, random...@fastmail.us wrote: On Tue, Jan 27, 2015, at 16:06, Mario Figueiredo wrote: That error message has me start that thread arguing that the error is misleading because the Sub object does have the __bases__ attribute. It's the Sub instance object that does not have it. What do you think Sub object means? Sub itself is not a Sub object, it is a type object. instance is implicit in the phrase foo object. Yes. Unfortunately, it's still not really completely clear. Sub instance would avoid this confusion for everyone. I think the only reason to avoid instance in the past would have been the old-style object confusion, as Ben Finney pointed out. (BTW I agree with literally every single thing he said in this thread, it's really amazing.) -- Devin -- https://mail.python.org/mailman/listinfo/python-list
[issue23322] parser module docs missing second example
New submission from Devin Jeanpierre: The port to reST missed the second example: https://docs.python.org/release/2.5/lib/node867.html This is still referred to in the docs, so it is not deliberate. For example, the token module docs say The second example for the parser module shows how to use the symbol module: https://docs.python.org/3.5/library/token.html#module-token There is no second example, nor any use of the symbol module, in the docs: https://docs.python.org/3.5/library/parser.html -- assignee: docs@python components: Documentation messages: 234716 nosy: Devin Jeanpierre, docs@python priority: normal severity: normal status: open title: parser module docs missing second example versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23322 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: Alternative to multi-line lambdas: Assign-anywhere def statements
On Sat, Jan 24, 2015 at 11:55 AM, Chris Angelico ros...@gmail.com wrote: That's still only able to assign to a key of a dictionary, using the function name. There's no way to represent fully arbitrary assignment in Python - normally, you can assign to a name, an attribute, a subscripted item, etc. (Augmented assignment is a different beast altogether, and doesn't really make sense with functions.) There's no easy way to say @stash(dispatch_table_a['asdf']) and have that end up assigning to exactly that. Obviously, nobody will be happy until you can do: def call(*a, **kw): return lambda f: f(*a, **kw) @call() def x, y (): yield 1 yield 2 Actually, maybe not even then. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Alternative to multi-line lambdas: Assign-anywhere def statements
On Sat, Jan 24, 2015 at 5:58 PM, Ethan Furman et...@stoneleaf.us wrote: On 01/24/2015 11:55 AM, Chris Angelico wrote: On Sun, Jan 25, 2015 at 5:56 AM, Ethan Furman et...@stoneleaf.us wrote: If the non-generic is what you're concerned about: # not tested dispatch_table_a = {} dispatch_table_b = {} dispatch_table_c = {} class dispatch: def __init__(self, dispatch_table): self.dispatch = dispatch_table def __call__(self, func): self.dispatch[func.__name__] = func return func @dispatch(dispatch_table_a) def foo(...): pass That's still only able to assign to a key of a dictionary, using the function name. This is a Good Thing. The def statement populates a few items, __name__ being one of them. One of the reasons lambda is not encouraged is because its name is always 'lambda', which just ain't helpful when the smelly becomes air borne! ;) Actually, in this case you'd probably want the function's __name__ to be something different, since it'd be confusing if all three dispatch tables had a 'foo' entry, using functions whose name was 'foo'. No reason a function's name can't be dispatch_table_a['foo'] -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Trees
There are similarly many kinds of hash tables. For a given use case (e.g. a sorted dict, or a list with efficient removal, etc.), there's a few data structures that make sense, and a library (even the standard library) doesn't have to expose which one was picked as long as the performance is good. -- Devin On Tue, Jan 20, 2015 at 12:15 PM, Ken Seehart k...@seehart.com wrote: Exactly. There are over 23,000 different kinds of trees. There's no way you could get all of them to fit in a library, especially a standard one. Instead, we prefer to provide people with the tools they need to grow their own trees. http://caseytrees.org/programs/planting/ctp/ http://www.ncsu.edu/project/treesofstrength/treefact.htm http://en.wikipedia.org/wiki/Tree On 1/19/2015 3:01 PM, Mark Lawrence wrote: On 19/01/2015 22:06, Zachary Gilmartin wrote: Why aren't there trees in the python standard library? Probably because you'd never get agreement as to which specific tree and which specific implementation was the most suitable for inclusion. -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Trees
On Mon, Jan 19, 2015 at 3:08 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Zachary Gilmartin wrote: Why aren't there trees in the python standard library? Possibly because they aren't needed? Under what circumstances would you use a tree instead of a list or a dict or combination of both? That's not a rhetorical question. I am genuinely curious, what task do you have that you think must be solved by a tree? In general, any time you want to maintain a sorted list or mapping, balanced search tree structures come in handy. Here's an example task: suppose you want to represent a calendar, where timeslots can be reserved for something. Calendar events are not allowed to intersect. The most important query is: What events are there that intersect with the timespan between datetimes d1 and d2? (To draw a daily agenda, figure out if you should display an alert to the user that an event is ongoing or imminent, etc.) You also want to be able to add a new event to the calendar, that takes place between d1 and d2, and to remove a event. I leave it to the reader to implement this using a sorted map. (hint: sort by start.) This maybe seems contrived, but I've used this exact datatype, or a remarkably similar one, in a few different circumstances: sequenced actions of characters in a strategy game, animation, motion planning... There are a few possible implementations using Python data structures. You can do it using a linear scan, which gets a little slow pretty quickly. You can make insertion slow (usually OK) by sorting on insertion, but if you ever forget to resort your list you will get a subtle bug you might not notice for a while. And so on. It's better in every way to use the third-party blist module, so why bother? -- Devin -- https://mail.python.org/mailman/listinfo/python-list
[issue23275] Can assign [] = (), but not () = []
New submission from Devin Jeanpierre: [] = () () = [] File stdin, line 1 SyntaxError: can't assign to () This contradicts the assignment grammar, which would make both illegal: https://docs.python.org/3/reference/simple_stmts.html#assignment-statements -- components: Interpreter Core messages: 234324 nosy: Devin Jeanpierre priority: normal severity: normal status: open title: Can assign [] = (), but not () = [] type: behavior ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23275 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: Hello World
Sorry for necro. On Sat, Dec 20, 2014 at 10:44 PM, Chris Angelico ros...@gmail.com wrote: On Sun, Dec 21, 2014 at 5:31 PM, Terry Reedy tjre...@udel.edu wrote: Just to be clear, writing to sys.stdout works fine in Idle. import sys; sys.stdout.write('hello ') hello #2.7 In 3.4, the number of chars? bytes? is returned and written also. Whether you mean something different by 'stdout' or not, I am not sure. The error is from writing to a non-existent file descriptor. That's because sys.stdout is replaced. But stdout itself, file descriptor 1, is not available: It surprises me that IDLE, and most other shells, don't dup2 stdout/err/in so that those FDs talk to IDLE. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: PyWart: Poor Documentation Examples
On Sat, Jan 10, 2015 at 6:32 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: At the point you are demonstrating reduce(), if the reader doesn't understand or can't guess the meaning of n = 4, n+1 or range(), they won't understand anything you say. Teachers need to understand that education is a process of building upon that which has come before. If the teacher talks down to the student by assuming that the student knows nothing, and tries to go back to first principles for every little thing, they will never get anywhere. Agree wholeheartedly. That said, I do think reduce(operator.mul, [1, 2, 3, 4]) actually _is_ a better example, since it cuts right to the point. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Decimals and other numbers
On Fri, Jan 9, 2015 at 2:20 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: -snip- I don't understand what you're trying to say here. You can't just arbitrarily declare that 0**1 equals something other than 0 (or for that matter, doesn't equal anything at all). You can, actually. It's just silly. (Similarly, you can declare that 0**0 is something other than 1 (or for that matter, doesn't equal anything at all), but it's silly.) Can we agree that 0**1 is well-defined and get back to 0**0? Believe it or not I actually misread your whole thing and thought we were talking about 0**0. Otherwise I would've been much briefer... Not quite. I agree that, *generally speaking* having 0**0 equal 1 is the right answer, or at least *a* right answer, but not always. It depends on how you get to 0**0... You don't get to a number. Those are limits. Limits and arithmetic are different. (Well, sort of. :) Yes, sort of :-) I was alluding to the definition of the reals. Of course you can get to numbers. We start with counting, that's a way to get to the natural numbers, by applying the successor function repeatedly until we reach the one we want. Or you can get to pi by generating an infinite sequence of closer and closer approximations. Or an infinite series. Or an infinite product. Or an infinite continued fraction. All of these ways to get to pi converge on the same result. Yes, all numbers can be represented as a converging limit. However, that does not mean that the way you compute the result of a function like x**y is by taking the limit as its arguments approach the input: that procedure works only for continuous functions. x**y is not continuous at 0, so this style of computation cannot give you an answer. If 0**0 has a value, we can give that number a name. Let's call it Q. There are different ways to evaluate Q: lim x - 0 of sin(x)/x gives 1 lim x - 0 of x**0 gives 1 lim x - 0 of 0**x gives 0 This is a proof that f(x, y) = x**y is not continuous around 0, 0. It is not a proof that it is undefined at 0, 0, in fact, it says nothing about the value. 0**0 = 0**(5-5) = 0**5 / 0**5 = 0/0 gives indeterminate Here is a nearly identical proof that 0**1 is indeterminate: 0 = 0**1 = 0**(5 - 4) = 0**5 / 0**4 = 0/0 gives indeterminate. The fact that you can construct a nonsensical expression from an expression doesn't mean the original expression was nonsensical. In this case, your proof was invalid, because 0**(X-Y) is not equivalent to 0**X/0**Y. So we have a problem. Since all these ways to get to Q fail to converge, the obvious answer is to declare that Q doesn't exist and that 0**0 is indeterminate, and that is what many mathematicians do: That isn't what indeterminate means. However, this begs the question of what we mean by 0**0. In the case of m**n, with both m and n positive integers, there is an intuitively obvious definition for exponentiation: repeated multiplication. But there's no obvious meaning for exponentiation when both m and n are zero, hence we (meaning, mathematicians) have to define what it means. So long as that definition doesn't lead to contradiction, we can make any choice we like. Sorry, I don't follow. n**0 as repeated multiplication makes perfect sense: we don't perform any multiplications, but if we did, we'd be multiplying 'n's. 0**m as repeated multiplication makes perfect sense: whatever we multiply, it's a bunch of 0s. Why doesn't 0**0 make sense? We don't perform any multiplications, but if we did, we'd be multiplying 0s. If we don't perform any multiplications, the things we didn't multiply don't matter. Whether they are fives, sevens, or zeroes, the answer is the same: 1. Since you can get difference results depending on the method you use to calculate it, the technically correct result is that 0**0 is indeterminate. No, only limits are indeterminate. Calculations not involving limits cannot be indeterminate. Do tell me what 0/0 equals, if it is not indeterminate. 0/0 is undefined, it isn't indeterminate. Indeterminate forms are a way of expressing limits where you have performed a lossy substitution. That is: the limit as x approaches a of 0/0 is an indeterminate form. In the real number system, infinity does not exist. It only exists in limits or extended number systems. Yes, you are technically correct, the best kind of correct. I'm just sketching an informal proof. If you want to make it vigorous by using limits, be my guest. It doesn't change the conclusion. No, the point is that limits are irrelevant. As has been proven countlessly many times, x**y is not continuous around the origin. This has no bearing on whether it takes a value at the origin. [...] Arguably, *integer* 0**0 could be zero, on the basis that you can't take limits of integer-valued quantities, and zero times itself zero times surely has to be zero. No. No no no. On natural numbers no other thing makes sense than 1.
[issue23086] Add start and stop parameters to the Sequence.index() ABC mixin method
Devin Jeanpierre added the comment: I inferred from Serhiy's comment that if you override __iter__ to be efficient and not use __getitem__, this overridden behavior used to pass on to index(), but wouldn't after this patch. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23086 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: Decimals and other numbers
On Fri, Jan 9, 2015 at 7:05 PM, Gregory Ewing greg.ew...@canterbury.ac.nz wrote: It's far from clear what *anything* multiplied by itself zero times should be. A better way of thinking about what x**n for integer n means is this: Start with 1, and multiply it by x n times. The result of this is clearly 1 when n is 0, regardless of the value of x. 5**4 = 5*5*5*5 = 625 No: 5**4 = 1*5*5*5*5 5**3 = 1*5*5*5 5**2 = 1*5*5 5**1 = 1*5 5**0 = 1 I never liked that, it seemed too arbitrary. How about this explanation: Assume that we know how to multiply a nonempty list numbers. so product([a]) == a, product([a, b]) = a * b, and so on. def product(nums): if len(nums) == 0: return ??? return reduce(operator.mul, nums) It should be the case that given a list of factors A and B, product(A + B) == product(A) * product(B) (associativity). We should let this rule apply even if A or B is the empty list, otherwise our rules are kind of stupid. Therefore, product([] + X) == product([]) * product(X) But since [] + X == X, product([] + X) == product(X) There's only one number like that: product([]) == 1 (Of course if you choose not to have the full associativity rule for empty products, then anything is possible.) -- Devin -- https://mail.python.org/mailman/listinfo/python-list
[issue23201] Decimal(0)**0 is an error, 0**0 is 1, but Decimal(0) == 0
Devin Jeanpierre added the comment: Does the spec have a handy list of differences to floats anywhere, or do you have to internalize the whole thing? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23201 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: Decimals and other numbers
Marko, your argument is this function x**y(a, x) must be continuous on [0, inf), and to be continuous at 0, 0**0 must be a. Since there are many possible values of a, this is not a justification, this is a proof by contradiction that the premise was faulty: x**y(a, x) doesn't have to be continuous after all. 0**0 is 1, which makes some functions continuous and some functions not, and who cares? It's 1 because that's what is demanded by combinatorial definitions of exponentiation, and its origins in the domain of the natural numbers. Knuth says that thought of combinatorially on the naturals, x**y counts the number of mappings from a set of x values to a set of y values. Clearly there's only one mapping from the empty set to itself: the empty mapping. Number theory demands that performing multiplication among an empty bag of numbers gives you the result of 1 -- even if the empty bag is an empty bag of zeroes instead of an empty bag of fives. The result does not change. Either of those ideas about exponentiation can be thought of as descriptions of its behavior, or as definitions. They completely describe its behavior on the naturals, from which we derive its behavior on the reals. -- Devin On Thu, Jan 8, 2015 at 11:28 PM, Marko Rauhamaa ma...@pacujo.net wrote: Devin Jeanpierre jeanpierr...@gmail.com: If 0**0 is defined, it must be 1. You can justify any value a within [0, 1]. For example, choose y(a, x) = log(a, x) Then, limy(a, x) = 0 x - 0+ and: lim[x - 0+] x**y(a, x) = a For example, a = 0.5 x = 1e-100 y = math.log(a, x) y 0.0030102999566398118 x**y 0.5 Marko -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Decimals and other numbers
On Fri, Jan 9, 2015 at 12:58 AM, Devin Jeanpierre jeanpierr...@gmail.com wrote: Arguably, *integer* 0**0 could be zero, on the basis that you can't take limits of integer-valued quantities, and zero times itself zero times surely has to be zero. I should have responded in more detail here, sorry. If you aren't performing any multiplication, why does it matter what numbers you are multiplying? Doing no multiplications of five is the same as doing no multiplications of two is the same as doing no multiplications of... 0. You can define it to be 0 but only if you are multiplying an empty bag of zeroes, but it's hard to imagine what makes an empty bag of zeroes different from an empty bag of fives. It really surely is *not* the case. Obviously, this kind of ridiculousness comes naturally to Java and C++ programmers, with their statically typed collections. It's no surprise that's where the Decimal spec came from. ;) -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Decimals and other numbers
On Fri, Jan 9, 2015 at 12:49 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Devin Jeanpierre wrote: On Thu, Jan 8, 2015 at 6:43 PM, Dave Angel da...@davea.name wrote: What you don't say is which behavior you actually expected. Since 0**0 is undefined mathematically, I'd expect either an exception or a NAN result. It can be undefined, if you choose for it to be. You can also choose to not define 0**1, of course. No you can't -- that would make arithmetic inconsistent. 0**1 is perfectly well defined as 0 however you look at it: lim of x - 0 of x**1 = 0 lim of y - 1 of 0**y = 0 This is a misunderstanding of limits. Limits are allowed to differ from the actual evaluated result when you substitute the limit point: that's what it means to be discontinuous. What you call making arithmetic inconsistent, I call making the function inside the limit discontinuous at 0. If 0**0 is defined, it must be 1. I Googled around to find a mathematician to back me up, here: http://arxiv.org/abs/math/9205211 (page 6, ripples). Not quite. I agree that, *generally speaking* having 0**0 equal 1 is the right answer, or at least *a* right answer, but not always. It depends on how you get to 0**0... You don't get to a number. Those are limits. Limits and arithmetic are different. (Well, sort of. :) Since you can get difference results depending on the method you use to calculate it, the technically correct result is that 0**0 is indeterminate. No, only limits are indeterminate. Calculations not involving limits cannot be indeterminate. -snip- log(Q) = 0*-inf What is zero times infinity? In the real number system, that is indeterminate, again because it depends on how you calculate it In the real number system, infinity does not exist. It only exists in limits or extended number systems. : naively it sounds like it should be 0, but infinity is pretty big and if you add up enough zeroes in the right way you can actually get something non-zero. There's no one right answer. So if the log of Q is indeterminate, then so must be Q. But there are a host of good reasons for preferring 0**0 = 1. Donald Knuth writes (using ^ for power): Some textbooks leave the quantity 0^0 undefined, because the functions 0^x and x^0 have different limiting values when x decreases to 0. But this is a mistake. We must define x^0=1 for all x , if the binomial theorem is to be valid when x=0, y=0, and/or x=-y. The theorem is too important to be arbitrarily restricted! By contrast, the function 0^x is quite unimportant. More discussion here: http://mathforum.org/dr.math/faq/faq.0.to.0.power.html I've already been citing Knuth. :P I expected 1, nan, or an exception, but more importantly, I expected it to be the same for floats and decimals. Arguably, *integer* 0**0 could be zero, on the basis that you can't take limits of integer-valued quantities, and zero times itself zero times surely has to be zero. No. No no no. On natural numbers no other thing makes sense than 1. All of the definitions of exponentiation for natural numbers require it, except for those derived from analytical notions of exponentiation. (Integers just give you ratios of natural exponentials, so again no.) -- Devin -- https://mail.python.org/mailman/listinfo/python-list