Re: [Python-Dev] Need discussion for a PR about memory and objects
Chris Angelico wrote: Licence plate numbers do get reused. And they can change, e.g. if you get a personalised plate. -- Greg ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Need discussion for a PR about memory and objects
Nick Coghlan wrote: - the object identity is like the registration number or license plate (unique within the particular system of registering vehicles, but not unique across systems, and may sometimes be transferred to a new vehicle after the old one is destroyed) - the object type is like the make and model (e.g. a 2007 Toyota Corolla Ascent Sedan) - the object value is a specific car (e.g. "that white Corolla over there with 89000 km on the odometer") A bit confusing, because "that white Corolla over there" is referring to its identity. -- Greg ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] General concerns about C API changes
On Sun, Nov 18, 2018 at 8:52 AM Stefan Behnel wrote: > > Gregory P. Smith schrieb am 15.11.18 um 01:03: > > From my point of view: A static inline function is a much nicer modern code > > style than a C preprocessor macro. > > It's also slower to compile, given that function inlining happens at a much > later point in the compiler pipeline than macro expansion. The C compiler > won't even get to see macros in fact, whereas whether to inline a function > or not is a dedicated decision during the optimisation phase based on > metrics collected in earlier stages. For something as ubiquitous as > Py_INCREF/Py_DECREF, it might even be visible in the compilation times. Have you measured this? I had the opposite intuition, that macros on average will be slower to compile because they increase the amount of code that the frontend has to process. But I've never checked... -n -- Nathaniel J. Smith -- https://vorpus.org ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Need discussion for a PR about memory and > objects
> Date: Sun, 18 Nov 2018 22:32:35 +1000 > From: Nick Coghlan > To: "Steven D'Aprano" > Cc: python-dev > Subject: Re: [Python-Dev] Need discussion for a PR about memory and > objects [ munch background ] > > Chris's initial suggestion was to use "license number" or "social > security number" (i.e. numbers governments assign to people), but I'm > thinking a better comparison might be to vehicle registration numbers, > since that analogy can be extended to the type and value > characteristics in a fairly straightforward way: > > - the object identity is like the registration number or license plate > (unique within the particular system of registering vehicles, but not > unique across systems, and may sometimes be transferred to a new > vehicle after the old one is destroyed) > - the object type is like the make and model (e.g. a 2007 Toyota > Corolla Ascent Sedan) > - the object value is a specific car (e.g. "that white Corolla over > there with 89000 km on the odometer") > > On the other hand, we're talking about the language reference here, > not the tutorial, and understanding memory addressing seems like a > reasonable assumed pre-requisite in that context. "Handle" has been used since the 1980s among Macintosh and Win32 programmers as "unique identifier of some object but isn't the memory address". The usage within those APIs seems to match what's being proposed for the new Python C API, in that programmers used functions to ask "what type are you?" "what value do you have?" but couldn't, or at least shouldn't, rely on actual memory layout. I suggest that for the language reference, use the license plate or registration analogy to introduce "handle" and after that use handle throughout. It's short, distinctive, and either will match up with what the programmer already knows or won't clash if or when they encounter handles elsewhere. -- cheers, Hugh Fisher ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Need discussion for a PR about memory and objects
I found this (very good) summary ended in a surprising conclusion. On 18/11/2018 12:32, Nick Coghlan wrote: On Sun, 4 Nov 2018 at 23:33, Steven D'Aprano wrote: On Sun, Nov 04, 2018 at 11:43:50AM +0100, Stephane Wirtel wrote: In this PR [https://github.com/python/cpython/pull/3382] "Remove reference to address from the docs, as it only causes confusion", opened by Chris Angelico, there is a discussion about the right term to use for the address of an object in memory. Why do we need to refer to the address of objects in memory? ... Chris's initial suggestion was to use "license number" or "social security number" (i.e. numbers governments assign to people), but I'm thinking a better comparison might be to vehicle registration numbers, ... On the other hand, we're talking about the language reference here, not the tutorial, and understanding memory addressing seems like a reasonable assumed pre-requisite in that context. Cheers, Nick. It is a good point that this is in the language reference, not a tutorial. Could we not expect readers of that to be prepared for a notion of object identity as the abstraction of what we mean by "the same object" vs "a distinct object"? If it were necessary to be explicit about what Python means by it, one could unpack the idea by its properties: distinct names may be given to the same object (is-operator); distinct objects may have the same value (==-operator); an object may change in value (if allowed by its type) while keeping its identity. And then there is the id() function. That is an imperfect reflection of the identity. id() guarantees that for a given object (identity) it will always return the same integer during the life of that object, and a different integer for any distinct object (distinct identity) with an overlapping lifetime. We note that, in an implementation of Python where objects are fixed in memory for life, a conformant id() may return the object's address. Jeff Allen ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged)
On Fri, 16 Nov 2018 at 10:11, Paul Moore wrote: > On Fri, 16 Nov 2018 at 17:49, Brett Cannon wrote: > > And Just to be clear, I totally support coming up with a totally > stripped-down C API as I have outlined above as that shouldn't be > controversial for any VM that wants to have a C-level API. > > If a stripped down API like this is intended as "use this and you get > compatibility across multiple Python interpreters and multiple Python > versions" (essentially a much stronger and more effective version of > the stable ABI) then I'm solidly in favour (and such an API has clear > trade-offs that allow people to judge whether it's the right choice > for them). > Yes, that's what I'm getting at. Basically we have to approach this from the "start with nothing and build up until we have _just_ enough and thus we know **everyone** now and into the future can support it", or we approach with "take what we have now and start peeling back until we _think_ it's good enough". Personally, I think the former is more future-proof. > > Having this alongside the existing API, which would still be supported > for projects that need low-level access or backward compatibility (or > simply don't have the resources to change), but which will remain > CPython-specific, seems like a perfectly fine idea. > And it can be done as wrappers around the current C API and as an external project to start. As Nathaniel pointed out in another thread, this is somewhat like what Py_LIMITED_API was meant to be, but I think we all admit we slightly messed up by making it opt-out instead of opt-in and so we didn't explicitly control that API as well as we probably should have (I know I have probably screwed up by accidentally including import functions by forgetting it was opt-out). I also don't think it was necessarily designed from a minimalist perspective to begin with as it defines things in terms of what's _not_ in Py_LIMITED_API instead of explicitly listing what _is_. So it may (or may not) lead to a different set of APIs in the end when you have to explicitly list every API to include. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Need discussion for a PR about memory and objects
On Mon, Nov 19, 2018 at 6:01 AM Richard Damon wrote: > One issue with things like vehicle registration numbers is that the VIN > of a vehicle is really a UUID, it is globally unique no other vehicle > will every have that same ID number, and people may not think of the > fact that some other ID numbers, like the SSN do get reused. Since the > Python Object Identities can get reused after the object goes away, the > analogy really needs to keep that clear, and not give the other extreme > of a false impression that the ID won't get reused after the object goes > away. Licence plate numbers do get reused. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Need discussion for a PR about memory and objects
On 11/18/18 7:32 AM, Nick Coghlan wrote: > On Sun, 4 Nov 2018 at 23:33, Steven D'Aprano wrote: >> On Sun, Nov 04, 2018 at 11:43:50AM +0100, Stephane Wirtel wrote: >>> In this PR [https://github.com/python/cpython/pull/3382] "Remove reference >>> to >>> address from the docs, as it only causes confusion", opened by Chris >>> Angelico, there is a discussion about the right term to use for the >>> address of an object in memory. >> Why do we need to refer to the address of objects in memory? > Following up on this discussion from a couple of weeks ago, note that > Stephane misstated Chris's question/proposal from the PR slightly. > > The context is that the data model documentation for objects currently > describes them as having an identity, a type, and a value, and then > uses "address in memory" as an analogy for the properties that the > object identity has (i.e. only one object can have a given identifier > at any particular point in time, but identifiers can be re-used over > time as objects are created and destroyed). > > That analogy is problematic, since it encourages the "object > identities are memory addresses" mindset that happens to be true in > CPython, but isn't true for Python implementations in general, and > also isn't helpful for folks that have never learned a lower level > language where you're manipulating pointers directly. > > However, simply removing the analogy entirely leaves that paragraph in > the documentation feeling incomplete, so it would be valuable to > replace it with a different real world analogy that will make sense to > a broad audience. > > Chris's initial suggestion was to use "license number" or "social > security number" (i.e. numbers governments assign to people), but I'm > thinking a better comparison might be to vehicle registration numbers, > since that analogy can be extended to the type and value > characteristics in a fairly straightforward way: > > - the object identity is like the registration number or license plate > (unique within the particular system of registering vehicles, but not > unique across systems, and may sometimes be transferred to a new > vehicle after the old one is destroyed) > - the object type is like the make and model (e.g. a 2007 Toyota > Corolla Ascent Sedan) > - the object value is a specific car (e.g. "that white Corolla over > there with 89000 km on the odometer") > > On the other hand, we're talking about the language reference here, > not the tutorial, and understanding memory addressing seems like a > reasonable assumed pre-requisite in that context. > > Cheers, > Nick. One issue with things like vehicle registration numbers is that the VIN of a vehicle is really a UUID, it is globally unique no other vehicle will every have that same ID number, and people may not think of the fact that some other ID numbers, like the SSN do get reused. Since the Python Object Identities can get reused after the object goes away, the analogy really needs to keep that clear, and not give the other extreme of a false impression that the ID won't get reused after the object goes away. -- Richard Damon ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] General concerns about C API changes
Gregory P. Smith schrieb am 15.11.18 um 01:03: > From my point of view: A static inline function is a much nicer modern code > style than a C preprocessor macro. It's also slower to compile, given that function inlining happens at a much later point in the compiler pipeline than macro expansion. The C compiler won't even get to see macros in fact, whereas whether to inline a function or not is a dedicated decision during the optimisation phase based on metrics collected in earlier stages. For something as ubiquitous as Py_INCREF/Py_DECREF, it might even be visible in the compilation times. Oh, BTW, I don't know if this was mentioned in the discussion before, but transitive inlining can easily be impacted by the switch from a macro to an inline function. Since inlining happens long before the final CPU code generation, the C compiler needs to uses heuristics for estimating the eventual "code weight" of an inline function, and then sums up all weights within a calling function to decide whether to also inline that calling function into the transitive callers or not. Now imagine that you have an inline function that executes several Py_INCREF/Py_DECREF call cycles, and the C compiler happens to slightly overestimate the weights of these two. Then it might end up deciding against inlining the function now, whereas it previously might have decided for it since it was able to see the exact source code expanded from the macros. I think that's what Raymond meant with his concerns regarding changing macros into inline functions. C compilers might be smart enough to always inline CPython's new inline functions themselves, but the style change can still have unexpected transitive impacts on code that uses them. I agree with Raymond that as long as there is no clear gain in this code churn, we should not underestimate the risk of degarding code on user side. Stefan ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged)
Neil Schemenauer schrieb am 17.11.18 um 00:10: > I think making PyObject an opaque pointer would help. ... well, as long as type checks are still as fast as with "ob_type", and visible to the C compiler so that it can eliminate redundant ones, I wouldn't mind. :) > - Borrowed references are a problem. However, because they are so > commonly used and because the source code changes needed to change > to a non-borrowed API is non-trivial, I don't think we should try > to change this. Maybe we could just discourage their use? FWIW, the code that Cython generates has a macro guard [1] that makes it avoid borrowed references where possible, e.g. when it detects compilation under PyPy. That's definitely doable already, right now. > - It would be nice to make PyTypeObject an opaque pointer as well. > I think that's a lot more difficult than making PyObject opaque. > So, I don't think we should attempt it in the near future. Maybe > we could make a half-way step and discourage accessing ob_type > directly. We would provide functions (probably inline) to do what > you would otherwise do by using op->ob_type->. I've sometimes been annoyed by the fact that protocol checks require two pointer indirections in CPython (or even three in some cases), so that the C compiler is essentially prevented from making any assumptions, and the CPU branch prediction is also stretched a bit more than necessary. At least, the slot check usually comes right before the call, so that the lookups are not wasted. Inline functions are unlikely to improve that situation, but at least they shouldn't make it worse, and they would be more explicit. Needless to say that Cython also has a macro guard in [1] that disables direct slot access and makes it fall back to C-API calls, for users and Python implementations where direct slot support is not wanted/available. > One reason you want to discourage access to ob_type is that > internally there is not necessarily one PyTypeObject structure for > each Python level type. E.g. the VM might have specialized types > for certain sub-domains. This is like the different flavours of > strings, depending on the set of characters stored in them. Or, > you could have different list types. One type of list if all > values are ints, for example. An implementation like this could also be based on the buffer protocol. It's already supported by the array.array type (which people probably also just use when they have a need like this and don't want to resort to NumPy). > Basically, with CPython op->ob_type is super fast. For other VMs, > it could be a lot slower. By accessing ob_type you are saying > "give me all possible type information for this object pointer". > By using functions to get just what you need, you could be putting > less burden on the VM. E.g. "is this object an instance of some > type" is faster to compute. Agreed. I think that inline functions (well, or macros, because why not?) that check for certain protocols explicitly could be helpful. > - APIs that return pointers to the internals of objects are a > problem. E.g. PySequence_Fast_ITEMS(). For CPython, this is > really fast because it is just exposing the internal details of > the layout that is already in the correct format. For other VMs, > that API could be expensive to emulate. E.g. you have a list to > store only ints. If someone calls PySequence_Fast_ITEMS(), you > have to create real PyObjects for all of the list elements. But that's intended by the caller, right? They want a flat serial representation of the sequence, with potential conversion to a (list) array if necessary. They might be a bit badly named, but that's exactly the contract of the "PySequence_Fast_*()" line of functions. In Cython, we completely avoid these functions, because they are way too generic for optimisation purposes. Direct type checks and code specialisation are much more effective. > - Reducing the size of the API seems helpful. E.g. we don't need > PyObject_CallObject() *and* PyObject_Call(). Also, do we really > need all the type specific APIs, PyList_GetItem() vs > PyObject_GetItem()? In some cases maybe we can justify the bigger > API due to performance. To add a new API, someone should have a > benchmark that shows a real speedup (not just that they imagine it > makes a difference). So, in Cython, we use macros wherever possible, and often avoid generic protocols in favour of type specialisations. We sometimes keep local copies of C-API helper functions, because inlining them allows the C compiler to strip down and streamline the implementation at compile time, rather than jumping through generic code. (Also, it's sometimes required in order to backport new CPython features to Py2.7+.) PyPy's cpyext often just maps type specific C-API functions to the same generic code, obviously, but in CPython, having a way to bypass protocols and going
Re: [Python-Dev] Need discussion for a PR about memory and objects
On Sun, 4 Nov 2018 at 23:33, Steven D'Aprano wrote: > > On Sun, Nov 04, 2018 at 11:43:50AM +0100, Stephane Wirtel wrote: > > In this PR [https://github.com/python/cpython/pull/3382] "Remove reference > > to > > address from the docs, as it only causes confusion", opened by Chris > > Angelico, there is a discussion about the right term to use for the > > address of an object in memory. > > Why do we need to refer to the address of objects in memory? Following up on this discussion from a couple of weeks ago, note that Stephane misstated Chris's question/proposal from the PR slightly. The context is that the data model documentation for objects currently describes them as having an identity, a type, and a value, and then uses "address in memory" as an analogy for the properties that the object identity has (i.e. only one object can have a given identifier at any particular point in time, but identifiers can be re-used over time as objects are created and destroyed). That analogy is problematic, since it encourages the "object identities are memory addresses" mindset that happens to be true in CPython, but isn't true for Python implementations in general, and also isn't helpful for folks that have never learned a lower level language where you're manipulating pointers directly. However, simply removing the analogy entirely leaves that paragraph in the documentation feeling incomplete, so it would be valuable to replace it with a different real world analogy that will make sense to a broad audience. Chris's initial suggestion was to use "license number" or "social security number" (i.e. numbers governments assign to people), but I'm thinking a better comparison might be to vehicle registration numbers, since that analogy can be extended to the type and value characteristics in a fairly straightforward way: - the object identity is like the registration number or license plate (unique within the particular system of registering vehicles, but not unique across systems, and may sometimes be transferred to a new vehicle after the old one is destroyed) - the object type is like the make and model (e.g. a 2007 Toyota Corolla Ascent Sedan) - the object value is a specific car (e.g. "that white Corolla over there with 89000 km on the odometer") On the other hand, we're talking about the language reference here, not the tutorial, and understanding memory addressing seems like a reasonable assumed pre-requisite in that context. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com