Re: [Python-Dev] Need discussion for a PR about memory and objects

2018-11-18 Thread Greg Ewing

Chris Angelico wrote:

Licence plate numbers do get reused.


And they can change, e.g. if you get a personalised plate.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Need discussion for a PR about memory and objects

2018-11-18 Thread Greg Ewing

Nick Coghlan wrote:

- the object identity is like the registration number or license plate
(unique within the particular system of registering vehicles, but not
unique across systems, and may sometimes be transferred to a new
vehicle after the old one is destroyed)
- the object type is like the make and model (e.g. a 2007 Toyota
Corolla Ascent Sedan)
- the object value is a specific car (e.g. "that white Corolla over
there with 89000 km on the odometer")


A bit confusing, because "that white Corolla over there" is referring
to its identity.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] General concerns about C API changes

2018-11-18 Thread Nathaniel Smith
On Sun, Nov 18, 2018 at 8:52 AM Stefan Behnel  wrote:
>
> Gregory P. Smith schrieb am 15.11.18 um 01:03:
> > From my point of view: A static inline function is a much nicer modern code
> > style than a C preprocessor macro.
>
> It's also slower to compile, given that function inlining happens at a much
> later point in the compiler pipeline than macro expansion. The C compiler
> won't even get to see macros in fact, whereas whether to inline a function
> or not is a dedicated decision during the optimisation phase based on
> metrics collected in earlier stages. For something as ubiquitous as
> Py_INCREF/Py_DECREF, it might even be visible in the compilation times.

Have you measured this? I had the opposite intuition, that macros on
average will be slower to compile because they increase the amount of
code that the frontend has to process. But I've never checked...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Need discussion for a PR about memory and > objects

2018-11-18 Thread Hugh Fisher
> Date: Sun, 18 Nov 2018 22:32:35 +1000
> From: Nick Coghlan 
> To: "Steven D'Aprano" 
> Cc: python-dev 
> Subject: Re: [Python-Dev] Need discussion for a PR about memory and
> objects

[  munch background ]
>
> Chris's initial suggestion was to use "license number" or "social
> security number" (i.e. numbers governments assign to people), but I'm
> thinking a better comparison might be to vehicle registration numbers,
> since that analogy can be extended to the type and value
> characteristics in a fairly straightforward way:
>
> - the object identity is like the registration number or license plate
> (unique within the particular system of registering vehicles, but not
> unique across systems, and may sometimes be transferred to a new
> vehicle after the old one is destroyed)
> - the object type is like the make and model (e.g. a 2007 Toyota
> Corolla Ascent Sedan)
> - the object value is a specific car (e.g. "that white Corolla over
> there with 89000 km on the odometer")
>
> On the other hand, we're talking about the language reference here,
> not the tutorial, and understanding memory addressing seems like a
> reasonable assumed pre-requisite in that context.

"Handle" has been used since the 1980s among Macintosh and
Win32 programmers as "unique identifier of some object but isn't
the memory address". The usage within those APIs seems to
match what's being proposed for the new Python C API, in that
programmers used functions to ask "what type are you?" "what
value do you have?" but couldn't, or at least shouldn't, rely on
actual memory layout.

I suggest that for the language reference, use the license plate
or registration analogy to introduce "handle" and after that use
handle throughout. It's short, distinctive, and either will match
up with what the programmer already knows or won't clash if
or when they encounter handles elsewhere.

-- 

cheers,
Hugh Fisher
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Need discussion for a PR about memory and objects

2018-11-18 Thread Jeff Allen

I found this (very good) summary ended in a surprising conclusion.

On 18/11/2018 12:32, Nick Coghlan wrote:

On Sun, 4 Nov 2018 at 23:33, Steven D'Aprano  wrote:

On Sun, Nov 04, 2018 at 11:43:50AM +0100, Stephane Wirtel wrote:

In this PR [https://github.com/python/cpython/pull/3382] "Remove reference
to
address from the docs, as it only causes confusion", opened by Chris
Angelico, there is a discussion about the right term to use for the
address of an object in memory.

Why do we need to refer to the address of objects in memory?

...
Chris's initial suggestion was to use "license number" or "social
security number" (i.e. numbers governments assign to people), but I'm
thinking a better comparison might be to vehicle registration numbers,
...
On the other hand, we're talking about the language reference here,
not the tutorial, and understanding memory addressing seems like a
reasonable assumed pre-requisite in that context.

Cheers,
Nick.
It is a good point that this is in the language reference, not a 
tutorial. Could we not expect readers of that to be prepared for a 
notion of object identity as the abstraction of what we mean by "the 
same object" vs "a distinct object"? If it were necessary to be explicit 
about what Python means by it, one could unpack the idea by its 
properties: distinct names may be given to the same object 
(is-operator); distinct objects may have the same value (==-operator); 
an object may change in value (if allowed by its type) while keeping its 
identity.


And then there is the id() function. That is an imperfect reflection of 
the identity. id() guarantees that for a given object (identity) it will 
always return the same integer during the life of that object, and a 
different integer for any distinct object (distinct identity) with an 
overlapping lifetime. We note that, in an implementation of Python where 
objects are fixed in memory for life, a conformant id() may return the 
object's address.


Jeff Allen




___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged)

2018-11-18 Thread Brett Cannon
On Fri, 16 Nov 2018 at 10:11, Paul Moore  wrote:

> On Fri, 16 Nov 2018 at 17:49, Brett Cannon  wrote:
> > And Just to be clear, I totally support coming up with a totally
> stripped-down C API as I have outlined above as that shouldn't be
> controversial for any VM that wants to have a C-level API.
>
> If a stripped down API like this is intended as "use this and you get
> compatibility across multiple Python interpreters and multiple Python
> versions" (essentially a much stronger and more effective version of
> the stable ABI) then I'm solidly in favour (and such an API has clear
> trade-offs that allow people to judge whether it's the right choice
> for them).
>

Yes, that's what I'm getting at. Basically we have to approach this from
the "start with nothing and build up until we have _just_ enough and thus
we know **everyone** now and into the future can support it", or we
approach with "take what we have now and start peeling back until we
_think_ it's good enough". Personally, I think the former is more
future-proof.


>
> Having this alongside the existing API, which would still be supported
> for projects that need low-level access or backward compatibility (or
> simply don't have the resources to change), but which will remain
> CPython-specific, seems like a perfectly fine idea.
>

And it can be done as wrappers around the current C API and as an external
project to start. As Nathaniel pointed out in another thread, this is
somewhat like what Py_LIMITED_API was meant to be, but I think we all admit
we slightly messed up by making it opt-out instead of opt-in and so we
didn't explicitly control that API as well as we probably should have (I
know I have probably screwed up by accidentally including import functions
by forgetting it was opt-out).

I also don't think it was necessarily designed from a minimalist
perspective to begin with as it defines things in terms of what's _not_ in
Py_LIMITED_API instead of explicitly listing what _is_. So it may (or may
not) lead to a different set of APIs in the end when you have to explicitly
list every API to include.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Need discussion for a PR about memory and objects

2018-11-18 Thread Chris Angelico
On Mon, Nov 19, 2018 at 6:01 AM Richard Damon  wrote:
> One issue with things like vehicle registration numbers is that the VIN
> of a vehicle is really a UUID, it is globally unique no other vehicle
> will every have that same ID number, and people may not think of the
> fact that some other ID numbers, like the SSN do get reused. Since the
> Python Object Identities can get reused after the object goes away, the
> analogy really needs to keep that clear, and not give the other extreme
> of a false impression that the ID won't get reused after the object goes
> away.

Licence plate numbers do get reused.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Need discussion for a PR about memory and objects

2018-11-18 Thread Richard Damon
On 11/18/18 7:32 AM, Nick Coghlan wrote:
> On Sun, 4 Nov 2018 at 23:33, Steven D'Aprano  wrote:
>> On Sun, Nov 04, 2018 at 11:43:50AM +0100, Stephane Wirtel wrote:
>>> In this PR [https://github.com/python/cpython/pull/3382] "Remove reference
>>> to
>>> address from the docs, as it only causes confusion", opened by Chris
>>> Angelico, there is a discussion about the right term to use for the
>>> address of an object in memory.
>> Why do we need to refer to the address of objects in memory?
> Following up on this discussion from a couple of weeks ago, note that
> Stephane misstated Chris's question/proposal from the PR slightly.
>
> The context is that the data model documentation for objects currently
> describes them as having an identity, a type, and a value, and then
> uses "address in memory" as an analogy for the properties that the
> object identity has (i.e. only one object can have a given identifier
> at any particular point in time, but identifiers can be re-used over
> time as objects are created and destroyed).
>
> That analogy is problematic, since it encourages the "object
> identities are memory addresses" mindset that happens to be true in
> CPython, but isn't true for Python implementations in general, and
> also isn't helpful for folks that have never learned a lower level
> language where you're manipulating pointers directly.
>
> However, simply removing the analogy entirely leaves that paragraph in
> the documentation feeling incomplete, so it would be valuable to
> replace it with a different real world analogy that will make sense to
> a broad audience.
>
> Chris's initial suggestion was to use "license number" or "social
> security number" (i.e. numbers governments assign to people), but I'm
> thinking a better comparison might be to vehicle registration numbers,
> since that analogy can be extended to the type and value
> characteristics in a fairly straightforward way:
>
> - the object identity is like the registration number or license plate
> (unique within the particular system of registering vehicles, but not
> unique across systems, and may sometimes be transferred to a new
> vehicle after the old one is destroyed)
> - the object type is like the make and model (e.g. a 2007 Toyota
> Corolla Ascent Sedan)
> - the object value is a specific car (e.g. "that white Corolla over
> there with 89000 km on the odometer")
>
> On the other hand, we're talking about the language reference here,
> not the tutorial, and understanding memory addressing seems like a
> reasonable assumed pre-requisite in that context.
>
> Cheers,
> Nick.

One issue with things like vehicle registration numbers is that the VIN
of a vehicle is really a UUID, it is globally unique no other vehicle
will every have that same ID number, and people may not think of the
fact that some other ID numbers, like the SSN do get reused. Since the
Python Object Identities can get reused after the object goes away, the
analogy really needs to keep that clear, and not give the other extreme
of a false impression that the ID won't get reused after the object goes
away.

-- 
Richard Damon

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] General concerns about C API changes

2018-11-18 Thread Stefan Behnel
Gregory P. Smith schrieb am 15.11.18 um 01:03:
> From my point of view: A static inline function is a much nicer modern code
> style than a C preprocessor macro.

It's also slower to compile, given that function inlining happens at a much
later point in the compiler pipeline than macro expansion. The C compiler
won't even get to see macros in fact, whereas whether to inline a function
or not is a dedicated decision during the optimisation phase based on
metrics collected in earlier stages. For something as ubiquitous as
Py_INCREF/Py_DECREF, it might even be visible in the compilation times.

Oh, BTW, I don't know if this was mentioned in the discussion before, but
transitive inlining can easily be impacted by the switch from a macro to an
inline function. Since inlining happens long before the final CPU code
generation, the C compiler needs to uses heuristics for estimating the
eventual "code weight" of an inline function, and then sums up all weights
within a calling function to decide whether to also inline that calling
function into the transitive callers or not.

Now imagine that you have an inline function that executes several
Py_INCREF/Py_DECREF call cycles, and the C compiler happens to slightly
overestimate the weights of these two. Then it might end up deciding
against inlining the function now, whereas it previously might have decided
for it since it was able to see the exact source code expanded from the
macros. I think that's what Raymond meant with his concerns regarding
changing macros into inline functions. C compilers might be smart enough to
always inline CPython's new inline functions themselves, but the style
change can still have unexpected transitive impacts on code that uses them.

I agree with Raymond that as long as there is no clear gain in this code
churn, we should not underestimate the risk of degarding code on user side.

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged)

2018-11-18 Thread Stefan Behnel
Neil Schemenauer schrieb am 17.11.18 um 00:10:
> I think making PyObject an opaque pointer would help.

... well, as long as type checks are still as fast as with "ob_type", and
visible to the C compiler so that it can eliminate redundant ones, I
wouldn't mind. :)


> - Borrowed references are a problem.  However, because they are so
>   commonly used and because the source code changes needed to change
>   to a non-borrowed API is non-trivial, I don't think we should try
>   to change this.  Maybe we could just discourage their use?

FWIW, the code that Cython generates has a macro guard [1] that makes it
avoid borrowed references where possible, e.g. when it detects compilation
under PyPy. That's definitely doable already, right now.


> - It would be nice to make PyTypeObject an opaque pointer as well.
>   I think that's a lot more difficult than making PyObject opaque.
>   So, I don't think we should attempt it in the near future.  Maybe
>   we could make a half-way step and discourage accessing ob_type
>   directly.  We would provide functions (probably inline) to do what
>   you would otherwise do by using op->ob_type->.

I've sometimes been annoyed by the fact that protocol checks require two
pointer indirections in CPython (or even three in some cases), so that the
C compiler is essentially prevented from making any assumptions, and the
CPU branch prediction is also stretched a bit more than necessary. At
least, the slot check usually comes right before the call, so that the
lookups are not wasted. Inline functions are unlikely to improve that
situation, but at least they shouldn't make it worse, and they would be
more explicit.

Needless to say that Cython also has a macro guard in [1] that disables
direct slot access and makes it fall back to C-API calls, for users and
Python implementations where direct slot support is not wanted/available.


>   One reason you want to discourage access to ob_type is that
>   internally there is not necessarily one PyTypeObject structure for
>   each Python level type.  E.g. the VM might have specialized types
>   for certain sub-domains.  This is like the different flavours of
>   strings, depending on the set of characters stored in them.  Or,
>   you could have different list types.  One type of list if all
>   values are ints, for example.

An implementation like this could also be based on the buffer protocol.
It's already supported by the array.array type (which people probably also
just use when they have a need like this and don't want to resort to NumPy).


>   Basically, with CPython op->ob_type is super fast.  For other VMs,
>   it could be a lot slower.  By accessing ob_type you are saying
>   "give me all possible type information for this object pointer".
>   By using functions to get just what you need, you could be putting
>   less burden on the VM.  E.g. "is this object an instance of some
>   type" is faster to compute.

Agreed. I think that inline functions (well, or macros, because why not?)
that check for certain protocols explicitly could be helpful.


> - APIs that return pointers to the internals of objects are a
>   problem.  E.g. PySequence_Fast_ITEMS().  For CPython, this is
>   really fast because it is just exposing the internal details of
>   the layout that is already in the correct format.  For other VMs,
>   that API could be expensive to emulate.  E.g. you have a list to
>   store only ints.  If someone calls PySequence_Fast_ITEMS(), you
>   have to create real PyObjects for all of the list elements.

But that's intended by the caller, right? They want a flat serial
representation of the sequence, with potential conversion to a (list) array
if necessary. They might be a bit badly named, but that's exactly the
contract of the "PySequence_Fast_*()" line of functions.

In Cython, we completely avoid these functions, because they are way too
generic for optimisation purposes. Direct type checks and code
specialisation are much more effective.


> - Reducing the size of the API seems helpful.  E.g. we don't need
>   PyObject_CallObject() *and* PyObject_Call().  Also, do we really
>   need all the type specific APIs, PyList_GetItem() vs
>   PyObject_GetItem()?  In some cases maybe we can justify the bigger
>   API due to performance.  To add a new API, someone should have a
>   benchmark that shows a real speedup (not just that they imagine it
>   makes a difference).

So, in Cython, we use macros wherever possible, and often avoid generic
protocols in favour of type specialisations. We sometimes keep local copies
of C-API helper functions, because inlining them allows the C compiler to
strip down and streamline the implementation at compile time, rather than
jumping through generic code. (Also, it's sometimes required in order to
backport new CPython features to Py2.7+.)

PyPy's cpyext often just maps type specific C-API functions to the same
generic code, obviously, but in CPython, having a way to bypass protocols
and going 

Re: [Python-Dev] Need discussion for a PR about memory and objects

2018-11-18 Thread Nick Coghlan
On Sun, 4 Nov 2018 at 23:33, Steven D'Aprano  wrote:
>
> On Sun, Nov 04, 2018 at 11:43:50AM +0100, Stephane Wirtel wrote:
> > In this PR [https://github.com/python/cpython/pull/3382] "Remove reference
> > to
> > address from the docs, as it only causes confusion", opened by Chris
> > Angelico, there is a discussion about the right term to use for the
> > address of an object in memory.
>
> Why do we need to refer to the address of objects in memory?

Following up on this discussion from a couple of weeks ago, note that
Stephane misstated Chris's question/proposal from the PR slightly.

The context is that the data model documentation for objects currently
describes them as having an identity, a type, and a value, and then
uses "address in memory" as an analogy for the properties that the
object identity has (i.e. only one object can have a given identifier
at any particular point in time, but identifiers can be re-used over
time as objects are created and destroyed).

That analogy is problematic, since it encourages the "object
identities are memory addresses" mindset that happens to be true in
CPython, but isn't true for Python implementations in general, and
also isn't helpful for folks that have never learned a lower level
language where you're manipulating pointers directly.

However, simply removing the analogy entirely leaves that paragraph in
the documentation feeling incomplete, so it would be valuable to
replace it with a different real world analogy that will make sense to
a broad audience.

Chris's initial suggestion was to use "license number" or "social
security number" (i.e. numbers governments assign to people), but I'm
thinking a better comparison might be to vehicle registration numbers,
since that analogy can be extended to the type and value
characteristics in a fairly straightforward way:

- the object identity is like the registration number or license plate
(unique within the particular system of registering vehicles, but not
unique across systems, and may sometimes be transferred to a new
vehicle after the old one is destroyed)
- the object type is like the make and model (e.g. a 2007 Toyota
Corolla Ascent Sedan)
- the object value is a specific car (e.g. "that white Corolla over
there with 89000 km on the odometer")

On the other hand, we're talking about the language reference here,
not the tutorial, and understanding memory addressing seems like a
reasonable assumed pre-requisite in that context.

Cheers,
Nick.

--
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com