[Python-ideas] Re: Extension methods in Python

Steven D'Aprano Wed, 23 Jun 2021 06:24:17 -0700

On Wed, Jun 23, 2021 at 03:47:05PM +1000, Chris Angelico wrote:

> Okay. Lemme give it to you *even more clearly* since the previous
> example didn't satisfy.
> 
> # file1.py
> 
> @extend(list)
> def in_order(self):
>     return sorted(self)
> 
> def frob(stuff):
>     return stuff.in_order()
> 
> # file2.py
> 
> from file1 import frob
> thing = [1, 5, 2]
> frob(thing) # == [1, 2, 5]
> def otherfrob(stuff):
>     return stuff.in_order()
> otherfrob(thing) # AttributeError
> 
> 
> 
> Am I correct so far? The function imported from file1 has the
> extension method, the code in file2 does not. That's the entire point
> here, right?


Correct so far.


> Okay. Now, what if getattr is brought into the mix?

To a first approximation (ignoring shadowing) every dot lookup can be 
replaced with getattr and vice versa:

    obj.name <--> getattr(obj, 'name')

A simple source code transformation could handle that, and the behaviour 
of the code should be the same. Extension methods shouldn't change that.


> # file3.py
> @extend(list)
> def in_order(self):
>     return sorted(self)
> 
> def fetch1(stuff, attr):
>     if attr == "in_order": return stuff.in_order
>     if attr == "unordered": return stuff.unordered
>     return getattr(stuff, attr)
> 
> def fetch2(stuff, attr):
>     return getattr(stuff, attr)


In file3's scope, there is no list.unordered method, so any call like

    some_list.unordered
    getattr(some_list, 'unordered')

will fail, regardless of which list some_list is, or where it was 
created. That implies that:

    fetch1(some_list, 'unordered')
    fetch2(some_list, 'unordered')

will also fail. It doesn't matter who is calling the functions, or what 
module they are called from. What matters is the context where the 
attribute lookup occurs, which in fetch1 and fetch2 is the file3 scope.


> # file4.py
> from file3 import fetch1, fetch2

Doesn't matter that fetch1 and fetch2 are imported into file4. They are 
still executed in the global scope of file3. If they called `globals()`, 
they would see file3's globals, not file4's. Same thing for extension 
methods.


> import random
>
> @extend(list)
> def unordered(self):
>     return random.shuffle(self[:])

I think that's going to always return None :-)


> def fetch3(stuff, attr):
>     if attr == "in_order": return stuff.in_order
>     if attr == "unordered": return stuff.unordered
>     return getattr(stuff, attr)
> 
> def fetch4(stuff, attr):
>     return getattr(stuff, attr)

In the scope of file4, there is no list method "in_order", but there is 
a list method "unordered". So

    some_list.in_order
    getattr(some_list, 'in_order')

will fail. That implies that:

    fetch3(some_list, 'unordered')
    fetch4(some_list, 'unordered')

will also fail. It doesn't matter who is calling the functions, or what 
module they are called from. What matters is the context where the 
attribute lookup occurs, which in fetch3 and fetch4 is the file4 scope.

(By the way, I think that your example here is about ten times more 
obfuscated than it need be, because of the use of generic, uninformative 
names with numbers.)


> thing = [1, 5, 2]
> fetch1(thing, "in_order")()
> fetch2(thing, "in_order")()
> fetch3(thing, "in_order")()
> fetch4(thing, "in_order")()
> fetch1(thing, "unordered")()
> fetch2(thing, "unordered")()
> fetch3(thing, "unordered")()
> fetch4(thing, "unordered")()
>
> Okay. *NOW* which ones raise AttributeError, and which ones give the
> extension method?

Look at the execution context.

fetch1(thing, "in_order") and fetch2(thing, "in_order") execute in the 
scope of file3, where lists have an in_order extension method.

It doesn't matter that they are called from file4: the body of the 
fetchN functions, where the attribute access takes place, executes where 
the global scope is file3 and hence the extension method "in_order" is 
found and returned.

For the same reason, both fetch1(thing, "unordered") and fetch2(thing, 
"unordered") will fail.

It doesn't matter that they are called from file4: their execution 
context is their global scope, file3, and just as they see file3's 
globals, not the callers, they will see file3's extension methods.

(I say "the module's extension methods", not necessarily to imply that 
the extension methods are somehow attached to the module, but only that 
there is some sort of registry that says, in effect, "if your execution 
context is module X, then these extension methods are in use".)

Similarly, the body of fetch3 and fetch4 execute in the execution 
context of file4, where list has been extended with an unordered method. 
So fetch3(thing, "unordered") and fetch4(thing, "unordered") both return 
that unordered method.

For the same reason (the execution context), fetch3(thing, "in_order") 
and fetch4(thing, "in_order") both fail.


> What exactly are the semantics of getattr?

Oh gods, I don't know the exact semantics of attribute look ups now! 
Something like this, I think:

obj.attr (same as getattr(obj, 'attr'):


    if type(obj).__dict__['attr'] exists and is a data descriptor:
        # data descriptors are the highest priority
        return type(obj).__dict__['attr'].__get__()

    elif obj.__dict__ exists and obj.__dict__['attr'] exists:
        # followed by instance attributes in the instance dict
        return obj.__dict__['attr']

    elif type(obj) defines __slots__ and there is an 'attr' slot:
        # then instance attributes in slots
        if the slot is filled:
            return contents of slot 'attr'
        else:
            raise AttributeError

    elif type(obj).__dict__['attr'] exists:
        if it is a non-data descriptor:
            return type(obj).__dict__['attr'].__get__()
        else:
            return type(obj).__dict__['attr']

    elif type(obj) defines a __getattr__ method:
        return type(obj).__getattr__(obj)

    else:
        # search the superclass hierarchy
        ...
    # if we get all the way to the end
    raise AttributeError
        

I've left out `__getattribute__`, I *think* that gets called right at 
the beginning. Also the result of calling `__getattr__` is checked for 
descriptor protocol too. And the look ups on classes are slightly 
different. Also when looking up on classes, metaclasses may get 
involved. And super() defines its own `__getattribute__` to customize 
the lookups. (As other objects may do too.) And some of the fine details 
may be wrong.

But, overall, the "big picture" should be more or less correct:

1. check for data descriptors;
2. check for instance attributes (dict or slot);
3. check for non-data descriptors and class attributes;
4. call __getattr__ if it exists;
5. search the inheritance hierarchy;
6. raise AttributeError if none of the earlier steps matched.

If we follow C# semantics, extension methods would be checked after step 
4 and before step 5:

    if the execution context is using extensions for this class:
        and 'attr' is an extension method, return that method



> Please explain exactly what the semantics of getattr are, and exactly
> which modules it is supposed to be able to see. Remember, it is not a
> compiler construct or an operator. It is a function, and it lives in
> its own module (the builtins).

You seem to think that getattr being a function makes a difference. Why?

Aside from the possibility that it might be shadowed or deleted from 
builtins, can you give me any examples where `obj.attr` and 
`getattr(obj. 'attr')` behave differently? Even *one* example?

Okay, this is Python. You could write a class with a `__getattr__` or 
`__getattribute__` method that inspected the call chain and did 
something different if it spotted a function called "getattr". 
Congratulations, you are very smart and Python is very dynamic.

You might even write a __getattr__ that, oh, I don't know, returned a 
method if the execution context had opted in to a system that provided 
extra methods to your class. But I digress.

But apart from custom-made classes that deliberately play silly buggers 
if they see that getattr is involved, can you give an example of where 
it behaves differently to dot syntax?


> > Not a rhetorical question: is that how it works in something like Swift,
> > or Kotlin?
> 
> I have no idea. I'm just asking how you intend it to work in Python.
> If you want to cite other languages, go ahead, but I'm not assuming
> that they already have the solution, because they are different
> languages. Also not a rhetorical question: Is their getattr equivalent
> actually an operator or compiler construct, rather than being a
> regular function? Because if it is, then the entire problem doesn't
> exist.

I really don't know why you think getattr being a function makes any 
difference here. It's a builtin function, written in C, and can and does 
call the same internal C routines used by dot notation.


> > > And what about this?
> > >
> > > f = functools.partial(getattr, stuff)
> > > f("in_order")
> > >
> > > NOW which extension methods should apply? Those registered here? Those
> > > registered in the builtins? Those registered in functools?
> >
> > partial is just a wrapper around its function argument, so that should
> > behave *exactly* the same as `getattr(stuff, 'in_order')`.
> 
> So if it behaves exactly the same way that getattr would, then is it
> exactly the same as fetch2 and fetch4? If not, how is it different?

Okay, let's look at the partial object:


    >>> import functools
    >>> f = functools.partial(getattr, [10, 20])
    >>> f('index')(20)
    1

Partial objects like f don't seem to have anything like a __globals__ 
attribute that allow me to tell what the execution context would be. I 
*think* that for Python functions (def or lambda) they just inherit the 
execution context from the function. For builtins, I'm not sure. I 
presume their execution context will be the current scope.

Right now, I've already spent multiple hours on these posts, and I have 
more important things to do now than argue about the minutia of 
partial's behaviour. But if you wanted to do an experiment, you could do 
something like comparing the behaviour of:


    # module A.py
    f = lambda: globals()
    g = partial(globals)

    # module B.py
    from A import f, g
    f()
    g()


and see whether f and g behave identically. I expect that f would return 
A's globals regardless of where it was called from, but I'm not sure 
what g would do. It might very well return the globals of the calling 
site.

In any case, with respect to getattr, the principle would be the same: 
the execution context defines whether the partial object sees the 
extension methods or not. If the execution context is A, and A has opted 
in to use extension methods, then it will see extension methods. If the 
context is B, and B hasn't opted in, then it won't.



> What about other functions implemented in C? If I write a C module
> that calls PyObject_GetAttr, does it behave as if dot notation were
> used in the module that called me, or does it use my module's
> extension methods?

That depends. If you write a C module that calls PyObject_GetAttr right 
now, is that *exactly* the same as dot notation in pure-Python code?

The documentation is terse:

https://docs.python.org/3.8/c-api/object.html#c.PyObject_GetAttr

but if it is correct that it is precisely equivalent to dot syntax, then 
the same rules will apply. Has the current module opted in? If so, then 
does the class have an extension method of the requested name?

Same applies to code objects evaluated without a function, or whatever 
other exotic corner cases you think of. Whatever you think of, the 
answer will always be the same:

- if the execution context is a module that has opted to use 
  extension methods, then attribute access will see extension methods;

- if not, then it won't.

If you think of a scenario where you are executing code where there is 
no module scope at all, and all global lookups fail, then "no module" 
cannot opt in to use extension methods and so the code won't see them.

If you can think of a scenario where you are executing code where there 
are multiple module scopes that fight for supremacy using their two 
weapons of fear, surprise and a fanatical devotion to the Pope, then the 
winner will determine the result.

*wink*




-- 
Steve
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/WYXAKIK2OEV64O7EEAHL23VKIIANMCNS/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Extension methods in Python

Reply via email to