Re: Look free ID genertion (was: Is there a more efficient threading lock?)

2023-03-02 Thread Dennis Lee Bieber
On Thu, 2 Mar 2023 12:45:50 +1100, Chris Angelico 
declaimed the following:

>
>As have all CPUs since; it's the only way to implement locks (push the
>locking all the way down to the CPU level).
>

Xerox Sigma (circa 1970): Modify and Test (byte/halfword/word)

Granted, that was a "mainframe" system, not a microprocessor.

Looks like Intel didn't catch the boat until 1985 and the i386.



-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Look free ID genertion (was: Is there a more efficient threading lock?)

2023-03-01 Thread Jon Ribbens via Python-list
On 2023-03-02, Chris Angelico  wrote:
> On Thu, 2 Mar 2023 at 08:01, <2qdxy4rzwzuui...@potatochowder.com> wrote:
>> On 2023-03-01 at 14:35:35 -0500,
>> avi.e.gr...@gmail.com wrote:
>> > What would have happened if all processors had been required to have
>> > some low level instruction that effectively did something in an atomic
>> > way that allowed a way for anyone using any language running on that
>> > machine a way to do a simple thing like set a lock or check it?
>>
>> Have happened?  I don't know about "required," but processors have
>> indeed had such instructions for decades; e.g., the MC68000 from the
>> early to mid 1980s (and used in the original Apple Macintosh, but I
>> digress) has/had a Test and Set instruction.
>
> As have all CPUs since; it's the only way to implement locks (push the
> locking all the way down to the CPU level).

Indeed, I remember thinking it was very fancy when they added the SWP
instruction to the ARM processor.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Look free ID genertion (was: Is there a more efficient threading lock?)

2023-03-01 Thread Chris Angelico
On Thu, 2 Mar 2023 at 13:02, Weatherby,Gerard  wrote:
>
> So I guess we know what would have happened.
>

Yep. It's not what I was talking about, but it's also a very important
concurrency management feature.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Look free ID genertion (was: Is there a more efficient threading lock?)

2023-03-01 Thread Weatherby,Gerard
So I guess we know what would have happened.

Get Outlook for iOS<https://aka.ms/o0ukef>

From: Python-list  on 
behalf of Chris Angelico 
Sent: Wednesday, March 1, 2023 8:45:50 PM
To: python-list@python.org 
Subject: Re: Look free ID genertion (was: Is there a more efficient threading 
lock?)

*** Attention: This is an external email. Use caution responding, opening 
attachments or clicking on links. ***

On Thu, 2 Mar 2023 at 08:01, <2qdxy4rzwzuui...@potatochowder.com> wrote:
>
> On 2023-03-01 at 14:35:35 -0500,
> avi.e.gr...@gmail.com wrote:
>
> > What would have happened if all processors had been required to have
> > some low level instruction that effectively did something in an atomic
> > way that allowed a way for anyone using any language running on that
> > machine a way to do a simple thing like set a lock or check it?
>
> Have happened?  I don't know about "required," but processors have
> indeed had such instructions for decades; e.g., the MC68000 from the
> early to mid 1980s (and used in the original Apple Macintosh, but I
> digress) has/had a Test and Set instruction.

As have all CPUs since; it's the only way to implement locks (push the
locking all the way down to the CPU level).

ChrisA
--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!g70S067RzF2oPCFUYpFFzUvPHRfS0AHIGEvVyww1Tlj7BCCrsU3DWIqCE9UBO_ex0ZVanquFLHGe1d2b$
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Look free ID genertion (was: Is there a more efficient threading lock?)

2023-03-01 Thread Chris Angelico
On Thu, 2 Mar 2023 at 08:01, <2qdxy4rzwzuui...@potatochowder.com> wrote:
>
> On 2023-03-01 at 14:35:35 -0500,
> avi.e.gr...@gmail.com wrote:
>
> > What would have happened if all processors had been required to have
> > some low level instruction that effectively did something in an atomic
> > way that allowed a way for anyone using any language running on that
> > machine a way to do a simple thing like set a lock or check it?
>
> Have happened?  I don't know about "required," but processors have
> indeed had such instructions for decades; e.g., the MC68000 from the
> early to mid 1980s (and used in the original Apple Macintosh, but I
> digress) has/had a Test and Set instruction.

As have all CPUs since; it's the only way to implement locks (push the
locking all the way down to the CPU level).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Look free ID genertion (was: Is there a more efficient threading lock?)

2023-03-01 Thread 2QdxY4RzWzUUiLuE
On 2023-03-01 at 14:35:35 -0500,
avi.e.gr...@gmail.com wrote:

> What would have happened if all processors had been required to have
> some low level instruction that effectively did something in an atomic
> way that allowed a way for anyone using any language running on that
> machine a way to do a simple thing like set a lock or check it?

Have happened?  I don't know about "required," but processors have
indeed had such instructions for decades; e.g., the MC68000 from the
early to mid 1980s (and used in the original Apple Macintosh, but I
digress) has/had a Test and Set instruction.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Look free ID genertion (was: Is there a more efficient threading lock?)

2023-03-01 Thread Chris Angelico
On Thu, 2 Mar 2023 at 06:37,  wrote:
>
> If a workaround like itertools.count.__next__() is used because it will not
> be interrupted as it is implemented in C, then I have to ask if it would
> make sense for Python to supply something similar in the standard library
> for the sole purpose of a use in locks.

That's not lock-free :) The only way that it works is because it's
locked against other threads doing the same job. Lock-free ID
generation means that:

1) Two threads can request IDs simultaneously and will not block each other
2) No two "request an ID" calls will ever return the same value
3) Preferably (but not required), IDs are not wasted.

PostgreSQL has ways of doing this, and there are a few other ways, but
simply using a count object and relying on the GIL isn't going to
achieve the first (though it'll happily achieve the other two).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Look free ID genertion (was: Is there a more efficient threading lock?)

2023-03-01 Thread avi.e.gross
If a workaround like itertools.count.__next__() is used because it will not
be interrupted as it is implemented in C, then I have to ask if it would
make sense for Python to supply something similar in the standard library
for the sole purpose of a use in locks.

But realistically, this is one place the concept of an abstract python
language intersects aspects of what is bundled into a sort of core at or
soon after startup, as well as the reality that python can be implemented in
many ways including some ways on some hardware that may not make guarantees
to behave this way.

Realistically, the history of computing is full of choices made that now
look less useful or obvious.

What would have happened if all processors had been required to have some
low level instruction that effectively did something in an atomic way that
allowed a way for anyone using any language running on that machine a way to
do a simple thing like set a lock or check it?

Of course life has also turned out to be more complex. Some architectures
can now support a small number of operations and implement others as sort of
streams of those operations liked together.  You would need to be sure your
program is very directly using the atomic operation directly.

-Original Message-
From: Python-list  On
Behalf Of Dieter Maurer
Sent: Wednesday, March 1, 2023 1:43 PM
To: Chris Angelico 
Cc: python-list@python.org
Subject: Look free ID genertion (was: Is there a more efficient threading
lock?)

Chris Angelico wrote at 2023-3-1 12:58 +1100:
> ...
> The
>atomicity would be more useful in that context as it would give 
>lock-free ID generation, which doesn't work in Python.

I have seen `itertools.count` for that.
This works because its `__next__` is implemented in "C" and therefore will
not be interrupted by a thread switch.
--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Look free ID genertion (was: Is there a more efficient threading lock?)

2023-03-01 Thread Dieter Maurer
Chris Angelico wrote at 2023-3-1 12:58 +1100:
> ...
> The
>atomicity would be more useful in that context as it would give
>lock-free ID generation, which doesn't work in Python.

I have seen `itertools.count` for that.
This works because its `__next__` is implemented in "C" and
therefore will not be interrupted by a thread switch.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-28 Thread Chris Angelico
On Wed, 1 Mar 2023 at 10:04, Barry  wrote:
>
> > Though it's still probably not as useful as you might hope. In C, if I
> > can do "int id = counter++;" atomically, it would guarantee me a new
> > ID that no other thread could ever have.
>
> C does not have to do that atomically. In fact it is free to use lots of 
> instructions to build the int value. And some compilers indeed do, the linux 
> kernel folks see this in gcc generated code.
>
> I understand you have to use the new atomics features.
>

Yeah, I didn't have a good analogy so I went with a hypothetical.  The
atomicity would be more useful in that context as it would give
lock-free ID generation, which doesn't work in Python.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-28 Thread Barry
> Though it's still probably not as useful as you might hope. In C, if I
> can do "int id = counter++;" atomically, it would guarantee me a new
> ID that no other thread could ever have.

C does not have to do that atomically. In fact it is free to use lots of 
instructions to build the int value. And some compilers indeed do, the linux 
kernel folks see this in gcc generated code.

I understand you have to use the new atomics features.

Barry


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-26 Thread Chris Angelico
On Mon, 27 Feb 2023 at 17:28, Michael Speer  wrote:
>
> https://github.com/python/cpython/commit/4958f5d69dd2bf86866c43491caf72f774ddec97
>
> it's a quirk of implementation. the scheduler currently only checks if it
> needs to release the gil after the POP_JUMP_IF_FALSE, POP_JUMP_IF_TRUE,
> JUMP_ABSOLUTE, CALL_METHOD, CALL_FUNCTION, CALL_FUNCTION_KW, and
> CALL_FUNCTION_EX opcodes.
>

Oh now that is VERY interesting. It's a quirk of implementation, yes,
but there's a reason for it; a bug being solved. The underlying
guarantee about __exit__ should be considered to be defined behaviour,
meaning that the precise quirk might not be relevant even though the
bug has to remain fixed in all future versions. But I'd also note here
that, if it can be absolutely 100% guaranteed that the GIL will be
released and signals checked on a reasonable interval, there's no
particular reason to state that signals are checked after every single
Python bytecode. (See the removed comment about empty loops, which
would have been a serious issue and is probably why the backward jump
rule exists.)

So it wouldn't be too hard for a future release of Python to mandate
atomicity of certain specific operations. Obviously it'd require
buy-in from other implementations, but it would be rather convenient
if, subject to some very tight rules like "only when adding integers
onto core data types" etc, a simple statement like "x.y += 1" could
actually be guaranteed to take place atomically.

Though it's still probably not as useful as you might hope. In C, if I
can do "int id = counter++;" atomically, it would guarantee me a new
ID that no other thread could ever have. But in Python, that increment
operation doesn't give you the result, so all it's really useful for
is statistics on operations done. Still, that in itself could be of
value in quite a few situations.

In any case, though, this isn't something to depend upon at the moment.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-26 Thread Michael Speer
https://stackoverflow.com/questions/69993959/python-threads-difference-for-3-10-and-others

https://github.com/python/cpython/commit/4958f5d69dd2bf86866c43491caf72f774ddec97

it's a quirk of implementation. the scheduler currently only checks if it
needs to release the gil after the POP_JUMP_IF_FALSE, POP_JUMP_IF_TRUE,
JUMP_ABSOLUTE, CALL_METHOD, CALL_FUNCTION, CALL_FUNCTION_KW, and
CALL_FUNCTION_EX opcodes.

>>> import code
>>> import dis
>>> dis.dis( code.update_x_times )
 10   0 LOAD_GLOBAL  0 (range)
  2 LOAD_FAST0 (xx)
  4 CALL_FUNCTION1
# GIL CAN RELEASE HERE #
  6 GET_ITER
>>8 FOR_ITER 6 (to 22)
 10 STORE_FAST   1 (_)
 12  12 LOAD_GLOBAL  1 (vv)
 14 LOAD_CONST   1 (1)
 16 INPLACE_ADD
 18 STORE_GLOBAL 1 (vv)
 20 JUMP_ABSOLUTE4 (to 8)
# GIL CAN RELEASE HERE (after JUMP_ABSOLUTE points the instruction
counter back to FOR_ITER, but before the interpreter actually jumps to
FOR_ITER again) #
 10 >>   22 LOAD_CONST   0 (None)
 24 RETURN_VALUE
>>>

due to this, this section:
 12  12 LOAD_GLOBAL  1 (vv)
 14 LOAD_CONST   1 (1)
 16 INPLACE_ADD
 18 STORE_GLOBAL 1 (vv)

is effectively locked/atomic on post-3.10 interpreters, though this is
neither portable nor guaranteed to stay that way into the future


On Sun, Feb 26, 2023 at 10:19 PM Michael Speer  wrote:

> I wanted to provide an example that your claimed atomicity is simply
> wrong, but I found there is something different in the 3.10+ cpython
> implementations.
>
> I've tested the code at the bottom of this message using a few docker
> python images, and it appears there is a difference starting in 3.10.0
>
> python3.8
> EXPECTED 256000
> ACTUAL   84533137
> python:3.9
> EXPECTED 256000
> ACTUAL   95311773
> python:3.10 (.8)
> EXPECTED 256000
> ACTUAL   256000
>
> just to see if there was a specific sub-version of 3.10 that added it
> python:3.10.0
> EXPECTED 256000
> ACTUAL   256000
>
> nope, from the start of 3.10 this is happening
>
> the only difference in the bytecode I see is 3.10 adds SETUP_LOOP and
> POP_BLOCK around the for loop
>
> I don't see anything different in the long c code that I would expect
> would cause this.
>
> AFAICT the inplace add is null for longs and so should revert to the
> long_add that always creates a new integer in x_add
>
> another test
> python:3.11
> EXPECTED 256000
> ACTUAL   256000
>
> I'm not sure where the difference is at the moment. I didn't see anything
> in the release notes given a quick glance.
>
> I do agree that you shouldn't depend on this unless you find a written
> guarantee of the behavior, as it is likely an implementation quirk of some
> kind
>
> --[code]--
>
> import threading
>
> UPDATES = 1000
> THREADS = 256
>
> vv = 0
>
> def update_x_times( xx ):
> for _ in range( xx ):
> global vv
> vv += 1
>
> def main():
> tts = []
> for _ in range( THREADS ):
> tts.append( threading.Thread( target = update_x_times, args =
> (UPDATES,) ) )
>
> for tt in tts:
> tt.start()
>
> for tt in tts:
> tt.join()
>
> print( 'EXPECTED', UPDATES * THREADS )
> print( 'ACTUAL  ', vv )
>
> if __name__ == '__main__':
> main()
>
> On Sun, Feb 26, 2023 at 6:35 PM Jon Ribbens via Python-list <
> python-list@python.org> wrote:
>
>> On 2023-02-26, Barry Scott  wrote:
>> > On 25/02/2023 23:45, Jon Ribbens via Python-list wrote:
>> >> I think it is the case that x += 1 is atomic but foo.x += 1 is not.
>> >
>> > No that is not true, and has never been true.
>> >
>> >:>>> def x(a):
>> >:...a += 1
>> >:...
>> >:>>>
>> >:>>> dis.dis(x)
>> >   1   0 RESUME   0
>> >
>> >   2   2 LOAD_FAST0 (a)
>> >   4 LOAD_CONST   1 (1)
>> >   6 BINARY_OP   13 (+=)
>> >  10 STORE_FAST   0 (a)
>> >  12 LOAD_CONST   0 (None)
>> >  14 RETURN_VALUE
>> >:>>>
>> >
>> > As you can see there are 4 byte code ops executed.
>> >
>> > Python's eval loop can switch to another thread between any of them.
>> >
>> > Its is not true that the GIL provides atomic operations in python.
>>
>> That's oversimplifying to the point of falsehood (just as the opposite
>> would be too). And: see my other reply in this thread just now - if the
>> GIL isn't making "x += 1" atomic, something else is.
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-26 Thread Michael Speer
I wanted to provide an example that your claimed atomicity is simply wrong,
but I found there is something different in the 3.10+ cpython
implementations.

I've tested the code at the bottom of this message using a few docker
python images, and it appears there is a difference starting in 3.10.0

python3.8
EXPECTED 256000
ACTUAL   84533137
python:3.9
EXPECTED 256000
ACTUAL   95311773
python:3.10 (.8)
EXPECTED 256000
ACTUAL   256000

just to see if there was a specific sub-version of 3.10 that added it
python:3.10.0
EXPECTED 256000
ACTUAL   256000

nope, from the start of 3.10 this is happening

the only difference in the bytecode I see is 3.10 adds SETUP_LOOP and
POP_BLOCK around the for loop

I don't see anything different in the long c code that I would expect would
cause this.

AFAICT the inplace add is null for longs and so should revert to the
long_add that always creates a new integer in x_add

another test
python:3.11
EXPECTED 256000
ACTUAL   256000

I'm not sure where the difference is at the moment. I didn't see anything
in the release notes given a quick glance.

I do agree that you shouldn't depend on this unless you find a written
guarantee of the behavior, as it is likely an implementation quirk of some
kind

--[code]--

import threading

UPDATES = 1000
THREADS = 256

vv = 0

def update_x_times( xx ):
for _ in range( xx ):
global vv
vv += 1

def main():
tts = []
for _ in range( THREADS ):
tts.append( threading.Thread( target = update_x_times, args =
(UPDATES,) ) )

for tt in tts:
tt.start()

for tt in tts:
tt.join()

print( 'EXPECTED', UPDATES * THREADS )
print( 'ACTUAL  ', vv )

if __name__ == '__main__':
main()

On Sun, Feb 26, 2023 at 6:35 PM Jon Ribbens via Python-list <
python-list@python.org> wrote:

> On 2023-02-26, Barry Scott  wrote:
> > On 25/02/2023 23:45, Jon Ribbens via Python-list wrote:
> >> I think it is the case that x += 1 is atomic but foo.x += 1 is not.
> >
> > No that is not true, and has never been true.
> >
> >:>>> def x(a):
> >:...a += 1
> >:...
> >:>>>
> >:>>> dis.dis(x)
> >   1   0 RESUME   0
> >
> >   2   2 LOAD_FAST0 (a)
> >   4 LOAD_CONST   1 (1)
> >   6 BINARY_OP   13 (+=)
> >  10 STORE_FAST   0 (a)
> >  12 LOAD_CONST   0 (None)
> >  14 RETURN_VALUE
> >:>>>
> >
> > As you can see there are 4 byte code ops executed.
> >
> > Python's eval loop can switch to another thread between any of them.
> >
> > Its is not true that the GIL provides atomic operations in python.
>
> That's oversimplifying to the point of falsehood (just as the opposite
> would be too). And: see my other reply in this thread just now - if the
> GIL isn't making "x += 1" atomic, something else is.
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-26 Thread Chris Angelico
On Mon, 27 Feb 2023 at 10:42, Jon Ribbens via Python-list
 wrote:
>
> On 2023-02-26, Chris Angelico  wrote:
> > On Sun, 26 Feb 2023 at 16:16, Jon Ribbens via Python-list
> > wrote:
> >> On 2023-02-25, Paul Rubin  wrote:
> >> > The GIL is an evil thing, but it has been around for so long that most
> >> > of us have gotten used to it, and some user code actually relies on it.
> >> > For example, with the GIL in place, a statement like "x += 1" is always
> >> > atomic, I believe.  But, I think it is better to not have any shared
> >> > mutables regardless.
> >>
> >> I think it is the case that x += 1 is atomic but foo.x += 1 is not.
> >> Any replacement for the GIL would have to keep the former at least,
> >> plus the fact that you can do hundreds of things like list.append(foo)
> >> which are all effectively atomic.
> >
> > The GIL is most assuredly *not* an evil thing. If you think it's so
> > evil, go ahead and remove it, because we'll clearly be better off
> > without it, right?
>
> If you say so. I said nothing whatsoever about the GIL being evil.

You didn't, but I was also responding to Paul's description that the
GIL "is an evil thing". Apologies if that wasn't clear.

> Yes, sure, you can make x += 1 not work even single-threaded if you
> make custom types which override basic operations. I'm talking about
> when you're dealing with simple atomic built-in types such as integers.
>
> > Here's the equivalent with just incrementing a global:
> >
>  def thrd():
> > ... x += 1
> > ...
>  dis.dis(thrd)
> >   1   0 RESUME   0
> >
> >   2   2 LOAD_FAST_CHECK  0 (x)
> >   4 LOAD_CONST   1 (1)
> >   6 BINARY_OP   13 (+=)
> >  10 STORE_FAST   0 (x)
> >  12 LOAD_CONST   0 (None)
> >  14 RETURN_VALUE
> 
> >
> > The exact same sequence: load, add, store. Still not atomic.
>
> And yet, it appears that *something* changed between Python 2
> and Python 3 such that it *is* atomic:

I don't think that's a guarantee. You might be unable to make it
break, but that doesn't mean it's dependable.

In any case, it's not the GIL that's doing this. It might be a quirk
of the current implementation of the core evaluation loop, or it might
be something unrelated, but whatever it is, removing the GIL wouldn't
change that; and it's certainly no different whether it's a global or
an attribute of an object.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-26 Thread Skip Montanaro
> And yet, it appears that *something* changed between Python 2 and Python
3 such that it *is* atomic:

I haven't looked, but something to check in the source is opcode
prediction. It's possible that after the BINARY_OP executes, opcode
prediction jumps straight to the STORE_FAST opcode, avoiding the transfer
to the top of the virtual machine loop. That would (I think) avoid checks
related to GIL release and thread switches.

I don't guarantee that's what's going on, and even if I'm correct, I don't
think you can rely on it.

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-26 Thread Jon Ribbens via Python-list
On 2023-02-26, Chris Angelico  wrote:
> On Sun, 26 Feb 2023 at 16:16, Jon Ribbens via Python-list
> wrote:
>> On 2023-02-25, Paul Rubin  wrote:
>> > The GIL is an evil thing, but it has been around for so long that most
>> > of us have gotten used to it, and some user code actually relies on it.
>> > For example, with the GIL in place, a statement like "x += 1" is always
>> > atomic, I believe.  But, I think it is better to not have any shared
>> > mutables regardless.
>>
>> I think it is the case that x += 1 is atomic but foo.x += 1 is not.
>> Any replacement for the GIL would have to keep the former at least,
>> plus the fact that you can do hundreds of things like list.append(foo)
>> which are all effectively atomic.
>
> The GIL is most assuredly *not* an evil thing. If you think it's so
> evil, go ahead and remove it, because we'll clearly be better off
> without it, right?

If you say so. I said nothing whatsoever about the GIL being evil.

> As it turns out, most GIL-removal attempts have had a fairly nasty
> negative effect on performance. The GIL is a huge performance boost.
>
> As to what is atomic and what is not... it's complicated, as always.
> Suppose that x (or foo.x) is a custom type:

Yes, sure, you can make x += 1 not work even single-threaded if you
make custom types which override basic operations. I'm talking about
when you're dealing with simple atomic built-in types such as integers.

> Here's the equivalent with just incrementing a global:
>
 def thrd():
> ... x += 1
> ...
 dis.dis(thrd)
>   1   0 RESUME   0
>
>   2   2 LOAD_FAST_CHECK  0 (x)
>   4 LOAD_CONST   1 (1)
>   6 BINARY_OP   13 (+=)
>  10 STORE_FAST   0 (x)
>  12 LOAD_CONST   0 (None)
>  14 RETURN_VALUE

>
> The exact same sequence: load, add, store. Still not atomic.

And yet, it appears that *something* changed between Python 2
and Python 3 such that it *is* atomic:

import sys, threading
class Foo:
x = 0
foo = Foo()
y = 0
def thrd():
global y
for _ in range(1):
foo.x += 1
y += 1
threads = [threading.Thread(target=thrd) for _ in range(50)]
for t in threads: t.start()
for t in threads: t.join()
print(sys.version)
print(foo.x, y)

2.7.5 (default, Jun 28 2022, 15:30:04)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
(64489, 59854)

3.8.10 (default, Nov 14 2022, 12:59:47)
[GCC 9.4.0]
50 50

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-26 Thread Jon Ribbens via Python-list
On 2023-02-26, Barry Scott  wrote:
> On 25/02/2023 23:45, Jon Ribbens via Python-list wrote:
>> I think it is the case that x += 1 is atomic but foo.x += 1 is not.
>
> No that is not true, and has never been true.
>
>:>>> def x(a):
>:...    a += 1
>:...
>:>>>
>:>>> dis.dis(x)
>   1   0 RESUME   0
>
>   2   2 LOAD_FAST    0 (a)
>   4 LOAD_CONST   1 (1)
>   6 BINARY_OP   13 (+=)
>  10 STORE_FAST   0 (a)
>  12 LOAD_CONST   0 (None)
>  14 RETURN_VALUE
>:>>>
>
> As you can see there are 4 byte code ops executed.
>
> Python's eval loop can switch to another thread between any of them.
>
> Its is not true that the GIL provides atomic operations in python.

That's oversimplifying to the point of falsehood (just as the opposite
would be too). And: see my other reply in this thread just now - if the
GIL isn't making "x += 1" atomic, something else is.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-26 Thread Barry Scott


On 25/02/2023 23:45, Jon Ribbens via Python-list wrote:

I think it is the case that x += 1 is atomic but foo.x += 1 is not.


No that is not true, and has never been true.

:>>> def x(a):
:...    a += 1
:...
:>>>
:>>> dis.dis(x)
 1   0 RESUME   0

 2   2 LOAD_FAST    0 (a)
 4 LOAD_CONST   1 (1)
 6 BINARY_OP   13 (+=)
10 STORE_FAST   0 (a)
12 LOAD_CONST   0 (None)
14 RETURN_VALUE
:>>>

As you can see there are 4 byte code ops executed.

Python's eval loop can switch to another thread between any of them.

Its is not true that the GIL provides atomic operations in python.

Barry

--
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-25 Thread Chris Angelico
On Sun, 26 Feb 2023 at 16:27, Dennis Lee Bieber  wrote:
>
> On Sat, 25 Feb 2023 15:41:52 -0600, Skip Montanaro
>  declaimed the following:
>
>
> >concurrent.futures.ThreadPoolExecutor() with the default number of workers (
> >os.cpu_count() * 1.5, or 12 threads on my system) to process each month, so
> >12 active threads at a time. Given that the process is pretty much CPU
> >bound, maybe reducing the number of workers to the CPU count would make
>
> Unless things have improved a lot over the years, the GIL still limits
> active threads to the equivalent of a single CPU. The OS may swap among
> which CPU as it schedules system processes, but only one thread will be
> running at any moment regardless of CPU count.

Specifically, a single CPU core *executing Python bytecode*. There are
quite a few libraries that release the GIL during computation. Here's
a script that's quite capable of saturating my CPU entirely - in fact,
typing this email is glitchy due to lack of resources:

import threading
import bcrypt
results = [0, 0]
def thrd():
for _ in range(10):
ok = bcrypt.checkpw(b"password",
b'$2b$15$DGDXMb2zvPotw1rHFouzyOVzSopiLIUSedO5DVGQ1GblAd6L6I8/6')
results[ok] += 1

threads = [threading.Thread(target=thrd) for _ in range(100)]
for t in threads: t.start()
for t in threads: t.join()
print(results)

I have four cores eight threads, and yeah, my CPU's not exactly the
latest and greatest (i7 6700k - it was quite good some years ago, but
outstripped now), but feel free to crank the numbers if you want to.

I'm pretty sure bcrypt won't use more than one CPU core for a single
hashpw/checkpw call, but by releasing the GIL during the hard number
crunching, it allows easy parallelization. Same goes for numpy work,
or anything else that can be treated as a separate operation.

So it's more accurate to say that only one CPU core can be
*manipulating Python objects* at a time, although it's hard to pin
down exactly what that means, making it easier to say that there can
only be one executing Python bytecode; it should be possible for any
function call into a C library to be a point where other threads can
take over (most notably, any sort of I/O, but also these kinds of
things).

As mentioned, GIL-removal has been under discussion at times, most
recently (and currently) with PEP 703
https://peps.python.org/pep-0703/ - and the benefits in multithreaded
applications always have to be factored against quite significant
performance penalties. It's looking like PEP 703's proposal has the
least-bad performance measurements of any GILectomy I've seen so far,
showing 10% worse performance on average (possibly able to be reduced
to 5%). As it happens, a GIL just makes sense when you want pure, raw
performance, and it's only certain workloads that suffer under it.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-25 Thread Chris Angelico
On Sun, 26 Feb 2023 at 16:16, Jon Ribbens via Python-list
 wrote:
>
> On 2023-02-25, Paul Rubin  wrote:
> > The GIL is an evil thing, but it has been around for so long that most
> > of us have gotten used to it, and some user code actually relies on it.
> > For example, with the GIL in place, a statement like "x += 1" is always
> > atomic, I believe.  But, I think it is better to not have any shared
> > mutables regardless.
>
> I think it is the case that x += 1 is atomic but foo.x += 1 is not.
> Any replacement for the GIL would have to keep the former at least,
> plus the fact that you can do hundreds of things like list.append(foo)
> which are all effectively atomic.

The GIL is most assuredly *not* an evil thing. If you think it's so
evil, go ahead and remove it, because we'll clearly be better off
without it, right?

As it turns out, most GIL-removal attempts have had a fairly nasty
negative effect on performance. The GIL is a huge performance boost.

As to what is atomic and what is not... it's complicated, as always.
Suppose that x (or foo.x) is a custom type:

class Thing:
def __iadd__(self, other):
print("Hi, I'm being added onto!")
self.increment_by(other)
return self

Then no, neither of these is atomic, although if the increment itself
is, it probably won't matter. As far as I know, the only way that it
would be at all different for x+=1 and foo.x+=1 would be if the
__iadd__ method both mutates and returns something other than self,
which is quite unusual. (Most incrementing is done by either
constructing a new object to return, or mutating the existing one, but
not a hybrid.)

Consider this:

import threading
d = {0:0, 1:0, 2:0, 3:0}
def thrd():
for _ in range(1):
d[0] += 1
d[1] += 1
d[2] += 1
d[3] += 1

threads = [threading.Thread(target=thrd) for _ in range(50)]
for t in threads: t.start()
for t in threads: t.join()
print(d)

Is this code guaranteed to result in 50 in every slot in the
dictionary? What if you replace the dictionary with a four-element
list? Do you need a GIL for this, or some other sort of lock? What
exactly is it that is needed? To answer that question, let's look at
exactly what happens in the disassembly:

>>> def thrd():
... d[0] += 1
... d[1] += 1
...
>>> import dis
>>> dis.dis(thrd)
  1   0 RESUME   0

  2   2 LOAD_GLOBAL  0 (d)
 14 LOAD_CONST   1 (0)
 16 COPY 2
 18 COPY 2
 20 BINARY_SUBSCR
 30 LOAD_CONST   2 (1)
 32 BINARY_OP   13 (+=)
 36 SWAP 3
 38 SWAP 2
 40 STORE_SUBSCR

  3  44 LOAD_GLOBAL  0 (d)
 56 LOAD_CONST   2 (1)
 58 COPY 2
 60 COPY 2
 62 BINARY_SUBSCR
 72 LOAD_CONST   2 (1)
 74 BINARY_OP   13 (+=)
 78 SWAP 3
 80 SWAP 2
 82 STORE_SUBSCR
 86 LOAD_CONST   0 (None)
 88 RETURN_VALUE
>>>

(Your exact disassembly may differ, this was on CPython 3.12.)
Crucially, note these three instructions that occur in each block:
BINARY_SUBSCR, BINARY_OP, and STORE_SUBSCR. Those are a lookup
(retrieving the value of d[0]), the actual addition (adding one to the
value), and a store (putting the result back into d[0]). So it's
actually not guaranteed to be atomic; it would be perfectly reasonable
to interrupt that sequence and have something else do another
subscript.

Here's the equivalent with just incrementing a global:

>>> def thrd():
... x += 1
...
>>> dis.dis(thrd)
  1   0 RESUME   0

  2   2 LOAD_FAST_CHECK  0 (x)
  4 LOAD_CONST   1 (1)
  6 BINARY_OP   13 (+=)
 10 STORE_FAST   0 (x)
 12 LOAD_CONST   0 (None)
 14 RETURN_VALUE
>>>

The exact same sequence: load, add, store. Still not atomic.

General takeaway: The GIL is a performance feature, not a magic
solution, and certainly not an evil beast that must be slain at any
cost. Attempts to remove it always have to provide equivalent
protection in some other way. But the protection you think you have
might not be what you actually have.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-25 Thread Dennis Lee Bieber
On Sat, 25 Feb 2023 15:41:52 -0600, Skip Montanaro
 declaimed the following:


>concurrent.futures.ThreadPoolExecutor() with the default number of workers (
>os.cpu_count() * 1.5, or 12 threads on my system) to process each month, so
>12 active threads at a time. Given that the process is pretty much CPU
>bound, maybe reducing the number of workers to the CPU count would make

Unless things have improved a lot over the years, the GIL still limits
active threads to the equivalent of a single CPU. The OS may swap among
which CPU as it schedules system processes, but only one thread will be
running at any moment regardless of CPU count.

Common wisdom is that Python threading works well for I/O bound
systems, where each thread spends most of its time waiting for some I/O
operation to complete -- thereby allowing the OS to schedule other threads.

For CPU bound, use of the multiprocessing package may be more suited --
though you'll have to device a working IPC system transfer data to/from the
separate processes (no shared objects as possible with threads).


-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-25 Thread Jon Ribbens via Python-list
On 2023-02-25, Paul Rubin  wrote:
> Jon Ribbens  writes:
>>> 1) you generally want to use RLock rather than Lock
>> Why?
>
> So that a thread that tries to acquire it twice doesn't block itself,
> etc.  Look at the threading lib docs for more info.

Yes, I know what the docs say, I was asking why you were making the
statement above. I haven't used Lock very often, but I've literally
never once in 25 years needed to use RLock. As you say, it's best
to keep the lock-protected code brief, so it's usually pretty
obvious that the code can't be re-entered.

>> What does this mean? Are you saying the GIL has been removed?
>
> Last I heard there was an experimental version of CPython with the GIL
> removed.  It is supposed to take less of a performance hit due to
> INCREF/DECREF than an earlier attempt some years back.  I don't know its
> current status.
>
> The GIL is an evil thing, but it has been around for so long that most
> of us have gotten used to it, and some user code actually relies on it.
> For example, with the GIL in place, a statement like "x += 1" is always
> atomic, I believe.  But, I think it is better to not have any shared
> mutables regardless.

I think it is the case that x += 1 is atomic but foo.x += 1 is not.
Any replacement for the GIL would have to keep the former at least,
plus the fact that you can do hundreds of things like list.append(foo)
which are all effectively atomic.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-25 Thread Jon Ribbens via Python-list
On 2023-02-25, Paul Rubin  wrote:
> Skip Montanaro  writes:
>> from threading import Lock
>
> 1) you generally want to use RLock rather than Lock

Why?

> 2) I have generally felt that using locks at the app level at all is an
> antipattern.  The main way I've stayed sane in multi-threaded Python
> code is to have every mutable strictly owned by exactly one thread, pass
> values around using Queues, and have an event loop in each thread taking
> requests from Queues.
>
> 3) I didn't know that no-gil was a now thing and I'm used to having the
> GIL.  So I would have considered the multiprocessing module rather than
> threading, for something like this.

What does this mean? Are you saying the GIL has been removed?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-25 Thread Barry
Re sqlite and threads. The C API can be compiled to be thread safe from my
Reading if the sqlite docs. What I have not checked is how python’s bundled 
sqlite
is compiled. There are claims python’s sqlite is not thread safe.

Barry


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-25 Thread Thomas Passin

On 2/25/2023 4:41 PM, Skip Montanaro wrote:

Thanks for the responses.

Peter wrote:


Which OS is this?


MacOS Ventura 13.1, M1 MacBook Pro (eight cores).

Thomas wrote:

 > I'm no expert on locks, but you don't usually want to keep a lock while
 > some long-running computation goes on.  You want the computation to be
 > done by a separate thread, put its results somewhere, and then notify
 > the choreographing thread that the result is ready.

In this case I'm extracting the noun phrases from the body of an email 
message(returned as a list). I have a collection of email messages 
organized by month(typically 1000 to 3000 messages per month). I'm using 
concurrent.futures.ThreadPoolExecutor() with the default number of 
workers (os.cpu_count() * 1.5, or 12 threads on my system)to process 
each month, so 12 active threads at a time. Given that the process is 
pretty much CPU bound, maybe reducing the number of workers to the CPU 
count would make sense. Processing of each email message enters that 
with block once.That's about as minimal as I can make it. I thought for 
a bit about pushing the textblob stuff into a separate worker thread, 
but it wasn't obvious how to set up queues to handle the communication 
between the threads created by ThreadPoolExecutor()and the worker 
thread. Maybe I'll think about it harder. (I have a related problem with 
SQLite, since an open database can't be manipulated from multiple 
threads. That makes much of the program's end-of-run processing 
single-threaded.)


If the noun extractor is single-threaded (which I think you mentioned), 
no amount of parallel access is going to help.  The best you can do is 
to queue up requests so that as soon as the noun extractor returns from 
one call, it gets handed another blob.  The CPU will be busy all the 
time running the noun-extraction code.


If that's the case, you might just as well eliminate all the threads and 
just do it sequentially in the most obvious and simple manner.


It would possibly be worth while to try this approach out and see what 
happens to the CPU usage and overall computation time.



 > This link may be helpful -
 >
 > https://anandology.com/blog/using-iterators-and-generators/ 



I don't think that's where my problem is. The lock protects the 
generation of the noun phrases. My loop which does the yielding operates 
outside of that lock's control. The version of the code is my latest, in 
which I tossed out a bunch of phrase-processing code (effectively dead 
end ideas for processing the phrases). Replacing the for loop with a 
simple return seems not to have any effect. In any case, the caller 
which uses the phrases does a fair amount of extra work with the 
phrases, populating a SQLite database, so I don't think the amount of 
time it takes to process a single email message is dominated by the 
phrase generation.


Here's timeitoutput for the noun_phrases code:

% python -m timeit -s 'text = """`python -m timeit --help`""" ; from 
textblob import TextBlob ; from textblob.np_extractors import 
ConllExtractor ; ext = ConllExtractor() ; phrases = TextBlob(text, 
np_extractor=ext).noun_phrases' 'phrases = TextBlob(text, 
np_extractor=ext).noun_phrases'

5000 loops, best of 5: 98.7 usec per loop

I process the output of timeit's help message which looks to be about 
the same length as a typical email message, certainly the same order of 
magnitude. Also, note that I call it once in the setup to eliminate the 
initial training of the ConllExtractor instance. I don't know if ~100us 
qualifies as long running or not.


I'll keep messing with it.

Skip


--
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-25 Thread Weatherby,Gerard


“I'm no expert on locks, but you don't usually want to keep a lock while
some long-running computation goes on.  You want the computation to be
done by a separate thread, put its results somewhere, and then notify
the choreographing thread that the result is ready.”

Maybe. There are so many possible threaded application designs I’d hesitate to 
make a general statement.

The threading.Lock.acquire method has flags for both a non-blocking attempt and 
a timeout, so a valid design could include a long-running computation with a 
main thread or event loop polling the thread. Or the thread could signal a main 
loop some other way.

I’ve written some code that coordinated threads by having a process talk to 
itself using a socket.socketpair. The advantage is that you can bundle multiple 
items (sockets, file handles, a polling timeout) into a select.select call 
which waits without consuming resources (at least on Linux) until
something interesting happens.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-25 Thread Skip Montanaro
Thanks for the responses.

Peter wrote:

> Which OS is this?

MacOS Ventura 13.1, M1 MacBook Pro (eight cores).

Thomas wrote:

> I'm no expert on locks, but you don't usually want to keep a lock while
> some long-running computation goes on.  You want the computation to be
> done by a separate thread, put its results somewhere, and then notify
> the choreographing thread that the result is ready.

In this case I'm extracting the noun phrases from the body of an email
message (returned as a list). I have a collection of email messages
organized by month (typically 1000 to 3000 messages per month). I'm using
concurrent.futures.ThreadPoolExecutor() with the default number of workers (
os.cpu_count() * 1.5, or 12 threads on my system) to process each month, so
12 active threads at a time. Given that the process is pretty much CPU
bound, maybe reducing the number of workers to the CPU count would make
sense. Processing of each email message enters that with block once. That's
about as minimal as I can make it. I thought for a bit about pushing the
textblob stuff into a separate worker thread, but it wasn't obvious how to
set up queues to handle the communication between the threads created by
ThreadPoolExecutor() and the worker thread. Maybe I'll think about it
harder. (I have a related problem with SQLite, since an open database can't
be manipulated from multiple threads. That makes much of the program's
end-of-run processing single-threaded.)

> This link may be helpful -
>
> https://anandology.com/blog/using-iterators-and-generators/

I don't think that's where my problem is. The lock protects the generation
of the noun phrases. My loop which does the yielding operates outside of
that lock's control. The version of the code is my latest, in which I
tossed out a bunch of phrase-processing code (effectively dead end ideas
for processing the phrases). Replacing the for loop with a simple return
seems not to have any effect. In any case, the caller which uses the
phrases does a fair amount of extra work with the phrases, populating a
SQLite database, so I don't think the amount of time it takes to process a
single email message is dominated by the phrase generation.

Here's timeit output for the noun_phrases code:

% python -m timeit -s 'text = """`python -m timeit --help`""" ; from
textblob import TextBlob ; from textblob.np_extractors import
ConllExtractor ; ext = ConllExtractor() ; phrases = TextBlob(text,
np_extractor=ext).noun_phrases' 'phrases = TextBlob(text,
np_extractor=ext).noun_phrases'
5000 loops, best of 5: 98.7 usec per loop

I process the output of timeit's help message which looks to be about the
same length as a typical email message, certainly the same order of
magnitude. Also, note that I call it once in the setup to eliminate the
initial training of the ConllExtractor instance. I don't know if ~100us
qualifies as long running or not.

I'll keep messing with it.

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-25 Thread Peter J. Holzer
On 2023-02-25 09:52:15 -0600, Skip Montanaro wrote:
> BLOB_LOCK = Lock()
> 
> def get_terms(text):
> with BLOB_LOCK:
> phrases = TextBlob(text, np_extractor=EXTRACTOR).noun_phrases
> for phrase in phrases:
> yield phrase
> 
> When I monitor the application using py-spy, that with statement is
> consuming huge amounts of CPU.

Another thought:

How accurate is py-spy? Is it possible that it assigns time actually
spent in 
phrases = TextBlob(text, np_extractor=EXTRACTOR).noun_phrases
to
with BLOB_LOCK:
?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-25 Thread Thomas Passin

On 2/25/2023 10:52 AM, Skip Montanaro wrote:

I have a multi-threaded program which calls out to a non-thread-safe
library (not mine) in a couple places. I guard against multiple
threads executing code there using threading.Lock. The code is
straightforward:

from threading import Lock

# Something in textblob and/or nltk doesn't play nice with no-gil, so just
# serialize all blobby accesses.
BLOB_LOCK = Lock()

def get_terms(text):
 with BLOB_LOCK:
 phrases = TextBlob(text, np_extractor=EXTRACTOR).noun_phrases
 for phrase in phrases:
 yield phrase

When I monitor the application using py-spy, that with statement is
consuming huge amounts of CPU. Does threading.Lock.acquire() sleep
anywhere? I didn't see anything obvious poking around in the C code
which implements this stuff. I'm no expert though, so could easily
have missed something.


I'm no expert on locks, but you don't usually want to keep a lock while 
some long-running computation goes on.  You want the computation to be 
done by a separate thread, put its results somewhere, and then notify 
the choreographing thread that the result is ready.


This link may be helpful -

https://anandology.com/blog/using-iterators-and-generators/

--
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-25 Thread Peter J. Holzer
On 2023-02-25 09:52:15 -0600, Skip Montanaro wrote:
> I have a multi-threaded program which calls out to a non-thread-safe
> library (not mine) in a couple places. I guard against multiple
> threads executing code there using threading.Lock. The code is
> straightforward:
> 
> from threading import Lock
> 
> # Something in textblob and/or nltk doesn't play nice with no-gil, so just
> # serialize all blobby accesses.
> BLOB_LOCK = Lock()
> 
> def get_terms(text):
> with BLOB_LOCK:
> phrases = TextBlob(text, np_extractor=EXTRACTOR).noun_phrases
> for phrase in phrases:
> yield phrase
> 
> When I monitor the application using py-spy, that with statement is
> consuming huge amounts of CPU.

Which OS is this?

> Does threading.Lock.acquire() sleep anywhere?

On Linux it calls futex(2), which does sleep if it can't get the lock
right away. (Of course if it does get the lock, it will return
immediately which may use a lot of CPU if you are calling it a lot.)

hp


-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Is there a more efficient threading lock?

2023-02-25 Thread Skip Montanaro
I have a multi-threaded program which calls out to a non-thread-safe
library (not mine) in a couple places. I guard against multiple
threads executing code there using threading.Lock. The code is
straightforward:

from threading import Lock

# Something in textblob and/or nltk doesn't play nice with no-gil, so just
# serialize all blobby accesses.
BLOB_LOCK = Lock()

def get_terms(text):
with BLOB_LOCK:
phrases = TextBlob(text, np_extractor=EXTRACTOR).noun_phrases
for phrase in phrases:
yield phrase

When I monitor the application using py-spy, that with statement is
consuming huge amounts of CPU. Does threading.Lock.acquire() sleep
anywhere? I didn't see anything obvious poking around in the C code
which implements this stuff. I'm no expert though, so could easily
have missed something.

Thx,

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list