[Python-Dev] Re: Python multithreading without the GIL

Guido van Rossum Sun, 31 Oct 2021 11:48:40 -0700

Remember that py stone is a terrible benchmark. It only exercises a few
byte codes and a modern CPU’s caching and branch prediction make minced
meat of those. Sam wrote a whole new register-based VM so perhaps that
exercises different byte codes.


On Sun, Oct 31, 2021 at 05:19 Skip Montanaro <skip.montan...@gmail.com>
wrote:

> Skip> 1. I use numpy arrays filled with random values, and the output
> array is also a numpy array. The vector multiplication is done in a simple
> for loop in my vecmul() function.
>
> CHB> probably doesn't make a difference for this exercise, but numpy
> arrays make lousy replacements for a  regular list ...
>
> Yeah, I don't think it should matter here. Both versions should be
> similarly penalized.
>
> Skip> The results were confusing, so I dredged up a copy of pystone to
> make sure I wasn't missing anything w.r.t. basic execution performance. I'm
> still confused, so will keep digging.
>
> CHB> I'll be interested to see what you find out :-)
>
> I'm still scratching my head. I was thinking there was something about the
> messaging between the main and worker threads, so I tweaked matmul.py to
> accept 0 as a number of threads. That means it would call matmul which
> would call vecmul directly. The original queue-using versions were simply
> renamed to matmul_t and vecmul_t.
>
> I am still confused. Here are the pystone numbers, nogil first, then the
> 3.9 git tip:
>
> (base) nogil_build% ./bin/python3 ~/cmd/pystone.py
> Pystone(1.1.1) time for 50000 passes = 0.137658
> This machine benchmarks at 363218 pystones/second
>
> (base) 3.9_build% ./bin/python3 ~/cmd/pystone.py
> Pystone(1.1.1) time for 50000 passes = 0.207102
> This machine benchmarks at 241427 pystones/second
>
> That suggests nogil is indeed a definite improvement over vanilla 3.9.
> However, here's a quick nogil v 3.9 timing run of my matrix multiplication,
> again, nogil followed by 3.9 tip:
>
> (base) nogil_build% time ./bin/python3 ~/tmp/matmul.py 0 100000
> a: (160, 625) b: (625, 320) result: (160, 320) -> 51200
>
> real 0m9.314s
> user 0m9.302s
> sys 0m0.012s
>
> (base) 3.9_build% time ./bin/python3 ~/tmp/matmul.py 0 100000
> a: (160, 625) b: (625, 320) result: (160, 320) -> 51200
>
> real 0m4.918s
> user 0m5.180s
> sys 0m0.380s
>
> What's up with that? Suddenly nogil is much slower than 3.9 tip. No
> threads are in use. I thought perhaps the nogil run somehow didn't use
> Sam's VM improvements, so I disassembled the two versions of vecmul. I
> won't bore you with the entire dis.dis output, but suffice it to say that
> Sam's instruction set appears to be in play:
>
> (base) nogil_build% PYTHONPATH=$HOME/tmp ./bin/python3/python3
> Python 3.9.0a4+ (heads/nogil:b0ee2c4740, Oct 30 2021, 16:23:03)
> [GCC 9.3.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import matmul, dis
> >>> dis.dis(matmul.vecmul)
>  26           0 FUNC_HEADER             11 (11)
>
>  28           2 LOAD_CONST               2 (0.0)
>               4 STORE_FAST               2 (result)
>
>  29           6 LOAD_GLOBAL          3 254 ('len'; 254)
>               9 STORE_FAST               8 (.t3)
>              11 COPY                   9 0 (.t4 <- a)
>              14 CALL_FUNCTION          9 1 (.t4 to .t5)
>              18 STORE_FAST               5 (.t0)
> ...
>
> So I unboxed the two numpy arrays once and used lists of lists for the
> actual work. The nogil version still performs worse by about a factor of
> two:
>
> (base) nogil_build% time ./bin/python3 ~/tmp/matmul.py 0 100000
> a: (160, 625) b: (625, 320) result: (160, 320) -> 51200
>
> real 0m9.537s
> user 0m9.525s
> sys 0m0.012s
>
> (base) 3.9_build% time ./bin/python3 ~/tmp/matmul.py 0 100000
> a: (160, 625) b: (625, 320) result: (160, 320) -> 51200
>
> real 0m4.836s
> user 0m5.109s
> sys 0m0.365s
>
> Still scratching my head and am open to suggestions about what to try
> next. If anyone is playing along from home, I've updated my script:
>
> https://gist.github.com/smontanaro/80f788a506d2f41156dae779562fd08d
>
> I'm sure there are things I could have done more efficiently, but I would
> think both Python versions would be similarly penalized by dumb s**t I've
> done.
>
> Skip
>
>
> Skip
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/4JSJFOWQPZHUAUGDVRGIU6LTF7QNXTLD/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
--Guido (mobile)

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SSLCURZJD5NLAYN5LFEZ4RJWU5YPQX65/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

Reply via email to