[Python-ideas] Re: Integer concatenation to byte string

Steven D'Aprano Tue, 02 Mar 2021 15:55:42 -0800

Memz,

Please keep your responses on the mailing list.

On Tue, Mar 02, 2021 at 08:07:39PM +0000, Barry Scott wrote:
> > On 2 Mar 2021, at 13:04, Memz <mmax42...@gmail.com> wrote:
> > 
> > There is no specific scenario it solves. The lack of efficiency of 
> > the timed code should speak for itself. Non-mutable bytes is a limit 
> > of python, since it's reliant on using function calls.

"Lack of efficiency" doesn't speak for itself.

You haven't shown how you benchmarked this, so we don't know if it is a 
valid comparison or not, but generally speaking I will allow that there 
is some function call overhead in Python. In this case you have:

- create a bytes string object;

- look up the name bytesarray, which requires two dict lookups (one in 
  the global scope that fails, one in the builtins scope that succeeds);

- then call the function with the bytes string object as argument;

- and finally the bytes object is garbage collected.

So it's reasonable to assume that this has some overhead. The overhead 
might even be significant if, for example, you create a temporary 10 GB 
byte string so you can append one byte to the end. But we don't 
typically care about optimizing for such unusual and extreme cases.

If you are trying to squeeze out every last nanosecond of performance, 
you're probably using the wrong language. Or at least the wrong 
interpreter. You might like to try PyPy, or some of the other 
specialising interpreters. Or write your critical code in Cython, or use 
ctypes, or write it as a C extension.

But honestly, I expect that you are falling into the trap of premature 
optimization. I presume that once you have your mutable bytearray 
object, you're actually going to do some work with it. It is quite 
likely that for any real example, not made-up Mickey-Mouse toy code, the 
time it takes to initialise the byte array object will be a negligible 
fraction of the time it takes your application to actually process the 
byte array object.

Who cares if it takes 130 nanoseconds to initialise the byte array 
object, if you then go on to spend ten million nanoseconds working with 
it? We don't typically make large language changes for the sake of 
micro-benchmarks.

[steve ~]$ python3.9 -m timeit "bytearray(b'abcdefghijklmnop')"
2000000 loops, best of 5: 131 nsec per loop

It's not inconceivable that in a tight loop where you have to make many 
bytearrays but do very little with them, the initialisation cost is 
significant. But to justify adding literal syntax to the language we 
would need to see some strong justification that the function call 
overhead not only is significant, but it is *frequently* a bottleneck in 
the code.

[Barry]
> All python byte code is interpreted by calling functions. They take 
> time and resources.

That's not entirely correct. Literals such as text strings, ints and 
floats get compiled directly into the byte-code. Now of course there is 
some overhead while executing the byte-code, but that doesn't include 
the heavy cost of a Python function call.

-- 
Steve
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7GTPEGXDAQRKWITBAGYWCU3MNY6JJE6U/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Integer concatenation to byte string

Reply via email to