While working on #1767933, Serhiy came up with an observation that
"monkey-patching" one of the base classes of io is faster than using
BytesIO when in need of a file-like object for writing into.

I've distilled it into this standalone test:

import io

data = [b'a'*10, b'bb'*5, b'ccc'*5] * 10000

def withbytesio():
    bio = io.BytesIO()
    for i in data:
        bio.write(i)
    return bio.getvalue()

def monkeypatching():
    mydata = []
    file = io.RawIOBase()
    file.writable = lambda: True
    file.write = mydata.append

    for i in data:
        file.write(i)
    return b''.join(mydata)

The second approach is consistently 10-20% faster than the first one
(depending on input) for trunk Python 3.3

Is there any reason for this to be so? What does BytesIO give us that the
second approach does not (I tried adding more methods to the patched
RawIOBase to make it more functional, like seekable() and tell(), and it
doesn't affect performance)?

This also raises a "moral" question - should I be using the second approach
deep inside the stdlib (ET.tostring) just because it's faster?

Eli
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to