New submission from Karthikeyan Singaravelan <tir.kar...@gmail.com>:

I came across this as a result of issue35557 and thought to make a new issue to 
keep the discussion separate. Currently the b16decode function uses a regex 
with re.search that can be compiled at the module level as a static variable to 
give up to 30% improvement when executed on Python 3.7. I am proposing a PR for 
this change since it looks safe to me.

$ python3 -m perf compare_to default.json optimized.json --table
+--------------------+---------+------------------------------+
| Benchmark          | default | optimized                    |
+====================+=========+==============================+
| b16decode          | 2.97 us | 2.03 us: 1.46x faster (-32%) |
+--------------------+---------+------------------------------+
| b16decode_casefold | 3.18 us | 2.19 us: 1.45x faster (-31%) |
+--------------------+---------+------------------------------+

Benchmark script : 

import perf
import re
import binascii
import base64

_B16DECODE_PAT = re.compile(b'[^0-9A-F]')

def b16decode_re_compiled_search(s, casefold=False):
    s = base64._bytes_from_decode_data(s)
    if casefold:
        s = s.upper()
    if _B16DECODE_PAT.search(s):
        raise binascii.Error('Non-base16 digit found')
    return binascii.unhexlify(s)

if __name__ == "__main__":
    hex_data = 
"806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca"
    hex_data_upper = hex_data.upper()

    assert base64.b16decode(hex_data_upper) == 
b16decode_re_compiled_search(hex_data_upper)
    assert base64.b16decode(hex_data, casefold=True) == 
b16decode_re_compiled_search(hex_data, casefold=True)

    runner = perf.Runner()
    if True: # toggle to False for default.json
        runner.timeit(name="b16decode",
                      stmt="b16decode_re_compiled_search(hex_data_upper)",
                      setup="from __main__ import b16decode_re_compiled_search, 
hex_data, hex_data_upper")
        runner.timeit(name="b16decode_casefold",
                      stmt="b16decode_re_compiled_search(hex_data, 
casefold=True)",
                      setup="from __main__ import b16decode_re_compiled_search, 
hex_data, hex_data_upper")
    else:
        runner.timeit(name="b16decode",
                      stmt="base64.b16decode(hex_data_upper)",
                      setup="from __main__ import hex_data, hex_data_upper; 
import base64")
        runner.timeit(name="b16decode_casefold",
                      stmt="base64.b16decode(hex_data, casefold=True)",
                      setup="from __main__ import hex_data, hex_data_upper; 
import base64")

----------
assignee: xtreak
components: Library (Lib)
messages: 332330
nosy: djhoulihan, serhiy.storchaka, xtreak
priority: normal
severity: normal
status: open
title: Optimize base64.b16decode to use compiled regex
type: performance
versions: Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35559>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to