https://github.com/python/cpython/commit/a966d94e76d91ef60f9912a98a3869f38ecd438b
commit: a966d94e76d91ef60f9912a98a3869f38ecd438b
branch: main
author: Gregory P. Smith <[email protected]>
committer: gpshead <[email protected]>
date: 2026-01-22T09:21:07-08:00
summary:
gh-144157: Optimize bytes.translate() by deferring change detection (GH-144158)
Optimize bytes.translate() by deferring change detection
Move the equality check out of the hot loop to allow better compiler
optimization. Instead of checking each byte during translation, perform
a single memcmp at the end to determine if the input can be returned
unchanged.
This allows compilers to unroll and pipeline the loops, resulting in ~2x
throughput improvement for medium-to-large inputs (tested on an AMD zen2).
No change observed on small inputs.
It will also be faster for bytes subclasses as those do not need change
detection.
files:
A
Misc/NEWS.d/next/Core_and_Builtins/2026-01-22-16-20-16.gh-issue-144157.dxyp7k.rst
M Objects/bytesobject.c
diff --git
a/Misc/NEWS.d/next/Core_and_Builtins/2026-01-22-16-20-16.gh-issue-144157.dxyp7k.rst
b/Misc/NEWS.d/next/Core_and_Builtins/2026-01-22-16-20-16.gh-issue-144157.dxyp7k.rst
new file mode 100644
index 00000000000000..ff62d739d7804c
--- /dev/null
+++
b/Misc/NEWS.d/next/Core_and_Builtins/2026-01-22-16-20-16.gh-issue-144157.dxyp7k.rst
@@ -0,0 +1,2 @@
+:meth:`bytes.translate` now allows the compiler to unroll its loop more
+usefully for a 2x speedup in the common no-deletions specified case.
diff --git a/Objects/bytesobject.c b/Objects/bytesobject.c
index 2b0925017f29e4..56de99bde11682 100644
--- a/Objects/bytesobject.c
+++ b/Objects/bytesobject.c
@@ -2237,11 +2237,15 @@ bytes_translate_impl(PyBytesObject *self, PyObject
*table,
/* If no deletions are required, use faster code */
for (i = inlen; --i >= 0; ) {
c = Py_CHARMASK(*input++);
- if (Py_CHARMASK((*output++ = table_chars[c])) != c)
- changed = 1;
- }
- if (!changed && PyBytes_CheckExact(input_obj)) {
- Py_SETREF(result, Py_NewRef(input_obj));
+ *output++ = table_chars[c];
+ }
+ /* Check if anything changed (for returning original object) */
+ /* We save this check until the end so that the compiler will */
+ /* unroll the loop above leading to MUCH faster code. */
+ if (PyBytes_CheckExact(input_obj)) {
+ if (memcmp(PyBytes_AS_STRING(input_obj), output_start, inlen) ==
0) {
+ Py_SETREF(result, Py_NewRef(input_obj));
+ }
}
PyBuffer_Release(&del_table_view);
PyBuffer_Release(&table_view);
_______________________________________________
Python-checkins mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/python-checkins.python.org
Member address: [email protected]