On Fri, 21 Oct 2011 18:23:24 +0000 (GMT) Richard Saunders <richismyn...@me.com> wrote: > > If both loops are the same unicode kind, we can add memcmp > to unicode_compare for an optimization: > > Py_ssize_t len = (len1<len2) ? len1: len2; > > /* use memcmp if both the same kind */ > if (kind1==kind2) { > int result=memcmp(data1, data2, ((int)kind1)*len); > if (result!=0) > return result<0 ? -1 : +1; > }
Hmm, you have to be a bit subtler than that: on a little-endian machine, you can't compare two characters by comparing their bytes representation in memory order. So memcmp() can only be used for the one-byte representation. (actually, it can also be used for equality comparisons on any representation) > Rerunning the test with this small change to unicode_compare: > > 17.84 seconds: -fno-builtin-memcmp > 36.25 seconds: STANDARD memcmp > > The standard memcmp is WORSE that the original unicode_compare > code, but if we compile using memcmp with -fno-builtin-memcmp, we get that > wonderful 2x performance increase again. The standard memcmp being worse is a bit puzzling. Intuitively, it should have roughly the same performance as the original function. I also wonder whether the slowdown could materialize on non-glibc systems. > I am still rooting for -fno-builtin-memcmp in both Python 2.7 and 3.3 ... > (after we put memcmp in unicode_compare) A patch for unicode_compare would be a good start. Its performance can then be checked on other systems (such as Windows). Regards Antoine. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com