Re: [Python-Dev] memcmp performance

2011-10-25 Thread Stefan Behnel

Richard Saunders, 25.10.2011 01:17:

-On [20111024 09:22], Stefan Behnel wrote:
  I agree. Given that the analysis shows that the libc memcmp() is
  particularly fast on many Linux systems, it should be up to the Python
  package maintainers for these systems to set that option externally through
  the optimisation CFLAGS.

Indeed, this is how I constructed my Python 3.3 and Python 2.7 :
setenv CFLAGS '-fno-builtin-memcmp'
just before I configured.

I would like to revisit changing unicode_compare: adding a
special arm for using memcmp when the unicode kinds are the
same will only work in two specific instances:

(1) the strings are the same kind, the char size is 1
* We could add THIS to unicode_compare, but it seems extremely
specialized by itself


But also extremely likely to happen. This means that the strings are pure 
ASCII, which is highly likely and one of the main reasons why the unicode 
string layout was rewritten for CPython 3.3. It allows CPython to save a 
lot of memory (thus clearly proving how likely this case is!), and it would 
also allow it to do faster comparisons for these strings.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] memcmp performance

2011-10-25 Thread Victor Stinner
Le Mardi 25 Octobre 2011 10:44:16 Stefan Behnel a écrit :
 Richard Saunders, 25.10.2011 01:17:
  -On [20111024 09:22], Stefan Behnel wrote:
I agree. Given that the analysis shows that the libc memcmp() is
particularly fast on many Linux systems, it should be up to the
Python package maintainers for these systems to set that option
externally through the optimisation CFLAGS.
  
  Indeed, this is how I constructed my Python 3.3 and Python 2.7 :
  setenv CFLAGS '-fno-builtin-memcmp'
  just before I configured.
  
  I would like to revisit changing unicode_compare: adding a
  special arm for using memcmp when the unicode kinds are the
  same will only work in two specific instances:
  
  (1) the strings are the same kind, the char size is 1
  * We could add THIS to unicode_compare, but it seems extremely
  specialized by itself
 
 But also extremely likely to happen. This means that the strings are pure
 ASCII, which is highly likely and one of the main reasons why the unicode
 string layout was rewritten for CPython 3.3. It allows CPython to save a
 lot of memory (thus clearly proving how likely this case is!), and it would
 also allow it to do faster comparisons for these strings.

Python 3.3 has already some optimizations for latin1: CPU and the C language 
are more efficient to process char* strings than Py_UCS2 and Py_UCS4 strings. 
For example, we are using memchr() to search a single character is a latin1 
string.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] memcmp performance

2011-10-24 Thread Richard Saunders
-On [20111024 09:22], Stefan Behnel (stefan...@behnel.de) wrote:I agree. Given that the analysis shows that the libc memcmp() isparticularly fast on many Linux systems, it should be up to the Pythonpackage maintainers for these systems to set that option externally throughthe optimisation CFLAGS.Indeed, this is how I constructed my Python 3.3 and Python 2.7 :setenv CFLAGS '-fno-builtin-memcmp'just before I configured.I would like to revisit changing unicode_compare: adding aspecial arm for using memcmp when the "unicode kinds" are thesame will only work in two specific instances:(1) the strings are the same kind, the char size is 1  * We could add THIS to unicode_compare, but it seems extremely   specialized by itself(2) the strings are the same kind, the char size is 1, and checking  for equality  * Since unicode_compare can't detect equality checking, we can't   really add this to unicode_compare at allThe problem is, of course, that memcmp won't compare for less-thanor greater-than correctly (unless on a BIG ENDIAN machine) forchar sizes of 2 or 4.If we wanted to put memcmp in unicodeobject.c, it would probably needto go into PyUnicode_RichCompare (so we would have some more semanticinformation). I may try to put together a patch for that, if peoplethink that's a good idea? It would be JUST adding a call to memcmpfor two instances specified above.From: Jeroen Ruigrok van der Werven asmo...@in-nomine.orgIn the same stretch, stuff like this needs to be documented. Packagemaintainers cannot be expected to follow each and every mailinglist's postsfor nuggets of information like this. Been there, done that, it's impossibleto keep track.I would like to second that: the whole point of a Makefile/configurationfile is to capture knowledge like this so it doesn't get lost.I would prefer the option would be part of a standard build Pythondistributes, but as long as the information gets captured SOMEWHEREso that (say) Fedora Core 17 has Python 2.7 built with -fno-builtin-memcmp,I would be happy. Gooday, Richie___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] memcmp performance

2011-10-24 Thread Stefan Behnel

Martin v. Löwis, 23.10.2011 23:44:

I am still rooting for -fno-builtin-memcmp in both Python 2.7 and 3.3 ...
(after we put memcmp in unicode_compare)


-1. We shouldn't do anything about this. Python has the tradition of not
working around platform bugs, except if the work-arounds are necessary
to make something work at all - i.e. in particular not for performance
issues.

If this is a serious problem, then platform vendors need to look into
it (CPU vendor, compiler vendor, OS vendor). If they don't act, it's
probably not a serious problem.

In the specific case, I don't think it's a problem at all. It's not
that memcmp is slow with the builtin version - it's just not as fast
as it could be. Adding a compiler option would put a maintenance burden
on Python - we already have way too many compiler options in
configure.in, and there is no good procedure to ever take them out
should they not be needed anymore.


I agree. Given that the analysis shows that the libc memcmp() is 
particularly fast on many Linux systems, it should be up to the Python 
package maintainers for these systems to set that option externally through 
the optimisation CFLAGS.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] memcmp performance

2011-10-24 Thread Jeroen Ruigrok van der Werven
-On [20111024 09:22], Stefan Behnel (stefan...@behnel.de) wrote:
I agree. Given that the analysis shows that the libc memcmp() is 
particularly fast on many Linux systems, it should be up to the Python 
package maintainers for these systems to set that option externally through 
the optimisation CFLAGS.

In the same stretch, stuff like this needs to be documented. Package
maintainers cannot be expected to follow each and every mailinglist's posts
for nuggets of information like this. Been there, done that, it's impossible
to keep track.

-- 
Jeroen Ruigrok van der Werven asmodai(-at-)in-nomine.org / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | GPG: 2EAC625B
Only in sleep can one find salvation that resembles Death...
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] memcmp performance

2011-10-23 Thread Martin v. Löwis
 I am still rooting for -fno-builtin-memcmp in both Python 2.7 and 3.3 ...
 (after we put memcmp in unicode_compare)

-1. We shouldn't do anything about this. Python has the tradition of not
working around platform bugs, except if the work-arounds are necessary
to make something work at all - i.e. in particular not for performance
issues.

If this is a serious problem, then platform vendors need to look into
it (CPU vendor, compiler vendor, OS vendor). If they don't act, it's
probably not a serious problem.

In the specific case, I don't think it's a problem at all. It's not
that memcmp is slow with the builtin version - it's just not as fast
as it could be. Adding a compiler option would put a maintenance burden
on Python - we already have way too many compiler options in
configure.in, and there is no good procedure to ever take them out
should they not be needed anymore.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] memcmp performance

2011-10-21 Thread Stefan Behnel

Antoine Pitrou, 20.10.2011 23:08:

I have been doing some performance experiments with memcmp, and I was
surprised that memcmp wasn't faster than it was in Python.  I did a whole,
long analysis and came up with some very simple results.


Thanks for the analysis. Non-bugfix work now happens on Python 3, where
the str type is Python 2's unicode type. Your recommendations would
have to be revisited under that light.


Well, Py3 is quite a bit different now that PEP393 is in. It appears to use 
memcmp() or strcmp() a lot less than before, but I think unicode_compare() 
should actually receive an optimisation to use a fast memcmp() if both 
string kinds are equal, at least when their character unit size is less 
than 4 (i.e. especially for ASCII strings). Funny enough, tailmatch() has 
such an optimisation.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] memcmp performance

2011-10-21 Thread Antoine Pitrou
On Fri, 21 Oct 2011 08:24:44 +0200
Stefan Behnel stefan...@behnel.de wrote:
 Antoine Pitrou, 20.10.2011 23:08:
  I have been doing some performance experiments with memcmp, and I was
  surprised that memcmp wasn't faster than it was in Python.  I did a whole,
  long analysis and came up with some very simple results.
 
  Thanks for the analysis. Non-bugfix work now happens on Python 3, where
  the str type is Python 2's unicode type. Your recommendations would
  have to be revisited under that light.
 
 Well, Py3 is quite a bit different now that PEP393 is in. It appears to use 
 memcmp() or strcmp() a lot less than before, but I think unicode_compare() 
 should actually receive an optimisation to use a fast memcmp() if both 
 string kinds are equal, at least when their character unit size is less 
 than 4 (i.e. especially for ASCII strings). Funny enough, tailmatch() has 
 such an optimisation.

Yes, unicode_compare() probably deserves optimizing.
Patches welcome, by the way :)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] memcmp performance

2011-10-21 Thread Richard Saunders
 Richard Saunders I have been doing some performance experiments with memcmp, and I was surprised that memcmp wasn't faster than it was in Python. I did a whole, long analysis and came up with some very simple results.Antoine Pitrou, 20.10.2011 23:08: Thanks for the analysis. Non-bugfix work now happens on Python 3, where the str type is Python 2's unicode type. Your recommendations would have to be revisited under that light. Stefan Behnel stefan...@behnel.deWell, Py3 is quite a bit different now that PEP393 is in. It appears to usememcmp() or strcmp() a lot less than before, but I think unicode_compare()should actually receive an optimisation to use a fast memcmp() if bothstring kinds are equal, at least when their character unit size is lessthan 4 (i.e. especially for ASCII strings). Funny enough, tailmatch() hassuch an optimisation.I started looking at the most recent 3.x baseline: a lot of places,the memcmp analysis appears relevant (zlib, arraymodule, datetime, xmlparse):all still use memcmp in about the same way. But I agree that there aresome major differences in the unicode portion.As long as the two strings are the same unicode "kind", you can use amemcmp to compare. In that case, I would almost argue some memcmpoptimization is even more important: unicode strings are potentially 2to 4 times larger, so the amount of time spent in memcmp may be more(i.e., I am still rooting for -fno-builtin-memcmp on the compile lines).I went ahead a quick string_test3.py for comparing strings(similar to what I did in Python 2.7)# Simple python string comparison test for Python 3.3a = []; b = []; c = []; d = []for x in range(0,1000) :  a.append("the quick brown fox"+str(x))  b.append("the wuick brown fox"+str(x))  c.append("the quick brown fox"+str(x))  d.append("the wuick brown fox"+str(x))count = 0for x in range(0,20) :  if a==c : count += 1  if a==c : count += 2  if a==d : count += 3  if b==c : count += 5  if b==d : count += 7  if c==d : count += 11print(count)Timings on On My FC14 machine (Intel Xeon W3520@2.67Ghz):29.18 seconds: Vanilla build of Python 3.329.17 seconds: Python 3.3 compiled with -fno-builtin-memcmp:No change: a little investigation shows unicode_compare is where allthe work is: Here's currently the main loop inside unicode_compare:  for (i = 0; i  len1  i  len2; ++i) {Py_UCS4 c1, c2;c1 = PyUnicode_READ(kind1, data1, i);c2 = PyUnicode_READ(kind2, data2, i);if (c1 != c2)  return (c1  c2) ? -1 : 1;  }  return (len1  len2) ? -1 : (len1 != len2);If both loops are the same unicode kind, we can add memcmpto unicode_compare for an optimization:  Py_ssize_t len = (len1len2) ? len1: len2;  /* use memcmp if both the same kind */  if (kind1==kind2) {   int result=memcmp(data1, data2, ((int)kind1)*len);   if (result!=0)	return result0 ? -1 : +1;  }Rerunning the test with this small change to unicode_compare:17.84 seconds: -fno-builtin-memcmp36.25 seconds: STANDARD memcmpThe standard memcmp is WORSE that the original unicode_comparecode, but if we compile using memcmp with -fno-builtin-memcmp, we get thatwonderful 2x performance increase again.I am still rooting for -fno-builtin-memcmp in both Python 2.7 and 3.3 ...(after we put memcmp in unicode_compare) Gooday, Richie___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] memcmp performance

2011-10-21 Thread Stefan Behnel

Richard Saunders, 21.10.2011 20:23:

As long as the two strings are the same unicode kind, you can use a
memcmp to compare. In that case, I would almost argue some memcmp
optimization is even more important: unicode strings are potentially 2
to 4 times larger, so the amount of time spent in memcmp may be more
(i.e., I am still rooting for -fno-builtin-memcmp on the compile lines).


I would argue that the pure ASCII (1 byte per character) case is even more 
important than the other cases, and it suffers from the 1 byte per 
comparison problem you noted. That's why you got the 2x speed-up for your 
quick test.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] memcmp performance

2011-10-21 Thread Antoine Pitrou
On Fri, 21 Oct 2011 18:23:24 + (GMT)
Richard Saunders richismyn...@me.com wrote:
 
 If both loops are the same unicode kind, we can add memcmp
 to unicode_compare for an optimization:
   
     Py_ssize_t len = (len1len2) ? len1: len2;
 
     /* use memcmp if both the same kind */
     if (kind1==kind2) {
       int result=memcmp(data1, data2, ((int)kind1)*len);
       if (result!=0) 
   return result0 ? -1 : +1; 
     }

Hmm, you have to be a bit subtler than that: on a little-endian
machine, you can't compare two characters by comparing their bytes
representation in memory order. So memcmp() can only be used for the
one-byte representation.
(actually, it can also be used for equality comparisons on any
representation)

 Rerunning the test with this small change to unicode_compare:
 
 17.84 seconds:  -fno-builtin-memcmp 
 36.25 seconds:  STANDARD memcmp
 
 The standard memcmp is WORSE that the original unicode_compare
 code, but if we compile using memcmp with -fno-builtin-memcmp, we get that
 wonderful 2x performance increase again.

The standard memcmp being worse is a bit puzzling. Intuitively, it
should have roughly the same performance as the original function.
I also wonder whether the slowdown could materialize on non-glibc
systems.

 I am still rooting for -fno-builtin-memcmp in both Python 2.7 and 3.3 ...
 (after we put memcmp in unicode_compare)

A patch for unicode_compare would be a good start. Its performance can
then be checked on other systems (such as Windows).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] memcmp performance

2011-10-20 Thread Richard Saunders
Hi,This is my first time on Python-dev, so I apologize for my newbie-ness.I have been doing some performance experiments with memcmp, and I wassurprised that memcmp wasn't faster than it was in Python. I did a whole,long analysis and came up with some very simple results.Before I put in a tracker bug report, I wanted to present my findingsand make sure they were repeatable to others (isn't that the natureof science? ;)  as well as offer discussion.The analysis is a pdf and is here:  http://www.picklingtools.com/study.pdfThe testcases are a tarball here: http://www.picklingtools.com/PickTest5.tar.gzI have three basic recommendations in the study: I amcurious what other people think. Gooday, Richie___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] memcmp performance

2011-10-20 Thread Antoine Pitrou

Hello,

 I have been doing some performance experiments with memcmp, and I was
 surprised that memcmp wasn't faster than it was in Python.  I did a whole, 
 long analysis and came up with some very simple results.
 
 Before I put in a tracker bug report, I wanted to present my findings
 and make sure they were repeatable to others (isn't that the nature
 of science? ;)   as well as offer discussion.

Thanks for the analysis. Non-bugfix work now happens on Python 3, where
the str type is Python 2's unicode type. Your recommendations would
have to be revisited under that light.

Have you reported gcc's outdated optimization issue to them? Or is it
already solved in newer gcc versions?

Under glibc-based systems, it seems we can't go wrong with the system
memcpy function. If gcc doesn't get in the way, that is.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] memcmp performance

2011-10-20 Thread Scott Dial
On 10/20/2011 5:08 PM, Antoine Pitrou wrote:
 Have you reported gcc's outdated optimization issue to them? Or is it
 already solved in newer gcc versions?

I checked this on gcc 4.6, and it still optimizes memcmp/strcmp into a
repz cmpsb instruction on x86. This has been known to be a problem
since at least 2002[1][2]. There are also some alternative
implementations available on their mailing list. It seems the main
objection to removing the optimization was that gcc isn't always
compiling against an optimized libc, so they didn't want to drop the
optimization. Beyond that, I think nobody was willing to put in the
effort to change the optimization itself.

[1] http://gcc.gnu.org/ml/gcc/2002-10/msg01616.html
[2] http://gcc.gnu.org/ml/gcc/2003-04/msg00166.html

-- 
Scott Dial
sc...@scottdial.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] memcmp performance

2011-10-20 Thread Richard Saunders
Hey, I have been doing some performance experiments with memcmp, and I was surprised that memcmp wasn't faster than it was in Python. I did a whole, long analysis and came up with some very simple results.Paul Svensson suggested I post as much as I can as text, as people would be more likely to read it.So, here's the basic ideas:(1) memcmp is surprisingly slow on some Intel gcc platforms (Linux)On several Linux, Intel platforms, memcmp was 2-3x slower thana simple, portable C function (with some optimizations).(2) The problem: If you compile C programs with gcc with any optimization on,  it will replace all memcmp calls with an assembly language stub: rep cmpsb  instead of the memcmp call.(3) rep cmpsb seems like it would be faster, but it really isn't:   this completely bypasses the memcmp.S, memcmp_sse3.S   and memcmp_sse4.S in glibc which are typically faster.(4) The basic conclusion is that the Python baseline on  Intel gcc platforms should probably be compiled with -fno-builtin-memcmp  so we "avoid" gcc's memcmp optimization.The numbers are all in the paper: I will endeavor to try to generate a text formof all the tables so it's easier to read. This is much first in the Python devarena, so I went a little overboard with my paper below. ;) Gooday, Richie Before I put in a tracker bug report, I wanted to present my findings and make sure they were repeatable to others (isn't that the nature of science? ;)  as well as offer discussion. The analysis is a pdf and is here:   http://www.picklingtools.com/study.pdf The testcases are a tarball here:  http://www.picklingtools.com/PickTest5.tar.gz I have three basic recommendations in the study: I am curious what other people think.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com