"Guido van Rossum" <[EMAIL PROTECTED]> wrote: > > Josiah (and other supporters of string views), > > You seem to be utterly convinced of the superior performance of your > proposal without having done any measurements. > > You appear to have a rather naive view on what makes code execute fast > or slow (e.g. you don't seem to appreciate the savings due to a string > object header and its data being consecutive in memory). > > Unless you have serious benchmark data (for realistic Python code) I > can't continue to participate in this discussion, where you have said > nothing new in many posts.
Put up or shut up, eh?
I have written a simple extension module using Pyrex (my manual C
extension writing is awful). Here are some sample interactions showing
that string views are indeed quite fast. In all of these examples, a
naive implementation using only stringview.partition() was able to beat
Python 2.5 str.partition, str.split, and re.finditer.
Attached you will find the implementation of stringview I used, along
with sufficient build scripts to get it working using Python 2.3 and
Pyrex 0.9.3 . Aside from replacing int usage with Py_ssize_t for 2.5,
and *nix users performing a dos2unix call, it should work without change
with the most recent Python and Pyrex versions.
- Josiah
Using 2.3 :
>>> x = stringview(40000*' ')
>>> if 1:
... t = time.time()
... while x:
... _1, _2, x = x.partition(' ')
... print time.time()-t
...
0.18700003624
>>>
Compared with Python 2.5 beta 2
>>> x = 40000*' '
>>> if 1:
... t = time.time()
... while x:
... _1, _2, x = x.partition(' ')
... print time.time()-t
...
0.625
>>>
But that's about as bad for Python 2.5 as it can get. What about
something else? Like a mail file? In my 21.5 meg archive of py3k,
which contains 3456 messages, I wanted to discover all messages.
Python 2.3.5 (#62, Feb 8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from stringview import *
>>> rest = stringview(open('mail', 'rb').read())
>>> import time
>>> if 1:
... x = []
... t = time.time()
... while rest:
... cur, found, rest = rest.partition('\r\n.\r\n')
... x.append(cur)
... print time.time()-t, len(x)
...
0.0780000686646 3456
>>>
What about Python 2.5 using split? That should be fast...
Python 2.5b2 (r25b2:50512, Jul 11 2006, 10:16:14) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> rest = open('mail', 'rb').read()
>>> import time
>>> if 1:
... t = time.time()
... x = rest.split('\r\n.\r\n')
... print time.time()-t, len(x)
...
0.109999895096 3457
>>>
Hrm...what about using re?
>>> import re
>>> pat = re.compile('\r\n\.\r\n')
>>> rest = open('mail', 'rb').read()
>>> import time
>>> if 1:
... x = []
... t = time.time()
... for i in pat.finditer(rest):
... x.append(i)
... print time.time()-t, len(x)
...
0.125 3456
>>>
Even that's not as good as Python 2.3 + string views.
stringview_build.py
Description: Binary data
stringview.pyx
Description: Binary data
stringview_helper.h
Description: Binary data
_setup.py
Description: Binary data
_______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
