Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-13 Thread Josiah Carlson

Larry Hastings <[EMAIL PROTECTED]> wrote:
[snip]
> The machine is dual-core, and was quiescent at the time.  XP's scheduler 
> is hopefully good enough to just leave the process running on one core.

It's not.  Go into the task manager (accessable via Ctrl+Alt+Del by
default) and change the process' affinity to the second core.  In my
experience, running on the second core (in both 2k and XP) tends to
produce slightly faster results.  Linux tends to keep processes on a
single core for a few seconds at a time.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-13 Thread Larry Hastings


I've uploaded a new patch to Sourceforge in response to feedback:
  * I purged all // comments and fixed all > 80 characters added by my 
patch, as per Neil Norwitz.
  * I added a definition of max() for those who don't already have one, 
as per [EMAIL PROTECTED]
It now compiles cleanly on Linux again without modification; sorry for 
not checking that since the original patch.

I've also uploaded my hacked-together benchmark script, for all that's 
worth.

That patch tracker page again:

http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470


M.-A. Lemburg wrote:
> When comparing results, please look at the minimum runtime.
> The average times are just given to indicate how much the mintime
> differs from the average of all runs.
>   
I'll do that next time.  In the meantime, I've also uploaded a zip file 
containing the results of my benchmarking, including the stdout from the 
run and the "-f" file which contains the pickled output.  So you can 
examine my results yourself, including doing analysis on the pickled 
data if you like.

> If however the speedups are not consistent across several runs of
> pybench, then it's likely that you have some background activity
> going on on the machine which causes a slowdown in the unmodified
> run you chose as basis for the comparison.
>   
The machine is dual-core, and was quiescent at the time.  XP's scheduler 
is hopefully good enough to just leave the process running on one core.

I ran the benchmarks just once on my Linux 2.6 machine; it's a dual-CPU 
P3 933EB (or maybe just 866EB, I forget).  It's faster overall there 
too, by 1.9% (minimum run-time).  The two tests I expected to be faster 
("ConcatStrings" and "CreateStringsWithConcat") were consistently much 
faster; beyond that the results don't particularly resemble the results 
from my XP machine.  (I uploaded those .txt and .pickle files too.)

The mystery overall speedup continues, not that I find it unwelcome.  :)

> Just to make sure: you are using pybench 2.0, right ?
>   
I sure was.  And I used stringbench.py downloaded from here:

http://svn.python.org/projects/sandbox/branches/jim-fix-setuptools-cli/stringbench/stringbench.py

Cheers,


/larry/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-09 Thread Kristján V . Jónsson
This patch looks really nice to use here at CCP.  Our code is full of string 
contcatenations so I will probably try to apply the patch soon and see what it 
gives us in a real life app.  The floating point integer cache was also a big 
win.  Soon, standard python won't be able to keep up with the patched versions 
out there :)

Oh, and since I have fixed the pcbuild8 thingy in the 2.5 branch, why don't you 
give the PGO version a whirl too?  Even the non-PGO dll, with link-time code 
generation, should be faster than your vanilla PCBuild one.  Read the 
Readme.txt for details.

Cheers,

Kristján
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] 
> On Behalf Of M.-A. Lemburg
> Sent: 9. október 2006 09:30
> To: Larry Hastings
> Cc: python-dev@python.org
> Subject: Re: [Python-Dev] PATCH submitted: Speed up + for 
> string concatenation, now as fast as "".join(x) idiom
> 
> Larry Hastings wrote:
> > Fredrik Lundh wrote:
> >> [EMAIL PROTECTED] wrote:
> >>   
> >>> MAL's pybench would probably be better for this presuming it does 
> >>> some addition with string operands.
> >>> 
> >> or stringbench.
> >>   
> > 
> > I ran 'em, and they are strangely consistent with pystone.
> > 
> > With concat, stringbench is ever-so-slightly faster 
> overall.  "172.82" 
> > vs "174.85" for the "ascii" column, I guess that's in seconds.  I'm 
> > just happy it's not slower.  (I only ran stringbench once; 
> it seems to 
> > take *forever*).
> > 
> > I ran pybench three times for each build.  The slowest 
> concat overall 
> > time was still 2.9% faster than the fastest release time.
> > "ConcatStrings" is a big winner, at around 150% faster; 
> since the test 
> > doesn't *do* anything with the concatenated values, it 
> never renders 
> > the concatenation objects, so it does a lot less work.
> > "CreateStringsWithConcat" is generally 18-19% faster, as expected.  
> > After that, the timings are all over the place, but some tests were 
> > consistently faster: "CompareInternedStrings" was 8-12% faster, 
> > "DictWithFloatKeys" was 9-11% faster, "SmallLists" was 
> 8-15% faster, 
> > "CompareLongs" was 6-10% faster, and "PyMethodCalls" was 
> 4-6% faster.
> > (These are all comparing the "average run-time" results, though the 
> > "minimum run-time" results were similar.)
> 
> When comparing results, please look at the minimum runtime.
> The average times are just given to indicate how much the 
> mintime differs from the average of all runs.
> 
> > I still couldn't tell you why my results are faster.  I swear on my 
> > mother's eyes I didn't touch anything major involved in 
> > "DictWithFloatKeys", "SmallLists", or "CompareLongs".  I 
> didn't touch 
> > the compiler settings, so that shouldn't be it.  I acknowledge not 
> > only that it could all be a mistake, and that I don't know enough 
> > about it to speculate.//
> 
> Depending on what you changed, it is possible that the layout 
> of the code in memory better fits your CPU architecture.
> 
> If however the speedups are not consistent across several 
> runs of pybench, then it's likely that you have some 
> background activity going on on the machine which causes a 
> slowdown in the unmodified run you chose as basis for the comparison.
> 
> Just to make sure: you are using pybench 2.0, right ?
> 
> --
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, 
> Oct 09 2006)
> >>> Python/Zope Consulting and Support ...
> http://www.egenix.com/
> >>> mxODBC.Zope.Database.Adapter ... 
> http://zope.egenix.com/
> >>> mxODBC, mxDateTime, mxTextTools ...
> http://python.egenix.com/
> __
> __
> 
> ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for 
> free ! 
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/kristjan%40c
cpgames.com
> 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-09 Thread M.-A. Lemburg
Larry Hastings wrote:
> Fredrik Lundh wrote:
>> [EMAIL PROTECTED] wrote:
>>   
>>> MAL's pybench would probably be better for this presuming it does some
>>> addition with string operands.
>>> 
>> or stringbench.
>>   
> 
> I ran 'em, and they are strangely consistent with pystone.
> 
> With concat, stringbench is ever-so-slightly faster overall.  "172.82" 
> vs "174.85" for the "ascii" column, I guess that's in seconds.  I'm just 
> happy it's not slower.  (I only ran stringbench once; it seems to take 
> *forever*).
> 
> I ran pybench three times for each build.  The slowest concat overall 
> time was still 2.9% faster than the fastest release time.  
> "ConcatStrings" is a big winner, at around 150% faster; since the test 
> doesn't *do* anything with the concatenated values, it never renders the 
> concatenation objects, so it does a lot less work.  
> "CreateStringsWithConcat" is generally 18-19% faster, as expected.  
> After that, the timings are all over the place, but some tests were 
> consistently faster: "CompareInternedStrings" was 8-12% faster, 
> "DictWithFloatKeys" was 9-11% faster, "SmallLists" was 8-15% faster, 
> "CompareLongs" was 6-10% faster, and "PyMethodCalls" was 4-6% faster.  
> (These are all comparing the "average run-time" results, though the 
> "minimum run-time" results were similar.)

When comparing results, please look at the minimum runtime.
The average times are just given to indicate how much the mintime
differs from the average of all runs.

> I still couldn't tell you why my results are faster.  I swear on my 
> mother's eyes I didn't touch anything major involved in 
> "DictWithFloatKeys", "SmallLists", or "CompareLongs".  I didn't touch 
> the compiler settings, so that shouldn't be it.  I acknowledge not only 
> that it could all be a mistake, and that I don't know enough about it to 
> speculate.//

Depending on what you changed, it is possible that the layout of
the code in memory better fits your CPU architecture.

If however the speedups are not consistent across several runs of
pybench, then it's likely that you have some background activity
going on on the machine which causes a slowdown in the unmodified
run you chose as basis for the comparison.

Just to make sure: you are using pybench 2.0, right ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 09 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-08 Thread Larry Hastings
Fredrik Lundh wrote:
> [EMAIL PROTECTED] wrote:
>   
>> MAL's pybench would probably be better for this presuming it does some
>> addition with string operands.
>> 
> or stringbench.
>   

I ran 'em, and they are strangely consistent with pystone.

With concat, stringbench is ever-so-slightly faster overall.  "172.82" 
vs "174.85" for the "ascii" column, I guess that's in seconds.  I'm just 
happy it's not slower.  (I only ran stringbench once; it seems to take 
*forever*).

I ran pybench three times for each build.  The slowest concat overall 
time was still 2.9% faster than the fastest release time.  
"ConcatStrings" is a big winner, at around 150% faster; since the test 
doesn't *do* anything with the concatenated values, it never renders the 
concatenation objects, so it does a lot less work.  
"CreateStringsWithConcat" is generally 18-19% faster, as expected.  
After that, the timings are all over the place, but some tests were 
consistently faster: "CompareInternedStrings" was 8-12% faster, 
"DictWithFloatKeys" was 9-11% faster, "SmallLists" was 8-15% faster, 
"CompareLongs" was 6-10% faster, and "PyMethodCalls" was 4-6% faster.  
(These are all comparing the "average run-time" results, though the 
"minimum run-time" results were similar.)

I still couldn't tell you why my results are faster.  I swear on my 
mother's eyes I didn't touch anything major involved in 
"DictWithFloatKeys", "SmallLists", or "CompareLongs".  I didn't touch 
the compiler settings, so that shouldn't be it.  I acknowledge not only 
that it could all be a mistake, and that I don't know enough about it to 
speculate.//

The speedup mystery continues,


*larry*
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-07 Thread Nicko van Someren
On 7 Oct 2006, at 09:17, Fredrik Lundh wrote:

> Nicko van Someren wrote:
>
>> If it speeds up pystone by 5.5% with such minimal down side
>> I'm hard pressed to see a reason not to use it.
>
> can you tell me where exactly "pystone" does string concatenations?

No, not without more in depth examination, but it is a pretty common  
operation in all sorts of cases including inside the interpreter.   
Larry's message in reply to Gregory Smith's request for a pystone  
score showed a 5.5% improvement and as yet I have no reason to doubt  
it.  If the patch provides a measurable performance improvement for  
code that merely happens to use strings as opposed to being  
explicitly heavy on string addition then all the better.

It's clear that this needs to be more carefully measured before it  
goes in (which is why that quote above starts "If").  As I've  
mentioned before in this thread, getting good performance measures on  
code that does lazy evaluation is often tricky.  pystone is a good  
place to start but I'm sure that there are use cases that it does not  
cover.

As for counting up the downsides, Josiah Carlson rightly points out  
that it breaks binary compatibility for modules, so the change can  
not be taken lightly and clearly it will have to wait for a major  
release.  Still, if the benefits outweigh the costs it seems worth  
doing.

Cheers,
Nicko

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-07 Thread skip

Fredrik> Nicko van Someren wrote:
>> If it speeds up pystone by 5.5% with such minimal down side I'm hard
>> pressed to see a reason not to use it.

Fredrik> can you tell me where exactly "pystone" does string
Fredrik> concatenations?

I wondered about that as well.  While I'm not prepared to assert without a
doubt that pystone does no simpleminded string concatenation, a couple
minutes scanning the pystone source didn't turn up any.  If the pystone
speedup isn't an artifact, the absence of string concatention in pystone
suggests it's happening somewhere in the interpreter.

I applied the patch, ran the interpreter under gdb with a breakpoint set in
string_concat where the PyStringConcatenationObject is created, then ran
pystone.  The first hit was in

site.py -> distutils/util.py -> string.py

All told, there were only 22 hits, none for very long strings, so that
doesn't explain the performance improvement.

BTW, on my Mac (OSX 10.4.8) max() is not defined.  I had to add a macro
definition to string_concat.

Skip

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-07 Thread Fredrik Lundh
Nicko van Someren wrote:

> If it speeds up pystone by 5.5% with such minimal down side  
> I'm hard pressed to see a reason not to use it.

can you tell me where exactly "pystone" does string concatenations?



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-07 Thread Josiah Carlson

Nicko van Someren <[EMAIL PROTECTED]> wrote:
> It's not like having this patch is going to force anyone to change  
> the way they write their code.  As far as I can tell it simply offers  
> better performance if you choose to express your code in some common  
> ways.  If it speeds up pystone by 5.5% with such minimal down side  
> I'm hard pressed to see a reason not to use it.

This has to wait until Python 2.6 (which is anywhere from 14-24 months
away, according to history); including it would destroy binary
capatability with modules compiled for 2.5, nevermind that it is a
nontrivial feature addition.

I also think that the original author (or one of this patch's supporters)
should write a PEP outlining the Python 2.5 and earlier drawbacks, what
changes this implementation brings, its improvements, and any potential
drawbacks.


 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-06 Thread Ron Adam
Nicko van Someren wrote:
> On 6 Oct 2006, at 12:37, Ron Adam wrote:
> 
 I've never liked the "".join([]) idiom for string concatenation; in my
 opinion it violates the principles "Beautiful is better than ugly." and
 "There should be one-- and preferably only one --obvious way to do 
 it.".
> ...
>> Well I always like things to run faster, but I disagree that this 
>> idiom is broken.
>>
>> I like using lists to store sub strings and I think it's just a matter of
>> changing your frame of reference in how you think about them.
> 
> I think that you've hit on exactly the reason why this patch is a good 
> idea.  You happen to like to store strings in lists, and in many 
> situations this is a fine thing to do, but if one is forced to change 
> ones frame of reference in order to get decent performance then as well 
> as violating the maxims Larry originally cited you're also hitting both 
> "readability counts" and "Correctness and clarity before speed."

The statement ".. if one is forced to change .." is a bit overstated I think. 
The situation is more a matter of increasing awareness so the frame of 
reference 
comes to mind more naturally and doesn't seem forced.  And the suggestion of 
how 
to do that is by adding additional functions and methods that can use 
lists-of-strings instead of having to join or concatenate them first.  Added 
examples and documentation can also do that as well.

The two ideas are non-competing. They are related because they realize their 
benefits by reducing redundant underlying operations in a similar way.

Cheers,
Ron


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-06 Thread Nicko van Someren
On 6 Oct 2006, at 12:37, Ron Adam wrote:

>>> I've never liked the "".join([]) idiom for string concatenation;  
>>> in my
>>> opinion it violates the principles "Beautiful is better than  
>>> ugly." and
>>> "There should be one-- and preferably only one --obvious way to  
>>> do it.".
...
> Well I always like things to run faster, but I disagree that this  
> idiom is broken.
>
> I like using lists to store sub strings and I think it's just a  
> matter of
> changing your frame of reference in how you think about them.

I think that you've hit on exactly the reason why this patch is a  
good idea.  You happen to like to store strings in lists, and in many  
situations this is a fine thing to do, but if one is forced to change  
ones frame of reference in order to get decent performance then as  
well as violating the maxims Larry originally cited you're also  
hitting both "readability counts" and "Correctness and clarity before  
speed."

The "".join(L) idiom is not "broken" in the sense that, to the fluent  
Python programmer, it does convey the intent as well as the action.   
That said, there are plenty of places that you'll see it not being  
used because it fails to convey the intent.  It's pretty rare to see  
someone write:
 for k,v in d.items():
 print " has value: ".join([k,v])
but, despite the utility of the % operator on strings it's pretty  
common to see:
 print k + " has value: " + v

This patch _seems_ to be able to provide better performance for this  
sort of usage and provide a major speed-up for some other common  
usage forms without causing the programmer to resort making their  
code more complicated.  The cost seems to be a small memory hit on  
the size of a string object, a tiny increase in code size and some  
well isolated, under-the-hood complexity.

It's not like having this patch is going to force anyone to change  
the way they write their code.  As far as I can tell it simply offers  
better performance if you choose to express your code in some common  
ways.  If it speeds up pystone by 5.5% with such minimal down side  
I'm hard pressed to see a reason not to use it.

Cheers,
Nicko

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-06 Thread Ron Adam
Josiah Carlson wrote:
> Fredrik Lundh <[EMAIL PROTECTED]> wrote:
>> Ron Adam wrote:
>>
>>> I think what may be missing is a larger set of higher level string 
>>> functions 
>>> that will work with lists of strings directly.  Then lists of strings can 
>>> be 
>>> thought of as a mutable string type by its use, and then working with 
>>> substrings 
>>> in lists and using ''.join() will not seem as out of place.
>> as important is the observation that you don't necessarily have to join 
>> string lists; if the data ends up being sent over a wire or written to 
>> disk, you might as well skip the join step, and work directly from the list.
>>
>> (it's no accident that ET has grown "tostringlist" and "fromstringlist" 
>> functions, for example ;-)
> 
> I've personally added a line-based abstraction with indent/dedent
> handling, etc., for the editor I use, which helps make macros and
> underlying editor functionality easier to write.
> 
> 
>  - Josiah

I've done the same thing just last week.  I've started to collect them into a 
module called stringtools, but I see no reason why they can't reside in the 
string module.

I think this may be just a case of collecting these type of routines together 
in 
one place so they can be reused easily because they already are scattered 
around 
pythons library in some form or another.

Another tool I found tucked away within a pydoc is the console pager that is 
used in pydoc.  I think it could easily be a separate module it self.  And it 
benefits from the line-based abstraction as well.

Cheers,
Ron

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-06 Thread Bob Ippolito
On 10/6/06, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> Ron Adam wrote:
>
> > I think what may be missing is a larger set of higher level string functions
> > that will work with lists of strings directly.  Then lists of strings can be
> > thought of as a mutable string type by its use, and then working with 
> > substrings
> > in lists and using ''.join() will not seem as out of place.
>
> as important is the observation that you don't necessarily have to join
> string lists; if the data ends up being sent over a wire or written to
> disk, you might as well skip the join step, and work directly from the list.
>
> (it's no accident that ET has grown "tostringlist" and "fromstringlist"
> functions, for example ;-)

The just make lists paradigm is used by Erlang too, it's called
"iolist" there (it's not a type, just a convention). The lists can be
nested though, so concatenating chunks of data for IO is always a
constant time operation even if the chunks are already iolists.

-bob
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-06 Thread Josiah Carlson

Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> 
> Ron Adam wrote:
> 
> > I think what may be missing is a larger set of higher level string 
> > functions 
> > that will work with lists of strings directly.  Then lists of strings can 
> > be 
> > thought of as a mutable string type by its use, and then working with 
> > substrings 
> > in lists and using ''.join() will not seem as out of place.
> 
> as important is the observation that you don't necessarily have to join 
> string lists; if the data ends up being sent over a wire or written to 
> disk, you might as well skip the join step, and work directly from the list.
> 
> (it's no accident that ET has grown "tostringlist" and "fromstringlist" 
> functions, for example ;-)

I've personally added a line-based abstraction with indent/dedent
handling, etc., for the editor I use, which helps make macros and
underlying editor functionality easier to write.


 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-06 Thread Fredrik Lundh
Ron Adam wrote:

> I think what may be missing is a larger set of higher level string functions 
> that will work with lists of strings directly.  Then lists of strings can be 
> thought of as a mutable string type by its use, and then working with 
> substrings 
> in lists and using ''.join() will not seem as out of place.

as important is the observation that you don't necessarily have to join 
string lists; if the data ends up being sent over a wire or written to 
disk, you might as well skip the join step, and work directly from the list.

(it's no accident that ET has grown "tostringlist" and "fromstringlist" 
functions, for example ;-)



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-06 Thread Ron Adam
Gregory P. Smith wrote:
>> I've never liked the "".join([]) idiom for string concatenation; in my 
>> opinion it violates the principles "Beautiful is better than ugly." and 
>> "There should be one-- and preferably only one --obvious way to do it.". 
>> (And perhaps several others.)  To that end I've submitted patch #1569040 
>> to SourceForge:
>> 
>> http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470
>> This patch speeds up using + for string concatenation.
> 
> yay!  i'm glad to see this.  i hate the "".join syntax.  i still write
> that as string.join() because thats at least readable).  it also fixes
> the python idiom for fast string concatenation as intended; anyone
> whos ever written code that builds a large string value by pushing
> substrings into a list only to call join later should agree.

Well I always like things to run faster, but I disagree that this idiom is 
broken.

I like using lists to store sub strings and I think it's just a matter of 
changing your frame of reference in how you think about them.  For example it 
doesn't bother me to have an numeric type with many digits, and to have lists 
of 
many, many digit numbers, and work with those.  Working with lists of many 
character strings is not that different.  I've even come to the conclusion 
(just 
my opinion) that mutable lists of strings probably would work better than a 
long 
mutable string of characters in most situations.

What I've found is there seems to be an optimum string length depending on what 
you are doing.  Too long (hundreds or thousands of characters) and repeating 
some string operations (not just concatenations) can be slow (relative to short 
strings), and using many short (single character) strings would use more memory 
than is needed.  So a list of medium length strings is actually a very nice 
compromise.  I'm not sure what the optimal strings length is, but lines of 
about 
80 columns seems to work very well for most things.

I think what may be missing is a larger set of higher level string functions 
that will work with lists of strings directly.  Then lists of strings can be 
thought of as a mutable string type by its use, and then working with 
substrings 
in lists and using ''.join() will not seem as out of place.  So maybe instead 
of 
splitting, modifying, then joining, (and again, etc ...), just pass the whole 
list around and have operations that work directly on the list of strings and 
return a list of strings as the result.  Pretty much what the Patch does under 
the covers, but it only works with concatenation.  Having more functions that 
work with lists of strings directly will reduce the need for concatenation as 
well.

Some operations that could work well with whole lists of strings of lines may 
be 
indent_lines, dedent_lines, prepend_lines, wrap_lines, and of course join_lines 
as in '\n'.join(L), the inverse of s.splitlines(), and there also readlines() 
and writelines(). Also possilby find_line or find_in_lines(). These really 
shouldn't seem anymore out of place than numeric operations that work with 
lists 
such as sum, max, and min.  So to me...  "".join(L) as a string operation that 
works on a list of strings seems perfectly natural. :-)

Cheers,
Ron


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-06 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote:

> Greg> have you run any generic benchmarks such as pystone to get a
> Greg> better idea of what the net effect on "typical" python code is?
> 
> MAL's pybench would probably be better for this presuming it does some
> addition with string operands.

or stringbench.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-06 Thread skip

Greg> have you run any generic benchmarks such as pystone to get a
Greg> better idea of what the net effect on "typical" python code is?

MAL's pybench would probably be better for this presuming it does some
addition with string operands.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-05 Thread Fredrik Lundh
Steve Holden wrote:

> instance.method(*args) <==> type.method(instance, *args)
> 
> You can nowadays spell this as str.join("", lst) - no need to import a 
> whole module!

except that str.join isn't polymorphic:

 >>> str.join(u",", ["1", "2", "3"])
Traceback (most recent call last):
   File "", line 1, in 
TypeError: descriptor 'join' requires a 'str' object but received a 
'unicode'
 >>> string.join(["1", "2", "3"], u",")
u'1,2,3'



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-05 Thread Steve Holden
Gregory P. Smith wrote:
>>I've never liked the "".join([]) idiom for string concatenation; in my 
>>opinion it violates the principles "Beautiful is better than ugly." and 
>>"There should be one-- and preferably only one --obvious way to do it.". 
>>(And perhaps several others.)  To that end I've submitted patch #1569040 
>>to SourceForge:
>>
>>http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470
>>This patch speeds up using + for string concatenation.
> 
> 
> yay!  i'm glad to see this.  i hate the "".join syntax.  i still write
> that as string.join()  [...]

instance.method(*args) <==> type.method(instance, *args)

You can nowadays spell this as str.join("", lst) - no need to import a 
whole module!

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-05 Thread Larry Hastings

Gregory P. Smith wrote:
> have you run any generic benchmarks such as pystone to get a better
> idea of what the net effect on "typical" python code is?
I hadn't, but I'm happy to.  On my machine (a fire-breathing Athlon 64 
x2 4400+), best of three runs:

Python 2.5 release:
Pystone(1.1) time for 5 passes = 1.01757
This machine benchmarks at 49136.8 pystones/second

Python 2.5 concat:
Pystone(1.1) time for 5 passes = 0.963191
This machine benchmarks at 51910.8 pystones/second

I'm surprised by this; I had expected it to be slightly *slower*, not 
the other way 'round.  I'm not sure why this is.  A cursory glance at 
pystone.py doesn't reveal any string concatenation using +, so I doubt 
it's benefiting from my speedup.  And I didn't change the optimization 
flags when I compiled Python, so that should be the same.


Josiah Carlson wrote:
> Regardless of "nicer to read", I would just point out that Guido has
> stated that Python will not have strings implemented as trees.
>   
I suspect it was more a caution that Python wouldn't *permanently* store 
strings as "ropes".  In my patch, the rope only exists until someone 
asks for the string's value, at which point the tree is rendered and 
dereferenced.  From that point on the object is exactly like a normal 
PyStringObject to the external viewer.

But you and I are, as I believe the saying goes, "channeling Guido 
(badly)".  Perhaps some adult supervision will intervene soon and make 
its opinions known.


For what it's worth, I've realized two things I want to change about my 
patch:

  * I left in a couple of /* lch */ comments I used during development 
as markers to find my own code.  Whoops; I'll strip those out.

  * I realized that, because of struct packing, all PyStringObjects are 
currently wasting an average of two bytes apiece.  (As in, that's 
something Python 2.5 does, not something added by my code.)  I'll change 
my patch so strings are allocated more precisely.  If my string 
concatenation patch is declined, I'll be sure to submit this patch 
separately.

I'll try to submit an updated patch today.

Cheers,


/larry/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-05 Thread Nicko van Someren
On 5 Oct 2006, at 20:28, Gregory P. Smith wrote:

>> I've never liked the "".join([]) idiom for string concatenation;  
>> in my
>> opinion it violates the principles "Beautiful is better than  
>> ugly." and
>> "There should be one-- and preferably only one --obvious way to do  
>> it.".
>> (And perhaps several others.)  To that end I've submitted patch  
>> #1569040
>> to SourceForge:
>>
>> http://sourceforge.net/tracker/index.php? 
>> func=detail&aid=1569040&group_id=5470&atid=305470
>> This patch speeds up using + for string concatenation.
>
> yay!  i'm glad to see this.  i hate the "".join syntax.

Here here.  Being able to write what you mean and have the language  
get decent performance none the less seems to me to be a "good thing".

> have you run any generic benchmarks such as pystone to get a better
> idea of what the net effect on "typical" python code is?

Yeah, "real world" performance testing is always important with  
anything that uses lazy evaluation.  If you get to control if and  
when the computation actually happens you have even more scope than  
usual for getting the benchmark answer you want to see!

Cheers,
Nicko



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-05 Thread Josiah Carlson

"Gregory P. Smith" <[EMAIL PROTECTED]> wrote:
> 
> > I've never liked the "".join([]) idiom for string concatenation; in my 
> > opinion it violates the principles "Beautiful is better than ugly." and 
> > "There should be one-- and preferably only one --obvious way to do it.". 
> > (And perhaps several others.)  To that end I've submitted patch #1569040 
> > to SourceForge:
> > 
> > http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470
> > This patch speeds up using + for string concatenation.
> 
> yay!  i'm glad to see this.  i hate the "".join syntax.  i still write
> that as string.join() because thats at least readable).  it also fixes
> the python idiom for fast string concatenation as intended; anyone
> whos ever written code that builds a large string value by pushing
> substrings into a list only to call join later should agree.
> 
> mystr = "prefix"
> while bla:
>   #...
>   mystr += moredata

Regardless of "nicer to read", I would just point out that Guido has
stated that Python will not have strings implemented as trees.  Also,
Python 3.x will have a data type called 'bytes', which will be the
default return of file.read() (when files are opened as binary), which
uses an over-allocation strategy like lists to get relatively fast
concatenation (on the order of lst1 += lst2).

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-05 Thread Gregory P. Smith
> I've never liked the "".join([]) idiom for string concatenation; in my 
> opinion it violates the principles "Beautiful is better than ugly." and 
> "There should be one-- and preferably only one --obvious way to do it.". 
> (And perhaps several others.)  To that end I've submitted patch #1569040 
> to SourceForge:
> 
> http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470
> This patch speeds up using + for string concatenation.

yay!  i'm glad to see this.  i hate the "".join syntax.  i still write
that as string.join() because thats at least readable).  it also fixes
the python idiom for fast string concatenation as intended; anyone
whos ever written code that builds a large string value by pushing
substrings into a list only to call join later should agree.

mystr = "prefix"
while bla:
  #...
  mystr += moredata

is much nicer to read than

mystr = "prefix"
strParts = [mystr]
while bla:
  #...
  strParts.append(moredata)
mystr = "".join(strParts)

have you run any generic benchmarks such as pystone to get a better
idea of what the net effect on "typical" python code is?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom

2006-10-04 Thread Larry Hastings


I've never liked the "".join([]) idiom for string concatenation; in my 
opinion it violates the principles "Beautiful is better than ugly." and 
"There should be one-- and preferably only one --obvious way to do it.". 
(And perhaps several others.)  To that end I've submitted patch #1569040 
to SourceForge:

http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470
This patch speeds up using + for string concatenation.  It's been in 
discussion on c.l.p for about a week, here:

http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf

I'm not a Python guru, and my initial benchmark had many mistakes. With 
help from the community correct benchmarks emerged: + for string 
concatenation is now roughly as fast as the usual "".join() idiom when 
appending.  (It appears to be *much* faster for prepending.)  The 
patched Python passes all the tests in regrtest.py for which I have 
source; I didn't install external packages such as bsddb and sqlite3.

My approach was to add a "string concatenation" object; I have since 
learned this is also called a "rope".  Internally, a 
PyStringConcatationObject is exactly like a PyStringObject but with a 
few extra members taking an additional thirty-six bytes of storage.  
When you add two PyStringObjects together, string_concat() returns a 
PyStringConcatationObject which contains references to the two strings.  
Concatenating any mixture of PyStringObjects and 
PyStringConcatationObjects works similarly, though there are some 
internal optimizations.

These changes are almost entirely contained within 
Objects/stringobject.c and Include/stringobject.h.  There is one major 
externally-visible change in this patch: PyStringObject.ob_sval is no 
longer a char[1] array, but a char *. Happily, this only requires a 
recompile, because the CPython source is *marvelously* consistent about 
using the macro PyString_AS_STRING().  (One hopes extension authors are 
as consistent.)  I only had to touch two other files (Python/ceval.c and 
Objects/codeobject.c) and those were one-line changes.  There is one 
remaining place that still needs fixing: the self-described "hack" in 
Mac/Modules/MacOS.c.  Fixing that is beyond my pay grade.

I changed the representation of ob_sval for two reasons: first, it is 
initially NULL for a string concatenation object, and second, because it 
may point to separately-allocated memory.  That's where the speedup came 
from--it doesn't render the string until someone asks for the string's 
value.  It is telling to see my new implementation of 
PyString_AS_STRING, as follows (casts and extra parentheses removed for 
legibility):
#define PyString_AS_STRING(x) ( x->ob_sval ? x->ob_sval : 
PyString_AsString(x) )
This adds a layer of indirection for the string and a branch, adding a 
tiny (but measurable) slowdown to the general case.  Again, because the 
changes to PyStringObject are hidden by this macro, external users of 
these objects don't notice the difference.

The patch is posted, and I have donned the thickest skin I have handy.  
I look forward to your feedback.

Cheers,


/larry/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com