Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
Larry Hastings <[EMAIL PROTECTED]> wrote: [snip] > The machine is dual-core, and was quiescent at the time. XP's scheduler > is hopefully good enough to just leave the process running on one core. It's not. Go into the task manager (accessable via Ctrl+Alt+Del by default) and change the process' affinity to the second core. In my experience, running on the second core (in both 2k and XP) tends to produce slightly faster results. Linux tends to keep processes on a single core for a few seconds at a time. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
I've uploaded a new patch to Sourceforge in response to feedback: * I purged all // comments and fixed all > 80 characters added by my patch, as per Neil Norwitz. * I added a definition of max() for those who don't already have one, as per [EMAIL PROTECTED] It now compiles cleanly on Linux again without modification; sorry for not checking that since the original patch. I've also uploaded my hacked-together benchmark script, for all that's worth. That patch tracker page again: http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470 M.-A. Lemburg wrote: > When comparing results, please look at the minimum runtime. > The average times are just given to indicate how much the mintime > differs from the average of all runs. > I'll do that next time. In the meantime, I've also uploaded a zip file containing the results of my benchmarking, including the stdout from the run and the "-f" file which contains the pickled output. So you can examine my results yourself, including doing analysis on the pickled data if you like. > If however the speedups are not consistent across several runs of > pybench, then it's likely that you have some background activity > going on on the machine which causes a slowdown in the unmodified > run you chose as basis for the comparison. > The machine is dual-core, and was quiescent at the time. XP's scheduler is hopefully good enough to just leave the process running on one core. I ran the benchmarks just once on my Linux 2.6 machine; it's a dual-CPU P3 933EB (or maybe just 866EB, I forget). It's faster overall there too, by 1.9% (minimum run-time). The two tests I expected to be faster ("ConcatStrings" and "CreateStringsWithConcat") were consistently much faster; beyond that the results don't particularly resemble the results from my XP machine. (I uploaded those .txt and .pickle files too.) The mystery overall speedup continues, not that I find it unwelcome. :) > Just to make sure: you are using pybench 2.0, right ? > I sure was. And I used stringbench.py downloaded from here: http://svn.python.org/projects/sandbox/branches/jim-fix-setuptools-cli/stringbench/stringbench.py Cheers, /larry/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
This patch looks really nice to use here at CCP. Our code is full of string contcatenations so I will probably try to apply the patch soon and see what it gives us in a real life app. The floating point integer cache was also a big win. Soon, standard python won't be able to keep up with the patched versions out there :) Oh, and since I have fixed the pcbuild8 thingy in the 2.5 branch, why don't you give the PGO version a whirl too? Even the non-PGO dll, with link-time code generation, should be faster than your vanilla PCBuild one. Read the Readme.txt for details. Cheers, Kristján > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] > On Behalf Of M.-A. Lemburg > Sent: 9. október 2006 09:30 > To: Larry Hastings > Cc: python-dev@python.org > Subject: Re: [Python-Dev] PATCH submitted: Speed up + for > string concatenation, now as fast as "".join(x) idiom > > Larry Hastings wrote: > > Fredrik Lundh wrote: > >> [EMAIL PROTECTED] wrote: > >> > >>> MAL's pybench would probably be better for this presuming it does > >>> some addition with string operands. > >>> > >> or stringbench. > >> > > > > I ran 'em, and they are strangely consistent with pystone. > > > > With concat, stringbench is ever-so-slightly faster > overall. "172.82" > > vs "174.85" for the "ascii" column, I guess that's in seconds. I'm > > just happy it's not slower. (I only ran stringbench once; > it seems to > > take *forever*). > > > > I ran pybench three times for each build. The slowest > concat overall > > time was still 2.9% faster than the fastest release time. > > "ConcatStrings" is a big winner, at around 150% faster; > since the test > > doesn't *do* anything with the concatenated values, it > never renders > > the concatenation objects, so it does a lot less work. > > "CreateStringsWithConcat" is generally 18-19% faster, as expected. > > After that, the timings are all over the place, but some tests were > > consistently faster: "CompareInternedStrings" was 8-12% faster, > > "DictWithFloatKeys" was 9-11% faster, "SmallLists" was > 8-15% faster, > > "CompareLongs" was 6-10% faster, and "PyMethodCalls" was > 4-6% faster. > > (These are all comparing the "average run-time" results, though the > > "minimum run-time" results were similar.) > > When comparing results, please look at the minimum runtime. > The average times are just given to indicate how much the > mintime differs from the average of all runs. > > > I still couldn't tell you why my results are faster. I swear on my > > mother's eyes I didn't touch anything major involved in > > "DictWithFloatKeys", "SmallLists", or "CompareLongs". I > didn't touch > > the compiler settings, so that shouldn't be it. I acknowledge not > > only that it could all be a mistake, and that I don't know enough > > about it to speculate.// > > Depending on what you changed, it is possible that the layout > of the code in memory better fits your CPU architecture. > > If however the speedups are not consistent across several > runs of pybench, then it's likely that you have some > background activity going on on the machine which causes a > slowdown in the unmodified run you chose as basis for the comparison. > > Just to make sure: you are using pybench 2.0, right ? > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, > Oct 09 2006) > >>> Python/Zope Consulting and Support ... > http://www.egenix.com/ > >>> mxODBC.Zope.Database.Adapter ... > http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... > http://python.egenix.com/ > __ > __ > > ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for > free ! > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/kristjan%40c cpgames.com > ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
Larry Hastings wrote: > Fredrik Lundh wrote: >> [EMAIL PROTECTED] wrote: >> >>> MAL's pybench would probably be better for this presuming it does some >>> addition with string operands. >>> >> or stringbench. >> > > I ran 'em, and they are strangely consistent with pystone. > > With concat, stringbench is ever-so-slightly faster overall. "172.82" > vs "174.85" for the "ascii" column, I guess that's in seconds. I'm just > happy it's not slower. (I only ran stringbench once; it seems to take > *forever*). > > I ran pybench three times for each build. The slowest concat overall > time was still 2.9% faster than the fastest release time. > "ConcatStrings" is a big winner, at around 150% faster; since the test > doesn't *do* anything with the concatenated values, it never renders the > concatenation objects, so it does a lot less work. > "CreateStringsWithConcat" is generally 18-19% faster, as expected. > After that, the timings are all over the place, but some tests were > consistently faster: "CompareInternedStrings" was 8-12% faster, > "DictWithFloatKeys" was 9-11% faster, "SmallLists" was 8-15% faster, > "CompareLongs" was 6-10% faster, and "PyMethodCalls" was 4-6% faster. > (These are all comparing the "average run-time" results, though the > "minimum run-time" results were similar.) When comparing results, please look at the minimum runtime. The average times are just given to indicate how much the mintime differs from the average of all runs. > I still couldn't tell you why my results are faster. I swear on my > mother's eyes I didn't touch anything major involved in > "DictWithFloatKeys", "SmallLists", or "CompareLongs". I didn't touch > the compiler settings, so that shouldn't be it. I acknowledge not only > that it could all be a mistake, and that I don't know enough about it to > speculate.// Depending on what you changed, it is possible that the layout of the code in memory better fits your CPU architecture. If however the speedups are not consistent across several runs of pybench, then it's likely that you have some background activity going on on the machine which causes a slowdown in the unmodified run you chose as basis for the comparison. Just to make sure: you are using pybench 2.0, right ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 09 2006) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
Fredrik Lundh wrote: > [EMAIL PROTECTED] wrote: > >> MAL's pybench would probably be better for this presuming it does some >> addition with string operands. >> > or stringbench. > I ran 'em, and they are strangely consistent with pystone. With concat, stringbench is ever-so-slightly faster overall. "172.82" vs "174.85" for the "ascii" column, I guess that's in seconds. I'm just happy it's not slower. (I only ran stringbench once; it seems to take *forever*). I ran pybench three times for each build. The slowest concat overall time was still 2.9% faster than the fastest release time. "ConcatStrings" is a big winner, at around 150% faster; since the test doesn't *do* anything with the concatenated values, it never renders the concatenation objects, so it does a lot less work. "CreateStringsWithConcat" is generally 18-19% faster, as expected. After that, the timings are all over the place, but some tests were consistently faster: "CompareInternedStrings" was 8-12% faster, "DictWithFloatKeys" was 9-11% faster, "SmallLists" was 8-15% faster, "CompareLongs" was 6-10% faster, and "PyMethodCalls" was 4-6% faster. (These are all comparing the "average run-time" results, though the "minimum run-time" results were similar.) I still couldn't tell you why my results are faster. I swear on my mother's eyes I didn't touch anything major involved in "DictWithFloatKeys", "SmallLists", or "CompareLongs". I didn't touch the compiler settings, so that shouldn't be it. I acknowledge not only that it could all be a mistake, and that I don't know enough about it to speculate.// The speedup mystery continues, *larry* ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
On 7 Oct 2006, at 09:17, Fredrik Lundh wrote: > Nicko van Someren wrote: > >> If it speeds up pystone by 5.5% with such minimal down side >> I'm hard pressed to see a reason not to use it. > > can you tell me where exactly "pystone" does string concatenations? No, not without more in depth examination, but it is a pretty common operation in all sorts of cases including inside the interpreter. Larry's message in reply to Gregory Smith's request for a pystone score showed a 5.5% improvement and as yet I have no reason to doubt it. If the patch provides a measurable performance improvement for code that merely happens to use strings as opposed to being explicitly heavy on string addition then all the better. It's clear that this needs to be more carefully measured before it goes in (which is why that quote above starts "If"). As I've mentioned before in this thread, getting good performance measures on code that does lazy evaluation is often tricky. pystone is a good place to start but I'm sure that there are use cases that it does not cover. As for counting up the downsides, Josiah Carlson rightly points out that it breaks binary compatibility for modules, so the change can not be taken lightly and clearly it will have to wait for a major release. Still, if the benefits outweigh the costs it seems worth doing. Cheers, Nicko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
Fredrik> Nicko van Someren wrote: >> If it speeds up pystone by 5.5% with such minimal down side I'm hard >> pressed to see a reason not to use it. Fredrik> can you tell me where exactly "pystone" does string Fredrik> concatenations? I wondered about that as well. While I'm not prepared to assert without a doubt that pystone does no simpleminded string concatenation, a couple minutes scanning the pystone source didn't turn up any. If the pystone speedup isn't an artifact, the absence of string concatention in pystone suggests it's happening somewhere in the interpreter. I applied the patch, ran the interpreter under gdb with a breakpoint set in string_concat where the PyStringConcatenationObject is created, then ran pystone. The first hit was in site.py -> distutils/util.py -> string.py All told, there were only 22 hits, none for very long strings, so that doesn't explain the performance improvement. BTW, on my Mac (OSX 10.4.8) max() is not defined. I had to add a macro definition to string_concat. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
Nicko van Someren wrote: > If it speeds up pystone by 5.5% with such minimal down side > I'm hard pressed to see a reason not to use it. can you tell me where exactly "pystone" does string concatenations? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
Nicko van Someren <[EMAIL PROTECTED]> wrote: > It's not like having this patch is going to force anyone to change > the way they write their code. As far as I can tell it simply offers > better performance if you choose to express your code in some common > ways. If it speeds up pystone by 5.5% with such minimal down side > I'm hard pressed to see a reason not to use it. This has to wait until Python 2.6 (which is anywhere from 14-24 months away, according to history); including it would destroy binary capatability with modules compiled for 2.5, nevermind that it is a nontrivial feature addition. I also think that the original author (or one of this patch's supporters) should write a PEP outlining the Python 2.5 and earlier drawbacks, what changes this implementation brings, its improvements, and any potential drawbacks. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
Nicko van Someren wrote: > On 6 Oct 2006, at 12:37, Ron Adam wrote: > I've never liked the "".join([]) idiom for string concatenation; in my opinion it violates the principles "Beautiful is better than ugly." and "There should be one-- and preferably only one --obvious way to do it.". > ... >> Well I always like things to run faster, but I disagree that this >> idiom is broken. >> >> I like using lists to store sub strings and I think it's just a matter of >> changing your frame of reference in how you think about them. > > I think that you've hit on exactly the reason why this patch is a good > idea. You happen to like to store strings in lists, and in many > situations this is a fine thing to do, but if one is forced to change > ones frame of reference in order to get decent performance then as well > as violating the maxims Larry originally cited you're also hitting both > "readability counts" and "Correctness and clarity before speed." The statement ".. if one is forced to change .." is a bit overstated I think. The situation is more a matter of increasing awareness so the frame of reference comes to mind more naturally and doesn't seem forced. And the suggestion of how to do that is by adding additional functions and methods that can use lists-of-strings instead of having to join or concatenate them first. Added examples and documentation can also do that as well. The two ideas are non-competing. They are related because they realize their benefits by reducing redundant underlying operations in a similar way. Cheers, Ron ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
On 6 Oct 2006, at 12:37, Ron Adam wrote: >>> I've never liked the "".join([]) idiom for string concatenation; >>> in my >>> opinion it violates the principles "Beautiful is better than >>> ugly." and >>> "There should be one-- and preferably only one --obvious way to >>> do it.". ... > Well I always like things to run faster, but I disagree that this > idiom is broken. > > I like using lists to store sub strings and I think it's just a > matter of > changing your frame of reference in how you think about them. I think that you've hit on exactly the reason why this patch is a good idea. You happen to like to store strings in lists, and in many situations this is a fine thing to do, but if one is forced to change ones frame of reference in order to get decent performance then as well as violating the maxims Larry originally cited you're also hitting both "readability counts" and "Correctness and clarity before speed." The "".join(L) idiom is not "broken" in the sense that, to the fluent Python programmer, it does convey the intent as well as the action. That said, there are plenty of places that you'll see it not being used because it fails to convey the intent. It's pretty rare to see someone write: for k,v in d.items(): print " has value: ".join([k,v]) but, despite the utility of the % operator on strings it's pretty common to see: print k + " has value: " + v This patch _seems_ to be able to provide better performance for this sort of usage and provide a major speed-up for some other common usage forms without causing the programmer to resort making their code more complicated. The cost seems to be a small memory hit on the size of a string object, a tiny increase in code size and some well isolated, under-the-hood complexity. It's not like having this patch is going to force anyone to change the way they write their code. As far as I can tell it simply offers better performance if you choose to express your code in some common ways. If it speeds up pystone by 5.5% with such minimal down side I'm hard pressed to see a reason not to use it. Cheers, Nicko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
Josiah Carlson wrote: > Fredrik Lundh <[EMAIL PROTECTED]> wrote: >> Ron Adam wrote: >> >>> I think what may be missing is a larger set of higher level string >>> functions >>> that will work with lists of strings directly. Then lists of strings can >>> be >>> thought of as a mutable string type by its use, and then working with >>> substrings >>> in lists and using ''.join() will not seem as out of place. >> as important is the observation that you don't necessarily have to join >> string lists; if the data ends up being sent over a wire or written to >> disk, you might as well skip the join step, and work directly from the list. >> >> (it's no accident that ET has grown "tostringlist" and "fromstringlist" >> functions, for example ;-) > > I've personally added a line-based abstraction with indent/dedent > handling, etc., for the editor I use, which helps make macros and > underlying editor functionality easier to write. > > > - Josiah I've done the same thing just last week. I've started to collect them into a module called stringtools, but I see no reason why they can't reside in the string module. I think this may be just a case of collecting these type of routines together in one place so they can be reused easily because they already are scattered around pythons library in some form or another. Another tool I found tucked away within a pydoc is the console pager that is used in pydoc. I think it could easily be a separate module it self. And it benefits from the line-based abstraction as well. Cheers, Ron ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
On 10/6/06, Fredrik Lundh <[EMAIL PROTECTED]> wrote: > Ron Adam wrote: > > > I think what may be missing is a larger set of higher level string functions > > that will work with lists of strings directly. Then lists of strings can be > > thought of as a mutable string type by its use, and then working with > > substrings > > in lists and using ''.join() will not seem as out of place. > > as important is the observation that you don't necessarily have to join > string lists; if the data ends up being sent over a wire or written to > disk, you might as well skip the join step, and work directly from the list. > > (it's no accident that ET has grown "tostringlist" and "fromstringlist" > functions, for example ;-) The just make lists paradigm is used by Erlang too, it's called "iolist" there (it's not a type, just a convention). The lists can be nested though, so concatenating chunks of data for IO is always a constant time operation even if the chunks are already iolists. -bob ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
Fredrik Lundh <[EMAIL PROTECTED]> wrote: > > Ron Adam wrote: > > > I think what may be missing is a larger set of higher level string > > functions > > that will work with lists of strings directly. Then lists of strings can > > be > > thought of as a mutable string type by its use, and then working with > > substrings > > in lists and using ''.join() will not seem as out of place. > > as important is the observation that you don't necessarily have to join > string lists; if the data ends up being sent over a wire or written to > disk, you might as well skip the join step, and work directly from the list. > > (it's no accident that ET has grown "tostringlist" and "fromstringlist" > functions, for example ;-) I've personally added a line-based abstraction with indent/dedent handling, etc., for the editor I use, which helps make macros and underlying editor functionality easier to write. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
Ron Adam wrote: > I think what may be missing is a larger set of higher level string functions > that will work with lists of strings directly. Then lists of strings can be > thought of as a mutable string type by its use, and then working with > substrings > in lists and using ''.join() will not seem as out of place. as important is the observation that you don't necessarily have to join string lists; if the data ends up being sent over a wire or written to disk, you might as well skip the join step, and work directly from the list. (it's no accident that ET has grown "tostringlist" and "fromstringlist" functions, for example ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
Gregory P. Smith wrote: >> I've never liked the "".join([]) idiom for string concatenation; in my >> opinion it violates the principles "Beautiful is better than ugly." and >> "There should be one-- and preferably only one --obvious way to do it.". >> (And perhaps several others.) To that end I've submitted patch #1569040 >> to SourceForge: >> >> http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470 >> This patch speeds up using + for string concatenation. > > yay! i'm glad to see this. i hate the "".join syntax. i still write > that as string.join() because thats at least readable). it also fixes > the python idiom for fast string concatenation as intended; anyone > whos ever written code that builds a large string value by pushing > substrings into a list only to call join later should agree. Well I always like things to run faster, but I disagree that this idiom is broken. I like using lists to store sub strings and I think it's just a matter of changing your frame of reference in how you think about them. For example it doesn't bother me to have an numeric type with many digits, and to have lists of many, many digit numbers, and work with those. Working with lists of many character strings is not that different. I've even come to the conclusion (just my opinion) that mutable lists of strings probably would work better than a long mutable string of characters in most situations. What I've found is there seems to be an optimum string length depending on what you are doing. Too long (hundreds or thousands of characters) and repeating some string operations (not just concatenations) can be slow (relative to short strings), and using many short (single character) strings would use more memory than is needed. So a list of medium length strings is actually a very nice compromise. I'm not sure what the optimal strings length is, but lines of about 80 columns seems to work very well for most things. I think what may be missing is a larger set of higher level string functions that will work with lists of strings directly. Then lists of strings can be thought of as a mutable string type by its use, and then working with substrings in lists and using ''.join() will not seem as out of place. So maybe instead of splitting, modifying, then joining, (and again, etc ...), just pass the whole list around and have operations that work directly on the list of strings and return a list of strings as the result. Pretty much what the Patch does under the covers, but it only works with concatenation. Having more functions that work with lists of strings directly will reduce the need for concatenation as well. Some operations that could work well with whole lists of strings of lines may be indent_lines, dedent_lines, prepend_lines, wrap_lines, and of course join_lines as in '\n'.join(L), the inverse of s.splitlines(), and there also readlines() and writelines(). Also possilby find_line or find_in_lines(). These really shouldn't seem anymore out of place than numeric operations that work with lists such as sum, max, and min. So to me... "".join(L) as a string operation that works on a list of strings seems perfectly natural. :-) Cheers, Ron ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
[EMAIL PROTECTED] wrote: > Greg> have you run any generic benchmarks such as pystone to get a > Greg> better idea of what the net effect on "typical" python code is? > > MAL's pybench would probably be better for this presuming it does some > addition with string operands. or stringbench. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
Greg> have you run any generic benchmarks such as pystone to get a Greg> better idea of what the net effect on "typical" python code is? MAL's pybench would probably be better for this presuming it does some addition with string operands. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
Steve Holden wrote: > instance.method(*args) <==> type.method(instance, *args) > > You can nowadays spell this as str.join("", lst) - no need to import a > whole module! except that str.join isn't polymorphic: >>> str.join(u",", ["1", "2", "3"]) Traceback (most recent call last): File "", line 1, in TypeError: descriptor 'join' requires a 'str' object but received a 'unicode' >>> string.join(["1", "2", "3"], u",") u'1,2,3' ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
Gregory P. Smith wrote: >>I've never liked the "".join([]) idiom for string concatenation; in my >>opinion it violates the principles "Beautiful is better than ugly." and >>"There should be one-- and preferably only one --obvious way to do it.". >>(And perhaps several others.) To that end I've submitted patch #1569040 >>to SourceForge: >> >>http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470 >>This patch speeds up using + for string concatenation. > > > yay! i'm glad to see this. i hate the "".join syntax. i still write > that as string.join() [...] instance.method(*args) <==> type.method(instance, *args) You can nowadays spell this as str.join("", lst) - no need to import a whole module! regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
Gregory P. Smith wrote: > have you run any generic benchmarks such as pystone to get a better > idea of what the net effect on "typical" python code is? I hadn't, but I'm happy to. On my machine (a fire-breathing Athlon 64 x2 4400+), best of three runs: Python 2.5 release: Pystone(1.1) time for 5 passes = 1.01757 This machine benchmarks at 49136.8 pystones/second Python 2.5 concat: Pystone(1.1) time for 5 passes = 0.963191 This machine benchmarks at 51910.8 pystones/second I'm surprised by this; I had expected it to be slightly *slower*, not the other way 'round. I'm not sure why this is. A cursory glance at pystone.py doesn't reveal any string concatenation using +, so I doubt it's benefiting from my speedup. And I didn't change the optimization flags when I compiled Python, so that should be the same. Josiah Carlson wrote: > Regardless of "nicer to read", I would just point out that Guido has > stated that Python will not have strings implemented as trees. > I suspect it was more a caution that Python wouldn't *permanently* store strings as "ropes". In my patch, the rope only exists until someone asks for the string's value, at which point the tree is rendered and dereferenced. From that point on the object is exactly like a normal PyStringObject to the external viewer. But you and I are, as I believe the saying goes, "channeling Guido (badly)". Perhaps some adult supervision will intervene soon and make its opinions known. For what it's worth, I've realized two things I want to change about my patch: * I left in a couple of /* lch */ comments I used during development as markers to find my own code. Whoops; I'll strip those out. * I realized that, because of struct packing, all PyStringObjects are currently wasting an average of two bytes apiece. (As in, that's something Python 2.5 does, not something added by my code.) I'll change my patch so strings are allocated more precisely. If my string concatenation patch is declined, I'll be sure to submit this patch separately. I'll try to submit an updated patch today. Cheers, /larry/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
On 5 Oct 2006, at 20:28, Gregory P. Smith wrote: >> I've never liked the "".join([]) idiom for string concatenation; >> in my >> opinion it violates the principles "Beautiful is better than >> ugly." and >> "There should be one-- and preferably only one --obvious way to do >> it.". >> (And perhaps several others.) To that end I've submitted patch >> #1569040 >> to SourceForge: >> >> http://sourceforge.net/tracker/index.php? >> func=detail&aid=1569040&group_id=5470&atid=305470 >> This patch speeds up using + for string concatenation. > > yay! i'm glad to see this. i hate the "".join syntax. Here here. Being able to write what you mean and have the language get decent performance none the less seems to me to be a "good thing". > have you run any generic benchmarks such as pystone to get a better > idea of what the net effect on "typical" python code is? Yeah, "real world" performance testing is always important with anything that uses lazy evaluation. If you get to control if and when the computation actually happens you have even more scope than usual for getting the benchmark answer you want to see! Cheers, Nicko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
"Gregory P. Smith" <[EMAIL PROTECTED]> wrote: > > > I've never liked the "".join([]) idiom for string concatenation; in my > > opinion it violates the principles "Beautiful is better than ugly." and > > "There should be one-- and preferably only one --obvious way to do it.". > > (And perhaps several others.) To that end I've submitted patch #1569040 > > to SourceForge: > > > > http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470 > > This patch speeds up using + for string concatenation. > > yay! i'm glad to see this. i hate the "".join syntax. i still write > that as string.join() because thats at least readable). it also fixes > the python idiom for fast string concatenation as intended; anyone > whos ever written code that builds a large string value by pushing > substrings into a list only to call join later should agree. > > mystr = "prefix" > while bla: > #... > mystr += moredata Regardless of "nicer to read", I would just point out that Guido has stated that Python will not have strings implemented as trees. Also, Python 3.x will have a data type called 'bytes', which will be the default return of file.read() (when files are opened as binary), which uses an over-allocation strategy like lists to get relatively fast concatenation (on the order of lst1 += lst2). - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
> I've never liked the "".join([]) idiom for string concatenation; in my > opinion it violates the principles "Beautiful is better than ugly." and > "There should be one-- and preferably only one --obvious way to do it.". > (And perhaps several others.) To that end I've submitted patch #1569040 > to SourceForge: > > http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470 > This patch speeds up using + for string concatenation. yay! i'm glad to see this. i hate the "".join syntax. i still write that as string.join() because thats at least readable). it also fixes the python idiom for fast string concatenation as intended; anyone whos ever written code that builds a large string value by pushing substrings into a list only to call join later should agree. mystr = "prefix" while bla: #... mystr += moredata is much nicer to read than mystr = "prefix" strParts = [mystr] while bla: #... strParts.append(moredata) mystr = "".join(strParts) have you run any generic benchmarks such as pystone to get a better idea of what the net effect on "typical" python code is? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
I've never liked the "".join([]) idiom for string concatenation; in my opinion it violates the principles "Beautiful is better than ugly." and "There should be one-- and preferably only one --obvious way to do it.". (And perhaps several others.) To that end I've submitted patch #1569040 to SourceForge: http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470 This patch speeds up using + for string concatenation. It's been in discussion on c.l.p for about a week, here: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf I'm not a Python guru, and my initial benchmark had many mistakes. With help from the community correct benchmarks emerged: + for string concatenation is now roughly as fast as the usual "".join() idiom when appending. (It appears to be *much* faster for prepending.) The patched Python passes all the tests in regrtest.py for which I have source; I didn't install external packages such as bsddb and sqlite3. My approach was to add a "string concatenation" object; I have since learned this is also called a "rope". Internally, a PyStringConcatationObject is exactly like a PyStringObject but with a few extra members taking an additional thirty-six bytes of storage. When you add two PyStringObjects together, string_concat() returns a PyStringConcatationObject which contains references to the two strings. Concatenating any mixture of PyStringObjects and PyStringConcatationObjects works similarly, though there are some internal optimizations. These changes are almost entirely contained within Objects/stringobject.c and Include/stringobject.h. There is one major externally-visible change in this patch: PyStringObject.ob_sval is no longer a char[1] array, but a char *. Happily, this only requires a recompile, because the CPython source is *marvelously* consistent about using the macro PyString_AS_STRING(). (One hopes extension authors are as consistent.) I only had to touch two other files (Python/ceval.c and Objects/codeobject.c) and those were one-line changes. There is one remaining place that still needs fixing: the self-described "hack" in Mac/Modules/MacOS.c. Fixing that is beyond my pay grade. I changed the representation of ob_sval for two reasons: first, it is initially NULL for a string concatenation object, and second, because it may point to separately-allocated memory. That's where the speedup came from--it doesn't render the string until someone asks for the string's value. It is telling to see my new implementation of PyString_AS_STRING, as follows (casts and extra parentheses removed for legibility): #define PyString_AS_STRING(x) ( x->ob_sval ? x->ob_sval : PyString_AsString(x) ) This adds a layer of indirection for the string and a branch, adding a tiny (but measurable) slowdown to the general case. Again, because the changes to PyStringObject are hidden by this macro, external users of these objects don't notice the difference. The patch is posted, and I have donned the thickest skin I have handy. I look forward to your feedback. Cheers, /larry/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com