Re: [Python-Dev] Revising RE docs
Guido van Rossum wrote: I also notice that _compile() is needlessly written as a varargs function -- all its uses pass it exactly two arguments. that's because the function uses [1] the argument tuple as the cache key, and I wanted to make the cache hit path as fast as possible. (but that was back in the 1.6 days; things have changed a lot since then, so maybe someone should benchmark some alternative ways to do this under 2.4...) /F 1) well, it used to use it. the code was modified slightly in 2.3 to prepend the type of the pattern string; not sure why, since 8-bit and unicode patterns should be equivalent. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Revising RE docs
Guido wrote: They *are* cached and there is no cost to using the functions instead of the methods unless you have so many regexps in your program that the cache is cleared (the limit is 100). Sure there is; the cost of looking them up in the cache. ... So in this (highly artificial toy) application it's about 7.5/2.5 = 3 times faster to use the methods instead of the functions. Yeah, but the cost is a constant -- it is not related to the cost of compiling the re. True. (You should've shown how much it cost if you included the compilation in each search.) Why should I have? I don't dispute that the caching helps -- I bet it helps a *lot*. I was just observing that it's not true that there's no cost to using the functions instead of the methods. I haven't looked into this, but I bet the overhead you're measuring is actually the extra Python function call, not the cache lookup itself. Hmm, that's possible. But what matters in practice is how big the cost of using re.search(...,...) rather than compiling once and using the RE object's search method is, not where it comes from. -- g ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Revising RE docs
Am I the only who are getting mails from iextream at naver.com whenever I post to python-dev, btw? My Korean (?) isn't that good, so I'm not sure what they want... /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Revising RE docs
[Fredrik Lundh] Am I the only who are getting mails from iextream at naver.com whenever I post to python-dev, btw? My Korean (?) isn't that good, so I'm not sure what they want... Only thing I've seen from them is one post in the archives, on June 13: http://mail.python.org/pipermail/python-dev/2005-June/054204.html Must be a secret admirer. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Revising RE docs
On 9/2/05, Gareth McCaughan [EMAIL PROTECTED] wrote: On Thursday 2005-09-01 18:09, Guido van Rossum wrote: They *are* cached and there is no cost to using the functions instead of the methods unless you have so many regexps in your program that the cache is cleared (the limit is 100). Sure there is; the cost of looking them up in the cache. import re,timeit timeit.re=re timeit.Timer(re.search(r(\d*).*(\d*), abc123def456)).timeit(100) 7.6042091846466064 timeit.r = re.compile(r(\d*).*(\d*)) timeit.Timer(r.search(abc123def456)).timeit(100) 2.6358869075775146 timeit.Timer().timeit(100) 0.091850996017456055 So in this (highly artificial toy) application it's about 7.5/2.5 = 3 times faster to use the methods instead of the functions. Yeah, but the cost is a constant -- it is not related to the cost of compiling the re. (You should've shown how much it cost if you included the compilation in each search.) I haven't looked into this, but I bet the overhead you're measuring is actually the extra Python function call, not the cache lookup itself. I also notice that _compile() is needlessly written as a varargs function -- all its uses pass it exactly two arguments. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Revising RE docs
On Thursday 2005-09-01 18:09, Guido van Rossum wrote: They *are* cached and there is no cost to using the functions instead of the methods unless you have so many regexps in your program that the cache is cleared (the limit is 100). Sure there is; the cost of looking them up in the cache. import re,timeit timeit.re=re timeit.Timer(re.search(r(\d*).*(\d*), abc123def456)).timeit(100) 7.6042091846466064 timeit.r = re.compile(r(\d*).*(\d*)) timeit.Timer(r.search(abc123def456)).timeit(100) 2.6358869075775146 timeit.Timer().timeit(100) 0.091850996017456055 So in this (highly artificial toy) application it's about 7.5/2.5 = 3 times faster to use the methods instead of the functions. -- g ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Revising RE docs
On 8/31/05, Stephen J. Turnbull [EMAIL PROTECTED] wrote: Michael == Michael Chermside [EMAIL PROTECTED] writes: Michael (2) is what we have today, but I would prefer (1) to Michael gently encourage people to use the precompiled objects Michael (which are distinctly faster when re-used). Didn't Fredrik Lundh strongly imply that implicitly compiled objects are cached? That's a pretty big speed up right there. What happened to RTSL? (Read the Source, Luke :) They *are* cached and there is no cost to using the functions instead of the methods unless you have so many regexps in your program that the cache is cleared (the limit is 100). -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Revising RE docs
Stephen J. Turnbull wrote: But you could have string objects (or a derivative) grow a compiled_regexp attribute internally. That would make the core dependent on the re module, which I think would be a bad idea. Personally I like the way the compilation step is made at least somewhat explicit. Regular expressions are not strings; a string is just one way of representing a regular expression. There could potentially be other representations that compile to the same re object. Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Revising RE docs
Greg == Greg Ewing [EMAIL PROTECTED] writes: Greg Stephen J. Turnbull wrote: But you could have string objects (or a derivative) grow a compiled_regexp attribute internally. Greg That would make the core dependent on the re module, which I Greg think would be a bad idea. Probably. Greg Personally I like the way the compilation step is made at Greg least somewhat explicit. Regular expressions are not Greg strings; a string is just one way of representing a regular Greg expression. There could potentially be other representations Greg that compile to the same re object. I guess I agree, but I would put the emphasis elsewhere. Something like, think of the call to compile() as a declaration that this string (or other representation) represents a regular expression. The actual compilation is an accidental side effect: it could be postponed to the first call of .match() or .search(). So I guess I would prefer a nomenclature like r = re.RegExp (string) over r = re.compile (string) Not a big deal though. -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of TsukubaTennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can do free software business; ask what your business can do for free software. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Revising RE docs (was: partition() (was: Remove str.find in 3.0?))
Barry Warsaw writes: Although it's mildly annoying that the docs describe the compiled method names in terms of the uncompiled functions. I always find myself looking up the regexp object's API only to be shuffled off to the module's API and then having to do the argument remapping myself. An excellent point. Obviously, EITHER (1) the module functions ought to be documented by reference to the RE object methods, or vice versa: (2) document the RE object methods by reference to the module functions. (2) is what we have today, but I would prefer (1) to gently encourage people to use the precompiled objects (which are distinctly faster when re-used). Does anyone else think we ought to swap that around in the documentation? I'm not trying to assign more work to Fred... but if there were a python-dev consensus that this would be desirable, then perhaps someone would be encouraged to supply a patch. -- Michael Chermside ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Revising RE docs (was: partition() (was: Remove str.find in 3.0?))
On Tuesday 30 August 2005 17:35, Michael Chermside wrote: An excellent point. Obviously, EITHER (1) the module functions ought to be documented by reference to the RE object methods, or vice versa: (2) document the RE object methods by reference to the module functions. Agreed. I think the current arrangement is primarily a historical accident more than anything else, but I didn't write that section, so could be wrong. Does anyone else think we ought to swap that around in the documentation? I'm not trying to assign more work to Fred... but if there were a python-dev consensus that this would be desirable, then perhaps someone would be encouraged to supply a patch. I'd rather see it reversed from what it is as well. While I don't have the time myself (and don't consider it a critical issue), I certainly won't revert a patch to make the change without good reason. :-) -Fred -- Fred L. Drake, Jr. fdrake at acm.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Revising RE docs (was: partition() (was: Removestr.find in 3.0?))
Fred L. Drake, Jr. [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] I'd rather see it reversed from what it is as well. While I don't have the time myself (and don't consider it a critical issue), I certainly won't revert a patch to make the change without good reason. :-) Do you mean 'not reject' rather than 'not revert'? Terry J. Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com