Re: [Python-Dev] Revising RE docs

2005-09-05 Thread Fredrik Lundh
Guido van Rossum wrote:

 I also notice that _compile() is needlessly written as a varargs
 function -- all its uses pass it exactly two arguments.

that's because the function uses [1] the argument tuple as the cache key,
and I wanted to make the cache hit path as fast as possible.

(but that was back in the 1.6 days; things have changed a lot since then, so
maybe someone should benchmark some alternative ways to do this under
2.4...)

/F

1) well, it used to use it.  the code was modified slightly in 2.3 to prepend
the type of the pattern string; not sure why, since 8-bit and unicode patterns
should be equivalent. 



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Revising RE docs

2005-09-05 Thread Gareth McCaughan
Guido wrote:

   They *are* cached and there is no cost to using the functions instead
   of the methods unless you have so many regexps in your program that
   the cache is cleared (the limit is 100).
  
  Sure there is; the cost of looking them up in the cache.
...
  So in this (highly artificial toy) application it's about 7.5/2.5 = 3 times
  faster to use the methods instead of the functions.
 
 Yeah, but the cost is a constant -- it is not related to the cost of
 compiling the re.

True.

   (You should've shown how much it cost if you
 included the compilation in each search.)

Why should I have? I don't dispute that the caching helps -- I bet it
helps a *lot*. I was just observing that it's not true that there's
no cost to using the functions instead of the methods.

 I haven't looked into this, but I bet the overhead you're measuring is
 actually the extra Python function call, not the cache lookup itself.

Hmm, that's possible. But what matters in practice is how big
the cost of using re.search(...,...) rather than compiling
once and using the RE object's search method is, not where it
comes from.

-- 
g

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Revising RE docs

2005-09-05 Thread Fredrik Lundh
Am I the only who are getting mails from iextream at naver.com
whenever I post to python-dev, btw?

My Korean (?) isn't that good, so I'm not sure what they want...

/F 



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Revising RE docs

2005-09-05 Thread Tim Peters
[Fredrik Lundh]
 Am I the only who are getting mails from iextream at naver.com
 whenever I post to python-dev, btw?

 My Korean (?) isn't that good, so I'm not sure what they want...

Only thing I've seen from them is one post in the archives, on June 13:

http://mail.python.org/pipermail/python-dev/2005-June/054204.html

Must be a secret admirer.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Revising RE docs

2005-09-03 Thread Guido van Rossum
On 9/2/05, Gareth McCaughan [EMAIL PROTECTED] wrote:
 On Thursday 2005-09-01 18:09, Guido van Rossum wrote:
 
  They *are* cached and there is no cost to using the functions instead
  of the methods unless you have so many regexps in your program that
  the cache is cleared (the limit is 100).
 
 Sure there is; the cost of looking them up in the cache.
 
  import re,timeit
 
  timeit.re=re
  timeit.Timer(re.search(r(\d*).*(\d*), 
 abc123def456)).timeit(100)
 7.6042091846466064
 
  timeit.r = re.compile(r(\d*).*(\d*))
  timeit.Timer(r.search(abc123def456)).timeit(100)
 2.6358869075775146
 
  timeit.Timer().timeit(100)
 0.091850996017456055
 
 So in this (highly artificial toy) application it's about 7.5/2.5 = 3 times
 faster to use the methods instead of the functions.

Yeah, but the cost is a constant -- it is not related to the cost of
compiling the re. (You should've shown how much it cost if you
included the compilation in each search.)

I haven't looked into this, but I bet the overhead you're measuring is
actually the extra Python function call, not the cache lookup itself.
I also notice that _compile() is needlessly written as a varargs
function -- all its uses pass it exactly two arguments.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Revising RE docs

2005-09-02 Thread Gareth McCaughan
On Thursday 2005-09-01 18:09, Guido van Rossum wrote:

 They *are* cached and there is no cost to using the functions instead
 of the methods unless you have so many regexps in your program that
 the cache is cleared (the limit is 100).

Sure there is; the cost of looking them up in the cache.

 import re,timeit

 timeit.re=re
 timeit.Timer(re.search(r(\d*).*(\d*), 
abc123def456)).timeit(100)
7.6042091846466064

 timeit.r = re.compile(r(\d*).*(\d*))
 timeit.Timer(r.search(abc123def456)).timeit(100)
2.6358869075775146

 timeit.Timer().timeit(100)
0.091850996017456055

So in this (highly artificial toy) application it's about 7.5/2.5 = 3 times
faster to use the methods instead of the functions.

-- 
g

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Revising RE docs

2005-09-01 Thread Guido van Rossum
On 8/31/05, Stephen J. Turnbull [EMAIL PROTECTED] wrote:
  Michael == Michael Chermside [EMAIL PROTECTED] writes:
 
 Michael (2) is what we have today, but I would prefer (1) to
 Michael gently encourage people to use the precompiled objects
 Michael (which are distinctly faster when re-used).
 
 Didn't Fredrik Lundh strongly imply that implicitly compiled objects
 are cached?  That's a pretty big speed up right there.

What happened to RTSL? (Read the Source, Luke :)

They *are* cached and there is no cost to using the functions instead
of the methods unless you have so many regexps in your program that
the cache is cleared (the limit is 100).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Revising RE docs

2005-08-31 Thread Greg Ewing
Stephen J. Turnbull wrote:
 But you could have string objects (or a derivative) grow a
 compiled_regexp attribute internally.

That would make the core dependent on the re module,
which I think would be a bad idea.

Personally I like the way the compilation step is
made at least somewhat explicit. Regular expressions
are not strings; a string is just one way of representing
a regular expression. There could potentially be other
representations that compile to the same re object.

Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Revising RE docs

2005-08-31 Thread Stephen J. Turnbull
 Greg == Greg Ewing [EMAIL PROTECTED] writes:

Greg Stephen J. Turnbull wrote:

 But you could have string objects (or a derivative) grow a
 compiled_regexp attribute internally.

Greg That would make the core dependent on the re module, which I
Greg think would be a bad idea.

Probably.

Greg Personally I like the way the compilation step is made at
Greg least somewhat explicit. Regular expressions are not
Greg strings; a string is just one way of representing a regular
Greg expression. There could potentially be other representations
Greg that compile to the same re object.

I guess I agree, but I would put the emphasis elsewhere.  Something
like, think of the call to compile() as a declaration that this string
(or other representation) represents a regular expression.  The actual
compilation is an accidental side effect: it could be postponed to the
first call of .match() or .search().

So I guess I would prefer a nomenclature like

r = re.RegExp (string)

over

r = re.compile (string)

Not a big deal though.

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of TsukubaTennodai 1-1-1 Tsukuba 305-8573 JAPAN
   Ask not how you can do free software business;
  ask what your business can do for free software.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Revising RE docs (was: partition() (was: Remove str.find in 3.0?))

2005-08-30 Thread Michael Chermside
Barry Warsaw writes:
 Although it's mildly annoying that the docs describe the compiled method
 names in terms of the uncompiled functions.  I always find myself
 looking up the regexp object's API only to be shuffled off to the
 module's API and then having to do the argument remapping myself.

An excellent point. Obviously, EITHER (1) the module functions ought to
be documented by reference to the RE object methods, or vice versa:
(2) document the RE object methods by reference to the module functions.

(2) is what we have today, but I would prefer (1) to gently encourage
people to use the precompiled objects (which are distinctly faster when
re-used).

Does anyone else think we ought to swap that around in the documentation?
I'm not trying to assign more work to Fred... but if there were a
python-dev consensus that this would be desirable, then perhaps someone
would be encouraged to supply a patch.

-- Michael Chermside

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Revising RE docs (was: partition() (was: Remove str.find in 3.0?))

2005-08-30 Thread Fred L. Drake, Jr.
On Tuesday 30 August 2005 17:35, Michael Chermside wrote:
  An excellent point. Obviously, EITHER (1) the module functions ought to
  be documented by reference to the RE object methods, or vice versa:
  (2) document the RE object methods by reference to the module functions.

Agreed.  I think the current arrangement is primarily a historical accident 
more than anything else, but I didn't write that section, so could be wrong.

  Does anyone else think we ought to swap that around in the documentation?
  I'm not trying to assign more work to Fred... but if there were a
  python-dev consensus that this would be desirable, then perhaps someone
  would be encouraged to supply a patch.

I'd rather see it reversed from what it is as well.  While I don't have the 
time myself (and don't consider it a critical issue), I certainly won't 
revert a patch to make the change without good reason.  :-)


  -Fred

-- 
Fred L. Drake, Jr.   fdrake at acm.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Revising RE docs (was: partition() (was: Removestr.find in 3.0?))

2005-08-30 Thread Terry Reedy

Fred L. Drake, Jr. [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 I'd rather see it reversed from what it is as well.  While I don't have 
 the
 time myself (and don't consider it a critical issue), I certainly won't
 revert a patch to make the change without good reason.  :-)

Do you mean 'not reject' rather than 'not revert'?

Terry J. Reedy



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com