RE: parrot rx engine

2002-02-04 Thread Hong Zhang
Agh, if you go and do that, you must then be sure that rx is capable of optimizing /a/i and /[aA]/ in the same way. What I mean is that Perl's current regex engine is able to use /abc/i as a constant in a string, while it cannot do the same for /[Aa][Bb][Cc]/. Why? Because in the first

RE: parrot rx engine

2002-02-02 Thread Jeff 'japhy' Pinyan
On Jan 31, Hong Zhang said: But as you say, case folding is expensive. And with this approach you are going to case-fold every string that is matched against an rx that has some part of it that is case-insensitive. That is correct in general. But regex compiler can be smarter than that. For

Re: parrot rx engine

2002-01-31 Thread Peter Haworth
On Wed, 30 Jan 2002 17:45:58 +, Graham Barr wrote: On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote: # rx_setprops P0, i, 2 # branch $start0 # $advance: # rx_advance P0, $fail # $start0: #

RE: parrot rx engine

2002-01-31 Thread Brent Dax
Peter Haworth: # On Wed, 30 Jan 2002 17:45:58 +, Graham Barr wrote: # On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote: # # rx_setprops P0, i, 2 # # branch $start0 # # $advance: # # rx_advance P0, $fail # #

Re: parrot rx engine

2002-01-31 Thread Graham Barr
On Thu, Jan 31, 2002 at 08:54:21AM -0800, Brent Dax wrote: Peter Haworth: # On Wed, 30 Jan 2002 17:45:58 +, Graham Barr wrote: # On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote: # # rx_setprops P0, i, 2 # # branch $start0 # #

Re: parrot rx engine

2002-01-31 Thread Tim Bunce
On Thu, Jan 31, 2002 at 05:15:49PM +, Graham Barr wrote: Yes, I was assuming that. However what is to be gained by case folding the input string ? Because parts of an rx can be case-insensitive while other parts are case-sensitive, we will probably need two sorts of ops anyway (or a

RE: parrot rx engine

2002-01-31 Thread Hong Zhang
Because parts of an rx can be case-insensitive while other parts are case-sensitive, we will probably need two sorts of ops anyway (or a way to tell the op to be case-insensitive). And you will only be able to do the case folding when the whole rx is case-insensitive. I don't like your

Re: parrot rx engine

2002-01-31 Thread Graham Barr
On Thu, Jan 31, 2002 at 11:18:58AM -0800, Hong Zhang wrote: Because parts of an rx can be case-insensitive while other parts are case-sensitive, we will probably need two sorts of ops anyway (or a way to tell the op to be case-insensitive). And you will only be able to do the case

RE: parrot rx engine

2002-01-31 Thread Hong Zhang
But as you say, case folding is expensive. And with this approach you are going to case-fold every string that is matched against an rx that has some part of it that is case-insensitive. That is correct in general. But regex compiler can be smarter than that. For example, rx should optimize

RE: parrot rx engine

2002-01-31 Thread Brent Dax
Tim Bunce: # On Thu, Jan 31, 2002 at 05:15:49PM +, Graham Barr wrote: # # Yes, I was assuming that. However what is to be gained by case # folding the input string ? # # Because parts of an rx can be case-insensitive while other parts # are case-sensitive, we will probably need two

RE: parrot rx engine

2002-01-31 Thread Ashley Winters
--- Brent Dax [EMAIL PROTECTED] wrote: Tim Bunce: # On Thu, Jan 31, 2002 at 05:15:49PM +, Graham Barr wrote: # # Especially as the perl6 rx engine will have to be able to # work directly on # non-trivial things like streams and generators ans suchlike. I have a suggestion similar to

Re: parrot rx engine

2002-01-31 Thread Tim Bunce
On Thu, Jan 31, 2002 at 12:50:52PM -0800, Brent Dax wrote: Let me know if I'm brilliant, on crack, or both with this idea. I've no idea :-) Tim.

Re: parrot rx engine

2002-01-30 Thread Jonathan Scott Duff
On Wed, Jan 30, 2002 at 08:13:55AM -0800, Ashley Winters wrote: I think that's exactly what you should be doing! Neither parrot nor the rx engine should try to be a full compiler. The rx engine definitely should have opcodes in the virtual machine, but those opcodes should simply contain

Re: parrot rx engine

2002-01-30 Thread Melvin Smith
Basically, I see a black-box being built in the interests of speed. Voodoo array formats, bitmaps, and other such things to avoid actually spelling out what the regular expression is doing *in parrot code*. [snip] What I see is that rx_literal is a speed hack to avoid compiling this into parrot

RE: parrot rx engine

2002-01-30 Thread Angel Faus
Ashley Winters wrote: First, we set the rx engine to case-insensitive. Why is that bad? It's setting a runtime property for what should be compile-time unicode-character-kung-fu. Assuming your CPU knows what the gritty details of unicode in the first place just feels wrong, but I digress. I

RE: parrot rx engine

2002-01-30 Thread Brent Dax
Ashley Winters: # Who the hell am I? # I've been only a weblog-lurker till now. It's been a couple # years since # I last contributed to Perl5. I just read the latest Apocalypse and it # inspired me to get a parrot snapshot and look around. Welcome back to the land of the living. :^) # What's

Re: parrot rx engine

2002-01-30 Thread Simon Cozens
begin quote from Ashley Winters: I think that's exactly what you should be doing! Neither parrot nor the rx engine should try to be a full compiler. The rx engine definitely should have opcodes in the virtual machine, but those opcodes should simply contain state-machine/backtracking info,

Re: parrot rx engine

2002-01-30 Thread Graham Barr
On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote: # rx_setprops P0, i, 2 # branch $start0 # $advance: # rx_advance P0, $fail # $start0: # rx_literal P0, a, $advance # # First, we set the rx engine to

Re: parrot rx engine

2002-01-30 Thread Bryan C. Warnock
On Wednesday 30 January 2002 12:32, Brent Dax wrote: # Mostly, I'd like to hear how either Unicode character-ranges aren't # deterministic at compile-time (I doubt that) or how crippling to One word: locale. Not that locales couldn't provide pre-compiled character classes. -- Bryan C.

Re: parrot rx engine

2002-01-30 Thread Bryan C. Warnock
On Wednesday 30 January 2002 11:13, Ashley Winters wrote: First, we set the rx engine to case-insensitive. Why is that bad? It's setting a runtime property for what should be compile-time {snip} Now, the current CVS rx engine is/would do this at runtime. We're also currently a compiler

Re: parrot rx engine

2002-01-30 Thread Steve Fink
On Wed, Jan 30, 2002 at 08:37:30PM -0500, Bryan C. Warnock wrote: But if you know they're going to be twenty times slower, why are you doing it? Because we know / think / hope / pray / have been making sacrifices to Tangential note: current benchmarking indicates that we're doing a lot

Re: parrot rx engine

2002-01-30 Thread Dan Sugalski
At 6:28 PM -0800 1/30/02, Steve Fink wrote: I'm sure in Apoc 5 Larry's going to go way beyond that and embed full parsers, not just regularish language matchers, but the above is easier to grasp. Odds are, yes. And don't be surprised if the RE engine's required to return data structures as

Re: parrot rx engine

2002-01-30 Thread Bryan C. Warnock
On Wednesday 30 January 2002 21:42, Dan Sugalski wrote: I think we may want trees as a fundamental data type at some point... I wonder about the trees -- Bryan C. Warnock [EMAIL PROTECTED]