Agh, if you go and do that, you must then be sure that rx is capable of
optimizing /a/i and /[aA]/ in the same way. What I mean is that Perl's
current regex engine is able to use /abc/i as a constant in a string,
while it cannot do the same for /[Aa][Bb][Cc]/. Why? Because in the
first
On Jan 31, Hong Zhang said:
But as you say, case folding is expensive. And with this approach you
are going to case-fold every string that is matched against an rx
that has some part of it that is case-insensitive.
That is correct in general. But regex compiler can be smarter than that.
For
On Wed, 30 Jan 2002 17:45:58 +, Graham Barr wrote:
On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote:
# rx_setprops P0, i, 2
# branch $start0
# $advance:
# rx_advance P0, $fail
# $start0:
#
Peter Haworth:
# On Wed, 30 Jan 2002 17:45:58 +, Graham Barr wrote:
# On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote:
# # rx_setprops P0, i, 2
# # branch $start0
# # $advance:
# # rx_advance P0, $fail
# #
On Thu, Jan 31, 2002 at 08:54:21AM -0800, Brent Dax wrote:
Peter Haworth:
# On Wed, 30 Jan 2002 17:45:58 +, Graham Barr wrote:
# On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote:
# # rx_setprops P0, i, 2
# # branch $start0
# #
On Thu, Jan 31, 2002 at 05:15:49PM +, Graham Barr wrote:
Yes, I was assuming that. However what is to be gained by case
folding the input string ?
Because parts of an rx can be case-insensitive while other parts
are case-sensitive, we will probably need two sorts of ops anyway
(or a
Because parts of an rx can be case-insensitive while other parts
are case-sensitive, we will probably need two sorts of ops anyway
(or a way to tell the op to be case-insensitive). And you will
only be able to do the case folding when the whole rx is
case-insensitive.
I don't like your
On Thu, Jan 31, 2002 at 11:18:58AM -0800, Hong Zhang wrote:
Because parts of an rx can be case-insensitive while other parts
are case-sensitive, we will probably need two sorts of ops anyway
(or a way to tell the op to be case-insensitive). And you will
only be able to do the case
But as you say, case folding is expensive. And with this approach you
are going to case-fold every string that is matched against an rx
that has some part of it that is case-insensitive.
That is correct in general. But regex compiler can be smarter than that.
For example, rx should optimize
Tim Bunce:
# On Thu, Jan 31, 2002 at 05:15:49PM +, Graham Barr wrote:
#
# Yes, I was assuming that. However what is to be gained by case
# folding the input string ?
#
# Because parts of an rx can be case-insensitive while other parts
# are case-sensitive, we will probably need two
--- Brent Dax [EMAIL PROTECTED] wrote:
Tim Bunce:
# On Thu, Jan 31, 2002 at 05:15:49PM +, Graham Barr wrote:
#
# Especially as the perl6 rx engine will have to be able to
# work directly on
# non-trivial things like streams and generators ans suchlike.
I have a suggestion similar to
On Thu, Jan 31, 2002 at 12:50:52PM -0800, Brent Dax wrote:
Let me know if I'm brilliant, on crack, or both with this idea.
I've no idea :-)
Tim.
On Wed, Jan 30, 2002 at 08:13:55AM -0800, Ashley Winters wrote:
I think that's exactly what you should be doing! Neither parrot nor the
rx engine should try to be a full compiler. The rx engine definitely
should have opcodes in the virtual machine, but those opcodes should
simply contain
Basically, I see a black-box being built in the interests of speed.
Voodoo array formats, bitmaps, and other such things to avoid actually
spelling out what the regular expression is doing *in parrot code*.
[snip]
What I see is that rx_literal is a speed hack to avoid compiling this
into parrot
Ashley Winters wrote:
First, we set the rx engine to case-insensitive. Why is that bad? It's
setting a runtime property for what should be compile-time
unicode-character-kung-fu. Assuming your CPU knows what the gritty
details of unicode in the first place just feels wrong, but I digress.
I
Ashley Winters:
# Who the hell am I?
# I've been only a weblog-lurker till now. It's been a couple
# years since
# I last contributed to Perl5. I just read the latest Apocalypse and it
# inspired me to get a parrot snapshot and look around.
Welcome back to the land of the living. :^)
# What's
begin quote from Ashley Winters:
I think that's exactly what you should be doing! Neither parrot nor the
rx engine should try to be a full compiler. The rx engine definitely
should have opcodes in the virtual machine, but those opcodes should
simply contain state-machine/backtracking info,
On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote:
# rx_setprops P0, i, 2
# branch $start0
# $advance:
# rx_advance P0, $fail
# $start0:
# rx_literal P0, a, $advance
#
# First, we set the rx engine to
On Wednesday 30 January 2002 12:32, Brent Dax wrote:
# Mostly, I'd like to hear how either Unicode character-ranges aren't
# deterministic at compile-time (I doubt that) or how crippling to
One word: locale.
Not that locales couldn't provide pre-compiled character classes.
--
Bryan C.
On Wednesday 30 January 2002 11:13, Ashley Winters wrote:
First, we set the rx engine to case-insensitive. Why is that bad? It's
setting a runtime property for what should be compile-time
{snip}
Now, the current CVS rx engine is/would do this at runtime.
We're also currently a compiler
On Wed, Jan 30, 2002 at 08:37:30PM -0500, Bryan C. Warnock wrote:
But if you know they're going to be twenty times slower, why are you doing
it? Because we know / think / hope / pray / have been making sacrifices to
Tangential note: current benchmarking indicates that we're doing a lot
At 6:28 PM -0800 1/30/02, Steve Fink wrote:
I'm sure in Apoc 5 Larry's going to go way beyond that and embed full
parsers, not just regularish language matchers, but the above is
easier to grasp.
Odds are, yes. And don't be surprised if the RE engine's required to
return data structures as
On Wednesday 30 January 2002 21:42, Dan Sugalski wrote:
I think we may want trees as a fundamental data type at some point...
I wonder about the trees
--
Bryan C. Warnock
[EMAIL PROTECTED]
23 matches
Mail list logo