Re: Hypothetical synonyms

2002-08-29 Thread Janek Schleicher

Aaron Sherman wrote at Wed, 28 Aug 2002 00:34:15 +0200:

 $stuff = (defined($1)?$1:$2) if /^\s*(?:(.*?)|(\S+))/;

It gives me the idea of a missing feature:

What really should be expressed is:

my ($stuff) = /^\s*(°.*?°|\S+)/;

where the ° character would mean,
Don't capture the previous element.

I think that such a meaning of uncapturing elements
from a regexp would be really nice,
as it would help to express things directly,
instead of going complicated ways.

The ° character doesn't have any special meaning,
that's why I choosed it in the above example.
However, it also symbolizes a little capturing
and as it isn't filled,
it could really symbolize an uncapturing.

I don't know how hard it would be to implement or
whether it had already discussed yet.


Greetings,
Janek




Capturing alternations (was Re: Hypothetical synonyms)

2002-08-29 Thread Trey Harris

In a message dated Thu, 29 Aug 2002, Janek Schleicher writes:

 Aaron Sherman wrote at Wed, 28 Aug 2002 00:34:15 +0200:

  $stuff = (defined($1)?$1:$2) if /^\s*(?:(.*?)|(\S+))/;

 It gives me the idea of a missing feature:

 What really should be expressed is:

 my ($stuff) = /^\s*(°.*?°|\S+)/;

 where the ° character would mean,
 Don't capture the previous element.

Hmm.  One thing that has always bothered me about regexes is capturing
parentheses in alternations.  It seems to me that:

my ($stuff) = /^\s* [ (.*?) | (\S+) ]/;

should DWIM somehow, since it's impossible that both parens will capture.
So when the same number of capturing parens appear in each of an
alternation, they should factor out to being a single return value.

Is this possible in the general case?

Trey




Re: Capturing alternations (was Re: Hypothetical synonyms)

2002-08-29 Thread Damian Conway

Piers wrote:


 Not exactly DWIM, but how about:
 
   my $stuff = /^\s* [ (.*?) | (\S+) ] : { $foo := $+ }/;
 
 Assuming $+ means 'the last capture group matched' as it does now.


Or just:

 my $stuff = /^\s* [ $foo:=(.*?) | $foo:=(\S+) ]/;

BTW, that doesn't actually *do* the match. It merely puts a reference
to a rule object into $stuff.

Perhaps we all actually meant variants on:

 my $stuff = m/^\s* [ $0:=(.*?) | $0:=(\S+) ]/;

???

Damian







Re: Does ::: constrain the pattern engine implementation?

2002-08-29 Thread Jerome Vouillon

On Wed, Aug 28, 2002 at 10:36:54AM -0700, Larry Wall wrote:
 That is a worthy consideration, but expressiveness takes precedence
 over it in this case.  DFAs are really only good for telling you
 *whether* and *where* a pattern matches as a whole.  They are
 relatively useless for telling you *how* a pattern matches.

Actually, DFAs can also tell you *how* a pattern matches matches.  For
instance the RE library (http://www.sourceforge.net/projects/libre/)
is DFA-based and support a lot of Perl 5 regular expression features:
- submatches
- left-most matching semantics
- greedy and non-greedy operators (*, *?, ?, ??)
- zero-length assertions such as ^, $, \b.

I don't know how to implement back-references and embedded Perl code
using a DFA.  Look-ahead and look-behind assertion can probably be
implemented, though this looks tricky.  But, these features are not
used that often.  So, I believe than most real-life Perl 5 regular
expressions can be handled by a DFA.

 Add to that the fact that most real-life patterns don't generally do
 much backtracking, because they're written to succeed, not to fail.
 This pattern never backtracks, for instance:
 
 my ($num) = /^Items: (\d+)/;

Another advantage of DFAs over NFAs is that the core of a DFA-based
pattern matching implementation is a small simple loop which is
executed very efficiently by modern CPUs.  On the other hand, the core
of a NFA-based implementation is much larger and much more complex,
and I'm not sure it is executed as efficiently.  In particular, there
is probably a lot more branch mispredictions.

As an example of what performance improvement one can sometimes
achieve using a DFA-based rather than a NFA-based implementation,
here are the results I get on my computer for the regular expression
benchmark from the Great Computer Language Shootout.
(http://www.bagley.org/~doug/shootout/bench/regexmatch/)

Perl 5  3.49s
OCaml with PCRE (NFA-based) 3.17s
OCaml with RE (DFA-based)   0.34s

-- Jerome



backtracking into { code }

2002-08-29 Thread Ken Fox

A question: Do rules matched in a { code } block set backtrack points for
the outer rule? For example, are these rules equivalent?

  rule expr1 {
term { /operators/ or fail } term
  }

  rule expr2 {
term operators term
  }

And a comment: It would be nice to have procedural control over back-
tracking so that { code } can fail, succeed (not fail), or succeed and
commit. Right now we can follow { code } with ::, :::, etc. but that does
not allow much control. I'm a little afraid of what happens in an LL(Inf)
grammar if backtracking states aren't aggressively pruned.

- Ken



Re: Hypothetical synonyms

2002-08-29 Thread Luke Palmer

 The ° character doesn't have any special meaning,
 that's why I choosed it in the above example.
 However, it also symbolizes a little capturing
 and as it isn't filled,
 it could really symbolize an uncapturing.

Interesting idea.  I'm not sure if I agree with it yet.  However, I don't 
agree with your syntax, as I can't type that character.  Is it possible to 
modify what was captured?

/ ([ \\ . { chop; chop } | [^\\] ]*?) /

Or is that just too ugly?

Luke




Re: backtracking into { code }

2002-08-29 Thread Aaron Sherman

On Thu, 2002-08-29 at 08:05, Ken Fox wrote:
 A question: Do rules matched in a { code } block set backtrack points for
 the outer rule? For example, are these rules equivalent?
 
   rule expr1 {
 term { /operators/ or fail } term
   }
 
   rule expr2 {
 term operators term
   }
 
 And a comment: It would be nice to have procedural control over back-
 tracking so that { code } can fail, succeed (not fail), or succeed and
 commit. Right now we can follow { code } with ::, :::, etc. but that does
 not allow much control. I'm a little afraid of what happens in an LL(Inf)
 grammar if backtracking states aren't aggressively pruned.

Well, if /.../ is returning a result object (Let's say
CORE::RX::Result), then I would imagine it's an easy enough thing to let
you create your own, or return the one from a rule that you invoke.
e.g.:

rule { term { /operators/.commit(1) or fail } term }

The hypothetical commit() method being one that would take a number and
modify the result object so that it commits as if you had used that many
colons.

{} inside a rule would, I imagine be implemented like so:

sub rxbraces ($code) {
my $stat = $code();
if $stat.isa(CORE::RX::Result) {
return $stat;
} else {
my $r is CORE::RX::Result;
$r.success($stat); # Boolean status-setting method
return $r;
}
}

Or the moral equiv In other words, it should be able to return a
result of your choosing.

Sorry if I've missed some of the design. My Perl 6 pseudo-code may not
be legal.





Re: backtracking into { code }

2002-08-29 Thread Ken Fox

Aaron Sherman wrote:
 rule { term { /operators/.commit(1) or fail } term }
 
 The hypothetical commit() method being one that would take a number and

That would only be useful if the outer rule can backtrack into the
inner /operators/ rule. Can it?

I agree with you that a commit method would be useful -- especially when
used on $self. I'd probably write your example as:

  rule { term { m/operators { $self.commit(1) }/ or fail } term }

which is of course just a complicated

  rule { term { m/operators :/ or fail } term }

BTW, why isn't fail a method? Then a rule could pass itself to a sub-rule
and allow the sub-rule to fail it's parent, but not the entire match. Isn't
failing just invoking the last continuation on the backtrack stack?

- Ken



Re: auto deserialization

2002-08-29 Thread Steve Canfield

From: Dan Sugalski [EMAIL PROTECTED]
I actually had something a bit more subversive
in mind, where the assignment operator for the
Date class did some magic the same way we do
now when we do math on strings.

I was thinking a simple general purpose rule. If the variable is
typed, and its class has a standard static method for
instantiating from a string, and if a String object is being assigned
to the variable, then the class's deserialization method is called,
returning the new object and assigning it to the variable.

_
Send and receive Hotmail on your mobile device: http://mobile.msn.com




Re: backtracking into { code }

2002-08-29 Thread Aaron Sherman

On Thu, 2002-08-29 at 10:28, Ken Fox wrote:
 Aaron Sherman wrote:
  rule { term { /operators/.commit(1) or fail } term }
  
  The hypothetical commit() method being one that would take a number and
 
 That would only be useful if the outer rule can backtrack into the
 inner /operators/ rule. Can it?

Of course not. In the same way that 

rule foo { b }
rule bar { a foo+ b }
abb =~ /bar/

would not. You backtrack OVER it, and that's when your commit (of
whatever degree) would come into play.

 I agree with you that a commit method would be useful -- especially when
 used on $self. I'd probably write your example as:
 
   rule { term { m/operators { $self.commit(1) }/ or fail } term }.
 which is of course just a complicated
 
   rule { term { m/operators :/ or fail } term }

There's no way that can affect anything, as : doesn't affect calling
rules, e.g.:

rule foo { b : }
rule bar { a foo+ b }
abb =~ /bar/

will match, because the foo rule never needs to backtrack. If foo had
used C commit , then you'd fail, but that's a horse of a different
animal.

The goal was to dynamically cause backtracking over inline code to fail.





Re: auto deserialization

2002-08-29 Thread Nicholas Clark

On Thu, Aug 29, 2002 at 07:52:42AM -0700, Steve Canfield wrote:
 From: Dan Sugalski [EMAIL PROTECTED]
 I actually had something a bit more subversive
 in mind, where the assignment operator for the
 Date class did some magic the same way we do
 now when we do math on strings.
 
 I was thinking a simple general purpose rule. If the variable is
 typed, and its class has a standard static method for
 instantiating from a string, and if a String object is being assigned
 to the variable, then the class's deserialization method is called,
 returning the new object and assigning it to the variable.

This is possibly more an internals question, but I was assuming that the
serialization/deserialization methods would normally be converting an object
to an efficient packed 8 bit binary serial format (much like Storable
does).

In which case, is it a counterproductive assumption to expect (or mandate)
that the incoming serialization method on a class accepts well formed
human readable Unicode (or Shift-JIS or ASCII or whatever) strings?

Surely a class is allowed to make a distinction between the format that it
uses to serialize itself, and the format(s) of initialization strings it
accepts?

Nicholas Clark



Re: Hypothetical synonyms

2002-08-29 Thread Janek Schleicher

Luke Palmer wrote at Thu, 29 Aug 2002 15:21:57 +0200:

 The ° character doesn't have any special meaning,
 that's why I choosed it in the above example.
 However, it also symbolizes a little capturing
 and as it isn't filled,
 it could really symbolize an uncapturing.
 
 Interesting idea.  I'm not sure if I agree with it yet.  However, I don't 
 agree with your syntax, as I can't type that character.  

Year, that's of course a problem.
But I don't have any imagination what over typeable character
with no other meaning could be choosen.

 Is it possible to 
 modify what was captured?
 
   / ([ \\ . { chop; chop } | [^\\] ]*?) /
 
 Or is that just too ugly?

IMHO, that looks as ugly as the other workaround solutions :-)

I think, the greatest strength of Perl is that
it expresses simple things in a simple, short and natural way.

Such a regexp behaviour would simplify a lot of
jobs where we have to make workarounds instead
about the simple stuff
  Match it, capture the relevant parts and ignore some irrelevant subparts.

It's always possible to implemented with
- more captures, joined together later
or
- a substitution regexp/translitariton for the captured part
  to remove the irrelevant subparts


It's from my IMHO comparable to problem
  Group it, but don't capture it
what had been solved with the (?:) sytnax.
From that regarding,
a (?_...) (Questionmark underscore) syntax could also be an idea
with the meaning
  Group it, don't capture it even not in surrounding captures.
With it,
the OP problem would look like:
/\s*((?_).*?(?_°)|\S+)/;

(I choosed the underscore, as it is typeable and could have the mnemonic meaning
 of some underlying unimport background group)


But perhaps, I'm only dreaming 


Cheerio,
Janek




Re: Hypothetical synonyms

2002-08-29 Thread Larry Wall

Don't forget you can parameterize rules with subrules.  I don't see
any reason you couldn't write a

pick (.*?) | (\S+)

kind of rule and do whatever you like with the submatched bits.

Larry




Re: declaring if and while (was: rule, rx and sub)

2002-08-29 Thread Larry Wall

On Thu, 29 Aug 2002, Thomas A. Boyer wrote:
: Am I getting this straight?

As straight as any of us are getting it thus far.  :-)

The process is intended to be convergent.  That doesn't guarantee it
will converge, but that's the intention.

When I'm playing golf, I always expect to knock the ball into the hole.
And I'm happy if the ball ends up closer to the hole than it was.

Larry