Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-28 Thread Tom Christiansen

I consider recursive regexps very useful:

 $a = qr{ (? [^()]+ ) | \( (??{ $a }) \) };

Yes, they're "useful", but darned tricky sometimes, and in
ways other than simple regex-related stuff.  For example,
consider what happens if you do

my $regex = qr{ (? [^()]+ ) | \( (??{ $regex }) \) };

That doesn't work due to differing scopings on either side
of the assignment.  And clearly a non-regex approach could
be more legible for recursive parsing.

--tom

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.




Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-28 Thread Hugo

In [EMAIL PROTECTED], Tom Christiansen writes:
:I consider recursive regexps very useful:
:
: $a = qr{ (? [^()]+ ) | \( (??{ $a }) \) };
:
:Yes, they're "useful", but darned tricky sometimes, and in
:ways other than simple regex-related stuff.  For example,
:consider what happens if you do
:
:my $regex = qr{ (? [^()]+ ) | \( (??{ $regex }) \) };
:
:That doesn't work due to differing scopings on either side
:of the assignment.

Yes, this is a problem. But it bites people in other situations
as well:
  my $fib = sub { $_[0]  2 ? 1 : $fib($_[0] - 1) };

I haven't kept up with the non-regexp RFCs, but I hope someone
has suggested an alternative scoping that would permit these
cases to refer to the just-introduced variable. Perhaps we
should special-case qr{} and sub{} - I can't offhand think of
another area that suffers from this, and I don't think these
two areas would suffer from an inability to refer to the same-
-name variable in an outlying scope.

A useful alternative might be a different special case. Plucking
random grammar, perhaps:
  my $regex = qr{ (? [^()]+ ) | \( ^^ \) }x;

Certainly I think a simple self-reference is likely to be a
common enough use that it would help to avoid the full deferred
eval infrastructure, even when it works properly.

:And clearly a non-regex approach could be more legible for
:recursive parsing.

Like any aspect of programming, if you use it regularly it will
become easier to read. And comments are a wonderful thing.

Hugo



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-26 Thread Piers Cawley

Perl6 RFC Librarian [EMAIL PROTECTED] writes:

 This and other RFCs are available on the web at
   http://dev.perl.org/rfc/
 
 =head1 TITLE
 
 Ban Perl hooks into regexes
 
 =head1 VERSION
 
   Maintainer: Simon Cozens [EMAIL PROTECTED]
   Date: 25 Sep 2000 
   Mailing List: [EMAIL PROTECTED]
   Number: 308
   Version: 1
   Status: Developing
 
 =head1 ABSTRACT
 
 Remove C?{ code }, C??{ code } and friends.
 
 =head1 DESCRIPTION
 
 The regular expression engine may well be rewritten from scratch or
 borrowed from somewhere else. One of the scarier things we've seen
 recently is that Perl's engine casts back its Krakken tentacles into Perl
 and executes Perl code. This is spooky, tangled, and incestuous.
 (Although admittedly fun.)

It's *loads* of fun. Though admittedly, I've not used it in any *real*
code yet...

 It would be preferable to keep the regular expression engine as
 self-contained as possible, if nothing else to enable it to be used
 either outside Perl or inside standalone translated Perl programs
 without a Perl runtime.
 
 To do this, we'll have to remove the bits of the engine that call 
 Perl code. In short: C?{ code } and C??{ code } must die.

You don't *have* to remove 'em. You can just throw an exception during
compilation if some hypothetical 'no regex subs' pragma is there.

-- 
Piers
'063039183598121887134041122600:1917131105:Jaercunrlkso tPh.'=~/^(.{6})*
(.{6})[^:]*:(..)*(..).*:(??{'.{'.$2%$4.'}'})(.)(??{print$5})/x;print"\n"





Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-26 Thread Bart Lateur

On 25 Sep 2000 20:14:52 -, Perl6 RFC Librarian wrote:

Remove C?{ code }, C??{ code } and friends.

I'm putting the finishing touches on an RFC to drop (?{...}) and replace
it with something far more localized, hence cleaner: assertions, also in
Perl code. That way,

/(?!\d)(\d+)(?{$1  256})/

would only match integers between 0 and 255.

Communications between Perl code snippets inside a regex would be
strongly discouraged.

-- 
Bart.



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-26 Thread Michael Maraist


 On 25 Sep 2000 20:14:52 -, Perl6 RFC Librarian wrote:

 Remove C?{ code }, C??{ code } and friends.

 I'm putting the finishing touches on an RFC to drop (?{...}) and replace
 it with something far more localized, hence cleaner: assertions, also in
 Perl code. That way,

 /(?!\d)(\d+)(?{$1  256})/

 would only match integers between 0 and 255.

 Communications between Perl code snippets inside a regex would be
 strongly discouraged.

I can't believe that there currently isn't a means of killing a back-track
based on perl-code.  Looking through perlre it seems like you're right.  I'm
not really crazy about breaking backward compatibilty like this though.  It
shouldn't be too hard to find another character sequence to perform your
above job.

Beyond that, there's a growing rift between reg-ex extenders and purifiers.
I assume the functionality you're trying to produce above is to find the
first bare number that is less than 256 (your above would match the 25 in
256).. Easily fixed by inserting (?!\d) between the second and third
aggregates.  If you were to be more strict, you could more simply apply
\b(\d+)\b...

In any case, the above is not very intuitive to the casual observers as
might be

while ( /(\d+)/g ) {
  if ( $1  256 ) {
$answer = $1;
last;
  }
}

Likewise, complex matching tokens are the realm of a parser (I'm almost
getting tired of saying that).  Please be kind to your local maintainer,
don't proliferate n'th order code complexities such as recursive or
conditional reg-ex's.  Yes, I can mandate that my work doesn't use them, but
it doesn't mean that CPAN won't (and I often have to reverse engineer CPAN
modules to figure out why something isn't working).

That said, nobody should touch the various relative reg-ex operators.  I
look at reg-ex as a tokenizer, and things like (?...) which optimizes
reading, and (?!..), etc are very useful in this realm.

Just my $0.02

-Michael




Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-26 Thread Bart Lateur

On Tue, 26 Sep 2000 13:32:37 -0400, Michael Maraist wrote:



I can't believe that there currently isn't a means of killing a back-track
based on perl-code.  Looking through perlre it seems like you're right.

There is, but as MJD wrote: "it ain't pretty". Now, semantic checks or
assertions would be the only reason why I'd expect to be able to execute
perl code every time a part of a regex is succesfully parsed. Simply
look at RFC 197: a syntactic extension to regexes just to check if a
number is within a range! That is absurd, isn't it? Would a simple way
to include localized tests, *any*¨test, make more sense?

I'm
not really crazy about breaking backward compatibilty like this though.  It
shouldn't be too hard to find another character sequence to perform your
above job.

Me neither. But many prominent people in the Perl World have expressed
their amazement when they found out that the purpose of embedding Perl
in a regex wasn't aimed to just do this kind of tests. (?{...}) hasn't
even been tried out yet by many people, let alone that they'd use it in
production code. (?{...}) is notorious for dumping core. I can't see why
it can't be recycled. After all, it still executes Perl code.

Beyond that, there's a growing rift between reg-ex extenders and purifiers.
I assume the functionality you're trying to produce above is to find the
first bare number that is less than 256 (your above would match the 25 in
256).. 

You're forgetting about greediness. This test simply answers the
question: "will this do?" If the answer is always yes, the regex will
*always* match the same thing as it would do without this assertion.
Compare it to other assertions, such as /\b/, anchors (/^/ and /$/), and
lookahead and loobehind. These too don't really control what it would
match. They can only express their veto.

In any case, the above is not very intuitive to the casual observers as
might be

while ( /(\d+)/g ) {
  if ( $1  256 ) {
$answer = $1;
last;
  }
}

Maybe for this simple example. But the same can be said of lookahead and
lookbehind. It takes a *bit* of getting used to, but it's very simple,
and very powerful. IMO.

Likewise, complex matching tokens are the realm of a parser (I'm almost
getting tired of saying that).  Please be kind to your local maintainer,
don't proliferate n'th order code complexities such as recursive or
conditional reg-ex's.

I said nothing of recursive regexes. Again, just look at RFC 197, and
see what complex rules people would like to cram into a regex. Or look
at the examples in Friedl's book, to see what contortions people put
themselves through, just to make sure that they only match numbers
between 0 and 23:

/[01]?[09]|2[0-3]/
/[01]?[4-9]|[012]?[0-3]/

So you think these are easy on the maintainer? I think not. A simple
boolean expression, "match a number and it must be 23 or less", is far
simpler, at least to me.

-- 
Bart.



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-26 Thread Michael Maraist

 There is, but as MJD wrote: "it ain't pretty". Now, semantic checks or
 assertions would be the only reason why I'd expect to be able to execute
 perl code every time a part of a regex is succesfully parsed. Simply
 look at RFC 197: a syntactic extension to regexes just to check if a
 number is within a range! That is absurd, isn't it? Would a simple way
 to include localized tests, *any*¨test, make more sense?

I'm trying to stick to a general philosophy of what's in a reg-ex, and I can
almost justify assertions since as you say, \d, ^, $, (?=), etc are these
very sort of things.  I've been avoiding most of this discussion because
it's been so odd, I can't believe they'll ultimately get accepted.  Given
the argument that it's unlikely that (?{code}) has been implemented in
production, I can almost see changing it's symantics.  From what I
understand, the point would be to run some sort of perl-code and returned
defined / undefined, where undefined forces a back-track.

As you said, we shouldn't encourage full-fledged execution (since core dumps
are common).  I can definately see simple optimizations such as (?{$1 op
const}), though other interesting things such as (?{exists $keywords{ $1 }})
might proliferate.  That would expand to the general purpose (?{
isKeyword( $1 ) }), which then allows function calls within the reg-ex,
which is just asking for trouble.

One restriction might be to disallow various op-codes within the reg-ex
assertion.  Namely user-function calls, reg-ex's, and most OS or IO
operations.

A very common thing could be an optimal /(?\d+)(?{MIN  $1  $1  MAX})/,
where MIN and MAX are constants.

-Michael




Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-26 Thread Hugo

In 005501c027eb$43bafe60$[EMAIL PROTECTED], "Michael Maraist" writes:
:As you said, we shouldn't encourage full-fledged execution (since core dumps
:are common).

Let's not redefine the language just because there are bugs to fix.
Surely it is better to concentrate first on fixing the bugs so that
we can then more fairly judge whether the feature is useful enough
to justify its existence.

:One restriction might be to disallow various op-codes within the reg-ex
:assertion.  Namely user-function calls, reg-ex's, and most OS or IO
:operations.

That seems quite unreasonable. Why do you _want_ to restrict someone
from calling isKeyword($1) within the regexp, which will then read
the keyword patterns from a file and check $1 against those patterns
using regexps? It seems like an entirely reasonable and useful thing
to do.

Hugo



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-26 Thread Hugo

In [EMAIL PROTECTED], Bart Lateur writes:
:On 25 Sep 2000 20:14:52 -, Perl6 RFC Librarian wrote:
:
:Remove C?{ code }, C??{ code } and friends.
:
:I'm putting the finishing touches on an RFC to drop (?{...}) and replace
:it with something far more localized, hence cleaner: assertions, also in
:Perl code. That way,
:
:   /(?!\d)(\d+)(?{$1  256})/
:
:would only match integers between 0 and 255.

I'd like to suggest an alternative semantic for this: rename
(??{ code }) to (?{ code }), and use the newly freed (??{ code })
for the assertions. (I was about to write an RFC for just that, so
I'm glad I can save a bit of time. :)

Hugo



RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Ban Perl hooks into regexes

=head1 VERSION

  Maintainer: Simon Cozens [EMAIL PROTECTED]
  Date: 25 Sep 2000 
  Mailing List: [EMAIL PROTECTED]
  Number: 308
  Version: 1
  Status: Developing

=head1 ABSTRACT

Remove C?{ code }, C??{ code } and friends.

=head1 DESCRIPTION

The regular expression engine may well be rewritten from scratch or
borrowed from somewhere else. One of the scarier things we've seen
recently is that Perl's engine casts back its Krakken tentacles into Perl
and executes Perl code. This is spooky, tangled, and incestuous.
(Although admittedly fun.)

It would be preferable to keep the regular expression engine as
self-contained as possible, if nothing else to enable it to be used
either outside Perl or inside standalone translated Perl programs
without a Perl runtime.

To do this, we'll have to remove the bits of the engine that call 
Perl code. In short: C?{ code } and C??{ code } must die.

=head1 IMPLEMENTATION

It's more of an unimplementation really.

=head1 REFERENCES

None.




Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Michael Maraist

 Ban Perl hooks into regexes

 =head1 ABSTRACT

 Remove C?{ code }, C??{ code } and friends.


At first, I thought you were crazy, then I read

It would be preferable to keep the regular expression engine as
self-contained as possible, if nothing else to enable it to be used
either outside Perl or inside standalone translated Perl programs
without a Perl runtime.

Which makes a lot of sence in the development field.

Tom has mentioned that the reg-ex engine is getting really out of hand;
it's hard enough to document clearly, much less be understandible to the
maintainer (or even the debugger).

A lot of what is trying to happen in (?{..}) and friends is parsing.  To
quote Star Trek Undiscovered Country, "Just because we can do a thing,
doesn't mean we should."  Tom and I have commented that parsing should be
done in a PARSER, not a lexer (like our beloved reg-ex engine).  RecDescent
and Yacc do a wonderful job of providing parsing power within perl.

I'd suggest you modify your RFC to summarize the above; that (?{}) and
friends are parsers, and we already have RecDescent / etc. which are much
easier to understand, and don't require too much additional overhead.

Other than the inherent coolness of having hooks into the reg-ex code, I
don't really see much real use from it other than debugging; eg (?{ print
"Still here\n" }).  I could go either way on the topic, but I'm definately
of the opinion that we shouldn't continue down this dark path any further.


-Michael




Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Mark-Jason Dominus


I think the proposal that Joe McMahon and I are finishing up now will
make these obsolete anyway.




Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Hugo

In [EMAIL PROTECTED], Perl6 RFC Librarian writes:
:It would be preferable to keep the regular expression engine as
:self-contained as possible, if nothing else to enable it to be used
:either outside Perl or inside standalone translated Perl programs
:without a Perl runtime.
:
:To do this, we'll have to remove the bits of the engine that call 
:Perl code. In short: C?{ code } and C??{ code } must die.

I would have thought it more reasonable, if you wish to create
standalone translated Perl programs without a Perl runtime, to fail
with a helpful error if you encounter a construct that won't permit
it. You'll need to remove chunks of eval() and do() as well,
otherwise, and probably more besides.

In the context of a more shareable regexp engine, I would like to
see (? and (?? stay, but they need to be implemented more cleanly.
You could handle them quite nicely, I think, with just three
well-defined external hooks: one to find the matching brace at the
end of the code, one to parse the code, and one to run the code.
Anyone wishing to re-use the regexp library could then choose either
to keep the default drop-in replacements for those hooks (that die)
or provide their own equivalents to the perl usage.

I consider recursive regexps very useful:

 $a = qr{ (? [^()]+ ) | \( (??{ $a }) \) };

.. and I class re-eval in general in the arena of 'making hard
things possible'. But whether or not they stay, it would probably
also be useful to have a more direct way of expressing simple
recursive regexps such as the above without resorting to a costly
eval. When I've tried to come up with an appropriate restriction,
however, I find it very difficult to pick a dividing line.

Hugo



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Hugo

In [EMAIL PROTECTED], Perl6 RFC Librarian writes:
:=head1 ABSTRACT
:
:Remove C?{ code }, C??{ code } and friends.

Whoops, I missed this bit - what 'friends' do you mean?

Hugo



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Simon Cozens

On Mon, Sep 25, 2000 at 11:31:08PM +0100, Hugo wrote:
 In [EMAIL PROTECTED], Perl6 RFC Librarian writes:
 :=head1 ABSTRACT
 :
 :Remove C?{ code }, C??{ code } and friends.
 
 Whoops, I missed this bit - what 'friends' do you mean?

Whatever even more bizarre extensions people will have suggested by now...

-- 
DEC diagnostics would run on a dead whale.
-- Mel Ferentz



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Simon Cozens

On Mon, Sep 25, 2000 at 08:56:47PM +, Mark-Jason Dominus wrote:
 I think the proposal that Joe McMahon and I are finishing up now will
 make these obsolete anyway.

Good! The less I have to maintain the better...

-- 
Keep the number of passes in a compiler to a minimum.
-- D. Gries



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Simon Cozens

On Mon, Sep 25, 2000 at 04:55:18PM -0400, Michael Maraist wrote:
 A lot of what is trying to happen in (?{..}) and friends is parsing.

That's not the problem that I'm trying to solve. The problem I'm trying
to solve is interdependence. Parsing is neither here nor there.
 
-- 
Intel engineering seem to have misheard Intel marketing strategy. The phrase
was "Divide and conquer" not "Divide and cock up"
(By [EMAIL PROTECTED], Alan Cox)



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Mark-Jason Dominus


 On Mon, Sep 25, 2000 at 08:56:47PM +, Mark-Jason Dominus wrote:
  I think the proposal that Joe McMahon and I are finishing up now will
  make these obsolete anyway.
 
 Good! The less I have to maintain the better...

Sorry, I meant that it would make (??...) and (?{...}) obsolete, not
that it will make your RFC obsolete.  Our proposal is agnostic about
whether (??...) and (?{...}) should be eliminated.




Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Michael Maraist

From: "Hugo" [EMAIL PROTECTED]



 :Remove C?{ code }, C??{ code } and friends.

 Whoops, I missed this bit - what 'friends' do you mean?

Going by the topic, I would assume it involves (?(cond) true-exp |
false-exp).
There's also the $^R or what-ever it was that is the result of (?{ }).
Basically the code-like operations found in perl 5.005 and 5.6's perlre.

-Michael




Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Michael Maraist

From: "Simon Cozens" [EMAIL PROTECTED]

  A lot of what is trying to happen in (?{..}) and friends is parsing.

 That's not the problem that I'm trying to solve. The problem I'm trying
 to solve is interdependence. Parsing is neither here nor there.

Well, I recognize that your focus was not on parsing.  However, I don't feel
that perl-abstractness is a key deliverable of perl.  My comment was
primarly on how the world might be a better place with reg-ex's not getting
into algorithms that are better solved elsewhere.  I just thought it might
help your cause if you expanded your rationale.

-Michael