RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Ban Perl hooks into regexes

=head1 VERSION

  Maintainer: Simon Cozens [EMAIL PROTECTED]
  Date: 25 Sep 2000 
  Mailing List: [EMAIL PROTECTED]
  Number: 308
  Version: 1
  Status: Developing

=head1 ABSTRACT

Remove C?{ code }, C??{ code } and friends.

=head1 DESCRIPTION

The regular expression engine may well be rewritten from scratch or
borrowed from somewhere else. One of the scarier things we've seen
recently is that Perl's engine casts back its Krakken tentacles into Perl
and executes Perl code. This is spooky, tangled, and incestuous.
(Although admittedly fun.)

It would be preferable to keep the regular expression engine as
self-contained as possible, if nothing else to enable it to be used
either outside Perl or inside standalone translated Perl programs
without a Perl runtime.

To do this, we'll have to remove the bits of the engine that call 
Perl code. In short: C?{ code } and C??{ code } must die.

=head1 IMPLEMENTATION

It's more of an unimplementation really.

=head1 REFERENCES

None.




Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Michael Maraist

 Ban Perl hooks into regexes

 =head1 ABSTRACT

 Remove C?{ code }, C??{ code } and friends.


At first, I thought you were crazy, then I read

It would be preferable to keep the regular expression engine as
self-contained as possible, if nothing else to enable it to be used
either outside Perl or inside standalone translated Perl programs
without a Perl runtime.

Which makes a lot of sence in the development field.

Tom has mentioned that the reg-ex engine is getting really out of hand;
it's hard enough to document clearly, much less be understandible to the
maintainer (or even the debugger).

A lot of what is trying to happen in (?{..}) and friends is parsing.  To
quote Star Trek Undiscovered Country, "Just because we can do a thing,
doesn't mean we should."  Tom and I have commented that parsing should be
done in a PARSER, not a lexer (like our beloved reg-ex engine).  RecDescent
and Yacc do a wonderful job of providing parsing power within perl.

I'd suggest you modify your RFC to summarize the above; that (?{}) and
friends are parsers, and we already have RecDescent / etc. which are much
easier to understand, and don't require too much additional overhead.

Other than the inherent coolness of having hooks into the reg-ex code, I
don't really see much real use from it other than debugging; eg (?{ print
"Still here\n" }).  I could go either way on the topic, but I'm definately
of the opinion that we shouldn't continue down this dark path any further.


-Michael




Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Mark-Jason Dominus


I think the proposal that Joe McMahon and I are finishing up now will
make these obsolete anyway.




Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Hugo

In [EMAIL PROTECTED], Perl6 RFC Librarian writes:
:It would be preferable to keep the regular expression engine as
:self-contained as possible, if nothing else to enable it to be used
:either outside Perl or inside standalone translated Perl programs
:without a Perl runtime.
:
:To do this, we'll have to remove the bits of the engine that call 
:Perl code. In short: C?{ code } and C??{ code } must die.

I would have thought it more reasonable, if you wish to create
standalone translated Perl programs without a Perl runtime, to fail
with a helpful error if you encounter a construct that won't permit
it. You'll need to remove chunks of eval() and do() as well,
otherwise, and probably more besides.

In the context of a more shareable regexp engine, I would like to
see (? and (?? stay, but they need to be implemented more cleanly.
You could handle them quite nicely, I think, with just three
well-defined external hooks: one to find the matching brace at the
end of the code, one to parse the code, and one to run the code.
Anyone wishing to re-use the regexp library could then choose either
to keep the default drop-in replacements for those hooks (that die)
or provide their own equivalents to the perl usage.

I consider recursive regexps very useful:

 $a = qr{ (? [^()]+ ) | \( (??{ $a }) \) };

.. and I class re-eval in general in the arena of 'making hard
things possible'. But whether or not they stay, it would probably
also be useful to have a more direct way of expressing simple
recursive regexps such as the above without resorting to a costly
eval. When I've tried to come up with an appropriate restriction,
however, I find it very difficult to pick a dividing line.

Hugo



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Hugo

In [EMAIL PROTECTED], Perl6 RFC Librarian writes:
:=head1 ABSTRACT
:
:Remove C?{ code }, C??{ code } and friends.

Whoops, I missed this bit - what 'friends' do you mean?

Hugo



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Simon Cozens

On Mon, Sep 25, 2000 at 11:31:08PM +0100, Hugo wrote:
 In [EMAIL PROTECTED], Perl6 RFC Librarian writes:
 :=head1 ABSTRACT
 :
 :Remove C?{ code }, C??{ code } and friends.
 
 Whoops, I missed this bit - what 'friends' do you mean?

Whatever even more bizarre extensions people will have suggested by now...

-- 
DEC diagnostics would run on a dead whale.
-- Mel Ferentz



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Simon Cozens

On Mon, Sep 25, 2000 at 08:56:47PM +, Mark-Jason Dominus wrote:
 I think the proposal that Joe McMahon and I are finishing up now will
 make these obsolete anyway.

Good! The less I have to maintain the better...

-- 
Keep the number of passes in a compiler to a minimum.
-- D. Gries



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Simon Cozens

On Mon, Sep 25, 2000 at 04:55:18PM -0400, Michael Maraist wrote:
 A lot of what is trying to happen in (?{..}) and friends is parsing.

That's not the problem that I'm trying to solve. The problem I'm trying
to solve is interdependence. Parsing is neither here nor there.
 
-- 
Intel engineering seem to have misheard Intel marketing strategy. The phrase
was "Divide and conquer" not "Divide and cock up"
(By [EMAIL PROTECTED], Alan Cox)



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Mark-Jason Dominus


 On Mon, Sep 25, 2000 at 08:56:47PM +, Mark-Jason Dominus wrote:
  I think the proposal that Joe McMahon and I are finishing up now will
  make these obsolete anyway.
 
 Good! The less I have to maintain the better...

Sorry, I meant that it would make (??...) and (?{...}) obsolete, not
that it will make your RFC obsolete.  Our proposal is agnostic about
whether (??...) and (?{...}) should be eliminated.




RFC 317 (v1) Access to optimisation information for regular expressions

2000-09-25 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Access to optimisation information for regular expressions

=head1 VERSION

  Maintainer: Hugo van der Sanden ([EMAIL PROTECTED])
  Date: 25 September 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 317
  Version: 1
  Status: Developing

=head1 ABSTRACT

Currently you can see optimisation information for a regexp only
by running with -Dr in a debugging perl and looking at STDERR.
There should be an interface that allows us to read this information
programmatically and possibly to alter it.

=head1 DESCRIPTION

At its core, the regular expression matcher knows how to check
whether a pattern matches a string starting at a particular location.
When the regular expression is compiled, perl may also look for
optimisation information that can be used to rule out some or all
of the possible starting locations in advance.

Currently you can find out about the optimisation information
captured for a particular regexp only in a perl built with
DEBUGGING, by turning on -Dr:

  % perl -Dr -e 'qr{test.*pattern}'
  Compiling REx `test.*pattern'
  size 8 first at 1
  rarest char p at 0
  rarest char s at 2
 1: EXACT test(3)
 3: STAR(5)
 4:   REG_ANY(0)
 5: EXACT pattern(8)
 8: END(0)
  anchored `test' at 0 floating `pattern' at 4..2147483647 (checking floating) minlen 
11 
  Omitting $` $ $' support.
  
  EXECUTING...
  
  Freeing REx: `test.*pattern'
  %

For some purposes it would help to be able to get at this information
programmatically: the test suite could take advantage of this (to test
that optimisations occur as expected), and it could also be useful for
enhanced development tools, such as a graphical regexp debugger.

Additionally there are times that the programmer is able to supply
optimisation that the regexp engine cannot discover for itself. While
we could consider making it possible to modify these values, it is
important to remember that these are only hints: the regexp engine
is free to ignore them. So there is a danger that people will misuse
writable optimisation information to move part of the logic out of
the regexp, and then blame us when it breaks.

Suggested example usage:

  % perl -wl
  use re;
  $a = qr{test.*pattern};
  print join ':', $a-fixed_string, $a-floating_string, $a-minlen;
  __END__
  test:pattern:11
  %

.. but perhaps a single new method returning a hashref would be
cleaner and more extensible:

  $opt = $a-optimisation;
  print join ':', @$opt{qw/ fixed_string floating_string minlen /};

=head1 IMPLEMENTATION

Straightforward: add interface functions within the perl core to give
access to read and/or write the optimisation values; add methods in
re.pm that use XS code to reach the internal functions.

=head1 REFERENCES

Prompted by discussion of RFC 72:

RFC 72: Variable-length lookbehind: the regexp engine should also go backward.




Re: RFC 316 (v1) Regex modifier for support of chunk processing and prefix matching

2000-09-25 Thread Damian Conway


Wouldn't this interact rather badly with the /gc option (which also leaves
Cpos set on failure)?

This question arose because I was trying to work out how one would write a
lexer with the new /z option, and it made my head ache ;-)


As you can see from the example code, the program flow stays very close 
to what people would ordinarily program under normal circumstances.

By contrast, RFC 93 proposes another solution to the same problem, but 
using callbacks. Since the same sub must do one of several things, the 
first thing that needs to be done is to channel different kinds of 
requests to their own handler. As a result, you need a complete rewrite 
from what you'd use in the ordinary case.

I think that a lot of people will find my approach far less
intimidating.


I'm not sure I see that this:
   
my $chunksize = 1024;
while(read FH, my $buffer, $chunksize) {
while(/(abcd|bc)/gz) {
# do something boring with the matched string:
print "$1\n";
}
if(defined pos) {  # end-of-buffer exception
# append the next chunk to the current one
read FH, $buffer, $chunksize, length $buffer;
# retry matching
redo;
}
}

is less intimidating or closer to the "ordinary program flow"  than:

\*FH =~ /(abcd|bc)/g;

(as proposed in RFC 93).

  
=head2 Match prefix

It can be useful to be able to recognize if a string could possibly be a
prefix for a potential match. For example in an interactive program, 
you want to allow a user to enter a number into an input field, but 
nothing else. After every single keystroke, you can test what he just 
entered against a regex matching the valid format for a number, so that 
C1234E can be recognized as a prefix for the regex

/^\d+\.?\d*(?:E[+-]?\d+)$/

Isn't this just:

\*STDIN =~ /^\d+\.?\d*(?:E[+-]?\d+)$/
or die "Not a number";

???

Damian



Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Michael Maraist

From: "Hugo" [EMAIL PROTECTED]



 :Remove C?{ code }, C??{ code } and friends.

 Whoops, I missed this bit - what 'friends' do you mean?

Going by the topic, I would assume it involves (?(cond) true-exp |
false-exp).
There's also the $^R or what-ever it was that is the result of (?{ }).
Basically the code-like operations found in perl 5.005 and 5.6's perlre.

-Michael




Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Michael Maraist

From: "Simon Cozens" [EMAIL PROTECTED]

  A lot of what is trying to happen in (?{..}) and friends is parsing.

 That's not the problem that I'm trying to solve. The problem I'm trying
 to solve is interdependence. Parsing is neither here nor there.

Well, I recognize that your focus was not on parsing.  However, I don't feel
that perl-abstractness is a key deliverable of perl.  My comment was
primarly on how the world might be a better place with reg-ex's not getting
into algorithms that are better solved elsewhere.  I just thought it might
help your cause if you expanded your rationale.

-Michael