RFC 308 (v1) Ban Perl hooks into regexes
This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Ban Perl hooks into regexes =head1 VERSION Maintainer: Simon Cozens [EMAIL PROTECTED] Date: 25 Sep 2000 Mailing List: [EMAIL PROTECTED] Number: 308 Version: 1 Status: Developing =head1 ABSTRACT Remove C?{ code }, C??{ code } and friends. =head1 DESCRIPTION The regular expression engine may well be rewritten from scratch or borrowed from somewhere else. One of the scarier things we've seen recently is that Perl's engine casts back its Krakken tentacles into Perl and executes Perl code. This is spooky, tangled, and incestuous. (Although admittedly fun.) It would be preferable to keep the regular expression engine as self-contained as possible, if nothing else to enable it to be used either outside Perl or inside standalone translated Perl programs without a Perl runtime. To do this, we'll have to remove the bits of the engine that call Perl code. In short: C?{ code } and C??{ code } must die. =head1 IMPLEMENTATION It's more of an unimplementation really. =head1 REFERENCES None.
Re: RFC 308 (v1) Ban Perl hooks into regexes
Ban Perl hooks into regexes =head1 ABSTRACT Remove C?{ code }, C??{ code } and friends. At first, I thought you were crazy, then I read It would be preferable to keep the regular expression engine as self-contained as possible, if nothing else to enable it to be used either outside Perl or inside standalone translated Perl programs without a Perl runtime. Which makes a lot of sence in the development field. Tom has mentioned that the reg-ex engine is getting really out of hand; it's hard enough to document clearly, much less be understandible to the maintainer (or even the debugger). A lot of what is trying to happen in (?{..}) and friends is parsing. To quote Star Trek Undiscovered Country, "Just because we can do a thing, doesn't mean we should." Tom and I have commented that parsing should be done in a PARSER, not a lexer (like our beloved reg-ex engine). RecDescent and Yacc do a wonderful job of providing parsing power within perl. I'd suggest you modify your RFC to summarize the above; that (?{}) and friends are parsers, and we already have RecDescent / etc. which are much easier to understand, and don't require too much additional overhead. Other than the inherent coolness of having hooks into the reg-ex code, I don't really see much real use from it other than debugging; eg (?{ print "Still here\n" }). I could go either way on the topic, but I'm definately of the opinion that we shouldn't continue down this dark path any further. -Michael
Re: RFC 308 (v1) Ban Perl hooks into regexes
I think the proposal that Joe McMahon and I are finishing up now will make these obsolete anyway.
Re: RFC 308 (v1) Ban Perl hooks into regexes
In [EMAIL PROTECTED], Perl6 RFC Librarian writes: :It would be preferable to keep the regular expression engine as :self-contained as possible, if nothing else to enable it to be used :either outside Perl or inside standalone translated Perl programs :without a Perl runtime. : :To do this, we'll have to remove the bits of the engine that call :Perl code. In short: C?{ code } and C??{ code } must die. I would have thought it more reasonable, if you wish to create standalone translated Perl programs without a Perl runtime, to fail with a helpful error if you encounter a construct that won't permit it. You'll need to remove chunks of eval() and do() as well, otherwise, and probably more besides. In the context of a more shareable regexp engine, I would like to see (? and (?? stay, but they need to be implemented more cleanly. You could handle them quite nicely, I think, with just three well-defined external hooks: one to find the matching brace at the end of the code, one to parse the code, and one to run the code. Anyone wishing to re-use the regexp library could then choose either to keep the default drop-in replacements for those hooks (that die) or provide their own equivalents to the perl usage. I consider recursive regexps very useful: $a = qr{ (? [^()]+ ) | \( (??{ $a }) \) }; .. and I class re-eval in general in the arena of 'making hard things possible'. But whether or not they stay, it would probably also be useful to have a more direct way of expressing simple recursive regexps such as the above without resorting to a costly eval. When I've tried to come up with an appropriate restriction, however, I find it very difficult to pick a dividing line. Hugo
Re: RFC 308 (v1) Ban Perl hooks into regexes
In [EMAIL PROTECTED], Perl6 RFC Librarian writes: :=head1 ABSTRACT : :Remove C?{ code }, C??{ code } and friends. Whoops, I missed this bit - what 'friends' do you mean? Hugo
Re: RFC 308 (v1) Ban Perl hooks into regexes
On Mon, Sep 25, 2000 at 11:31:08PM +0100, Hugo wrote: In [EMAIL PROTECTED], Perl6 RFC Librarian writes: :=head1 ABSTRACT : :Remove C?{ code }, C??{ code } and friends. Whoops, I missed this bit - what 'friends' do you mean? Whatever even more bizarre extensions people will have suggested by now... -- DEC diagnostics would run on a dead whale. -- Mel Ferentz
Re: RFC 308 (v1) Ban Perl hooks into regexes
On Mon, Sep 25, 2000 at 08:56:47PM +, Mark-Jason Dominus wrote: I think the proposal that Joe McMahon and I are finishing up now will make these obsolete anyway. Good! The less I have to maintain the better... -- Keep the number of passes in a compiler to a minimum. -- D. Gries
Re: RFC 308 (v1) Ban Perl hooks into regexes
On Mon, Sep 25, 2000 at 04:55:18PM -0400, Michael Maraist wrote: A lot of what is trying to happen in (?{..}) and friends is parsing. That's not the problem that I'm trying to solve. The problem I'm trying to solve is interdependence. Parsing is neither here nor there. -- Intel engineering seem to have misheard Intel marketing strategy. The phrase was "Divide and conquer" not "Divide and cock up" (By [EMAIL PROTECTED], Alan Cox)
Re: RFC 308 (v1) Ban Perl hooks into regexes
On Mon, Sep 25, 2000 at 08:56:47PM +, Mark-Jason Dominus wrote: I think the proposal that Joe McMahon and I are finishing up now will make these obsolete anyway. Good! The less I have to maintain the better... Sorry, I meant that it would make (??...) and (?{...}) obsolete, not that it will make your RFC obsolete. Our proposal is agnostic about whether (??...) and (?{...}) should be eliminated.
RFC 317 (v1) Access to optimisation information for regular expressions
This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Access to optimisation information for regular expressions =head1 VERSION Maintainer: Hugo van der Sanden ([EMAIL PROTECTED]) Date: 25 September 2000 Mailing List: [EMAIL PROTECTED] Number: 317 Version: 1 Status: Developing =head1 ABSTRACT Currently you can see optimisation information for a regexp only by running with -Dr in a debugging perl and looking at STDERR. There should be an interface that allows us to read this information programmatically and possibly to alter it. =head1 DESCRIPTION At its core, the regular expression matcher knows how to check whether a pattern matches a string starting at a particular location. When the regular expression is compiled, perl may also look for optimisation information that can be used to rule out some or all of the possible starting locations in advance. Currently you can find out about the optimisation information captured for a particular regexp only in a perl built with DEBUGGING, by turning on -Dr: % perl -Dr -e 'qr{test.*pattern}' Compiling REx `test.*pattern' size 8 first at 1 rarest char p at 0 rarest char s at 2 1: EXACT test(3) 3: STAR(5) 4: REG_ANY(0) 5: EXACT pattern(8) 8: END(0) anchored `test' at 0 floating `pattern' at 4..2147483647 (checking floating) minlen 11 Omitting $` $ $' support. EXECUTING... Freeing REx: `test.*pattern' % For some purposes it would help to be able to get at this information programmatically: the test suite could take advantage of this (to test that optimisations occur as expected), and it could also be useful for enhanced development tools, such as a graphical regexp debugger. Additionally there are times that the programmer is able to supply optimisation that the regexp engine cannot discover for itself. While we could consider making it possible to modify these values, it is important to remember that these are only hints: the regexp engine is free to ignore them. So there is a danger that people will misuse writable optimisation information to move part of the logic out of the regexp, and then blame us when it breaks. Suggested example usage: % perl -wl use re; $a = qr{test.*pattern}; print join ':', $a-fixed_string, $a-floating_string, $a-minlen; __END__ test:pattern:11 % .. but perhaps a single new method returning a hashref would be cleaner and more extensible: $opt = $a-optimisation; print join ':', @$opt{qw/ fixed_string floating_string minlen /}; =head1 IMPLEMENTATION Straightforward: add interface functions within the perl core to give access to read and/or write the optimisation values; add methods in re.pm that use XS code to reach the internal functions. =head1 REFERENCES Prompted by discussion of RFC 72: RFC 72: Variable-length lookbehind: the regexp engine should also go backward.
Re: RFC 316 (v1) Regex modifier for support of chunk processing and prefix matching
Wouldn't this interact rather badly with the /gc option (which also leaves Cpos set on failure)? This question arose because I was trying to work out how one would write a lexer with the new /z option, and it made my head ache ;-) As you can see from the example code, the program flow stays very close to what people would ordinarily program under normal circumstances. By contrast, RFC 93 proposes another solution to the same problem, but using callbacks. Since the same sub must do one of several things, the first thing that needs to be done is to channel different kinds of requests to their own handler. As a result, you need a complete rewrite from what you'd use in the ordinary case. I think that a lot of people will find my approach far less intimidating. I'm not sure I see that this: my $chunksize = 1024; while(read FH, my $buffer, $chunksize) { while(/(abcd|bc)/gz) { # do something boring with the matched string: print "$1\n"; } if(defined pos) { # end-of-buffer exception # append the next chunk to the current one read FH, $buffer, $chunksize, length $buffer; # retry matching redo; } } is less intimidating or closer to the "ordinary program flow" than: \*FH =~ /(abcd|bc)/g; (as proposed in RFC 93). =head2 Match prefix It can be useful to be able to recognize if a string could possibly be a prefix for a potential match. For example in an interactive program, you want to allow a user to enter a number into an input field, but nothing else. After every single keystroke, you can test what he just entered against a regex matching the valid format for a number, so that C1234E can be recognized as a prefix for the regex /^\d+\.?\d*(?:E[+-]?\d+)$/ Isn't this just: \*STDIN =~ /^\d+\.?\d*(?:E[+-]?\d+)$/ or die "Not a number"; ??? Damian
Re: RFC 308 (v1) Ban Perl hooks into regexes
From: "Hugo" [EMAIL PROTECTED] :Remove C?{ code }, C??{ code } and friends. Whoops, I missed this bit - what 'friends' do you mean? Going by the topic, I would assume it involves (?(cond) true-exp | false-exp). There's also the $^R or what-ever it was that is the result of (?{ }). Basically the code-like operations found in perl 5.005 and 5.6's perlre. -Michael
Re: RFC 308 (v1) Ban Perl hooks into regexes
From: "Simon Cozens" [EMAIL PROTECTED] A lot of what is trying to happen in (?{..}) and friends is parsing. That's not the problem that I'm trying to solve. The problem I'm trying to solve is interdependence. Parsing is neither here nor there. Well, I recognize that your focus was not on parsing. However, I don't feel that perl-abstractness is a key deliverable of perl. My comment was primarly on how the world might be a better place with reg-ex's not getting into algorithms that are better solved elsewhere. I just thought it might help your cause if you expanded your rationale. -Michael