Re: RFC 316 (v1) Regex modifier for support of chunk processing and prefix matching
On Tue, 26 Sep 2000 11:55:32 +1100 (EST), Damian Conway wrote: Wouldn't this interact rather badly with the /gc option (which also leaves Cpos set on failure)? Yes. The easy way out is disallow combining /gc wit h/z. But, since this typically one of the applications it is aimed for, I should find a solution. A different interface, is one option. This question arose because I was trying to work out how one would write a lexer with the new /z option, and it made my head ache ;-) Heheh. Your turn. ;-) I'm not sure I see that this: ... is less intimidating or closer to the "ordinary program flow" than: \*FH =~ /(abcd|bc)/g; (as proposed in RFC 93). Was that what was proposed? I think not. It was: sub { ... } =~ /(abcd|bc)/g; But I kinda like that syntax. But, in practice, it looks too much like black magic: * where is the sting stored? It looks like it disappears into thin air. * What about pushback? Your proposal depends on it, but standard filehandles don't support it, IMO. Does this require a TIEHANDLE implementation? * Your regex shouldn't consume any more characters friom the filehandle than it matches? Where are the reamining characters pushed back into? After every single keystroke, you can test what he just entered against a regex matching the valid format for a number, so that C1234E can be recognized as a prefix for the regex /^\d+\.?\d*(?:E[+-]?\d+)$/ Isn't this just: \*STDIN =~ /^\d+\.?\d*(?:E[+-]?\d+)$/ or die "Not a number"; ??? No. First of all, you can't override the behaviour of STDIN. That reads a whole line, then checks it, and then your script dies if it's not right. I want a test on every single keystroke, see if it's in sync with the regex, and if it's not, reject it, i.e. no insertion in the uinput buffer, and no echo on screen. Besides, you can't be sure your data comes from a filehandle (or compatible handle). Not in a GUI. -- Bart.
Re: RFC 316 (v1) Regex modifier for support of chunk processing and prefix matching
On Fri, 29 Sep 2000 13:19:47 +0100, Hugo wrote: I think that involves rewriting your /p example something like: if (/^$pat$/z) { print "found a complete match"; } elsif (defined pos) { print "found a prefix match"; } else { print "not a match"; } Except that this isn't exactly what would happen. Look, "1234E+2" is a complete string matching the regex, but it could be that it's just a prefix for "1234E+21". So, /^$pat$/z should fail. No? This doesn't seem too intuitive, but that's a result from a minimal interface. -- Bart.
Re: RFC 316 (v1) Regex modifier for support of chunk processing and prefix matching
Wouldn't this interact rather badly with the /gc option (which also leaves Cpos set on failure)? This question arose because I was trying to work out how one would write a lexer with the new /z option, and it made my head ache ;-) As you can see from the example code, the program flow stays very close to what people would ordinarily program under normal circumstances. By contrast, RFC 93 proposes another solution to the same problem, but using callbacks. Since the same sub must do one of several things, the first thing that needs to be done is to channel different kinds of requests to their own handler. As a result, you need a complete rewrite from what you'd use in the ordinary case. I think that a lot of people will find my approach far less intimidating. I'm not sure I see that this: my $chunksize = 1024; while(read FH, my $buffer, $chunksize) { while(/(abcd|bc)/gz) { # do something boring with the matched string: print "$1\n"; } if(defined pos) { # end-of-buffer exception # append the next chunk to the current one read FH, $buffer, $chunksize, length $buffer; # retry matching redo; } } is less intimidating or closer to the "ordinary program flow" than: \*FH =~ /(abcd|bc)/g; (as proposed in RFC 93). =head2 Match prefix It can be useful to be able to recognize if a string could possibly be a prefix for a potential match. For example in an interactive program, you want to allow a user to enter a number into an input field, but nothing else. After every single keystroke, you can test what he just entered against a regex matching the valid format for a number, so that C1234E can be recognized as a prefix for the regex /^\d+\.?\d*(?:E[+-]?\d+)$/ Isn't this just: \*STDIN =~ /^\d+\.?\d*(?:E[+-]?\d+)$/ or die "Not a number"; ??? Damian