Re: RFC 316 (v1) Regex modifier for support of chunk processing and prefix matching

2000-09-30 Thread Bart Lateur

On Tue, 26 Sep 2000 11:55:32 +1100 (EST), Damian Conway wrote:

Wouldn't this interact rather badly with the /gc option (which also leaves
Cpos set on failure)?

Yes.

The easy way out is disallow combining /gc wit h/z. But, since this
typically one of the applications it is aimed for, I should find a
solution. A different interface, is one option.

This question arose because I was trying to work out how one would write a
lexer with the new /z option, and it made my head ache ;-)

Heheh. Your turn.   ;-)


I'm not sure I see that this:
...
is less intimidating or closer to the "ordinary program flow"  than:

   \*FH =~ /(abcd|bc)/g;

(as proposed in RFC 93).

Was that what was proposed? I think not. It was:

sub { ... } =~ /(abcd|bc)/g;


But I kinda like that syntax. But, in practice, it looks too much like
black magic:

 * where is the sting stored? It looks like it disappears into thin air.
 
 * What about pushback? Your proposal depends on it, but standard
filehandles don't support it, IMO. Does this require a TIEHANDLE
implementation?

 * Your regex shouldn't consume any more characters friom the filehandle
than it matches? Where are the reamining characters pushed back into?

After every single keystroke, you can test what he just 
entered against a regex matching the valid format for a number, so that 
C1234E can be recognized as a prefix for the regex

/^\d+\.?\d*(?:E[+-]?\d+)$/

Isn't this just:

   \*STDIN =~ /^\d+\.?\d*(?:E[+-]?\d+)$/
   or die "Not a number";

???

No. First of all, you can't override the behaviour of STDIN. That reads
a whole line, then checks it, and then your script dies if it's not
right.

I want a test on every single keystroke, see if it's in sync with the
regex, and if it's not, reject it, i.e. no insertion in the uinput
buffer, and no echo on screen. Besides, you can't be sure your data
comes from a filehandle (or compatible handle). Not in a GUI.

-- 
Bart.



Re: RFC 316 (v1) Regex modifier for support of chunk processing and prefix matching

2000-09-29 Thread Bart Lateur

On Fri, 29 Sep 2000 13:19:47 +0100, Hugo wrote:

I think that involves
rewriting your /p example something like:
  if (/^$pat$/z) {
print "found a complete match";
  } elsif (defined pos) {
print "found a prefix match";
  } else {
print "not a match";
  }

Except that this isn't exactly what would happen. Look, "1234E+2" is a
complete string matching the regex, but it could be that it's just a
prefix for "1234E+21". So, /^$pat$/z should fail. No? This doesn't seem
too intuitive, but that's a result from a minimal interface.

-- 
Bart.



Re: RFC 316 (v1) Regex modifier for support of chunk processing and prefix matching

2000-09-25 Thread Damian Conway


Wouldn't this interact rather badly with the /gc option (which also leaves
Cpos set on failure)?

This question arose because I was trying to work out how one would write a
lexer with the new /z option, and it made my head ache ;-)


As you can see from the example code, the program flow stays very close 
to what people would ordinarily program under normal circumstances.

By contrast, RFC 93 proposes another solution to the same problem, but 
using callbacks. Since the same sub must do one of several things, the 
first thing that needs to be done is to channel different kinds of 
requests to their own handler. As a result, you need a complete rewrite 
from what you'd use in the ordinary case.

I think that a lot of people will find my approach far less
intimidating.


I'm not sure I see that this:
   
my $chunksize = 1024;
while(read FH, my $buffer, $chunksize) {
while(/(abcd|bc)/gz) {
# do something boring with the matched string:
print "$1\n";
}
if(defined pos) {  # end-of-buffer exception
# append the next chunk to the current one
read FH, $buffer, $chunksize, length $buffer;
# retry matching
redo;
}
}

is less intimidating or closer to the "ordinary program flow"  than:

\*FH =~ /(abcd|bc)/g;

(as proposed in RFC 93).

  
=head2 Match prefix

It can be useful to be able to recognize if a string could possibly be a
prefix for a potential match. For example in an interactive program, 
you want to allow a user to enter a number into an input field, but 
nothing else. After every single keystroke, you can test what he just 
entered against a regex matching the valid format for a number, so that 
C1234E can be recognized as a prefix for the regex

/^\d+\.?\d*(?:E[+-]?\d+)$/

Isn't this just:

\*STDIN =~ /^\d+\.?\d*(?:E[+-]?\d+)$/
or die "Not a number";

???

Damian