subject:"RFC 110 $v3$ counting matches"

Re: $ and copying: rfc 158 (was Re: RFC 110 (v3) counting matches)

2000-09-11 Thread Mark-Jason Dominus



  in any case, i think we have a fair agreement on rfc 158 and i will
  freeze it if there is no further comments on it.
 
 I think you should remove the parts of your propsal about making $ be
  autolocalized.

If you're not planning to revise your RFC, let me know so that I can
ask the librarian to mark it as withdrawn.

Re: RFC 110 (v3) counting matches

2000-08-31 Thread Mark-Jason Dominus



 (mystery: how
 can filling in $ be a lot slower than filling in $1?)

It isn't.  It's the same.  $1 might even be more expensive than $.

It appears that many people don't understand the problem with $.  I
will try to explain.

Maintaining the information required by $1 or $ slows down the regex
match, possibly by as much as forty to sixty percent, or more.  (How
much depends on details of the regex and the target string.)

For this reason, Perl has an optimization in it so that if you never
use $ anywhere in your program, Perl never maintains the information,
and every regex in your program runs faster.

But if you do use $ somewhere, Perl cannot apply the optimization,
and it must compute the $ information for every regex in the program.
Every regex becomes much slower.

In particular, if you load a module whose author happened to use $,
all your regexes get slower, which might be an unpleasant surprise,
since you might not be aware of the cause.

A regex with backreferences is *also* slow.  But using backreferences
in one regex does not make all the *other* regexes slow.  If you have

/(...)/   # regex 1
/.../ # regex 2

Perl knows that it must compute the backreference information for
regex 1, and knows that it can skip computing the backreference
information for regex 2, because regex 2 contains no parentheses.

If you use a module that contains regexes that use backreferences,
those regexes run slowly, but there is no effect on *your* regexes.

The cost is just as high for backreferences as for $, but the
backreference cost is paid only by regexes that actually need it.

The $ cost is paid by every regex in the entire program, whether they
used it or not.  This is because Perl has no way to tell which regexes
use $ and which do not. 

One of Uri's suggestions in RFC 158 was to compute $ only for regexes
that have a /k modifier.  This would solve the $ problem because Perl
would compute $ only when asked to, and not for every other regex in
the rest of the program.

Re: RFC 110 (v3) counting matches

2000-08-31 Thread Joe McMahon


Jonathan Scott Duff wrote:
 
 How about something like this?
 
   $re = qr/(\d\d)-(\d\d)-(\d\d)/g;
   $re-onmatch_callback(push @list, makedate(^0,^1,^2));
   $string =~ $re;
 
It's not bad, but it loses one thing that I was trying to keep from the 
SNOBOL model. If you have (again, improvised syntax - I *know* you want 
to use the $ variables, OK? This is just for discussion):

   
/($pat1)($pat2)($pat3)(?{sub1(@\)$pat4|?{sub2(@\)}$pat5|?{sub3(@\)})/

This would translate to "if pat1pat2pat3 matches, call sub1 with all the 
matches to that point  if pat4 matches afterward, otherwise call sub2 
with all the matches if pat5 matches, else just call sub3." The key bit 
here is that you pass over the sub call, deferring it until you've 
decided if the whole match worked, then picking the one that succeeded 
and calling it. If you don't like the syntax, please feel free to 
propose another. @\ seemed a good mnemonic for "the array of 
backreferences I already matched".

And, of course, if you assume that @\ keeps growing when you use /g, 
then doing a scalar @\and dividing by the number of backreferences would 
give you a match count:

   $string /(\d\d)-(\d\d)-(\d\d)/g;
   $hits = scalar(@\)/3;

Of course, with multiple alternatives with different numbers of 
backreferences leads to a problem, so maybe this is all academic. Oh well.
--- Joe M.

Re: RFC 110 (v3) counting matches

2000-08-29 Thread Mark-Jason Dominus



 On Mon, 28 Aug 2000, Mark-Jason Dominus wrote:
 
  But there is no convenient way to run the loop once for each date and
  split the dates into pieces:
  
  # WRONG
  while (($mo, $dy, $yr) = ($string =~ /(\d\d)-(\d\d)-(\d\d)/g)) {
...
  }
 
 What I use in a script of mine is:
 
 while ($string =~ /(\d\d)-(\d\d)-(\d\d)/g) {
 ($mo, $dy, $yr) = ($1, $2, $3);
 }
 
 Although this, of course, also requires that you know the number of
 backreferences. 

The real problem I was trying to discuss was not this particular
application.  I was trying to point out a larger problem, which is
that there are several regex features that are enabled or disabled
depending on what context the match is in, so that if you want one
scalar-context feature and one list-context feature at the same time,
there is no direct way to do it.

 Nicer would be to be able to assign from @matchdata or something
 like that :)

I agree.  There are many operations that would be simpler if there was
a magic array that contained ($1, $2, $3, ...).  If anyone wants to
write an RFC on this, I will help.

Re: RFC 110 (v3) counting matches

2000-08-29 Thread Bart Lateur


On Tue, 29 Aug 2000 08:51:29 -0400, Mark-Jason Dominus wrote:

There are many operations that would be simpler if there was
a magic array that contained ($1, $2, $3, ...).  If anyone wants to
write an RFC on this, I will help.

Heh. I once complained about the lack of such an array, in
comp.lang.perl.misc, *years* ago.

My practical problem was something like this, in a translation program.
$phrase is one of many patterns in a table, to look for English phrases,
%translate contains the French translations. interpolate() is a sub that
fills in the parameters -- the numbers in the string):

$_ = "It is 5 past 10." 
$phrase = 'it is (\d+) past (\d+)';
s/^$phrase/interpolate($translate{$phrase}, $1, $2)/ie;


The problem is that with variable patterns, you *don't know* how many
paren groups there are.

The solution they came upo with, was @+ and @-. I still can't work with
those. An array of matches, (e.g. @) would be a lot easier. It could
also be a lot slower; see the discussion on $ for this. (mystery: how
can filling in $ be a lot slower than filling in $1?)

-- 
Bart.

Re: RFC 110 (v3) counting matches

2000-08-29 Thread Tom Christiansen


That empty list to force the proper context irks me.  How about a
modifier to the RE that forces it (this would solve the "counting matches"
problem too).

   $string =~ m{
   (\d\d) - (\d\d) - (\d\d)
   (?{ push @dates, makedate($1,$2,$3) })
   }gxl;

   $count = $string =~ m/foo/gl;   # always list context

The reason why not is because you're adding a special case hack to 
one particular place, rather than promoting a general mechanism
that can be everywhere.  

Tell me: which is better and why.

1) A regex switch to specify scalar context, as in a mythical /r:

push(@got, /bar/r)

2) A general mechanism, say for example, "scalar":

push(@got, scalar /bar/)

Obviously the "scalar" is better, because it does not require that
a new switch be learnt, nor is its use restricted to pattern matching.
Furthermore, it's inarguably more mnemonic for the sense of "match this
scalarishly".

Likewise, to force list context (a far less common operation, mind
you), it is a bad idea to have what amounts to a special argument
to just one function to this.  What happens to the next function you
want to do this to?  How about if I want to force getpwnam() into list
context and get back a scalar result?

$count = getpwnam("tchrist")/l;
$count = getpwnam("tchrist", LIST);
$count = getpwnam("tchrist")-as_list;

All of those, frankly, suck.  This is much better:

$count = () = getpwnam("tchrist");

It's better because 

  * You don't have to invent anything new, whether syntactically
or mnemonically.  The sucky solution all require modification
of Perl's very syntax.  With the list assignment, you just need
to learn how to use what you *already have*.  I could say as
much for (?{...}).  Think how many of the suggestions on these
lists can be dealt with simply through using existing features
that the suggesting party was unaware of.

  * It's a general mechanism that isn't tailored for this particular
function call.  Special-purpose solutions are often inferior
to general-purpose ones, because the latter are more likely to 
be creatively usable in a fashion unforeseen by the author.

  * What could possibly be more intuitive for the action of acting
as though one were assigning to a list than doing that very
thing itself?  Since () is the canonical list (it's empty, after
all), this follows directly and requires on special knowledge
whatsoever.

--tom

Re: RFC 110 (v3) counting matches

2000-08-29 Thread Tom Christiansen


p.s. Has anybody already suggested that we ought to have a nicer
solution to execute perl code inside a string, replacing "${\(...)}" and
"@{[...]}", which also won't ever win a beauty contest?  Oops, wrong
mailing list.

The first one doesn't work, and never did.  You want 
@{[]} and @{[scalar ]} instead.

"Doesn't work"?

   print "The sum of 1 + 2 is ${\(1+2)}.\n";
--
   The sum of 1 + 2 is 3.

I'm surprised your wouldn't have known this. The principle is the same:
"${...}" expects a scalar reference inside the block, and '\' provides
one. Of course, there shouldn't be a real multi-element list inside the
parens, but just one scalar. And often, the parens aren't needed.

I'm surprised that you still don't understand.  Notice what I showed
you for the replacement above: @{[scalar ]}.

Using ${\(...)} doesn't work in the sense that contrary to popular
belief, it fails to provide a scalar context to the contents of
those parens.  Thus ${ \( fn() ) } is still calling fn() in list
context, not scalar context.  Witness:

sub fn { sprintf "called in %s context", wantarray ? "list" : "scalar" } 

print "Test 1: ";
print "@{ [fn()] }\n";

print "Test 2: ";
print "${ \(fn()) }\n";

print "Test 3: ";
print "@{ [scalar fn()] }\n";

That, when executed, yields:

Test 1: called in list context
Test 2: called in list context
Test 3: called in scalar context

*That's* why test 2 "doesn't work".

--tom

Re: RFC 110 (v3) counting matches

2000-08-28 Thread Tom Christiansen


Have you ever wanted to count the number of matches of a patten?  s///g 
returns the number of matches it finds.  m//g just returns 1 for matching.
Counts can be made using s//$/g but this is wastefull, or by putting some 
counting loop round a m//g.  But this all seams rather messy. 

It's really much easier than all that:

$count = () = $string =~ /pattern/g;

--tom

Re: $ and copying: rfc 158 (was Re: RFC 110 (v3) counting matches)

Re: RFC 110 (v3) counting matches

Re: RFC 110 (v3) counting matches

Re: RFC 110 (v3) counting matches

Re: RFC 110 (v3) counting matches

Re: RFC 110 (v3) counting matches

Re: RFC 110 (v3) counting matches

Re: RFC 110 (v3) counting matches

8 matches

Site Navigation

Mail list logo

Footer information