Re: RFC 158 (v1) Regular Expression Special Variables

2000-08-25 Thread Uri Guttman

 "TC" == Tom Christiansen [EMAIL PROTECTED] writes:

   $`, $ and $' are useful variables which are never used by any
   experienced Perl hacker since they have well known problems with
   efficiency. 

  TC That's hardly true.  I could show you plenty of code from
  TC inexperienced Perl hackers like lwall that use them.  But
  TC the cost in understood.  :-)

those early perl3 scripts by lwall floating around in /etc were poorly
written. i am glad they are finally out of the distribution.

  TC The rest of what you said probably is reasonable, however.

  TC The (.*?)(blah)(.*) solution kind works sometimes, but is 
  TC hardly pleasant.  Likewise the @+ and @- stuff.

i would like to see the @+ and @- stuff made to work faster or beterr or
something. they have merit but not practicality.

another related grabbing issue is grabbing repeated groups like

@all_words = /(\w+\s+)+/ ;

we only get the last match from that. but that should be a separate rfc.

  TC There's also long been talk/thought about making $ and $1 
  TC and friends magic aliases into the original string, which would
  TC save that cost.

but if you modify that string with s/// you lose unless you make a
copy. in fact $`, $ and $' should just be aliases if the op was
m///. it is the s/// case that is the problem. 

that brings up the question about how often is $ needed after a s///?
it almost makes little sense since you are matching and modifying. maybe
we can also remove support for them with s/// and thereby remove the
copy penalty. but my idea would work in both cases and puts it under
program control so we could just use that.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting
The Perl Books Page  ---  http://www.sysarch.com/cgi-bin/perl_books
The Best Search Engine on the Net  --  http://www.northernlight.com



Re: RFC 158 (v1) Regular Expression Special Variables

2000-08-25 Thread Tom Christiansen

those early perl3 scripts by lwall floating around in /etc were poorly
written. i am glad they are finally out of the distribution.

Those weren't the scripts I was thinking about, and it is *NOT*
ipso facto true that something which uses $ or $` is poorly
written.

--tom



Re: RFC 145 (v2) Brace-matching for Perl Regular Expressions

2000-08-25 Thread Eric Roode

Nat wrote:
5.6's regular expressions have (??{ ... }) to permit recursion and
$^R to maintain state through the parsing. 

In another thread, Tomc wrote:
[...]  Likewise the @+ and @- stuff.

Okay, I'm throwing my ignorance out for the whole world to see. WTF??

Sure, I'm not in the loop, as certainly gnat and tomc are, but ...
I haven't heard of these features, and can't begin to guess what they
mean. I just spent an hour or two cruising the perl web site, and
nothing about them did I find. Most especially, no mention of any of
them is made in the What's New in 5.6.0, What's new in 5.005, or
What's New in 5.004 articles.

What are these things, and where can I learn about them?

ObPerl6: Perhaps some (many?) of the RFCs propose to solve problems
that have already been solved, but nobody knows about the solution.
 --
 Eric J. Roode,  [EMAIL PROTECTED]   print  scalar  reverse  sort
 Senior Software Engineer'tona ', 'reh', 'ekca', 'lre',
 Myxa Corporation'.r', 'h ', 'uj', 'p ', 'ts';




Re: RFC 158 (v1) Regular Expression Special Variables

2000-08-25 Thread David L. Nicol

Tom Christiansen wrote:

 There's also long been talk/thought about making $ and $1
 and friends magic aliases into the original string, which would
 save that cost.


I was distressed to discover that s///g does not rebuild the
old string between matches, but only at the end.  It broke my
random anagram generator which was depending on instant updates.


If STRING was a linked list of partially full blocks rather than
a big piece of contiguous space, we could do length-altering substitutions
without copying.

-- 
  David Nicol 816.235.1187 [EMAIL PROTECTED]
   safety first: Republicans for Nader in 2000



Re: RFC 145 (v2) Brace-matching for Perl Regular Expressions

2000-08-25 Thread Tom Christiansen

All in all, though, you're right that neither set of features is particularly
well-known/used outside of p5p followers. At least from what I've seen.
Virtually every person I've worked with since 5.6 came out has been surprised
and amazed at the REx eval stuff.

The completely reworked regex chapter in Camel III explains and demos all the
new 5.6 features.  I do not believe they will long remain the Cabal's secret.

--tom



Re: RFC 158 (v1) Regular Expression Special Variables

2000-08-25 Thread Mark-Jason Dominus


 Please correct me if I'm mistaken, but I believe that that's the way
 they are implemented now.  A regex match populates the -startp and
 -endp parts of the regex structure, and the elements of these items
 are byte offsets into the original string.  
 
 I haven't looked at it at all, and perhaps that 's sometihng Ilya
 did when creating @+ etc.  So you might be right.  

As far as I know it's the same in 5.000.

I thought the problem with $ was that the regex engine has to adjust
the offsets in the startp/endp arrays every time it scans forward a
character or backtracks a character.  

But maybe the effect of $ is greatly exaggerated or is a relic from
perl4?  Has anyone actually benchmarked this recently?




New match and subst replacements for =~ and !~ (was Re: RFC 135 (v2) Require explicit m on matches, even with ?? and // as delimiters.)

2000-08-25 Thread Nathan Wiger

[cc'ed to -regex b/c this is related to RFC 138]

Proposed replacements for m// and s///:

match /pattern/flags, $string
subst /pattern/newpattern/flags, $string
 
 The more I look at that, the more I like it. Very consistent with split
 and join. You can now potentially match on @multiple_strings too.

Just to extend this idea, at least for the exercise of it, consider:

   match;  # all defaults (pattern is /\w+/?)
   match /pat/;# match $_
   match /pat/, $str;  # match $str
   match /pat/, @strs; # match any of @strs

   subst;  # like s///, pretty useless :-)
   subst /pat/new/;# sub on $_
   subst /pat/new/, $str;  # sub on $str
   subst /pat/new/, @strs; # return array of modified strings
 
Notice you can drop trailing args and they work just like split. Much
more consistent. This also eliminates "one more oddity", =~ and !~. So
the new syntax would be:

   Perl 5   Perl 6
    --
   if ( /\w+/ ) { } if ( match ) { }
   if ( $_ !~ /\w+/ ) { }   if ( ! match ) { }#
better
   ($res) = m#^(.*)$#g; $res = match #^(.*)$#g;

   next if /\s+/ || /\w+/;  next if match /\s+/ or match /\w+/;
   next if ($str =~ /\s+/) ||   next if match /\s+/, $str or 
   ($str =~ /\w+/)  match /\w+/, $str;
   next unless $str =~ /^N/;next unless match /^N/, $str;
   
   $str =~ s/\w+/$bob/gi;   $str = subst /\w+/$bob/gi, $str;
   ($str = $_) =~ s/\d+/func/ge;   $str = subst /\d+/func/ge;   #
better
   s/\w+/this/; subst /\w+/this/; 

   # These are pretty cool...   
   foreach (@old) { @new = subst /hello/X/gi, @old;
  s/hello/X/gi;
  push @new, $_;
   }

   foreach (@str) { print "Got it" if match /\w+/, @str;
  print "Got it" if (/\w+/);
   }

Now, this gives us a cleaner syntax, yes. More consistent, more
sensical, and makes some things easier. But more typing overall, and
relearning for lots of people. If it's more powerful and extensible,
then it's worth it, but this should be a conscious decision.

However, it is worth consideration, in light of RFC 138 and many other
issues. If we did eliminate =~, I think something like this would work
pretty well in its place. If anyone thinks this is an idea worthy of an
RFC (the more I look at it the better it looks, but I'm biased :), let
me know. Although we'd probably need something better than "subst".
Maybe just "m" and "s" still.

-Nate