RFC 110 counting matches (post Hugo)

2000-09-11 Thread Richard Proctor

This list has gone a little quiet...

Hugo wrote:
 I like this too. I'd suggest /t should mean a) return a scalar of
 the number of matches and b) don't set any special variables. Then
 /t without /g would return 0 or 1, but be faster since no extra
 information need be captured (except internally for (.)\1 type
 matching - compile time checks could determine if these are needed,
 though (?{..}) and (??{..}) patterns would require disabling of
 that optimisation). /tg would give a scalar count of the total
 number of matches. \G would retain its meaning.
 
 Any which way, implementation should be fairly straightforward,
 though ensuring that optimisations occurred precisely when they
 are safe would probably involve a few bug-chasing cycles.


I propose adding this note.  His preference for the working of
/t and /g seems the most appropriate.  Unless I here any further
discussion I propose moving this RFC to frozen this week.

Richard


-- 

[EMAIL PROTECTED]




RFC 166 (postHugo)

2000-09-11 Thread Richard Proctor

This RFC had three concepts, I propose dropping the "Not a pattern" from here
as it is now in RFC 198 and the null element.  The List expansion might
benefit from a slight enhancement.

Hugo:
 (?@foo) and (?Q@foo) are both things I've wanted before now. I'm
 not sure if this is the right syntax, particularly if RFC 112 is
 adopted: it would be confusing to have (?@foo) to have so
 different a meaning from (?$foo=...), and even more so if the
 latter is ever extended to allow (?@foo=...).
 I see no reason that implementation should cause any problems
 since this is purely a regexp-compile time issue.
 
 I dont have any problem with the (?@foo) syntax, does anybody else?
 I cant imagine a (?@foo=...) style syntax (yet).

Thinking further about what I defined for (?Q@foo) as adding the list
as quoted alternatives, is there a case for (?Q$foo) to match the contents of
$foo quoted in a similar way?  (I think it is at least a probably).

Feedback desirable.  

Richard

(Still thinking on scoping in assignment and boolean regexes)


-- 

[EMAIL PROTECTED]




Re: RFC 166 (postHugo)

2000-09-11 Thread Nathan Torkington

Sorry, I can't help but read the subject as an abbreviation of
  post Hugo, ergo propter Hugo

and then I wonder why you're naming an RFC after a logical fallacy
involving a perl5-porter.  I am seeking treatment, though :-)

Nat



special character to match a valid expression

2000-09-11 Thread David L. Nicol



Thinking about the brace matching problem, regarding the specific
problem of writing a regex to match any valid specification of
a scalar written like 

${expression returning name or reference goes here}

I realized that no amount of lookahead is going to be without possible
problems.

So why not give up and try another direction?  A special backslash
assertion that matches ___ a valid perl expression which appears as if
it would return a value ___ would be just the thing.

We could call it \v for valid, and it would match as far as it could get
in validity.

($Name_Of_The_First_Interpolable_Scalar) = m/\$(\w+|(\{\v\}))/;


It's opposite, \V, is something I'd like to know more about before submitting
a RFC on this idea.  Would it be greedy?  Would it require a quantifier, matching
dot for that many, and then validate the results?  Maybe it should be left undefined:
can anyone come up with a situation in which you'd want to match all the characters
that were not syntactically valid, or match up to the last token that would match all
the previous brackets?


-- 
  David Nicol 816.235.1187 [EMAIL PROTECTED]
   perl -e'map{sleep print$w[rand@w]}@w=' ~/nsmail/Inbox



Re: RFC 72 (v1) The regexp engine should go backward as well as forward.

2000-09-11 Thread Mark-Jason Dominus


 Simply put, I want variable-length lookbehind.  

Why didn't you simply propose that the (?...) operator be fixed to
support variable-length expressions?  Why so much additional machinery?




Re: $ and copying: rfc 158 (was Re: RFC 110 (v3) counting matches)

2000-09-11 Thread Mark-Jason Dominus


  in any case, i think we have a fair agreement on rfc 158 and i will
  freeze it if there is no further comments on it.
 
 I think you should remove the parts of your propsal about making $ be
  autolocalized.

If you're not planning to revise your RFC, let me know so that I can
ask the librarian to mark it as withdrawn.




Re: RFC 158 (v1) Regular Expression Special Variables

2000-09-11 Thread Hugo

Mark-Jason Dominus writes:
: There's also long been talk/thought about making $ and $1 
: and friends magic aliases into the original string, which would
: save that cost.
:
:Please correct me if I'm mistaken, but I believe that that's the way
:they are implemented now.  A regex match populates the -startp and
:-endp parts of the regex structure, and the elements of these items
:are byte offsets into the original string.

I went on a briefish trawl for this the other day, and as far as I
can tell what happens is this:
- during matching, the startp/endp pairs are populated with offsets
into the target string
- immediately after matching, the target string is copied if needed,
and the PL_curpm object is updated to refer to the copy
- the copy is needed if any of the special variables can be referred
to: $`, $, $', $1, $2, ...

The result of that is that if there are backreferences in the regexp,
the copy is always needed; if not, the copy is needed only if $ or
her kin have been seen. So regexps with backrefs should suffer no
slowdown from use of $ in the same program, but regexps without
backrefs will get a (potentially) unnecessary copy.

The other problem with this, of course, is that the compiler may not
yet have seen the $ we intend to use:
  crypt% perl -wle '$_="foo"; /.*/; $_="bar"; print eval q{$}'
  bar
  crypt% 
.. and I think coredumps may be possible from this. (Hmm, perlbug
upcoming.)

Hugo



Re: XML/HTML-specific ? and ? operators?

2000-09-11 Thread Mark-Jason Dominus


 : it looks worse and dumps core.
 
 That's because the first non-paren forces it to recurse into the
 second branch until you hit REG_INFTY or overflow the stack. Swap
 second and third branches and you have a better chance:

I think something else goes wrong there too.  


   $re = qr{...}
 (I haven't checked that there aren't other problems with it, though.)

Try this:

"(x)(y)" -~ /^$re$/;

This should match, but it dumps core.  I don't think there is infinite
recursion, although I might be mistaken.

Anyway, Snobol has a nice heuristic to prevent infinite recursion in
cases like this, but I'm not sure it's applicable to the way the Perl
regex engine works.  I will think about it.




Re: XML/HTML-specific ? and ? operators?

2000-09-11 Thread Mark-Jason Dominus


 :Anyway, Snobol has a nice heuristic to prevent infinite recursion in
 :cases like this, but I'm not sure it's applicable to the way the Perl
 :regex engine works.  I will think about it.
 
 It is probably worth adding the heuristic above: anytime you recurse
 into the same re at the same position, there is an infinite loop.


That is basically it, except that in snobol it is inside out:  Each
recursively interpolated pattern is assumed to match a string of at
least length 1, and if the remaining part of the target string isn't
sufficiently long to match the rest of the pattern after recursion,
then the recursion is skipped.