It seems to me that there are at least two important things missing
from this proposal.
1. There is no substantive rationale presented for why the change
would be desirable.
The only reasons you put forth are:
* The syntax is ugly and unintuitive.
Ugliness is a matter of opinion, and I d
.
135 (v1): Require explicit m on matches, even with ?? and // as delimiters.
C and C are what makes Perl hard to tokenize.
Requiring them to be written C and C would
solve this.
Mark-Jason Dominus [EMAIL PROTECTED]
I am
> I'm not concerned about / being mistaken for division, since that
> ambiguity already exists with bare /pat/ matches.
Yes, but the current ambiguity is resolved from context in a rather
complicated way. Nevertheless it turns out that Perl does the right
thing in most cases. You are proposin
> What exactly is matched by \g and \G is controlled by two new special
> variables, @^g and @^G, which are arrays of strings.
These sorts of global variables have been a problem in the past.
Since they change the meaning of the \g and \G escapes, I think they
should be pragmas or some other de
> >I propose that this 'last successful match' behavior be discarded
> >entirely, and that an empty pattern always match the empty string.
>
> I don't see a consideration for simply s/successful// above, which
> has also been talked about.
Thanks, I will add this to the next version. I did c
> There's also long been talk/thought about making $& and $1
> and friends magic aliases into the original string, which would
> save that cost.
Please correct me if I'm mistaken, but I believe that that's the way
they are implemented now. A regex match populates the ->startp and
->endp parts
> >Please correct me if I'm mistaken, but I believe that that's the way
> >they are implemented now. A regex match populates the ->startp and
> >->endp parts of the regex structure, and the elements of these items
> >are byte offsets into the original string.
>
> I haven't looked at it at all
> But maybe the effect of $& is greatly exaggerated or is a relic from
> perl4? Has anyone actually benchmarked this recently?
Matching with $& enabled is about 40% slower.
http://www.plover.com/~mjd/perl/amper.pl
> > $count = () = $string =~ /pattern/g;
>
> Which I find cute as a demonstration of the Perl's context concept,
> but ugly as hell from usability viewpoint.
I'd really like to see an RFC that looks into making the following
features more orthogonal:
1. Return the number of match
> Drawing on some of the proposals for extended 'for' syntax:
> for my($mo, $dy, $yr) ($string =~ /(\d\d)-(\d\d)-(\d\d)/g) {
> ...
> }
>
> This still requires that you know how many () matching groups are in
> the RE, of course. I don't think I would consider that onerous.
If ther rege
> > 1. Return the number of matches
> >
> > 2. Iterate over each match in sequence
> >
> > 3. Return list of all matches
> >
> > 4. Return a list of backreferences
>
> Please see RFC 164. It can handle all of 1-3.
You seem to have missed my point. I'm not ask
> Make your suggestions. But I think it is all off-base. None of this is
> addressing some improvement in working conditions, ease of use, problems
> in the language, etc.
1. I don't agree.
2. This mailing list is also for discussing stylistic improvements to
the language.
3. If you thin
Richard Proctor's RFC166 says:
> =head2 Matching Not a pattern
>
> (?^pattern) matches anything that does not match the pattern. On
> its own, one can use !~ etc to negatively match patterns, but to
> match a pattern that has foo(anything but not baz)bar is currently
> difficult. With this sy
foo(?:)bar/ to get what you wanted. This
is almost identical to what Richard proposed anyway.
It is really not clear to me that this problem needs to be solved any
better than it is already.
I suggest that this section be removed from the RFC.
Mark-Jas
> /t is suggested for "counT", as /c is already taken. Using /t
> without /g would be result in only 0 or 1 being returned, which is
> nearly the existing syntax.
It occurs to me that since none of the capital letters are taken, we
could adopt the convention that a capital letter as a regex mod
> On Mon, 28 Aug 2000, Mark-Jason Dominus wrote:
>
> > But there is no convenient way to run the loop once for each date and
> > split the dates into pieces:
> >
> > # WRONG
> > while (($mo, $dy, $yr) =
> On Tue, 29 Aug 2000 08:47:25 -0400, Mark-Jason Dominus wrote:
>
> >m/.../Count,Insensitive (instead of m/.../ti)
> >
> >That would escape the problem that we are running out of letters and
> >also the problem that the current letters are hard to remembe
> Mark-Jason Dominus wrote:
> >
> > m/.../Count (instead of m/.../t)
> > m/.../iCount (instead of m/.../it)
> > m/.../Count,i (instead of m/.../ti)
> > m/.../Count,Insensitive (instead of
should investigate several
solutions in parallel, and should compare them with one another and
contrast the benefits and drawbacks of each one.
Mark-Jason Dominus [EMAIL PROTECTED]
I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.ht
OK, I think this discussion should be closed.
Richard should add a section to RFC110 that discusses the
$count = () = m/PAT/g;
locution and its advantages and disadvantages compared to his
proposal, duly taking into account the many valuable comments that
have been made.
Thanks to eve
> This is going to need a much better definition...
Yes, that was my point.
I snipped the following discussion, in which you argued against a
suggestion that I advanced only as an example of something that would
not work.
> (?^baz) should behave as (.*)(?{$1 !~ /baz/})
I don't think that's go
> Would there be any interest in adding these two ideas to this RFC:
>
> 1) tr is not regex function, so it should be regularized to
>
>tr(SEARCH, REPLACE, MOD, STR)
MOD should be last, because you're frequently going to want to omit MOD.
But I think this is worth discussing further, be
> =head1 IMPLENTATION
>
> No idea, but should be straight forward.
I think the reason this hasn't been done before it because it's *not*
quite straightforward.
The way tr/// works is that a 256-byte table is constructed at compile
time that say for each input character what output character is
> When does the structure get built? That's why eg. tr[a-z][A-Z]
> brooks no variables, for it is solely at compile time that these
> things occur, and why you must resort to delayed compilation via
> eval qq/.../ to prod the compiler into building you a new one.
Certainly. But if there were
> Accepting variables in tr// makes no sense. It defeats the purpose of
> tr/// - extremely fast, known transliterations.
The propsal extends tr/// to handle extremely fast transliterations
whose nature is not known at compile time.
>
> tr///e is the same as s///g:
>
> tr/$foo/$bar/e ==
> Note that the 256-byte thing is out the window with Unicode, but that
> I no longer know how it is done.
Thanks. I was going to mention that, but I forgot before I sent the
message. The 256-byte thing is still in place with unicode, but it's
only used on byte strings, not on UTF8 strings. S
> One thing to be careful of there is thread safety. You can't hand
> the data off the syntax node (the one with the tr op on it), because
> tr/$foo/$bar/ wouldn't work for several threads in it at the same
> time then.
Certainly, but that is true for everything else that is in the op
node, whi
> >>solution to execute perl code inside a string, replacing "${\(...)}" and
> >
> >The first one doesn't work, and never did. You want
> >@{[]} and @{[scalar ]} instead.
>
> "Doesn't work"?
I think what Tom means is that (for example)
print "${\(localtime())}\n";
does not p
> > The way tr/// works is that a 256-byte table is constructed at compile
> > time that say for each input character what output character is
>
> Speaking of which, what's going to happen when there are more than 256
> values to map?
It's already happened, but I forget the details.
> On Tue, 29 Aug 2000, Mark-Jason Dominus wrote:
>
> > OK, I think this discussion should be closed.
>
> I think the bit about "having a special array containing all captured
> matches" might well still live on. The "counting" bit _per se_ is probably
&
> Ok, I can understand that. But, what happens when we get to UTF16? Aren't
> we talking about 256k per tr///, then? That seems like a lot of memory
> that is potentially wasted and could lead to some really large footprints.
I don't understand what this discussion has to do with this mailing
The big thing I find missing from this RFC is compelling examples.
You are proposing a major change to the regex engine but you only have
two examples. Both involve only fixed strings and one of them is
artificial. I really think you need to discuss in more detail why
this feature would be usef
> (mystery: how
> can filling in $& be a lot slower than filling in $1?)
It isn't. It's the same. $1 might even be more expensive than $&.
It appears that many people don't understand the problem with $&. I
will try to explain.
Maintaining the information required by $1 or $& slows down the
> I am unemcumbered by any knowledge of the regex engine implementation,
Yeah.
But I do know something about it, and I have already expressed my
informed opinion. Having you come along to say that you don't know
anything about it at all, but that you nevertheless think I am
mistaken, is bizar
> MD> One of Uri's suggestions in RFC 158 was to compute $& only for
> MD> regexes that have a /k modifier. This would solve the $& problem
> MD> because Perl would compute $& only when asked to, and not for
> MD> every other regex in the rest of the program.
>
> the rfc was about makin
> in any case, i think we have a fair agreement on rfc 158 and i will
> freeze it if there is no further comments on it.
In light of this:
$& The string matched by the last successful pattern match (not
counting any matches hidden within a BLOCK or eval() enclosed
by the
> On Thu, Aug 31, 2000 at 12:34:05PM -0400, Mark-Jason Dominus wrote:
> >
> > perl6-language-regex
> >
> > Summary report 2831
> >
> > RFC 72: The regexp engine should go backward as well as
> > forward. (Peter Heslin)
> >
>
There was no discussion of this.
RFC 170: Generalize =~ to a special-purpose assignment operator
(Nathan Wiger)
This is probably the most interesting and far-reaching RFC proposed
this week, but there was essentially no discussion.
Mark-Jason Dominus [EMAIL PROTECTED]
I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.html for details.
> >...My point is that I think we're approaching this
> >the wrong way. We're trying to apply more and more parser power into what
> >classically has been the lexer / tokenizer, namely our beloved
> >regular-expression engine.
I've been thinking the same thing. It seems to me that the attempts
> >>>>> "Mark-Jason" == Mark-Jason Dominus <[EMAIL PROTECTED]> writes:
>
> Mark-Jason> I have some ideas about how to do this, and I will try to
> Mark-Jason> write up an RFC this week.
>
> "You want Icon, you know where to find i
> > 2. Many people - including Larry - have voiced their desire
> > to see =~ die a horrible death
>
> Please provide a look-up-able reference to Larry's saying that he
> wanted to =~ to die horrible death.
Larry said:
# Well, the fact is, I've been thinking about possible ways to get
> I think what is needed is something along the line of :
Joe McMahon and I are working on something along these lines.
> Simply put, I want variable-length lookbehind.
Why didn't you simply propose that the (?<...) operator be fixed to
support variable-length expressions? Why so much additional machinery?
> As to your contention that "at best" (?r) will defeat many present
> optimizations, can you tell me why this will necessarily be so in the
> new engine?
Let me explain my thinking along these lines. I've made a number of
assumptions, which may not be correct, and certainly aren't obvious.
I
> I propose adding the first para as a note and moving RFC to frozen soon.
You did not address my points about tr///o and related issues.
I suggest that you submit a revised RFC and then freeze it a week
afterwards if there is still no discussion.
> (?@foo) is sort of equivalent to (??{join('|',@foo)}), ie it expands into a
> list of alternatives. One could possible use just @foo, for this.
It just occurs to me that this is already possible. I've written a
module, 'atq', such that if you write
use atq;
then your regexes may co
I have some trouble understanding just what the proposal is, since the
RFC doesn't contain any examples. But I gather that you want to usurp
*both* the (...) and the [...] notation for numeric ranges.
This would change the meaning of any code that happened to contain a
regex like this:
> > in any case, i think we have a fair agreement on rfc 158 and i will
> > freeze it if there is no further comments on it.
>
> I think you should remove the parts of your propsal about making $& be
> autolocalized.
If you're not planning to revise your RFC, let me know so that I can
> I propose adding this note. His preference for the working of
> /t and /g seems the most appropriate. Unless I here any further
> discussion I propose moving this RFC to frozen this week.
Please post a complete, revised version of the RFC *before* you freeze it.
> : it looks worse and dumps core.
>
> That's because the first non-paren forces it to recurse into the
> second branch until you hit REG_INFTY or overflow the stack. Swap
> second and third branches and you have a better chance:
I think something else goes wrong there too.
> $re = qr{...
> In theory, all letters should be reserved to map to future flags for
> the same reason.
My recollection is that Larry specifically mandated this, and that's
why (?p...) was changed to (??...) in 5.6.0.
> :Anyway, Snobol has a nice heuristic to prevent infinite recursion in
> :cases like this, but I'm not sure it's applicable to the way the Perl
> :regex engine works. I will think about it.
>
> It is probably worth adding the heuristic above: anytime you recurse
> into the same re at the same
> > (The \ is necessary here because (?@foo) already has a meaning under
> > Perl 5, and I think your proposal must address this.)
>
> (?@foo) has no meaning I checked the code
I don't know what you mean, but you're mistaken, because it means to
interpolate @foo as in a double-quoted string.
perl6-language-regex
Summary report 2911
RFC 72: The regexp engine should go backward as well as
forward. (Peter Heslin)
The author sent revised version of the RFC. There seem to be two ideas
here:
1. The lookbehind assertions should work for variable-length
patterns. (At pre
> On Tue, 12 Sep 2000 19:01:35 -0400, Mark-Jason Dominus wrote:
>
> >I don't know what you mean, but you're mistaken, because it means to
> >interpolate @foo as in a double-quoted string.
>
> Which is precisely the meaning he wants for it, with $" set t
> (?Q$foo) Quotes the contents of the scalar $foo - equivalent to
> (??{ quotemeta $foo }).
How is this different from
\Q$foo\E
?
> I lie: the other reason qr{} currently doesn't behave like that is that
> when we interpolate a compiled regexp into a context that requires it be
> recompiled,
Interpolated qr() items shouldn't be recompiled anyway. They should
be treated as subroutine calls. Unfortunately, this requires a
I think the proposal that Joe McMahon and I are finishing up now will
make these obsolete anyway.
> On Mon, Sep 25, 2000 at 08:56:47PM +0000, Mark-Jason Dominus wrote:
> > I think the proposal that Joe McMahon and I are finishing up now will
> > make these obsolete anyway.
>
> Good! The less I have to maintain the better...
Sorry, I meant that it would make (??...)
59 matches
Mail list logo