Re: RFC 138 (v1) Eliminate =~ operator.

2000-08-23 Thread Mark-Jason Dominus
It seems to me that there are at least two important things missing from this proposal. 1. There is no substantive rationale presented for why the change would be desirable. The only reasons you put forth are: * The syntax is ugly and unintuitive. Ugliness is a matter of opinion, and I d

Summary of regex-related RFCs so far

2000-08-23 Thread Mark-Jason Dominus
. 135 (v1): Require explicit m on matches, even with ?? and // as delimiters. C and C are what makes Perl hard to tokenize. Requiring them to be written C and C would solve this. Mark-Jason Dominus [EMAIL PROTECTED] I am

Re: RFC 138 (v1) Eliminate =~ operator.

2000-08-23 Thread Mark-Jason Dominus
> I'm not concerned about / being mistaken for division, since that > ambiguity already exists with bare /pat/ matches. Yes, but the current ambiguity is resolved from context in a rather complicated way. Nevertheless it turns out that Perl does the right thing in most cases. You are proposin

Re: RFC 145 (v1) Brace-matching for Perl Regular Expressions

2000-08-24 Thread Mark-Jason Dominus
> What exactly is matched by \g and \G is controlled by two new special > variables, @^g and @^G, which are arrays of strings. These sorts of global variables have been a problem in the past. Since they change the meaning of the \g and \G escapes, I think they should be pragmas or some other de

Re: RFC 144 (v1) Behavior of empty regex should be simple

2000-08-24 Thread Mark-Jason Dominus
> >I propose that this 'last successful match' behavior be discarded > >entirely, and that an empty pattern always match the empty string. > > I don't see a consideration for simply s/successful// above, which > has also been talked about. Thanks, I will add this to the next version. I did c

Re: RFC 158 (v1) Regular Expression Special Variables

2000-08-25 Thread Mark-Jason Dominus
> There's also long been talk/thought about making $& and $1 > and friends magic aliases into the original string, which would > save that cost. Please correct me if I'm mistaken, but I believe that that's the way they are implemented now. A regex match populates the ->startp and ->endp parts

Re: RFC 158 (v1) Regular Expression Special Variables

2000-08-25 Thread Mark-Jason Dominus
> >Please correct me if I'm mistaken, but I believe that that's the way > >they are implemented now. A regex match populates the ->startp and > >->endp parts of the regex structure, and the elements of these items > >are byte offsets into the original string. > > I haven't looked at it at all

Re: RFC 158 (v1) Regular Expression Special Variables

2000-08-25 Thread Mark-Jason Dominus
> But maybe the effect of $& is greatly exaggerated or is a relic from > perl4? Has anyone actually benchmarked this recently? Matching with $& enabled is about 40% slower. http://www.plover.com/~mjd/perl/amper.pl

Re: RFC 110 (v3) counting matches

2000-08-28 Thread Mark-Jason Dominus
> > $count = () = $string =~ /pattern/g; > > Which I find cute as a demonstration of the Perl's context concept, > but ugly as hell from usability viewpoint. I'd really like to see an RFC that looks into making the following features more orthogonal: 1. Return the number of match

Re: RFC 110 (v3) counting matches

2000-08-28 Thread Mark-Jason Dominus
> Drawing on some of the proposals for extended 'for' syntax: > for my($mo, $dy, $yr) ($string =~ /(\d\d)-(\d\d)-(\d\d)/g) { > ... > } > > This still requires that you know how many () matching groups are in > the RE, of course. I don't think I would consider that onerous. If ther rege

Re: RFC 110 (v3) counting matches

2000-08-28 Thread Mark-Jason Dominus
> > 1. Return the number of matches > > > > 2. Iterate over each match in sequence > > > > 3. Return list of all matches > > > > 4. Return a list of backreferences > > Please see RFC 164. It can handle all of 1-3. You seem to have missed my point. I'm not ask

Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-29 Thread Mark-Jason Dominus
> Make your suggestions. But I think it is all off-base. None of this is > addressing some improvement in working conditions, ease of use, problems > in the language, etc. 1. I don't agree. 2. This mailing list is also for discussing stylistic improvements to the language. 3. If you thin

RFC 166 (does-not-match)

2000-08-29 Thread Mark-Jason Dominus
Richard Proctor's RFC166 says: > =head2 Matching Not a pattern > > (?^pattern) matches anything that does not match the pattern. On > its own, one can use !~ etc to negatively match patterns, but to > match a pattern that has foo(anything but not baz)bar is currently > difficult. With this sy

RFC 166 (disambiguator)

2000-08-29 Thread Mark-Jason Dominus
foo(?:)bar/ to get what you wanted. This is almost identical to what Richard proposed anyway. It is really not clear to me that this problem needs to be solved any better than it is already. I suggest that this section be removed from the RFC. Mark-Jas

Re: RFC 110 (v2) counting matches

2000-08-29 Thread Mark-Jason Dominus
> /t is suggested for "counT", as /c is already taken. Using /t > without /g would be result in only 0 or 1 being returned, which is > nearly the existing syntax. It occurs to me that since none of the capital letters are taken, we could adopt the convention that a capital letter as a regex mod

Re: RFC 110 (v3) counting matches

2000-08-29 Thread Mark-Jason Dominus
> On Mon, 28 Aug 2000, Mark-Jason Dominus wrote: > > > But there is no convenient way to run the loop once for each date and > > split the dates into pieces: > > > > # WRONG > > while (($mo, $dy, $yr) =

Re: RFC 110 (v2) counting matches

2000-08-29 Thread Mark-Jason Dominus
> On Tue, 29 Aug 2000 08:47:25 -0400, Mark-Jason Dominus wrote: > > >m/.../Count,Insensitive (instead of m/.../ti) > > > >That would escape the problem that we are running out of letters and > >also the problem that the current letters are hard to remembe

Re: RFC 110 (v2) counting matches

2000-08-29 Thread Mark-Jason Dominus
> Mark-Jason Dominus wrote: > > > > m/.../Count (instead of m/.../t) > > m/.../iCount (instead of m/.../it) > > m/.../Count,i (instead of m/.../ti) > > m/.../Count,Insensitive (instead of

Overlapping RFCs 135 138 164

2000-08-29 Thread Mark-Jason Dominus
should investigate several solutions in parallel, and should compare them with one another and contrast the benefits and drawbacks of each one. Mark-Jason Dominus [EMAIL PROTECTED] I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.ht

Re: RFC 110 (v3) counting matches

2000-08-29 Thread Mark-Jason Dominus
OK, I think this discussion should be closed. Richard should add a section to RFC110 that discusses the $count = () = m/PAT/g; locution and its advantages and disadvantages compared to his proposal, duly taking into account the many valuable comments that have been made. Thanks to eve

Re: RFC 166 (does-not-match)

2000-08-29 Thread Mark-Jason Dominus
> This is going to need a much better definition... Yes, that was my point. I snipped the following discussion, in which you argued against a suggestion that I advanced only as an example of something that would not work. > (?^baz) should behave as (.*)(?{$1 !~ /baz/}) I don't think that's go

Re: RFC 165: Allow variables in a tr///

2000-08-29 Thread Mark-Jason Dominus
> Would there be any interest in adding these two ideas to this RFC: > > 1) tr is not regex function, so it should be regularized to > >tr(SEARCH, REPLACE, MOD, STR) MOD should be last, because you're frequently going to want to omit MOD. But I think this is worth discussing further, be

Re: RFC 165 (v1) Allow Varibles in tr///

2000-08-29 Thread Mark-Jason Dominus
> =head1 IMPLENTATION > > No idea, but should be straight forward. I think the reason this hasn't been done before it because it's *not* quite straightforward. The way tr/// works is that a 256-byte table is constructed at compile time that say for each input character what output character is

Re: RFC 165: Allow variables in a tr///

2000-08-29 Thread Mark-Jason Dominus
> When does the structure get built? That's why eg. tr[a-z][A-Z] > brooks no variables, for it is solely at compile time that these > things occur, and why you must resort to delayed compilation via > eval qq/.../ to prod the compiler into building you a new one. Certainly. But if there were

Re: RFC 165 (v1) Allow Varibles in tr///

2000-08-29 Thread Mark-Jason Dominus
> Accepting variables in tr// makes no sense. It defeats the purpose of > tr/// - extremely fast, known transliterations. The propsal extends tr/// to handle extremely fast transliterations whose nature is not known at compile time. > > tr///e is the same as s///g: > > tr/$foo/$bar/e ==

Re: RFC 165 (v1) Allow Varibles in tr///

2000-08-29 Thread Mark-Jason Dominus
> Note that the 256-byte thing is out the window with Unicode, but that > I no longer know how it is done. Thanks. I was going to mention that, but I forgot before I sent the message. The 256-byte thing is still in place with unicode, but it's only used on byte strings, not on UTF8 strings. S

Re: RFC 165: Allow variables in a tr///

2000-08-29 Thread Mark-Jason Dominus
> One thing to be careful of there is thread safety. You can't hand > the data off the syntax node (the one with the tr op on it), because > tr/$foo/$bar/ wouldn't work for several threads in it at the same > time then. Certainly, but that is true for everything else that is in the op node, whi

Re: RFC 110 (v3) counting matches

2000-08-29 Thread Mark-Jason Dominus
> >>solution to execute perl code inside a string, replacing "${\(...)}" and > > > >The first one doesn't work, and never did. You want > >@{[]} and @{[scalar ]} instead. > > "Doesn't work"? I think what Tom means is that (for example) print "${\(localtime())}\n"; does not p

Re: RFC 165 (v1) Allow Varibles in tr///

2000-08-29 Thread Mark-Jason Dominus
> > The way tr/// works is that a 256-byte table is constructed at compile > > time that say for each input character what output character is > > Speaking of which, what's going to happen when there are more than 256 > values to map? It's already happened, but I forget the details.

Re: RFC 110 (v3) counting matches

2000-08-30 Thread Mark-Jason Dominus
> On Tue, 29 Aug 2000, Mark-Jason Dominus wrote: > > > OK, I think this discussion should be closed. > > I think the bit about "having a special array containing all captured > matches" might well still live on. The "counting" bit _per se_ is probably &

Re: RFC 165 (v1) Allow Varibles in tr///

2000-08-30 Thread Mark-Jason Dominus
> Ok, I can understand that. But, what happens when we get to UTF16? Aren't > we talking about 256k per tr///, then? That seems like a lot of memory > that is potentially wasted and could lead to some really large footprints. I don't understand what this discussion has to do with this mailing

Re: RFC 72 (v1) The regexp engine should go backward as well as forward.

2000-08-30 Thread Mark-Jason Dominus
The big thing I find missing from this RFC is compelling examples. You are proposing a major change to the regex engine but you only have two examples. Both involve only fixed strings and one of them is artificial. I really think you need to discuss in more detail why this feature would be usef

Re: RFC 110 (v3) counting matches

2000-08-31 Thread Mark-Jason Dominus
> (mystery: how > can filling in $& be a lot slower than filling in $1?) It isn't. It's the same. $1 might even be more expensive than $&. It appears that many people don't understand the problem with $&. I will try to explain. Maintaining the information required by $1 or $& slows down the

Re: RFC 72 (v1) The regexp engine should go backward as well as forward.

2000-08-31 Thread Mark-Jason Dominus
> I am unemcumbered by any knowledge of the regex engine implementation, Yeah. But I do know something about it, and I have already expressed my informed opinion. Having you come along to say that you don't know anything about it at all, but that you nevertheless think I am mistaken, is bizar

Re: $& and copying: rfc 158 (was Re: RFC 110 (v3) counting matches)

2000-08-31 Thread Mark-Jason Dominus
> MD> One of Uri's suggestions in RFC 158 was to compute $& only for > MD> regexes that have a /k modifier. This would solve the $& problem > MD> because Perl would compute $& only when asked to, and not for > MD> every other regex in the rest of the program. > > the rfc was about makin

Re: $& and copying: rfc 158 (was Re: RFC 110 (v3) counting matches)

2000-08-31 Thread Mark-Jason Dominus
> in any case, i think we have a fair agreement on rfc 158 and i will > freeze it if there is no further comments on it. In light of this: $& The string matched by the last successful pattern match (not counting any matches hidden within a BLOCK or eval() enclosed by the

Re: perl6-language-regex summary for 20000831

2000-09-01 Thread Mark-Jason Dominus
> On Thu, Aug 31, 2000 at 12:34:05PM -0400, Mark-Jason Dominus wrote: > > > > perl6-language-regex > > > > Summary report 2831 > > > > RFC 72: The regexp engine should go backward as well as > > forward. (Peter Heslin) > > >

perl6-language-regex summary for 20000831

2000-08-31 Thread Mark-Jason Dominus
There was no discussion of this. RFC 170: Generalize =~ to a special-purpose assignment operator (Nathan Wiger) This is probably the most interesting and far-reaching RFC proposed this week, but there was essentially no discussion. Mark-Jason Dominus [EMAIL PROTECTED] I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.html for details.

Re: XML/HTML-specific ?< and ?> operators? (was Re: RFC 145 (alternate approach))

2000-09-06 Thread Mark-Jason Dominus
> >...My point is that I think we're approaching this > >the wrong way. We're trying to apply more and more parser power into what > >classically has been the lexer / tokenizer, namely our beloved > >regular-expression engine. I've been thinking the same thing. It seems to me that the attempts

Re: XML/HTML-specific ?< and ?> operators? (was Re: RFC 145 (alternate approach))

2000-09-06 Thread Mark-Jason Dominus
> >>>>> "Mark-Jason" == Mark-Jason Dominus <[EMAIL PROTECTED]> writes: > > Mark-Jason> I have some ideas about how to do this, and I will try to > Mark-Jason> write up an RFC this week. > > "You want Icon, you know where to find i

Re: What's in a Regex (was RFC 145)

2000-09-07 Thread Mark-Jason Dominus
> > 2. Many people - including Larry - have voiced their desire > > to see =~ die a horrible death > > Please provide a look-up-able reference to Larry's saying that he > wanted to =~ to die horrible death. Larry said: # Well, the fact is, I've been thinking about possible ways to get

Re: XML/HTML-specific ?< and ?> operators? (was Re: RFC 145 (alternate approach))

2000-09-07 Thread Mark-Jason Dominus
> I think what is needed is something along the line of : Joe McMahon and I are working on something along these lines.

Re: RFC 72 (v1) The regexp engine should go backward as well as forward.

2000-09-11 Thread Mark-Jason Dominus
> Simply put, I want variable-length lookbehind. Why didn't you simply propose that the (?<...) operator be fixed to support variable-length expressions? Why so much additional machinery?

Re: RFC 72 (v1) The regexp engine should go backward as well as forward.

2000-09-11 Thread Mark-Jason Dominus
> As to your contention that "at best" (?r) will defeat many present > optimizations, can you tell me why this will necessarily be so in the > new engine? Let me explain my thinking along these lines. I've made a number of assumptions, which may not be correct, and certainly aren't obvious. I

Re: RFC 165: Allow Variables in tr/// (post hugo)

2000-09-11 Thread Mark-Jason Dominus
> I propose adding the first para as a note and moving RFC to frozen soon. You did not address my points about tr///o and related issues. I suggest that you submit a revised RFC and then freeze it a week afterwards if there is still no discussion.

Re: RFC 166 (v1) Additions to regexs

2000-09-11 Thread Mark-Jason Dominus
> (?@foo) is sort of equivalent to (??{join('|',@foo)}), ie it expands into a > list of alternatives. One could possible use just @foo, for this. It just occurs to me that this is already possible. I've written a module, 'atq', such that if you write use atq; then your regexes may co

Re: RFC 197 (v1) Numberic Value Ranges In Regular Expressions

2000-09-11 Thread Mark-Jason Dominus
I have some trouble understanding just what the proposal is, since the RFC doesn't contain any examples. But I gather that you want to usurp *both* the (...) and the [...] notation for numeric ranges. This would change the meaning of any code that happened to contain a regex like this:

Re: $& and copying: rfc 158 (was Re: RFC 110 (v3) counting matches)

2000-09-11 Thread Mark-Jason Dominus
> > in any case, i think we have a fair agreement on rfc 158 and i will > > freeze it if there is no further comments on it. > > I think you should remove the parts of your propsal about making $& be > autolocalized. If you're not planning to revise your RFC, let me know so that I can

Re: RFC 110 counting matches (post Hugo)

2000-09-11 Thread Mark-Jason Dominus
> I propose adding this note. His preference for the working of > /t and /g seems the most appropriate. Unless I here any further > discussion I propose moving this RFC to frozen this week. Please post a complete, revised version of the RFC *before* you freeze it.

Re: XML/HTML-specific ?< and ?> operators?

2000-09-11 Thread Mark-Jason Dominus
> : it looks worse and dumps core. > > That's because the first non-paren forces it to recurse into the > second branch until you hit REG_INFTY or overflow the stack. Swap > second and third branches and you have a better chance: I think something else goes wrong there too. > $re = qr{...

Re: what (?x) are in use? (was RFC 145 (alternate approach))

2000-09-11 Thread Mark-Jason Dominus
> In theory, all letters should be reserved to map to future flags for > the same reason. My recollection is that Larry specifically mandated this, and that's why (?p...) was changed to (??...) in 5.6.0.

Re: XML/HTML-specific ?< and ?> operators?

2000-09-11 Thread Mark-Jason Dominus
> :Anyway, Snobol has a nice heuristic to prevent infinite recursion in > :cases like this, but I'm not sure it's applicable to the way the Perl > :regex engine works. I will think about it. > > It is probably worth adding the heuristic above: anytime you recurse > into the same re at the same

Re: RFC 166 (v1) Additions to regexs

2000-09-12 Thread Mark-Jason Dominus
> > (The \ is necessary here because (?@foo) already has a meaning under > > Perl 5, and I think your proposal must address this.) > > (?@foo) has no meaning I checked the code I don't know what you mean, but you're mistaken, because it means to interpolate @foo as in a double-quoted string.

perl6-language-regex summary for 20000911

2000-09-11 Thread Mark-Jason Dominus
perl6-language-regex Summary report 2911 RFC 72: The regexp engine should go backward as well as forward. (Peter Heslin) The author sent revised version of the RFC. There seem to be two ideas here: 1. The lookbehind assertions should work for variable-length patterns. (At pre

Re: RFC 166 (v1) Additions to regexs

2000-09-13 Thread Mark-Jason Dominus
> On Tue, 12 Sep 2000 19:01:35 -0400, Mark-Jason Dominus wrote: > > >I don't know what you mean, but you're mistaken, because it means to > >interpolate @foo as in a double-quoted string. > > Which is precisely the meaning he wants for it, with $" set t

Re: RFC 166 (v2) Alternative lists and quoting of things

2000-09-15 Thread Mark-Jason Dominus
> (?Q$foo) Quotes the contents of the scalar $foo - equivalent to > (??{ quotemeta $foo }). How is this different from \Q$foo\E ?

Re: Perlstorm #0040

2000-09-23 Thread Mark-Jason Dominus
> I lie: the other reason qr{} currently doesn't behave like that is that > when we interpolate a compiled regexp into a context that requires it be > recompiled, Interpolated qr() items shouldn't be recompiled anyway. They should be treated as subroutine calls. Unfortunately, this requires a

Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Mark-Jason Dominus
I think the proposal that Joe McMahon and I are finishing up now will make these obsolete anyway.

Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-25 Thread Mark-Jason Dominus
> On Mon, Sep 25, 2000 at 08:56:47PM +0000, Mark-Jason Dominus wrote: > > I think the proposal that Joe McMahon and I are finishing up now will > > make these obsolete anyway. > > Good! The less I have to maintain the better... Sorry, I meant that it would make (??...)