Re: RFC 331 (v1) Consolidate the $1 and C\1 notations
Jonathan Scott Duff [EMAIL PROTECTED] writes: On Thu, Sep 28, 2000 at 08:57:39PM -, Perl6 RFC Librarian wrote: ${P1} means what $1 currently means (first match in last regex) I'm sorry that I don't have anything more constructive to say than "ick", but ... Ick. I'm with the 'Ick' camp too. And possibly with the 'Leave it the hell alone! If you're that bloody stupid you deserve to lose' camp too. -- Piers
Re: RFC 112 (v3) Asignment within a regex
On Fri, 29 Sep 2000 01:02:40 +0100, Hugo wrote: It also isn't clear what parts of the expression are interpolated at compile time; what should the following leave in %foo? %foo = (); $bar = "one"; "twothree" =~ / (?$bar=two) (?$foo{$bar}=three) /x; It's not just that. You act as if this is assignment takes place whenever a submatch succeeds. So: "twofour" =~ /(?$bar=two)($foo=three)/; Will $bar be set to "two", and $foo undef? I think not. Assignment should be postponed to till the very end, when the match finally succeeds, as a whole. In general all assignments should wait to the very end, and then assign them all. However before code callouts (?{...}) and enemies, the named assignments that are currently defined should be made (localised) so that the code can refer to them by name. If the expression finally fails the localised values would unroll. Therefore, I think that allowing just any l-value on the left of the "=" sign, is not practical. Or is it? I think any simple scalar value is reasonable. OTOH I would rather have that all submatches would be assigned to a hash, not to global or lexical variables. I have no clue about what syntax that would need. That is in RFC 150, I think there is a case for both. Richard
Re: RFC 112 (v3) Asignment within a regex
In [EMAIL PROTECTED], "Richard Proctor" writes: :In general all assignments should wait to the very end, and then assign :them all. [...] If the expression finally fails the localised values :would unroll. Ah, I hadn't anticipated that - I had assumed you would get whatever was the last value set. Please can you make sure this is clearly explained in the next version of the RFC? Hugo
Re: RFC 316 (v1) Regex modifier for support of chunk processing and prefix matching
On Fri, 29 Sep 2000 13:19:47 +0100, Hugo wrote: I think that involves rewriting your /p example something like: if (/^$pat$/z) { print "found a complete match"; } elsif (defined pos) { print "found a prefix match"; } else { print "not a match"; } Except that this isn't exactly what would happen. Look, "1234E+2" is a complete string matching the regex, but it could be that it's just a prefix for "1234E+21". So, /^$pat$/z should fail. No? This doesn't seem too intuitive, but that's a result from a minimal interface. -- Bart.
Re: RFC 331 (v1) Consolidate the $1 and C\1 notations
On Thu, 28 Sep 2000, Hugo wrote: :=item * :/(foo)_C\1_bar/ Please don't do this: write C/(foo)_\1_bar/ or /(foo)_\1_bar/, but don't insert C in the middle: that makes it much more difficult to read. Sorry; that was a global-replace error that I missed on proofreading. :mean different things: the second will match 'foo_foo_bar', while the :first will match 'foo[SOMETHING]bar' where [SOMETHING] is whatever was should be: foo_[SOMETHING]_bar Um, yeah, it should...(jeez...I proofed this like three times, honest!) *blush* :captured in the Bprevious match...which could be a long, long way away, This seems a bit unfair. It is just another variable. Any variable you include in a pattern, you are assumed to know that it contains the intended value - there is nothing special about $1 in this regard. Fair enough; the point I was trying to make was that \1 was captured right here, while $1 was capturd long, long ago in a pattern match far, far away. The visual/cognitive difference is small, but the programming difference is huge. :=item * :${P1} means what $1 currently means (first match in last regex) Do you understand that this is the same variable as $P1? Traditionally, perl very rarely coopts variable names that start with alphanumerics, and (off the top of my head) all the ones it does so coopt are letters only (ARGV, AUTOLOAD, STDOUT etc). I think we need better reasons to extend that to all $P1-style variables. I do understand that, and I agree with your concern. Actually, I didn't think that ${P1} was a particularly good notation even as I was suggesting it...I just wanted to get the RFC up there before the deadline so that people could discuss it. Having now thought about it more, I think that (?P1) is better...in other words, make references to the previous pattern match be a regex _extension_, not a core feature (if that's a valid way to phrase the distinction). What is the migration path for existing uses of $P1-style variables? Wherever p526 sees a pattern that contains a $1, it should replace it with (?P1). :=item * :s/(bar)(bell)/${P1}$2/ # changes "barbell" to "foobell" Note that in the current regexp engine, ${P1} has disappeared by the time matching starts. Can you explain why we need to change this? Note also that if you are sticking with ${P1} either we need to rename all existing user variables of this form, or we can no longer use the existing 'interpolate this string' (or eval, double-eval etc) routines, and have to roll our own for this (these) as well. I'm a bit confused by the way this came out but, if I understand what you're asking, then I believe your concerns are solved by the new proposed syntax. Am I right? :This may require significant changes to the regex engine, which is a topic :on which I am not qualified to speak. Could someone with more :knowledge/experience please chime in? Currently the regexp compiler is handed a string in which $variables have already interpolated. [...] I know there are certain exceptions to this...my Camel III says (something to the effect of--I don't have it in front of me) "if there is any doubt as to whether something should be interpolated or left for the Engine, it will be left for the Engine." In any case, I don't think this needs to change. I'm simply changing what the names of the variables and backreferences are...\1 becomes (the new) $1, and (the current) $1 becomes (?P1) Changing the lifetime of backreferences feels likely to be difficult, but it isn't clear to me what you are trying to achieve here. I think you at least need to add an example of how it would act under s///g and s///ge. Good point. I'll do that. :RFC 276: Localising Paren Counts in qr()s. I didn't see a mention of these in the body of the proposal. 276 is rather tangentially related, I grant. However, I felt that if my proposal went forward, it could impact on how 276 was implemented, so I crossreferenced to it. Dave
Re: RFC 331 (v1) Consolidate the $1 and C\1 notations
On Fri, 29 Sep 2000, Hildo Biersma wrote: Currently, C\1 and $1 have only slightly different meanings within a regex. Let's consolidate them together, eliminate the differences, and settle on $1 as the standard. Sigh. That would remove functionality from the language. The reason why you need \1 in a regular expression is that $1, $2, ... are interpolated from the previous regular expression. This allows me to do a pattern match that captures variables, then use the results of that to create a second regular expression. (Remember: A regexp interpolates first, then compiles the pattern). Umm...with all due respect, did you read the RFC? Because what I proposed does not eliminate any functionality. Dave
Re: RFC 348 (v1) Regex assertions in plain Perl code
In [EMAIL PROTECTED], Perl6 RFC Librarian writes: :=item assertion in Perl5 : : (?(?{not COND})(?!)) : (?(?{not do { COND }})(?!)) Or (?(?{COND})|(?!)). Migration could consider replacing detectable equivalents of such constructs with the favoured new construct. :"local" inside embedded code will no longer be supported, nor will :consitional regexes. The Perl5 - Perl6 translator should warn if it :ever encounters one of these. I'm not convinced that removing either of these are necessary to the main thrust of the proposal. They may both still be useful in their own right, and you seem to offer little evidence against them other than that you don't like them. I do like the idea of making (?{...}) an assertion, all the more because we have a simple migration path that avoids unnecessarily breaking existing scripts: wrap $code as '$^R = do { $code }; 1'. If you want to remove support for 'local' in embedded code, it is worth a full proposal in its own right that will explain what will happen if people try to do that. (I think it will make perl unnecessarily more complex to detect and disable it in this case.) Similarly if you want to remove support for (?(...)) completely, you need to address the utility and options for migration for all the available uses of it, not just the one addressed by the new handling of (?{...}). Hugo