Re: Please rename 'but' to 'has'.
On Sun, 2002-04-21 at 10:59, Trey Harris wrote: 0 has true my first reaction would be, huh? Since when? Dare I say... now? ;-) Sorry, someone had to say it. Personally, even though it sucks up namespace, I think what we're seeing here is a need for more than one keyword that are synonyms. but and now seem to cover a good deal of ground. 0 now true Is misleading, IMHO, as 0 is not now true. 0, in this context is an expression, and we're saying that that expression is now true. but conveys this much more clearly. However, as many have pointed out, there are a number of cases where but is equally misleading. Is there any problem with allowing both but and now? It might even be elegant to use both at the same time: $x now integer but true which is clearer to my eye than $x now integer now true which seems to change the properties of $x twice without reconciling the changes with each other. In any other language this would be unthinkable, but I think it fits nicely with Perl's philosophy. Not TMTOWTDI, which I think is often used to excuse the inexcusable, but the idea that Perl reflects the ways in which humans use language. We want to convey shades of meaning that do not translate directly to action. So, have I just lost it, or would it make sense to have now and but? Apologies to the person who started this thread. I know you thought has was ideal, and I understand why. It's just that between but and now, I think you get more ground covered than you do with has and either one.
RE: Regex and Matched Delimiters
On Sat, 2002-04-20 at 05:06, Mike Lambert wrote: He then went on to describe something I didn't understand at all. Sorry. Few corrections to what you wrote: To avoid the problem of extending {} to support new features with a character 'x', without breaking stuff that might have an 'x' immediately after the '{', my proposal is to require one space after the { before the real regex appears. I hope that you mean one or more whitespace characters, not just a space. The following would be correct, no? /{| .* }/ Anything else would seem rather confusing to the average Perl programmer.
Re: Regex and Matched Delimiters
On Sat, 2002-04-20 at 14:33, Me wrote: [2c. What about ( data) or (ops data) normally means non-capturing, ($2 data) captures into $2, ($foo data) captures into $foo?] Very nice (but, I assume you meant {$foo data})! This does add another special case to the regexp parser's handling of $, but it seems like it would be worth it. Makes me think of the even slightly hairier: {foo data} or even more hair-full: {{$foo} data} for references. Where you capture into the usual positional, and then invoke foo with the variable as parameter. Would be pretty nice closure-wise: sub match_with_alert($re,$id,$ops,$fac,$pri) { openlog $id,$ops,$fac; my $alert = sub ($match) { syslog $pri, Matched regexp: $match; } return study /{{$alert} $re}/; } my $m = match_with_alert('ROOT login',$0,0,LOG_USER,PRI_CRIT); for - $_ { /$m/ } That would certainly be a handy thing that would set Perl apart from the pack of advanced regexp languages that don't support closures Some other things come to mind as well, but I'm not sure how evil they are. For example: sub decrypt($data is rw) { $data = rot13($data); } print The secret message is: , /^Encrypted: {decrypt .*}/, \n;
Re: Regex and Matched Delimiters
Very nice (but, I assume you meant {$foo data})! I didn't mean that (even if I should have). Aiui, Mike's final suggestion was that parens end up doing all the (ops data) tricks, and braces are used purely to do code insertions. (I really liked that idea.) So: Perl 5Perl6 (data)( data) (?opsdata)(ops data) ({}) {} -- ralph
Re: Regex and Matched Delimiters
On Mon, 2002-04-22 at 14:18, Me wrote: Very nice (but, I assume you meant {$foo data})! I didn't mean that (even if I should have). Aiui, Mike's final suggestion was that parens end up doing all the (ops data) tricks, and braces are used purely to do code insertions. (I really liked that idea.) So: Perl 5Perl6 (data)( data) (?opsdata)(ops data) ({}) {} I don't like that particular way of looking at things, but either way my comments about subroutines and closures still holds.
Re: Please rename 'but' to 'has'.
Aaron Sherman writes: : On Sun, 2002-04-21 at 10:59, Trey Harris wrote: : : 0 has true : : my first reaction would be, huh? Since when? : : Dare I say... now? ;-) : : Sorry, someone had to say it. : : Personally, even though it sucks up namespace, I think what we're seeing : here is a need for more than one keyword that are synonyms. but and : now seem to cover a good deal of ground. : : 0 now true : : Is misleading, IMHO, as 0 is not now true. 0, in this context is an : expression, and we're saying that that expression is now true. but : conveys this much more clearly. However, as many have pointed out, there : are a number of cases where but is equally misleading. : : Is there any problem with allowing both but and now? It might even be : elegant to use both at the same time: : : $x now integer but true : : which is clearer to my eye than : : $x now integer now true : : which seems to change the properties of $x twice without reconciling the : changes with each other. : : In any other language this would be unthinkable, but I think it fits : nicely with Perl's philosophy. Not TMTOWTDI, which I think is often used : to excuse the inexcusable, but the idea that Perl reflects the ways in : which humans use language. We want to convey shades of meaning that do : not translate directly to action. : : So, have I just lost it, or would it make sense to have now and but? : : Apologies to the person who started this thread. I know you thought : has was ideal, and I understand why. It's just that between but and : now, I think you get more ground covered than you do with has and : either one. Perl 6 will try to avoid synonyms but make it easy to declare them. At worst it would be something like: my sub operator:now ($a,$b) is inline { $a but $b } Larry
Re: Regex and Matched Delimiters
Me writes: : Very nice (but, I assume you meant {$foo data})! : : I didn't mean that (even if I should have). : : Aiui, Mike's final suggestion was that parens end up : doing all the (ops data) tricks, and braces are used : purely to do code insertions. (I really liked that idea.) : : So: : : Perl 5Perl6 : (data)( data) : (?opsdata)(ops data) : ({}) {} Hmm. Let me spill a few beans about where I'm going with A5. I've been thinking similar thoughts about the problem of overloading parens so heavily in Perl 5, but I'm going in a slightly different direction with it. The basic principles for the new regexen are: * Parens always capture. * Braces are always closures. * Square brackets are always character classes. * Angle brackets are always metasyntax (along with backslash). So a first whack at the differences might be: Old New --- --- // /prior/ ??? ?pat? /?f:pat/ ??? /pat/i m:i/pat/ or /?i:pat/ or even m?i:pat ??? /pat/x /pat/ /^pat$/m/^^pat$$/ /./s/any/ or /./ ??? \p{prop}+prop ??? \P{prop}-prop ??? space sp (or \h for horizontal?) {n,m} n,m \t also tab \n also lf or nl (latter matching logical newline) \r also cr \f also ff \a also bell \e also esc \033same \x1Bsame \x{263a}\x263a ??? \c[ same \N{name}name \l same \u same \Lstring\E \Lstring \Ustring\E \Ustring \E gone [\040\t]\h plus any Unicode horizontal whitespace [\r\n\ck] \v plus any Unicode vertical whitespace \b same \B same \A ^ \Z same? \z $ \G pos, but assumed in nested patterns? \1 $1 \Q$var\E$varalways assumed literal, so $1 is literal backref $var$var assumed to be regex =~ $re =~ /$re/ ouch? (??{$rule}) rule (?{ code }) { code } with failure semantics (?#...) {...} :-) (?:...) :... (?=...) before: ... (?!...) !before: ... (?=...)after: ... (?!...)!after: ... (?...) grab: ... (?(cond)t|f)Not sure. Could just use { if ... } Obviously the word and word:... syntaxes will be user extensible. We have to be able to support full grammars. I consider it a feature that foo looks like a non-terminal in standard BNF notation. I do not consider it a misfeature that foo resembles an HTML or XML tag, since most of those languages need to be matched with a fancy rule named tag anyway. An interesting idea would be that if you say mfoo: pat or m{code} it's as if you said m/foo: pat/ or m/{code}/ The latter is particularly interesting to me in that I can see uses for patterns that are Perl code at the top level rather than regex literal. Any closure within a regular expression has full access to the current state object for the match. So most of the RFCs proposing ad hoc mechanisms for saving submatches in various kinds of variables can be handled with closures. /(...)(...)(...) { array = .all } / or /(...) { $first = $+ } (...) { $second = $+ } (...) { $third = $+ }/ or /IF (COND) (BLOCK) { .node = [if,$1,$2] } / # shades of yacc or whatever. Could have a $foo=... as syntactic sugar, perhaps. But we need the general mechanism for building up parse trees of arrays of hashes of arrays of arrays of hashes of arrays of hashes of... I haven't decided yet whether matches embedded in the closure should automatically pick up where the outer match is, or whether there should be some explicit match op to mean that, much like \G only better. I'm thinking when the current topic is a match state, we automatically continue where we left off, and require explicit =~ to start an unrelated match. I also haven't committed to any particular mechanism for defining a set of related rules in a grammar. Obviously it needs to be a good enough mechanism to parse Perl and its variants, which means it probably needs to be OO based, and you make new grammars by derivation from the base grammar and overriding the rules you want to change. Sorry if this is a bit delirious--I'm fighting off some kind of infection, and my nights have been shortchanged lately by the neighborhood panhandler who doesn't seem to understand
Re: Regex and Matched Delimiters
(?=...) before: ... (?!...) !before: ... (?=...) after: ... (?!...) !after: ... (?...) grab: ... Yummy :) I'd say this is about perfect. The look(ahead|behind)s, er, look:ahead|behinds are used seldom enough that this is practical. And it's Iso much clea[nr]er than that (?=...) crap. (Think I'm going overboard with this tregext?) And are you going to reveal the method by which you define your own words, so we can overload it with personal ungrounded opinions? (On the other hand, it'd probably just stick and not move, because you said it.) Sorry if this is a bit delirious--I'm fighting off some kind of infection, and my nights have been shortchanged lately by the neighborhood panhandler who doesn't seem to understand either complicated concepts like bedtime or simple concepts like no. bed...what? Luke
RE: Regex and Matched Delimiters
Larry Wall: # Me writes: # : Very nice (but, I assume you meant {$foo data})! # : # : I didn't mean that (even if I should have). # : # : Aiui, Mike's final suggestion was that parens end up # : doing all the (ops data) tricks, and braces are used # : purely to do code insertions. (I really liked that idea.) # : # : So: # : # : Perl 5Perl6 # : (data)( data) # : (?opsdata)(ops data) # : ({}) {} # # Hmm. Let me spill a few beans about where I'm going with A5. # I've been thinking similar thoughts about the problem of # overloading parens so heavily in Perl 5, but I'm going in a # slightly different direction with it. The basic principles # for the new regexen are: # # * Parens always capture. # * Braces are always closures. # * Square brackets are always character classes. # * Angle brackets are always metasyntax (along with backslash). # # So a first whack at the differences might be: # # Old New # --- --- # ///prior/ ??? # ?pat? /?f:pat/ ??? # /pat/im:i/pat/ or /?i:pat/ or even m?i:pat ??? Whoa, those are moving to the front?!? # /pat/x/pat/ # /^pat$/m /^^pat$$/ That's...odd. Is $$ (the variable) going away? # /./s /any/ or /./ ??? I think that . is too common a metacharacter to be relegated to this. # \p{prop} +prop ??? # \P{prop} -prop ??? Intriguing. # space sp (or \h for horizontal?) Same thinking as '.'. # {n,m} n,m Ah, OK. # \talso tab # \nalso lf or nl (latter matching logical newline) # \ralso cr # \falso ff # \aalso bell # \ealso esc I can tell you right now that these are going to screw people up. They'll try to use these in normal strings and be confused when it doesn't work. And you probably won't be able to emit a warning, considering how much CGI Perl munches. # \033 same # \x1B same # \x{263a} \x263a ??? Why? Wouldn't we want the same thing to work in quoted strings? (Or are those changing syntaxes too?) # \c[ same # \N{name} name # \lsame # \usame # \Lstring\E\Lstring # \Ustring\E\Ustring So that's changed from whenever you talked about \q{} ? # \Egone # [\040\t] \hplus any Unicode horizontal whitespace # [\r\n\ck] \v plus any Unicode vertical whitespace # # \bsame # \Bsame # \A^ # \Zsame? # \z$ Are you sure that optimizes for the common case? # \Gpos, but assumed in nested patterns? # # \1$1 # # \Q$var\E $varalways assumed literal, so $1 is literal backref So these are reinterpolated every time you backtrack? Are you *trying* to destroy regex performance? :^) # $var $var assumed to be regex What if $var is a qr//ed object? # =~ $re=~ /$re/ ouch? I don't see the win. # (??{$rule}) rule # (?{ code }) { code } with failure semantics # (?#...) {...} :-) # (?:...) :... # (?=...) before: ... # (?!...) !before: ... # (?=...) after: ... # (?!...) !after: ... Cute. (Wait a minute, aren't those reversed?) # (?...) grab: ... # (?(cond)t|f) Not sure. Could just use { if ... } if(cond):true|false? # Obviously the word and word:... syntaxes will be user # extensible. We have to be able to support full grammars. I # consider it a feature that foo looks like a non-terminal in # standard BNF notation. I do not consider it a misfeature # that foo resembles an HTML or XML tag, since most of those # languages need to be matched with a fancy rule named tag anyway. But that *does* make it harder to define the fancy rules. I could see someone defining rules like: 'gt' = qr/\/, 'lt' = qr/\/ just to get around backslashing everything in sight. # An interesting idea would be that if you say # # mfoo: pat # # or # # m{code} # # it's as if you said # # m/foo: pat/ # # or # # m/{code}/ I don't know about that one. I often use {} as delimiters on regexen because it's a character that doesn't occur in data very often. I think the gain of two characters isn't as critical as the loss of