Re: \x{123a 123b 123c}
On Tue, Nov 22, 2005 at 12:48:39PM -0600, Patrick R. Michaud wrote: : On Mon, Nov 21, 2005 at 09:02:57AM -0800, Larry Wall wrote: : > On Sun, Nov 20, 2005 at 10:27:17AM -0600, Patrick R. Michaud wrote: : > : On Sat, Nov 19, 2005 at 06:32:17PM -0800, Larry Wall wrote: : > : > We already have, from A5, \x[0a;0d], so you can supposedly say : > : > "\x[123a;123b;123c]" : > : : > : Hmm, I hadn't caught that particular syntax in A05. AFAIK it's not : > : in S05, so I should probably add it, or whatever syntax we end up : > : adopting. : > : > Yes. : : Out of curiosity (and so I can update S05 and PGE), what syntax : are we adopting? Is it semicolon, comma, space, any combination of the : three, or ...? S02.pod currently has it as comma. Larry
Re: \x{123a 123b 123c}
On Mon, Nov 21, 2005 at 09:02:57AM -0800, Larry Wall wrote: > On Sun, Nov 20, 2005 at 10:27:17AM -0600, Patrick R. Michaud wrote: > : On Sat, Nov 19, 2005 at 06:32:17PM -0800, Larry Wall wrote: > : > We already have, from A5, \x[0a;0d], so you can supposedly say > : > "\x[123a;123b;123c]" > : > : Hmm, I hadn't caught that particular syntax in A05. AFAIK it's not > : in S05, so I should probably add it, or whatever syntax we end up > : adopting. > > Yes. Out of curiosity (and so I can update S05 and PGE), what syntax are we adopting? Is it semicolon, comma, space, any combination of the three, or ...? Pm
Re: \x{123a 123b 123c}
On Tue, Nov 22, 2005 at 10:30:20AM -0800, Larry Wall wrote: > On Tue, Nov 22, 2005 at 09:46:59AM -0800, Dave Whipp wrote: > : Larry Wall wrote: > : > : >And there aren't that many regexish languages anyway. So I think :syntax > : >is relatively useless except for documentation, and in practice people > : >will almost always omit it, which makes it even less useful, and pretty > : >nearly kicks it over into the category of multiplied entities for me. > : > : Its surprising how many are out there. > > We can certainly add a :syntax() modifier as easily as a :foolang modifier, > if we decide at some point we really need one, or if PGE could make good > use of it even if Perl 6 doesn't want it. I'm agreeing with Larry on this one -- let's wait to decide this until we actually feel like we need it. Pm
Re: \x{123a 123b 123c}
On Tue, Nov 22, 2005 at 09:46:59AM -0800, Dave Whipp wrote: : Larry Wall wrote: : : >And there aren't that many regexish languages anyway. So I think :syntax : >is relatively useless except for documentation, and in practice people : >will almost always omit it, which makes it even less useful, and pretty : >nearly kicks it over into the category of multiplied entities for me. : : Its surprising how many are out there. We can certainly add a :syntax() modifier as easily as a :foolang modifier, if we decide at some point we really need one, or if PGE could make good use of it even if Perl 6 doesn't want it. Larry
Re: \x{123a 123b 123c}
Larry Wall wrote: And there aren't that many regexish languages anyway. So I think :syntax is relatively useless except for documentation, and in practice people will almost always omit it, which makes it even less useful, and pretty nearly kicks it over into the category of multiplied entities for me. Its surprising how many are out there. Even if we ignore the various dialects of standard rexen, we can find interesting examples such as PSL, a language for specifying temporal assertions, for hardware design: http://www.project-veripage.com/psl_tutorial_5.php. Whether one would want to fold this syntax into a C is a different question. There are actually a number of competing languages in this space. E.g. http://www.pslsugar.org/papers/pslandsva.pdf.
Re: \x{123a 123b 123c}
On Tue, Nov 22, 2005 at 08:19:04PM +1100, Damian Conway wrote: : >And perhaps we'd want a general form for specifying other : >pattern syntaxes; i.e., :perl5 and :glob are shortcuts for : >:syntax('perl5') and :syntax('glob') or something like that. : : Agreed. But the language in the following lexical scope is a constant, so what can :syntax($foo) possibly mean? [Wait, this is Damian I'm talking to.] Nevermind, don't answer that... And there aren't that many regexish languages anyway. So I think :syntax is relatively useless except for documentation, and in practice people will almost always omit it, which makes it even less useful, and pretty nearly kicks it over into the category of multiplied entities for me. Larry
Re: \x{123a 123b 123c}
Patrick wrote: Since we already have :perl5, I'd think that we'd want globbing to be something like rule jpeg :i :glob /*.jp{e,}g/ or, for something intra-rule-ish: m :w / mv (:glob *.c)+ / Here! Here! And perhaps we'd want a general form for specifying other pattern syntaxes; i.e., :perl5 and :glob are shortcuts for :syntax('perl5') and :syntax('glob') or something like that. Agreed. Damian
Re: \x{123a 123b 123c}
On Tue, Nov 22, 2005 at 07:52:24AM -0800, Larry Wall wrote: > > I think we'll leave both _ and \_ meaning the same thing, just to avoid > that confusion path [...] Yay! > : Whatever shortcuts we introduce, I'll be happy if we can just > : rule that backslash+space (i.e., "\ ") is a literal space > : character -- i.e., keeping the principle that placing a backslash > : in front of a metacharacter removes that character's "meta" > : behavior. > > Yes, that will be a space. Yay! > : Since we already have :perl5, I'd think that we'd want globbing > : to be something like > : rule jpeg :i :glob /*.jp{e,}g/ > : or, for something intra-rule-ish: > : m :w / mv (:glob *.c)+ / > > Yep, that's what I decided in my other message that was thinking about > using < ... > for word boundaries and << ... >> for capturing $<>. Yay! (Our messages on this crossed in the mail; mine was moderated for some reason but that's been corrected.) > : And perhaps we'd want a general form for specifying other > : pattern syntaxes; i.e., :perl5 and :glob are shortcuts for > : :syntax('perl5') and :syntax('glob') or something like that. > > Maybe. Or maybe it's enough that there are syntactic categories for > adding rule modifiers. Doesn't seem like you'd want to parameterize > the current language very often. At least within PGE, I'm starting to come across the situation where each application and host language wants its own slight variations of the regular expression syntax (for compatibility reasons). And I figured that since we (conjecturally) have C<:lang('PIR')>, C<:lang('Python')> and C<:lang('TCL')> to indicate the language to be used for the closures within a rule, it might be nice to have a similar parameterized modifier for the pattern syntax itself. I was also thinking that one of the tricky parts to custom rule modifiers such as :perl and :glob is that they actually change the parsing for whatever follows, so it might be nice to have a parameterized form to hook into rather than defining a custom modifier for each syntax variant. But on thinking about it further from an implementation perspective I guess it all comes out the same anyway... Pm
Re: \x{123a 123b 123c}
On Mon, Nov 21, 2005 at 11:25:20AM -0600, Patrick R. Michaud wrote: : On Mon, Nov 21, 2005 at 09:02:57AM -0800, Larry Wall wrote: : > : There's also , unless someone redefines the subrule. : > : > But you can't use in a character class. Well, that is, unless : > you write it: : > : > <+[ a..z ]+> : > : > or some such. Maybe that's good enough. : : Er, that's now <+[ a..z ]+sp>, unless you're now changing it back. No, just me going senile. : > : And in the general case that's a slightly more expensive mechanism : > : to get a space (it involves at least a subrule lookup). Perhaps : > : we could also create a visible meta sequence for it, in the same : > : way that we have visible metas for \e, \f, \r, \t. But I have : > : no idea what letter we might use there. : > : > Something to be said for \_ in that regard. : : Yes, I thought of \_ but mentally I still have trouble : classifying "_" along with the alphabetics -- '_' looks more : like punctuation to me. And in general we use backslashes : in front of metacharacters to remove their meta meaning : (or when we aren't sure if a character has a meta meaning), : so that \_ somehow seems like it ought to be a literal : underscore, guarding against the possibility that the unescaped : underscore has a meta meaning. (And yes, I can shoot : holes in this line of thinking along with everyone else.) I think we'll leave both _ and \_ meaning the same thing, just to avoid that confusion path--I've seen people backwhacking anything remotely resembling punctuation just in case it's a metacharacter, and if they are confused about _, they might backwhack it. More to the point, I think and +sp are about the right Huffman length, given that matching a single space is usually wrong. You usually want \s or \s*. : Whatever shortcuts we introduce, I'll be happy if we can just : rule that backslash+space (i.e., "\ ") is a literal space : character -- i.e., keeping the principle that placing a backslash : in front of a metacharacter removes that character's "meta" : behavior. Yes, that will be a space. : > I dunno. If «...» in ordinary code does shell quoting, maybe «...» in : > rules does filename globbing or some such. I can see some issues with : > anchoring semantics. Makes more sense on a string as a whole, but maybe : > can anchor on element boundaries if used on a list of filenames. : > I suppose one could even go as far as : > : > rule jpeg :i « *.jp{e,}g » : > : > or whatever the right glob syntax is. : : Since we already have :perl5, I'd think that we'd want globbing : to be something like : : rule jpeg :i :glob /*.jp{e,}g/ : : or, for something intra-rule-ish: : : m :w / mv (:glob *.c)+ / Yep, that's what I decided in my other message that was thinking about using < ... > for word boundaries and << ... >> for capturing $<>. : And perhaps we'd want a general form for specifying other : pattern syntaxes; i.e., :perl5 and :glob are shortcuts for : :syntax('perl5') and :syntax('glob') or something like that. Maybe. Or maybe it's enough that there are syntactic categories for adding rule modifiers. Doesn't seem like you'd want to parameterize the current language very often. Larry
Re: \x{123a 123b 123c}
On Mon, Nov 21, 2005 at 09:02:57AM -0800, Larry Wall wrote: > : There's also , unless someone redefines the subrule. > > But you can't use in a character class. Well, that is, unless > you write it: > > <+[ a..z ]+> > > or some such. Maybe that's good enough. Er, that's now <+[ a..z ]+sp>, unless you're now changing it back. > : And in the general case that's a slightly more expensive mechanism > : to get a space (it involves at least a subrule lookup). Perhaps > : we could also create a visible meta sequence for it, in the same > : way that we have visible metas for \e, \f, \r, \t. But I have > : no idea what letter we might use there. > > Something to be said for \_ in that regard. Yes, I thought of \_ but mentally I still have trouble classifying "_" along with the alphabetics -- '_' looks more like punctuation to me. And in general we use backslashes in front of metacharacters to remove their meta meaning (or when we aren't sure if a character has a meta meaning), so that \_ somehow seems like it ought to be a literal underscore, guarding against the possibility that the unescaped underscore has a meta meaning. (And yes, I can shoot holes in this line of thinking along with everyone else.) Whatever shortcuts we introduce, I'll be happy if we can just rule that backslash+space (i.e., "\ ") is a literal space character -- i.e., keeping the principle that placing a backslash in front of a metacharacter removes that character's "meta" behavior. > I dunno. If «...» in ordinary code does shell quoting, maybe «...» in > rules does filename globbing or some such. I can see some issues with > anchoring semantics. Makes more sense on a string as a whole, but maybe > can anchor on element boundaries if used on a list of filenames. > I suppose one could even go as far as > > rule jpeg :i « *.jp{e,}g » > > or whatever the right glob syntax is. Since we already have :perl5, I'd think that we'd want globbing to be something like rule jpeg :i :glob /*.jp{e,}g/ or, for something intra-rule-ish: m :w / mv (:glob *.c)+ / And perhaps we'd want a general form for specifying other pattern syntaxes; i.e., :perl5 and :glob are shortcuts for :syntax('perl5') and :syntax('glob') or something like that. Pm
Re: \x{123a 123b 123c}
On Mon, Nov 21, 2005 at 09:02:57AM -0800, Larry Wall wrote: : But I'd like to reserve < > for delimiting what is returned by $<>, : the string officially matched: : : "foo bar baz" ~~ /:w foo < \w+ > baz/ : say $/; # foo bar baz : say $<>; # bar Though it occurs to me that there's another possible interpretation, culturally speaking. The overloading of \b has always bothered me, plus the fact that \b can't distinguish which kind of word boundary without additional context. In regex culture, we have the \<...\> word matcher, and maybe that devolves to isolated < ... > in rules. We could still use << ... >> to capture $<>, which I was leaning toward anyway just for visibility reasons, since the two ends could be quite far apart. And file globbing could just be :glob or some such if we really need to embed it in rules. Larry
Re: apo5 (was: Re: \x{123a 123b 123c})
On Mon, Nov 21, 2005 at 05:49:59PM +0100, Ruud H.G. van Tol wrote: : Larry Wall: : > Juerd: : >> Ruud: : : >>> Maybe : >>> "\x{123a 123b 123c}" : >>> is a nice alternative of : >>> "\x{123a} \x{123b} \x{123c}". : >> : >> Hmm, very cute and friendly! Can we keep it, please? Please? : : Thanks for the support. Hey, this ain't exactly a popularity contest here... :-) : > We already have, from A5, \x[0a;0d], so you can supposedly say : > "\x[123a;123b;123c]" : : : Found it in the old/new table on page 7. For me the semicolon is fine. The fact that you say "page 7" leads me to guess that you're reading it from perl.com. That's going to be the most out-of-date version. Better would be dev.perl.orgone day latency but html-ified svn.perl.orgup to the minute but only in pod In particular, the Apocalypses have little [Update:] sections that are supposed to alert you to things that have changed since the the Apo was written. (Though some of those are a little out of date right now too--I'm just working my way through A12 again.) : I am using character names more and more, and between those, semicolons : are less cluttery. Character names can contain spaces, but semicolons : too? If not then : \c[BEL; EXTENDED ARABIC-INDIC DIGIT ZERO] would be possible, but maybe : better not, or more like : \c['BEL'; 'EXTENDED ARABIC-INDIC DIGIT ZERO'] or even : \c('BEL', 'EXTENDED ARABIC-INDIC DIGIT ZERO'). None of the current names contain either semicolon or comma, so I expect they're avoiding them by policy. : Something else: : The '^' could be used for both the ultimate start- and end-of-string. : This frees the '$'. I think this is one of those aspects of regex culture that is too entrenched to remove. Besides, you have to be able to distinguish s/^/foo/ from s/$/foo/. : There is still the '$$' that matches before embedded newlines, and since : '^^' matches after those newlines, the '^^' and '$$' can only be unified : to '^^' if it is one-width inside a string, so is like '[$$\n^^]' (or : just '\n') there. But then if you use it within a capture, you get an extra newline you probably don't want. : At start- and end-of-string the '^^' can still be a zero-width match. : I am not sure about greedy (meaning to try one-width first) or : non-greedy. : : Example: '^[(\N*)^^]*^' to capture all lines, clean of newlines. : Not a lot clearer than '^[(\N*)\n*]*$', but freeing the '$' and '$$' : might be worth it. I don't think it's any clearer. In fact, I find all the ^'s there are a little too visually confusing and contextual. Larry
Re: \x{123a 123b 123c}
On Sun, Nov 20, 2005 at 10:27:17AM -0600, Patrick R. Michaud wrote: : On Sat, Nov 19, 2005 at 06:32:17PM -0800, Larry Wall wrote: : > On Sun, Nov 20, 2005 at 01:26:21AM +0100, Juerd wrote: : > : Ruud H.G. van Tol skribis 2005-11-20 1:19 (+0100): : > : > Maybe : > : > "\x{123a 123b 123c}" : > : > is a nice alternative of : > : > "\x{123a} \x{123b} \x{123c}". : > : > We already have, from A5, \x[0a;0d], so you can supposedly say : > "\x[123a;123b;123c]" : : Hmm, I hadn't caught that particular syntax in A05. AFAIK it's not : in S05, so I should probably add it, or whatever syntax we end up : adopting. Yes. : (BTW, we haven't announced it on p6l yet, but there's a new version of : S05 available.) Indeed, there are new versions of most of the S's. People who want the latest should use svn.perl.org, which also makes it easy to do diff listings with svn or svk. : > [...] : > But I see that the semicolon is rather cluttery, mainly because it's : > too tall. I'm not sure going all the way to space is good, but we : > might have : > "\x[123a,123b,123c]" : > just to get a little visual space along with the separator. : : Just to verify, with this syntax would we expect : : \x[123a,123b,123c]+ : : to be the same as : : [\x123a \x123b \x123c]+ : : and not "\x123a \x123b \x123c+" ? Yes. I think the rule interpretation of \x is that it is a sequence to be considered a single character regardless of its context. Certainly the square brackets we've mandated would tend to read as grouping anyway. Of course, the main point of the \x[a,b,c] notation is to allow interpolation of sequences of hex characters into ordinary strings, and those don't care about abstract character boundaries. : > It occurs to me that we didn't spec whether character classes ignore : > whitespace. They probably should, just so you can chunk things: : > : > / <[ a..z A..Z 0..9 _ ]> / : > : > Then the question arises about whether <[ \ ]> is an escaped space : > or a backslash, or illegal : : I vote that it's an escaped space. A backslash is nearly always \\ : (or should be imho). : : > But if we make it match a backslash : > or illegal, then the minimal space matcher becomes \x20, I think, : > unless you graduate to \s. On the other hand, if we make it match : > a space, people aren't going to read that way unless they're pretty : > sophisticated... : : There's also , unless someone redefines the subrule. But you can't use in a character class. Well, that is, unless you write it: <+[ a..z ]+> or some such. Maybe that's good enough. : And in the general case that's a slightly more expensive mechanism : to get a space (it involves at least a subrule lookup). Perhaps : we could also create a visible meta sequence for it, in the same : way that we have visible metas for \e, \f, \r, \t. But I have : no idea what letter we might use there. Something to be said for \_ in that regard. : I don't think I like this, but perhaps C<< <> >> becomes : and C<< < > >> becomes <' '>? Seems like not enough visual distinction : there... <_> maybe. I'm good with <> being , and <,> being element boundary when matching lists. But I'd like to reserve < > for delimiting what is returned by $<>, the string officially matched: "foo bar baz" ~~ /:w foo < \w+ > baz/ say $/; # foo bar baz say $<>;# bar Or possibly "foo bar baz" ~~ /:w foo << \w+ >> baz/ but that should probably mean whatever "foo bar baz" ~~ /:w foo « \w+ » baz/ eventually means. Which I haven't the foggiest. But we should probably reserve the brackets on general principle's sake, just because brackets are so scarce. I dunno. If «...» in ordinary code does shell quoting, maybe «...» in rules does filename globbing or some such. I can see some issues with anchoring semantics. Makes more sense on a string as a whole, but maybe can anchor on element boundaries if used on a list of filenames. I suppose one could even go as far as rule jpeg :i « *.jp{e,}g » or whatever the right glob syntax is. Larry
apo5 (was: Re: \x{123a 123b 123c})
Larry Wall: > Juerd: >> Ruud: >>> Maybe >>> "\x{123a 123b 123c}" >>> is a nice alternative of >>> "\x{123a} \x{123b} \x{123c}". >> >> Hmm, very cute and friendly! Can we keep it, please? Please? Thanks for the support. > We already have, from A5, \x[0a;0d], so you can supposedly say > "\x[123a;123b;123c]" Found it in the old/new table on page 7. For me the semicolon is fine. I am using character names more and more, and between those, semicolons are less cluttery. Character names can contain spaces, but semicolons too? If not then \c[BEL; EXTENDED ARABIC-INDIC DIGIT ZERO] would be possible, but maybe better not, or more like \c['BEL'; 'EXTENDED ARABIC-INDIC DIGIT ZERO'] or even \c('BEL', 'EXTENDED ARABIC-INDIC DIGIT ZERO'). Something else: The '^' could be used for both the ultimate start- and end-of-string. This frees the '$'. There is still the '$$' that matches before embedded newlines, and since '^^' matches after those newlines, the '^^' and '$$' can only be unified to '^^' if it is one-width inside a string, so is like '[$$\n^^]' (or just '\n') there. At start- and end-of-string the '^^' can still be a zero-width match. I am not sure about greedy (meaning to try one-width first) or non-greedy. Example: '^[(\N*)^^]*^' to capture all lines, clean of newlines. Not a lot clearer than '^[(\N*)\n*]*$', but freeing the '$' and '$$' might be worth it. -- Affijn, Ruud "Gewoon is een tijger."
Re: \x{123a 123b 123c}
On Mon, Nov 21, 2005 at 03:23:35PM +0100, TSa wrote: > Patrick R. Michaud wrote: > >There's also , unless someone redefines the subrule. > >And in the general case that's a slightly more expensive mechanism > >to get a space (it involves at least a subrule lookup). Perhaps > >we could also create a visible meta sequence for it, in the same > >way that we have visible metas for \e, \f, \r, \t. But I have > >no idea what letter we might use there. > > How about \x and \X respectively? Note the *space* after it :) > ... If we're going to do that, I'd think it would be "\c " and "\C " instead of "\x " and "\X ". I'm not really advocating this, I'm just commenting that in this case \c seems more natural than \x. Pm
Re: \x{123a 123b 123c}
HaloO, Patrick R. Michaud wrote: There's also , unless someone redefines the subrule. And in the general case that's a slightly more expensive mechanism to get a space (it involves at least a subrule lookup). Perhaps we could also create a visible meta sequence for it, in the same way that we have visible metas for \e, \f, \r, \t. But I have no idea what letter we might use there. How about \x and \X respectively? Note the *space* after it :) I mean that much more serious than it might sound err read. I hope the concept of unwritten things in the source beeing interesting values of void/undef applies always. OTOH, I'm usually not saying anything in the area of the grammar subsystem, but I still try to wrap my brain around the underlying unifyed conceptual level where rules and methods or subs and macros are indistinguishable. So, please consider this as a well wanting question. And please forgive the syntax errors. With something like # or token? perhaps even sub? macro x ( HexLiteral *[$char = 32, [EMAIL PROTECTED] ) is parsed( * ) {...} and \ in match strings escaping out to the macro level when the circumfix match creator is invoked, I would expect m/ \x /; # single space is required m/ \x20 /; # same m/ <{x}> /; # same? m/ \X /; # any single char except space m/ \x\x\x /; # exactly three spaces m/ \x[20,20,20] /; # same, as proposed by Larry m/ \xy /; # parse error 'y not a hex digit' m/ \x y /; # one space then y to insert verbatim, machine level chars into the match definition. In particular *no* lookup is compiled in. I would call \x the single character *exact* matcher and \X the *excluder*. BTW, the definition of the latter could just be &X ::= !&x; # or automagically defined by up-casing and outer negation if ? and ! play in the meta operator league. I don't think I like this, but perhaps C<< <> >> becomes and C<< < > >> becomes <' '>? Seems like not enough visual distinction there... I strongly agree. I would ask the moot question *how* the single space in / / is removed ---as leading, trailing or separating space---when the parser goes over it. But I would never expect the source space to make it into the compiled match code! --
Re: \x{123a 123b 123c}
On Sat, Nov 19, 2005 at 06:32:17PM -0800, Larry Wall wrote: > On Sun, Nov 20, 2005 at 01:26:21AM +0100, Juerd wrote: > : Ruud H.G. van Tol skribis 2005-11-20 1:19 (+0100): > : > Maybe > : > "\x{123a 123b 123c}" > : > is a nice alternative of > : > "\x{123a} \x{123b} \x{123c}". > > We already have, from A5, \x[0a;0d], so you can supposedly say > "\x[123a;123b;123c]" Hmm, I hadn't caught that particular syntax in A05. AFAIK it's not in S05, so I should probably add it, or whatever syntax we end up adopting. (BTW, we haven't announced it on p6l yet, but there's a new version of S05 available.) > [...] > But I see that the semicolon is rather cluttery, mainly because it's > too tall. I'm not sure going all the way to space is good, but we > might have > "\x[123a,123b,123c]" > just to get a little visual space along with the separator. Just to verify, with this syntax would we expect \x[123a,123b,123c]+ to be the same as [\x123a \x123b \x123c]+ and not "\x123a \x123b \x123c+" ? > It occurs to me that we didn't spec whether character classes ignore > whitespace. They probably should, just so you can chunk things: > > / <[ a..z A..Z 0..9 _ ]> / > > Then the question arises about whether <[ \ ]> is an escaped space > or a backslash, or illegal I vote that it's an escaped space. A backslash is nearly always \\ (or should be imho). > But if we make it match a backslash > or illegal, then the minimal space matcher becomes \x20, I think, > unless you graduate to \s. On the other hand, if we make it match > a space, people aren't going to read that way unless they're pretty > sophisticated... There's also , unless someone redefines the subrule. And in the general case that's a slightly more expensive mechanism to get a space (it involves at least a subrule lookup). Perhaps we could also create a visible meta sequence for it, in the same way that we have visible metas for \e, \f, \r, \t. But I have no idea what letter we might use there. I don't think I like this, but perhaps C<< <> >> becomes and C<< < > >> becomes <' '>? Seems like not enough visual distinction there... Pm
Re: \x{123a 123b 123c}
On Sun, Nov 20, 2005 at 01:26:21AM +0100, Juerd wrote: : Ruud H.G. van Tol skribis 2005-11-20 1:19 (+0100): : > Maybe : > "\x{123a 123b 123c}" : > is a nice alternative of : > "\x{123a} \x{123b} \x{123c}". : : Hmm, very cute and friendly! Can we keep it, please? Please? We already have, from A5, \x[0a;0d], so you can supposedly say "\x[123a;123b;123c]" Note that square brackets are now the normative style though, since we're trying to reserve curlies psychologically for closures. But I see that the semicolon is rather cluttery, mainly because it's too tall. I'm not sure going all the way to space is good, but we might have "\x[123a,123b,123c]" just to get a little visual space along with the separator. My problem with space is that it has potential visual confusion with character classes (especially with the square brackets), and it also will make people wonder whether :w should match optional whitespace between the characters. The commas seems to imply sequence to me, and they occur often enough that you can see it's not a well-formed character class, insofar as it has repeated characters. It occurs to me that we didn't spec whether character classes ignore whitespace. They probably should, just so you can chunk things: / <[ a..z A..Z 0..9 _ ]> / Then the question arises about whether <[ \ ]> is an escaped space or a backslash, or illegal But if we make it match a backslash or illegal, then the minimal space matcher becomes \x20, I think, unless you graduate to \s. On the other hand, if we make it match a space, people aren't going to read that way unless they're pretty sophisticated... Larry
Re: \x{123a 123b 123c}
Ruud H.G. van Tol skribis 2005-11-20 1:19 (+0100): > Maybe > "\x{123a 123b 123c}" > is a nice alternative of > "\x{123a} \x{123b} \x{123c}". Hmm, very cute and friendly! Can we keep it, please? Please? Juerd -- http://convolution.nl/maak_juerd_blij.html http://convolution.nl/make_juerd_happy.html http://convolution.nl/gajigu_juerd_n.html