:syntax (was: \x{123a 123b 123c})

2005-11-23 Thread Damian Conway

Larry wrote:

 But the language in the following lexical scope is a constant, so what can
 :syntax($foo) possibly mean?  [Wait, this is Damian I'm talking to.]
 Nevermind, don't answer that...

Too late! ;-)

Regex syntaxes already are a twisty maze of variations, mostly alike. I
can easily envisage Perl users occasionally needing/wanting/using
patterns which are any of:

:syntaxPOSIX
:syntaxgrep
:syntaxegrep
:syntaxvim
:syntaxSnobol
:syntaxGoogle

Not just because people are used to different syntaxes, but also because
programs will want to accept search patterns in different (generally: more
restrictive) syntaxes so as to be able to interpolate them safely:

use Regex::Google;

for = :promptFind: - $search {
for @texts {
say if m:syntaxGoogle/$search/;
}
}


 And there aren't that many regexish languages anyway.

That depends on how broadly you define regexish. Search is a *very* common
activity and people are (re-)inventing notations for it all the time.

Damian


Re: :syntax (was: \x{123a 123b 123c})

2005-11-23 Thread Luke Palmer
On 11/22/05, Damian Conway [EMAIL PROTECTED] wrote:
  :syntaxPOSIX
  :syntaxgrep
  :syntaxegrep
  :syntaxvim
  :syntaxSnobol
  :syntaxGoogle

Aren't we providing an interface to define your own regex modifiers? 
All of these can easily be mapped into Perl 6 patterns, so...

Modules welcome!  ;-)

Luke


Re: \x{123a 123b 123c}

2005-11-22 Thread Patrick R. Michaud
On Mon, Nov 21, 2005 at 09:02:57AM -0800, Larry Wall wrote:
 : There's also sp, unless someone redefines the sp subrule.
 
 But you can't use sp in a character class.  Well, that is, unless
 you write it:
 
 +[ a..z ]+sp
 
 or some such.  Maybe that's good enough.

Er, that's now +[ a..z ]+sp, unless you're now changing it back.

 : And in the general case that's a slightly more expensive mechanism 
 : to get a space (it involves at least a subrule lookup).  Perhaps 
 : we could also create a visible meta sequence for it, in the same 
 : way that we have visible metas for \e, \f, \r, \t.  But I have 
 : no idea what letter we might use there.
 
 Something to be said for \_ in that regard.

Yes, I thought of \_ but mentally I still have trouble 
classifying _ along with the alphabetics -- '_' looks more
like punctuation to me.  And in general we use backslashes
in front of metacharacters to remove their meta meaning
(or when we aren't sure if a character has a meta meaning),
so that \_ somehow seems like it ought to be a literal
underscore, guarding against the possibility that the unescaped
underscore has a meta meaning.  (And yes, I can shoot
holes in this line of thinking along with everyone else.)

Whatever shortcuts we introduce, I'll be happy if we can just
rule that backslash+space (i.e., \ ) is a literal space
character -- i.e., keeping the principle that placing a backslash
in front of a metacharacter removes that character's meta
behavior.

 I dunno.  If «...» in ordinary code does shell quoting, maybe «...» in
 rules does filename globbing or some such.  I can see some issues with
 anchoring semantics.  Makes more sense on a string as a whole, but maybe
 can anchor on element boundaries if used on a list of filenames.
 I suppose one could even go as far as
 
 rule jpeg :i « *.jp{e,}g »
 
 or whatever the right glob syntax is.

Since we already have :perl5, I'd think that we'd want globbing 
to be something like

rule jpeg :i :glob /*.jp{e,}g/

or, for something intra-rule-ish:

m :w / mv (:glob *.c)+ dir /

And perhaps we'd want a general form for specifying other 
pattern syntaxes; i.e., :perl5 and :glob are shortcuts for
:syntax('perl5') and :syntax('glob') or something like that.

Pm


Re: \x{123a 123b 123c}

2005-11-22 Thread Larry Wall
On Mon, Nov 21, 2005 at 11:25:20AM -0600, Patrick R. Michaud wrote:
: On Mon, Nov 21, 2005 at 09:02:57AM -0800, Larry Wall wrote:
:  : There's also sp, unless someone redefines the sp subrule.
:  
:  But you can't use sp in a character class.  Well, that is, unless
:  you write it:
:  
:  +[ a..z ]+sp
:  
:  or some such.  Maybe that's good enough.
: 
: Er, that's now +[ a..z ]+sp, unless you're now changing it back.

No, just me going senile.

:  : And in the general case that's a slightly more expensive mechanism 
:  : to get a space (it involves at least a subrule lookup).  Perhaps 
:  : we could also create a visible meta sequence for it, in the same 
:  : way that we have visible metas for \e, \f, \r, \t.  But I have 
:  : no idea what letter we might use there.
:  
:  Something to be said for \_ in that regard.
: 
: Yes, I thought of \_ but mentally I still have trouble 
: classifying _ along with the alphabetics -- '_' looks more
: like punctuation to me.  And in general we use backslashes
: in front of metacharacters to remove their meta meaning
: (or when we aren't sure if a character has a meta meaning),
: so that \_ somehow seems like it ought to be a literal
: underscore, guarding against the possibility that the unescaped
: underscore has a meta meaning.  (And yes, I can shoot
: holes in this line of thinking along with everyone else.)

I think we'll leave both _ and \_ meaning the same thing, just to avoid
that confusion path--I've seen people backwhacking anything remotely
resembling punctuation just in case it's a metacharacter, and if they
are confused about _, they might backwhack it.  More to the point,
I think sp and +sp are about the right Huffman length, given that
matching a single space is usually wrong.  You usually want \s or \s*.

: Whatever shortcuts we introduce, I'll be happy if we can just
: rule that backslash+space (i.e., \ ) is a literal space
: character -- i.e., keeping the principle that placing a backslash
: in front of a metacharacter removes that character's meta
: behavior.

Yes, that will be a space.

:  I dunno.  If «...» in ordinary code does shell quoting, maybe «...» in
:  rules does filename globbing or some such.  I can see some issues with
:  anchoring semantics.  Makes more sense on a string as a whole, but maybe
:  can anchor on element boundaries if used on a list of filenames.
:  I suppose one could even go as far as
:  
:  rule jpeg :i « *.jp{e,}g »
:  
:  or whatever the right glob syntax is.
: 
: Since we already have :perl5, I'd think that we'd want globbing 
: to be something like
: 
: rule jpeg :i :glob /*.jp{e,}g/
: 
: or, for something intra-rule-ish:
: 
: m :w / mv (:glob *.c)+ dir /

Yep, that's what I decided in my other message that was thinking about
using  ...  for word boundaries and  ...  for capturing $.

: And perhaps we'd want a general form for specifying other 
: pattern syntaxes; i.e., :perl5 and :glob are shortcuts for
: :syntax('perl5') and :syntax('glob') or something like that.

Maybe.  Or maybe it's enough that there are syntactic categories for
adding rule modifiers.  Doesn't seem like you'd want to parameterize
the current language very often.

Larry


Re: \x{123a 123b 123c}

2005-11-22 Thread Patrick R. Michaud
On Tue, Nov 22, 2005 at 07:52:24AM -0800, Larry Wall wrote:
 
 I think we'll leave both _ and \_ meaning the same thing, just to avoid
 that confusion path [...]

Yay!

 : Whatever shortcuts we introduce, I'll be happy if we can just
 : rule that backslash+space (i.e., \ ) is a literal space
 : character -- i.e., keeping the principle that placing a backslash
 : in front of a metacharacter removes that character's meta
 : behavior.
 
 Yes, that will be a space.

Yay!

 : Since we already have :perl5, I'd think that we'd want globbing 
 : to be something like
 : rule jpeg :i :glob /*.jp{e,}g/
 : or, for something intra-rule-ish:
 : m :w / mv (:glob *.c)+ dir /
 
 Yep, that's what I decided in my other message that was thinking about
 using  ...  for word boundaries and  ...  for capturing $.

Yay! (Our messages on this crossed in the mail; mine was moderated for
some reason but that's been corrected.)

 : And perhaps we'd want a general form for specifying other 
 : pattern syntaxes; i.e., :perl5 and :glob are shortcuts for
 : :syntax('perl5') and :syntax('glob') or something like that.
 
 Maybe.  Or maybe it's enough that there are syntactic categories for
 adding rule modifiers.  Doesn't seem like you'd want to parameterize
 the current language very often.

At least within PGE, I'm starting to come across the situation
where each application and host language wants its own slight variations
of the regular expression syntax (for compatibility reasons).
And I figured that since we (conjecturally) have C:lang('PIR'), 
C:lang('Python') and C:lang('TCL') to indicate the language 
to be used for the closures within a rule, it might be nice to 
have a similar parameterized modifier for the pattern syntax
itself.

I was also thinking that one of the tricky parts to custom rule
modifiers such as :perl and :glob is that they actually change
the parsing for whatever follows, so it might be nice to have
a parameterized form to hook into rather than defining a custom
modifier for each syntax variant.  But on thinking about it 
further from an implementation perspective I guess it all comes 
out the same anyway...

Pm


Re: \x{123a 123b 123c}

2005-11-22 Thread Damian Conway

Patrick wrote:

Since we already have :perl5, I'd think that we'd want globbing 
to be something like


rule jpeg :i :glob /*.jp{e,}g/

or, for something intra-rule-ish:

m :w / mv (:glob *.c)+ dir /


Here! Here!

And perhaps we'd want a general form for specifying other 
pattern syntaxes; i.e., :perl5 and :glob are shortcuts for

:syntax('perl5') and :syntax('glob') or something like that.


Agreed.

Damian


Re: \x{123a 123b 123c}

2005-11-22 Thread Larry Wall
On Tue, Nov 22, 2005 at 08:19:04PM +1100, Damian Conway wrote:
: And perhaps we'd want a general form for specifying other 
: pattern syntaxes; i.e., :perl5 and :glob are shortcuts for
: :syntax('perl5') and :syntax('glob') or something like that.
: 
: Agreed.

But the language in the following lexical scope is a constant, so what can
:syntax($foo) possibly mean?  [Wait, this is Damian I'm talking to.]
Nevermind, don't answer that...

And there aren't that many regexish languages anyway.  So I think :syntax
is relatively useless except for documentation, and in practice people
will almost always omit it, which makes it even less useful, and pretty
nearly kicks it over into the category of multiplied entities for me.

Larry


Re: \x{123a 123b 123c}

2005-11-22 Thread Dave Whipp

Larry Wall wrote:


And there aren't that many regexish languages anyway.  So I think :syntax
is relatively useless except for documentation, and in practice people
will almost always omit it, which makes it even less useful, and pretty
nearly kicks it over into the category of multiplied entities for me.


Its surprising how many are out there. Even if we ignore the various 
dialects of standard rexen, we can find interesting examples such as 
PSL, a language for specifying temporal assertions, for hardware design: 
http://www.project-veripage.com/psl_tutorial_5.php. Whether one would 
want to fold this syntax into a Crule is a different question.


There are actually a number of competing languages in this space. E.g. 
http://www.pslsugar.org/papers/pslandsva.pdf.


Re: \x{123a 123b 123c}

2005-11-22 Thread Larry Wall
On Tue, Nov 22, 2005 at 09:46:59AM -0800, Dave Whipp wrote:
: Larry Wall wrote:
: 
: And there aren't that many regexish languages anyway.  So I think :syntax
: is relatively useless except for documentation, and in practice people
: will almost always omit it, which makes it even less useful, and pretty
: nearly kicks it over into the category of multiplied entities for me.
: 
: Its surprising how many are out there.

We can certainly add a :syntax() modifier as easily as a :foolang modifier,
if we decide at some point we really need one, or if PGE could make good
use of it even if Perl 6 doesn't want it.

Larry


Re: \x{123a 123b 123c}

2005-11-22 Thread Patrick R. Michaud
On Tue, Nov 22, 2005 at 10:30:20AM -0800, Larry Wall wrote:
 On Tue, Nov 22, 2005 at 09:46:59AM -0800, Dave Whipp wrote:
 : Larry Wall wrote:
 : 
 : And there aren't that many regexish languages anyway.  So I think :syntax
 : is relatively useless except for documentation, and in practice people
 : will almost always omit it, which makes it even less useful, and pretty
 : nearly kicks it over into the category of multiplied entities for me.
 : 
 : Its surprising how many are out there.
 
 We can certainly add a :syntax() modifier as easily as a :foolang modifier,
 if we decide at some point we really need one, or if PGE could make good
 use of it even if Perl 6 doesn't want it.

I'm agreeing with Larry on this one -- let's wait to decide this 
until we actually feel like we need it.

Pm


Re: \x{123a 123b 123c}

2005-11-22 Thread Patrick R. Michaud
On Mon, Nov 21, 2005 at 09:02:57AM -0800, Larry Wall wrote:
 On Sun, Nov 20, 2005 at 10:27:17AM -0600, Patrick R. Michaud wrote:
 : On Sat, Nov 19, 2005 at 06:32:17PM -0800, Larry Wall wrote:
 :  We already have, from A5, \x[0a;0d], so you can supposedly say 
 :  \x[123a;123b;123c] 
 : 
 : Hmm, I hadn't caught that particular syntax in A05.  AFAIK it's not 
 : in S05, so I should probably add it, or whatever syntax we end up 
 : adopting.
 
 Yes.

Out of curiosity (and so I can update S05 and PGE), what syntax 
are we adopting?  Is it semicolon, comma, space, any combination of the 
three, or ...?

Pm


Re: \x{123a 123b 123c}

2005-11-22 Thread Larry Wall
On Tue, Nov 22, 2005 at 12:48:39PM -0600, Patrick R. Michaud wrote:
: On Mon, Nov 21, 2005 at 09:02:57AM -0800, Larry Wall wrote:
:  On Sun, Nov 20, 2005 at 10:27:17AM -0600, Patrick R. Michaud wrote:
:  : On Sat, Nov 19, 2005 at 06:32:17PM -0800, Larry Wall wrote:
:  :  We already have, from A5, \x[0a;0d], so you can supposedly say 
:  :  \x[123a;123b;123c] 
:  : 
:  : Hmm, I hadn't caught that particular syntax in A05.  AFAIK it's not 
:  : in S05, so I should probably add it, or whatever syntax we end up 
:  : adopting.
:  
:  Yes.
: 
: Out of curiosity (and so I can update S05 and PGE), what syntax 
: are we adopting?  Is it semicolon, comma, space, any combination of the 
: three, or ...?

S02.pod currently has it as comma.

Larry


Re: \x{123a 123b 123c}

2005-11-21 Thread TSa

HaloO,

Patrick R. Michaud wrote:

There's also sp, unless someone redefines the sp subrule.
And in the general case that's a slightly more expensive mechanism 
to get a space (it involves at least a subrule lookup).  Perhaps 
we could also create a visible meta sequence for it, in the same 
way that we have visible metas for \e, \f, \r, \t.  But I have 
no idea what letter we might use there.


How about \x and \X respectively? Note the *space* after it :)
I mean that much more serious than it might sound err read.
I hope the concept of unwritten things in the source beeing
interesting values of void/undef applies always.

OTOH, I'm usually not saying anything in the area of the grammar
subsystem, but I still try to wrap my brain around the underlying
unifyed conceptual level where rules and methods or subs and macros
are indistinguishable. So, please consider this as a well wanting
question. And please forgive the syntax errors.

With something like

   # or token? perhaps even sub?
   macro   x ( HexLiteral *[$char = 32, [EMAIL PROTECTED] )
   is parsed( HexLiteral* )
   {...}

and \ in match strings escaping out to the macro level when
the circumfix match creator is invoked, I would expect

   m/  \x   /;  # single space is required
   m/  \x20 /;  # same
   m/ {x} /;  # same?
   m/  \X   /;  # any single char except space
   m/  \x\x\x   /;  # exactly three spaces
   m/  \x[20,20,20] /;  # same, as proposed by Larry
   m/  \xy  /;  # parse error 'y not a hex digit'
   m/  \x y /;  # one space then y

to insert verbatim, machine level chars into the match definition.
In particular *no* lookup is compiled in.

I would call \x the single character *exact* matcher and \X
the *excluder*. BTW, the definition of the latter could just be

   X ::= !x; # or automagically defined by up-casing and outer negation

if ? and ! play in the meta operator league.


I don't think I like this, but perhaps  C   becomes ?null 
and Cbecomes ' '?  Seems like not enough visual distinction

there...


I strongly agree. I would ask the moot question *how* the single space
in / / is removed ---as leading, trailing or separating space---when the
parser goes over it. But I would never expect the source space to make it
into the compiled match code!
--


Re: \x{123a 123b 123c}

2005-11-21 Thread Patrick R. Michaud
On Mon, Nov 21, 2005 at 03:23:35PM +0100, TSa wrote:
 Patrick R. Michaud wrote:
 There's also sp, unless someone redefines the sp subrule.
 And in the general case that's a slightly more expensive mechanism 
 to get a space (it involves at least a subrule lookup).  Perhaps 
 we could also create a visible meta sequence for it, in the same 
 way that we have visible metas for \e, \f, \r, \t.  But I have 
 no idea what letter we might use there.
 
 How about \x and \X respectively? Note the *space* after it :)
 ...

If we're going to do that, I'd think it would be \c  and \C  
instead of \x  and \X .  I'm not really advocating this,
I'm just commenting that in this case \c seems more natural 
than \x.

Pm


apo5 (was: Re: \x{123a 123b 123c})

2005-11-21 Thread Ruud H.G. van Tol
Larry Wall:
 Juerd:
 Ruud:

 Maybe
 \x{123a 123b 123c}
 is a nice alternative of
 \x{123a} \x{123b} \x{123c}.

 Hmm, very cute and friendly! Can we keep it, please? Please?

Thanks for the support.


 We already have, from A5, \x[0a;0d], so you can supposedly say
 \x[123a;123b;123c]

rereading apo5 /
Found it in the old/new table on page 7. For me the semicolon is fine.

I am using character names more and more, and between those, semicolons
are less cluttery. Character names can contain spaces, but semicolons
too? If not then
\c[BEL; EXTENDED ARABIC-INDIC DIGIT ZERO] would be possible, but maybe
better not, or more like
\c['BEL'; 'EXTENDED ARABIC-INDIC DIGIT ZERO'] or even
\c('BEL', 'EXTENDED ARABIC-INDIC DIGIT ZERO').



Something else:
The '^' could be used for both the ultimate start- and end-of-string.
This frees the '$'.

There is still the '$$' that matches before embedded newlines, and since
'^^' matches after those newlines, the '^^' and '$$' can only be unified
to '^^' if it is one-width inside a string, so is like '[$$\n^^]' (or
just '\n') there.
At start- and end-of-string the '^^' can still be a zero-width match.
I am not sure about greedy (meaning to try one-width first) or
non-greedy.

Example: '^[(\N*)^^]*^' to capture all lines, clean of newlines.
Not a lot clearer than '^[(\N*)\n*]*$', but freeing the '$' and '$$'
might be worth it.

mess about '^^+', '^+^' and '^*^' (bats!) removed

-- 
Affijn, Ruud

Gewoon is een tijger.



Re: \x{123a 123b 123c}

2005-11-21 Thread Larry Wall
On Sun, Nov 20, 2005 at 10:27:17AM -0600, Patrick R. Michaud wrote:
: On Sat, Nov 19, 2005 at 06:32:17PM -0800, Larry Wall wrote:
:  On Sun, Nov 20, 2005 at 01:26:21AM +0100, Juerd wrote:
:  : Ruud H.G. van Tol skribis 2005-11-20  1:19 (+0100):
:  :  Maybe 
:  :  \x{123a 123b 123c} 
:  :  is a nice alternative of 
:  :  \x{123a} \x{123b} \x{123c}. 
:  
:  We already have, from A5, \x[0a;0d], so you can supposedly say 
:  \x[123a;123b;123c] 
: 
: Hmm, I hadn't caught that particular syntax in A05.  AFAIK it's not 
: in S05, so I should probably add it, or whatever syntax we end up 
: adopting.

Yes.

: (BTW, we haven't announced it on p6l yet, but there's a new version of
: S05 available.)

Indeed, there are new versions of most of the S's.  People who want the
latest should use svn.perl.org, which also makes it easy to do diff listings
with svn or svk.

:  [...]
:  But I see that the semicolon is rather cluttery, mainly because it's
:  too tall.  I'm not sure going all the way to space is good, but we
:  might have
:  \x[123a,123b,123c] 
:  just to get a little visual space along with the separator.  
: 
: Just to verify, with this syntax would we expect
: 
: \x[123a,123b,123c]+
: 
: to be the same as
: 
: [\x123a \x123b \x123c]+
: 
: and not \x123a \x123b \x123c+ ?

Yes.  I think the rule interpretation of \x is that it is a sequence to
be considered a single character regardless of its context.  Certainly
the square brackets we've mandated would tend to read as grouping anyway.

Of course, the main point of the \x[a,b,c] notation is to allow
interpolation of sequences of hex characters into ordinary strings,
and those don't care about abstract character boundaries.

:  It occurs to me that we didn't spec whether character classes ignore
:  whitespace.  They probably should, just so you can chunk things:
:  
:  / [ a..z A..Z 0..9 _ ] /
:  
:  Then the question arises about whether [ \ ] is an escaped space
:  or a backslash, or illegal  
: 
: I vote that it's an escaped space.  A backslash is nearly always \\
: (or should be imho).
: 
:  But if we make it match a backslash
:  or illegal, then the minimal space matcher becomes \x20, I think,
:  unless you graduate to \s.  On the other hand, if we make it match
:  a space, people aren't going to read that way unless they're pretty
:  sophisticated...
: 
: There's also sp, unless someone redefines the sp subrule.

But you can't use sp in a character class.  Well, that is, unless
you write it:

+[ a..z ]+sp

or some such.  Maybe that's good enough.

: And in the general case that's a slightly more expensive mechanism 
: to get a space (it involves at least a subrule lookup).  Perhaps 
: we could also create a visible meta sequence for it, in the same 
: way that we have visible metas for \e, \f, \r, \t.  But I have 
: no idea what letter we might use there.

Something to be said for \_ in that regard.

: I don't think I like this, but perhaps  C   becomes ?null 
: and Cbecomes ' '?  Seems like not enough visual distinction
: there...

_ maybe.  I'm good with  being ?null, and , being element boundary
when matching lists.  But I'd like to reserve   for delimiting what
is returned by $, the string officially matched:

foo bar baz ~~ /:w foo  \w+  baz/
say $/; # foo bar baz
say $;# bar

Or possibly

foo bar baz ~~ /:w foo  \w+  baz/

but that should probably mean whatever

foo bar baz ~~ /:w foo « \w+ » baz/

eventually means.  Which I haven't the foggiest.  But we should probably
reserve the brackets on general principle's sake, just because brackets
are so scarce.

I dunno.  If «...» in ordinary code does shell quoting, maybe «...» in
rules does filename globbing or some such.  I can see some issues with
anchoring semantics.  Makes more sense on a string as a whole, but maybe
can anchor on element boundaries if used on a list of filenames.
I suppose one could even go as far as

rule jpeg :i « *.jp{e,}g »

or whatever the right glob syntax is.

Larry


Re: apo5 (was: Re: \x{123a 123b 123c})

2005-11-21 Thread Larry Wall
On Mon, Nov 21, 2005 at 05:49:59PM +0100, Ruud H.G. van Tol wrote:
: Larry Wall:
:  Juerd:
:  Ruud:
: 
:  Maybe
:  \x{123a 123b 123c}
:  is a nice alternative of
:  \x{123a} \x{123b} \x{123c}.
: 
:  Hmm, very cute and friendly! Can we keep it, please? Please?
: 
: Thanks for the support.

Hey, this ain't exactly a popularity contest here...  :-)

:  We already have, from A5, \x[0a;0d], so you can supposedly say
:  \x[123a;123b;123c]
: 
: rereading apo5 /
: Found it in the old/new table on page 7. For me the semicolon is fine.

The fact that you say page 7 leads me to guess that you're reading
it from perl.com.  That's going to be the most out-of-date version.
Better would be

dev.perl.orgone day latency but html-ified
svn.perl.orgup to the minute but only in pod

In particular, the Apocalypses have little [Update:] sections that are
supposed to alert you to things that have changed since the the Apo
was written.  (Though some of those are a little out of date right now
too--I'm just working my way through A12 again.)

: I am using character names more and more, and between those, semicolons
: are less cluttery. Character names can contain spaces, but semicolons
: too? If not then
: \c[BEL; EXTENDED ARABIC-INDIC DIGIT ZERO] would be possible, but maybe
: better not, or more like
: \c['BEL'; 'EXTENDED ARABIC-INDIC DIGIT ZERO'] or even
: \c('BEL', 'EXTENDED ARABIC-INDIC DIGIT ZERO').

None of the current names contain either semicolon or comma, so I expect
they're avoiding them by policy.

: Something else:
: The '^' could be used for both the ultimate start- and end-of-string.
: This frees the '$'.

I think this is one of those aspects of regex culture that is too
entrenched to remove.  Besides, you have to be able to distinguish
s/^/foo/ from s/$/foo/.

: There is still the '$$' that matches before embedded newlines, and since
: '^^' matches after those newlines, the '^^' and '$$' can only be unified
: to '^^' if it is one-width inside a string, so is like '[$$\n^^]' (or
: just '\n') there.

But then if you use it within a capture, you get an extra newline you
probably don't want.

: At start- and end-of-string the '^^' can still be a zero-width match.
: I am not sure about greedy (meaning to try one-width first) or
: non-greedy.
: 
: Example: '^[(\N*)^^]*^' to capture all lines, clean of newlines.
: Not a lot clearer than '^[(\N*)\n*]*$', but freeing the '$' and '$$'
: might be worth it.

I don't think it's any clearer.  In fact, I find all the ^'s there
are a little too visually confusing and contextual.

Larry


Re: \x{123a 123b 123c}

2005-11-21 Thread Larry Wall
On Mon, Nov 21, 2005 at 09:02:57AM -0800, Larry Wall wrote:
: But I'd like to reserve   for delimiting what is returned by $,
: the string officially matched:
: 
: foo bar baz ~~ /:w foo  \w+  baz/
: say $/;   # foo bar baz
: say $;  # bar

Though it occurs to me that there's another possible interpretation,
culturally speaking.  The overloading of \b has always bothered me,
plus the fact that \b can't distinguish which kind of word boundary
without additional context.  In regex culture, we have the \...\
word matcher, and maybe that devolves to isolated  ...  in rules.

We could still use  ...  to capture $, which I was leaning toward
anyway just for visibility reasons, since the two ends could be quite
far apart.

And file globbing could just be :glob or some such if we really need
to embed it in rules.

Larry


Re: \x{123a 123b 123c}

2005-11-20 Thread Patrick R. Michaud
On Sat, Nov 19, 2005 at 06:32:17PM -0800, Larry Wall wrote:
 On Sun, Nov 20, 2005 at 01:26:21AM +0100, Juerd wrote:
 : Ruud H.G. van Tol skribis 2005-11-20  1:19 (+0100):
 :  Maybe 
 :  \x{123a 123b 123c} 
 :  is a nice alternative of 
 :  \x{123a} \x{123b} \x{123c}. 
 
 We already have, from A5, \x[0a;0d], so you can supposedly say 
 \x[123a;123b;123c] 

Hmm, I hadn't caught that particular syntax in A05.  AFAIK it's not 
in S05, so I should probably add it, or whatever syntax we end up 
adopting.

(BTW, we haven't announced it on p6l yet, but there's a new version of
S05 available.)

 [...]
 But I see that the semicolon is rather cluttery, mainly because it's
 too tall.  I'm not sure going all the way to space is good, but we
 might have
 \x[123a,123b,123c] 
 just to get a little visual space along with the separator.  

Just to verify, with this syntax would we expect

\x[123a,123b,123c]+

to be the same as

[\x123a \x123b \x123c]+

and not \x123a \x123b \x123c+ ?

 It occurs to me that we didn't spec whether character classes ignore
 whitespace.  They probably should, just so you can chunk things:
 
 / [ a..z A..Z 0..9 _ ] /
 
 Then the question arises about whether [ \ ] is an escaped space
 or a backslash, or illegal  

I vote that it's an escaped space.  A backslash is nearly always \\
(or should be imho).

 But if we make it match a backslash
 or illegal, then the minimal space matcher becomes \x20, I think,
 unless you graduate to \s.  On the other hand, if we make it match
 a space, people aren't going to read that way unless they're pretty
 sophisticated...

There's also sp, unless someone redefines the sp subrule.
And in the general case that's a slightly more expensive mechanism 
to get a space (it involves at least a subrule lookup).  Perhaps 
we could also create a visible meta sequence for it, in the same 
way that we have visible metas for \e, \f, \r, \t.  But I have 
no idea what letter we might use there.

I don't think I like this, but perhaps  C   becomes ?null 
and Cbecomes ' '?  Seems like not enough visual distinction
there...

Pm


\x{123a 123b 123c}

2005-11-19 Thread Ruud H.G. van Tol
Maybe 

\x{123a 123b 123c} 

is a nice alternative of 

\x{123a} \x{123b} \x{123c}. 

-- 
Grtz, Ruud


Re: \x{123a 123b 123c}

2005-11-19 Thread Juerd
Ruud H.G. van Tol skribis 2005-11-20  1:19 (+0100):
 Maybe 
 \x{123a 123b 123c} 
 is a nice alternative of 
 \x{123a} \x{123b} \x{123c}. 

Hmm, very cute and friendly! Can we keep it, please? Please?


Juerd
-- 
http://convolution.nl/maak_juerd_blij.html
http://convolution.nl/make_juerd_happy.html 
http://convolution.nl/gajigu_juerd_n.html


Re: \x{123a 123b 123c}

2005-11-19 Thread Larry Wall
On Sun, Nov 20, 2005 at 01:26:21AM +0100, Juerd wrote:
: Ruud H.G. van Tol skribis 2005-11-20  1:19 (+0100):
:  Maybe 
:  \x{123a 123b 123c} 
:  is a nice alternative of 
:  \x{123a} \x{123b} \x{123c}. 
: 
: Hmm, very cute and friendly! Can we keep it, please? Please?

We already have, from A5, \x[0a;0d], so you can supposedly say 

\x[123a;123b;123c] 

Note that square brackets are now the normative style though, since we're
trying to reserve curlies psychologically for closures.

But I see that the semicolon is rather cluttery, mainly because it's
too tall.  I'm not sure going all the way to space is good, but we
might have

\x[123a,123b,123c] 

just to get a little visual space along with the separator.  My problem
with space is that it has potential visual confusion with character
classes (especially with the square brackets), and it also will make
people wonder whether :w should match optional whitespace between
the characters.  The commas seems to imply sequence to me, and they
occur often enough that you can see it's not a well-formed character
class, insofar as it has repeated characters.

It occurs to me that we didn't spec whether character classes ignore
whitespace.  They probably should, just so you can chunk things:

/ [ a..z A..Z 0..9 _ ] /

Then the question arises about whether [ \ ] is an escaped space
or a backslash, or illegal  But if we make it match a backslash
or illegal, then the minimal space matcher becomes \x20, I think,
unless you graduate to \s.  On the other hand, if we make it match
a space, people aren't going to read that way unless they're pretty
sophisticated...

Larry