Re: Please rename 'but' to 'has'.

2002-04-22 Thread Aaron Sherman


On Sun, 2002-04-21 at 10:59, Trey Harris wrote:

 0 has true
 
 my first reaction would be, huh?  Since when?

Dare I say... now? ;-)

Sorry, someone had to say it.

Personally, even though it sucks up namespace, I think what we're seeing
here is a need for more than one keyword that are synonyms. but and
now seem to cover a good deal of ground.

0 now true

Is misleading, IMHO, as 0 is not now true. 0, in this context is an
expression, and we're saying that that expression is now true. but
conveys this much more clearly. However, as many have pointed out, there
are a number of cases where but is equally misleading.

Is there any problem with allowing both but and now? It might even be
elegant to use both at the same time:

$x now integer but true

which is clearer to my eye than

$x now integer now true

which seems to change the properties of $x twice without reconciling the
changes with each other.

In any other language this would be unthinkable, but I think it fits
nicely with Perl's philosophy. Not TMTOWTDI, which I think is often used
to excuse the inexcusable, but the idea that Perl reflects the ways in
which humans use language. We want to convey shades of meaning that do
not translate directly to action.

So, have I just lost it, or would it make sense to have now and but?

Apologies to the person who started this thread. I know you thought
has was ideal, and I understand why. It's just that between but and
now, I think you get more ground covered than you do with has and
either one.





RE: Regex and Matched Delimiters

2002-04-22 Thread Aaron Sherman

On Sat, 2002-04-20 at 05:06, Mike Lambert wrote:
  He then went on to describe something I didn't understand at all.
  Sorry.
 
 Few corrections to what you wrote:
 
 To avoid the problem of extending {} to support new features with a
 character 'x', without breaking stuff that might have an 'x' immediately
 after the '{', my proposal is to require one space after the { before the
 real regex appears.

I hope that you mean one or more whitespace characters, not just a
space. The following would be correct, no?

/{|
.*
 }/

Anything else would seem rather confusing to the average Perl
programmer.





Re: Regex and Matched Delimiters

2002-04-22 Thread Aaron Sherman

On Sat, 2002-04-20 at 14:33, Me wrote:

 [2c. What about ( data) or (ops data) normally means non-capturing,
 ($2 data) captures into $2, ($foo data) captures into $foo?]

Very nice (but, I assume you meant {$foo data})! This does add another
special case to the regexp parser's handling of $, but it seems like
it would be worth it.

Makes me think of the even slightly hairier:

{foo data}

or even more hair-full:

{{$foo} data}

for references.

Where you capture into the usual positional, and then invoke foo with
the variable as parameter.

Would be pretty nice closure-wise:

sub match_with_alert($re,$id,$ops,$fac,$pri) {
openlog $id,$ops,$fac;
my $alert = sub ($match) {
syslog $pri, Matched regexp: $match;
}
return study /{{$alert} $re}/;
}
my $m = match_with_alert('ROOT login',$0,0,LOG_USER,PRI_CRIT);
for  - $_ { /$m/ }

That would certainly be a handy thing that would set Perl apart from the
pack of advanced regexp languages that don't support closures

Some other things come to mind as well, but I'm not sure how evil they
are. For example:

sub decrypt($data is rw) {
$data = rot13($data);
}

print The secret message is: , /^Encrypted: {decrypt .*}/,
  \n;






Re: Regex and Matched Delimiters

2002-04-22 Thread Me

 Very nice (but, I assume you meant {$foo data})!

I didn't mean that (even if I should have).

Aiui, Mike's final suggestion was that parens end up
doing all the (ops data) tricks, and braces are used
purely to do code insertions. (I really liked that idea.)

So:

Perl 5Perl6
(data)( data)
(?opsdata)(ops data)
({})  {}  


--
ralph




Re: Regex and Matched Delimiters

2002-04-22 Thread Aaron Sherman

On Mon, 2002-04-22 at 14:18, Me wrote:
  Very nice (but, I assume you meant {$foo data})!
 
 I didn't mean that (even if I should have).
 
 Aiui, Mike's final suggestion was that parens end up
 doing all the (ops data) tricks, and braces are used
 purely to do code insertions. (I really liked that idea.)
 
 So:
 
 Perl 5Perl6
 (data)( data)
 (?opsdata)(ops data)
 ({})  {}  

I don't like that particular way of looking at things, but either way my
comments about subroutines and closures still holds.





Re: Please rename 'but' to 'has'.

2002-04-22 Thread Larry Wall

Aaron Sherman writes:
: On Sun, 2002-04-21 at 10:59, Trey Harris wrote:
: 
:  0 has true
:  
:  my first reaction would be, huh?  Since when?
: 
: Dare I say... now? ;-)
: 
: Sorry, someone had to say it.
: 
: Personally, even though it sucks up namespace, I think what we're seeing
: here is a need for more than one keyword that are synonyms. but and
: now seem to cover a good deal of ground.
: 
: 0 now true
: 
: Is misleading, IMHO, as 0 is not now true. 0, in this context is an
: expression, and we're saying that that expression is now true. but
: conveys this much more clearly. However, as many have pointed out, there
: are a number of cases where but is equally misleading.
: 
: Is there any problem with allowing both but and now? It might even be
: elegant to use both at the same time:
: 
: $x now integer but true
: 
: which is clearer to my eye than
: 
: $x now integer now true
: 
: which seems to change the properties of $x twice without reconciling the
: changes with each other.
: 
: In any other language this would be unthinkable, but I think it fits
: nicely with Perl's philosophy. Not TMTOWTDI, which I think is often used
: to excuse the inexcusable, but the idea that Perl reflects the ways in
: which humans use language. We want to convey shades of meaning that do
: not translate directly to action.
: 
: So, have I just lost it, or would it make sense to have now and but?
: 
: Apologies to the person who started this thread. I know you thought
: has was ideal, and I understand why. It's just that between but and
: now, I think you get more ground covered than you do with has and
: either one.

Perl 6 will try to avoid synonyms but make it easy to declare them.  At
worst it would be something like:

my sub operator:now ($a,$b) is inline { $a but $b }

Larry



Re: Regex and Matched Delimiters

2002-04-22 Thread Larry Wall

Me writes:
:  Very nice (but, I assume you meant {$foo data})!
: 
: I didn't mean that (even if I should have).
: 
: Aiui, Mike's final suggestion was that parens end up
: doing all the (ops data) tricks, and braces are used
: purely to do code insertions. (I really liked that idea.)
: 
: So:
: 
: Perl 5Perl6
: (data)( data)
: (?opsdata)(ops data)
: ({})  {}  

Hmm.  Let me spill a few beans about where I'm going with A5.  I've
been thinking similar thoughts about the problem of overloading parens
so heavily in Perl 5, but I'm going in a slightly different direction
with it.  The basic principles for the new regexen are:

* Parens always capture.
* Braces are always closures.
* Square brackets are always character classes.
* Angle brackets are always metasyntax (along with backslash).

So a first whack at the differences might be:

Old New
--- ---
//  /prior/  ???
?pat?   /?f:pat/  ???
/pat/i  m:i/pat/ or /?i:pat/ or even m?i:pat ???
/pat/x  /pat/
/^pat$/m/^^pat$$/
/./s/any/ or /./ ???

\p{prop}+prop  ???
\P{prop}-prop  ???
space   sp (or \h for horizontal?)
{n,m}   n,m

\t  also tab
\n  also lf or nl (latter matching logical newline)
\r  also cr
\f  also ff
\a  also bell
\e  also esc
\033same
\x1Bsame
\x{263a}\x263a ???
\c[ same
\N{name}name
\l  same
\u  same
\Lstring\E  \Lstring
\Ustring\E  \Ustring
\E  gone
[\040\t]\h  plus any Unicode horizontal whitespace
[\r\n\ck]   \v  plus any Unicode vertical whitespace

\b  same
\B  same
\A  ^
\Z  same?
\z  $
\G  pos, but assumed in nested patterns?
 
\1  $1

\Q$var\E$varalways assumed literal, so $1 is literal backref
$var$var  assumed to be regex
=~ $re  =~ /$re/   ouch?

(??{$rule}) rule
(?{ code }) { code } with failure semantics
(?#...) {...} :-)
(?:...) :...
(?=...) before: ...
(?!...) !before: ...
(?=...)after: ...
(?!...)!after: ...
(?...) grab: ...
(?(cond)t|f)Not sure.  Could just use { if ... }

Obviously the word and word:... syntaxes will be user extensible.
We have to be able to support full grammars.  I consider it a feature
that foo looks like a non-terminal in standard BNF notation.  I do
not consider it a misfeature that foo resembles an HTML or XML tag,
since most of those languages need to be matched with a fancy rule
named tag anyway.

An interesting idea would be that if you say

mfoo: pat

or

m{code}

it's as if you said

m/foo: pat/

or

m/{code}/

The latter is particularly interesting to me in that I can see uses for
patterns that are Perl code at the top level rather than regex
literal.  Any closure within a regular expression has full access to
the current state object for the match.  So most of the RFCs proposing
ad hoc mechanisms for saving submatches in various kinds of variables
can be handled with closures.

/(...)(...)(...) { array = .all } /

or

/(...) { $first  = $+ }
 (...) { $second = $+ }
 (...) { $third  = $+ }/

or

/IF (COND) (BLOCK) { .node = [if,$1,$2] } /  # shades of yacc

or whatever.  Could have a $foo=... as syntactic sugar, perhaps.
But we need the general mechanism for building up parse trees of
arrays of hashes of arrays of arrays of hashes of arrays of hashes of...

I haven't decided yet whether matches embedded in the closure should
automatically pick up where the outer match is, or whether there should
be some explicit match op to mean that, much like \G only better.  I'm
thinking when the current topic is a match state, we automatically
continue where we left off, and require explicit =~ to start an unrelated
match.

I also haven't committed to any particular mechanism for defining a
set of related rules in a grammar.  Obviously it needs to be a good
enough mechanism to parse Perl and its variants, which means it
probably needs to be OO based, and you make new grammars by derivation
from the base grammar and overriding the rules you want to change.

Sorry if this is a bit delirious--I'm fighting off some kind of
infection, and my nights have been shortchanged lately by the
neighborhood panhandler who doesn't seem to understand 

Re: Regex and Matched Delimiters

2002-04-22 Thread Luke Palmer

 (?=...)   before: ...
 (?!...)   !before: ...
 (?=...)  after: ...
 (?!...)  !after: ...
 (?...)   grab: ...

Yummy :)
I'd say this is about perfect. The look(ahead|behind)s, er, 
look:ahead|behinds are used seldom enough that this is practical. And 
it's Iso much clea[nr]er than that (?=...) crap. (Think I'm going 
overboard with this tregext?)

And are you going to reveal the method by which you define your own 
words, so we can overload it with personal ungrounded opinions? (On the 
other hand, it'd probably just stick and not move, because you said it.)

 Sorry if this is a bit delirious--I'm fighting off some kind of
 infection, and my nights have been shortchanged lately by the
 neighborhood panhandler who doesn't seem to understand either
 complicated concepts like bedtime or simple concepts like no.

bed...what?


Luke




RE: Regex and Matched Delimiters

2002-04-22 Thread Brent Dax

Larry Wall:
# Me writes:
# :  Very nice (but, I assume you meant {$foo data})!
# : 
# : I didn't mean that (even if I should have).
# : 
# : Aiui, Mike's final suggestion was that parens end up
# : doing all the (ops data) tricks, and braces are used
# : purely to do code insertions. (I really liked that idea.)
# : 
# : So:
# : 
# : Perl 5Perl6
# : (data)( data)
# : (?opsdata)(ops data)
# : ({})  {}  
# 
# Hmm.  Let me spill a few beans about where I'm going with A5. 
#  I've been thinking similar thoughts about the problem of 
# overloading parens so heavily in Perl 5, but I'm going in a 
# slightly different direction with it.  The basic principles 
# for the new regexen are:
# 
# * Parens always capture.
# * Braces are always closures.
# * Square brackets are always character classes.
# * Angle brackets are always metasyntax (along with backslash).
# 
# So a first whack at the differences might be:
# 
# Old   New
# ---   ---
# ///prior/  ???
# ?pat? /?f:pat/  ???
# /pat/im:i/pat/ or /?i:pat/ or even m?i:pat ???

Whoa, those are moving to the front?!?

# /pat/x/pat/
# /^pat$/m  /^^pat$$/

That's...odd.  Is $$ (the variable) going away?

# /./s  /any/ or /./ ???

I think that . is too common a metacharacter to be relegated to this.

# \p{prop}  +prop  ???
# \P{prop}  -prop  ???

Intriguing.

# space sp (or \h for horizontal?)

Same thinking as '.'.

# {n,m} n,m

Ah, OK.

# \talso tab
# \nalso lf or nl (latter matching
logical newline)
# \ralso cr
# \falso ff
# \aalso bell
# \ealso esc

I can tell you right now that these are going to screw people up.
They'll try to use these in normal strings and be confused when it
doesn't work.  And you probably won't be able to emit a warning,
considering how much CGI Perl munches.

# \033  same
# \x1B  same
# \x{263a}  \x263a ???

Why?  Wouldn't we want the same thing to work in quoted strings?  (Or
are those changing syntaxes too?)

# \c[   same
# \N{name}  name
# \lsame
# \usame
# \Lstring\E\Lstring
# \Ustring\E\Ustring

So that's changed from whenever you talked about \q{} ?

# \Egone
# [\040\t]  \hplus any Unicode horizontal whitespace
# [\r\n\ck] \v  plus any Unicode vertical whitespace
# 
# \bsame
# \Bsame

# \A^
# \Zsame?
# \z$

Are you sure that optimizes for the common case?

# \Gpos, but assumed in nested patterns?
#  
# \1$1
# 
# \Q$var\E  $varalways assumed literal, so $1 is literal
backref

So these are reinterpolated every time you backtrack?  Are you *trying*
to destroy regex performance?  :^)

# $var  $var  assumed to be regex

What if $var is a qr//ed object?

# =~ $re=~ /$re/   ouch?

I don't see the win.

# (??{$rule})   rule
# (?{ code })   { code } with failure semantics
# (?#...)   {...} :-)
# (?:...)   :...
# (?=...)   before: ...
# (?!...)   !before: ...
# (?=...)  after: ...
# (?!...)  !after: ...

Cute.  (Wait a minute, aren't those reversed?)

# (?...)   grab: ...
# (?(cond)t|f)  Not sure.  Could just use { if ... }

if(cond):true|false?

# Obviously the word and word:... syntaxes will be user 
# extensible. We have to be able to support full grammars.  I 
# consider it a feature that foo looks like a non-terminal in 
# standard BNF notation.  I do not consider it a misfeature 
# that foo resembles an HTML or XML tag, since most of those 
# languages need to be matched with a fancy rule named tag anyway.

But that *does* make it harder to define the fancy rules.  I could see
someone defining rules like:

'gt' = qr/\/,
'lt' = qr/\/

just to get around backslashing everything in sight.

# An interesting idea would be that if you say
# 
# mfoo: pat
# 
# or
# 
# m{code}
# 
# it's as if you said
# 
# m/foo: pat/
# 
# or
# 
# m/{code}/

I don't know about that one.  I often use {} as delimiters on regexen
because it's a character that doesn't occur in data very often.  I think
the gain of two characters isn't as critical as the loss of