Re: A5: hypotheticals outside regexen

2002-06-06 Thread Damian Conway

 Page 13 tells use about Clet decls. But it also says that the topic must
 be a regex. Whilst it explains that this isn't really a problem, I'm not
 sure that it justifies it. So perhaps someone can clarify why this
 (hypothetical) code in not a reasonable generalization:

Because Perl code doesn't backtrack (except within regexes).
Exceptions and backtracking are quite different.

If you want hypothetical *code*, put it in a regex.

Damian



Re: A5: hypotheticals outside regexen

2002-06-06 Thread Damian Conway

 You have Ino idea how often that would have been useful.  It's a great
 exception safety mechanism... like C++'s resource aquisition is
 initialization thingy, but without having to write a class for every
 variable.

Have you already forgotten KEEP and UNDO (that we introduced in A4/E4):

   our $foo = 0;
  
   sub do_something
   {
 KEEP { $foo = $foo + 1 }
 commit();
   }
  
   sub commit
   {
 fail if rand  0.3;
   }
  
   for 1..10
   {
 try { do_something() }
   }
  
   print $foo\n; # expect a value of around 7


;-)

Damian



Re: A5: a few simple questions

2002-06-06 Thread Damian Conway

David Whipp wrote:
 
 First, a slight clarification: if I say:
 
   m:w/ %foo := [ (\w+) = (\w+) [ , (\w+) ]* ] /
 
 does this give me a hash of arrays? (i.e. is the rhs of a hash processed as
 a scalar context)

That's an error. The grouping bound to a hypothetical hash has to have
either exactly one or exactly two captures in it. To get what you want 
you'd need something like:

rule wordlist { (\w+) [ , (\w+) ]* }
m:w/ %foo := [ (\w+) = (wordlist) ] /

or just:

m:w/ %foo := [ (\w+) = ({ /(\w+) [ , (\w+) ]*/ }) ] /



 When I look at this, I see a common pattern: the join/split concept. It
 feels like there should be a standard assertion:

These are good ideas for assertions. If they don't become standard, it will
certainly be possible to write a module that makes them available.


 And a question about m,n (I think something similar came up a few weeks
 ago): why isn't it m..n, i.e. a list of the numbers of matches allowed.
 This seems to be the only place in perl6 where a list of numbers, as a
 range, isn't constructed using the .. operator.

Because a m,n isn't a list of numbers. It's the lower and upper bounds on a
repetition count.

Damian



Re: A5: making a production out of REs

2002-06-06 Thread Damian Conway

Rich Morin wrote:

 I'd like to be able to use REs to generate lists of strings.  For
 example, it might be nice to create a loop such as:
 
for $i (sort(p:p5|[0-9A-F]{2}|)) {  # p operator for production?
 
 and have $i walk from '00' through 'FF'.  Or whatever.

You mean:

$ch = any(0..9,'A'..'F');
for sort egs $ch _ $ch  = $i {
...
}

where Cegs is the (hypothetical) eigenstate operator on
(hypothetical) superpositions?

Even if Larry decides against superpositions, there will definitely be some
kind of non-quantum iterator syntax that supports these kinds of permuted
sequences.

Damian



Re: 6PAN (was: Half measures all round)

2002-06-06 Thread Josh Wilmes



For the record, you will hear no disagreement from me.  I recognize that 
this is a HARD problem.  Nonetheless, I think it's an important one, and 
solving it (even imperfectly, by only supporting well-defined platforms)
would be a major coup.

--Josh

At 23:31 on 06/05/2002 BST, Nicholas Clark [EMAIL PROTECTED] wrote:

 On Wed, Jun 05, 2002 at 12:55:36AM -0400, Josh Wilmes wrote:
  
  Good stuff.  Sounds halfway between CPAN.pm and activestate's ppm.  See 
  also debian's apt-get.
  
  Which brings me to my pet peeve-  I think it's time to start doing binary 
  packaging in CPAN, for those who don't want to bother with compilation.
  
  That has interesting implications for how we deal with paths, but still, I 
  think it's worthwhile.
  
  Of course you would want to support source as well, but having binary 
  available for those who want it just seems like a darn good idea.
 
 OK. Say I want binaries for my 3 boxes:
 
 On Bagpuss /usr/local/bin/perl -v says:
 
 This is perl, v5.8.0 built for armv4l-linux
 (with 1 registered patch, see perl -V for more detail)
 
 but you had better actually build that with -v3 flags on your ARM compiler
 because my machine's hardware can't cope with the v4 instructions on the CPU
 
 On Thinking-Cap /usr/local/bin/perl -v says:
 
 This is perl, version 5.004_05 built for i386-freebsd
 
 Copyright 1987-1998, Larry Wall
 
 5.004 is officially still supported, and some modules do build on 5.004
 
 [Third box, Marvellous-Mechanical-Mouse-Organ is an SGI Indy and doesn't
 doesn't want to power up for some reason, probably because it's been off
 for about 12 months]
 
 I presume you're going to suggest that they are too obscure for binary CPAN
 to support them. So limit things to the most recent perl. But having
 experimented with trying to ship 5.8.0-RC1 between FreeBSD versions, there
 are sufficient changes between libc on 4.4 STABLE and 4.5 STABLE such that
 you can't run a binary compiled on 4.5 on a 4.4 box due to missing symbols.
 So you're starting to enter version compatibility nightmare.
 
 And if you have module needing a C++ compiler, are you going to ship your
 x86 linux binaries using RedHat's 2.96, or a real gcc?
 
 And are you doing dependencies, or are you interfacing with the OS package
 manager? And if you're not interfacing, but you are adding modules to the
 OS perl, then what do you do if one of your dependency modules is already
 there? Do you just go oh good, have binary CPAN say nothing, and then
 hope that the OS packaging system doesn't remove the dependency module from
 under you?
 
 I believe that binary CPAN would have problems that scale as the number
 of OS subversions that binary CPAN would try to support.
 
 This may sound rather negative, but it basically means that I'm feeling
 sufficiently pessimistic that I don't think there are reasonable solutions
 to the problems. However, that's only my opinion, and others' will differ.
 
 On the other hand, I think the idea of multiple platforms automatic CPAN
 testing is a very good idea.
 
 Nicholas Clark
 -- 
 Even better than the real thing:  http://nms-cgi.sourceforge.net/





Re: A5: making a production out of REs

2002-06-06 Thread Rich Morin

At 6:10 PM +1000 6/6/02, Damian Conway wrote:
  Rich sez:
 But make Damian use es, rather than egs for the
 eigenstate (is :-) operator.

s/is/it/, above (blush).  That is, the superposition _could_ be in
any of several states, but the eigenstate tells us what it really is.

No, no, no! any and all are three letters, so the eigenstate operator has
to be as well. And since the eigenstates are *examples of the possible states
of a superposition, egs is entire appropriate! ;-)

Well, neither es not egs is a word, at least in Scrabble (though this is an
egs Scrabble argument).  While we're on the subject, however, make sure that
you warn Unicode users against putting an umlaut on the a in all or any,
as you can't have an umlaut without ...

We now return to the (ahem) serious topics of the list.

-r
-- 
email: [EMAIL PROTECTED]; phone: +1 650-873-7841
http://www.cfcl.com/rdm- my home page, resume, etc.
http://www.cfcl.com/Meta   - The FreeBSD Browser, Meta Project, etc.
http://www.ptf.com/dossier - Prime Time Freeware's DOSSIER series
http://www.ptf.com/tdc - Prime Time Freeware's Darwin Collection



Re: A5: a few simple questions

2002-06-06 Thread John Siracusa

On 6/6/02 2:43 AM, Damian Conway wrote:
   rule wordlist { (\w+) [ , (\w+) ]* }

No semicolon at the end of that line?  I've already forgotten the new
rules for that type of thing... :)

-John




Re: A5: a few simple questions

2002-06-06 Thread Allison Randal

On Thu, Jun 06, 2002 at 10:38:39AM -0400, John Siracusa wrote:
 On 6/6/02 2:43 AM, Damian Conway wrote:
rule wordlist { (\w+) [ , (\w+) ]* }
 
 No semicolon at the end of that line?  I've already forgotten the new
 rules for that type of thing... :)

No, because rules are basically methods, just like grammars are
basically classes. You would only need a semi-colon if you were defining
an anonymous Crule (similar to an anonymous Csub):

my $wordlist = rule { (\w+) [ , (\w+) ]* };

Allison



A5: Is this right?

2002-06-06 Thread Brent Dax

#Preliminary Perl6::Regex
#  This does not have any actions, but otherwise I think is correct.
#  Let me know if it's right or not.

use 6;

grammar Perl6::Regex {
  rule metachar { [{(\[\])}:*+?\\|]}
  
  rule ws   { [[\h\v]|\#\N*]*}
  
  rule atom { ws (!metachar | \\ . | group) ws }
  
  rule modifier { ws ([*+?] \?? \:?) ws  }
  
  rule molecule {
   (  atom modifier
   |  ws \:1,4 ws
   |  compound ws \| ws compound
   )
  }
  
  rule compound { [(molecule)]*  }
  
  rule group{ws 
   (  \( compound \)
   |  \[ compound \]
   |  \{ Perl6::Code \}
   |  \ !? [ \w+ | \d+ , \d+ ] compound \
   )
   ws
  }
}

--Brent Dax [EMAIL PROTECTED]
@roles=map {Parrot $_} qw(embedding regexen Configure)

Early in the series, Patrick Stewart came up to us and asked how warp
drive worked.  We explained some of the hypothetical principles . . .
Nonsense, Patrick declared.  All you have to do is say, 'Engage.'
--Star Trek: The Next Generation Technical Manual




Re: A5: Is this right?

2002-06-06 Thread Buddha Buck

At 11:31 AM 06-06-2002 -0700, Brent Dax wrote:
#Preliminary Perl6::Regex
#  This does not have any actions, but otherwise I think is correct.
#  Let me know if it's right or not.

I'm not a regex guru, but...


use 6;

grammar Perl6::Regex {
   rule metachar { [{(\[\])}:*+?\\|]}

   rule ws   { [[\h\v]|\#\N*]*}

   rule atom { ws (!metachar | \\ . | group) ws }

I had gotten the impression that a literal string separated by whitespace 
was an atom, so

rule foofoobar { foo 1,2 bar }

would match 'foobar' or 'foofoobar'.  If so, I think !metachar needs to 
be replaced by !metachar+

 rule modifier { ws ([*+?] \?? \:?) ws  }

   rule molecule {
(  atom modifier

atom ends with ws, modifier begins with ws.  Does that mean that 
there must be two ws between an atom and a modifier?  (Possibly not, 
since ws can match null, so 'a*' would match ws with four nulls).  Just 
clarifying for myself.

|  ws \:1,4 ws
|  compound ws \| ws compound
)
   }

   rule compound { [(molecule)]*  }

   rule group{ws
(  \( compound \)
|  \[ compound \]
|  \{ Perl6::Code \}
|  \ !? [ \w+ | \d+ , \d+ ] compound \
)
ws
   }
}

--Brent Dax [EMAIL PROTECTED]
@roles=map {Parrot $_} qw(embedding regexen Configure)

Early in the series, Patrick Stewart came up to us and asked how warp
drive worked.  We explained some of the hypothetical principles . . .
Nonsense, Patrick declared.  All you have to do is say, 'Engage.'
 --Star Trek: The Next Generation Technical Manual




Re: A5: Is this right?

2002-06-06 Thread Larry Wall

On Thu, 6 Jun 2002, Buddha Buck wrote:

 At 11:31 AM 06-06-2002 -0700, Brent Dax wrote:
 I had gotten the impression that a literal string separated by whitespace 
 was an atom, so
 
 rule foofoobar { foo 1,2 bar }
 
 would match 'foobar' or 'foofoobar'.  If so, I think !metachar needs to 
 be replaced by !metachar+

Nope, still gotta use [foo] if you want an atom larger than a character
(whatever a character is...)

Larry




Apoc5 comments/questions

2002-06-06 Thread Jonathan Scott Duff


Whew! I've carefully (well, I tried to be careful :-) read through
Apocalypse 5 twice now and it still makes my head hurt (but in a good
way). What follows is some notes that I jotted down and am tired of
looking at.  Please correct any misconceptions and feel free to add
where I've omitted.


Here's a quick table of the built-in modifiers that I saw and/or
surmised. Are there any others? (entries with ? are guesses or unknown
on my part)

long form   short form  meaning
:any:a  match returns a list of anywhere the pattern
matches within the string regarless of overlap.
:each   :e  Apply the pattern each time we can
within the string? Is this what happened
to perl5's /g modifier?
:once   :o  Match succeeds exactly once (unless .reset)
:words  :w  Perform a word match treating
whitespace between patterns as if it
were \s+
:cont   :c  Continue from where the last match left off 
:ignorecase :i  Match alphabetics case insensitively
:perl5? :p5 Match using perl 5 rules
:unicode0?  :u0 dot matches bytes
:unicode1?  :u1 dot matches code points
:unicode2?  :u2 dot matches graphemes
:unicode3?  :u3 what dot matches is language dependent
:?  :1stsucceed on the first match
:?  :2ndsucceed on the second match
:?  :3rdsucceed on the third match
:?  :4thsucceed on the fourth match

This pattern continues for positive integers (i.e. :53rd succeeds on the
fifty-third match) It'd be simpler IMHO, if instead of the st, nd,
rd, and th suffixes it were an n suffix. e.g., :53n would succeed
on the fifty-third match.

:1time? :1x match exactly one time
:2times?:2x match two times
:3times?:3x match three times

This pattern continues for all positive integers (i.e. :23x matches 23
times) Is the x necessary? In a later example s:3/// is used to
perform the s/// 3 times.

Can I use 0 in the above?  Will :0 never match?  Is there a way to
interpolate the number?  Does :$number work?

The text says:

A modifier that starts with a number causes the pattern to match
that many times. It may only be used outside the regex.

Why only outside the RE?  Why wouldn't /:3x foo/ be synonymous with
/foo3/?

And here's a table of built-in assertions; are there any others?

assertion   meaning
alpha matches any alphabetic character
digit matches any numeric character
spmatches a space character
prior match whatever the most recently successful match did
null  match nothing
commitfails the match if backtracked to
cut   fails the match if backtracked to and removes the
portion of the string that matched to that point
before ...match if the pattern occurs before ...
after ... match if the pattern occurs after ...

The example at the top of Backslash Reform ...

$oldpos = pos $string;
$string =~ m/... ( .pos == $oldpos ) .../;

Shouldn't that first line should be something like 

$oldpos = $matchobj.pos;# or ...
$oldpos = pos $matchobj;# or just ...
$oldpos = pos;  # uses the most recently seen 
# match object
 
?

End of random ramblings ...

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]



Re: A5: a few simple questions

2002-06-06 Thread Piers Cawley

Allison Randal [EMAIL PROTECTED] writes:

 On Thu, Jun 06, 2002 at 10:38:39AM -0400, John Siracusa wrote:
 On 6/6/02 2:43 AM, Damian Conway wrote:
rule wordlist { (\w+) [ , (\w+) ]* }
 
 No semicolon at the end of that line?  I've already forgotten the new
 rules for that type of thing... :)

 No, because rules are basically methods, just like grammars are
 basically classes. You would only need a semi-colon if you were defining
 an anonymous Crule (similar to an anonymous Csub):

   my $wordlist = rule { (\w+) [ , (\w+) ]* };

You wouldn't even need it then. Assuming you're following the closing
brace with nothing but white space and a newline.

-- 
Piers

   It is a truth universally acknowledged that a language in
possession of a rich syntax must be in need of a rewrite.
 -- Jane Austen?




Re: A5: a few simple questions

2002-06-06 Thread Allison Randal

On Thu, Jun 06, 2002 at 08:21:25PM +0100, Piers Cawley wrote:
 Allison Randal [EMAIL PROTECTED] writes:
 
  No, because rules are basically methods, just like grammars are
  basically classes. You would only need a semi-colon if you were defining
  an anonymous Crule (similar to an anonymous Csub):
 
  my $wordlist = rule { (\w+) [ , (\w+) ]* };
 
 You wouldn't even need it then. Assuming you're following the closing
 brace with nothing but white space and a newline.

I guess you're talking about the bit of A4 to do with When do I put a
semicolon after a curly?. But that is if the final curly is on a line
by itself. So you could get away with:

my $wordlist = rule { 
(\w+) [ , (\w+) ]* 
}

Allison



RFC261 in Perl 5 and where it needs Perl 6 support

2002-06-06 Thread Aaron Sherman


Larry discounted RFC261 in A5, but I think there's some good in it. The
biggest problem is not that it's hard to do in Perl6, but that 80-90% of
it is ALREADY done in Perl5! Once you peel away that portion of the RFC,
you get to Perl5's limitations and what Perl6 might do to support these
things.

NOTE: My examples are unchecked, so they should be considered
pseudo-code.

# RFC261
match ($a) = foo;
# Perl5
($a) = foo;

#RFC261
match { 'Joe' = ? } = $h or die Hash does not contain Joe;
#Perl5
scalar(grep { $_ eq 'Joe' } keys %$h) or die ...

# This one states its own solution
# Equiv to scalar(grep { $_ == 1 } list)
  match (..., 1, ...) = list;

# No idea what this is supposed to do. I think $_[$_] is meant to work
# in a way it doesn't ($_ will be a value, not an index).
# Pretty close to ($idx) = grep { $_[$_] == 1 } _; $b = $_[$idx+1];
  match (..., 1, $b) = _;
# However, I want to suggest that Perl6 closures on grep and map should
# be considered anonymous methods on the implied loop itself, so that we
# can call methods like .index or .prev or .next (look ahead/behind), etc.
# This gets sticky in some situations, but it would be invaluable, so it's
# worth the effort, IMHO.

# RFC
# It gets worse! This gives the value associated with a key matching the
  # regular expression a*b:
  match { /a*b/ = $value } = \%h;
#Perl5
# The RFC does not account for multiple matches
# (presumably we just take the first)
($value) = map {$h{$_}} grep {/a*b/} keys %h;

# RFC
# And if you want to know what the key was:
  match { $key = /a*b/ = $value } = \%h;
# Perl5
($key,$value) = map {($_,$h{$_})} grep {/a*b/} keys %h;

# RFC
# What if you want to grab out the index? This is like
  # ($i) = grep { $list[$_] =~ /foo/ } 0..$#list
  match ( $i = /foo/ ) = list;
# Perl5
# As suggested above
# Perl6
# See my previous comment on closures of this type
grep {/foo/  $i = .index} list;


Sorry, it's time for me to go meet someone, but I think the rest of the
RFC just gets into two cases. One involves some funky OO stuff that I'm
not going to touch on here. The other is the idea of matching sub-lists,
which would be the place for some methods inside of grep and map that
support look ahead and behind.

I hope others will take up this line of thought. Improved list
manipulation can only be a good thing, IMHO.





Re: A5: Is this right?

2002-06-06 Thread Damian Conway

Brent Dax wrote:

 grammar Perl6::Regex {
   rule metachar { [{(\[\])}:*+?\\|]}
 
   rule ws   { [[\h\v]|\#\N*]*}

Or just:

rule ws   { [\s|\#\N*]*  }


   rule atom { ws (!metachar | \\ . | group) ws }
 
   rule modifier { ws ([*+?] \?? \:?) ws  }

rule modifier { ws ([[*+?]|reprange] \?? \:?) ws }
 
rule reprange { \ [ bound [, bound?]? | , bound ] \ }

rule bound{ \d+ | Perl::scalar }


There are also bits missing from the rest of the grammar (e.g. named captures).
I'll be showing a full regex grammar in E5.

Damian



Re: A5: Is this right?

2002-06-06 Thread Larry Wall

On Fri, 7 Jun 2002, Damian Conway wrote:

 Brent Dax wrote:
 
  grammar Perl6::Regex {
rule metachar { [{(\[\])}:*+?\\|]}
  
rule ws   { [[\h\v]|\#\N*]*}
 
 Or just:
 
 rule ws   { [\s|\#\N*]*  }

Just as a practical matter, given that you tend to have runs of
whitespace,

rule ws   { [ \s+ | \#\N* ]*   }

will probably run faster.   At least, that would certainly run
faster with Perl 5's engine.  Can't speak for Perl 6's, of course.

As a different kind of practical matter, if we put spaces around
our square brackets and vertical bars, it won't look so much like
a character class.  I know we're all from the old school, but we
should therefore be even more alert against excessive regex
compaction.

Larry




Apoc 5 questions/comments

2002-06-06 Thread Dave Storrs

Well, A5 definitely has my head spinning.  The new features seem amazingly
powerful...it almost feels like we're going to have two equally powerful,
equally complex languages living side-by-side:  one of them is called
Perl and the other one is called Regexes.  Although they may talk to
one another, I really did come away feeling like they were completely
separate animals.

I admit I'm a bit nervous about that...so far, I'm completely sold on
(basically) all the new features and changes in Perl 6, and I'm eagerly
anticipating working with them.  But this level of change...I don't know.
I've spent a lot of time getting to be (reasonaly) good at Perl regular
expressions, and I don't like the thought of throwing out all or most of
that effort.  Somehow, this feels like we're trying to roll all of Prolog
into Perl, and I'm not sure I personally want to go there (note the
personally...YMMV).

For now, I'm just going to defer worrying about it until I see Exegesis 5,
since past experience has shown me that there is a good chance that all my
fears will be shown to be groundless once concrete examples are being
demonstrated.


In any case, I do have some specific questions:

-

Page 8:
s:3x:3rd /foo/bar/
That changes the 3rd, 6th, and 9th occurrences.

Just to verify, this:

s:3rd /foo3/bar/

would do the 3rd, 4th, and 5th, correct?

-

Page 8:

The u1-u3 mods all say level 1 support.  I assume this was a typo, and
they should go (u1 = 'level 1', u2 = 'level 2', u3 = 'level 3').

-

Can modifiers abut the delimiter?

s:3x /foo/bar# most (all?) examples looked like this
s:3x/foo/bar # is this legal?

-

Can we please have a 'reverse x' modifier that means treat whitespace as
literals?  Yes, we are living in a Unicode world now and your data could
theoretically be coming in from a different character set than expected.
But there are times when it won't...when (for example), you wrote the data
out yourself, or you're operating on files that are generated and
maintained purely in-house, so they are guaranteed to be in the same
character set as the Perl source code you're writing.  I understand the
arguments for the way the defaults are set.  I even agree with them.  But
you will NEVER convince me that the first example below is not easier to
read than any of the alternatives:

/FATAL ERROR\:Process (\d+) received signal\: (\d+)/
/FATAL ERROR\:\ \ \ \ Process\ (\d+)\ received\ signal\:\ (\d+)/
/FATAL ERROR\: \h+ Process \h+ (\d+) \h+ received \h+ signal: \h+ (\d+)/
/FATAL ERROR\: \s+ Process \s+ (\d+) \s+ received \s+ signal: \s+ (\d+)/

(Yes, I know that the last one matches vertical whitespace and
therefore means something slightly different than the others.)

If this means that we need to store a byte or two to remember what
character set the originally-read-in code was in before being converted to
UTF-8 (or whatever we're using internally), so that we know what character
set to assume literal ws refers to...well, that seems like a small
price to pay for a lot of convenience.

-

Page 9:
my $foo = ?/.../;  # boolean context, return whether matched,
my $foo = +/.../;  # numeric context, return count of matches
my $foo = _/.../;  # string context, return captured/matched string

This 'initial character to force evaluation' rule initially seemed
annoying, but the more I think about it, the more I like it; one
character isn't much to type, and it makes it extremely clear why you're
doing the match (i.e., what you're trying to get back).  Kudos to our
Fearless Language Designer!

-

I am a little unclear on what the difference is between these two:
my foo = $rx;
my foo = m/$rx/;

If I understand correctly, it works like this:

my stuff;
$_ = foofoofoo;
$rx = /:each foo/;

for (0..2) { stuff = $rx }
# above line is equialent to following 3 lines:
stuff = ('foo', 'foo', 'foo');
stuff = ();
stuff = ();

for (0..2) { stuff = m/$rx/ }
# above line is equialent to following 3 lines:
stuff = ('foo', 'foo', 'foo');
stuff = ('foo', 'foo', 'foo');
stuff = ('foo', 'foo', 'foo');

Is that correct?

-

Page 10:

You could also use the {'...'} construct for comments, but then
you risk warnings about useless use of a string in void context.

Could we automagically turn off that warning inside such constructs, when
the only thing there was a string?  (Perhaps there could be a switch
that prevented it from being turned off, if people really wanted to
see it; if so, make it be OFF by default, so it needs to be enabled,
much like 'use strict.')

-

Page 11:

/ pattern ::: { code() or fail } /  # fails entire rule

Farther down:

A pattern nested within a closure is classified as its own rule,
however, so it never gets the chance to pass out of a {...}
closure.

If I understand