Re: A5: hypotheticals outside regexen
Page 13 tells use about Clet decls. But it also says that the topic must be a regex. Whilst it explains that this isn't really a problem, I'm not sure that it justifies it. So perhaps someone can clarify why this (hypothetical) code in not a reasonable generalization: Because Perl code doesn't backtrack (except within regexes). Exceptions and backtracking are quite different. If you want hypothetical *code*, put it in a regex. Damian
Re: A5: hypotheticals outside regexen
You have Ino idea how often that would have been useful. It's a great exception safety mechanism... like C++'s resource aquisition is initialization thingy, but without having to write a class for every variable. Have you already forgotten KEEP and UNDO (that we introduced in A4/E4): our $foo = 0; sub do_something { KEEP { $foo = $foo + 1 } commit(); } sub commit { fail if rand 0.3; } for 1..10 { try { do_something() } } print $foo\n; # expect a value of around 7 ;-) Damian
Re: A5: a few simple questions
David Whipp wrote: First, a slight clarification: if I say: m:w/ %foo := [ (\w+) = (\w+) [ , (\w+) ]* ] / does this give me a hash of arrays? (i.e. is the rhs of a hash processed as a scalar context) That's an error. The grouping bound to a hypothetical hash has to have either exactly one or exactly two captures in it. To get what you want you'd need something like: rule wordlist { (\w+) [ , (\w+) ]* } m:w/ %foo := [ (\w+) = (wordlist) ] / or just: m:w/ %foo := [ (\w+) = ({ /(\w+) [ , (\w+) ]*/ }) ] / When I look at this, I see a common pattern: the join/split concept. It feels like there should be a standard assertion: These are good ideas for assertions. If they don't become standard, it will certainly be possible to write a module that makes them available. And a question about m,n (I think something similar came up a few weeks ago): why isn't it m..n, i.e. a list of the numbers of matches allowed. This seems to be the only place in perl6 where a list of numbers, as a range, isn't constructed using the .. operator. Because a m,n isn't a list of numbers. It's the lower and upper bounds on a repetition count. Damian
Re: A5: making a production out of REs
Rich Morin wrote: I'd like to be able to use REs to generate lists of strings. For example, it might be nice to create a loop such as: for $i (sort(p:p5|[0-9A-F]{2}|)) { # p operator for production? and have $i walk from '00' through 'FF'. Or whatever. You mean: $ch = any(0..9,'A'..'F'); for sort egs $ch _ $ch = $i { ... } where Cegs is the (hypothetical) eigenstate operator on (hypothetical) superpositions? Even if Larry decides against superpositions, there will definitely be some kind of non-quantum iterator syntax that supports these kinds of permuted sequences. Damian
Re: 6PAN (was: Half measures all round)
For the record, you will hear no disagreement from me. I recognize that this is a HARD problem. Nonetheless, I think it's an important one, and solving it (even imperfectly, by only supporting well-defined platforms) would be a major coup. --Josh At 23:31 on 06/05/2002 BST, Nicholas Clark [EMAIL PROTECTED] wrote: On Wed, Jun 05, 2002 at 12:55:36AM -0400, Josh Wilmes wrote: Good stuff. Sounds halfway between CPAN.pm and activestate's ppm. See also debian's apt-get. Which brings me to my pet peeve- I think it's time to start doing binary packaging in CPAN, for those who don't want to bother with compilation. That has interesting implications for how we deal with paths, but still, I think it's worthwhile. Of course you would want to support source as well, but having binary available for those who want it just seems like a darn good idea. OK. Say I want binaries for my 3 boxes: On Bagpuss /usr/local/bin/perl -v says: This is perl, v5.8.0 built for armv4l-linux (with 1 registered patch, see perl -V for more detail) but you had better actually build that with -v3 flags on your ARM compiler because my machine's hardware can't cope with the v4 instructions on the CPU On Thinking-Cap /usr/local/bin/perl -v says: This is perl, version 5.004_05 built for i386-freebsd Copyright 1987-1998, Larry Wall 5.004 is officially still supported, and some modules do build on 5.004 [Third box, Marvellous-Mechanical-Mouse-Organ is an SGI Indy and doesn't doesn't want to power up for some reason, probably because it's been off for about 12 months] I presume you're going to suggest that they are too obscure for binary CPAN to support them. So limit things to the most recent perl. But having experimented with trying to ship 5.8.0-RC1 between FreeBSD versions, there are sufficient changes between libc on 4.4 STABLE and 4.5 STABLE such that you can't run a binary compiled on 4.5 on a 4.4 box due to missing symbols. So you're starting to enter version compatibility nightmare. And if you have module needing a C++ compiler, are you going to ship your x86 linux binaries using RedHat's 2.96, or a real gcc? And are you doing dependencies, or are you interfacing with the OS package manager? And if you're not interfacing, but you are adding modules to the OS perl, then what do you do if one of your dependency modules is already there? Do you just go oh good, have binary CPAN say nothing, and then hope that the OS packaging system doesn't remove the dependency module from under you? I believe that binary CPAN would have problems that scale as the number of OS subversions that binary CPAN would try to support. This may sound rather negative, but it basically means that I'm feeling sufficiently pessimistic that I don't think there are reasonable solutions to the problems. However, that's only my opinion, and others' will differ. On the other hand, I think the idea of multiple platforms automatic CPAN testing is a very good idea. Nicholas Clark -- Even better than the real thing: http://nms-cgi.sourceforge.net/
Re: A5: making a production out of REs
At 6:10 PM +1000 6/6/02, Damian Conway wrote: Rich sez: But make Damian use es, rather than egs for the eigenstate (is :-) operator. s/is/it/, above (blush). That is, the superposition _could_ be in any of several states, but the eigenstate tells us what it really is. No, no, no! any and all are three letters, so the eigenstate operator has to be as well. And since the eigenstates are *examples of the possible states of a superposition, egs is entire appropriate! ;-) Well, neither es not egs is a word, at least in Scrabble (though this is an egs Scrabble argument). While we're on the subject, however, make sure that you warn Unicode users against putting an umlaut on the a in all or any, as you can't have an umlaut without ... We now return to the (ahem) serious topics of the list. -r -- email: [EMAIL PROTECTED]; phone: +1 650-873-7841 http://www.cfcl.com/rdm- my home page, resume, etc. http://www.cfcl.com/Meta - The FreeBSD Browser, Meta Project, etc. http://www.ptf.com/dossier - Prime Time Freeware's DOSSIER series http://www.ptf.com/tdc - Prime Time Freeware's Darwin Collection
Re: A5: a few simple questions
On 6/6/02 2:43 AM, Damian Conway wrote: rule wordlist { (\w+) [ , (\w+) ]* } No semicolon at the end of that line? I've already forgotten the new rules for that type of thing... :) -John
Re: A5: a few simple questions
On Thu, Jun 06, 2002 at 10:38:39AM -0400, John Siracusa wrote: On 6/6/02 2:43 AM, Damian Conway wrote: rule wordlist { (\w+) [ , (\w+) ]* } No semicolon at the end of that line? I've already forgotten the new rules for that type of thing... :) No, because rules are basically methods, just like grammars are basically classes. You would only need a semi-colon if you were defining an anonymous Crule (similar to an anonymous Csub): my $wordlist = rule { (\w+) [ , (\w+) ]* }; Allison
A5: Is this right?
#Preliminary Perl6::Regex # This does not have any actions, but otherwise I think is correct. # Let me know if it's right or not. use 6; grammar Perl6::Regex { rule metachar { [{(\[\])}:*+?\\|]} rule ws { [[\h\v]|\#\N*]*} rule atom { ws (!metachar | \\ . | group) ws } rule modifier { ws ([*+?] \?? \:?) ws } rule molecule { ( atom modifier | ws \:1,4 ws | compound ws \| ws compound ) } rule compound { [(molecule)]* } rule group{ws ( \( compound \) | \[ compound \] | \{ Perl6::Code \} | \ !? [ \w+ | \d+ , \d+ ] compound \ ) ws } } --Brent Dax [EMAIL PROTECTED] @roles=map {Parrot $_} qw(embedding regexen Configure) Early in the series, Patrick Stewart came up to us and asked how warp drive worked. We explained some of the hypothetical principles . . . Nonsense, Patrick declared. All you have to do is say, 'Engage.' --Star Trek: The Next Generation Technical Manual
Re: A5: Is this right?
At 11:31 AM 06-06-2002 -0700, Brent Dax wrote: #Preliminary Perl6::Regex # This does not have any actions, but otherwise I think is correct. # Let me know if it's right or not. I'm not a regex guru, but... use 6; grammar Perl6::Regex { rule metachar { [{(\[\])}:*+?\\|]} rule ws { [[\h\v]|\#\N*]*} rule atom { ws (!metachar | \\ . | group) ws } I had gotten the impression that a literal string separated by whitespace was an atom, so rule foofoobar { foo 1,2 bar } would match 'foobar' or 'foofoobar'. If so, I think !metachar needs to be replaced by !metachar+ rule modifier { ws ([*+?] \?? \:?) ws } rule molecule { ( atom modifier atom ends with ws, modifier begins with ws. Does that mean that there must be two ws between an atom and a modifier? (Possibly not, since ws can match null, so 'a*' would match ws with four nulls). Just clarifying for myself. | ws \:1,4 ws | compound ws \| ws compound ) } rule compound { [(molecule)]* } rule group{ws ( \( compound \) | \[ compound \] | \{ Perl6::Code \} | \ !? [ \w+ | \d+ , \d+ ] compound \ ) ws } } --Brent Dax [EMAIL PROTECTED] @roles=map {Parrot $_} qw(embedding regexen Configure) Early in the series, Patrick Stewart came up to us and asked how warp drive worked. We explained some of the hypothetical principles . . . Nonsense, Patrick declared. All you have to do is say, 'Engage.' --Star Trek: The Next Generation Technical Manual
Re: A5: Is this right?
On Thu, 6 Jun 2002, Buddha Buck wrote: At 11:31 AM 06-06-2002 -0700, Brent Dax wrote: I had gotten the impression that a literal string separated by whitespace was an atom, so rule foofoobar { foo 1,2 bar } would match 'foobar' or 'foofoobar'. If so, I think !metachar needs to be replaced by !metachar+ Nope, still gotta use [foo] if you want an atom larger than a character (whatever a character is...) Larry
Apoc5 comments/questions
Whew! I've carefully (well, I tried to be careful :-) read through Apocalypse 5 twice now and it still makes my head hurt (but in a good way). What follows is some notes that I jotted down and am tired of looking at. Please correct any misconceptions and feel free to add where I've omitted. Here's a quick table of the built-in modifiers that I saw and/or surmised. Are there any others? (entries with ? are guesses or unknown on my part) long form short form meaning :any:a match returns a list of anywhere the pattern matches within the string regarless of overlap. :each :e Apply the pattern each time we can within the string? Is this what happened to perl5's /g modifier? :once :o Match succeeds exactly once (unless .reset) :words :w Perform a word match treating whitespace between patterns as if it were \s+ :cont :c Continue from where the last match left off :ignorecase :i Match alphabetics case insensitively :perl5? :p5 Match using perl 5 rules :unicode0? :u0 dot matches bytes :unicode1? :u1 dot matches code points :unicode2? :u2 dot matches graphemes :unicode3? :u3 what dot matches is language dependent :? :1stsucceed on the first match :? :2ndsucceed on the second match :? :3rdsucceed on the third match :? :4thsucceed on the fourth match This pattern continues for positive integers (i.e. :53rd succeeds on the fifty-third match) It'd be simpler IMHO, if instead of the st, nd, rd, and th suffixes it were an n suffix. e.g., :53n would succeed on the fifty-third match. :1time? :1x match exactly one time :2times?:2x match two times :3times?:3x match three times This pattern continues for all positive integers (i.e. :23x matches 23 times) Is the x necessary? In a later example s:3/// is used to perform the s/// 3 times. Can I use 0 in the above? Will :0 never match? Is there a way to interpolate the number? Does :$number work? The text says: A modifier that starts with a number causes the pattern to match that many times. It may only be used outside the regex. Why only outside the RE? Why wouldn't /:3x foo/ be synonymous with /foo3/? And here's a table of built-in assertions; are there any others? assertion meaning alpha matches any alphabetic character digit matches any numeric character spmatches a space character prior match whatever the most recently successful match did null match nothing commitfails the match if backtracked to cut fails the match if backtracked to and removes the portion of the string that matched to that point before ...match if the pattern occurs before ... after ... match if the pattern occurs after ... The example at the top of Backslash Reform ... $oldpos = pos $string; $string =~ m/... ( .pos == $oldpos ) .../; Shouldn't that first line should be something like $oldpos = $matchobj.pos;# or ... $oldpos = pos $matchobj;# or just ... $oldpos = pos; # uses the most recently seen # match object ? End of random ramblings ... -Scott -- Jonathan Scott Duff [EMAIL PROTECTED]
Re: A5: a few simple questions
Allison Randal [EMAIL PROTECTED] writes: On Thu, Jun 06, 2002 at 10:38:39AM -0400, John Siracusa wrote: On 6/6/02 2:43 AM, Damian Conway wrote: rule wordlist { (\w+) [ , (\w+) ]* } No semicolon at the end of that line? I've already forgotten the new rules for that type of thing... :) No, because rules are basically methods, just like grammars are basically classes. You would only need a semi-colon if you were defining an anonymous Crule (similar to an anonymous Csub): my $wordlist = rule { (\w+) [ , (\w+) ]* }; You wouldn't even need it then. Assuming you're following the closing brace with nothing but white space and a newline. -- Piers It is a truth universally acknowledged that a language in possession of a rich syntax must be in need of a rewrite. -- Jane Austen?
Re: A5: a few simple questions
On Thu, Jun 06, 2002 at 08:21:25PM +0100, Piers Cawley wrote: Allison Randal [EMAIL PROTECTED] writes: No, because rules are basically methods, just like grammars are basically classes. You would only need a semi-colon if you were defining an anonymous Crule (similar to an anonymous Csub): my $wordlist = rule { (\w+) [ , (\w+) ]* }; You wouldn't even need it then. Assuming you're following the closing brace with nothing but white space and a newline. I guess you're talking about the bit of A4 to do with When do I put a semicolon after a curly?. But that is if the final curly is on a line by itself. So you could get away with: my $wordlist = rule { (\w+) [ , (\w+) ]* } Allison
RFC261 in Perl 5 and where it needs Perl 6 support
Larry discounted RFC261 in A5, but I think there's some good in it. The biggest problem is not that it's hard to do in Perl6, but that 80-90% of it is ALREADY done in Perl5! Once you peel away that portion of the RFC, you get to Perl5's limitations and what Perl6 might do to support these things. NOTE: My examples are unchecked, so they should be considered pseudo-code. # RFC261 match ($a) = foo; # Perl5 ($a) = foo; #RFC261 match { 'Joe' = ? } = $h or die Hash does not contain Joe; #Perl5 scalar(grep { $_ eq 'Joe' } keys %$h) or die ... # This one states its own solution # Equiv to scalar(grep { $_ == 1 } list) match (..., 1, ...) = list; # No idea what this is supposed to do. I think $_[$_] is meant to work # in a way it doesn't ($_ will be a value, not an index). # Pretty close to ($idx) = grep { $_[$_] == 1 } _; $b = $_[$idx+1]; match (..., 1, $b) = _; # However, I want to suggest that Perl6 closures on grep and map should # be considered anonymous methods on the implied loop itself, so that we # can call methods like .index or .prev or .next (look ahead/behind), etc. # This gets sticky in some situations, but it would be invaluable, so it's # worth the effort, IMHO. # RFC # It gets worse! This gives the value associated with a key matching the # regular expression a*b: match { /a*b/ = $value } = \%h; #Perl5 # The RFC does not account for multiple matches # (presumably we just take the first) ($value) = map {$h{$_}} grep {/a*b/} keys %h; # RFC # And if you want to know what the key was: match { $key = /a*b/ = $value } = \%h; # Perl5 ($key,$value) = map {($_,$h{$_})} grep {/a*b/} keys %h; # RFC # What if you want to grab out the index? This is like # ($i) = grep { $list[$_] =~ /foo/ } 0..$#list match ( $i = /foo/ ) = list; # Perl5 # As suggested above # Perl6 # See my previous comment on closures of this type grep {/foo/ $i = .index} list; Sorry, it's time for me to go meet someone, but I think the rest of the RFC just gets into two cases. One involves some funky OO stuff that I'm not going to touch on here. The other is the idea of matching sub-lists, which would be the place for some methods inside of grep and map that support look ahead and behind. I hope others will take up this line of thought. Improved list manipulation can only be a good thing, IMHO.
Re: A5: Is this right?
Brent Dax wrote: grammar Perl6::Regex { rule metachar { [{(\[\])}:*+?\\|]} rule ws { [[\h\v]|\#\N*]*} Or just: rule ws { [\s|\#\N*]* } rule atom { ws (!metachar | \\ . | group) ws } rule modifier { ws ([*+?] \?? \:?) ws } rule modifier { ws ([[*+?]|reprange] \?? \:?) ws } rule reprange { \ [ bound [, bound?]? | , bound ] \ } rule bound{ \d+ | Perl::scalar } There are also bits missing from the rest of the grammar (e.g. named captures). I'll be showing a full regex grammar in E5. Damian
Re: A5: Is this right?
On Fri, 7 Jun 2002, Damian Conway wrote: Brent Dax wrote: grammar Perl6::Regex { rule metachar { [{(\[\])}:*+?\\|]} rule ws { [[\h\v]|\#\N*]*} Or just: rule ws { [\s|\#\N*]* } Just as a practical matter, given that you tend to have runs of whitespace, rule ws { [ \s+ | \#\N* ]* } will probably run faster. At least, that would certainly run faster with Perl 5's engine. Can't speak for Perl 6's, of course. As a different kind of practical matter, if we put spaces around our square brackets and vertical bars, it won't look so much like a character class. I know we're all from the old school, but we should therefore be even more alert against excessive regex compaction. Larry
Apoc 5 questions/comments
Well, A5 definitely has my head spinning. The new features seem amazingly powerful...it almost feels like we're going to have two equally powerful, equally complex languages living side-by-side: one of them is called Perl and the other one is called Regexes. Although they may talk to one another, I really did come away feeling like they were completely separate animals. I admit I'm a bit nervous about that...so far, I'm completely sold on (basically) all the new features and changes in Perl 6, and I'm eagerly anticipating working with them. But this level of change...I don't know. I've spent a lot of time getting to be (reasonaly) good at Perl regular expressions, and I don't like the thought of throwing out all or most of that effort. Somehow, this feels like we're trying to roll all of Prolog into Perl, and I'm not sure I personally want to go there (note the personally...YMMV). For now, I'm just going to defer worrying about it until I see Exegesis 5, since past experience has shown me that there is a good chance that all my fears will be shown to be groundless once concrete examples are being demonstrated. In any case, I do have some specific questions: - Page 8: s:3x:3rd /foo/bar/ That changes the 3rd, 6th, and 9th occurrences. Just to verify, this: s:3rd /foo3/bar/ would do the 3rd, 4th, and 5th, correct? - Page 8: The u1-u3 mods all say level 1 support. I assume this was a typo, and they should go (u1 = 'level 1', u2 = 'level 2', u3 = 'level 3'). - Can modifiers abut the delimiter? s:3x /foo/bar# most (all?) examples looked like this s:3x/foo/bar # is this legal? - Can we please have a 'reverse x' modifier that means treat whitespace as literals? Yes, we are living in a Unicode world now and your data could theoretically be coming in from a different character set than expected. But there are times when it won't...when (for example), you wrote the data out yourself, or you're operating on files that are generated and maintained purely in-house, so they are guaranteed to be in the same character set as the Perl source code you're writing. I understand the arguments for the way the defaults are set. I even agree with them. But you will NEVER convince me that the first example below is not easier to read than any of the alternatives: /FATAL ERROR\:Process (\d+) received signal\: (\d+)/ /FATAL ERROR\:\ \ \ \ Process\ (\d+)\ received\ signal\:\ (\d+)/ /FATAL ERROR\: \h+ Process \h+ (\d+) \h+ received \h+ signal: \h+ (\d+)/ /FATAL ERROR\: \s+ Process \s+ (\d+) \s+ received \s+ signal: \s+ (\d+)/ (Yes, I know that the last one matches vertical whitespace and therefore means something slightly different than the others.) If this means that we need to store a byte or two to remember what character set the originally-read-in code was in before being converted to UTF-8 (or whatever we're using internally), so that we know what character set to assume literal ws refers to...well, that seems like a small price to pay for a lot of convenience. - Page 9: my $foo = ?/.../; # boolean context, return whether matched, my $foo = +/.../; # numeric context, return count of matches my $foo = _/.../; # string context, return captured/matched string This 'initial character to force evaluation' rule initially seemed annoying, but the more I think about it, the more I like it; one character isn't much to type, and it makes it extremely clear why you're doing the match (i.e., what you're trying to get back). Kudos to our Fearless Language Designer! - I am a little unclear on what the difference is between these two: my foo = $rx; my foo = m/$rx/; If I understand correctly, it works like this: my stuff; $_ = foofoofoo; $rx = /:each foo/; for (0..2) { stuff = $rx } # above line is equialent to following 3 lines: stuff = ('foo', 'foo', 'foo'); stuff = (); stuff = (); for (0..2) { stuff = m/$rx/ } # above line is equialent to following 3 lines: stuff = ('foo', 'foo', 'foo'); stuff = ('foo', 'foo', 'foo'); stuff = ('foo', 'foo', 'foo'); Is that correct? - Page 10: You could also use the {'...'} construct for comments, but then you risk warnings about useless use of a string in void context. Could we automagically turn off that warning inside such constructs, when the only thing there was a string? (Perhaps there could be a switch that prevented it from being turned off, if people really wanted to see it; if so, make it be OFF by default, so it needs to be enabled, much like 'use strict.') - Page 11: / pattern ::: { code() or fail } / # fails entire rule Farther down: A pattern nested within a closure is classified as its own rule, however, so it never gets the chance to pass out of a {...} closure. If I understand