URI replacement pseudocode
Over the past week, I've been using my scant bits of nighttime coding to cobble together a pseudocode version of what I think the URI module should look like. There's already one available as example code, but it doesn't actually implement either the URI or IRI spec correctly. Instead, this approach uses a pluggable grammar so that you can: my URI $uri .= new( get_url_from_user(), :specIRI ) which would parse the given URL using the RFC3987 IRI grammar. By default, it will use RFC3896 to parse URIs, which does not implement the UCS extensions. It can even handle the legacy RFC2396 and regex-based RFC3896 variations. Here's the code: https://docs.google.com/leaf?id=0B41eVYcoggK7YjdkMzVjODctMTAxMi00ZGE0LWE1OTAtZTg1MTY0Njk5YjY4hl=en So, my questions are: * Is this code doing anything that is explicitly not Perl 6ish? * Is this style of pluggable grammar the correct approach? * Should I hold off until R* to even try to convert this into working code? * What's the best way to write tests/package? * Am I correct in assuming that ... in a regex is intended to allow the creation of interface roles for grammars? * I guessed wildly at how I should be invoking the match against a saved token reference: if $s ~~ m/^ .$.spec.gtype.URI_reference $/ { is that correct? * Are implementations going to be OK with massive character classes like: +[\xA0 .. \xD7FF] + [\xF900 .. \xFDCF] + [\xFDF0 .. \xFFEF] + [\x1 .. \x1FFFD] + [\x2 .. \x2FFFD] + [\x3 .. \x3FFFD] + [\x4 .. \x4FFFD] + [\x5 .. \x5FFFD] + [\x6 .. \x6FFFD] + [\x7 .. \x7FFFD] + [\x8 .. \x8FFFD] + [\x9 .. \x9FFFD] + [\xA .. \xAFFFD] + [\xB .. \xBFFFD] + [\xC .. \xCFFFD] + [\xD .. \xDFFFD] + [\xE1000 .. \xEFFFD] (from the IRI specification) -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs
Re: URI replacement pseudocode
Hi, Aaron Sherman wrote: Over the past week, I've been using my scant bits of nighttime coding to cobble together a pseudocode version of what I think the URI module should look like. There's already one available as example code, but it doesn't actually implement either the URI or IRI spec correctly. Instead, this approach uses a pluggable grammar so that you can: my URI $uri .= new( get_url_from_user(), :specIRI ) which would parse the given URL using the RFC3987 IRI grammar. By default, it will use RFC3896 to parse URIs, which does not implement the UCS extensions. It can even handle the legacy RFC2396 and regex-based RFC3896 variations. Here's the code: https://docs.google.com/leaf?id=0B41eVYcoggK7YjdkMzVjODctMTAxMi00ZGE0LWE1OTAtZTg1MTY0Njk5YjY4hl=en I think your code would benefit greatly from actually trying to get run with Rakudo (or at least that parts that are yet implemented), as well as from a version control system. So, my questions are: * Is this code doing anything that is explicitly not Perl 6ish? Some things I've noticed: * you put lots of subs into roles - you probably meant methods * Don't inherit from roles, implement them with 'does' * the grammars contain a mixture of tokens for parsing and of methods/subs for data extraction; yet Perl 6 offers a nice way to separate the two, in the form of action/reduction methods; your code might benefit from them. * class URI::GrammarType seems not very extensible... maybe keep a hash of URI names that map to URIs, which can be extended by method calls? * Is this style of pluggable grammar the correct approach? Looks good, from a first glance. * Should I hold off until R* to even try to convert this into working code? No need for that. The support for grammars and roles is pretty good, character classes and match objects are still a bit unstable/whacky. * What's the best way to write tests/package? Every Perl 6 compiler comes with a Test.pm module, so use that. It outputs TAP, so you can use the 'prove' command from perl5/Tap::Harness * Am I correct in assuming that ... in a regex is intended to allow the creation of interface roles for grammars? You lost me here. identifier(...) calls a named rule (with arguments). Could you rephrase your question? * I guessed wildly at how I should be invoking the match against a saved token reference: if $s ~~ m/^ .$.spec.gtype.URI_reference $/ { is that correct? probably just $s ~~ /^ $regex $/; * Are implementations going to be OK with massive character classes like: +[\xA0 .. \xD7FF] + [\xF900 .. \xFDCF] + [\xFDF0 .. \xFFEF] + [\x1 .. \x1FFFD] + [\x2 .. \x2FFFD] + [\x3 .. \x3FFFD] + [\x4 .. \x4FFFD] + [\x5 .. \x5FFFD] + [\x6 .. \x6FFFD] + [\x7 .. \x7FFFD] + [\x8 .. \x8FFFD] + [\x9 .. \x9FFFD] + [\xA .. \xAFFFD] + [\xB .. \xBFFFD] + [\xC .. \xCFFFD] + [\xD .. \xDFFFD] + [\xE1000 .. \xEFFFD] (from the IRI specification) Funny thing, why does it exclude the FFFE and codepoints? Anyway, I can't answer that question. Cheers, Moritz
Re: Fwd: URI replacement pseudocode
Aaron Sherman wrote: Ooops, took this off-list by accident. -- Forwarded message -- From: ajs a...@ajs.com Date: Mon, May 17, 2010 at 2:59 PM Subject: Re: URI replacement pseudocode To: Moritz Lenz mor...@faui2k3.org Thank you for your responses! On Mon, May 17, 2010 at 1:37 PM, Moritz Lenz mor...@faui2k3.org wrote: Aaron Sherman wrote: Here's the code: https://docs.google.com/leaf?id=0B41eVYcoggK7YjdkMzVjODctMTAxMi00ZGE0LWE1OTAtZTg1MTY0Njk5YjY4hl=en I think your code would benefit greatly from actually trying to get run with Rakudo (or at least that parts that are yet implemented), as well as from a version control system. (re: storage. yes, I intend to get this into something. not sure what, yet. git is preferred, I presume?) Yes, but it's really your decision in the end. I had a hard time even getting basic code working like: token foo { blah } if blah ~~ m/foo/ { say blah! } (See my question to the list, last week) Right. What works today is grammar Foo { token TOP { foo } token foo { blah } } if Foo.parse('blah') { say yes } So, my questions are: * Is this code doing anything that is explicitly not Perl 6ish? Some things I've noticed: * you put lots of subs into roles - you probably meant methods Well... that's a fair question. What does a method mean in a grammar? I wasn't too clear on what being a method of a grammar meant. Should I be calling these as class-methods? a grammar is really just a class that inherits from Grammar. So the answer is it means the same as in a class. * Don't inherit from roles, implement them with 'does' I did that, didn't I? Did I typo something? grammar URI::rfc2396 does URI::Grammarish ... and grammarb URI::rfc3986_regex is URI::Grammarish that's what I meant * the grammars contain a mixture of tokens for parsing and of methods/subs for data extraction; yet Perl 6 offers a nice way to separate the two, in the form of action/reduction methods; your code might benefit from them. Do you have a pointer for some discussion of this? I'd love to pursue it. http://github.com/perl6/book/raw/master/src/grammars.pod (that chapter still uses the outdated {*} rules - if you read about them, ignore them, and instead know that the corresponding action method is always called implicitly at the end of each named rule). * class URI::GrammarType seems not very extensible... maybe keep a hash of URI names that map to URIs, which can be extended by method calls? The idea that I was working with was that you would provide the grammar itself when you wanted to do something custom, and the string names were just a convenience for the default cases. So, for example: my URI $privatewww .= new(ajs://perl**6, :spec(::MyURI::Spec)); Fair enough. * Should I hold off until R* to even try to convert this into working code? No need for that. The support for grammars and roles is pretty good, character classes and match objects are still a bit unstable/whacky. Is there any collected wisdom available on this? I'd love to not run around chasing my own tail trying to figure out why something doesn't work. it's called #perl6, and is our IRC channel :-) Writing down such volatile information isn't very useful, because it becomes outdated rather quickly. * Am I correct in assuming that ... in a regex is intended to allow the creation of interface roles for grammars? You lost me here. identifier(...) calls a named rule (with arguments). Could you rephrase your question? Sure. All S05 says is The ..., ???, and !!! special tokens have the same not-defined-yet meanings within regexes that the bare elipses have in ordinary code. Which doesn't tell me a lot, but seems to imply that: role blah { token bletch { ... } } is roughly analogous to: role blah { method bletch {...} } that is to say, the role should have an interface which, when applied to a grammar, would assert the presence of a bletch token. Am I reading too much into this? Don't know... If yes, is there a way to assert role-based interfaces on grammars? The main reason I wanted this was for the very parametric grammar selection we were talking about, above, where the given block says: given $type { when .does(URI::Grammarish) { $.gtype = $_ } I'm assuming, of course, that I can make such assertions about a grammar in the same way that I would make them about a class. Is this true? Yes. But you could just say given $type { when Grammar { $!gtype = $_ } ... } and accept any grammar there. Not as type-safe, but probably a good start. Have I identified an interface token/rule correctly given that that was my goal? * I guessed wildly at how I should be invoking the match against a saved token reference: if $s ~~ m/^ .$.spec.gtype.URI_reference $/ { is that correct? probably just $s ~~ /^ $regex $/; But what should
Re: Fwd: URI replacement pseudocode
On Mon, May 17, 2010 at 3:34 PM, Moritz Lenz mor...@faui2k3.org wrote: Aaron Sherman wrote: I had a hard time even getting basic code working like: token foo { blah } if blah ~~ m/foo/ { say blah! } (See my question to the list, last week) Right. What works today is grammar Foo { token TOP { foo } token foo { blah } } if Foo.parse('blah') { say yes } I will do this. Thanks. * Don't inherit from roles, implement them with 'does' I did that, didn't I? Did I typo something? grammar URI::rfc2396 does URI::Grammarish ... and grammarb URI::rfc3986_regex is URI::Grammarish that's what I meant That's a double typo (grammarb and is). I'll fix that in the version I put up after this discussion. it's called #perl6, and is our IRC channel :-) Writing down such volatile information isn't very useful, because it becomes outdated rather quickly. I used to be active in #perl6. I'll try to jump back in. I'm noting the rest of what you said and moving forward with the changes. It all sounds much more reasonable than I feared it would be. -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs
Re: Fwd: URI replacement pseudocode
On Mon, May 17, 2010 at 3:34 PM, Moritz Lenz mor...@faui2k3.org wrote: Aaron Sherman wrote: I had a hard time even getting basic code working like: token foo { blah } if blah ~~ m/foo/ { say blah! } (See my question to the list, last week) Right. What works today is grammar Foo { token TOP { foo } token foo { blah } } if Foo.parse('blah') { say yes } I will do this. Thanks. * Don't inherit from roles, implement them with 'does' I did that, didn't I? Did I typo something? grammar URI::rfc2396 does URI::Grammarish ... and grammarb URI::rfc3986_regex is URI::Grammarish that's what I meant That's a double typo (grammarb and is). I'll fix that in the version I put up after this discussion. it's called #perl6, and is our IRC channel :-) Writing down such volatile information isn't very useful, because it becomes outdated rather quickly. I used to be active in #perl6. I'll try to jump back in. I'm noting the rest of what you said and moving forward with the changes. It all sounds much more reasonable than I feared it would be. -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs
replacement of $
Recently I had time to think about the $ symbol we use in Perl. I think Perl has been using the USD symbol for too long, and I'm now sure that it's time to replace it. After some research I came to the conclusion that the best fit is the euro symbol (€). So, spread the word, Perl 6 will require you to replace all the $ in your scripts with €. That's just a regex after all...
Re: replacement of $
At 15:04 -0800 1/4/06, Larry Wall wrote: Recently I had time to think about the $ symbol we use in Perl. I think Perl has been using the USD symbol for too long, and I'm now sure that it's time to replace it. After some research I came to the conclusion that the best fit is the euro symbol (¤). So, spread the word, Perl 6 will require you to replace all the $ in your scripts with ¤. That's just a regex after all... But $ isn't specifically a USD symbol. Its used by Canada and Australia too, if not more places. Its multi-country like the Euro is. Perhaps what we need is a more universal currency. I suggest gold. So every relevant symbol name could start with 'Au' instead of '$', and an advantage of this is that it is still easy to type on any keyboard. -- Darren Duncan
Re: replacement of $
On Sat, Apr 01, 2006 at 03:11:27PM -0800, Larry Wall wrote: : : : Recently I had time to think about the $ symbol we use in Perl. : : I think Perl has been using the USD symbol for too long, and I'm now sure : that it's time to replace it. After some research I came to the conclusion : that the best fit is the euro symbol (€). : : So, spread the word, Perl 6 will require you to replace all the $ in your : scripts with €. That's just a regex after all... Hmm, like anyone's going to believe you... Anyone can forge Larry-like email. Look how easy it is for *me* to forge email from Larry even though I'm in Japan right now. All you have to do is end most of your paragraphs with ... and throw in a few Hmms here and there And then there's all these checkins I've been forging from Audrey. Piece o' cake... Now, gettin' myself made up to look like Larry for the Bugs Manifesto, that was a wee bit more challenging, but I think most people were fooled... TomTiady, AKA 落第の駱駝, AKA Larry Boy (the tsukemono, not the pickle)
Re: replacement of $
On Sun, Apr 02, 2006 at 02:04:07 +0300, Larry Wall wrote: ^^^-- (actually that was IDT in the headers) Hi, I'm in Israel and Japan at the same time! Nice one though ;-) plugIf you guys would have participated in the keysigning parties.../plug -- Yuval Kogman [EMAIL PROTECTED] http://nothingmuch.woobling.org 0xEBD27418 pgpAaFxAF3CvE.pgp Description: PGP signature