Ooops, took this off-list by accident. ---------- Forwarded message ---------- From: ajs <a...@ajs.com> Date: Mon, May 17, 2010 at 2:59 PM Subject: Re: URI replacement pseudocode To: Moritz Lenz <mor...@faui2k3.org>
Thank you for your responses! On Mon, May 17, 2010 at 1:37 PM, Moritz Lenz <mor...@faui2k3.org> wrote: > Aaron Sherman wrote: > > Here's the code: > > > > > https://docs.google.com/leaf?id=0B41eVYcoggK7YjdkMzVjODctMTAxMi00ZGE0LWE1OTAtZTg1MTY0Njk5YjY4&hl=en > > I think your code would benefit greatly from actually trying to get run > with Rakudo (or at least that parts that are yet implemented), as well > as from a version control system. > (re: storage. yes, I intend to get this into something. not sure what, yet. git is preferred, I presume?) I had a hard time even getting basic code working like: token foo { blah } if "blah" ~~ m/<foo>/ { say "blah!" } (See my question to the list, last week) so I really didn't want to venture into trying to get this working, but yeah, now that it's done I'll see how Rakudo chokes on it. > > > So, my questions are: > > > > * Is this code doing anything that is explicitly not Perl 6ish? > > Some things I've noticed: > * you put lots of subs into roles - you probably meant methods > Well... that's a fair question. What does a method mean in a grammar? I wasn't too clear on what being a method of a grammar meant. Should I be calling these as class-methods? > * Don't inherit from roles, implement them with 'does' > I did that, didn't I? Did I typo something? grammar URI::rfc2396 does URI::Grammarish ... > * the grammars contain a mixture of tokens for parsing and of > methods/subs for data extraction; yet Perl 6 offers a nice way to > separate the two, in the form of action/reduction methods; your code > might benefit from them. > Do you have a pointer for some discussion of this? I'd love to pursue it. > * class URI::GrammarType seems not very extensible... maybe keep a hash > of URI names that map to URIs, which can be extended by method calls? > The idea that I was working with was that you would provide the grammar itself when you wanted to do something custom, and the string names were just a convenience for the default cases. So, for example: my URI $privatewww .= new("ajs://perl**6", :spec(::MyURI::Spec)); Where MyURI::Spec could be any grammar that implements the URI::Grammarish interface (see grammar interface discussion, below). I can look into extending it with string names as well, though. > > * Is this style of pluggable grammar the correct approach? > > Looks good, from a first glance. > Thanks! > > > * Should I hold off until R* to even try to convert this into working > code? > > No need for that. The support for grammars and roles is pretty good, > character classes and match objects are still a bit unstable/whacky. > Is there any collected wisdom available on this? I'd love to not run around chasing my own tail trying to figure out why something doesn't work. > > * Am I correct in assuming that <...> in a regex is intended to allow the > > creation of interface roles for grammars? > > You lost me here. <identifier(...)> calls a named rule (with arguments). > Could you rephrase your question? Sure. All S05 says is "The <...>, <???>, and <!!!> special tokens have the same "not-defined-yet" meanings within regexes that the bare elipses have in ordinary code." Which doesn't tell me a lot, but seems to imply that: role blah { token bletch { <...> } } is roughly analogous to: role blah { method bletch {...} } that is to say, the role should have an interface which, when applied to a grammar, would assert the presence of a bletch token. Am I reading too much into this? If yes, is there a way to assert role-based interfaces on grammars? The main reason I wanted this was for the very parametric grammar selection we were talking about, above, where the given block says: given $type { when .does(URI::Grammarish) { $.gtype = $_ } I'm assuming, of course, that I can make such assertions about a grammar in the same way that I would make them about a class. Is this true? Have I identified an interface token/rule correctly given that that was my goal? > * I guessed wildly at how I should be invoking the match against a saved > > "token" reference: > > if $s ~~ m/^ <.$.spec.gtype.URI_reference> $/ { > > is that correct? > > probably just $s ~~ /^ $regex $/; > But what should $regex contain? I have a $.spec which contains a reference to the URI::GrammarType object whose $.gtype identifies the grammar I should be using. That grammar is guaranteed to have a URI_reference rule, so the variable is: $.spec.gtype.URI_reference Should that be: if $s ~~ m/^ <$.spec.gtype.URI_reference> $/ { ... } then? Do I need to copy that into a lexical to avoid confusing the rule parser? I think I'll do that, just to avoid all the confusion anyway. > > * Are implementations going to be OK with massive character classes like: > > <+[\xA0 .. \xD7FF] + [\xF900 .. \xFDCF] + [\xFDF0 .. \xFFEF] + > > [\x10000 .. \x1FFFD] + [\x20000 .. \x2FFFD] + > > [\x30000 .. \x3FFFD] + [\x40000 .. \x4FFFD] + > > [\x50000 .. \x5FFFD] + [\x60000 .. \x6FFFD] + > > [\x70000 .. \x7FFFD] + [\x80000 .. \x8FFFD] + > > [\x90000 .. \x9FFFD] + [\xA0000 .. \xAFFFD] + > > [\xB0000 .. \xBFFFD] + [\xC0000 .. \xCFFFD] + > > [\xD0000 .. \xDFFFD] + [\xE1000 .. \xEFFFD]> > > (from the IRI specification) > > Funny thing, why does it exclude the FFFE and FFFF codepoints? > Anyway, I can't answer that question. > FFFE and FEFF are used to manage byte-ordering, so they really shouldn't be part of a URI (URIs should exist in a context in which byte ordering is assured, would be my take). The Unicode spec says that FFFF is guaranteed not to be a valid Unicode character, but does not explain why. [ http://unicode.org/charts/PDF/UFFF0.pdf] -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs