Hi, Aaron Sherman wrote: > Over the past week, I've been using my scant bits of nighttime coding to > cobble together a pseudocode version of what I think the URI module should > look like. There's already one available as example code, but it doesn't > actually implement either the URI or IRI spec correctly. Instead, this > approach uses a pluggable grammar so that you can: > > my URI $uri .= new( get_url_from_user(), :spec<IRI> ) > > which would parse the given URL using the RFC3987 IRI grammar. By default, > it will use RFC3896 to parse URIs, which does not implement the UCS > extensions. It can even handle the "legacy" RFC2396 and regex-based RFC3896 > variations. > > Here's the code: > > https://docs.google.com/leaf?id=0B41eVYcoggK7YjdkMzVjODctMTAxMi00ZGE0LWE1OTAtZTg1MTY0Njk5YjY4&hl=en
I think your code would benefit greatly from actually trying to get run with Rakudo (or at least that parts that are yet implemented), as well as from a version control system. > So, my questions are: > > * Is this code doing anything that is explicitly not Perl 6ish? Some things I've noticed: * you put lots of subs into roles - you probably meant methods * Don't inherit from roles, implement them with 'does' * the grammars contain a mixture of tokens for parsing and of methods/subs for data extraction; yet Perl 6 offers a nice way to separate the two, in the form of action/reduction methods; your code might benefit from them. * class URI::GrammarType seems not very extensible... maybe keep a hash of URI names that map to URIs, which can be extended by method calls? > * Is this style of pluggable grammar the correct approach? Looks good, from a first glance. > * Should I hold off until R* to even try to convert this into working code? No need for that. The support for grammars and roles is pretty good, character classes and match objects are still a bit unstable/whacky. > * What's the best way to write tests/package? Every Perl 6 compiler comes with a Test.pm module, so use that. It outputs TAP, so you can use the 'prove' command from perl5/Tap::Harness > * Am I correct in assuming that <...> in a regex is intended to allow the > creation of interface roles for grammars? You lost me here. <identifier(...)> calls a named rule (with arguments). Could you rephrase your question? > * I guessed wildly at how I should be invoking the match against a saved > "token" reference: > if $s ~~ m/^ <.$.spec.gtype.URI_reference> $/ { > is that correct? probably just $s ~~ /^ $regex $/; > * Are implementations going to be OK with massive character classes like: > <+[\xA0 .. \xD7FF] + [\xF900 .. \xFDCF] + [\xFDF0 .. \xFFEF] + > [\x10000 .. \x1FFFD] + [\x20000 .. \x2FFFD] + > [\x30000 .. \x3FFFD] + [\x40000 .. \x4FFFD] + > [\x50000 .. \x5FFFD] + [\x60000 .. \x6FFFD] + > [\x70000 .. \x7FFFD] + [\x80000 .. \x8FFFD] + > [\x90000 .. \x9FFFD] + [\xA0000 .. \xAFFFD] + > [\xB0000 .. \xBFFFD] + [\xC0000 .. \xCFFFD] + > [\xD0000 .. \xDFFFD] + [\xE1000 .. \xEFFFD]> > (from the IRI specification) Funny thing, why does it exclude the FFFE and FFFF codepoints? Anyway, I can't answer that question. Cheers, Moritz