Hi,
Aaron Sherman wrote:
> Over the past week, I've been using my scant bits of nighttime coding to
> cobble together a pseudocode version of what I think the URI module should
> look like. There's already one available as example code, but it doesn't
> actually implement either the URI or IRI spec correctly. Instead, this
> approach uses a pluggable grammar so that you can:
>
> my URI $uri .= new( get_url_from_user(), :spec )
>
> which would parse the given URL using the RFC3987 IRI grammar. By default,
> it will use RFC3896 to parse URIs, which does not implement the UCS
> extensions. It can even handle the "legacy" RFC2396 and regex-based RFC3896
> variations.
>
> Here's the code:
>
> https://docs.google.com/leaf?id=0B41eVYcoggK7YjdkMzVjODctMTAxMi00ZGE0LWE1OTAtZTg1MTY0Njk5YjY4&hl=en
I think your code would benefit greatly from actually trying to get run
with Rakudo (or at least that parts that are yet implemented), as well
as from a version control system.
> So, my questions are:
>
> * Is this code doing anything that is explicitly not Perl 6ish?
Some things I've noticed:
* you put lots of subs into roles - you probably meant methods
* Don't inherit from roles, implement them with 'does'
* the grammars contain a mixture of tokens for parsing and of
methods/subs for data extraction; yet Perl 6 offers a nice way to
separate the two, in the form of action/reduction methods; your code
might benefit from them.
* class URI::GrammarType seems not very extensible... maybe keep a hash
of URI names that map to URIs, which can be extended by method calls?
> * Is this style of pluggable grammar the correct approach?
Looks good, from a first glance.
> * Should I hold off until R* to even try to convert this into working code?
No need for that. The support for grammars and roles is pretty good,
character classes and match objects are still a bit unstable/whacky.
> * What's the best way to write tests/package?
Every Perl 6 compiler comes with a Test.pm module, so use that. It
outputs TAP, so you can use the 'prove' command from perl5/Tap::Harness
> * Am I correct in assuming that <...> in a regex is intended to allow the
> creation of interface roles for grammars?
You lost me here. calls a named rule (with arguments).
Could you rephrase your question?
> * I guessed wildly at how I should be invoking the match against a saved
> "token" reference:
> if $s ~~ m/^ <.$.spec.gtype.URI_reference> $/ {
> is that correct?
probably just $s ~~ /^ $regex $/;
> * Are implementations going to be OK with massive character classes like:
> <+[\xA0 .. \xD7FF] + [\xF900 .. \xFDCF] + [\xFDF0 .. \xFFEF] +
> [\x1 .. \x1FFFD] + [\x2 .. \x2FFFD] +
> [\x3 .. \x3FFFD] + [\x4 .. \x4FFFD] +
> [\x5 .. \x5FFFD] + [\x6 .. \x6FFFD] +
> [\x7 .. \x7FFFD] + [\x8 .. \x8FFFD] +
> [\x9 .. \x9FFFD] + [\xA .. \xAFFFD] +
> [\xB .. \xBFFFD] + [\xC .. \xCFFFD] +
> [\xD .. \xDFFFD] + [\xE1000 .. \xEFFFD]>
> (from the IRI specification)
Funny thing, why does it exclude the FFFE and codepoints?
Anyway, I can't answer that question.
Cheers,
Moritz