Re: URI replacement pseudocode

Moritz Lenz Mon, 17 May 2010 10:37:22 -0700

Hi,

Aaron Sherman wrote:
> Over the past week, I've been using my scant bits of nighttime coding to
> cobble together a pseudocode version of what I think the URI module should
> look like. There's already one available as example code, but it doesn't
> actually implement either the URI or IRI spec correctly. Instead, this
> approach uses a pluggable grammar so that you can:
> 
>   my URI $uri .= new( get_url_from_user(), :spec<IRI> )
> 
> which would parse the given URL using the RFC3987 IRI grammar. By default,
> it will use RFC3896 to parse URIs, which does not implement the UCS
> extensions. It can even handle the "legacy" RFC2396 and regex-based RFC3896
> variations.
> 
> Here's the code:
> 
> https://docs.google.com/leaf?id=0B41eVYcoggK7YjdkMzVjODctMTAxMi00ZGE0LWE1OTAtZTg1MTY0Njk5YjY4&hl=en


I think your code would benefit greatly from actually trying to get run
with Rakudo (or at least that parts that are yet implemented), as well
as from a version control system.

> So, my questions are:
> 
> * Is this code doing anything that is explicitly not Perl 6ish?

Some things I've noticed:
* you put lots of subs into roles - you probably meant methods
* Don't inherit from roles, implement them with 'does'
* the grammars contain a mixture of tokens for parsing and of
methods/subs for data extraction; yet Perl 6 offers a nice way to
separate the two, in the form of action/reduction methods; your code
might benefit from them.
* class URI::GrammarType seems not very extensible... maybe keep a hash
of URI names that map to URIs, which can be extended by method calls?

> * Is this style of pluggable grammar the correct approach?

Looks good, from a first glance.

> * Should I hold off until R* to even try to convert this into working code?

No need for that. The support for grammars and roles is pretty good,
character classes and match objects are still a bit unstable/whacky.

> * What's the best way to write tests/package?

Every Perl 6 compiler comes with a Test.pm module, so use that. It
outputs TAP, so you can use the 'prove' command from perl5/Tap::Harness

> * Am I correct in assuming that <...> in a regex is intended to allow the
> creation of interface roles for grammars?

You lost me here. <identifier(...)> calls a named rule (with arguments).
Could you rephrase your question?

> * I guessed wildly at how I should be invoking the match against a saved
> "token" reference:
>         if $s ~~ m/^ <.$.spec.gtype.URI_reference> $/ {
>   is that correct?

probably just $s ~~ /^ $regex $/;

> * Are implementations going to be OK with massive character classes like:
> <+[\xA0 .. \xD7FF] + [\xF900 .. \xFDCF] + [\xFDF0 .. \xFFEF] +
>   [\x10000 .. \x1FFFD] + [\x20000 .. \x2FFFD] +
>   [\x30000 .. \x3FFFD] + [\x40000 .. \x4FFFD] +
>   [\x50000 .. \x5FFFD] + [\x60000 .. \x6FFFD] +
>   [\x70000 .. \x7FFFD] + [\x80000 .. \x8FFFD] +
>   [\x90000 .. \x9FFFD] + [\xA0000 .. \xAFFFD] +
>   [\xB0000 .. \xBFFFD] + [\xC0000 .. \xCFFFD] +
>   [\xD0000 .. \xDFFFD] + [\xE1000 .. \xEFFFD]>
> (from the IRI specification)

Funny thing, why does it exclude the FFFE and FFFF codepoints?
Anyway, I can't answer that question.

Cheers,
Moritz

Re: URI replacement pseudocode

Reply via email to