URI replacement pseudocode

2010-05-17 Thread Aaron Sherman
Over the past week, I've been using my scant bits of nighttime coding to
cobble together a pseudocode version of what I think the URI module should
look like. There's already one available as example code, but it doesn't
actually implement either the URI or IRI spec correctly. Instead, this
approach uses a pluggable grammar so that you can:

  my URI $uri .= new( get_url_from_user(), :specIRI )

which would parse the given URL using the RFC3987 IRI grammar. By default,
it will use RFC3896 to parse URIs, which does not implement the UCS
extensions. It can even handle the legacy RFC2396 and regex-based RFC3896
variations.

Here's the code:

https://docs.google.com/leaf?id=0B41eVYcoggK7YjdkMzVjODctMTAxMi00ZGE0LWE1OTAtZTg1MTY0Njk5YjY4hl=en

So, my questions are:

* Is this code doing anything that is explicitly not Perl 6ish?
* Is this style of pluggable grammar the correct approach?
* Should I hold off until R* to even try to convert this into working code?
* What's the best way to write tests/package?
* Am I correct in assuming that ... in a regex is intended to allow the
creation of interface roles for grammars?
* I guessed wildly at how I should be invoking the match against a saved
token reference:
if $s ~~ m/^ .$.spec.gtype.URI_reference $/ {
  is that correct?
* Are implementations going to be OK with massive character classes like:
+[\xA0 .. \xD7FF] + [\xF900 .. \xFDCF] + [\xFDF0 .. \xFFEF] +
  [\x1 .. \x1FFFD] + [\x2 .. \x2FFFD] +
  [\x3 .. \x3FFFD] + [\x4 .. \x4FFFD] +
  [\x5 .. \x5FFFD] + [\x6 .. \x6FFFD] +
  [\x7 .. \x7FFFD] + [\x8 .. \x8FFFD] +
  [\x9 .. \x9FFFD] + [\xA .. \xAFFFD] +
  [\xB .. \xBFFFD] + [\xC .. \xCFFFD] +
  [\xD .. \xDFFFD] + [\xE1000 .. \xEFFFD]
(from the IRI specification)

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: URI replacement pseudocode

2010-05-17 Thread Moritz Lenz
Hi,

Aaron Sherman wrote:
 Over the past week, I've been using my scant bits of nighttime coding to
 cobble together a pseudocode version of what I think the URI module should
 look like. There's already one available as example code, but it doesn't
 actually implement either the URI or IRI spec correctly. Instead, this
 approach uses a pluggable grammar so that you can:
 
   my URI $uri .= new( get_url_from_user(), :specIRI )
 
 which would parse the given URL using the RFC3987 IRI grammar. By default,
 it will use RFC3896 to parse URIs, which does not implement the UCS
 extensions. It can even handle the legacy RFC2396 and regex-based RFC3896
 variations.
 
 Here's the code:
 
 https://docs.google.com/leaf?id=0B41eVYcoggK7YjdkMzVjODctMTAxMi00ZGE0LWE1OTAtZTg1MTY0Njk5YjY4hl=en

I think your code would benefit greatly from actually trying to get run
with Rakudo (or at least that parts that are yet implemented), as well
as from a version control system.

 So, my questions are:
 
 * Is this code doing anything that is explicitly not Perl 6ish?

Some things I've noticed:
* you put lots of subs into roles - you probably meant methods
* Don't inherit from roles, implement them with 'does'
* the grammars contain a mixture of tokens for parsing and of
methods/subs for data extraction; yet Perl 6 offers a nice way to
separate the two, in the form of action/reduction methods; your code
might benefit from them.
* class URI::GrammarType seems not very extensible... maybe keep a hash
of URI names that map to URIs, which can be extended by method calls?

 * Is this style of pluggable grammar the correct approach?

Looks good, from a first glance.

 * Should I hold off until R* to even try to convert this into working code?

No need for that. The support for grammars and roles is pretty good,
character classes and match objects are still a bit unstable/whacky.

 * What's the best way to write tests/package?

Every Perl 6 compiler comes with a Test.pm module, so use that. It
outputs TAP, so you can use the 'prove' command from perl5/Tap::Harness

 * Am I correct in assuming that ... in a regex is intended to allow the
 creation of interface roles for grammars?

You lost me here. identifier(...) calls a named rule (with arguments).
Could you rephrase your question?

 * I guessed wildly at how I should be invoking the match against a saved
 token reference:
 if $s ~~ m/^ .$.spec.gtype.URI_reference $/ {
   is that correct?

probably just $s ~~ /^ $regex $/;

 * Are implementations going to be OK with massive character classes like:
 +[\xA0 .. \xD7FF] + [\xF900 .. \xFDCF] + [\xFDF0 .. \xFFEF] +
   [\x1 .. \x1FFFD] + [\x2 .. \x2FFFD] +
   [\x3 .. \x3FFFD] + [\x4 .. \x4FFFD] +
   [\x5 .. \x5FFFD] + [\x6 .. \x6FFFD] +
   [\x7 .. \x7FFFD] + [\x8 .. \x8FFFD] +
   [\x9 .. \x9FFFD] + [\xA .. \xAFFFD] +
   [\xB .. \xBFFFD] + [\xC .. \xCFFFD] +
   [\xD .. \xDFFFD] + [\xE1000 .. \xEFFFD]
 (from the IRI specification)

Funny thing, why does it exclude the FFFE and  codepoints?
Anyway, I can't answer that question.

Cheers,
Moritz


Re: Fwd: URI replacement pseudocode

2010-05-17 Thread Moritz Lenz


Aaron Sherman wrote:
 Ooops, took this off-list by accident.
 
 -- Forwarded message --
 From: ajs a...@ajs.com
 Date: Mon, May 17, 2010 at 2:59 PM
 Subject: Re: URI replacement pseudocode
 To: Moritz Lenz mor...@faui2k3.org
 
 
 Thank you for your responses!
 
 On Mon, May 17, 2010 at 1:37 PM, Moritz Lenz mor...@faui2k3.org wrote:
 
 Aaron Sherman wrote:
  Here's the code:
 
 
 https://docs.google.com/leaf?id=0B41eVYcoggK7YjdkMzVjODctMTAxMi00ZGE0LWE1OTAtZTg1MTY0Njk5YjY4hl=en

 I think your code would benefit greatly from actually trying to get run
 with Rakudo (or at least that parts that are yet implemented), as well
 as from a version control system.

 
 (re: storage. yes, I intend to get this into something. not sure what, yet.
 git is preferred, I presume?)

Yes, but it's really your decision in the end.

 I had a hard time even getting basic code working like:
 
   token foo { blah }
   if blah ~~ m/foo/ { say blah! }
 
 (See my question to the list, last week)

Right. What works today is

grammar Foo {
   token TOP { foo }
   token foo { blah }
}

if Foo.parse('blah') {
   say yes
}

  So, my questions are:
 
  * Is this code doing anything that is explicitly not Perl 6ish?

 Some things I've noticed:
 * you put lots of subs into roles - you probably meant methods

 
 Well... that's a fair question. What does a method mean in a grammar? I
 wasn't too clear on what being a method of a grammar meant. Should I be
 calling these as class-methods?

a grammar is really just a class that inherits from Grammar. So the
answer is it means the same as in a class.

 
 * Don't inherit from roles, implement them with 'does'

 
 I did that, didn't I? Did I typo something?
 
grammar URI::rfc2396 does URI::Grammarish ...
 

and

grammarb URI::rfc3986_regex is URI::Grammarish

that's what I meant

 * the grammars contain a mixture of tokens for parsing and of
 methods/subs for data extraction; yet Perl 6 offers a nice way to
 separate the two, in the form of action/reduction methods; your code
 might benefit from them.

 
 Do you have a pointer for some discussion of this? I'd love to pursue it.


http://github.com/perl6/book/raw/master/src/grammars.pod

(that chapter still uses the outdated {*} rules - if you read about
them, ignore them, and instead know that the corresponding action method
is always called implicitly at the end of each named rule).


 * class URI::GrammarType seems not very extensible... maybe keep a hash
 of URI names that map to URIs, which can be extended by method calls?

 
 The idea that I was working with was that you would provide the grammar
 itself when you wanted to do something custom, and the string names were
 just a convenience for the default cases.  So, for example:
 
   my URI $privatewww .= new(ajs://perl**6, :spec(::MyURI::Spec));

Fair enough.

  * Should I hold off until R* to even try to convert this into working
 code?

 No need for that. The support for grammars and roles is pretty good,
 character classes and match objects are still a bit unstable/whacky.

 
 Is there any collected wisdom available on this? I'd love to not run around
 chasing my own tail trying to figure out why something doesn't work.

it's called #perl6, and is our IRC channel :-)
Writing down such volatile information isn't very useful, because it
becomes outdated rather quickly.

  * Am I correct in assuming that ... in a regex is intended to allow the
  creation of interface roles for grammars?

 You lost me here. identifier(...) calls a named rule (with arguments).
 Could you rephrase your question?
 
 
 Sure.
 
 All S05 says is The ..., ???, and !!! special tokens have the same
 not-defined-yet meanings within regexes that the bare elipses have in
 ordinary code. Which doesn't tell me a lot, but seems to imply that:
 
 role blah { token bletch { ... } }
 
 is roughly analogous to:
 
 role blah { method bletch {...} }
 
 that is to say, the role should have an interface which, when applied to a
 grammar, would assert the presence of a bletch token. Am I reading too much
 into this?

Don't know...

 If yes, is there a way to assert role-based interfaces on
 grammars? The main reason I wanted this was for the very parametric grammar
 selection we were talking about, above, where the given block says:
 
 given $type {
 when .does(URI::Grammarish) { $.gtype = $_ }

 I'm assuming, of course, that I can make such assertions about a grammar in
 the same way that I would make them about a class. Is this true?

Yes. But you could just say

given $type {
when Grammar { $!gtype = $_ }
...
}

and accept any grammar there. Not as type-safe, but probably a good start.

 Have I
 identified an interface token/rule correctly given that that was my goal?
 
 
 * I guessed wildly at how I should be invoking the match against a saved
  token reference:
  if $s ~~ m/^ .$.spec.gtype.URI_reference $/ {
is that correct?

 probably just $s ~~ /^ $regex $/;

 
 But what should

Re: Fwd: URI replacement pseudocode

2010-05-17 Thread Aaron Sherman
On Mon, May 17, 2010 at 3:34 PM, Moritz Lenz mor...@faui2k3.org wrote:



 Aaron Sherman wrote:



  I had a hard time even getting basic code working like:
 
token foo { blah }
if blah ~~ m/foo/ { say blah! }
 
  (See my question to the list, last week)

 Right. What works today is

 grammar Foo {
   token TOP { foo }
   token foo { blah }
 }

 if Foo.parse('blah') {
   say yes
 }



I will do this. Thanks.

 * Don't inherit from roles, implement them with 'does'
 
 
  I did that, didn't I? Did I typo something?
 
 grammar URI::rfc2396 does URI::Grammarish ...
 

 and

 grammarb URI::rfc3986_regex is URI::Grammarish

 that's what I meant


That's a double typo (grammarb and is). I'll fix that in the version I put
up after this discussion.

it's called #perl6, and is our IRC channel :-)
 Writing down such volatile information isn't very useful, because it
 becomes outdated rather quickly.


I used to be active in #perl6. I'll try to jump back in.

I'm noting the rest of what you said and moving forward with the changes. It
all sounds much more reasonable than I feared it would be.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Fwd: URI replacement pseudocode

2010-05-17 Thread Aaron Sherman
On Mon, May 17, 2010 at 3:34 PM, Moritz Lenz mor...@faui2k3.org wrote:



 Aaron Sherman wrote:



  I had a hard time even getting basic code working like:
 
token foo { blah }
if blah ~~ m/foo/ { say blah! }
 
  (See my question to the list, last week)

 Right. What works today is

 grammar Foo {
   token TOP { foo }
   token foo { blah }
 }

 if Foo.parse('blah') {
   say yes
 }



I will do this. Thanks.

 * Don't inherit from roles, implement them with 'does'
 
 
  I did that, didn't I? Did I typo something?
 
 grammar URI::rfc2396 does URI::Grammarish ...
 

 and

 grammarb URI::rfc3986_regex is URI::Grammarish

 that's what I meant


That's a double typo (grammarb and is). I'll fix that in the version I put
up after this discussion.

it's called #perl6, and is our IRC channel :-)
 Writing down such volatile information isn't very useful, because it
 becomes outdated rather quickly.


I used to be active in #perl6. I'll try to jump back in.

I'm noting the rest of what you said and moving forward with the changes. It
all sounds much more reasonable than I feared it would be.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


replacement of $

2006-04-01 Thread Larry Wall


Recently I had time to think about the $ symbol we use in Perl.

I think Perl has been using the USD symbol for too long, and I'm now sure
that it's time to replace it. After some research I came to the conclusion 
that the best fit is the euro symbol (€).

So, spread the word, Perl 6 will require you to replace all the $ in your 
scripts with €. That's just a regex after all...




Re: replacement of $

2006-04-01 Thread Darren Duncan

At 15:04 -0800 1/4/06, Larry Wall wrote:

Recently I had time to think about the $ symbol we use in Perl.

I think Perl has been using the USD symbol for too long, and I'm now sure
that it's time to replace it. After some research I came to the conclusion
that the best fit is the euro symbol (¤).

So, spread the word, Perl 6 will require you to replace all the $ in your
scripts with ¤. That's just a regex after all...


But $ isn't specifically a USD symbol.  Its used 
by Canada and Australia too, if not more places. 
Its multi-country like the Euro is.


Perhaps what we need is a more universal currency.  I suggest gold.

So every relevant symbol name could start with 
'Au' instead of '$', and an advantage of this is 
that it is still easy to type on any keyboard.


-- Darren Duncan


Re: replacement of $

2006-04-01 Thread Larry Wall
On Sat, Apr 01, 2006 at 03:11:27PM -0800, Larry Wall wrote:
: 
: 
: Recently I had time to think about the $ symbol we use in Perl.
: 
: I think Perl has been using the USD symbol for too long, and I'm now sure
: that it's time to replace it. After some research I came to the conclusion 
: that the best fit is the euro symbol (€).
: 
: So, spread the word, Perl 6 will require you to replace all the $ in your 
: scripts with €. That's just a regex after all...

Hmm, like anyone's going to believe you...

Anyone can forge Larry-like email.  Look how easy it is for *me* to
forge email from Larry even though I'm in Japan right now.  All you
have to do is end most of your paragraphs with ... and throw in a
few Hmms here and there

And then there's all these checkins I've been forging from Audrey.
Piece o' cake...

Now, gettin' myself made up to look like Larry for the Bugs Manifesto,
that was a wee bit more challenging, but I think most people were fooled...

TomTiady, AKA 落第の駱駝, AKA Larry Boy (the tsukemono, not the pickle)


Re: replacement of $

2006-04-01 Thread Yuval Kogman
On Sun, Apr 02, 2006 at 02:04:07 +0300, Larry Wall wrote:
^^^-- (actually that was IDT in the headers)

 Hi,
 I'm in Israel and Japan at the same time!

Nice one though ;-)

plugIf you guys would have participated in the keysigning
parties.../plug

-- 
  Yuval Kogman [EMAIL PROTECTED]
http://nothingmuch.woobling.org  0xEBD27418



pgpAaFxAF3CvE.pgp
Description: PGP signature