Perl 6 Rules implementations in Haskell and Perl 6.

A 20-minutes hack had paid off handsomely: In 33 lines, we have a fast, pure Haskell implementation of basic Perl 6 Rules support. Adding support for each new construct should take only 2 lines -- one at the OpTable precedence parser below to recognize it, and one to compile it to the underlying PArrows library.

(As an aside, I'm glad to have finally grokked what an Arrow is. As usual with lambdafolk technologies such as MonadsActions, it is very simple yet tricky to convey, and could really use a better name like Flow.)

Of course, the real work is done somewhere else. Aside from using PArrows for the heavy lifting in top-down parsing, we have the bottom-up OpTable.hs precedence parser (a straight port of Parrot's PGE::OPTable). This is my first successful use of GHC's implicit parameters, which are exactly like "env" variables as recently specced in the lexical syntax spec (S02), allowing a much faster and concise implementation than Reader monads.

Currently the OpTable port covers all features found in the PIR base code, and I plan to add list/chain-associative infix operators to it, so we can  perform interesting optimizations like turning /cat|car|cow/ into /c[a[t|r]|ow]/.

Much kudos to Patrick Michaud for doing all the design work for Rules and OpTable, so this coding monkey can just plunge ahead without any head-banging against the wall.

Luqui is also doing well in implementing both libraries in Perl 6, using a role-based composition architecture to handle multiple input types and evaluation strategies.  More details is available in its architecture document.

With luck, we can ship 6.28.0 with this Haskell-native Rule support, without having PGE (and thus Parrot) as a dependency. We'd still use PGE when compiling to Parrot, and rely on luqui's Parse::Rule when targetting _javascript_ and Perl 5. PArrows can compile to _javascript_ natively, which may speed up things a bit...  We'll see. :-)

Second day of Text.Parser.Rule.

I managed to transcribe most of PGE::P6Rule to Text.Parser.Rule, so about 80% of rule syntax is now parsed correctly into a Rule tree. Instead of using GADTs, I went for a more conventional cascading data type approach, with five levels of rule nodes:

(Alternation > Conjunction > Concat > Quantified > Term)

Moreover, list-associative infix support landed in OpTable parser, so /(a|b|c)/ is now parsed as alternation of a three-element list, instead of (a|(b|c)) in the current PGE implementation.

Multipositional error reporting.

It's past midnight here, so I'll let the commit log speak for itself:

$ ghc --make -main-is Text.Parser.Rule.main \
      -isrc -Isrc/cbits
src/cbits/*o \
 src/Text/Parser/Rule.hs
$ ./a.out '.abc|def' 'xyz'
Expecting: 'def' at line 1, column 1
       or: 'abc' at line 1, column 2

I have rewritten half of PArrow for this.  The neat thing is that if it matches at any given point, none of the expensive error collection would take place, so there should be no severe performance drawbacks.

Also, the returned match object is now compatible with PGE's structure, so hooking it back to Pugs is trivial.

Day 4 of Text.Parser.Rule.

Today I've rewritten PArrow's primitive combinators to support proper backtracking and non-greedy matching, as well as subrule support.  This now works correctly:

miniLang :: Grammar
miniLang = grammar
    [ "literal" ~:~ "nil | true | false"
    , "call"    ~:~ "\\. <method>"
    , "method"  ~:~ "\\w+"
    , "expr"    ~:~ "<literal> <call>*"
    ]

ghci> "nil.foo.bar" ~~ miniLang .<> "expr"
Right (MatchObj
    { matchString = "nil.foo.bar"
    , matchSubPos = fromList []
    , matchSubNam =
        { "call" := MatchSeq (fromList
            [ MatchStr "."
            , MatchNam "method" (MatchStr "bar")
            ])
        , "literal" := MatchStr "nil"
        }
    })

Both named and positional captures are supported, as well as negated and noncapturing variants (<!name> and <?name> respectively).  This seems sufficient for porting the surface language parser of PILN and PIL2 from Parsec; tewk already volunteered to give it a try.

If it turns out that we need more features (e.g. lookaround, integration with OpTable, callbacks and backreferences), I'll definitely hack them in. :-)

Mixing Rules with OpTable.

As Patrick and Larry had observed, Perl 6 is better parsed with a combination of top-down (Rules) and bottom-up (OpTable) parsers. To reach this goal, today I hacked in Dynamic Rules support to Text.Parser.Rule, so we can define a rule term with any function that can parse a string into anything:

data DynamicTerm = MkDynamicTerm
    { dynLabel :: Str
    , dynTerm  :: Str -> Maybe (Dynamic, Str)
    }
    deriving (Data, Typeable)

As a concrete example, here is a grammar that parses a named rule declaration:

ruleGrammar :: Grammar
ruleGrammar = grammar
    [ "ruleDecl" ~:~ "rule \\s+ (\\w+) \\s* \\{ <ruleBody> \\}"
    , "ruleBody" ~&~ ruleTable
    ]

And the ruleTable is defined as an OpTable:

ruleTable :: OpTable Rule
ruleTable = mkOpTable
    [ noWs (op _Lit  Term wsLiteral)
   ++ noWs (op (_Group CapturePos)   Circumfix "( )")
   ++ noWs (op (_Group NonCapture)   Circumfix "[ ]")
   -- ...and a lot more...
    ]

Because of the flexible Dynamic return type, the term derived from OpTable doesn't have to create a MatchObj structure, but can simply return the matched Rule. We can also hook Parsec and parser methods defined in Perl 6 to this engine -- it's very likely to happen during the development towards 6.281, if not sooner.

In other news, tewk finished porting the PILN minilang parser to Rules. It does not run yet, because I hadn't got around to add enumerated character class support, but there's always tomorrow... :-)


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to