Something like this might be what you want. It depends on how malformed tags are to be dealt with. This machine just treats them as plain text. They get broken down into separate tokens, but for most applications that's not a problem.

    main := |*
        ( [^{]+ | '{' );
        '{{' lower+ '}}';
        0;
    *|;

-Adrian

On 10-07-28 09:40 PM, Tobias Lütke wrote:
Thanks Alex,

I modified the code to your clever example. This almost works, however
after running the any* scanner p will be advanced all the way to the
end of {{, so the other rule will not match the tag correctly. Here is
my current machine:


       machine parser;

       action start      { tokstart = p; }
       action on_tag     { results<<  [:tag,    data[tokstart..p]] }
       action on_static  { results<<  [:static, data[tokstart..p]] }

       tag  = '{{' lower+ '}}'>start @on_tag;
       html = (any* -- '{{')>start @on_static;
       EOF = 0;

       main := |*
         tag;
         html;
         EOF;
       *|;

Regards
-- tobi



On Tue, Jul 27, 2010 at 9:52 PM, Laslavic, Alex
<[email protected]>  wrote:
I'm actually working on a similar sounding task.

Try the strong subtraction operator
Untested:

main := |*
   '[[' lower+ ']]' =>  action
   ( any* -- '[[' ) =>  action
*|;


( any* -- '[[' ) will match the longest possible string that doesn't have
'[[' as a substring.

-----Original Message-----
From: [email protected] on behalf of Tobias Lütke
Sent: Tue 7/27/2010 6:54 PM
To: [email protected]
Subject: Re: [ragel-users] Parsing a template language

Depends on the answers in this thread I suppose :-)



On Tue, Jul 27, 2010 at 3:42 AM, Magnus Holm<[email protected]>  wrote:
(A little off-topic, but whatever:

So Liquid will finally get a proper parser? :-))

// Magnus Holm



On Tue, Jul 27, 2010 at 03:15, Tobias Lütke<[email protected]>  wrote:
I've been working on a parser for simple template language. I'm using
Ragel.

The requirements are modest. I'm trying to find [[tags]] that can be
embedded anywhere in the input string.

I'm trying to parse a simple template language, something that can
have tags such as {{foo}} embedded within HTML. I tried several
approaches to parse this but had to resort to using a Ragel scanner
and use the inefficient approach of only matching a single character
as a "catch all". I feel this is the wrong way to go about this. I'm
essentially abusing the longest-match bias of the scanner to implement
my default rule ( it can only be 1 char long, so it should always be
the last resort ).

%%{

  machine parser;

  action start      { tokstart = p; }
  action on_tag      { results<<  [:tag, data[tokstart..p]] }
  action on_static  { results<<  [:static, data[p..p]] }

  tag  = ('[[' lower+ ']]')>start @on_tag;

  main := |*
    tag;
    any      =>  on_static;
  *|;

}%%

( actions written in ruby, but should be easy to understand ).

How would you go about writing a parser for such a simple language? Is
Ragel maybe not the right tool? It seems you have to fight Ragel tooth
and nails if the syntax is unpredictable such as this.


Regards
-- tobi

_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users


_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users


_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users


_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users



_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users

_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users

Reply via email to