Apparently I don't know how to use my own tool! Let's try this again, this time not so rushed on my part :)

        action le {}
        foo = 'hello' $^le;
        main := (
                any* |
                foo
        );

Local error actions are local to the named machine they are in, not the enclosing (), which is the rushed mistake I made.

Thanks,
 Adrian

On 11-02-02 02:17 PM, Murray Henderson wrote:
Hi Adrian,

Thanks for taking an interest :-).


As far as I can tell,

  main = (
           ('HELLO ' $^parse_error) 'WORLD' |
           any*
        );

and

  main = (
           ('HELLO ' $!parse_error) 'WORLD' |
           any*
        );

are equivalent to


  main = any*;




Anyway, the real machine I am trying to build currently looks like this:


doctype_single_quoted_value = (
     "'" ([^>]*)
         >start_token_value
         %end_token
     :>>  "'"
);

doctype_double_quoted_value = (
     '"' ([^>]*)
         >start_token_value
         %end_token
     :>>  '"'
);

doctype_quoted_value = (doctype_single_quoted_value |
doctype_double_quoted_value);

doctype_name = (
     space+ (any - ('>' | space))+
         >start_token_doctype_name
         %end_token
);

doctype_public = space+ 'PUBLIC' %token_doctype_public space+
doctype_quoted_value;

doctype_system = space+ 'SYSTEM' %token_doctype_system space+
doctype_quoted_value;

doctype = (
     '<!DOCTYPE' %token_doctype space* (doctype_name doctype_public?
doctype_system?)? space* '>'
);



This machine looks about right (in the FSM diagram) except that it
doesn't handle malformed doctypes.

With the $^^ operator I described, I imagine the machine would look
like this (given a parse error action, pe):



doctype = (
     '<!DOCTYPE' %token_doctype space* ((doctype_name doctype_public?
doctype_system?) $^^pe)? space*<: ([^>]+>pe)? '>'
);


Additionally, I think I might be able to use that imaginary operator
to make whitespace optional (though with a parse error if the
whitespace is omitted):

eg:

omittable_space = space+>^^pe;
doctype_public = omittable_space 'PUBLIC' %token_doctype_public
omittable_space doctype_quoted_value;




I will be using this machine inside multiple scanners, so goto based
error recovery would be a pain. Default actions that transition to the
final state seem like a handy feature for any permissive parser
(although I realize I am doing something extreme).

I still thinking about attempting to patch ragel. Much more
complicated than I thought it would be, but can't hurt for me to give
it a crack.


Still absolutely nowhere near finished, but my work is progressing slowly ;-).
https://github.com/murrayh/html5rl/blob/master/html5_grammar.rl


Cheers,
Murray


On Tue, Feb 1, 2011 at 5:16 PM, Adrian Thurston<[email protected]>  wrote:
Hi, does this do what you want?

main = (
          ('HELLO ' $^parse_error) 'WORLD' |
          any*
       );

I'm not sure how that fits into your overall plan. Try it out and we'll
discuss further.

Regards,
  Adrian

On 11-01-31 03:50 PM, Murray Henderson wrote:

Hello,

Both local and global error actions transition to the error state. I
am using Ragel 6.5. I can try with 6.6 when I get home.

I made a quick example (based off S. Geist's example):

http://pastebin.com/06ihRxQg

Example output:

HELLO WORLD
read: HELLO WORLD
len: 12, state: 12
HELWORLD
parse error
read: HEL
len: 3, state: 0


Cheers,
Murray


On Tue, Feb 1, 2011 at 10:02 AM, Adrian Thurston<[email protected]>
  wrote:

Local error actions don't. Sorry I should have suggested just those.

On 11-01-31 02:58 PM, Murray Henderson wrote:

Hello,

Local and global error actions transition to the error state.

I want DEF to transition to the next machine (ie. behave like a final
state), not the error state.

The parser I am writing is permissive, all input must be accepted (I
never want to goto the error state).

I do not wish to use manual goto recovery, because the parser is large
and complex, such manual tracking is a lot of work and error prone.

Cheers,
Murray



On Tue, Feb 1, 2011 at 4:58 AM, Adrian Thurston
<[email protected]>      wrote:

Hi, have you looked at ragel's local and global error actions yet?
These
may
do what you want.

-Adrian

On 11-01-26 08:08 PM, Murray Henderson wrote:

Hello,

I want to embed a default action into a machine that leaves the
machine (without using manual a jump inside the action).

For simplicities sake, I will call this operator $^^ (since it is
similar to the Local Error operator).


Example:

action parse_error {}
helloworld = ('HELLO ' %^^parse_error) 'WORLD';

Non-error inputs include:
HELLO WORLD
HELLOWORLD (parse_error action occurs on 'O' ->        'W' transition)
HELLWORLD (parse_error action occurs on 'L' ->        'W' transition)
HELWORLD (parse_error action occurs on 'L' ->        'W' transition)
HEWORLD (parse_error action occurs on 'E' ->        'W' transition)
HWORLD (parse_error action occurs on 'H' ->        'W' transition)
WORLD (parse_error action occurs on ->        'W' transition)


I can simulate the above behavior with the '?' operator, but that is
laborious, and there are other ways of using $^^ that I suspect cannot
be simulated.


I want this operator because I am trying to make a liberal parser that
accepts all possible input. (Every state must have a default action)
.I am creating a html5 parser that uses regular machines for
tokenizing, and scanners built from the regular machines for parsing.
Yes, I am mad.

I cannot use manual jumps, because I don't want to jump out of the
scanners mid-token.


I am willing to try and add this operator into Ragel myself. I have
grabbed the source code and tracked my way to fsmap.cpp, where the new
operator would be added.

Before I continue...
Is there already a way to achieve my desired behavior that I am not
aware
of?
Would such an operator be worthwhile? Is it even possible?
Is there any knowledge that could be imparted that would help me make
a
patch?

If I do end up making a patch, for symmetry purposes I will make
global/local and start/any/final etc versions of the operator.

After a brief look through the source, it looks like I would need to
mod the FsmAp::fillGaps() function, passing in a (separate object for
each?) final state into the FsmAp::attachNewTrans() instead of NULL.

Ragel is a wonderful program by the way, thank you for creating it.

Cheers,
Murray

_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users


_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users


_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users

_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users


_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users

_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users


_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users

_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users

Reply via email to