I took what I think you were trying to accomplish (at least to some degree) and tried to develop a state machine based on the specifications as I understood them. I did this partially to help me understand Ragel a little bit better, so I hope you don't mind that I didn't use much of what you provided in your original email. Some notes related to your email though:
- I don't think you want a scanner for what you are trying to accomplish. A scanner spits out things like "number" or "string" or "operator" without regards to how those things are put together. I think you want something that understands the structure of what can happen where (e.g., udp:foo = true "sets" udp:foo equal to true ... a scanner might kick out "identifer" [for udp:foo], "operator" or "equals" for '=' and "keyword" or "identifier" for 'true'.). - I think you might be able to accomplish some of what you were intending (even with a scanner) by using fgoto instead of fcall (although I'm not entirely sure as I didn't fully grasp your code). The following is in C (not C++), but I think should be easy to follow. Note that the last section of my 'main' (checking for errors) was very helpful in letting me know when I screwed something up (e.g., forgot a specific char in a machine, etc.). All I'm doing is printing stuff out, but you could adapt it to your needs. Hope this helps (PS - you may want to read http://zedshaw.com/essays/ragel_state_charts.html - I also found it very helpful in getting started with Ragel): #include <stdio.h> #include <string.h> %%{ machine sas_scanner; action init { printf("group: "); start = fpc+1; } action args { printf("call: %.*s\n", fpc-start, start); start = fpc+1; } action pr { printf("%.*s\n", fpc-start, start); start = fpc+1; } action kwd { printf(" %.*s = ", fpc-start, start); start = fpc+1; } action nl { printf("\n"); start = fpc+1; } action reset { start = fpc; } action chain { printf("- Chained call -\n"); start = fpc+1; } action prset { printf(" Set: %.*s ->", fpc-start, start); start = fpc+1; } main := ( start: ( "group " @init -> group_name ), group_name: ( alpha+ -> group_name | " " @pr -> group_name | "{" -> details ), details: ( '(' @args -> arguments | [:.] -> details | '>' @chain -> details | alpha+ -> details | '\n' @reset -> details | ';' @reset -> details | '=' @prset -> set | ' ' -> details | digit+ -> details | '}' -> final ), arguments: ( ',' @pr -> arguments | alpha+ -> arguments | ':' @kwd -> arguments | ' ' -> arguments | digit+ -> arguments | ')' @pr -> details ), set: ( alpha+ -> set | ' ' -> set | ';' @pr -> details ) ); }%% %% write data; int main () { char* to_parse = "group MyGroup {\n" " tcpclient( host: foo, port: 49152 );\n" " udp( host: bar, port: 49152 ) > tcpserver( port: 11111 );\n" " udp:foo:49152.nonblocking = true;\n" "}"; int cs, act; const char* p = to_parse; const char* pe = to_parse + strlen(to_parse); const char* start; const char* end; %% write init; %% write exec; if (cs == sas_scanner_error) { printf("Error parsing @ %s\n", p); } return 0; } --------------------- Kevin T. Ryan http://blog.gridmule.com/ On Fri, Aug 5, 2011 at 9:13 AM, 𝄆 Rob Harris 𝄇 <[email protected]> wrote: > > All, help. I've R'd TFM all week trying to figure this out, but am still > confused (so please pardon the potential n00bness.) > > I have to parse a config file for an app I'm working on, whose format is > basically of the format: > group MyGroup { > tcpclient( host: foo, port: 49152 ); > udp( host: bar, port: 49152 ) > tcpserver( port: 11111 ); > udp:foo:49152.nonblocking = true; > } > > From what I've read on the Intertubes, it seems that the SOP for processing > this is to define a main := which will match a particular line of the text > and then upon matching call a another machine to "scan" the message. > However, I'm not sure how to do that because it seem that regardless of > whether I define main as a matcher or a scanner, executing the parser always > seems to consume the text as it matches. For instance, when I parse the > group definition, I can simply match on the word "group" and then pass the > rest of the line (up to the {) in to the scanner and I can get 'MyGroup' out > relatively easily. However, when I try to parse the first encapsulated line, > I don't know whether I'm dealing with a string of the first line form or > third line form (or if the command is "chained" as in the second line) until > I've done a kleene star match of the entire line (up to the ;) at which > point it seems that the parser has already consumed the entire line and when > I pass it into a scanner the pointers are already at the next line. Do I > need the store the starting pointer before the first main scan (and if so, > how?) and then how would I tell the downstream scanner where to start? I > thought of making a number of nested c++ "parser objects" but that just seem > inherently wrong. > > Below is what I've written so far--just enough to hopefully pass the first > two cases. Again, I don't know if I'm only a character or so off or if my > mindset is completely off. Any help would be appreciated. > > -- > Rob Harris > Technological Pragmatist > rob period harris shift-2 gmail decimal-point com > "The universe tends towards maximum irony." --Jamie Zawinsky > > %%{ > machine sas_scanner; > ml_comment = '/*' ( any )* :>> '*/'; > sl_comment = '//' [^\n]* '\n'; > comment = ml_comment | sl_comment; > wspace = comment | space+ ; > integer = [0-9]*; > float = [0-9]* '.' [0-9]*; > identifier = [a-zA-Z][a-zA-Z0-9]*; > fqsm = [a-zA-Z] ( [a-zA-Z0-9:][a-zA-Z0-9_] )*; > sqstring = '\'' [^\n]* :>> '\''; > dqstring = '\"' [^\n]* :>> '\"'; > strvalue = ( integer | float | identifier | sqstring | dqstring ); > action DEBUG { fprintf( stderr, "state: %4d, char: %c\n", cs, *p ); } > action RESET { reset(); } > action CRLF { std::cout << std::endl << std::endl; } > action NAME { m_name.append( 1, fc ); } > action KEY { m_key.append( 1, fc ); } > action VAL { m_val.append( 1, fc ); } > action QKV > { > printf( "[%s]=>[%s]\n", m_key.c_str(), m_val.c_str()); > m_kvMap[ m_key ] = m_val; > m_key.clear(); > m_val.clear(); > } > action SNAME { printf( "NAME: [%s]\n", m_name.c_str() ); } > kvpair = ( identifier space* ':' space* strvalue ); > kvlist = ( space+ | kvpair | ',' space+ kvpair ); > instantiation = ( identifier '(' kvlist* ')' ); > > instantiation_chain = ( > instantiation $NAME ( space* '>' space* instantiation )* > ) $NAME >RESET ';' @SNAME; > > inst_chain_scanner := > |* > space+; > identifier => { diff(); }; > strvalue => { diff(); }; > *|; > > group_name = ( 'g' 'r' 'o' 'u' 'p' ); > group_id = ( identifier - group_name ) @NAME; > group_line = ( group_name space+ group_id :>> space* '{' ); > > group_scanner := > |* > space+ => { m_name.clear(); }; > group_name; > group_id => { printf( ">> %s\n", m_name.c_str() ); }; > '{' => { fret; }; > *|; > > main := > |* > wspace+; > group_name => { fcall group_scanner; }; > instantiation_chain => { fcall inst_chain_scanner; }; > *|; > > _______________________________________________ > ragel-users mailing list > [email protected] > http://www.complang.org/mailman/listinfo/ragel-users > _______________________________________________ ragel-users mailing list [email protected] http://www.complang.org/mailman/listinfo/ragel-users
