Hi Rajesh, Inside a parser rule, the `~` negates tokens, not characters. So if you have no lexer rule that tokenizes one of: '%', ''^' or '$', then ~SEMICOLON won't match any of such tokens/characters.
Your grammar (with minor modifications): grammar Test; options { output=AST; } tokens { OPTION; OPTION_BLOCK; } query_options : OPTIONS^ option_block ; option_block : L_BRACE option_def* R_BRACE -> ^(OPTION_BLOCK option_def*) ; option_def : option_name option_value -> ^(OPTION option_name option_value) ; option_name : ID (DOT^ ID)* ; option_value : COLON^ (~SEMICOLON)* SEMICOLON! | option_block ; OPTIONS : 'options'; ID: (LETTER | '_') (LETTER | DIGIT | '_')*; DOLLAR: '$'; PERCENT: '%'; CARET: '^'; DOT: '.'; L_BRACE: '{'; R_BRACE: '}'; COLON: ':'; SEMICOLON: ';'; DIGIT : '0'..'9'; SL_COMMENT: '#' ~('\r' | '\n')* { skip(); }; WS: (' ' | '\f' | '\r' | '\t')+ { skip(); }; fragment LETTER : 'a'..'z' | 'A'..'Z'; parses the input: "options { foo: $ % 1 2 45 ^ $ $$$; }" as follows: (options (OPTION_BLOCK (OPTION foo (: $ % 1 2 4 5 ^ $ $ $ $)))) as you can see after running the test rig: import org.antlr.runtime.*; import org.antlr.runtime.tree.*; import org.antlr.stringtemplate.*; public class Main { public static void main(String[] args) throws Exception { ANTLRStringStream in = new ANTLRStringStream("options { foo: $ % 1 2 45 ^ $ $$$; }"); TestLexer lexer = new TestLexer(in); CommonTokenStream tokens = new CommonTokenStream(lexer); TestParser parser = new TestParser(tokens); TestParser.query_options_return returnValue = parser.query_options(); CommonTree tree = (CommonTree)returnValue.getTree(); DOTTreeGenerator gen = new DOTTreeGenerator(); StringTemplate st = gen.toDOT(tree); System.out.println(st); System.out.println("-----------------------\n" + tree.toStringTree()); } } Regards, Bart. On Wed, May 18, 2011 at 12:55 AM, Rajesh Raman <r...@fb.com> wrote: > Hello ANTLR-ites, > > I'm trying to parse an "options" structure, like the following: > > options { > foo { > bar { > ww: $32.50; > xx: Jekyll & Hyde; > } > yy.zz: @15% p/a; > } > } > > (Please ignore the non-sensical values for ww, xx and yy.zz -- I'm just > making a point, which will become clearer below). This options structure > will be followed by a query expression whose grammar is more complicated, > and includes ints/floats, identifiers, operators, etc. etc. > > The grammar I have for parsing the options structure looks like the below. > (The grammar for the query language is complicated and therefore omitted.) > > <snip> > > // ... other stuff here > tokens { > // ... other ad hoc token values > OPTION; > OPTION_BLOCK; > OPTION_VALUE; > } > > // ... > > query_options > : OPTIONS^ option_block > ; > > option_block > : L_BRACE option_def* R_BRACE -> > ^(OPTION_BLOCK option_def*) > ; > > option_def > : option_name option_value -> > ^(OPTION option_name option_value) > ; > > option_name > : ID (DOT^ ID)* > ; > > option_value > : COLON^ (~SEMICOLON)* SEMICOLON! > | option_block > ; > > //... other stuff here > //... > > OPTIONS: 'options'; > ID: (LETTER | '_') (LETTER | DIGIT | '_')*; > DOT: '.'; > L_BRACE: '{'; > R_BRACE: '}'; > COLON: ':'; > SEMICOLON: ';'; > > SL_COMMENT: '#' ~('\r' | '\n')* NEWLINE { skip(); }; > WS: (' ' | '\f' | '\r' | '\t')+ { skip(); }; > > ... > > </snip> > > As mentioned, the "options" clause is part of a larger grammar for a > language that includes operators, identifiers, numbers, etc., However, > within the options clause, I want the characters between the colon and the > semicolon to be treated as a single string, regardless of the fact that it > may contain characters that lex into other tokens used by the language. > This feels like I should be able to use the same techniques as used in > comment-stripping (i.e,. see the line that has COLON^...). But this doesn't > seem to work: > - The "stray" characters that are not used elsewhere in the grammar are > ignored and don't show up in the parse tree (e.g., $, @, %, &, in the > example above) > - Character sequences that form valid tokens for the rest of the language > (like integers or identifiers) are lexed into those respective tokens > instead of being slurped into a single string as intended. > > E.g., when I input a string like "options { foo: $ % 1 2 45 ^ $ $$$; }" > and display the resulting tree.toStringTree(), I get > "(options (OPTION_BLOCK (OPTION foo (: 1 2 45))))" > > Any guidance you have on the above will be greatly appreciated. > > Thanks in advance. > > ++Rajesh > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.