[il-antlr-interest: 32356] Re: [antlr-interest] Rematching AST Nodes
You grammar doesn't have an 'aaa' token. It does have CHARACTERS tokens. If 'aaa' is special, then you need to match it in your grammar like a keyword. Then you can reference it in your tree grammar. Otherwise you will need to match any CHARACTERS token in your rematch rule and do what you need to when the value is 'aaa' and do something else when it is not. Your tree grammars can only work with the tokens your lexers produce (and the same set that your parsers use as well). That's unfortunate. I'm working on a workaround using semantic predicates. The huge downside is that I have to implement in a separate piece of Java code the boolean validation function for the semantic predicate. Then in a second separate piece of Java code I implement the string parsing function. This solution is far less elegant than implementing everything as ANTLR logic. Court List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 32357] Re: [antlr-interest] Lexer too quick to grab a token?
On Mon, May 2, 2011 at 1:19 AM, Todd O'Bryan toddobr...@gmail.com wrote: ... Does this make any sense? Is there some way to deal with it? ... You could let '/]]' be matched in the 'R_TAG' rule and emit another token as per the instructions described here: http://www.antlr.org/wiki/pages/viewpage.action?pageId=3604497 A demo: lexer grammar TLexer; @members { ListToken tokens = new ArrayListToken(); private void emit(String text, int type) { Token token = new CommonToken(type, text); token.setType(type); emit(token); } @Override public void emit(Token token) { state.token = token; tokens.add(token); } @Override public Token nextToken() { super.nextToken(); if(tokens.size() == 0) { return Token.EOF_TOKEN; } return (Token)tokens.remove(0); } } L_TAG : '[/' ; R_TAG : '/]]' {emit(/, ANY); emit(]], R_BRACKET);} | '/]' ; L_BRACKET : '[[' ; R_BRACKET : ']]' ; SPACE : (' ' | '\t' | '\r' | '\n') {skip();} ; ANY : . ; which can be tested with the class: import org.antlr.runtime.*; public class Main { public static void main(String[] args) throws Exception { String source = [/ foo /] [[/ bar /]]; ANTLRStringStream in = new ANTLRStringStream(source); TLexer lexer = new TLexer(in); CommonTokenStream tokens = new CommonTokenStream(lexer); for(Object o : tokens.getTokens()) { Token t = (Token)o; System.out.println(text= + t.getText() + , type= + t.getType()); } } } Regards, Bart. List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 32358] Re: [antlr-interest] Rematching AST Nodes
I suspect that you are approaching this problem incorrectly in some way. Why do you feel you need to specify a new token at the AST stage? Why don't you restate your goal, ignoring what you have done so far - I suspect that we may be trying to solve a problem that you should not have. Jim -Original Message- From: antlr-interest-boun...@antlr.org [mailto:antlr-interest- boun...@antlr.org] On Behalf Of Courtney Falk Sent: Monday, May 02, 2011 5:29 AM To: antlr-interest@antlr.org Subject: Re: [antlr-interest] Rematching AST Nodes You grammar doesn't have an 'aaa' token. It does have CHARACTERS tokens. If 'aaa' is special, then you need to match it in your grammar like a keyword. Then you can reference it in your tree grammar. Otherwise you will need to match any CHARACTERS token in your rematch rule and do what you need to when the value is 'aaa' and do something else when it is not. Your tree grammars can only work with the tokens your lexers produce (and the same set that your parsers use as well). That's unfortunate. I'm working on a workaround using semantic predicates. The huge downside is that I have to implement in a separate piece of Java code the boolean validation function for the semantic predicate. Then in a second separate piece of Java code I implement the string parsing function. This solution is far less elegant than implementing everything as ANTLR logic. Court List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 32359] Re: [antlr-interest] Rematching AST Nodes
On 5/2/2011 9:47 AM, Jim Idle wrote: I suspect that you are approaching this problem incorrectly in some way. Why do you feel you need to specify a new token at the AST stage? Why don't you restate your goal, ignoring what you have done so far - I suspect that we may be trying to solve a problem that you should not have. Certainly. I was trying to keep things simple/short, but I can expand. My project is a NLP tokenizer/parser. The first stage of functionality is implemented the FuzzyLexer and FuzzyParser grammars. They strip out all punctuation and white space, preserving them as tokens and grouping all the text between the punctuation/white space as unspecified tokens. Stage 1.5 is the language-specific composite grammar (Sentential.g), which imports the Fuzzy* grammars. Here, I implement regular expressions used in semantic predicates that attempt to categorize unspecified tokens into relevant categories (see also, LongNumber.java). For instance, the string one would be cast as a long form number token. Any unspecified tokens that don't match any semantic predicates stay unspecified tokens. Stage 2, which is yet to be written, walks the AST output by stage 1.5 and wraps the tokens up into an application-specific data structure. This tree grammar will also perform tasks such as clustering together numbers into one single number, etc. Courtney Falk co...@infiauto.com lexer grammar FuzzyLexer; options { filter=UNSPECIFIED; k=2; } @members { private StringBuilder unknown; { unknown = new StringBuilder(); } public void appendUnknown(char c) { unknown.append(c); } public String getUnknown() { String result = unknown.toString(); clearUnknown(); return result; } public void clearUnknown() { unknown.delete(0, unknown.length()); } public boolean isUnknownEmpty() { return unknown.length() == 0; } @Override public void match(String s) throws MismatchedTokenException { int i = 0; while ( is.length() ) { unknown.append((char)input.LA(1)); if ( input.LA(1)!=s.charAt(i) ) { if ( state.backtracking0 ) { state.failed = true; return; } MismatchedTokenException mte = new MismatchedTokenException(s.charAt(i), input); recover(mte); throw mte; } i++; input.consume(); state.failed = false; } // successfully matched the string clearUnknown(); } } ELLIPSIS : '...'; PERIOD : '.'; QUESTION_MARK : '?'; LEFT_QUESTION_MARK : '¿'; EXCLAMATION_POINT : '!'; LEFT_EXCLAMATION_POINT : '¡'; COMMA : ','; COLON : ':'; SEMI_COLON : ';'; MDASH : '--'; DASH : '-'; FORWARD_SLASH : '/'; QUOTATION_MARK : ''; SINGLE_QUOTATION_MARK : '\''; LEFT_PARENTHESIS : '('; RIGHT_PARENTHESIS : ')'; LEFT_BRACKET : '['; RIGHT_BRACKET : ']'; LEFT_BRACE : '{'; RIGHT_BRACE : '}'; WHITESPACE : ' ' | '\t' | '\r' | '\n'; protected UNSPECIFIED : . { unknown.append(getText()); };parser grammar FuzzyParser; @members { public Sentential_FuzzyLexer lexer; public void setLexer(Sentential_FuzzyLexer lexer) { this.lexer = lexer; } } whitespace : WHITESPACE+; unspecified returns [String s] : UNSPECIFIED+ { $s = lexer.getUnknown(); } ; nonterminal_punctuation : COMMA | COLON | SEMI_COLON | FORWARD_SLASH | MDASH | DASH | QUOTATION_MARK | SINGLE_QUOTATION_MARK ; terminal_punctuation : PERIOD | EXCLAMATION_POINT | QUESTION_MARK | ELLIPSIS ;package com.infiauto.ontosem.lang.eng; enum LongNumber { ZERO(zero, 0, 0), ONE(one, 0, 1), TWO(two, 0, 2), THREE(three, 0, 3), FOUR(four, 0, 4), FIVE(five, 0, 5), SIX(six, 0, 6), SEVEN(seven, 0, 7), EIGHT(eight, 0, 8), NINE(nine, 0, 9), TEN(ten, 1, 10), ELEVEN(eleven, 1, 11), TWELVE(twelve, 1, 12), THIRTEEN(thirteen, 1, 13), FOURTEEN(fourteen, 1, 14), FIFTEEN(fifteen, 1, 15), SIXTEEN(sixteen, 1, 16), SEVENTEEN(seventeen, 1, 17), EIGHTEEN(eighteen, 1, 18), NINTEEN(ninteen, 1, 19), TWENTY(twenty, 1, 20), THIRTY(thirty, 1, 30), FORTY(forty, 1, 40), FIFTY(fifty, 1, 50), SIXTY(sixty, 1, 60), SEVENTY(seventy, 1, 70), EIGHTY(eighty, 1, 80), NINTY(ninty, 1, 90), HUNDRED(hundred, 2, 100), THOUSAND(thousand, 3, 1000), MILLION(million, 6, 100), BILLION(billion, 9, 10); private String long_form; private long power; private long value; private LongNumber(String long_form, long power, long value) { this.long_form = long_form; this.power = power; this.value = value; } public String getLongForm() { return long_form; } public long