[il-antlr-interest: 29096] Re: [antlr-interest] .* consuming all input
Hi, thank You for answer. Book The definitive ANTLR Reference , page 101, states, that ANTLR automatically makes .' and .+ non-greedy. Also adding non-greedi option does nothing. Georg Oliver Zeigermann oliver.zeigerm...@gmail.com wrote on 3 Jun 2010, 12:09 PM: Subject: Re: [antlr-interest] .* consuming all input Hi, I am pretty sure wildcards are *greedy* by default and you have to switch on non-greediness. I seem to remember this should look like: (options {greedy=false;}:.)* - Oliver 2010/6/3 George Soom george.s...@siria.cc: Hi, according to documentation wildcards are non-greedy in ANTLR, so rule ' comment: '//' a+=.* NEWLINE - comment(a={a}) ' should match anything until newline, construct list 'a' and send it to template 'comment'. Somehow .* will consume everything up to the end of input file so I get error 'line 0:-1 mismatched input 'EOF' expecting NEWLINE'. NEWLINE is defined as NEWLINE: ('\r'? '\n')+; and is not sent to hidden channel or skipped. Where is the problem? I need to send everything to template comment, so I can not send comments to trashbin through lexer rule. Thank You Georg List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29095] Re: [antlr-interest] .* consuming all input
On Thu, Jun 3, 2010 at 11:09 AM, Oliver Zeigermann oliver.zeigerm...@gmail.com wrote: Hi, I am pretty sure wildcards are *greedy* by default and ... + and * are normally greedy, except when preceded by a DOT. From the Definitive ANTLR reference: *What you really want to type, though, and what you will see in other* *systems, is the terse notation: ’.*’ and ’.+’. Unfortunately, following the * *usual convention that all subrules are greedy makes this notation useless.* *Such greedy subrules would match all characters until the end of* *file. Instead, ANTLR considers them idioms for “Match any symbol until* *you see what lies beyond the subrule.” ANTLR automatically makes* *these two subrules nongreedy. So, you can use ’.*’ instead of manually* *specifying the option.* See chapter 4, *Extended BNF Subrules*, page 86. Regards, Bart. List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29100] Re: [antlr-interest] Advice with backtracking/ambiguity
On 6/2/10 5:38 PM, John B. Brodie j...@acm.org wrote: On Wed, 2010-06-02 at 17:03 -0500, Ken Williams wrote: Yeah, probably I should be using parser rules. I was trying to keep things simple by making everything a linear stream of tokens from the point of view of the Java caller, while still having high-level constructs like DATE. just be aware that when you make date a parser rule WS will be silently accepted between the DIGITS and SLASHes comprising the date non-terminal. Yeah, good point. In this case that's fine. It would be nice, though, if there were a per-rule parser directive to control which channel(s) to pay attention to, something like this: date options {channel=ALL;} :DIGITS SLASH DIGITS SLASH DIGITS ; Not sure whether that's feasible or not though. -- Ken Williams Sr. Research Scientist Thomson Reuters Phone: 651-848-7712 ken.willi...@thomsonreuters.com List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29101] [antlr-interest] Grammar not detecting extraneous input at end
Hi, I have a grammar that does not give an error (in ANTLR 3.2) if there are extraneous characters at the end of the input. For example, I would expect (a+b)) to generate an error, but it does not. (a+b) is parsed fine and the extra ) at the end is just ignored. I enclose a small sample grammar that demonstrates this problem. I am using the C runtime, but the problem is not there. The error is not detected in ANTLRWorks 1.4 either. Any help anyone can give me would be greatly appreciated. Thank you, Karim List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address Test.g Description: Binary data -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29104] Re: [antlr-interest] Grammar not detecting extraneous input at end
Channeling Jim Idle: Remember to use antlr.markmail.org, this is a very common question, and has been answered many times. More then likely you don't require EOF at the end of whatever your start rule is. The token stream more then likely has a ')' and if you tried to parse again, you'd get an error. Use EOF to require all input to be parsed. Kirby On Thu, Jun 3, 2010 at 1:38 PM, Karim Chichakly karim...@gmail.com wrote: Hi, I have a grammar that does not give an error (in ANTLR 3.2) if there are extraneous characters at the end of the input. For example, I would expect (a+b)) to generate an error, but it does not. (a+b) is parsed fine and the extra ) at the end is just ignored. I enclose a small sample grammar that demonstrates this problem. I am using the C runtime, but the problem is not there. The error is not detected in ANTLRWorks 1.4 either. Any help anyone can give me would be greatly appreciated. Thank you, Karim List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29109] Re: [antlr-interest] Grammar not detecting extraneous input at end
Thank you Bart and Kirby, that is very helpful! Karim On Thu, Jun 3, 2010 at 2:47 PM, Bart Kiers bki...@gmail.com wrote: Hi, On Thu, Jun 3, 2010 at 8:38 PM, Karim Chichakly karim...@gmail.com wrote: Hi, I have a grammar that does not give an error (in ANTLR 3.2) if there are extraneous characters at the end of the input. For example, I would expect (a+b)) to generate an error, but it does not. (a+b) is parsed fine and the extra ) at the end is just ignored. Since (a+b)) does not contain any illegal tokens, the parser simply stops after it (successfully) parses (a+b). You'll want to force the parser to go through the entire token stream by adding an 'EOF' after your 'equation' rule: equation : expr EOF ; Regards, Bart. List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29115] Re: [antlr-interest] Multiple lexer tokens per rule
Try this to get you started: - @lexer::members { // Queue to hold additional tokens private java.util.QueueToken tokenQueue = new java.util.LinkedListToken(); // Include queue in reset(). public void reset() { super.reset(); tokenQueue.clear(); } // Queued tokens are returned before matching a new token. public Token nextToken() { if (tokenQueue.peek() != null) return tokenQueue.poll(); return super.nextToken(); } } MATCHED_TOKEN: ... { // Add additional tokens to the queue. tokenQueue( new CommonToken( ... ) ); } - MATCHED_TOKEN is returned first, and additional tokens queued by MATCHED_TOKEN's action are returned subsequently before matching new tokens in the input stream. Instantiate the additional token accordingly if you need input stream context - see Lexer.emit(). Ken Williams wrote: On 6/3/10 4:18 PM, Jim Idle j...@temporal-wave.com wrote: Add to an array or collection then get nextToken to remove from the collection. It si slower to do this so it isn't the default way. Yeah, that's what the book says. =) It seems like there are some subtleties involved, though - there's a lot of bookkeeping in nextToken() that looks kind of scary (e.g. the current-line-number stuff, the default-channel stuff, etc.), and if I override it I'm really not confident I'll do it correctly. I'm also unsure how mTokens(), emit(), and nextToken() cooperate with their member variables. I tried this simple-minded implementation, and started getting out-of-bounds exceptions: @lexer::members { ListToken tokBuf = new ArrayListToken(); public Token nextToken() { while (tokBuf.isEmpty()) { emit(); } return tokBuf.remove(0); } public void emit(Token token) { tokBuf.add(token); } } So if someone does have a working example, I'd love to see it! List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.