[il-antlr-interest: 29096] Re: [antlr-interest] .* consuming all input

2010-06-03 Thread George Soom
Hi, thank You for answer.
Book The definitive ANTLR Reference , page 101, states, 
that ANTLR automatically makes .' and .+ non-greedy. 
Also adding non-greedi option does nothing.

Georg

Oliver Zeigermann oliver.zeigerm...@gmail.com wrote on 3 Jun 2010, 12:09
PM:
Subject: Re: [antlr-interest] .* consuming all input
Hi, I am pretty sure wildcards are *greedy* by default and you have to
switch on non-greediness. I seem to remember this should look like:

(options {greedy=false;}:.)*

- Oliver

2010/6/3 George Soom george.s...@siria.cc:

 Hi,
 according to documentation wildcards are non-greedy in ANTLR, so rule '
 comment: '//' a+=.* NEWLINE - comment(a={a}) ' should match anything
until
 newline, construct list 'a' and send it to template 'comment'.
 Somehow .* will consume everything up to the end of input file so I get
 error 'line 0:-1 mismatched input 'EOF' expecting NEWLINE'. NEWLINE is
 defined as NEWLINE: ('\r'? '\n')+; and is not sent to hidden channel or
 skipped. Where is the problem? I need to send everything to template
 comment, so I can not send comments to trashbin through lexer rule.
 Thank You
 Georg



 List: http://www.antlr.org/mailman/listinfo/antlr-interest
 Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address




List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29095] Re: [antlr-interest] .* consuming all input

2010-06-03 Thread Bart Kiers
On Thu, Jun 3, 2010 at 11:09 AM, Oliver Zeigermann 
oliver.zeigerm...@gmail.com wrote:

 Hi, I am pretty sure wildcards are *greedy* by default and ...


+ and * are normally greedy, except when preceded by a DOT. From the
Definitive ANTLR reference:

*What you really want to type, though, and what you will see in other*

*systems, is the terse notation: ’.*’ and ’.+’. Unfortunately, following the
 *

*usual convention that all subrules are greedy makes this notation useless.*

*Such greedy subrules would match all characters until the end of*

*file. Instead, ANTLR considers them idioms for “Match any symbol until*

*you see what lies beyond the subrule.” ANTLR automatically makes*

*these two subrules nongreedy. So, you can use ’.*’ instead of manually*

*specifying the option.*


See chapter 4, *Extended BNF Subrules*, page 86.

Regards,

Bart.

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29100] Re: [antlr-interest] Advice with backtracking/ambiguity

2010-06-03 Thread Ken Williams



On 6/2/10 5:38 PM, John B. Brodie j...@acm.org wrote:

 On Wed, 2010-06-02 at 17:03 -0500, Ken Williams wrote:
 Yeah, probably I should be using parser rules.  I was trying to keep things
 simple by making everything a linear stream of tokens from the point of
 view of the Java caller, while still having high-level constructs like DATE.
 
 just be aware that when you make date a parser rule WS will be silently
 accepted between the DIGITS and SLASHes comprising the date
 non-terminal. 

Yeah, good point.  In this case that's fine.  It would be nice, though, if
there were a per-rule parser directive to control which channel(s) to pay
attention to, something like this:

date
options {channel=ALL;}
:DIGITS SLASH DIGITS SLASH DIGITS ;


Not sure whether that's feasible or not though.

-- 
Ken Williams
Sr. Research Scientist
Thomson Reuters
Phone: 651-848-7712
ken.willi...@thomsonreuters.com



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29101] [antlr-interest] Grammar not detecting extraneous input at end

2010-06-03 Thread Karim Chichakly
Hi,

I have a grammar that does not give an error (in ANTLR 3.2) if there are
extraneous characters at the end of the input.  For example, I would expect
(a+b)) to generate an error, but it does not.  (a+b) is parsed fine and
the extra ) at the end is just ignored.

I enclose a small sample grammar that demonstrates this problem.  I am using
the C runtime, but the problem is not there.  The error is not detected in
ANTLRWorks 1.4 either.

Any help anyone can give me would be greatly appreciated.

Thank you,

Karim

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address


Test.g
Description: Binary data
-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29104] Re: [antlr-interest] Grammar not detecting extraneous input at end

2010-06-03 Thread Kirby Bohling
Channeling Jim Idle:

Remember to use antlr.markmail.org, this is a very common question,
and has been answered many times.

More then likely you don't require EOF at the end of whatever your
start rule is.  The token stream more then likely has a ')' and if you
tried to parse again, you'd get an error.  Use EOF to require all
input to be parsed.

Kirby



On Thu, Jun 3, 2010 at 1:38 PM, Karim Chichakly karim...@gmail.com wrote:
 Hi,

 I have a grammar that does not give an error (in ANTLR 3.2) if there are
 extraneous characters at the end of the input.  For example, I would expect
 (a+b)) to generate an error, but it does not.  (a+b) is parsed fine and
 the extra ) at the end is just ignored.

 I enclose a small sample grammar that demonstrates this problem.  I am using
 the C runtime, but the problem is not there.  The error is not detected in
 ANTLRWorks 1.4 either.

 Any help anyone can give me would be greatly appreciated.

 Thank you,

 Karim


 List: http://www.antlr.org/mailman/listinfo/antlr-interest
 Unsubscribe: 
 http://www.antlr.org/mailman/options/antlr-interest/your-email-address



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29109] Re: [antlr-interest] Grammar not detecting extraneous input at end

2010-06-03 Thread Karim Chichakly
Thank you Bart and Kirby, that is very helpful!

Karim


On Thu, Jun 3, 2010 at 2:47 PM, Bart Kiers bki...@gmail.com wrote:

 Hi,

 On Thu, Jun 3, 2010 at 8:38 PM, Karim Chichakly karim...@gmail.com
 wrote:

  Hi,
 
  I have a grammar that does not give an error (in ANTLR 3.2) if there are
  extraneous characters at the end of the input.  For example, I would
 expect
  (a+b)) to generate an error, but it does not.  (a+b) is parsed fine
 and
  the extra ) at the end is just ignored.
 

 Since (a+b)) does not contain any illegal tokens, the parser simply stops
 after it (successfully) parses (a+b). You'll want to force the parser
 to
 go through the entire token stream by adding an 'EOF' after your 'equation'
 rule:

 equation
  :  expr EOF
  ;


 Regards,

 Bart.

 List: http://www.antlr.org/mailman/listinfo/antlr-interest
 Unsubscribe:
 http://www.antlr.org/mailman/options/antlr-interest/your-email-address


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29115] Re: [antlr-interest] Multiple lexer tokens per rule

2010-06-03 Thread Junkman
Try this to get you started:
-
@lexer::members {

// Queue to hold additional tokens
private java.util.QueueToken tokenQueue = new
java.util.LinkedListToken();

// Include queue in reset().
public void reset() {
super.reset();
tokenQueue.clear();
}

// Queued tokens are returned before matching a new token.
public Token nextToken() {
if (tokenQueue.peek() != null)
return tokenQueue.poll();
return super.nextToken();
}

}

MATCHED_TOKEN:  ...
{
// Add additional tokens to the queue.
tokenQueue( new CommonToken( ... ) );
}

-

MATCHED_TOKEN is returned first, and additional tokens queued by
MATCHED_TOKEN's action are returned subsequently before matching new
tokens in the input stream.

Instantiate the additional token accordingly if you need input stream
context - see Lexer.emit().



Ken Williams wrote:
 
 On 6/3/10 4:18 PM, Jim Idle j...@temporal-wave.com wrote:
 
 Add to an array or collection then get nextToken to remove from the
 collection. It si slower to do this so it isn't the default way.
 
 Yeah, that's what the book says. =)
 
 It seems like there are some subtleties involved, though - there's a lot of
 bookkeeping in nextToken() that looks kind of scary (e.g. the
 current-line-number stuff, the default-channel stuff, etc.), and if I
 override it I'm really not confident I'll do it correctly.  I'm also unsure
 how mTokens(), emit(), and nextToken() cooperate with their member
 variables.
 
 I tried this simple-minded implementation, and started getting out-of-bounds
 exceptions:
 
 @lexer::members {
 ListToken tokBuf = new ArrayListToken();
 public Token nextToken() {
 while (tokBuf.isEmpty()) {
 emit();
 }
 return tokBuf.remove(0);
 }
 public void emit(Token token) {
 tokBuf.add(token);
 }
 }
 
 
 So if someone does have a working example, I'd love to see it!
 


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.