[il-antlr-interest: 28890] Re: [antlr-interest] help please
On Wed, May 19, 2010 at 2:10 AM, Ernesto Castillo hcast...@rocketmail.comwrote: hello everybody my name is Ernesto and i am calling for help on antlr programming, i am a newby in this and i am in my second semester master , after 12 years ago that finish my degree in computer science but because the circumstance never work in the computer field, but planning to get into, so this semester i am taking compiler, and my first programming assignment was really bad because i am not clear how put together the java with the antlr. i know how Java work because my first semester i took Java and i used it with eclipse . Now i thing i have properly installed the antlr 3.2 nevertheless i do not know if i have to install antlrwork because the IDE , i was trying to do the main java with eclipse to invoke antlr but never work . so i feel lost in the sea and i have the antlr book but look like is the old version. my computer is Mac . i would appreciate the help thanks Scott Stanchfield has written some excellent video tutorials starting from the very basics (setting up ANTLR with Eclipse). Have a look at them: http://javadude.com/articles/antlr3xtut/%20http://javadude.com/articles/antlr3xtut/ Kind regards, Bart Kiers. List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28892] [antlr-interest] The Java Method that Generates the Lexer and the Parser
Dear All, I am a Java developer using ANTLR 1.3.1 I am working is some dynamic environment, so my grammar is changing over time due to the continuous change in vocabulary.. So I was thinking of generating my *.g grammar file automatically not to write it by myself.. But now I face the problem that I cannot find the runtime method that takes the grammar file as input, and gives as output the generation of the tokens file, lexer.java file, and parser.java file.. In other words, I simply want the method that does the exact same task as the Generate Code option in the Generate menu in ANTLR 1.3.1 :-) Any help? Thanks in Advance ;-) -- Sameh W. Zaky List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28893] Re: [antlr-interest] The Java Method that Generates the Lexer and the Parser
Sorry, I meant ANTLRWorks 1.3.1.. On Wed, May 19, 2010 at 11:37 AM, Sameh W. Zaky sameh...@gmail.com wrote: Dear All, I am a Java developer using ANTLR 1.3.1 I am working is some dynamic environment, so my grammar is changing over time due to the continuous change in vocabulary.. So I was thinking of generating my *.g grammar file automatically not to write it by myself.. But now I face the problem that I cannot find the runtime method that takes the grammar file as input, and gives as output the generation of the tokens file, lexer.java file, and parser.java file.. In other words, I simply want the method that does the exact same task as the Generate Code option in the Generate menu in ANTLR 1.3.1 :-) Any help? Thanks in Advance ;-) -- Sameh W. Zaky -- Sameh W. Zaky List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28894] Re: [antlr-interest] Skip subtree in tree grammar
Hello list, Did someone solve this? I have a similar problem with a grammar I took from this list (Eval.g and Simple.g). It concerns the . ifElse scope { bool expResult; } : ^( IFTHEN b = expression { $ifElse::expResult = b; } ( {$ifElse::expResult == true}?= actionSequence | . // if expResult == false, no action required but eat the token ) ) | ^( IFTHENELSE b = expression { $ifElse::expResult = b; } ( {$ifElse::expResult == true}? actionSequence . // if expResult == true, call the 'then' action and 'eat' the else action | . actionSequence // if expResult == false, 'eat' the 'then' action and call the else action ) ); On nested statements this fails to throw away the 'false' part of the tree. How can I fix that? Kind regards, Jan On 7-5-2009 20:38, Martijn Reuvers wrote: Hello! I tried it, but neither works. :/ I ran it against a snapshot of 3.1.4 runtime that I built with mave (3.1.3 has the same errors btw): The skip option says when run: * Wildcard invalid as root; wildcard can itself be a tree. As for the | * option it still has a similar error as before: * node from after line 22:12 no viable alternative at input 'DOWN'. This is what I have for the |* -- bool_function_content[Boolean value] scope { Boolean t; } @init { $bool_function_content::t = $value; } : {$bool_function_content::t != null $bool_function_content::t.booleanValue() }? = function_content* | .* ; Any thoughts? Martijn List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28895] Re: [antlr-interest] SKIP() vs skip() in 'C' runtime
Why? :s/skip\(\)/SKIP()/g However it is a macro defined in the generated code, all you need do is: #define skip() SKIP() In an @section that follows the macro definition of SKIP Jim -Original Message- From: antlr-interest-boun...@antlr.org [mailto:antlr-interest- boun...@antlr.org] On Behalf Of Alan Condit Sent: Tuesday, May 18, 2010 9:42 PM To: antlr-interest@antlr.org Subject: [antlr-interest] SKIP() vs skip() in 'C' runtime Where is the code for SKIP() found in the 'C' runtime? I had SKIP() in my C code version of the parser then I had to move to Java to find some bugs in my grammar. There I had to change SKIP() to skip(). Now I am going back to 'C' but I would like to change the 'C' runtime so that it will accept the lowercase skip(). Thanks, Alan --- Alan Condit 1085 Tierra Ct. Woodburn, OR 97071 Email -- acon...@ipns.com Home-Office (503) 982-0906 List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 0] Re: [antlr-interest] another question about custom lexer
Well, what language are you talking about? What are you trying to achieve? Why do you think you need a custom lexer? http://perl.plover.com/Questions.html Jim -Original Message- From: antlr-interest-boun...@antlr.org [mailto:antlr-interest- boun...@antlr.org] On Behalf Of ante...@freemail.hu Sent: Wednesday, May 19, 2010 1:59 AM To: antlr-interest@antlr.org Subject: [antlr-interest] another question about custom lexer Hi, I have a hand-made lexer that returns tokens. Let us say it has a fuction string getnexttoken(int tokentype); How would you plug that in the Antlr? Thanks. Marton Papp List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28897] [antlr-interest] Input buffer instead of reading the whole file
Hi, a back-breaker question, Is it possible under these circumstances to have the input file read in blocks (say, 8kb) instead of reading the whole file into memory? I'll be writing actions for every rule (not using Antlr's AST). Once the actions are processed the input history is not used. Reason: Some source files are 800mb - 1.4gb in size and reading the entire thing into 32 bit address space doesn't leave much leftover. If it's possible to limit the input buffer size, can you point me in the right direction? Thanks, Bob List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28898] [antlr-interest] Token lin lexer
I'm 1 day into Antlr and hope for an answer to this: With an identifier rule (for example this one): SIMPLE_IDENTIFIER : ( 'a'..'z'|'A'..'Z'|'_' ) ( 'a'..'z'|'A'..'Z'|'_'|'0'..'9'|'$')* ; Is it possible, when the lexer recognizes the input stream to be a SIMPLE_IDENTIFIER, to add some extra code that would look-up the SIMPLE_IDENTIFIER and return possibly a different token? - Thus directing the parser to different grammar rules. Take this expression for example: ( V(n1)/r1 + Func(arg1) ) where the semantics of V(n1) are more akin to n1-V rather than a function call to V with arg n1. I'd like to capture the V(n1) during parsing and make it a n1-V node instead of a function call node. Using flex this is easy: Once the identifier string is matched it can be used in a lookup to determine the token type then fed to bison. So, Can Antlr let me switch the token type at the lexical level before the parser gets hold of it? Hope this makes sense! List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28900] [antlr-interest] null pointer to ADAPTOR-setTokenBoundaries
Help!!! I am getting a null pointer to setTokenBoundaries in the following line of generated code. ADAPTOR-setTokenBoundaries(ADAPTOR, retval.tree, retval.start, retval.stop); The grammar works under Java. In moving it back to 'C', I changed the language option to 'C', added option ASTLabelType=pANTLR3_BASE_TREE; and added the necessary includes to compile and link under Objective-C. Is there anything obvious that I am doing wrong? Thanks, Alan --- Alan Condit 1085 Tierra Ct. Woodburn, OR 97071 Email -- acon...@ipns.com Home-Office (503) 982-0906 List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28901] Re: [antlr-interest] enums in v4 ANTLR Java code generation considered useless
On May 18, 2010, at 2:58 PM, Scott Stanchfield wrote: There are several advantages to enums: * there is a discrete set of values that can be used (no accidental 42's passed in when 42 isn't a token type) * the enum value can carry extra information * the enum values can override methods differently These are all excellent advantages. I believe that these mostly apply when you're writing code, not generating. Just like the compiler generates integers underneath, if antlr is generating integers, it's probably okay. OH - one of the things that's clouding this is that you really don't need the numeric type identifers anymore. You can just have public enum TokenType { IDENT, INT ...; } then in your match method: void match(TokenType type) { if (LA(1).getType() == type) { ... } } The only problem is that match() lives up in the superclass in the library but the generated parser needs to define the enum. I also have the problem that I need to merge token types from multiple grammars for grammar imports. This gets more competition with enum types without inheritance. And you can use the types in a switch statement: switch(type) { case INT: case IDENT: ... } No more magic numbers! Woohoo! ANTLR already uses the labels when possible such as INT. If you use a literal in your grammar such as ';' in don't label it in the lexer, than I had no choice but to generate the integer token type or a weird label like TOKEN34. Ter List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28902] Re: [antlr-interest] enums in v4 ANTLR Java code generation considered useless
You can still define the match in the superclass -- just use an interface like Edgar mentioned and I demonstrated in the clarification note I sent. I think the big value here would be that it forces every place that uses the token types to use the enum names (as there are no integer values). I think that would help debugging enormously (rather than seeing '4' as the value in the variables window, you'd see 'IDENT'). -- Scott Scott Stanchfield http://javadude.com On Wed, May 19, 2010 at 2:34 PM, Terence Parr pa...@cs.usfca.edu wrote: On May 18, 2010, at 2:58 PM, Scott Stanchfield wrote: There are several advantages to enums: * there is a discrete set of values that can be used (no accidental 42's passed in when 42 isn't a token type) * the enum value can carry extra information * the enum values can override methods differently These are all excellent advantages. I believe that these mostly apply when you're writing code, not generating. Just like the compiler generates integers underneath, if antlr is generating integers, it's probably okay. OH - one of the things that's clouding this is that you really don't need the numeric type identifers anymore. You can just have public enum TokenType { IDENT, INT ...; } then in your match method: void match(TokenType type) { if (LA(1).getType() == type) { ... } } The only problem is that match() lives up in the superclass in the library but the generated parser needs to define the enum. I also have the problem that I need to merge token types from multiple grammars for grammar imports. This gets more competition with enum types without inheritance. And you can use the types in a switch statement: switch(type) { case INT: case IDENT: ... } No more magic numbers! Woohoo! ANTLR already uses the labels when possible such as INT. If you use a literal in your grammar such as ';' in don't label it in the lexer, than I had no choice but to generate the integer token type or a weird label like TOKEN34. Ter List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28903] Re: [antlr-interest] enums in v4 ANTLR Java code generation considered useless
On May 19, 2010, at 11:39 AM, Scott Stanchfield wrote: You can still define the match in the superclass -- just use an interface like Edgar mentioned and I demonstrated in the clarification note I sent. oh right. I think the big value here would be that it forces every place that uses the token types to use the enum names (as there are no integer values). I think that would help debugging enormously (rather than seeing '4' as the value in the variables window, you'd see 'IDENT'). what about ';' token? What's it's label? T List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28904] [antlr-interest] company looking for 2 ANTLR developers
Hi, a recruiter in NYC has 2 positions to fill for a client. full-time and paying anywhere from $100k to $120k. Contact info: Hamilton Daza Intrigue Systems, Inc. 7211 Austin Street #259 Forest Hills, NY 11375 800.809.0318 Main 917.699.3376 Mobile 718.841.7091 Fax hdaza at intriguesys.com Ter List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28905] Re: [antlr-interest] enums in v4 ANTLR Java code generation considered useless
Hmmm... that's evil, ya know that ;) Good to catch that now, though... Probably LITERAL_1, LITERAL_2, etc. To make it easier for debugging/printing/reporting you could add a pattern property (hmmm... the more I think about it the more I like it... if there's a description it could be printed w/ the error message, otherwise the pattern. both could be useful for other purposes) public enum FooParserTokens implements TokenType { IDENT(('a'..'z')('a'..'z'|'A'..'Z')*, an identifier ...), LITERAL_1(;, null), LITERAL_2(+, null); private String pattern; private String description; private FooParserTokens(String pattern, String description) { this.pattern = pattern; this.description = description; } } -- Scott Scott Stanchfield http://javadude.com On Wed, May 19, 2010 at 2:42 PM, Terence Parr pa...@cs.usfca.edu wrote: On May 19, 2010, at 11:39 AM, Scott Stanchfield wrote: You can still define the match in the superclass -- just use an interface like Edgar mentioned and I demonstrated in the clarification note I sent. oh right. I think the big value here would be that it forces every place that uses the token types to use the enum names (as there are no integer values). I think that would help debugging enormously (rather than seeing '4' as the value in the variables window, you'd see 'IDENT'). what about ';' token? What's it's label? T List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28907] Re: [antlr-interest] enums in v4 ANTLR Java code generation considered useless
I also have doubts about the performance characteristics and the possibility of starting to rely on the target language to fill in gaps such as token numbering - we could get to the point where code generators cannot be built for more primitive languages because the schema is relying the language to automatically do things. The generated code should be as primitive as possible, with the runtime being as maintainable and clear as possible while not sacrificing performance. Jim -Original Message- From: antlr-interest-boun...@antlr.org [mailto:antlr-interest- boun...@antlr.org] On Behalf Of Terence Parr Sent: Wednesday, May 19, 2010 11:35 AM To: antlr-interest interest Subject: Re: [antlr-interest] enums in v4 ANTLR Java code generation considered useless On May 18, 2010, at 2:58 PM, Scott Stanchfield wrote: There are several advantages to enums: * there is a discrete set of values that can be used (no accidental 42's passed in when 42 isn't a token type) * the enum value can carry extra information * the enum values can override methods differently These are all excellent advantages. I believe that these mostly apply when you're writing code, not generating. Just like the compiler generates integers underneath, if antlr is generating integers, it's probably okay. OH - one of the things that's clouding this is that you really don't need the numeric type identifers anymore. You can just have public enum TokenType { IDENT, INT ...; } then in your match method: void match(TokenType type) { if (LA(1).getType() == type) { ... } } The only problem is that match() lives up in the superclass in the library but the generated parser needs to define the enum. I also have the problem that I need to merge token types from multiple grammars for grammar imports. This gets more competition with enum types without inheritance. And you can use the types in a switch statement: switch(type) { case INT: case IDENT: ... } No more magic numbers! Woohoo! ANTLR already uses the labels when possible such as INT. If you use a literal in your grammar such as ';' in don't label it in the lexer, than I had no choice but to generate the integer token type or a weird label like TOKEN34. Ter List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28908] Re: [antlr-interest] enums in v4 ANTLR Java code generation considered useless
Interesting point re common code generation approaches, but as far as performance goes, it's equivalent - all == tests are done using pointers, which are the same size as ints. If switch is used the ordinal values of the enums are used, and the java compiler may be able to better optimize which switch bytecode is used b/c it knows the exact possible range of values. I'd much rather use enums where available, though. I'd think any code generator could generate a simple int equivalent where enums don't exist, though. The only gotcha would be if we had the pattern/description properties, which would have to be represented as separate arrays in most languages. They aren't necessary though (but I'd love to have them) -- Scott Scott Stanchfield http://javadude.com On Wed, May 19, 2010 at 3:04 PM, Jim Idle j...@temporal-wave.com wrote: I also have doubts about the performance characteristics and the possibility of starting to rely on the target language to fill in gaps such as token numbering - we could get to the point where code generators cannot be built for more primitive languages because the schema is relying the language to automatically do things. The generated code should be as primitive as possible, with the runtime being as maintainable and clear as possible while not sacrificing performance. Jim -Original Message- From: antlr-interest-boun...@antlr.org [mailto:antlr-interest- boun...@antlr.org] On Behalf Of Terence Parr Sent: Wednesday, May 19, 2010 11:35 AM To: antlr-interest interest Subject: Re: [antlr-interest] enums in v4 ANTLR Java code generation considered useless On May 18, 2010, at 2:58 PM, Scott Stanchfield wrote: There are several advantages to enums: * there is a discrete set of values that can be used (no accidental 42's passed in when 42 isn't a token type) * the enum value can carry extra information * the enum values can override methods differently These are all excellent advantages. I believe that these mostly apply when you're writing code, not generating. Just like the compiler generates integers underneath, if antlr is generating integers, it's probably okay. OH - one of the things that's clouding this is that you really don't need the numeric type identifers anymore. You can just have public enum TokenType { IDENT, INT ...; } then in your match method: void match(TokenType type) { if (LA(1).getType() == type) { ... } } The only problem is that match() lives up in the superclass in the library but the generated parser needs to define the enum. I also have the problem that I need to merge token types from multiple grammars for grammar imports. This gets more competition with enum types without inheritance. And you can use the types in a switch statement: switch(type) { case INT: case IDENT: ... } No more magic numbers! Woohoo! ANTLR already uses the labels when possible such as INT. If you use a literal in your grammar such as ';' in don't label it in the lexer, than I had no choice but to generate the integer token type or a weird label like TOKEN34. Ter List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28909] Re: [antlr-interest] enums in v4 ANTLR Java code generation considered useless
On Wed, May 19, 2010 at 2:13 PM, Scott Stanchfield sc...@javadude.com wrote: Interesting point re common code generation approaches, but as far as performance goes, it's equivalent - all == tests are done using pointers, which are the same size as ints. If switch is used the ordinal values of the enums are used, and the java compiler may be able to better optimize which switch bytecode is used b/c it knows the exact possible range of values. That's true of most full scale JVMs with good JIT, but for many embedded VM's that isn't true. See the Dalvik VM for Android. This link for instance: http://developer.android.com/guide/practices/design/performance.html#avoid_enums I believe it is becoming less true as time goes along, but from what I know right now it is true. If you can't support generating both, I'd agree with Jim Idle support the one that will go everywhere. If however you could treat it like the C target does with using switch vs. if/else, I'd think that'd be nifty. Doubly so because maintenance burden is free when somebody else is doing the work. As this affects the external API, I would assume that it's a non-option to generate one or the other. I'd much rather use enums where available, though. I'd think any code generator could generate a simple int equivalent where enums don't exist, though. The only gotcha would be if we had the pattern/description properties, which would have to be represented as separate arrays in most languages. They aren't necessary though (but I'd love to have them) -- Scott Scott Stanchfield http://javadude.com On Wed, May 19, 2010 at 3:04 PM, Jim Idle j...@temporal-wave.com wrote: I also have doubts about the performance characteristics and the possibility of starting to rely on the target language to fill in gaps such as token numbering - we could get to the point where code generators cannot be built for more primitive languages because the schema is relying the language to automatically do things. The generated code should be as primitive as possible, with the runtime being as maintainable and clear as possible while not sacrificing performance. Jim -Original Message- From: antlr-interest-boun...@antlr.org [mailto:antlr-interest- boun...@antlr.org] On Behalf Of Terence Parr Sent: Wednesday, May 19, 2010 11:35 AM To: antlr-interest interest Subject: Re: [antlr-interest] enums in v4 ANTLR Java code generation considered useless On May 18, 2010, at 2:58 PM, Scott Stanchfield wrote: There are several advantages to enums: * there is a discrete set of values that can be used (no accidental 42's passed in when 42 isn't a token type) * the enum value can carry extra information * the enum values can override methods differently These are all excellent advantages. I believe that these mostly apply when you're writing code, not generating. Just like the compiler generates integers underneath, if antlr is generating integers, it's probably okay. OH - one of the things that's clouding this is that you really don't need the numeric type identifers anymore. You can just have public enum TokenType { IDENT, INT ...; } then in your match method: void match(TokenType type) { if (LA(1).getType() == type) { ... } } The only problem is that match() lives up in the superclass in the library but the generated parser needs to define the enum. I also have the problem that I need to merge token types from multiple grammars for grammar imports. This gets more competition with enum types without inheritance. And you can use the types in a switch statement: switch(type) { case INT: case IDENT: ... } No more magic numbers! Woohoo! ANTLR already uses the labels when possible such as INT. If you use a literal in your grammar such as ';' in don't label it in the lexer, than I had no choice but to generate the integer token type or a weird label like TOKEN34. Ter List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at
[il-antlr-interest: 28911] Re: [antlr-interest] enums in v4 ANTLR Java code generation considered useless
Don't pre-optimize for things like this. Profile, then optimize. This won't even show up as an issue. I think whoever wrote that page was daydreaming about any minor way performance might be increased - note that they don't talk at all on that page about the big performance issues (I/O, networking, etc), though I do like that they talk about limiting object creation. With the example they show on that android dev page, you'll never see/feel the difference. And their example on grabbing the ordinal value so you don't need to lookup a static field is really silly. If they just want to avoid looking up the static field everytime through the loop, don't do: int valX = MyEnum.VAL_X.ordinal(); int valY = MyEnum.VAL_Y.ordinal(); int count = list.size(); MyItem items = list.items(); for (int n = 0; n count; n++) { int valItem = items[n].e.ordinal(); if (valItem == valX) // do stuff 1 else if (valItem == valY) // do stuff 2 } instead do MyEnum valX = MyEnum.VAL_X; MyEnum valY = MyEnum.VAL_Y; int count = list.size(); MyItem items = list.items(); for (int n = 0; n count; n++) { MyEnum valItem = items[n].e; if (valItem == valX) // do stuff 1 else if (valItem == valY) // do stuff 2 } Stuff like that makes me think whoever wrote that really didn't think it through all the way. The pointer comparison is the same expense as the int comparison and avoids n+2 calls to ordinal() in their example code. Moreso, the suggestion to use constants that the compiler will inline is truly evil. Compiler constant inlining can very easily lead to incorrect constant values when a library (that provides a constant) changes (new jar dropped in with a new value for the constant) but the code using that library isn't recompiled. Safety issue. If this becomes an issue (which I doubt it will), someone can always extend the code generator to tweak it. -- Scott Scott Stanchfield http://javadude.com On Wed, May 19, 2010 at 3:59 PM, Kirby Bohling kirby.bohl...@gmail.com wrote: On Wed, May 19, 2010 at 2:13 PM, Scott Stanchfield sc...@javadude.com wrote: Interesting point re common code generation approaches, but as far as performance goes, it's equivalent - all == tests are done using pointers, which are the same size as ints. If switch is used the ordinal values of the enums are used, and the java compiler may be able to better optimize which switch bytecode is used b/c it knows the exact possible range of values. That's true of most full scale JVMs with good JIT, but for many embedded VM's that isn't true. See the Dalvik VM for Android. This link for instance: http://developer.android.com/guide/practices/design/performance.html#avoid_enums I believe it is becoming less true as time goes along, but from what I know right now it is true. If you can't support generating both, I'd agree with Jim Idle support the one that will go everywhere. If however you could treat it like the C target does with using switch vs. if/else, I'd think that'd be nifty. Doubly so because maintenance burden is free when somebody else is doing the work. As this affects the external API, I would assume that it's a non-option to generate one or the other. I'd much rather use enums where available, though. I'd think any code generator could generate a simple int equivalent where enums don't exist, though. The only gotcha would be if we had the pattern/description properties, which would have to be represented as separate arrays in most languages. They aren't necessary though (but I'd love to have them) -- Scott Scott Stanchfield http://javadude.com On Wed, May 19, 2010 at 3:04 PM, Jim Idle j...@temporal-wave.com wrote: I also have doubts about the performance characteristics and the possibility of starting to rely on the target language to fill in gaps such as token numbering - we could get to the point where code generators cannot be built for more primitive languages because the schema is relying the language to automatically do things. The generated code should be as primitive as possible, with the runtime being as maintainable and clear as possible while not sacrificing performance. Jim -Original Message- From: antlr-interest-boun...@antlr.org [mailto:antlr-interest- boun...@antlr.org] On Behalf Of Terence Parr Sent: Wednesday, May 19, 2010 11:35 AM To: antlr-interest interest Subject: Re: [antlr-interest] enums in v4 ANTLR Java code generation considered useless On May 18, 2010, at 2:58 PM, Scott Stanchfield wrote: There are several advantages to enums: * there is a discrete set of values that can be used (no accidental 42's passed in when 42 isn't a token type) * the enum value can carry extra information * the enum values can override
[il-antlr-interest: 28912] Re: [antlr-interest] null pointer to ADAPTOR-setTokenBoundaries
Jim, Here is what I have set in options: options { backtrack = true; memoize = true; language= C; output = AST; ASTLabelType= pANTLR3_BASE_TREE; } The null is inside 'ctx' inside 'adaptor' at 'setTokenBoundaries'. It is inside a function /** * $ANTLR start line * /Users/acondit/source/GCCnv/LatheBranch/trunk/Parser/RS274ngc.g:184:1: line : ( ( line_number )? ( segment )+ K_NEWLINE - ^( STMT ( segment )+ ) | ( line_number )? K_NEWLINE - | oword_stmt - ^( STMT oword_stmt ) ); */ static RS274ngcParser_line_return line(pRS274ngcParser ctx) { ... } which I assume, based on the comment, is generated from this rule: line: line_number? segment+ K_NEWLINE - ^(STMT segment+) | line_number? K_NEWLINE - | oword_stmt - ^(STMT oword_stmt) ; The grammar is for parsing an existing language not one of my invention, and grammatically the newlines delineate a semantic block therefore must be known by the parser, but empty lines are discarded and therefore should not be in the tree. Alan --- Alan's MachineWorks 1085 Tierra Ct. Woodburn, OR 97071 Email -- acon...@alansmachineworks.com www.alansmachineworks.com Jim wrote-- Please post more information about your grammar, what the null pointer is, etc. It is hard to interpolate, but the common mistake is not adding output=AST; to the options, so you do not get a tree adaptor created. Jim -Original Message- From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- bounces at antlr.org ] On Behalf Of Alan Condit Sent: Wednesday, May 19, 2010 11:25 AM To: antlr-interest at antlr.org Subject: [antlr-interest] null pointer to ADAPTOR-setTokenBoundaries Help!!! I am getting a null pointer to setTokenBoundaries in the following line of generated code. ADAPTOR-setTokenBoundaries(ADAPTOR, retval.tree, retval.start, retval.stop); The grammar works under Java. In moving it back to 'C', I changed the language option to 'C', added option ASTLabelType=pANTLR3_BASE_TREE; and added the necessary includes to compile and link under Objective-C. Is there anything obvious that I am doing wrong? Thanks, Alan List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28914] [antlr-interest] Referencing attributes
Greetings, I'm a Antlr noob, and have a question regarding accessing attributes. Where, outside of action, can you reference attributes? One place seems to be as parameter to rule invocation like this: decl: type declarator[ $type.text ] ';' ; This is from The Definitive Antlr Reference, page 119. Is that true in general? Are there other locations outside of actions where attributes can be accessed? As noted, I am a noob to Antlr and just joined this list. Please let me know if this email's question/topic is not appropriate to the list. Thanks. List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28915] Re: [antlr-interest] null pointer to ADAPTOR-setTokenBoundaries
Pardon me for butting in. And I have never used the C code generator, but. On Wed, 2010-05-19 at 14:06 -0700, Alan Condit wrote: which I assume, based on the comment, is generated from this rule: line : line_number? segment+ K_NEWLINE - ^(STMT segment+) | line_number? K_NEWLINE - | oword_stmt - ^(STMT oword_stmt) ; The grammar is for parsing an existing language not one of my invention, and grammatically the newlines delineate a semantic block therefore must be known by the parser, but empty lines are discarded and therefore should not be in the tree. having an empty RHS of the - rewrite operator feels well unusual. i am not sure that ANTLR permits a rule which produces no tree when output=AST is present Maybe try (untested): line : line_number? ( segment+ - ^(STMT segment+) )? K_NEWLINE | oword_stmt - ^(STMT oword_stmt) ; but i do not know what would happen when no segment is present for the above rule have you considered building a dummy tree node for the empty case and then your tree walker can just ignore it? not sure that i have really helped any, sorry. -jbb List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28916] Re: [antlr-interest] Question about building code generation target
On Jan 16 2009, 4:51 pm, Jim Idle j...@temporal-wave.com wrote: When you change your template or codegen target java file, you just type: mvn And it rebuilds just what has changed in a second or two (depends on your machine speed of course). On my slow machine, this takes 33 seconds after changing 1 template file. However, once its built, I can unjar to /path/to/antlr_unjarred export CLASSPATH=/path/to/antlr_unjarred:$CLASSPATH and edit the templates without having to rebuild anything. by the way, are there plans to integrate the build of the other runtimes into maven ? List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28918] Re: [antlr-interest] null pointer to ADAPTOR-setTokenBoundaries
On page 164 of The Definitive Antlr Reference under the heading Omitting Input Elements Terrance shows using an empty rewrite rule to allow omitting unneeded symbols from the output AST tree. This does not say that it could not be causing a problem with the generated 'C' code. Jim, is there a possibility that this is a problem? Alan --- Alan Condit 1085 Tierra Ct. Woodburn, OR 97071 Email -- acon...@ipns.com Home-Office (503) 982-0906 On May 19, 2010, at 3:36 PM, John B. Brodie wrote: Pardon me for butting in. And I have never used the C code generator, but. On Wed, 2010-05-19 at 14:06 -0700, Alan Condit wrote: which I assume, based on the comment, is generated from this rule: line : line_number? segment+ K_NEWLINE - ^(STMT segment+) | line_number? K_NEWLINE - | oword_stmt - ^(STMT oword_stmt) ; The grammar is for parsing an existing language not one of my invention, and grammatically the newlines delineate a semantic block therefore must be known by the parser, but empty lines are discarded and therefore should not be in the tree. having an empty RHS of the - rewrite operator feels well unusual. i am not sure that ANTLR permits a rule which produces no tree when output=AST is present Maybe try (untested): line : line_number? ( segment+ - ^(STMT segment+) )? K_NEWLINE | oword_stmt - ^(STMT oword_stmt) ; but i do not know what would happen when no segment is present for the above rule have you considered building a dummy tree node for the empty case and then your tree walker can just ignore it? not sure that i have really helped any, sorry. -jbb List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28919] Re: [antlr-interest] null pointer to ADAPTOR-setTokenBoundaries
I think you will have to put those three productions in separate rules, but I will look into it more. Jim -Original Message- From: antlr-interest-boun...@antlr.org [mailto:antlr-interest- boun...@antlr.org] On Behalf Of Alan Condit Sent: Wednesday, May 19, 2010 2:06 PM To: antlr-interest@antlr.org Subject: Re: [antlr-interest] null pointer to ADAPTOR- setTokenBoundaries Jim, Here is what I have set in options: options { backtrack = true; memoize = true; language= C; output = AST; ASTLabelType= pANTLR3_BASE_TREE; } The null is inside 'ctx' inside 'adaptor' at 'setTokenBoundaries'. It is inside a function /** * $ANTLR start line * /Users/acondit/source/GCCnv/LatheBranch/trunk/Parser/RS274ngc.g:184:1: line : ( ( line_number )? ( segment )+ K_NEWLINE - ^( STMT ( segment )+ ) | ( line_number )? K_NEWLINE - | oword_stmt - ^( STMT oword_stmt ) ); */ static RS274ngcParser_line_return line(pRS274ngcParser ctx) { ... } which I assume, based on the comment, is generated from this rule: line : line_number? segment+ K_NEWLINE - ^(STMT segment+) | line_number? K_NEWLINE - | oword_stmt - ^(STMT oword_stmt) ; The grammar is for parsing an existing language not one of my invention, and grammatically the newlines delineate a semantic block therefore must be known by the parser, but empty lines are discarded and therefore should not be in the tree. Alan --- Alan's MachineWorks 1085 Tierra Ct. Woodburn, OR 97071 Email -- acon...@alansmachineworks.com www.alansmachineworks.com Jim wrote-- Please post more information about your grammar, what the null pointer is, etc. It is hard to interpolate, but the common mistake is not adding output=AST; to the options, so you do not get a tree adaptor created. Jim -Original Message- From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- bounces at antlr.org ] On Behalf Of Alan Condit Sent: Wednesday, May 19, 2010 11:25 AM To: antlr-interest at antlr.org Subject: [antlr-interest] null pointer to ADAPTOR-setTokenBoundaries Help!!! I am getting a null pointer to setTokenBoundaries in the following line of generated code. ADAPTOR-setTokenBoundaries(ADAPTOR, retval.tree, retval.start, retval.stop); The grammar works under Java. In moving it back to 'C', I changed the language option to 'C', added option ASTLabelType=pANTLR3_BASE_TREE; and added the necessary includes to compile and link under Objective- C. Is there anything obvious that I am doing wrong? Thanks, Alan List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28920] Re: [antlr-interest] C target - initialization of return/scope structures
Why would you try to use a return value that you have not set? If it is set to NULL then you will core dump unless you check for NULL so it would not help you. The values are not initialized because I don't know what they are, they might be object references or something that cannot be set to NULL. I changed from assuming a nullable target because everyone complained ;-) But I assure you that you can initialize all your values in the @init{} section. Where is it that you are having problems. I think that your question might not be the one you are asking. Jim -Original Message- From: antlr-interest-boun...@antlr.org [mailto:antlr-interest- boun...@antlr.org] On Behalf Of Cristian Târºoagã Sent: Wednesday, May 19, 2010 2:08 PM To: antlr-interest@antlr.org Subject: [antlr-interest] C target - initialization of return/scope structures Hi All, My name is Chris, I started to use antlr and I like it a lot! I use C++ and I have successfully used it to generate some sourcecode. I need to use C++: I want std::string, std::vector and more things like this. But since I use the C target, it didn't took too much time to get into some quirks though. One of the problems I had/have is this: structures used for return values and those used for scope values are NOT initialized. Since I tried to use a std:string as a scoped value, I quickly got a nice crash since my string was created using malloc. These are (well) known problems, I know that. I found some posts from other guys having the same problems. I also found some recommendations on how to avoid initialization problems. E.g: http://www.mail-archive.com/il-antlr- inter...@googlegroups.com/msg02614.html The hint there was to use pointers, and: 1. define ANTLR3_MALLOC / ANTLR3_FREE to override antlr's allocators or 2. manually allocate/deallocate those pointers, probably inside @init and @after I'd like to have a clean solution to this, but I can't see how any of these two options can properly work. Option 1: I can't override antlr allocator like suggested #define ANTLR3_MALLOC(request) new request() because ANTLR_MALLOC is actually called with an argument which is actually the SIZE of the type that will be allocated and not the TYPE itself. I think a simple change inside antlr can fix this, but until then I tried the other way... Option 2: I can't use @init and @after because this will create memory leaks. Imagine that I have a scoped value x. I would do @init {x = new X();} and @after{delete x;} When rule is fully matched, this works perfectly. But when the parser fails, the code the pops the scoped value from the stack is called (and my piece of code inside @after is skipped) so I will get a memory leak!! I noticed that the scoped values also have a free function pointer inside (member), that can take care of deallocation in that situations, but I couldn't find a way to set it. (?) So: - my suggestion: change the ANTLR_MALLOC macro (change the name to ANTLR_ALLOC and change the impl to take as arg the type itself, so that a c++ impl could override it with 'new') - my suggestion: generate a properly initialized structure (I know, it's C code, but still...once you have such a smart StringTemplate lib, this shouldn't be a problem) - my question: what would be a clean way to allocate/deallocate pointers (without leaks)? THANKS a lot for ANTLR and for your help! Chris PS: I have some other problems too with the C target: I wasn't able to use composite grammars with C++. I will get back on this later :-) List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28921] Re: [antlr-interest] enums in v4 ANTLR Java code generation considered useless
I suspect that your benchmark runs afoul of clock granularity issues for the JIT. If you run it a few times you will likely get different results. Also you say 10% better for enums but look at your results again. Take the client JIT, your first run gives: Enum Time: 25707993 Int Time : 28520406 So enum is slightly better, but your second run gives: Enum Time: 34060167 Int Time : 24820249 And Int time in this run is superior to your enum time by a far greater margin than the reverse in the first run. Your server shows a similar disparity. You have to run for much longer times and repeat many times, then average out because the JIT does not always make the same decision. Unless there is something about your print outs that I am missing? Finally, I would not trust 64 bit openjdk as far as I can throw my house :-) Finally, finally, you need to look at switch() performance really, and as ANTLR will (does if you set the -X options to the same values as I use in the C generator) use them. There tend to be a fair number of switch cases with some further embedded switches. The C optimizer will murder those but the Java JIT has some opportunity to reorder the case at runtime and theoretically it could do better than the C compiler for some use cases. It rarely does though because of other overheads and the fact that most real world applications don't exhibit a polarization to one or two oft used cases out of many. You can see that ANTLR generated code would only do this if out of many alts, just one or two were taken a lot (which would depend on the language being parsed). Jim -Original Message- From: antlr-interest-boun...@antlr.org [mailto:antlr-interest- boun...@antlr.org] On Behalf Of Kirby Bohling Sent: Wednesday, May 19, 2010 4:29 PM To: Scott Stanchfield Cc: antlr-interest interest Subject: Re: [antlr-interest] enums in v4 ANTLR Java code generation considered useless On topic, I think the only important decision to make is from an API perspective, while one can go tweak the generator, going from int's to enums would change the API. I'd suggest just deciding which one you want to support. Enums are definitely nicer from that perspective. Given the below performance benchmarks, and just how much of ANTLR's output is really just a series of if/else or switch blocks buried inside of a huge number of loops, I actually do think you'd spot the difference. Moving well off-topic, but since you said to, I did just what you suggested: Using my personal laptop running Fedora 11 using x86_64 for the kernel and JVM: $ java -version java version 1.6.0_18 OpenJDK Runtime Environment (IcedTea6 1.8) (fedora-35.b18.fc11-x86_64) OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode) Both CPU's are Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz w/ 3MB cache. These aren't spectacular benchmarks from an accuracy perspective, but illustrate that assuming ints and enums have identical performance characteristics in all cases is an invalid assumption: Using java -Xint Foo: Enum Time: 516121334 Int Time : 424748884 Enum Time: 514078841 Int Time : 423574161 ~21% performance hit to use enums with HotSpot disabled, (similar to the DalikVM because it has minimal JIT as of right now, which I'm guessing why the original article suggested you stay away from them near performance critical areas). Using: java -client Foo Enum Time: 25707993 Int Time : 28520406 Enum Time: 34060167 Int Time : 24820249 ~10% speed up for using enums. Using: java -server Foo Enum Time: 25543589 Int Time : 28637110 Enum Time: 32887612 Int Time : 28968574 Again ~10% speed up for using enums. So there might actually be a reason to support Enum's internally from a speed/performance perspective if the non-JIT case is considered negligible. I thought they'd match your claim in this case. Didn't have any reason to actually think enums would be faster then int's. -- Sample code: public class Foo { private static long MAX = 1000; public static void main(String[] args) { doEnums(); doInts(); doEnums(); doInts(); } public static void doInts() { int val = 0; long start = System.nanoTime(); for (long iii = 0; iii MAX; ++iii) { if (0 == val) { val = 1; } else if (1 == val) { val = 0; } } long end = System.nanoTime(); System.out.println(Int Time : + (end - start)); } enum Parity { EVEN, ODD }; public static void doEnums() { Parity val = Parity.EVEN; long start = System.nanoTime(); for (long iii = 0; iii MAX; ++iii) { if (Parity.EVEN == val) { val = Parity.ODD; } else if (Parity.ODD == val) { val = Parity.EVEN; } } long end =
[il-antlr-interest: 28922] Re: [antlr-interest] null pointer to ADAPTOR-setTokenBoundaries
Possibly, though I suspect your easy work around is to make each alt a subrule. I will look tomorrow. Jim -Original Message- From: antlr-interest-boun...@antlr.org [mailto:antlr-interest- boun...@antlr.org] On Behalf Of Alan Condit Sent: Wednesday, May 19, 2010 5:01 PM To: antlr-interest@antlr.org Subject: Re: [antlr-interest] null pointer to ADAPTOR- setTokenBoundaries On page 164 of The Definitive Antlr Reference under the heading Omitting Input Elements Terrance shows using an empty rewrite rule to allow omitting unneeded symbols from the output AST tree. This does not say that it could not be causing a problem with the generated 'C' code. Jim, is there a possibility that this is a problem? Alan --- Alan Condit 1085 Tierra Ct. Woodburn, OR 97071 Email -- acon...@ipns.com Home-Office (503) 982-0906 On May 19, 2010, at 3:36 PM, John B. Brodie wrote: Pardon me for butting in. And I have never used the C code generator, but. On Wed, 2010-05-19 at 14:06 -0700, Alan Condit wrote: which I assume, based on the comment, is generated from this rule: line : line_number? segment+ K_NEWLINE - ^(STMT segment+) | line_number? K_NEWLINE - | oword_stmt - ^(STMT oword_stmt) ; The grammar is for parsing an existing language not one of my invention, and grammatically the newlines delineate a semantic block therefore must be known by the parser, but empty lines are discarded and therefore should not be in the tree. having an empty RHS of the - rewrite operator feels well unusual. i am not sure that ANTLR permits a rule which produces no tree when output=AST is present Maybe try (untested): line : line_number? ( segment+ - ^(STMT segment+) )? K_NEWLINE | oword_stmt - ^(STMT oword_stmt) ; but i do not know what would happen when no segment is present for the above rule have you considered building a dummy tree node for the empty case and then your tree walker can just ignore it? not sure that i have really helped any, sorry. -jbb List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28923] Re: [antlr-interest] enums in v4 ANTLR Java code generation considered useless
I just ran that code with it looping through doEnums/doInts 1000 times. The difference was ~5% for -client and -Xbatch, and ~10% for -server. (I tried -Xint and it took waay too long). All had enums as higher, which sounds reasonable (as there's static field lookups being done) My main point here is that while we're seeing 5-10% or so differences, that's 5-10% difference in part of the program that goes incredibly fast (so a 5-10% hit is unnoticeable), whereas a 5-10% hit in I/O could be a very big deal. We're measuring a performance difference of millions of calls. In a typical parse, you may have a few thousand tokens, each of which may be tested a few dozen times. -- Scott Scott Stanchfield http://javadude.com On Wed, May 19, 2010 at 7:29 PM, Kirby Bohling kirby.bohl...@gmail.com wrote: On topic, I think the only important decision to make is from an API perspective, while one can go tweak the generator, going from int's to enums would change the API. I'd suggest just deciding which one you want to support. Enums are definitely nicer from that perspective. Given the below performance benchmarks, and just how much of ANTLR's output is really just a series of if/else or switch blocks buried inside of a huge number of loops, I actually do think you'd spot the difference. Moving well off-topic, but since you said to, I did just what you suggested: Using my personal laptop running Fedora 11 using x86_64 for the kernel and JVM: $ java -version java version 1.6.0_18 OpenJDK Runtime Environment (IcedTea6 1.8) (fedora-35.b18.fc11-x86_64) OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode) Both CPU's are Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz w/ 3MB cache. These aren't spectacular benchmarks from an accuracy perspective, but illustrate that assuming ints and enums have identical performance characteristics in all cases is an invalid assumption: Using java -Xint Foo: Enum Time: 516121334 Int Time : 424748884 Enum Time: 514078841 Int Time : 423574161 ~21% performance hit to use enums with HotSpot disabled, (similar to the DalikVM because it has minimal JIT as of right now, which I'm guessing why the original article suggested you stay away from them near performance critical areas). Using: java -client Foo Enum Time: 25707993 Int Time : 28520406 Enum Time: 34060167 Int Time : 24820249 ~10% speed up for using enums. Using: java -server Foo Enum Time: 25543589 Int Time : 28637110 Enum Time: 32887612 Int Time : 28968574 Again ~10% speed up for using enums. So there might actually be a reason to support Enum's internally from a speed/performance perspective if the non-JIT case is considered negligible. I thought they'd match your claim in this case. Didn't have any reason to actually think enums would be faster then int's. -- Sample code: public class Foo { private static long MAX = 1000; public static void main(String[] args) { doEnums(); doInts(); doEnums(); doInts(); } public static void doInts() { int val = 0; long start = System.nanoTime(); for (long iii = 0; iii MAX; ++iii) { if (0 == val) { val = 1; } else if (1 == val) { val = 0; } } long end = System.nanoTime(); System.out.println(Int Time : + (end - start)); } enum Parity { EVEN, ODD }; public static void doEnums() { Parity val = Parity.EVEN; long start = System.nanoTime(); for (long iii = 0; iii MAX; ++iii) { if (Parity.EVEN == val) { val = Parity.ODD; } else if (Parity.ODD == val) { val = Parity.EVEN; } } long end = System.nanoTime(); System.out.println(Enum Time: + (end - start)); } } On Wed, May 19, 2010 at 3:30 PM, Scott Stanchfield sc...@javadude.com wrote: Don't pre-optimize for things like this. Profile, then optimize. This won't even show up as an issue. I think whoever wrote that page was daydreaming about any minor way performance might be increased - note that they don't talk at all on that page about the big performance issues (I/O, networking, etc), though I do like that they talk about limiting object creation. With the example they show on that android dev page, you'll never see/feel the difference. And their example on grabbing the ordinal value so you don't need to lookup a static field is really silly. If they just want to avoid looking up the static field everytime through the loop, don't do: int valX = MyEnum.VAL_X.ordinal(); int valY = MyEnum.VAL_Y.ordinal(); int count = list.size(); MyItem items = list.items(); for (int n = 0; n count; n++) { int valItem = items[n].e.ordinal(); if (valItem == valX) // do stuff 1 else if (valItem