[il-antlr-interest: 30340] Re: [antlr-interest] Grammar natural language
Hi Armin, I would like to cover just basic sentences in the English language with the purpose to let a user formulate simple constraints for a modelling environment. For the beginning sentences like The length of a runway is not greater than 5000 metres Or If the runway is dependent then the distance is smaller than 1000 metres Thanks for any advice, Dagi -Ursprüngliche Nachricht- Von: armin.weg...@bka.bund.de [mailto:armin.weg...@bka.bund.de] Gesendet: Freitag, 15. Oktober 2010 07:47 An: Trögner, Dagi Betreff: AW: [antlr-interest] Grammar natural language Hi Dagi, for which one? Most likely you will have a separate grammar for each natural language. Armin -Ursprüngliche Nachricht- Von: antlr-interest-boun...@antlr.org [mailto:antlr-interest-boun...@antlr.org] Im Auftrag von dagi.troeg...@dlr.de Gesendet: Donnerstag, 14. Oktober 2010 14:39 An: antlr-interest@antlr.org Betreff: [antlr-interest] Grammar natural language Hi everyone, I am looking for a simple grammar for natural language. In a first version just short simple sentences would be satisfying. Has anyone tried to formulated such a grammar already? Thanks a lot, Dagi Dagi Troegner Deutsches Zentrum fuer Luft- und Raumfahrt (DLR) Institut fuer Flugfuehrung Abteilung Lotsenassistenz Lilienthalplatz 7 D-38108 Braunschweig Telefon: (49) 531 / 295-2189 Fax: (49) 531 / 295-2180 Email: dagi.troeg...@dlr.demailto:dagi.troeg...@dlr.de __/|__ /_/_/_/ |/ DLR List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 30342] [antlr-interest] complex lexer (at least to me)
Hi list, while doing a parser I ran into the trouble of lexing correctly comments and non-comments that look like comments. Comments starts with a '#' and ends at newline, they should be hidden. BUT '#!something' is an ID and ':#header' has its meaning too I've tried several ways which never worked enough, synpreds, ... This one eats everything in the last option: COLUMN_NAMES_END : HASH HEADER {System.out.println( ^~^ LEXER: found HEADER_COMMENT: + $text); }; DBT_UNIT_NAME_START : HASH BANG {System.out.println( ^~^ LEXER: found DBT_UNIT_NAME_START: + $text); }; LINE_COMMENT_OR_ELSE : ( HASH BANG ) = DBT_UNIT_NAME_START{ $type = DBT_UNIT_NAME_START; System.out.println( ^~^ LEXER: found HASH BANG: DBT_UNIT_NAME_START: + $text); } | ( HASH HEADER ) = COLUMN_NAMES_END { $type = COLUMN_NAMES_END; System.out.println( ^~^ LEXER: found HASH HEADER: COLUMN_NAMES_END: + $text); } | ( HASH (options {greedy=false;} : .)* NEWLINE ) = COMMENT {System.out.println( ^~^ LEXER: LINE_COMMENT Ignoring LINE comment: + $text); } ; protected COMMENT: HASH (options {greedy=false;} : .)* NEWLINE {$channel=HIDDEN; System.out.println( ^~^ LEXER: COMMENT: Ignoring LINE comment: + $text); } ; So every '#' line ends up caught by COMMENT and I get this unique error message on grammar generation: [java] error(208): JADATextGrammar.g:98:1: The following token definitions can never be matched because prior tokens match the same input: COMMENT Any ideas?? Stanislas Herman Rusinsky. P.S.: From the article What makes a language problem hard? ( http://www.antlr.org/wiki/pages/viewpage.action?pageId=1773 )it looks like I meet those: * Context sensitive lexer? You can't decide what vocabulay symbol to match unless you know what kind of sentence you are parsing. * Is the set of all input fixed? If you have a fixed set of files to convert, your job is much easier because the set of language construct combinations is fixed. For example, building a general Pascal to Java translator is much harder than building a translator for a set of 50 existing Pascal files. * Are delimiters non-fixed for things like strings and comments? That makes it tough to build an efficient lexer. * Are the source statements really similar; declarations vs expressions in C++? * Column sensitive input? E.g., are newlines significant like lines in a log file and does the position of an item change its meaning? * Does your input have comments as you do in programming languages that can occur anywhere in the input and need to go into the output in a sane location? * How much semantic information do you need to do the translation? For example, do you need to simply know that something is a type name or do you need to know that it is, say, an array whose indices are a set like (day,week,month) and contains records? Sometimes syntax alone is enough to do translation. * ... List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 30343] Re: [antlr-interest] Grammar natural language
Hi Dagi Grammars for natural languages are very difficult, and ANTLR is not suited for the general case. Natural languages are a complex structure involving the interaction of phonemics, morphology, syntax and semantics (not to mention general knowledge). Classic illustrations of the sort of problems that can arise are sentences like flying planes can be dangerous or general flies back to front. However, if you can restrict your corpus to a relatively small, well-defined domain (runways?), you may still be able to create an adequate grammar. But the chances that anyone has already written a grammar for that domain are correspondingly small. And your users are going to have to learn to restrict their language to what the grammar can handle, so you might really be better off writing a simple DSL instead. Steve On 15 Oct 2010, at 10:29, dagi.troeg...@dlr.de wrote: Hi Armin, I would like to cover just basic sentences in the English language with the purpose to let a user formulate simple constraints for a modelling environment. For the beginning sentences like The length of a runway is not greater than 5000 metres Or If the runway is dependent then the distance is smaller than 1000 metres Thanks for any advice, Dagi -Ursprüngliche Nachricht- Von: armin.weg...@bka.bund.de [mailto:armin.weg...@bka.bund.de] Gesendet: Freitag, 15. Oktober 2010 07:47 An: Trögner, Dagi Betreff: AW: [antlr-interest] Grammar natural language Hi Dagi, for which one? Most likely you will have a separate grammar for each natural language. Armin -Ursprüngliche Nachricht- Von: antlr-interest-boun...@antlr.org [mailto:antlr-interest-boun...@antlr.org] Im Auftrag von dagi.troeg...@dlr.de Gesendet: Donnerstag, 14. Oktober 2010 14:39 An: antlr-interest@antlr.org Betreff: [antlr-interest] Grammar natural language Hi everyone, I am looking for a simple grammar for natural language. In a first version just short simple sentences would be satisfying. Has anyone tried to formulated such a grammar already? Thanks a lot, Dagi Dagi Troegner Deutsches Zentrum fuer Luft- und Raumfahrt (DLR) Institut fuer Flugfuehrung Abteilung Lotsenassistenz Lilienthalplatz 7 D-38108 Braunschweig Telefon: (49) 531 / 295-2189 Fax: (49) 531 / 295-2180 Email: dagi.troeg...@dlr.demailto:dagi.troeg...@dlr.de __/|__ /_/_/_/ |/ DLR List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 30344] [antlr-interest] cretae the inverse of a rule
if i create a rule like LETTER: 'a'..'z' ; how can i create a rule for all is not a LETTER ? like !myBoolean in java List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 30345] Re: [antlr-interest] cretae the inverse of a rule
On Fri, Oct 15, 2010 at 3:39 PM, Remi Marechal remi.marec...@gmail.comwrote: if i create a rule like LETTER: 'a'..'z' ; With the '~' (tilde): NON_LETTER : ~LETTER ; Note that the negation only works on single characters (or lexer rules that match a single character). For example, you can't negate the rule: FOO : 'foo' ; Regards, Bart. List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 30346] [antlr-interest] Tree grammar: How to handle rule arguments
Hello, I have implemented a parser grammar as well as a tree parser for a little Java-like language. I have encounter difficulties with the following productions parser grammar: declList : ( varDecl | methodDecl )+ ; varDecl : varType=type identList[$varType.tree] - identList ; identList[Object varType] : id=Identifier ( ',' Identifier )* - ^( VarDecl[$id, VarDecl] { $varType } Identifier )+ ; I have checked the parser grammar using ANTLRWorks' debugger and it works as expected. E.g., for the input int x, y, z; it creates three VarDecl nodes, whereby each node has int as its left child and the identifier as its right child. I use the following rules in the tree grammar: tree grammar: declList[ArrayListDecl members] : ( v=varDecl { $members.add($v.var) } | m=methodDecl { $members.add($m.mth); } )+ ; varDecl returns [VarDecl var] : ^( VarDecl t=type n=Identifier ) { $var = new VarDecl($t.text, $n.text); } ; However, if I pass above example, i.e., int x, y, z; as input to the walker, I get only one variable declaration: int z. The variables x and y are lost (it turns out that only the last identifier is kept). I guess that there is a problem in my construction of the tree grammar rules. In particular, I merged identList with varDecl and I removed '+' from ^(VarDecl ...). However, if I keep the '+', I get ambiguity. What would the correct tree grammar rule be? Thanks a lot for your help! Stephanie List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 30347] Re: [antlr-interest] Grammar natural language
I agree with Steve that a small structured language is probably best. However, if natural language input is a requirement and you can tolerate some degree of inexactness, you can use the OpenNLP (sourceforge) package to: 1) do sentence detection (unless you can guarantee that every statement is bounded by a hard line end). 2) do part of speech tagging to augment the words of the sentence. 3) do word grouping to identify related word relations and further augment the contents of the sentence. You will also need to: 4) develop tools to build a corpus of examples to train the models underlying 1-3. 5) develop an Antlr grammar and set of tree walkers to analyze and extract usable information from a fully augmented sentence. Your initial OpenNLP models will likely be about 70% accurate. With a lot of training and tuning, and dependent on the size of the domain, you can push it up to about 95-98% accuracy. Doing NLP solely in Antlr is a practical impossibility. With OpenNLP as a front end, Antlr is well suited for NLP. Just don't do it unless NL is a requirement. Best, Gerald -- Original Message (Friday, October 15, 2010 1:24:53 PM) From: Stephen Winnall -- Subject: Re: [antlr-interest] Grammar natural language Hi Dagi Grammars for natural languages are very difficult, and ANTLR is not suited for the general case. Natural languages are a complex structure involving the interaction of phonemics, morphology, syntax and semantics (not to mention general knowledge). Classic illustrations of the sort of problems that can arise are sentences like flying planes can be dangerous or general flies back to front. However, if you can restrict your corpus to a relatively small, well-defined domain (runways?), you may still be able to create an adequate grammar. But the chances that anyone has already written a grammar for that domain are correspondingly small. And your users are going to have to learn to restrict their language to what the grammar can handle, so you might really be better off writing a simple DSL instead. Steve On 15 Oct 2010, at 10:29,dagi.troeg...@dlr.de wrote: Hi Armin, I would like to cover just basic sentences in the English language with the purpose to let a user formulate simple constraints for a modelling environment. For the beginning sentences like The length of a runway is not greater than 5000 metres Or If the runway is dependent then the distance is smaller than 1000 metres Thanks for any advice, Dagi -Ursprüngliche Nachricht- Von: armin.weg...@bka.bund.de [mailto:armin.weg...@bka.bund.de] Gesendet: Freitag, 15. Oktober 2010 07:47 An: Trögner, Dagi Betreff: AW: [antlr-interest] Grammar natural language Hi Dagi, for which one? Most likely you will have a separate grammar for each natural language. Armin -Ursprüngliche Nachricht- Von: antlr-interest-boun...@antlr.org [mailto:antlr-interest-boun...@antlr.org] Im Auftrag von dagi.troeg...@dlr.de Gesendet: Donnerstag, 14. Oktober 2010 14:39 An: antlr-interest@antlr.org Betreff: [antlr-interest] Grammar natural language Hi everyone, I am looking for a simple grammar for natural language. In a first version just short simple sentences would be satisfying. Has anyone tried to formulated such a grammar already? Thanks a lot, Dagi Dagi Troegner Deutsches Zentrum fuer Luft- und Raumfahrt (DLR) Institut fuer Flugfuehrung Abteilung Lotsenassistenz Lilienthalplatz 7 D-38108 Braunschweig Telefon: (49) 531 / 295-2189 Fax: (49) 531 / 295-2180 Email: dagi.troeg...@dlr.demailto:dagi.troeg...@dlr.de __/|__ /_/_/_/ |/ DLR List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- Gerald B. Rosenberg, Esq. NewTechLaw 260 Sheridan Ave., Suite 208 Palo Alto, CA 94306-2009 650.325.2100 (office) / 650.703.1724 (cell) 650.325.2107 (facsimile) www.newtechlaw.com CONFIDENTIALITY NOTICE: This email message (including any attachments) is being sent by an attorney, is for the sole use of the intended recipient, and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender immediately by reply email and delete all copies of this message and any attachments without retaining a copy. List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe:
[il-antlr-interest: 30348] Re: [antlr-interest] very simple doubt about EXPR grammar
You need to post your grammar and code, not screen shots. First though, delete everything the was generated and start from a clean directory. You are just doing something silly somewhere, but we can’t see what it is unless you post all your code. Also, as you are using Java, you should look at Maven for your builds. Jim -Original Message- From: antlr-interest-boun...@antlr.org [mailto:antlr-interest- boun...@antlr.org] On Behalf Of Leonardo K. Shikida Sent: Thursday, October 14, 2010 5:19 PM To: John B. Brodie Cc: antlr-interest@antlr.org Subject: Re: [antlr-interest] very simple doubt about EXPR grammar Hi John here are some screenshots of what I am doing with ANTLR Works. not ok: http://img833.imageshack.us/img833/7586/noviable.png ok: http://img194.imageshack.us/img194/7260/viable.png so strange it only happens to the MINUS sign. thanks Leo On Thu, Oct 14, 2010 at 5:17 PM, John B. Brodie j...@acm.org wrote: Greetings! On Thu, 2010-10-14 at 09:31 -0300, Leonardo K. Shikida wrote: Hi Kevin You´re right. So I´ve changed the grammar to include a stopword (semicolon). Still the same problem. 1-1+1; generates a NoViableAltException very strange... while 1+1-1; does not This is very strange because according to the rule expr : e=multExpr ( '+' multExpr | '-' multExpr | '*' multExpr | '/' multExpr )* ; it does not matter what symbol comes. In fact, for all other combinations of symbols in the same expression, only those starting with 1-1 throws the exception. 1*1-1; OK 1*1/1; OK 1-1-1; NOT OK 1*1+1; OK unable to reproduce. attached please find a complete test grammar including a test driver that contains your grammar. this test grammar parses all four of the above without any problem. (does your test input happen to (incorrectly) include a blank(s)? your lexer accepts white space but your parser does not) and so on... Can anyone help me? Is it an ANTLR bug or am I missing something here in this grammar? Thanks in advance Leo. grammar Expr; @header { } @members { } stat: comp ';' ; comp : e=expr ( '' expr | '' expr | '=' expr )* ; expr : e=multExpr ( '+' multExpr | '-' multExpr | '*' multExpr | '/' multExpr )* ; multExpr : atom ( atom )* ; atom : INT | ID | '(' comp ')' ; ID : ('a'..'z'|'_')+ ; INT : '0'..'9'+ ; WS : (' '|'\t')+ ; [] Leonardo K. Shikida On Wed, Oct 13, 2010 at 3:14 PM, Kevin J. Cummings cummi...@kjchome.homeip.net wrote: On 10/13/2010 01:29 PM, Leonardo K. Shikida wrote: Hi This is something stupid, I guess. I have a grammar like this below and I would like to know why 1+1-1 works and 1-1+1 does not work (NoViableAltException) NoViableAltException is thrown in your stat rule when it can't predict an INT, ID, (, or NEWLINE in the lookahead. Does your test case end in a NEWLINE? Thanks Leo K. -- Kevin J. Cummings kjch...@rcn.com cummi...@kjchome.homeip.net cummi...@kjc386.framingham.ma.us Registered Linux User #1232 (http://counter.li.org) List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 30349] Re: [antlr-interest] C++ Map not usable, always SEGFAULT
1) Do not embed code in the grammar as you will end up with an unreadable mess (see many other posts about this); 2) Don't use $text unless what you are doing is trivial, but remember that the $text items are released once you free the parser (see many other posts about this); 3) Don't use 'literals', use tokens defined in the lexer (see many other posts about this); 4) However, that is not your problem, download and run valgrind, which will tell you what you are doing wrong - this is a programming issue, not an antlr issue; http://antlr.markmail.org http://valgrind.org/ You might be better just putting this code in a main.c program and taking antlr out of the equation. When you can just set the list you will see what you are doing wrong. Jim -Original Message- From: antlr-interest-boun...@antlr.org [mailto:antlr-interest- boun...@antlr.org] On Behalf Of Christian Benjamin Ries Sent: Friday, October 15, 2010 2:05 AM To: antlr-interest@antlr.org Subject: [antlr-interest] C++ Map not usable, always SEGFAULT Hello, I'm a little bit frustated. I'm trying to use a C++ Map with ANTLR, but I get a segfault always. The source is pasted below, any suggestions? The output is always: -- c...@:~/dsl/output$ ./dsl ../install1.host_dsl NetIP: 127.0.0.1 hlist size: 0 Segmentation fault -- 1. Snippet - class Host with Map: 1. START class Host { private: Network network; Distribution distribution; public: Host(); Host(const Host host); ~Host(); void setNetwork(Network network) { this-network = network; } Network getNetwork() { return this-network; } void setDistribution(Distribution distri) { this-distribution = distri; } Distribution getDistribution() { return this-distribution; } }; struct HostMapCompare { bool operator()( const char* s1, const char* s2 ) const { return strcmp( s1, s2 ) 0; } }; typedef std::mapconst char*, Host, HostMapCompare HostList; typedef std::mapconst char*, Host, HostMapCompare::iterator HostListIterator; 1. END 2. START cluster : host* ; 2. END 3. START host scope Symbols; @init { $Symbols::name-custom = new std::string(); $Symbols::name-freeCustom = freeName; } : 'Host' n=STRING { $Symbols::name-custom = $n.text-chars; } OPEN_BRACE host_values[$Symbols::name] CLOSE_BRACE ; host_values[pANTLR3_COMMON_TOKEN hostname] @init { std::string hostid; Network network; } : { hostid = (char*)hostname-custom; hostid = hostid.substr(1,hostid.length()-2); } ( 'distribution' ':' DISTRIBUTION END |'netip' ':' a=IP {network.setNetIp((char*)$a.text-chars);} END |'netgw' ':' b=IP {network.setNetGw((char*)$b.text-chars);} END |'netns' ':' c=IP {network.setNetNs((char*)$c.text-chars);} END |'netnm' ':' d=IP {network.setNetNm((char*)$d.text-chars);} END |'nethn' ':' STRING {network.setNetHn((char*)$STRING.text-chars);} END )* 'services' ':' (ID|service) ((',' ID)|service)* END { VisualGrid::Host h; h.setNetwork(network); std::cout NetIP: h.getNetwork().getNetIp() std::endl; std::cout hlist size: hlist.size() std::endl; hlist[hostid.c_str()] = h; // COULD NOT SET?!?!?!?!? std::cout hlist size: hlist.size() std::endl; } ; 3. END -- == == Dipl.-Ing. (FH) Christian Benjamin Ries University of Applied Sciences Bielefeld Department of Engineering Sciences and Mathematics CMSE - Computational Materials Science Engineering Wilhelm-Bertelsmann-Str. 10 D-33602 Bielefeld Office: 202 (WBS II) Phone: +49 521 106-71222 Fax:+49 521 106-71241 http://www.christianbenjaminries.de == == List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 30350] [antlr-interest] generate lexer parser pair at runtime
Thanks to all in advance for your help. Is there a sample available that demonstrates how to generate a lexer parser pair at runtime. I've searched a bit.Can't seem to find one. The naive approach would be something like ... org.antlr.Tool tool = new org.antlr.Tool(Grammar.g); -- Rick List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 30352] [antlr-interest] Semantic predicate behaviour with k1
Hello, I am seeing ANTLR generate unexpected code when using semantic predicates and am wondering if my grammar or understanding is incorrect. The EBNF has a rule similar to the following: rule : primary_literal | {isIdent(LT(1)-getText(LT(1)),PARAM_IDENT)}? identifier LBRACKET? | {isIdent(LT(1)-getText(LT(1)),SPECPARAM_IDENT)}? identifier (LBRACKET constant_range_expression RBRACKET)? | {isIdent(LT(1)-getText(LT(1)),TYPE_IDENT)}? identifier APOSTROPHE | {isIdent(LT(1)-getText(LT(1)),ENUM_IDENT)}? identifier | {isIdent(LT(1)-getText(LT(1)),GENVAR_IDENT)}?identifier | {isIdent(LT(1)-getText(LT(1)),LET_IDENT)}? identifier LPARAN? | {isIdent(LT(1)-getText(LT(1)),GENBLOCK_IDENT)}? identifier (LBRACKET constant_expression RBRACKET)? PERIOD | {isIdent(LT(1)-getText(LT(1)),PACKAGE_IDENT)}? identifier COLONCOLON constant_primary_package_scope_suffix | identifier ((LPARAN list_of_arguments RPARAN)= LPARAN list_of_arguments RPARAN)?// tf_call The last identifier type can be forward declared so that type is assumed if the identifier at this point is undefined. I previously had tried doing this by factoring but it makes the grammar very difficult to follow and substantially increases the number of rules. With this rule ANTLR generates the following: else if ( (LA1039_0 == SIMPLE_IDENT) ) { { int LA1039_2 = LA(2); if ( (LA1039_2 == LBRACKET || LA1039_2 == PERIOD) ) { alt1039=8; } else if ( (LA1039_2 == APOSTROPHE) ) { alt1039=4; } else if ( (LA1039_2 == COLONCOLON) ) { alt1039=9; } else if ( ((isIdent(LT(1)-getText(LT(1)),PARAM_IDENT))) ) { alt1039=2; } else if ( ((isIdent(LT(1)-getText(LT(1)),SPECPARAM_IDENT))) ) { alt1039=3; } else if ( ((isIdent(LT(1)-getText(LT(1)),ENUM_IDENT))) ) { alt1039=5; } else if ( ((isIdent(LT(1)-getText(LT(1)),GENVAR_IDENT))) ) { alt1039=6; } else if ( ((isIdent(LT(1)-getText(LT(1)),LET_IDENT))) ) The first 3 conditions look out of place. It appears even with predicates, ANTLR will increase k if it thinks it can help resolve ambiguities. Chapter 13 in the book doesn't appear to describe cases like this. The first case won't work as three different alternatives match this sequence. If I force k=1 for this rule, then the code is generated as expected. Strangely, removing the PERIOD from the GENBLOCK subrule also works, but breaks the grammar. Is this expected behaviour? List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 30353] Re: [antlr-interest] Tree grammar: How to handle rule arguments
Hi Jim, On Oct 15, 2010, at 7:52 PM, Jim Idle j...@temporal-wave.com wrote: -Original Message- From: antlr-interest-boun...@antlr.org [mailto:antlr-interest- boun...@antlr.org] On Behalf Of Stephanie Sent: Friday, October 15, 2010 8:05 AM To: antlr-interest@antlr.org Cc: balz...@inf.ethz.ch Subject: [antlr-interest] Tree grammar: How to handle rule arguments Hello, I have implemented a parser grammar as well as a tree parser for a little Java-like language. I have encounter difficulties with the following productions identList[Object varType] : id=Identifier ( ',' Identifier )* - ^( VarDecl[$id, VarDecl] { $varType } Identifier )+ What does Identifier refer to here in your rewrite? It is just the last variable that is parsed. When you are looking in Works, you are probably looking at the parse tree and not the AST. Why are you trying to rewrite the first token to an Imaginary? What you want is: : id+=Identifier (',' id+=Identifier)* -^(VarDecl {$varType} $id)+ No, I was looking at the AST in ANTLRWorks and the AST was produced as described. I see that I should have named the starred occurrence of Identifier as well. What is the meaning of id+= above? Then reflect that in your tree parser. I have never quite seen the reason behind eliding the fact that multiple declarations were made on the same type, but if that is what you want, then you need to do it as above. OK, thanks a lot. Stephanie Jim List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 30354] Re: [antlr-interest] Semantic predicate behaviour with k1
You can't dp that, but you could use options {k=1;} on this rule. But all your alts call identifier anyway, so why would you do that? Predicates are not supposed to have side effects, though I sometimes break that rule on keyword vs identifier problems. But it seems you just need to left factor yout parser rule: identifier ( LBRACKET ... | etc) It looks to me like you are trying to type in a grammar from the normative spec of something like Verilog, and do everything in one pass. You need to parser the common syntax in to a tree, then walk the tree and verify it (throw out ranges that are not constant when they must be etc). Don't try to reject semantic errors in the parser basically. Jim -Original Message- From: antlr-interest-boun...@antlr.org [mailto:antlr-interest- boun...@antlr.org] On Behalf Of A Z Sent: Friday, October 15, 2010 3:40 PM To: antlr-interest@antlr.org Subject: [antlr-interest] Semantic predicate behaviour with k1 Hello, I am seeing ANTLR generate unexpected code when using semantic predicates and am wondering if my grammar or understanding is incorrect. The EBNF has a rule similar to the following: rule : primary_literal | {isIdent(LT(1)-getText(LT(1)),PARAM_IDENT)}? identifier LBRACKET? | {isIdent(LT(1)-getText(LT(1)),SPECPARAM_IDENT)}? identifier (LBRACKET constant_range_expression RBRACKET)? | {isIdent(LT(1)-getText(LT(1)),TYPE_IDENT)}? identifier APOSTROPHE | {isIdent(LT(1)-getText(LT(1)),ENUM_IDENT)}? identifier | {isIdent(LT(1)-getText(LT(1)),GENVAR_IDENT)}?identifier | {isIdent(LT(1)-getText(LT(1)),LET_IDENT)}? identifier LPARAN? | {isIdent(LT(1)-getText(LT(1)),GENBLOCK_IDENT)}? identifier (LBRACKET constant_expression RBRACKET)? PERIOD | {isIdent(LT(1)-getText(LT(1)),PACKAGE_IDENT)}? identifier COLONCOLON constant_primary_package_scope_suffix | identifier ((LPARAN list_of_arguments RPARAN)= LPARAN list_of_arguments RPARAN)?// tf_call The last identifier type can be forward declared so that type is assumed if the identifier at this point is undefined. I previously had tried doing this by factoring but it makes the grammar very difficult to follow and substantially increases the number of rules. With this rule ANTLR generates the following: else if ( (LA1039_0 == SIMPLE_IDENT) ) { { int LA1039_2 = LA(2); if ( (LA1039_2 == LBRACKET || LA1039_2 == PERIOD) ) { alt1039=8; } else if ( (LA1039_2 == APOSTROPHE) ) { alt1039=4; } else if ( (LA1039_2 == COLONCOLON) ) { alt1039=9; } else if ( ((isIdent(LT(1)-getText(LT(1)),PARAM_IDENT))) ) { alt1039=2; } else if ( ((isIdent(LT(1)-getText(LT(1)),SPECPARAM_IDENT))) ) { alt1039=3; } else if ( ((isIdent(LT(1)-getText(LT(1)),ENUM_IDENT))) ) { alt1039=5; } else if ( ((isIdent(LT(1)-getText(LT(1)),GENVAR_IDENT))) ) { alt1039=6; } else if ( ((isIdent(LT(1)-getText(LT(1)),LET_IDENT))) ) The first 3 conditions look out of place. It appears even with predicates, ANTLR will increase k if it thinks it can help resolve ambiguities. Chapter 13 in the book doesn't appear to describe cases like this. The first case won't work as three different alternatives match this sequence. If I force k=1 for this rule, then the code is generated as expected. Strangely, removing the PERIOD from the GENBLOCK subrule also works, but breaks the grammar. Is this expected behaviour? List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.