Re: Followup: EBNF grammar for .proto files

Chris Mon, 22 Sep 2008 03:24:15 -0700

Yegor wrote:
> Hi, everyone,
>
> I am following up on the discussion about the EBNF grammar for .proto
> files: 
> http://groups.google.com/group/protobuf/browse_thread/thread/1cccfc624cd612da
>
> I am now trying port this grammar to ANTLR format and make it generate
> the lexers and parsers, but so far no luck.
>
> Does anyone know how to translate /[^\0\n]/ to ANTLR format? I'm not
> even sure what it means. It's from the definition of strLit.
>   
Not a null byte (0) and not a newline byte (10).
> Also can anyone tell me what's wrong with the following grammar? You
> should be able to just copy and paste the following in ANTLRWorks.
> (NOTE: I simplified strLit (STR_LIT) as I couldn't translate the regex
> above.) Thanks.
>


Many things are wrong.  I stopped using the EBNF that was posted to the 
list when making my lexer.

Negative constant values are allowed (for default values), but are not 
in the grammar below (including oct and hex constants).
The  ('.' | DIGIT+)? in FLOAT_LIT is just wrong. And they ought to be 
allowed to be negative.

The opening and closing QUOTE of strings must match.  You should not 
accept an opening single quote and a closing double quote.
Inside of a single quotes string you are allow unescaped double quotes.  
Inside of a double quoted string you are allowed unescaped single quotes.

The grammar should allow internal use of a period character to allow for 
qualified names in defaults for enums from imported packages.  Two 
periods in a row are not permitted, however.



>
> /*
>  * ANTLR grammar file for Google Protocol Buffers
>  */
>
> grammar proto;
>
> proto
>       : ( message | extend | enum | pimport | package | option | ';' )*
>       ;
>
> pimport
>       : 'import' STR_LIT ';'
>       ;
>
> package
>       : 'package' IDENT ( '.' IDENT )* ';'
>         ;
>
> option
>       : 'option' optionBody ';'
>       ;
>
> optionBody
>       : IDENT ( '.' IDENT )* '=' constant
>       ;
>
> message
>       : 'message' IDENT messageBody
>       ;
>
> extend
>       : 'extend' userType '{' ( field | group | ';' )* '}'
>       ;
>
> enum
>       : 'enum' IDENT '{' ( option | enumField | ';' )* '}'
>       ;
>
> enumField
>       : IDENT '=' INT_LIT ';'
>       ;
>
> service
>       : 'service' IDENT '{' ( option | rpc | ';' )* '}'
>       ;
>
> rpc
>       : 'rpc' IDENT '(' userType ')' 'returns' '(' userType ')' ';'
>       ;
>
> messageBody
>       : '{' ( field | enum | message | extend | extensions | group | option
> | ':' )* '}'
>       ;
>
> group
>       : modifier 'group' camelIdent '=' INT_LIT messageBody
>       ;
>
> // tag number must be 2^28-1 or lower
> field
>       : modifier type IDENT '=' INT_LIT ( '[' fieldOption ( ','
> fieldOption )* ']' )? ';'
>       ;
>
> fieldOption
>       : optionBody | 'default' '=' constant
>       ;
>
> extensions
>       : extRange ( ',' extRange )* ';'
>       ;
>
> extRange
>       : INT_LIT ( 'to' ( INT_LIT | 'max' ) )?
>       ;
>
> // Kenton: "I would either call this "label" or "cardinality""
> modifier
>       : 'required' | 'optional' | 'repeated'
>       ;
>
> type
>       : 'double' | 'float' | 'int32' | 'int64' | 'uint32' |
>         'uint64' | 'sint32' | 'sint64' | 'fixed32' | 'fixed64' |
>         'sfixed32' | 'sfixed64' | 'bool' | 'string' | 'bytes' | userType
>       ;
>
> // leading dot for identifiers means they're fully qualified
> // Kenton: userType ::= "."? ident ( "." ident )*
> userType
>       : '.'? IDENT ( '.' IDENT )*
>       ;
>
> constant
>       : IDENT | INT_LIT | FLOAT_LIT | STR_LIT | BOOL_LIT
>       ;
>
> IDENT
>       : ('a'..'z'|'A'..'Z'|'_')('A'..'Z'|'a'..'z'|'0'..'9'|'_')*
>       ;
>
> // according to parser.cc, group names must start with a capital
> letter as a
> // hack for backwards-compatibility
> camelIdent
>       : ('A'..'Z')('A'..'Z'|'a'..'z'|'0'..'9'|'_')*
>       ;
>
> INT_LIT
>       : DEC_INT | HEX_INT | OCT_INT
>       ;
>
> DEC_INT
>       : '1'..'9' DIGIT*
>       ;
>
> HEX_INT
>       : '0' ('x' | 'X') ('A'..'F' | 'a'..'f' | DIGIT)+
>       ;
>
> OCT_INT
>       : '0' ('0'..'7')+
>       ;
>
> // allow_f_after_float_ is disabled by default in tokenizer.cc
> FLOAT_LIT
>       : DIGIT+ ('.' | DIGIT+)? (('E' | 'e') ('+' | '-')? DIGIT+)?
>       ;
>
> DIGIT
>       : '0'..'9'
>       ;
>
> BOOL_LIT
>       : 'true' | 'false'
>       ;
>
> STR_LIT
>       : QUOTE ( HEX_ESCAPE | OCT_ESCAPE | CHAR_ESCAPE | 'a'..'z' | 'A'..'Z'
> | '0'..'9' | ' ' ) QUOTE
>       ;
>
> QUOTE
>       : '\'' | '"'
>       ;
>
> HEX_ESCAPE
>       : '\\' ('X' | 'x') ('A'..'F' | 'a'..'f' | '0'..'9'){1,2}
>       ;
>
> OCT_ESCAPE
>       : '\\' '0'? ('0'..'7'){1,3}
>       ;
>
> CHAR_ESCAPE
>       : '\\' ('a' | 'b' | 'f' | 'n' | 'r' | 't' | 'v' | '\\' | '\?' | ('\\'
> '\'') | ('\\' '"'))
>       ;
>
> >
>   


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Followup: EBNF grammar for .proto files

Reply via email to