The specification is more restrictive, it says: The name portion of a fullname, record field names, and enum symbols must: - start with [A-Za-z_] - subsequently contain only [A-Za-z0-9_]
The Java implementation is more liberal in what it accepts. This is discussed in https://issues.apache.org/jira/browse/AVRO-1022. Doug On Wed, Feb 27, 2013 at 8:27 AM, Francis Galiegue <[email protected]> wrote: > On Wed, Feb 27, 2013 at 5:24 PM, Francis Galiegue <[email protected]> wrote: >> Hello, >> >> I have tried to parse this schema: >> >> { >> "name": "gender", >> "type": "enum", >> "symbols": [ "MALE", "FEMALE", "WHO CARES?" ] >> } >> >> But the parser complains about an illegal character in the third symbol. >> >> The problem is, nothing in the spec as far as I can see says that the >> set of usable code points in a symbol is limited at all... >> >> So, what is this allowed set of code points? >> >> -- >> Francis Galiegue, [email protected] >> JSON Schema in Java: http://json-schema-validator.herokuapp.com > > OK, beginning of answer to self: > > if (!(Character.isLetter(first) || first == '_')) > throw new SchemaParseException("Illegal initial character: "+name); > for (int i = 1; i < length; i++) { > char c = name.charAt(i); > if (!(Character.isLetterOrDigit(c) || c == '_')) > throw new SchemaParseException("Illegal character in: "+name); > > It therefore means any unicode letter or digit, or the underscore, is > allowed anywhere, except at the first point where there must not be an > underscore. So, it means the following is legal: > > [ "mémé", "dans", "les" "orties" ] > > Right? > > -- > Francis Galiegue, [email protected] > JSON Schema in Java: http://json-schema-validator.herokuapp.com
