#1132: XML Schema type "identifier" is broken
--------------------+-------------------------------------------------------
 Reporter:  david   |       Owner:  david
     Type:  defect  |      Status:  new  
 Priority:  normal  |   Milestone:  1.0.2
Component:  config  |     Version:  1.0.1
 Severity:  minor   |    Keywords:       
Has_patch:  0       |  
--------------------+-------------------------------------------------------
Description changed by david:

Old description:

> The purpose of it is making sure that a given identifier (var name, class
> name, method name, ...) is a valid PHP {{{LABEL}}}, a scanner rule that
> operates on bytes which can be represented by the regular expression
> {{{^[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*$}}}.
>
> The equivalent we have in {{{_types.xsd}}} is:
> {{{
> #!xml
> <xs:simpleType name="identifier">
>         <xs:restriction base="xs:string">
>                 <xs:pattern value="[_A-Za-z\p{IsLatin-1Supplement}][_A-
> Za-z0-9\p{IsLatin-1Supplement}]*" />
>         </xs:restriction>
> </xs:simpleType>
> }}}
>
> That is complete nonsense, because the above uses Unicode character
> properties. As a result, "über" is a valid LABEL, but "русский" is not.
>
> We have two options:
>  1. drop the validation in the schema, do it at runtime in the config
> handlers, and assume that the encoding of the XML file is the same as the
> encoding of the PHP file and do the conversion at runtime (because if the
> XML file is in Shift_JIS and specifies a class with japanese characters,
> then the PHP file must be in Shift_JIS, too)
>  1. simply mandate that for anything non-ASCII, UTF-8 must be used as the
> encoding in PHP source
>
> We'll go with #2 because #1 is bucketloads of work for cases nobody cares
> about.
>
> The pattern now needs to be changed so that it does ''not'' allow
> characters that are ''not'' {{{[_A-Za-z0-9]}}} (or {{{[_A-Za-z]}}} for
> the first character), which is possible, and directly translates into the
> equivalent ASCII ranges in case of UTF-8.

New description:

 The purpose of it is making sure that a given identifier (var name, class
 name, method name, ...) is a valid PHP {{{LABEL}}}, a scanner rule that
 operates on bytes which can be represented by the regular expression
 {{{^[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*$}}}.

 The equivalent we have in {{{_types.xsd}}} is:
 {{{
 #!xml
 <xs:simpleType name="identifier">
         <xs:restriction base="xs:string">
                 <xs:pattern value="[_A-Za-z\p{IsLatin-1Supplement}][_A-
 Za-z0-9\p{IsLatin-1Supplement}]*" />
         </xs:restriction>
 </xs:simpleType>
 }}}

 That is complete nonsense, because the above uses Unicode character
 properties. As a result, "über" is a valid LABEL, but "русский" is not.

 We have two options:
  1. drop the validation in the schema, do it at runtime in the config
 handlers, and assume that the encoding of the XML file is the same as the
 encoding of the PHP file and do the conversion at runtime (because if the
 XML file is in Shift_JIS and specifies a class with japanese characters,
 then the PHP file must be in Shift_JIS, too)
  1. simply mandate that for anything non-ASCII, UTF-8 must be used as the
 encoding in PHP source

 We'll go with 2) because 1) is bucketloads of work for cases nobody cares
 about.

 The pattern now needs to be changed so that it does ''not'' allow
 characters that are ''not'' {{{[_A-Za-z0-9]}}} (or {{{[_A-Za-z]}}} for the
 first character), which is possible, and directly translates into the
 equivalent ASCII ranges in case of UTF-8.

--

-- 
Ticket URL: <http://trac.agavi.org/ticket/1132#comment:1>
Agavi <http://www.agavi.org/>
An MVC Framework for PHP5



_______________________________________________
Agavi Tickets Mailing List
[email protected]
http://lists.agavi.org/mailman/listinfo/tickets

Reply via email to