#1132: XML Schema type "identifier" is broken
--------------------+-------------------------------------------------------
Reporter: david | Owner: david
Type: defect | Status: new
Priority: normal | Milestone: 1.0.2
Component: config | Version: 1.0.1
Severity: minor | Keywords:
Has_patch: 0 |
--------------------+-------------------------------------------------------
Description changed by david:
Old description:
> The purpose of it is making sure that a given identifier (var name, class
> name, method name, ...) is a valid PHP {{{LABEL}}}, a scanner rule that
> operates on bytes which can be represented by the regular expression
> {{{^[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*$}}}.
>
> The equivalent we have in {{{_types.xsd}}} is:
> {{{
> #!xml
> <xs:simpleType name="identifier">
> <xs:restriction base="xs:string">
> <xs:pattern value="[_A-Za-z\p{IsLatin-1Supplement}][_A-
> Za-z0-9\p{IsLatin-1Supplement}]*" />
> </xs:restriction>
> </xs:simpleType>
> }}}
>
> That is complete nonsense, because the above uses Unicode character
> properties. As a result, "über" is a valid LABEL, but "русский" is not.
>
> We have two options:
> 1. drop the validation in the schema, do it at runtime in the config
> handlers, and assume that the encoding of the XML file is the same as the
> encoding of the PHP file and do the conversion at runtime (because if the
> XML file is in Shift_JIS and specifies a class with japanese characters,
> then the PHP file must be in Shift_JIS, too)
> 1. simply mandate that for anything non-ASCII, UTF-8 must be used as the
> encoding in PHP source
>
> We'll go with #2 because #1 is bucketloads of work for cases nobody cares
> about.
>
> The pattern now needs to be changed so that it does ''not'' allow
> characters that are ''not'' {{{[_A-Za-z0-9]}}} (or {{{[_A-Za-z]}}} for
> the first character), which is possible, and directly translates into the
> equivalent ASCII ranges in case of UTF-8.
New description:
The purpose of it is making sure that a given identifier (var name, class
name, method name, ...) is a valid PHP {{{LABEL}}}, a scanner rule that
operates on bytes which can be represented by the regular expression
{{{^[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*$}}}.
The equivalent we have in {{{_types.xsd}}} is:
{{{
#!xml
<xs:simpleType name="identifier">
<xs:restriction base="xs:string">
<xs:pattern value="[_A-Za-z\p{IsLatin-1Supplement}][_A-
Za-z0-9\p{IsLatin-1Supplement}]*" />
</xs:restriction>
</xs:simpleType>
}}}
That is complete nonsense, because the above uses Unicode character
properties. As a result, "über" is a valid LABEL, but "русский" is not.
We have two options:
1. drop the validation in the schema, do it at runtime in the config
handlers, and assume that the encoding of the XML file is the same as the
encoding of the PHP file and do the conversion at runtime (because if the
XML file is in Shift_JIS and specifies a class with japanese characters,
then the PHP file must be in Shift_JIS, too)
1. simply mandate that for anything non-ASCII, UTF-8 must be used as the
encoding in PHP source
We'll go with 2) because 1) is bucketloads of work for cases nobody cares
about.
The pattern now needs to be changed so that it does ''not'' allow
characters that are ''not'' {{{[_A-Za-z0-9]}}} (or {{{[_A-Za-z]}}} for the
first character), which is possible, and directly translates into the
equivalent ASCII ranges in case of UTF-8.
--
--
Ticket URL: <http://trac.agavi.org/ticket/1132#comment:1>
Agavi <http://www.agavi.org/>
An MVC Framework for PHP5
_______________________________________________
Agavi Tickets Mailing List
[email protected]
http://lists.agavi.org/mailman/listinfo/tickets