#478: Character escape sequences in CLDR data are not evaluated, single quotes
possibly broken, too
-------------------------+--------------------------------------------------
Reporter: david | Owner: dominik
Type: defect | Status: new
Priority: high | Milestone: 0.11
Component: translation | Version: 0.11.0RC4
Severity: critical | Keywords:
-------------------------+--------------------------------------------------
Example:
source:/tags/0.11.0RC4/src/translation/data/locales/[EMAIL PROTECTED]
Note the {{{\u00a0}}} - that's a non-breaking space
(http://www.unicode.org/charts/PDF/U0080.pdf), it should be converted.
Reproduce:
* configure and set locale to {{{ru_RU}}}
* format a date using {{{long}}} (here {{{date}}} only, for Nov 26,
2006).
* Expected result is: 26 ноября 2006 г.
* Yes, including the trailing dot
* The space between "2006" and "г" is a non-breaking space, unicode
character no. x0A/160 (not ASCII character 160!)
* Actual result is: 26 ноября 2006\200600AM0г.'
* the trailing single quote indicates that single quotes are probably
handled incorrectly, see http://unicode.org/reports/tr35/#Unicode_Sets
(subsection E.1)
* the "2006" is from the lowercase "u" in the escape sequence
* the "AM" is from the lowercase "a" in the escape sequence
Possible solution: grab the escape sequences (maybe best at compile time!)
and convert them to XML entities, then convert them to UTF-8:
{{{
$seq = '00a0';
html_entity_decode('&#x' . $seq . ';', ENT_QUOTES, 'utf-8');
}}}
The possible sequences are described in
http://unicode.org/reports/tr35/#Unicode_Sets section E.2 (I don't think
we can support {{{\N{name}}}} though
And to make things even nicer:
Any character formed as the result of a backslash escape loses any
special meaning and is treated as a literal. In particular, note that \u
and \U escapes create literal characters. (In contrast, Java treats
Unicode escapes as just a way to represent arbitrary characters in an
ASCII source file, and any resulting characters are not tagged as
literals.)
My guess is that many other locales use escape sequences, not only in date
(or other) patterns, so it's pretty important to fix this.
I suggest replacing all {{{$node->getValue()}}} calls with
{{{$this->_($node)}}} or something in AgaviLdmlConfigHandler, where
{{{_()}}} calls {{{getValue()}}} on the node and then looks for escape
sequences to eval.
Note: Prado and Symfony use ICU's {{{.dat}}} files and thus are not
affected. Zend Framework doesn't seem to handle the escape sequences at
all.
--
Ticket URL: <http://trac.agavi.org/ticket/478>
Agavi <http://www.agavi.org/>
An MVC Framework for PHP5
_______________________________________________
Agavi Tickets Mailing List
[email protected]
http://lists.agavi.org/mailman/listinfo/tickets