2012/4/22 C.Koy can5...@gmail.com
On 4/21/2012 4:37 AM, Galen Wright-Watson wrote:
What about instead creating a special-purpose Zend function to normalize
class names (zend_normalize_class_name, or zend_classname_tolower)? This
function would examine the current locale and, if it's a problematic one,
convert the string to lower case on its own (calling zend_tolower on
non-problematic characters). Alternatively, zend_normalize_class_name
could
switch LC_CTYPE to an appropriate locale (e.g. UTF-8; the locale could
be
determined at compile time), call zend_str_tolower_copy, then switch back
before returning. Then, any appropriate function (e.g.
zend_resolve_class_name, zend_lookup_class_ex, class_exists, class_alias)
would call zend_normalize_class_name instead of zend_str_tolower_copy/
zend_str_tolower_dup.
In plain words/pseudo-code, adding an if statement at a certain step
should suffice, like:
1. lowercase the name;
2. if the effective locale is tr_XY, then replace every ı with i;
3. look up the name;
For those who have nothing to do with Turkish locales, that should incur
the overhead of an if condition only.
The fix would need to be applied to at least four functions, so adding a
new function would be more maintainable. Also, there are locales that don't
begin with tr_ or have TR in the locale name, so the condition would
need to be more complex.
Converting I or ı separately from lowercase conversion is less
performant than either option I describe, as it requires an extra loop,
which is why I didn't bother suggesting it. I suspect switching the locale
is most performant, as it doesn't require additional tests, though I
haven't examined the cost of setting the locale.
But, I did not start this thread to discuss such bug fix, because:
1. It does not take a genius to figure it out, and should take minutes to
implement for someone experienced in the internals. Given the 10 year span
and dozens of comments/complaints on the bug's entry, it's hard to say this
issue went unnoticed. So I had to conclude that such fix has quietly been
overruled for performance and/or other undisclosed reasons.
Why does it matter if a solution is simple? If anything, that a fix does
not take a genius is an argument in its favor, if it also solves the
problem.
If it's already been rejected privately, it's time to bring the reasons
into the open (which is why I asked). If not, it should be considered
publicly.
2. Absent bug #18556, case-sensitive PHP has merits as I stated in other
post and several people voiced opinions in favor. Case-sensitive PHP is
worth considering.
It is, but it's also a major BC break, hence perhaps better suited for
PHP6. Case-sensitivity is also a much bigger issue than this bug. A custom
conversion function, on the other hand, produces the minimum impact of any
option I've read. As such, it's hopefully a solution for this bug that
everyone can agree on.
Does this bug pop-up for locales other than Turkish, Azerbaijani and
Kurdish
?
Theoretically, this problem occurs for any locales sharing a letter
lowercase of which is different from each other's, and the PHP script
changes its locale among these locales throughout its execution.
The abstract property that makes a locale problematic is obvious. I was
looking for specific locales, as they need to be identified for a complete
solution.