Re: [PHP-DEV] Complete case-sensitivity in PHP

2012-04-22 Thread C.Koy

On 4/21/2012 4:37 AM, Galen Wright-Watson wrote:

What about instead creating a special-purpose Zend function to normalize
class names (zend_normalize_class_name, or zend_classname_tolower)? This
function would examine the current locale and, if it's a problematic one,
convert the string to lower case on its own (calling zend_tolower on
non-problematic characters). Alternatively, zend_normalize_class_name could
switch LC_CTYPE to an appropriate locale (e.g. UTF-8; the locale could be
determined at compile time), call zend_str_tolower_copy, then switch back
before returning. Then, any appropriate function (e.g.
zend_resolve_class_name, zend_lookup_class_ex, class_exists,  class_alias)
would call zend_normalize_class_name instead of zend_str_tolower_copy/
zend_str_tolower_dup.


In plain words/pseudo-code, adding an if statement at a certain step 
should suffice, like:


1. lowercase the name;
2. if the effective locale is tr_XY, then replace every ı with i;
3. look up the name;

For those who have nothing to do with Turkish locales, that should incur 
the overhead of an if condition only.



But, I did not start this thread to discuss such bug fix, because:

1. It does not take a genius to figure it out, and should take minutes 
to implement for someone experienced in the internals. Given the 10 year 
span and dozens of comments/complaints on the bug's entry, it's hard to 
say this issue went unnoticed. So I had to conclude that such fix has 
quietly been overruled for performance and/or other undisclosed reasons.
2. Absent bug #18556, case-sensitive PHP has merits as I stated in other 
post and several people voiced opinions in favor. Case-sensitive PHP is 
worth considering.




Does this bug pop-up for locales other than Turkish, Azerbaijani and Kurdish
?


Theoretically, this problem occurs for any locales sharing a letter 
lowercase of which is different from each other's, and the PHP script 
changes its locale among these locales throughout its execution.


best regards,






--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Complete case-sensitivity in PHP

2012-04-22 Thread Galen Wright-Watson
2012/4/22 C.Koy can5...@gmail.com

 On 4/21/2012 4:37 AM, Galen Wright-Watson wrote:

 What about instead creating a special-purpose Zend function to normalize
 class names (zend_normalize_class_name, or zend_classname_tolower)? This
 function would examine the current locale and, if it's a problematic one,
 convert the string to lower case on its own (calling zend_tolower on
 non-problematic characters). Alternatively, zend_normalize_class_name
 could
 switch LC_CTYPE to an appropriate locale (e.g. UTF-8; the locale could
 be
 determined at compile time), call zend_str_tolower_copy, then switch back
 before returning. Then, any appropriate function (e.g.
 zend_resolve_class_name, zend_lookup_class_ex, class_exists,  class_alias)
 would call zend_normalize_class_name instead of zend_str_tolower_copy/
 zend_str_tolower_dup.


 In plain words/pseudo-code, adding an if statement at a certain step
 should suffice, like:

 1. lowercase the name;
 2. if the effective locale is tr_XY, then replace every ı with i;
 3. look up the name;

 For those who have nothing to do with Turkish locales, that should incur
 the overhead of an if condition only.

 The fix would need to be applied to at least four functions, so adding a
new function would be more maintainable. Also, there are locales that don't
begin with tr_ or have TR in the locale name, so the condition would
need to be more complex.

Converting I or ı separately from lowercase conversion is less
performant than either option I describe, as it requires an extra loop,
which is why I didn't bother suggesting it. I suspect switching the locale
is most performant, as it doesn't require additional tests, though I
haven't examined the cost of setting the locale.


 But, I did not start this thread to discuss such bug fix, because:

 1. It does not take a genius to figure it out, and should take minutes to
 implement for someone experienced in the internals. Given the 10 year span
 and dozens of comments/complaints on the bug's entry, it's hard to say this
 issue went unnoticed. So I had to conclude that such fix has quietly been
 overruled for performance and/or other undisclosed reasons.


Why does it matter if a solution is simple? If anything, that a fix does
not take a genius is an argument in its favor, if it also solves the
problem.

If it's already been rejected privately, it's time to bring the reasons
into the open (which is why I asked). If not, it should be considered
publicly.


 2. Absent bug #18556, case-sensitive PHP has merits as I stated in other
 post and several people voiced opinions in favor. Case-sensitive PHP is
 worth considering.

 It is, but it's also a major BC break, hence perhaps better suited for
PHP6. Case-sensitivity is also a much bigger issue than this bug. A custom
conversion function, on the other hand, produces the minimum impact of any
option I've read. As such, it's hopefully a solution for this bug that
everyone can agree on.


 Does this bug pop-up for locales other than Turkish, Azerbaijani and
 Kurdish
 ?


 Theoretically, this problem occurs for any locales sharing a letter
 lowercase of which is different from each other's, and the PHP script
 changes its locale among these locales throughout its execution.

 The abstract property that makes a locale problematic is obvious. I was
looking for specific locales, as they need to be identified for a complete
solution.


Re: [PHP-DEV] Complete case-sensitivity in PHP

2012-04-22 Thread Yasuo Ohgaki
Hi,

2012/4/23 Galen Wright-Watson ww.ga...@gmail.com:
 2. Absent bug #18556, case-sensitive PHP has merits as I stated in other
 post and several people voiced opinions in favor. Case-sensitive PHP is
 worth considering.

 It is, but it's also a major BC break, hence perhaps better suited for
 PHP6. Case-sensitivity is also a much bigger issue than this bug. A custom
 conversion function, on the other hand, produces the minimum impact of any
 option I've read. As such, it's hopefully a solution for this bug that
 everyone can agree on.

Conversion script may be provided.
It's a rather simple script with tokenizer.

Anyway, if we are going to change function name rule, consistent module
function names should better be considered at the same time.
 createimage() htmlentities(), etc should be create_image()/html_entities().
There is alias system. This is just a matter of defining aliases for them.

Regards,

--
Yasuo Ohgaki
yohg...@ohgaki.net

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php