Re: [PHP] utf-8-safe replacement for strtr()?

2009-03-27 Thread Tom Worster
On 3/26/09 11:36 AM, Nisse Engström news.nospam.0ixbt...@luden.se wrote:

 On Wed, 25 Mar 2009 11:32:42 +0100, Nisse Engström wrote:
 
 On Tue, 24 Mar 2009 08:15:35 -0400, Tom Worster wrote:
 
 strtr() with three parameters is certainly unsafe. but my tests are showing
 that it may be ok with two parameters if the strings in the second parameter
 are well formed utf-8.
 
 does anyone know more? can confirm or contradict?
 
 The two-argument version of strtr() should work fine
 since there are no collisions in utf-8 such that part
 of one character matches part of a different character.
 
 Oops. I meant to write that one complete character does
 not match any part of any other character. If a string
 of one or more utf-8 characters match a utf-8 text, it
 matches exactly those characters in the text. If that
 makes sense...

yes.

my conclusion is that 2-param strtr is safe if the subject text and
parameter strings are valid utf-8.



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] utf-8-safe replacement for strtr()?

2009-03-26 Thread Nisse Engström
On Wed, 25 Mar 2009 11:32:42 +0100, Nisse Engström wrote:

 On Tue, 24 Mar 2009 08:15:35 -0400, Tom Worster wrote:
 
 strtr() with three parameters is certainly unsafe. but my tests are showing
 that it may be ok with two parameters if the strings in the second parameter
 are well formed utf-8.
 
 does anyone know more? can confirm or contradict?
 
 The two-argument version of strtr() should work fine
 since there are no collisions in utf-8 such that part
 of one character matches part of a different character.

Oops. I meant to write that one complete character does
not match any part of any other character. If a string
of one or more utf-8 characters match a utf-8 text, it
matches exactly those characters in the text. If that
makes sense...


/Nisse

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] utf-8-safe replacement for strtr()?

2009-03-25 Thread Nisse Engström
On Tue, 24 Mar 2009 08:15:35 -0400, Tom Worster wrote:

 On 3/23/09 2:02 PM, Tom Worster f...@thefsb.org wrote:
 
 i havea general replacement or workaround for every php function in my code
 that i know to be utf-8-unsafe. except one: strtr().
 
 strtr() with three parameters is certainly unsafe. but my tests are showing
 that it may be ok with two parameters if the strings in the second parameter
 are well formed utf-8.
 
 does anyone know more? can confirm or contradict?

The two-argument version of strtr() should work fine
since there are no collisions in utf-8 such that part
of one character matches part of a different character.

The question is whether the function is binary safe.
The manual page doesn't say as far as I can tell.
Google came up with the following:

strtr() made binary safe in PHP3:
  http://marc.info/?l=php-generalm=92740681805351w=4

Two-argument version added in PHP4:
  http://php.net/ChangeLog-4.php


/Nisse

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] utf-8-safe replacement for strtr()?

2009-03-25 Thread Tom Worster
thanks for the info.

i'll leave 2-param uses of strtr in my code alone. i have a replacement for
the 3-param version.

btw: i have quite a long checklist of stuff to do when upgrading code for
utf-8, including notes on about 100 functions. do you think it would be
worth putting it on a wiki somewhere?


On 3/25/09 6:32 AM, Nisse Engström news.nospam.0ixbt...@luden.se wrote:

 On Tue, 24 Mar 2009 08:15:35 -0400, Tom Worster wrote:
 
 On 3/23/09 2:02 PM, Tom Worster f...@thefsb.org wrote:
 
 i havea general replacement or workaround for every php function in my code
 that i know to be utf-8-unsafe. except one: strtr().
 
 strtr() with three parameters is certainly unsafe. but my tests are showing
 that it may be ok with two parameters if the strings in the second parameter
 are well formed utf-8.
 
 does anyone know more? can confirm or contradict?
 
 The two-argument version of strtr() should work fine
 since there are no collisions in utf-8 such that part
 of one character matches part of a different character.
 
 The question is whether the function is binary safe.
 The manual page doesn't say as far as I can tell.
 Google came up with the following:
 
 strtr() made binary safe in PHP3:
   http://marc.info/?l=php-generalm=92740681805351w=4
 
 Two-argument version added in PHP4:
   http://php.net/ChangeLog-4.php
 
 
 /Nisse



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] utf-8-safe replacement for strtr()?

2009-03-24 Thread Tom Worster
On 3/23/09 2:02 PM, Tom Worster f...@thefsb.org wrote:

 i havea general replacement or workaround for every php function in my code
 that i know to be utf-8-unsafe. except one: strtr().

strtr() with three parameters is certainly unsafe. but my tests are showing
that it may be ok with two parameters if the strings in the second parameter
are well formed utf-8.

does anyone know more? can confirm or contradict?


 the only ideas i have to implement strtr in php with known utf-8-safe php
 functions would be rather inefficient.

my replacement for the 3-param strtr is an order of magnitude less unlovely
to behold than my replacement for the 2-param version


 any ideas, suggestions, pointers? or maybe you're better at googling an
 answer than i am. assume mbstring is available.



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php