[PHP] Re: ctype_print, the British Pound and other non-ASCII characters

2010-02-26 Thread Nathan Rixham
Bob wrote:
 I'm seeing mischief from ctype_print.
 
 So far as I can tell, the British Pound symbol, '£' is considered a 
 printable character according to the locale I use on my Ubuntu box. But 
 even across two years, two boxes, several versions of Ubuntu (from 7.04 
 to 9.10, one x86, one AMD64), and two major versions of PHP (PHP 4 and 
 now PHP 5.2.11), I cannot get ctype_print to return true when a string 
 given to it contains the British Pound symbol. (Or other non-ASCII 
 characters such as ø or ß.)
 
 The locale I'm using is en_GB.UTF-8 and when I call setlocale(LC_ALL, 
 'en_GB.UTF-8') in PHP, it returns the name of this locale rather than 
 FALSE, so that seems to be in order. (However, to be sure I have 
 installed and reinstalled the language pack in Ubuntu as suggested by 
 others.)
 
 I've even read through the en_GB and i18n locale definition files to 
 confirm that U00A3 (for the British Pound symbol) does appear within 
 the print and graph sections, so both ctype_print and ctype_graph should 
 consider it acceptable.
 
 What's most maddening is that ctype_print does return true on my shared 
 hosting server, so I know that it can be achieved. I'm just hoping that 
 someone here can tell me what I'm doing wrong, or what my operating 
 system is doing wrong.
 
 For your information, I'm currently running the following:
 
 Ubuntu 9.10 (AMD64)
 Apache 2.2.14
 PHP 5.2.11 running as a CGI (to mirror the config of my shared host)
 Locale in use: en_GB.UTF-8
 LANG=en_GB.UTF-8
 
 Can anyone tell me how to get ctype_print to behave?

Tested on a few ubuntu boxes (89s) and:

When using en_US.utf8 all is fine

var_dump( ctype_print( 'abcd ef £ ghs als kl ,!' ) ); // TRUE

then:

# locale-gen en_GB.UTF-8
Generating locales...
  en_GB.UTF-8... done
Generation complete.

# locale -a
C
en_GB.utf8
en_US
en_US.utf8
POSIX

setlocale(LC_ALL, 'en_GB.UTF-8');
var_dump( ctype_print( 'abcd ef £ ghs als kl ,!' ) ); // FALSE

wondering if this is a PHP issue or a mapping generation issue on ubuntu..

have you checked the output of #locale to ensure LC_CTYPE is set to the
appropriate value?

regards!

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Re: ctype_print, the British Pound and other non-ASCII characters

2010-02-26 Thread Bob
Hello, Nathan.

I'm glad to hear that someone else can reproduce the problem with 
en_GB.UTF-8. I was worried it was some bad luck quirk that I was never 
going to get to the bottom of.

I tried using en_US.utf8 (and also en_US.UTF-8) in setlocale (and it did 
not return false, so again looks like the locale is found and accepted). 
But I still got a return of false from ctype_print for non-ASCII 
characters. So even with en_US I'm getting bad behaviour.

When you switch back to en_US.UTF-8 (or en_US.utf8) do you get true from 
ctype_print as expected? (I'm hoping that you don't suddenly find 
ctype_print refuses to behave properly under all locales.)

Output from `locale` shows that all types are 'en_GB.UTF-8' except LC_ALL 
which is blank (as I believe it should be).

Do you know how I can dig further? I don't know anything about debugging 
PHP or Linux, so I don't know how to trace the source of this strange 
result.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Re: ctype_print, the British Pound and other non-ASCII characters

2010-02-26 Thread Bob
In php.i18n, an interesting discussion about this problem has appeared.

It looks like the problem is Ubuntu and not PHP, as a short chunk of code 
written in C and using the native isprint equivalent to ctype_print also 
returns false for the British Pound symbol.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php