[PHP-I18N] Re: [PHP-DEV] Unicode string literals and casting

Andrei Zmievski Thu, 02 Mar 2006 09:47:45 -0800

[moving the discussion to php-i18n list]

Will a program always be able to change the runtime_encoding setting?


Yes.

Some hosts like to lock off everything and disable ini_set etc. If thehost hashardlocked it at something terrible, can my portable program stilldeclare that
it needs to work with UTF-8?

You can change anything but unicode_semantics from your own .ini fileor from within the script (using declare() pragma instead ofscript_encoding).

Which brings to mind; if the input in $_REQUEST etc has beenmisconverted by abad setting, how do I get at the unconverted data to fix it? The(outdated ;)README says this will be possible but I didn't see any reference tohow.


Yes, that part has not been implemented yet.

I do find the FATAL ERRORS on using the 'wrong' string type a bit oddthough;most other types in PHP will coerce silently (string . int), and thewildly
incompatible ones usually cause mere NOTICE or WARNING-level messages.
Was this change from PHP's regular behavior a conscious decision tomake peoplethink harder about what kind of strings they're using? From theoriginal designdocument I got the impression that it was meant to be specific tospecialbinary-only strings, which would be used relatively rarely (eg forbinary fileI/O) while more typical strings would transparently "just work" mostof thetime. Now the binary strings have replaced the native strings and thewhole
behavior has changed.

The only difference between binary and native strings in the originaldesign was that binary ones did no participate in implicit or someexplicit conversions. Now that these two types have been conflated, wemay have to adjust the semantics, which is why I proposed that castingoperators (explicit conversions) always work. We could make implicitconversions work also if we work out the details, such as what encodingto use for converting binary strings to Unicode (script or runtime). Ikind of like your idea of allowing only ASCII characters in binarystring literals..

(A comparison with other languages; Python is normally very strictabout typingand won't even let you concatenate a string with an integer without anexplicitconversion. But it will let you concatenate a byte string with aUnicode string,
with an automatic coercion to Unicode.)


I guess they don't worry about script encoding?

Personally I have no use for non-ASCII identifiers.
Anything that needs to get used for referring to identifiers, though,needs to
be able to operate consistently in some fashion...
* array_map("some_function_name", $data);
* $GLOBALS["myConfigVar"] = $newval;
etc
These probably need to either 'just work' when passed the other kindof string,
or have some kind of consistent cast available.

Yes, of course. One solution would be to make our class/function tablesalways be Unicode instead of depending on the unicode_semantics switch.But that might slow down the non-Unicode mode.

(Life would be a lot simpler if there weren't two different modes, ofcourse. :)

Absolutely. Derick, Rasmus, and I have been discussing the possibilityof having only one mode. The main issue is performance (portabilityissues should not be that great) so Derick was going to run some testsand see how much slower PHP 6 in unicode mode really is compared tonon-unicode mode and to 5.1.


-Andrei

--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP-I18N] Re: [PHP-DEV] Unicode string literals and casting

Reply via email to