[moving the discussion to php-i18n list]

Will a program always be able to change the runtime_encoding setting?

Yes.

Some hosts like to lock off everything and disable ini_set etc. If the host has hardlocked it at something terrible, can my portable program still declare that
it needs to work with UTF-8?

You can change anything but unicode_semantics from your own .ini file or from within the script (using declare() pragma instead of script_encoding).

Which brings to mind; if the input in $_REQUEST etc has been misconverted by a bad setting, how do I get at the unconverted data to fix it? The (outdated ;) README says this will be possible but I didn't see any reference to how.

Yes, that part has not been implemented yet.

I do find the FATAL ERRORS on using the 'wrong' string type a bit odd though; most other types in PHP will coerce silently (string . int), and the wildly
incompatible ones usually cause mere NOTICE or WARNING-level messages.

Was this change from PHP's regular behavior a conscious decision to make people think harder about what kind of strings they're using? From the original design document I got the impression that it was meant to be specific to special binary-only strings, which would be used relatively rarely (eg for binary file I/O) while more typical strings would transparently "just work" most of the time. Now the binary strings have replaced the native strings and the whole
behavior has changed.

The only difference between binary and native strings in the original design was that binary ones did no participate in implicit or some explicit conversions. Now that these two types have been conflated, we may have to adjust the semantics, which is why I proposed that casting operators (explicit conversions) always work. We could make implicit conversions work also if we work out the details, such as what encoding to use for converting binary strings to Unicode (script or runtime). I kind of like your idea of allowing only ASCII characters in binary string literals..

(A comparison with other languages; Python is normally very strict about typing and won't even let you concatenate a string with an integer without an explicit conversion. But it will let you concatenate a byte string with a Unicode string,
with an automatic coercion to Unicode.)

I guess they don't worry about script encoding?

Personally I have no use for non-ASCII identifiers.

Anything that needs to get used for referring to identifiers, though, needs to
be able to operate consistently in some fashion...
* array_map("some_function_name", $data);
* $GLOBALS["myConfigVar"] = $newval;
etc

These probably need to either 'just work' when passed the other kind of string,
or have some kind of consistent cast available.

Yes, of course. One solution would be to make our class/function tables always be Unicode instead of depending on the unicode_semantics switch. But that might slow down the non-Unicode mode.

(Life would be a lot simpler if there weren't two different modes, of course. :)


Absolutely. Derick, Rasmus, and I have been discussing the possibility of having only one mode. The main issue is performance (portability issues should not be that great) so Derick was going to run some tests and see how much slower PHP 6 in unicode mode really is compared to non-unicode mode and to 5.1.

-Andrei

--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to