So we don't lose track of it..

Begin forwarded message:

Resent-From: [EMAIL PROTECTED]
From: Rolland Santimano <[EMAIL PROTECTED]>
Date: October 4, 2005 8:25:22 AM PDT
Resent-To: [EMAIL PROTECTED]
To: Tex Texin <[EMAIL PROTECTED]>, Andrei Zmievski <[EMAIL PROTECTED]>
Subject: Converting URL processing funcns in PHP to Unicode

I was looking at the foll funcns to consider Unicode migration -
these deal with processing URL-like strings and I am wondering
whether there are any special/extra issues to consider rather than
than simply converting the iteration,etc to handle Unicode ?

Eg. for funcns [2]-[6], should the input be processed keeping IDN in
mind ?

I haven't posted this on the internals list as yet, do you guys have
any comments/suggestions ?

[1] string http_build_query(mixed formdata [, string prefix [, string
arg_separator]])
Generates a form-encoded query string from an associative array or
object - uses urlencode() mentioned below.


[2] mixed parse_url(string url, [int url_component])
Split URL into components: username, password, hostname, port, etc.

[3] string urlencode(string str)
[4] string urldecode(string str)
urlencode() replaces non-alphanumerics (except for hyphen, underscore
& period) with equivalent 2-digit hex escape sequences of the form
%xx. Space is replaced with plus(+).

[5] string rawurlencode(string str)
[6] string rawurldecode(string str)
rawurlencode() replaces non-alphanumerics (except for hyphen,
underscore & period) with equivalent 2-digit hex escape sequences of
the form %xx.

A couple of pblms in converting [3]-[6] above to handle Unicode:
(1) 2-digit hex sequences don't cover the range of Unicode codepts.
(2) The existing code has #define sections to handle EBCDIC and ASCII
input.


[7] string base64_encode(string str)
[8] string base64_decode(string str)
Implement base64 MIME

Is it correct to extend [7]&[8] above to support Unicode simply by
changing the iteration over the input string data ? Or should an
alternate transfer encoding method (quoted MIME ?) be used ?


I had also posted the foll Q last week on the internals list, but
didn't get any responses. Any comments as to correct approach ?

[3] string addcslashes(string text, string charlist)
[4] string stripcslashes(string text)
Escape chars < 32 or > 126 with octal sequences, and escape
characters from charlist with backspace.

Escaping chars/codepts with values > 126 is a pblm in Unicode
strings. Using the 3-digit octal escape sequence, only the first
0x1FF codepts will be escaped. One soln is to only escape values < 32
with the 3-digit octal sequence. Or use hex sequences for escaping
everything.

--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to