Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-24 Thread Sara Golemon
On Thu, Jun 22, 2006 at 09:15:23PM -0700, Sara Golemon wrote: utf16 of php's internal encoding. Big or Little Endian? Yes. By that I of course mean that the endianness of U16 data points used internally by PHP are dependent on the architecture's endianness. For example: If it's x86, then

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Andrei Zmievski
How about str_storage_size()? It is explicit enough that people will be wary of using it. -Andrei On Jun 22, 2006, at 10:56 PM, Andi Gutmans wrote: Oops, senile me :) How about str_size()? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit:

RE: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Andi Gutmans
Fine with me. -Original Message- From: Andrei Zmievski [mailto:[EMAIL PROTECTED] Sent: Thursday, June 22, 2006 11:08 PM To: Andi Gutmans Cc: 'Johannes Schlueter'; internals@lists.php.net; 'Ron Korving' Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics How about

RE: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Andi Gutmans
- From: Sara Golemon [mailto:[EMAIL PROTECTED] Sent: Thursday, June 22, 2006 9:15 PM To: Ron Korving Cc: internals@lists.php.net Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics Still, it's gotta be useful to be know how many bytes it occupies. Perhaps for Content-length

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Andrei Zmievski
PROTECTED] Sent: Thursday, June 22, 2006 9:15 PM To: Ron Korving Cc: internals@lists.php.net Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics Still, it's gotta be useful to be know how many bytes it occupies. Perhaps for Content-length headers or something. There are plenty of low level

RE: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Andi Gutmans
'; internals@lists.php.net Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics The only way they can get at the internal UTF-16 representation is via unicode_encode($uni, 'UTF-16') which will return a binary UTF-16 string. In that case, strlen() will work just as well. -Andrei

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Johannes Schlueter
Hi, in my opinion that name is bad since most of the time the string won't be stored using the internal encoding but stored using some implicit converted encoding like the encoding of the stream being used or the one from the database. So the size needed to store the string would most likley

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Ron Korving
-Original Message- From: Sara Golemon [mailto:[EMAIL PROTECTED] Sent: Thursday, June 22, 2006 9:15 PM To: Ron Korving Cc: internals@lists.php.net Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics Still, it's gotta be useful to be know how many bytes it occupies. Perhaps

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Andrei Zmievski
Golemon [mailto:[EMAIL PROTECTED] Sent: Thursday, June 22, 2006 9:15 PM To: Ron Korving Cc: internals@lists.php.net Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics Still, it's gotta be useful to be know how many bytes it occupies. Perhaps for Content-length headers or something

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Andrei Zmievski
it. -Original Message- From: Andrei Zmievski [mailto:[EMAIL PROTECTED] Sent: Thursday, June 22, 2006 11:38 PM To: Andi Gutmans Cc: 'Sara Golemon'; 'Ron Korving'; internals@lists.php.net Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics The only way they can get at the internal UTF-16

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Sara Golemon
The only way they can get at the internal UTF-16 representation is via unicode_encode($uni, 'UTF-16') which will return a binary UTF-16 string. In that case, strlen() will work just as well. Hmm, I was thinking we might have some binary write function which would do that automagically. I

RE: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Andi Gutmans
:[EMAIL PROTECTED] Sent: Friday, June 23, 2006 1:16 AM To: Andi Gutmans; 'Andrei Zmievski' Cc: 'Ron Korving'; internals@lists.php.net Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics The only way they can get at the internal UTF-16 representation is via unicode_encode($uni

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Andrei Zmievski
Especially since the UTF-16 internal representation may be little- or big-endian, depending on the platform. -Andrei On Jun 23, 2006, at 11:31 AM, Andi Gutmans wrote: Nah I didn't mean to get back to that discussion. I was thinking more of a binary dump of info (e.g. session-like stuff) or

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Daniel Convissor
On Thu, Jun 22, 2006 at 09:15:23PM -0700, Sara Golemon wrote: utf16 of php's internal encoding. Big or Little Endian? Thanks, --Dan -- T H E A N A L Y S I S A N D S O L U T I O N S C O M P A N Y data intensive web and database programming

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-22 Thread Ron Korving
Still, it's gotta be useful to be know how many bytes it occupies. Perhaps for Content-length headers or something. There are plenty of low level concepts to think of where one might need this. And even if you can't think of any reason now, you don't wanna get hit in the face by it and have to

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-22 Thread Andrei Zmievski
It'll be there. strlen_bytes() perhaps? -Andrei On Jun 22, 2006, at 2:55 PM, Ron Korving wrote: Still, it's gotta be useful to be know how many bytes it occupies. Perhaps for Content-length headers or something. There are plenty of low level concepts to think of where one might need this.

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-22 Thread Sara Golemon
Still, it's gotta be useful to be know how many bytes it occupies. Perhaps for Content-length headers or something. There are plenty of low level concepts to think of where one might need this. And even if you can't think of any reason now, you don't wanna get hit in the face by it and have to

RE: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-22 Thread Andi Gutmans
Maybe sizeof() should not be an alias for strlen() when operating on Unicode...? Andi -Original Message- From: Andrei Zmievski [mailto:[EMAIL PROTECTED] Sent: Thursday, June 22, 2006 3:14 PM To: Ron Korving Cc: internals@lists.php.net Subject: Re: [PHP-DEV] Re: strlen() under

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-22 Thread Johannes Schlueter
: Thursday, June 22, 2006 3:14 PM To: Ron Korving Cc: internals@lists.php.net Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics It'll be there. strlen_bytes() perhaps? -Andrei On Jun 22, 2006, at 2:55 PM, Ron Korving wrote: Still, it's gotta be useful to be know how

RE: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-22 Thread Andi Gutmans
[mailto:[EMAIL PROTECTED] Sent: Thursday, June 22, 2006 3:14 PM To: Ron Korving Cc: internals@lists.php.net Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics It'll be there. strlen_bytes() perhaps? -Andrei On Jun 22, 2006, at 2:55 PM, Ron Korving wrote

RE: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-21 Thread Jared Williams
Enjoyed Andrei's talk at the NYPHP Conference last week about unicode in PHP 6. He mentioned that when unicode.semantics is on, strlen() will return the number of characters rather than the number of bytes, like mb_string() does or strlen() if mbstring.func_overload is on. The

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-21 Thread Sara Golemon
What happens with $fp = fopen('foo.bin', 'wb'); $written = fwrite($fp, $str); if (strlen($str) != $written) { echo 'Not written', \n; } Assuming $str is a binary string. The above code works just fine. If it's a unicode string: Short version: Don't do that. Writing a unicode string to a