It is not clear to me that the overhead of supporting both native and
unicode strings is worth the effort.
The developers of the api will still need to make changes to their
implementation, and making the code look like it supports either form,
disguises errors that will become bugs.

It might be more efficient in the long run to be unicode internally, and
offer a thunk layer to interface to the unmodified functions.
Old functions will go thru a conversion of string vars to native and back on
call/return, perhaps some other conversions for length. As functions are
upgraded to be native unicode, they will avoid the conversions.

There is a performance cost, but the most frequently called functions will
be upgraded first so the majority of the performance issues can be
addressed. This approach has the benefit that we don't carry the baggage for
dual support around throughout the php core going forward.

The thunk layer could include a way to specify the conversion details for
each api so there wouldn't be "guessing" as to what is needed.

Tex Texin
Internationalization Architect,   Yahoo! Inc.
 
 


> -----Original Message-----
> From: Dmitry Stogov [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, February 16, 2006 1:12 AM
> To: php-i18n@lists.php.net
> Subject: RE: [PHP-I18N] Ideas for a portable string api
> 
> 
> Hi,
> 
> After reviewing Marcus ideas, some experiments and speaking 
> with Andrei. I propose the following solutions:
> 
> 1) We will not use any kind of unicode literals in C code (no 
> L"foo" no "f\0o\0o\0\0"), Because L"" is not portable and 
> "f\0.." looks to ugly.
> 
> 2) We will change "zval" structure to make 
> "zval.value.str.len" and "zval.value.ustr.len" of the same 
> type. This will allow optimize Z_UNISTR() and Z_UNILEN() 
> macros. They will
> 
> #define Z_UNISTR(z)  ((void*)(Z_STRVAL(z)))
> #define Z_UNILEN(z)  ((void*)(Z_STRLEN(z)))
> 
> Instead of
> 
> #define Z_UNISTR(z)  
> Z_TYPE(z)==IS_UNICODE?(char*)Z_USTRVAL(z):Z_STRVAL(z)
> #define Z_UNILEN(z)  
> Z_TYPE(z)==IS_UNICODE?(int)Z_USTRLEN(z):Z_STRLEN(z)
> 
> 3)  I don't like to break source compatibility with 
> modification of "zval" layout as Marcus suggested. We will 
> pass string/unicode values near in the same way as do today. 
> As three values - zend_uchar type, void* str, int len. But we 
> will create a set of the following macros to do it with less overhead.
> 
> #define S_TYPE(x)             _type_##x
> #define S_UNIVAL(x)           _val_##x
> #define S_UNILEN(x)           _len_##x
> #define S_STRVAL(x)           ((char*)S_UNIVAL(x))
> #define S_USTRVAL(x)          ((UChar*)S_UNIVAL(x))
> #define S_STRLEN(x)           S_UNILEN(x)             
> #define S_USTRLEN(x)          S_UNILEN(x)
> 
> #define S_ARG(x)              zend_uchar S_TYPE(x), void 
> *S_UNIVAL(x), int
> S_UNILEN(x)
> 
> #define S_PASS(x)             S_TYPE(x), S_UNIVAL(x), S_UNILEN(x)
> 
> #define Z_STR_PASS(x)         Z_TYPE(x), Z_UNIVAL(x), Z_UNILEN(x)
> #define Z_STR_PASS_P(x)       Z_TYPE_P(x), Z_UNIVAL_P(x), 
> Z_UNILEN_P(x)
> #define Z_STR_PASS_PP(x)      Z_TYPE_PP(x), Z_UNIVAL_PP(x), 
> Z_UNILEN_PP(x)
> 
> Then most zend_u_... Functions must be rewriten with these macros
> 
> Foe example:
> 
> ZEND_API int zend_u_lookup_class(S_ARG(name), zend_class_entry ***ce
> TSRMLS_DC)
> {
>       return zend_u_lookup_class_ex(S_PASS(name), 1, ce TSRMLS_CC); }
> 
> Instead of
> 
> ZEND_API int zend_u_lookup_class(zend_uchar type, void *name, 
> int name_length, zend_class_entry ***ce TSRMLS_DC) {
>       return zend_u_lookup_class_ex(type, name, name_length, 
> 1, ce TSRMLS_CC); }
> 
> Any objections, additions?
> 
> Thanks. Dmitry.
> 
> -- 
> PHP Unicode & I18N Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
> 
> 

--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to