Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl

2012-11-01 Thread Sara Golemon
Okay, I can be convinced. :)

Diff and RFC updated.

And I agree on the slight sense of unease at adding a third encoding
converter, but ICU is so far ahead of the other two that I really feel
we should encourage its use over iconv and mbstring.

-Sara

On Wed, Oct 31, 2012 at 6:40 PM, Adam Harvey ahar...@php.net wrote:
 On 31 October 2012 06:18, Sara Golemon poll...@php.net wrote:
 With the exception of renaming the UConverter::UCNV_* constants to
 remove the UCNV_ prefix, I believe I've addressed the concerns thus
 far.  ((Waiting to hear if anyone else wants to weigh in on the
 contants))  The RFC has been updated accordingly.+

 I would say that unprefixing makes sense in general, particularly
 since that's what happens in other intl classes (such as Collator).
 Prefixing the callback reason constants with REASON_* seems like a
 good compromise there — as you said upthread, they are different to
 the other constants defined on UConverter. Beyond that, I think the
 type constants would do fine as direct UConverter constants
 (UConverter::UTF8, for instance, makes sense to me — in general,
 you're using a converter because you want to deal with encoding
 types).

 I'm +1 on the functionality in general. The thought of another
 encoding conversion API in PHP doesn't fill me with great joy given we
 already have mbstring and iconv, but it does provide features neither
 of those libraries provide: combined with the existing intl
 functionality and the ever-increasing need for internationalisation
 support, I'd like to think we might look at nudging intl towards being
 the usual way of providing that functionality in PHP.

 Thanks,

 Adam, who is apparently in a run-on sentence kind of mood today.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl

2012-10-31 Thread Ilia Alshanetsky
After the updates it looks really good, very handy functionality to have.

On Tue, Oct 30, 2012 at 6:18 PM, Sara Golemon poll...@php.net wrote:
 With the exception of renaming the UConverter::UCNV_* constants to
 remove the UCNV_ prefix, I believe I've addressed the concerns thus
 far.  ((Waiting to hear if anyone else wants to weigh in on the
 contants))  The RFC has been updated accordingly.+

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl

2012-10-31 Thread Adam Harvey
On 31 October 2012 06:18, Sara Golemon poll...@php.net wrote:
 With the exception of renaming the UConverter::UCNV_* constants to
 remove the UCNV_ prefix, I believe I've addressed the concerns thus
 far.  ((Waiting to hear if anyone else wants to weigh in on the
 contants))  The RFC has been updated accordingly.+

I would say that unprefixing makes sense in general, particularly
since that's what happens in other intl classes (such as Collator).
Prefixing the callback reason constants with REASON_* seems like a
good compromise there — as you said upthread, they are different to
the other constants defined on UConverter. Beyond that, I think the
type constants would do fine as direct UConverter constants
(UConverter::UTF8, for instance, makes sense to me — in general,
you're using a converter because you want to deal with encoding
types).

I'm +1 on the functionality in general. The thought of another
encoding conversion API in PHP doesn't fill me with great joy given we
already have mbstring and iconv, but it does provide features neither
of those libraries provide: combined with the existing intl
functionality and the ever-increasing need for internationalisation
support, I'd like to think we might look at nudging intl towards being
the usual way of providing that functionality in PHP.

Thanks,

Adam, who is apparently in a run-on sentence kind of mood today.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl

2012-10-30 Thread Pierre Joye
hi Sara!

On Tue, Oct 30, 2012 at 2:57 AM, Sara Golemon poll...@php.net wrote:
 http://wiki.php.net/rfc/uconverter

 Discuss!

Nice job!

Some comments, raw :) :

- the UCNV prefix is not necessary, we are in the UConverter class
already, I would simply use LATIN_1, or ILLEGAL for exampe.
- It is OO, we should or could use exception. Warning are not nice to
deal with and errors are quite common during conversion
- I would add ERROR_ to the error related constants (UNASSIGNED  co), or?

Some questions, is 'latin1' the same than const UCNV_LATIN_1? or is
there any relation between the two?

Cheers,
-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl

2012-10-30 Thread Pierre Joye
btw, can you add the changes to config.w32 too pls? Pretty much the
same than in .m4, so I can  add it to our CI and valid the RFC patch
in a more handy manner :)

On Tue, Oct 30, 2012 at 2:57 AM, Sara Golemon poll...@php.net wrote:
 http://wiki.php.net/rfc/uconverter

 Discuss!

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php




-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl

2012-10-30 Thread Ivan Enderlin @ Hoa

Very nice work.
Bravo!


On 30/10/12 02:57, Sara Golemon wrote:

http://wiki.php.net/rfc/uconverter

Discuss!



--
Ivan Enderlin
Developer of Hoa
http://hoa.42/ or http://hoa-project.net/

PhD. student at DISC/Femto-ST (Vesontio) and INRIA (Cassis)
http://disc.univ-fcomte.fr/ and http://www.inria.fr/

Member of HTML and WebApps Working Group of W3C
http://w3.org/


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl

2012-10-30 Thread Gustavo Lopes

Em 2012-10-30 2:57, Sara Golemon escreveu:

http://wiki.php.net/rfc/uconverter

Discuss!


Good work. I do, however, have some concerns:

* Error handling is done in a different way from the rest of the 
extension.


Error handling in ext/intl is, admittedly, a little strange. Errors are 
both stored globally (to be retrieved by intl_get_error_code() and 
intl_get_error_message()) and in the object that raised them (through 
::getErrorCode() and ::getErrorMessage()). The mechanism in ICU where 
one can do several calls in succession and check the error code only at 
the end is broken; the error codes are reset at the start of each call. 
The errors may raise any kind of PHP error (depending on the value of a 
INI setting, intl.error_level) or an exception (activated globally via 
intl.use_exceptions). On failure, functions/methods return false 
(constructors and factory methods return null).


Unusual as this may be, it would be a bad idea to introduce 
inconsistency here. See the implementations in the other modules.


* The tests have poor coverage.

--
Gustavo Lopes

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl

2012-10-30 Thread Sara Golemon
 - the UCNV prefix is not necessary, we are in the UConverter class
 already, I would simply use LATIN_1, or ILLEGAL for exampe.
 - I would add ERROR_ to the error related constants (UNASSIGNED  co), or?

For the UConverterType enum I could maybe get behind that.  For
UConverterCallbackReason I think we need something, but ERROR_ isn't
universally applicable since three are just status incidents
(UCNV_RESET, UCNV_CLOSE, UCNV_CLONE).  Perhaps: REASON_* for these?
In that vein, I'd be tempted to make the UConverterType values be
TYPE_*

 - It is OO, we should or could use exception. Warning are not nice to
 deal with and errors are quite common during conversion

We are.  UConverterException is thrown everywhere but one place.  That
one exception is the static method I added to give a non-oop-like API
for the most core functionality.  Given its non-oopness, I left it
non-exceptiony.  See Gustvo's notes about error handling below.

 Some questions, is 'latin1' the same than const UCNV_LATIN_1? or is
 there any relation between the two?

The UCNV_LATIN1 constant is a numeric value, so they're not the same,
no.  The relationship is that when you instantiate a UConverter with
'latin1' (or 'iso-8859-1', or any of its aliases),
$conv-getSourceType() (or getDestinationType() as appropriate) will
return UConverter::UCNV_LATIN_1.

Each type in the UConverterType set uses a different algorithm
approach to do the character conversion.  Sets like latin1, utf*,
ascii, etc... can be done faster than sets like 'koi8-r' because the
conversions are simple bitshifts (very simple in the case of
ascii/latin1).  This just surfaces some info on what the converter is
doing internally and isn't much use beyond a bit of logging, in
practice.

 btw, can you add the changes to config.w32 too pls?

Ah, right sorry, had meant to do that at the last minute.  Will add
that in a bit,

 Error handling is done in a different way from the rest of the extension.
 Unusual as this may be, it would be a bad idea to introduce inconsistency
 here. See the implementations in the other modules.

I do agree, I was hoping I could slip that one by. :p  I'll take
another look at the rest of ext/intl and see about adapting my error
handling.

 The tests have poor coverage.

The important bits are hit, but yeah, it could stand to have better
testing.  I'll add some.

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl

2012-10-30 Thread Stas Malyshev
Hi!

 http://wiki.php.net/rfc/uconverter
 
 Discuss!

Looks nice. Some points:

1. transcode() accepts options, but there's no comparable way to set
options to the object. I think these APIs should be synchronized.
Imagine code keeping options in array/config object - it's be really
annoying to have two separate procedures to feed these to object and to
transcode().
Also, description of options would be helpful.

2. Shouldn't Enumeration and lookup methods be static? They look like
independent from encodings and don't use the object.

3. For Advanced Use, I think no error condition should be the
default and not requiring explicit action.

4. I think error reporting should match other intl functions. It'd not
really be good if intl submodules would be all different in error
reporting.

5. What is $source parameter for callbacks?

6. Why toUCallback returns string but fromUCallback gets codepoint as
long? Shouldn't those be the same - i.e., if toU returns unicode
codepoint, it should be long? Or it can return multiple codepoints? In
which case it becomes confusing as we represent codepoints as both
string and long in the same API.

7. Link to ICU API from the RFC would be helpful for reviewers and later
docs, I think.

-- 
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl

2012-10-30 Thread Sara Golemon
 1. transcode() accepts options, but there's no comparable way to set
 options to the object. I think these APIs should be synchronized.
 Imagine code keeping options in array/config object - it's be really
 annoying to have two separate procedures to feed these to object and to
 transcode().

transcode() having an $options parameter is to make up for the
instance version (convert()) being able to set those via instance
functions (setSubstChars()).

I don't picture a given app using both convert() and transcode(), the
latter only exists to placate those who are objectophobic.

 Also, description of options would be helpful.

They're covered in the RFC: to_subst and from_subst under Simple Use

 2. Shouldn't Enumeration and lookup methods be static? They look like
 independent from encodings and don't use the object.

They are in the patch, I just forgot to note that in the RFC.  Updated.

 3. For Advanced Use, I think no error condition should be the
 default and not requiring explicit action.

If you take no action at all, then an error still exists.  This is
consistent with the underlying API.

 4. I think error reporting should match other intl functions. It'd not
 really be good if intl submodules would be all different in error
 reporting.

Mentioned in previous feedback, I plan to look at this again.

 5. What is $source parameter for callbacks?

It's context for where in the conversion we are.  $codeunit/$codepoint
is the specific element causing the problem, $source is the string
from that point forward.

 6. Why toUCallback returns string but fromUCallback gets codepoint as
 long? Shouldn't those be the same - i.e., if toU returns unicode
 codepoint, it should be long? Or it can return multiple codepoints? In
 which case it becomes confusing as we represent codepoints as both
 string and long in the same API.

Actually (I left this out of the RFC), they both can return a large
number of types.

In the case of toUCallback, you can return a utf-8 string (most
reasonable Unicode representation to be returned as a char* string)
and the callback mechanism will make that into UChars to put into the
target string.  You can return a long and it'll be treated as a single
Unicode codepoint (One UChar for BMP, 2 for higher planes).  You can
also return an array of either of these types to specify a string in a
readable, but unicode friendly format, e.g. array(Espa, 0x00F1 /*
LATIN SMALL LETTER N WITH TILDE */, ol)  would be equivalent to
Espa\xC3\xB1ol.

The same is true for fromUCallback() with the exception that the
values being returned are assumed to be in the target encoding.  For
longs this means a single byte unsigned char which is appended to the
target as-is.  Similarly strings are appended as-is.

As for input parameters: for toUCallback, $source and $codeUnits are
still in their original encoding and presented as-is for that
encoding.  For fromUCallback(), the $source/$codePoint are in Unicode
(UChar/UTF16 internall) and can't be directly offered to PHP without
running into endianness issues.  So the codepoint is provided as a
single UChar32 (avoiding the surrogate problem in the process), and
source is given as a series of UChar32 codepoints in a numerically
indexed array.

I'll add a section about callback input/return types to clarify this.

 7. Link to ICU API from the RFC would be helpful for reviewers and later
 docs, I think.

Added!

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl

2012-10-30 Thread Sara Golemon
With the exception of renaming the UConverter::UCNV_* constants to
remove the UCNV_ prefix, I believe I've addressed the concerns thus
far.  ((Waiting to hear if anyone else wants to weigh in on the
contants))  The RFC has been updated accordingly.+

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] [RFC] ICU UConverter implementation for ext/intl

2012-10-29 Thread Sara Golemon
http://wiki.php.net/rfc/uconverter

Discuss!

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php