Re: [PHP-DEV][RFC] mb_str_split

2019-01-15 Thread Lauri Kenttä

On 2019-01-13 17:29, Legale Legage wrote:
There is 2 more 2-bytes width encodings: MBFL_ENCTYPE_MWC2BE 
(UTF16-BE),

MBFL_ENCTYPE_MWC2LE (UTF16-LE).


UTF-16 is not a fixed-width 2-byte encoding.
Just like UTF-8 is not a fixed-width 1-byte encoding.

--
Lauri Kenttä

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV][RFC] mb_str_split

2019-01-13 Thread Legale Legage
Help me to find memory leak. Travis CI says:
010+ [Sun Jan 13 18:49:49 2019]  Script:
'/home/travis/build/php/php-src/ext/mbstring/tests/mb_str_split_jp.php'
011+ /home/travis/build/php/php-src/ext/mbstring/mbstring.c(646) :  Freeing
0x7f59d2c02540 (66 bytes),

https://travis-ci.org/php/php-src/jobs/479101356

What to do next?





On Sun, 13 Jan 2019 at 16:29, Legale Legage  wrote:

> Hello, internals!
> NikiC wrote very detailed review about my mb_str_split. So i rewrote the
> function completely. While i was working on new implementation i've noticed
> something in the mbfl library functions: mbfl_substr and mbfl_strlen.
>
>
> if (encoding->flag & MBFL_ENCTYPE_SBCS) {
> len = string->len;
> } else if (encoding->flag & (MBFL_ENCTYPE_WCS2BE | MBFL_ENCTYPE_WCS2LE)) {
> len = string->len/2;
> } else if (encoding->flag & (MBFL_ENCTYPE_WCS4BE | MBFL_ENCTYPE_WCS4LE)) {
> len = string->len/4;
> }
>
> There is 2 more 2-bytes width encodings: MBFL_ENCTYPE_MWC2BE (UTF16-BE),
> MBFL_ENCTYPE_MWC2LE (UTF16-LE).
>
> Is this a mistake or not?
>
> Please check:
> https://github.com/php/php-src/blob/30668755b64aa732246d952451f89d1fcfe581f0/ext/mbstring/libmbfl/mbfl/mbfilter.c#L659
>


[PHP-DEV][RFC] mb_str_split

2019-01-13 Thread Legale Legage
Hello, internals!
NikiC wrote very detailed review about my mb_str_split. So i rewrote the
function completely. While i was working on new implementation i've noticed
something in the mbfl library functions: mbfl_substr and mbfl_strlen.


if (encoding->flag & MBFL_ENCTYPE_SBCS) {
len = string->len;
} else if (encoding->flag & (MBFL_ENCTYPE_WCS2BE | MBFL_ENCTYPE_WCS2LE)) {
len = string->len/2;
} else if (encoding->flag & (MBFL_ENCTYPE_WCS4BE | MBFL_ENCTYPE_WCS4LE)) {
len = string->len/4;
}

There is 2 more 2-bytes width encodings: MBFL_ENCTYPE_MWC2BE (UTF16-BE),
MBFL_ENCTYPE_MWC2LE (UTF16-LE).

Is this a mistake or not?

Please check:
https://github.com/php/php-src/blob/30668755b64aa732246d952451f89d1fcfe581f0/ext/mbstring/libmbfl/mbfl/mbfilter.c#L659


[PHP-DEV] RFC mb_str_split

2019-01-02 Thread Legale Legage
Hello, internals.
I would like to introduce you RFC mb_str_split (
https://wiki.php.net/rfc/mb_str_split).
mb_str_split it's just a multibyte analog of the native str_split.

Let's discuss.