Re: [PHP-DEV] Re: foreach() for strings

2011-06-21 Thread Robert Eisele
And what actually failed? The idea seams straightforward.

Robert

2011/6/20 Johannes Schlüter johan...@schlueters.de

 On Mon, 2011-06-20 at 20:38 +0200, Robert Eisele wrote:
  I really like the ideas shared here. It's a thing of consideration that
  array-functions should also work with strings. Maybe this would be the
 way
  to go, but I'm more excited about the OOP implementation of TextIterator
 and
  ByteIterator, which solves the whole problem at once (and is easier to
  implement, as mentioned by Stas). As Jonathan  said, Database results
 with a
  certain encoding could get iterated, too. The only way to workaround the
  Text/Byte problem would be, offsetting EVERY string with 1-2 byte
  string-type information or an additional type flag in the
 zval-strcuture.
  Handling everything with zval's instead of objects would have the
 advantage,
  that database-layers like mysqlnd could write the database-encoding
 directly
  into the zval and the user had no need to decide what encoding is used.

 Welcome back to the failed PHP 6 Unicode project. ;-)
 (while we didn't store the original encoding but converted to Utf-16,
 which prevents random/strange conversions in other places when mixing
 encodings)

 johannes





Re: [PHP-DEV] Re: foreach() for strings

2011-06-21 Thread Ferenc Kovacs
2011/6/21 Robert Eisele rob...@xarg.org

 And what actually failed? The idea seams straightforward.

 Robert


http://www.slideshare.net/andreizm/the-good-the-bad-and-the-ugly-what-happened-to-unicode-and-php-6

to my understanding: in retrospective the utf-16 wasn't the best idea, it
caused more conversion that it seemed necessary beforehand, and many of the
core devs lacked the vison and/or the technical knowledge about the unicode
stuff, the adoption of the support for unicode strings was much slower than
expected.

Tyrael


Re: [PHP-DEV] Re: foreach() for strings

2011-06-21 Thread Stas Malyshev

Hi!

On 6/21/11 1:23 AM, Ferenc Kovacs wrote:


2011/6/21 Robert Eisele rob...@xarg.org mailto:rob...@xarg.org

And what actually failed? The idea seams straightforward.

Robert


http://www.slideshare.net/andreizm/the-good-the-bad-and-the-ugly-what-happened-to-unicode-and-php-6


Also you may want to read this:
http://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default

to understand why the idea is not straightforward as it seems. Yes, 
it's about Perl and UTF-8, but gives some impression about the number of 
issues that need to be handled. There are many PHP-specific ones on top 
of that (think databases, streams, filesystems, etc.) which would be 
expected to work out of the box if we declare Unicode support.

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] 5.4 features for vote (long)

2011-06-21 Thread Etienne Kneuss
Hello,

On Tue, Jun 21, 2011 at 05:17, Rasmus Lerdorf syst...@php.net wrote:
 On 06/20/2011 08:09 PM, Felipe Pena wrote:

 I'm ok with this, I just think it's ugly to repeat the token name in
 the definition in the .y file. :P

 %token T_LNUMBER 'number' (T_LNUMBER)
 %token T_STRING 'identifier' (T_STRING)

 Why 'identifier' and not 'string' or 'string-literal' there?

For people using php, a string or a string literal is foo or 'foo'.
T_STRING does not represent foo nor 'foo'.
identifier seems to adequatly describe what it encompass.

IMHO, it would even be better if the unnexpect part displayed the
actual content:

i.e.

function 1() = Unexpected number '1' ...
or
function 1() = Unexpected '1'...

Best,

 People know
 what a string is. I am not sure that people know what an identifier is,
 so in this case changing the error message from something that says
 expecting T_STRING to expecting identifier isn't making the error
 message any clearer as far as I am concerned. This is one of the reasons
 that having the token name there is useful. It provides continuity with
 the current error messages that people have grown used to. I think we
 either need the token names, or we need more descriptive names printed.

 -Rasmus




-- 
Etienne Kneuss
http://www.colder.ch

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] Re: svn: /php/php-src/ branches/PHP_5_4/sapi/cli/config.m4 branches/PHP_5_4/sapi/cli/config.w32 branches/PHP_5_4/sapi/cli/php_cli.c branches/PHP_5_4/sapi/cli/php_cli_server.c branches/PHP_5_

2011-06-21 Thread Michael Wallner
On Mon, 20 Jun 2011 20:27:39 +, Moriyoshi Koizumi wrote:

 moriyoshiMon, 20 Jun 2011 20:27:39 +
 
 Revision: http://svn.php.net/viewvc?view=revisionrevision=312344
 
 Log:
 - Add built-in web server to CLI SAPI. See the RFC for detail.

As noted [1] php_http_* had been used by pecl_http, so am I supposed to 
change to e.g. pecl_http_* prefix now?

Regards,
Mike

[1] http://marc.info/?l=php-internalsm=130321550627147w=2

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Re: svn: /php/php-src/ branches/PHP_5_4/sapi/cli/config.m4 branches/PHP_5_4/sapi/cli/config.w32 branches/PHP_5_4/sapi/cli/php_cli.c branches/PHP_5_4/sapi/cli/php_cli_server.c branches/PH

2011-06-21 Thread Pierre Joye
On Tue, Jun 21, 2011 at 11:43 AM, Michael Wallner m...@php.net wrote:
 On Mon, 20 Jun 2011 20:27:39 +, Moriyoshi Koizumi wrote:

 moriyoshi                                Mon, 20 Jun 2011 20:27:39 +

 Revision: http://svn.php.net/viewvc?view=revisionrevision=312344

 Log:
 - Add built-in web server to CLI SAPI. See the RFC for detail.

 As noted [1] php_http_* had been used by pecl_http, so am I supposed to
 change to e.g. pecl_http_* prefix now?

it should be php_cli_http actually, as php_http is also likely to be
used for other http related function not related to this.

-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Can't vote yet, as RFC has options (Was: Re: [PHP-DEV] [VOTE] release process RFC)

2011-06-21 Thread Pierre Joye
On Tue, Jun 21, 2011 at 5:30 AM, Sebastian Bergmann sebast...@php.net wrote:
 Am 20.06.2011 15:30, schrieb Derick Rethans:
 I am not generally against this RFC, but this point needs to be
 discussed first IMO. As having 5 active branches at the same time for
 the multiple major releases option is *not* workable.

  I agree.

That's why we added a couple of notices about 12 or 18 months. It is
also very unlikely that we end we end in such situations anyway.And
even if we do, given the strictness (about what can be applied), We
didn't see much of a problem here.

Cheers,
-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [VOTE] voting rfc

2011-06-21 Thread Alexey Shein
Hi!
Wouldn't it be better if wiki voting mechanism would be embedded in
the page with rfc? Under the block Table of contents, for example.
Just now green link Votes are open is not so noticeable, so having a
chance to vote without leaving the page would better inspire people to
take part in voting process.

2011/6/20 David Soria Parra d...@php.net:
 Hi Internals,

 we have been working on getting an rfc together on how to deal with
 votes on rfcs. We aim to provide a simple mechaism for votes while
 still maintaining freedom on how to do votes and how to work on rfcs.

 I want to move forward on the voting and release RFCs, so we can move
 forward on the 5.4 process. Therefore I call for votes on the voting RFC.

 The RFC can be found here:

  https://wiki.php.net/rfc/voting

 You can vote here:

  https://wiki.php.net/rfc/voting/vote

 Votes are open until Monday 27.06.2011.

 Thank you
 David

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php





-- 
Regards,
Shein Alexey

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [VOTE] voting rfc

2011-06-21 Thread Pierre Joye
added a link to the vote page. It should be more clear now.

On Tue, Jun 21, 2011 at 12:25 PM, Alexey Shein con...@gmail.com wrote:
 Hi!
 Wouldn't it be better if wiki voting mechanism would be embedded in
 the page with rfc? Under the block Table of contents, for example.
 Just now green link Votes are open is not so noticeable, so having a
 chance to vote without leaving the page would better inspire people to
 take part in voting process.

 2011/6/20 David Soria Parra d...@php.net:
 Hi Internals,

 we have been working on getting an rfc together on how to deal with
 votes on rfcs. We aim to provide a simple mechaism for votes while
 still maintaining freedom on how to do votes and how to work on rfcs.

 I want to move forward on the voting and release RFCs, so we can move
 forward on the 5.4 process. Therefore I call for votes on the voting RFC.

 The RFC can be found here:

  https://wiki.php.net/rfc/voting

 You can vote here:

  https://wiki.php.net/rfc/voting/vote

 Votes are open until Monday 27.06.2011.

 Thank you
 David

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php





 --
 Regards,
 Shein Alexey

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php





-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] foreach() for strings

2011-06-21 Thread Derick Rethans
On Mon, 20 Jun 2011, Stas Malyshev wrote:

 On 6/20/11 9:15 AM, John Crenshaw wrote:
   From: Ilia Alshanetsky [mailto:i...@prohost.org]
   
   As long as it works on a premise that a string is a byte array
   and each element represents 1 byte, +1 from me.
  
  Code written on this premise is almost always bug central when people
  finally get around to realizing why they really do need to support
  wide characters (and everybody does, because people like to paste
  stuff containing non-break-spaces, and decorative quotes). I really
  don't think this single byte character mentality should be
  encouraged.
 
 I think you're right, TextIterator would be better (and also much easier to
 implement, I think). Didn't we have it in Unicode branch? We could port it
 back or we could have something along the lines of grapheme_extract...

It depended on ICU there, and I would be against making a core thing in 
PHP 5.x depend on ICU.

cheers,
Derick

-- 
http://derickrethans.nl | http://xdebug.org
Like Xdebug? Consider a donation: http://xdebug.org/donate.php
twitter: @derickr and @xdebug

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Changed behaviour for strtr()

2011-06-21 Thread Derick Rethans
On Mon, 20 Jun 2011, Stas Malyshev wrote:

  Here is the next one.
  
  I think it's quite intuitive to use strtr() to remove single characters of a
  string, too, instead of using many str_replace($str, $chr, ). I'd glad to
  see this change also in 5.4.
 
 This is a BC break, if I understand it correctly, so I don't think it is a
 good idea.

I agree that this is not a good thing then.

cheers,
Derick

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] foreach() for strings

2011-06-21 Thread Pierre Joye
On Tue, Jun 21, 2011 at 12:53 PM, Derick Rethans der...@php.net wrote:

 It depended on ICU there, and I would be against making a core thing in
 PHP 5.x depend on ICU.

It can and should be done as part of intl, actually.

But that's somehow unrelated to the proposal here, as it is about
byte, not characters :)

-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [VOTE] voting rfc

2011-06-21 Thread Alexey Shein
2011/6/21 Pierre Joye pierre@gmail.com:
 added a link to the vote page. It should be more clear now.
Thank you.

But why not just place doodle plugin in the bottom of the page with
rfc? This will give some chances that people will read rfc till the
end before voting. What's the idea behind keeping 2 separate pages for
rfc and voting?


 On Tue, Jun 21, 2011 at 12:25 PM, Alexey Shein con...@gmail.com wrote:
 Hi!
 Wouldn't it be better if wiki voting mechanism would be embedded in
 the page with rfc? Under the block Table of contents, for example.
 Just now green link Votes are open is not so noticeable, so having a
 chance to vote without leaving the page would better inspire people to
 take part in voting process.

 2011/6/20 David Soria Parra d...@php.net:
 Hi Internals,

 we have been working on getting an rfc together on how to deal with
 votes on rfcs. We aim to provide a simple mechaism for votes while
 still maintaining freedom on how to do votes and how to work on rfcs.

 I want to move forward on the voting and release RFCs, so we can move
 forward on the 5.4 process. Therefore I call for votes on the voting RFC.

 The RFC can be found here:

  https://wiki.php.net/rfc/voting

 You can vote here:

  https://wiki.php.net/rfc/voting/vote

 Votes are open until Monday 27.06.2011.

 Thank you
 David

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php





 --
 Regards,
 Shein Alexey

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php





 --
 Pierre

 @pierrejoye | http://blog.thepimp.net | http://www.libgd.org




-- 
Regards,
Shein Alexey

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [VOTE] voting rfc

2011-06-21 Thread Pierre Joye
On Tue, Jun 21, 2011 at 1:05 PM, Alexey Shein con...@gmail.com wrote:
 2011/6/21 Pierre Joye pierre@gmail.com:
 added a link to the vote page. It should be more clear now.
 Thank you.

 But why not just place doodle plugin in the bottom of the page with
 rfc? This will give some chances that people will read rfc till the
 end before voting. What's the idea behind keeping 2 separate pages for
 rfc and voting?

Find nicer and clearer on a separate page, but it could be done on the
same page too... Not that it is that important we should not change it
now.


-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Re: foreach() for strings

2011-06-21 Thread Derick Rethans
On Mon, 20 Jun 2011, Anthony Ferrara wrote:

  text_to_array($s) == str_split($s, 1)
 
 No, because str_split always splits into 1 byte chunks.  text_to_array
 would take the character set into account (or that's where the utility
 in it would be)...

No, as PHP currently does *NOT* know about character sets. If you want 
character set, we need Unicode strings like we had in the PHP6 branch.

Derick

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] 5.4 features for vote (long)

2011-06-21 Thread Felipe Pena
2011/6/21 Etienne Kneuss col...@php.net:
 Hello,

 On Tue, Jun 21, 2011 at 05:17, Rasmus Lerdorf syst...@php.net wrote:
 On 06/20/2011 08:09 PM, Felipe Pena wrote:

 I'm ok with this, I just think it's ugly to repeat the token name in
 the definition in the .y file. :P

 %token T_LNUMBER 'number' (T_LNUMBER)
 %token T_STRING 'identifier' (T_STRING)

 Why 'identifier' and not 'string' or 'string-literal' there?

 For people using php, a string or a string literal is foo or 'foo'.
 T_STRING does not represent foo nor 'foo'.
 identifier seems to adequatly describe what it encompass.

 IMHO, it would even be better if the unnexpect part displayed the
 actual content:

 i.e.

 function 1() = Unexpected number '1' ...
 or
 function 1() = Unexpected '1'...


Currently it's possible to do this, it'll only require a static
variable in yytnamerr implementation.

-- 
Regards,
Felipe Pena

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] foreach() for strings

2011-06-21 Thread Lester Caine

Pierre Joye wrote:

It depended on ICU there, and I would be against making a core thing in
  PHP 5.x depend on ICU.

It can and should be done as part of intl, actually.

But that's somehow unrelated to the proposal here, as it is about
byte, not characters :)


I believe this may be where some of the new niggles may be coming from? With 
browsers returning unicode, it may be that some of the 'extra' characters are 
being returned as multibyte rather than as single bytes? Such as the problem 
reported on the general list currently. How do we ensure that we are dealing 
with single byte character strings nowadays?


--
Lester Caine - G8HFL
-
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] foreach() for strings

2011-06-21 Thread Pierre Joye
On Tue, Jun 21, 2011 at 1:33 PM, Lester Caine les...@lsces.co.uk wrote:
 Pierre Joye wrote:

 It depended on ICU there, and I would be against making a core thing in
   PHP 5.x depend on ICU.

 It can and should be done as part of intl, actually.

 But that's somehow unrelated to the proposal here, as it is about
 byte, not characters :)

 I believe this may be where some of the new niggles may be coming from? With
 browsers returning unicode, it may be that some of the 'extra' characters
 are being returned as multibyte rather than as single bytes? Such as the
 problem reported on the general list currently. How do we ensure that we are
 dealing with single byte character strings nowadays?

As it has been stated numerous times in this thread and other, we do
not do anything with multi bytes systems, unicode, etc. mbstring and
intl do, but php's string as of now is all about bytes, array of bytes
if I may describe them this way.

And we can't change this behavior.

Cheers,
-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] foreach() for strings

2011-06-21 Thread Lester Caine

Pierre Joye wrote:

On Tue, Jun 21, 2011 at 1:33 PM, Lester Caineles...@lsces.co.uk  wrote:

Pierre Joye wrote:


It depended on ICU there, and I would be against making a core thing in

  PHP 5.x depend on ICU.


It can and should be done as part of intl, actually.

But that's somehow unrelated to the proposal here, as it is about
byte, not characters :)


I believe this may be where some of the new niggles may be coming from? With
browsers returning unicode, it may be that some of the 'extra' characters
are being returned as multibyte rather than as single bytes? Such as the
problem reported on the general list currently. How do we ensure that we are
dealing with single byte character strings nowadays?


As it has been stated numerous times in this thread and other, we do
not do anything with multi bytes systems, unicode, etc. mbstring and
intl do, but php's string as of now is all about bytes, array of bytes
if I may describe them this way.

And we can't change this behavior.


That is exactly the point. I suppose what I am asking is how people ensure that 
what they are feeding into simple strings are single byte when cut and past 
nowadays does not make a distinction?


--
Lester Caine - G8HFL
-
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] foreach() for strings

2011-06-21 Thread John Crenshaw
Pierre Joye wrote:
 On Tue, Jun 21, 2011 at 1:33 PM, Lester Caineles...@lsces.co.uk  wrote:
 Pierre Joye wrote:

 It depended on ICU there, and I would be against making a core thing in
   PHP 5.x depend on ICU.

 It can and should be done as part of intl, actually.

 But that's somehow unrelated to the proposal here, as it is about
 byte, not characters :)

 I believe this may be where some of the new niggles may be coming from? With
 browsers returning unicode, it may be that some of the 'extra' characters
 are being returned as multibyte rather than as single bytes? Such as the
 problem reported on the general list currently. How do we ensure that we are
 dealing with single byte character strings nowadays?

 As it has been stated numerous times in this thread and other, we do
 not do anything with multi bytes systems, unicode, etc. mbstring and
 intl do, but php's string as of now is all about bytes, array of bytes
 if I may describe them this way.

 And we can't change this behavior.

This mindset is fundamentally broken. You can call it a byte array all you 
want, but the truth is that 99.999% of the time, when a developer is using a 
string they need it for characters, not for bytes, and characters are not 
single byte. Even English users tend to submit Unicode range characters at an 
alarming rate. If you're using a WYSIWYG editor, Chrome will submit 
non-breaking-spaces as the actual UTF8 encoded character, not as an HTML 
encoded entity. Whether developers like it, or even know it, supporting an 
extended universal character set is not really optional.

PHP makes this bad enough with the whole collection of bytewise string 
functions, including many with no appropriate multibyte aware replacement, but 
at least this can be avoided, quickly audited, and in the future can even be 
fixed in any number of ways with only a nominal BC impact. Hard coding this 
single byte idiocy into a language construct (foreach) though would be an 
incredibly awful idea. This would create a trap for new naive PHP developers, 
and create a character set problem that the language could NEVER recover from 
without a massive BC break.

This proposal is really about adding a feature which whenever it used is almost 
guaranteed to be an error. It probably won't look to the developer like an 
error during simple testing, but will almost certainly show up as an error in 
production. Is it really worth all that for a bit of syntax sugar that the 
developer will have to strip out anyway to fix their bug?

If string iteration needs to be addressed in the core (and IMO it doesn't 
because it can be handled at the script level, but if it does) why not use 
iterator classes? This gives the same functionality and prevents the language 
from encouraging hidden bugs.

John Crenshaw
Priacta, Inc.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] foreach() for strings

2011-06-21 Thread Pierre Joye
On Tue, Jun 21, 2011 at 4:38 PM, John Crenshaw johncrens...@priacta.com wrote:

 This mindset is fundamentally broken. You can call it a byte array all you 
 want, but the truth is that 99.999% of the time, when a developer is using a 
 string they need it for characters, not for bytes

Let me rephrase:

For backward compatibility reasons we cannot change this behavior.

Any serious text processing should be done using intl, mbstring,
transliterator (pecl) or other similar solutions.

Cheers,
--
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [VOTE] voting rfc

2011-06-21 Thread Philip Olson


On Jun 20, 2011, at 5:15 AM, David Soria Parra wrote:

 Hi Internals,
 
 we have been working on getting an rfc together on how to deal with
 votes on rfcs. We aim to provide a simple mechaism for votes while
 still maintaining freedom on how to do votes and how to work on rfcs.
 
 I want to move forward on the voting and release RFCs, so we can move
 forward on the 5.4 process. Therefore I call for votes on the voting RFC.
 
 The RFC can be found here:
 
  https://wiki.php.net/rfc/voting
 
 You can vote here:
 
  https://wiki.php.net/rfc/voting/vote
 
 Votes are open until Monday 27.06.2011.

Please clarify the who can vote aspect of this RFC, which is:


The proposal here is for two audiences to participate in the voting process:

* People with php.net SVN accounts that have contributed code to PHP
* Representatives from the PHP community, that will be chosen by those with 
php.net SVN accounts
  * Lead developers of PHP based projects (frameworks, cms, tools, etc.)
  * regular participant of internals discussions


Does this mean that a php.net account holder must have 1+ commits? How are 
Lead developers determined exactly? Do they nominate themselves? Does each 
name require an official vote with a two week waiting period? And what's a 
regular participant of internal discussions? One post per week/month/year? 
And only the internals@lists.php.net mailing list applies? 

I voted against this RFC partly because the above is not clear.

Regards,
Philip


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] foreach() for strings

2011-06-21 Thread Tomas Kuliavas
2011.06.21 17:38 John Crenshaw rašė:
 Pierre Joye wrote:
 On Tue, Jun 21, 2011 at 1:33 PM, Lester Caineles...@lsces.co.uk
 wrote:
 Pierre Joye wrote:

 It depended on ICU there, and I would be against making a core thing
 in
   PHP 5.x depend on ICU.

 It can and should be done as part of intl, actually.

 But that's somehow unrelated to the proposal here, as it is about
 byte, not characters :)

 I believe this may be where some of the new niggles may be coming from?
 With
 browsers returning unicode, it may be that some of the 'extra'
 characters
 are being returned as multibyte rather than as single bytes? Such as
 the
 problem reported on the general list currently. How do we ensure that
 we are
 dealing with single byte character strings nowadays?

 As it has been stated numerous times in this thread and other, we do
 not do anything with multi bytes systems, unicode, etc. mbstring and
 intl do, but php's string as of now is all about bytes, array of bytes
 if I may describe them this way.

 And we can't change this behavior.

 This mindset is fundamentally broken. You can call it a byte array all you
 want, but the truth is that 99.999% of the time, when a developer is using
 a string they need it for characters, not for bytes, and characters are
 not single byte. Even English users tend to submit Unicode range
 characters at an alarming rate. If you're using a WYSIWYG editor, Chrome
 will submit non-breaking-spaces as the actual UTF8 encoded character, not
 as an HTML encoded entity. Whether developers like it, or even know it,
 supporting an extended universal character set is not really optional.

They submit it in utf-8 only if your html form allows them to do that or
they don't follow html specification and try to exploit your form. Set
form input charset to iso-8859-1 and your nbspace will take only one byte.

-- 
Tomas



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [VOTE] voting rfc

2011-06-21 Thread Arvids Godjuks
That really neads clearing, because if i understand correctly, I should get
ability to vote (userland developer activly reading the list and writing to
list on some maters). So the question - do i get a vote ability? :-)
21.06.2011 17:36 пользователь Philip Olson phi...@roshambo.org написал:


 On Jun 20, 2011, at 5:15 AM, David Soria Parra wrote:

 Hi Internals,

 we have been working on getting an rfc together on how to deal with
 votes on rfcs. We aim to provide a simple mechaism for votes while
 still maintaining freedom on how to do votes and how to work on rfcs.

 I want to move forward on the voting and release RFCs, so we can move
 forward on the 5.4 process. Therefore I call for votes on the voting RFC.

 The RFC can be found here:

 https://wiki.php.net/rfc/voting

 You can vote here:

 https://wiki.php.net/rfc/voting/vote

 Votes are open until Monday 27.06.2011.

 Please clarify the who can vote aspect of this RFC, which is:

 
 The proposal here is for two audiences to participate in the voting
process:

 * People with php.net SVN accounts that have contributed code to PHP
 * Representatives from the PHP community, that will be chosen by those
with php.net SVN accounts
 * Lead developers of PHP based projects (frameworks, cms, tools, etc.)
 * regular participant of internals discussions
 

 Does this mean that a php.net account holder must have 1+ commits? How are
Lead developers determined exactly? Do they nominate themselves? Does each
name require an official vote with a two week waiting period? And what's a
regular participant of internal discussions? One post per week/month/year?
And only the internals@lists.php.net mailing list applies?

 I voted against this RFC partly because the above is not clear.

 Regards,
 Philip


 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php



Re: RE: [PHP-DEV] foreach() for strings

2011-06-21 Thread Arvids Godjuks
As a userland developer due to my geographical nature i have to work with 3
languages constantly - english, russian (cyryllic) and latvian (witch has
it's own share of non latin characters). I end up using utf-8 in every
project. And some give me a headake of dealing with text parsing. mb_string
covers just part of the functionality and can be turned off.

I personally think something has to be done about unicode handling in php
after 5.4 so that we have an official method of dealing with it in the core.
Probably it can be done in a namespace of its own and be new functionality
to witch people should migrate.

my 2 cents.
21.06.2011 17:56 пользователь Tomas Kuliavas to...@users.sourceforge.net
написал:
 2011.06.21 17:38 John Crenshaw rašė:
 Pierre Joye wrote:
 On Tue, Jun 21, 2011 at 1:33 PM, Lester Caineles...@lsces.co.uk
 wrote:
 Pierre Joye wrote:

 It depended on ICU there, and I would be against making a core thing
 in
 PHP 5.x depend on ICU.

 It can and should be done as part of intl, actually.

 But that's somehow unrelated to the proposal here, as it is about
 byte, not characters :)

 I believe this may be where some of the new niggles may be coming from?
 With
 browsers returning unicode, it may be that some of the 'extra'
 characters
 are being returned as multibyte rather than as single bytes? Such as
 the
 problem reported on the general list currently. How do we ensure that
 we are
 dealing with single byte character strings nowadays?

 As it has been stated numerous times in this thread and other, we do
 not do anything with multi bytes systems, unicode, etc. mbstring and
 intl do, but php's string as of now is all about bytes, array of bytes
 if I may describe them this way.

 And we can't change this behavior.

 This mindset is fundamentally broken. You can call it a byte array all
you
 want, but the truth is that 99.999% of the time, when a developer is
using
 a string they need it for characters, not for bytes, and characters are
 not single byte. Even English users tend to submit Unicode range
 characters at an alarming rate. If you're using a WYSIWYG editor, Chrome
 will submit non-breaking-spaces as the actual UTF8 encoded character, not
 as an HTML encoded entity. Whether developers like it, or even know it,
 supporting an extended universal character set is not really optional.

 They submit it in utf-8 only if your html form allows them to do that or
 they don't follow html specification and try to exploit your form. Set
 form input charset to iso-8859-1 and your nbspace will take only one byte.

 --
 Tomas



 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] foreach() for strings

2011-06-21 Thread Reindl Harald


Am 21.06.2011 17:55, schrieb Tomas Kuliavas:

 They submit it in utf-8 only if your html form allows them to do that or
 they don't follow html specification and try to exploit your form. Set
 form input charset to iso-8859-1 and your nbspace will take only one byte.

and this naive attitude is the root of most security problems!

why do you believe that every client submission is coming over
your form or generally over anything you can control?



signature.asc
Description: OpenPGP digital signature


Re: [PHP-DEV] foreach() for strings

2011-06-21 Thread Ferenc Kovacs
On Tue, Jun 21, 2011 at 6:14 PM, Reindl Harald h.rei...@thelounge.netwrote:



 Am 21.06.2011 17:55, schrieb Tomas Kuliavas:

  They submit it in utf-8 only if your html form allows them to do that or
  they don't follow html specification and try to exploit your form. Set
  form input charset to iso-8859-1 and your nbspace will take only one
 byte.

 and this naive attitude is the root of most security problems!

 why do you believe that every client submission is coming over
 your form or generally over anything you can control?


that doesn't matter here, Tomas just corrected John, that his statement that
chrome will always use utf-8 encoding for some special character isn't true.
browsers will adhere the
http://www.w3.org/TR/html401/interact/forms.html#adef-accept-charset
of course you can't trust user input, and you have to validate it, but this
has nothing to do with this topic.

Tyrael


Re: [PHP-DEV] foreach() for strings

2011-06-21 Thread Reindl Harald


Am 21.06.2011 18:22, schrieb Ferenc Kovacs:
 On Tue, Jun 21, 2011 at 6:14 PM, Reindl Harald h.rei...@thelounge.netwrote:
 


 Am 21.06.2011 17:55, schrieb Tomas Kuliavas:

 They submit it in utf-8 only if your html form allows them to do that or
 they don't follow html specification and try to exploit your form. Set
 form input charset to iso-8859-1 and your nbspace will take only one
 byte.

 and this naive attitude is the root of most security problems!

 why do you believe that every client submission is coming over
 your form or generally over anything you can control?


 that doesn't matter here, Tomas just corrected John, that his statement that
 chrome will always use utf-8 encoding for some special character isn't true.
 browsers will adhere the
 http://www.w3.org/TR/html401/interact/forms.html#adef-accept-charset
 of course you can't trust user input, and you have to validate it, but this
 has nothing to do with this topic

it has

how du you validate input if the string-functions having undefined results
which you probably use for your validation?



signature.asc
Description: OpenPGP digital signature


Re: [PHP-DEV] foreach() for strings

2011-06-21 Thread Ferenc Kovacs
On Tue, Jun 21, 2011 at 6:24 PM, Reindl Harald h.rei...@thelounge.netwrote:



 Am 21.06.2011 18:22, schrieb Ferenc Kovacs:
  On Tue, Jun 21, 2011 at 6:14 PM, Reindl Harald h.rei...@thelounge.net
 wrote:
 
 
 
  Am 21.06.2011 17:55, schrieb Tomas Kuliavas:
 
  They submit it in utf-8 only if your html form allows them to do that
 or
  they don't follow html specification and try to exploit your form. Set
  form input charset to iso-8859-1 and your nbspace will take only one
  byte.
 
  and this naive attitude is the root of most security problems!
 
  why do you believe that every client submission is coming over
  your form or generally over anything you can control?
 
 
  that doesn't matter here, Tomas just corrected John, that his statement
 that
  chrome will always use utf-8 encoding for some special character isn't
 true.
  browsers will adhere the
  http://www.w3.org/TR/html401/interact/forms.html#adef-accept-charset
  of course you can't trust user input, and you have to validate it, but
 this
  has nothing to do with this topic

 it has

 how du you validate input if the string-functions having undefined results
 which you probably use for your validation?


what do you mean by undefined?
if you use iso-8859-1 in your whole app and database, it doesn't matter from
the security POV if somebody sends you crafted utf-8 data.
if you mix up your encodings or you don't escape with the proper encoding,
then that can get hit you (
http://shiflett.org/blog/2006/jan/addslashes-versus-mysql-real-escape-string
 )

the multiby support in the php core isn't undefined, just unsupported. :/
use intl or mbstring for handling multibyte encodings.

Tyrael


RE: RE: [PHP-DEV] foreach() for strings

2011-06-21 Thread John Crenshaw
 They submit it in utf-8 only if your html form allows them to do that or
 they don't follow html specification and try to exploit your form.

If no explicit encoding is given, all modern browsers will attempt to 
autodetect the encoding based on the page contents, often with unpredictable 
results. Most web developers really don't understand the whole encoding thing, 
and many aren't aware of it at all. If they aren't taking care of the encoding 
question in their server side code, what makes anyone believe that they are 
specifying the encoding in their response headers, or HTML?

I can tell you for certain that if no encoding is specified, Chrome can and 
will decide that the data is UTF8, at least under certain conditions (because I 
watched it recently when working on an encoding problem in some legacy code.)

 Set form input charset to iso-8859-1

I can't believe I just saw someone recommend that ;)

Yes, you *could* use Latin-1...for which the Euro sign, ellipsis, decorative 
quotes, trademark, em dash, and a number of other frequently pasted characters 
are still out of range.

Then, when you eventually decide that latin1 isn't meeting your needs, you'll 
get to go through the wonderful process of trying to convert all of your legacy 
data to UTF8.

Single byte just doesn't cut the mustard anymore, especially on the web. The 
world is too small. We should be trying to move PHP *away* from this, not 
towards it.

John Crenshaw
Priacta, Inc.


RE: [PHP-DEV] foreach() for strings

2011-06-21 Thread John Crenshaw
 From: Pierre Joye [mailto:pierre@gmail.com] 
  On Tue, Jun 21, 2011 at 4:38 PM, John Crenshaw johncrens...@priacta.com 
  wrote:
 
  This mindset is fundamentally broken. You can call it a byte array all you 
  want, but the truth is that 99.999% of the time, when a developer is using 
  a string they need it for characters, not for bytes
 
 Let me rephrase:
 
 For backward compatibility reasons we cannot change this behavior.
 
 Any serious text processing should be done using intl, mbstring,
 transliterator (pecl) or other similar solutions.
 
 Cheers,
 --
 Pierre

Right, I totally agree. We can't fix the multibyte string issue today; I'm just 
saying that we *can* (and should) avoid making it much worse.

John Crenshaw
Priacta, Inc.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] foreach() for strings

2011-06-21 Thread Tomas Kuliavas
2011.06.21 19:24 Reindl Harald rašė:


 Am 21.06.2011 18:22, schrieb Ferenc Kovacs:
 On Tue, Jun 21, 2011 at 6:14 PM, Reindl Harald
 h.rei...@thelounge.netwrote:



 Am 21.06.2011 17:55, schrieb Tomas Kuliavas:

 They submit it in utf-8 only if your html form allows them to do that
 or
 they don't follow html specification and try to exploit your form. Set
 form input charset to iso-8859-1 and your nbspace will take only one
 byte.

 and this naive attitude is the root of most security problems!

 why do you believe that every client submission is coming over
 your form or generally over anything you can control?


 that doesn't matter here, Tomas just corrected John, that his statement
 that
 chrome will always use utf-8 encoding for some special character isn't
 true.
 browsers will adhere the
 http://www.w3.org/TR/html401/interact/forms.html#adef-accept-charset
 of course you can't trust user input, and you have to validate it, but
 this
 has nothing to do with this topic

 it has

 how du you validate input if the string-functions having undefined results
 which you probably use for your validation?

I've never said that he should trust user input. I've only said that his
valid user inputs depend on html form format.

utf-8 is strict format. If you expect utf-8 and someone submits something
else, you can tell that without any string function. You can verify utf-8
strings in pcre. You can convert nbspace to regular space, if you want.
utf-8 does not have any byte sequence that can collide with nbspace byte
sequence in utf-8.

-- 
Tomas


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] foreach() for strings

2011-06-21 Thread Reindl Harald


Am 21.06.2011 19:12, schrieb Tomas Kuliavas:
 and this naive attitude is the root of most security problems!

 why do you believe that every client submission is coming over
 your form or generally over anything you can control?


 that doesn't matter here, Tomas just corrected John, that his statement
 that
 chrome will always use utf-8 encoding for some special character isn't
 true.
 browsers will adhere the
 http://www.w3.org/TR/html401/interact/forms.html#adef-accept-charset
 of course you can't trust user input, and you have to validate it, but
 this
 has nothing to do with this topic

 it has

 how du you validate input if the string-functions having undefined results
 which you probably use for your validation?
 
 I've never said that he should trust user input. I've only said that his
 valid user inputs depend on html form format.

and i told you that this in the real world is utopic
there is a world outside of forms

show me FIVE php-apps which are using accept-charset

not one of mine - they do and even there i can not be sure that
all of the thousands of scipts/websites i wrote use it realy everywhere

 utf-8 is strict format. If you expect utf-8 and someone submits something
 else, you can tell that without any string function. You can verify utf-8
 strings in pcre. You can convert nbspace to regular space, if you want.
 utf-8 does not have any byte sequence that can collide with nbspace byte
 sequence in utf-8

show me a practicable way to detect if some input data contains UTF8
mb_string-functions are out of the game because there are many servers
even of real big companies where they are not available

so the problem is simply that you can not really write portable and well
performing code that is aware of UTF8






signature.asc
Description: OpenPGP digital signature


Re: [PHP-DEV] foreach() for strings

2011-06-21 Thread Tomas Kuliavas
2011.06.21 20:51 Reindl Harald rašė:
 utf-8 is strict format. If you expect utf-8 and someone submits
 something
 else, you can tell that without any string function. You can verify
 utf-8
 strings in pcre. You can convert nbspace to regular space, if you want.
 utf-8 does not have any byte sequence that can collide with nbspace byte
 sequence in utf-8

 show me a practicable way to detect if some input data contains UTF8
 mb_string-functions are out of the game because there are many servers
 even of real big companies where they are not available

:) I've said pcre and not mbstring. If you read fine utf-8 manual like I
did about 8 years ago, you would know how to detect 8bit inputs that are
not in utf-8. utf-8 is variable byte length character set which has very
specific rules about the way bytes are arranged. You can tell length of
symbol in bytes based on first byte. You can tell what kind of byte values
should be used for second, third, fourth, fifth or sixth byte. If you
eliminate five valid utf-8 8bit byte sequences and still have 8bit data,
it is not utf-8.

-- 
Tomas


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] foreach() for strings

2011-06-21 Thread Reindl Harald


Am 21.06.2011 22:19, schrieb Tomas Kuliavas:
 2011.06.21 20:51 Reindl Harald rašė:
 utf-8 is strict format. If you expect utf-8 and someone submits
 something
 else, you can tell that without any string function. You can verify
 utf-8
 strings in pcre. You can convert nbspace to regular space, if you want.
 utf-8 does not have any byte sequence that can collide with nbspace byte
 sequence in utf-8

 show me a practicable way to detect if some input data contains UTF8
 mb_string-functions are out of the game because there are many servers
 even of real big companies where they are not available
 
 :) I've said pcre and not mbstring. If you read fine utf-8 manual like I
 did about 8 years ago, you would know how to detect 8bit inputs that are
 not in utf-8. utf-8 is variable byte length character set which has very
 specific rules about the way bytes are arranged. You can tell length of
 symbol in bytes based on first byte. You can tell what kind of byte values
 should be used for second, third, fourth, fifth or sixth byte. If you
 eliminate five valid utf-8 8bit byte sequences and still have 8bit data,
 it is not utf-8

i do not understand any word and miss a simple str_is_utf8() or call it
as you like which can do this native and performant on a given variable
and would offer the possibility to stop a script with not expected input
without degrade performance




signature.asc
Description: OpenPGP digital signature


Re: [PHP-DEV] [VOTE] voting rfc

2011-06-21 Thread Pierre Joye
We thought there was no need to over regulate this part.

It is something like mentors, if you just come in, post a couple of
times or daily but nobody can second you and you lead zero OSS
project, then the chance that you can vote will be rather low. Your
option? Contribute! :-)

On Tue, Jun 21, 2011 at 5:57 PM, Arvids Godjuks
arvids.godj...@gmail.com wrote:
 That really neads clearing, because if i understand correctly, I should get
 ability to vote (userland developer activly reading the list and writing to
 list on some maters). So the question - do i get a vote ability? :-)
 21.06.2011 17:36 пользователь Philip Olson phi...@roshambo.org написал:


-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Changed behaviour for strtr()

2011-06-21 Thread Jordi Boggiano
On Tue, Jun 21, 2011 at 12:55 PM, Derick Rethans der...@php.net wrote:
 On Mon, 20 Jun 2011, Stas Malyshev wrote:

  Here is the next one.
 
  I think it's quite intuitive to use strtr() to remove single characters of 
  a
  string, too, instead of using many str_replace($str, $chr, ). I'd glad to
  see this change also in 5.4.

 This is a BC break, if I understand it correctly, so I don't think it is a
 good idea.

 I agree that this is not a good thing then.

Right now strtr('anything', 'anything', '') === 'anything', which
means that anyone relying on this behavior is doing something strange
and dumb imo, doing a function call for nothing. We could maybe say
that strtr('anything', 'anything', null) maps all letters to an empty
string? That should take care of the user-based inputs for BC reasons,
while still allowing strtr() to be used for this strip letter x and
y use case.

Anyway I'm not gonna fight one way or the other, it's a detail, but I
don't think the BC concern is as big as it's presented.

Cheers

-- 
Jordi Boggiano
@seldaek :: http://seld.be/

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Changed behaviour for strtr()

2011-06-21 Thread Stas Malyshev

Hi!

On 6/21/11 5:14 PM, Jordi Boggiano wrote:

Right now strtr('anything', 'anything', '') === 'anything', which
means that anyone relying on this behavior is doing something strange
and dumb imo, doing a function call for nothing. We could maybe say


It does not matter if you approve or disapprove how people write their 
code - we can't break BC unless there's a VERY good reason. You never 
know in which situation with which combination of inputs which 
application may end up using this sequence of parameters and how 
changing it may break it.

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Changed behaviour for strtr()

2011-06-21 Thread Jordi Boggiano
On Wed, Jun 22, 2011 at 2:22 AM, Stas Malyshev smalys...@sugarcrm.com wrote:
 Right now strtr('anything', 'anything', '') === 'anything', which
 means that anyone relying on this behavior is doing something strange
 and dumb imo, doing a function call for nothing. We could maybe say

 It does not matter if you approve or disapprove how people write their code
 - we can't break BC unless there's a VERY good reason. You never know in
 which situation with which combination of inputs which application may end
 up using this sequence of parameters and how changing it may break it.

Of course it's not just a matter of taste, but the null case imo
really is not likely to happen by accident, and there is no valid use
case. Anyways.. Case closed I suppose.

Cheers

-- 
Jordi Boggiano
@seldaek :: http://seld.be/

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Changed behaviour for strtr()

2011-06-21 Thread Sanford Whiteman
 Right now strtr('anything', 'anything', '') === 'anything', which
 means that anyone relying on this behavior is doing something strange
 and dumb imo, doing a function call for nothing.

How  is relying on by-design behavior that turns the call into a no-op
(instead  of  wrapping the call in an empty() check or whatever) dumb?
Is  there  some  performance  hit for entering strtr() that makes this
true?

-- S.


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php