Re: [PHP-DEV] Re: foreach() for strings
And what actually failed? The idea seams straightforward. Robert 2011/6/20 Johannes Schlüter johan...@schlueters.de On Mon, 2011-06-20 at 20:38 +0200, Robert Eisele wrote: I really like the ideas shared here. It's a thing of consideration that array-functions should also work with strings. Maybe this would be the way to go, but I'm more excited about the OOP implementation of TextIterator and ByteIterator, which solves the whole problem at once (and is easier to implement, as mentioned by Stas). As Jonathan said, Database results with a certain encoding could get iterated, too. The only way to workaround the Text/Byte problem would be, offsetting EVERY string with 1-2 byte string-type information or an additional type flag in the zval-strcuture. Handling everything with zval's instead of objects would have the advantage, that database-layers like mysqlnd could write the database-encoding directly into the zval and the user had no need to decide what encoding is used. Welcome back to the failed PHP 6 Unicode project. ;-) (while we didn't store the original encoding but converted to Utf-16, which prevents random/strange conversions in other places when mixing encodings) johannes
Re: [PHP-DEV] Re: foreach() for strings
2011/6/21 Robert Eisele rob...@xarg.org And what actually failed? The idea seams straightforward. Robert http://www.slideshare.net/andreizm/the-good-the-bad-and-the-ugly-what-happened-to-unicode-and-php-6 to my understanding: in retrospective the utf-16 wasn't the best idea, it caused more conversion that it seemed necessary beforehand, and many of the core devs lacked the vison and/or the technical knowledge about the unicode stuff, the adoption of the support for unicode strings was much slower than expected. Tyrael
Re: [PHP-DEV] Re: foreach() for strings
Hi! On 6/21/11 1:23 AM, Ferenc Kovacs wrote: 2011/6/21 Robert Eisele rob...@xarg.org mailto:rob...@xarg.org And what actually failed? The idea seams straightforward. Robert http://www.slideshare.net/andreizm/the-good-the-bad-and-the-ugly-what-happened-to-unicode-and-php-6 Also you may want to read this: http://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default to understand why the idea is not straightforward as it seems. Yes, it's about Perl and UTF-8, but gives some impression about the number of issues that need to be handled. There are many PHP-specific ones on top of that (think databases, streams, filesystems, etc.) which would be expected to work out of the box if we declare Unicode support. -- Stanislav Malyshev, Software Architect SugarCRM: http://www.sugarcrm.com/ (408)454-6900 ext. 227 -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] 5.4 features for vote (long)
Hello, On Tue, Jun 21, 2011 at 05:17, Rasmus Lerdorf syst...@php.net wrote: On 06/20/2011 08:09 PM, Felipe Pena wrote: I'm ok with this, I just think it's ugly to repeat the token name in the definition in the .y file. :P %token T_LNUMBER 'number' (T_LNUMBER) %token T_STRING 'identifier' (T_STRING) Why 'identifier' and not 'string' or 'string-literal' there? For people using php, a string or a string literal is foo or 'foo'. T_STRING does not represent foo nor 'foo'. identifier seems to adequatly describe what it encompass. IMHO, it would even be better if the unnexpect part displayed the actual content: i.e. function 1() = Unexpected number '1' ... or function 1() = Unexpected '1'... Best, People know what a string is. I am not sure that people know what an identifier is, so in this case changing the error message from something that says expecting T_STRING to expecting identifier isn't making the error message any clearer as far as I am concerned. This is one of the reasons that having the token name there is useful. It provides continuity with the current error messages that people have grown used to. I think we either need the token names, or we need more descriptive names printed. -Rasmus -- Etienne Kneuss http://www.colder.ch -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Re: svn: /php/php-src/ branches/PHP_5_4/sapi/cli/config.m4 branches/PHP_5_4/sapi/cli/config.w32 branches/PHP_5_4/sapi/cli/php_cli.c branches/PHP_5_4/sapi/cli/php_cli_server.c branches/PHP_5_
On Mon, 20 Jun 2011 20:27:39 +, Moriyoshi Koizumi wrote: moriyoshiMon, 20 Jun 2011 20:27:39 + Revision: http://svn.php.net/viewvc?view=revisionrevision=312344 Log: - Add built-in web server to CLI SAPI. See the RFC for detail. As noted [1] php_http_* had been used by pecl_http, so am I supposed to change to e.g. pecl_http_* prefix now? Regards, Mike [1] http://marc.info/?l=php-internalsm=130321550627147w=2 -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Re: svn: /php/php-src/ branches/PHP_5_4/sapi/cli/config.m4 branches/PHP_5_4/sapi/cli/config.w32 branches/PHP_5_4/sapi/cli/php_cli.c branches/PHP_5_4/sapi/cli/php_cli_server.c branches/PH
On Tue, Jun 21, 2011 at 11:43 AM, Michael Wallner m...@php.net wrote: On Mon, 20 Jun 2011 20:27:39 +, Moriyoshi Koizumi wrote: moriyoshi Mon, 20 Jun 2011 20:27:39 + Revision: http://svn.php.net/viewvc?view=revisionrevision=312344 Log: - Add built-in web server to CLI SAPI. See the RFC for detail. As noted [1] php_http_* had been used by pecl_http, so am I supposed to change to e.g. pecl_http_* prefix now? it should be php_cli_http actually, as php_http is also likely to be used for other http related function not related to this. -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Can't vote yet, as RFC has options (Was: Re: [PHP-DEV] [VOTE] release process RFC)
On Tue, Jun 21, 2011 at 5:30 AM, Sebastian Bergmann sebast...@php.net wrote: Am 20.06.2011 15:30, schrieb Derick Rethans: I am not generally against this RFC, but this point needs to be discussed first IMO. As having 5 active branches at the same time for the multiple major releases option is *not* workable. I agree. That's why we added a couple of notices about 12 or 18 months. It is also very unlikely that we end we end in such situations anyway.And even if we do, given the strictness (about what can be applied), We didn't see much of a problem here. Cheers, -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [VOTE] voting rfc
Hi! Wouldn't it be better if wiki voting mechanism would be embedded in the page with rfc? Under the block Table of contents, for example. Just now green link Votes are open is not so noticeable, so having a chance to vote without leaving the page would better inspire people to take part in voting process. 2011/6/20 David Soria Parra d...@php.net: Hi Internals, we have been working on getting an rfc together on how to deal with votes on rfcs. We aim to provide a simple mechaism for votes while still maintaining freedom on how to do votes and how to work on rfcs. I want to move forward on the voting and release RFCs, so we can move forward on the 5.4 process. Therefore I call for votes on the voting RFC. The RFC can be found here: https://wiki.php.net/rfc/voting You can vote here: https://wiki.php.net/rfc/voting/vote Votes are open until Monday 27.06.2011. Thank you David -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- Regards, Shein Alexey -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [VOTE] voting rfc
added a link to the vote page. It should be more clear now. On Tue, Jun 21, 2011 at 12:25 PM, Alexey Shein con...@gmail.com wrote: Hi! Wouldn't it be better if wiki voting mechanism would be embedded in the page with rfc? Under the block Table of contents, for example. Just now green link Votes are open is not so noticeable, so having a chance to vote without leaving the page would better inspire people to take part in voting process. 2011/6/20 David Soria Parra d...@php.net: Hi Internals, we have been working on getting an rfc together on how to deal with votes on rfcs. We aim to provide a simple mechaism for votes while still maintaining freedom on how to do votes and how to work on rfcs. I want to move forward on the voting and release RFCs, so we can move forward on the 5.4 process. Therefore I call for votes on the voting RFC. The RFC can be found here: https://wiki.php.net/rfc/voting You can vote here: https://wiki.php.net/rfc/voting/vote Votes are open until Monday 27.06.2011. Thank you David -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- Regards, Shein Alexey -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] foreach() for strings
On Mon, 20 Jun 2011, Stas Malyshev wrote: On 6/20/11 9:15 AM, John Crenshaw wrote: From: Ilia Alshanetsky [mailto:i...@prohost.org] As long as it works on a premise that a string is a byte array and each element represents 1 byte, +1 from me. Code written on this premise is almost always bug central when people finally get around to realizing why they really do need to support wide characters (and everybody does, because people like to paste stuff containing non-break-spaces, and decorative quotes). I really don't think this single byte character mentality should be encouraged. I think you're right, TextIterator would be better (and also much easier to implement, I think). Didn't we have it in Unicode branch? We could port it back or we could have something along the lines of grapheme_extract... It depended on ICU there, and I would be against making a core thing in PHP 5.x depend on ICU. cheers, Derick -- http://derickrethans.nl | http://xdebug.org Like Xdebug? Consider a donation: http://xdebug.org/donate.php twitter: @derickr and @xdebug -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Changed behaviour for strtr()
On Mon, 20 Jun 2011, Stas Malyshev wrote: Here is the next one. I think it's quite intuitive to use strtr() to remove single characters of a string, too, instead of using many str_replace($str, $chr, ). I'd glad to see this change also in 5.4. This is a BC break, if I understand it correctly, so I don't think it is a good idea. I agree that this is not a good thing then. cheers, Derick -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] foreach() for strings
On Tue, Jun 21, 2011 at 12:53 PM, Derick Rethans der...@php.net wrote: It depended on ICU there, and I would be against making a core thing in PHP 5.x depend on ICU. It can and should be done as part of intl, actually. But that's somehow unrelated to the proposal here, as it is about byte, not characters :) -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [VOTE] voting rfc
2011/6/21 Pierre Joye pierre@gmail.com: added a link to the vote page. It should be more clear now. Thank you. But why not just place doodle plugin in the bottom of the page with rfc? This will give some chances that people will read rfc till the end before voting. What's the idea behind keeping 2 separate pages for rfc and voting? On Tue, Jun 21, 2011 at 12:25 PM, Alexey Shein con...@gmail.com wrote: Hi! Wouldn't it be better if wiki voting mechanism would be embedded in the page with rfc? Under the block Table of contents, for example. Just now green link Votes are open is not so noticeable, so having a chance to vote without leaving the page would better inspire people to take part in voting process. 2011/6/20 David Soria Parra d...@php.net: Hi Internals, we have been working on getting an rfc together on how to deal with votes on rfcs. We aim to provide a simple mechaism for votes while still maintaining freedom on how to do votes and how to work on rfcs. I want to move forward on the voting and release RFCs, so we can move forward on the 5.4 process. Therefore I call for votes on the voting RFC. The RFC can be found here: https://wiki.php.net/rfc/voting You can vote here: https://wiki.php.net/rfc/voting/vote Votes are open until Monday 27.06.2011. Thank you David -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- Regards, Shein Alexey -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org -- Regards, Shein Alexey -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [VOTE] voting rfc
On Tue, Jun 21, 2011 at 1:05 PM, Alexey Shein con...@gmail.com wrote: 2011/6/21 Pierre Joye pierre@gmail.com: added a link to the vote page. It should be more clear now. Thank you. But why not just place doodle plugin in the bottom of the page with rfc? This will give some chances that people will read rfc till the end before voting. What's the idea behind keeping 2 separate pages for rfc and voting? Find nicer and clearer on a separate page, but it could be done on the same page too... Not that it is that important we should not change it now. -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Re: foreach() for strings
On Mon, 20 Jun 2011, Anthony Ferrara wrote: text_to_array($s) == str_split($s, 1) No, because str_split always splits into 1 byte chunks. text_to_array would take the character set into account (or that's where the utility in it would be)... No, as PHP currently does *NOT* know about character sets. If you want character set, we need Unicode strings like we had in the PHP6 branch. Derick -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] 5.4 features for vote (long)
2011/6/21 Etienne Kneuss col...@php.net: Hello, On Tue, Jun 21, 2011 at 05:17, Rasmus Lerdorf syst...@php.net wrote: On 06/20/2011 08:09 PM, Felipe Pena wrote: I'm ok with this, I just think it's ugly to repeat the token name in the definition in the .y file. :P %token T_LNUMBER 'number' (T_LNUMBER) %token T_STRING 'identifier' (T_STRING) Why 'identifier' and not 'string' or 'string-literal' there? For people using php, a string or a string literal is foo or 'foo'. T_STRING does not represent foo nor 'foo'. identifier seems to adequatly describe what it encompass. IMHO, it would even be better if the unnexpect part displayed the actual content: i.e. function 1() = Unexpected number '1' ... or function 1() = Unexpected '1'... Currently it's possible to do this, it'll only require a static variable in yytnamerr implementation. -- Regards, Felipe Pena -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] foreach() for strings
Pierre Joye wrote: It depended on ICU there, and I would be against making a core thing in PHP 5.x depend on ICU. It can and should be done as part of intl, actually. But that's somehow unrelated to the proposal here, as it is about byte, not characters :) I believe this may be where some of the new niggles may be coming from? With browsers returning unicode, it may be that some of the 'extra' characters are being returned as multibyte rather than as single bytes? Such as the problem reported on the general list currently. How do we ensure that we are dealing with single byte character strings nowadays? -- Lester Caine - G8HFL - Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk// Firebird - http://www.firebirdsql.org/index.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] foreach() for strings
On Tue, Jun 21, 2011 at 1:33 PM, Lester Caine les...@lsces.co.uk wrote: Pierre Joye wrote: It depended on ICU there, and I would be against making a core thing in PHP 5.x depend on ICU. It can and should be done as part of intl, actually. But that's somehow unrelated to the proposal here, as it is about byte, not characters :) I believe this may be where some of the new niggles may be coming from? With browsers returning unicode, it may be that some of the 'extra' characters are being returned as multibyte rather than as single bytes? Such as the problem reported on the general list currently. How do we ensure that we are dealing with single byte character strings nowadays? As it has been stated numerous times in this thread and other, we do not do anything with multi bytes systems, unicode, etc. mbstring and intl do, but php's string as of now is all about bytes, array of bytes if I may describe them this way. And we can't change this behavior. Cheers, -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] foreach() for strings
Pierre Joye wrote: On Tue, Jun 21, 2011 at 1:33 PM, Lester Caineles...@lsces.co.uk wrote: Pierre Joye wrote: It depended on ICU there, and I would be against making a core thing in PHP 5.x depend on ICU. It can and should be done as part of intl, actually. But that's somehow unrelated to the proposal here, as it is about byte, not characters :) I believe this may be where some of the new niggles may be coming from? With browsers returning unicode, it may be that some of the 'extra' characters are being returned as multibyte rather than as single bytes? Such as the problem reported on the general list currently. How do we ensure that we are dealing with single byte character strings nowadays? As it has been stated numerous times in this thread and other, we do not do anything with multi bytes systems, unicode, etc. mbstring and intl do, but php's string as of now is all about bytes, array of bytes if I may describe them this way. And we can't change this behavior. That is exactly the point. I suppose what I am asking is how people ensure that what they are feeding into simple strings are single byte when cut and past nowadays does not make a distinction? -- Lester Caine - G8HFL - Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk// Firebird - http://www.firebirdsql.org/index.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] foreach() for strings
Pierre Joye wrote: On Tue, Jun 21, 2011 at 1:33 PM, Lester Caineles...@lsces.co.uk wrote: Pierre Joye wrote: It depended on ICU there, and I would be against making a core thing in PHP 5.x depend on ICU. It can and should be done as part of intl, actually. But that's somehow unrelated to the proposal here, as it is about byte, not characters :) I believe this may be where some of the new niggles may be coming from? With browsers returning unicode, it may be that some of the 'extra' characters are being returned as multibyte rather than as single bytes? Such as the problem reported on the general list currently. How do we ensure that we are dealing with single byte character strings nowadays? As it has been stated numerous times in this thread and other, we do not do anything with multi bytes systems, unicode, etc. mbstring and intl do, but php's string as of now is all about bytes, array of bytes if I may describe them this way. And we can't change this behavior. This mindset is fundamentally broken. You can call it a byte array all you want, but the truth is that 99.999% of the time, when a developer is using a string they need it for characters, not for bytes, and characters are not single byte. Even English users tend to submit Unicode range characters at an alarming rate. If you're using a WYSIWYG editor, Chrome will submit non-breaking-spaces as the actual UTF8 encoded character, not as an HTML encoded entity. Whether developers like it, or even know it, supporting an extended universal character set is not really optional. PHP makes this bad enough with the whole collection of bytewise string functions, including many with no appropriate multibyte aware replacement, but at least this can be avoided, quickly audited, and in the future can even be fixed in any number of ways with only a nominal BC impact. Hard coding this single byte idiocy into a language construct (foreach) though would be an incredibly awful idea. This would create a trap for new naive PHP developers, and create a character set problem that the language could NEVER recover from without a massive BC break. This proposal is really about adding a feature which whenever it used is almost guaranteed to be an error. It probably won't look to the developer like an error during simple testing, but will almost certainly show up as an error in production. Is it really worth all that for a bit of syntax sugar that the developer will have to strip out anyway to fix their bug? If string iteration needs to be addressed in the core (and IMO it doesn't because it can be handled at the script level, but if it does) why not use iterator classes? This gives the same functionality and prevents the language from encouraging hidden bugs. John Crenshaw Priacta, Inc. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] foreach() for strings
On Tue, Jun 21, 2011 at 4:38 PM, John Crenshaw johncrens...@priacta.com wrote: This mindset is fundamentally broken. You can call it a byte array all you want, but the truth is that 99.999% of the time, when a developer is using a string they need it for characters, not for bytes Let me rephrase: For backward compatibility reasons we cannot change this behavior. Any serious text processing should be done using intl, mbstring, transliterator (pecl) or other similar solutions. Cheers, -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [VOTE] voting rfc
On Jun 20, 2011, at 5:15 AM, David Soria Parra wrote: Hi Internals, we have been working on getting an rfc together on how to deal with votes on rfcs. We aim to provide a simple mechaism for votes while still maintaining freedom on how to do votes and how to work on rfcs. I want to move forward on the voting and release RFCs, so we can move forward on the 5.4 process. Therefore I call for votes on the voting RFC. The RFC can be found here: https://wiki.php.net/rfc/voting You can vote here: https://wiki.php.net/rfc/voting/vote Votes are open until Monday 27.06.2011. Please clarify the who can vote aspect of this RFC, which is: The proposal here is for two audiences to participate in the voting process: * People with php.net SVN accounts that have contributed code to PHP * Representatives from the PHP community, that will be chosen by those with php.net SVN accounts * Lead developers of PHP based projects (frameworks, cms, tools, etc.) * regular participant of internals discussions Does this mean that a php.net account holder must have 1+ commits? How are Lead developers determined exactly? Do they nominate themselves? Does each name require an official vote with a two week waiting period? And what's a regular participant of internal discussions? One post per week/month/year? And only the internals@lists.php.net mailing list applies? I voted against this RFC partly because the above is not clear. Regards, Philip -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] foreach() for strings
2011.06.21 17:38 John Crenshaw rašė: Pierre Joye wrote: On Tue, Jun 21, 2011 at 1:33 PM, Lester Caineles...@lsces.co.uk wrote: Pierre Joye wrote: It depended on ICU there, and I would be against making a core thing in PHP 5.x depend on ICU. It can and should be done as part of intl, actually. But that's somehow unrelated to the proposal here, as it is about byte, not characters :) I believe this may be where some of the new niggles may be coming from? With browsers returning unicode, it may be that some of the 'extra' characters are being returned as multibyte rather than as single bytes? Such as the problem reported on the general list currently. How do we ensure that we are dealing with single byte character strings nowadays? As it has been stated numerous times in this thread and other, we do not do anything with multi bytes systems, unicode, etc. mbstring and intl do, but php's string as of now is all about bytes, array of bytes if I may describe them this way. And we can't change this behavior. This mindset is fundamentally broken. You can call it a byte array all you want, but the truth is that 99.999% of the time, when a developer is using a string they need it for characters, not for bytes, and characters are not single byte. Even English users tend to submit Unicode range characters at an alarming rate. If you're using a WYSIWYG editor, Chrome will submit non-breaking-spaces as the actual UTF8 encoded character, not as an HTML encoded entity. Whether developers like it, or even know it, supporting an extended universal character set is not really optional. They submit it in utf-8 only if your html form allows them to do that or they don't follow html specification and try to exploit your form. Set form input charset to iso-8859-1 and your nbspace will take only one byte. -- Tomas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [VOTE] voting rfc
That really neads clearing, because if i understand correctly, I should get ability to vote (userland developer activly reading the list and writing to list on some maters). So the question - do i get a vote ability? :-) 21.06.2011 17:36 пользователь Philip Olson phi...@roshambo.org написал: On Jun 20, 2011, at 5:15 AM, David Soria Parra wrote: Hi Internals, we have been working on getting an rfc together on how to deal with votes on rfcs. We aim to provide a simple mechaism for votes while still maintaining freedom on how to do votes and how to work on rfcs. I want to move forward on the voting and release RFCs, so we can move forward on the 5.4 process. Therefore I call for votes on the voting RFC. The RFC can be found here: https://wiki.php.net/rfc/voting You can vote here: https://wiki.php.net/rfc/voting/vote Votes are open until Monday 27.06.2011. Please clarify the who can vote aspect of this RFC, which is: The proposal here is for two audiences to participate in the voting process: * People with php.net SVN accounts that have contributed code to PHP * Representatives from the PHP community, that will be chosen by those with php.net SVN accounts * Lead developers of PHP based projects (frameworks, cms, tools, etc.) * regular participant of internals discussions Does this mean that a php.net account holder must have 1+ commits? How are Lead developers determined exactly? Do they nominate themselves? Does each name require an official vote with a two week waiting period? And what's a regular participant of internal discussions? One post per week/month/year? And only the internals@lists.php.net mailing list applies? I voted against this RFC partly because the above is not clear. Regards, Philip -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: RE: [PHP-DEV] foreach() for strings
As a userland developer due to my geographical nature i have to work with 3 languages constantly - english, russian (cyryllic) and latvian (witch has it's own share of non latin characters). I end up using utf-8 in every project. And some give me a headake of dealing with text parsing. mb_string covers just part of the functionality and can be turned off. I personally think something has to be done about unicode handling in php after 5.4 so that we have an official method of dealing with it in the core. Probably it can be done in a namespace of its own and be new functionality to witch people should migrate. my 2 cents. 21.06.2011 17:56 пользователь Tomas Kuliavas to...@users.sourceforge.net написал: 2011.06.21 17:38 John Crenshaw rašė: Pierre Joye wrote: On Tue, Jun 21, 2011 at 1:33 PM, Lester Caineles...@lsces.co.uk wrote: Pierre Joye wrote: It depended on ICU there, and I would be against making a core thing in PHP 5.x depend on ICU. It can and should be done as part of intl, actually. But that's somehow unrelated to the proposal here, as it is about byte, not characters :) I believe this may be where some of the new niggles may be coming from? With browsers returning unicode, it may be that some of the 'extra' characters are being returned as multibyte rather than as single bytes? Such as the problem reported on the general list currently. How do we ensure that we are dealing with single byte character strings nowadays? As it has been stated numerous times in this thread and other, we do not do anything with multi bytes systems, unicode, etc. mbstring and intl do, but php's string as of now is all about bytes, array of bytes if I may describe them this way. And we can't change this behavior. This mindset is fundamentally broken. You can call it a byte array all you want, but the truth is that 99.999% of the time, when a developer is using a string they need it for characters, not for bytes, and characters are not single byte. Even English users tend to submit Unicode range characters at an alarming rate. If you're using a WYSIWYG editor, Chrome will submit non-breaking-spaces as the actual UTF8 encoded character, not as an HTML encoded entity. Whether developers like it, or even know it, supporting an extended universal character set is not really optional. They submit it in utf-8 only if your html form allows them to do that or they don't follow html specification and try to exploit your form. Set form input charset to iso-8859-1 and your nbspace will take only one byte. -- Tomas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] foreach() for strings
Am 21.06.2011 17:55, schrieb Tomas Kuliavas: They submit it in utf-8 only if your html form allows them to do that or they don't follow html specification and try to exploit your form. Set form input charset to iso-8859-1 and your nbspace will take only one byte. and this naive attitude is the root of most security problems! why do you believe that every client submission is coming over your form or generally over anything you can control? signature.asc Description: OpenPGP digital signature
Re: [PHP-DEV] foreach() for strings
On Tue, Jun 21, 2011 at 6:14 PM, Reindl Harald h.rei...@thelounge.netwrote: Am 21.06.2011 17:55, schrieb Tomas Kuliavas: They submit it in utf-8 only if your html form allows them to do that or they don't follow html specification and try to exploit your form. Set form input charset to iso-8859-1 and your nbspace will take only one byte. and this naive attitude is the root of most security problems! why do you believe that every client submission is coming over your form or generally over anything you can control? that doesn't matter here, Tomas just corrected John, that his statement that chrome will always use utf-8 encoding for some special character isn't true. browsers will adhere the http://www.w3.org/TR/html401/interact/forms.html#adef-accept-charset of course you can't trust user input, and you have to validate it, but this has nothing to do with this topic. Tyrael
Re: [PHP-DEV] foreach() for strings
Am 21.06.2011 18:22, schrieb Ferenc Kovacs: On Tue, Jun 21, 2011 at 6:14 PM, Reindl Harald h.rei...@thelounge.netwrote: Am 21.06.2011 17:55, schrieb Tomas Kuliavas: They submit it in utf-8 only if your html form allows them to do that or they don't follow html specification and try to exploit your form. Set form input charset to iso-8859-1 and your nbspace will take only one byte. and this naive attitude is the root of most security problems! why do you believe that every client submission is coming over your form or generally over anything you can control? that doesn't matter here, Tomas just corrected John, that his statement that chrome will always use utf-8 encoding for some special character isn't true. browsers will adhere the http://www.w3.org/TR/html401/interact/forms.html#adef-accept-charset of course you can't trust user input, and you have to validate it, but this has nothing to do with this topic it has how du you validate input if the string-functions having undefined results which you probably use for your validation? signature.asc Description: OpenPGP digital signature
Re: [PHP-DEV] foreach() for strings
On Tue, Jun 21, 2011 at 6:24 PM, Reindl Harald h.rei...@thelounge.netwrote: Am 21.06.2011 18:22, schrieb Ferenc Kovacs: On Tue, Jun 21, 2011 at 6:14 PM, Reindl Harald h.rei...@thelounge.net wrote: Am 21.06.2011 17:55, schrieb Tomas Kuliavas: They submit it in utf-8 only if your html form allows them to do that or they don't follow html specification and try to exploit your form. Set form input charset to iso-8859-1 and your nbspace will take only one byte. and this naive attitude is the root of most security problems! why do you believe that every client submission is coming over your form or generally over anything you can control? that doesn't matter here, Tomas just corrected John, that his statement that chrome will always use utf-8 encoding for some special character isn't true. browsers will adhere the http://www.w3.org/TR/html401/interact/forms.html#adef-accept-charset of course you can't trust user input, and you have to validate it, but this has nothing to do with this topic it has how du you validate input if the string-functions having undefined results which you probably use for your validation? what do you mean by undefined? if you use iso-8859-1 in your whole app and database, it doesn't matter from the security POV if somebody sends you crafted utf-8 data. if you mix up your encodings or you don't escape with the proper encoding, then that can get hit you ( http://shiflett.org/blog/2006/jan/addslashes-versus-mysql-real-escape-string ) the multiby support in the php core isn't undefined, just unsupported. :/ use intl or mbstring for handling multibyte encodings. Tyrael
RE: RE: [PHP-DEV] foreach() for strings
They submit it in utf-8 only if your html form allows them to do that or they don't follow html specification and try to exploit your form. If no explicit encoding is given, all modern browsers will attempt to autodetect the encoding based on the page contents, often with unpredictable results. Most web developers really don't understand the whole encoding thing, and many aren't aware of it at all. If they aren't taking care of the encoding question in their server side code, what makes anyone believe that they are specifying the encoding in their response headers, or HTML? I can tell you for certain that if no encoding is specified, Chrome can and will decide that the data is UTF8, at least under certain conditions (because I watched it recently when working on an encoding problem in some legacy code.) Set form input charset to iso-8859-1 I can't believe I just saw someone recommend that ;) Yes, you *could* use Latin-1...for which the Euro sign, ellipsis, decorative quotes, trademark, em dash, and a number of other frequently pasted characters are still out of range. Then, when you eventually decide that latin1 isn't meeting your needs, you'll get to go through the wonderful process of trying to convert all of your legacy data to UTF8. Single byte just doesn't cut the mustard anymore, especially on the web. The world is too small. We should be trying to move PHP *away* from this, not towards it. John Crenshaw Priacta, Inc.
RE: [PHP-DEV] foreach() for strings
From: Pierre Joye [mailto:pierre@gmail.com] On Tue, Jun 21, 2011 at 4:38 PM, John Crenshaw johncrens...@priacta.com wrote: This mindset is fundamentally broken. You can call it a byte array all you want, but the truth is that 99.999% of the time, when a developer is using a string they need it for characters, not for bytes Let me rephrase: For backward compatibility reasons we cannot change this behavior. Any serious text processing should be done using intl, mbstring, transliterator (pecl) or other similar solutions. Cheers, -- Pierre Right, I totally agree. We can't fix the multibyte string issue today; I'm just saying that we *can* (and should) avoid making it much worse. John Crenshaw Priacta, Inc. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] foreach() for strings
2011.06.21 19:24 Reindl Harald rašė: Am 21.06.2011 18:22, schrieb Ferenc Kovacs: On Tue, Jun 21, 2011 at 6:14 PM, Reindl Harald h.rei...@thelounge.netwrote: Am 21.06.2011 17:55, schrieb Tomas Kuliavas: They submit it in utf-8 only if your html form allows them to do that or they don't follow html specification and try to exploit your form. Set form input charset to iso-8859-1 and your nbspace will take only one byte. and this naive attitude is the root of most security problems! why do you believe that every client submission is coming over your form or generally over anything you can control? that doesn't matter here, Tomas just corrected John, that his statement that chrome will always use utf-8 encoding for some special character isn't true. browsers will adhere the http://www.w3.org/TR/html401/interact/forms.html#adef-accept-charset of course you can't trust user input, and you have to validate it, but this has nothing to do with this topic it has how du you validate input if the string-functions having undefined results which you probably use for your validation? I've never said that he should trust user input. I've only said that his valid user inputs depend on html form format. utf-8 is strict format. If you expect utf-8 and someone submits something else, you can tell that without any string function. You can verify utf-8 strings in pcre. You can convert nbspace to regular space, if you want. utf-8 does not have any byte sequence that can collide with nbspace byte sequence in utf-8. -- Tomas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] foreach() for strings
Am 21.06.2011 19:12, schrieb Tomas Kuliavas: and this naive attitude is the root of most security problems! why do you believe that every client submission is coming over your form or generally over anything you can control? that doesn't matter here, Tomas just corrected John, that his statement that chrome will always use utf-8 encoding for some special character isn't true. browsers will adhere the http://www.w3.org/TR/html401/interact/forms.html#adef-accept-charset of course you can't trust user input, and you have to validate it, but this has nothing to do with this topic it has how du you validate input if the string-functions having undefined results which you probably use for your validation? I've never said that he should trust user input. I've only said that his valid user inputs depend on html form format. and i told you that this in the real world is utopic there is a world outside of forms show me FIVE php-apps which are using accept-charset not one of mine - they do and even there i can not be sure that all of the thousands of scipts/websites i wrote use it realy everywhere utf-8 is strict format. If you expect utf-8 and someone submits something else, you can tell that without any string function. You can verify utf-8 strings in pcre. You can convert nbspace to regular space, if you want. utf-8 does not have any byte sequence that can collide with nbspace byte sequence in utf-8 show me a practicable way to detect if some input data contains UTF8 mb_string-functions are out of the game because there are many servers even of real big companies where they are not available so the problem is simply that you can not really write portable and well performing code that is aware of UTF8 signature.asc Description: OpenPGP digital signature
Re: [PHP-DEV] foreach() for strings
2011.06.21 20:51 Reindl Harald rašė: utf-8 is strict format. If you expect utf-8 and someone submits something else, you can tell that without any string function. You can verify utf-8 strings in pcre. You can convert nbspace to regular space, if you want. utf-8 does not have any byte sequence that can collide with nbspace byte sequence in utf-8 show me a practicable way to detect if some input data contains UTF8 mb_string-functions are out of the game because there are many servers even of real big companies where they are not available :) I've said pcre and not mbstring. If you read fine utf-8 manual like I did about 8 years ago, you would know how to detect 8bit inputs that are not in utf-8. utf-8 is variable byte length character set which has very specific rules about the way bytes are arranged. You can tell length of symbol in bytes based on first byte. You can tell what kind of byte values should be used for second, third, fourth, fifth or sixth byte. If you eliminate five valid utf-8 8bit byte sequences and still have 8bit data, it is not utf-8. -- Tomas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] foreach() for strings
Am 21.06.2011 22:19, schrieb Tomas Kuliavas: 2011.06.21 20:51 Reindl Harald rašė: utf-8 is strict format. If you expect utf-8 and someone submits something else, you can tell that without any string function. You can verify utf-8 strings in pcre. You can convert nbspace to regular space, if you want. utf-8 does not have any byte sequence that can collide with nbspace byte sequence in utf-8 show me a practicable way to detect if some input data contains UTF8 mb_string-functions are out of the game because there are many servers even of real big companies where they are not available :) I've said pcre and not mbstring. If you read fine utf-8 manual like I did about 8 years ago, you would know how to detect 8bit inputs that are not in utf-8. utf-8 is variable byte length character set which has very specific rules about the way bytes are arranged. You can tell length of symbol in bytes based on first byte. You can tell what kind of byte values should be used for second, third, fourth, fifth or sixth byte. If you eliminate five valid utf-8 8bit byte sequences and still have 8bit data, it is not utf-8 i do not understand any word and miss a simple str_is_utf8() or call it as you like which can do this native and performant on a given variable and would offer the possibility to stop a script with not expected input without degrade performance signature.asc Description: OpenPGP digital signature
Re: [PHP-DEV] [VOTE] voting rfc
We thought there was no need to over regulate this part. It is something like mentors, if you just come in, post a couple of times or daily but nobody can second you and you lead zero OSS project, then the chance that you can vote will be rather low. Your option? Contribute! :-) On Tue, Jun 21, 2011 at 5:57 PM, Arvids Godjuks arvids.godj...@gmail.com wrote: That really neads clearing, because if i understand correctly, I should get ability to vote (userland developer activly reading the list and writing to list on some maters). So the question - do i get a vote ability? :-) 21.06.2011 17:36 пользователь Philip Olson phi...@roshambo.org написал: -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Changed behaviour for strtr()
On Tue, Jun 21, 2011 at 12:55 PM, Derick Rethans der...@php.net wrote: On Mon, 20 Jun 2011, Stas Malyshev wrote: Here is the next one. I think it's quite intuitive to use strtr() to remove single characters of a string, too, instead of using many str_replace($str, $chr, ). I'd glad to see this change also in 5.4. This is a BC break, if I understand it correctly, so I don't think it is a good idea. I agree that this is not a good thing then. Right now strtr('anything', 'anything', '') === 'anything', which means that anyone relying on this behavior is doing something strange and dumb imo, doing a function call for nothing. We could maybe say that strtr('anything', 'anything', null) maps all letters to an empty string? That should take care of the user-based inputs for BC reasons, while still allowing strtr() to be used for this strip letter x and y use case. Anyway I'm not gonna fight one way or the other, it's a detail, but I don't think the BC concern is as big as it's presented. Cheers -- Jordi Boggiano @seldaek :: http://seld.be/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Changed behaviour for strtr()
Hi! On 6/21/11 5:14 PM, Jordi Boggiano wrote: Right now strtr('anything', 'anything', '') === 'anything', which means that anyone relying on this behavior is doing something strange and dumb imo, doing a function call for nothing. We could maybe say It does not matter if you approve or disapprove how people write their code - we can't break BC unless there's a VERY good reason. You never know in which situation with which combination of inputs which application may end up using this sequence of parameters and how changing it may break it. -- Stanislav Malyshev, Software Architect SugarCRM: http://www.sugarcrm.com/ (408)454-6900 ext. 227 -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Changed behaviour for strtr()
On Wed, Jun 22, 2011 at 2:22 AM, Stas Malyshev smalys...@sugarcrm.com wrote: Right now strtr('anything', 'anything', '') === 'anything', which means that anyone relying on this behavior is doing something strange and dumb imo, doing a function call for nothing. We could maybe say It does not matter if you approve or disapprove how people write their code - we can't break BC unless there's a VERY good reason. You never know in which situation with which combination of inputs which application may end up using this sequence of parameters and how changing it may break it. Of course it's not just a matter of taste, but the null case imo really is not likely to happen by accident, and there is no valid use case. Anyways.. Case closed I suppose. Cheers -- Jordi Boggiano @seldaek :: http://seld.be/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Changed behaviour for strtr()
Right now strtr('anything', 'anything', '') === 'anything', which means that anyone relying on this behavior is doing something strange and dumb imo, doing a function call for nothing. How is relying on by-design behavior that turns the call into a no-op (instead of wrapping the call in an empty() check or whatever) dumb? Is there some performance hit for entering strtr() that makes this true? -- S. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php