Re: [PHP-DEV] Removal of unicode_semantics
On Thu, May 8, 2008 at 7:33 AM, Andi Gutmans [EMAIL PROTECTED] wrote: So for now we should remove the switch. We can do this if needed. Who is we in this context? Zend? Scott is already working on the removal but I'll bet he would really appreciate help with it. -Hannes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] Removal of unicode_semantics
See below: -Original Message- From: Derick Rethans [mailto:[EMAIL PROTECTED] Sent: Thursday, May 08, 2008 12:23 AM To: Andi Gutmans Cc: Andrei Zmievski; PHP Developers Mailing List Subject: RE: [PHP-DEV] Removal of unicode_semantics Scott is already working on this AFAIK. And like Andrei, I'd also be against defaulting to binary strings. Great. Dmitry can help out if needed. He'll be reviewing it anyway. I understand you are against it but as we discussed on this list a few months ago we will have to see what reality delivers when people actually start migrating applications. It's not something we should decide at this point before we are any smarter. For now we can definitely keep as Unicode and we'll learn how that works during the alpha/beta cycles. We do owe our users a feasible upgrade path whether it's with automated scripts or some other way. As we figure that out it'll become more apparent what makes sense. Andi
Re: [PHP-DEV] Removal of unicode_semantics
The easiest thing would be just to default unicode_semantics to On internally and hide it from users. Don't remove all the UG(unicode) checks yet, because we can test migration/compatibility with those in place. -Andrei Derick Rethans wrote: On Wed, 7 May 2008, Andi Gutmans wrote: Yep, we said that we'd remove the switch. Then we'd see how compatibility fairs and if we discover the upgrade path is too painful we'd consider making be binary string and require u for Unicode strings. But this was TBD depending on people's experiences and our ability to deliver an easy migration path for applications. So for now we should remove the switch. We can do this if needed. Scott is already working on this AFAIK. And like Andrei, I'd also be against defaulting to binary strings. regards, Derick -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
As far as I remember, the latest point was to remove the unicode_semantics switch and presume that its value is always On. At the same time we said that binary strings should probably be the default string type (which I don't agree with), and that we need to have a test suite to see what exactly breaks with these changes. -Andrei Derick Rethans wrote: On Sun, 4 May 2008, Tomas Kuliavas wrote: We've discussed this a few times in the past and it's time to make a final decision about its removal. I think most people have agreed that this is the way forward but no one has produced a patch. I have a student working on unicode conversion for the Google Summer of Code and this would help make it simpler. unicode_semantics=on breaks backwards compatibility in scripts that have implemented multiple character set support in current PHP setups. Why don't you go ahead and make a list of those exacty issues then? We can then see how to fix those issues. That's much more useful then just posting to the mailinglist when you don't agree with something. From what I've seen with my code base, the changes that I have to do are minimal once some (internal) functions are fixed up. regards, Derick -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
On 07.05.2008, at 18:35, Andrei Zmievski wrote: As far as I remember, the latest point was to remove the unicode_semantics switch and presume that its value is always On. At the same time we said that binary strings should probably be the default string type (which I don't agree with), and that we need to have a test suite to see what exactly breaks with these changes. yeah .. that is what i remember as well .. one decision done .. one more to go (what the default string type will be unicode or binary) regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
Tomas Kuliavas wrote: If I remain silent, others will have arguments that everybody agrees on removal of unicode_semantics. I write and maintain charset decoding and encoding functions. unicode_semantics breaks every mapping table and other functions that operate with binary 8bit strings. Just curious, do these decoding/encoding functions do something that Unicode support won't do? In slides by Andrei Zmievski Unicode symbols are written with \u. Why are they written with \x(hex) and \(octal) in current PHP6? \x and \(octal) inside Unicode strings are assumed to specify Unicode characters. This is one of the contention points, since a few people have said that they should specify individual bytes rather than characters, but in my opinion it's kind of dangerous since it may lead to broken/invalid Unicode strings. --- ?php echo \xC3\200; --- I am not writing U+00C3 and U+0080, I am writing U+00C0 in UTF-8. This should work fine inside binary strings.. I can bypass it by adding one line to every script that operates with binary strings, but where are warranties that you won't dump declare() support just like you dump unicode_semantics. It won't get dumped. Unicode_semantics is a BC/transition switch. declare() is crucial to proper script parsing. What happens to your new Unicode aware string functions, if I lie about strings' charset to PHP interpreter? You will get in trouble. mb_strlen can't calculate correct $string length even when I set correct charset in mb_strlen() arguments. If above code works as I want in PHP6 unicode_semantics=on, mb_strlen($string,'utf-8') returns 2 and not 1. I don't know what mbstring does or does not with unicode_semantics switch, since it's meant to be deprecated. -Andrei -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
Precisely. Stefan Walk wrote: Lester Caine schrieb: That sounds like just the sort of edge case that Derick is suggesting needs logging for fixing up. unicode_semantics=on is just another bodge to to make it happen rather than a solution. I think I understand your description, and to my eyes it looks like a unicode bug that needs addressing? No, it's a misunderstanding of how things work that has been explained to Tomas countless times. A unicode string consists of codepoints, not of bytes. Having \xXX and \XXX insert bytes instead of codepoints does not make sense, because a) That would require a defined unicode encoding to be used, and even if that is the case b) would allow you to insert broken data into the unicode string, so it's not a unicode string anymore, which is a no-no. If you want to do that sort of fiddling with binary details, use binary strings, not unicode strings. Regards, Stefan -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] Removal of unicode_semantics
Yep, we said that we'd remove the switch. Then we'd see how compatibility fairs and if we discover the upgrade path is too painful we'd consider making be binary string and require u for Unicode strings. But this was TBD depending on people's experiences and our ability to deliver an easy migration path for applications. So for now we should remove the switch. We can do this if needed. Andi -Original Message- From: Andrei Zmievski [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 07, 2008 9:36 AM To: Derick Rethans Cc: Tomas Kuliavas; internals@lists.php.net Subject: Re: [PHP-DEV] Removal of unicode_semantics As far as I remember, the latest point was to remove the unicode_semantics switch and presume that its value is always On. At the same time we said that binary strings should probably be the default string type (which I don't agree with), and that we need to have a test suite to see what exactly breaks with these changes. -Andrei Derick Rethans wrote: On Sun, 4 May 2008, Tomas Kuliavas wrote: We've discussed this a few times in the past and it's time to make a final decision about its removal. I think most people have agreed that this is the way forward but no one has produced a patch. I have a student working on unicode conversion for the Google Summer of Code and this would help make it simpler. unicode_semantics=on breaks backwards compatibility in scripts that have implemented multiple character set support in current PHP setups. Why don't you go ahead and make a list of those exacty issues then? We can then see how to fix those issues. That's much more useful then just posting to the mailinglist when you don't agree with something. From what I've seen with my code base, the changes that I have to do are minimal once some (internal) functions are fixed up. regards, Derick -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
+1 for removal. Scott MacVicar [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Hi everyone, We've discussed this a few times in the past and it's time to make a final decision about its removal. I think most people have agreed that this is the way forward but no one has produced a patch. I have a student working on unicode conversion for the Google Summer of Code and this would help make it simpler. If there are no serious objections I'll create a patch and get this done as soon as possible Scott -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
Tomas Kuliavas wrote: We've discussed this a few times in the past and it's time to make a final decision about its removal. I think most people have agreed that this is the way forward but no one has produced a patch. I have a student working on unicode conversion for the Google Summer of Code and this would help make it simpler. unicode_semantics=on breaks backwards compatibility in scripts that have implemented multiple character set support in current PHP setups. Why don't you go ahead and make a list of those exacty issues then? We can then see how to fix those issues. That's much more useful then just posting to the mailinglist when you don't agree with something. From what I've seen with my code base, the changes that I have to do are minimal once some (internal) functions are fixed up. If I remain silent, others will have arguments that everybody agrees on removal of unicode_semantics. snip I can bypass it by adding one line to every script that operates with binary strings, but where are warranties that you won't dump declare() support just like you dump unicode_semantics. What happens to your new Unicode aware string functions, if I lie about strings' charset to PHP interpreter? mb_strlen can't calculate correct $string length even when I set correct charset in mb_strlen() arguments. If above code works as I want in PHP6 unicode_semantics=on, mb_strlen($string,'utf-8') returns 2 and not 1. That sounds like just the sort of edge case that Derick is suggesting needs logging for fixing up. unicode_semantics=on is just another bodge to to make it happen rather than a solution. I think I understand your description, and to my eyes it looks like a unicode bug that needs addressing? We have been maintaining two code bases for a long time now - PHP4 and PHP5. Now that PHP4 is being shelved finally those of us who have had to maintain compatibility with PHP4 can now move on and address the problems of PHP5/PHP6 compatibility. So from *MY* point of view unicode_semantics=on is creating a THIRD case to have to manage? PLEASE can someone take charge and at least get PHP6 moving forward to a stable alpha so that we have something users can be happy to test against! PHP5 = code sets PHP6 = Unicode -- Lester Caine - G8HFL - Contact - http://home.lsces.co.uk/lsces/wiki/?page=contact L.S.Caine Electronic Services - http://home.lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk// Firebird - http://www.firebirdsql.org/index.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
On 04.05.2008 20:34, Tomas Kuliavas wrote: We've discussed this a few times in the past and it's time to make a final decision about its removal. I think most people have agreed that this is the way forward but no one has produced a patch. I have a student working on unicode conversion for the Google Summer of Code and this would help make it simpler. unicode_semantics=on breaks backwards compatibility in scripts that have implemented multiple character set support in current PHP setups. If setting is removed, instead of maintaining at least some bits of backwards compatibility and doing some additional work, you force massive code rewrites in scripts that depend on working charset support and more work for people, who use interpreter. That is correct, removing The Switch does cause some backward compatibility breakage. But The Switch does NOT fix it, that's the problem: you would still have to fix your applications to work with unicode_semantics both OFF and ON, i.e. it causes _2x more_ trouble. Every time somebody proposes removal of this setting, they claim that majority agreed on it when there is no agreement on anything. The majority of active developers have agreed that the switch would cause more harm than good. That's the fact. -- Wbr, Antony Dovgal -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
My biggest concern is the 2 code bases that need to be maintained by the PHP developers, you need to have two branches for handling unicode and native strings. To sum it up, unicode_semantics is in the exact same vain as ze1_compatability and it was a complete failure. Totally agree! Before any developers decide they need to port things to PHP 6 we need to just make it Unicode only. I have some internal applications that I am happy to try porting to PHP 6 to see the outcome and list any issues, I was waiting for this switch to be removed first though... If I have time I might try and do it before but currently I'm pretty snowed under currently. Regards Marco
Re: [PHP-DEV] Removal of unicode_semantics
Lester Caine schrieb: That sounds like just the sort of edge case that Derick is suggesting needs logging for fixing up. unicode_semantics=on is just another bodge to to make it happen rather than a solution. I think I understand your description, and to my eyes it looks like a unicode bug that needs addressing? No, it's a misunderstanding of how things work that has been explained to Tomas countless times. A unicode string consists of codepoints, not of bytes. Having \xXX and \XXX insert bytes instead of codepoints does not make sense, because a) That would require a defined unicode encoding to be used, and even if that is the case b) would allow you to insert broken data into the unicode string, so it's not a unicode string anymore, which is a no-no. If you want to do that sort of fiddling with binary details, use binary strings, not unicode strings. Regards, Stefan -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
Lester Caine schrieb: That sounds like just the sort of edge case that Derick is suggesting needs logging for fixing up. unicode_semantics=on is just another bodge to to make it happen rather than a solution. I think I understand your description, and to my eyes it looks like a unicode bug that needs addressing? No, it's a misunderstanding of how things work that has been explained to Tomas countless times. A unicode string consists of codepoints, not of bytes. Having \xXX and \XXX insert bytes instead of codepoints does not make sense, because a) That would require a defined unicode encoding to be used, and even if that is the case b) would allow you to insert broken data into the unicode string, so it's not a unicode string anymore, which is a no-no. If you want to do that sort of fiddling with binary details, use binary strings, not unicode strings. I agree that it is not a bug, because I declare invalid encoding in scripts in order to make sure that binary and unicode bytes are equal. You haven't explained me how things work. All your explanations ask me to use code compatible only with PHP 5.2.1+, drop code that worked fine in older PHP versions and take away control of charset conversions. I want backwards compatibility with PHP 5.2.0 and PHP4. I want to be able to control charset conversions. Where are warranties that charset conversions will work better in PHP6? In current setups it is safer to do charset conversions internally instead of relying on PHP to do things. And I can't drop that code entirely because Unicode implementation in PHP 5.2.1 is dummy. It is there only to avoid E_PARSE errors in PHP6 compatible code. -- Tomas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
On 05.05.2008 12:16, Tomas Kuliavas wrote: PHP4, PHP5 and PHP6 unicode_semantics = off work same way. No, they do not work in the same way. I.e. we were trying to make PHP5 work in the same way PHP4 did as much as we could, but that's not always possible. Same for PHP6 - there will be some differences anyway, that's the reality. -- Wbr, Antony Dovgal -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
On Mon, 5 May 2008, Lester Caine wrote: So from *MY* point of view unicode_semantics=on is creating a THIRD case to have to manage? PLEASE can someone take charge and at least get PHP6 moving forward to a stable alpha so that we have something users can be happy to test against! I think the reason why people are reluctant to take charge here is just because of this setting. regards, Derick -- Derick Rethans http://derickrethans.nl | http://ezcomponents.org | http://xdebug.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
PHP4, PHP5 and PHP6 unicode_semantics = off work same way. No, they do not work in the same way. I.e. we were trying to make PHP5 work in the same way PHP4 did as much as we could, but that's not always possible. In my case they do. -- Tomas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
On 05.05.2008 12:44, Tomas Kuliavas wrote: PHP4, PHP5 and PHP6 unicode_semantics = off work same way. No, they do not work in the same way. I.e. we were trying to make PHP5 work in the same way PHP4 did as much as we could, but that's not always possible. In my case they do. This means your case is very simple and you have nothing to worry about. -- Wbr, Antony Dovgal -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
Derick Rethans wrote: On Mon, 5 May 2008, Lester Caine wrote: So from *MY* point of view unicode_semantics=on is creating a THIRD case to have to manage? PLEASE can someone take charge and at least get PHP6 moving forward to a stable alpha so that we have something users can be happy to test against! I think the reason why people are reluctant to take charge here is just because of this setting. And as a result nothing is happening :( Do we need to set up some formal vote on this quite basic feature which was - I thought - the whole basis that PHP6 was being built on? Or do we have to wait another 5 years for PHP6 :( Working with Unicode does require a different mindset, and THEN overloading it by requiring complete compatibility with a non-unicode model is adding a level of complexity that has resulted in the current stalemate? I was ready to run with Unicode/PHP6 two years ago and run all the database data Unicode as well, but at present things seem to be in limbo all around? -- Lester Caine - G8HFL - Contact - http://home.lsces.co.uk/lsces/wiki/?page=contact L.S.Caine Electronic Services - http://home.lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk// Firebird - http://www.firebirdsql.org/index.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
Am 05.05.2008 um 09:51 schrieb Antony Dovgal: On 04.05.2008 20:34, Tomas Kuliavas wrote: We've discussed this a few times in the past and it's time to make a final decision about its removal. I think most people have agreed that this is the way forward but no one has produced a patch. I have a student working on unicode conversion for the Google Summer of Code and this would help make it simpler. unicode_semantics=on breaks backwards compatibility in scripts that have implemented multiple character set support in current PHP setups. If setting is removed, instead of maintaining at least some bits of backwards compatibility and doing some additional work, you force massive code rewrites in scripts that depend on working charset support and more work for people, who use interpreter. That is correct, removing The Switch does cause some backward compatibility breakage. But The Switch does NOT fix it, that's the problem: you would still have to fix your applications to work with unicode_semantics both OFF and ON, i.e. it causes _2x more_ trouble. Every time somebody proposes removal of this setting, they claim that majority agreed on it when there is no agreement on anything. The majority of active developers have agreed that the switch would cause more harm than good. That's the fact. And that's the word. +1000. Lets get rid of it and move on. David -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
Just use Unicode and don't even think about backward compability, because thouse who need it most probably still are with PHP4 and MySQL 3.x Most normal developers are for years with utf-8 for now and even wouldn't notice it. So +1 for pure Unicode. No switches. Lame hosting companies 100% will mess up with this switch and will ruin everything again like it was with PHP5. Make them pay for PHP5! ;) :D
Re: [PHP-DEV] Removal of unicode_semantics
Arvids Godjuks wrote: Most normal developers are for years with utf-8 for now and even wouldn't notice it. Sorry to destroy your pipe dream but that's just not true. - Chris -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
Well, at least in my country i haven't saw any normal programmer not using unicode :) 2008/5/5 Christian Schneider [EMAIL PROTECTED]: Arvids Godjuks wrote: Most normal developers are for years with utf-8 for now and even wouldn't notice it. Sorry to destroy your pipe dream but that's just not true. - Chris
Re: [PHP-DEV] Removal of unicode_semantics
Arvids Godjuks wrote: Well, at least in my country i haven't saw any normal programmer not using unicode :) meta-posting I guess that was meant to be an ironic comment but I think we should improve the signal-to-noise ration on internals again. /meta-posting - Chris -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Removal of unicode_semantics
Hi everyone, We've discussed this a few times in the past and it's time to make a final decision about its removal. I think most people have agreed that this is the way forward but no one has produced a patch. I have a student working on unicode conversion for the Google Summer of Code and this would help make it simpler. If there are no serious objections I'll create a patch and get this done as soon as possible Scott -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
We've discussed this a few times in the past and it's time to make a final decision about its removal. I think most people have agreed that this is the way forward but no one has produced a patch. I have a student working on unicode conversion for the Google Summer of Code and this would help make it simpler. unicode_semantics=on breaks backwards compatibility in scripts that have implemented multiple character set support in current PHP setups. If setting is removed, instead of maintaining at least some bits of backwards compatibility and doing some additional work, you force massive code rewrites in scripts that depend on working charset support and more work for people, who use interpreter. Every time somebody proposes removal of this setting, they claim that majority agreed on it when there is no agreement on anything. People only defended own positions and we had other flame about unicode_semantics. -- Tomas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
On Sun, 4 May 2008, Tomas Kuliavas wrote: We've discussed this a few times in the past and it's time to make a final decision about its removal. I think most people have agreed that this is the way forward but no one has produced a patch. I have a student working on unicode conversion for the Google Summer of Code and this would help make it simpler. unicode_semantics=on breaks backwards compatibility in scripts that have implemented multiple character set support in current PHP setups. Why don't you go ahead and make a list of those exacty issues then? We can then see how to fix those issues. That's much more useful then just posting to the mailinglist when you don't agree with something. From what I've seen with my code base, the changes that I have to do are minimal once some (internal) functions are fixed up. regards, Derick -- Derick Rethans http://derickrethans.nl | http://ezcomponents.org | http://xdebug.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
Tomas Kuliavas wrote: We've discussed this a few times in the past and it's time to make a final decision about its removal. I think most people have agreed that this is the way forward but no one has produced a patch. I have a student working on unicode conversion for the Google Summer of Code and this would help make it simpler. unicode_semantics=on breaks backwards compatibility in scripts that have implemented multiple character set support in current PHP setups. If setting is removed, instead of maintaining at least some bits of backwards compatibility and doing some additional work, you force massive code rewrites in scripts that depend on working charset support and more work for people, who use interpreter. Every time somebody proposes removal of this setting, they claim that majority agreed on it when there is no agreement on anything. People only defended own positions and we had other flame about unicode_semantics. There has been agreement by the people that actually contribute towards the development of PHP. It certainly doesn't give backwards compatability, you are able to turn it off in php.ini and its going to mean that developers will need to maintain two versions. One for it off and the other for on. My biggest concern is the 2 code bases that need to be maintained by the PHP developers, you need to have two branches for handling unicode and native strings. To sum it up, unicode_semantics is in the exact same vain as ze1_compatability and it was a complete failure. Before any developers decide they need to port things to PHP 6 we need to just make it Unicode only. Scott -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
On Sun, May 4, 2008 at 8:34 PM, Tomas Kuliavas [EMAIL PROTECTED] wrote: We've discussed this a few times in the past and it's time to make a final decision about its removal. I think most people have agreed that this is the way forward but no one has produced a patch. I have a student working on unicode conversion for the Google Summer of Code and this would help make it simpler. unicode_semantics=on breaks backwards compatibility in scripts that have implemented multiple character set support in current PHP setups. If setting is removed, instead of maintaining at least some bits of backwards compatibility and doing some additional work, you force massive code rewrites in scripts that depend on working charset support and more work for people, who use interpreter. Every time somebody proposes removal of this setting, they claim that majority agreed on it when there is no agreement on anything. People only defended own positions and we had other flame about unicode_semantics. It's the lesser of two evils. If the switch stays there, every future-author of libraries/frameworks will have to maintain 2 separate code-bases (one for unicode_semantics=off, other for unicode_semantics=on). On the other hand, 1 year from now it would be safe to require 5.2.1 as a minimal supported version of php, which will allow you to mark all the strings as binary, which will lead to eaier migration to php-6 -- Alexey Zakhlestin http://blog.milkfarmsoft.com/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
Tomas Kuliavas wrote: We've discussed this a few times in the past and it's time to make a final decision about its removal. I think most people have agreed that this is the way forward but no one has produced a patch. I have a student working on unicode conversion for the Google Summer of Code and this would help make it simpler. unicode_semantics=on breaks backwards compatibility in scripts that have implemented multiple character set support in current PHP setups. If setting is removed, instead of maintaining at least some bits of backwards compatibility and doing some additional work, you force massive code rewrites in scripts that depend on working charset support and more work for people, who use interpreter. Every time somebody proposes removal of this setting, they claim that majority agreed on it when there is no agreement on anything. People only defended own positions and we had other flame about unicode_semantics. And leaving unicode_semantics in will make it so web application developers like myself, who distribute their applications to be installed on people's own servers, have to write two different versions of their software to support the switch being on or off because of the major differences in the language based on an ini setting. Not only is there twice the code in PHP's codebase, there's twice the code in the codebases for people like me. But, we've been through this discussion before. I've already stated my opinions. +1 to removing this. -- Jeremy Privett C.E.O. C.S.A. Omega Vortex Corporation http://www.omegavortex.net Please note: This message has been sent with information that could be confidential and meant only for the intended recipient. If you are not the intended recipient, please delete all copies and inform us of the error as soon as possible. Thank you for your cooperation. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
Hey Scott As the most others already have posted, then from the php developers point it would be stupid to maintain two versions of the same function unless you wrap it all into a function that does it by itself. And yes zend.ze1_compatibility_mode was a major failure. +1 for removal Kalle - Original Message - From: Scott MacVicar [EMAIL PROTECTED] To: PHP Developers Mailing List internals@lists.php.net Sent: Sunday, May 04, 2008 6:12 PM Subject: [PHP-DEV] Removal of unicode_semantics Hi everyone, We've discussed this a few times in the past and it's time to make a final decision about its removal. I think most people have agreed that this is the way forward but no one has produced a patch. I have a student working on unicode conversion for the Google Summer of Code and this would help make it simpler. If there are no serious objections I'll create a patch and get this done as soon as possible Scott -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Removal of unicode_semantics
We've discussed this a few times in the past and it's time to make a final decision about its removal. I think most people have agreed that this is the way forward but no one has produced a patch. I have a student working on unicode conversion for the Google Summer of Code and this would help make it simpler. unicode_semantics=on breaks backwards compatibility in scripts that have implemented multiple character set support in current PHP setups. Why don't you go ahead and make a list of those exacty issues then? We can then see how to fix those issues. That's much more useful then just posting to the mailinglist when you don't agree with something. From what I've seen with my code base, the changes that I have to do are minimal once some (internal) functions are fixed up. If I remain silent, others will have arguments that everybody agrees on removal of unicode_semantics. I write and maintain charset decoding and encoding functions. unicode_semantics breaks every mapping table and other functions that operate with binary 8bit strings. In slides by Andrei Zmievski Unicode symbols are written with \u. Why are they written with \x(hex) and \(octal) in current PHP6? --- ?php echo \xC3\200; --- I am not writing U+00C3 and U+0080, I am writing U+00C0 in UTF-8. --- ?php $string = ą; var_dump(preg_replace(/([\300-\337])([\200-\277])/e, '#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';', $string)); for ($i=0;$istrlen($string);$i++) { $char = ord($string[$i]); echo sprintf(=%02X,$char); } --- string(6) #261; and '=C4=85' expected, if ą is written in UTF-8. I can bypass it by adding one line to every script that operates with binary strings, but where are warranties that you won't dump declare() support just like you dump unicode_semantics. What happens to your new Unicode aware string functions, if I lie about strings' charset to PHP interpreter? mb_strlen can't calculate correct $string length even when I set correct charset in mb_strlen() arguments. If above code works as I want in PHP6 unicode_semantics=on, mb_strlen($string,'utf-8') returns 2 and not 1. -- Tomas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php