Re: [PHP-DEV] Removal of unicode_semantics

2008-05-08 Thread Hannes Magnusson
On Thu, May 8, 2008 at 7:33 AM, Andi Gutmans [EMAIL PROTECTED] wrote:
  So for now we should remove the switch. We can do this if needed.

Who is we in this context? Zend?
Scott is already working on the removal but I'll bet he would really
appreciate help with it.

-Hannes

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] Removal of unicode_semantics

2008-05-08 Thread Andi Gutmans
See below:

 -Original Message-
 From: Derick Rethans [mailto:[EMAIL PROTECTED]
 Sent: Thursday, May 08, 2008 12:23 AM
 To: Andi Gutmans
 Cc: Andrei Zmievski; PHP Developers Mailing List
 Subject: RE: [PHP-DEV] Removal of unicode_semantics
 
 
 Scott is already working on this AFAIK. And like Andrei, I'd also be
 against defaulting to binary strings.

Great. Dmitry can help out if needed. He'll be reviewing it anyway.

I understand you are against it but as we discussed on this list a few months 
ago we will have to see what reality delivers when people actually start 
migrating applications. It's not something we should decide at this point 
before we are any smarter. For now we can definitely keep  as Unicode and 
we'll learn how that works during the alpha/beta cycles.
We do owe our users a feasible upgrade path whether it's with automated scripts 
or some other way. As we figure that out it'll become more apparent what makes 
sense. 

Andi



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-08 Thread Andrei Zmievski
The easiest thing would be just to default unicode_semantics to On 
internally and hide it from users. Don't remove all the UG(unicode) 
checks yet, because we can test migration/compatibility with those in place.


-Andrei

Derick Rethans wrote:

On Wed, 7 May 2008, Andi Gutmans wrote:

Yep, we said that we'd remove the switch. Then we'd see how 
compatibility fairs and if we discover the upgrade path is too painful 
we'd consider making  be binary string and require u for Unicode 
strings. But this was TBD depending on people's experiences and our 
ability to deliver an easy migration path for applications. So for now 
we should remove the switch. We can do this if needed.


Scott is already working on this AFAIK. And like Andrei, I'd also be 
against defaulting to binary strings.


regards,
Derick



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-07 Thread Andrei Zmievski
As far as I remember, the latest point was to remove the 
unicode_semantics switch and presume that its value is always On. At the 
same time we said that binary strings should probably be the default 
string type (which I don't agree with), and that we need to have a test 
suite to see what exactly breaks with these changes.


-Andrei

Derick Rethans wrote:

On Sun, 4 May 2008, Tomas Kuliavas wrote:


We've discussed this a few times in the past and it's time to make a
final decision about its removal.

I think most people have agreed that this is the way forward but no
one has produced a patch. I have a student working on unicode
conversion for the Google Summer of Code and this would help make it
simpler.

unicode_semantics=on breaks backwards compatibility in scripts that have
implemented multiple character set support in current PHP setups.


Why don't you go ahead and make a list of those exacty issues then? We 
can then see how to fix those issues. That's much more useful then just 
posting to the mailinglist when you don't agree with something. From 
what I've seen with my code base, the changes that I have to do are 
minimal once some (internal) functions are fixed up.


regards,
Derick



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-07 Thread Lukas Kahwe Smith


On 07.05.2008, at 18:35, Andrei Zmievski wrote:

As far as I remember, the latest point was to remove the  
unicode_semantics switch and presume that its value is always On. At  
the same time we said that binary strings should probably be the  
default string type (which I don't agree with), and that we need to  
have a test suite to see what exactly breaks with these changes.


yeah .. that is what i remember as well ..
one decision done .. one more to go (what the default string type will  
be unicode or binary)


regards,
Lukas

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-07 Thread Andrei Zmievski

Tomas Kuliavas wrote:

If I remain silent, others will have arguments that everybody agrees on
removal of unicode_semantics.

I write and maintain charset decoding and encoding functions.
unicode_semantics breaks every mapping table and other functions that
operate with binary 8bit strings.


Just curious, do these decoding/encoding functions do something that 
Unicode support won't do?



In slides by Andrei Zmievski Unicode symbols are written with \u. Why are
they written with \x(hex) and \(octal) in current PHP6?


\x and \(octal) inside Unicode strings are assumed to specify Unicode 
characters. This is one of the contention points, since a few people 
have said that they should specify individual bytes rather than 
characters, but in my opinion it's kind of dangerous since it may lead 
to broken/invalid Unicode strings.



---
?php
echo \xC3\200;
---
I am not writing U+00C3 and U+0080, I am writing U+00C0 in UTF-8.


This should work fine inside binary strings..


I can bypass it by adding one line to every script that operates with
binary strings, but where are warranties that you won't dump declare()
support just like you dump unicode_semantics.


It won't get dumped. Unicode_semantics is a BC/transition switch. 
declare() is crucial to proper script parsing.



What happens to your new
Unicode aware string functions, if I lie about strings' charset to PHP
interpreter?


You will get in trouble.


mb_strlen can't calculate correct $string length even when I
set correct charset in mb_strlen() arguments. If above code works as I
want in PHP6 unicode_semantics=on, mb_strlen($string,'utf-8') returns 2
and not 1.


I don't know what mbstring does or does not with unicode_semantics 
switch, since it's meant to be deprecated.


-Andrei

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-07 Thread Andrei Zmievski

Precisely.

Stefan Walk wrote:

Lester Caine schrieb:
That sounds like just the sort of edge case that Derick is suggesting 
needs logging for fixing up. unicode_semantics=on is just another 
bodge to to make it happen rather than a solution. I think I 
understand your description, and to my eyes it looks like a unicode 
bug that needs addressing?


No, it's a misunderstanding of how things work that has been explained 
to Tomas countless times. A unicode string consists of codepoints, not 
of bytes. Having \xXX and \XXX insert bytes instead of codepoints does 
not make sense, because  a) That would require a defined unicode 
encoding to be used, and even if that is the case b) would allow you to 
insert broken data into the unicode string, so it's not a unicode string 
anymore, which is a no-no. If you want to do that sort of fiddling with 
binary details, use binary strings, not unicode strings.


Regards,
Stefan



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] Removal of unicode_semantics

2008-05-07 Thread Andi Gutmans
Yep, we said that we'd remove the switch. Then we'd see how compatibility fairs 
and if we discover the upgrade path is too painful we'd consider making  be 
binary string and require u for Unicode strings. But this was TBD depending 
on people's experiences and our ability to deliver an easy migration path for 
applications.
So for now we should remove the switch. We can do this if needed.

Andi

 -Original Message-
 From: Andrei Zmievski [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, May 07, 2008 9:36 AM
 To: Derick Rethans
 Cc: Tomas Kuliavas; internals@lists.php.net
 Subject: Re: [PHP-DEV] Removal of unicode_semantics
 
 As far as I remember, the latest point was to remove the
 unicode_semantics switch and presume that its value is always On. At
 the
 same time we said that binary strings should probably be the default
 string type (which I don't agree with), and that we need to have a test
 suite to see what exactly breaks with these changes.
 
 -Andrei
 
 Derick Rethans wrote:
  On Sun, 4 May 2008, Tomas Kuliavas wrote:
 
  We've discussed this a few times in the past and it's time to make
 a
  final decision about its removal.
 
  I think most people have agreed that this is the way forward but no
  one has produced a patch. I have a student working on unicode
  conversion for the Google Summer of Code and this would help make
 it
  simpler.
  unicode_semantics=on breaks backwards compatibility in scripts that
 have
  implemented multiple character set support in current PHP setups.
 
  Why don't you go ahead and make a list of those exacty issues then?
 We
  can then see how to fix those issues. That's much more useful then
 just
  posting to the mailinglist when you don't agree with something. From
  what I've seen with my code base, the changes that I have to do are
  minimal once some (internal) functions are fixed up.
 
  regards,
  Derick
 
 
 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-06 Thread Nikolay Ananiev
+1 for removal.

Scott MacVicar [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 Hi everyone,

 We've discussed this a few times in the past and it's time to make a 
 final decision about its removal.

 I think most people have agreed that this is the way forward but no  one 
 has produced a patch. I have a student working on unicode  conversion for 
 the Google Summer of Code and this would help make it  simpler.

 If there are no serious objections I'll create a patch and get this  done 
 as soon as possible

 Scott

 -- 
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php

 




-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-05 Thread Lester Caine

Tomas Kuliavas wrote:

We've discussed this a few times in the past and it's time to make a
final decision about its removal.

I think most people have agreed that this is the way forward but no
one has produced a patch. I have a student working on unicode
conversion for the Google Summer of Code and this would help make it
simpler.

unicode_semantics=on breaks backwards compatibility in scripts that have
implemented multiple character set support in current PHP setups.

Why don't you go ahead and make a list of those exacty issues then? We
can then see how to fix those issues. That's much more useful then just
posting to the mailinglist when you don't agree with something. From
what I've seen with my code base, the changes that I have to do are
minimal once some (internal) functions are fixed up.


If I remain silent, others will have arguments that everybody agrees on
removal of unicode_semantics.

snip


I can bypass it by adding one line to every script that operates with
binary strings, but where are warranties that you won't dump declare()
support just like you dump unicode_semantics. What happens to your new
Unicode aware string functions, if I lie about strings' charset to PHP
interpreter? mb_strlen can't calculate correct $string length even when I
set correct charset in mb_strlen() arguments. If above code works as I
want in PHP6 unicode_semantics=on, mb_strlen($string,'utf-8') returns 2
and not 1.


That sounds like just the sort of edge case that Derick is suggesting needs 
logging for fixing up. unicode_semantics=on is just another bodge to to make 
it happen rather than a solution. I think I understand your description, and 
to my eyes it looks like a unicode bug that needs addressing?


We have been maintaining two code bases for a long time now - PHP4 and PHP5. 
Now that PHP4 is being shelved finally those of us who have had to maintain 
compatibility with PHP4 can now move on and address the problems of PHP5/PHP6 
compatibility. So from *MY* point of view unicode_semantics=on is creating a 
THIRD case to have to manage? PLEASE can someone take charge and at least get 
PHP6 moving forward to a stable alpha so that we have something users can be 
happy to test against!


PHP5 = code sets
PHP6 = Unicode

--
Lester Caine - G8HFL
-
Contact - http://home.lsces.co.uk/lsces/wiki/?page=contact
L.S.Caine Electronic Services - http://home.lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-05 Thread Antony Dovgal

On 04.05.2008 20:34, Tomas Kuliavas wrote:

We've discussed this a few times in the past and it's time to make a
final decision about its removal.

I think most people have agreed that this is the way forward but no
one has produced a patch. I have a student working on unicode
conversion for the Google Summer of Code and this would help make it
simpler.


unicode_semantics=on breaks backwards compatibility in scripts that have
implemented multiple character set support in current PHP setups.

If setting is removed, instead of maintaining at least some bits of
backwards compatibility and doing some additional work, you force massive
code rewrites in scripts that depend on working charset support and more
work for people, who use interpreter.


That is correct, removing The Switch does cause some backward compatibility 
breakage.
But The Switch does NOT fix it, that's the problem: you would still have 
to fix your applications to work with unicode_semantics both OFF and ON, 
i.e. it causes _2x more_ trouble.



Every time somebody proposes removal of this setting, they claim that
majority agreed on it when there is no agreement on anything.


The majority of active developers have agreed that the switch would cause more 
harm than good.
That's the fact.

--
Wbr, 
Antony Dovgal


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-05 Thread Marco


 My biggest concern is the 2 code bases that need to be maintained by the
 PHP developers, you need to have two branches for handling unicode and
 native strings.


 To sum it up, unicode_semantics is in the exact same vain as
 ze1_compatability and it was a complete failure.


Totally agree!



 Before any developers decide they need to port things to PHP 6 we need to
 just make it Unicode only.


I have some internal applications that I am happy to try porting to PHP 6 to
see the outcome and list any issues, I was waiting for this switch to be
removed first though...  If I have time I might try and do it before but
currently I'm pretty snowed under currently.

Regards

Marco


Re: [PHP-DEV] Removal of unicode_semantics

2008-05-05 Thread Stefan Walk

Lester Caine schrieb:
That sounds like just the sort of edge case that Derick is suggesting 
needs logging for fixing up. unicode_semantics=on is just another bodge 
to to make it happen rather than a solution. I think I understand your 
description, and to my eyes it looks like a unicode bug that needs 
addressing?


No, it's a misunderstanding of how things work that has been explained 
to Tomas countless times. A unicode string consists of codepoints, not 
of bytes. Having \xXX and \XXX insert bytes instead of codepoints does 
not make sense, because  a) That would require a defined unicode 
encoding to be used, and even if that is the case b) would allow you to 
insert broken data into the unicode string, so it's not a unicode string 
anymore, which is a no-no. If you want to do that sort of fiddling with 
binary details, use binary strings, not unicode strings.


Regards,
Stefan

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-05 Thread Tomas Kuliavas
 Lester Caine schrieb:
 That sounds like just the sort of edge case that Derick is suggesting
 needs logging for fixing up. unicode_semantics=on is just another bodge
 to to make it happen rather than a solution. I think I understand your
 description, and to my eyes it looks like a unicode bug that needs
 addressing?

 No, it's a misunderstanding of how things work that has been explained
 to Tomas countless times. A unicode string consists of codepoints, not
 of bytes. Having \xXX and \XXX insert bytes instead of codepoints does
 not make sense, because  a) That would require a defined unicode
 encoding to be used, and even if that is the case b) would allow you to
 insert broken data into the unicode string, so it's not a unicode string
 anymore, which is a no-no. If you want to do that sort of fiddling with
 binary details, use binary strings, not unicode strings.

I agree that it is not a bug, because I declare invalid encoding in
scripts in order to make sure that binary and unicode bytes are equal.

You haven't explained me how things work. All your explanations ask me to
use code compatible only with PHP 5.2.1+, drop code that worked fine in
older PHP versions and take away control of charset conversions. I want
backwards compatibility with PHP 5.2.0 and PHP4. I want to be able to
control charset conversions. Where are warranties that charset conversions
will work better in PHP6? In current setups it is safer to do charset
conversions internally instead of relying on PHP to do things. And I can't
drop that code entirely because Unicode implementation in PHP 5.2.1 is
dummy. It is there only to avoid E_PARSE errors in PHP6 compatible code.

-- 
Tomas



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-05 Thread Antony Dovgal

On 05.05.2008 12:16, Tomas Kuliavas wrote:

PHP4, PHP5 and PHP6 unicode_semantics = off work same way.


No, they do not work in the same way.
I.e. we were trying to make PHP5 work in the same way PHP4 did as much as 
we could, but that's not always possible.


Same for PHP6 - there will be some differences anyway, that's the reality.

--
Wbr, 
Antony Dovgal


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-05 Thread Derick Rethans
On Mon, 5 May 2008, Lester Caine wrote:

 So from *MY* point of view unicode_semantics=on is creating a THIRD 
 case to have to manage? PLEASE can someone take charge and at least 
 get PHP6 moving forward to a stable alpha so that we have something 
 users can be happy to test against!

I think the reason why people are reluctant to take charge here is 
just because of this setting.

regards,
Derick

-- 
Derick Rethans
http://derickrethans.nl | http://ezcomponents.org | http://xdebug.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-05 Thread Tomas Kuliavas
 PHP4, PHP5 and PHP6 unicode_semantics = off work same way.

 No, they do not work in the same way.
 I.e. we were trying to make PHP5 work in the same way PHP4 did as much as
 we could, but that's not always possible.

In my case they do.

-- 
Tomas


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-05 Thread Antony Dovgal

On 05.05.2008 12:44, Tomas Kuliavas wrote:

PHP4, PHP5 and PHP6 unicode_semantics = off work same way.


No, they do not work in the same way.
I.e. we were trying to make PHP5 work in the same way PHP4 did as much as
we could, but that's not always possible.


In my case they do.


This means your case is very simple and you have nothing to worry about.

--
Wbr, 
Antony Dovgal


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-05 Thread Lester Caine

Derick Rethans wrote:

On Mon, 5 May 2008, Lester Caine wrote:

So from *MY* point of view unicode_semantics=on is creating a THIRD 
case to have to manage? PLEASE can someone take charge and at least 
get PHP6 moving forward to a stable alpha so that we have something 
users can be happy to test against!


I think the reason why people are reluctant to take charge here is 
just because of this setting.


And as a result nothing is happening :(
Do we need to set up some formal vote on this quite basic feature which was - 
I thought - the whole basis that PHP6 was being built on?

Or do we have to wait another 5 years for PHP6 :(

Working with Unicode does require a different mindset, and THEN overloading it 
by requiring complete compatibility with a non-unicode model is adding a level 
of complexity that has resulted in the current stalemate?


I was ready to run with Unicode/PHP6 two years ago and run all the database 
data Unicode as well, but at present things seem to be in limbo all around?


--
Lester Caine - G8HFL
-
Contact - http://home.lsces.co.uk/lsces/wiki/?page=contact
L.S.Caine Electronic Services - http://home.lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-05 Thread David Zülke

Am 05.05.2008 um 09:51 schrieb Antony Dovgal:


On 04.05.2008 20:34, Tomas Kuliavas wrote:

We've discussed this a few times in the past and it's time to make a
final decision about its removal.

I think most people have agreed that this is the way forward but no
one has produced a patch. I have a student working on unicode
conversion for the Google Summer of Code and this would help make it
simpler.
unicode_semantics=on breaks backwards compatibility in scripts that  
have

implemented multiple character set support in current PHP setups.
If setting is removed, instead of maintaining at least some bits of
backwards compatibility and doing some additional work, you force  
massive
code rewrites in scripts that depend on working charset support and  
more

work for people, who use interpreter.


That is correct, removing The Switch does cause some backward  
compatibility breakage.
But The Switch does NOT fix it, that's the problem: you would still  
have to fix your applications to work with unicode_semantics both  
OFF and ON, i.e. it causes _2x more_ trouble.



Every time somebody proposes removal of this setting, they claim that
majority agreed on it when there is no agreement on anything.
The majority of active developers have agreed that the switch would  
cause more harm than good.

That's the fact.


And that's the word. +1000. Lets get rid of it and move on.


David

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-05 Thread Arvids Godjuks
Just use Unicode and don't even think about backward compability, because
thouse who need it most probably still are with PHP4 and MySQL 3.x
Most normal developers are for years with utf-8 for now and even wouldn't
notice it.

So +1 for pure Unicode. No switches. Lame hosting companies 100% will mess
up with this switch and will ruin everything again like it was with PHP5.
Make them pay for PHP5! ;) :D


Re: [PHP-DEV] Removal of unicode_semantics

2008-05-05 Thread Christian Schneider
Arvids Godjuks wrote:
 Most normal developers are for years with utf-8 for now and even wouldn't
 notice it.

Sorry to destroy your pipe dream but that's just not true.

- Chris

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-05 Thread Arvids Godjuks
Well, at least in my country i haven't saw any normal programmer not using
unicode :)

2008/5/5 Christian Schneider [EMAIL PROTECTED]:

 Arvids Godjuks wrote:
  Most normal developers are for years with utf-8 for now and even
 wouldn't
  notice it.

 Sorry to destroy your pipe dream but that's just not true.

 - Chris



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-05 Thread Christian Schneider
Arvids Godjuks wrote:
 Well, at least in my country i haven't saw any normal programmer not using
 unicode :)

meta-posting
I guess that was meant to be an ironic comment but I think we should
improve the signal-to-noise ration on internals again.
/meta-posting

- Chris

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] Removal of unicode_semantics

2008-05-04 Thread Scott MacVicar

Hi everyone,

We've discussed this a few times in the past and it's time to make a  
final decision about its removal.


I think most people have agreed that this is the way forward but no  
one has produced a patch. I have a student working on unicode  
conversion for the Google Summer of Code and this would help make it  
simpler.


If there are no serious objections I'll create a patch and get this  
done as soon as possible


Scott

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-04 Thread Tomas Kuliavas
 We've discussed this a few times in the past and it's time to make a
 final decision about its removal.

 I think most people have agreed that this is the way forward but no
 one has produced a patch. I have a student working on unicode
 conversion for the Google Summer of Code and this would help make it
 simpler.

unicode_semantics=on breaks backwards compatibility in scripts that have
implemented multiple character set support in current PHP setups.

If setting is removed, instead of maintaining at least some bits of
backwards compatibility and doing some additional work, you force massive
code rewrites in scripts that depend on working charset support and more
work for people, who use interpreter.

Every time somebody proposes removal of this setting, they claim that
majority agreed on it when there is no agreement on anything. People only
defended own positions and we had other flame about unicode_semantics.

-- 
Tomas



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-04 Thread Derick Rethans
On Sun, 4 May 2008, Tomas Kuliavas wrote:

  We've discussed this a few times in the past and it's time to make a
  final decision about its removal.
 
  I think most people have agreed that this is the way forward but no
  one has produced a patch. I have a student working on unicode
  conversion for the Google Summer of Code and this would help make it
  simpler.
 
 unicode_semantics=on breaks backwards compatibility in scripts that have
 implemented multiple character set support in current PHP setups.

Why don't you go ahead and make a list of those exacty issues then? We 
can then see how to fix those issues. That's much more useful then just 
posting to the mailinglist when you don't agree with something. From 
what I've seen with my code base, the changes that I have to do are 
minimal once some (internal) functions are fixed up.

regards,
Derick

-- 
Derick Rethans
http://derickrethans.nl | http://ezcomponents.org | http://xdebug.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-04 Thread Scott MacVicar

Tomas Kuliavas wrote:

We've discussed this a few times in the past and it's time to make a
final decision about its removal.

I think most people have agreed that this is the way forward but no
one has produced a patch. I have a student working on unicode
conversion for the Google Summer of Code and this would help make it
simpler.


unicode_semantics=on breaks backwards compatibility in scripts that have
implemented multiple character set support in current PHP setups.

If setting is removed, instead of maintaining at least some bits of
backwards compatibility and doing some additional work, you force massive
code rewrites in scripts that depend on working charset support and more
work for people, who use interpreter.

Every time somebody proposes removal of this setting, they claim that
majority agreed on it when there is no agreement on anything. People only
defended own positions and we had other flame about unicode_semantics.



There has been agreement by the people that actually contribute towards 
the development of PHP.


It certainly doesn't give backwards compatability, you are able to turn 
it off in php.ini and its going to mean that developers will need to 
maintain two versions. One for it off and the other for on.


My biggest concern is the 2 code bases that need to be maintained by the 
PHP developers, you need to have two branches for handling unicode and 
native strings.


To sum it up, unicode_semantics is in the exact same vain as 
ze1_compatability and it was a complete failure.


Before any developers decide they need to port things to PHP 6 we need 
to just make it Unicode only.


Scott

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-04 Thread Alexey Zakhlestin
On Sun, May 4, 2008 at 8:34 PM, Tomas Kuliavas
[EMAIL PROTECTED] wrote:
  We've discussed this a few times in the past and it's time to make a
   final decision about its removal.
  
   I think most people have agreed that this is the way forward but no
   one has produced a patch. I have a student working on unicode
   conversion for the Google Summer of Code and this would help make it
   simpler.

  unicode_semantics=on breaks backwards compatibility in scripts that have
  implemented multiple character set support in current PHP setups.

  If setting is removed, instead of maintaining at least some bits of
  backwards compatibility and doing some additional work, you force massive
  code rewrites in scripts that depend on working charset support and more
  work for people, who use interpreter.

  Every time somebody proposes removal of this setting, they claim that
  majority agreed on it when there is no agreement on anything. People only
  defended own positions and we had other flame about unicode_semantics.

It's the lesser of two evils.
If the switch stays there, every future-author of libraries/frameworks
will have to maintain 2 separate code-bases (one for
unicode_semantics=off, other for unicode_semantics=on).

On the other hand, 1 year from now it would be safe to require 5.2.1
as a minimal supported version of php, which will allow you to mark
all the strings as binary, which will lead to eaier migration to
php-6

-- 
Alexey Zakhlestin
http://blog.milkfarmsoft.com/

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-04 Thread Jeremy Privett

Tomas Kuliavas wrote:

We've discussed this a few times in the past and it's time to make a
final decision about its removal.

I think most people have agreed that this is the way forward but no
one has produced a patch. I have a student working on unicode
conversion for the Google Summer of Code and this would help make it
simpler.



unicode_semantics=on breaks backwards compatibility in scripts that have
implemented multiple character set support in current PHP setups.

If setting is removed, instead of maintaining at least some bits of
backwards compatibility and doing some additional work, you force massive
code rewrites in scripts that depend on working charset support and more
work for people, who use interpreter.

Every time somebody proposes removal of this setting, they claim that
majority agreed on it when there is no agreement on anything. People only
defended own positions and we had other flame about unicode_semantics.

  


And leaving unicode_semantics in will make it so web application 
developers like myself, who distribute their applications to be 
installed on people's own servers, have to write two different versions 
of their software to support the switch being on or off because of the 
major differences in the language based on an ini setting. Not only is 
there twice the code in PHP's codebase, there's twice the code in the 
codebases for people like me.


But, we've been through this discussion before. I've already stated my 
opinions. +1 to removing this.


--
Jeremy Privett
C.E.O.  C.S.A.
Omega Vortex Corporation

http://www.omegavortex.net

Please note: This message has been sent with information that could be 
confidential and meant only for the intended recipient. If you are not the 
intended recipient, please delete all copies and inform us of the error as soon 
as possible. Thank you for your cooperation.


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-04 Thread Kalle Sommer Nielsen

Hey Scott

As the most others already have posted, then from the php developers point
it would be stupid to maintain two versions of the same function unless you
wrap it all into a function that does it by itself.

And yes zend.ze1_compatibility_mode was a major failure.

+1 for removal

Kalle

- Original Message - 
From: Scott MacVicar [EMAIL PROTECTED]

To: PHP Developers Mailing List internals@lists.php.net
Sent: Sunday, May 04, 2008 6:12 PM
Subject: [PHP-DEV] Removal of unicode_semantics



Hi everyone,

We've discussed this a few times in the past and it's time to make a 
final decision about its removal.


I think most people have agreed that this is the way forward but no  one 
has produced a patch. I have a student working on unicode  conversion for 
the Google Summer of Code and this would help make it  simpler.


If there are no serious objections I'll create a patch and get this  done 
as soon as possible


Scott

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php 



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Removal of unicode_semantics

2008-05-04 Thread Tomas Kuliavas
  We've discussed this a few times in the past and it's time to make a
  final decision about its removal.
 
  I think most people have agreed that this is the way forward but no
  one has produced a patch. I have a student working on unicode
  conversion for the Google Summer of Code and this would help make it
  simpler.

 unicode_semantics=on breaks backwards compatibility in scripts that have
 implemented multiple character set support in current PHP setups.

 Why don't you go ahead and make a list of those exacty issues then? We
 can then see how to fix those issues. That's much more useful then just
 posting to the mailinglist when you don't agree with something. From
 what I've seen with my code base, the changes that I have to do are
 minimal once some (internal) functions are fixed up.

If I remain silent, others will have arguments that everybody agrees on
removal of unicode_semantics.

I write and maintain charset decoding and encoding functions.
unicode_semantics breaks every mapping table and other functions that
operate with binary 8bit strings.

In slides by Andrei Zmievski Unicode symbols are written with \u. Why are
they written with \x(hex) and \(octal) in current PHP6?

---
?php
echo \xC3\200;
---
I am not writing U+00C3 and U+0080, I am writing U+00C0 in UTF-8.


---
?php
$string = ą;
var_dump(preg_replace(/([\300-\337])([\200-\277])/e,
'#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';', $string));

for ($i=0;$istrlen($string);$i++) {
$char = ord($string[$i]);
echo sprintf(=%02X,$char);
}
---
string(6) #261; and '=C4=85' expected, if ą is written in UTF-8.


I can bypass it by adding one line to every script that operates with
binary strings, but where are warranties that you won't dump declare()
support just like you dump unicode_semantics. What happens to your new
Unicode aware string functions, if I lie about strings' charset to PHP
interpreter? mb_strlen can't calculate correct $string length even when I
set correct charset in mb_strlen() arguments. If above code works as I
want in PHP6 unicode_semantics=on, mb_strlen($string,'utf-8') returns 2
and not 1.

-- 
Tomas


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php