Re: [PHP-DEV] POSIX regex [PATCH]
On Mon, July 30, 2007 2:22 am, Richard Lynch wrote: On Mon, July 16, 2007 7:47 am, Jani Taskinen wrote: Now only places using the POSIX regex functions (ext/ereg/ excluded) are ext/standard/browscap.c and ext/pgsql/pgsql.c. For your review, my first patch (!) along with a php test case, of course, in a URL/directory structure that should be familiar: http://l-i-e.com/php5/ext/pgsql/ :-) The commit comment should probably have something not unlike this: Use PCRE instead of POSIX regex Remove stray closing parenthesis in PG_TIME pattern It's been a week and nobody has commented on this. Should somebody commit it now?... Or grant me commit karma to ext/pgsql CVS username is 'lynch' And, just to be sure, since it only changes internal workings and not documented features, it should go into 5.x, right?... Or is requiring PCRE instead of POSIX considered not BC for 5.x series? I'll check PHP 6 pgsql and see if it's been Unicode-ified beyond recognition for this patch, or if it applies cleanly there as well. PS I'll change the test case to do the insert with the converted data as a further check that it worked, instead of a rather bogus test insert of hand-coded data that it does now. -- Some people have a gift link here. Know what I want? I want you to buy a CD from some indie artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex [PATCH]
On Mon, July 16, 2007 7:47 am, Jani Taskinen wrote: Now only places using the POSIX regex functions (ext/ereg/ excluded) are ext/standard/browscap.c and ext/pgsql/pgsql.c. For your review, my first patch (!) along with a php test case, of course, in a URL/directory structure that should be familiar: http://l-i-e.com/php5/ext/pgsql/ :-) The commit comment should probably have something not unlike this: Use PCRE instead of POSIX regex Remove stray closing parenthesis in PG_TIME pattern Real Hackers can snag the patch and play with it and hit 'delete' now. Regarding the test case... The existing pg_convert test case only tested 3 conversions and there are/were 9 PCRE/POSIX-regex non-trivial conversions. I didn't really want to mess with adding a bunch of columns to the existing test table, potentially messing up a bunch of other test scripts, so just created/dropped my own table to hit all 9 PCREs I hacked. There are many other conversions, actually, but they mostly consist of no-op or typecasting an int to a string with no actual change, or adding apostrophes around a value to make it DB-ready, and I didn't touch those anyway, so they should be no less broken than they were before. I am, of course, 100% open to critiques, comments, or derogatory remarks. :-) PS The function was and probably should remain experimental in the docs, I guess... Though I am pretty sure I did excise one bug with that stray paren. :-) -- Some people have a gift link here. Know what I want? I want you to buy a CD from some indie artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Mon, July 16, 2007 7:47 am, Jani Taskinen wrote: Now only places using the POSIX regex functions (ext/ereg/ excluded) are ext/standard/browscap.c and ext/pgsql/pgsql.c. As you may know, I'm working on converting ext/pgsql/pgsql.c to use PCRE instead of POSIX regex. It's actually going fairly well, believe it or not, though I have a ton of debug printf's in my C code at the moment, to be sure I'm hitting all the lines I want to hit to test everything. [Yeah, I know, there are way more fancy tools available this decade, but I'm old.] At any rate, I've hit a bit of a snag, and either I'm being stupid, or there's been a bad regex pattern in there for awhile now... Does not this line: http://lxr.php.net/source/php-src/ext/pgsql/pgsql.c#5021 have an extra closing paren at the end of the pattern? [I am making my patch against PHP 5; it just hasn't changed since 4] My PCRE patch is telling me it's broken there: Warning: pg_convert(): Compilation failed: unmatched parentheses at offset 47 in /home/lynch/pg_pcre.php on line 28 [Seems weird how I get a PHP error out of my C code, but there it is...] My eyes and counting up/down on my fingers like I learned in Lisp class in college does. The Regex Coach says it does. Does POSIX regex not generate some kind of error on an extra paren at the end? Or am I missing something particularly arcane or abstract here? If it's actually broken: My PCRE patch will just not have that extra closing paren, so don't rush a patch through just for this, unless you really want to. I expect to wrap up in a couple days, unless something totally unexpected crops up. -- Some people have a gift link here. Know what I want? I want you to buy a CD from some indie artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
snip Also I wonder how a unicode on/off switch will be handled on the documentation side. It would add more permutations in the documentation to have the switch. From my understanding the situation is fairly non trivial already in how to handle all the version dependent differences. Philipp, whats your take on this? I don't think it matters for documentation because both routes have hurdles and planning requirements. But, it's exciting that we're worrying about this because it's time we educate the world to understand why unicode is useful, and why it's needed today. Andrei asked the documentation team to start the unicode documentation process long ago but given that nobody knows what PHP 6 will be, it makes that tough so we've (for time reasons too) done little. However, each function has a unicode section dedicated to it and general unicode feature sections planned. I don't know if a PHP 6 version of the manual would be a good route to take but it's possible although I prefer shoving information into a users face, both past and present, so said user knows what to look for and worry about in all directions. Each function now has a changelog for that. In reply to removing the directive, I fear that PHP 6 would be discussed as === PHP 5 + Unicode when this won't be true... yet this idea could persist and cause confusion so let's be sure everyone realizes this from day #.01. It's the main new (and big) feature only, so that's all we can promise. And in this scenario please decide what PHP 7 could be. Would we have 5/7, 7/8, or just 7 with unicode. In other words, coupled PHP versions forever? Or just once. And regardless, we need an effective marketing strategy via PHP.net that does not solely rely on third parties, word of mouth, or PHP's greatness like we've done in the past. This includes the website and documentation, and this includes strong efforts by everyone. Like, explaining ways to be forward compatible. And perhaps PHP 6 will bring with it a new web design, with pictures of little children from all around the world happily holding hands... :-) So unless something truly innovative seeps up (maybe it has) then stealing ideas from other languages experience and growing pains (like Python and Java) sounds good. If a document existed that compared the situation in many programming languages, the pros and cons, that would be great and might shed light in many of the right places. At least, for me. And/or an update deciphering where we're at after all these lengthy unicode threads. If it's time to go old school with two sides presenting official statements/arguments, then a vote, then so be it. But I don't feel we're quite there yet. Regards, Philip -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] POSIX regex
On Wed, 18 Jul 2007, Andi Gutmans wrote: Functions would work properly with Unicode, but you would explicitly create Unicode strings e.g. ufoobar. This is not uncommon practice and many other languages actually go down this route incl. Python and various versions of C++ frameworks. That's what I meant, Unicode is not implied so it doesn't work by default. Derick -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On 7/20/07, Stanislav Malyshev [EMAIL PROTECTED] wrote: I think on Windows you can do something with the registry per-dir too. On unix there's no registry though. Maybe we need some generic solution to this (like for FastCGI users)? Anybody has good ideas? FastCGI users already can have their own php.ini for every application -- Alexey Zakhlestin http://blog.milkfarmsoft.com/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On 7/20/07, Alexey Zakhlestin [EMAIL PROTECTED] wrote: On 7/20/07, Stanislav Malyshev [EMAIL PROTECTED] wrote: I think on Windows you can do something with the registry per-dir too. On unix there's no registry though. Maybe we need some generic solution to this (like for FastCGI users)? Anybody has good ideas? FastCGI users already can have their own php.ini for every application Having 100 FCGI only because you have 100 different config is not an option. --Pierre -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Thu, 2007-07-19 at 15:39 -0700, Andrei Zmievski wrote: Python did go down that road, but take a look at Python 3000 effort and you will see that what they are trying to do is exactly what we have: native Unicode strings, without prefixes. So maybe we should learn from mistakes other have already made and not do the same.. and remove that stupid option before it's too late. --Jani -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Thu, July 19, 2007 7:52 pm, Stanislav Malyshev wrote: Yeah I also like that casting better than the u It's different things. Casting means create string as binary, then in runtime cast it to unicode, u means this string is unicode. Oh. I think we're going to have to write some documentation on that one before implementation, or a zillion users are gonna be very very confused... If it remains one of those undocumented function for any length of time, expect mass confusion :-) ustuff typed in unicode to allow creation of Unicode strings in PHP5 seems like a Good Idea to this naive reader, if it's easy enough to code that. It may even ease the transition from 5 to 6 for some? Presumably ufoo would be a no-op in PHP 6 with semantics on and not generate some kind of silly error or something, right?... -- Some people have a gift link here. Know what I want? I want you to buy a CD from some indie artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Fri, July 20, 2007 3:07 am, Alexey Zakhlestin wrote: On 7/20/07, Stanislav Malyshev [EMAIL PROTECTED] wrote: I think on Windows you can do something with the registry per-dir too. On unix there's no registry though. Maybe we need some generic solution to this (like for FastCGI users)? Anybody has good ideas? FastCGI users already can have their own php.ini for every application Perhaps the OP just needs a link to a good HowTo FastCGI reference... http://www.fastcgi.com/docs/faq.html#PHP It would be nice if it were a bit more specific about the CLI install hack... Or if PHP out of the box compiled --with-fastcgi as a different binary name so there was no hack... :-v -- Some people have a gift link here. Know what I want? I want you to buy a CD from some indie artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Thu, July 19, 2007 8:29 am, Jani Taskinen wrote: On Thu, 2007-07-19 at 15:47 +0300, Tomas Kuliavas wrote: From the low end user perspective I think this would be great from another POV. Let's imagine for a second that Wordpress will only work with unicode semantics off and that phpBB will only work with the switch on. What if someone would want to run both on a shared server? from httpd.conf Directory /var/www/example.org/www/phpbb php_admin_flag unicode.semantics on /Directory Directory /var/www/example.org/www/wp php_admin_flag unicode.semantics off /Directory Hmm..I forgot that this works for ZEND_INI_SYSTEM type of options. Live and learn I guess. :) Too bad it only works for Apache module.. ;) Maybe I'm being stupid, but why would this work when .htaccess isn't supposed to work for Unicode on/off because it would require too much gnarly ifdef-type code in PHP source? Maybe this doesn't really really work at all and it's going to be a problem? -- Some people have a gift link here. Know what I want? I want you to buy a CD from some indie artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Fri, 2007-07-20 at 15:46 -0500, Richard Lynch wrote: ustuff typed in unicode to allow creation of Unicode strings in PHP5 seems like a Good Idea to this naive reader, if it's easy enough to code that. No, we can't introduce a unicode string type in PHP 5. johannes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] POSIX regex
Jani Taskinen writes: On Thu, 2007-07-19 at 15:39 -0700, Andrei Zmievski wrote: Python did go down that road, but take a look at Python 3000 effort and you will see that what they are trying to do is exactly what we have: native Unicode strings, without prefixes. So maybe we should learn from mistakes other have already made and not do the same.. and remove that stupid option before it's too late. You betcha! IMHO, it'll be a persistent ugliness and source of headaches and regret for a long time. Best Regards Mike Robinson -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Wed, July 18, 2007 10:45 am, Zeev Suraski wrote: I also was thinking the other day, like Ze'ev, that PHP Devs aren't really in touch with the unwashed masses of the userbase... There are a zillion websites out there that run on shared hosts with copy/pasted code and all these scripters will get burned big-time if ereg is suddenly unavailable. They don't really care about PCRE versus POSIX, so long as they can get the job done. I suspect all the shared webhosts will just install ereg once they figure out that their users who never re-factor need it, but they'll be pretty cranky with you for nuking it and making them jump through an extra hoop to bring it back. And all the distro package-maintainers will probably just bundle it right into their packages. And there will be tutorials on how to compile PHP with ereg in it, or how to add it back into windows, or how to install PECL ereg. So just yanking ereg will cause a fair amount of grief, followed by the dubious benefit of thousands of users figuring out how to install a PECL module. Any gurus really offended by ereg can --disable-ereg or whatever it is, no? At least just spit out an E_DEPRECATED in PHP 6, and move it to PECL in PHP 7. Give people enough warning that it's going away before nuking it, so that you can at least say You've been warned for a whole major release that it was going away. I suspect you'll still end up with people just installing it rather than re-writing their code, though, so it's not serving any real purpose to any real users to move it. The people who need to use PCRE exclusively can do that already. The people who need their legacy code to work will just have to jump through an extra hoop. What purpose is served, then, in moving ereg out? None, really. PS I'm working on the PostgreSQL POSIX-PCRE patch, as I don't think PHP itself should need ereg. -- Some people have a gift link here. Know what I want? I want you to buy a CD from some indie artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Richard Lynch wrote: Any gurus really offended by ereg can --disable-ereg or whatever it is, no? So in a dream world, Rasmus would have shipped all the features of PHP 42 as his first release. In a slightly less dreamy, but still unrealistic world, we would have infinite development resources to maintain all the BC hacks in the world. In reality, we have limited resources, so its not about being offended, its about yet another extension that is redundant that needs to be supported. This is the point with a lot of this. How do we set the priorities in managing the scarce resources. For the most part, this is pretty automatic: whatever people do is what we priorities, the other stuff is left for someone else to pick up if they care. Obviously it's not quite that extreme, since there are several people that are willing to do stuff they do not need (or they have a company sponsoring them), just to move PHP forward. regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] POSIX regex
On Wed, July 18, 2007 3:04 am, Derick Rethans wrote: I hope you are not suggesting to port them to both modes? Why on earth should an application support both unicode=off and unicode=on? That's exactly the thing that some of us are so afraid of and want to prevent as this just annoys more and more PHP users that have to deal with this stuff. And as mentioned before, having both modes is *way* worse than having to real with register_globals on/off or magic_quotes, as those two cases could atleast be handled in user space. I suspect some apps can only be reasonably ported one way or the other. But one would hope that an app could make the choice to go either way, and not have a nightmare experience. The purpose of the PHP Devs doing a port is not to release both versions, or either version, but to find out if it can actually be done without major grief for either version. -- Some people have a gift link here. Know what I want? I want you to buy a CD from some indie artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Zeev Suraski wrote: Other than the theological views some people on this list have (either very pro-BC or anti-BC), what did keeping BC cost us? Hey that must be me he is talking about - as I am a real theologian! So for a theologians 2c on Unicode: 1. Teaching unicode and PHP As stated elsewhere I am *working* as a teacher. I follow this list for one *main* purpose and that is I am trying to remedy the extremely sad situation when it comes to books and other teaching material about PHP in Sweden. All books we have got by Swedish authors are so bad that I actively discourage people from reading it! I am trying to write an advanced newbie book that will focus on PHP 6 (+ some HTML 5, CSS 3 and JS 2), with an emphasis on best practice. In Sweden we can do nicely with iso-8859-1 (we do not even need the stinkin' euro-symbol!) But I have students that have developed websites in Arabic, Kurdish and Hindi! I am appalled to see some comments even seemingly questioning if Unicode is worthwhile at all. That's a no brainer! i18n is the next big move on the web. But what technique would be easier to grasp when it comes to switching it on or off? Considering that PHP:s main strength always has been its low entry barrier, I think this is a reasonable consideration. And maybe I am the only one on this list that deals daily with newbies...? From this POV I would definitely say that it would be easier to teach that in PHP 6 unicode is always on and in PHP 5 it's N/A. I do however find the arguments compelling that such an ideal would be impractical. My second best option would be something that can be turned on or off within the scripts, i.e. with ini_set or per directory with .htaccess From the low end user perspective I think this would be great from another POV. Let's imagine for a second that Wordpress will only work with unicode semantics off and that phpBB will only work with the switch on. What if someone would want to run both on a shared server? But as my commit karma is zero I do not know if this is feasible at all. 2. User base. There is not one voice on this list as far as I can tell that is from the CJK-language hemisphere. Is it part of the PHP way to Europe/America ethnocentric? I think it would be a noble thing to actively try to engage PHP developers from Asia in this discussion. (Well, besides the Israeli ones... who *are* doing a great job!) 3. Adoption rate. When PHP 5 was new we got two books in Sweden claiming to teach this version. When I read them there was so little PHP 5 in there that it was scary. Even today most resources that newbies read tend to teach PHP 4. Most discussion fora - at least in Sweden - discuss PHP 4 solutions to peoples problems. This spring I actually taught my students PDO - but then my wife got ill and had a heart transplant. When I got back to school and started grading my students work, all but two had switched to the mysql extension. I asked why, and all said that they had found tutorials and help in a discussion forum, all teaching the old way. I undertook a study: All four totally dominant sites in Sweden where a young developer would turn, all teach PHP 4. (Two of them also teach table-based-layout, unsemantic, inaccessible, proprietary HTML and obtrusive browser-sniffing old school DHTML.) Conclusion: Every advance in PHP internally has to be communicated to us who teach PHP and the easier something is, the more likely it is that it will be picked up. Lars Gunther -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
From the low end user perspective I think this would be great from another POV. Let's imagine for a second that Wordpress will only work with unicode semantics off and that phpBB will only work with the switch on. What if someone would want to run both on a shared server? from httpd.conf Directory /var/www/example.org/www/phpbb php_admin_flag unicode.semantics on /Directory Directory /var/www/example.org/www/wp php_admin_flag unicode.semantics off /Directory Code written to work in unicode.semantics = off, can work in unicode.semantics=on. It just has to deal with functions that expect binary strings instead of PHP5 strings. Other side effects of unicode.semantics=on can be switched off without breaking backwards compatibility. -- Tomas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Thu, 2007-07-19 at 14:27 +0200, Keryx Web wrote: one *main* purpose and that is I am trying to remedy the extremely sad situation when it comes to books and other teaching material about PHP in Sweden. All books we have got by Swedish authors are so bad that I actively discourage people from reading it! Perhaps you should teach the students english? And encourage them to read english books which are widely available.. :D I really thought most swedes do learn english in school? Like we finns do.. :) another POV. Let's imagine for a second that Wordpress will only work with unicode semantics off and that phpBB will only work with the switch on. What if someone would want to run both on a shared server? Very good point. --Jani -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Thu, 2007-07-19 at 15:47 +0300, Tomas Kuliavas wrote: From the low end user perspective I think this would be great from another POV. Let's imagine for a second that Wordpress will only work with unicode semantics off and that phpBB will only work with the switch on. What if someone would want to run both on a shared server? from httpd.conf Directory /var/www/example.org/www/phpbb php_admin_flag unicode.semantics on /Directory Directory /var/www/example.org/www/wp php_admin_flag unicode.semantics off /Directory Hmm..I forgot that this works for ZEND_INI_SYSTEM type of options. Live and learn I guess. :) Too bad it only works for Apache module.. ;) --Jani -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Too bad it only works for Apache module.. ;) I think on Windows you can do something with the registry per-dir too. On unix there's no registry though. Maybe we need some generic solution to this (like for FastCGI users)? Anybody has good ideas? -- Stanislav Malyshev, Zend Software Architect [EMAIL PROTECTED] http://www.zend.com/ (408)253-8829 MSN: [EMAIL PROTECTED] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On 7/19/07, Stanislav Malyshev [EMAIL PROTECTED] wrote: Too bad it only works for Apache module.. ;) I think on Windows you can do something with the registry per-dir too. On unix there's no registry though. Maybe we need some generic solution to this (like for FastCGI users)? Anybody has good ideas? Yes, merge htscanner (pecl) into the core (sapi hooks or something like that). Doing so will also kill the couple of limitations due to the init order in php. It is on my todos, but I would appreciate any help :) Cheers, --Pierre -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Python did go down that road, but take a look at Python 3000 effort and you will see that what they are trying to do is exactly what we have: native Unicode strings, without prefixes. -Andrei On Jul 18, 2007, at 11:51 AM, Andi Gutmans wrote: Functions would work properly with Unicode, but you would explicitly create Unicode strings e.g. ufoobar. This is not uncommon practice and many other languages actually go down this route incl. Python and various versions of C++ frameworks. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Python did go down that road, but take a look at Python 3000 effort and you will see that what they are trying to do is exactly what we have: native Unicode strings, without prefixes. Maybe still having u - that always produce unicode, regardless of semantics - could be helpful... -- Stanislav Malyshev, Zend Software Architect [EMAIL PROTECTED] http://www.zend.com/ (408)253-8829 MSN: [EMAIL PROTECTED] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] POSIX regex
I don't like the idea of having a u prefix for Unicode strings. It may improve performance, and give you some level of fine grain control, but... - It breaks your keep php simple policy by introducing a lot of new functions (ugly). - I (plus a lot of others) have an existing php5 application which I wish to eventually use with Unicode, and like others, I don't want to spend time refactoring. - It will also introduce bugs when programmers accidentally forget to add the u prefix when working with unicode. If you always want to produce Unicode, I think its best to always use a cast or a conversion function. Eg $str = (unicode)(strtoupper($str)); Or $str = unicode_val(strtoupper($str)); My 2c :) -Original Message- From: Stanislav Malyshev [mailto:[EMAIL PROTECTED] Sent: Friday, 20 July 2007 8:47 AM To: Andrei Zmievski Cc: Andi Gutmans; Derick Rethans; Lukas Kahwe Smith; Ilia Alshanetsky; [EMAIL PROTECTED]; internals@lists.php.net Subject: Re: [PHP-DEV] POSIX regex Python did go down that road, but take a look at Python 3000 effort and you will see that what they are trying to do is exactly what we have: native Unicode strings, without prefixes. Maybe still having u - that always produce unicode, regardless of semantics - could be helpful... -- Stanislav Malyshev, Zend Software Architect [EMAIL PROTECTED] http://www.zend.com/ (408)253-8829 MSN: [EMAIL PROTECTED] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On 7/19/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I don't like the idea of having a u prefix for Unicode strings. It may improve performance, and give you some level of fine grain control, but... - It breaks your keep php simple policy by introducing a lot of new functions (ugly). - I (plus a lot of others) have an existing php5 application which I wish to eventually use with Unicode, and like others, I don't want to spend time refactoring. - It will also introduce bugs when programmers accidentally forget to add the u prefix when working with unicode. If you always want to produce Unicode, I think its best to always use a cast or a conversion function. Eg $str = (unicode)(strtoupper($str)); Or $str = unicode_val(strtoupper($str)); My 2c :) Yeah I also like that casting better than the u $0.02 :P -Original Message- From: Stanislav Malyshev [mailto:[EMAIL PROTECTED] Sent: Friday, 20 July 2007 8:47 AM To: Andrei Zmievski Cc: Andi Gutmans; Derick Rethans; Lukas Kahwe Smith; Ilia Alshanetsky; [EMAIL PROTECTED]; internals@lists.php.net Subject: Re: [PHP-DEV] POSIX regex Python did go down that road, but take a look at Python 3000 effort and you will see that what they are trying to do is exactly what we have: native Unicode strings, without prefixes. Maybe still having u - that always produce unicode, regardless of semantics - could be helpful... -- Stanislav Malyshev, Zend Software Architect [EMAIL PROTECTED] http://www.zend.com/ (408)253-8829 MSN: [EMAIL PROTECTED] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- David Coallier, Founder Software Architect, Agora Production (http://agoraproduction.com) 51.42.06.70.18 -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Jul 19, 2007, at 4:14 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I don't like the idea of having a u prefix for Unicode strings. It may improve performance, and give you some level of fine grain control, but... - It breaks your keep php simple policy by introducing a lot of new functions (ugly). - I (plus a lot of others) have an existing php5 application which I wish to eventually use with Unicode, and like others, I don't want to spend time refactoring. - It will also introduce bugs when programmers accidentally forget to add the u prefix when working with unicode. If you always want to produce Unicode, I think its best to always use a cast or a conversion function. Eg $str = (unicode)(strtoupper($str)); Or $str = unicode_val(strtoupper($str)); Good idea and it will totally work, except that it won't. strtoupper () operates in different ways according to the type of the string that it gets. -Andrei -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] POSIX regex
I don't really know much about unicode, and to be honest, I don't really know much about the internal workings of php. But I assume that there are going to be different implementations of string functions depending on whether the string is unicode or not. I'm going to suggest an implementation suggestion... Keep in mind I havent hacked around with php source, so my variable naming etc will be wrong... and its all psuedocode, so its not // The object type used when php creates a string class ZendString { char *strPtr; // however strings are stored in php ZendStringFunctions *pFunctions; }; abstract class ZendStringFunctions { abstract function strtolower(ZendString *pStr); abstract function strtoupper(ZendString *pStr); abstract function substr(ZendString *pStr); // All functions that differ depending on unicode / non-unicode implementation // ... }; // A set of string functions for unicode strings class ZendStringFunctionsUnicode { function strtolower(ZendString *pStr) { // unicode implementation } function strtoupper(ZendString *pStr) { // unicode implementation } function substr(ZendString *pStr) { // unicode implementation } }; // A set of string functions for non-unicode strings class ZendStringFunctionsNonUnicode { function strtolower(ZendString *pStr) { // non-unicode implementation } function strtoupper(ZendString *pStr) { // non-unicode implementation } function substr(ZendString *pStr) { // non-unicode implementation } }; // the strtolower implmentation ZEND_FUNC strtolower(ZendString *pStr) { return pStr-pFunctions-strtolower(pStr); } // the strtoupper implmentation ZEND_FUNC strtolower(ZendString *pStr) { return pStr-pFunctions-strtolower(pStr); } ZEND_FUNC unicode_val(ZendString *pStr) { // do something with pStr-strPtr delete pStr-pFunctions; pStr-pFunctions = new ZendStringFunctionsUnicode(); } Anyway - the point I'm trying to make is to use function pointers to switch between implementations. You could even make the ZendStringFunctions singletons and just set pStr-pFunctions to an instance of the singleton. I think this would provide a very fast implementation of what is trying to be done. Im just making a suggestion, and feel free to ignore/criticise me if im wrong. I don't know anything about phps internals... Just an idea Scott -Original Message- From: Andrei Zmievski [mailto:[EMAIL PROTECTED] Sent: Friday, 20 July 2007 9:36 AM To: [EMAIL PROTECTED] Cc: internals@lists.php.net Subject: Re: [PHP-DEV] POSIX regex On Jul 19, 2007, at 4:14 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I don't like the idea of having a u prefix for Unicode strings. It may improve performance, and give you some level of fine grain control, but... - It breaks your keep php simple policy by introducing a lot of new functions (ugly). - I (plus a lot of others) have an existing php5 application which I wish to eventually use with Unicode, and like others, I don't want to spend time refactoring. - It will also introduce bugs when programmers accidentally forget to add the u prefix when working with unicode. If you always want to produce Unicode, I think its best to always use a cast or a conversion function. Eg $str = (unicode)(strtoupper($str)); Or $str = unicode_val(strtoupper($str)); Good idea and it will totally work, except that it won't. strtoupper () operates in different ways according to the type of the string that it gets. -Andrei -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] POSIX regex
Sorry if you are using outlook, turn off the thing that says Extra line breaks in this message were removed at the top of my previous message. Scott -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, 20 July 2007 10:11 AM To: internals@lists.php.net Subject: RE: [PHP-DEV] POSIX regex I don't really know much about unicode, and to be honest, I don't really know much about the internal workings of php. But I assume that there are going to be different implementations of string functions depending on whether the string is unicode or not. I'm going to suggest an implementation suggestion... Keep in mind I havent hacked around with php source, so my variable naming etc will be wrong... and its all psuedocode, so its not // The object type used when php creates a string class ZendString { char *strPtr; // however strings are stored in php ZendStringFunctions *pFunctions; }; abstract class ZendStringFunctions { abstract function strtolower(ZendString *pStr); abstract function strtoupper(ZendString *pStr); abstract function substr(ZendString *pStr); // All functions that differ depending on unicode / non-unicode implementation // ... }; // A set of string functions for unicode strings class ZendStringFunctionsUnicode { function strtolower(ZendString *pStr) { // unicode implementation } function strtoupper(ZendString *pStr) { // unicode implementation } function substr(ZendString *pStr) { // unicode implementation } }; // A set of string functions for non-unicode strings class ZendStringFunctionsNonUnicode { function strtolower(ZendString *pStr) { // non-unicode implementation } function strtoupper(ZendString *pStr) { // non-unicode implementation } function substr(ZendString *pStr) { // non-unicode implementation } }; // the strtolower implmentation ZEND_FUNC strtolower(ZendString *pStr) { return pStr-pFunctions-strtolower(pStr); } // the strtoupper implmentation ZEND_FUNC strtolower(ZendString *pStr) { return pStr-pFunctions-strtolower(pStr); } ZEND_FUNC unicode_val(ZendString *pStr) { // do something with pStr-strPtr delete pStr-pFunctions; pStr-pFunctions = new ZendStringFunctionsUnicode(); } Anyway - the point I'm trying to make is to use function pointers to switch between implementations. You could even make the ZendStringFunctions singletons and just set pStr-pFunctions to an instance of the singleton. I think this would provide a very fast implementation of what is trying to be done. Im just making a suggestion, and feel free to ignore/criticise me if im wrong. I don't know anything about phps internals... Just an idea Scott -Original Message- From: Andrei Zmievski [mailto:[EMAIL PROTECTED] Sent: Friday, 20 July 2007 9:36 AM To: [EMAIL PROTECTED] Cc: internals@lists.php.net Subject: Re: [PHP-DEV] POSIX regex On Jul 19, 2007, at 4:14 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I don't like the idea of having a u prefix for Unicode strings. It may improve performance, and give you some level of fine grain control, but... - It breaks your keep php simple policy by introducing a lot of new functions (ugly). - I (plus a lot of others) have an existing php5 application which I wish to eventually use with Unicode, and like others, I don't want to spend time refactoring. - It will also introduce bugs when programmers accidentally forget to add the u prefix when working with unicode. If you always want to produce Unicode, I think its best to always use a cast or a conversion function. Eg $str = (unicode)(strtoupper($str)); Or $str = unicode_val(strtoupper($str)); Good idea and it will totally work, except that it won't. strtoupper () operates in different ways according to the type of the string that it gets. -Andrei -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Yeah I also like that casting better than the u It's different things. Casting means create string as binary, then in runtime cast it to unicode, u means this string is unicode. -- Stanislav Malyshev, Zend Software Architect [EMAIL PROTECTED] http://www.zend.com/ (408)253-8829 MSN: [EMAIL PROTECTED] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On 7/19/07, Stanislav Malyshev [EMAIL PROTECTED] wrote: Yeah I also like that casting better than the u It's different things. Casting means create string as binary, then in runtime cast it to unicode, u means this string is unicode. -- You are right that casting means string - binary, and that runtime cast to unicode means a string is unicode, however, after speaking with many php developers (not internals), the same answer always come up It's ugly. Does that simply mean that it's ugly ? I believe not, it means that it's also unreadable, unclear at first look, and easy to overlook. One solution that I could foresee would be to recognize (unicode) within a function call. Ex: strlen( (unicode) Óglaig); This would runtime-cast Óglaig to a unicode string. Expected answers: 1) I don't find it to be useful and better than u A: It is more readable, easier to find/notice and simply cleaner. 2) No A: Ok.. 3) It's going against the usual casting standard of (type) A: True The decision probably has been made already and if so just let me know and I'll stop trying to rise a voice for the community :P And no, I do not have a patch ;-) Stanislav Malyshev, Zend Software Architect [EMAIL PROTECTED] http://www.zend.com/ (408)253-8829 MSN: [EMAIL PROTECTED] -- David Coallier, Founder Software Architect, Agora Production (http://agoraproduction.com) 51.42.06.70.18 -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On 7/18/07, Zeev Suraski [EMAIL PROTECTED] wrote: Pierre, I wanted to send my 2c even though I'm not really involved in internals@ any longer - because in reality it doesn't really have much to do with such decisions. internals@ makes decisions that effect the entire PHP userbase. We all need to remember that the people on this mailing list are not close to something that represents the userbase. We do have some very opinionated people on this list, some of them with a lot of commit-karma - which are not very open to feedback from regular users. I'm not saying I represent the PHP userbase, and I don't think Andi is saying this either - but both of us try to take the end user's view when we think about stuff like this, as opposed as the internals@ PHP developer view. I would go as far as saying that I think we do it (as well as some others, like Rasmus) more so than some others on this list. For that reason I suspect that if you moved the discussion to, say, php-general - you'd see a much more balanced view of the world. Unfortunately it will probably not be very manageable. Something more practical would be trying to think about things from the end users perspective as opposed to our perspective as the developers and maintainers of the language. Finally, at the risk of sounding like a broken record, we always need to remember that BC breakage accumulates, and it's not binary. Every cleanup we do in PHP 6 will further slow migration, and as Andi pointed out a few days ago, things don't look too well as it is. As for ereg - especially in light of the discontinuation of PHP 4 we shouldn't even consider removing it in PHP 5. I agree with Andi that I'm not sure it's a good idea for PHP 6 either, but I'm not sure it isn't either. As long as it's easy enough to turn it back on (i.e. have it bundled but disabled) I think it's not unreasonable. My answer to Andi was not only about ereg but php6 in general (the unicode flag being a much more important problem that ereg, for example). I fully agree with you. Each individual here does not represent the user base but only a relative small part. However, my problem here is not about that but about the respect of our voices. It is understandable that you think to have a brighter customers base, it is not necessary the case. not historically and not practically. Conferences attendees are also a very small part of our users. All in all, internals developers, with their customers, coworkers or users (Ez, PEAR, linux package maintainers, etc.) do represent what I consider as a good representation of what our users are or like to have. About the migration path, we should not forget our PHP5 lessons. All Andi is trying to do was what was done with PHP5. Many cleanups have not been done for the sake of BC breaks and migration troubles. We know now that it does not matter. Users migrate when they have to or need to not just for the fun of it. Finally, you are right to say that an opinion has little to do with the commit karma. Cheers, --Pierre -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] POSIX regex
On Tue, 17 Jul 2007, Andi Gutmans wrote: Hmm I don't quite understand what bad code vs. good code plays here. Wordpress is one of the most popular applications out there so it's got huge value to our community. I bet there's a huge amount of PHP applications who's source code is of the same quality or worse. Anyway, the issues I have seen would also be relevant to what you call good code but again, when it comes to compatibility, I don't quite know why that will play a big role. I am talking about porting to both unicode_semantics=off and on. This will give us a good understanding of the difference of the modes and where we're at. I bet most people who are voicing their opinions have actually not tried to write a sizeable application with PHP 6 and also tried to run an existing one on PHP 6 (unciode_semantics=on). I hope you are not suggesting to port them to both modes? Why on earth should an application support both unicode=off and unicode=on? That's exactly the thing that some of us are so afraid of and want to prevent as this just annoys more and more PHP users that have to deal with this stuff. And as mentioned before, having both modes is *way* worse than having to real with register_globals on/off or magic_quotes, as those two cases could atleast be handled in user space. regards, Derick -- Derick Rethans http://derickrethans.nl | http://ez.no | http://xdebug.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Tue, 17 Jul 2007, Stanislav Malyshev wrote: that would actually benefit quite a bit from unicode support, but I guess you are talking about porting with unicode==off, right? unicode=off doesn't mean no unicode support, btw. Of course that's what it means, as none of the string functions work properly with unicode if you turn it off. And that's just the whole selling point of Unicode support. Derick -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Wed, 2007-07-18 at 10:20 +0200, Derick Rethans wrote: On Wed, 18 Jul 2007, Zeev Suraski wrote: As for ereg - especially in light of the discontinuation of PHP 4 we shouldn't even consider removing it in PHP 5. I don't think anybody wanted to remove it in PHP 5 - just make it possible to disable as an extension. I guess it was misunderstood: All the talk about it concerns HEAD only, not PHP 5. But I will MFH the move to ext in PHP_5_3 though. Helps future merges around when the changes are in both branches. --Jani -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Tue, 17 Jul 2007, Lukas Kahwe Smith wrote: Andi Gutmans wrote: There are clear things we want to change (like register_globals) because we believe that ultimately they have a significant benefit to our users with controllable downside (there is an easy one line workaround which we can document for people to get their old apps to work). There are other areas where breaking BC makes sense. But saying we should just break it across the board and not even consider having a good upgrade path for our users is unreasonable. I believe we can have a very good PHP 6, which is pretty much in sync with many of your feelings, but that provides a well documented and reasonable upgrade path (unlike VB - VB.NET). I never said we should break BC just for the hell of it. The goal must be that PHP6 feels and behaves like PHP. Its not about high-jacking PHP to come up with the language we all wanted instead. So let's not oversimplify this situation. We have to continue to make trade-offs. Sure, but you are suggesting to delay decisions indefinitely. Either you are saying this because you already decided that you don't want this change, Doh, isn't that obvious? regards, Derick -- Derick Rethans http://derickrethans.nl | http://ez.no | http://xdebug.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
At 00:55 18/07/2007, Pierre wrote: My answer to Andi was not only about ereg but php6 in general (the unicode flag being a much more important problem that ereg, for example). I fully agree with you. Each individual here does not represent the user base but only a relative small part. However, my problem here is not about that but about the respect of our voices. It is understandable that you think to have a brighter customers base, it is not necessary the case. not historically and not practically. Conferences attendees are also a very small part of our users. All in all, internals developers, with their customers, coworkers or users (Ez, PEAR, linux package maintainers, etc.) do represent what I consider as a good representation of what our users are or like to have. I think that they're still quite far away from a real coverage of the entire userbase. Each of them sees a certain part of the userbase through a different prism. I think that some of us get to see people through some more prisms than others, and you may very well be one of them - but they are still prisms, and I *think* that most of us don't get to meet some of the lower 'average' developers. The ones that don't respond to blogs, go to conferences, let alone participate in [EMAIL PROTECTED] The ones who constitute the vast majority of PHP developers around the world - those using it to get their job done. If you noticed, I didn't just speak about the users that I meet, but trying to put myself in the average user's place using a simple thought experiment. I think using this approach (the famous 'WTF factor' is a part of that) helped PHP tremendously and was one of the key reasons for its success. That's why I'm pretty confident you'd get a very different (much more balanced) view of the world if you ask the question in a more neutral environment - such as php-general (and even that list arguably includes people with above-average interest in PHP - given that we're talking about millions of developers and only thousands of subscribers). Can I realize, from an end-user's point of view, why the removal of a certain feature that I'm using would help me? Or will it be much easier for me to imagine the pain involved with working around it? Other than the theological views some people on this list have (either very pro-BC or anti-BC), what did keeping BC cost us? About the migration path, we should not forget our PHP5 lessons. All Andi is trying to do was what was done with PHP5. Many cleanups have not been done for the sake of BC breaks and migration troubles. We know now that it does not matter. Users migrate when they have to or need to not just for the fun of it. I think we're learning very different lessons from the same facts. PHP 5 migration stalled because of several reasons, the key of which are (IMHO): 1. Misperception about the level of compatibility breakage. 2. Correct perception that moving to PHP 5 requires a full QA cycle of your entire codebase with full code coverage (assuming you're running a critical app that you can't afford to break, which needless to say thousands and thousands of users do); And contrary to popular belief, that's actually a very very big deal. In the shared hosting arena there's supposedly also lack of support for PHP 5 deployment, although the big hosters I've been in touch with have provided PHP 5 support (as an option) a couple of months after its release, so I'm not sure how much this had to do with it. Is the lesson we should learn that we need to turn #1 into a correct perception, requiring substantial changes and potentially a full code audit, and make the migration much more difficult? Would we ever be able to discontinue PHP 5 if migrating to PHP 6 is a truly tough task, like we just did with PHP 4? The less undue compatibility breakage we introduce the better. I hope we can agree on that - turning the discussion into what's exactly 'due' and what is 'undue'. IMHO - if we remove the unicode=off mode, we'll have to support PHP 5 (unlike we supported PHP 4 with bugfixes only for the most part - but with true backporting of all key features, apps frameworks running properly on both versions, etc.) or seriously risk losing our userbase. Given that we managed to nail it fairly well already, I can't understand why we would want to do that and increase the chances of PHP 6 being a flop quite significantly. Zeev -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
At 01:20 18/07/2007, Derick Rethans wrote: This sounds like a broken record, this sounds like a broken record, this sounds like a broken record. I've heard this so many times now, it get's boring. I'm not surprised, but it doesn't change the fact that it's true, though. No matter how many times this will be discussed or disputed, the more we break - the harder it is for our users to move. It's an axiom, and we have to live with it, even if it gets easy to repress it and take all sorts of opportunities for an end-of-the-season compatibility breakage sale. You seem to think that none of the people on the internals list are part of the user base - that is incorrect. Most of my opinions come forth out of my involvement with an extremely large code base. I didn't say that, I did say that they (myself included) don't represent the PHP userbase at large and I fully stand behind that statement. Read my other post from a couple of minutes ago for an explanation as to what I mean. I'm not saying I represent the PHP userbase, and I don't think Andi is saying this either - but both of us try to take the end user's view when we think about stuff like this, as opposed as the internals@ PHP developer view. I would go as far as saying that I think we do it (as well as some others, like Rasmus) more so than some others on this list. Regarding the unicode on/off modes, I don't think you put yourself in the developer's view at all. Users are not going to be better of having to deal with both modes. Well, I tend to agree with you that they shouldn't have to handle BOTH modes (write code that works with both settings). But they will definitely be better off if they can choose one of these modes and develop/deploy for it. For someone for whom PHP 6 is a non-item (no interest in Unicode), moving to PHP 6 and being forced to audit his code will be a completely unreasonable cost of migration. A clear 'not worth it' situation. For that reason I suspect that if you moved the discussion to, say, php-general - you'd see a much more balanced view of the world. I really doubt that, as that list does not include many people that use PHP for internal projects. It's mostly the geeks that have time to discuss on the list. I know that *many* PHP users don't either know about this list, or simply can't be bothered with it. You know what, I agree. I wrote something to that effect in my post from a few minutes ago. The vast userbase is mostly comprised of people we hardly even get to see. As for ereg - especially in light of the discontinuation of PHP 4 we shouldn't even consider removing it in PHP 5. I don't think anybody wanted to remove it in PHP 5 - just make it possible to disable as an extension. Great, I misunderstood. Zeev -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Derick Rethans wrote: Regarding the unicode on/off modes, I don't think you put yourself in the developer's view at all. Users are not going to be better of having to deal with both modes. Have you guys really thought this through? Let's look at this from two angles. First, from the our perspective maintaining and developing PHP. Without the Unicode switch, and as has already been suggested, PHP 5 will never die. Anything new in PHP 6 that isn't related to Unicode will be backported to PHP 5. Or, a slight variation of that, any developer with no interest in Unicode will only work on the PHP 5 branch and not bother worrying about whether it works in PHP 6 forcing others to do that work. I don't think we have the resources to do this, and I think it is likely to either create 2 classes of developers and potentially diverging trees, or it may simply kill off the Unicode effort altogether if not enough developers bother looking at PHP 6 since PHP 5 will live forever and is free of all this annoying Unicode stuff that is just too complicated to deal with. Second, from the user space PHP developers' perspective. There are two groups of those out there. There is the group that builds apps for controlled environments. Yahoo, Facebook, and the hundreds, if not thousands of smaller companies out there that will define a certain PHP configuration and code against that. To them such a switch isn't a big deal except when it comes to re-using external code. Which bring us to the second group which is the group that strives to build portable apps designed to run on as many unknown PHP configs as possible. This is the group that will get hit by this, and here is where we need to figure out how to cause them the least amount of pain. They are going to feel some pain in order to get their heads around Unicode no matter how we handle this. For the portion of these folks who don't want to worry about Unicode at all and they actually have code that does stuff on binary strings that will break, their stuff just won't work no matter what we do. The difference comes down to whether it gets marked as PHP5-only or it gets marked as non-Unicode-only. And the other camp who do want to make sure their stuff supports Unicode will need to write the Unicode and non-Unicode versions and check to see if the system they are running on supports Unicode or not. Whether they check the PHP version number, or the Unicode switch, or probe directly for the features they need, it ends up being about the same amount of pain. What may be somewhat lost in all this, that I hope nobody here is forgetting, is that smooth Unicode support is really important. Being able to work directly in your native charset with your native strings without having to deal with iconv and other crap is the goal here. And let's also not forget that a lot of code will actually work unchanged in PHP 6 Unicode-mode and suddenly be Unicode-capable where they weren't before. I would love to see all this energy put toward making sure as much code as possible falls into this category instead of arguing about where to put the Unicode switch. It's still a switch whether you put it in the version number or in the .ini file. In the version number it is simply easier for people to ignore from all sides or the discussion here, but where does that leave us 4 years from now? Perhaps the real argument here is whether we should be doing Unicode at all? -Rasmus -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Rasmus Lerdorf wrote: Perhaps the real argument here is whether we should be doing Unicode at all? I've watched these debate with tremendous interest. i18n is one of my pure 'hobbies' (my 'clients' are all quite happy with ISO-8859-1, and one of my backgrounds is WinNT where everything became unicode within the OS.) I'm pondering if utf-8 as the 'default' encoding wouldn't have been a more effective approach than pure unicode wide-chars, but no matter how you slice this, there will be several points of pain in the transition. Rethinking in terms of utf-8 might be an interesting exercise, just to draw up a comparison of 'what is broken' when sliding between a PHP5 ISO charset and a PHP6 Unicode or utf-8 charset. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Rasmus Lerdorf wrote: Derick Rethans wrote: Regarding the unicode on/off modes, I don't think you put yourself in the developer's view at all. Users are not going to be better of having to deal with both modes. Have you guys really thought this through? Let's look at this from two angles. First, from the our perspective maintaining and developing PHP. Without the Unicode switch, and as has already been suggested, PHP 5 will never die. Anything new in PHP 6 that isn't related to Unicode will be backported to PHP 5. Or, a slight variation of that, any developer with no interest in Unicode will only work on the PHP 5 branch and not bother worrying about whether it works in PHP 6 forcing others to do that work. I don't think we have the resources to do this, and I think it is likely to either create 2 classes of developers and potentially diverging trees, or it may simply kill off the Unicode effort altogether if not enough developers bother looking at PHP 6 since PHP 5 will live forever and is free of all this annoying Unicode stuff that is just too complicated to deal with. Second, from the user space PHP developers' perspective. There are two groups of those out there. There is the group that builds apps for controlled environments. Yahoo, Facebook, and the hundreds, if not thousands of smaller companies out there that will define a certain PHP configuration and code against that. To them such a switch isn't a big deal except when it comes to re-using external code. Which bring us to the second group which is the group that strives to build portable apps designed to run on as many unknown PHP configs as possible. This is the group that will get hit by this, and here is where we need to figure out how to cause them the least amount of pain. They are going to feel some pain in order to get their heads around Unicode no matter how we handle this. For the portion of these folks who don't want to worry about Unicode at all and they actually have code that does stuff on binary strings that will break, their stuff just won't work no matter what we do. The difference comes down to whether it gets marked as PHP5-only or it gets marked as non-Unicode-only. And the other camp who do want to make sure their stuff supports Unicode will need to write the Unicode and non-Unicode versions and check to see if the system they are running on supports Unicode or not. Whether they check the PHP version number, or the Unicode switch, or probe directly for the features they need, it ends up being about the same amount of pain. What may be somewhat lost in all this, that I hope nobody here is forgetting, is that smooth Unicode support is really important. Being able to work directly in your native charset with your native strings without having to deal with iconv and other crap is the goal here. And let's also not forget that a lot of code will actually work unchanged in PHP 6 Unicode-mode and suddenly be Unicode-capable where they weren't before. I would love to see all this energy put toward making sure as much code as possible falls into this category instead of arguing about where to put the Unicode switch. It's still a switch whether you put it in the version number or in the .ini file. In the version number it is simply easier for people to ignore from all sides or the discussion here, but where does that leave us 4 years from now? I guess the question (which I am unable to answer) is if its easier to maintain PHP6 with the switch or be forced to backport to PHP5 without the switch in PHP6. If it does end up that a lot of devs prefer to work on PHP5 and as a result PHP6 is left dangling, I wonder if with the switch things will be any easier as devs will work/test only the non unicode side of things? I think this was the key point that was brought up that it will not be easier and instead was deemed more error prone to handle all the if's in a single tree, versus have a clean separation. Also I wonder how a unicode on/off switch will be handled on the documentation side. It would add more permutations in the documentation to have the switch. From my understanding the situation is fairly non trivial already in how to handle all the version dependent differences. Philipp, whats your take on this? regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Wed, 2007-07-18 at 02:42 -0700, Rasmus Lerdorf wrote: What may be somewhat lost in all this, that I hope nobody here is forgetting, is that smooth Unicode support is really important. Being Smooth it will be only if it's the only option. Otherwise it's just PITA for both the camps. I'm all for unicode support as long as it's always there. where to put the Unicode switch. It's still a switch whether you put it in the version number or in the .ini file. In the version number it is simply easier for people to ignore from all sides or the discussion here, but where does that leave us 4 years from now? With a bone in hand? ;) Or most likely with actually working PHP with full Unicode support rather than half-assed one.. Why not just rename the beast to uPHP. :D --Jani -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Wed, 18 Jul 2007, Zeev Suraski wrote: ... You know what, I agree. I wrote something to that effect in my post from a few minutes ago. The vast userbase is mostly comprised of people we hardly even get to see. Sorry to chime in on this already long thread with my -negative- commit karma, but I really want to show support for the extremely sensible and considerate position of Zeev. I tend to consider myself a not-so-average php user, not because of my self-assessed superior coding skills, but because most of my (ex)coworkers, both developers and sysadmins involved in web applications, have zero interest in any of the php mailing lists, conferences or similar. They need to get a job done, have very limited resources for it and absolutely no time at all to improve their knowledge. They use php because 1-it's easy, 2-other ppl use it. They can read english, but with some difficulties, so their main source of information is blogs from the italian php community. They might keep running applications on PHP 4.04 (pl1!!!) because the original coder left the project years ago, and doing proper QA on an application you have not written is a huge effort, and migration a risk they cannot really even asses. Now, I can sneer at them all I want, the fact remains that they are part of the user base, and have no less rights than I do to get the best solution that can be served to them. And I do not think mindless BC breakage is a thing they like. imho, a lesson to be learned from the slow transition to php5 is to really focus on communication. The big changes were written on the walls, but many small ones were not. And QA is needed almost exclusively to catch the small ones (for the big ones, you have the coder fix it upfront). Some examples of things I stumbled upon include: objects not being copied on assignment (it was really documented, in the cabinet after the 'beware of the leopard' sign), *curl_version suddenly returning an array instead of a string and others... Of course most users will migrate only when they feel the need for it anyway, but the more obstacles are put on their path, the slower the adoption rate will be. Keeping the 'unicode off' switch is a kind of double edged word: it eases life for people developing for intranets (they can migrate to php 6 with unicode off and be fine), but might backfire on framework/library developers, that will have to code for two environments... Maybe the only solution is making it easier to run different versions of php in parallel? my .2euros Gaetano *
Re: [PHP-DEV] POSIX regex
On Wed, 2007-07-18 at 12:23 +0200, Gaetano Giunta wrote: Maybe the only solution is making it easier to run different versions of php in parallel? It's already easy and possible. Please don't start that discussion nor spread the fud that it isn't. --Jani -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Hi Zeev, On Wed, 2007-07-18 at 01:58 -0700, Zeev Suraski wrote: Regarding the unicode on/off modes, I don't think you put yourself in the developer's view at all. Users are not going to be better of having to deal with both modes. Well, I tend to agree with you that they shouldn't have to handle BOTH modes (write code that works with both settings). But they will definitely be better off if they can choose one of these modes and develop/deploy for it. For someone for whom PHP 6 is a non-item (no interest in Unicode), moving to PHP 6 and being forced to audit his code will be a completely unreasonable cost of migration. A clear 'not worth it' situation. The question here in my opinion is: How much harm should we do to users who develop new things in order to make lives simpler for these who need BC. The first thing I see is: Having these two modes is a pita for everybody who wants to write portable code. The modes act different depending on that switch, some parts of PHP work quite different, some of these changes can be worked around in a quite simple way others not that easy but still possible. (since the engine still knows unicode and you still can make it all think there's some more unicode in there) But for a new application it's imo bad to need such compatibility hacks. If you want clean code you might concentrate on one of these two modes - but which? The faster oder the better? Well, that depends on what hoster's will configure, but how should they know? For hosters it's hard to decide which road to go. Offer both? - Offering both is, from the complexity, the same as hosting PHP 5 and PHP 6 since unicode.semantics is PHP_INI_SYSTEM, meaning you need independent PHP instances (FastCGI, individual hosts, whatever) Another possibility is offering just PHP 6 with unicode.semantics Off. In my opinion a hoster doing that might not advertise offering PHP 6 with that mode off since it's only offering half of PHP 6 (namespaces, gote, maybe LSB, ...) or offer PHP 6 + unicode and PHP 5 for BC. For me this feels like the most sane way by the means of BC - on the one hand you have the full BC by using PHP 5 on the other hand you're offering full PHP 6 for the ones who need this feature. Talking about BC: Except for the unicode stuff PHP 6 will most likely have around the same amount of BC breaks as PHP 5 had compared to PHP 4. (there are already a few tiny ones, like you can't call your functions goto anymore and such stuff). PHP 5 offers an compatibility mode for PHP 4, the benefit of that mode, compared to PHP 6's BC mode was that one might change it even at runtime. What might help doing the migration (while making the code ugly but hopefully such hacks are temporarily) Another argument for that setting I read was performance. I didn't do proper benchmarks of the code comparing both modes so I don't know how relevant the impact is but if performance of the unicode mode really is a big problem for most users we are really going to have a big problem since then we have to keep the mode forever and I, who can really live with using ISO-8859-1, am wondering whether it really makes sense to change half the engine for a mode which is too slow for most cases and only needed by a minority of users (some mentioned in these discussions numers like 10 % unicode mode on, 90% off ...) and whether it won't be better do concentrate on the intl and mbstring extensions to improve the tools for the ones needing better support in the area without harming most users. But well, as said: Here I'm just wondering after reading the previous discussions. This all gives me the conclusion that we really should consider removing the mode, but well, that's my opinion. As for ereg - especially in light of the discontinuation of PHP 4 we shouldn't even consider removing it in PHP 5. I don't think anybody wanted to remove it in PHP 5 - just make it possible to disable as an extension. Great, I misunderstood. This gives me the possibility to come back to the original topic of this thread, which wasn't about the unicode.semantics mode: Since I think we should remove that setting I think we should disable ereg with PHP 6 since for what I know ereg won't work with unicode data. Regular expressions which won't work on the main data type are pointless in my opinion. Besides that there are two other reasons I see: - ereg functions are marked as deprecated for ages so user's should be prepared - ereg functions aren't binary safe - most cases where I've seen them where most likely insecure since people didn't know you can bypass ereg-based input checking by inserting nullbytes so removing these helps writing more secure code In most cases a workaround, by PHP_Compat or something, can be offered by escaping slashes in the pattern, adding slashes as delimiters and give that to preg - this won't work in all cases but I'm sure it works in most cases. Ah, another thing kind of related to this
Re: [PHP-DEV] POSIX regex
Zeev Suraski wrote: Finally, at the risk of sounding like a broken record, we always need to remember that BC breakage accumulates, and it's not binary. Every cleanup we do in PHP 6 will further slow migration, and as Andi pointed out a few days ago, things don't look too well as it is. Agreed, its not binary, but its also not the simple addition of all issues either. The effort does diminish as you can cover multiple BC breaks in one going over your code. The key thing that we screwed up with PHP 5.x was not providing enough documentation on the BC breaks. Doing this better this time (the migration guides are a good start, porting some major apps and documenting the issues is another) could help us easy the transition as well. But as you point out, there is the fixed overhead of having to do the QA'ing at any rate. regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On 7/18/07, Lukas Kahwe Smith [EMAIL PROTECTED] wrote: Zeev Suraski wrote: Finally, at the risk of sounding like a broken record, we always need to remember that BC breakage accumulates, and it's not binary. Every cleanup we do in PHP 6 will further slow migration, and as Andi pointed out a few days ago, things don't look too well as it is. Agreed, its not binary, but its also not the simple addition of all issues either. The effort does diminish as you can cover multiple BC breaks in one going over your code. The key thing that we screwed up with PHP 5.x was not providing enough documentation on the BC breaks. Doing this better this time (the migration guides are a good start, porting some major apps and documenting the issues is another) could help us easy the transition as well. But as you point out, there is the fixed overhead of having to do the QA'ing at any rate. What we really screwed up are the breakages _after_ 5.0, between 5.0 and now. Every one expects changes and breakages between two major major versions, no matter the language. --Pierre -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Pierre wrote: On 7/18/07, Lukas Kahwe Smith [EMAIL PROTECTED] wrote: Zeev Suraski wrote: Finally, at the risk of sounding like a broken record, we always need to remember that BC breakage accumulates, and it's not binary. Every cleanup we do in PHP 6 will further slow migration, and as Andi pointed out a few days ago, things don't look too well as it is. Agreed, its not binary, but its also not the simple addition of all issues either. The effort does diminish as you can cover multiple BC breaks in one going over your code. The key thing that we screwed up with PHP 5.x was not providing enough documentation on the BC breaks. Doing this better this time (the migration guides are a good start, porting some major apps and documenting the issues is another) could help us easy the transition as well. But as you point out, there is the fixed overhead of having to do the QA'ing at any rate. What we really screwed up are the breakages _after_ 5.0, between 5.0 and now. Every one expects changes and breakages between two major major versions, no matter the language. True that ... the way E_STRICT was handled did not help either. Still looking forward to E_DEPRECATED. regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Johannes Schlüter wrote: Ah, another thing kind of related to this thread: We really need a proper way of having decisions declared as being made. Recently it happened quite often that many developer's thought some decision was made (for example from reading the Paris meeting notes) and then some developers come and say there wasn't anything finally decided, yet. But imo it's important to decide some things (like removal of possibly often used functionality) soon so user's can be informed and prepare their code and developers here can spent time on theses tasks knowing that they are following decisions. Maybe this should discussed independently from this thread - but it's a good example for the need... (while there might be reasons to change the decision - but that shouldn't happen too often) Yeah, I guess I should put higher emphasis on adding links to the todo page that reference key mailing list posts. regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
At 04:47 18/07/2007, Lukas Kahwe Smith wrote: Zeev Suraski wrote: Finally, at the risk of sounding like a broken record, we always need to remember that BC breakage accumulates, and it's not binary. Every cleanup we do in PHP 6 will further slow migration, and as Andi pointed out a few days ago, things don't look too well as it is. Agreed, its not binary, but its also not the simple addition of all issues either. The effort does diminish as you can cover multiple BC breaks in one going over your code. The key thing that we screwed up with PHP 5.x was not providing enough documentation on the BC breaks. Doing this better this time (the migration guides are a good start, porting some major apps and documenting the issues is another) could help us easy the transition as well. But as you point out, there is the fixed overhead of having to do the QA'ing at any rate. Well I don't think it really diminishes, but I agree that 1+1 is maybe 1.9 and not 2. On the other hand, if you remember that perception is everything (or at least very important), 1+1 can easily be perceived as 3, and in a negative sense. Zeev -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On 7/18/07, Zeev Suraski [EMAIL PROTECTED] wrote: Well I don't think it really diminishes, but I agree that 1+1 is maybe 1.9 and not 2. On the other hand, if you remember that perception is everything (or at least very important), 1+1 can easily be perceived as 3, and in a negative sense. Exactly. And many people lost much more time to hunt down smaller things like the Indirect modification of overloaded property.. or the numerous other annoying (but sometimes required) changes. And those means 1+1+1+1=2^32/F* php for most of them. A dropped extension, function or feature, when known (and done) soon enough, is by far easier (planning is possible, migration, etc.). --Pierre -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] POSIX regex
Functions would work properly with Unicode, but you would explicitly create Unicode strings e.g. ufoobar. This is not uncommon practice and many other languages actually go down this route incl. Python and various versions of C++ frameworks. Andi -Original Message- From: Derick Rethans [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 18, 2007 1:07 AM To: Stas Malyshev Cc: Lukas Kahwe Smith; Andi Gutmans; Ilia Alshanetsky; [EMAIL PROTECTED]; internals@lists.php.net Subject: Re: [PHP-DEV] POSIX regex On Tue, 17 Jul 2007, Stanislav Malyshev wrote: that would actually benefit quite a bit from unicode support, but I guess you are talking about porting with unicode==off, right? unicode=off doesn't mean no unicode support, btw. Of course that's what it means, as none of the string functions work properly with unicode if you turn it off. And that's just the whole selling point of Unicode support. Derick -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Tue, 17 Jul 2007 08:47:42 +0200, Lukas Kahwe Smith [EMAIL PROTECTED] wrote: Larry Garfield wrote: Non-core PHP developer speaking, so read with that in mind: One of the things that held back PHP 5 adoption for so long, IMO, is the large amount of FUD that surrounded it. Even now, 3 years after it was released, I keep seeing the argument that I can't drop PHP 4 and use PHP 5, then I have to rewrite *everything* to use objects. I hate objects. That is, of course, completely untrue, and if you're paying even moderate attention it's not at all difficult to write code that runs just fine in both PHP 4 and PHP 5, with and without register_globals and magic_quotes. All it takes is a little forethought and not letting yourself be sloppy. I have seen little of that. But I have seen issues due to array_merge() changes. But more importantly our handling of E_STRICT has made it difficult for many. Writing PHP 5/6 compatible code needs to be just as easy, if not easier, in addition to having better marketing to head off the FUD. Taking a stance of you'll have to start from scratch if you want to be PHP 6 compatible, oh well is an absolutely sure-fire way to guarantee that no one uses PHP 6 for anything except niche markets. I see it more as a question of being open about whats going on. If we would have had the upgrading guides from the beginning of 5.0.z, I think things would have been easier. I'm /quite/ sure you are correct here. As memory serves; ppl were polarized for or against almost immediately when PHP5 came out. This alone, is probably this single most important ingredient to produce a FUD factory. The fact that our x.0.z releases are not particularly popular is another issue. I think the biggest challenge PHP5 faced however was that it was mainly about making developers life easier, since PHP4 already enables you to do pretty much any kind of web site if you are willing to put in the required time. Native unicode to me feels a bit more like adding something that was not really doable before (sure you can but that would mean writing every lib yourself, so the time required is beyond the vast majority of dev teams). Then again its not like all developers will jump on unicode the second its released (mainly because not all end users are asking for this). But the point is, getting very high adoption rates for new PHP releases will always be hard. Just wondering; would it make /any/ sense to run a survey/poll on the PHP site, asking what feature/capability/etc.. they would most like to see in future PHP versions? This /might/ provide some insight for the developers to see if they are at all inline with the developers goals/roadmap. Point being; it may help future versions avoid the /underwhelming/ reception that PHP5 recieved. And better; might help future versions recieve the same success that PHP4 did. Just a thought. regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php / Service provided by hitOmeter.NET internet messaging! . -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Wednesday 18 July 2007, Rasmus Lerdorf wrote: Second, from the user space PHP developers' perspective. There are two groups of those out there. There is the group that builds apps for controlled environments. Yahoo, Facebook, and the hundreds, if not thousands of smaller companies out there that will define a certain PHP configuration and code against that. To them such a switch isn't a big deal except when it comes to re-using external code. Which bring us to the second group which is the group that strives to build portable apps designed to run on as many unknown PHP configs as possible. This is the group that will get hit by this, and here is where we need to figure out how to cause them the least amount of pain. They are going to feel some pain in order to get their heads around Unicode no matter how we handle this. For the portion of these folks who don't want to worry about Unicode at all and they actually have code that does stuff on binary strings that will break, their stuff just won't work no matter what we do. The difference comes down to whether it gets marked as PHP5-only or it gets marked as non-Unicode-only. And the other camp who do want to make sure their stuff supports Unicode will need to write the Unicode and non-Unicode versions and check to see if the system they are running on supports Unicode or not. Whether they check the PHP version number, or the Unicode switch, or probe directly for the features they need, it ends up being about the same amount of pain. Disclaimer again: PHP commit karma of 0, PHP development karma of some positive integer, PHP support karma of depends if you like gophp5.org or not. :-) Permit me to offer a concrete example. I am a Drupal developer; that is, I work on the Drupal CMS core and also get paid to build sites with Drupal professionally. Drupal has made a huge push for internationalization in the past year and a half or so. It's currently UTF-8 through and through, complete with user-space UTF-8-safe implementations of various string manipulation functions. Native Unicode support would be awesome. Drupal is used by a huge number of people on dedicated boxes where they control the environment. It's also used by an even huger number of people on shared hosts where they get almost no control over the environment. Right now it handles both quite well, under PHP 4.3.6-5.2.3. (PHP 4 to be dropped in version 7.) Now, when PHP 6 is released we are going to want to be able to run in PHP 6, and likely at some point in the future switch to PHP 6 only just as we're now (finally) moving to PHP 5 only. That means that, for a time, we'll have to be able to run with the same code base on PHP 5 and PHP 6. A great many people will want to run it on a PHP 6 unicode=on server, so they can leverage native Unicode support. A great many people will want to run it on shared hosts, which means either PHP 5 or PHP 6 unicode=off (because I don't expect shared hosts to default to unicode=on any more readily than they accepted the default of register_globals=off). And unlike register_globals, it won't be something we can change in the So there will be a prolonged period where we will have to be able to run on PHP 5.2, PHP 6 unicode=off, and PHP 6 unicode=on, even if we don't explicitly use PHP 6-only features yet. Simply excluding one of those three completely will not be a viable option. Maintaining two or three separate trees is also not an option. We simply don't have anywhere close to the resources to do that. (Plus Drupal is a plugin-based system, and asking plugin authors to do that is completely unreasonable.) So, just how much hair should we plan to pull out in order to make that happen? That's the million dollar question for me, and for, I suspect, most of the open source application developers out there. How can we minimize that hair loss? Right now I really don't know what the answer is. That's why I'm asking the question, because as C is really not a comfortable language for me anymore I have little ability to affect it directly. -- Larry Garfield AIM: LOLG42 [EMAIL PROTECTED] ICQ: 6817012 If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it. -- Thomas Jefferson -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Andi Gutmans wrote: There are clear things we want to change (like register_globals) because we believe that ultimately they have a significant benefit to our users with controllable downside (there is an easy one line workaround which we can document for people to get their old apps to work). There are other areas where breaking BC makes sense. But saying we should just break it across the board and not even consider having a good upgrade path for our users is unreasonable. I believe we can have a very good PHP 6, which is pretty much in sync with many of your feelings, but that provides a well documented and reasonable upgrade path (unlike VB - VB.NET). I never said we should break BC just for the hell of it. The goal must be that PHP6 feels and behaves like PHP. Its not about high-jacking PHP to come up with the language we all wanted instead. So let's not oversimplify this situation. We have to continue to make trade-offs. Sure, but you are suggesting to delay decisions indefinitely. Either you are saying this because you already decided that you don't want this change, or you are accepting that our users will be unable to prepare themselves for what happens in the future. This of course will make it that much harder for them to take the plunge into PHP6. Btw, one of PHP's strengths has been in high performance sites and with a Unicode=on only mode this would take quite a hit (but it's not the only reason why I need we need choice). In any case, I think on this question it does make sense that we start making informed decisions by understanding the migration path better, as opposed to just basing decisions on gut feelings. Maybe that kind of learning experience will proove me wrong (which may be so). I have not seen any proposed way of finding out this migration path besides lets wait. Lets wait is not the answer. What I asked for was exactly a decision on how far we are willing to go with the breakage and more importantly the fundamental decision about how we approach unicode in PHP6. The on off switch is not something that makes sense to delay until forever. Its a big decision and once its decided other things will become much easier (like PHP6 development or deciding the impact of other potential BC breaks). regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Larry Garfield wrote: Non-core PHP developer speaking, so read with that in mind: One of the things that held back PHP 5 adoption for so long, IMO, is the large amount of FUD that surrounded it. Even now, 3 years after it was released, I keep seeing the argument that I can't drop PHP 4 and use PHP 5, then I have to rewrite *everything* to use objects. I hate objects. That is, of course, completely untrue, and if you're paying even moderate attention it's not at all difficult to write code that runs just fine in both PHP 4 and PHP 5, with and without register_globals and magic_quotes. All it takes is a little forethought and not letting yourself be sloppy. I have seen little of that. But I have seen issues due to array_merge() changes. But more importantly our handling of E_STRICT has made it difficult for many. Writing PHP 5/6 compatible code needs to be just as easy, if not easier, in addition to having better marketing to head off the FUD. Taking a stance of you'll have to start from scratch if you want to be PHP 6 compatible, oh well is an absolutely sure-fire way to guarantee that no one uses PHP 6 for anything except niche markets. I see it more as a question of being open about whats going on. If we would have had the upgrading guides from the beginning of 5.0.z, I think things would have been easier. The fact that our x.0.z releases are not particularly popular is another issue. I think the biggest challenge PHP5 faced however was that it was mainly about making developers life easier, since PHP4 already enables you to do pretty much any kind of web site if you are willing to put in the required time. Native unicode to me feels a bit more like adding something that was not really doable before (sure you can but that would mean writing every lib yourself, so the time required is beyond the vast majority of dev teams). Then again its not like all developers will jump on unicode the second its released (mainly because not all end users are asking for this). But the point is, getting very high adoption rates for new PHP releases will always be hard. regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Hi Andi, On 7/16/07, Andi Gutmans [EMAIL PROTECTED] wrote: I disagree with this view of the world. Well, we seem to all agree on this view, but let forget this unsignificant fact :) It doesn't have to be a complete either/or decision and labeling everything as a bc hacks decision is an inacurrate and populistic way of building FUD. Your persistent way to tell me (I use me as I'm not in the position to talk for the other developers) that my way is populist, source of a FUD, or whatever else came through your mind at a given moment . Fine, if it helps you to make your point. However, can I suggest you to seriously consider the (legitimate) voices outside your (no matter how huge it is) world, it would be much appreciated. There are other areas where breaking BC makes sense. But saying we should just break it across the board and not even consider having a good upgrade path for our users is unreasonable. For what I see in the various code I can fgrep, pcre is already used much more than pcre. To migrate from ereg to pcre is a very small task and it only brings advantages (cache, unicode support if required,...). Ironically, a little pcre based script or grep should do the job, if any regexp fan likes to play with that :) Other changes in the engine will bring much more troubles (because they are not obvious). Just like they did in the past between two minor PHP versions. I believe we can have a very good PHP 6, which is pretty much in sync with many of your feelings, but that provides a well documented and reasonable upgrade path (unlike VB - VB.NET). It is comparing apple and orange. As far as I remember, VB.net was not really planed, they only realized how much their users liked VB and why they will not move to c* or whatever else :) If you want to break everything and anything It is not about breaking everything just for the fun of it but about creating a sane base to create portable and maintainable application and libraries. and don't want to be limited whatsoever by our huge user-base then maybe you should write a new language which fits exactly what your preference would be. The fact is though, that even after these discussions and the Paris discussions, the bulk of the idiosyncracies which make PHP what it is today will remain (as per agreement). You seem to have a straight view on what should be PHP6, why don't you publish it (we have a wiki for this exact purpose) and let see that we (as PHP internals developers) think about it, the sooner the better (and once for all). Waiting indefinitely is not a solution, and taking quick decisions a week before the final release neither. Taking early decision will let us adapt them or change them if necessary. Our users will have the time to think about the consequences and tell us their needs or fears. So let's not oversimplify this situation. We have to continue to make trade-offs. Let's not complicate it either. Btw, one of PHP's strengths has been in high performance sites and with a Unicode=on only mode this would take quite a hit (but it's not the only reason why I need we need choice). In any case, I think on this question it does make sense that we start making informed decisions by understanding the migration path better, as opposed to just basing decisions on gut feelings. Maybe that kind of learning experience will proove me wrong (which may be so). With the risk to repeat myself, we already learned from PHP5. There is nothing that can prevent users to migrate quicker than they want (read: quicker that they need) except if the benefits are enormous, but that's not the case (it is but not for a large amount of users). We can keep dreaming about a short migration path for PHP6 or we can simply take the right decisions. Saying that we are not informed is a poor excuse to delay any critical decisions. We are informed, we use php daily and we have to deal every day with the issues we try to solve now. Cheers, --Pierre -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] POSIX regex
A few months ago we agreed that we will give our users the choice of both modes. The burdon of maintenance has mainly been on us btw as the majority of the differences here are in the Zend Engine and the extensions don't have as much work associated with them. Here's my proposed way of figuring how to make migration easier. Port the following applications to PHP 6 and let's see what we can learn from it: - mediaWiki - SugarCRM - Drupal - Wordpress I don't think we can have more of a reality check than actually going through this exercise and understanding the issues. As I mentioned from the small work we have done up to now it seems like there really is no migration patch except for applications to be almost completely rewritten when unicode_semantics=on. I don't think this is a feasible way to go. But if volunteers can work on this porting and it allows us to fix some things (if they are fixable) then that would change the situation. I believe that people who actually do this exercise and want to have a migration path will understand that there's no other way except to support unicode_semantics=off. Btw, most languages deliver Unicode in this way and it works pretty well. Andi -Original Message- From: Lukas Kahwe Smith [mailto:[EMAIL PROTECTED] Sent: Monday, July 16, 2007 11:40 PM To: Andi Gutmans Cc: Ilia Alshanetsky; [EMAIL PROTECTED]; internals@lists.php.net Subject: Re: [PHP-DEV] POSIX regex Andi Gutmans wrote: There are clear things we want to change (like register_globals) because we believe that ultimately they have a significant benefit to our users with controllable downside (there is an easy one line workaround which we can document for people to get their old apps to work). There are other areas where breaking BC makes sense. But saying we should just break it across the board and not even consider having a good upgrade path for our users is unreasonable. I believe we can have a very good PHP 6, which is pretty much in sync with many of your feelings, but that provides a well documented and reasonable upgrade path (unlike VB - VB.NET). I never said we should break BC just for the hell of it. The goal must be that PHP6 feels and behaves like PHP. Its not about high-jacking PHP to come up with the language we all wanted instead. So let's not oversimplify this situation. We have to continue to make trade-offs. Sure, but you are suggesting to delay decisions indefinitely. Either you are saying this because you already decided that you don't want this change, or you are accepting that our users will be unable to prepare themselves for what happens in the future. This of course will make it that much harder for them to take the plunge into PHP6. Btw, one of PHP's strengths has been in high performance sites and with a Unicode=on only mode this would take quite a hit (but it's not the only reason why I need we need choice). In any case, I think on this question it does make sense that we start making informed decisions by understanding the migration path better, as opposed to just basing decisions on gut feelings. Maybe that kind of learning experience will proove me wrong (which may be so). I have not seen any proposed way of finding out this migration path besides lets wait. Lets wait is not the answer. What I asked for was exactly a decision on how far we are willing to go with the breakage and more importantly the fundamental decision about how we approach unicode in PHP6. The on off switch is not something that makes sense to delay until forever. Its a big decision and once its decided other things will become much easier (like PHP6 development or deciding the impact of other potential BC breaks). regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] POSIX regex
Just FYI: I did not agree with that choice. And IIRC, neither did several other people here. --Jani On Tue, 2007-07-17 at 07:27 -0700, Andi Gutmans wrote: A few months ago we agreed that we will give our users the choice of both modes. The burdon of maintenance has mainly been on us btw as the majority of the differences here are in the Zend Engine and the extensions don't have as much work associated with them. Here's my proposed way of figuring how to make migration easier. Port the following applications to PHP 6 and let's see what we can learn from it: - mediaWiki - SugarCRM - Drupal - Wordpress I don't think we can have more of a reality check than actually going through this exercise and understanding the issues. As I mentioned from the small work we have done up to now it seems like there really is no migration patch except for applications to be almost completely rewritten when unicode_semantics=on. I don't think this is a feasible way to go. But if volunteers can work on this porting and it allows us to fix some things (if they are fixable) then that would change the situation. I believe that people who actually do this exercise and want to have a migration path will understand that there's no other way except to support unicode_semantics=off. Btw, most languages deliver Unicode in this way and it works pretty well. Andi -Original Message- From: Lukas Kahwe Smith [mailto:[EMAIL PROTECTED] Sent: Monday, July 16, 2007 11:40 PM To: Andi Gutmans Cc: Ilia Alshanetsky; [EMAIL PROTECTED]; internals@lists.php.net Subject: Re: [PHP-DEV] POSIX regex Andi Gutmans wrote: There are clear things we want to change (like register_globals) because we believe that ultimately they have a significant benefit to our users with controllable downside (there is an easy one line workaround which we can document for people to get their old apps to work). There are other areas where breaking BC makes sense. But saying we should just break it across the board and not even consider having a good upgrade path for our users is unreasonable. I believe we can have a very good PHP 6, which is pretty much in sync with many of your feelings, but that provides a well documented and reasonable upgrade path (unlike VB - VB.NET). I never said we should break BC just for the hell of it. The goal must be that PHP6 feels and behaves like PHP. Its not about high-jacking PHP to come up with the language we all wanted instead. So let's not oversimplify this situation. We have to continue to make trade-offs. Sure, but you are suggesting to delay decisions indefinitely. Either you are saying this because you already decided that you don't want this change, or you are accepting that our users will be unable to prepare themselves for what happens in the future. This of course will make it that much harder for them to take the plunge into PHP6. Btw, one of PHP's strengths has been in high performance sites and with a Unicode=on only mode this would take quite a hit (but it's not the only reason why I need we need choice). In any case, I think on this question it does make sense that we start making informed decisions by understanding the migration path better, as opposed to just basing decisions on gut feelings. Maybe that kind of learning experience will proove me wrong (which may be so). I have not seen any proposed way of finding out this migration path besides lets wait. Lets wait is not the answer. What I asked for was exactly a decision on how far we are willing to go with the breakage and more importantly the fundamental decision about how we approach unicode in PHP6. The on off switch is not something that makes sense to delay until forever. Its a big decision and once its decided other things will become much easier (like PHP6 development or deciding the impact of other potential BC breaks). regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] POSIX regex
Hmm I don't quite understand what bad code vs. good code plays here. Wordpress is one of the most popular applications out there so it's got huge value to our community. I bet there's a huge amount of PHP applications who's source code is of the same quality or worse. Anyway, the issues I have seen would also be relevant to what you call good code but again, when it comes to compatibility, I don't quite know why that will play a big role. I am talking about porting to both unicode_semantics=off and on. This will give us a good understanding of the difference of the modes and where we're at. I bet most people who are voicing their opinions have actually not tried to write a sizeable application with PHP 6 and also tried to run an existing one on PHP 6 (unciode_semantics=on). I can also do some performance testing in our performance lab once we have both working. I haven't yet mentioned how companies building high-performance sites would probably take a huge hit by moving to Unicode to the point where I think they will not adopt for a long time and then will be faced with the choice to migrate off of PHP or bite the bullet. With some of the companies I know that have huge server farms adding 50% capacity (or whatever the number is) could be a good enough reason to migate off as they are paying huge fees for the servers... Andi -Original Message- From: Lukas Kahwe Smith [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 17, 2007 7:50 AM To: Andi Gutmans Cc: Ilia Alshanetsky; [EMAIL PROTECTED]; internals@lists.php.net Subject: Re: [PHP-DEV] POSIX regex Andi Gutmans wrote: Here's my proposed way of figuring how to make migration easier. Port the following applications to PHP 6 and let's see what we can learn from it: - mediaWiki - SugarCRM - Drupal - Wordpress IIRC Wordpress is a good example of bad source code to fix. Drupal would be a good example of a PHP4 style fairly procedural app to port. mediaWiki also seems like a worthy cause since its one of those apps that would actually benefit quite a bit from unicode support, but I guess you are talking about porting with unicode==off, right? SugarCRM would be a good example of a gigantic horrible horrible source code to fix and I am not sure if I would put it on the list considering the limited open source release they do. I think it would be cool of they would do it themselves or sponsor whoever is doing it. We also have an SoC project where someone is implementing a PHP6 version of Jaws. regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Andi Gutmans wrote: Here's my proposed way of figuring how to make migration easier. Port the following applications to PHP 6 and let's see what we can learn from it: - mediaWiki - SugarCRM - Drupal - Wordpress IIRC Wordpress is a good example of bad source code to fix. Drupal would be a good example of a PHP4 style fairly procedural app to port. mediaWiki also seems like a worthy cause since its one of those apps that would actually benefit quite a bit from unicode support, but I guess you are talking about porting with unicode==off, right? SugarCRM would be a good example of a gigantic horrible horrible source code to fix and I am not sure if I would put it on the list considering the limited open source release they do. I think it would be cool of they would do it themselves or sponsor whoever is doing it. We also have an SoC project where someone is implementing a PHP6 version of Jaws. regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] POSIX regex
I thought you were retired at the time... -Original Message- From: Jani Taskinen [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 17, 2007 7:37 AM To: Andi Gutmans Cc: internals@lists.php.net Subject: RE: [PHP-DEV] POSIX regex Just FYI: I did not agree with that choice. And IIRC, neither did several other people here. --Jani On Tue, 2007-07-17 at 07:27 -0700, Andi Gutmans wrote: A few months ago we agreed that we will give our users the choice of both modes. The burdon of maintenance has mainly been on us btw as the majority of the differences here are in the Zend Engine and the extensions don't have as much work associated with them. Here's my proposed way of figuring how to make migration easier. Port the following applications to PHP 6 and let's see what we can learn from it: - mediaWiki - SugarCRM - Drupal - Wordpress I don't think we can have more of a reality check than actually going through this exercise and understanding the issues. As I mentioned from the small work we have done up to now it seems like there really is no migration patch except for applications to be almost completely rewritten when unicode_semantics=on. I don't think this is a feasible way to go. But if volunteers can work on this porting and it allows us to fix some things (if they are fixable) then that would change the situation. I believe that people who actually do this exercise and want to have a migration path will understand that there's no other way except to support unicode_semantics=off. Btw, most languages deliver Unicode in this way and it works pretty well. Andi -Original Message- From: Lukas Kahwe Smith [mailto:[EMAIL PROTECTED] Sent: Monday, July 16, 2007 11:40 PM To: Andi Gutmans Cc: Ilia Alshanetsky; [EMAIL PROTECTED]; internals@lists.php.net Subject: Re: [PHP-DEV] POSIX regex Andi Gutmans wrote: There are clear things we want to change (like register_globals) because we believe that ultimately they have a significant benefit to our users with controllable downside (there is an easy one line workaround which we can document for people to get their old apps to work). There are other areas where breaking BC makes sense. But saying we should just break it across the board and not even consider having a good upgrade path for our users is unreasonable. I believe we can have a very good PHP 6, which is pretty much in sync with many of your feelings, but that provides a well documented and reasonable upgrade path (unlike VB - VB.NET). I never said we should break BC just for the hell of it. The goal must be that PHP6 feels and behaves like PHP. Its not about high-jacking PHP to come up with the language we all wanted instead. So let's not oversimplify this situation. We have to continue to make trade-offs. Sure, but you are suggesting to delay decisions indefinitely. Either you are saying this because you already decided that you don't want this change, or you are accepting that our users will be unable to prepare themselves for what happens in the future. This of course will make it that much harder for them to take the plunge into PHP6. Btw, one of PHP's strengths has been in high performance sites and with a Unicode=on only mode this would take quite a hit (but it's not the only reason why I need we need choice). In any case, I think on this question it does make sense that we start making informed decisions by understanding the migration path better, as opposed to just basing decisions on gut feelings. Maybe that kind of learning experience will proove me wrong (which may be so). I have not seen any proposed way of finding out this migration path besides lets wait. Lets wait is not the answer. What I asked for was exactly a decision on how far we are willing to go with the breakage and more importantly the fundamental decision about how we approach unicode in PHP6. The on off switch is not something that makes sense to delay until forever. Its a big decision and once its decided other things will become much easier (like PHP6 development or deciding the impact of other potential BC breaks). regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Andi Gutmans wrote: Hmm I don't quite understand what bad code vs. good code plays here. Wordpress is one of the most popular applications out there so it's got huge value to our community. I bet there's a huge amount of PHP applications who's source code is of the same quality or worse. Anyway, the issues I have seen would also be relevant to what you call good code but again, when it comes to compatibility, I don't quite know why that will play a big role. Bad good in the sense its messy. But what I was going at is that I find your proposed list good with the exception of SugarCRM. It might be good to also include a php5 only app, so that we have a good idea of how messy code, fairly procedural, E_STRICT complaint etc code ports to PHP6 unicode==off. I am talking about porting to both unicode_semantics=off and on. This will give us a good understanding of the difference of the modes and where we're at. I bet most people who are voicing their opinions have actually not tried to write a sizeable application with PHP 6 and also tried to run an existing one on PHP 6 (unciode_semantics=on). I can also do some performance testing in our ok .. this makes this quite a large undertaking indeed. regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On 7/17/07, Andi Gutmans [EMAIL PROTECTED] wrote: Hmm I don't quite understand what bad code vs. good code plays here. Wordpress is one of the most popular applications out there so it's got huge value to our community. I bet there's a huge amount of PHP applications who's source code is of the same quality or worse. Anyway, the issues I have seen would also be relevant to what you call good code but again, when it comes to compatibility, I don't quite know why that will play a big role. Using PHP4 as a base to test the compatibility of PHP6 is a bad idea. The entry point should be PHP5+ (even if the troubles begin between 5.1 and 5.2). Having apps running on 5.2 with E_STRICT without notices would be a good indicator about how it will work with php6 without unicode (or php 5.3 for php6/Off and php6 with unicode only). I am talking about porting to both unicode_semantics=off and on. This will give us a good understanding of the difference of the modes and where we're at. I bet most people who are voicing their opinions have actually not tried to write a sizeable application with PHP 6 and also tried to run an existing one on PHP 6 (unciode_semantics=on). I did. And please (for god' sake...), can you stop to make bad assumptions about what other knows or not? With all my apps and I'm well aware of the work I will need to port them. But this work is required as long as I'm interested in Unicode. Unicode off? No interest sorry, I do not care about Namespace for my existing apps. Don't get me wrong: I love them but I don't consider this feature as critical for my _exisiting_ applications. They work without since years, they will continue to work without a couple of more years. Using Namespace will require more work anyway. I can also do some performance testing in our performance lab once we have both working. I haven't yet mentioned how companies building high-performance sites would probably take a huge hit by moving to Unicode to the point where I think they will not adopt for a long time and then will be faced with the choice to migrate off of PHP or bite the bullet. With some of the companies I know that have huge server farms adding 50% capacity (or whatever the number is) could be a good enough reason to migate off as they are paying huge fees for the servers... 50% increase sounds off base. But I did not bench php6 yet. When all the new features are implemented, it will make more sense to work on the performance problem. For now, it is simply premature. Gruß, --Pierre -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On 7/17/07, Andi Gutmans [EMAIL PROTECTED] wrote: I thought you were retired at the time... Other were not. Some other were not even present. And those who were present seem to have different interpretations of the decisions. I also have to say that this meeting was done when we were not actually informed. --Pierre -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
that would actually benefit quite a bit from unicode support, but I guess you are talking about porting with unicode==off, right? unicode=off doesn't mean no unicode support, btw. -- Stanislav Malyshev, Zend Software Architect [EMAIL PROTECTED] http://www.zend.com/ (408)253-8829 MSN: [EMAIL PROTECTED] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Pierre kirjoitti: 50% increase sounds off base. But I did not bench php6 yet. When all the new features are implemented, it will make more sense to work on the performance problem. For now, it is simply premature. If Moore's law stands for the coming years, this argument is moot anyway. By the time PHP 6 is out the door, any performance issues are insignificant. :) And by the time people actually start using PHP 6, it's propably already antique tech anyway..(around 2015 or so) :D --Jani -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
50% increase sounds off base. But I did not bench php6 yet. When all the new features are implemented, it will make more sense to work on the performance problem. For now, it is simply premature. If Moore's law stands for the coming years, this argument is moot anyway. By the time PHP 6 is out the door, any performance issues are insignificant. :) If you have setup with 10 machines and new interpreter works 10% faster, you can serve same amount of users with 9 machines. Plus Moore talks about number of transistors and not about performance or power consumption. -- Tomas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Nitpicking, are we? :) Tomas Kuliavas kirjoitti: 50% increase sounds off base. But I did not bench php6 yet. When all the new features are implemented, it will make more sense to work on the performance problem. For now, it is simply premature. If Moore's law stands for the coming years, this argument is moot anyway. By the time PHP 6 is out the door, any performance issues are insignificant. :) If you have setup with 10 machines and new interpreter works 10% faster, you can serve same amount of users with 9 machines. Plus Moore talks about number of transistors and not about performance or power consumption. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On 7/17/07, Tomas Kuliavas [EMAIL PROTECTED] wrote: 50% increase sounds off base. But I did not bench php6 yet. When all the new features are implemented, it will make more sense to work on the performance problem. For now, it is simply premature. If Moore's law stands for the coming years, this argument is moot anyway. By the time PHP 6 is out the door, any performance issues are insignificant. :) If you have setup with 10 machines and new interpreter works 10% faster, you can serve same amount of users with 9 machines. Plus Moore talks about number of transistors and not about performance or power consumption. Three core in one processor consume less than three different processors. More CPUs in one host will also faster than many hosts (processing power). Sorry, but Jani's reference to Moore is correct. But that's definitively not the topic :) -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Mon, July 16, 2007 7:47 am, Jani Taskinen wrote: I have moved the POSIX regex dependant functions to ext/ereg/ extension. Now only places using the POSIX regex functions (ext/ereg/ excluded) are ext/standard/browscap.c and ext/pgsql/pgsql.c. I took a brief look at the pgsql.c stuff, and it looks like it's all fairly straight-forward to alter to PCRE instead of POSIX, and it's all localized to this function: http://lxr.php.net/ident?i=php_pgsql_convert_match Am I under-estimating the problem? Or is it actually possible that *I* could fix this and contribute something useful for once? Is anybody else already on it? Cuz I'm gonna go download CVS and see if I can't submit a patch... [be afraid, be very afraid] -- Some people have a gift link here. Know what I want? I want you to buy a CD from some indie artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Richard Lynch kirjoitti: I took a brief look at the pgsql.c stuff, and it looks like it's all fairly straight-forward to alter to PCRE instead of POSIX, and it's all localized to this function: http://lxr.php.net/ident?i=php_pgsql_convert_match Am I under-estimating the problem? Propably not. Is anybody else already on it? AFAIK, no. Feel free. I hope you have pgsql in use and you can test it too. :) --Jani -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Tue, July 17, 2007 4:29 pm, Jani Taskinen wrote: Richard Lynch kirjoitti: I took a brief look at the pgsql.c stuff, and it looks like it's all fairly straight-forward to alter to PCRE instead of POSIX, and it's all localized to this function: http://lxr.php.net/ident?i=php_pgsql_convert_match Am I under-estimating the problem? Propably not. Is anybody else already on it? AFAIK, no. Feel free. I hope you have pgsql in use and you can test it too. :) Yes and yes. Errr, I guess I'd better make that Yes and I'll try :-) I use PostgreSQL a lot, actually. -- Some people have a gift link here. Know what I want? I want you to buy a CD from some indie artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
At 00:21 17/07/2007, Pierre wrote: Hi Andi, On 7/16/07, Andi Gutmans [EMAIL PROTECTED] wrote: I disagree with this view of the world. Well, we seem to all agree on this view, but let forget this unsignificant fact :) Pierre, I wanted to send my 2c even though I'm not really involved in internals@ any longer - because in reality it doesn't really have much to do with such decisions. internals@ makes decisions that effect the entire PHP userbase. We all need to remember that the people on this mailing list are not close to something that represents the userbase. We do have some very opinionated people on this list, some of them with a lot of commit-karma - which are not very open to feedback from regular users. I'm not saying I represent the PHP userbase, and I don't think Andi is saying this either - but both of us try to take the end user's view when we think about stuff like this, as opposed as the internals@ PHP developer view. I would go as far as saying that I think we do it (as well as some others, like Rasmus) more so than some others on this list. For that reason I suspect that if you moved the discussion to, say, php-general - you'd see a much more balanced view of the world. Unfortunately it will probably not be very manageable. Something more practical would be trying to think about things from the end users perspective as opposed to our perspective as the developers and maintainers of the language. Finally, at the risk of sounding like a broken record, we always need to remember that BC breakage accumulates, and it's not binary. Every cleanup we do in PHP 6 will further slow migration, and as Andi pointed out a few days ago, things don't look too well as it is. As for ereg - especially in light of the discontinuation of PHP 4 we shouldn't even consider removing it in PHP 5. I agree with Andi that I'm not sure it's a good idea for PHP 6 either, but I'm not sure it isn't either. As long as it's easy enough to turn it back on (i.e. have it bundled but disabled) I think it's not unreasonable. Zeev -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] POSIX regex
Why move it to PECL? I agree that PCRE is the preferred way but not having ereg() will break a huge amount of applications for very little gain. We might possibly want to consider disabling by default but not having it in the default package doesn't make real sense. Trying to do browscap.c and pgsql.c with PCRE sounds right (if it's possible which it probably is). Andi -Original Message- From: Jani Taskinen [mailto:[EMAIL PROTECTED] Sent: Monday, July 16, 2007 5:48 AM To: internals@lists.php.net Subject: [PHP-DEV] POSIX regex I have moved the POSIX regex dependant functions to ext/ereg/ extension. Now only places using the POSIX regex functions (ext/ereg/ excluded) are ext/standard/browscap.c and ext/pgsql/pgsql.c. So what to do with these 2 places using the POSIX stuff? Convert them to use PCRE functions or enable PCRE to be build with the POSIX compat functions? ext/ereg/ is going to go to PECL anyway.. --Jani -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On 16-Jul-07, at 9:46 AM, Andi Gutmans wrote: Why move it to PECL? I agree that PCRE is the preferred way but not having ereg() will break a huge amount of applications for very little gain. I tend to agree, unless we provide wrappers via PCRE that emulate ereg functionality I don't think we can remove posix regex until PHP 6. Ilia Alshanetsky -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] POSIX regex
Please read about the decision done regarding this and why it was done at: http://derickrethans.nl/files/meeting-notes.html#move-ereg-to-pecl This is getting quite boring. You have had over 2 years to read about this and complain..and this wasn't the first time with your usual comment will break a huge amount of applications about anything we're trying to improve. removed usual rant about BC --Jani On Mon, 2007-07-16 at 06:53 -0700, Andi Gutmans wrote: Even in PHP 6 I am not sure it's a good idea. There are a huge amount of apps that use them and it'll be very hard for people to upgrade. Anyway, let's do some more research on that once we get closer to PHP 6 and see what the migration path looks like. We'll have to check with a few popular apps + google code search :) No need to decide on that right now without having more info. Andi -Original Message- From: Ilia Alshanetsky [mailto:[EMAIL PROTECTED] Sent: Monday, July 16, 2007 6:48 AM To: Andi Gutmans Cc: [EMAIL PROTECTED]; internals@lists.php.net Subject: Re: [PHP-DEV] POSIX regex On 16-Jul-07, at 9:46 AM, Andi Gutmans wrote: Why move it to PECL? I agree that PCRE is the preferred way but not having ereg() will break a huge amount of applications for very little gain. I tend to agree, unless we provide wrappers via PCRE that emulate ereg functionality I don't think we can remove posix regex until PHP 6. Ilia Alshanetsky -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On 7/16/07, Andi Gutmans [EMAIL PROTECTED] wrote: Even in PHP 6 I am not sure it's a good idea. As far as I know, Jani is referring to PHP6 only. And it was decided in the php6 notes. I'm in favour to remove ereg in php6, and the sooner we decide the better.Users will know about this change and will finally understand the PCRE superiority and why they should use it instead, and today. As of 5.x (5.2.x or 5.3.x), I rather prefer to deprecate it in 5.3 (if any) but I don't think we should remove it in 5.x. Cheers, --Pierre -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] POSIX regex
On Mon, 16 Jul 2007, Andi Gutmans wrote: Even in PHP 6 I am not sure it's a good idea. There are a huge amount of apps that use them and it'll be very hard for people to upgrade. Their apps are breaking anyway and three regex engines doesn't make sense. Derick -- Derick Rethans http://derickrethans.nl | http://ez.no | http://xdebug.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] POSIX regex
Even in PHP 6 I am not sure it's a good idea. There are a huge amount of apps that use them and it'll be very hard for people to upgrade. Anyway, let's do some more research on that once we get closer to PHP 6 and see what the migration path looks like. We'll have to check with a few popular apps + google code search :) No need to decide on that right now without having more info. Andi -Original Message- From: Ilia Alshanetsky [mailto:[EMAIL PROTECTED] Sent: Monday, July 16, 2007 6:48 AM To: Andi Gutmans Cc: [EMAIL PROTECTED]; internals@lists.php.net Subject: Re: [PHP-DEV] POSIX regex On 16-Jul-07, at 9:46 AM, Andi Gutmans wrote: Why move it to PECL? I agree that PCRE is the preferred way but not having ereg() will break a huge amount of applications for very little gain. I tend to agree, unless we provide wrappers via PCRE that emulate ereg functionality I don't think we can remove posix regex until PHP 6. Ilia Alshanetsky -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Mon, 16 Jul 2007, Pierre wrote: On 7/16/07, Andi Gutmans [EMAIL PROTECTED] wrote: Even in PHP 6 I am not sure it's a good idea. As far as I know, Jani is referring to PHP6 only. And it was decided in the php6 notes. Unfortunately that is not true. It's only the title of the agenda point, it's not part of the conclusions. I'm in favour to remove ereg in php6, and the sooner we decide the better. Yes, I agree. Users will know about this change and will finally understand the PCRE superiority and why they should use it instead, and today. However, users should learn how to use the new regexp engine as that will support Unicode :) regards, Derick -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
PCRE has a POSIX API, so it is possible to use PCRE as a drop-in replacement for the engine behind ereg(). What I don't know is how compatible it is with the current engine. But I think it worth investigating. Nuno P.S.: this POSIX PCRE layer isn't currently bundled with PHP, because it wasn't needed so far. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Andi Gutmans wrote: Even in PHP 6 I am not sure it's a good idea. There are a huge amount of apps that use them and it'll be very hard for people to upgrade. Anyway, let's do some more research on that once we get closer to PHP 6 and see what the migration path looks like. We'll have to check with a few popular apps + google code search :) No need to decide on that right now without having more info. I disagree with this approach. The thing is that we need to get a clear message out ASAP. This all ties into topics like if we will have a unicode off/on switch or not. Delaying these decisions will hurt our userbase. We need to prepare them early. IMHO we should use PHP6 as the clean up release. Drop unicode on/off switch, accept that the bulk of all code will need to be rewritten from scratch. The benefit will be that it will truely be cleaned up, people will still be able to leverage the bulk of their PHP programming background and they can enjoy the fastest possible unicode engine we can provide them. PHP5 will be for the people that cannot make the jump. We will back port whatever we can reasonably get into PHP5. People will linger on PHP5, just as they are doing now with PHP4. So it goes. At least we will not punish the early adopters for those that are unwilling to move to the new version in the near future anyways. At any rate .. the time is now to make a decision on what its gonna be. PHP6 with BC hacks or not. regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Ilia Alshanetsky wrote: On 16-Jul-07, at 9:46 AM, Andi Gutmans wrote: Why move it to PECL? I agree that PCRE is the preferred way but not having ereg() will break a huge amount of applications for very little gain. I tend to agree, unless we provide wrappers via PCRE that emulate ereg functionality I don't think we can remove posix regex until PHP 6. Doing before PHP6 would require some very very solid wrappers. Giving the little phpt coverage (*) we currently seem to have for ereg, I do not think its really possible to be able to even determine if any attempt at a wrapper is truely solid or not. regards, Lukas (*) Actually I heard this said on IRC. I do not see an ext/ereg on gcov.php.net .. are the available tests part of regexp? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On Mon, 2007-07-16 at 15:22 +0100, Nuno Lopes wrote: PCRE has a POSIX API, so it is possible to use PCRE as a drop-in replacement for the engine behind ereg(). What I don't know is how compatible it is with the current engine. But I think it worth investigating. Worked fine when I tested it. But it's quite pointless, it's still not unicode friendly. It's just better to use system POSIX regex funcs if ext/ereg/ is to stay..which is stupid since all functions it provides can be easily replaced with unicode friendly preg_* funcs. Nobody should use ereg_*() for anything if they want to use unicode. If they don't need unicode, they don't need PHP 6 either. P.S.: this POSIX PCRE layer isn't currently bundled with PHP, because it wasn't needed so far. It is bundled, just isn't compiled, see above.. :) --Jani -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Thank you Lukas for expressing exactly my thoughts on this. On 16.07.2007 18:24, Lukas Kahwe Smith wrote: Andi Gutmans wrote: Even in PHP 6 I am not sure it's a good idea. There are a huge amount of apps that use them and it'll be very hard for people to upgrade. Anyway, let's do some more research on that once we get closer to PHP 6 and see what the migration path looks like. We'll have to check with a few popular apps + google code search :) No need to decide on that right now without having more info. I disagree with this approach. The thing is that we need to get a clear message out ASAP. This all ties into topics like if we will have a unicode off/on switch or not. Delaying these decisions will hurt our userbase. We need to prepare them early. IMHO we should use PHP6 as the clean up release. Drop unicode on/off switch, accept that the bulk of all code will need to be rewritten from scratch. The benefit will be that it will truely be cleaned up, people will still be able to leverage the bulk of their PHP programming background and they can enjoy the fastest possible unicode engine we can provide them. PHP5 will be for the people that cannot make the jump. We will back port whatever we can reasonably get into PHP5. People will linger on PHP5, just as they are doing now with PHP4. So it goes. At least we will not punish the early adopters for those that are unwilling to move to the new version in the near future anyways. At any rate .. the time is now to make a decision on what its gonna be. PHP6 with BC hacks or not. regards, Lukas -- Wbr, Antony Dovgal -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Thank you Lucas and Antony. Could not agree more.. On Mon, 2007-07-16 at 19:19 +0400, Antony Dovgal wrote: Thank you Lukas for expressing exactly my thoughts on this. On 16.07.2007 18:24, Lukas Kahwe Smith wrote: Andi Gutmans wrote: Even in PHP 6 I am not sure it's a good idea. There are a huge amount of apps that use them and it'll be very hard for people to upgrade. Anyway, let's do some more research on that once we get closer to PHP 6 and see what the migration path looks like. We'll have to check with a few popular apps + google code search :) No need to decide on that right now without having more info. I disagree with this approach. The thing is that we need to get a clear message out ASAP. This all ties into topics like if we will have a unicode off/on switch or not. Delaying these decisions will hurt our userbase. We need to prepare them early. IMHO we should use PHP6 as the clean up release. Drop unicode on/off switch, accept that the bulk of all code will need to be rewritten from scratch. The benefit will be that it will truely be cleaned up, people will still be able to leverage the bulk of their PHP programming background and they can enjoy the fastest possible unicode engine we can provide them. PHP5 will be for the people that cannot make the jump. We will back port whatever we can reasonably get into PHP5. People will linger on PHP5, just as they are doing now with PHP4. So it goes. At least we will not punish the early adopters for those that are unwilling to move to the new version in the near future anyways. At any rate .. the time is now to make a decision on what its gonna be. PHP6 with BC hacks or not. regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On 7/16/07, Jani Taskinen [EMAIL PROTECTED] wrote: Thank you Lucas and Antony. Could not agree more.. On Mon, 2007-07-16 at 19:19 +0400, Antony Dovgal wrote: Thank you Lukas for expressing exactly my thoughts on this. On 16.07.2007 18:24, Lukas Kahwe Smith wrote: Andi Gutmans wrote: Even in PHP 6 I am not sure it's a good idea. There are a huge amount of apps that use them and it'll be very hard for people to upgrade. Anyway, let's do some more research on that once we get closer to PHP 6 and see what the migration path looks like. We'll have to check with a few popular apps + google code search :) No need to decide on that right now without having more info. I disagree with this approach. The thing is that we need to get a clear message out ASAP. This all ties into topics like if we will have a unicode off/on switch or not. Delaying these decisions will hurt our userbase. We need to prepare them early. IMHO we should use PHP6 as the clean up release. Drop unicode on/off switch, accept that the bulk of all code will need to be rewritten from scratch. The benefit will be that it will truely be cleaned up, people will still be able to leverage the bulk of their PHP programming background and they can enjoy the fastest possible unicode engine we can provide them. PHP5 will be for the people that cannot make the jump. We will back port whatever we can reasonably get into PHP5. People will linger on PHP5, just as they are doing now with PHP4. So it goes. At least we will not punish the early adopters for those that are unwilling to move to the new version in the near future anyways. At any rate .. the time is now to make a decision on what its gonna be. PHP6 with BC hacks or not. regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php Another thing to mention is that without GLOBALS (PHP6), most application and coughphp4-developers/cough will have far more problems than without posix regex'es. D -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
On 7/16/07, Jani Taskinen [EMAIL PROTECTED] wrote: Thank you Lucas and Antony. Could not agree more.. But we all agree, don't we? :) -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] POSIX regex
Non-core PHP developer speaking, so read with that in mind: One of the things that held back PHP 5 adoption for so long, IMO, is the large amount of FUD that surrounded it. Even now, 3 years after it was released, I keep seeing the argument that I can't drop PHP 4 and use PHP 5, then I have to rewrite *everything* to use objects. I hate objects. That is, of course, completely untrue, and if you're paying even moderate attention it's not at all difficult to write code that runs just fine in both PHP 4 and PHP 5, with and without register_globals and magic_quotes. All it takes is a little forethought and not letting yourself be sloppy. Writing PHP 5/6 compatible code needs to be just as easy, if not easier, in addition to having better marketing to head off the FUD. Taking a stance of you'll have to start from scratch if you want to be PHP 6 compatible, oh well is an absolutely sure-fire way to guarantee that no one uses PHP 6 for anything except niche markets. If people are still relying on register_globals at this point, sure, they're screwed no matter what they do. But code written to PHP 5 E_STRICT standards with a recommended configuration (register_globals off, etc.) should be possible to make run successfully in PHP 6 without gutting and starting from scratch (even if you can't use the new-and-cool features). If not, GoPHP6 will be a failure before it even gets started. :-) (And yes, I'm already pondering how to do GoPHP6 in order to make the 5/6 transition smoother.) On Monday 16 July 2007, Andi Gutmans wrote: I disagree with this view of the world. It doesn't have to be a complete either/or decision and labeling everything as a bc hacks decision is an inacurrate and populistic way of building FUD. There are clear things we want to change (like register_globals) because we believe that ultimately they have a significant benefit to our users with controllable downside (there is an easy one line workaround which we can document for people to get their old apps to work). There are other areas where breaking BC makes sense. But saying we should just break it across the board and not even consider having a good upgrade path for our users is unreasonable. I believe we can have a very good PHP 6, which is pretty much in sync with many of your feelings, but that provides a well documented and reasonable upgrade path (unlike VB - VB.NET). If you want to break everything and anything and don't want to be limited whatsoever by our huge user-base then maybe you should write a new language which fits exactly what your preference would be. The fact is though, that even after these discussions and the Paris discussions, the bulk of the idiosyncracies which make PHP what it is today will remain (as per agreement). So there must have been some kind of view even by the folks here that they don't want to create a new language but improve on what we have. And it's a trade-off between bang for the buck; sometimes it really brings high returns to break BC especially when it comes to security; but sometimes except for making 10 PHP devs happy who are not the bulk of our users it doesn't. So let's not oversimplify this situation. We have to continue to make trade-offs. Btw, one of PHP's strengths has been in high performance sites and with a Unicode=on only mode this would take quite a hit (but it's not the only reason why I need we need choice). In any case, I think on this question it does make sense that we start making informed decisions by understanding the migration path better, as opposed to just basing decisions on gut feelings. Maybe that kind of learning experience will proove me wrong (which may be so). Andi -Original Message- From: Lukas Kahwe Smith [mailto:[EMAIL PROTECTED] Sent: Monday, July 16, 2007 7:25 AM To: Andi Gutmans Cc: Ilia Alshanetsky; [EMAIL PROTECTED]; internals@lists.php.net Subject: Re: [PHP-DEV] POSIX regex Andi Gutmans wrote: Even in PHP 6 I am not sure it's a good idea. There are a huge amount of apps that use them and it'll be very hard for people to upgrade. Anyway, let's do some more research on that once we get closer to PHP 6 and see what the migration path looks like. We'll have to check with a few popular apps + google code search :) No need to decide on that right now without having more info. I disagree with this approach. The thing is that we need to get a clear message out ASAP. This all ties into topics like if we will have a unicode off/on switch or not. Delaying these decisions will hurt our userbase. We need to prepare them early. IMHO we should use PHP6 as the clean up release. Drop unicode on/off switch, accept that the bulk of all code will need to be rewritten from scratch. The benefit will be that it will truely be cleaned up, people will still be able to leverage the bulk of their PHP programming background