Re: [PHP-DEV] POSIX regex [PATCH]

2007-08-06 Thread Richard Lynch
On Mon, July 30, 2007 2:22 am, Richard Lynch wrote:
 On Mon, July 16, 2007 7:47 am, Jani Taskinen wrote:
 Now only places using the POSIX regex functions (ext/ereg/ excluded)
 are
 ext/standard/browscap.c and ext/pgsql/pgsql.c.

 For your review, my first patch (!) along with a php test case, of
 course, in a URL/directory structure that should be familiar:

 http://l-i-e.com/php5/ext/pgsql/

 :-)

 The commit comment should probably have something not unlike this:
 Use PCRE instead of POSIX regex
 Remove stray closing parenthesis in PG_TIME pattern

It's been a week and nobody has commented on this.

Should somebody commit it now?...

Or grant me commit karma to ext/pgsql

CVS username is 'lynch'

And, just to be sure, since it only changes internal workings and not
documented features, it should go into 5.x, right?...

Or is requiring PCRE instead of POSIX considered not BC for 5.x series?

I'll check PHP 6 pgsql and see if it's been Unicode-ified beyond
recognition for this patch, or if it applies cleanly there as well.

PS
I'll change the test case to do the insert with the converted data as
a further check that it worked, instead of a rather bogus test insert
of hand-coded data that it does now.

-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex [PATCH]

2007-07-30 Thread Richard Lynch
On Mon, July 16, 2007 7:47 am, Jani Taskinen wrote:
 Now only places using the POSIX regex functions (ext/ereg/ excluded)
 are
 ext/standard/browscap.c and ext/pgsql/pgsql.c.

For your review, my first patch (!) along with a php test case, of
course, in a URL/directory structure that should be familiar:

http://l-i-e.com/php5/ext/pgsql/

:-)

The commit comment should probably have something not unlike this:
Use PCRE instead of POSIX regex
Remove stray closing parenthesis in PG_TIME pattern

Real Hackers can snag the patch and play with it and hit 'delete' now.

Regarding the test case...

The existing pg_convert test case only tested 3 conversions and there
are/were 9 PCRE/POSIX-regex non-trivial conversions.

I didn't really want to mess with adding a bunch of columns to the
existing test table, potentially messing up a bunch of other test
scripts, so just created/dropped my own table to hit all 9 PCREs I
hacked.

There are many other conversions, actually, but they mostly consist of
no-op or typecasting an int to a string with no actual change, or
adding apostrophes around a value to make it DB-ready, and I didn't
touch those anyway, so they should be no less broken than they were
before.

I am, of course, 100% open to critiques, comments, or derogatory
remarks. :-)

PS
The function was and probably should remain experimental in the docs,
I guess...
Though I am pretty sure I did excise one bug with that stray paren. :-)

-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-29 Thread Richard Lynch
On Mon, July 16, 2007 7:47 am, Jani Taskinen wrote:

 Now only places using the POSIX regex functions (ext/ereg/ excluded)
 are
 ext/standard/browscap.c and ext/pgsql/pgsql.c.

As you may know, I'm working on converting ext/pgsql/pgsql.c to use
PCRE instead of POSIX regex.

It's actually going fairly well, believe it or not, though I have a
ton of debug printf's in my C code at the moment, to be sure I'm
hitting all the lines I want to hit to test everything.
[Yeah, I know, there are way more fancy tools available this decade,
but I'm old.]

At any rate, I've hit a bit of a snag, and either I'm being stupid, or
there's been a bad regex pattern in there for awhile now...

Does not this line:
http://lxr.php.net/source/php-src/ext/pgsql/pgsql.c#5021
have an extra closing paren at the end of the pattern?
[I am making my patch against PHP 5; it just hasn't changed since 4]

My PCRE patch is telling me it's broken there:
Warning: pg_convert(): Compilation failed: unmatched parentheses at
offset 47 in /home/lynch/pg_pcre.php on line 28
[Seems weird how I get a PHP error out of my C code, but there it is...]

My eyes and counting up/down on my fingers like I learned in Lisp
class in college does.

The Regex Coach says it does.

Does POSIX regex not generate some kind of error on an extra paren at
the end?

Or am I missing something particularly arcane or abstract here?

If it's actually broken:
My PCRE patch will just not have that extra closing paren, so don't
rush a patch through just for this, unless you really want to.

I expect to wrap up in a couple days, unless something totally
unexpected crops up.

-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-25 Thread Philip Olson

snip
Also I wonder how a unicode on/off switch will be handled on the  
documentation side. It would add more permutations in the  
documentation to have the switch. From my understanding the  
situation is fairly non trivial already in how to handle all the  
version dependent differences. Philipp, whats your take on this?


I don't think it matters for documentation because both routes have  
hurdles and planning requirements. But, it's exciting that we're  
worrying about this because it's time we educate the world to  
understand why unicode is useful, and why it's needed today. Andrei  
asked the documentation team to start the unicode documentation  
process long ago but given that nobody knows what PHP 6 will be, it  
makes that tough so we've (for time reasons too) done little.  
However, each function has a unicode section dedicated to it and  
general unicode feature sections planned. I don't know if a PHP 6  
version of the manual would be a good route to take but it's possible  
although I prefer shoving information into a users face, both past  
and present, so said user knows what to look for and worry about in  
all directions. Each function now has a changelog for that.


In reply to removing the directive, I fear that PHP 6 would be  
discussed as === PHP 5 + Unicode when this won't be true... yet this  
idea could persist and cause confusion so let's be sure everyone  
realizes this from day #.01. It's the main new (and big) feature  
only, so that's all we can promise. And in this scenario please  
decide what PHP 7 could be. Would we have 5/7, 7/8, or just 7 with  
unicode. In other words, coupled PHP versions forever? Or just once.  
And regardless, we need an effective marketing strategy via PHP.net  
that does not solely rely on third parties, word of mouth, or PHP's  
greatness like we've done in the past. This includes the website and  
documentation, and this includes strong efforts by everyone. Like,  
explaining ways to be forward compatible. And perhaps PHP 6 will  
bring with it a new web design, with pictures of little children from  
all around the world happily holding hands... :-)


So unless something truly innovative seeps up (maybe it has) then  
stealing ideas from other languages experience and growing pains  
(like Python and Java) sounds good. If a document existed that  
compared the situation in many programming languages, the pros and  
cons, that would be great and might shed light in many of the right  
places. At least, for me. And/or an update deciphering where we're at  
after all these lengthy unicode threads. If it's time to go old  
school with two sides presenting official statements/arguments, then  
a vote, then so be it. But I don't feel we're quite there yet.


Regards,
Philip

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] POSIX regex

2007-07-20 Thread Derick Rethans
On Wed, 18 Jul 2007, Andi Gutmans wrote:

 Functions would work properly with Unicode, but you would explicitly
 create Unicode strings e.g. ufoobar. This is not uncommon practice and
 many other languages actually go down this route incl. Python and
 various versions of C++ frameworks.

That's what I meant, Unicode is not implied so it doesn't work by 
default.

Derick

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-20 Thread Alexey Zakhlestin

On 7/20/07, Stanislav Malyshev [EMAIL PROTECTED] wrote:

I think on Windows you can do something with the registry per-dir too.
On unix there's no registry though. Maybe we need some generic solution
to this (like for FastCGI users)? Anybody has good ideas?


FastCGI users already can have their own php.ini for every application

--
Alexey Zakhlestin
http://blog.milkfarmsoft.com/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-20 Thread Pierre

On 7/20/07, Alexey Zakhlestin [EMAIL PROTECTED] wrote:

On 7/20/07, Stanislav Malyshev [EMAIL PROTECTED] wrote:
 I think on Windows you can do something with the registry per-dir too.
 On unix there's no registry though. Maybe we need some generic solution
 to this (like for FastCGI users)? Anybody has good ideas?

FastCGI users already can have their own php.ini for every application


Having 100 FCGI only because you have 100 different config is not an option.

--Pierre

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-20 Thread Jani Taskinen
On Thu, 2007-07-19 at 15:39 -0700, Andrei Zmievski wrote:
 Python did go down that road, but take a look at Python 3000 effort  
 and you will see that what they are trying to do is exactly what we  
 have: native Unicode strings, without prefixes.

So maybe we should learn from mistakes other have already made and not
do the same.. and remove that stupid option before it's too late.

--Jani

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-20 Thread Richard Lynch
On Thu, July 19, 2007 7:52 pm, Stanislav Malyshev wrote:
 Yeah I also like that casting better than the u

 It's different things. Casting means create string as binary, then in
 runtime cast it to unicode, u means this string is unicode.

Oh.

I think we're going to have to write some documentation on that one
before implementation, or a zillion users are gonna be very very
confused...

If it remains one of those undocumented function for any length of
time, expect mass confusion :-)

ustuff typed in unicode to allow creation of Unicode strings in PHP5
seems like a Good Idea to this naive reader, if it's easy enough to
code that.

It may even ease the transition from 5 to 6 for some?

Presumably ufoo would be a no-op in PHP 6 with semantics on and
not generate some kind of silly error or something, right?...

-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-20 Thread Richard Lynch
On Fri, July 20, 2007 3:07 am, Alexey Zakhlestin wrote:
 On 7/20/07, Stanislav Malyshev [EMAIL PROTECTED] wrote:
 I think on Windows you can do something with the registry per-dir
 too.
 On unix there's no registry though. Maybe we need some generic
 solution
 to this (like for FastCGI users)? Anybody has good ideas?

 FastCGI users already can have their own php.ini for every application

Perhaps the OP just needs a link to a good HowTo FastCGI reference...

http://www.fastcgi.com/docs/faq.html#PHP

It would be nice if it were a bit more specific about the CLI install
hack...

Or if PHP out of the box compiled --with-fastcgi as a different binary
name so there was no hack... :-v

-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-20 Thread Richard Lynch


On Thu, July 19, 2007 8:29 am, Jani Taskinen wrote:
 On Thu, 2007-07-19 at 15:47 +0300, Tomas Kuliavas wrote:
   From the low end user perspective I think this would be great
 from
  another POV. Let's imagine for a second that Wordpress will only
 work
  with unicode semantics off and that phpBB will only work with the
 switch
on. What if someone would want to run both on a shared server?

 from httpd.conf

 Directory /var/www/example.org/www/phpbb
php_admin_flag unicode.semantics on
 /Directory
 Directory /var/www/example.org/www/wp
php_admin_flag unicode.semantics off
 /Directory

 Hmm..I forgot that this works for ZEND_INI_SYSTEM type of options.
 Live and learn I guess. :)

 Too bad it only works for Apache module.. ;)

Maybe I'm being stupid, but why would this work when .htaccess isn't
supposed to work for Unicode on/off because it would require too much
gnarly ifdef-type code in PHP source?

Maybe this doesn't really really work at all and it's going to be a
problem?

-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-20 Thread Johannes Schlüter
On Fri, 2007-07-20 at 15:46 -0500, Richard Lynch wrote:
 ustuff typed in unicode to allow creation of Unicode strings in PHP5
 seems like a Good Idea to this naive reader, if it's easy enough to
 code that.

No, we can't introduce a unicode string type in PHP 5.

johannes

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] POSIX regex

2007-07-20 Thread Mike Robinson
Jani Taskinen writes:

 On Thu, 2007-07-19 at 15:39 -0700, Andrei Zmievski wrote:
  Python did go down that road, but take a look at Python 3000 effort 
  and you will see that what they are trying to do is exactly what we
  have: native Unicode strings, without prefixes.
 
 So maybe we should learn from mistakes other have already 
 made and not do the same.. and remove that stupid option 
 before it's too late.

You betcha! IMHO, it'll be a persistent ugliness and source of headaches and
regret for a long time.

Best Regards


Mike Robinson

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-19 Thread Richard Lynch
On Wed, July 18, 2007 10:45 am, Zeev Suraski wrote:

I also was thinking the other day, like Ze'ev, that PHP Devs aren't
really in touch with the unwashed masses of the userbase...

There are a zillion websites out there that run on shared hosts with
copy/pasted code and all these scripters will get burned big-time if
ereg is suddenly unavailable.

They don't really care about PCRE versus POSIX, so long as they can
get the job done.

I suspect all the shared webhosts will just install ereg once they
figure out that their users who never re-factor need it, but they'll
be pretty cranky with you for nuking it and making them jump through
an extra hoop to bring it back.

And all the distro package-maintainers will probably just bundle it
right into their packages.

And there will be tutorials on how to compile PHP with ereg in it, or
how to add it back into windows, or how to install PECL ereg.

So just yanking ereg will cause a fair amount of grief, followed by
the dubious benefit of thousands of users figuring out how to install
a PECL module.

Any gurus really offended by ereg can --disable-ereg or whatever it
is, no?

At least just spit out an E_DEPRECATED in PHP 6, and move it to PECL
in PHP 7.

Give people enough warning that it's going away before nuking it, so
that you can at least say You've been warned for a whole major
release that it was going away.

I suspect you'll still end up with people just installing it rather
than re-writing their code, though, so it's not serving any real
purpose to any real users to move it.

The people who need to use PCRE exclusively can do that already.

The people who need their legacy code to work will just have to jump
through an extra hoop.

What purpose is served, then, in moving ereg out? None, really.

PS
I'm working on the PostgreSQL POSIX-PCRE patch, as I don't think PHP
itself should need ereg.

-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-19 Thread Lukas Kahwe Smith

Richard Lynch wrote:


Any gurus really offended by ereg can --disable-ereg or whatever it
is, no?


So in a dream world, Rasmus would have shipped all the features of PHP 
42 as his first release.


In a slightly less dreamy, but still unrealistic world, we would have 
infinite development resources to maintain all the BC hacks in the world.


In reality, we have limited resources, so its not about being 
offended, its about yet another extension that is redundant that needs 
to be supported. This is the point with a lot of this. How do we set the 
priorities in managing the scarce resources. For the most part, this is 
pretty automatic: whatever people do is what we priorities, the other 
stuff is left for someone else to pick up if they care. Obviously it's 
not quite that extreme, since there are several people that are willing 
to do stuff they do not need (or they have a company sponsoring them), 
just to move PHP forward.


regards,
Lukas

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] POSIX regex

2007-07-19 Thread Richard Lynch
On Wed, July 18, 2007 3:04 am, Derick Rethans wrote:
 I hope you are not suggesting to port them to both modes? Why on earth
 should an application support both unicode=off and unicode=on? That's
 exactly the thing that some of us are so afraid of and want to prevent
 as this just annoys more and more PHP users that have to deal with
 this
 stuff.  And as mentioned before, having both modes is *way* worse than
 having to real with register_globals on/off or magic_quotes, as those
 two cases could atleast be handled in user space.

I suspect some apps can only be reasonably ported one way or the other.

But one would hope that an app could make the choice to go either way,
and not have a nightmare experience.

The purpose of the PHP Devs doing a port is not to release both
versions, or either version, but to find out if it can actually be
done without major grief for either version.

-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-19 Thread Keryx Web

Zeev Suraski wrote:

 Other than the theological views some people on this list have 
(either very pro-BC or anti-BC), what did keeping BC cost us?


Hey that must be me he is talking about - as I am a real theologian!

So for a theologians 2c on Unicode:

1. Teaching unicode and PHP

As stated elsewhere I am *working* as a teacher. I follow this list for 
one *main* purpose and that is I am trying to remedy the extremely sad 
situation when it comes to books and other teaching material about PHP 
in Sweden. All books we have got by Swedish authors are so bad that I 
actively discourage people from reading it!


I am trying to write an advanced newbie book that will focus on PHP 6 
(+ some HTML 5, CSS 3 and JS 2), with an emphasis on best practice.


In Sweden we can do nicely with iso-8859-1 (we do not even need the 
stinkin' euro-symbol!) But I have students that have developed websites 
in Arabic, Kurdish and Hindi!


I am appalled to see some comments even seemingly questioning if Unicode 
is worthwhile at all. That's a no brainer! i18n is the next big move on 
the web. But what technique would be easier to grasp when it comes to 
switching it on or off? Considering that PHP:s main strength always 
has been its low entry barrier, I think this is a reasonable 
consideration. And maybe I am the only one on this list that deals daily 
with newbies...?


From this POV I would definitely say that it would be easier to teach 
that in PHP 6 unicode is always on and in PHP 5 it's N/A. I do however 
find the arguments compelling that such an ideal would be impractical.


My second best option would be something that can be turned on or off 
within the scripts, i.e. with ini_set or per directory with .htaccess


From the low end user perspective I think this would be great from 
another POV. Let's imagine for a second that Wordpress will only work 
with unicode semantics off and that phpBB will only work with the switch 
 on. What if someone would want to run both on a shared server?


But as my commit karma is zero I do not know if this is feasible at all.

2. User base.

There is not one voice on this list as far as I can tell that is from 
the CJK-language hemisphere. Is it part of the PHP way to Europe/America 
ethnocentric?


I think it would be a noble thing to actively try to engage PHP 
developers from Asia in this discussion. (Well, besides the Israeli 
ones... who *are* doing a great job!)


3. Adoption rate.

When PHP 5 was new we got two books in Sweden claiming to teach this 
version. When I read them there was so little PHP 5 in there that it was 
scary. Even today most resources that newbies read tend to teach PHP 4. 
Most discussion fora - at least in Sweden - discuss PHP 4 solutions to 
peoples problems.


This spring I actually taught my students PDO - but then my wife got ill 
and had a heart transplant. When I got back to school and started 
grading my students work, all but two had switched to the mysql 
extension. I asked why, and all said that they had found tutorials and 
help in a discussion forum, all teaching the old way.


I undertook a study: All four totally dominant sites in Sweden where a 
young developer would turn, all teach PHP 4. (Two of them also teach 
table-based-layout, unsemantic, inaccessible, proprietary HTML and 
obtrusive browser-sniffing old school DHTML.)


Conclusion: Every advance in PHP internally has to be communicated to us 
who teach PHP and the easier something is, the more likely it is that it 
will be picked up.



Lars Gunther

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-19 Thread Tomas Kuliavas
  From the low end user perspective I think this would be great from
 another POV. Let's imagine for a second that Wordpress will only work
 with unicode semantics off and that phpBB will only work with the switch
   on. What if someone would want to run both on a shared server?

from httpd.conf

Directory /var/www/example.org/www/phpbb
   php_admin_flag unicode.semantics on
/Directory
Directory /var/www/example.org/www/wp
   php_admin_flag unicode.semantics off
/Directory

Code written to work in unicode.semantics = off, can work in
unicode.semantics=on. It just has to deal with functions that expect
binary strings instead of PHP5 strings. Other side effects of
unicode.semantics=on can be switched off without breaking backwards
compatibility.

-- 
Tomas

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-19 Thread Jani Taskinen
On Thu, 2007-07-19 at 14:27 +0200, Keryx Web wrote:
 one *main* purpose and that is I am trying to remedy the extremely sad 
 situation when it comes to books and other teaching material about PHP 
 in Sweden. All books we have got by Swedish authors are so bad that I 
 actively discourage people from reading it!

Perhaps you should teach the students english? And encourage them to
read english books which are widely available.. :D

I really thought most swedes do learn english in school? Like we finns
do.. :)

 another POV. Let's imagine for a second that Wordpress will only work 
 with unicode semantics off and that phpBB will only work with the switch 
   on. What if someone would want to run both on a shared server?

Very good point.

--Jani

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-19 Thread Jani Taskinen
On Thu, 2007-07-19 at 15:47 +0300, Tomas Kuliavas wrote:
   From the low end user perspective I think this would be great from
  another POV. Let's imagine for a second that Wordpress will only work
  with unicode semantics off and that phpBB will only work with the switch
on. What if someone would want to run both on a shared server?
 
 from httpd.conf
 
 Directory /var/www/example.org/www/phpbb
php_admin_flag unicode.semantics on
 /Directory
 Directory /var/www/example.org/www/wp
php_admin_flag unicode.semantics off
 /Directory

Hmm..I forgot that this works for ZEND_INI_SYSTEM type of options.
Live and learn I guess. :)

Too bad it only works for Apache module.. ;)

--Jani

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-19 Thread Stanislav Malyshev

Too bad it only works for Apache module.. ;)


I think on Windows you can do something with the registry per-dir too.
On unix there's no registry though. Maybe we need some generic solution 
to this (like for FastCGI users)? Anybody has good ideas?

--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED]   http://www.zend.com/
(408)253-8829   MSN: [EMAIL PROTECTED]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-19 Thread Pierre

On 7/19/07, Stanislav Malyshev [EMAIL PROTECTED] wrote:

 Too bad it only works for Apache module.. ;)

I think on Windows you can do something with the registry per-dir too.
On unix there's no registry though. Maybe we need some generic solution
to this (like for FastCGI users)? Anybody has good ideas?


Yes, merge htscanner (pecl) into the core (sapi hooks or something
like that). Doing so will also kill the couple of limitations due to
the init order in php. It is on my todos, but I would appreciate any
help :)

Cheers,
--Pierre

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-19 Thread Andrei Zmievski
Python did go down that road, but take a look at Python 3000 effort  
and you will see that what they are trying to do is exactly what we  
have: native Unicode strings, without prefixes.


-Andrei


On Jul 18, 2007, at 11:51 AM, Andi Gutmans wrote:


Functions would work properly with Unicode, but you would explicitly
create Unicode strings e.g. ufoobar. This is not uncommon  
practice and

many other languages actually go down this route incl. Python and
various versions of C++ frameworks.


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-19 Thread Stanislav Malyshev
Python did go down that road, but take a look at Python 3000 effort and 
you will see that what they are trying to do is exactly what we have: 
native Unicode strings, without prefixes.


Maybe still having u - that always produce unicode, regardless of 
semantics - could be helpful...


--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED]   http://www.zend.com/
(408)253-8829   MSN: [EMAIL PROTECTED]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] POSIX regex

2007-07-19 Thread scott.mcnaught
I don't like the idea of having a u prefix for Unicode strings.  It may
improve performance, and give you some level of fine grain control, but...

- It breaks your keep php simple policy by introducing a lot of new
functions (ugly).
- I (plus a lot of others) have an existing php5 application which I wish to
eventually use with Unicode, and like others, I don't want to spend time
refactoring.
- It will also introduce bugs when programmers accidentally forget to add
the u prefix when working with unicode. 

If you always want to produce Unicode, I think its best to always use a cast
or a conversion function.

Eg 

$str = (unicode)(strtoupper($str));
Or
$str = unicode_val(strtoupper($str));

My 2c :)




-Original Message-
From: Stanislav Malyshev [mailto:[EMAIL PROTECTED] 
Sent: Friday, 20 July 2007 8:47 AM
To: Andrei Zmievski
Cc: Andi Gutmans; Derick Rethans; Lukas Kahwe Smith; Ilia Alshanetsky;
[EMAIL PROTECTED]; internals@lists.php.net
Subject: Re: [PHP-DEV] POSIX regex

 Python did go down that road, but take a look at Python 3000 effort and 
 you will see that what they are trying to do is exactly what we have: 
 native Unicode strings, without prefixes.

Maybe still having u - that always produce unicode, regardless of 
semantics - could be helpful...

-- 
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED]   http://www.zend.com/
(408)253-8829   MSN: [EMAIL PROTECTED]

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-19 Thread David Coallier

On 7/19/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

I don't like the idea of having a u prefix for Unicode strings.  It may
improve performance, and give you some level of fine grain control, but...

- It breaks your keep php simple policy by introducing a lot of new
functions (ugly).
- I (plus a lot of others) have an existing php5 application which I wish to
eventually use with Unicode, and like others, I don't want to spend time
refactoring.
- It will also introduce bugs when programmers accidentally forget to add
the u prefix when working with unicode.

If you always want to produce Unicode, I think its best to always use a cast
or a conversion function.

Eg

$str = (unicode)(strtoupper($str));
Or
$str = unicode_val(strtoupper($str));

My 2c :)



Yeah I also like that casting better than the u

$0.02 :P





-Original Message-
From: Stanislav Malyshev [mailto:[EMAIL PROTECTED]
Sent: Friday, 20 July 2007 8:47 AM
To: Andrei Zmievski
Cc: Andi Gutmans; Derick Rethans; Lukas Kahwe Smith; Ilia Alshanetsky;
[EMAIL PROTECTED]; internals@lists.php.net
Subject: Re: [PHP-DEV] POSIX regex

 Python did go down that road, but take a look at Python 3000 effort and
 you will see that what they are trying to do is exactly what we have:
 native Unicode strings, without prefixes.

Maybe still having u - that always produce unicode, regardless of
semantics - could be helpful...

--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED]   http://www.zend.com/
(408)253-8829   MSN: [EMAIL PROTECTED]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php





--
David Coallier,
Founder  Software Architect,
Agora Production (http://agoraproduction.com)
51.42.06.70.18

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-19 Thread Andrei Zmievski
On Jul 19, 2007, at 4:14 PM, [EMAIL PROTECTED]  
[EMAIL PROTECTED] wrote:


I don't like the idea of having a u prefix for Unicode strings.   
It may
improve performance, and give you some level of fine grain control,  
but...


- It breaks your keep php simple policy by introducing a lot of new
functions (ugly).
- I (plus a lot of others) have an existing php5 application which  
I wish to
eventually use with Unicode, and like others, I don't want to spend  
time

refactoring.
- It will also introduce bugs when programmers accidentally forget  
to add

the u prefix when working with unicode.

If you always want to produce Unicode, I think its best to always  
use a cast

or a conversion function.

Eg

$str = (unicode)(strtoupper($str));
Or
$str = unicode_val(strtoupper($str));


Good idea and it will totally work, except that it won't. strtoupper 
() operates in different ways according to the type of the string  
that it gets.


-Andrei

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] POSIX regex

2007-07-19 Thread scott.mcnaught
I don't really know much about unicode, and to be honest, I don't really
know much about the internal workings of php.
But I assume that there are going to be different implementations of string
functions depending on whether the string is unicode or not.

I'm going to suggest an implementation suggestion... Keep in mind I havent
hacked around with php source, so my variable naming etc will be wrong...
and its all psuedocode, so its not

// The object type used when php creates a string
class ZendString
{
char *strPtr; // however strings are stored in php
ZendStringFunctions *pFunctions;
};


abstract class ZendStringFunctions
{
abstract function strtolower(ZendString *pStr);
abstract function strtoupper(ZendString *pStr);
abstract function substr(ZendString *pStr);

// All functions that differ depending on unicode / non-unicode
implementation
// ...
};

// A set of string functions for unicode strings
class ZendStringFunctionsUnicode
{
function strtolower(ZendString *pStr)
{
// unicode implementation
}

function strtoupper(ZendString *pStr)
{
// unicode implementation
}

function substr(ZendString *pStr)
{
// unicode implementation
}
};

// A set of string functions for non-unicode strings
class ZendStringFunctionsNonUnicode
{
function strtolower(ZendString *pStr)
{
// non-unicode implementation
}

function strtoupper(ZendString *pStr)
{
// non-unicode implementation
}

function substr(ZendString *pStr)
{
// non-unicode implementation
}
};


// the strtolower implmentation
ZEND_FUNC strtolower(ZendString *pStr)
{
return pStr-pFunctions-strtolower(pStr);
}

// the strtoupper implmentation
ZEND_FUNC strtolower(ZendString *pStr)
{
return pStr-pFunctions-strtolower(pStr);
}

ZEND_FUNC unicode_val(ZendString *pStr)
{
// do something with pStr-strPtr
delete pStr-pFunctions;
pStr-pFunctions = new ZendStringFunctionsUnicode();
}


Anyway - the point I'm trying to make is to use function pointers to switch
between implementations. 

You could even make the ZendStringFunctions singletons and just set
pStr-pFunctions to an instance of the singleton.

I think this would provide a very fast implementation of what is trying to
be done.

Im just making a suggestion, and feel free to ignore/criticise me if im
wrong.  I don't know anything about phps internals... Just an idea

Scott


-Original Message-
From: Andrei Zmievski [mailto:[EMAIL PROTECTED] 
Sent: Friday, 20 July 2007 9:36 AM
To: [EMAIL PROTECTED]
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] POSIX regex

On Jul 19, 2007, at 4:14 PM, [EMAIL PROTECTED]  
[EMAIL PROTECTED] wrote:

 I don't like the idea of having a u prefix for Unicode strings.   
 It may
 improve performance, and give you some level of fine grain control,  
 but...

 - It breaks your keep php simple policy by introducing a lot of new
 functions (ugly).
 - I (plus a lot of others) have an existing php5 application which  
 I wish to
 eventually use with Unicode, and like others, I don't want to spend  
 time
 refactoring.
 - It will also introduce bugs when programmers accidentally forget  
 to add
 the u prefix when working with unicode.

 If you always want to produce Unicode, I think its best to always  
 use a cast
 or a conversion function.

 Eg

 $str = (unicode)(strtoupper($str));
 Or
 $str = unicode_val(strtoupper($str));

Good idea and it will totally work, except that it won't. strtoupper 
() operates in different ways according to the type of the string  
that it gets.

-Andrei

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] POSIX regex

2007-07-19 Thread scott.mcnaught
Sorry if you are using outlook, turn off the thing that says Extra line
breaks in this message were removed at the top of my previous message.

Scott


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Friday, 20 July 2007 10:11 AM
To: internals@lists.php.net
Subject: RE: [PHP-DEV] POSIX regex

I don't really know much about unicode, and to be honest, I don't really
know much about the internal workings of php.
But I assume that there are going to be different implementations of string
functions depending on whether the string is unicode or not.

I'm going to suggest an implementation suggestion... Keep in mind I havent
hacked around with php source, so my variable naming etc will be wrong...
and its all psuedocode, so its not

// The object type used when php creates a string
class ZendString
{
char *strPtr; // however strings are stored in php
ZendStringFunctions *pFunctions;
};


abstract class ZendStringFunctions
{
abstract function strtolower(ZendString *pStr);
abstract function strtoupper(ZendString *pStr);
abstract function substr(ZendString *pStr);

// All functions that differ depending on unicode / non-unicode
implementation
// ...
};

// A set of string functions for unicode strings
class ZendStringFunctionsUnicode
{
function strtolower(ZendString *pStr)
{
// unicode implementation
}

function strtoupper(ZendString *pStr)
{
// unicode implementation
}

function substr(ZendString *pStr)
{
// unicode implementation
}
};

// A set of string functions for non-unicode strings
class ZendStringFunctionsNonUnicode
{
function strtolower(ZendString *pStr)
{
// non-unicode implementation
}

function strtoupper(ZendString *pStr)
{
// non-unicode implementation
}

function substr(ZendString *pStr)
{
// non-unicode implementation
}
};


// the strtolower implmentation
ZEND_FUNC strtolower(ZendString *pStr)
{
return pStr-pFunctions-strtolower(pStr);
}

// the strtoupper implmentation
ZEND_FUNC strtolower(ZendString *pStr)
{
return pStr-pFunctions-strtolower(pStr);
}

ZEND_FUNC unicode_val(ZendString *pStr)
{
// do something with pStr-strPtr
delete pStr-pFunctions;
pStr-pFunctions = new ZendStringFunctionsUnicode();
}


Anyway - the point I'm trying to make is to use function pointers to switch
between implementations. 

You could even make the ZendStringFunctions singletons and just set
pStr-pFunctions to an instance of the singleton.

I think this would provide a very fast implementation of what is trying to
be done.

Im just making a suggestion, and feel free to ignore/criticise me if im
wrong.  I don't know anything about phps internals... Just an idea

Scott


-Original Message-
From: Andrei Zmievski [mailto:[EMAIL PROTECTED] 
Sent: Friday, 20 July 2007 9:36 AM
To: [EMAIL PROTECTED]
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] POSIX regex

On Jul 19, 2007, at 4:14 PM, [EMAIL PROTECTED]  
[EMAIL PROTECTED] wrote:

 I don't like the idea of having a u prefix for Unicode strings.   
 It may
 improve performance, and give you some level of fine grain control,  
 but...

 - It breaks your keep php simple policy by introducing a lot of new
 functions (ugly).
 - I (plus a lot of others) have an existing php5 application which  
 I wish to
 eventually use with Unicode, and like others, I don't want to spend  
 time
 refactoring.
 - It will also introduce bugs when programmers accidentally forget  
 to add
 the u prefix when working with unicode.

 If you always want to produce Unicode, I think its best to always  
 use a cast
 or a conversion function.

 Eg

 $str = (unicode)(strtoupper($str));
 Or
 $str = unicode_val(strtoupper($str));

Good idea and it will totally work, except that it won't. strtoupper 
() operates in different ways according to the type of the string  
that it gets.

-Andrei

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-19 Thread Stanislav Malyshev

Yeah I also like that casting better than the u


It's different things. Casting means create string as binary, then in 
runtime cast it to unicode, u means this string is unicode.

--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED]   http://www.zend.com/
(408)253-8829   MSN: [EMAIL PROTECTED]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-19 Thread David Coallier

On 7/19/07, Stanislav Malyshev [EMAIL PROTECTED] wrote:

 Yeah I also like that casting better than the u

It's different things. Casting means create string as binary, then in
runtime cast it to unicode, u means this string is unicode.
--


You are right that casting means string - binary, and that runtime
cast to unicode means a string is unicode, however, after speaking
with many php developers (not internals), the same answer always come
up It's ugly. Does that simply mean that it's ugly ? I believe not,
it means that it's also unreadable, unclear at first look, and easy to
overlook.

One solution that I could foresee would be to recognize (unicode)
within a function call.
Ex:
strlen( (unicode) Óglaig);

This would runtime-cast Óglaig to a unicode string.

Expected answers:
1) I don't find it to be useful and better than u
A: It is more readable, easier to find/notice and simply cleaner.

2) No
A: Ok..

3) It's going against the usual casting standard of (type)
A: True


The decision probably has been made already and if so just let me know
and I'll stop trying to rise a voice for the community :P


And no, I do not have a patch ;-)



Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED]   http://www.zend.com/
(408)253-8829   MSN: [EMAIL PROTECTED]





--
David Coallier,
Founder  Software Architect,
Agora Production (http://agoraproduction.com)
51.42.06.70.18

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Pierre

On 7/18/07, Zeev Suraski [EMAIL PROTECTED] wrote:


Pierre,

I wanted to send my 2c even though I'm not really involved in
internals@ any longer - because in reality it doesn't really have
much to do with such decisions.  internals@ makes decisions that
effect the entire PHP userbase.

We all need to remember that the people on this mailing list are not
close to something that represents the userbase.  We do have some
very opinionated people on this list, some of them with a lot of
commit-karma - which are not very open to feedback from regular
users.  I'm not saying I represent the PHP userbase, and I don't
think Andi is saying this either - but both of us try to take the end
user's view when we think about stuff like this, as opposed as the
internals@ PHP developer view.  I would go as far as saying that I
think we do it (as well as some others, like Rasmus) more so than
some others on this list.

For that reason I suspect that if you moved the discussion to, say,
php-general - you'd see a much more balanced view of the
world.  Unfortunately it will probably not be very
manageable.  Something more practical would be trying to think about
things from the end users perspective as opposed to our perspective
as the developers and maintainers of the language.

Finally, at the risk of sounding like a broken record, we always need
to remember that BC breakage accumulates, and it's not binary.  Every
cleanup we do in PHP 6 will further slow migration, and as Andi
pointed out a few days ago, things don't look too well as it is.

As for ereg - especially in light of the discontinuation of PHP 4 we
shouldn't even consider removing it in PHP 5.  I agree with Andi that
I'm not sure it's a good idea for PHP 6 either, but I'm not sure it
isn't either.  As long as it's easy enough to turn it back on (i.e.
have it bundled but disabled) I think it's not unreasonable.


My answer to Andi was not only about ereg but php6 in general (the
unicode flag being a much more important problem that ereg, for
example).

I fully agree with you. Each individual here does not represent the
user base but only a relative small part.

However, my problem here is not about that but about the respect of
our voices. It is understandable that you think to have a brighter
customers base, it is not necessary the case. not historically and not
practically. Conferences attendees are also a very small part of our
users.

All in all, internals developers, with their customers, coworkers or
users (Ez, PEAR, linux package maintainers, etc.) do represent what I
consider as a good representation of what our users are or like to
have.

About the migration path, we should not forget our PHP5 lessons. All
Andi is trying to do was what was done with PHP5. Many cleanups have
not been done for the sake of BC breaks and migration troubles. We
know now that it does not matter. Users migrate when they have to or
need to not just for the fun of it.

Finally, you are right to say that an opinion has little to do with
the commit karma.

Cheers,
--Pierre

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] POSIX regex

2007-07-18 Thread Derick Rethans
On Tue, 17 Jul 2007, Andi Gutmans wrote:

 Hmm I don't quite understand what bad code vs. good code plays here.
 Wordpress is one of the most popular applications out there so it's got
 huge value to our community. I bet there's a huge amount of PHP
 applications who's source code is of the same quality or worse. Anyway,
 the issues I have seen would also be relevant to what you call good
 code but again, when it comes to compatibility, I don't quite know why
 that will play a big role.
 
 I am talking about porting to both unicode_semantics=off and on. This
 will give us a good understanding of the difference of the modes and
 where we're at. I bet most people who are voicing their opinions have
 actually not tried to write a sizeable application with PHP 6 and also
 tried to run an existing one on PHP 6 
 (unciode_semantics=on).

I hope you are not suggesting to port them to both modes? Why on earth 
should an application support both unicode=off and unicode=on? That's 
exactly the thing that some of us are so afraid of and want to prevent 
as this just annoys more and more PHP users that have to deal with this 
stuff.  And as mentioned before, having both modes is *way* worse than 
having to real with register_globals on/off or magic_quotes, as those 
two cases could atleast be handled in user space. 

regards,
Derick

-- 
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Derick Rethans
On Tue, 17 Jul 2007, Stanislav Malyshev wrote:

  that would actually benefit quite a bit from unicode support, but I guess
  you are talking about porting with unicode==off, right?
 
 unicode=off doesn't mean no unicode support, btw.

Of course that's what it means, as none of the string functions work 
properly with unicode if you turn it off. And that's just the whole 
selling point of Unicode support.

Derick

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Jani Taskinen
On Wed, 2007-07-18 at 10:20 +0200, Derick Rethans wrote:
 On Wed, 18 Jul 2007, Zeev Suraski wrote:
  As for ereg - especially in light of the discontinuation of PHP 4 we 
  shouldn't even consider removing it in PHP 5.
 
 I don't think anybody wanted to remove it in PHP 5 - just make it 
 possible to disable as an extension.

I guess it was misunderstood: All the talk about it concerns HEAD only,
not PHP 5. But I will MFH the move to ext in PHP_5_3 though. Helps
future merges around when the changes are in both branches. 

--Jani

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Derick Rethans
On Tue, 17 Jul 2007, Lukas Kahwe Smith wrote:

 Andi Gutmans wrote:
 
  There are clear things we want to change (like register_globals) because
  we believe that ultimately they have a significant benefit to our users
  with controllable downside (there is an easy one line workaround which
  we can document for people to get their old apps to work). There are
  other areas where breaking BC makes sense. But saying we should just
  break it across the board and not even consider having a good upgrade
  path for our users is unreasonable. I believe we can have a very good
  PHP 6, which is pretty much in sync with many of your feelings, but that
  provides a well documented and reasonable upgrade path (unlike VB -
  VB.NET). 
 
 I never said we should break BC just for the hell of it. The goal must be that
 PHP6 feels and behaves like PHP. Its not about high-jacking PHP to come up
 with the language we all wanted instead.
 
  So let's not oversimplify this situation. We have to continue to make
  trade-offs.
 
 Sure, but you are suggesting to delay decisions indefinitely. Either you are
 saying this because you already decided that you don't want this change,

Doh, isn't that obvious? 

regards,
Derick

-- 
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Zeev Suraski

At 00:55 18/07/2007, Pierre wrote:

My answer to Andi was not only about ereg but php6 in general (the
unicode flag being a much more important problem that ereg, for
example).

I fully agree with you. Each individual here does not represent the
user base but only a relative small part.

However, my problem here is not about that but about the respect of
our voices. It is understandable that you think to have a brighter
customers base, it is not necessary the case. not historically and not
practically. Conferences attendees are also a very small part of our
users.

All in all, internals developers, with their customers, coworkers or
users (Ez, PEAR, linux package maintainers, etc.) do represent what I
consider as a good representation of what our users are or like to
have.


I think that they're still quite far away from a real coverage of the 
entire userbase.  Each of them sees a certain part of the userbase 
through a different prism.  I think that some of us get to see people 
through some more prisms than others, and you may very well be one of 
them - but they are still prisms, and I *think* that most of us don't 
get to meet some of the lower 'average' developers.  The ones that 
don't respond to blogs, go to conferences, let alone participate in 
[EMAIL PROTECTED]  The ones who constitute the vast majority of PHP 
developers around the world - those using it to get their job done.
If you noticed, I didn't just speak about the users that I meet, but 
trying to put myself in the average user's place using a simple 
thought experiment.  I think using this approach (the famous 'WTF 
factor' is a part of that) helped PHP tremendously and was one of the 
key reasons for its success.
That's why I'm pretty confident you'd get a very different (much more 
balanced) view of the world if you ask the question in a more neutral 
environment - such as php-general (and even that list arguably 
includes people with above-average interest in PHP - given that we're 
talking about millions of developers and only thousands of 
subscribers).  Can I realize, from an end-user's point of view, why 
the removal of a certain feature that I'm using would help me?  Or 
will it be much easier for me to imagine the pain involved with 
working around it?


Other than the theological views some people on this list have 
(either very pro-BC or anti-BC), what did keeping BC cost us?



About the migration path, we should not forget our PHP5 lessons. All
Andi is trying to do was what was done with PHP5. Many cleanups have
not been done for the sake of BC breaks and migration troubles. We
know now that it does not matter. Users migrate when they have to or
need to not just for the fun of it.


I think we're learning very different lessons from the same facts.

PHP 5 migration stalled because of several reasons, the key of which 
are (IMHO):

1.  Misperception about the level of compatibility breakage.
2.  Correct perception that moving to PHP 5 requires a full QA cycle 
of your entire codebase with full code coverage (assuming you're 
running a critical app that you can't afford to break, which needless 
to say thousands and thousands of users do);  And contrary to popular 
belief, that's actually a very very big deal.


In the shared hosting arena there's supposedly also lack of support 
for PHP 5 deployment, although the big hosters I've been in touch 
with have provided PHP 5 support (as an option) a couple of months 
after its release, so I'm not sure how much this had to do with it.


Is the lesson we should learn that we need to turn #1 into a correct 
perception, requiring substantial changes and potentially a full code 
audit, and make the migration much more difficult?  Would we ever be 
able to discontinue PHP 5 if migrating to PHP 6 is a truly tough 
task, like we just did with PHP 4?


The less undue compatibility breakage we introduce the better.  I 
hope we can agree on that - turning the discussion into what's 
exactly 'due' and what is 'undue'.


IMHO - if we remove the unicode=off mode, we'll have to support PHP 5 
(unlike we supported PHP 4 with bugfixes only for the most part - but 
with true backporting of all key features, apps  frameworks running 
properly on both versions, etc.) or seriously risk losing our 
userbase.  Given that we managed to nail it fairly well already, I 
can't understand why we would want to do that and increase the 
chances of PHP 6 being a flop quite significantly.


Zeev 


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Zeev Suraski

At 01:20 18/07/2007, Derick Rethans wrote:

This sounds like a broken record, this sounds like a broken record, this
sounds like a broken record.  I've heard this so many times now, it
get's boring.


I'm not surprised, but it doesn't change the fact that it's true, though.
No matter how many times this will be discussed or disputed, the more 
we break - the harder it is for our users to move.  It's an axiom, 
and we have to live with it, even if it gets easy to repress it and 
take all sorts of opportunities for an end-of-the-season 
compatibility breakage sale.



 You seem to think that none of the people on the internals
list are part of the user base - that is incorrect. Most of my opinions
come forth out of my involvement with an extremely large code base.


I didn't say that, I did say that they (myself included) don't 
represent the PHP userbase at large and I fully stand behind that statement.
Read my other post from a couple of minutes ago for an explanation as 
to what I mean.



 I'm not saying I represent the PHP userbase, and I don't think Andi is
 saying this either - but both of us try to take the end user's view
 when we think about stuff like this, as opposed as the internals@ PHP
 developer view.  I would go as far as saying that I think we do it (as
 well as some others, like Rasmus) more so than some others on this
 list.

Regarding the unicode on/off modes, I don't think you put yourself in
the developer's view at all. Users are not going to be better of having
to deal with both modes.


Well, I tend to agree with you that they shouldn't have to handle 
BOTH modes (write code that works with both settings).  But they will 
definitely be better off if they can choose one of these modes and 
develop/deploy for it.


For someone for whom PHP 6 is a non-item (no interest in Unicode), 
moving to PHP 6 and being forced to audit his code will be a 
completely unreasonable cost of migration.  A clear 'not worth it' situation.



 For that reason I suspect that if you moved the discussion to, say,
 php-general - you'd see a much more balanced view of the world.

I really doubt that, as that list does not include many people that use
PHP for internal projects. It's mostly the geeks that have time to
discuss on the list. I know that *many* PHP users don't either know
about this list, or simply can't be bothered with it.


You know what, I agree.  I wrote something to that effect in my post 
from a few minutes ago.  The vast userbase is mostly comprised of 
people we hardly even get to see.



 As for ereg - especially in light of the discontinuation of PHP 4 we
 shouldn't even consider removing it in PHP 5.

I don't think anybody wanted to remove it in PHP 5 - just make it
possible to disable as an extension.


Great, I misunderstood.

Zeev 


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Rasmus Lerdorf
Derick Rethans wrote:
 Regarding the unicode on/off modes, I don't think you put yourself in 
 the developer's view at all. Users are not going to be better of having 
 to deal with both modes.

Have you guys really thought this through?

Let's look at this from two angles.

First, from the our perspective maintaining and developing PHP.  Without
the Unicode switch, and as has already been suggested, PHP 5 will never
die.  Anything new in PHP 6 that isn't related to Unicode will be
backported to PHP 5.  Or, a slight variation of that, any developer with
no interest in Unicode will only work on the PHP 5 branch and not bother
worrying about whether it works in PHP 6 forcing others to do that work.
 I don't think we have the resources to do this, and I think it is
likely to either create 2 classes of developers and potentially
diverging trees, or it may simply kill off the Unicode effort altogether
if not enough developers bother looking at PHP 6 since PHP 5 will live
forever and is free of all this annoying Unicode stuff that is just too
complicated to deal with.

Second, from the user space PHP developers' perspective.  There are two
groups of those out there.  There is the group that builds apps for
controlled environments.  Yahoo, Facebook, and the hundreds, if not
thousands of smaller companies out there that will define a certain PHP
configuration and code against that.  To them such a switch isn't a big
deal except when it comes to re-using external code.  Which bring us to
the second group which is the group that strives to build portable apps
designed to run on as many unknown PHP configs as possible.  This is the
group that will get hit by this, and here is where we need to figure out
how to cause them the least amount of pain.  They are going to feel some
pain in order to get their heads around Unicode no matter how we handle
this.  For the portion of these folks who don't want to worry about
Unicode at all and they actually have code that does stuff on binary
strings that will break, their stuff just won't work no matter what we
do.  The difference comes down to whether it gets marked as PHP5-only or
it gets marked as non-Unicode-only.  And the other camp who do want to
make sure their stuff supports Unicode will need to write the Unicode
and non-Unicode versions and check to see if the system they are running
on supports Unicode or not.  Whether they check the PHP version number,
or the Unicode switch, or probe directly for the features they need, it
ends up being about the same amount of pain.

What may be somewhat lost in all this, that I hope nobody here is
forgetting, is that smooth Unicode support is really important.  Being
able to work directly in your native charset with your native strings
without having to deal with iconv and other crap is the goal here.  And
let's also not forget that a lot of code will actually work unchanged in
PHP 6 Unicode-mode and suddenly be Unicode-capable where they weren't
before.  I would love to see all this energy put toward making sure as
much code as possible falls into this category instead of arguing about
where to put the Unicode switch.  It's still a switch whether you put it
in the version number or in the .ini file.  In the version number it is
simply easier for people to ignore from all sides or the discussion
here, but where does that leave us 4 years from now?

Perhaps the real argument here is whether we should be doing Unicode at all?

-Rasmus

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread William A. Rowe, Jr.
Rasmus Lerdorf wrote:
 
 Perhaps the real argument here is whether we should be doing Unicode at all?

I've watched these debate with tremendous interest.  i18n is one of my
pure 'hobbies' (my 'clients' are all quite happy with ISO-8859-1, and
one of my backgrounds is WinNT where everything became unicode within
the OS.)

I'm pondering if utf-8 as the 'default' encoding wouldn't have been a
more effective approach than pure unicode wide-chars, but no matter how
you slice this, there will be several points of pain in the transition.

Rethinking in terms of utf-8 might be an interesting exercise, just to
draw up a comparison of 'what is broken' when sliding between a PHP5 ISO
charset and a PHP6 Unicode or utf-8 charset.

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Lukas Kahwe Smith

Rasmus Lerdorf wrote:

Derick Rethans wrote:
Regarding the unicode on/off modes, I don't think you put yourself in 
the developer's view at all. Users are not going to be better of having 
to deal with both modes.


Have you guys really thought this through?

Let's look at this from two angles.

First, from the our perspective maintaining and developing PHP.  Without
the Unicode switch, and as has already been suggested, PHP 5 will never
die.  Anything new in PHP 6 that isn't related to Unicode will be
backported to PHP 5.  Or, a slight variation of that, any developer with
no interest in Unicode will only work on the PHP 5 branch and not bother
worrying about whether it works in PHP 6 forcing others to do that work.
 I don't think we have the resources to do this, and I think it is
likely to either create 2 classes of developers and potentially
diverging trees, or it may simply kill off the Unicode effort altogether
if not enough developers bother looking at PHP 6 since PHP 5 will live
forever and is free of all this annoying Unicode stuff that is just too
complicated to deal with.

Second, from the user space PHP developers' perspective.  There are two
groups of those out there.  There is the group that builds apps for
controlled environments.  Yahoo, Facebook, and the hundreds, if not
thousands of smaller companies out there that will define a certain PHP
configuration and code against that.  To them such a switch isn't a big
deal except when it comes to re-using external code.  Which bring us to
the second group which is the group that strives to build portable apps
designed to run on as many unknown PHP configs as possible.  This is the
group that will get hit by this, and here is where we need to figure out
how to cause them the least amount of pain.  They are going to feel some
pain in order to get their heads around Unicode no matter how we handle
this.  For the portion of these folks who don't want to worry about
Unicode at all and they actually have code that does stuff on binary
strings that will break, their stuff just won't work no matter what we
do.  The difference comes down to whether it gets marked as PHP5-only or
it gets marked as non-Unicode-only.  And the other camp who do want to
make sure their stuff supports Unicode will need to write the Unicode
and non-Unicode versions and check to see if the system they are running
on supports Unicode or not.  Whether they check the PHP version number,
or the Unicode switch, or probe directly for the features they need, it
ends up being about the same amount of pain.

What may be somewhat lost in all this, that I hope nobody here is
forgetting, is that smooth Unicode support is really important.  Being
able to work directly in your native charset with your native strings
without having to deal with iconv and other crap is the goal here.  And
let's also not forget that a lot of code will actually work unchanged in
PHP 6 Unicode-mode and suddenly be Unicode-capable where they weren't
before.  I would love to see all this energy put toward making sure as
much code as possible falls into this category instead of arguing about
where to put the Unicode switch.  It's still a switch whether you put it
in the version number or in the .ini file.  In the version number it is
simply easier for people to ignore from all sides or the discussion
here, but where does that leave us 4 years from now?


I guess the question (which I am unable to answer) is if its easier to 
maintain PHP6 with the switch or be forced to backport to PHP5 without 
the switch in PHP6. If it does end up that a lot of devs prefer to work 
on PHP5 and as a result PHP6 is left dangling, I wonder if with the 
switch things will be any easier as devs will work/test only the non 
unicode side of things? I think this was the key point that was brought 
up that it will not be easier and instead was deemed more error prone to 
handle all the if's in a single tree, versus have a clean separation.


Also I wonder how a unicode on/off switch will be handled on the 
documentation side. It would add more permutations in the documentation 
to have the switch. From my understanding the situation is fairly non 
trivial already in how to handle all the version dependent differences. 
Philipp, whats your take on this?


regards,
Lukas

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Jani Taskinen
On Wed, 2007-07-18 at 02:42 -0700, Rasmus Lerdorf wrote:
 What may be somewhat lost in all this, that I hope nobody here is
 forgetting, is that smooth Unicode support is really important.  Being

Smooth it will be only if it's the only option. Otherwise it's just PITA
for both the camps. I'm all for unicode support as long as it's always
there.

 where to put the Unicode switch.  It's still a switch whether you put it
 in the version number or in the .ini file.  In the version number it is
 simply easier for people to ignore from all sides or the discussion
 here, but where does that leave us 4 years from now?

With a bone in hand? ;) Or most likely with actually working PHP with
full Unicode support rather than half-assed one..

Why not just rename the beast to uPHP. :D

--Jani

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Gaetano Giunta

On Wed, 18 Jul 2007, Zeev Suraski wrote:
...
You know what, I agree.  I wrote something to that effect in my post
from a few minutes ago.  The vast userbase is mostly comprised of
people we hardly even get to see.


Sorry to chime in on this already long thread with my -negative-
commit karma, but I really want to show support for the extremely
sensible and considerate position of Zeev.

I tend to consider myself a not-so-average php user, not because of my
self-assessed superior coding skills, but because most of my
(ex)coworkers, both developers and sysadmins involved in web
applications, have zero interest in any of the php mailing lists,
conferences or similar.
They need to get a job done, have very limited resources for it and
absolutely no time at all to improve their knowledge. They use php
because 1-it's easy, 2-other ppl use it. They can read english, but
with some difficulties, so their main source of information is blogs
from the italian php community.
They might keep running applications on PHP 4.04 (pl1!!!) because the
original coder left the project years ago, and doing proper QA on an
application you have not written is a huge effort, and migration a
risk they cannot really even asses.

Now, I can sneer at them all I want, the fact remains that they are
part of the user base, and have no less rights than I do to get the
best solution that can be served to them.

And I do not think mindless BC breakage is a thing they like.

imho, a lesson to be learned from the slow transition to php5 is to
really focus on communication. The big changes were written on the
walls, but many small ones were not. And QA is needed almost
exclusively to catch the small ones (for the big ones, you have the
coder fix it upfront).
Some examples of things I stumbled upon include: objects not being
copied on assignment (it was really documented, in the cabinet after
the 'beware of the leopard' sign), *curl_version suddenly returning an
array instead of a string and others...

Of course most users will migrate only when they feel the need for it
anyway, but the more obstacles are put on their path, the slower the
adoption rate will be.

Keeping the 'unicode off' switch is a kind of double edged word: it
eases life for people developing for intranets (they can migrate to
php 6 with unicode off and be fine), but might backfire on
framework/library developers, that will have to code for two
environments...
Maybe the only solution is making it easier to run different versions
of php in parallel?

my .2euros
Gaetano
*


Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Jani Taskinen
On Wed, 2007-07-18 at 12:23 +0200, Gaetano Giunta wrote:
 Maybe the only solution is making it easier to run different versions
 of php in parallel?

It's already easy and possible. Please don't start that discussion nor
spread the fud that it isn't.

--Jani



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Johannes Schlüter
Hi Zeev,

On Wed, 2007-07-18 at 01:58 -0700, Zeev Suraski wrote:
 Regarding the unicode on/off modes, I don't think you put yourself in
 the developer's view at all. Users are not going to be better of having
 to deal with both modes.
 
 Well, I tend to agree with you that they shouldn't have to handle 
 BOTH modes (write code that works with both settings).  But they will 
 definitely be better off if they can choose one of these modes and 
 develop/deploy for it.
 
 For someone for whom PHP 6 is a non-item (no interest in Unicode), 
 moving to PHP 6 and being forced to audit his code will be a 
 completely unreasonable cost of migration.  A clear 'not worth it' situation.

The question here in my opinion is: How much harm should we do to users
who develop new things in order to make lives simpler for these who need
BC.

The first thing I see is: Having these two modes is a pita for everybody
who wants to write portable code. The modes act different depending on
that switch, some parts of PHP work quite different, some of these
changes can be worked around in a quite simple way others not that easy
but still possible. (since the engine still knows unicode and you still
can make it all think there's some more unicode in there) But for a new
application it's imo bad to need such compatibility hacks.

If you want clean code you might concentrate on one of these two modes -
but which? The faster oder the better? Well, that depends on what
hoster's will configure, but how should they know?

For hosters it's hard to decide which road to go. Offer both? - Offering
both is, from the complexity, the same as hosting PHP 5 and PHP 6 since
unicode.semantics is PHP_INI_SYSTEM, meaning you need independent PHP
instances (FastCGI, individual hosts, whatever) Another possibility is
offering just PHP 6 with unicode.semantics Off. In my opinion a hoster
doing that might not advertise offering PHP 6 with that mode off since
it's only offering half of PHP 6 (namespaces, gote, maybe LSB, ...) or
offer PHP 6 + unicode and PHP 5 for BC. For me this feels like the most
sane way by the means of BC - on the one hand you have the full BC by
using PHP 5 on the other hand you're offering full PHP 6 for the ones
who need this feature.

Talking about BC: Except for the unicode stuff PHP 6 will most likely
have around the same amount of BC breaks as PHP 5 had compared to PHP 4.
(there are already a few tiny ones, like you can't call your functions
goto anymore and such stuff). PHP 5 offers an compatibility mode for
PHP 4, the benefit of that mode, compared to PHP 6's BC mode was that
one might change it even at runtime. What might help doing the migration
(while making the code ugly but hopefully such hacks are temporarily)

Another argument for that setting I read was performance. I didn't do
proper  benchmarks of the code comparing both modes so I don't know how
relevant the impact is but if performance of the unicode mode really is
a big problem for most users we are really going to have a big problem
since then we have to keep the mode forever and I, who can really live
with using ISO-8859-1, am wondering whether it really makes sense to
change half the engine for a mode which is too slow for most cases and
only needed by a minority of users (some mentioned in these discussions
numers like 10 % unicode mode on, 90% off ...) and whether it won't be
better do concentrate on the intl and mbstring extensions to improve the
tools for the ones needing better support in the area without harming
most users. But well, as said: Here I'm just wondering after reading the
previous discussions.

This all gives me the conclusion that we really should consider removing
the mode, but well, that's my opinion.

   As for ereg - especially in light of the discontinuation of PHP 4 we
   shouldn't even consider removing it in PHP 5.
 
 I don't think anybody wanted to remove it in PHP 5 - just make it
 possible to disable as an extension.
 
 Great, I misunderstood.

This gives me the possibility to come back to the original topic of this
thread, which wasn't about the unicode.semantics mode: Since I think we
should remove that setting I think we should disable ereg with PHP 6
since for what I know ereg won't work with unicode data. Regular
expressions which won't work on the main data type are pointless in my
opinion.

Besides that there are two other reasons I see:
- ereg functions are marked as deprecated for ages so user's should be
  prepared
- ereg functions aren't binary safe - most cases where I've seen them
  where most likely insecure since people didn't know you can bypass 
  ereg-based input checking by inserting nullbytes so removing these
  helps writing more secure code

In most cases a workaround, by PHP_Compat or something, can be offered
by escaping slashes in the pattern, adding slashes as delimiters and
give that to preg - this won't work in all cases but I'm sure it works
in most cases.


Ah, another thing kind of related to this 

Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Lukas Kahwe Smith

Zeev Suraski wrote:

Finally, at the risk of sounding like a broken record, we always need to 
remember that BC breakage accumulates, and it's not binary.  Every 
cleanup we do in PHP 6 will further slow migration, and as Andi pointed 
out a few days ago, things don't look too well as it is.


Agreed, its not binary, but its also not the simple addition of all 
issues either. The effort does diminish as you can cover multiple BC 
breaks in one going over your code. The key thing that we screwed up 
with PHP 5.x was not providing enough documentation on the BC breaks. 
Doing this better this time (the migration guides are a good start, 
porting some major apps and documenting the issues is another) could 
help us easy the transition as well. But as you point out, there is the 
fixed overhead of having to do the QA'ing at any rate.


regards,
Lukas

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Pierre

On 7/18/07, Lukas Kahwe Smith [EMAIL PROTECTED] wrote:

Zeev Suraski wrote:

 Finally, at the risk of sounding like a broken record, we always need to
 remember that BC breakage accumulates, and it's not binary.  Every
 cleanup we do in PHP 6 will further slow migration, and as Andi pointed
 out a few days ago, things don't look too well as it is.

Agreed, its not binary, but its also not the simple addition of all
issues either. The effort does diminish as you can cover multiple BC
breaks in one going over your code. The key thing that we screwed up
with PHP 5.x was not providing enough documentation on the BC breaks.
Doing this better this time (the migration guides are a good start,
porting some major apps and documenting the issues is another) could
help us easy the transition as well. But as you point out, there is the
fixed overhead of having to do the QA'ing at any rate.


What we really screwed up are the breakages _after_ 5.0, between 5.0
and now. Every one expects changes and breakages between two major
major versions, no matter the language.

--Pierre

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Lukas Kahwe Smith

Pierre wrote:

On 7/18/07, Lukas Kahwe Smith [EMAIL PROTECTED] wrote:

Zeev Suraski wrote:

 Finally, at the risk of sounding like a broken record, we always 
need to

 remember that BC breakage accumulates, and it's not binary.  Every
 cleanup we do in PHP 6 will further slow migration, and as Andi pointed
 out a few days ago, things don't look too well as it is.

Agreed, its not binary, but its also not the simple addition of all
issues either. The effort does diminish as you can cover multiple BC
breaks in one going over your code. The key thing that we screwed up
with PHP 5.x was not providing enough documentation on the BC breaks.
Doing this better this time (the migration guides are a good start,
porting some major apps and documenting the issues is another) could
help us easy the transition as well. But as you point out, there is the
fixed overhead of having to do the QA'ing at any rate.


What we really screwed up are the breakages _after_ 5.0, between 5.0
and now. Every one expects changes and breakages between two major
major versions, no matter the language.


True that ... the way E_STRICT was handled did not help either. Still 
looking forward to E_DEPRECATED.


regards,
Lukas

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Lukas Kahwe Smith

Johannes Schlüter wrote:


Ah, another thing kind of related to this thread: We really need a
proper way of having decisions declared  as being made. Recently it
happened quite often that many developer's thought some decision was
made (for example from reading the Paris meeting notes)  and then some
developers come and say there wasn't anything finally decided, yet. But
imo it's important to decide some things (like removal of possibly often
used functionality) soon so user's can be informed and prepare their
code and developers here can spent time on theses tasks knowing that
they are following decisions. Maybe this should discussed independently
from this thread - but it's a good example for the need... (while there
might be reasons to change the decision - but that shouldn't happen too
often)


Yeah, I guess I should put higher emphasis on adding links to the todo 
page that reference key mailing list posts.


regards,
Lukas

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Zeev Suraski

At 04:47 18/07/2007, Lukas Kahwe Smith wrote:

Zeev Suraski wrote:

Finally, at the risk of sounding like a broken record, we always 
need to remember that BC breakage accumulates, and it's not 
binary.  Every cleanup we do in PHP 6 will further slow migration, 
and as Andi pointed out a few days ago, things don't look too well as it is.


Agreed, its not binary, but its also not the simple addition of all 
issues either. The effort does diminish as you can cover multiple BC 
breaks in one going over your code. The key thing that we screwed up 
with PHP 5.x was not providing enough documentation on the BC 
breaks. Doing this better this time (the migration guides are a good 
start, porting some major apps and documenting the issues is 
another) could help us easy the transition as well. But as you point 
out, there is the fixed overhead of having to do the QA'ing at any rate.


Well I don't think it really diminishes, but I agree that 1+1 is 
maybe 1.9 and not 2.  On the other hand, if you remember that 
perception is everything (or at least very important), 1+1 can easily 
be perceived as 3, and in a negative sense.


Zeev 


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Pierre

On 7/18/07, Zeev Suraski [EMAIL PROTECTED] wrote:


Well I don't think it really diminishes, but I agree that 1+1 is
maybe 1.9 and not 2.  On the other hand, if you remember that
perception is everything (or at least very important), 1+1 can easily
be perceived as 3, and in a negative sense.


Exactly. And many people lost much more time to hunt down smaller
things like the Indirect modification of overloaded property.. or
the numerous other annoying (but sometimes required) changes. And
those means 1+1+1+1=2^32/F* php for most of them.

A dropped extension,  function or feature, when known (and done) soon
enough, is by far easier (planning is possible, migration, etc.).

--Pierre

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] POSIX regex

2007-07-18 Thread Andi Gutmans
Functions would work properly with Unicode, but you would explicitly
create Unicode strings e.g. ufoobar. This is not uncommon practice and
many other languages actually go down this route incl. Python and
various versions of C++ frameworks.

Andi 

 -Original Message-
 From: Derick Rethans [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, July 18, 2007 1:07 AM
 To: Stas Malyshev
 Cc: Lukas Kahwe Smith; Andi Gutmans; Ilia Alshanetsky; 
 [EMAIL PROTECTED]; internals@lists.php.net
 Subject: Re: [PHP-DEV] POSIX regex
 
 On Tue, 17 Jul 2007, Stanislav Malyshev wrote:
 
   that would actually benefit quite a bit from unicode 
 support, but I 
   guess you are talking about porting with unicode==off, right?
  
  unicode=off doesn't mean no unicode support, btw.
 
 Of course that's what it means, as none of the string 
 functions work properly with unicode if you turn it off. And 
 that's just the whole selling point of Unicode support.
 
 Derick
 

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread chris#



On Tue, 17 Jul 2007 08:47:42 +0200, Lukas Kahwe Smith [EMAIL PROTECTED] wrote:
 Larry Garfield wrote:
 Non-core PHP developer speaking, so read with that in mind:

 One of the things that held back PHP 5 adoption for so long, IMO, is the 
 large
 amount of FUD that surrounded it.  Even now, 3 years after it was
 released, I keep seeing the argument that I can't drop PHP 4 and use PHP 5, 
 then I
 have to rewrite *everything* to use objects.  I hate objects.  That is, of
 course, completely untrue, and if you're paying even moderate attention
 it's not at all difficult to write code that runs just fine in both PHP 4 and
 PHP 5, with and without register_globals and magic_quotes.  All it takes is a
 little forethought and not letting yourself be sloppy.
 
 I have seen little of that. But I have seen issues due to array_merge()
 changes. But more importantly our handling of E_STRICT has made it
 difficult for many.
 
 Writing PHP 5/6 compatible code needs to be just as easy, if not easier, in
 addition to having better marketing to head off the FUD.  Taking a stance
 of you'll have to start from scratch if you want to be PHP 6 compatible, oh
 well is an absolutely sure-fire way to guarantee that no one uses PHP 6 for
 anything except niche markets.
 
 I see it more as a question of being open about whats going on. If we
 would have had the upgrading guides from the beginning of 5.0.z, I think
 things would have been easier.
I'm /quite/ sure you are correct here. As memory serves; ppl were polarized for
or against almost immediately when PHP5 came out. This alone, is probably this
single most important ingredient to produce a FUD factory.
 The fact that our x.0.z releases are not
 particularly popular is another issue.
 
 I think the biggest challenge PHP5 faced however was that it was mainly
 about making developers life easier, since PHP4 already enables you to
 do pretty much any kind of web site if you are willing to put in the
 required time. Native unicode to me feels a bit more like adding
 something that was not really doable before (sure you can but that would
 mean writing every lib yourself, so the time required is beyond the vast
 majority of dev teams). Then again its not like all developers will jump
 on unicode the second its released (mainly because not all end users are
 asking for this). But the point is, getting very high adoption rates for
 new PHP releases will always be hard.
Just wondering; would it make /any/ sense to run a survey/poll on the PHP site,
asking what feature/capability/etc.. they would most like to see in future PHP
versions? This /might/ provide some insight for the developers to see if they
are at all inline with the developers goals/roadmap. Point being; it may help
future versions avoid the /underwhelming/ reception that PHP5 recieved. And 
better;
might help future versions recieve the same success that PHP4 did.

Just a thought.
 
 regards,
 Lukas
 
 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php
/
Service provided by hitOmeter.NET internet messaging!
.

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-18 Thread Larry Garfield
On Wednesday 18 July 2007, Rasmus Lerdorf wrote:

 Second, from the user space PHP developers' perspective.  There are two
 groups of those out there.  There is the group that builds apps for
 controlled environments.  Yahoo, Facebook, and the hundreds, if not
 thousands of smaller companies out there that will define a certain PHP
 configuration and code against that.  To them such a switch isn't a big
 deal except when it comes to re-using external code.  Which bring us to
 the second group which is the group that strives to build portable apps
 designed to run on as many unknown PHP configs as possible.  This is the
 group that will get hit by this, and here is where we need to figure out
 how to cause them the least amount of pain.  They are going to feel some
 pain in order to get their heads around Unicode no matter how we handle
 this.  For the portion of these folks who don't want to worry about
 Unicode at all and they actually have code that does stuff on binary
 strings that will break, their stuff just won't work no matter what we
 do.  The difference comes down to whether it gets marked as PHP5-only or
 it gets marked as non-Unicode-only.  And the other camp who do want to
 make sure their stuff supports Unicode will need to write the Unicode
 and non-Unicode versions and check to see if the system they are running
 on supports Unicode or not.  Whether they check the PHP version number,
 or the Unicode switch, or probe directly for the features they need, it
 ends up being about the same amount of pain.

Disclaimer again: PHP commit karma of 0, PHP development karma of some 
positive integer, PHP support karma of depends if you like gophp5.org or 
not. :-)

Permit me to offer a concrete example.  I am a Drupal developer; that is, I 
work on the Drupal CMS core and also get paid to build sites with Drupal 
professionally.  Drupal has made a huge push for internationalization in the 
past year and a half or so.  It's currently UTF-8 through and through, 
complete with user-space UTF-8-safe implementations of various string 
manipulation functions.  Native Unicode support would be awesome.

Drupal is used by a huge number of people on dedicated boxes where they 
control the environment.  It's also used by an even huger number of people on 
shared hosts where they get almost no control over the environment.  Right 
now it handles both quite well, under PHP 4.3.6-5.2.3.  (PHP 4 to be dropped 
in version 7.)  

Now, when PHP 6 is released we are going to want to be able to run in PHP 6, 
and likely at some point in the future switch to PHP 6 only just as we're now 
(finally) moving to PHP 5 only.  That means that, for a time, we'll have to 
be able to run with the same code base on PHP 5 and PHP 6.  

A great many people will want to run it on a PHP 6 unicode=on server, so they 
can leverage native Unicode support.  A great many people will want to run it 
on shared hosts, which means either PHP 5 or PHP 6 unicode=off (because I 
don't expect shared hosts to default to unicode=on any more readily than they 
accepted the default of register_globals=off).  And unlike register_globals, 
it won't be something we can change in the 

So there will be a prolonged period where we will have to be able to run on 
PHP 5.2, PHP 6 unicode=off, and PHP 6 unicode=on, even if we don't explicitly 
use PHP 6-only features yet.  Simply excluding one of those three completely 
will not be a viable option.  Maintaining two or three separate trees is also 
not an option.  We simply don't have anywhere close to the resources to do 
that.  (Plus Drupal is a plugin-based system, and asking plugin authors to do 
that is completely unreasonable.)

So, just how much hair should we plan to pull out in order to make that 
happen?  That's the million dollar question for me, and for, I suspect, most 
of the open source application developers out there.  How can we minimize 
that hair loss?  

Right now I really don't know what the answer is.  That's why I'm asking the 
question, because as C is really not a comfortable language for me anymore I 
have little ability to affect it directly.

-- 
Larry Garfield  AIM: LOLG42
[EMAIL PROTECTED]   ICQ: 6817012

If nature has made any one thing less susceptible than all others of 
exclusive property, it is the action of the thinking power called an idea, 
which an individual may exclusively possess as long as he keeps it to 
himself; but the moment it is divulged, it forces itself into the possession 
of every one, and the receiver cannot dispossess himself of it.  -- Thomas 
Jefferson

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-17 Thread Lukas Kahwe Smith

Andi Gutmans wrote:


There are clear things we want to change (like register_globals) because
we believe that ultimately they have a significant benefit to our users
with controllable downside (there is an easy one line workaround which
we can document for people to get their old apps to work). There are
other areas where breaking BC makes sense. But saying we should just
break it across the board and not even consider having a good upgrade
path for our users is unreasonable. I believe we can have a very good
PHP 6, which is pretty much in sync with many of your feelings, but that
provides a well documented and reasonable upgrade path (unlike VB -
VB.NET). 


I never said we should break BC just for the hell of it. The goal must 
be that PHP6 feels and behaves like PHP. Its not about high-jacking PHP 
to come up with the language we all wanted instead.



So let's not oversimplify this situation. We have to continue to make
trade-offs.


Sure, but you are suggesting to delay decisions indefinitely. Either you 
are saying this because you already decided that you don't want this 
change, or you are accepting that our users will be unable to prepare 
themselves for what happens in the future. This of course will make it 
that much harder for them to take the plunge into PHP6.



Btw, one of PHP's strengths has been in high performance sites and with
a Unicode=on only mode this would take quite a hit (but it's not the
only reason why I need we need choice). In any case, I think on this
question it does make sense that we start making informed decisions by
understanding the migration path better, as opposed to just basing
decisions on gut feelings. Maybe that kind of learning experience will
proove me wrong (which may be so).


I have not seen any proposed way of finding out this migration path 
besides lets wait. Lets wait is not the answer. What I asked for was 
exactly a decision on how far we are willing to go with the breakage and 
more importantly the fundamental decision about how we approach unicode 
in PHP6. The on off switch is not something that makes sense to delay 
until forever. Its a big decision and once its decided other things will 
become much easier (like PHP6 development or deciding the impact of 
other potential BC breaks).


regards,
Lukas

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-17 Thread Lukas Kahwe Smith

Larry Garfield wrote:

Non-core PHP developer speaking, so read with that in mind:

One of the things that held back PHP 5 adoption for so long, IMO, is the large 
amount of FUD that surrounded it.  Even now, 3 years after it was released, I 
keep seeing the argument that I can't drop PHP 4 and use PHP 5, then I have 
to rewrite *everything* to use objects.  I hate objects.  That is, of 
course, completely untrue, and if you're paying even moderate attention it's 
not at all difficult to write code that runs just fine in both PHP 4 and PHP 
5, with and without register_globals and magic_quotes.  All it takes is a 
little forethought and not letting yourself be sloppy.


I have seen little of that. But I have seen issues due to array_merge() 
changes. But more importantly our handling of E_STRICT has made it 
difficult for many.


Writing PHP 5/6 compatible code needs to be just as easy, if not easier, in 
addition to having better marketing to head off the FUD.  Taking a stance 
of you'll have to start from scratch if you want to be PHP 6 compatible, oh 
well is an absolutely sure-fire way to guarantee that no one uses PHP 6 for 
anything except niche markets.  


I see it more as a question of being open about whats going on. If we 
would have had the upgrading guides from the beginning of 5.0.z, I think 
things would have been easier. The fact that our x.0.z releases are not 
particularly popular is another issue.


I think the biggest challenge PHP5 faced however was that it was mainly 
about making developers life easier, since PHP4 already enables you to 
do pretty much any kind of web site if you are willing to put in the 
required time. Native unicode to me feels a bit more like adding 
something that was not really doable before (sure you can but that would 
mean writing every lib yourself, so the time required is beyond the vast 
majority of dev teams). Then again its not like all developers will jump 
on unicode the second its released (mainly because not all end users are 
asking for this). But the point is, getting very high adoption rates for 
new PHP releases will always be hard.


regards,
Lukas

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-17 Thread Pierre

Hi Andi,

On 7/16/07, Andi Gutmans [EMAIL PROTECTED] wrote:

I disagree with this view of the world.


Well, we seem to all agree on this view, but let forget this
unsignificant fact :)


It doesn't have to be a complete either/or decision and labeling
everything as a bc hacks decision is an inacurrate and populistic way
of building FUD.


Your persistent way to tell me (I use me as I'm not in the position
to talk for the other developers) that my way is populist, source of a
FUD, or whatever else came through your mind at a given moment . Fine,
if it helps you to make your point. However, can I suggest you to
seriously consider the (legitimate) voices outside your (no matter how
huge it is) world, it would be much appreciated.


There are
other areas where breaking BC makes sense. But saying we should just
break it across the board and not even consider having a good upgrade
path for our users is unreasonable.


For what I see in the various code I can fgrep, pcre is already used
much more than pcre. To migrate from ereg to pcre is a very small task
and it only brings advantages (cache, unicode support if
required,...). Ironically, a little pcre based script or grep should
do the job, if any regexp fan likes to play with that :)

Other changes in the engine will bring much more troubles (because
they are not obvious). Just like they did in the past between two
minor PHP versions.


I believe we can have a very good
PHP 6, which is pretty much in sync with many of your feelings, but that
provides a well documented and reasonable upgrade path (unlike VB -
VB.NET).


It is comparing apple and orange. As far as I remember, VB.net was not
really planed, they only realized how much their users liked VB and
why they will not move to c* or whatever else :)


If you want to break everything and anything


It is not about breaking everything just for the fun of it but about
creating a sane base to create portable and maintainable application
and libraries.


and don't want to be
limited whatsoever by our huge user-base then maybe you should write a
new language which fits exactly what your preference would be. The fact
is though, that even after these discussions and the Paris discussions,
the bulk of the idiosyncracies which make PHP what it is today will
remain (as per agreement).


You seem to have a straight view on what should be PHP6, why don't you
publish it (we have a wiki for this exact purpose) and let see that we
(as PHP internals developers) think about it, the sooner the better
(and once for all).

Waiting indefinitely is not a solution, and taking quick decisions a
week before the final release neither. Taking early decision will let
us adapt them or change them if necessary. Our users will have the
time to think about the consequences and tell us their needs or fears.


So let's not oversimplify this situation. We have to continue to make
trade-offs.


Let's not complicate it either.


Btw, one of PHP's strengths has been in high performance sites and with
a Unicode=on only mode this would take quite a hit (but it's not the
only reason why I need we need choice).



 In any case, I think on this
question it does make sense that we start making informed decisions by
understanding the migration path better, as opposed to just basing
decisions on gut feelings. Maybe that kind of learning experience will
proove me wrong (which may be so).


With the risk to repeat myself, we already learned from PHP5. There is
nothing that can prevent users to migrate quicker than they want
(read: quicker that they need) except if the benefits are enormous,
but that's not the case (it is but not for a large amount of users).

We can keep dreaming about a short migration path for PHP6 or we can
simply take the right decisions. Saying that we are not informed is a
poor excuse to delay any critical decisions. We are informed, we use
php daily and we have to deal every day with the issues we try to
solve now.

Cheers,
--Pierre

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] POSIX regex

2007-07-17 Thread Andi Gutmans
A few months ago we agreed that we will give our users the choice of
both modes. The burdon of maintenance has mainly been on us btw as the
majority of the differences here are in the Zend Engine and the
extensions don't have as much work associated with them. 

Here's my proposed way of figuring how to make migration easier. Port
the following applications to PHP 6 and let's see what we can learn from
it:
- mediaWiki
- SugarCRM
- Drupal
- Wordpress

I don't think we can have more of a reality check than actually going
through this exercise and understanding the issues. As I mentioned from
the small work we have done up to now it seems like there really is no
migration patch except for applications to be almost completely
rewritten when unicode_semantics=on. I don't think this is a feasible
way to go. But if volunteers can work on this porting and it allows us
to fix some things (if they are fixable) then that would change the
situation.

I believe that people who actually do this exercise and want to have a
migration path will understand that there's no other way except to
support unicode_semantics=off. Btw, most languages deliver Unicode in
this way and it works pretty well.

Andi

 -Original Message-
 From: Lukas Kahwe Smith [mailto:[EMAIL PROTECTED] 
 Sent: Monday, July 16, 2007 11:40 PM
 To: Andi Gutmans
 Cc: Ilia Alshanetsky; [EMAIL PROTECTED]; internals@lists.php.net
 Subject: Re: [PHP-DEV] POSIX regex
 
 Andi Gutmans wrote:
 
  There are clear things we want to change (like register_globals) 
  because we believe that ultimately they have a significant 
 benefit to 
  our users with controllable downside (there is an easy one line 
  workaround which we can document for people to get their 
 old apps to 
  work). There are other areas where breaking BC makes sense. 
 But saying 
  we should just break it across the board and not even 
 consider having 
  a good upgrade path for our users is unreasonable. I believe we can 
  have a very good PHP 6, which is pretty much in sync with 
 many of your 
  feelings, but that provides a well documented and 
 reasonable upgrade 
  path (unlike VB - VB.NET).
 
 I never said we should break BC just for the hell of it. The 
 goal must be that PHP6 feels and behaves like PHP. Its not 
 about high-jacking PHP to come up with the language we all 
 wanted instead.
 
  So let's not oversimplify this situation. We have to 
 continue to make 
  trade-offs.
 
 Sure, but you are suggesting to delay decisions indefinitely. 
 Either you are saying this because you already decided that 
 you don't want this change, or you are accepting that our 
 users will be unable to prepare themselves for what happens 
 in the future. This of course will make it that much harder 
 for them to take the plunge into PHP6.
 
  Btw, one of PHP's strengths has been in high performance sites and 
  with a Unicode=on only mode this would take quite a hit 
 (but it's not 
  the only reason why I need we need choice). In any case, I think on 
  this question it does make sense that we start making informed 
  decisions by understanding the migration path better, as opposed to 
  just basing decisions on gut feelings. Maybe that kind of learning 
  experience will proove me wrong (which may be so).
 
 I have not seen any proposed way of finding out this 
 migration path besides lets wait. Lets wait is not the 
 answer. What I asked for was exactly a decision on how far we 
 are willing to go with the breakage and more importantly the 
 fundamental decision about how we approach unicode in PHP6. 
 The on off switch is not something that makes sense to delay 
 until forever. Its a big decision and once its decided other 
 things will become much easier (like PHP6 development or 
 deciding the impact of other potential BC breaks).
 
 regards,
 Lukas
 

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] POSIX regex

2007-07-17 Thread Jani Taskinen
Just FYI: I did not agree with that choice. And IIRC, neither did
several other people here.

--Jani

On Tue, 2007-07-17 at 07:27 -0700, Andi Gutmans wrote:
 A few months ago we agreed that we will give our users the choice of
 both modes. The burdon of maintenance has mainly been on us btw as the
 majority of the differences here are in the Zend Engine and the
 extensions don't have as much work associated with them. 
 
 Here's my proposed way of figuring how to make migration easier. Port
 the following applications to PHP 6 and let's see what we can learn from
 it:
 - mediaWiki
 - SugarCRM
 - Drupal
 - Wordpress
 
 I don't think we can have more of a reality check than actually going
 through this exercise and understanding the issues. As I mentioned from
 the small work we have done up to now it seems like there really is no
 migration patch except for applications to be almost completely
 rewritten when unicode_semantics=on. I don't think this is a feasible
 way to go. But if volunteers can work on this porting and it allows us
 to fix some things (if they are fixable) then that would change the
 situation.
 
 I believe that people who actually do this exercise and want to have a
 migration path will understand that there's no other way except to
 support unicode_semantics=off. Btw, most languages deliver Unicode in
 this way and it works pretty well.
 
 Andi
 
  -Original Message-
  From: Lukas Kahwe Smith [mailto:[EMAIL PROTECTED] 
  Sent: Monday, July 16, 2007 11:40 PM
  To: Andi Gutmans
  Cc: Ilia Alshanetsky; [EMAIL PROTECTED]; internals@lists.php.net
  Subject: Re: [PHP-DEV] POSIX regex
  
  Andi Gutmans wrote:
  
   There are clear things we want to change (like register_globals) 
   because we believe that ultimately they have a significant 
  benefit to 
   our users with controllable downside (there is an easy one line 
   workaround which we can document for people to get their 
  old apps to 
   work). There are other areas where breaking BC makes sense. 
  But saying 
   we should just break it across the board and not even 
  consider having 
   a good upgrade path for our users is unreasonable. I believe we can 
   have a very good PHP 6, which is pretty much in sync with 
  many of your 
   feelings, but that provides a well documented and 
  reasonable upgrade 
   path (unlike VB - VB.NET).
  
  I never said we should break BC just for the hell of it. The 
  goal must be that PHP6 feels and behaves like PHP. Its not 
  about high-jacking PHP to come up with the language we all 
  wanted instead.
  
   So let's not oversimplify this situation. We have to 
  continue to make 
   trade-offs.
  
  Sure, but you are suggesting to delay decisions indefinitely. 
  Either you are saying this because you already decided that 
  you don't want this change, or you are accepting that our 
  users will be unable to prepare themselves for what happens 
  in the future. This of course will make it that much harder 
  for them to take the plunge into PHP6.
  
   Btw, one of PHP's strengths has been in high performance sites and 
   with a Unicode=on only mode this would take quite a hit 
  (but it's not 
   the only reason why I need we need choice). In any case, I think on 
   this question it does make sense that we start making informed 
   decisions by understanding the migration path better, as opposed to 
   just basing decisions on gut feelings. Maybe that kind of learning 
   experience will proove me wrong (which may be so).
  
  I have not seen any proposed way of finding out this 
  migration path besides lets wait. Lets wait is not the 
  answer. What I asked for was exactly a decision on how far we 
  are willing to go with the breakage and more importantly the 
  fundamental decision about how we approach unicode in PHP6. 
  The on off switch is not something that makes sense to delay 
  until forever. Its a big decision and once its decided other 
  things will become much easier (like PHP6 development or 
  deciding the impact of other potential BC breaks).
  
  regards,
  Lukas
  
 
 -- 
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php
 

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] POSIX regex

2007-07-17 Thread Andi Gutmans
Hmm I don't quite understand what bad code vs. good code plays here.
Wordpress is one of the most popular applications out there so it's got
huge value to our community. I bet there's a huge amount of PHP
applications who's source code is of the same quality or worse. Anyway,
the issues I have seen would also be relevant to what you call good
code but again, when it comes to compatibility, I don't quite know why
that will play a big role.

I am talking about porting to both unicode_semantics=off and on. This
will give us a good understanding of the difference of the modes and
where we're at. I bet most people who are voicing their opinions have
actually not tried to write a sizeable application with PHP 6 and also
tried to run an existing one on PHP 6 
(unciode_semantics=on). I can also do some performance testing in our
performance lab once we have both working. I haven't yet mentioned how
companies building high-performance sites would probably take a huge hit
by moving to Unicode to the point where I think they will not adopt for
a long time and then will be faced with the choice to migrate off of PHP
or bite the bullet. With some of the companies I know that have huge
server farms adding 50% capacity (or whatever the number is) could be a
good enough reason to migate off as they are paying huge fees for the
servers...

Andi

 -Original Message-
 From: Lukas Kahwe Smith [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, July 17, 2007 7:50 AM
 To: Andi Gutmans
 Cc: Ilia Alshanetsky; [EMAIL PROTECTED]; internals@lists.php.net
 Subject: Re: [PHP-DEV] POSIX regex
 
 Andi Gutmans wrote:
 
  Here's my proposed way of figuring how to make migration 
 easier. Port 
  the following applications to PHP 6 and let's see what we can learn 
  from
  it:
  - mediaWiki
  - SugarCRM
  - Drupal
  - Wordpress
 
 IIRC Wordpress is a good example of bad source code to fix. 
 Drupal would be a good example of a PHP4 style fairly 
 procedural app to port. 
 mediaWiki also seems like a worthy cause since its one of 
 those apps that would actually benefit quite a bit from 
 unicode support, but I guess you are talking about porting 
 with unicode==off, right?
 
 SugarCRM would be a good example of a gigantic horrible 
 horrible source code to fix and I am not sure if I would put 
 it on the list considering the limited open source release 
 they do. I think it would be cool of they would do it 
 themselves or sponsor whoever is doing it.
 
 We also have an SoC project where someone is implementing a 
 PHP6 version of Jaws.
 
 regards,
 Lukas
 

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-17 Thread Lukas Kahwe Smith

Andi Gutmans wrote:


Here's my proposed way of figuring how to make migration easier. Port
the following applications to PHP 6 and let's see what we can learn from
it:
- mediaWiki
- SugarCRM
- Drupal
- Wordpress


IIRC Wordpress is a good example of bad source code to fix. Drupal would 
be a good example of a PHP4 style fairly procedural app to port. 
mediaWiki also seems like a worthy cause since its one of those apps 
that would actually benefit quite a bit from unicode support, but I 
guess you are talking about porting with unicode==off, right?


SugarCRM would be a good example of a gigantic horrible horrible source 
code to fix and I am not sure if I would put it on the list considering 
the limited open source release they do. I think it would be cool of 
they would do it themselves or sponsor whoever is doing it.


We also have an SoC project where someone is implementing a PHP6 version 
of Jaws.


regards,
Lukas

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] POSIX regex

2007-07-17 Thread Andi Gutmans
I thought you were retired at the time... 

 -Original Message-
 From: Jani Taskinen [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, July 17, 2007 7:37 AM
 To: Andi Gutmans
 Cc: internals@lists.php.net
 Subject: RE: [PHP-DEV] POSIX regex
 
 Just FYI: I did not agree with that choice. And IIRC, neither 
 did several other people here.
 
 --Jani
 
 On Tue, 2007-07-17 at 07:27 -0700, Andi Gutmans wrote:
  A few months ago we agreed that we will give our users the 
 choice of 
  both modes. The burdon of maintenance has mainly been on us 
 btw as the 
  majority of the differences here are in the Zend Engine and the 
  extensions don't have as much work associated with them.
  
  Here's my proposed way of figuring how to make migration 
 easier. Port 
  the following applications to PHP 6 and let's see what we can learn 
  from
  it:
  - mediaWiki
  - SugarCRM
  - Drupal
  - Wordpress
  
  I don't think we can have more of a reality check than 
 actually going 
  through this exercise and understanding the issues. As I mentioned 
  from the small work we have done up to now it seems like 
 there really 
  is no migration patch except for applications to be almost 
 completely 
  rewritten when unicode_semantics=on. I don't think this is 
 a feasible 
  way to go. But if volunteers can work on this porting and 
 it allows us 
  to fix some things (if they are fixable) then that would change the 
  situation.
  
  I believe that people who actually do this exercise and 
 want to have a 
  migration path will understand that there's no other way except to 
  support unicode_semantics=off. Btw, most languages deliver 
 Unicode in 
  this way and it works pretty well.
  
  Andi
  
   -Original Message-
   From: Lukas Kahwe Smith [mailto:[EMAIL PROTECTED]
   Sent: Monday, July 16, 2007 11:40 PM
   To: Andi Gutmans
   Cc: Ilia Alshanetsky; [EMAIL PROTECTED]; 
 internals@lists.php.net
   Subject: Re: [PHP-DEV] POSIX regex
   
   Andi Gutmans wrote:
   
There are clear things we want to change (like 
 register_globals) 
because we believe that ultimately they have a significant
   benefit to
our users with controllable downside (there is an easy one line 
workaround which we can document for people to get their
   old apps to
work). There are other areas where breaking BC makes sense. 
   But saying
we should just break it across the board and not even
   consider having
a good upgrade path for our users is unreasonable. I believe we 
can have a very good PHP 6, which is pretty much in sync with
   many of your
feelings, but that provides a well documented and
   reasonable upgrade
path (unlike VB - VB.NET).
   
   I never said we should break BC just for the hell of it. The goal 
   must be that PHP6 feels and behaves like PHP. Its not about 
   high-jacking PHP to come up with the language we all 
 wanted instead.
   
So let's not oversimplify this situation. We have to
   continue to make
trade-offs.
   
   Sure, but you are suggesting to delay decisions indefinitely. 
   Either you are saying this because you already decided that you 
   don't want this change, or you are accepting that our 
 users will be 
   unable to prepare themselves for what happens in the 
 future. This of 
   course will make it that much harder for them to take the plunge 
   into PHP6.
   
Btw, one of PHP's strengths has been in high 
 performance sites and 
with a Unicode=on only mode this would take quite a hit
   (but it's not
the only reason why I need we need choice). In any 
 case, I think 
on this question it does make sense that we start 
 making informed
decisions by understanding the migration path better, 
 as opposed 
to just basing decisions on gut feelings. Maybe that kind of 
learning experience will proove me wrong (which may be so).
   
   I have not seen any proposed way of finding out this 
 migration path 
   besides lets wait. Lets wait is not the answer. What I 
 asked for was 
   exactly a decision on how far we are willing to go with 
 the breakage 
   and more importantly the fundamental decision about how 
 we approach 
   unicode in PHP6.
   The on off switch is not something that makes sense to 
 delay until 
   forever. Its a big decision and once its decided other 
 things will 
   become much easier (like PHP6 development or deciding the 
 impact of 
   other potential BC breaks).
   
   regards,
   Lukas
   
  
  --
  PHP Internals - PHP Runtime Development Mailing List To 
 unsubscribe, 
  visit: http://www.php.net/unsub.php
  
 
 

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-17 Thread Lukas Kahwe Smith

Andi Gutmans wrote:

Hmm I don't quite understand what bad code vs. good code plays here.
Wordpress is one of the most popular applications out there so it's got
huge value to our community. I bet there's a huge amount of PHP
applications who's source code is of the same quality or worse. Anyway,
the issues I have seen would also be relevant to what you call good
code but again, when it comes to compatibility, I don't quite know why
that will play a big role.


Bad good in the sense its messy. But what I was going at is that I find 
your proposed list good with the exception of SugarCRM. It might be good 
to also include a php5 only app, so that we have a good idea of how 
messy code, fairly procedural, E_STRICT complaint etc code ports to PHP6 
unicode==off.



I am talking about porting to both unicode_semantics=off and on. This
will give us a good understanding of the difference of the modes and
where we're at. I bet most people who are voicing their opinions have
actually not tried to write a sizeable application with PHP 6 and also
tried to run an existing one on PHP 6 
(unciode_semantics=on). I can also do some performance testing in our


ok .. this makes this quite a large undertaking indeed.

regards,
Lukas

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-17 Thread Pierre

On 7/17/07, Andi Gutmans [EMAIL PROTECTED] wrote:

Hmm I don't quite understand what bad code vs. good code plays here.
Wordpress is one of the most popular applications out there so it's got
huge value to our community. I bet there's a huge amount of PHP
applications who's source code is of the same quality or worse. Anyway,
the issues I have seen would also be relevant to what you call good
code but again, when it comes to compatibility, I don't quite know why
that will play a big role.


Using PHP4 as a base to test the compatibility of PHP6 is a bad idea.
The entry point should be PHP5+ (even if the troubles begin between
5.1 and 5.2).

Having apps running on 5.2 with E_STRICT without notices would be a
good indicator about how it will work with php6 without unicode (or
php 5.3 for php6/Off  and php6 with unicode only).


I am talking about porting to both unicode_semantics=off and on. This
will give us a good understanding of the difference of the modes and
where we're at. I bet most people who are voicing their opinions have
actually not tried to write a sizeable application with PHP 6 and also
tried to run an existing one on PHP 6  (unciode_semantics=on).


I did. And please (for god' sake...), can you stop to make bad
assumptions about what other knows or not?

With all my apps and I'm well aware of the work I will need to port
them. But this work is required as long as I'm interested in Unicode.
Unicode off? No interest sorry, I do not care about Namespace for my
existing apps.

Don't get me wrong: I love them but I don't consider this feature as
critical for my _exisiting_ applications. They work without since
years, they will continue to work without a couple of more years.
Using Namespace will require more work anyway.


I can also do some performance testing in our
performance lab once we have both working. I haven't yet mentioned how
companies building high-performance sites would probably take a huge hit
by moving to Unicode to the point where I think they will not adopt for
a long time and then will be faced with the choice to migrate off of PHP
or bite the bullet. With some of the companies I know that have huge
server farms adding 50% capacity (or whatever the number is) could be a
good enough reason to migate off as they are paying huge fees for the
servers...


50% increase sounds off base. But I did not bench php6 yet. When all
the new features are implemented, it will make more sense to work on
the performance problem. For now, it is simply premature.

Gruß,
--Pierre

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-17 Thread Pierre

On 7/17/07, Andi Gutmans [EMAIL PROTECTED] wrote:

I thought you were retired at the time...


Other were not. Some other were not even present. And those who were
present seem to have different interpretations of the decisions. I
also have to say that this meeting was done when we were not actually
informed.

--Pierre

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-17 Thread Stanislav Malyshev
that would actually benefit quite a bit from unicode support, but I 
guess you are talking about porting with unicode==off, right?


unicode=off doesn't mean no unicode support, btw.
--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED]   http://www.zend.com/
(408)253-8829   MSN: [EMAIL PROTECTED]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-17 Thread Jani Taskinen

Pierre kirjoitti:

50% increase sounds off base. But I did not bench php6 yet. When all
the new features are implemented, it will make more sense to work on
the performance problem. For now, it is simply premature.


If Moore's law stands for the coming years, this argument is moot anyway.
By the time PHP 6 is out the door, any performance issues are insignificant. :)

And by the time people actually start using PHP 6, it's propably already antique 
tech anyway..(around 2015 or so) :D


--Jani

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-17 Thread Tomas Kuliavas
 50% increase sounds off base. But I did not bench php6 yet. When all
 the new features are implemented, it will make more sense to work on
 the performance problem. For now, it is simply premature.

 If Moore's law stands for the coming years, this argument is moot anyway.
 By the time PHP 6 is out the door, any performance issues are
 insignificant. :)

If you have setup with 10 machines and new interpreter works 10% faster,
you can serve same amount of users with 9 machines. Plus Moore talks about
number of transistors and not about performance or power consumption.

-- 
Tomas

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-17 Thread Jani Taskinen

Nitpicking, are we? :)

Tomas Kuliavas kirjoitti:

50% increase sounds off base. But I did not bench php6 yet. When all
the new features are implemented, it will make more sense to work on
the performance problem. For now, it is simply premature.

If Moore's law stands for the coming years, this argument is moot anyway.
By the time PHP 6 is out the door, any performance issues are
insignificant. :)


If you have setup with 10 machines and new interpreter works 10% faster,
you can serve same amount of users with 9 machines. Plus Moore talks about
number of transistors and not about performance or power consumption.



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-17 Thread Pierre

On 7/17/07, Tomas Kuliavas [EMAIL PROTECTED] wrote:

 50% increase sounds off base. But I did not bench php6 yet. When all
 the new features are implemented, it will make more sense to work on
 the performance problem. For now, it is simply premature.

 If Moore's law stands for the coming years, this argument is moot anyway.
 By the time PHP 6 is out the door, any performance issues are
 insignificant. :)

If you have setup with 10 machines and new interpreter works 10% faster,
you can serve same amount of users with 9 machines. Plus Moore talks about
number of transistors and not about performance or power consumption.


Three core in one processor consume less than three different
processors. More CPUs in one host will also faster than many hosts
(processing power). Sorry, but Jani's reference to Moore is correct.
But that's definitively not the topic :)

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-17 Thread Richard Lynch
On Mon, July 16, 2007 7:47 am, Jani Taskinen wrote:
 I have moved the POSIX regex dependant functions to ext/ereg/
 extension.

 Now only places using the POSIX regex functions (ext/ereg/ excluded)
 are
 ext/standard/browscap.c and ext/pgsql/pgsql.c.

I took a brief look at the pgsql.c stuff, and it looks like it's all
fairly straight-forward to alter to PCRE instead of POSIX, and it's
all localized to this function:
http://lxr.php.net/ident?i=php_pgsql_convert_match

Am I under-estimating the problem?

Or is it actually possible that *I* could fix this and contribute
something useful for once?

Is anybody else already on it?

Cuz I'm gonna go download CVS and see if I can't submit a patch...

[be afraid, be very afraid]

-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-17 Thread Jani Taskinen

Richard Lynch kirjoitti:

I took a brief look at the pgsql.c stuff, and it looks like it's all
fairly straight-forward to alter to PCRE instead of POSIX, and it's
all localized to this function:
http://lxr.php.net/ident?i=php_pgsql_convert_match

Am I under-estimating the problem?


Propably not.


Is anybody else already on it?


AFAIK, no. Feel free. I hope you have pgsql in use and you can test it too. :)

--Jani

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-17 Thread Richard Lynch
On Tue, July 17, 2007 4:29 pm, Jani Taskinen wrote:
 Richard Lynch kirjoitti:
 I took a brief look at the pgsql.c stuff, and it looks like it's all
 fairly straight-forward to alter to PCRE instead of POSIX, and it's
 all localized to this function:
 http://lxr.php.net/ident?i=php_pgsql_convert_match

 Am I under-estimating the problem?

 Propably not.

 Is anybody else already on it?

 AFAIK, no. Feel free. I hope you have pgsql in use and you can test it
 too. :)

Yes and yes.

Errr, I guess I'd better make that Yes and I'll try :-)

I use PostgreSQL a lot, actually.

-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-17 Thread Zeev Suraski

At 00:21 17/07/2007, Pierre wrote:

Hi Andi,

On 7/16/07, Andi Gutmans [EMAIL PROTECTED] wrote:

I disagree with this view of the world.


Well, we seem to all agree on this view, but let forget this
unsignificant fact :)


Pierre,

I wanted to send my 2c even though I'm not really involved in 
internals@ any longer - because in reality it doesn't really have 
much to do with such decisions.  internals@ makes decisions that 
effect the entire PHP userbase.


We all need to remember that the people on this mailing list are not 
close to something that represents the userbase.  We do have some 
very opinionated people on this list, some of them with a lot of 
commit-karma - which are not very open to feedback from regular 
users.  I'm not saying I represent the PHP userbase, and I don't 
think Andi is saying this either - but both of us try to take the end 
user's view when we think about stuff like this, as opposed as the 
internals@ PHP developer view.  I would go as far as saying that I 
think we do it (as well as some others, like Rasmus) more so than 
some others on this list.


For that reason I suspect that if you moved the discussion to, say, 
php-general - you'd see a much more balanced view of the 
world.  Unfortunately it will probably not be very 
manageable.  Something more practical would be trying to think about 
things from the end users perspective as opposed to our perspective 
as the developers and maintainers of the language.


Finally, at the risk of sounding like a broken record, we always need 
to remember that BC breakage accumulates, and it's not binary.  Every 
cleanup we do in PHP 6 will further slow migration, and as Andi 
pointed out a few days ago, things don't look too well as it is.


As for ereg - especially in light of the discontinuation of PHP 4 we 
shouldn't even consider removing it in PHP 5.  I agree with Andi that 
I'm not sure it's a good idea for PHP 6 either, but I'm not sure it 
isn't either.  As long as it's easy enough to turn it back on (i.e. 
have it bundled but disabled) I think it's not unreasonable.


Zeev

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] POSIX regex

2007-07-16 Thread Andi Gutmans
Why move it to PECL? I agree that PCRE is the preferred way but not
having ereg() will break a huge amount of applications for very little
gain.

We might possibly want to consider disabling by default but not having
it in the default package doesn't make real sense.

Trying to do browscap.c and pgsql.c with PCRE sounds right (if it's
possible which it probably is).
Andi 

 -Original Message-
 From: Jani Taskinen [mailto:[EMAIL PROTECTED]
 Sent: Monday, July 16, 2007 5:48 AM
 To: internals@lists.php.net
 Subject: [PHP-DEV] POSIX regex
 
 I have moved the POSIX regex dependant functions to ext/ereg/ 
 extension.
 
 Now only places using the POSIX regex functions (ext/ereg/
 excluded) are ext/standard/browscap.c and ext/pgsql/pgsql.c.
 
 So what to do with these 2 places using the POSIX stuff? 
 Convert them to use PCRE functions or enable PCRE to be build with the

 POSIX compat functions?
 
 ext/ereg/ is going to go to PECL anyway..
 
 --Jani
 
 --
 PHP Internals - PHP Runtime Development Mailing List To unsubscribe, 
 visit: http://www.php.net/unsub.php
 
 

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-16 Thread Ilia Alshanetsky


On 16-Jul-07, at 9:46 AM, Andi Gutmans wrote:


Why move it to PECL? I agree that PCRE is the preferred way but not
having ereg() will break a huge amount of applications for very little
gain.


I tend to agree, unless we provide wrappers via PCRE that emulate  
ereg functionality I don't think we can remove posix regex until PHP 6.


Ilia Alshanetsky

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] POSIX regex

2007-07-16 Thread Jani Taskinen
Please read about the decision done regarding this and why it was done
at: http://derickrethans.nl/files/meeting-notes.html#move-ereg-to-pecl

This is getting quite boring. You have had over 2 years to read about
this and complain..and this wasn't the first time with your usual
comment will break a huge amount of applications about anything we're
trying to improve. removed usual rant about BC

--Jani


On Mon, 2007-07-16 at 06:53 -0700, Andi Gutmans wrote:
 Even in PHP 6 I am not sure it's a good idea. There are a huge amount of
 apps that use them and it'll be very hard for people to upgrade.
 Anyway, let's do some more research on that once we get closer to PHP 6
 and see what the migration path looks like. We'll have to check with a
 few popular apps + google code search :)
 No need to decide on that right now without having more info.
 
 Andi 
 
  -Original Message-
  From: Ilia Alshanetsky [mailto:[EMAIL PROTECTED] 
  Sent: Monday, July 16, 2007 6:48 AM
  To: Andi Gutmans
  Cc: [EMAIL PROTECTED]; internals@lists.php.net
  Subject: Re: [PHP-DEV] POSIX regex
  
  
  On 16-Jul-07, at 9:46 AM, Andi Gutmans wrote:
  
   Why move it to PECL? I agree that PCRE is the preferred way but not 
   having ereg() will break a huge amount of applications for 
  very little 
   gain.
  
  I tend to agree, unless we provide wrappers via PCRE that 
  emulate ereg functionality I don't think we can remove posix 
  regex until PHP 6.
  
  Ilia Alshanetsky
  
  
  
  
  

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-16 Thread Pierre

On 7/16/07, Andi Gutmans [EMAIL PROTECTED] wrote:

Even in PHP 6 I am not sure it's a good idea.


As far as I know, Jani is referring to PHP6 only. And it was decided
in the php6 notes.

I'm in favour to remove ereg in php6, and the sooner we decide the
better.Users will know about this change and will finally understand
the PCRE superiority and why they should use it instead, and today.

As of 5.x (5.2.x or 5.3.x), I rather prefer to deprecate it in  5.3
(if any) but I don't think we should remove it in 5.x.

Cheers,
--Pierre

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] POSIX regex

2007-07-16 Thread Derick Rethans
On Mon, 16 Jul 2007, Andi Gutmans wrote:

 Even in PHP 6 I am not sure it's a good idea. There are a huge amount of
 apps that use them and it'll be very hard for people to upgrade.

Their apps are breaking anyway and three regex engines doesn't make 
sense. 

Derick

-- 
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] POSIX regex

2007-07-16 Thread Andi Gutmans
Even in PHP 6 I am not sure it's a good idea. There are a huge amount of
apps that use them and it'll be very hard for people to upgrade.
Anyway, let's do some more research on that once we get closer to PHP 6
and see what the migration path looks like. We'll have to check with a
few popular apps + google code search :)
No need to decide on that right now without having more info.

Andi 

 -Original Message-
 From: Ilia Alshanetsky [mailto:[EMAIL PROTECTED] 
 Sent: Monday, July 16, 2007 6:48 AM
 To: Andi Gutmans
 Cc: [EMAIL PROTECTED]; internals@lists.php.net
 Subject: Re: [PHP-DEV] POSIX regex
 
 
 On 16-Jul-07, at 9:46 AM, Andi Gutmans wrote:
 
  Why move it to PECL? I agree that PCRE is the preferred way but not 
  having ereg() will break a huge amount of applications for 
 very little 
  gain.
 
 I tend to agree, unless we provide wrappers via PCRE that 
 emulate ereg functionality I don't think we can remove posix 
 regex until PHP 6.
 
 Ilia Alshanetsky
 
 
 
 
 

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-16 Thread Derick Rethans
On Mon, 16 Jul 2007, Pierre wrote:

 On 7/16/07, Andi Gutmans [EMAIL PROTECTED] wrote:
  Even in PHP 6 I am not sure it's a good idea.
 
 As far as I know, Jani is referring to PHP6 only. And it was decided
 in the php6 notes.

Unfortunately that is not true. It's only the title of the agenda point, 
it's not part of the conclusions.

 I'm in favour to remove ereg in php6, and the sooner we decide the
 better.

Yes, I agree.

 Users will know about this change and will finally understand
 the PCRE superiority and why they should use it instead, and today.

However, users should learn how to use the new regexp engine 
as that will support Unicode :)

regards,
Derick

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-16 Thread Nuno Lopes
PCRE has a POSIX API, so it is possible to use PCRE as a drop-in replacement 
for the engine behind ereg(). What I don't know is how compatible it is with 
the current engine. But I think it worth investigating.



Nuno

P.S.: this POSIX PCRE layer isn't currently bundled with PHP, because it 
wasn't needed so far. 


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-16 Thread Lukas Kahwe Smith

Andi Gutmans wrote:

Even in PHP 6 I am not sure it's a good idea. There are a huge amount of
apps that use them and it'll be very hard for people to upgrade.
Anyway, let's do some more research on that once we get closer to PHP 6
and see what the migration path looks like. We'll have to check with a
few popular apps + google code search :)
No need to decide on that right now without having more info.


I disagree with this approach. The thing is that we need to get a clear 
message out ASAP. This all ties into topics like if we will have a 
unicode off/on switch or not. Delaying these decisions will hurt our 
userbase. We need to prepare them early.


IMHO we should use PHP6 as the clean up release. Drop unicode on/off 
switch, accept that the bulk of all code will need to be rewritten from 
scratch. The benefit will be that it will truely be cleaned up, people 
will still be able to leverage the bulk of their PHP programming 
background and they can enjoy the fastest possible unicode engine we can 
provide them.


PHP5 will be for the people that cannot make the jump. We will back port 
whatever we can reasonably get into PHP5. People will linger on PHP5, 
just as they are doing now with PHP4. So it goes. At least we will not 
punish the early adopters for those that are unwilling to move to the 
new version in the near future anyways.


At any rate .. the time is now to make a decision on what its gonna be. 
PHP6 with BC hacks or not.


regards,
Lukas

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-16 Thread Lukas Kahwe Smith

Ilia Alshanetsky wrote:


On 16-Jul-07, at 9:46 AM, Andi Gutmans wrote:


Why move it to PECL? I agree that PCRE is the preferred way but not
having ereg() will break a huge amount of applications for very little
gain.


I tend to agree, unless we provide wrappers via PCRE that emulate ereg 
functionality I don't think we can remove posix regex until PHP 6.


Doing before PHP6 would require some very very solid wrappers. Giving 
the little phpt coverage (*) we currently seem to have for ereg, I do 
not think its really possible to be able to even determine if any 
attempt at a wrapper is truely solid or not.


regards,
Lukas

(*) Actually I heard this said on IRC. I do not see an ext/ereg on 
gcov.php.net .. are the available tests part of regexp?


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-16 Thread Jani Taskinen
On Mon, 2007-07-16 at 15:22 +0100, Nuno Lopes wrote:
 PCRE has a POSIX API, so it is possible to use PCRE as a drop-in replacement 
 for the engine behind ereg(). What I don't know is how compatible it is with 
 the current engine. But I think it worth investigating.

Worked fine when I tested it. But it's quite pointless, it's still not
unicode friendly. It's just better to use system POSIX regex funcs if
ext/ereg/ is to stay..which is stupid since all functions it provides
can be easily replaced with unicode friendly preg_* funcs. Nobody should
use ereg_*() for anything if they want to use unicode. If they don't
need unicode, they don't need PHP 6 either.

 P.S.: this POSIX PCRE layer isn't currently bundled with PHP, because it 
 wasn't needed so far. 

It is bundled, just isn't compiled, see above.. :)

--Jani

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-16 Thread Antony Dovgal

Thank you Lukas for expressing exactly my thoughts on this.

On 16.07.2007 18:24, Lukas Kahwe Smith wrote:

Andi Gutmans wrote:

Even in PHP 6 I am not sure it's a good idea. There are a huge amount of
apps that use them and it'll be very hard for people to upgrade.
Anyway, let's do some more research on that once we get closer to PHP 6
and see what the migration path looks like. We'll have to check with a
few popular apps + google code search :)
No need to decide on that right now without having more info.


I disagree with this approach. The thing is that we need to get a clear 
message out ASAP. This all ties into topics like if we will have a 
unicode off/on switch or not. Delaying these decisions will hurt our 
userbase. We need to prepare them early.


IMHO we should use PHP6 as the clean up release. Drop unicode on/off 
switch, accept that the bulk of all code will need to be rewritten from 
scratch. The benefit will be that it will truely be cleaned up, people 
will still be able to leverage the bulk of their PHP programming 
background and they can enjoy the fastest possible unicode engine we can 
provide them.


PHP5 will be for the people that cannot make the jump. We will back port 
whatever we can reasonably get into PHP5. People will linger on PHP5, 
just as they are doing now with PHP4. So it goes. At least we will not 
punish the early adopters for those that are unwilling to move to the 
new version in the near future anyways.


At any rate .. the time is now to make a decision on what its gonna be. 
PHP6 with BC hacks or not.


regards,
Lukas




--
Wbr, 
Antony Dovgal


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-16 Thread Jani Taskinen
Thank you Lucas and Antony. Could not agree more..

On Mon, 2007-07-16 at 19:19 +0400, Antony Dovgal wrote:
 Thank you Lukas for expressing exactly my thoughts on this.
 
 On 16.07.2007 18:24, Lukas Kahwe Smith wrote:
  Andi Gutmans wrote:
  Even in PHP 6 I am not sure it's a good idea. There are a huge amount of
  apps that use them and it'll be very hard for people to upgrade.
  Anyway, let's do some more research on that once we get closer to PHP 6
  and see what the migration path looks like. We'll have to check with a
  few popular apps + google code search :)
  No need to decide on that right now without having more info.
  
  I disagree with this approach. The thing is that we need to get a clear 
  message out ASAP. This all ties into topics like if we will have a 
  unicode off/on switch or not. Delaying these decisions will hurt our 
  userbase. We need to prepare them early.
  
  IMHO we should use PHP6 as the clean up release. Drop unicode on/off 
  switch, accept that the bulk of all code will need to be rewritten from 
  scratch. The benefit will be that it will truely be cleaned up, people 
  will still be able to leverage the bulk of their PHP programming 
  background and they can enjoy the fastest possible unicode engine we can 
  provide them.
  
  PHP5 will be for the people that cannot make the jump. We will back port 
  whatever we can reasonably get into PHP5. People will linger on PHP5, 
  just as they are doing now with PHP4. So it goes. At least we will not 
  punish the early adopters for those that are unwilling to move to the 
  new version in the near future anyways.
  
  At any rate .. the time is now to make a decision on what its gonna be. 
  PHP6 with BC hacks or not.
  
  regards,
  Lukas
  
 
 

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-16 Thread David Coallier

On 7/16/07, Jani Taskinen [EMAIL PROTECTED] wrote:

Thank you Lucas and Antony. Could not agree more..

On Mon, 2007-07-16 at 19:19 +0400, Antony Dovgal wrote:
 Thank you Lukas for expressing exactly my thoughts on this.

 On 16.07.2007 18:24, Lukas Kahwe Smith wrote:
  Andi Gutmans wrote:
  Even in PHP 6 I am not sure it's a good idea. There are a huge amount of
  apps that use them and it'll be very hard for people to upgrade.
  Anyway, let's do some more research on that once we get closer to PHP 6
  and see what the migration path looks like. We'll have to check with a
  few popular apps + google code search :)
  No need to decide on that right now without having more info.
 
  I disagree with this approach. The thing is that we need to get a clear
  message out ASAP. This all ties into topics like if we will have a
  unicode off/on switch or not. Delaying these decisions will hurt our
  userbase. We need to prepare them early.
 
  IMHO we should use PHP6 as the clean up release. Drop unicode on/off
  switch, accept that the bulk of all code will need to be rewritten from
  scratch. The benefit will be that it will truely be cleaned up, people
  will still be able to leverage the bulk of their PHP programming
  background and they can enjoy the fastest possible unicode engine we can
  provide them.
 
  PHP5 will be for the people that cannot make the jump. We will back port
  whatever we can reasonably get into PHP5. People will linger on PHP5,
  just as they are doing now with PHP4. So it goes. At least we will not
  punish the early adopters for those that are unwilling to move to the
  new version in the near future anyways.
 
  At any rate .. the time is now to make a decision on what its gonna be.
  PHP6 with BC hacks or not.
 
  regards,
  Lukas
 



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php




Another thing to mention is that without GLOBALS (PHP6), most
application and coughphp4-developers/cough will have far more
problems than without posix regex'es.


D

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-16 Thread Pierre

On 7/16/07, Jani Taskinen [EMAIL PROTECTED] wrote:

Thank you Lucas and Antony. Could not agree more..


But  we all agree, don't we? :)

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] POSIX regex

2007-07-16 Thread Larry Garfield
Non-core PHP developer speaking, so read with that in mind:

One of the things that held back PHP 5 adoption for so long, IMO, is the large 
amount of FUD that surrounded it.  Even now, 3 years after it was released, I 
keep seeing the argument that I can't drop PHP 4 and use PHP 5, then I have 
to rewrite *everything* to use objects.  I hate objects.  That is, of 
course, completely untrue, and if you're paying even moderate attention it's 
not at all difficult to write code that runs just fine in both PHP 4 and PHP 
5, with and without register_globals and magic_quotes.  All it takes is a 
little forethought and not letting yourself be sloppy.

Writing PHP 5/6 compatible code needs to be just as easy, if not easier, in 
addition to having better marketing to head off the FUD.  Taking a stance 
of you'll have to start from scratch if you want to be PHP 6 compatible, oh 
well is an absolutely sure-fire way to guarantee that no one uses PHP 6 for 
anything except niche markets.  

If people are still relying on register_globals at this point, sure, they're 
screwed no matter what they do.  But code written to PHP 5 E_STRICT standards 
with a recommended configuration (register_globals off, etc.) should be 
possible to make run successfully in PHP 6 without gutting and starting from 
scratch (even if you can't use the new-and-cool features).  If not, GoPHP6 
will be a failure before it even gets started. :-)

(And yes, I'm already pondering how to do GoPHP6 in order to make the 5/6 
transition smoother.)

On Monday 16 July 2007, Andi Gutmans wrote:
 I disagree with this view of the world.
 It doesn't have to be a complete either/or decision and labeling
 everything as a bc hacks decision is an inacurrate and populistic way
 of building FUD.

 There are clear things we want to change (like register_globals) because
 we believe that ultimately they have a significant benefit to our users
 with controllable downside (there is an easy one line workaround which
 we can document for people to get their old apps to work). There are
 other areas where breaking BC makes sense. But saying we should just
 break it across the board and not even consider having a good upgrade
 path for our users is unreasonable. I believe we can have a very good
 PHP 6, which is pretty much in sync with many of your feelings, but that
 provides a well documented and reasonable upgrade path (unlike VB -
 VB.NET).

 If you want to break everything and anything and don't want to be
 limited whatsoever by our huge user-base then maybe you should write a
 new language which fits exactly what your preference would be. The fact
 is though, that even after these discussions and the Paris discussions,
 the bulk of the idiosyncracies which make PHP what it is today will
 remain (as per agreement). So there must have been some kind of view
 even by the folks here that they don't want to create a new language but
 improve on what we have. And it's a trade-off between bang for the buck;
 sometimes it really brings high returns to break BC especially when it
 comes to security; but sometimes except for making 10 PHP devs happy who
 are not the bulk of our users it doesn't.

 So let's not oversimplify this situation. We have to continue to make
 trade-offs.

 Btw, one of PHP's strengths has been in high performance sites and with
 a Unicode=on only mode this would take quite a hit (but it's not the
 only reason why I need we need choice). In any case, I think on this
 question it does make sense that we start making informed decisions by
 understanding the migration path better, as opposed to just basing
 decisions on gut feelings. Maybe that kind of learning experience will
 proove me wrong (which may be so).

 Andi

  -Original Message-
  From: Lukas Kahwe Smith [mailto:[EMAIL PROTECTED]
  Sent: Monday, July 16, 2007 7:25 AM
  To: Andi Gutmans
  Cc: Ilia Alshanetsky; [EMAIL PROTECTED]; internals@lists.php.net
  Subject: Re: [PHP-DEV] POSIX regex
 
  Andi Gutmans wrote:
   Even in PHP 6 I am not sure it's a good idea. There are a
 
  huge amount
 
   of apps that use them and it'll be very hard for people to upgrade.
   Anyway, let's do some more research on that once we get
 
  closer to PHP
 
   6 and see what the migration path looks like. We'll have to
 
  check with
 
   a few popular apps + google code search :) No need to
 
  decide on that
 
   right now without having more info.
 
  I disagree with this approach. The thing is that we need to
  get a clear message out ASAP. This all ties into topics like
  if we will have a unicode off/on switch or not. Delaying
  these decisions will hurt our userbase. We need to prepare them early.
 
  IMHO we should use PHP6 as the clean up release. Drop unicode
  on/off switch, accept that the bulk of all code will need to
  be rewritten from scratch. The benefit will be that it will
  truely be cleaned up, people will still be able to leverage
  the bulk of their PHP programming background