Re: [PHP-DEV] array_seek function
Hi As SEEK_END only makes sense with zero or negative offsets (for arrays anyway), I've come up with an implementation for SEEK_END: http://phpbenelux.eu/array_seek.patch.txt So you can do: $arr = array('a', 'b', 'c', 'd'); echo array_seek($arr, -2, SEEK_END); // outputs 'b' echo array_seek($arr, 0, SEEK_END); // outputs 'd' Cheers, Felix On 16-mrt-2010, at 19:07, Mikko Koppanen wrote: > On Tue, Mar 16, 2010 at 4:22 PM, Derick Rethans wrote: >> I was also thinking, can we just make this work just like fseek (with a >> "whence" parameter) as well? (http://uk3.php.net/fseek) > > Hi, > > not sure how SEEK_END is supposed to work with arrays but here is > SEEK_SET and SEEK_CUR (with positive and negative offset) > http://valokuva.org/~mikko/array_seek_whence.patch.txt > > -- > Mikko Koppanen -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Re: array_seek function
On Tue, Mar 16, 2010 at 17:12, Mikko Koppanen wrote: > On Tue, Mar 16, 2010 at 2:12 PM, Christian Schneider > wrote >> I thinks the user space implementation >> >> function array_seek($array, $pos) >> { >> $a = array_values($array); >> return $a[$pos]; >> } >> >> is simple enough to not add a native function for this. >> >> It might not be the most efficient way to do it but I doubt that it is >> something done frequently enough to justify another native function. > slightly modified version of the original patch > http://valokuva.org/~mikko/array_seek.patch.txt. The difference to the I once porpoised similar patch to in_array, where it didn't reset the position after finding the "found element". In applications like PhD, this is extremely useful and safes us at least 10% overhead (at the time I benchmarked it with my patch to in_array()). I think we wound up with something like: while (list($key, $val) = each($array)) { if ($key == "foobar") { break; } } next($array); $current_index = current($array); To get the _next_ value after the known "currently known value (or key)". In an application like PhD (which already brought 24hours (DSSSL 24hours, xsltproc two formats compile time to) down to ~3-4minutes (3-5formats), 10% of _language_ overhead is extremely important, so I am all for a function that can do this (our/my goal is max 1minute... - sorry, HD read/write is still extremely expensive :(, it simply cant get faster then that afaict - if you have an idea; GSOC is open for experiments.. :D). -Hannes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] array_seek function
> -Original Message- > From: Felix De Vliegher [mailto:felix.devlieg...@gmail.com] > Sent: 16 March 2010 13:31 > To: PHP internals > Subject: [PHP-DEV] array_seek function > > Hi all > > I recently needed seek functionality in arrays, and couldn't > find it in the regular set of array functions, so I wrote a > function for it. (Seek = getting an array value based on the > position (or offset, if you want to call it like that), and > not the key of the item) > > Basically you can use it like this: > $input = array(3, 'bar', 'baz'); > echo array_seek($input, 2); // returns 'baz' > echo array_seek($input, 0); // returns 3 echo > array_seek($input, 5); // returns NULL, emits an out of range warning > Remember doing something like this in the past... $input = array(3, 'bar', 'baz'); $iterator = new ArrayIterator($input); $iterator->seek(2); echo $iterator->current(); $iterator->seek(0); echo $iterator->current(); $iterator->seek(5); // throws OutOfBoundsException Though a specific function does make sense, imo. Jared -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] PHP 5.4 branch and trunk
On Tue, 2010-03-16 at 22:13 +0100, Lukas Kahwe Smith wrote: > On 16.03.2010, at 16:58, Derick Rethans wrote: > > > Before we add features, they need to be discussed whether we want to > > have them. As version name for it I would like to use "trunk-dev" (and > > not 5.4-dev or 6.0-dev) as we're not quite sure where this is moving. > > Right now, there are the following features that I can see we should > > think about: > > > Since we do not know the name of the next version yet, maybe its best to > base the name on what version it will have as a predecessor and add > support for this in version_compare()? Something like "5.3post". Ok this > isnt a good suggestion, but I hope you get what I am suggesting. We need a version number which can be represented as a numeric value like #define PHP_VERSION_ID 50303 to help extension authors; as said on IRC 5.4 is the only sane choice there. We can still increase the number if needed. How to document this is a good question... johannes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Where are we ACTUALLY on Unicode?
On Tue, Mar 16, 2010 at 9:43 PM, dreamcat four wrote: > And remember, > > Its not just the number of times its send to ICU for conversion. Its > also the number of times your UTF-16 string has to be converted back > into utf-8 afterwards. This is why Apple makes its utf-16 strings > immutable. So they are read-only, and the utf-8 representation can be > cached afterward. > > Think of it this way: > > 1. Load a utf-8 string from DB or file > 2. Convert it to utf-16 > 3. Perform ICU conv 3-5 times > 4. Page gets hit by memcache > 5. utf-16 is converted back to utf-8 > 6. Something changes > ? String was cached ? > 7. need to spit out another utf-8 version of the string again > > And a persistent web application can be held for many hours in memory. > Are we converting back to utf-8 every time? Then it might be better to > wrap the string conversions just around ICU. > > I'd suggest selecting a real (but still as easy-to-work with as can be > found) unicode php app. One that has been written to use a unicode php > module. Then getting a single, representative page from it. By that I > mean the kind of page that gets accessed the most. So for imdb that > would be a movie's page, etc. The smalled 'slice' of the app, not the > whole thing. Dummy-out the other stuff. > > Then convert that part (for rendering one page) into the current php6 > unicode scheme. And can see what's what. > I would choose mediawiki software for this kind of test, it works in a really internationalized environment, plus I did see posting/contributing the main developer of the mediawiki/wikipedia application on the mailing list. But that's just my two cents. Tyrael > > > On Tue, Mar 16, 2010 at 8:04 PM, Ferenc Kovacs wrote: >> On Tue, Mar 16, 2010 at 8:05 PM, Stanislav Malyshev wrote: >>> Hi! >>> On disk storage should probably be UTF-8 without any question? Windows use of widestrings for some files simple doubles up the on disk storage >>> >>> As file content, it's OK (an it'd be easy to add option to specify content >>> transformation if we wanted), but prescribing filenames as UTF-8 would >>> probably be not workable, since different systems (and maybe even different >>> filesystems inside same OS?) can have different opinions on that. >>> '3' is not a very processor friendly number, so working with 4 even though wasteful on memory, does make perfect sense. How long is it since >>> >>> I'm not sure it does. Most of PHP strings are short, so memory loss would be >>> very significant. Also, take into account that CPU caches aren't as big as >>> the main memory, and not fitting your data into the cache is expensive. >>> we had a 640k limit on working memory? SERVERS should have a good amount >>> >>> It doesn't matter how much memory you have, in numbers. Until we find an >>> unlimited source of computer memory left by the aliens in Himalayas, memory >>> costs money. It doesn't matter how much memory do you have - however many >>> gigs you have, you'll be able to run 3 times less PHP processes in new >>> version on the same hardware than in old version, which means new PHP would >>> cost you more to run. "Memory is cheap" is a very misunderstood expression - >>> it's only cheap if you always have much more than you need. >>> Probably 90% of the time a string will come in and go out without requiring any processing at all, so leave it as UTF-8 ? The only time we >>> >>> It might be great if we could do that. The problem might be that right now >>> AFAIK we don't have a good library to work with utf-8 strings (please >>> correct me if I'm wrong here). >> http://source.icu-project.org/repos/icu/icuhtml/trunk/design/strings/icu_utf8.html >> from ICU 3.6 changelog => The UTF-8 transformation functions and >> macros are faster. >> from 4.2 => UTF-8 friendly internal data structure for Unicode data lookup >> so it's seems that guys at ICU tries to close the gap between the >> UTF-16 and UTF-8 performance, so maybe it would be a good idea, to >> check out the current situation. >> >> Tyrael >>> -- >>> Stanislav Malyshev, Zend Software Architect >>> s...@zend.com http://www.zend.com/ >>> (408)253-8829 MSN: s...@zend.com >>> >>> -- >>> PHP Internals - PHP Runtime Development Mailing List >>> To unsubscribe, visit: http://www.php.net/unsub.php >>> >>> >> >> -- >> PHP Internals - PHP Runtime Development Mailing List >> To unsubscribe, visit: http://www.php.net/unsub.php >> >> > -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Req #51295: busyTimeout method for SQLite3
On Sun, Mar 14, 2010 at 09:15:37AM -0400, Wez Furlong wrote: > I'm sure that the docs team will add this to the manual if you ask them > politely. > > Specifically, PDO_SQLITE defaults to a 60 second busy timeout. This can > be changed by setting PDO::ATTR_TIMEOUT. The value is specified in > seconds. > > ISTR that this option can also be specified for some of the other > database drivers to affect the network timeout when processing a query. A nod's as good as a wink. :) This has been committed. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] PHP 5.4 branch and trunk
On 16.03.2010, at 16:58, Derick Rethans wrote: > I've just renamed the 5.4 branch to THE_5_4_THAT_ISNT_5_4 and moved Eventually it should be deleted, if it helps at all in merging the OB change then it should be kept until that happens, otherwise it can be deleted now imho. The new 5.3 based trunk will emerge soon I am sure, but until then lets not bother with having to merge those changes. > trunk to the branch FIRST_UNICODE_IMPLEMENTATION. +1 > The next things to do is to re-create trunk from PHP 5.3; I've hold off > that for now, but I'd like to do the following soon: > > - Declare 5.2 security fixes only (Something for Ilia to declare). > - Declare 5.3 bug fixes only (and ini-mini features if so desired) > (Something for Johannes to declare). +1 > - the new output buffering mechanism (I can not really see why we would > not want this) +1 > - traits, there are also RFCs: > http://wiki.php.net/rfc/horizontalreuse > http://wiki.php.net/rfc/nonbreakabletraits +1 other stuff: http://wiki.php.net/todo/php60 http://wiki.php.net/todo/backlog That being said I think until we know if the next version will be a new major version, we should hold off on BC breaking cleanup stuff likes dropping register globals and friends. But we still might bundle APC with the next release for example, even if its not 6.0 .. -- As for unicode, I would like the next release to be planned independently of finding a solution for unicode, but with the clear option that it will be included if we find a good solution in time (like I said I think it would be good to shoot for a final release summer 2011, so beta phase in early 2011). I propose that sort of a unicode working group forms but much less formal than what I make it sound like. I think the discussions can remain on internals@ and hopefully alternative approaches will be documented as RFCs. But what I mean with working group is a list of a handful of names who feel responsible to keep this topic moving until a solution is found and who people know they can contact if they want to chat or whatever. Again if these guys find a workable solution that can be implemented this year and I am all for putting it into the next release. If not so be it, because I think the lesson learned in all of the PHP6/PHP5.3 release nightmare is that we should have regular releases. So I say we shoot for the release following the next one to come out in the summer of 2012. regards, Lukas Kahwe Smith m...@pooteeweet.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] PHP 5.4 branch and trunk
On 16.03.2010, at 16:58, Derick Rethans wrote: > Before we add features, they need to be discussed whether we want to > have them. As version name for it I would like to use "trunk-dev" (and > not 5.4-dev or 6.0-dev) as we're not quite sure where this is moving. > Right now, there are the following features that I can see we should > think about: Since we do not know the name of the next version yet, maybe its best to base the name on what version it will have as a predecessor and add support for this in version_compare()? Something like "5.3post". Ok this isnt a good suggestion, but I hope you get what I am suggesting. regards, Lukas Kahwe Smith m...@pooteeweet.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] PHP 5.4 branch and trunk
On 16.03.2010, at 19:23, Hannes Magnusson wrote: > On Tue, Mar 16, 2010 at 16:58, Derick Rethans wrote: >> Before we add features, they need to be discussed whether we want to >> have them. > > Does that mean you want to take up a > - strict RFC-and-after-3months-discussion-before-commit policy > (i.e. killing the scratching-an-itch spirit of PHP) > - "I'm going to commit this patch tomorrow" mail to internals@ > (i.e. killing "I need this functionality, maybe others do to" spirit of PHP) Its all a question about the scope of the change obviously. There is some tipping point where it makes sense for an RFC. Remember an RFC not only serves decision making, but also provides some level of documentation (on which the final documentation can be build) for past generations (this is why I for example wrote the ifsetor RFC after we decided that we cannot currently implement it). So like Stas said .. common sense still rules. > I would much rather have a development branch which ""everything > goes"" (like it used to) and then make it up to the release manager to > merge the features he wants in "his branch" (DVCS style) I dont think we ever had an "everything goes" HEAD .. lets say in the past we had a small very active core dev team with really short turn around times for decisions because everybody was answering on IRC or mailinglists within minutes. As a result decisions (not always for the better) were made in a much shorter timeframe than the current availability of core developers affords us. >> - Ilia's scalar type hint patch. > > And which of Ilias patches are you referring to? The original one > (which is identical to the patch I sent in November 2006) or the > "fucking eyh, I need to please everyone so this can be in 5.3 - but > still got rejected" patch? I think he clearly pointed to the wiki page which lists 3 proposals. He is just suggesting we should finalize which one we want and get it in. > You didn't even list the mbstring patch.. that was discussed and as > far as I remember everyone thought it was great idea, just not in a > stable branch. Is this tone really necessary? One you are argueing for more flexibility and then you are shooting the messenger because in a long list he forgot one thing (there are probably a few others .. we might want to go through the todo wiki pages for more)? regards, Lukas Kahwe Smith m...@pooteeweet.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Where are we ACTUALLY on Unicode?
And remember, Its not just the number of times its send to ICU for conversion. Its also the number of times your UTF-16 string has to be converted back into utf-8 afterwards. This is why Apple makes its utf-16 strings immutable. So they are read-only, and the utf-8 representation can be cached afterward. Think of it this way: 1. Load a utf-8 string from DB or file 2. Convert it to utf-16 3. Perform ICU conv 3-5 times 4. Page gets hit by memcache 5. utf-16 is converted back to utf-8 6. Something changes ? String was cached ? 7. need to spit out another utf-8 version of the string again And a persistent web application can be held for many hours in memory. Are we converting back to utf-8 every time? Then it might be better to wrap the string conversions just around ICU. I'd suggest selecting a real (but still as easy-to-work with as can be found) unicode php app. One that has been written to use a unicode php module. Then getting a single, representative page from it. By that I mean the kind of page that gets accessed the most. So for imdb that would be a movie's page, etc. The smalled 'slice' of the app, not the whole thing. Dummy-out the other stuff. Then convert that part (for rendering one page) into the current php6 unicode scheme. And can see what's what. On Tue, Mar 16, 2010 at 8:04 PM, Ferenc Kovacs wrote: > On Tue, Mar 16, 2010 at 8:05 PM, Stanislav Malyshev wrote: >> Hi! >> >>> On disk storage should probably be UTF-8 without any question? Windows >>> use of widestrings for some files simple doubles up the on disk storage >> >> As file content, it's OK (an it'd be easy to add option to specify content >> transformation if we wanted), but prescribing filenames as UTF-8 would >> probably be not workable, since different systems (and maybe even different >> filesystems inside same OS?) can have different opinions on that. >> >>> '3' is not a very processor friendly number, so working with 4 even >>> though wasteful on memory, does make perfect sense. How long is it since >> >> I'm not sure it does. Most of PHP strings are short, so memory loss would be >> very significant. Also, take into account that CPU caches aren't as big as >> the main memory, and not fitting your data into the cache is expensive. >> >>> we had a 640k limit on working memory? SERVERS should have a good amount >> >> It doesn't matter how much memory you have, in numbers. Until we find an >> unlimited source of computer memory left by the aliens in Himalayas, memory >> costs money. It doesn't matter how much memory do you have - however many >> gigs you have, you'll be able to run 3 times less PHP processes in new >> version on the same hardware than in old version, which means new PHP would >> cost you more to run. "Memory is cheap" is a very misunderstood expression - >> it's only cheap if you always have much more than you need. >> >>> Probably 90% of the time a string will come in and go out without >>> requiring any processing at all, so leave it as UTF-8 ? The only time we >> >> It might be great if we could do that. The problem might be that right now >> AFAIK we don't have a good library to work with utf-8 strings (please >> correct me if I'm wrong here). > http://source.icu-project.org/repos/icu/icuhtml/trunk/design/strings/icu_utf8.html > from ICU 3.6 changelog => The UTF-8 transformation functions and > macros are faster. > from 4.2 => UTF-8 friendly internal data structure for Unicode data lookup > so it's seems that guys at ICU tries to close the gap between the > UTF-16 and UTF-8 performance, so maybe it would be a good idea, to > check out the current situation. > > Tyrael >> -- >> Stanislav Malyshev, Zend Software Architect >> s...@zend.com http://www.zend.com/ >> (408)253-8829 MSN: s...@zend.com >> >> -- >> PHP Internals - PHP Runtime Development Mailing List >> To unsubscribe, visit: http://www.php.net/unsub.php >> >> > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > > -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Where are we ACTUALLY on Unicode?
On 3/16/2010 6:48 AM, dreamcat four wrote: > > Sure UTF-16 can make sense. But only if your external representations > are also in UTF-16. So whats the default Unicode settings for MYSQL, > POSTGRE, etc? Well, are they always set to UTF-8, or UTF-16? This is a very good point. The PHP project consumes some 30-odd libraries of extensions. How many do utf-8? How many do ucs2? Utf-16? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Where are we ACTUALLY on Unicode?
Rasmus Lerdorf wrote: On 03/16/2010 12:05 PM, dreamcat four wrote: On Tue, Mar 16, 2010 at 6:32 PM, Rasmus Lerdorf wrote: On 03/16/2010 10:40 AM, dreamcat four wrote: As for text files on disk, if they are unicode, they are most commonly utf-8 too. So then, why use utf-16 as internal unicode representation in Php? It doesn't really make a lot of sense for most regular people who want to use Php for their web application. Unless they don't really care how slow its gonna be converting everything, constantly... Well, the obvious original reason is that ICU uses UTF-16 internally and the logic was that we would be going in and out of ICU to do all the various Unicode operations many more times than we would be interfacing with external things like MySQL or files on disk. You generally only read or write a string once from an external source, but you may perform multiple Unicode operations on that same string so avoiding a conversion for each operation seems logical. -Rasmus Its only logical if you've bothered to profile the conversion calls to ICU against the non-ICU conversion calls. Im guessing the way to do that, is to have 2 versions of each conversion method. One used by ICU, and another used everywhere else. The harder part is to find some suitable, real life php programs to test with. You mean check to see how many actual Unicode operations a standard app makes? We did talk about that, but there is a bit of a chicken-and-egg problem here. Because PHP doesn't natively support Unicode, people write apps in a way that lets them just pass Unicode through PHP and deal with it elsewhere. I would expect the profile to change once PHP gets better support for Unicode. But yes, some ideas around lazy conversions and other tricks would be interesting. If your input and output encoding are both utf-8 and all your data sources are utf-8 and you never do any sort of string manipulation on a particular string, why bother doing the utf-8 to utf-16 conversion on that string. I think that is what I said originally ;) When a string is read in you set an extra flag if it needs special handling, otherwise you just handle it as a single byte per character string ... and for the diehards you add a switch to treat everything as it is now :) -- Lester Caine - G8HFL - Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk// Firebird - http://www.firebirdsql.org/index.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] array_seek function
Right now, it returns the value of a given position. How it's different from: array_slice() returns the sequence of elements from the array array as specified by the offset and length parameters? array_slice returns an array of elements. This function would return the value at the given position. Brian. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] array_seek function
Hi! Right now, it returns the value of a given position. How it's different from: array_slice() returns the sequence of elements from the array array as specified by the offset and length parameters? -- Stanislav Malyshev, Zend Software Architect s...@zend.com http://www.zend.com/ (408)253-8829 MSN: s...@zend.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] PHP 5.4 branch and trunk
Hi! Does that mean you want to take up a - strict RFC-and-after-3months-discussion-before-commit policy (i.e. killing the scratching-an-itch spirit of PHP) - "I'm going to commit this patch tomorrow" mail to internals@ (i.e. killing "I need this functionality, maybe others do to" spirit of PHP) Probably something like "I have this patch and I wrote this RFC, please discuss", then wait reasonable* time for discussion and reasonable* consensus before commit, and for reasonably* small patches "I'm going to commit it in 2 days unless somebody objects" would work (*) I know definitions of "reasonable" differ but I have faith we find a common ground. And which of Ilias patches are you referring to? The original one (which is identical to the patch I sent in November 2006) or the "fucking eyh, I need to please everyone so this can be in 5.3 - but still got rejected" patch? That's exactly why having RFC is good - one link solves all the questions about "which one is it' :) -- Stanislav Malyshev, Zend Software Architect s...@zend.com http://www.zend.com/ (408)253-8829 MSN: s...@zend.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] PHP 5.4 branch and trunk
On 03/16/2010 11:00 PM, Johannes Schlüter wrote: > On Tue, 2010-03-16 at 19:11 +0300, Alexey Zakhlestin wrote: >> + merge php-fpm branch? > > If we get a trunk which will be released in a foreseeable timeframe we > don't need to merge this to 5.3 anymore, which had been an old plan. > Tony, do you agree? Makes sense to me. -- Wbr, Antony Dovgal --- http://pinba.org - realtime statistics for PHP -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Where are we ACTUALLY on Unicode?
On Tue, Mar 16, 2010 at 8:05 PM, Stanislav Malyshev wrote: > Hi! > >> On disk storage should probably be UTF-8 without any question? Windows >> use of widestrings for some files simple doubles up the on disk storage > > As file content, it's OK (an it'd be easy to add option to specify content > transformation if we wanted), but prescribing filenames as UTF-8 would > probably be not workable, since different systems (and maybe even different > filesystems inside same OS?) can have different opinions on that. > >> '3' is not a very processor friendly number, so working with 4 even >> though wasteful on memory, does make perfect sense. How long is it since > > I'm not sure it does. Most of PHP strings are short, so memory loss would be > very significant. Also, take into account that CPU caches aren't as big as > the main memory, and not fitting your data into the cache is expensive. > >> we had a 640k limit on working memory? SERVERS should have a good amount > > It doesn't matter how much memory you have, in numbers. Until we find an > unlimited source of computer memory left by the aliens in Himalayas, memory > costs money. It doesn't matter how much memory do you have - however many > gigs you have, you'll be able to run 3 times less PHP processes in new > version on the same hardware than in old version, which means new PHP would > cost you more to run. "Memory is cheap" is a very misunderstood expression - > it's only cheap if you always have much more than you need. > >> Probably 90% of the time a string will come in and go out without >> requiring any processing at all, so leave it as UTF-8 ? The only time we > > It might be great if we could do that. The problem might be that right now > AFAIK we don't have a good library to work with utf-8 strings (please > correct me if I'm wrong here). http://source.icu-project.org/repos/icu/icuhtml/trunk/design/strings/icu_utf8.html from ICU 3.6 changelog => The UTF-8 transformation functions and macros are faster. from 4.2 => UTF-8 friendly internal data structure for Unicode data lookup so it's seems that guys at ICU tries to close the gap between the UTF-16 and UTF-8 performance, so maybe it would be a good idea, to check out the current situation. Tyrael > -- > Stanislav Malyshev, Zend Software Architect > s...@zend.com http://www.zend.com/ > (408)253-8829 MSN: s...@zend.com > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > > -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] PHP 5.4 branch and trunk
On Tue, 2010-03-16 at 19:11 +0300, Alexey Zakhlestin wrote: > + merge php-fpm branch? If we get a trunk which will be released in a foreseeable timeframe we don't need to merge this to 5.3 anymore, which had been an old plan. Tony, do you agree? johannes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Where are we ACTUALLY on Unicode?
On 03/16/2010 12:05 PM, dreamcat four wrote: > On Tue, Mar 16, 2010 at 6:32 PM, Rasmus Lerdorf wrote: >> On 03/16/2010 10:40 AM, dreamcat four wrote: >>> As for text files on disk, if they are unicode, they are most commonly >>> utf-8 too. So then, why use utf-16 as internal unicode representation >>> in Php? It doesn't really make a lot of sense for most regular people >>> who want to use Php for their web application. Unless they don't >>> really care how slow its gonna be converting everything, constantly... >> >> Well, the obvious original reason is that ICU uses UTF-16 internally and >> the logic was that we would be going in and out of ICU to do all the >> various Unicode operations many more times than we would be interfacing >> with external things like MySQL or files on disk. You generally only >> read or write a string once from an external source, but you may perform >> multiple Unicode operations on that same string so avoiding a conversion >> for each operation seems logical. >> >> -Rasmus > > Its only logical if you've bothered to profile the conversion calls to > ICU against the non-ICU conversion calls. Im guessing the way to do > that, is to have 2 versions of each conversion method. One used by > ICU, and another used everywhere else. The harder part is to find some > suitable, real life php programs to test with. You mean check to see how many actual Unicode operations a standard app makes? We did talk about that, but there is a bit of a chicken-and-egg problem here. Because PHP doesn't natively support Unicode, people write apps in a way that lets them just pass Unicode through PHP and deal with it elsewhere. I would expect the profile to change once PHP gets better support for Unicode. But yes, some ideas around lazy conversions and other tricks would be interesting. If your input and output encoding are both utf-8 and all your data sources are utf-8 and you never do any sort of string manipulation on a particular string, why bother doing the utf-8 to utf-16 conversion on that string. -Rasmus -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Where are we ACTUALLY on Unicode?
On Tue, Mar 16, 2010 at 7:32 PM, Rasmus Lerdorf wrote: > Well, the obvious original reason is that ICU uses UTF-16 internally and > the logic was that we would be going in and out of ICU to do all the > various Unicode operations many more times than we would be interfacing > with external things like MySQL or files on disk. You generally only > read or write a string once from an external source, but you may perform > multiple Unicode operations on that same string so avoiding a conversion > for each operation seems logical. Exactly, that's why I was not so affirmative about using UTF-8 over UTF-16. I would like to evaluate both solutions with a small set of PHP features (say some file ops, 1-2 DBs and part of the core string functions) and see the impact of using UTF-8 or UTF-16. But it is definitivelly not a small decision. -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Where are we ACTUALLY on Unicode?
On Tue, Mar 16, 2010 at 6:32 PM, Rasmus Lerdorf wrote: > On 03/16/2010 10:40 AM, dreamcat four wrote: >> As for text files on disk, if they are unicode, they are most commonly >> utf-8 too. So then, why use utf-16 as internal unicode representation >> in Php? It doesn't really make a lot of sense for most regular people >> who want to use Php for their web application. Unless they don't >> really care how slow its gonna be converting everything, constantly... > > Well, the obvious original reason is that ICU uses UTF-16 internally and > the logic was that we would be going in and out of ICU to do all the > various Unicode operations many more times than we would be interfacing > with external things like MySQL or files on disk. You generally only > read or write a string once from an external source, but you may perform > multiple Unicode operations on that same string so avoiding a conversion > for each operation seems logical. > > -Rasmus > > > Its only logical if you've bothered to profile the conversion calls to ICU against the non-ICU conversion calls. Im guessing the way to do that, is to have 2 versions of each conversion method. One used by ICU, and another used everywhere else. The harder part is to find some suitable, real life php programs to test with. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Where are we ACTUALLY on Unicode?
Hi! On disk storage should probably be UTF-8 without any question? Windows use of widestrings for some files simple doubles up the on disk storage As file content, it's OK (an it'd be easy to add option to specify content transformation if we wanted), but prescribing filenames as UTF-8 would probably be not workable, since different systems (and maybe even different filesystems inside same OS?) can have different opinions on that. '3' is not a very processor friendly number, so working with 4 even though wasteful on memory, does make perfect sense. How long is it since I'm not sure it does. Most of PHP strings are short, so memory loss would be very significant. Also, take into account that CPU caches aren't as big as the main memory, and not fitting your data into the cache is expensive. we had a 640k limit on working memory? SERVERS should have a good amount It doesn't matter how much memory you have, in numbers. Until we find an unlimited source of computer memory left by the aliens in Himalayas, memory costs money. It doesn't matter how much memory do you have - however many gigs you have, you'll be able to run 3 times less PHP processes in new version on the same hardware than in old version, which means new PHP would cost you more to run. "Memory is cheap" is a very misunderstood expression - it's only cheap if you always have much more than you need. Probably 90% of the time a string will come in and go out without requiring any processing at all, so leave it as UTF-8 ? The only time we It might be great if we could do that. The problem might be that right now AFAIK we don't have a good library to work with utf-8 strings (please correct me if I'm wrong here). -- Stanislav Malyshev, Zend Software Architect s...@zend.com http://www.zend.com/ (408)253-8829 MSN: s...@zend.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Where are we ACTUALLY on Unicode?
Rasmus Lerdorf wrote: On 03/16/2010 10:40 AM, dreamcat four wrote: As for text files on disk, if they are unicode, they are most commonly utf-8 too. So then, why use utf-16 as internal unicode representation in Php? It doesn't really make a lot of sense for most regular people who want to use Php for their web application. Unless they don't really care how slow its gonna be converting everything, constantly... Well, the obvious original reason is that ICU uses UTF-16 internally and the logic was that we would be going in and out of ICU to do all the various Unicode operations many more times than we would be interfacing with external things like MySQL or files on disk. You generally only read or write a string once from an external source, but you may perform multiple Unicode operations on that same string so avoiding a conversion for each operation seems logical. Which begs the question - is ICU actually the right base? But I'd still like some feedback on my idea that until an operation needs to be able to handle multi byte character string processing, why not simply stay in UTF-8? No reason why a string variable can't be converted only when needed, and then dropped back to UTF-8 if needed later? And if the user is only using single byte characters then the multi byte stuff never kicks in anyway? If you NEED raw speed use the basic character set. -- Lester Caine - G8HFL - Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk// Firebird - http://www.firebirdsql.org/index.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Where are we ACTUALLY on Unicode?
On 03/16/2010 10:40 AM, dreamcat four wrote: > As for text files on disk, if they are unicode, they are most commonly > utf-8 too. So then, why use utf-16 as internal unicode representation > in Php? It doesn't really make a lot of sense for most regular people > who want to use Php for their web application. Unless they don't > really care how slow its gonna be converting everything, constantly... Well, the obvious original reason is that ICU uses UTF-16 internally and the logic was that we would be going in and out of ICU to do all the various Unicode operations many more times than we would be interfacing with external things like MySQL or files on disk. You generally only read or write a string once from an external source, but you may perform multiple Unicode operations on that same string so avoiding a conversion for each operation seems logical. -Rasmus -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Where are we ACTUALLY on Unicode?
dreamcat four wrote: On Tue, Mar 16, 2010 at 11:48 AM, dreamcat four wrote: On Tue, Mar 16, 2010 at 8:30 AM, Lester Caine wrote: '3' is not a very processor friendly number, so working with 4 even though wasteful on memory, does make perfect sense. How long is it since we had a 640k limit on working memory? SERVERS should have a good amount of memory for caching information anyway. SO is UTF-16 the right approach for processing wide strings? It needs special code to handle everything wider than 16 bits, but at what gain really? If all core functionality is handled as 32 bit characters is there that much of an overhead over the additional processing to get around strings of dissimilar sizes in UTF-16 ? Just to re-enforce some of Lester's points above here. 4-byte per character is never slower that 2-bytes per character... its faster if anything. Bear in mind that 4-byte has been the defacto size for all modern cpu registers / 32-bit microarchitectures since like... Forever. Give a c compiler 4bytes of data... it'll say: thank you very much, and more of the same please! It keeps em happy ;) Sure UTF-16 can make sense. But only if your external representations are also in UTF-16. So whats the default Unicode settings for MYSQL, POSTGRE, etc? Well, are they always set to UTF-8, or UTF-16? To answer my own question, I have done some some further research. It seems that both MySQL and Postgre recommend / default to Latin1 (8-bit ASCII) and 'C' (7-bit ASCII) respectively. So that is to say neither set themselves to any unicode standard by default. In the case of Postgre, the ASCII default is often overiden to UTF-8 by the distro / os / package managers. From the $LOCALE environment variable. So then its UTF-8. In the case of MySQL, it may be left as latin1. But most competent web developers decide to set it to utf-8. Again, its not generally believed that very many people (by comparison) actively chooses utf-16. The most common encoding issue people run into is that their web application has sent their database utf-8 encoded data. But their (usually a MySQL) database still has the factory default encoding Latin-1 (8-bit ascii). People who discover this almost always solve the problem by converting their databases into utf-8. MySQL doesn't support UTF-16 in any GA release. UCS-2 can be used though. As for text files on disk, if they are unicode, they are most commonly utf-8 too. So then, why use utf-16 as internal unicode representation in Php? It doesn't really make a lot of sense for most regular people who want to use Php for their web application. Unless they don't really care how slow its gonna be converting everything, constantly... Andrey -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] PHP 5.4 branch and trunk
On Tue, Mar 16, 2010 at 16:58, Derick Rethans wrote: > Before we add features, they need to be discussed whether we want to > have them. Does that mean you want to take up a - strict RFC-and-after-3months-discussion-before-commit policy (i.e. killing the scratching-an-itch spirit of PHP) - "I'm going to commit this patch tomorrow" mail to internals@ (i.e. killing "I need this functionality, maybe others do to" spirit of PHP) or what exactly do you mean by that? I would much rather have a development branch which ""everything goes"" (like it used to) and then make it up to the release manager to merge the features he wants in "his branch" (DVCS style) > - Ilia's scalar type hint patch. And which of Ilias patches are you referring to? The original one (which is identical to the patch I sent in November 2006) or the "fucking eyh, I need to please everyone so this can be in 5.3 - but still got rejected" patch? You didn't even list the mbstring patch.. that was discussed and as far as I remember everyone thought it was great idea, just not in a stable branch. -Hannes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] PHP 5.4 branch and trunk
On Tue, Mar 16, 2010 at 17:54, Pierre Joye wrote: > On Tue, Mar 16, 2010 at 5:43 PM, Sebastian Bergmann > wrote: >> Am 16.03.2010 16:58, schrieb Derick Rethans: >>> I've just renamed the 5.4 branch to THE_5_4_THAT_ISNT_5_4 and moved >>> trunk to the branch FIRST_UNICODE_IMPLEMENTATION. >> >> Why do we need THE_5_4_THAT_ISNT_5_4 > > Right, this branch must be deleted, useless. The OB patch can be > merged again in trunk when trunk has been rebranched. Why exactly do we need to duplicate the work? IMO that branch should be renamed to trunk/ and those 2 or 3 patches to 5.3 to merged into it. -Hannes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] array_seek function
On Tue, Mar 16, 2010 at 4:22 PM, Derick Rethans wrote: > I was also thinking, can we just make this work just like fseek (with a > "whence" parameter) as well? (http://uk3.php.net/fseek) Hi, not sure how SEEK_END is supposed to work with arrays but here is SEEK_SET and SEEK_CUR (with positive and negative offset) http://valokuva.org/~mikko/array_seek_whence.patch.txt -- Mikko Koppanen -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Where are we ACTUALLY on Unicode?
On Tue, Mar 16, 2010 at 11:48 AM, dreamcat four wrote: > On Tue, Mar 16, 2010 at 8:30 AM, Lester Caine wrote: >> '3' is not a very processor friendly number, so working with 4 even though >> wasteful on memory, does make perfect sense. How long is it since we had a >> 640k limit on working memory? SERVERS should have a good amount of memory >> for caching information anyway. SO is UTF-16 the right approach for >> processing wide strings? It needs special code to handle everything wider >> than 16 bits, but at what gain really? If all core functionality is handled >> as 32 bit characters is there that much of an overhead over the additional >> processing to get around strings of dissimilar sizes in UTF-16 ? > > Just to re-enforce some of Lester's points above here. > > 4-byte per character is never slower that 2-bytes per character... its > faster if anything. Bear in mind that 4-byte has been the defacto size > for all modern cpu registers / 32-bit microarchitectures since > like... Forever. Give a c compiler 4bytes of data... it'll say: thank > you very much, and more of the same please! It keeps em happy ;) > > Sure UTF-16 can make sense. But only if your external representations > are also in UTF-16. So whats the default Unicode settings for MYSQL, > POSTGRE, etc? Well, are they always set to UTF-8, or UTF-16? > To answer my own question, I have done some some further research. It seems that both MySQL and Postgre recommend / default to Latin1 (8-bit ASCII) and 'C' (7-bit ASCII) respectively. So that is to say neither set themselves to any unicode standard by default. In the case of Postgre, the ASCII default is often overiden to UTF-8 by the distro / os / package managers. From the $LOCALE environment variable. So then its UTF-8. In the case of MySQL, it may be left as latin1. But most competent web developers decide to set it to utf-8. Again, its not generally believed that very many people (by comparison) actively chooses utf-16. The most common encoding issue people run into is that their web application has sent their database utf-8 encoded data. But their (usually a MySQL) database still has the factory default encoding Latin-1 (8-bit ascii). People who discover this almost always solve the problem by converting their databases into utf-8. As for text files on disk, if they are unicode, they are most commonly utf-8 too. So then, why use utf-16 as internal unicode representation in Php? It doesn't really make a lot of sense for most regular people who want to use Php for their web application. Unless they don't really care how slow its gonna be converting everything, constantly... -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] PHP 5.4 branch and trunk
On Tue, Mar 16, 2010 at 5:43 PM, Sebastian Bergmann wrote: > Am 16.03.2010 16:58, schrieb Derick Rethans: >> I've just renamed the 5.4 branch to THE_5_4_THAT_ISNT_5_4 and moved >> trunk to the branch FIRST_UNICODE_IMPLEMENTATION. > > Why do we need THE_5_4_THAT_ISNT_5_4 Right, this branch must be deleted, useless. The OB patch can be merged again in trunk when trunk has been rebranched. Cheers, -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] PHP 5.4 branch and trunk
Am 16.03.2010 16:58, schrieb Derick Rethans: > I've just renamed the 5.4 branch to THE_5_4_THAT_ISNT_5_4 and moved > trunk to the branch FIRST_UNICODE_IMPLEMENTATION. Why do we need THE_5_4_THAT_ISNT_5_4 and trunk? trunk should be where the development happens. When the time comes for a release, PHP_X_Y should be branched off of trunk. -- Sebastian BergmannCo-Founder and Principal Consultant http://sebastian-bergmann.de/ http://thePHP.cc/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] array_seek function
On Tue, 16 Mar 2010, Felix De Vliegher wrote: > On 16-mrt-2010, at 17:07, Derick Rethans wrote: > > > On Tue, 16 Mar 2010, Felix De Vliegher wrote: > > > >> Right now, it returns the value of a given position. In that case, > >> array_get_pos might be a better name. Oh, and I attached the patch > >> with .txt extension :) > > > > Does it also seek the array pointer? Because I think array_seek that > > moves the pointer, in combination with current() and key() might make > > slightly more sense? > > Mikko updated the patch a bit to set the array pointer correctly (and > make it perform a bit better when dealing with large arrays), my > version left it one position too far. So yes, that's possible. The > updated version can be found here: > http://valokuva.org/~mikko/array_seek.patch.txt I was also thinking, can we just make this work just like fseek (with a "whence" parameter) as well? (http://uk3.php.net/fseek) with kind regards, Derick -- http://derickrethans.nl | http://xdebug.org Like Xdebug? Consider a donation: http://xdebug.org/donate.php twitter: @derickr and @xdebug -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] array_seek function
On 16-mrt-2010, at 17:07, Derick Rethans wrote: > On Tue, 16 Mar 2010, Felix De Vliegher wrote: > >> Right now, it returns the value of a given position. In that case, >> array_get_pos might be a better name. Oh, and I attached the patch >> with .txt extension :) > > Does it also seek the array pointer? Because I think array_seek that > moves the pointer, in combination with current() and key() might make > slightly more sense? > Mikko updated the patch a bit to set the array pointer correctly (and make it perform a bit better when dealing with large arrays), my version left it one position too far. So yes, that's possible. The updated version can be found here: http://valokuva.org/~mikko/array_seek.patch.txt Cheers, Felix
Re: [PHP-DEV] Re: PHP 5.4 branch and trunk
On Tue, Mar 16, 2010 at 17:10, David Soria Parra wrote: > On 2010-03-16, Derick Rethans wrote: >> - Declare 5.2 security fixes only (Something for Ilia to declare). >> - Declare 5.3 bug fixes only (and ini-mini features if so desired) >> (Something for Johannes to declare). >> >> Once that's done, I'd like to: >> >> - Recreate trunk from the 5.3 branch. >> >> - the new output buffering mechanism (I can not really see why we would >> not want this) > is there something about that in the wiki? I think a few lines in the wiki > about this would be good. I doubt it. Its a rewrite which had to be done to simplify things and fixes several bugs. -Hannes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] PHP 5.4 branch and trunk
On Tue, 16 Mar 2010, Alexey Zakhlestin wrote: > On Tue, Mar 16, 2010 at 6:58 PM, Derick Rethans > wrote: > > > Right now, there are the following features that I can see we should > > think about: > > > > - the new output buffering mechanism (I can not really see why we would > > not want this) > > - Scott's big number improvements. Scott, can you explain (in an RFC) > > what exactly this does and how it works? > > - Ilia's scalar type hint patch. There are RFCs: > > http://wiki.php.net/rfc/typechecking > > - traits, there are also RFCs: > > http://wiki.php.net/rfc/horizontalreuse > > http://wiki.php.net/rfc/nonbreakabletraits > > + merge php-fpm branch? Can't see why not. Is there an RFC for this? regards, Derick -- http://derickrethans.nl | http://xdebug.org Like Xdebug? Consider a donation: http://xdebug.org/donate.php twitter: @derickr and @xdebug -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Re: array_seek function
On Tue, Mar 16, 2010 at 2:12 PM, Christian Schneider wrote > I thinks the user space implementation > > function array_seek($array, $pos) > { > $a = array_values($array); > return $a[$pos]; > } > > is simple enough to not add a native function for this. > > It might not be the most efficient way to do it but I doubt that it is > something done frequently enough to justify another native function. Hi, slightly modified version of the original patch http://valokuva.org/~mikko/array_seek.patch.txt. The difference to the original is that the iterator position is left where the user seeked to. So something like following should work: Not sure how useful it is to have this in core but I do remember for looking for a seek function for arrays before. -- Mikko Koppanen -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] PHP 5.4 branch and trunk
On Tue, Mar 16, 2010 at 6:58 PM, Derick Rethans wrote: > Right now, there are the following features that I can see we should > think about: > > - the new output buffering mechanism (I can not really see why we would > not want this) > - Scott's big number improvements. Scott, can you explain (in an RFC) > what exactly this does and how it works? > - Ilia's scalar type hint patch. There are RFCs: > http://wiki.php.net/rfc/typechecking > - traits, there are also RFCs: > http://wiki.php.net/rfc/horizontalreuse > http://wiki.php.net/rfc/nonbreakabletraits + merge php-fpm branch? -- Alexey Zakhlestin http://www.milkfarmsoft.com/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Re: PHP 5.4 branch and trunk
On 2010-03-16, Derick Rethans wrote: > - Declare 5.2 security fixes only (Something for Ilia to declare). > - Declare 5.3 bug fixes only (and ini-mini features if so desired) > (Something for Johannes to declare). > > Once that's done, I'd like to: > > - Recreate trunk from the 5.3 branch. > > - the new output buffering mechanism (I can not really see why we would > not want this) is there something about that in the wiki? I think a few lines in the wiki about this would be good. > - Scott's big number improvements. Scott, can you explain (in an RFC) > what exactly this does and how it works? > - Ilia's scalar type hint patch. There are RFCs: > http://wiki.php.net/rfc/typechecking > - traits, there are also RFCs: > http://wiki.php.net/rfc/horizontalreuse > http://wiki.php.net/rfc/nonbreakabletraits thank you. I agree that we should discuss new additions with proper rfcs before we commit them. in addition to that peopled interested in unicode should get together and discuss how to readd unicode support and in which way to do this. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] array_seek function
On Tue, 16 Mar 2010, Felix De Vliegher wrote: > Right now, it returns the value of a given position. In that case, > array_get_pos might be a better name. Oh, and I attached the patch > with .txt extension :) Does it also seek the array pointer? Because I think array_seek that moves the pointer, in combination with current() and key() might make slightly more sense? with kind regards, Derick -- http://derickrethans.nl | http://xdebug.org Like Xdebug? Consider a donation: http://xdebug.org/donate.php twitter: @derickr and @xdebug -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] PHP 5.4 branch and trunk
Hello, I've just renamed the 5.4 branch to THE_5_4_THAT_ISNT_5_4 and moved trunk to the branch FIRST_UNICODE_IMPLEMENTATION. The next things to do is to re-create trunk from PHP 5.3; I've hold off that for now, but I'd like to do the following soon: - Declare 5.2 security fixes only (Something for Ilia to declare). - Declare 5.3 bug fixes only (and ini-mini features if so desired) (Something for Johannes to declare). Once that's done, I'd like to: - Recreate trunk from the 5.3 branch. Before we add features, they need to be discussed whether we want to have them. As version name for it I would like to use "trunk-dev" (and not 5.4-dev or 6.0-dev) as we're not quite sure where this is moving. Right now, there are the following features that I can see we should think about: - the new output buffering mechanism (I can not really see why we would not want this) - Scott's big number improvements. Scott, can you explain (in an RFC) what exactly this does and how it works? - Ilia's scalar type hint patch. There are RFCs: http://wiki.php.net/rfc/typechecking - traits, there are also RFCs: http://wiki.php.net/rfc/horizontalreuse http://wiki.php.net/rfc/nonbreakabletraits with kind regards, Derick -- http://derickrethans.nl | http://xdebug.org Like Xdebug? Consider a donation: http://xdebug.org/donate.php twitter: @derickr and @xdebug -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Re: [PHP-CVS] svn: /php/php-src/
On Tue, Mar 16, 2010 at 16:45, Derick Rethans wrote: > derick Tue, 16 Mar 2010 15:45:24 + > > Revision: http://svn.php.net/viewvc?view=revision&revision=296284 > > Log: > - Moved the Unicode experiment from trunk to its own branch for reference. > > Changed paths: > A + php/php-src/branches/FIRST_UNICODE_IMPLEMENTATION/ > (from php/php-src/trunk/:r296283) > D php/php-src/trunk/ Kudos -Hannes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Re: array_seek function
Felix De Vliegher wrote: > Hi all > > I recently needed seek functionality in arrays, and couldn't find it > in the regular set of array functions, so I wrote a function for it. > Seek = getting an array value based on the position (or offset, if you > want to call it like that), and not the key of the item) > > Basically you can use it like this: > $input = array(3, 'bar', 'baz'); > echo array_seek($input, 2); // returns 'baz' > echo array_seek($input, 0); // returns 3 > echo array_seek($input, 5); // returns NULL, emits an out of range warning I thinks the user space implementation function array_seek($array, $pos) { $a = array_values($array); return $a[$pos]; } is simple enough to not add a native function for this. It might not be the most efficient way to do it but I doubt that it is something done frequently enough to justify another native function. My 2 cents, - Chris -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] PHP 6
On 2010-03-13, Lukas Kahwe Smith wrote: > +1 > > As for the exact features to merge, lets first start with formulating a plan > about what we want to see in the next release. I also think it makes sense to > base the number and scope if the features on a rough idea of when we want to > see this next release. In order to put together that release plan i guess we > should have an RM defined first. I think Andi said the same thing on IRC > yesterday. > > I can certainly see you as RM, but i would like to propose another newer guy > for the job: > David Soria Parra for the record: I'm willing to do the RM. Besides my spare time that I spend on the project I have dedicated working time for this. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] array_seek function
On 16 March 2010 13:30, Felix De Vliegher wrote: > Hi all > > I recently needed seek functionality in arrays, and couldn't find it in the > regular set of array functions, so I wrote a function for it. (Seek = getting > an array value based on the position (or offset, if you want to call it like > that), and not the key of the item) > > Basically you can use it like this: > $input = array(3, 'bar', 'baz'); > echo array_seek($input, 2); // returns 'baz' > echo array_seek($input, 0); // returns 3 > echo array_seek($input, 5); // returns NULL, emits an out of range warning > > I was wondering if it's useful to add this to the family of array functions. > I know there is a somewhat similar thing in SPL (ArrayIterator::seek), but > that doesn't work exactly like what I was aiming for. > > Attached is a patch for the function against the 5.3 branch. If approved, I > could add it to svn + testcases + docs. Feedback please :-) > > > Kind regards, > Felix > > > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > Maybe not as efficient as it could be but ... 'Itchy', 'Two' => 'Knee', 'Three' => 'San', 'Four' => 'She'); echo @reset(array_keys(array_values($input), 'Knee')); Richard. -- - Richard Quadling "Standing on the shoulders of some very clever giants!" EE : http://www.experts-exchange.com/M_248814.html EE4Free : http://www.experts-exchange.com/becomeAnExpert.jsp Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731 ZOPA : http://uk.zopa.com/member/RQuadling -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] array_seek function
Hi Pierre Right now, it returns the value of a given position. In that case, array_get_pos might be a better name. Oh, and I attached the patch with .txt extension :) Greetings, Felix Index: ext/standard/array.c === --- ext/standard/array.c(revision 296276) +++ ext/standard/array.c(working copy) @@ -4507,6 +4507,41 @@ } /* }}} */ +/* {{{ proto array array_seek(array input, int position) + Finds the array value which matches the position of that element */ +PHP_FUNCTION(array_seek) +{ + int num_in; + int currentpos = 0; + long pos; + zval *array, **entry; + HashPosition hpos; + + if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "al", &array, &pos) == FAILURE) { +return; + } + + /* Get number of entries in the array */ + num_in = zend_hash_num_elements(Z_ARRVAL_P(array)); + + /* Check if we have a valid position. */ + if (pos > num_in - 1 || pos < 0) { +php_error_docref(NULL TSRMLS_CC, E_WARNING, "Seek position %ld is out of range", pos); +return; + } + + /* Loop over the input array untill we are at the right position */ + zend_hash_internal_pointer_reset_ex(Z_ARRVAL_P(array), &hpos); + while (currentpos <= pos && zend_hash_get_current_data_ex(Z_ARRVAL_P(array), (void **)&entry, &hpos) == SUCCESS) { +currentpos++; +zend_hash_move_forward_ex(Z_ARRVAL_P(array), &hpos); + } + + /* Return the matching element */ + RETURN_ZVAL(*entry, 1, 0); +} +/* }}} */ + /* * Local variables: * tab-width: 4 Index: ext/standard/basic_functions.c === --- ext/standard/basic_functions.c (revision 296276) +++ ext/standard/basic_functions.c (working copy) @@ -609,6 +609,11 @@ ZEND_ARG_INFO(0, keys) /* ARRAY_INFO(0, keys, 0) */ ZEND_ARG_INFO(0, values) /* ARRAY_INFO(0, values, 0) */ ZEND_END_ARG_INFO() + +ZEND_BEGIN_ARG_INFO(arginfo_array_seek, 0) + ZEND_ARG_INFO(0, input) /* ARRAY_INFO(0, input, 0) */ + ZEND_ARG_INFO(0, position) /* ARRAY_INFO(0, position, 0) */ +ZEND_END_ARG_INFO() /* }}} */ /* {{{ basic_functions.c */ ZEND_BEGIN_ARG_INFO(arginfo_get_magic_quotes_gpc, 0) @@ -3320,6 +3325,7 @@ PHP_FE(array_chunk, arginfo_array_chunk) PHP_FE(array_combine, arginfo_array_combine) PHP_FE(array_key_exists, arginfo_array_key_exists) + PHP_FE(array_seek, arginfo_array_seek) /* aliases from array.c */ PHP_FALIAS(pos, current, arginfo_current) Index: ext/standard/php_array.h === --- ext/standard/php_array.h(revision 296276) +++ ext/standard/php_array.h(working copy) @@ -101,6 +101,7 @@ PHP_FUNCTION(array_key_exists); PHP_FUNCTION(array_chunk); PHP_FUNCTION(array_combine); +PHP_FUNCTION(array_seek); PHPAPI HashTable* php_splice(HashTable *, int, int, zval ***, int, HashTable **); PHPAPI int php_array_merge(HashTable *dest, HashTable *src, int recursive TSRMLS_DC); On 16-mrt-2010, at 14:34, Pierre Joye wrote: > hi Felix, > > Not sure about the usefulness of this function but the name is > misleading (pls reattach the patch as .txt while being at it :). Does > it set the position (_seek) or does it return the value of a given > position (_get_pos)? or both (no idea :)? > > Cheers, > > Cheers, > > On Tue, Mar 16, 2010 at 2:30 PM, Felix De Vliegher > wrote: >> Hi all >> >> I recently needed seek functionality in arrays, and couldn't find it in the >> regular set of array functions, so I wrote a function for it. (Seek = >> getting an array value based on the position (or offset, if you want to call >> it like that), and not the key of the item) >> >> Basically you can use it like this: >> $input = array(3, 'bar', 'baz'); >> echo array_seek($input, 2); // returns 'baz' >> echo array_seek($input, 0); // returns 3 >> echo array_seek($input, 5); // returns NULL, emits an out of range warning >> >> I was wondering if it's useful to add this to the family of array functions. >> I know there is a somewhat similar thing in SPL (ArrayIterator::seek), but >> that doesn't work exactly like what I was aiming for. >> >> Attached is a patch for the function against the 5.3 branch. If approved, I >> could add it to svn + testcases + docs. Feedback please :-) >> >> >> Kind regards, >> Felix >> >> >> >> -- >> PHP Internals - PHP Runtime Development Mailing List >> To unsubscribe, visit: http://www.php.net/unsub.php >> > > > > -- > Pierr
Re: [PHP-DEV] array_seek function
hi Felix, Not sure about the usefulness of this function but the name is misleading (pls reattach the patch as .txt while being at it :). Does it set the position (_seek) or does it return the value of a given position (_get_pos)? or both (no idea :)? Cheers, Cheers, On Tue, Mar 16, 2010 at 2:30 PM, Felix De Vliegher wrote: > Hi all > > I recently needed seek functionality in arrays, and couldn't find it in the > regular set of array functions, so I wrote a function for it. (Seek = getting > an array value based on the position (or offset, if you want to call it like > that), and not the key of the item) > > Basically you can use it like this: > $input = array(3, 'bar', 'baz'); > echo array_seek($input, 2); // returns 'baz' > echo array_seek($input, 0); // returns 3 > echo array_seek($input, 5); // returns NULL, emits an out of range warning > > I was wondering if it's useful to add this to the family of array functions. > I know there is a somewhat similar thing in SPL (ArrayIterator::seek), but > that doesn't work exactly like what I was aiming for. > > Attached is a patch for the function against the 5.3 branch. If approved, I > could add it to svn + testcases + docs. Feedback please :-) > > > Kind regards, > Felix > > > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] array_seek function
Hi all I recently needed seek functionality in arrays, and couldn't find it in the regular set of array functions, so I wrote a function for it. (Seek = getting an array value based on the position (or offset, if you want to call it like that), and not the key of the item) Basically you can use it like this: $input = array(3, 'bar', 'baz'); echo array_seek($input, 2); // returns 'baz' echo array_seek($input, 0); // returns 3 echo array_seek($input, 5); // returns NULL, emits an out of range warning I was wondering if it's useful to add this to the family of array functions. I know there is a somewhat similar thing in SPL (ArrayIterator::seek), but that doesn't work exactly like what I was aiming for. Attached is a patch for the function against the 5.3 branch. If approved, I could add it to svn + testcases + docs. Feedback please :-) Kind regards, Felix -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Where are we ACTUALLY on Unicode?
dreamcat four wrote: On Tue, Mar 16, 2010 at 8:30 AM, Lester Caine wrote: '3' is not a very processor friendly number, so working with 4 even though wasteful on memory, does make perfect sense. How long is it since we had a 640k limit on working memory? SERVERS should have a good amount of memory for caching information anyway. SO is UTF-16 the right approach for processing wide strings? It needs special code to handle everything wider than 16 bits, but at what gain really? If all core functionality is handled as 32 bit characters is there that much of an overhead over the additional processing to get around strings of dissimilar sizes in UTF-16 ? Just to re-enforce some of Lester's points above here. 4-byte per character is never slower that 2-bytes per character... its faster if anything. Bear in mind that 4-byte has been the defacto size for all modern cpu registers / 32-bit microarchitectures since like... Forever. Give a c compiler 4bytes of data... it'll say: thank you very much, and more of the same please! It keeps em happy ;) Sure UTF-16 can make sense. But only if your external representations are also in UTF-16. So whats the default Unicode settings for MYSQL, POSTGRE, etc? Well, are they always set to UTF-8, or UTF-16? Just do the same as them. All MySQL GA versions (not including the upcoming 5.5 which is not GA) can't eat UTF-16 queries but can receive UTF-16 results (although all MySQL GA releases that know character sets, 4.1, 5.0, 5.1, don't know anything about UTF-16 but only UCS-2, which are the characters in the BMP). It is probable (I can't say definitely due to Oracle's recognition rules) that 5.5 will have proper UTF-16. UTF-16 has its advantages. If your unicode data includes mostly ASCII characters and here and there some non-ascii ones, then UTF-8 should be the choice - less disk space used, which means the HDD can read more data which in turn means more table rows server per second. Converting in the client (PHP) is ok, as it scales, just throw some more web servers. Scaling a RDBMS is completely different story Best, Andrey -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] XML binding & mapping library
On 16.03.2010, at 10:46, John wrote: > Hello, people. I am looking for community feedback about my > ideas for XML binding & persistence library: > Are you thinking about implementing it as some kind of extension? or about php-code? or just reusable C-library with bindings for PHP? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Where are we ACTUALLY on Unicode?
On Tue, Mar 16, 2010 at 8:30 AM, Lester Caine wrote: > '3' is not a very processor friendly number, so working with 4 even though > wasteful on memory, does make perfect sense. How long is it since we had a > 640k limit on working memory? SERVERS should have a good amount of memory > for caching information anyway. SO is UTF-16 the right approach for > processing wide strings? It needs special code to handle everything wider > than 16 bits, but at what gain really? If all core functionality is handled > as 32 bit characters is there that much of an overhead over the additional > processing to get around strings of dissimilar sizes in UTF-16 ? Just to re-enforce some of Lester's points above here. 4-byte per character is never slower that 2-bytes per character... its faster if anything. Bear in mind that 4-byte has been the defacto size for all modern cpu registers / 32-bit microarchitectures since like... Forever. Give a c compiler 4bytes of data... it'll say: thank you very much, and more of the same please! It keeps em happy ;) Sure UTF-16 can make sense. But only if your external representations are also in UTF-16. So whats the default Unicode settings for MYSQL, POSTGRE, etc? Well, are they always set to UTF-8, or UTF-16? Just do the same as them. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] XML binding & mapping library
Hello, people. I am looking for community feedback about my ideas for XML binding & persistence library: XML binding & persistence library Description: object-oriented library for mapping XML file structure, binding to PHP 5 class. Library would provide functionality for using XML based entity descriptions, described in XML Schema, to perform data manipulations(CRUD facility), and would use described entity relations for persisting XML entries. Association and/or aggregation would represent O-O abstractions for XML entity relations. For performing queries would be used either on-the-fly or automatically generated translation of XML entity properties, characteristics to XPath expressions, which can give a performance boost for production (read: non-development) usage. Also should be implemented functionality for building XML formatters from XML Schema, which itself avoids a lot of script execution speed reduction for preparing & splitting together XML tree data structures Main implementation reasons and usage opportunities: * Most of XML structures represent complex entries, where both nested tags and tag properties represent characteristics and behavior. This usually results in escalation of complexity of XML interpretation rules, code will be more heavy. Additional development penalty is that often such rules are less similar against each other – amount of reusable code is very low. Also data entries, related to entities with deeply nested relations, would require manipulation of a group of files as a single transaction to provide persistence; normally there is no way to reuse XPath queries in a way, that is entry based 1. I have motivation to go even farther: iterators would act as a fork, moving back an forth (development stage) threw a set of tags and/or tag properties. Iterators can become reusable in different projects, for different XML structures, simply changing naming convention (read: the way how the tag names, with respect to W3C qualified name features) 2. Can help a lot in translating XML binding code threw XPath expressions (because of specific W3C specification rules, which supposed to query all matched entries by default). Even possible to force relations in generated XPath to speed up querying – a complex expression can specify one rules before another (for example, if specific ancestor in XML tree would arrive less often then some tag property, it can be possible to mention ancestor requirement before property requirement – less line count would be iterated*) * You could use PHP 5 reflection and O-O features of PHP 5 to describe classes. And after that you can simply generate required XML Schema. Awesome for using non-DOM (read: SAX) libraries at production environment, especially if there would be a mix of PHP 5 code and, say, Java code (by the way – code generator can produce code for other languages, if necessary). Good for situations where you should develop some administration tools with web interface in PHP 5, which would operate on Java software XML based configuration files (for example, XML configuration file for Java message broker) * Of course, the well known mixing of database mapping(ORM) and XML mapping. Would be awesome if there would be an import of ORM classes to XML related ones and vice versa – more flexible RDBMS table structure. Awesome for complex storage manipulations (storing objects and/or their entities in XML and RDBMS, without an performance penalty of forcing RDBMS to format XML for you). And, probably the most awesome here – you can move data from XML storage to RDBMS and vice versa for balancing load performance, especially if data characteristics can vary a lot. Generation of performance tests can be added if necessary * XML web site maps. Yea, those are often in XML. Also their CMS systems, that can manage site structure threw XML. And because MVC frameworks are most wide solution – the site map itself is referred (directly or not) to controller class – action name pairs. As a result – it is hard to forget about automatic testing, that is where PHPUnit can come up. Main idea – use site map as a prototype for defining more complex mapping. Some kind of XML tree with entities described (entries are referred to controller-index pairs, remember), where entities described in a test feature driven manner; you can force your testing automation with a guaranty of covering necessary QA features, even connect QA software to development tools. Note: DOM library would be used at background due to complexity of testing configuration * And, of course, the famous RESTful Web services ;) At least can help in data deduplication... Thanks. John aka webautoma...@gmail.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Where are we ACTUALLY on Unicode?
Stanislav Malyshev wrote: Hi! What I am probably asking is what was the brick wall PHP6 hit. I was under the impression that there was no agreement on 'switchable or only' to unicode core? ( And those who did write PHP6 books seemed to have their own views on which way the discussions would go ;) ). From what I can see, the biggest issues are these: 1. Performance - Unicode-based PHP right now requires tons of conversions when talking to outside world (like MySQL) which slows down the app significantly. Many extensions frequently used by PHP app writers (such as mysql, pcre, etc.) do not support UTF-16 properly. Also, inflated memory usage hurts scalability a lot. 2. Compatibility - it's hard to make existing app works with Unicode and doesn't lose in performance or doesn't have any weird scenarios where your passwords suddenly stop working because there's an extra recoding step in some md5() call. I think that there does need to be a proper review of just what the target is? There are a number of 'unknowns' such as how does one identify the version of unicode being used. Differences seem to exist between OS's which don't help with that problem? On disk storage should probably be UTF-8 without any question? Windows use of widestrings for some files simple doubles up the on disk storage requirements for very little gain? And remembering to convert '.reg' files back to normal raw text so I can read them on the Linux machines adds to the fun. In memory handling of character strings is I think where some alternative methods may be appropriate. Firebird's original UNICODE_FSS collation was 3 bytes per character ( that IS the limit for Unicode ;) ) and so all of the character counting stuff works transparently. Firebird records are automatically compressed before storage, so white space in character strings is not wasting space on disk, and the unicode collations get compressed in the same way. '3' is not a very processor friendly number, so working with 4 even though wasteful on memory, does make perfect sense. How long is it since we had a 640k limit on working memory? SERVERS should have a good amount of memory for caching information anyway. SO is UTF-16 the right approach for processing wide strings? It needs special code to handle everything wider than 16 bits, but at what gain really? If all core functionality is handled as 32 bit characters is there that much of an overhead over the additional processing to get around strings of dissimilar sizes in UTF-16 ? Most of my own data handling is done via the database anyway, so queries return data already sorted and filtered. There is no point pulling un-proccessed data and then throwing much of it away, hence the rest of the infrastructure being used is important to get the best performance? Probably 90% of the time a string will come in and go out without requiring any processing at all, so leave it as UTF-8 ? The only time we need to accurately know the number and position of characters is when we need to do some sting processing, and then only if the strings use multibyte characters. SO how about an additional couple of flags on a string variable. When a UTF-8 string is loaded, it is counted for bytes, and characters, and number of bytes per. If bytes and characters are the same ... no problems. If number of bytes is greater than 1, then sting handling needs to 'open them up' before processing, and '2' just uses an efficient UTF-16 processing, while '3+' goes to 32 bit processing? Am I missing something? Why does unicode have to complicate things when in reality they are quite simple? Legacy stuff gets converted to UTF-8 and in many cases the user will not even see a difference, but the 'unicode on/off' switch just allows 127 single byte characters rather than 255 ? Currently all the multilingual stuff IS passing through PHP transparently and it would seem we can use unicode for variable names? So what IS missing? -- Lester Caine - G8HFL - Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk// Firebird - http://www.firebirdsql.org/index.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php