date:20100316

Re: [PHP-DEV] array_seek function

2010-03-16 Thread Felix De Vliegher

Hi

As SEEK_END only makes sense with zero or negative offsets (for arrays anyway), 
I've come up with an implementation for SEEK_END:
http://phpbenelux.eu/array_seek.patch.txt

So you can do:
$arr = array('a', 'b', 'c', 'd');
echo array_seek($arr, -2, SEEK_END); // outputs 'b'
echo array_seek($arr, 0, SEEK_END); // outputs 'd'


Cheers,
Felix

On 16-mrt-2010, at 19:07, Mikko Koppanen wrote:

> On Tue, Mar 16, 2010 at 4:22 PM, Derick Rethans  wrote:
>> I was also thinking, can we just make this work just like fseek (with a
>> "whence" parameter) as well? (http://uk3.php.net/fseek)
> 
> Hi,
> 
> not sure how SEEK_END is supposed to work with arrays but here is
> SEEK_SET and SEEK_CUR (with positive and negative offset)
> http://valokuva.org/~mikko/array_seek_whence.patch.txt
> 
> -- 
> Mikko Koppanen


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Re: array_seek function

2010-03-16 Thread Hannes Magnusson

On Tue, Mar 16, 2010 at 17:12, Mikko Koppanen  wrote:
> On Tue, Mar 16, 2010 at 2:12 PM, Christian Schneider
>  wrote
>> I thinks the user space implementation
>>
>> function array_seek($array, $pos)
>> {
>>        $a = array_values($array);
>>        return $a[$pos];
>> }
>>
>> is simple enough to not add a native function for this.
>>
>> It might not be the most efficient way to do it but I doubt that it is
>> something done frequently enough to justify another native function.

> slightly modified version of the original patch
> http://valokuva.org/~mikko/array_seek.patch.txt. The difference to the

I once porpoised similar patch to in_array, where it didn't reset the
position after finding the "found element".


In applications like PhD, this is extremely useful and safes us at
least 10% overhead (at the time I benchmarked it with my patch to
in_array()).

I think we wound up with something like:
while (list($key, $val) = each($array)) {
  if ($key == "foobar") {
break;
  }
}
next($array);
$current_index = current($array);
To get the _next_ value after the known "currently known value (or key)".

In an application like PhD (which already brought 24hours (DSSSL
24hours, xsltproc two formats compile time to) down to ~3-4minutes
(3-5formats), 10% of _language_ overhead is extremely important, so I
am all for a function that can do this (our/my goal is max 1minute...
- sorry, HD read/write is still extremely expensive :(, it simply cant
get faster then that afaict - if you have an idea; GSOC is open for
experiments.. :D).

-Hannes

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

RE: [PHP-DEV] array_seek function

2010-03-16 Thread Jared Williams

 

> -Original Message-
> From: Felix De Vliegher [mailto:felix.devlieg...@gmail.com] 
> Sent: 16 March 2010 13:31
> To: PHP internals
> Subject: [PHP-DEV] array_seek function
> 
> Hi all
> 
> I recently needed seek functionality in arrays, and couldn't 
> find it in the regular set of array functions, so I wrote a 
> function for it. (Seek = getting an array value based on the 
> position (or offset, if you want to call it like that), and 
> not the key of the item)
> 
> Basically you can use it like this:
> $input = array(3, 'bar', 'baz');
> echo array_seek($input, 2); // returns 'baz'
> echo array_seek($input, 0); // returns 3 echo 
> array_seek($input, 5); // returns NULL, emits an out of range
warning
> 

Remember doing something like this in the past...

$input = array(3, 'bar', 'baz');
$iterator = new ArrayIterator($input);

$iterator->seek(2);
echo $iterator->current();

$iterator->seek(0);
echo $iterator->current();

$iterator->seek(5); // throws OutOfBoundsException

Though a specific function does make sense, imo.

Jared


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] PHP 5.4 branch and trunk

2010-03-16 Thread Johannes Schlüter

On Tue, 2010-03-16 at 22:13 +0100, Lukas Kahwe Smith wrote:
> On 16.03.2010, at 16:58, Derick Rethans wrote:
> 
> > Before we add features, they need to be discussed whether we want to 
> > have them. As version name for it I would like to use "trunk-dev" (and 
> > not 5.4-dev or 6.0-dev) as we're not quite sure where this is moving. 
> > Right now, there are the following features that I can see we should 
> > think about:
> 
> 
> Since we do not know the name of the next version yet, maybe its best to 
> base the name on what version it will have as a predecessor and add 
> support for this in version_compare()? Something like "5.3post". Ok this 
> isnt a good suggestion, but I hope you get what I am suggesting.

We need a version number which can be represented as a numeric value
like 

#define PHP_VERSION_ID 50303

to help extension authors; as said on IRC 5.4 is the only sane choice
there. We can still increase the number if needed.

How to document this is a good question...

johannes



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

2010-03-16 Thread Ferenc Kovacs

On Tue, Mar 16, 2010 at 9:43 PM, dreamcat four  wrote:
> And remember,
>
> Its not just the number of times its send to ICU for conversion. Its
> also the number of times your UTF-16 string has to be converted back
> into utf-8 afterwards. This is why Apple makes its utf-16 strings
> immutable. So they are read-only, and the utf-8 representation can be
> cached afterward.
>
> Think of it this way:
>
> 1. Load a utf-8 string from DB or file
> 2. Convert it to utf-16
> 3. Perform ICU conv 3-5 times
> 4. Page gets hit by memcache
> 5. utf-16 is converted back to utf-8
> 6. Something changes
>  ? String was cached ?
> 7. need to spit out another utf-8 version of the string again
>
> And a persistent web application can be held for many hours in memory.
> Are we converting back to utf-8 every time? Then it might be better to
> wrap the string conversions just around ICU.
>
> I'd suggest selecting a real (but still as easy-to-work with as can be
> found) unicode php app. One that has been written to use a unicode php
> module. Then getting a single, representative page from it. By that I
> mean the kind of page that gets accessed the most. So for imdb that
> would be a movie's page, etc. The smalled 'slice' of the app, not the
> whole thing. Dummy-out the other stuff.
>
> Then convert that part (for rendering one page) into the current php6
> unicode scheme. And can see what's what.
>
I would choose mediawiki software for this kind of test, it works in a
really internationalized environment, plus I did see
posting/contributing the main developer of the mediawiki/wikipedia
application on the mailing list.

But that's just my two cents.

Tyrael
>
>
> On Tue, Mar 16, 2010 at 8:04 PM, Ferenc Kovacs  wrote:
>> On Tue, Mar 16, 2010 at 8:05 PM, Stanislav Malyshev  wrote:
>>> Hi!
>>>
 On disk storage should probably be UTF-8 without any question? Windows
 use of widestrings for some files simple doubles up the on disk storage
>>>
>>> As file content, it's OK (an it'd be easy to add option to specify content
>>> transformation if we wanted), but prescribing filenames as UTF-8 would
>>> probably be not workable, since different systems (and maybe even different
>>> filesystems inside same OS?) can have different opinions on that.
>>>
 '3' is not a very processor friendly number, so working with 4 even
 though wasteful on memory, does make perfect sense. How long is it since
>>>
>>> I'm not sure it does. Most of PHP strings are short, so memory loss would be
>>> very significant. Also, take into account that CPU caches aren't as big as
>>> the main memory, and not fitting your data into the cache is expensive.
>>>
 we had a 640k limit on working memory? SERVERS should have a good amount
>>>
>>> It doesn't matter how much memory you have, in numbers. Until we find an
>>> unlimited source of computer memory left by the aliens in Himalayas, memory
>>> costs money. It doesn't matter how much memory do you have - however many
>>> gigs you have, you'll be able to run 3 times less PHP processes in new
>>> version on the same hardware than in old version, which means new PHP would
>>> cost you more to run. "Memory is cheap" is a very misunderstood expression -
>>> it's only cheap if you always have much more than you need.
>>>
 Probably 90% of the time a string will come in and go out without
 requiring any processing at all, so leave it as UTF-8 ? The only time we
>>>
>>> It might be great if we could do that. The problem might be that right now
>>> AFAIK we don't have a good library to work with utf-8 strings (please
>>> correct me if I'm wrong here).
>> http://source.icu-project.org/repos/icu/icuhtml/trunk/design/strings/icu_utf8.html
>> from ICU 3.6 changelog => The UTF-8 transformation functions and
>> macros are faster.
>> from 4.2 => UTF-8 friendly internal data structure for Unicode data lookup
>> so it's seems that guys at ICU tries to close the gap between the
>> UTF-16 and UTF-8 performance, so maybe it would be a good idea, to
>> check out the current situation.
>>
>> Tyrael
>>> --
>>> Stanislav Malyshev, Zend Software Architect
>>> s...@zend.com   http://www.zend.com/
>>> (408)253-8829   MSN: s...@zend.com
>>>
>>> --
>>> PHP Internals - PHP Runtime Development Mailing List
>>> To unsubscribe, visit: http://www.php.net/unsub.php
>>>
>>>
>>
>> --
>> PHP Internals - PHP Runtime Development Mailing List
>> To unsubscribe, visit: http://www.php.net/unsub.php
>>
>>
>

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Req #51295: busyTimeout method for SQLite3

2010-03-16 Thread Joey Smith

On Sun, Mar 14, 2010 at 09:15:37AM -0400, Wez Furlong wrote:
> I'm sure that the docs team will add this to the manual if you ask them 
> politely.
>
> Specifically, PDO_SQLITE defaults to a 60 second busy timeout.  This can 
> be changed by setting PDO::ATTR_TIMEOUT.  The value is specified in 
> seconds.
>
> ISTR that this option can also be specified for some of the other  
> database drivers to affect the network timeout when processing a query.

A nod's as good as a wink. :) This has been committed.

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] PHP 5.4 branch and trunk

2010-03-16 Thread Lukas Kahwe Smith

On 16.03.2010, at 16:58, Derick Rethans wrote:

> I've just renamed the 5.4 branch to THE_5_4_THAT_ISNT_5_4 and moved 

Eventually it should be deleted, if it helps at all in merging the OB change 
then it should be kept until that happens, otherwise it can be deleted now 
imho. The new 5.3 based trunk will emerge soon I am sure, but until then lets 
not bother with having to merge those changes.

> trunk to the branch FIRST_UNICODE_IMPLEMENTATION.

+1

> The next things to do is to re-create trunk from PHP 5.3; I've hold off 
> that for now, but I'd like to do the following soon:
> 
> - Declare 5.2 security fixes only (Something for Ilia to declare).
> - Declare 5.3 bug fixes only (and ini-mini features if so desired) 
>  (Something for Johannes to declare).

+1

> - the new output buffering mechanism (I can not really see why we would 
>  not want this)

+1

> - traits, there are also RFCs:
>  http://wiki.php.net/rfc/horizontalreuse
>  http://wiki.php.net/rfc/nonbreakabletraits

+1

other stuff:
http://wiki.php.net/todo/php60
http://wiki.php.net/todo/backlog

That being said I think until we know if the next version will be a new major 
version, we should hold off on BC breaking cleanup stuff likes dropping 
register globals and friends. But we still might bundle APC with the next 
release for example, even if its not 6.0 ..

--

As for unicode, I would like the next release to be planned independently of 
finding a solution for unicode, but with the clear option that it will be 
included if we find a good solution in time (like I said I think it would be 
good to shoot for a final release summer 2011, so beta phase in early 2011). I 
propose that sort of a unicode working group forms but much less formal than 
what I make it sound like. I think the discussions can remain on internals@ and 
hopefully alternative approaches will be documented as RFCs. But what I mean 
with working group is a list of a handful of names who feel responsible to keep 
this topic moving until a solution is found and who people know they can 
contact if they want to chat or whatever.

Again if these guys find a workable solution that can be implemented this year 
and I am all for putting it into the next release. If not so be it, because I 
think the lesson learned in all of the PHP6/PHP5.3 release nightmare is that we 
should have regular releases. So I say we shoot for the release following the 
next one to come out in the summer of 2012.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] PHP 5.4 branch and trunk

2010-03-16 Thread Lukas Kahwe Smith

On 16.03.2010, at 16:58, Derick Rethans wrote:

> Before we add features, they need to be discussed whether we want to 
> have them. As version name for it I would like to use "trunk-dev" (and 
> not 5.4-dev or 6.0-dev) as we're not quite sure where this is moving. 
> Right now, there are the following features that I can see we should 
> think about:

Since we do not know the name of the next version yet, maybe its best to base 
the name on what version it will have as a predecessor and add support for this 
in version_compare()? Something like "5.3post". Ok this isnt a good suggestion, 
but I hope you get what I am suggesting.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] PHP 5.4 branch and trunk

2010-03-16 Thread Lukas Kahwe Smith

On 16.03.2010, at 19:23, Hannes Magnusson wrote:

> On Tue, Mar 16, 2010 at 16:58, Derick Rethans  wrote:
>> Before we add features, they need to be discussed whether we want to
>> have them.
> 
> Does that mean you want to take up a
> - strict RFC-and-after-3months-discussion-before-commit policy
>  (i.e. killing the scratching-an-itch spirit of PHP)
> - "I'm going to commit this patch tomorrow" mail to internals@
>  (i.e. killing "I need this functionality, maybe others do to" spirit of PHP)

Its all a question about the scope of the change obviously. There is some 
tipping point where it makes sense for an RFC. Remember an RFC not only serves 
decision making, but also provides some level of documentation (on which the 
final documentation can be build) for past generations (this is why I for 
example wrote the ifsetor RFC after we decided that we cannot currently 
implement it).

So like Stas said .. common sense still rules.

> I would much rather have a development branch which ""everything
> goes"" (like it used to) and then make it up to the release manager to
> merge the features he wants in "his branch" (DVCS style)

I dont think we ever had an "everything goes" HEAD .. lets say in the past we 
had a small very active core dev team with really short turn around times for 
decisions because everybody was answering on IRC or mailinglists within 
minutes. As a result decisions (not always for the better) were made in a much 
shorter timeframe than the current availability of core developers affords us.

>> - Ilia's scalar type hint patch.
> 
> And which of Ilias patches are you referring to? The original one
> (which is identical to the patch I sent in November 2006) or the
> "fucking eyh, I need to please everyone so this can be in 5.3 - but
> still got rejected" patch?

I think he clearly pointed to the wiki page which lists 3 proposals. He is just 
suggesting we should finalize which one we want and get it in.

> You didn't even list the mbstring patch.. that was discussed and as
> far as I remember everyone thought it was great idea, just not in a
> stable branch.

Is this tone really necessary? One you are argueing for more flexibility and 
then you are shooting the messenger because in a long list he forgot one thing 
(there are probably a few others .. we might want to go through the todo wiki 
pages for more)?

regards,
Lukas Kahwe Smith
m...@pooteeweet.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

2010-03-16 Thread dreamcat four

And remember,

Its not just the number of times its send to ICU for conversion. Its
also the number of times your UTF-16 string has to be converted back
into utf-8 afterwards. This is why Apple makes its utf-16 strings
immutable. So they are read-only, and the utf-8 representation can be
cached afterward.

Think of it this way:

1. Load a utf-8 string from DB or file
2. Convert it to utf-16
3. Perform ICU conv 3-5 times
4. Page gets hit by memcache
5. utf-16 is converted back to utf-8
6. Something changes
 ? String was cached ?
7. need to spit out another utf-8 version of the string again

And a persistent web application can be held for many hours in memory.
Are we converting back to utf-8 every time? Then it might be better to
wrap the string conversions just around ICU.

I'd suggest selecting a real (but still as easy-to-work with as can be
found) unicode php app. One that has been written to use a unicode php
module. Then getting a single, representative page from it. By that I
mean the kind of page that gets accessed the most. So for imdb that
would be a movie's page, etc. The smalled 'slice' of the app, not the
whole thing. Dummy-out the other stuff.

Then convert that part (for rendering one page) into the current php6
unicode scheme. And can see what's what.

On Tue, Mar 16, 2010 at 8:04 PM, Ferenc Kovacs  wrote:
> On Tue, Mar 16, 2010 at 8:05 PM, Stanislav Malyshev  wrote:
>> Hi!
>>
>>> On disk storage should probably be UTF-8 without any question? Windows
>>> use of widestrings for some files simple doubles up the on disk storage
>>
>> As file content, it's OK (an it'd be easy to add option to specify content
>> transformation if we wanted), but prescribing filenames as UTF-8 would
>> probably be not workable, since different systems (and maybe even different
>> filesystems inside same OS?) can have different opinions on that.
>>
>>> '3' is not a very processor friendly number, so working with 4 even
>>> though wasteful on memory, does make perfect sense. How long is it since
>>
>> I'm not sure it does. Most of PHP strings are short, so memory loss would be
>> very significant. Also, take into account that CPU caches aren't as big as
>> the main memory, and not fitting your data into the cache is expensive.
>>
>>> we had a 640k limit on working memory? SERVERS should have a good amount
>>
>> It doesn't matter how much memory you have, in numbers. Until we find an
>> unlimited source of computer memory left by the aliens in Himalayas, memory
>> costs money. It doesn't matter how much memory do you have - however many
>> gigs you have, you'll be able to run 3 times less PHP processes in new
>> version on the same hardware than in old version, which means new PHP would
>> cost you more to run. "Memory is cheap" is a very misunderstood expression -
>> it's only cheap if you always have much more than you need.
>>
>>> Probably 90% of the time a string will come in and go out without
>>> requiring any processing at all, so leave it as UTF-8 ? The only time we
>>
>> It might be great if we could do that. The problem might be that right now
>> AFAIK we don't have a good library to work with utf-8 strings (please
>> correct me if I'm wrong here).
> http://source.icu-project.org/repos/icu/icuhtml/trunk/design/strings/icu_utf8.html
> from ICU 3.6 changelog => The UTF-8 transformation functions and
> macros are faster.
> from 4.2 => UTF-8 friendly internal data structure for Unicode data lookup
> so it's seems that guys at ICU tries to close the gap between the
> UTF-16 and UTF-8 performance, so maybe it would be a good idea, to
> check out the current situation.
>
> Tyrael
>> --
>> Stanislav Malyshev, Zend Software Architect
>> s...@zend.com   http://www.zend.com/
>> (408)253-8829   MSN: s...@zend.com
>>
>> --
>> PHP Internals - PHP Runtime Development Mailing List
>> To unsubscribe, visit: http://www.php.net/unsub.php
>>
>>
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

2010-03-16 Thread William A. Rowe Jr.

On 3/16/2010 6:48 AM, dreamcat four wrote:
> 
> Sure UTF-16 can make sense. But only if your external representations
> are also in UTF-16. So whats the default Unicode settings for MYSQL,
> POSTGRE, etc? Well, are they always set to UTF-8, or UTF-16?

This is a very good point.  The PHP project consumes some 30-odd libraries
of extensions.  How many do utf-8?  How many do ucs2?  Utf-16?

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

2010-03-16 Thread Lester Caine


Rasmus Lerdorf wrote:

On 03/16/2010 12:05 PM, dreamcat four wrote:

On Tue, Mar 16, 2010 at 6:32 PM, Rasmus Lerdorf  wrote:

On 03/16/2010 10:40 AM, dreamcat four wrote:

As for text files on disk, if they are unicode, they are most commonly
utf-8 too. So then, why use utf-16 as internal unicode representation
in Php? It doesn't really make a lot of sense for most regular people
who want to use Php for their web application. Unless they don't
really care how slow its gonna be converting everything, constantly...


Well, the obvious original reason is that ICU uses UTF-16 internally and
the logic was that we would be going in and out of ICU to do all the
various Unicode operations many more times than we would be interfacing
with external things like MySQL or files on disk.  You generally only
read or write a string once from an external source, but you may perform
multiple Unicode operations on that same string so avoiding a conversion
for each operation seems logical.

-Rasmus


Its only logical if you've bothered to profile the conversion calls to
ICU against the non-ICU conversion calls. Im guessing the way to do
that, is to have 2 versions of each conversion method. One used by
ICU, and another used everywhere else. The harder part is to find some
suitable, real life php programs to test with.


You mean check to see how many actual Unicode operations a standard app
makes?  We did talk about that, but there is a bit of a chicken-and-egg
problem here.  Because PHP doesn't natively support Unicode, people
write apps in a way that lets them just pass Unicode through PHP and
deal with it elsewhere.  I would expect the profile to change once PHP
gets better support for Unicode.

But yes, some ideas around lazy conversions and other tricks would be
interesting.  If your input and output encoding are both utf-8 and all
your data sources are utf-8 and you never do any sort of string
manipulation on a particular string, why bother doing the utf-8 to
utf-16 conversion on that string.


I think that is what I said originally ;)
When a string is read in you set an extra flag if it needs special handling, 
otherwise you just handle it as a single byte per character string ... and for 
the diehards you add a switch to treat everything as it is now :)


--
Lester Caine - G8HFL
-
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] array_seek function

2010-03-16 Thread Brian Moon


Right now, it returns the value of a given position.


How it's different from:

array_slice() returns the sequence of elements from the array array as
specified by the offset and length parameters?


array_slice returns an array of elements. This function would return the 
value at the given position.


Brian.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] array_seek function

2010-03-16 Thread Stanislav Malyshev


Hi!


Right now, it returns the value of a given position.


How it's different from:

array_slice() returns the sequence of elements from the array array  as 
specified by the offset  and length  parameters?

--
Stanislav Malyshev, Zend Software Architect
s...@zend.com   http://www.zend.com/
(408)253-8829   MSN: s...@zend.com

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] PHP 5.4 branch and trunk

2010-03-16 Thread Stanislav Malyshev


Hi!


Does that mean you want to take up a
  - strict RFC-and-after-3months-discussion-before-commit policy
   (i.e. killing the scratching-an-itch spirit of PHP)
  - "I'm going to commit this patch tomorrow" mail to internals@
   (i.e. killing "I need this functionality, maybe others do to" spirit of PHP)


Probably something like "I have this patch and I wrote this RFC, please 
discuss", then wait reasonable* time for discussion and reasonable* 
consensus before commit, and for reasonably* small patches "I'm going to 
commit it in 2 days unless somebody objects" would work


(*) I know definitions of "reasonable" differ but I have faith we find a 
common ground.



And which of Ilias patches are you referring to? The original one
(which is identical to the patch I sent in November 2006) or the
"fucking eyh, I need to please everyone so this can be in 5.3 - but
still got rejected" patch?


That's exactly why having RFC is good - one link solves all the 
questions about "which one is it' :)


--
Stanislav Malyshev, Zend Software Architect
s...@zend.com   http://www.zend.com/
(408)253-8829   MSN: s...@zend.com

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] PHP 5.4 branch and trunk

2010-03-16 Thread Antony Dovgal

On 03/16/2010 11:00 PM, Johannes Schlüter wrote:
> On Tue, 2010-03-16 at 19:11 +0300, Alexey Zakhlestin wrote:
>> + merge php-fpm branch?
> 
> If we get a trunk which will be released in a foreseeable timeframe we
> don't need to merge this to 5.3 anymore, which had been an old plan.
> Tony, do you agree?

Makes sense to me.

-- 
Wbr,
Antony Dovgal
---
http://pinba.org - realtime statistics for PHP

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

2010-03-16 Thread Ferenc Kovacs

On Tue, Mar 16, 2010 at 8:05 PM, Stanislav Malyshev  wrote:
> Hi!
>
>> On disk storage should probably be UTF-8 without any question? Windows
>> use of widestrings for some files simple doubles up the on disk storage
>
> As file content, it's OK (an it'd be easy to add option to specify content
> transformation if we wanted), but prescribing filenames as UTF-8 would
> probably be not workable, since different systems (and maybe even different
> filesystems inside same OS?) can have different opinions on that.
>
>> '3' is not a very processor friendly number, so working with 4 even
>> though wasteful on memory, does make perfect sense. How long is it since
>
> I'm not sure it does. Most of PHP strings are short, so memory loss would be
> very significant. Also, take into account that CPU caches aren't as big as
> the main memory, and not fitting your data into the cache is expensive.
>
>> we had a 640k limit on working memory? SERVERS should have a good amount
>
> It doesn't matter how much memory you have, in numbers. Until we find an
> unlimited source of computer memory left by the aliens in Himalayas, memory
> costs money. It doesn't matter how much memory do you have - however many
> gigs you have, you'll be able to run 3 times less PHP processes in new
> version on the same hardware than in old version, which means new PHP would
> cost you more to run. "Memory is cheap" is a very misunderstood expression -
> it's only cheap if you always have much more than you need.
>
>> Probably 90% of the time a string will come in and go out without
>> requiring any processing at all, so leave it as UTF-8 ? The only time we
>
> It might be great if we could do that. The problem might be that right now
> AFAIK we don't have a good library to work with utf-8 strings (please
> correct me if I'm wrong here).
http://source.icu-project.org/repos/icu/icuhtml/trunk/design/strings/icu_utf8.html
from ICU 3.6 changelog => The UTF-8 transformation functions and
macros are faster.
from 4.2 => UTF-8 friendly internal data structure for Unicode data lookup
so it's seems that guys at ICU tries to close the gap between the
UTF-16 and UTF-8 performance, so maybe it would be a good idea, to
check out the current situation.

Tyrael
> --
> Stanislav Malyshev, Zend Software Architect
> s...@zend.com   http://www.zend.com/
> (408)253-8829   MSN: s...@zend.com
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] PHP 5.4 branch and trunk

2010-03-16 Thread Johannes Schlüter

On Tue, 2010-03-16 at 19:11 +0300, Alexey Zakhlestin wrote:
> + merge php-fpm branch?

If we get a trunk which will be released in a foreseeable timeframe we
don't need to merge this to 5.3 anymore, which had been an old plan.
Tony, do you agree?

johannes


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

2010-03-16 Thread Rasmus Lerdorf

On 03/16/2010 12:05 PM, dreamcat four wrote:
> On Tue, Mar 16, 2010 at 6:32 PM, Rasmus Lerdorf  wrote:
>> On 03/16/2010 10:40 AM, dreamcat four wrote:
>>> As for text files on disk, if they are unicode, they are most commonly
>>> utf-8 too. So then, why use utf-16 as internal unicode representation
>>> in Php? It doesn't really make a lot of sense for most regular people
>>> who want to use Php for their web application. Unless they don't
>>> really care how slow its gonna be converting everything, constantly...
>>
>> Well, the obvious original reason is that ICU uses UTF-16 internally and
>> the logic was that we would be going in and out of ICU to do all the
>> various Unicode operations many more times than we would be interfacing
>> with external things like MySQL or files on disk.  You generally only
>> read or write a string once from an external source, but you may perform
>> multiple Unicode operations on that same string so avoiding a conversion
>> for each operation seems logical.
>>
>> -Rasmus
>
> Its only logical if you've bothered to profile the conversion calls to
> ICU against the non-ICU conversion calls. Im guessing the way to do
> that, is to have 2 versions of each conversion method. One used by
> ICU, and another used everywhere else. The harder part is to find some
> suitable, real life php programs to test with.

You mean check to see how many actual Unicode operations a standard app
makes?  We did talk about that, but there is a bit of a chicken-and-egg
problem here.  Because PHP doesn't natively support Unicode, people
write apps in a way that lets them just pass Unicode through PHP and
deal with it elsewhere.  I would expect the profile to change once PHP
gets better support for Unicode.

But yes, some ideas around lazy conversions and other tricks would be
interesting.  If your input and output encoding are both utf-8 and all
your data sources are utf-8 and you never do any sort of string
manipulation on a particular string, why bother doing the utf-8 to
utf-16 conversion on that string.

-Rasmus

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

2010-03-16 Thread Pierre Joye

On Tue, Mar 16, 2010 at 7:32 PM, Rasmus Lerdorf  wrote:

> Well, the obvious original reason is that ICU uses UTF-16 internally and
> the logic was that we would be going in and out of ICU to do all the
> various Unicode operations many more times than we would be interfacing
> with external things like MySQL or files on disk.  You generally only
> read or write a string once from an external source, but you may perform
> multiple Unicode operations on that same string so avoiding a conversion
> for each operation seems logical.

Exactly, that's why I was not so affirmative about using UTF-8 over
UTF-16. I would like to evaluate both solutions with a small set of
PHP features (say some file ops, 1-2 DBs and part of the core string
functions) and see the impact of using UTF-8 or UTF-16. But it is
definitivelly not a small decision.

-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

2010-03-16 Thread dreamcat four

On Tue, Mar 16, 2010 at 6:32 PM, Rasmus Lerdorf  wrote:
> On 03/16/2010 10:40 AM, dreamcat four wrote:
>> As for text files on disk, if they are unicode, they are most commonly
>> utf-8 too. So then, why use utf-16 as internal unicode representation
>> in Php? It doesn't really make a lot of sense for most regular people
>> who want to use Php for their web application. Unless they don't
>> really care how slow its gonna be converting everything, constantly...
>
> Well, the obvious original reason is that ICU uses UTF-16 internally and
> the logic was that we would be going in and out of ICU to do all the
> various Unicode operations many more times than we would be interfacing
> with external things like MySQL or files on disk.  You generally only
> read or write a string once from an external source, but you may perform
> multiple Unicode operations on that same string so avoiding a conversion
> for each operation seems logical.
>
> -Rasmus
>
>
>

Its only logical if you've bothered to profile the conversion calls to
ICU against the non-ICU conversion calls. Im guessing the way to do
that, is to have 2 versions of each conversion method. One used by
ICU, and another used everywhere else. The harder part is to find some
suitable, real life php programs to test with.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

2010-03-16 Thread Stanislav Malyshev


Hi!


On disk storage should probably be UTF-8 without any question? Windows
use of widestrings for some files simple doubles up the on disk storage


As file content, it's OK (an it'd be easy to add option to specify 
content transformation if we wanted), but prescribing filenames as UTF-8 
would probably be not workable, since different systems (and maybe even 
different filesystems inside same OS?) can have different opinions on that.



'3' is not a very processor friendly number, so working with 4 even
though wasteful on memory, does make perfect sense. How long is it since


I'm not sure it does. Most of PHP strings are short, so memory loss 
would be very significant. Also, take into account that CPU caches 
aren't as big as the main memory, and not fitting your data into the 
cache is expensive.



we had a 640k limit on working memory? SERVERS should have a good amount


It doesn't matter how much memory you have, in numbers. Until we find an 
unlimited source of computer memory left by the aliens in Himalayas, 
memory costs money. It doesn't matter how much memory do you have - 
however many gigs you have, you'll be able to run 3 times less PHP 
processes in new version on the same hardware than in old version, which 
means new PHP would cost you more to run. "Memory is cheap" is a very 
misunderstood expression - it's only cheap if you always have much more 
than you need.



Probably 90% of the time a string will come in and go out without
requiring any processing at all, so leave it as UTF-8 ? The only time we


It might be great if we could do that. The problem might be that right 
now AFAIK we don't have a good library to work with utf-8 strings 
(please correct me if I'm wrong here).

--
Stanislav Malyshev, Zend Software Architect
s...@zend.com   http://www.zend.com/
(408)253-8829   MSN: s...@zend.com

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

2010-03-16 Thread Lester Caine


Rasmus Lerdorf wrote:

On 03/16/2010 10:40 AM, dreamcat four wrote:

As for text files on disk, if they are unicode, they are most commonly
utf-8 too. So then, why use utf-16 as internal unicode representation
in Php? It doesn't really make a lot of sense for most regular people
who want to use Php for their web application. Unless they don't
really care how slow its gonna be converting everything, constantly...


Well, the obvious original reason is that ICU uses UTF-16 internally and
the logic was that we would be going in and out of ICU to do all the
various Unicode operations many more times than we would be interfacing
with external things like MySQL or files on disk.  You generally only
read or write a string once from an external source, but you may perform
multiple Unicode operations on that same string so avoiding a conversion
for each operation seems logical.


Which begs the question - is ICU actually the right base?

But I'd still like some feedback on my idea that until an operation needs to be 
able to handle multi byte character string processing, why not simply stay in 
UTF-8? No reason why a string variable can't be converted only when needed, and 
then dropped back to UTF-8 if needed later? And if the user is only using single 
byte characters then the multi byte stuff never kicks in anyway? If you NEED raw 
speed use the basic character set.


--
Lester Caine - G8HFL
-
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

2010-03-16 Thread Rasmus Lerdorf

On 03/16/2010 10:40 AM, dreamcat four wrote:
> As for text files on disk, if they are unicode, they are most commonly
> utf-8 too. So then, why use utf-16 as internal unicode representation
> in Php? It doesn't really make a lot of sense for most regular people
> who want to use Php for their web application. Unless they don't
> really care how slow its gonna be converting everything, constantly...

Well, the obvious original reason is that ICU uses UTF-16 internally and
the logic was that we would be going in and out of ICU to do all the
various Unicode operations many more times than we would be interfacing
with external things like MySQL or files on disk.  You generally only
read or write a string once from an external source, but you may perform
multiple Unicode operations on that same string so avoiding a conversion
for each operation seems logical.

-Rasmus

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

2010-03-16 Thread Andrey Hristov


dreamcat four wrote:

On Tue, Mar 16, 2010 at 11:48 AM, dreamcat four  wrote:

On Tue, Mar 16, 2010 at 8:30 AM, Lester Caine  wrote:

'3' is not a very processor friendly number, so working with 4 even though
wasteful on memory, does make perfect sense. How long is it since we had a
640k limit on working memory? SERVERS should have a good amount of memory
for caching information anyway. SO is UTF-16 the right approach for
processing wide strings? It needs special code to handle everything wider
than 16 bits, but at what gain really? If all core functionality is handled
as 32 bit characters is there that much of an overhead over the additional
processing to get around strings of dissimilar sizes in UTF-16 ?

Just to re-enforce some of Lester's points above here.

4-byte per character is never slower that 2-bytes per character... its
faster if anything. Bear in mind that 4-byte has been the defacto size
for all modern cpu registers / 32-bit microarchitectures since
like... Forever. Give a c compiler 4bytes of data... it'll say: thank
you very much, and more of the same please! It keeps em happy ;)

Sure UTF-16 can make sense. But only if your external representations
are also in UTF-16. So whats the default Unicode settings for MYSQL,
POSTGRE, etc? Well, are they always set to UTF-8, or UTF-16?



To answer my own question, I have done some some further research.

It seems that both MySQL and Postgre recommend / default to Latin1
(8-bit ASCII) and  'C' (7-bit ASCII) respectively. So that is to say
neither set themselves to any unicode standard by default.

In the case of Postgre, the ASCII default is often overiden to UTF-8
by the distro / os / package managers. From the $LOCALE environment
variable. So then its UTF-8.

In the case of MySQL, it may be left as latin1. But most competent web
developers decide to set it to utf-8. Again, its not generally
believed that very many people (by comparison) actively chooses
utf-16. The most common encoding issue people run into is that their
web application has sent their database utf-8 encoded data. But their
(usually a MySQL) database still has the factory default encoding
Latin-1 (8-bit ascii). People who discover this almost always solve
the problem by converting their databases into utf-8.


MySQL doesn't support UTF-16 in any GA release. UCS-2 can be used though.


As for text files on disk, if they are unicode, they are most commonly
utf-8 too. So then, why use utf-16 as internal unicode representation
in Php? It doesn't really make a lot of sense for most regular people
who want to use Php for their web application. Unless they don't
really care how slow its gonna be converting everything, constantly...




Andrey

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] PHP 5.4 branch and trunk

2010-03-16 Thread Hannes Magnusson

On Tue, Mar 16, 2010 at 16:58, Derick Rethans  wrote:
> Before we add features, they need to be discussed whether we want to
> have them.

Does that mean you want to take up a
 - strict RFC-and-after-3months-discussion-before-commit policy
  (i.e. killing the scratching-an-itch spirit of PHP)
 - "I'm going to commit this patch tomorrow" mail to internals@
  (i.e. killing "I need this functionality, maybe others do to" spirit of PHP)

or what exactly do you mean by that?

I would much rather have a development branch which ""everything
goes"" (like it used to) and then make it up to the release manager to
merge the features he wants in "his branch" (DVCS style)

> - Ilia's scalar type hint patch.

And which of Ilias patches are you referring to? The original one
(which is identical to the patch I sent in November 2006) or the
"fucking eyh, I need to please everyone so this can be in 5.3 - but
still got rejected" patch?

You didn't even list the mbstring patch.. that was discussed and as
far as I remember everyone thought it was great idea, just not in a
stable branch.

-Hannes

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] PHP 5.4 branch and trunk

2010-03-16 Thread Hannes Magnusson

On Tue, Mar 16, 2010 at 17:54, Pierre Joye  wrote:
> On Tue, Mar 16, 2010 at 5:43 PM, Sebastian Bergmann
>  wrote:
>> Am 16.03.2010 16:58, schrieb Derick Rethans:
>>> I've just renamed the 5.4 branch to THE_5_4_THAT_ISNT_5_4 and moved
>>> trunk to the branch FIRST_UNICODE_IMPLEMENTATION.
>>
>>  Why do we need THE_5_4_THAT_ISNT_5_4
>
> Right, this branch must be deleted, useless. The OB patch can be
> merged again in trunk when trunk has been rebranched.

Why exactly do we need to duplicate the work?

IMO that branch should be renamed to trunk/ and those 2 or 3 patches
to 5.3 to merged into it.

-Hannes

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] array_seek function

2010-03-16 Thread Mikko Koppanen

On Tue, Mar 16, 2010 at 4:22 PM, Derick Rethans  wrote:
> I was also thinking, can we just make this work just like fseek (with a
> "whence" parameter) as well? (http://uk3.php.net/fseek)

Hi,

not sure how SEEK_END is supposed to work with arrays but here is
SEEK_SET and SEEK_CUR (with positive and negative offset)
http://valokuva.org/~mikko/array_seek_whence.patch.txt

-- 
Mikko Koppanen

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

2010-03-16 Thread dreamcat four

On Tue, Mar 16, 2010 at 11:48 AM, dreamcat four  wrote:
> On Tue, Mar 16, 2010 at 8:30 AM, Lester Caine  wrote:
>> '3' is not a very processor friendly number, so working with 4 even though
>> wasteful on memory, does make perfect sense. How long is it since we had a
>> 640k limit on working memory? SERVERS should have a good amount of memory
>> for caching information anyway. SO is UTF-16 the right approach for
>> processing wide strings? It needs special code to handle everything wider
>> than 16 bits, but at what gain really? If all core functionality is handled
>> as 32 bit characters is there that much of an overhead over the additional
>> processing to get around strings of dissimilar sizes in UTF-16 ?
>
> Just to re-enforce some of Lester's points above here.
>
> 4-byte per character is never slower that 2-bytes per character... its
> faster if anything. Bear in mind that 4-byte has been the defacto size
> for all modern cpu registers / 32-bit microarchitectures since
> like... Forever. Give a c compiler 4bytes of data... it'll say: thank
> you very much, and more of the same please! It keeps em happy ;)
>
> Sure UTF-16 can make sense. But only if your external representations
> are also in UTF-16. So whats the default Unicode settings for MYSQL,
> POSTGRE, etc? Well, are they always set to UTF-8, or UTF-16?
>

To answer my own question, I have done some some further research.

It seems that both MySQL and Postgre recommend / default to Latin1
(8-bit ASCII) and  'C' (7-bit ASCII) respectively. So that is to say
neither set themselves to any unicode standard by default.

In the case of Postgre, the ASCII default is often overiden to UTF-8
by the distro / os / package managers. From the $LOCALE environment
variable. So then its UTF-8.

In the case of MySQL, it may be left as latin1. But most competent web
developers decide to set it to utf-8. Again, its not generally
believed that very many people (by comparison) actively chooses
utf-16. The most common encoding issue people run into is that their
web application has sent their database utf-8 encoded data. But their
(usually a MySQL) database still has the factory default encoding
Latin-1 (8-bit ascii). People who discover this almost always solve
the problem by converting their databases into utf-8.

As for text files on disk, if they are unicode, they are most commonly
utf-8 too. So then, why use utf-16 as internal unicode representation
in Php? It doesn't really make a lot of sense for most regular people
who want to use Php for their web application. Unless they don't
really care how slow its gonna be converting everything, constantly...

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] PHP 5.4 branch and trunk

2010-03-16 Thread Pierre Joye

On Tue, Mar 16, 2010 at 5:43 PM, Sebastian Bergmann
 wrote:
> Am 16.03.2010 16:58, schrieb Derick Rethans:
>> I've just renamed the 5.4 branch to THE_5_4_THAT_ISNT_5_4 and moved
>> trunk to the branch FIRST_UNICODE_IMPLEMENTATION.
>
>  Why do we need THE_5_4_THAT_ISNT_5_4

Right, this branch must be deleted, useless. The OB patch can be
merged again in trunk when trunk has been rebranched.

Cheers,
-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] PHP 5.4 branch and trunk

2010-03-16 Thread Sebastian Bergmann

Am 16.03.2010 16:58, schrieb Derick Rethans:
> I've just renamed the 5.4 branch to THE_5_4_THAT_ISNT_5_4 and moved 
> trunk to the branch FIRST_UNICODE_IMPLEMENTATION.

 Why do we need THE_5_4_THAT_ISNT_5_4 and trunk? trunk should be where
 the development happens. When the time comes for a release, PHP_X_Y
 should be branched off of trunk.

-- 
Sebastian BergmannCo-Founder and Principal Consultant
http://sebastian-bergmann.de/   http://thePHP.cc/


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] array_seek function

2010-03-16 Thread Derick Rethans

On Tue, 16 Mar 2010, Felix De Vliegher wrote:

> On 16-mrt-2010, at 17:07, Derick Rethans wrote:
> 
> > On Tue, 16 Mar 2010, Felix De Vliegher wrote:
> > 
> >> Right now, it returns the value of a given position. In that case, 
> >> array_get_pos might be a better name. Oh, and I attached the patch 
> >> with .txt extension :)
> > 
> > Does it also seek the array pointer? Because I think array_seek that 
> > moves the pointer,  in combination with current() and key() might make 
> > slightly more sense?
> 
> Mikko updated the patch a bit to set the array pointer correctly (and 
> make it perform a bit better when dealing with large arrays), my 
> version left it one position too far. So yes, that's possible. The 
> updated version can be found here: 
> http://valokuva.org/~mikko/array_seek.patch.txt

I was also thinking, can we just make this work just like fseek (with a 
"whence" parameter) as well? (http://uk3.php.net/fseek)

with kind regards,
Derick
-- 
http://derickrethans.nl | http://xdebug.org
Like Xdebug? Consider a donation: http://xdebug.org/donate.php
twitter: @derickr and @xdebug

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] array_seek function

2010-03-16 Thread Felix De Vliegher

On 16-mrt-2010, at 17:07, Derick Rethans wrote:

> On Tue, 16 Mar 2010, Felix De Vliegher wrote:
> 
>> Right now, it returns the value of a given position. In that case, 
>> array_get_pos might be a better name. Oh, and I attached the patch 
>> with .txt extension :)
> 
> Does it also seek the array pointer? Because I think array_seek that 
> moves the pointer,  in combination with current() and key() might make 
> slightly more sense?
> 

Mikko updated the patch a bit to set the array pointer correctly (and make it 
perform a bit better when dealing with large arrays), my version left it one 
position too far. So yes, that's possible. The updated version can be found 
here: http://valokuva.org/~mikko/array_seek.patch.txt

Cheers,
Felix

Re: [PHP-DEV] Re: PHP 5.4 branch and trunk

2010-03-16 Thread Hannes Magnusson

On Tue, Mar 16, 2010 at 17:10, David Soria Parra  wrote:
> On 2010-03-16, Derick Rethans  wrote:
>> - Declare 5.2 security fixes only (Something for Ilia to declare).
>> - Declare 5.3 bug fixes only (and ini-mini features if so desired)
>>   (Something for Johannes to declare).
>>
>> Once that's done, I'd like to:
>>
>> - Recreate trunk from the 5.3 branch.
>>
>> - the new output buffering mechanism (I can not really see why we would
>>   not want this)
> is there something about that in the wiki? I think a few lines in the wiki
> about this would be good.

I doubt it. Its a rewrite which had to be done to simplify things and
fixes several bugs.


-Hannes

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] PHP 5.4 branch and trunk

2010-03-16 Thread Derick Rethans

On Tue, 16 Mar 2010, Alexey Zakhlestin wrote:

> On Tue, Mar 16, 2010 at 6:58 PM, Derick Rethans  
> wrote:
> 
> > Right now, there are the following features that I can see we should
> > think about:
> >
> > - the new output buffering mechanism (I can not really see why we would
> >  not want this)
> > - Scott's big number improvements. Scott, can you explain (in an RFC)
> >  what exactly this does and how it works?
> > - Ilia's scalar type hint patch. There are RFCs:
> >  http://wiki.php.net/rfc/typechecking
> > - traits, there are also RFCs:
> >  http://wiki.php.net/rfc/horizontalreuse
> >  http://wiki.php.net/rfc/nonbreakabletraits
> 
> + merge php-fpm branch?

Can't see why not. Is there an RFC for this?

regards,
Derick

-- 
http://derickrethans.nl | http://xdebug.org
Like Xdebug? Consider a donation: http://xdebug.org/donate.php
twitter: @derickr and @xdebug
-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Re: array_seek function

2010-03-16 Thread Mikko Koppanen

On Tue, Mar 16, 2010 at 2:12 PM, Christian Schneider
 wrote
> I thinks the user space implementation
>
> function array_seek($array, $pos)
> {
>        $a = array_values($array);
>        return $a[$pos];
> }
>
> is simple enough to not add a native function for this.
>
> It might not be the most efficient way to do it but I doubt that it is
> something done frequently enough to justify another native function.

Hi,

slightly modified version of the original patch
http://valokuva.org/~mikko/array_seek.patch.txt. The difference to the
original is that the iterator position is left where the user seeked
to. So something like following should work:



Not sure how useful it is to have this in core but I do remember for
looking for a seek function for arrays before.

-- 
Mikko Koppanen

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] PHP 5.4 branch and trunk

2010-03-16 Thread Alexey Zakhlestin

On Tue, Mar 16, 2010 at 6:58 PM, Derick Rethans  wrote:

> Right now, there are the following features that I can see we should
> think about:
>
> - the new output buffering mechanism (I can not really see why we would
>  not want this)
> - Scott's big number improvements. Scott, can you explain (in an RFC)
>  what exactly this does and how it works?
> - Ilia's scalar type hint patch. There are RFCs:
>  http://wiki.php.net/rfc/typechecking
> - traits, there are also RFCs:
>  http://wiki.php.net/rfc/horizontalreuse
>  http://wiki.php.net/rfc/nonbreakabletraits

+ merge php-fpm branch?



-- 
Alexey Zakhlestin
http://www.milkfarmsoft.com/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP-DEV] Re: PHP 5.4 branch and trunk

2010-03-16 Thread David Soria Parra

On 2010-03-16, Derick Rethans  wrote:
> - Declare 5.2 security fixes only (Something for Ilia to declare).
> - Declare 5.3 bug fixes only (and ini-mini features if so desired) 
>   (Something for Johannes to declare).
>
> Once that's done, I'd like to:
>
> - Recreate trunk from the 5.3 branch.
>
> - the new output buffering mechanism (I can not really see why we would 
>   not want this)
is there something about that in the wiki? I think a few lines in the wiki
about this would be good.

> - Scott's big number improvements. Scott, can you explain (in an RFC) 
>   what exactly this does and how it works?
> - Ilia's scalar type hint patch. There are RFCs:
>   http://wiki.php.net/rfc/typechecking
> - traits, there are also RFCs:
>   http://wiki.php.net/rfc/horizontalreuse
>   http://wiki.php.net/rfc/nonbreakabletraits

thank you.

I agree that we should discuss new additions with proper rfcs before we commit
them. in addition to that peopled interested in unicode should get together and
discuss how to readd unicode support and in which way to do this.

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] array_seek function

2010-03-16 Thread Derick Rethans

On Tue, 16 Mar 2010, Felix De Vliegher wrote:

> Right now, it returns the value of a given position. In that case, 
> array_get_pos might be a better name. Oh, and I attached the patch 
> with .txt extension :)

Does it also seek the array pointer? Because I think array_seek that 
moves the pointer,  in combination with current() and key() might make 
slightly more sense?

with kind regards,
Derick

-- 
http://derickrethans.nl | http://xdebug.org
Like Xdebug? Consider a donation: http://xdebug.org/donate.php
twitter: @derickr and @xdebug

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP-DEV] PHP 5.4 branch and trunk

2010-03-16 Thread Derick Rethans

Hello,

I've just renamed the 5.4 branch to THE_5_4_THAT_ISNT_5_4 and moved 
trunk to the branch FIRST_UNICODE_IMPLEMENTATION.

The next things to do is to re-create trunk from PHP 5.3; I've hold off 
that for now, but I'd like to do the following soon:

- Declare 5.2 security fixes only (Something for Ilia to declare).
- Declare 5.3 bug fixes only (and ini-mini features if so desired) 
  (Something for Johannes to declare).

Once that's done, I'd like to:

- Recreate trunk from the 5.3 branch.

Before we add features, they need to be discussed whether we want to 
have them. As version name for it I would like to use "trunk-dev" (and 
not 5.4-dev or 6.0-dev) as we're not quite sure where this is moving. 
Right now, there are the following features that I can see we should 
think about:

- the new output buffering mechanism (I can not really see why we would 
  not want this)
- Scott's big number improvements. Scott, can you explain (in an RFC) 
  what exactly this does and how it works?
- Ilia's scalar type hint patch. There are RFCs:
  http://wiki.php.net/rfc/typechecking
- traits, there are also RFCs:
  http://wiki.php.net/rfc/horizontalreuse
  http://wiki.php.net/rfc/nonbreakabletraits


with kind regards,
Derick

-- 
http://derickrethans.nl | http://xdebug.org
Like Xdebug? Consider a donation: http://xdebug.org/donate.php
twitter: @derickr and @xdebug

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP-DEV] Re: [PHP-CVS] svn: /php/php-src/

2010-03-16 Thread Hannes Magnusson

On Tue, Mar 16, 2010 at 16:45, Derick Rethans  wrote:
> derick                                   Tue, 16 Mar 2010 15:45:24 +
>
> Revision: http://svn.php.net/viewvc?view=revision&revision=296284
>
> Log:
> - Moved the Unicode experiment from trunk to its own branch for reference.
>
> Changed paths:
>    A + php/php-src/branches/FIRST_UNICODE_IMPLEMENTATION/
>        (from php/php-src/trunk/:r296283)
>    D   php/php-src/trunk/


Kudos

-Hannes

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP-DEV] Re: array_seek function

2010-03-16 Thread Christian Schneider

Felix De Vliegher wrote:
> Hi all
> 
> I recently needed seek functionality in arrays, and couldn't find it
> in the regular set of array functions, so I wrote a function for it.
> Seek = getting an array value based on the position (or offset, if you
> want to call it like that), and not the key of the item)
> 
> Basically you can use it like this:
> $input = array(3, 'bar', 'baz');
> echo array_seek($input, 2); // returns 'baz'
> echo array_seek($input, 0); // returns 3
> echo array_seek($input, 5); // returns NULL, emits an out of range warning

I thinks the user space implementation

function array_seek($array, $pos)
{
$a = array_values($array);
return $a[$pos];
}

is simple enough to not add a native function for this.

It might not be the most efficient way to do it but I doubt that it is
something done frequently enough to justify another native function.

My 2 cents,
- Chris

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] PHP 6

2010-03-16 Thread David Soria Parra

On 2010-03-13, Lukas Kahwe Smith  wrote:
> +1
>
> As for the exact features to merge, lets first start with formulating a plan 
> about what we want to see in the next release. I also think it makes sense to 
> base the number and scope if the features on a rough idea of when we want to 
> see this next release. In order to put together that release plan i guess we 
> should have an RM defined first. I think Andi said the same thing on IRC 
> yesterday.
>
> I can certainly see you as RM, but i would like to propose another newer guy 
> for the job:
> David Soria Parra
for the record: I'm willing to do the RM. Besides my spare time that I spend on 
the
project I have dedicated working time for this.

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] array_seek function

2010-03-16 Thread Richard Quadling

On 16 March 2010 13:30, Felix De Vliegher  wrote:
> Hi all
>
> I recently needed seek functionality in arrays, and couldn't find it in the 
> regular set of array functions, so I wrote a function for it. (Seek = getting 
> an array value based on the position (or offset, if you want to call it like 
> that), and not the key of the item)
>
> Basically you can use it like this:
> $input = array(3, 'bar', 'baz');
> echo array_seek($input, 2); // returns 'baz'
> echo array_seek($input, 0); // returns 3
> echo array_seek($input, 5); // returns NULL, emits an out of range warning
>
> I was wondering if it's useful to add this to the family of array functions. 
> I know there is a somewhat similar thing in SPL (ArrayIterator::seek), but 
> that doesn't work exactly like what I was aiming for.
>
> Attached is a patch for the function against the 5.3 branch. If approved, I 
> could add it to svn + testcases + docs. Feedback please :-)
>
>
> Kind regards,
> Felix
>
>
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>

Maybe not as efficient as it could be but ...

 'Itchy', 'Two' => 'Knee', 'Three' => 'San',
'Four' => 'She');

echo @reset(array_keys(array_values($input), 'Knee'));

Richard.
-- 
-
Richard Quadling
"Standing on the shoulders of some very clever giants!"
EE : http://www.experts-exchange.com/M_248814.html
EE4Free : http://www.experts-exchange.com/becomeAnExpert.jsp
Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731
ZOPA : http://uk.zopa.com/member/RQuadling

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] array_seek function

2010-03-16 Thread Felix De Vliegher

Hi Pierre

Right now, it returns the value of a given position. In that case, 
array_get_pos might be a better name. Oh, and I attached the patch with .txt 
extension :)

Greetings,
Felix

Index: ext/standard/array.c
===
--- ext/standard/array.c(revision 296276)
+++ ext/standard/array.c(working copy)
@@ -4507,6 +4507,41 @@
 }
 /* }}} */
 
+/* {{{ proto array array_seek(array input, int position)
+   Finds the array value which matches the position of that element */
+PHP_FUNCTION(array_seek)
+{
+  int num_in;
+  int currentpos = 0;
+  long pos;
+   zval *array, **entry;
+  HashPosition hpos;
+
+   if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "al", &array, 
&pos) == FAILURE) {
+return;
+  }
+
+  /* Get number of entries in the array */
+  num_in = zend_hash_num_elements(Z_ARRVAL_P(array));
+
+  /* Check if we have a valid position. */
+  if (pos > num_in - 1 || pos < 0) {
+php_error_docref(NULL TSRMLS_CC, E_WARNING, "Seek position %ld is out of 
range", pos);
+return;
+  }
+
+  /* Loop over the input array untill we are at the right position */
+  zend_hash_internal_pointer_reset_ex(Z_ARRVAL_P(array), &hpos);
+  while (currentpos <= pos && zend_hash_get_current_data_ex(Z_ARRVAL_P(array), 
(void **)&entry, &hpos) == SUCCESS) {
+currentpos++;
+zend_hash_move_forward_ex(Z_ARRVAL_P(array), &hpos);
+  }
+
+  /* Return the matching element */
+  RETURN_ZVAL(*entry, 1, 0);
+}
+/* }}} */
+
 /*
  * Local variables:
  * tab-width: 4
Index: ext/standard/basic_functions.c
===
--- ext/standard/basic_functions.c  (revision 296276)
+++ ext/standard/basic_functions.c  (working copy)
@@ -609,6 +609,11 @@
ZEND_ARG_INFO(0, keys)   /* ARRAY_INFO(0, keys, 0) */
ZEND_ARG_INFO(0, values) /* ARRAY_INFO(0, values, 0) */
 ZEND_END_ARG_INFO()
+
+ZEND_BEGIN_ARG_INFO(arginfo_array_seek, 0)
+   ZEND_ARG_INFO(0, input)   /* ARRAY_INFO(0, input, 0) */
+   ZEND_ARG_INFO(0, position) /* ARRAY_INFO(0, position, 0) */
+ZEND_END_ARG_INFO()
 /* }}} */
 /* {{{ basic_functions.c */
 ZEND_BEGIN_ARG_INFO(arginfo_get_magic_quotes_gpc, 0)
@@ -3320,6 +3325,7 @@
PHP_FE(array_chunk, 
arginfo_array_chunk)
PHP_FE(array_combine,   
arginfo_array_combine)
PHP_FE(array_key_exists,
arginfo_array_key_exists)
+  PHP_FE(array_seek,  arginfo_array_seek)
 
/* aliases from array.c */
PHP_FALIAS(pos, current,
arginfo_current)
Index: ext/standard/php_array.h
===
--- ext/standard/php_array.h(revision 296276)
+++ ext/standard/php_array.h(working copy)
@@ -101,6 +101,7 @@
 PHP_FUNCTION(array_key_exists);
 PHP_FUNCTION(array_chunk);
 PHP_FUNCTION(array_combine);
+PHP_FUNCTION(array_seek);
 
 PHPAPI HashTable* php_splice(HashTable *, int, int, zval ***, int, HashTable 
**);
 PHPAPI int php_array_merge(HashTable *dest, HashTable *src, int recursive 
TSRMLS_DC);


On 16-mrt-2010, at 14:34, Pierre Joye wrote:

> hi Felix,
> 
> Not sure about the usefulness of this function but the name is
> misleading (pls reattach the patch as .txt while being at it :). Does
> it set the position (_seek) or does it return the value of a given
> position (_get_pos)? or both (no idea :)?
> 
> Cheers,
> 
> Cheers,
> 
> On Tue, Mar 16, 2010 at 2:30 PM, Felix De Vliegher
>  wrote:
>> Hi all
>> 
>> I recently needed seek functionality in arrays, and couldn't find it in the 
>> regular set of array functions, so I wrote a function for it. (Seek = 
>> getting an array value based on the position (or offset, if you want to call 
>> it like that), and not the key of the item)
>> 
>> Basically you can use it like this:
>> $input = array(3, 'bar', 'baz');
>> echo array_seek($input, 2); // returns 'baz'
>> echo array_seek($input, 0); // returns 3
>> echo array_seek($input, 5); // returns NULL, emits an out of range warning
>> 
>> I was wondering if it's useful to add this to the family of array functions. 
>> I know there is a somewhat similar thing in SPL (ArrayIterator::seek), but 
>> that doesn't work exactly like what I was aiming for.
>> 
>> Attached is a patch for the function against the 5.3 branch. If approved, I 
>> could add it to svn + testcases + docs. Feedback please :-)
>> 
>> 
>> Kind regards,
>> Felix
>> 
>> 
>> 
>> --
>> PHP Internals - PHP Runtime Development Mailing List
>> To unsubscribe, visit: http://www.php.net/unsub.php
>> 
> 
> 
> 
> -- 
> Pierr

Re: [PHP-DEV] array_seek function

2010-03-16 Thread Pierre Joye

hi Felix,

Not sure about the usefulness of this function but the name is
misleading (pls reattach the patch as .txt while being at it :). Does
it set the position (_seek) or does it return the value of a given
position (_get_pos)? or both (no idea :)?

Cheers,

Cheers,

On Tue, Mar 16, 2010 at 2:30 PM, Felix De Vliegher
 wrote:
> Hi all
>
> I recently needed seek functionality in arrays, and couldn't find it in the 
> regular set of array functions, so I wrote a function for it. (Seek = getting 
> an array value based on the position (or offset, if you want to call it like 
> that), and not the key of the item)
>
> Basically you can use it like this:
> $input = array(3, 'bar', 'baz');
> echo array_seek($input, 2); // returns 'baz'
> echo array_seek($input, 0); // returns 3
> echo array_seek($input, 5); // returns NULL, emits an out of range warning
>
> I was wondering if it's useful to add this to the family of array functions. 
> I know there is a somewhat similar thing in SPL (ArrayIterator::seek), but 
> that doesn't work exactly like what I was aiming for.
>
> Attached is a patch for the function against the 5.3 branch. If approved, I 
> could add it to svn + testcases + docs. Feedback please :-)
>
>
> Kind regards,
> Felix
>
>
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>



-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP-DEV] array_seek function

2010-03-16 Thread Felix De Vliegher

Hi all

I recently needed seek functionality in arrays, and couldn't find it in the 
regular set of array functions, so I wrote a function for it. (Seek = getting 
an array value based on the position (or offset, if you want to call it like 
that), and not the key of the item)

Basically you can use it like this:
$input = array(3, 'bar', 'baz');
echo array_seek($input, 2); // returns 'baz'
echo array_seek($input, 0); // returns 3
echo array_seek($input, 5); // returns NULL, emits an out of range warning

I was wondering if it's useful to add this to the family of array functions. I 
know there is a somewhat similar thing in SPL (ArrayIterator::seek), but that 
doesn't work exactly like what I was aiming for.

Attached is a patch for the function against the 5.3 branch. If approved, I 
could add it to svn + testcases + docs. Feedback please :-)


Kind regards,
Felix


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

2010-03-16 Thread Andrey Hristov


dreamcat four wrote:

On Tue, Mar 16, 2010 at 8:30 AM, Lester Caine  wrote:

'3' is not a very processor friendly number, so working with 4 even though
wasteful on memory, does make perfect sense. How long is it since we had a
640k limit on working memory? SERVERS should have a good amount of memory
for caching information anyway. SO is UTF-16 the right approach for
processing wide strings? It needs special code to handle everything wider
than 16 bits, but at what gain really? If all core functionality is handled
as 32 bit characters is there that much of an overhead over the additional
processing to get around strings of dissimilar sizes in UTF-16 ?


Just to re-enforce some of Lester's points above here.

4-byte per character is never slower that 2-bytes per character... its
faster if anything. Bear in mind that 4-byte has been the defacto size
for all modern cpu registers / 32-bit microarchitectures since
like... Forever. Give a c compiler 4bytes of data... it'll say: thank
you very much, and more of the same please! It keeps em happy ;)

Sure UTF-16 can make sense. But only if your external representations
are also in UTF-16. So whats the default Unicode settings for MYSQL,
POSTGRE, etc? Well, are they always set to UTF-8, or UTF-16?

Just do the same as them.

All MySQL GA versions (not including the upcoming 5.5 which is not GA) 
can't eat UTF-16 queries but can receive UTF-16 results (although all 
MySQL GA releases that know character sets, 4.1, 5.0, 5.1, don't know 
anything about UTF-16 but only UCS-2, which are the characters in the 
BMP). It is probable (I can't say definitely due to Oracle's recognition 
rules) that 5.5 will have proper UTF-16. UTF-16 has its advantages.


If your unicode data includes mostly ASCII characters and here and there 
some non-ascii ones, then UTF-8 should be the choice - less disk space 
used, which means the HDD can read more data which in turn means more 
table rows server per second.
Converting in the client (PHP) is ok, as it scales, just throw some more 
web servers. Scaling a RDBMS is completely different story


Best,
Andrey

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] XML binding & mapping library

2010-03-16 Thread Alexey Zakhlestin


On 16.03.2010, at 10:46, John wrote:

> Hello, people. I am looking for community feedback about my 
> ideas for XML binding & persistence library:
> 

Are you thinking about implementing it as some kind of extension? or about 
php-code?
or just reusable C-library with bindings for PHP?
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

2010-03-16 Thread dreamcat four

On Tue, Mar 16, 2010 at 8:30 AM, Lester Caine  wrote:
> '3' is not a very processor friendly number, so working with 4 even though
> wasteful on memory, does make perfect sense. How long is it since we had a
> 640k limit on working memory? SERVERS should have a good amount of memory
> for caching information anyway. SO is UTF-16 the right approach for
> processing wide strings? It needs special code to handle everything wider
> than 16 bits, but at what gain really? If all core functionality is handled
> as 32 bit characters is there that much of an overhead over the additional
> processing to get around strings of dissimilar sizes in UTF-16 ?

Just to re-enforce some of Lester's points above here.

4-byte per character is never slower that 2-bytes per character... its
faster if anything. Bear in mind that 4-byte has been the defacto size
for all modern cpu registers / 32-bit microarchitectures since
like... Forever. Give a c compiler 4bytes of data... it'll say: thank
you very much, and more of the same please! It keeps em happy ;)

Sure UTF-16 can make sense. But only if your external representations
are also in UTF-16. So whats the default Unicode settings for MYSQL,
POSTGRE, etc? Well, are they always set to UTF-8, or UTF-16?

Just do the same as them.

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP-DEV] XML binding & mapping library

2010-03-16 Thread John

Hello, people. I am looking for community feedback about my 
ideas for XML binding & persistence library:

XML binding & persistence library

Description: object-oriented library for mapping XML file 
structure, binding to PHP 5 class. Library would provide 
functionality for using XML based entity descriptions, 
described in XML Schema, to perform data manipulations(CRUD 
facility), and would use described entity relations for 
persisting XML entries. Association and/or aggregation would 
represent O-O abstractions for XML entity relations. For 
performing queries would be used either on-the-fly or 
automatically generated translation of XML entity properties, 
characteristics to XPath expressions, which can give a 
performance boost for production (read: non-development) 
usage. Also should be implemented functionality for building 
XML formatters from XML Schema, which itself avoids a lot of 
script execution speed reduction for preparing & splitting 
together XML tree data structures

Main implementation reasons and usage opportunities:

*

  Most of XML structures represent complex entries, where 
both nested tags and tag properties represent characteristics 
and behavior. This usually results in escalation of 
complexity of XML interpretation rules, code will be more 
heavy. Additional development penalty is that often such 
rules are less similar against each other – amount of 
reusable code is very low. Also data entries, related to 
entities with deeply nested relations, would require 
manipulation of a group of files as a single transaction to 
provide persistence; normally there is no way to reuse XPath 
queries in a way, that is entry based

 1.

I have motivation to go even farther: iterators 
would act as a fork, moving back an forth (development stage) 
threw a set of tags and/or tag properties. Iterators can 
become reusable in different projects, for different XML 
structures, simply changing naming convention (read: the way 
how the tag names, with respect to W3C qualified name 
features)
 2. Can help a lot in translating XML binding code 
threw XPath expressions (because of specific W3C 
specification rules, which supposed to query all matched 
entries by default). Even possible to force relations in 
generated XPath to speed up querying – a complex expression 
can specify one rules before another (for example, if 
specific ancestor in XML tree would arrive less often then 
some tag property, it can be possible to mention ancestor 
requirement before property requirement – less line count 
would be iterated*) 

*

  You could use PHP 5 reflection and O-O features of PHP 
5 to describe classes. And after that you can simply generate 
required XML Schema. Awesome for using non-DOM (read: SAX) 
libraries at production environment, especially if there 
would be a mix of PHP 5 code and, say, Java code (by the way 
– code generator can produce code for other languages, if 
necessary). Good for situations where you should develop some 
administration tools with web interface in PHP 5, which would 
operate on Java software XML based configuration files (for 
example, XML configuration file for Java message broker)
*

  Of course, the well known mixing of database 
mapping(ORM) and XML mapping. Would be awesome if there would 
be an import of ORM classes to XML related ones and vice 
versa – more flexible RDBMS table structure. Awesome for 
complex storage manipulations (storing objects and/or their 
entities in XML and RDBMS, without an performance penalty of 
forcing RDBMS to format XML for you). And, probably the most 
awesome here – you can move data from XML storage to RDBMS 
and vice versa for balancing load performance, especially if 
data characteristics can vary a lot. Generation of 
performance tests can be added if necessary
*

  XML web site maps. Yea, those are often in XML. Also 
their CMS systems, that can manage site structure threw XML. 
And because MVC frameworks are most wide solution – the site 
map itself is referred (directly or not) to controller class 
– action name pairs. As a result – it is hard to forget about 
automatic testing, that is where PHPUnit can come up. Main 
idea – use site map as a prototype for defining more complex 
mapping. Some kind of XML tree with entities described 
(entries are referred to controller-index pairs, remember), 
where entities described in a test feature driven manner; you 
can force your testing automation with a guaranty of covering 
necessary QA features, even connect QA software to 
development tools. Note: DOM library would be used at 
background due to complexity of testing configuration
*

  And, of course, the famous RESTful Web services ;) At 
least can help in data deduplication...

Thanks. John aka webautoma...@gmail.com

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

2010-03-16 Thread Lester Caine


Stanislav Malyshev wrote:

Hi!


What I am probably asking is what was the brick wall PHP6 hit. I was
under the impression that there was no agreement on 'switchable or only'
to unicode core? ( And those who did write PHP6 books seemed to have
their own views on which way the discussions would go ;) ).


 From what I can see, the biggest issues are these:
1. Performance - Unicode-based PHP right now requires tons of
conversions when talking to outside world (like MySQL) which slows down
the app significantly. Many extensions frequently used by PHP app
writers (such as mysql, pcre, etc.) do not support UTF-16 properly.
Also, inflated memory usage hurts scalability a lot.
2. Compatibility - it's hard to make existing app works with Unicode and
doesn't lose in performance or doesn't have any weird scenarios where
your passwords suddenly stop working because there's an extra recoding
step in some md5() call.


I think that there does need to be a proper review of just what the target is?

There are a number of 'unknowns' such as how does one identify the version of 
unicode being used. Differences seem to exist between OS's which don't help with 
that problem?


On disk storage should probably be UTF-8 without any question? Windows use of 
widestrings for some files simple doubles up the on disk storage requirements 
for very little gain? And remembering to convert '.reg' files back to normal raw 
text so I can read them on the Linux machines adds to the fun.


In memory handling of character strings is I think where some alternative 
methods may be appropriate. Firebird's original UNICODE_FSS collation was 3 
bytes per character ( that IS the limit for Unicode ;) ) and so all of the 
character counting stuff works transparently. Firebird records are automatically 
compressed before storage, so white space in character strings is not wasting 
space on disk, and the unicode collations get compressed in the same way.


'3' is not a very processor friendly number, so working with 4 even though 
wasteful on memory, does make perfect sense. How long is it since we had a 640k 
limit on working memory? SERVERS should have a good amount of memory for caching 
information anyway. SO is UTF-16 the right approach for processing wide strings? 
It needs special code to handle everything wider than 16 bits, but at what gain 
really? If all core functionality is handled as 32 bit characters is there that 
much of an overhead over the additional processing to get around strings of 
dissimilar sizes in UTF-16 ?


Most of my own data handling is done via the database anyway, so queries return 
data already sorted and filtered. There is no point pulling un-proccessed data 
and then throwing much of it away, hence the rest of the infrastructure being 
used is important to get the best performance?


Probably 90% of the time a string will come in and go out without requiring any 
processing at all, so leave it as UTF-8 ? The only time we need to accurately 
know the number and position of characters is when we need to do some sting 
processing, and then only if the strings use multibyte characters. SO how about 
an additional couple of flags on a string variable. When a UTF-8 string is 
loaded, it is counted for bytes, and characters, and number of bytes per. If 
bytes and characters are the same ... no problems. If number of bytes is greater 
than 1, then sting handling needs to 'open them up' before processing, and '2' 
just uses an efficient UTF-16 processing, while '3+' goes to 32 bit processing?


Am I missing something? Why does unicode have to complicate things when in 
reality they are quite simple? Legacy stuff gets converted to UTF-8 and in many 
cases the user will not even see a difference, but the 'unicode on/off' switch 
just allows 127 single byte characters rather than 255 ? Currently all the 
multilingual stuff IS passing through PHP transparently and it would seem we can 
use unicode for variable names? So what IS missing?


--
Lester Caine - G8HFL
-
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

52 matches

Mail list logo