Re: [PHP-DEV] How deep is copy on write?
Using references does not speed up PHP. It does that already internally, if I'm not mistaken. The point of my post was that assigning values to tree arrays are in general faster than a full array copy. Hannes On 19 January 2011 08:36, Ben Schmidt mail_ben_schm...@yahoo.com.au wrote: Yep. PHP does clock up memory very quickly for big arrays, objects with lots of members and/or lots of small objects with large overheads. There are a LOT of zvals and zobjects and things around the place, and their overhead isn't all that small. Of course, if you go to the trouble to construct arrays using references, you can avoid some of that, because a copy-on-write will just copy the reference. It does mean you're passing references, though. $bar['baz'] = 1; $poink['narf'] = 1; $a['foo']['bar'] = $bar; $a['foo']['poink'] = $poink; Then if you test($a), $bar and $poink will be changed, since they are 'passed by reference'--no copying needs to be done. It's almost as if $b were passed by reference, but setting $b['blip'] wouldn't show up in $a, because $a itself would be copied in that case, including the references, which would continue to refer to $bar and $poink. So a much quicker copy, but obviously not the same level of isolation that you might expect or desire. Unless you did some jiggerypokery like $b_bar=$b['bar']; $b['bar']=$b_bar; which would break the reference and make a copy of just that part of the array. But this is a pretty nasty caller-callee co-operative kind of thing. Just a thought to throw into the mix, though. Disclaimer: I'm somewhat out of my depth here. But I'm sure someone will jump on me if I'm wrong. Ben. On 19/01/11 6:09 PM, Larry Garfield wrote: That's what I was afraid of. So it does copy the entire array. Crap. :-) Am I correct that each level in the array represents its own ZVal, with the additional memory overhead a ZVal has (however many bytes that is)? That is, the array below would have $a, foo, bar, baz, bob, narf, poink, poink/narf = 8 ZVals? (That seems logical to me because each its its own variable that just happens to be an array, but I want to be sure.) --Larry Garfield On Wednesday, January 19, 2011 1:01:44 am Ben Schmidt wrote: It does the whole of $b. It has to, because when you change 'baz', a reference in 'bar' needs to change to point to the newly copied 'baz', so 'bar' is written...and likewise 'foo' is written. Ben. On 19/01/11 5:45 PM, Larry Garfield wrote: Hi folks. I have a question about the PHP runtime that I hope is appropriate for this list. (If not, please thwap me gently; I bruise easily.) I know PHP does copy-on-write. However, how deeply does it copy when dealing with nested arrays? This is probably easiest to explain with an example... $a['foo']['bar']['baz'] = 1; $a['foo']['bar']['bob'] = 1; $a['foo']['bar']['narf'] = 1; $a['foo']['poink']['narf'] = 1; function test($b) { // Assume each of the following lines in isolation... // Does this copy just the one variable baz, or the full array? $b['foo']['bar']['baz'] = 2; // Does this copy $b, or just $b['foo']['poink']? $b['foo']['poink']['stuff'] = 3; return $b; } // I know this is wasteful; I'm trying to figure out just how wasteful. $a = test($a); test() in this case should take $b by reference, but I'm trying to determine how much of a difference it is. (In practice my use case has a vastly larger array, so any inefficiencies are multiplied.) --Larry Garfield -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] [citations for] Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor
On 19/01/11 16:14, Sam Vilain wrote: In general, Java's basic types typically correspond with types that can be dealt with atomically by processors, or are small enough to be passed by value. This already makes things a lot easier. I've had another reason for the differences explained to me. I'm not sure I understand it fully enough to be able to re-explain it, but I'll try anyway. As I grasped the concept, the key to making VMs fully threadable with shared state, is to first allow reference addresses to change, such as via generational garbage collection. This allows you to have much clearer stack frames, perhaps even really stored on the thread-local/C stack, as opposed to most dynamic language interpreters which barely use the C stack at all. Then, when the long-lived objects are discovered at scope exit time they can be safely moved into the next memory pool, as well as letting access to old objects be locked (or copied, in the case of Software Transactional Memory). Access to objects in your own frame can therefore be fast, and the number of locks that have to be held reduced. Ref: http://java.sun.com/docs/books/jvms/second_edition/html/Concepts.doc.html#33308 and to a lesser extent, the note on http://java.sun.com/docs/books/jvms/second_edition/html/Threads.doc.html#22244 Perhaps to support/refute this argument, in your JVM, how do you handle: - memory allocation: object references' timeline and garbage collection - call stack frames and/or return continuations - the C stack or the heap? - atomicity of functions (that's the synchronized keyword?) - timely object destruction put it forward that the overall design of the interpreter, and therefore what is possible in terms of threading, is highly influenced by these factors. When threading in C or C++ for instance (and this includes HipHop-TBB), the call stack frame is on the C stack, so shared state is possible so long as you pass heap pointers around and synchronise appropriately. The virtual machine is of a different nature, and it can work. For JVMs, as far as I know references are temporary and again the nature of the execution environment is different. For VMs where there is basically nothing on the stack, and everything on the heap, it becomes a lot harder. To talk about a VM I know better, Perl has about 6 internal stacks all represented on the heap; a function call/return stack, a lexical scope stack to represent what is in scope, a variable stack (the tmps stack) for variables declared in those scopes and for timely destruction, a stack to implement local($var) called the save stack, a mark stack used for garbage collection, ok well only 5 but I think you get my point. From my reading of the PHP internals so far there are similar set there too, so comparisons are quite likely to be instructive. It's a bit hard figuring out everything that is going on internally (all these internal void* types don't help either), and whether or not there is some inherent property of reference counting, or whether it just makes a shared state model harder, is a question I'm not sure is easy to answer Based on https://github.com/smarr/RoarVM/blob/98caf11d0/README.rst it can be seen that indeed it is a completely different architecture. From the first of the ACM papers' abstract: In addition to the cost of inter-core communication, two hardware characteristics influenced our design: the absence of hardware-provided cache-coherence, and the inability to move a single object from one core's cache to another's without changing its address. In any case, full shared state is not required for a large set of useful parallelism APIs, and in fact contains a number of pitfalls which are difficult to explain, debug and fix. I'm far more interested in simple acceleration of tight loops - to make use of otherwise idle CPU cores (perhaps virtual as in hyperthreading) to increase throughput - and APIs like map express this well. The idea is that the executor can start up with no variables in scope, though hopefully shared code segments, call some function on the data it is passed in, and pass the answers back to the main thread and then set about cleaning itself up. You could probably support this with any paper on Erlang ;-) Sam -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] How deep is copy on write?
What about objects? class Foo { public $foo; } function test($o) { $o-foo-foo-foo = 2; } $bar = new Foo; $bar-foo = new Foo; $bar-foo-foo = new Foo; test( $bar ); --- Also... is it better to pass an object as a parameter rather than many values? function withValues($anInteger, $aBool, $aString) { var_dump($anInteger, $aBool, $aString); } function withObject(ParamOject $o) { var_dump( $o-theInteger(), $o-theBool(), $o-theString() ); } Martin Scotta On Wed, Jan 19, 2011 at 5:03 AM, Hannes Landeholm landeh...@gmail.comwrote: Using references does not speed up PHP. It does that already internally, if I'm not mistaken. The point of my post was that assigning values to tree arrays are in general faster than a full array copy. Hannes On 19 January 2011 08:36, Ben Schmidt mail_ben_schm...@yahoo.com.au wrote: Yep. PHP does clock up memory very quickly for big arrays, objects with lots of members and/or lots of small objects with large overheads. There are a LOT of zvals and zobjects and things around the place, and their overhead isn't all that small. Of course, if you go to the trouble to construct arrays using references, you can avoid some of that, because a copy-on-write will just copy the reference. It does mean you're passing references, though. $bar['baz'] = 1; $poink['narf'] = 1; $a['foo']['bar'] = $bar; $a['foo']['poink'] = $poink; Then if you test($a), $bar and $poink will be changed, since they are 'passed by reference'--no copying needs to be done. It's almost as if $b were passed by reference, but setting $b['blip'] wouldn't show up in $a, because $a itself would be copied in that case, including the references, which would continue to refer to $bar and $poink. So a much quicker copy, but obviously not the same level of isolation that you might expect or desire. Unless you did some jiggerypokery like $b_bar=$b['bar']; $b['bar']=$b_bar; which would break the reference and make a copy of just that part of the array. But this is a pretty nasty caller-callee co-operative kind of thing. Just a thought to throw into the mix, though. Disclaimer: I'm somewhat out of my depth here. But I'm sure someone will jump on me if I'm wrong. Ben. On 19/01/11 6:09 PM, Larry Garfield wrote: That's what I was afraid of. So it does copy the entire array. Crap. :-) Am I correct that each level in the array represents its own ZVal, with the additional memory overhead a ZVal has (however many bytes that is)? That is, the array below would have $a, foo, bar, baz, bob, narf, poink, poink/narf = 8 ZVals? (That seems logical to me because each its its own variable that just happens to be an array, but I want to be sure.) --Larry Garfield On Wednesday, January 19, 2011 1:01:44 am Ben Schmidt wrote: It does the whole of $b. It has to, because when you change 'baz', a reference in 'bar' needs to change to point to the newly copied 'baz', so 'bar' is written...and likewise 'foo' is written. Ben. On 19/01/11 5:45 PM, Larry Garfield wrote: Hi folks. I have a question about the PHP runtime that I hope is appropriate for this list. (If not, please thwap me gently; I bruise easily.) I know PHP does copy-on-write. However, how deeply does it copy when dealing with nested arrays? This is probably easiest to explain with an example... $a['foo']['bar']['baz'] = 1; $a['foo']['bar']['bob'] = 1; $a['foo']['bar']['narf'] = 1; $a['foo']['poink']['narf'] = 1; function test($b) { // Assume each of the following lines in isolation... // Does this copy just the one variable baz, or the full array? $b['foo']['bar']['baz'] = 2; // Does this copy $b, or just $b['foo']['poink']? $b['foo']['poink']['stuff'] = 3; return $b; } // I know this is wasteful; I'm trying to figure out just how wasteful. $a = test($a); test() in this case should take $b by reference, but I'm trying to determine how much of a difference it is. (In practice my use case has a vastly larger array, so any inefficiencies are multiplied.) --Larry Garfield -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [citations for] Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor
I think the point is that the php language itself does not provide solid construct for writing rock-solid code. Yes, there are many programmers/hackers that can, but the effort they put is huge. it's so easy to break well-written bug-free code, that's impossible for developers to share libraries, and even those who share has the problems that the language does not provides the language construct for the system to evolve without breaking its clients code. As you were speaking about Java, we must learn from Java experience. All that non-sense stuff that it imposes is the same stuff that provide to Java developers to share their libraries. All you need to do is put the .jar in your classpath, and that's it. In Java you are free to extend a class --yours or imported-- without worries about it's internal implementation. Is that possible in PHP? nope. __construct breaks that. So instead of hacking the language, why don't we start by adding better language constructs. Look at the foreach statement and the Iterators, that is a really good example of a well-designed language construct. I'm really interested on threads for PHP, but as a language construct. Threads are not easy, even the most experienced programmer could not get it right from the scratch. IMHO, as a simple PHP programmer, the language should provide the simplest language construct and the engine should handle all the complexity under the hood. Martin Scotta On Wed, Jan 19, 2011 at 8:40 AM, Sam Vilain sam.vil...@openparallel.comwrote: On 19/01/11 16:14, Sam Vilain wrote: In general, Java's basic types typically correspond with types that can be dealt with atomically by processors, or are small enough to be passed by value. This already makes things a lot easier. I've had another reason for the differences explained to me. I'm not sure I understand it fully enough to be able to re-explain it, but I'll try anyway. As I grasped the concept, the key to making VMs fully threadable with shared state, is to first allow reference addresses to change, such as via generational garbage collection. This allows you to have much clearer stack frames, perhaps even really stored on the thread-local/C stack, as opposed to most dynamic language interpreters which barely use the C stack at all. Then, when the long-lived objects are discovered at scope exit time they can be safely moved into the next memory pool, as well as letting access to old objects be locked (or copied, in the case of Software Transactional Memory). Access to objects in your own frame can therefore be fast, and the number of locks that have to be held reduced. Ref: http://java.sun.com/docs/books/jvms/second_edition/html/Concepts.doc.html#33308 and to a lesser extent, the note on http://java.sun.com/docs/books/jvms/second_edition/html/Threads.doc.html#22244 Perhaps to support/refute this argument, in your JVM, how do you handle: - memory allocation: object references' timeline and garbage collection - call stack frames and/or return continuations - the C stack or the heap? - atomicity of functions (that's the synchronized keyword?) - timely object destruction put it forward that the overall design of the interpreter, and therefore what is possible in terms of threading, is highly influenced by these factors. When threading in C or C++ for instance (and this includes HipHop-TBB), the call stack frame is on the C stack, so shared state is possible so long as you pass heap pointers around and synchronise appropriately. The virtual machine is of a different nature, and it can work. For JVMs, as far as I know references are temporary and again the nature of the execution environment is different. For VMs where there is basically nothing on the stack, and everything on the heap, it becomes a lot harder. To talk about a VM I know better, Perl has about 6 internal stacks all represented on the heap; a function call/return stack, a lexical scope stack to represent what is in scope, a variable stack (the tmps stack) for variables declared in those scopes and for timely destruction, a stack to implement local($var) called the save stack, a mark stack used for garbage collection, ok well only 5 but I think you get my point. From my reading of the PHP internals so far there are similar set there too, so comparisons are quite likely to be instructive. It's a bit hard figuring out everything that is going on internally (all these internal void* types don't help either), and whether or not there is some inherent property of reference counting, or whether it just makes a shared state model harder, is a question I'm not sure is easy to answer Based on https://github.com/smarr/RoarVM/blob/98caf11d0/README.rst it can be seen that indeed it is a completely different architecture. From the first of the ACM papers' abstract: In addition to the cost of inter-core communication, two
Re: [PHP-DEV] [citations for] Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor
hi, On Wed, Jan 19, 2011 at 4:41 PM, Martin Scotta martinsco...@gmail.com wrote: I think the point is that the php language itself does not provide solid construct for writing rock-solid code. Yes, there are many programmers/hackers that can, but the effort they put is huge. Care to enlighten me and tell me what is missing to allow one to write rock-solid code? it's so easy to break well-written bug-free code, that's impossible for developers to share libraries, and even those who share has the problems that the language does not provides the language construct for the system to evolve without breaking its clients code. I think that most of PHP is actually thread safe. And almost all libraries are now either thread safe or used in a way that makes them thread safe. Now, about making the engine itself and the userland scripts able to implement parallelized functions for multi-core architecture (which is very disputable in a web environment, btw), that's a totally different topic and I don't think it is worth the effort. I'm really interested on threads for PHP, but as a language construct. Threads are not easy, even the most experienced programmer could not get it right from the scratch. Most of the time what PHP needs are non blocking operations, not necessary multi threaded operations. That's what some of the newly implemented features do (like in mysqlnd, to fetch the data). IMHO, as a simple PHP programmer, the language should provide the simplest language construct and the engine should handle all the complexity under the hood. Honestly if a given part of an application needs something along this line for performance reasons, then doing that on the same box where the request is executed may be a bad idea. Tools like gearman will do a far better jobs and will let you do resource intensive processing on other machines where cores may not be already busy serving other requests. my 2 cents based on my experiences and benches in this area, Cheers, -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [citations for] Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor
On 1/19/11 7:50 AM, Pierre Joye wrote: Honestly if a given part of an application needs something along this line for performance reasons, then doing that on the same box where the request is executed may be a bad idea. Tools like gearman will do a far better jobs and will let you do resource intensive processing on other machines where cores may not be already busy serving other requests. my 2 cents based on my experiences and benches in this area, In real-world situations this is what I see as well. People either want to parallelize operations like fetching data from multiple URLs at once, where they think they need threading, but actually just need to learn the async calls, or they want to background something that takes a while to finish. This second case is much better handled by a separate job manager like Gearman. One example I have written is a rule engine that calculates a trust score for a financial transaction. The rules can get a bit complicated so it isn't something I want to have the web request wait on. Using the Kohana framework the call to kick off the rule engine looks like this: Gearman::doBackground('kohana', gearman/payment_score/{$payment-id}) And I have a 'kohana' gearman worker that loads the entire framework which means my actual worker code is just another controller that looks exactly like my Web code. Any controller can be backgrounded that way with the added advantage that I can distribute these backgrounded jobs to a pool of worker servers that are separate from my frontend web servers, but they all run the same code stack. To me this is a much more flexible way to solve the problem that having to write thread-management code in my Web code and have my already overloaded web servers take on more work. -Rasmus -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] How deep is copy on write?
On Wed, 19 Jan 2011 14:23:49 -, Martin Scotta martinsco...@gmail.com wrote: What about objects? With objects less copying occurs because the object value (zval) data is actually just a pointer and an id that for most purposes works as a pointer. However, it should be said that while a copy of an array forces more memory to be copied, the inner zvals are not actually copied. In this snippet: $a = array(1, 2, array(3)); $b = $a; function separate($dummy) { } separate($a); the copy that occurs when you force the separation of the zval that is shared by $a and $b ($b = $a doesn't copy the array in $a to $b, it merely copies the zval pointer of $a to $b and increments its reference count) is just a shallow copy of hash table and a increment of the first level zvals' refcounts. This means the zvals that have their pointers stored in the array $a's HashTable are not themselves copied. Interestingly (or should I say, unfortunately), this happens even if the inner zvals are references. See http://php.net/manual/en/language.references.whatdo.php the part on arrays. class Foo { public $foo; } function test($o) { $o-foo-foo-foo = 2; } $bar = new Foo; $bar-foo = new Foo; $bar-foo-foo = new Foo; test( $bar ); This example shows no copying (in the sense of new zval allocation on passing or assignment) at all. --- Also... is it better to pass an object as a parameter rather than many values? function withValues($anInteger, $aBool, $aString) { var_dump($anInteger, $aBool, $aString); } function withObject(ParamOject $o) { var_dump( $o-theInteger(), $o-theBool(), $o-theString() ); } It should be indifferent. In normal circumstances, there is no zval copying at all (only the pointers of arguments' symbols are copied). Only when you start throwing references into the mix will you start forcing copied. -- Gustavo Lopes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] How deep is copy on write?
So it sounds like the general answer is that if you pass a complex array to a function by value and mess with it, data is duplicated for every item you modify and its direct ancestors up to the root variable but not for the rest of the tree. For objects, because of their pass by handle-type behavior you are (usually) modifying the same data directly so there's no duplication. Does that sound correct? Related: What is the overhead of a ZVal? I'm assuming it's a fixed number of bytes. --Larry Garfield On 1/19/11 11:27 AM, Gustavo Lopes wrote: On Wed, 19 Jan 2011 14:23:49 -, Martin Scotta martinsco...@gmail.com wrote: What about objects? With objects less copying occurs because the object value (zval) data is actually just a pointer and an id that for most purposes works as a pointer. However, it should be said that while a copy of an array forces more memory to be copied, the inner zvals are not actually copied. In this snippet: $a = array(1, 2, array(3)); $b = $a; function separate($dummy) { } separate($a); the copy that occurs when you force the separation of the zval that is shared by $a and $b ($b = $a doesn't copy the array in $a to $b, it merely copies the zval pointer of $a to $b and increments its reference count) is just a shallow copy of hash table and a increment of the first level zvals' refcounts. This means the zvals that have their pointers stored in the array $a's HashTable are not themselves copied. Interestingly (or should I say, unfortunately), this happens even if the inner zvals are references. See http://php.net/manual/en/language.references.whatdo.php the part on arrays. class Foo { public $foo; } function test($o) { $o-foo-foo-foo = 2; } $bar = new Foo; $bar-foo = new Foo; $bar-foo-foo = new Foo; test( $bar ); This example shows no copying (in the sense of new zval allocation on passing or assignment) at all. --- Also... is it better to pass an object as a parameter rather than many values? function withValues($anInteger, $aBool, $aString) { var_dump($anInteger, $aBool, $aString); } function withObject(ParamOject $o) { var_dump( $o-theInteger(), $o-theBool(), $o-theString() ); } It should be indifferent. In normal circumstances, there is no zval copying at all (only the pointers of arguments' symbols are copied). Only when you start throwing references into the mix will you start forcing copied. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] How deep is copy on write?
On 19 January 2011 20:05, la...@garfieldtech.com la...@garfieldtech.com wrote: So it sounds like the general answer is that if you pass a complex array to a function by value and mess with it, data is duplicated for every item you modify and its direct ancestors up to the root variable but not for the rest of the tree. For objects, because of their pass by handle-type behavior you are (usually) modifying the same data directly so there's no duplication. Does that sound correct? Related: What is the overhead of a ZVal? I'm assuming it's a fixed number of bytes. http://lmgtfy.com/?q=php+zvall=1 Regards Peter -- hype WWW: plphp.dk / plind.dk LinkedIn: plind BeWelcome/Couchsurfing: Fake51 Twitter: kafe15 /hype -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [citations for] Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor
Hi! I think the point is that the php language itself does not provide solid construct for writing rock-solid code. Yes, there are many programmers/hackers that can, but the effort they put is huge. I think this is completely untrue. In Java you are free to extend a class --yours or imported-- without worries about it's internal implementation. Is that possible in PHP? nope. __construct breaks that. Could you please explain what you mean? How __construct breaks extending a class? IMHO, as a simple PHP programmer, the language should provide the simplest language construct and the engine should handle all the complexity under the hood. I see no way of hiding threads complexity under the hood - if you want threads, you'll need to deal with synchronization, locking, race conditions, etc. Do you see any way to avoid it? -- Stanislav Malyshev, Software Architect SugarCRM: http://www.sugarcrm.com/ (408)454-6900 ext. 227 -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor
Have you taken a look at Runkit_Sandbox? It may provide useful tips. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor
On 20/01/11 10:17, Ángel González wrote: Have you taken a look at Runkit_Sandbox? It may provide useful tips. *headdesk* No, I hadn't seen that. Thanks for pointing this out, it looks like exactly what I was trying to reinvent... Cheers, Sam. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor
On 19/01/11 23:10, Sam Vilain wrote: On 20/01/11 10:17, Ángel González wrote: Have you taken a look at Runkit_Sandbox? It may provide useful tips. *headdesk* No, I hadn't seen that. Thanks for pointing this out, it looks like exactly what I was trying to reinvent... Cheers, Sam. You may need to patch it to work on 5.3 as-is. Patches at its bugzilla are your friend. Dmitry Zenovich was going to take care of maintaining it, but I don't know if he finally got his account or not. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] How deep is copy on write?
On 20/01/11 6:05 AM, la...@garfieldtech.com wrote: So it sounds like the general answer is that if you pass a complex array to a function by value and mess with it, data is duplicated for every item you modify and its direct ancestors up to the root variable but not for the rest of the tree. For objects, because of their pass by handle-type behavior you are (usually) modifying the same data directly so there's no duplication. Does that sound correct? Yes. Related: What is the overhead of a ZVal? I'm assuming it's a fixed number of bytes. It seems not, though a zval has a fixed size. What that size is will depend on the compiler and architecture of the system being used, or at least on the ABI. From zend.h: typedef union _zvalue_value { long lval; /* long value */ double dval;/* double value */ struct { char *val; int len; } str; HashTable *ht; /* hash table value */ zend_object_value obj; } zvalue_value; struct _zval_struct { /* Variable information */ zvalue_value value; /* value */ zend_uint refcount__gc; zend_uchar type;/* active type */ zend_uchar is_ref__gc; }; The zvalue_value union will probably be 8 or 12 bytes, depending on the architecture. The whole struct will then probably be between 14 and 24 bytes, depending on the architecture and structure alignment and so on. For my system: $ cd php-5.3.3 $ ./configure $ cd Zend $ gcc -I. -I../TSRM -x c - END #include zend.h int main(void) { printf(%lu\n,sizeof(zval)); return 0; } END $ file ./a.out ./a.out: Mach-O 64-bit executable $ ./a.out 24 $ gcc -I. -I../TSRM -arch i386 -x c - END #include zend.h int main(void) { printf(%lu\n,sizeof(zval)); return 0; } END $ file ./a.out ./a.out: Mach-O executable i386 $ ./a.out 16 You can figure out what you think the overhead is from that. For a string, arguably the whole structure is overhead, since the string is stored elsewhere via pointer. Likewise for objects. For a double, the payload is 8 bytes, and stored in the zval, so there's less overhead. An integer, with a payload of 4 bytes, is somewhere in between. Ben. --Larry Garfield On 1/19/11 11:27 AM, Gustavo Lopes wrote: On Wed, 19 Jan 2011 14:23:49 -, Martin Scotta martinsco...@gmail.com wrote: What about objects? With objects less copying occurs because the object value (zval) data is actually just a pointer and an id that for most purposes works as a pointer. However, it should be said that while a copy of an array forces more memory to be copied, the inner zvals are not actually copied. In this snippet: $a = array(1, 2, array(3)); $b = $a; function separate($dummy) { } separate($a); the copy that occurs when you force the separation of the zval that is shared by $a and $b ($b = $a doesn't copy the array in $a to $b, it merely copies the zval pointer of $a to $b and increments its reference count) is just a shallow copy of hash table and a increment of the first level zvals' refcounts. This means the zvals that have their pointers stored in the array $a's HashTable are not themselves copied. Interestingly (or should I say, unfortunately), this happens even if the inner zvals are references. See http://php.net/manual/en/language.references.whatdo.php the part on arrays. class Foo { public $foo; } function test($o) { $o-foo-foo-foo = 2; } $bar = new Foo; $bar-foo = new Foo; $bar-foo-foo = new Foo; test( $bar ); This example shows no copying (in the sense of new zval allocation on passing or assignment) at all. --- Also... is it better to pass an object as a parameter rather than many values? function withValues($anInteger, $aBool, $aString) { var_dump($anInteger, $aBool, $aString); } function withObject(ParamOject $o) { var_dump( $o-theInteger(), $o-theBool(), $o-theString() ); } It should be indifferent. In normal circumstances, there is no zval copying at all (only the pointers of arguments' symbols are copied). Only when you start throwing references into the mix will you start forcing copied. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor
Hi Sam: (becomes off-topic here, but for the sake of argument) On 19 Jan 2011, at 04:14, Sam Vilain wrote: On 19/01/11 10:50, Stefan Marr wrote: On 18 Jan 2011, at 22:16, Sam Vilain wrote: there doesn't seem to be an interpreter under the sun which has successfully pulled off threading with shared data. Could you explain what you mean with that statement? Sorry, but that's my topic, and the most well know interpreters that 'pulled off' threading with shared data are for Java. The interpreter I am working on is for manycore systems (running on a 64-core Tilera chip) and executes Smalltalk (https://github.com/smarr/RoarVM). You raise a very good point. My statement is too broad and should probably apply only to dynamic languages, executed on reference counted VMs. Look at some major ones - PHP, Python, Ruby, Perl, most JS engines - none of them actually thread properly. Ok, but the reason here is that building such VMs is inherently complex. And it has nothing to do with dynamic or not, with typed or what ever. The mentioned languages happen to be very successful in the domain of web applications, and as others already mentioned, the need for fine-grained shared-memory parallelism here is not clear. So, why don't we have Python without the GIL? Because nobody cared enough. However, there is still JRuby... Well, Perl's threading does run full speed, but actually copies every variable on the heap for each new thread, massively bloating the process. Cutting corners is the only way, if you do not have a great team of engineers. For the RoarVM we also have to cut more corners than we would like. So the question is why should this be so, if C++ and Java, even interpreted on a JVM, can do it? JVMs suffer from the same complexity. And C++, well, last time I checked there is just no threading model. There will be a memory model in C++0x, but there is nothing which makes it inherently hard to implement. Since you don't get any guarantees (beside the memory model semantics) and you don't have any GC either. In general, Java's basic types typically correspond with types that can be dealt with atomically by processors, or are small enough to be passed by value. This already makes things a lot easier. I don't think that buys you anything. Which basic types can be pass by copy? Ints, and bools perhaps. That takes a bit pressure from the GC, but does not really help with making things safe. Smalltalk does not know basic types. However, it knows an implementation technique called tagged pointers/tagged integers. This allows you to have 31-bit integers since pointer are aligned and do not need all bits. However, that really helps only with GC pressure. I've had another reason for the differences explained to me. I'm not sure I understand it fully enough to be able to re-explain it, but I'll try anyway. As I grasped the concept, the key to making VMs fully threadable with shared state, is to first allow reference addresses to change, such as via generational garbage collection. Hm, there is usually the wish that you can run your GC threads in parallel with mutator threads, here it is indeed helpful to support moving GCs. But how does it help with threads working in parallel on some shared object? Any point were an object is allowed to move requires synchronization. So, either someone has to change the pointer you own to that object, or you need an additional level of indirection. I guess you are talking here about having such an additional indirection, object handles? This allows you to have much clearer stack frames, perhaps even really stored on the thread-local/C stack, as opposed to most dynamic language interpreters which barely use the C stack at all. Why does having object handles give you a better stack frame layout? Using the C stack can be helpful for performance, well, makes other languages features harder to implement. For instance what about closures? Other techniques like recycling you stack-frame-objects is usually a simpler optimization without making it harder to stuff like closures. Then, when the long-lived objects are discovered at scope exit time they can be safely moved into the next memory pool, Ui ui ui. Slooow. I don't follow. Ok, there are things like escape analysis. And then there are techniques like on-stack-allocation. Both usually done in JIT compilers, not so much in interpreters. Are we still talking about interpreters? Or are you implying a incremental GC that is triggered on the return of method calls? as well as letting access to old objects be locked (or copied, in the case of Software Transactional Memory). There are to many things here discussed in a single sentence. Sorry, I am lost. Access to objects in your own frame can therefore be fast, and the number of locks that have to be held reduced. Ok, on-stack-allocation and biased locking? - memory allocation: object references' timeline and
Re: [PHP-DEV] [citations for] Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor
Many PHP features should be language constructs, but they were made as language hacks. __construct is evil, as like any other language hack It does not provides a safe fundation to build safe abstractions, reusable and extendibles components, which leads to the lack of PHP libraries. Let's suppose there is a library that provides an utility class, which has no super class nor constructor. // lives in library.phar class Utility { } A client uses this class by extending it // includes library.phar class Client extends Utility { function __construct() { // client initialization code here } } At that point the Utility class can not add __construct safely, and if it does Client will break it, it's not calling the constructor. but what happen if Utility provides a __constructor class Utility { function __construct() { // Utility initialization here } } class Client { function __construct() { // some code parent::__construct(); // as good client call the super class // and then more code } } In this case the Utility is forced to keep the __construct, if it's removed the Client call will fail as parent::__construct will not exists. In both cases there were no API changes, only the way the objects are initializated was what changed. My point is that the language does not provide solid fundations (aka language constructs) for systems and libraries to evolve in a safe way. Martin Scotta On Wed, Jan 19, 2011 at 4:58 PM, Stas Malyshev smalys...@sugarcrm.comwrote: Hi! I think the point is that the php language itself does not provide solid construct for writing rock-solid code. Yes, there are many programmers/hackers that can, but the effort they put is huge. I think this is completely untrue. In Java you are free to extend a class --yours or imported-- without worries about it's internal implementation. Is that possible in PHP? nope. __construct breaks that. Could you please explain what you mean? How __construct breaks extending a class? IMHO, as a simple PHP programmer, the language should provide the simplest language construct and the engine should handle all the complexity under the hood. I see no way of hiding threads complexity under the hood - if you want threads, you'll need to deal with synchronization, locking, race conditions, etc. Do you see any way to avoid it? -- Stanislav Malyshev, Software Architect SugarCRM: http://www.sugarcrm.com/ (408)454-6900 ext. 227
Re: [PHP-DEV] [citations for] Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor
Hi! Many PHP features should be language constructs, but they were made as language hacks. __construct is evil, as like any other language hack Constructors are standard feature in many languages. There's nothing evil in them. class Client { function __construct() { // some code parent::__construct(); // as good client call the super class // and then more code } } Arguably, initialization is the part of the API, but I see your point - it might be useful to supply all objects with empty default ctor so that parent::__construct() always works. Submit a feature request to bugs.php.net. -- Stanislav Malyshev, Software Architect SugarCRM: http://www.sugarcrm.com/ (408)454-6900 ext. 227 -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] How deep is copy on write?
On Wednesday, January 19, 2011 4:45:14 pm Ben Schmidt wrote: Related: What is the overhead of a ZVal? I'm assuming it's a fixed number of bytes. It seems not, though a zval has a fixed size. What that size is will depend on the compiler and architecture of the system being used, or at least on the ABI. Ah, yes, of course. Oh C... *snip* The zvalue_value union will probably be 8 or 12 bytes, depending on the architecture. The whole struct will then probably be between 14 and 24 bytes, depending on the architecture and structure alignment and so on. *snip* You can figure out what you think the overhead is from that. For a string, arguably the whole structure is overhead, since the string is stored elsewhere via pointer. Likewise for objects. For a double, the payload is 8 bytes, and stored in the zval, so there's less overhead. An integer, with a payload of 4 bytes, is somewhere in between. Hm. OK, so if I'm assuming a 64-bit architecture (most servers these days, I'd think) and just looking for a rough approximation, it sounds like 20 bytes per zval/variable is a not unreasonable estimation. At least close enough for determining the memory overhead of a general algorithm. Thanks again! --Larry Garfield -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php