Re: [PHP-DEV] How deep is copy on write?

2011-01-19 Thread Hannes Landeholm
Using references does not speed up PHP. It does that already
internally, if I'm not mistaken. The point of my post was that
assigning values to tree arrays are in general faster than a full
array copy.

Hannes

On 19 January 2011 08:36, Ben Schmidt mail_ben_schm...@yahoo.com.au wrote:
 Yep. PHP does clock up memory very quickly for big arrays, objects with lots
 of members and/or lots of small objects with large overheads. There are a
 LOT of zvals and zobjects and things around the place, and their overhead
 isn't all that small.

 Of course, if you go to the trouble to construct arrays using references,
 you can avoid some of that, because a copy-on-write will just copy the
 reference. It does mean you're passing references, though.

 $bar['baz'] = 1;
 $poink['narf'] = 1;
 $a['foo']['bar'] = $bar;
 $a['foo']['poink'] = $poink;

 Then if you test($a), $bar and $poink will be changed, since they are
 'passed by reference'--no copying needs to be done. It's almost as if $b
 were passed by reference, but setting $b['blip'] wouldn't show up in $a,
 because $a itself would be copied in that case, including the references,
 which would continue to refer to $bar and $poink. So a much quicker copy,
 but obviously not the same level of isolation that you might expect or
 desire. Unless you did some jiggerypokery like $b_bar=$b['bar'];
 $b['bar']=$b_bar; which would break the reference and make a copy of just
 that part of the array. But this is a pretty nasty caller-callee
 co-operative kind of thing. Just a thought to throw into the mix, though.

 Disclaimer: I'm somewhat out of my depth here. But I'm sure someone will
 jump on me if I'm wrong.

 Ben.



 On 19/01/11 6:09 PM, Larry Garfield wrote:

 That's what I was afraid of.  So it does copy the entire array.  Crap. :-)

 Am I correct that each level in the array represents its own ZVal, with
 the
 additional memory overhead a ZVal has (however many bytes that is)?

 That is, the array below would have $a, foo, bar, baz, bob, narf, poink,
 poink/narf = 8 ZVals?  (That seems logical to me because each its its own
 variable that just happens to be an array, but I want to be sure.)

 --Larry Garfield

 On Wednesday, January 19, 2011 1:01:44 am Ben Schmidt wrote:

 It does the whole of $b. It has to, because when you change 'baz', a
 reference in 'bar' needs to change to point to the newly copied 'baz', so
 'bar' is written...and likewise 'foo' is written.

 Ben.

 On 19/01/11 5:45 PM, Larry Garfield wrote:

 Hi folks.  I have a question about the PHP runtime that I hope is
 appropriate for this list.  (If not, please thwap me gently; I bruise
 easily.)

 I know PHP does copy-on-write.  However, how deeply does it copy when
 dealing with nested arrays?

 This is probably easiest to explain with an example...

 $a['foo']['bar']['baz'] = 1;
 $a['foo']['bar']['bob'] = 1;
 $a['foo']['bar']['narf'] = 1;
 $a['foo']['poink']['narf'] = 1;

 function test($b) {

    // Assume each of the following lines in isolation...

    // Does this copy just the one variable baz, or the full array?
    $b['foo']['bar']['baz'] = 2;

    // Does this copy $b, or just $b['foo']['poink']?
    $b['foo']['poink']['stuff'] = 3;

    return $b;

 }

 // I know this is wasteful; I'm trying to figure out just how wasteful.
 $a = test($a);

 test() in this case should take $b by reference, but I'm trying to
 determine how much of a difference it is.  (In practice my use case has
 a vastly larger array, so any inefficiencies are multiplied.)

 --Larry Garfield


 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] [citations for] Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor

2011-01-19 Thread Sam Vilain
On 19/01/11 16:14, Sam Vilain wrote:
 In general, Java's basic types typically correspond with types that can
 be dealt with atomically by processors, or are small enough to be passed
 by value.  This already makes things a lot easier.

 I've had another reason for the differences explained to me.  I'm not
 sure I understand it fully enough to be able to re-explain it, but I'll
 try anyway.  As I grasped the concept, the key to making VMs fully
 threadable with shared state, is to first allow reference addresses to
 change, such as via generational garbage collection.  This allows you to
 have much clearer stack frames, perhaps even really stored on the
 thread-local/C stack, as opposed to most dynamic language interpreters
 which barely use the C stack at all.  Then, when the long-lived objects
 are discovered at scope exit time they can be safely moved into the next
 memory pool, as well as letting access to old objects be locked (or
 copied, in the case of Software Transactional Memory).  Access to
 objects in your own frame can therefore be fast, and the number of locks
 that have to be held reduced.

Ref:
http://java.sun.com/docs/books/jvms/second_edition/html/Concepts.doc.html#33308
and to a lesser extent, the note on
http://java.sun.com/docs/books/jvms/second_edition/html/Threads.doc.html#22244

 Perhaps to support/refute this argument, in your JVM, how do you handle:

 - memory allocation: object references' timeline and garbage collection
 - call stack frames and/or return continuations - the C stack or the heap?
 - atomicity of functions (that's the synchronized keyword?)
 - timely object destruction

  put it forward that the overall design of the interpreter, and
 therefore what is possible in terms of threading, is highly influenced
 by these factors.

 When threading in C or C++ for instance (and this includes HipHop-TBB),
 the call stack frame is on the C stack, so shared state is possible so
 long as you pass heap pointers around and synchronise appropriately. 
 The virtual machine is of a different nature, and it can work.  For
 JVMs, as far as I know references are temporary and again the nature of
 the execution environment is different.

 For VMs where there is basically nothing on the stack, and everything on
 the heap, it becomes a lot harder.  To talk about a VM I know better,
 Perl has about 6 internal stacks all represented on the heap; a function
 call/return stack, a lexical scope stack to represent what is in scope,
 a variable stack (the tmps stack) for variables declared in those
 scopes and for timely destruction, a stack to implement local($var)
 called the save stack, a mark stack used for garbage collection, ok
 well only 5 but I think you get my point.  From my reading of the PHP
 internals so far there are similar set there too, so comparisons are
 quite likely to be instructive.  It's a bit hard figuring out everything
 that is going on internally (all these internal void* types don't help
 either), and whether or not there is some inherent property of reference
 counting, or whether it just makes a shared state model harder, is a
 question I'm not sure is easy to answer


Based on https://github.com/smarr/RoarVM/blob/98caf11d0/README.rst it
can be seen that indeed it is a completely different architecture.  From
the first of the ACM papers' abstract:

In addition to the cost of inter-core communication, two hardware
characteristics influenced our design: the absence of hardware-provided
cache-coherence, and the inability to move a single object from one
core's cache to another's without changing its address.

 In any case, full shared state is not required for a large set of useful
 parallelism APIs, and in fact contains a number of pitfalls which are
 difficult to explain, debug and fix.  I'm far more interested in simple
 acceleration of tight loops - to make use of otherwise idle CPU cores
 (perhaps virtual as in hyperthreading) to increase throughput - and APIs
 like map express this well.  The idea is that the executor can start
 up with no variables in scope, though hopefully shared code segments,
 call some function on the data it is passed in, and pass the answers
 back to the main thread and then set about cleaning itself up.

You could probably support this with any paper on Erlang ;-)

Sam

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] How deep is copy on write?

2011-01-19 Thread Martin Scotta
What about objects?

class Foo {
public $foo;
}

function test($o) {
$o-foo-foo-foo = 2;
}

$bar = new Foo;
$bar-foo = new Foo;
$bar-foo-foo = new Foo;

test( $bar );

---
Also... is it better to pass an object as a parameter rather than many
values?

function withValues($anInteger, $aBool, $aString) {
   var_dump($anInteger, $aBool, $aString);
}

function withObject(ParamOject $o) {
   var_dump( $o-theInteger(), $o-theBool(), $o-theString() );
}

 Martin Scotta


On Wed, Jan 19, 2011 at 5:03 AM, Hannes Landeholm landeh...@gmail.comwrote:

 Using references does not speed up PHP. It does that already
 internally, if I'm not mistaken. The point of my post was that
 assigning values to tree arrays are in general faster than a full
 array copy.

 Hannes

 On 19 January 2011 08:36, Ben Schmidt mail_ben_schm...@yahoo.com.au
 wrote:
  Yep. PHP does clock up memory very quickly for big arrays, objects with
 lots
  of members and/or lots of small objects with large overheads. There are a
  LOT of zvals and zobjects and things around the place, and their overhead
  isn't all that small.
 
  Of course, if you go to the trouble to construct arrays using references,
  you can avoid some of that, because a copy-on-write will just copy the
  reference. It does mean you're passing references, though.
 
  $bar['baz'] = 1;
  $poink['narf'] = 1;
  $a['foo']['bar'] = $bar;
  $a['foo']['poink'] = $poink;
 
  Then if you test($a), $bar and $poink will be changed, since they are
  'passed by reference'--no copying needs to be done. It's almost as if $b
  were passed by reference, but setting $b['blip'] wouldn't show up in $a,
  because $a itself would be copied in that case, including the references,
  which would continue to refer to $bar and $poink. So a much quicker copy,
  but obviously not the same level of isolation that you might expect or
  desire. Unless you did some jiggerypokery like $b_bar=$b['bar'];
  $b['bar']=$b_bar; which would break the reference and make a copy of just
  that part of the array. But this is a pretty nasty caller-callee
  co-operative kind of thing. Just a thought to throw into the mix, though.
 
  Disclaimer: I'm somewhat out of my depth here. But I'm sure someone will
  jump on me if I'm wrong.
 
  Ben.
 
 
 
  On 19/01/11 6:09 PM, Larry Garfield wrote:
 
  That's what I was afraid of.  So it does copy the entire array.  Crap.
 :-)
 
  Am I correct that each level in the array represents its own ZVal, with
  the
  additional memory overhead a ZVal has (however many bytes that is)?
 
  That is, the array below would have $a, foo, bar, baz, bob, narf, poink,
  poink/narf = 8 ZVals?  (That seems logical to me because each its its
 own
  variable that just happens to be an array, but I want to be sure.)
 
  --Larry Garfield
 
  On Wednesday, January 19, 2011 1:01:44 am Ben Schmidt wrote:
 
  It does the whole of $b. It has to, because when you change 'baz', a
  reference in 'bar' needs to change to point to the newly copied 'baz',
 so
  'bar' is written...and likewise 'foo' is written.
 
  Ben.
 
  On 19/01/11 5:45 PM, Larry Garfield wrote:
 
  Hi folks.  I have a question about the PHP runtime that I hope is
  appropriate for this list.  (If not, please thwap me gently; I bruise
  easily.)
 
  I know PHP does copy-on-write.  However, how deeply does it copy
 when
  dealing with nested arrays?
 
  This is probably easiest to explain with an example...
 
  $a['foo']['bar']['baz'] = 1;
  $a['foo']['bar']['bob'] = 1;
  $a['foo']['bar']['narf'] = 1;
  $a['foo']['poink']['narf'] = 1;
 
  function test($b) {
 
 // Assume each of the following lines in isolation...
 
 // Does this copy just the one variable baz, or the full array?
 $b['foo']['bar']['baz'] = 2;
 
 // Does this copy $b, or just $b['foo']['poink']?
 $b['foo']['poink']['stuff'] = 3;
 
 return $b;
 
  }
 
  // I know this is wasteful; I'm trying to figure out just how
 wasteful.
  $a = test($a);
 
  test() in this case should take $b by reference, but I'm trying to
  determine how much of a difference it is.  (In practice my use case
 has
  a vastly larger array, so any inefficiencies are multiplied.)
 
  --Larry Garfield
 
 
  --
  PHP Internals - PHP Runtime Development Mailing List
  To unsubscribe, visit: http://www.php.net/unsub.php
 
 

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP-DEV] [citations for] Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor

2011-01-19 Thread Martin Scotta
I think the point is that the php language itself does not provide solid
construct for writing rock-solid code. Yes, there are many
programmers/hackers that can, but the effort they put is huge.

it's so easy to break well-written bug-free code, that's impossible for
developers to share libraries, and even those who share has the problems
that the language does not provides the language construct for the system to
evolve without breaking its clients code.

As you were speaking about Java, we must learn from Java experience. All
that non-sense stuff that it imposes is the same stuff that provide to Java
developers to share their libraries. All you need to do is put the .jar in
your classpath, and that's it.

In Java you are free to extend a class --yours or imported-- without worries
about it's internal implementation. Is that possible in PHP? nope.
__construct breaks that.

So instead of hacking the language, why don't we start by adding better
language constructs.
Look at the foreach statement and the Iterators, that is a really good
example of a well-designed language construct.

I'm really interested on threads for PHP, but as a language construct.
Threads are not easy, even the most experienced programmer could not get it
right from the scratch.

IMHO, as a simple PHP programmer, the language should provide the simplest
language construct and the engine should handle all the complexity under the
hood.

 Martin Scotta


On Wed, Jan 19, 2011 at 8:40 AM, Sam Vilain sam.vil...@openparallel.comwrote:

 On 19/01/11 16:14, Sam Vilain wrote:
  In general, Java's basic types typically correspond with types that can
  be dealt with atomically by processors, or are small enough to be passed
  by value.  This already makes things a lot easier.
 
  I've had another reason for the differences explained to me.  I'm not
  sure I understand it fully enough to be able to re-explain it, but I'll
  try anyway.  As I grasped the concept, the key to making VMs fully
  threadable with shared state, is to first allow reference addresses to
  change, such as via generational garbage collection.  This allows you to
  have much clearer stack frames, perhaps even really stored on the
  thread-local/C stack, as opposed to most dynamic language interpreters
  which barely use the C stack at all.  Then, when the long-lived objects
  are discovered at scope exit time they can be safely moved into the next
  memory pool, as well as letting access to old objects be locked (or
  copied, in the case of Software Transactional Memory).  Access to
  objects in your own frame can therefore be fast, and the number of locks
  that have to be held reduced.

 Ref:

 http://java.sun.com/docs/books/jvms/second_edition/html/Concepts.doc.html#33308
 and to a lesser extent, the note on

 http://java.sun.com/docs/books/jvms/second_edition/html/Threads.doc.html#22244

  Perhaps to support/refute this argument, in your JVM, how do you handle:
 
  - memory allocation: object references' timeline and garbage collection
  - call stack frames and/or return continuations - the C stack or the
 heap?
  - atomicity of functions (that's the synchronized keyword?)
  - timely object destruction
 
   put it forward that the overall design of the interpreter, and
  therefore what is possible in terms of threading, is highly influenced
  by these factors.
 
  When threading in C or C++ for instance (and this includes HipHop-TBB),
  the call stack frame is on the C stack, so shared state is possible so
  long as you pass heap pointers around and synchronise appropriately.
  The virtual machine is of a different nature, and it can work.  For
  JVMs, as far as I know references are temporary and again the nature of
  the execution environment is different.
 
  For VMs where there is basically nothing on the stack, and everything on
  the heap, it becomes a lot harder.  To talk about a VM I know better,
  Perl has about 6 internal stacks all represented on the heap; a function
  call/return stack, a lexical scope stack to represent what is in scope,
  a variable stack (the tmps stack) for variables declared in those
  scopes and for timely destruction, a stack to implement local($var)
  called the save stack, a mark stack used for garbage collection, ok
  well only 5 but I think you get my point.  From my reading of the PHP
  internals so far there are similar set there too, so comparisons are
  quite likely to be instructive.  It's a bit hard figuring out everything
  that is going on internally (all these internal void* types don't help
  either), and whether or not there is some inherent property of reference
  counting, or whether it just makes a shared state model harder, is a
  question I'm not sure is easy to answer
 

 Based on https://github.com/smarr/RoarVM/blob/98caf11d0/README.rst it
 can be seen that indeed it is a completely different architecture.  From
 the first of the ACM papers' abstract:

 In addition to the cost of inter-core communication, two 

Re: [PHP-DEV] [citations for] Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor

2011-01-19 Thread Pierre Joye
hi,

On Wed, Jan 19, 2011 at 4:41 PM, Martin Scotta martinsco...@gmail.com wrote:
 I think the point is that the php language itself does not provide solid
 construct for writing rock-solid code. Yes, there are many
 programmers/hackers that can, but the effort they put is huge.

Care to enlighten me and tell me what is missing to allow one to write
rock-solid code?

 it's so easy to break well-written bug-free code, that's impossible for
 developers to share libraries, and even those who share has the problems
 that the language does not provides the language construct for the system to
 evolve without breaking its clients code.

I think that most of PHP is actually thread safe. And almost all
libraries are now either thread safe or used in a way that makes them
thread safe.

Now, about making the engine itself and the userland scripts able to
implement parallelized functions for multi-core architecture (which is
very disputable in a web environment, btw), that's a totally different
topic and I don't think it is worth the effort.


 I'm really interested on threads for PHP, but as a language construct.
 Threads are not easy, even the most experienced programmer could not get it
 right from the scratch.

Most of the time what PHP needs are non blocking operations, not
necessary multi threaded operations. That's what some of the newly
implemented features do (like in mysqlnd, to fetch the data).

 IMHO, as a simple PHP programmer, the language should provide the simplest
 language construct and the engine should handle all the complexity under the
 hood.

Honestly if a given part of an application needs something along this
line for performance reasons, then doing that on the same box where
the request is executed may be a bad idea. Tools like gearman will do
a far better jobs and will let you do resource intensive processing on
other machines where cores may not be already busy serving other
requests.

my 2 cents based on my experiences and benches in this area,

Cheers,
-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [citations for] Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor

2011-01-19 Thread Rasmus Lerdorf
On 1/19/11 7:50 AM, Pierre Joye wrote:
 Honestly if a given part of an application needs something along this
 line for performance reasons, then doing that on the same box where
 the request is executed may be a bad idea. Tools like gearman will do
 a far better jobs and will let you do resource intensive processing on
 other machines where cores may not be already busy serving other
 requests.
 
 my 2 cents based on my experiences and benches in this area,

In real-world situations this is what I see as well.  People either want
to parallelize operations like fetching data from multiple URLs at once,
where they think they need threading, but actually just need to learn
the async calls, or they want to background something that takes a while
to finish.  This second case is much better handled by a separate job
manager like Gearman.

One example I have written is a rule engine that calculates a trust
score for a financial transaction.  The rules can get a bit complicated
so it isn't something I want to have the web request wait on.  Using the
Kohana framework the call to kick off the rule engine looks like this:

Gearman::doBackground('kohana', gearman/payment_score/{$payment-id})

And I have a 'kohana' gearman worker that loads the entire framework
which means my actual worker code is just another controller that looks
exactly like my Web code.  Any controller can be backgrounded that way
with the added advantage that I can distribute these backgrounded jobs
to a pool of worker servers that are separate from my frontend web
servers, but they all run the same code stack.  To me this is a much
more flexible way to solve the problem that having to write
thread-management code in my Web code and have my already overloaded web
servers take on more work.

-Rasmus

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] How deep is copy on write?

2011-01-19 Thread Gustavo Lopes
On Wed, 19 Jan 2011 14:23:49 -, Martin Scotta martinsco...@gmail.com  
wrote:



What about objects?


With objects less copying occurs because the object value (zval) data is  
actually just a pointer and an id that for most purposes works as a  
pointer.


However, it should be said that while a copy of an array forces more  
memory to be copied, the inner zvals are not actually copied. In this  
snippet:


$a = array(1, 2, array(3));
$b = $a;
function separate($dummy) { }
separate($a);

the copy that occurs when you force the separation of the zval that is  
shared by $a and $b ($b = $a doesn't copy the array in $a to $b, it merely  
copies the zval pointer of $a to $b and increments its reference count) is  
just a shallow copy of hash table and a increment of the first level  
zvals' refcounts. This means the zvals that have their pointers stored in  
the array $a's HashTable are not themselves copied.


Interestingly (or should I say, unfortunately), this happens even if the  
inner zvals are references. See  
http://php.net/manual/en/language.references.whatdo.php the part on arrays.




class Foo {
public $foo;
}

function test($o) {
$o-foo-foo-foo = 2;
}

$bar = new Foo;
$bar-foo = new Foo;
$bar-foo-foo = new Foo;

test( $bar );


This example shows no copying (in the sense of new zval allocation on  
passing or assignment) at all.




---
Also... is it better to pass an object as a parameter rather than many
values?

function withValues($anInteger, $aBool, $aString) {
   var_dump($anInteger, $aBool, $aString);
}

function withObject(ParamOject $o) {
   var_dump( $o-theInteger(), $o-theBool(), $o-theString() );
}



It should be indifferent. In normal circumstances, there is no zval  
copying at all (only the pointers of arguments' symbols are copied). Only  
when you start throwing references into the mix will you start forcing  
copied.



--
Gustavo Lopes

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] How deep is copy on write?

2011-01-19 Thread la...@garfieldtech.com
So it sounds like the general answer is that if you pass a complex array 
to a function by value and mess with it, data is duplicated for every 
item you modify and its direct ancestors up to the root variable but not 
for the rest of the tree.


For objects, because of their pass by handle-type behavior you are 
(usually) modifying the same data directly so there's no duplication.


Does that sound correct?

Related: What is the overhead of a ZVal?  I'm assuming it's a fixed 
number of bytes.


--Larry Garfield

On 1/19/11 11:27 AM, Gustavo Lopes wrote:

On Wed, 19 Jan 2011 14:23:49 -, Martin Scotta
martinsco...@gmail.com wrote:


What about objects?


With objects less copying occurs because the object value (zval) data is
actually just a pointer and an id that for most purposes works as a
pointer.

However, it should be said that while a copy of an array forces more
memory to be copied, the inner zvals are not actually copied. In this
snippet:

$a = array(1, 2, array(3));
$b = $a;
function separate($dummy) { }
separate($a);

the copy that occurs when you force the separation of the zval that is
shared by $a and $b ($b = $a doesn't copy the array in $a to $b, it
merely copies the zval pointer of $a to $b and increments its reference
count) is just a shallow copy of hash table and a increment of the first
level zvals' refcounts. This means the zvals that have their pointers
stored in the array $a's HashTable are not themselves copied.

Interestingly (or should I say, unfortunately), this happens even if the
inner zvals are references. See
http://php.net/manual/en/language.references.whatdo.php the part on arrays.



class Foo {
public $foo;
}

function test($o) {
$o-foo-foo-foo = 2;
}

$bar = new Foo;
$bar-foo = new Foo;
$bar-foo-foo = new Foo;

test( $bar );


This example shows no copying (in the sense of new zval allocation on
passing or assignment) at all.



---
Also... is it better to pass an object as a parameter rather than many
values?

function withValues($anInteger, $aBool, $aString) {
var_dump($anInteger, $aBool, $aString);
}

function withObject(ParamOject $o) {
var_dump( $o-theInteger(), $o-theBool(), $o-theString() );
}



It should be indifferent. In normal circumstances, there is no zval
copying at all (only the pointers of arguments' symbols are copied).
Only when you start throwing references into the mix will you start
forcing copied.




--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] How deep is copy on write?

2011-01-19 Thread Peter Lind
On 19 January 2011 20:05, la...@garfieldtech.com la...@garfieldtech.com wrote:
 So it sounds like the general answer is that if you pass a complex array to
 a function by value and mess with it, data is duplicated for every item you
 modify and its direct ancestors up to the root variable but not for the rest
 of the tree.

 For objects, because of their pass by handle-type behavior you are
 (usually) modifying the same data directly so there's no duplication.

 Does that sound correct?

 Related: What is the overhead of a ZVal?  I'm assuming it's a fixed number
 of bytes.


http://lmgtfy.com/?q=php+zvall=1

Regards
Peter

-- 
hype
WWW: plphp.dk / plind.dk
LinkedIn: plind
BeWelcome/Couchsurfing: Fake51
Twitter: kafe15
/hype

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [citations for] Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor

2011-01-19 Thread Stas Malyshev

Hi!


I think the point is that the php language itself does not provide solid
construct for writing rock-solid code. Yes, there are many
programmers/hackers that can, but the effort they put is huge.


I think this is completely untrue.


In Java you are free to extend a class --yours or imported-- without worries
about it's internal implementation. Is that possible in PHP? nope.
__construct breaks that.


Could you please explain what you mean? How __construct breaks extending 
a class?



IMHO, as a simple PHP programmer, the language should provide the simplest
language construct and the engine should handle all the complexity under the
hood.


I see no way of hiding threads complexity under the hood - if you want 
threads, you'll need to deal with synchronization, locking, race 
conditions, etc. Do you see any way to avoid it?

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor

2011-01-19 Thread Ángel González
Have you taken a look at Runkit_Sandbox? It may provide useful tips.

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor

2011-01-19 Thread Sam Vilain
On 20/01/11 10:17, Ángel González wrote:
 Have you taken a look at Runkit_Sandbox? It may provide useful tips.

*headdesk*

No, I hadn't seen that.  Thanks for pointing this out, it looks like
exactly what I was trying to reinvent...

Cheers,
Sam.

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor

2011-01-19 Thread Ángel González
On 19/01/11 23:10, Sam Vilain wrote:
 On 20/01/11 10:17, Ángel González wrote:
 Have you taken a look at Runkit_Sandbox? It may provide useful tips.
 *headdesk*

 No, I hadn't seen that.  Thanks for pointing this out, it looks like
 exactly what I was trying to reinvent...

 Cheers,
 Sam.

You may need to patch it to work on 5.3 as-is. Patches at its bugzilla
are your friend.

Dmitry Zenovich was going to take care of maintaining it, but I don't know if 
he 
finally got his account or not.




-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] How deep is copy on write?

2011-01-19 Thread Ben Schmidt

On 20/01/11 6:05 AM, la...@garfieldtech.com wrote:

So it sounds like the general answer is that if you pass a complex array to a
function by value and mess with it, data is duplicated for every item you modify
and its direct ancestors up to the root variable but not for the rest of the 
tree.

For objects, because of their pass by handle-type behavior you are (usually)
modifying the same data directly so there's no duplication.

Does that sound correct?


Yes.


Related: What is the overhead of a ZVal? I'm assuming it's a fixed
number of bytes.


It seems not, though a zval has a fixed size. What that size is will
depend on the compiler and architecture of the system being used, or at
least on the ABI.

From zend.h:

typedef union _zvalue_value {
long lval;  /* long value */
double dval;/* double value */
struct {
char *val;
int len;
} str;
HashTable *ht;  /* hash table value */
zend_object_value obj;
} zvalue_value;

struct _zval_struct {
/* Variable information */
zvalue_value value; /* value */
zend_uint refcount__gc;
zend_uchar type;/* active type */
zend_uchar is_ref__gc;
};

The zvalue_value union will probably be 8 or 12 bytes, depending on the
architecture. The whole struct will then probably be between 14 and 24
bytes, depending on the architecture and structure alignment and so on.

For my system:

$ cd php-5.3.3
$ ./configure
$ cd Zend
$ gcc -I. -I../TSRM -x c - END

#include zend.h
int main(void) {
   printf(%lu\n,sizeof(zval));
   return 0;
}
END

$ file ./a.out
./a.out: Mach-O 64-bit executable
$ ./a.out
24
$ gcc -I. -I../TSRM -arch i386 -x c - END

#include zend.h
int main(void) {
   printf(%lu\n,sizeof(zval));
   return 0;
}
END

$ file ./a.out
./a.out: Mach-O executable i386
$ ./a.out
16

You can figure out what you think the overhead is from that. For a
string, arguably the whole structure is overhead, since the string is
stored elsewhere via pointer. Likewise for objects. For a double, the
payload is 8 bytes, and stored in the zval, so there's less overhead. An
integer, with a payload of 4 bytes, is somewhere in between.

Ben.





--Larry Garfield

On 1/19/11 11:27 AM, Gustavo Lopes wrote:

On Wed, 19 Jan 2011 14:23:49 -, Martin Scotta
martinsco...@gmail.com wrote:


What about objects?


With objects less copying occurs because the object value (zval) data is
actually just a pointer and an id that for most purposes works as a
pointer.

However, it should be said that while a copy of an array forces more
memory to be copied, the inner zvals are not actually copied. In this
snippet:

$a = array(1, 2, array(3));
$b = $a;
function separate($dummy) { }
separate($a);

the copy that occurs when you force the separation of the zval that is
shared by $a and $b ($b = $a doesn't copy the array in $a to $b, it
merely copies the zval pointer of $a to $b and increments its reference
count) is just a shallow copy of hash table and a increment of the first
level zvals' refcounts. This means the zvals that have their pointers
stored in the array $a's HashTable are not themselves copied.

Interestingly (or should I say, unfortunately), this happens even if the
inner zvals are references. See
http://php.net/manual/en/language.references.whatdo.php the part on arrays.



class Foo {
public $foo;
}

function test($o) {
$o-foo-foo-foo = 2;
}

$bar = new Foo;
$bar-foo = new Foo;
$bar-foo-foo = new Foo;

test( $bar );


This example shows no copying (in the sense of new zval allocation on
passing or assignment) at all.



---
Also... is it better to pass an object as a parameter rather than many
values?

function withValues($anInteger, $aBool, $aString) {
var_dump($anInteger, $aBool, $aString);
}

function withObject(ParamOject $o) {
var_dump( $o-theInteger(), $o-theBool(), $o-theString() );
}



It should be indifferent. In normal circumstances, there is no zval
copying at all (only the pointers of arguments' symbols are copied).
Only when you start throwing references into the mix will you start
forcing copied.






--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor

2011-01-19 Thread Stefan Marr
Hi Sam:

(becomes off-topic here, but for the sake of argument)

On 19 Jan 2011, at 04:14, Sam Vilain wrote:

 On 19/01/11 10:50, Stefan Marr wrote:
 On 18 Jan 2011, at 22:16, Sam Vilain wrote:
 there doesn't seem to
 be an interpreter under the sun which has successfully pulled off
 threading with shared data.
 Could you explain what you mean with that statement?
 
 Sorry, but that's my topic, and the most well know interpreters that 'pulled 
 off' threading with shared data are for Java. The interpreter I am working 
 on is for manycore systems (running on a 64-core Tilera chip) and executes 
 Smalltalk (https://github.com/smarr/RoarVM).
 
 You raise a very good point.  My statement is too broad and should
 probably apply only to dynamic languages, executed on reference counted
 VMs.  Look at some major ones - PHP, Python, Ruby, Perl, most JS engines
 - none of them actually thread properly.
Ok, but the reason here is that building such VMs is inherently complex.
And it has nothing to do with dynamic or not, with typed or what ever.
The mentioned languages happen to be very successful in the domain of web 
applications, and as others already mentioned, the need for fine-grained 
shared-memory parallelism here is not clear. So, why don't we have Python 
without the GIL? Because nobody cared enough. However, there is still JRuby...

 Well, Perl's threading does
 run full speed, but actually copies every variable on the heap for each
 new thread, massively bloating the process.
Cutting corners is the only way, if you do not have a great team of engineers.
For the RoarVM we also have to cut more corners than we would like.

 So the question is why should this be so, if C++ and Java, even
 interpreted on a JVM, can do it?
JVMs suffer from the same complexity. And C++, well, last time I checked there 
is just no threading model.
There will be a memory model in C++0x, but there is nothing which makes it 
inherently hard to implement.
Since you don't get any guarantees (beside the memory model semantics) and you 
don't have any GC either.

 In general, Java's basic types typically correspond with types that can
 be dealt with atomically by processors, or are small enough to be passed
 by value.  This already makes things a lot easier.
I don't think that buys you anything. Which basic types can be pass by copy?
Ints, and bools perhaps. That takes a bit pressure from the GC, but does not 
really help with making things safe. Smalltalk does not know basic types. 
However, it knows an implementation technique called tagged pointers/tagged 
integers. This allows you to have 31-bit integers since pointer are aligned and 
do not need all bits. However, that really helps only with GC pressure.  

 
 I've had another reason for the differences explained to me.  I'm not
 sure I understand it fully enough to be able to re-explain it, but I'll
 try anyway.  As I grasped the concept, the key to making VMs fully
 threadable with shared state, is to first allow reference addresses to
 change, such as via generational garbage collection.
Hm, there is usually the wish that you can run your GC threads in parallel with 
mutator threads, here it is indeed helpful to support moving GCs. But how does 
it help with threads working in parallel on some shared object? Any point were 
an object is allowed to move requires synchronization. So, either someone has 
to change the pointer you own to that object, or you need an additional level 
of indirection.

I guess you are talking here about having such an additional indirection, 
object handles?

 This allows you to
 have much clearer stack frames, perhaps even really stored on the
 thread-local/C stack, as opposed to most dynamic language interpreters
 which barely use the C stack at all.
Why does having object handles give you a better stack frame layout?
Using the C stack can be helpful for performance, well, makes other languages 
features harder to implement.
For instance what about closures?
Other techniques like recycling you stack-frame-objects is usually a simpler 
optimization without making it harder to stuff like closures.


  Then, when the long-lived objects
 are discovered at scope exit time they can be safely moved into the next
 memory pool,
Ui ui ui. Slooow. I don't follow. Ok, there are things like escape analysis.
And then there are techniques like on-stack-allocation. Both usually done in 
JIT compilers, not so much in interpreters. Are we still talking about 
interpreters?
Or are you implying a incremental GC that is triggered on the return of method 
calls?


 as well as letting access to old objects be locked (or
 copied, in the case of Software Transactional Memory).
There are to many things here discussed in a single sentence. Sorry, I am lost.

  Access to
 objects in your own frame can therefore be fast, and the number of locks
 that have to be held reduced.
Ok, on-stack-allocation and biased locking? 

 - memory allocation: object references' timeline and 

Re: [PHP-DEV] [citations for] Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor

2011-01-19 Thread Martin Scotta
Many PHP features should be language constructs, but they were made as
language hacks.

__construct is evil, as like any other language hack

It does not provides a safe fundation to build safe abstractions, reusable
and extendibles components, which leads to the lack of PHP libraries.

Let's suppose there is a library that provides an utility class, which has
no super class nor constructor.

// lives in library.phar
class Utility { }

A client uses this class by extending it

// includes library.phar
class Client extends Utility {
function __construct() {
//  client initialization code here
}
}

At that point the Utility class can not add __construct safely, and if it
does Client will break it, it's not calling the constructor.

but what happen if Utility provides a __constructor
class Utility {
function __construct() {
// Utility initialization here
}
}

class Client {
function __construct() {
// some code
parent::__construct(); // as good client call the super class
// and then more code
}
}

In this case the Utility is forced to keep the __construct, if it's removed
the Client call will fail as parent::__construct will not exists.

In both cases there were no API changes, only the way the objects are
initializated was what changed.

My point is that the language does not provide solid fundations (aka
language constructs) for systems and libraries to evolve in a safe way.

 Martin Scotta


On Wed, Jan 19, 2011 at 4:58 PM, Stas Malyshev smalys...@sugarcrm.comwrote:

 Hi!


  I think the point is that the php language itself does not provide solid
 construct for writing rock-solid code. Yes, there are many
 programmers/hackers that can, but the effort they put is huge.


 I think this is completely untrue.


  In Java you are free to extend a class --yours or imported-- without
 worries
 about it's internal implementation. Is that possible in PHP? nope.
 __construct breaks that.


 Could you please explain what you mean? How __construct breaks extending a
 class?


  IMHO, as a simple PHP programmer, the language should provide the simplest
 language construct and the engine should handle all the complexity under
 the
 hood.


 I see no way of hiding threads complexity under the hood - if you want
 threads, you'll need to deal with synchronization, locking, race conditions,
 etc. Do you see any way to avoid it?

 --
 Stanislav Malyshev, Software Architect
 SugarCRM: http://www.sugarcrm.com/
 (408)454-6900 ext. 227



Re: [PHP-DEV] [citations for] Re: [PHP-DEV] Experiments with a threading library for Zend: spawning a new executor

2011-01-19 Thread Stas Malyshev

Hi!


Many PHP features should be language constructs, but they were made as
language hacks.

__construct is evil, as like any other language hack


Constructors are standard feature in many languages. There's nothing 
evil in them.



class Client {
 function __construct() {
 // some code
 parent::__construct(); // as good client call the super class
 // and then more code
 }
}


Arguably, initialization is the part of the API, but I see your point - 
it might be useful to supply all objects with empty default ctor so that 
parent::__construct() always works. Submit a feature request to 
bugs.php.net.

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] How deep is copy on write?

2011-01-19 Thread Larry Garfield
On Wednesday, January 19, 2011 4:45:14 pm Ben Schmidt wrote:

  Related: What is the overhead of a ZVal? I'm assuming it's a fixed
  number of bytes.
 
 It seems not, though a zval has a fixed size. What that size is will
 depend on the compiler and architecture of the system being used, or at
 least on the ABI.

Ah, yes, of course.  Oh C...

*snip*

 The zvalue_value union will probably be 8 or 12 bytes, depending on the
 architecture. The whole struct will then probably be between 14 and 24
 bytes, depending on the architecture and structure alignment and so on.

*snip*

 You can figure out what you think the overhead is from that. For a
 string, arguably the whole structure is overhead, since the string is
 stored elsewhere via pointer. Likewise for objects. For a double, the
 payload is 8 bytes, and stored in the zval, so there's less overhead. An
 integer, with a payload of 4 bytes, is somewhere in between.

Hm.  OK, so if I'm assuming a 64-bit architecture (most servers these days, 
I'd think) and just looking for a rough approximation, it sounds like 20 bytes 
per zval/variable is a not unreasonable estimation.  At least close enough for 
determining the memory overhead of a general algorithm.

Thanks again!

--Larry Garfield

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php