Re: [PHP-DEV] Improvements to for-each implementation

2015-01-29 Thread Dmitry Stogov
Hi,

I've solved most of the reported problems and started working on RFC.

Nikta, please take a look into:
https://wiki.php.net/rfc/php7_foreach
https://github.com/php/php-src/pull/1034
and especially commit 15a23b1

Only iteration by value over plain objects is not completely consistent now.
However, the implementation may have bugs and other problems (your code
review may be very helpful).
I'm still working...

Thanks. Dmitry.

On Wed, Jan 28, 2015 at 7:59 AM, Dmitry Stogov dmi...@zend.com wrote:

 I've created a branch and pull request for easier collaboration and
 tracking.

 https://github.com/php/php-src/pull/1034

 It's the same diff I published yesterday. Nothing changed or added yet.

 Thanks. Dmitry.

 On Wed, Jan 28, 2015 at 1:36 AM, Nikita Popov nikita@gmail.com
 wrote:

 On Tue, Jan 27, 2015 at 5:55 PM, Dmitry Stogov dmi...@zend.com wrote:

 Hi,

 I'm working on a PoC, implementing the proposed behavior -
 https://gist.github.com/dstogov/a311e8b0b2cabee4dab4

 Only few PHPT tests had to be changed to confirm new behavior and
 actually they started to behave as HHVM.
 2 tests are still unfixed XFAILed. Foreach by reference is still need to
 be improved to support different array modifications.

 The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
 destruction of 200 arrays on each request)

 Thanks. Dmitry.


 I quickly looked over the patch, some notes:

 * 171: Can directly use GET_OP1_ZVAL_PTR_DEREF
 * For iterator failures (exception) you use FREE_OP1_IF_VAR(). Is this
 enough? If the object is a TMP_VAR, don't we have to free it as well?
 * For objects it will still be possible to modify the hashtable during
 iteration even in the _R case, correct? I assume we just don't care about
 this edge case.
 * 315: In RESET_RW you now always create a reference for VAR|CV operands.
 However for the get_iterator case we don't need this, right?
 * 328: In the non VAR|CV case SEPARTE_ARRAY is not used. As we're going
 to change the IAP, this is probably necessary in this case as well.
 * For RW iterator failures FREE_OP1_VAR_PTR() is used. This probably
 leaks in the TMP case. (Same for the invalid argument case.)

 What concerns me a bit is the new FETCH_RW implementation, because it no
 longer checks the internal pointer against the external one. This
 effectively means that foreach by reference behavior will be influenced a
 lot by details of the hashtable implementation. An example:

 $array = [0, 1, 2, 3, 4, 5, 6, 7];
 unset($array[0], $array[1], $array[2], $array[3]);
 foreach ($array as $ref) {
 var_dump($ref);
 //$array[42] = 42;
 }

 Without the commented line this will just print the numbers 4, 5, 6, 7.
 If you uncomment the line it will only print 4. (Because the append caused
 a rehash and arData was compacted, so the position now points past all
 elements). The previous output would have been 4, 5, 6, 7, 42, which is
 closer to what you would expect. Maybe better to keep the pos !=
 nInternalPointer code?

 Thanks,
 Nikita





Re: [PHP-DEV] Improvements to for-each implementation

2015-01-27 Thread Rasmus Lerdorf
On 01/27/2015 08:55 AM, Dmitry Stogov wrote:
 The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
 destruction of 200 arrays on each request)

Awesome!



signature.asc
Description: OpenPGP digital signature


Re: [PHP-DEV] Improvements to for-each implementation

2015-01-27 Thread Yasuo Ohgaki
Hi Dmitry,

On Wed, Jan 28, 2015 at 1:55 AM, Dmitry Stogov dmi...@zend.com wrote:

 I'm working on a PoC, implementing the proposed behavior -
 https://gist.github.com/dstogov/a311e8b0b2cabee4dab4

 Only few PHPT tests had to be changed to confirm new behavior and actually
 they started to behave as HHVM.
 2 tests are still unfixed XFAILed. Foreach by reference is still need to be
 improved to support different array modifications.

 The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
 destruction of 200 arrays on each request)


Great!

--
Yasuo Ohgaki
yohg...@ohgaki.net


Re: [PHP-DEV] Improvements to for-each implementation

2015-01-27 Thread Nikita Popov
On Tue, Jan 27, 2015 at 5:55 PM, Dmitry Stogov dmi...@zend.com wrote:

 Hi,

 I'm working on a PoC, implementing the proposed behavior -
 https://gist.github.com/dstogov/a311e8b0b2cabee4dab4

 Only few PHPT tests had to be changed to confirm new behavior and actually
 they started to behave as HHVM.
 2 tests are still unfixed XFAILed. Foreach by reference is still need to
 be improved to support different array modifications.

 The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
 destruction of 200 arrays on each request)

 Thanks. Dmitry.


I quickly looked over the patch, some notes:

* 171: Can directly use GET_OP1_ZVAL_PTR_DEREF
* For iterator failures (exception) you use FREE_OP1_IF_VAR(). Is this
enough? If the object is a TMP_VAR, don't we have to free it as well?
* For objects it will still be possible to modify the hashtable during
iteration even in the _R case, correct? I assume we just don't care about
this edge case.
* 315: In RESET_RW you now always create a reference for VAR|CV operands.
However for the get_iterator case we don't need this, right?
* 328: In the non VAR|CV case SEPARTE_ARRAY is not used. As we're going to
change the IAP, this is probably necessary in this case as well.
* For RW iterator failures FREE_OP1_VAR_PTR() is used. This probably leaks
in the TMP case. (Same for the invalid argument case.)

What concerns me a bit is the new FETCH_RW implementation, because it no
longer checks the internal pointer against the external one. This
effectively means that foreach by reference behavior will be influenced a
lot by details of the hashtable implementation. An example:

$array = [0, 1, 2, 3, 4, 5, 6, 7];
unset($array[0], $array[1], $array[2], $array[3]);
foreach ($array as $ref) {
var_dump($ref);
//$array[42] = 42;
}

Without the commented line this will just print the numbers 4, 5, 6, 7. If
you uncomment the line it will only print 4. (Because the append caused a
rehash and arData was compacted, so the position now points past all
elements). The previous output would have been 4, 5, 6, 7, 42, which is
closer to what you would expect. Maybe better to keep the pos !=
nInternalPointer code?

Thanks,
Nikita


Re: [PHP-DEV] Improvements to for-each implementation

2015-01-27 Thread Xinchen Hui
Hey:

On Wed, Jan 28, 2015 at 11:22 AM, Xinchen Hui xinche...@zend.com wrote:
 Hey:

 On Wed, Jan 28, 2015 at 12:55 AM, Dmitry Stogov dmi...@zend.com wrote:
 Hi,

 I'm working on a PoC, implementing the proposed behavior -
 https://gist.github.com/dstogov/a311e8b0b2cabee4dab4

 Only few PHPT tests had to be changed to confirm new behavior and actually
 they started to behave as HHVM.
 2 tests are still unfixed XFAILed. Foreach by reference is still need to be
 improved to support different array modifications.

 The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
 destruction of 200 arrays on each request)

 I just have got an idea:

 Droping use of ZEND_OP_DATA make it possible to generate better
 foreach loop opcodes

 previously FE_RESET and FE_FETCH  share one same OP_DATA based on offset.

 0 FE_RESET
 1 FE_FETCH fail- 5
 2 OP_DATA
 3  . statements
 4 JMP 1
 5

 without depending on that OP_DATA, we can change this to

 0 FE_RESET
 1 JMP 3
 2 statements
 3 FE_FETCH success-2
 4 

 that will reduce on JMP in every foreach loop

 could you please also try this?
I mean, dropping both ZEND_FE_RESET and FE_FETCH share one ZEND_OP_DATA.

the previously example should be:

 0 FE_RESET
 1 JMP 3
 2 statements
 3 FE_FETCH success-2
   4 ZEND_OP_DATA
 5 

thanks

 thanks


 Thanks. Dmitry.



 On Thu, Jan 22, 2015 at 1:38 PM, Benjamin Coutu ben.co...@zeyos.com wrote:

 Hi Nikita,

 I would suggest using the proposal for the by-value case and sticking with
 the current behavior for the by-reference case as you suggested. Granted, we
 then cannot remove the internal pointer all together, but we would just use
 it for the less common by-reference case as well as for the legacy
 reset/next functions, of course. This would still improve most for-each
 traversals. At least we then wouldn't have to find a replacement solution
 for rest/prev/next/end. Moreover we can then move the internal position
 pointer out of the hashtable structure into zend_array because it is only
 used for userland arrays, not other hash tables (such as object properties
 or internal structures that build upon the hashtable struct).

 Thanks,

 Ben

 == Original ==
 From: Nikita Popov nikita@gmail.com
 To: Benjamin Coutu ben.co...@zeyos.com
 Date: Thu, 22 Jan 2015 10:53:19 +0100
 Subject: Re: [PHP-DEV] Improvements to for-each implementation

 Doing this was the idea I had in mind as well, i.e. change the semantics
 of
 foreach to say that it will always work on a copy for by-value iteration
 (which ironically avoids having to actually copy it). Note that this will
 differ from the current behavior in a number of ways. In particular it
 means that changes to arrays that were references prior to iteration will
 not influence the iteration.

 The real question is what we should do in the by-reference case. Given
 that
 we need to acquire references to elements of the original array we can't
 reasonably work with copy-semantics (at least I don't see how). So would
 we
 just stick with the previous behavior (using the hash position hack) for
 that?

 Nikita


 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php





 --
 Xinchen Hui
 @Laruence
 http://www.laruence.com/



-- 
Xinchen Hui
@Laruence
http://www.laruence.com/

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Improvements to for-each implementation

2015-01-27 Thread Dmitry Stogov
I've created a branch and pull request for easier collaboration and
tracking.

https://github.com/php/php-src/pull/1034

It's the same diff I published yesterday. Nothing changed or added yet.

Thanks. Dmitry.

On Wed, Jan 28, 2015 at 1:36 AM, Nikita Popov nikita@gmail.com wrote:

 On Tue, Jan 27, 2015 at 5:55 PM, Dmitry Stogov dmi...@zend.com wrote:

 Hi,

 I'm working on a PoC, implementing the proposed behavior -
 https://gist.github.com/dstogov/a311e8b0b2cabee4dab4

 Only few PHPT tests had to be changed to confirm new behavior and
 actually they started to behave as HHVM.
 2 tests are still unfixed XFAILed. Foreach by reference is still need to
 be improved to support different array modifications.

 The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
 destruction of 200 arrays on each request)

 Thanks. Dmitry.


 I quickly looked over the patch, some notes:

 * 171: Can directly use GET_OP1_ZVAL_PTR_DEREF
 * For iterator failures (exception) you use FREE_OP1_IF_VAR(). Is this
 enough? If the object is a TMP_VAR, don't we have to free it as well?
 * For objects it will still be possible to modify the hashtable during
 iteration even in the _R case, correct? I assume we just don't care about
 this edge case.
 * 315: In RESET_RW you now always create a reference for VAR|CV operands.
 However for the get_iterator case we don't need this, right?
 * 328: In the non VAR|CV case SEPARTE_ARRAY is not used. As we're going to
 change the IAP, this is probably necessary in this case as well.
 * For RW iterator failures FREE_OP1_VAR_PTR() is used. This probably
 leaks in the TMP case. (Same for the invalid argument case.)

 What concerns me a bit is the new FETCH_RW implementation, because it no
 longer checks the internal pointer against the external one. This
 effectively means that foreach by reference behavior will be influenced a
 lot by details of the hashtable implementation. An example:

 $array = [0, 1, 2, 3, 4, 5, 6, 7];
 unset($array[0], $array[1], $array[2], $array[3]);
 foreach ($array as $ref) {
 var_dump($ref);
 //$array[42] = 42;
 }

 Without the commented line this will just print the numbers 4, 5, 6, 7. If
 you uncomment the line it will only print 4. (Because the append caused a
 rehash and arData was compacted, so the position now points past all
 elements). The previous output would have been 4, 5, 6, 7, 42, which is
 closer to what you would expect. Maybe better to keep the pos !=
 nInternalPointer code?

 Thanks,
 Nikita



Re: [PHP-DEV] Improvements to for-each implementation

2015-01-27 Thread Dmitry Stogov
Hi,

I'm working on a PoC, implementing the proposed behavior -
https://gist.github.com/dstogov/a311e8b0b2cabee4dab4

Only few PHPT tests had to be changed to confirm new behavior and actually
they started to behave as HHVM.
2 tests are still unfixed XFAILed. Foreach by reference is still need to be
improved to support different array modifications.

The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
destruction of 200 arrays on each request)

Thanks. Dmitry.



On Thu, Jan 22, 2015 at 1:38 PM, Benjamin Coutu ben.co...@zeyos.com wrote:

 Hi Nikita,

 I would suggest using the proposal for the by-value case and sticking with
 the current behavior for the by-reference case as you suggested. Granted,
 we then cannot remove the internal pointer all together, but we would just
 use it for the less common by-reference case as well as for the legacy
 reset/next functions, of course. This would still improve most for-each
 traversals. At least we then wouldn't have to find a replacement solution
 for rest/prev/next/end. Moreover we can then move the internal position
 pointer out of the hashtable structure into zend_array because it is only
 used for userland arrays, not other hash tables (such as object properties
 or internal structures that build upon the hashtable struct).

 Thanks,

 Ben

 == Original ==
 From: Nikita Popov nikita@gmail.com
 To: Benjamin Coutu ben.co...@zeyos.com
 Date: Thu, 22 Jan 2015 10:53:19 +0100
 Subject: Re: [PHP-DEV] Improvements to for-each implementation

 Doing this was the idea I had in mind as well, i.e. change the semantics of
 foreach to say that it will always work on a copy for by-value iteration
 (which ironically avoids having to actually copy it). Note that this will
 differ from the current behavior in a number of ways. In particular it
 means that changes to arrays that were references prior to iteration will
 not influence the iteration.

 The real question is what we should do in the by-reference case. Given that
 we need to acquire references to elements of the original array we can't
 reasonably work with copy-semantics (at least I don't see how). So would we
 just stick with the previous behavior (using the hash position hack) for
 that?

 Nikita


 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP-DEV] Improvements to for-each implementation

2015-01-27 Thread Xinchen Hui
Hey:

On Wed, Jan 28, 2015 at 11:47 AM, Dmitry Stogov dmi...@zend.com wrote:
 Hi,

 I have this in mind. May be It's even possible to remove JMP.

 FE_RESET op1=array op2=label 2 result=var
 OP_DATA op1=iterator result=key
 1: statements...
 FE_FETCH op1=iterator op2=label 1 result=var
 OP_DATA result=key
 2:  FREE iterator

 Anyway, it's not a big problem implementing this, but eliminating of array
 duplication is more important.
 We still have to solve more serious problems first.

of course, I just was curious maybe this could give us more
improvement added to that ~1%.

thanks
 Thanks. Dmitry.

 On Wed, Jan 28, 2015 at 6:22 AM, Xinchen Hui xinche...@zend.com wrote:

 Hey:

 On Wed, Jan 28, 2015 at 12:55 AM, Dmitry Stogov dmi...@zend.com wrote:
  Hi,
 
  I'm working on a PoC, implementing the proposed behavior -
  https://gist.github.com/dstogov/a311e8b0b2cabee4dab4
 
  Only few PHPT tests had to be changed to confirm new behavior and
  actually
  they started to behave as HHVM.
  2 tests are still unfixed XFAILed. Foreach by reference is still need to
  be
  improved to support different array modifications.
 
  The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
  destruction of 200 arrays on each request)

 I just have got an idea:

 Droping use of ZEND_OP_DATA make it possible to generate better
 foreach loop opcodes

 previously FE_RESET and FE_FETCH  share one same OP_DATA based on offset.

 0 FE_RESET
 1 FE_FETCH fail- 5
 2 OP_DATA
 3  . statements
 4 JMP 1
 5

 without depending on that OP_DATA, we can change this to

 0 FE_RESET
 1 JMP 3
 2 statements
 3 FE_FETCH success-2
 4 

 that will reduce on JMP in every foreach loop

 could you please also try this?

 thanks

 
  Thanks. Dmitry.
 
 
 
  On Thu, Jan 22, 2015 at 1:38 PM, Benjamin Coutu ben.co...@zeyos.com
  wrote:
 
  Hi Nikita,
 
  I would suggest using the proposal for the by-value case and sticking
  with
  the current behavior for the by-reference case as you suggested.
  Granted, we
  then cannot remove the internal pointer all together, but we would just
  use
  it for the less common by-reference case as well as for the legacy
  reset/next functions, of course. This would still improve most for-each
  traversals. At least we then wouldn't have to find a replacement
  solution
  for rest/prev/next/end. Moreover we can then move the internal position
  pointer out of the hashtable structure into zend_array because it is
  only
  used for userland arrays, not other hash tables (such as object
  properties
  or internal structures that build upon the hashtable struct).
 
  Thanks,
 
  Ben
 
  == Original ==
  From: Nikita Popov nikita@gmail.com
  To: Benjamin Coutu ben.co...@zeyos.com
  Date: Thu, 22 Jan 2015 10:53:19 +0100
  Subject: Re: [PHP-DEV] Improvements to for-each implementation
 
  Doing this was the idea I had in mind as well, i.e. change the
  semantics
  of
  foreach to say that it will always work on a copy for by-value
  iteration
  (which ironically avoids having to actually copy it). Note that this
  will
  differ from the current behavior in a number of ways. In particular it
  means that changes to arrays that were references prior to iteration
  will
  not influence the iteration.
 
  The real question is what we should do in the by-reference case. Given
  that
  we need to acquire references to elements of the original array we
  can't
  reasonably work with copy-semantics (at least I don't see how). So
  would
  we
  just stick with the previous behavior (using the hash position hack)
  for
  that?
 
  Nikita
 
 
  --
  PHP Internals - PHP Runtime Development Mailing List
  To unsubscribe, visit: http://www.php.net/unsub.php
 
 



 --
 Xinchen Hui
 @Laruence
 http://www.laruence.com/





-- 
Xinchen Hui
@Laruence
http://www.laruence.com/

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Improvements to for-each implementation

2015-01-27 Thread Dmitry Stogov
Hi,

I have this in mind. May be It's even possible to remove JMP.

FE_RESET op1=array op2=label 2 result=var
OP_DATA op1=iterator result=key
1: statements...
FE_FETCH op1=iterator op2=label 1 result=var
OP_DATA result=key
2:  FREE iterator

Anyway, it's not a big problem implementing this, but eliminating of array
duplication is more important.
We still have to solve more serious problems first.

Thanks. Dmitry.

On Wed, Jan 28, 2015 at 6:22 AM, Xinchen Hui xinche...@zend.com wrote:

 Hey:

 On Wed, Jan 28, 2015 at 12:55 AM, Dmitry Stogov dmi...@zend.com wrote:
  Hi,
 
  I'm working on a PoC, implementing the proposed behavior -
  https://gist.github.com/dstogov/a311e8b0b2cabee4dab4
 
  Only few PHPT tests had to be changed to confirm new behavior and
 actually
  they started to behave as HHVM.
  2 tests are still unfixed XFAILed. Foreach by reference is still need to
 be
  improved to support different array modifications.
 
  The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
  destruction of 200 arrays on each request)

 I just have got an idea:

 Droping use of ZEND_OP_DATA make it possible to generate better
 foreach loop opcodes

 previously FE_RESET and FE_FETCH  share one same OP_DATA based on offset.

 0 FE_RESET
 1 FE_FETCH fail- 5
 2 OP_DATA
 3  . statements
 4 JMP 1
 5

 without depending on that OP_DATA, we can change this to

 0 FE_RESET
 1 JMP 3
 2 statements
 3 FE_FETCH success-2
 4 

 that will reduce on JMP in every foreach loop

 could you please also try this?

 thanks

 
  Thanks. Dmitry.
 
 
 
  On Thu, Jan 22, 2015 at 1:38 PM, Benjamin Coutu ben.co...@zeyos.com
 wrote:
 
  Hi Nikita,
 
  I would suggest using the proposal for the by-value case and sticking
 with
  the current behavior for the by-reference case as you suggested.
 Granted, we
  then cannot remove the internal pointer all together, but we would just
 use
  it for the less common by-reference case as well as for the legacy
  reset/next functions, of course. This would still improve most for-each
  traversals. At least we then wouldn't have to find a replacement
 solution
  for rest/prev/next/end. Moreover we can then move the internal position
  pointer out of the hashtable structure into zend_array because it is
 only
  used for userland arrays, not other hash tables (such as object
 properties
  or internal structures that build upon the hashtable struct).
 
  Thanks,
 
  Ben
 
  == Original ==
  From: Nikita Popov nikita@gmail.com
  To: Benjamin Coutu ben.co...@zeyos.com
  Date: Thu, 22 Jan 2015 10:53:19 +0100
  Subject: Re: [PHP-DEV] Improvements to for-each implementation
 
  Doing this was the idea I had in mind as well, i.e. change the semantics
  of
  foreach to say that it will always work on a copy for by-value iteration
  (which ironically avoids having to actually copy it). Note that this
 will
  differ from the current behavior in a number of ways. In particular it
  means that changes to arrays that were references prior to iteration
 will
  not influence the iteration.
 
  The real question is what we should do in the by-reference case. Given
  that
  we need to acquire references to elements of the original array we can't
  reasonably work with copy-semantics (at least I don't see how). So would
  we
  just stick with the previous behavior (using the hash position hack) for
  that?
 
  Nikita
 
 
  --
  PHP Internals - PHP Runtime Development Mailing List
  To unsubscribe, visit: http://www.php.net/unsub.php
 
 



 --
 Xinchen Hui
 @Laruence
 http://www.laruence.com/



Re: [PHP-DEV] Improvements to for-each implementation

2015-01-27 Thread Xinchen Hui
Hey:

On Wed, Jan 28, 2015 at 12:55 AM, Dmitry Stogov dmi...@zend.com wrote:
 Hi,

 I'm working on a PoC, implementing the proposed behavior -
 https://gist.github.com/dstogov/a311e8b0b2cabee4dab4

 Only few PHPT tests had to be changed to confirm new behavior and actually
 they started to behave as HHVM.
 2 tests are still unfixed XFAILed. Foreach by reference is still need to be
 improved to support different array modifications.

 The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
 destruction of 200 arrays on each request)

I just have got an idea:

Droping use of ZEND_OP_DATA make it possible to generate better
foreach loop opcodes

previously FE_RESET and FE_FETCH  share one same OP_DATA based on offset.

0 FE_RESET
1 FE_FETCH fail- 5
2 OP_DATA
3  . statements
4 JMP 1
5

without depending on that OP_DATA, we can change this to

0 FE_RESET
1 JMP 3
2 statements
3 FE_FETCH success-2
4 

that will reduce on JMP in every foreach loop

could you please also try this?

thanks


 Thanks. Dmitry.



 On Thu, Jan 22, 2015 at 1:38 PM, Benjamin Coutu ben.co...@zeyos.com wrote:

 Hi Nikita,

 I would suggest using the proposal for the by-value case and sticking with
 the current behavior for the by-reference case as you suggested. Granted, we
 then cannot remove the internal pointer all together, but we would just use
 it for the less common by-reference case as well as for the legacy
 reset/next functions, of course. This would still improve most for-each
 traversals. At least we then wouldn't have to find a replacement solution
 for rest/prev/next/end. Moreover we can then move the internal position
 pointer out of the hashtable structure into zend_array because it is only
 used for userland arrays, not other hash tables (such as object properties
 or internal structures that build upon the hashtable struct).

 Thanks,

 Ben

 == Original ==
 From: Nikita Popov nikita@gmail.com
 To: Benjamin Coutu ben.co...@zeyos.com
 Date: Thu, 22 Jan 2015 10:53:19 +0100
 Subject: Re: [PHP-DEV] Improvements to for-each implementation

 Doing this was the idea I had in mind as well, i.e. change the semantics
 of
 foreach to say that it will always work on a copy for by-value iteration
 (which ironically avoids having to actually copy it). Note that this will
 differ from the current behavior in a number of ways. In particular it
 means that changes to arrays that were references prior to iteration will
 not influence the iteration.

 The real question is what we should do in the by-reference case. Given
 that
 we need to acquire references to elements of the original array we can't
 reasonably work with copy-semantics (at least I don't see how). So would
 we
 just stick with the previous behavior (using the hash position hack) for
 that?

 Nikita


 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php





-- 
Xinchen Hui
@Laruence
http://www.laruence.com/

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] Improvements to for-each implementation

2015-01-22 Thread Benjamin Coutu
Hi,

this post is a fork of the [PHP-DEV] Fixing strange foreach behavior thread. 
It proposes a more efficient for-each mechanism (that does NOT change the 
conceptual behaviour).

Currently on for-each the engine will have to copy the array if that array is 
visible anywhere else in the program because it will reset the internal 
position pointer (which is part of the underlying hashtable structure) and 
another part of the program might rely on it.

Essentially the array gets duplicated prematurely, only because of the internal 
position pointer. Of course it might have to anyways be duplicated within the 
for-each loop, but if (any only if) it is actually altered. In most cases one 
just iterates over without altering. Please consider the following sample, 
taken from my recent post:

$arr = $obj-arr; // property arr is an array
foreach ($arr as $val) ...;

This will currently copy the array, because $arr is also visible through 
$obj-arr although this is not really necessary unless the array is actually 
changed during iteration.

If one would use an external position variable that is initialized in FE_RESET 
(TEMPVAR) and then incremented in FE_FETCH one could just increment the 
ref_count of the array while being traversed without the initial need to 
perform copy-on-write.

Now, if the hashtable is in any way altered during the traversal then the usual 
copy-on-write would kick in because for-each initialization made sure that 
ref_count was incremented before starting traversal. In that case PHP would - 
just like currently - have to duplicate, but only on first actual alteration, 
not prematurely on for-each initialization.

So in 90% (just a guess) of the cases, when you just traverse without altering 
you get the full benefit of no-copy-necessary, while in the other cases you 
will basically have the previous performance penalty of duplication, but at 
least postponed to the first alteration (which might be inside a branch that is 
not even taken).

Nested for-each loops would not have to revert to copy-on-write either, because 
they have their own pointer.

This would effectively speed up most for-each operations and would have the 
extra benefit of not having to store an internal pointer in the hashtable 
structure.

Please let me know your thoughts!

Cheers,

Ben

-- 

Benjamin Coutu
Zeyon Technologies Inc.
http://www.zeyos.com


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Improvements to for-each implementation

2015-01-22 Thread Nikita Popov
On Thu, Jan 22, 2015 at 9:33 AM, Benjamin Coutu ben.co...@zeyos.com wrote:

 Hi,

 this post is a fork of the [PHP-DEV] Fixing strange foreach behavior
 thread. It proposes a more efficient for-each mechanism (that does NOT
 change the conceptual behaviour).

 Currently on for-each the engine will have to copy the array if that array
 is visible anywhere else in the program because it will reset the internal
 position pointer (which is part of the underlying hashtable structure) and
 another part of the program might rely on it.

 Essentially the array gets duplicated prematurely, only because of the
 internal position pointer. Of course it might have to anyways be duplicated
 within the for-each loop, but if (any only if) it is actually altered. In
 most cases one just iterates over without altering. Please consider the
 following sample, taken from my recent post:

 $arr = $obj-arr; // property arr is an array
 foreach ($arr as $val) ...;

 This will currently copy the array, because $arr is also visible through
 $obj-arr although this is not really necessary unless the array is
 actually changed during iteration.

 If one would use an external position variable that is initialized in
 FE_RESET (TEMPVAR) and then incremented in FE_FETCH one could just
 increment the ref_count of the array while being traversed without the
 initial need to perform copy-on-write.

 Now, if the hashtable is in any way altered during the traversal then the
 usual copy-on-write would kick in because for-each initialization made sure
 that ref_count was incremented before starting traversal. In that case PHP
 would - just like currently - have to duplicate, but only on first actual
 alteration, not prematurely on for-each initialization.

 So in 90% (just a guess) of the cases, when you just traverse without
 altering you get the full benefit of no-copy-necessary, while in the other
 cases you will basically have the previous performance penalty of
 duplication, but at least postponed to the first alteration (which might be
 inside a branch that is not even taken).

 Nested for-each loops would not have to revert to copy-on-write either,
 because they have their own pointer.

 This would effectively speed up most for-each operations and would have
 the extra benefit of not having to store an internal pointer in the
 hashtable structure.

 Please let me know your thoughts!

 Cheers,

 Ben


Doing this was the idea I had in mind as well, i.e. change the semantics of
foreach to say that it will always work on a copy for by-value iteration
(which ironically avoids having to actually copy it). Note that this will
differ from the current behavior in a number of ways. In particular it
means that changes to arrays that were references prior to iteration will
not influence the iteration.

The real question is what we should do in the by-reference case. Given that
we need to acquire references to elements of the original array we can't
reasonably work with copy-semantics (at least I don't see how). So would we
just stick with the previous behavior (using the hash position hack) for
that?

Nikita


Re: [PHP-DEV] Improvements to for-each implementation

2015-01-22 Thread Benjamin Coutu
Hi Nikita,

I would suggest using the proposal for the by-value case and sticking with the 
current behavior for the by-reference case as you suggested. Granted, we then 
cannot remove the internal pointer all together, but we would just use it for 
the less common by-reference case as well as for the legacy reset/next 
functions, of course. This would still improve most for-each traversals. At 
least we then wouldn't have to find a replacement solution for 
rest/prev/next/end. Moreover we can then move the internal position pointer out 
of the hashtable structure into zend_array because it is only used for userland 
arrays, not other hash tables (such as object properties or internal structures 
that build upon the hashtable struct).

Thanks,

Ben

== Original ==
From: Nikita Popov nikita@gmail.com
To: Benjamin Coutu ben.co...@zeyos.com
Date: Thu, 22 Jan 2015 10:53:19 +0100
Subject: Re: [PHP-DEV] Improvements to for-each implementation

Doing this was the idea I had in mind as well, i.e. change the semantics of
foreach to say that it will always work on a copy for by-value iteration
(which ironically avoids having to actually copy it). Note that this will
differ from the current behavior in a number of ways. In particular it
means that changes to arrays that were references prior to iteration will
not influence the iteration.

The real question is what we should do in the by-reference case. Given that
we need to acquire references to elements of the original array we can't
reasonably work with copy-semantics (at least I don't see how). So would we
just stick with the previous behavior (using the hash position hack) for
that?

Nikita


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php