Re: [PHP-DEV] Improvements to for-each implementation
Hi, I've solved most of the reported problems and started working on RFC. Nikta, please take a look into: https://wiki.php.net/rfc/php7_foreach https://github.com/php/php-src/pull/1034 and especially commit 15a23b1 Only iteration by value over plain objects is not completely consistent now. However, the implementation may have bugs and other problems (your code review may be very helpful). I'm still working... Thanks. Dmitry. On Wed, Jan 28, 2015 at 7:59 AM, Dmitry Stogov dmi...@zend.com wrote: I've created a branch and pull request for easier collaboration and tracking. https://github.com/php/php-src/pull/1034 It's the same diff I published yesterday. Nothing changed or added yet. Thanks. Dmitry. On Wed, Jan 28, 2015 at 1:36 AM, Nikita Popov nikita@gmail.com wrote: On Tue, Jan 27, 2015 at 5:55 PM, Dmitry Stogov dmi...@zend.com wrote: Hi, I'm working on a PoC, implementing the proposed behavior - https://gist.github.com/dstogov/a311e8b0b2cabee4dab4 Only few PHPT tests had to be changed to confirm new behavior and actually they started to behave as HHVM. 2 tests are still unfixed XFAILed. Foreach by reference is still need to be improved to support different array modifications. The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and destruction of 200 arrays on each request) Thanks. Dmitry. I quickly looked over the patch, some notes: * 171: Can directly use GET_OP1_ZVAL_PTR_DEREF * For iterator failures (exception) you use FREE_OP1_IF_VAR(). Is this enough? If the object is a TMP_VAR, don't we have to free it as well? * For objects it will still be possible to modify the hashtable during iteration even in the _R case, correct? I assume we just don't care about this edge case. * 315: In RESET_RW you now always create a reference for VAR|CV operands. However for the get_iterator case we don't need this, right? * 328: In the non VAR|CV case SEPARTE_ARRAY is not used. As we're going to change the IAP, this is probably necessary in this case as well. * For RW iterator failures FREE_OP1_VAR_PTR() is used. This probably leaks in the TMP case. (Same for the invalid argument case.) What concerns me a bit is the new FETCH_RW implementation, because it no longer checks the internal pointer against the external one. This effectively means that foreach by reference behavior will be influenced a lot by details of the hashtable implementation. An example: $array = [0, 1, 2, 3, 4, 5, 6, 7]; unset($array[0], $array[1], $array[2], $array[3]); foreach ($array as $ref) { var_dump($ref); //$array[42] = 42; } Without the commented line this will just print the numbers 4, 5, 6, 7. If you uncomment the line it will only print 4. (Because the append caused a rehash and arData was compacted, so the position now points past all elements). The previous output would have been 4, 5, 6, 7, 42, which is closer to what you would expect. Maybe better to keep the pos != nInternalPointer code? Thanks, Nikita
Re: [PHP-DEV] Improvements to for-each implementation
On 01/27/2015 08:55 AM, Dmitry Stogov wrote: The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and destruction of 200 arrays on each request) Awesome! signature.asc Description: OpenPGP digital signature
Re: [PHP-DEV] Improvements to for-each implementation
Hi Dmitry, On Wed, Jan 28, 2015 at 1:55 AM, Dmitry Stogov dmi...@zend.com wrote: I'm working on a PoC, implementing the proposed behavior - https://gist.github.com/dstogov/a311e8b0b2cabee4dab4 Only few PHPT tests had to be changed to confirm new behavior and actually they started to behave as HHVM. 2 tests are still unfixed XFAILed. Foreach by reference is still need to be improved to support different array modifications. The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and destruction of 200 arrays on each request) Great! -- Yasuo Ohgaki yohg...@ohgaki.net
Re: [PHP-DEV] Improvements to for-each implementation
On Tue, Jan 27, 2015 at 5:55 PM, Dmitry Stogov dmi...@zend.com wrote: Hi, I'm working on a PoC, implementing the proposed behavior - https://gist.github.com/dstogov/a311e8b0b2cabee4dab4 Only few PHPT tests had to be changed to confirm new behavior and actually they started to behave as HHVM. 2 tests are still unfixed XFAILed. Foreach by reference is still need to be improved to support different array modifications. The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and destruction of 200 arrays on each request) Thanks. Dmitry. I quickly looked over the patch, some notes: * 171: Can directly use GET_OP1_ZVAL_PTR_DEREF * For iterator failures (exception) you use FREE_OP1_IF_VAR(). Is this enough? If the object is a TMP_VAR, don't we have to free it as well? * For objects it will still be possible to modify the hashtable during iteration even in the _R case, correct? I assume we just don't care about this edge case. * 315: In RESET_RW you now always create a reference for VAR|CV operands. However for the get_iterator case we don't need this, right? * 328: In the non VAR|CV case SEPARTE_ARRAY is not used. As we're going to change the IAP, this is probably necessary in this case as well. * For RW iterator failures FREE_OP1_VAR_PTR() is used. This probably leaks in the TMP case. (Same for the invalid argument case.) What concerns me a bit is the new FETCH_RW implementation, because it no longer checks the internal pointer against the external one. This effectively means that foreach by reference behavior will be influenced a lot by details of the hashtable implementation. An example: $array = [0, 1, 2, 3, 4, 5, 6, 7]; unset($array[0], $array[1], $array[2], $array[3]); foreach ($array as $ref) { var_dump($ref); //$array[42] = 42; } Without the commented line this will just print the numbers 4, 5, 6, 7. If you uncomment the line it will only print 4. (Because the append caused a rehash and arData was compacted, so the position now points past all elements). The previous output would have been 4, 5, 6, 7, 42, which is closer to what you would expect. Maybe better to keep the pos != nInternalPointer code? Thanks, Nikita
Re: [PHP-DEV] Improvements to for-each implementation
Hey: On Wed, Jan 28, 2015 at 11:22 AM, Xinchen Hui xinche...@zend.com wrote: Hey: On Wed, Jan 28, 2015 at 12:55 AM, Dmitry Stogov dmi...@zend.com wrote: Hi, I'm working on a PoC, implementing the proposed behavior - https://gist.github.com/dstogov/a311e8b0b2cabee4dab4 Only few PHPT tests had to be changed to confirm new behavior and actually they started to behave as HHVM. 2 tests are still unfixed XFAILed. Foreach by reference is still need to be improved to support different array modifications. The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and destruction of 200 arrays on each request) I just have got an idea: Droping use of ZEND_OP_DATA make it possible to generate better foreach loop opcodes previously FE_RESET and FE_FETCH share one same OP_DATA based on offset. 0 FE_RESET 1 FE_FETCH fail- 5 2 OP_DATA 3 . statements 4 JMP 1 5 without depending on that OP_DATA, we can change this to 0 FE_RESET 1 JMP 3 2 statements 3 FE_FETCH success-2 4 that will reduce on JMP in every foreach loop could you please also try this? I mean, dropping both ZEND_FE_RESET and FE_FETCH share one ZEND_OP_DATA. the previously example should be: 0 FE_RESET 1 JMP 3 2 statements 3 FE_FETCH success-2 4 ZEND_OP_DATA 5 thanks thanks Thanks. Dmitry. On Thu, Jan 22, 2015 at 1:38 PM, Benjamin Coutu ben.co...@zeyos.com wrote: Hi Nikita, I would suggest using the proposal for the by-value case and sticking with the current behavior for the by-reference case as you suggested. Granted, we then cannot remove the internal pointer all together, but we would just use it for the less common by-reference case as well as for the legacy reset/next functions, of course. This would still improve most for-each traversals. At least we then wouldn't have to find a replacement solution for rest/prev/next/end. Moreover we can then move the internal position pointer out of the hashtable structure into zend_array because it is only used for userland arrays, not other hash tables (such as object properties or internal structures that build upon the hashtable struct). Thanks, Ben == Original == From: Nikita Popov nikita@gmail.com To: Benjamin Coutu ben.co...@zeyos.com Date: Thu, 22 Jan 2015 10:53:19 +0100 Subject: Re: [PHP-DEV] Improvements to for-each implementation Doing this was the idea I had in mind as well, i.e. change the semantics of foreach to say that it will always work on a copy for by-value iteration (which ironically avoids having to actually copy it). Note that this will differ from the current behavior in a number of ways. In particular it means that changes to arrays that were references prior to iteration will not influence the iteration. The real question is what we should do in the by-reference case. Given that we need to acquire references to elements of the original array we can't reasonably work with copy-semantics (at least I don't see how). So would we just stick with the previous behavior (using the hash position hack) for that? Nikita -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- Xinchen Hui @Laruence http://www.laruence.com/ -- Xinchen Hui @Laruence http://www.laruence.com/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Improvements to for-each implementation
I've created a branch and pull request for easier collaboration and tracking. https://github.com/php/php-src/pull/1034 It's the same diff I published yesterday. Nothing changed or added yet. Thanks. Dmitry. On Wed, Jan 28, 2015 at 1:36 AM, Nikita Popov nikita@gmail.com wrote: On Tue, Jan 27, 2015 at 5:55 PM, Dmitry Stogov dmi...@zend.com wrote: Hi, I'm working on a PoC, implementing the proposed behavior - https://gist.github.com/dstogov/a311e8b0b2cabee4dab4 Only few PHPT tests had to be changed to confirm new behavior and actually they started to behave as HHVM. 2 tests are still unfixed XFAILed. Foreach by reference is still need to be improved to support different array modifications. The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and destruction of 200 arrays on each request) Thanks. Dmitry. I quickly looked over the patch, some notes: * 171: Can directly use GET_OP1_ZVAL_PTR_DEREF * For iterator failures (exception) you use FREE_OP1_IF_VAR(). Is this enough? If the object is a TMP_VAR, don't we have to free it as well? * For objects it will still be possible to modify the hashtable during iteration even in the _R case, correct? I assume we just don't care about this edge case. * 315: In RESET_RW you now always create a reference for VAR|CV operands. However for the get_iterator case we don't need this, right? * 328: In the non VAR|CV case SEPARTE_ARRAY is not used. As we're going to change the IAP, this is probably necessary in this case as well. * For RW iterator failures FREE_OP1_VAR_PTR() is used. This probably leaks in the TMP case. (Same for the invalid argument case.) What concerns me a bit is the new FETCH_RW implementation, because it no longer checks the internal pointer against the external one. This effectively means that foreach by reference behavior will be influenced a lot by details of the hashtable implementation. An example: $array = [0, 1, 2, 3, 4, 5, 6, 7]; unset($array[0], $array[1], $array[2], $array[3]); foreach ($array as $ref) { var_dump($ref); //$array[42] = 42; } Without the commented line this will just print the numbers 4, 5, 6, 7. If you uncomment the line it will only print 4. (Because the append caused a rehash and arData was compacted, so the position now points past all elements). The previous output would have been 4, 5, 6, 7, 42, which is closer to what you would expect. Maybe better to keep the pos != nInternalPointer code? Thanks, Nikita
Re: [PHP-DEV] Improvements to for-each implementation
Hi, I'm working on a PoC, implementing the proposed behavior - https://gist.github.com/dstogov/a311e8b0b2cabee4dab4 Only few PHPT tests had to be changed to confirm new behavior and actually they started to behave as HHVM. 2 tests are still unfixed XFAILed. Foreach by reference is still need to be improved to support different array modifications. The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and destruction of 200 arrays on each request) Thanks. Dmitry. On Thu, Jan 22, 2015 at 1:38 PM, Benjamin Coutu ben.co...@zeyos.com wrote: Hi Nikita, I would suggest using the proposal for the by-value case and sticking with the current behavior for the by-reference case as you suggested. Granted, we then cannot remove the internal pointer all together, but we would just use it for the less common by-reference case as well as for the legacy reset/next functions, of course. This would still improve most for-each traversals. At least we then wouldn't have to find a replacement solution for rest/prev/next/end. Moreover we can then move the internal position pointer out of the hashtable structure into zend_array because it is only used for userland arrays, not other hash tables (such as object properties or internal structures that build upon the hashtable struct). Thanks, Ben == Original == From: Nikita Popov nikita@gmail.com To: Benjamin Coutu ben.co...@zeyos.com Date: Thu, 22 Jan 2015 10:53:19 +0100 Subject: Re: [PHP-DEV] Improvements to for-each implementation Doing this was the idea I had in mind as well, i.e. change the semantics of foreach to say that it will always work on a copy for by-value iteration (which ironically avoids having to actually copy it). Note that this will differ from the current behavior in a number of ways. In particular it means that changes to arrays that were references prior to iteration will not influence the iteration. The real question is what we should do in the by-reference case. Given that we need to acquire references to elements of the original array we can't reasonably work with copy-semantics (at least I don't see how). So would we just stick with the previous behavior (using the hash position hack) for that? Nikita -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Improvements to for-each implementation
Hey: On Wed, Jan 28, 2015 at 11:47 AM, Dmitry Stogov dmi...@zend.com wrote: Hi, I have this in mind. May be It's even possible to remove JMP. FE_RESET op1=array op2=label 2 result=var OP_DATA op1=iterator result=key 1: statements... FE_FETCH op1=iterator op2=label 1 result=var OP_DATA result=key 2: FREE iterator Anyway, it's not a big problem implementing this, but eliminating of array duplication is more important. We still have to solve more serious problems first. of course, I just was curious maybe this could give us more improvement added to that ~1%. thanks Thanks. Dmitry. On Wed, Jan 28, 2015 at 6:22 AM, Xinchen Hui xinche...@zend.com wrote: Hey: On Wed, Jan 28, 2015 at 12:55 AM, Dmitry Stogov dmi...@zend.com wrote: Hi, I'm working on a PoC, implementing the proposed behavior - https://gist.github.com/dstogov/a311e8b0b2cabee4dab4 Only few PHPT tests had to be changed to confirm new behavior and actually they started to behave as HHVM. 2 tests are still unfixed XFAILed. Foreach by reference is still need to be improved to support different array modifications. The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and destruction of 200 arrays on each request) I just have got an idea: Droping use of ZEND_OP_DATA make it possible to generate better foreach loop opcodes previously FE_RESET and FE_FETCH share one same OP_DATA based on offset. 0 FE_RESET 1 FE_FETCH fail- 5 2 OP_DATA 3 . statements 4 JMP 1 5 without depending on that OP_DATA, we can change this to 0 FE_RESET 1 JMP 3 2 statements 3 FE_FETCH success-2 4 that will reduce on JMP in every foreach loop could you please also try this? thanks Thanks. Dmitry. On Thu, Jan 22, 2015 at 1:38 PM, Benjamin Coutu ben.co...@zeyos.com wrote: Hi Nikita, I would suggest using the proposal for the by-value case and sticking with the current behavior for the by-reference case as you suggested. Granted, we then cannot remove the internal pointer all together, but we would just use it for the less common by-reference case as well as for the legacy reset/next functions, of course. This would still improve most for-each traversals. At least we then wouldn't have to find a replacement solution for rest/prev/next/end. Moreover we can then move the internal position pointer out of the hashtable structure into zend_array because it is only used for userland arrays, not other hash tables (such as object properties or internal structures that build upon the hashtable struct). Thanks, Ben == Original == From: Nikita Popov nikita@gmail.com To: Benjamin Coutu ben.co...@zeyos.com Date: Thu, 22 Jan 2015 10:53:19 +0100 Subject: Re: [PHP-DEV] Improvements to for-each implementation Doing this was the idea I had in mind as well, i.e. change the semantics of foreach to say that it will always work on a copy for by-value iteration (which ironically avoids having to actually copy it). Note that this will differ from the current behavior in a number of ways. In particular it means that changes to arrays that were references prior to iteration will not influence the iteration. The real question is what we should do in the by-reference case. Given that we need to acquire references to elements of the original array we can't reasonably work with copy-semantics (at least I don't see how). So would we just stick with the previous behavior (using the hash position hack) for that? Nikita -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- Xinchen Hui @Laruence http://www.laruence.com/ -- Xinchen Hui @Laruence http://www.laruence.com/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Improvements to for-each implementation
Hi, I have this in mind. May be It's even possible to remove JMP. FE_RESET op1=array op2=label 2 result=var OP_DATA op1=iterator result=key 1: statements... FE_FETCH op1=iterator op2=label 1 result=var OP_DATA result=key 2: FREE iterator Anyway, it's not a big problem implementing this, but eliminating of array duplication is more important. We still have to solve more serious problems first. Thanks. Dmitry. On Wed, Jan 28, 2015 at 6:22 AM, Xinchen Hui xinche...@zend.com wrote: Hey: On Wed, Jan 28, 2015 at 12:55 AM, Dmitry Stogov dmi...@zend.com wrote: Hi, I'm working on a PoC, implementing the proposed behavior - https://gist.github.com/dstogov/a311e8b0b2cabee4dab4 Only few PHPT tests had to be changed to confirm new behavior and actually they started to behave as HHVM. 2 tests are still unfixed XFAILed. Foreach by reference is still need to be improved to support different array modifications. The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and destruction of 200 arrays on each request) I just have got an idea: Droping use of ZEND_OP_DATA make it possible to generate better foreach loop opcodes previously FE_RESET and FE_FETCH share one same OP_DATA based on offset. 0 FE_RESET 1 FE_FETCH fail- 5 2 OP_DATA 3 . statements 4 JMP 1 5 without depending on that OP_DATA, we can change this to 0 FE_RESET 1 JMP 3 2 statements 3 FE_FETCH success-2 4 that will reduce on JMP in every foreach loop could you please also try this? thanks Thanks. Dmitry. On Thu, Jan 22, 2015 at 1:38 PM, Benjamin Coutu ben.co...@zeyos.com wrote: Hi Nikita, I would suggest using the proposal for the by-value case and sticking with the current behavior for the by-reference case as you suggested. Granted, we then cannot remove the internal pointer all together, but we would just use it for the less common by-reference case as well as for the legacy reset/next functions, of course. This would still improve most for-each traversals. At least we then wouldn't have to find a replacement solution for rest/prev/next/end. Moreover we can then move the internal position pointer out of the hashtable structure into zend_array because it is only used for userland arrays, not other hash tables (such as object properties or internal structures that build upon the hashtable struct). Thanks, Ben == Original == From: Nikita Popov nikita@gmail.com To: Benjamin Coutu ben.co...@zeyos.com Date: Thu, 22 Jan 2015 10:53:19 +0100 Subject: Re: [PHP-DEV] Improvements to for-each implementation Doing this was the idea I had in mind as well, i.e. change the semantics of foreach to say that it will always work on a copy for by-value iteration (which ironically avoids having to actually copy it). Note that this will differ from the current behavior in a number of ways. In particular it means that changes to arrays that were references prior to iteration will not influence the iteration. The real question is what we should do in the by-reference case. Given that we need to acquire references to elements of the original array we can't reasonably work with copy-semantics (at least I don't see how). So would we just stick with the previous behavior (using the hash position hack) for that? Nikita -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- Xinchen Hui @Laruence http://www.laruence.com/
Re: [PHP-DEV] Improvements to for-each implementation
Hey: On Wed, Jan 28, 2015 at 12:55 AM, Dmitry Stogov dmi...@zend.com wrote: Hi, I'm working on a PoC, implementing the proposed behavior - https://gist.github.com/dstogov/a311e8b0b2cabee4dab4 Only few PHPT tests had to be changed to confirm new behavior and actually they started to behave as HHVM. 2 tests are still unfixed XFAILed. Foreach by reference is still need to be improved to support different array modifications. The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and destruction of 200 arrays on each request) I just have got an idea: Droping use of ZEND_OP_DATA make it possible to generate better foreach loop opcodes previously FE_RESET and FE_FETCH share one same OP_DATA based on offset. 0 FE_RESET 1 FE_FETCH fail- 5 2 OP_DATA 3 . statements 4 JMP 1 5 without depending on that OP_DATA, we can change this to 0 FE_RESET 1 JMP 3 2 statements 3 FE_FETCH success-2 4 that will reduce on JMP in every foreach loop could you please also try this? thanks Thanks. Dmitry. On Thu, Jan 22, 2015 at 1:38 PM, Benjamin Coutu ben.co...@zeyos.com wrote: Hi Nikita, I would suggest using the proposal for the by-value case and sticking with the current behavior for the by-reference case as you suggested. Granted, we then cannot remove the internal pointer all together, but we would just use it for the less common by-reference case as well as for the legacy reset/next functions, of course. This would still improve most for-each traversals. At least we then wouldn't have to find a replacement solution for rest/prev/next/end. Moreover we can then move the internal position pointer out of the hashtable structure into zend_array because it is only used for userland arrays, not other hash tables (such as object properties or internal structures that build upon the hashtable struct). Thanks, Ben == Original == From: Nikita Popov nikita@gmail.com To: Benjamin Coutu ben.co...@zeyos.com Date: Thu, 22 Jan 2015 10:53:19 +0100 Subject: Re: [PHP-DEV] Improvements to for-each implementation Doing this was the idea I had in mind as well, i.e. change the semantics of foreach to say that it will always work on a copy for by-value iteration (which ironically avoids having to actually copy it). Note that this will differ from the current behavior in a number of ways. In particular it means that changes to arrays that were references prior to iteration will not influence the iteration. The real question is what we should do in the by-reference case. Given that we need to acquire references to elements of the original array we can't reasonably work with copy-semantics (at least I don't see how). So would we just stick with the previous behavior (using the hash position hack) for that? Nikita -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- Xinchen Hui @Laruence http://www.laruence.com/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Improvements to for-each implementation
Hi, this post is a fork of the [PHP-DEV] Fixing strange foreach behavior thread. It proposes a more efficient for-each mechanism (that does NOT change the conceptual behaviour). Currently on for-each the engine will have to copy the array if that array is visible anywhere else in the program because it will reset the internal position pointer (which is part of the underlying hashtable structure) and another part of the program might rely on it. Essentially the array gets duplicated prematurely, only because of the internal position pointer. Of course it might have to anyways be duplicated within the for-each loop, but if (any only if) it is actually altered. In most cases one just iterates over without altering. Please consider the following sample, taken from my recent post: $arr = $obj-arr; // property arr is an array foreach ($arr as $val) ...; This will currently copy the array, because $arr is also visible through $obj-arr although this is not really necessary unless the array is actually changed during iteration. If one would use an external position variable that is initialized in FE_RESET (TEMPVAR) and then incremented in FE_FETCH one could just increment the ref_count of the array while being traversed without the initial need to perform copy-on-write. Now, if the hashtable is in any way altered during the traversal then the usual copy-on-write would kick in because for-each initialization made sure that ref_count was incremented before starting traversal. In that case PHP would - just like currently - have to duplicate, but only on first actual alteration, not prematurely on for-each initialization. So in 90% (just a guess) of the cases, when you just traverse without altering you get the full benefit of no-copy-necessary, while in the other cases you will basically have the previous performance penalty of duplication, but at least postponed to the first alteration (which might be inside a branch that is not even taken). Nested for-each loops would not have to revert to copy-on-write either, because they have their own pointer. This would effectively speed up most for-each operations and would have the extra benefit of not having to store an internal pointer in the hashtable structure. Please let me know your thoughts! Cheers, Ben -- Benjamin Coutu Zeyon Technologies Inc. http://www.zeyos.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Improvements to for-each implementation
On Thu, Jan 22, 2015 at 9:33 AM, Benjamin Coutu ben.co...@zeyos.com wrote: Hi, this post is a fork of the [PHP-DEV] Fixing strange foreach behavior thread. It proposes a more efficient for-each mechanism (that does NOT change the conceptual behaviour). Currently on for-each the engine will have to copy the array if that array is visible anywhere else in the program because it will reset the internal position pointer (which is part of the underlying hashtable structure) and another part of the program might rely on it. Essentially the array gets duplicated prematurely, only because of the internal position pointer. Of course it might have to anyways be duplicated within the for-each loop, but if (any only if) it is actually altered. In most cases one just iterates over without altering. Please consider the following sample, taken from my recent post: $arr = $obj-arr; // property arr is an array foreach ($arr as $val) ...; This will currently copy the array, because $arr is also visible through $obj-arr although this is not really necessary unless the array is actually changed during iteration. If one would use an external position variable that is initialized in FE_RESET (TEMPVAR) and then incremented in FE_FETCH one could just increment the ref_count of the array while being traversed without the initial need to perform copy-on-write. Now, if the hashtable is in any way altered during the traversal then the usual copy-on-write would kick in because for-each initialization made sure that ref_count was incremented before starting traversal. In that case PHP would - just like currently - have to duplicate, but only on first actual alteration, not prematurely on for-each initialization. So in 90% (just a guess) of the cases, when you just traverse without altering you get the full benefit of no-copy-necessary, while in the other cases you will basically have the previous performance penalty of duplication, but at least postponed to the first alteration (which might be inside a branch that is not even taken). Nested for-each loops would not have to revert to copy-on-write either, because they have their own pointer. This would effectively speed up most for-each operations and would have the extra benefit of not having to store an internal pointer in the hashtable structure. Please let me know your thoughts! Cheers, Ben Doing this was the idea I had in mind as well, i.e. change the semantics of foreach to say that it will always work on a copy for by-value iteration (which ironically avoids having to actually copy it). Note that this will differ from the current behavior in a number of ways. In particular it means that changes to arrays that were references prior to iteration will not influence the iteration. The real question is what we should do in the by-reference case. Given that we need to acquire references to elements of the original array we can't reasonably work with copy-semantics (at least I don't see how). So would we just stick with the previous behavior (using the hash position hack) for that? Nikita
Re: [PHP-DEV] Improvements to for-each implementation
Hi Nikita, I would suggest using the proposal for the by-value case and sticking with the current behavior for the by-reference case as you suggested. Granted, we then cannot remove the internal pointer all together, but we would just use it for the less common by-reference case as well as for the legacy reset/next functions, of course. This would still improve most for-each traversals. At least we then wouldn't have to find a replacement solution for rest/prev/next/end. Moreover we can then move the internal position pointer out of the hashtable structure into zend_array because it is only used for userland arrays, not other hash tables (such as object properties or internal structures that build upon the hashtable struct). Thanks, Ben == Original == From: Nikita Popov nikita@gmail.com To: Benjamin Coutu ben.co...@zeyos.com Date: Thu, 22 Jan 2015 10:53:19 +0100 Subject: Re: [PHP-DEV] Improvements to for-each implementation Doing this was the idea I had in mind as well, i.e. change the semantics of foreach to say that it will always work on a copy for by-value iteration (which ironically avoids having to actually copy it). Note that this will differ from the current behavior in a number of ways. In particular it means that changes to arrays that were references prior to iteration will not influence the iteration. The real question is what we should do in the by-reference case. Given that we need to acquire references to elements of the original array we can't reasonably work with copy-semantics (at least I don't see how). So would we just stick with the previous behavior (using the hash position hack) for that? Nikita -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php