On Jun 18, 2013, at 12:00 AM, Gregory P. Smith <g...@krypto.org> wrote:
> Why did you do this in the 2.7 branch? > > It hasn't been done in 3.3 or default I worked on the 2.7 branch first because that was the one I had loaded and the one where I did timings and code disassembly. I intended to post it to 3.3 and 3.4 as well over the weekend. Ideally, it makes maintenance simpler for me if I keep the branches the same as possible. I viewed the one-line struct transposition, comment correction, and one-line blocklen change as being somewhat innocuous non-algorithmic changes. The struct change fixed an unintended cache miss for left links and the blocksize change causes the deque_index code to compile more efficiently (using a right-shift and bitwise-and and rather than a measurably more expensive division and modulo calculation). I truly wasn't expecting the Spanish Inquisition :-) On Jun 22, 2013, at 1:43 PM, Scott Dial <scott+python-...@scottdial.com> wrote: > changeset 84248:f1dc30a1be72 2.7 > Arrange structure to match the common access patterns. > > 1.1 --- a/Modules/_collectionsmodule.c > 1.2 +++ b/Modules/_collectionsmodule.c > 1.3 @@ -48,8 +48,8 @@ > 1.4 > 1.5 typedef struct BLOCK { > 1.6 struct BLOCK *leftlink; > 1.7 + PyObject *data[BLOCKLEN]; > 1.8 struct BLOCK *rightlink; > 1.9 - PyObject *data[BLOCKLEN]; > 1.10 } block; > 1.11 > 1.12 #define MAXFREEBLOCKS 10 > > , which seems like a strange micro-optimization, at best. Yes, this is a micro-optimization. In working on implementing deque slicing for 3.4, I restudied the block access patterns. On an appendleft(), popleft() or extendleft() operation, the left link is accessed immediately before or after the leftmost entry in the data block. The old struct arrangement can cause an unnecessary cache miss when jumping leftward. This was something I overlooked when I originally wrote the code almost a decade ago. On Jun 23, 2013, at 11:38 AM, Benjamin Peterson <benja...@python.org> wrote: > I've backed this one out, too. Really, you reverted my one-line change within 24 hours of it being posted? I can't be on-line every day and sometimes it takes a little while to catch up with python email. On Jun 22, 2013, at 2:55 PM, Guido van Rossum <gu...@python.org> wrote: > Actually the data buffer is an array of pointers too, so with the > original BLOCKLEN value of 62, sizeof(block) would be 64 times > sizeof(PyObject *). In the Py3 version of the source there's even a > comment explaining this right before the #define BLOCKLEN. Raymond, > can you explain? I also tried to fix that comment so it would stop emphasizing the blocklen being a multiple of the cache line. Also long as there is a reasonably long data block, it matters less whether the data block size is an exact multiple of the cache line length (62 or 64 words of data versus a typical 8 byte cache line). The data block size does matter elsewhere though. The benefit of the having the data block being a power-of-two is that the deque_index computation can use bits shifts rather division and modulo calculations. The benefit of switching data block size from 62 to 64 was measurable (a few percent) and observable in the disassembly of the code. I experimented with one other ordering as well (putting the data block first and the links afterwards). That saved the scaled-index byte in the generated code but produced no measureable speed-up. In short, I believe the following should go in: * The comment fix. (Incorrectly suggesting that a 64-bit Py_ssize_t would overflow). The revised comment is simpler, more general, and correct. * Putting the leftlink before the data block in the structure. The follow is up for debate: Changing the BLOCKLEN from 62 to 64 is debatable. It measureably improved deque_index without an observable negative effect on the other operations. Lastly, there was a change I just put in to Py 3.4 replacing the memcpy() with a simple loop and replacing the "deque->" references with local variables. Besides giving a small speed-up, it made the code more clear and less at the mercy of various implementations of memcpy(). Ideally, I would like 2.7 and 3.3 to replace their use of memcpy() as well, but the flavor of this thread suggests that is right out. Raymond
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com