Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Sturla Molden
Den 24.05.2011 17:39, skrev Artur Siekielski: Disk access is about 1000x slower than memory access in C, and Python in a worst case is 50x slower than C, so there is still a huge win (not to mention that in a common case Python is only a few times slower). You can put databases in shared memor

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Cesare Di Mauro
2011/5/24 Stefan Behnel > Maciej Fijalkowski, 24.05.2011 13:31: > > CPython was not designed for CPU cache usage as far as I'm aware. >> > > That's a pretty bold statement to make on this list. Even if it wasn't > originally "designed" for (efficient?) CPU cache usage, it's certainly been > aro

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread geremy condra
On Tue, May 24, 2011 at 8:44 AM, Terry Reedy wrote: > On 5/24/2011 8:25 AM, Sturla Molden wrote: > >> Artur Siekielski is not talking about cache locality, but copy-on-write >> fork on Linux et al. >> >> When reference counts are updated after forking, memory pages marked >> copy-on-write are copi

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Terry Reedy
On 5/24/2011 8:25 AM, Sturla Molden wrote: Artur Siekielski is not talking about cache locality, but copy-on-write fork on Linux et al. When reference counts are updated after forking, memory pages marked copy-on-write are copied if they store reference counts. And then he quickly runs out of m

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Artur Siekielski
2011/5/24 Sturla Molden : > Den 24.05.2011 11:55, skrev Artur Siekielski: >> >> PYRO/multiprocessing proxies isn't a comparable solution because of >> ORDERS OF MAGNITUDE worser performance. You compare here direct memory >> access vs serialization/message passing through sockets/pipes. > The bottl

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Nick Coghlan
On Tue, May 24, 2011 at 10:05 PM, Stefan Behnel wrote: > Maciej Fijalkowski, 24.05.2011 13:31: >> >> CPython was not designed for CPU cache usage as far as I'm aware. > > That's a pretty bold statement to make on this list. Even if it wasn't > originally "designed" for (efficient?) CPU cache usage

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Stefan Behnel
Antoine Pitrou, 24.05.2011 14:32: On Tue, 24 May 2011 14:05:26 +0200Stefan Behnel wrote: I doubt that efficient CPU cache usage was a major design goal of PyPy right from the start. IMHO, the project has changed its objectives way too many times to claim something like that, especially at the l

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Antoine Pitrou
On Tue, 24 May 2011 14:05:26 +0200 Stefan Behnel wrote: > > I doubt that efficient CPU cache usage was a major design goal of PyPy > right from the start. IMHO, the project has changed its objectives way too > many times to claim something like that, especially at the low level where > the CPU

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Sturla Molden
Den 24.05.2011 11:55, skrev Artur Siekielski: POSH might be good, but the project is dead for 8 years. And this copy-on-write is nice because you don't need changes/restrictions to your code, or a special garbage collector. Then I have a solution for you, one that is cheaper than anything else

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Sturla Molden
Den 24.05.2011 13:31, skrev Maciej Fijalkowski: Not sure what scenario exactly are you discussing here, but storing reference counts outside of objects has (at least on a single processor) worse cache locality than inside objects. Artur Siekielski is not talking about cache locality, but copy

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Sturla Molden
Den 24.05.2011 11:55, skrev Artur Siekielski: PYRO/multiprocessing proxies isn't a comparable solution because of ORDERS OF MAGNITUDE worser performance. You compare here direct memory access vs serialization/message passing through sockets/pipes. The bottleneck is likely the serialization, bu

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Stefan Behnel
Maciej Fijalkowski, 24.05.2011 13:31: CPython was not designed for CPU cache usage as far as I'm aware. That's a pretty bold statement to make on this list. Even if it wasn't originally "designed" for (efficient?) CPU cache usage, it's certainly been around for long enough to have received nu

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Maciej Fijalkowski
On Sun, May 22, 2011 at 1:57 AM, Artur Siekielski wrote: > Hi. > The problem with reference counters is that they are very often > incremented/decremented, even for read-only algorithms (like traversal > of a list). It has two drawbacks: > 1. CPU cache lines (64 bytes on X86) containing a beginnin

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Artur Siekielski
2011/5/24 Sturla Molden : >> Oh, and using explicit shared memory or mmap is much harder, because >> you have to map the whole object graph into bytes. > > It sounds like you need PYRO, POSH or multiprocessing's proxy objects. PYRO/multiprocessing proxies isn't a comparable solution because of ORD

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-23 Thread Nick Coghlan
On Tue, May 24, 2011 at 8:33 AM, Sturla Molden wrote: > Den 24.05.2011 00:07, skrev Artur Siekielski: >> >> Oh, and using explicit shared memory or mmap is much harder, because >> you have to map the whole object graph into bytes. > > It sounds like you need PYRO, POSH or multiprocessing's proxy o

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-23 Thread Sturla Molden
Den 24.05.2011 00:07, skrev Artur Siekielski: Oh, and using explicit shared memory or mmap is much harder, because you have to map the whole object graph into bytes. It sounds like you need PYRO, POSH or multiprocessing's proxy objects. Sturla ___ P

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-23 Thread Artur Siekielski
2011/5/23 Guido van Rossum : >> Anyway, I'd like to have working copy-on-write in CPython - in the >> presence of GIL I find it important to have multiprocess programs >> optimized (and I think it's a common idiom that a parent process >> prepares some big data structure, and child "worker" process

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-23 Thread Guido van Rossum
On Mon, May 23, 2011 at 1:55 PM, Artur Siekielski wrote: > Ok, I managed to make a quick but working patch (sufficient to get > working interpreter, it segfaults for extension modules). It uses the > "ememoa" allocator (http://code.google.com/p/ememoa/) which seems a > reasonable pool allocator. T

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-23 Thread Artur Siekielski
Ok, I managed to make a quick but working patch (sufficient to get working interpreter, it segfaults for extension modules). It uses the "ememoa" allocator (http://code.google.com/p/ememoa/) which seems a reasonable pool allocator. The patch: http://dpaste.org/K8en/. The main obstacle was that ther

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-23 Thread Sturla Molden
Den 23.05.2011 06:59, skrev "Martin v. Löwis": My expectation is that your approach would likely make the issues worse in a multi-CPU setting. If you put multiple reference counters into a contiguous block of memory, unrelated reference counters will live in the same cache line. Consequentially,

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-22 Thread Cesare Di Mauro
2011/5/23 "Martin v. Löwis" > > I'm not a compiler/profiling expert so the main question is if such > > design can work, and maybe someone was thinking about something > > similar? > > My expectation is that your approach would likely make the issues > worse in a multi-CPU setting. If you put mul

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-22 Thread Martin v. Löwis
> I'm not a compiler/profiling expert so the main question is if such > design can work, and maybe someone was thinking about something > similar? My expectation is that your approach would likely make the issues worse in a multi-CPU setting. If you put multiple reference counters into a contiguou

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-22 Thread Charles-François Natali
>> 1. CPU cache lines (64 bytes on X86) containing a beginning of a >> PyObject are very often invalidated, resulting in loosing many chances >> to use the CPU caches > > Mutating data doesn't invalidate a cache line. It just makes it > necessary to write it back to memory at some point. > I think

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-22 Thread Antoine Pitrou
Hello, On Sun, 22 May 2011 01:57:55 +0200 Artur Siekielski wrote: > 1. CPU cache lines (64 bytes on X86) containing a beginning of a > PyObject are very often invalidated, resulting in loosing many chances > to use the CPU caches Mutating data doesn't invalidate a cache line. It just makes it n

[Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-21 Thread Artur Siekielski
Hi. The problem with reference counters is that they are very often incremented/decremented, even for read-only algorithms (like traversal of a list). It has two drawbacks: 1. CPU cache lines (64 bytes on X86) containing a beginning of a PyObject are very often invalidated, resulting in loosing man