Maybe you just have a job for Cap'n'proto? https://capnproto.org/
On 8 September 2015 at 11:12, Gary Robinson <gary...@me.com> wrote: > Folks, > > If it’s out of line in some way for me to make this comment on this list, let > me know and I’ll stop! But I do feel strongly about one issue and think it’s > worth mentioning, so here goes. > > I read the "A better story for multi-core Python” with great interest because > the GIL has actually been a major hindrance to me. I know that for many uses, > it’s a non-issue. But it was for me. > > My situation was that I had a huge (technically mutable, but unchanging) data > structure which needed a lot of analysis. CPU time was a major factor — > things took days to run. But even so, my time as a programmer was much more > important than CPU time. I needed to prototype different algorithms very > quickly. Even Cython would have slowed me down too much. Also, I had a lot of > reason to want to make use of the many great statistical functions in SciPy, > so Python was an excellent choice for me in that way. > > So, even though pure Python might not be the right choice for this program in > a production environment, it was the right choice for me at the time. And, if > I could have accessed as many cores as I wanted, it may have been good enough > in production too. But my work was hampered by one thing: > > There was a huge data structure that all the analysis needed to access. Using > a database would have slowed things down too much. Ideally, I needed to > access this same structure from many cores at once. On a Power8 system, for > example, with its larger number of cores, performance may well have been good > enough for production. In any case, my experimentation and prototyping would > have gone more quickly with more cores. > > But this data structure was simply too big. Replicating it in different > processes used memory far too quickly and was the limiting factor on the > number of cores I could use. (I could fork with the big data structure > already in memory, but copy-on-write issues due to reference counting caused > multiple copies to exist anyway.) > > So, one thing I am hoping comes out of any effort in the “A better story” > direction would be a way to share large data structures between processes. > Two possible solutions: > > 1) More the reference counts away from data structures, so copy-on-write > isn’t an issue. That sounds like a lot of work — I have no idea whether it’s > practical. It has been mentioned in the “A better story” discussion, but I > wanted to bring it up again in the context of my specific use-case. Also, it > seems worth reiterating that even though copy-on-write forking is a Unix > thing, the midipix project appears to bring it to Windows as well. > (http://midipix.org) > > 2) Have a mode where a particular data structure is not reference counted or > garbage collected. The programmer would be entirely responsible for manually > calling del on the structure if he wants to free that memory. I would imagine > this would be controversial because Python is currently designed in a very > different way. However, I see no actual risk if one were to use an > @manual_memory_management decorator or some technique like that to make it > very clear that the programmer is taking responsibility. I.e., in general, > information sharing between subinterpreters would occur through message > passing. But there would be the option of the programmer taking > responsibility of memory management for a particular structure. In my case, > the amount of work required for this would have been approximately zero — > once the structure was created, it was needed for the lifetime of the process. > > Under this second solution, there would be little need to actually remove the > reference counts from the data structures — they just wouldn’t be accessed. > Maybe it’s not a practical solution, if only because of the overhead of > Python needing to check whether a given structure is manually managed or not. > In that case, the first solution makes more sense. > > In any case I thought this was worth mentioning, because it has been a real > problem for me, and I assume it has been a real problem for other people as > well. If a solution is both possible and practical, that would be great. > > Thank you for listening, > Gary > > > -- > > Gary Robinson > gary...@me.com > http://www.garyrobinson.net > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/jsbueno%40python.org.br _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com