Re: [Python-Dev] Yet another "A better story for multi-core Python" comment

Joao S. O. Bueno Tue, 08 Sep 2015 13:58:20 -0700

Maybe you just have a job for Cap'n'proto?
https://capnproto.org/


On 8 September 2015 at 11:12, Gary Robinson <gary...@me.com> wrote:
> Folks,
>
> If it’s out of line in some way for me to make this comment on this list, let 
> me know and I’ll stop! But I do feel strongly about one issue and think it’s 
> worth mentioning, so here goes.
>
> I read the "A better story for multi-core Python” with great interest because 
> the GIL has actually been a major hindrance to me. I know that for many uses, 
> it’s a non-issue. But it was for me.
>
> My situation was that I had a huge (technically mutable, but unchanging) data 
> structure which needed a lot of analysis. CPU time was a major factor — 
> things took days to run. But even so, my time as a programmer was much more 
> important than CPU time. I needed to prototype different algorithms very 
> quickly. Even Cython would have slowed me down too much. Also, I had a lot of 
> reason to want to make use of the many great statistical functions in SciPy, 
> so Python was an excellent choice for me in that way.
>
> So, even though pure Python might not be the right choice for this program in 
> a production environment, it was the right choice for me at the time. And, if 
> I could have accessed as many cores as I wanted, it may have been good enough 
> in production too. But my work was hampered by one thing:
>
> There was a huge data structure that all the analysis needed to access. Using 
> a database would have slowed things down too much. Ideally, I needed to 
> access this same structure from many cores at once. On a Power8 system, for 
> example, with its larger number of cores, performance may well have been good 
> enough for production. In any case, my experimentation and prototyping would 
> have gone more quickly with more cores.
>
> But this data structure was simply too big. Replicating it in different 
> processes used memory far too quickly and was the limiting factor on the 
> number of cores I could use. (I could fork with the big data structure 
> already in memory, but copy-on-write issues due to reference counting caused 
> multiple copies to exist anyway.)
>
> So, one thing I am hoping comes out of any effort in the “A better story” 
> direction would be a way to share large data structures between processes. 
> Two possible solutions:
>
> 1) More the reference counts away from data structures, so copy-on-write 
> isn’t an issue. That sounds like a lot of work — I have no idea whether it’s 
> practical. It has been mentioned in the “A better story” discussion, but I 
> wanted to bring it up again in the context of my specific use-case. Also, it 
> seems worth reiterating that even though copy-on-write forking is a Unix 
> thing, the midipix project appears to bring it to Windows as well. 
> (http://midipix.org)
>
> 2) Have a mode where a particular data structure is not reference counted or 
> garbage collected. The programmer would be entirely responsible for manually 
> calling del on the structure if he wants to free that memory. I would imagine 
> this would be controversial because Python is currently designed in a very 
> different way. However, I see no actual risk if one were to use an 
> @manual_memory_management decorator or some technique like that to make it 
> very clear that the programmer is taking responsibility. I.e., in general, 
> information sharing between subinterpreters would occur through message 
> passing. But there would be the option of the programmer taking 
> responsibility of memory management for a particular structure. In my case, 
> the amount of work required for this would have been approximately zero — 
> once the structure was created, it was needed for the lifetime of the process.
>
> Under this second solution, there would be little need to actually remove the 
> reference counts from the data structures — they just wouldn’t be accessed. 
> Maybe it’s not a practical solution, if only because of the overhead of 
> Python needing to check whether a given structure is manually managed or not. 
> In that case, the first solution makes more sense.
>
> In any case I thought this was worth mentioning,  because it has been a real 
> problem for me, and I assume it has been a real problem for other people as 
> well. If a solution is both possible and practical, that would be great.
>
> Thank you for listening,
> Gary
>
>
> --
>
> Gary Robinson
> gary...@me.com
> http://www.garyrobinson.net
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/jsbueno%40python.org.br
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Yet another "A better story for multi-core Python" comment

Reply via email to