Re: [Python-Dev] Yet another "A better story for multi-core Python" comment
On Wed, Sep 09, 2015 at 04:33:49PM -0400, Trent Nelson wrote: PyObjects, loads a huge NumPy array, and has a WSS of ~11GB. [...] I've done a couple of consultancy projects now that were very data science oriented (with huge data sets), so I really gained an appreciation for how common the situation you describe is. It is probably the best demonstration of PyParallel's strengths. This problem is also common in well-heeled financial services places, many of which are non-Windows. There might be some good opportunities there. Trent. Martin pgpEhY6ahMoN9.pgp Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Yet another "A better story for multi-core Python" comment
Hi Gary, On Tue, Sep 8, 2015 at 4:12 PM, Gary Robinsonwrote: > 1) More the reference counts away from data structures, so copy-on-write > isn’t an issue. A general note about PyPy --- sorry, it probably doesn't help your use case because SciPy is not supported right now... Right now, PyPy hits the same problem as CPython, despite not being based on reference counting, because every major collection needs to write flag bits inside the header of every object. However, fixing this issue is much more straightforward here: there are well-documented ways that other virtual machines (for other languages) already do. Mostly, instead of writing one bit in the GC header, we'd write one bit in some compact out-of-line array of bits. Moreover, it is an issue that we hear about every now and again, so we may eventually just do it. A bientôt, Armin. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Yet another "A better story for multi-core Python" comment
On Wed, Sep 09, 2015 at 04:52:39PM -0400, Gary Robinson wrote: > I’m going to seriously consider installing Windows or using a > dedicated hosted windows box next time I have this problem so that I > can try your solution. It does seem pretty ideal, although the STM > branch of PyPy (using http://codespeak.net/execnet/ to access SciPy) > might also work at this point. I'm not sure how up-to-date this is: http://pypy.readthedocs.org/en/latest/stm.html But it sounds like there's a 1.5GB memory limit (or maybe 2.5GB now, I just peaked at core.h linked in that page) and a 4-core segment limit. PyParallel has no memory limit (although it actually does have support for throttling back memory pressure by not accepting new connections when the system hits 90% physical memory used) and no core limit, and it scales linearly with cores+concurrency. PyPy-STM and PyParallel are both pretty bleeding edge and experimental though so I'm sure we both crash as much as each other when exercised outside of our comfort zones :-) I haven't tried getting the SciPy stack running with PyParallel yet. Trent. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Yet another "A better story for multi-core Python" comment
> > I haven't tried getting the SciPy stack running with PyParallel yet. That would be essential for my use. I would assume a lot of potential PyParallel users are in the same boat. Thanks for the info about PyPy limits. You have a really interesting project. -- Gary Robinson gary...@me.com http://www.garyrobinson.net > On Sep 9, 2015, at 7:02 PM, Trent Nelsonwrote: > > On Wed, Sep 09, 2015 at 04:52:39PM -0400, Gary Robinson wrote: >> I’m going to seriously consider installing Windows or using a >> dedicated hosted windows box next time I have this problem so that I >> can try your solution. It does seem pretty ideal, although the STM >> branch of PyPy (using http://codespeak.net/execnet/ to access SciPy) >> might also work at this point. > > I'm not sure how up-to-date this is: > > http://pypy.readthedocs.org/en/latest/stm.html > > But it sounds like there's a 1.5GB memory limit (or maybe 2.5GB now, I > just peaked at core.h linked in that page) and a 4-core segment limit. > > PyParallel has no memory limit (although it actually does have support > for throttling back memory pressure by not accepting new connections > when the system hits 90% physical memory used) and no core limit, and it > scales linearly with cores+concurrency. > > PyPy-STM and PyParallel are both pretty bleeding edge and experimental > though so I'm sure we both crash as much as each other when exercised > outside of our comfort zones :-) > > I haven't tried getting the SciPy stack running with PyParallel yet. > >Trent. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Yet another "A better story for multi-core Python" comment
On Tue, Sep 08, 2015 at 10:12:37AM -0400, Gary Robinson wrote: > There was a huge data structure that all the analysis needed to > access. Using a database would have slowed things down too much. > Ideally, I needed to access this same structure from many cores at > once. On a Power8 system, for example, with its larger number of > cores, performance may well have been good enough for production. In > any case, my experimentation and prototyping would have gone more > quickly with more cores. > > But this data structure was simply too big. Replicating it in > different processes used memory far too quickly and was the limiting > factor on the number of cores I could use. (I could fork with the big > data structure already in memory, but copy-on-write issues due to > reference counting caused multiple copies to exist anyway.) This problem is *exactly* the type of thing that PyParallel excels at, just FYI. PyParallel can load large, complex data structures now, and then access them freely from within multiple threads. I'd recommended taking a look at the "instantaneous Wikipedia search server" example as a start: https://github.com/pyparallel/pyparallel/blob/branches/3.3-px/examples/wiki/wiki.py That loads trie with 27 million entries, creates ~27.1 million PyObjects, loads a huge NumPy array, and has a WSS of ~11GB. I've actually got a new version in development that loads 6 tries of the most frequent terms for character lengths 1-6. Once everything is loaded, the data structures can be accessed for free in parallel threads. There are more details regarding how this is achieved on the landing page: https://github.com/pyparallel/pyparallel I've done a couple of consultancy projects now that were very data science oriented (with huge data sets), so I really gained an appreciation for how common the situation you describe is. It is probably the best demonstration of PyParallel's strengths. > Gary Robinson gary...@me.com http://www.garyrobinson.net Trent. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Yet another "A better story for multi-core Python" comment
On 09/09/2015 01:33 PM, Trent Nelson wrote: This problem is *exactly* the type of thing that PyParallel excels at [...] Sorry if I missed it, but is PyParallel still Windows only? -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Yet another "A better story for multi-core Python" comment
On Wed, Sep 09, 2015 at 01:43:19PM -0700, Ethan Furman wrote: > On 09/09/2015 01:33 PM, Trent Nelson wrote: > > >This problem is *exactly* the type of thing that PyParallel excels at [...] > > Sorry if I missed it, but is PyParallel still Windows only? Yeah, still Windows only. Still based off 3.3.5. I'm hoping to rebase off 3.5 after its tagged and get it into a state where it can at least build on POSIX (i.e. stub enough functions such that it'll compile). That's going to be a lot of work though, would love to get some help with it. Trent. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Yet another "A better story for multi-core Python" comment
I’m going to seriously consider installing Windows or using a dedicated hosted windows box next time I have this problem so that I can try your solution. It does seem pretty ideal, although the STM branch of PyPy (using http://codespeak.net/execnet/ to access SciPy) might also work at this point. Thanks! I still hope CPython has a solution at some point… maybe PyParallelel functionality will be integrated into Python 4 circa 2023… :) -- Gary Robinson gary...@me.com http://www.garyrobinson.net > On Sep 9, 2015, at 4:33 PM, Trent Nelsonwrote: > > On Tue, Sep 08, 2015 at 10:12:37AM -0400, Gary Robinson wrote: >> There was a huge data structure that all the analysis needed to >> access. Using a database would have slowed things down too much. >> Ideally, I needed to access this same structure from many cores at >> once. On a Power8 system, for example, with its larger number of >> cores, performance may well have been good enough for production. In >> any case, my experimentation and prototyping would have gone more >> quickly with more cores. >> >> But this data structure was simply too big. Replicating it in >> different processes used memory far too quickly and was the limiting >> factor on the number of cores I could use. (I could fork with the big >> data structure already in memory, but copy-on-write issues due to >> reference counting caused multiple copies to exist anyway.) > > This problem is *exactly* the type of thing that PyParallel excels at, > just FYI. PyParallel can load large, complex data structures now, and > then access them freely from within multiple threads. I'd recommended > taking a look at the "instantaneous Wikipedia search server" example as > a start: > > https://github.com/pyparallel/pyparallel/blob/branches/3.3-px/examples/wiki/wiki.py > > That loads trie with 27 million entries, creates ~27.1 million > PyObjects, loads a huge NumPy array, and has a WSS of ~11GB. I've > actually got a new version in development that loads 6 tries of the > most frequent terms for character lengths 1-6. Once everything is > loaded, the data structures can be accessed for free in parallel > threads. > > There are more details regarding how this is achieved on the landing > page: > > https://github.com/pyparallel/pyparallel > > I've done a couple of consultancy projects now that were very data > science oriented (with huge data sets), so I really gained an > appreciation for how common the situation you describe is. It is > probably the best demonstration of PyParallel's strengths. > >> Gary Robinson gary...@me.com http://www.garyrobinson.net > >Trent. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Yet another "A better story for multi-core Python" comment
> > Trent seems to be on to something that requires only a bit of a tilt > ;-), and despite the caveat above, I agree with David, check it out: I emailed with Trent a couple years ago about this very topic. The biggest issue for me was that it was Windows-only, but it sounds like that restriction may be getting closer to possibly going away… (?) -- Gary Robinson gary...@me.com http://www.garyrobinson.net ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Yet another "A better story for multi-core Python" comment
On 8 September 2015 at 11:07, Gary Robinsonwrote: >> I guess a third possible solution, although it would probably have >> meant developing something for yourself which would have hit the same >> "programmer time is critical" issue that you noted originally, would >> be to create a module that managed the data structure in shared >> memory, and then use that to access the data from the multiple >> processes. > > I think you mean, write a non-python data structure in shared memory, such as > writing it in C? If so, you’re right, I want to avoid the time overhead for > writing something like that. Although I have used C data in shared-memory in > the past when the data structure was simple enough. It’s not a foreign > concept to me — it just would have been a real nuisance in this case. > > An in-memory SQLLite database would have been too slow, at least if I used > any kind of ORM. Without an ORM it still would have slowed things down while > making for code that’s harder to read and write. While I have used in-memory > SQLite code at times, I’m not sure how much slowdown it would have engendered > in this case. > >> Your suggestion (2), of having a non-refcounted data structure is >> essentially this, doable as an extension module. The core data >> structures all use refcounting, and that's unlikely to change, but >> there's nothing to say that an extension module couldn't implement >> fast data structures with objects allocated from a pool of >> preallocated memory which is only freed as a complete block. > > Again, I think you’re talking about non-Python data structures, for instance > C structures, which could be written to be “fast”? Again, I want to avoid > writing that kind of code. Sure, for a production project where I had more > programmer time, that would be a solution, but that wasn’t my situation. And, > ideally, even if I had more time, I would greatly prefer not to have to spend > it on that kind of code. I like Python because it saves me time and > eliminates potential bugs that are associated with language like C but not > with Python (primarily memory management related). To the extent that I have > to write and debug external modules in C or C++, it doesn’t. I've used cffi to good effect to gain some of the benefits of the "share a lump of memory" model. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Yet another "A better story for multi-core Python" comment
Maybe you just have a job for Cap'n'proto? https://capnproto.org/ On 8 September 2015 at 11:12, Gary Robinsonwrote: > Folks, > > If it’s out of line in some way for me to make this comment on this list, let > me know and I’ll stop! But I do feel strongly about one issue and think it’s > worth mentioning, so here goes. > > I read the "A better story for multi-core Python” with great interest because > the GIL has actually been a major hindrance to me. I know that for many uses, > it’s a non-issue. But it was for me. > > My situation was that I had a huge (technically mutable, but unchanging) data > structure which needed a lot of analysis. CPU time was a major factor — > things took days to run. But even so, my time as a programmer was much more > important than CPU time. I needed to prototype different algorithms very > quickly. Even Cython would have slowed me down too much. Also, I had a lot of > reason to want to make use of the many great statistical functions in SciPy, > so Python was an excellent choice for me in that way. > > So, even though pure Python might not be the right choice for this program in > a production environment, it was the right choice for me at the time. And, if > I could have accessed as many cores as I wanted, it may have been good enough > in production too. But my work was hampered by one thing: > > There was a huge data structure that all the analysis needed to access. Using > a database would have slowed things down too much. Ideally, I needed to > access this same structure from many cores at once. On a Power8 system, for > example, with its larger number of cores, performance may well have been good > enough for production. In any case, my experimentation and prototyping would > have gone more quickly with more cores. > > But this data structure was simply too big. Replicating it in different > processes used memory far too quickly and was the limiting factor on the > number of cores I could use. (I could fork with the big data structure > already in memory, but copy-on-write issues due to reference counting caused > multiple copies to exist anyway.) > > So, one thing I am hoping comes out of any effort in the “A better story” > direction would be a way to share large data structures between processes. > Two possible solutions: > > 1) More the reference counts away from data structures, so copy-on-write > isn’t an issue. That sounds like a lot of work — I have no idea whether it’s > practical. It has been mentioned in the “A better story” discussion, but I > wanted to bring it up again in the context of my specific use-case. Also, it > seems worth reiterating that even though copy-on-write forking is a Unix > thing, the midipix project appears to bring it to Windows as well. > (http://midipix.org) > > 2) Have a mode where a particular data structure is not reference counted or > garbage collected. The programmer would be entirely responsible for manually > calling del on the structure if he wants to free that memory. I would imagine > this would be controversial because Python is currently designed in a very > different way. However, I see no actual risk if one were to use an > @manual_memory_management decorator or some technique like that to make it > very clear that the programmer is taking responsibility. I.e., in general, > information sharing between subinterpreters would occur through message > passing. But there would be the option of the programmer taking > responsibility of memory management for a particular structure. In my case, > the amount of work required for this would have been approximately zero — > once the structure was created, it was needed for the lifetime of the process. > > Under this second solution, there would be little need to actually remove the > reference counts from the data structures — they just wouldn’t be accessed. > Maybe it’s not a practical solution, if only because of the overhead of > Python needing to check whether a given structure is manually managed or not. > In that case, the first solution makes more sense. > > In any case I thought this was worth mentioning, because it has been a real > problem for me, and I assume it has been a real problem for other people as > well. If a solution is both possible and practical, that would be great. > > Thank you for listening, > Gary > > > -- > > Gary Robinson > gary...@me.com > http://www.garyrobinson.net > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/jsbueno%40python.org.br ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Yet another "A better story for multi-core Python" comment
Folks, If it’s out of line in some way for me to make this comment on this list, let me know and I’ll stop! But I do feel strongly about one issue and think it’s worth mentioning, so here goes. I read the "A better story for multi-core Python” with great interest because the GIL has actually been a major hindrance to me. I know that for many uses, it’s a non-issue. But it was for me. My situation was that I had a huge (technically mutable, but unchanging) data structure which needed a lot of analysis. CPU time was a major factor — things took days to run. But even so, my time as a programmer was much more important than CPU time. I needed to prototype different algorithms very quickly. Even Cython would have slowed me down too much. Also, I had a lot of reason to want to make use of the many great statistical functions in SciPy, so Python was an excellent choice for me in that way. So, even though pure Python might not be the right choice for this program in a production environment, it was the right choice for me at the time. And, if I could have accessed as many cores as I wanted, it may have been good enough in production too. But my work was hampered by one thing: There was a huge data structure that all the analysis needed to access. Using a database would have slowed things down too much. Ideally, I needed to access this same structure from many cores at once. On a Power8 system, for example, with its larger number of cores, performance may well have been good enough for production. In any case, my experimentation and prototyping would have gone more quickly with more cores. But this data structure was simply too big. Replicating it in different processes used memory far too quickly and was the limiting factor on the number of cores I could use. (I could fork with the big data structure already in memory, but copy-on-write issues due to reference counting caused multiple copies to exist anyway.) So, one thing I am hoping comes out of any effort in the “A better story” direction would be a way to share large data structures between processes. Two possible solutions: 1) More the reference counts away from data structures, so copy-on-write isn’t an issue. That sounds like a lot of work — I have no idea whether it’s practical. It has been mentioned in the “A better story” discussion, but I wanted to bring it up again in the context of my specific use-case. Also, it seems worth reiterating that even though copy-on-write forking is a Unix thing, the midipix project appears to bring it to Windows as well. (http://midipix.org) 2) Have a mode where a particular data structure is not reference counted or garbage collected. The programmer would be entirely responsible for manually calling del on the structure if he wants to free that memory. I would imagine this would be controversial because Python is currently designed in a very different way. However, I see no actual risk if one were to use an @manual_memory_management decorator or some technique like that to make it very clear that the programmer is taking responsibility. I.e., in general, information sharing between subinterpreters would occur through message passing. But there would be the option of the programmer taking responsibility of memory management for a particular structure. In my case, the amount of work required for this would have been approximately zero — once the structure was created, it was needed for the lifetime of the process. Under this second solution, there would be little need to actually remove the reference counts from the data structures — they just wouldn’t be accessed. Maybe it’s not a practical solution, if only because of the overhead of Python needing to check whether a given structure is manually managed or not. In that case, the first solution makes more sense. In any case I thought this was worth mentioning, because it has been a real problem for me, and I assume it has been a real problem for other people as well. If a solution is both possible and practical, that would be great. Thank you for listening, Gary -- Gary Robinson gary...@me.com http://www.garyrobinson.net ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Yet another "A better story for multi-core Python" comment
On 8 September 2015 at 15:12, Gary Robinsonwrote: > So, one thing I am hoping comes out of any effort in the “A better story” > direction would be a way to share large data structures between processes. > Two possible solutions: > > 1) More the reference counts away from data structures, so copy-on-write > isn’t an issue. That sounds like a lot of work — I have no idea whether it’s > practical. It has been mentioned in the “A better story” discussion, but I > wanted to bring it up again in the context of my specific use-case. Also, it > seems worth reiterating that even though copy-on-write forking is a Unix > thing, the midipix project appears to bring it to Windows as well. > (http://midipix.org) > > 2) Have a mode where a particular data structure is not reference counted or > garbage collected. The programmer would be entirely responsible for manually > calling del on the structure if he wants to free that memory. I would imagine > this would be controversial because Python is currently designed in a very > different way. However, I see no actual risk if one were to use an > @manual_memory_management decorator or some technique like that to make it > very clear that the programmer is taking responsibility. I.e., in general, > information sharing between subinterpreters would occur through message > passing. But there would be the option of the programmer taking > responsibility of memory management for a particular structure. In my case, > the amount of work required for this would have been approximately zero — > once the structure was created, it was needed for the lifetime of the process. I guess a third possible solution, although it would probably have meant developing something for yourself which would have hit the same "programmer time is critical" issue that you noted originally, would be to create a module that managed the data structure in shared memory, and then use that to access the data from the multiple processes. If your data structure is generic enough, you could make such a module generally usable - or there may even be something available already... I know you said that putting the data into a database would be too slow, but how about an in-memory Sqlite database (using shared memory so that there was only one copy for all processes)? Your suggestion (2), of having a non-refcounted data structure is essentially this, doable as an extension module. The core data structures all use refcounting, and that's unlikely to change, but there's nothing to say that an extension module couldn't implement fast data structures with objects allocated from a pool of preallocated memory which is only freed as a complete block. These suggestions are probably more suitable for python-list, though, as (unlike your comment on non-refcounted core data structures) they are things you can do in current versions of Python. Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Yet another "A better story for multi-core Python" comment
On 08.09.2015 19:17, R. David Murray wrote: On Tue, 08 Sep 2015 10:12:37 -0400, Gary Robinsonwrote: 2) Have a mode where a particular data structure is not reference counted or garbage collected. This sounds kind of like what Trent did in PyParallel (in a more generic way). Yes, I can recall that from his talk as well. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Yet another "A better story for multi-core Python" comment
> I guess a third possible solution, although it would probably have > meant developing something for yourself which would have hit the same > "programmer time is critical" issue that you noted originally, would > be to create a module that managed the data structure in shared > memory, and then use that to access the data from the multiple > processes. I think you mean, write a non-python data structure in shared memory, such as writing it in C? If so, you’re right, I want to avoid the time overhead for writing something like that. Although I have used C data in shared-memory in the past when the data structure was simple enough. It’s not a foreign concept to me — it just would have been a real nuisance in this case. An in-memory SQLLite database would have been too slow, at least if I used any kind of ORM. Without an ORM it still would have slowed things down while making for code that’s harder to read and write. While I have used in-memory SQLite code at times, I’m not sure how much slowdown it would have engendered in this case. > Your suggestion (2), of having a non-refcounted data structure is > essentially this, doable as an extension module. The core data > structures all use refcounting, and that's unlikely to change, but > there's nothing to say that an extension module couldn't implement > fast data structures with objects allocated from a pool of > preallocated memory which is only freed as a complete block. Again, I think you’re talking about non-Python data structures, for instance C structures, which could be written to be “fast”? Again, I want to avoid writing that kind of code. Sure, for a production project where I had more programmer time, that would be a solution, but that wasn’t my situation. And, ideally, even if I had more time, I would greatly prefer not to have to spend it on that kind of code. I like Python because it saves me time and eliminates potential bugs that are associated with language like C but not with Python (primarily memory management related). To the extent that I have to write and debug external modules in C or C++, it doesn’t. But, my view is: I shouldn’t be forced to even think about that kind of thing. Python should simply provide a solution. The fact that the reference counters are mixed in with the data structure, so that copy-on-write causes copies to be made of the data structure shouldn’t be something I should have to discover by trial and error, or by having deep knowledge of language and OS internals before I start a project, and then have to try to find a way to work around. Obviously, Python, like any language, will always have limitations, and therefore it’s arguable that no one should say that any language “should” do anything it doesn’t do; if I don’t like it, I can use a more appropriate language. But these limitations aren’t obvious up-front. They make the language less predictable to people who don’t have a deep knowledge and just want to get something done and think Python (especially combined with things like SciPy) looks like a great choice to do them. And that confusion and uncertainty has to be bad for general language acceptance. I don’t see it as “PR issue” — I see it as a practical issue having to do with the cost of knowledge acquisition. Indeed, I personally lost a lot of time because I didn’t understand them upfront! Solving the problem I mention here would provide real benefits even with the current multiprocessing module. But it would also make the “A better story” subinterpreter idea a better solution than it would be without it. The subinterpreter multi-core solution is a major project — it seems like it would be a shame to create that solution and still have it not solve the problem discussed here. Anyway, too much of this post is probably spent proseletyzing for my point of view. Members of python-dev can judge it as they think fit — I don’t have much more to say unless anyone has questions. But if I’m missing something about the solutions mentioned by Paul, and they can be implemented in pure Python, I would be much appreciative if that could be explained! Thanks, Gary -- Gary Robinson gary...@me.com http://www.garyrobinson.net > On Sep 8, 2015, at 11:44 AM, Paul Moorewrote: > > On 8 September 2015 at 15:12, Gary Robinson wrote: >> So, one thing I am hoping comes out of any effort in the “A better story” >> direction would be a way to share large data structures between processes. >> Two possible solutions: >> >> 1) More the reference counts away from data structures, so copy-on-write >> isn’t an issue. That sounds like a lot of work — I have no idea whether it’s >> practical. It has been mentioned in the “A better story” discussion, but I >> wanted to bring it up again in the context of my specific use-case. Also, it >> seems worth reiterating that even though copy-on-write forking is a Unix >> thing, the midipix
Re: [Python-Dev] Yet another "A better story for multi-core Python" comment
On Tue, 08 Sep 2015 10:12:37 -0400, Gary Robinsonwrote: > 2) Have a mode where a particular data structure is not reference > counted or garbage collected. This sounds kind of like what Trent did in PyParallel (in a more generic way). --David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Yet another "A better story for multi-core Python" comment
R. David Murray writes: > On Tue, 08 Sep 2015 10:12:37 -0400, Gary Robinsonwrote: > > 2) Have a mode where a particular data structure is not reference > > counted or garbage collected. > > This sounds kind of like what Trent did in PyParallel (in a more generic > way). Except Gary has a large persistent data structure, and Trent's only rule is "don't persist objects you want to operate on in parallel." The similarity may be purely superficial, though. @Gary: Justifying your request is unnecessary. As far as I can see, everybody acknowledges that "large shared data structure" + "multiple cores" is something that Python doesn't do well enough in some sense. It's just a hard problem, and the applications that really need it are sufficiently specialized that we haven't been able to justify turning the world upside down to serve them. Trent seems to be on to something that requires only a bit of a tilt ;-), and despite the caveat above, I agree with David, check it out: https://mail.python.org/pipermail/python-dev/2015-September/141485.html ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Yet another "A better story for multi-core Python" comment
On 9/8/2015 2:08 PM, Stephen J. Turnbull wrote: R. David Murray writes: > On Tue, 08 Sep 2015 10:12:37 -0400, Gary Robinsonwrote: > > 2) Have a mode where a particular data structure is not reference > > counted or garbage collected. > > This sounds kind of like what Trent did in PyParallel (in a more generic > way). Except Gary has a large persistent data structure, and Trent's only rule is "don't persist objects you want to operate on in parallel." The similarity may be purely superficial, though. That rule, which includes not modifying persistent data, is only for the parallel threads. In his wikipedia search example, the main thread loads 60 GB of data (and perhaps occasionally updates it) while multiple parallel threads, running of multiple cores, search the persistent data like busy little bees. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com