I’m going to seriously consider installing Windows or using a dedicated hosted windows box next time I have this problem so that I can try your solution. It does seem pretty ideal, although the STM branch of PyPy (using http://codespeak.net/execnet/ to access SciPy) might also work at this point.
Thanks! I still hope CPython has a solution at some point… maybe PyParallelel functionality will be integrated into Python 4 circa 2023… :) -- Gary Robinson gary...@me.com http://www.garyrobinson.net > On Sep 9, 2015, at 4:33 PM, Trent Nelson <tr...@snakebite.org> wrote: > > On Tue, Sep 08, 2015 at 10:12:37AM -0400, Gary Robinson wrote: >> There was a huge data structure that all the analysis needed to >> access. Using a database would have slowed things down too much. >> Ideally, I needed to access this same structure from many cores at >> once. On a Power8 system, for example, with its larger number of >> cores, performance may well have been good enough for production. In >> any case, my experimentation and prototyping would have gone more >> quickly with more cores. >> >> But this data structure was simply too big. Replicating it in >> different processes used memory far too quickly and was the limiting >> factor on the number of cores I could use. (I could fork with the big >> data structure already in memory, but copy-on-write issues due to >> reference counting caused multiple copies to exist anyway.) > > This problem is *exactly* the type of thing that PyParallel excels at, > just FYI. PyParallel can load large, complex data structures now, and > then access them freely from within multiple threads. I'd recommended > taking a look at the "instantaneous Wikipedia search server" example as > a start: > > https://github.com/pyparallel/pyparallel/blob/branches/3.3-px/examples/wiki/wiki.py > > That loads trie with 27 million entries, creates ~27.1 million > PyObjects, loads a huge NumPy array, and has a WSS of ~11GB. I've > actually got a new version in development that loads 6 tries of the > most frequent terms for character lengths 1-6. Once everything is > loaded, the data structures can be accessed for free in parallel > threads. > > There are more details regarding how this is achieved on the landing > page: > > https://github.com/pyparallel/pyparallel > > I've done a couple of consultancy projects now that were very data > science oriented (with huge data sets), so I really gained an > appreciation for how common the situation you describe is. It is > probably the best demonstration of PyParallel's strengths. > >> Gary Robinson gary...@me.com http://www.garyrobinson.net > > Trent. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com