On Wed, Mar 7, 2012 at 4:38 AM, Anthony Scopatz <scop...@gmail.com> wrote: > 1) What you described. Every process writes out its own library and then > a post-process sweeps through and combines them all later. This is > probably the easiest to implement. You wouldn't even need to dump them > to ASCII. Tables have an append() method you would find useful.
The documentation page appears to be broken: http://pytables.github.com/usersguide/ch04.html#Table.append But I found this: http://www.pytables.org/moin/HintsForSQLUsers. >From that I understand that I should get the rows with something like: [for (x['n'], x['c']...) in table.inter...] and then append directly that into my "master" DB. > 2) Have one library per node (ie 10 total libraries, 4 processes per > library). > If the writing is done in a thread safe way, then you only have to sweep > through and post-process 10 files. Naturally, the individual file sizes > are > larger. >From the FAQ: "several process writing concurrently to the same PyTables file will probably end corrupting it, so don't do this!" How can this thread-safe way we achieved? I thought this was impossible (but I have knowledge holes). How could that be achieved? > 3) Have one master process whose sole job it is to write the single library. > All other 'compute' processes communicate with this process. The compute > processes will calculate a row of the table and send it back over the wire > to the master process as a tuple. The master process will take this row, > put it on a stack and crank through the stack, adding rows to the table when > it has free time. For communication, you could use something like JSON RPC > (in the Python standard library) or ZeroMQ / pyzmq (which is easy to use and > has a lot of nice features) or MPI / mpi4py (which is meant for high > performance > computing concerns). No post-processing is needed for this strategy. This one looks complicated to me. Do you know of any simple example, or how to look for it? If there is not easy way, I will stick to the previous options. Thank you for your help. David. ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users