Re: [Pytables-users] Merging multiple DB

Daπid Fri, 09 Mar 2012 01:41:33 -0800

On Wed, Mar 7, 2012 at 4:38 AM, Anthony Scopatz <scop...@gmail.com> wrote:
> 1) What you described.  Every process writes out its own library and then
> a post-process sweeps through and combines them all later.  This is
> probably the easiest to implement.  You wouldn't even need to dump them
> to ASCII.  Tables have an append() method you would find useful.


The documentation page appears to be broken:
http://pytables.github.com/usersguide/ch04.html#Table.append

But I found this: http://www.pytables.org/moin/HintsForSQLUsers.

>From that I understand that I should get the rows with something like:

[for (x['n'], x['c']...) in table.inter...]

and then append directly that into my "master" DB.



> 2) Have one library per node (ie 10 total libraries, 4 processes per
> library).
> If the writing is done in a thread safe way, then you only have to sweep
> through and post-process 10 files.  Naturally, the individual file sizes
> are
> larger.

>From the FAQ: "several process writing concurrently to the same
PyTables file will probably end corrupting it, so don't do this!" How
can this thread-safe way we achieved? I thought this was impossible
(but I have knowledge holes). How could that be achieved?


> 3) Have one master process whose sole job it is to write the single library.
> All other 'compute' processes communicate with this process.  The compute
> processes will calculate a row of the table and send it back over the wire
> to the master process as a tuple.  The master process will take this row,
> put it on a stack and crank through the stack, adding rows to the table when
> it has free time.  For communication, you could use something like JSON RPC
> (in the Python standard library) or ZeroMQ / pyzmq (which is easy to use and
> has a lot of nice features) or MPI / mpi4py (which is meant for high
> performance
> computing concerns).  No post-processing is needed for this strategy.

This one looks complicated to me. Do you know of any simple example,
or how to look for it? If there is not easy way, I will stick to the
previous options.


Thank you for your help.

David.

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Merging multiple DB

Reply via email to