On Fri, Mar 9, 2012 at 3:40 AM, Daπid <davidmen...@gmail.com> wrote:

> On Wed, Mar 7, 2012 at 4:38 AM, Anthony Scopatz <scop...@gmail.com> wrote:
> > 1) What you described.  Every process writes out its own library and then
> > a post-process sweeps through and combines them all later.  This is
> > probably the easiest to implement.  You wouldn't even need to dump them
> > to ASCII.  Tables have an append() method you would find useful.
>
> The documentation page appears to be broken:
> http://pytables.github.com/usersguide/ch04.html#Table.append
>
> But I found this: http://www.pytables.org/moin/HintsForSQLUsers.
>
> >From that I understand that I should get the rows with something like:
>
> [for (x['n'], x['c']...) in table.inter...]
>
> and then append directly that into my "master" DB.
>
>
Sorry about the broken link.  Here is the fixed one:
http://pytables.github.com/usersguide/libref.html#tables.Table.append

Rather than using a list comprehension you might also want to look at:
http://pytables.github.com/usersguide/libref.html#tables.Table.whereAppend


>
>
> > 2) Have one library per node (ie 10 total libraries, 4 processes per
> > library).
> > If the writing is done in a thread safe way, then you only have to sweep
> > through and post-process 10 files.  Naturally, the individual file sizes
> > are
> > larger.
>
> >From the FAQ: "several process writing concurrently to the same
> PyTables file will probably end corrupting it, so don't do this!" How
> can this thread-safe way we achieved? I thought this was impossible
> (but I have knowledge holes). How could that be achieved?
>

That is correct.  Each node will have to have a single thread which is the
only
one which is responsible for writing to the actual file.  You can't write
in parallel
but in this option, only one of the cores is ever writing.  This means that
the
other cores can continue performing computations.


>
>
> > 3) Have one master process whose sole job it is to write the single
> library.
> > All other 'compute' processes communicate with this process.  The compute
> > processes will calculate a row of the table and send it back over the
> wire
> > to the master process as a tuple.  The master process will take this row,
> > put it on a stack and crank through the stack, adding rows to the table
> when
> > it has free time.  For communication, you could use something like JSON
> RPC
> > (in the Python standard library) or ZeroMQ / pyzmq (which is easy to use
> and
> > has a lot of nice features) or MPI / mpi4py (which is meant for high
> > performance
> > computing concerns).  No post-processing is needed for this strategy.
>
> This one looks complicated to me. Do you know of any simple example,
> or how to look for it? If there is not easy way, I will stick to the
> previous options.
>

I would go watch the zeromq introduction videos to get a quick introduction
on how to do this kind of thing.
http://www.zeromq.org/intro:read-the-manual

If done properly, the difference between this option and (2) is that option
(2)
can use python threading an multiprocess and this option requires different,
third-party libraries.

Let us know if you have any other questions!

Be Well
Anthony


>
>
> Thank you for your help.
>
> David.
>
>
> ------------------------------------------------------------------------------
> Virtualization & Cloud Management Using Capacity Planning
> Cloud computing makes use of virtualization - but cloud computing
> also focuses on allowing computing to be delivered as a service.
> http://www.accelacomm.com/jaw/sfnl/114/51521223/
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to