Re: [Pytables-users] pytables or pyroot?

Francesc Alted Thu, 30 Jul 2009 06:02:28 -0700

A Thursday 30 July 2009 14:32:42 escriguéreu:
> On Thu, 2009-07-30 at 13:46 +0200, Francesc Alted wrote:
> > A Thursday 30 July 2009 12:22:27 David Fokkema escrigué:
> > > On Wed, 2009-07-29 at 18:47 +0200, Francesc Alted wrote:
> > > > Hi David,
> > > >
> > > > If you need concurrency for writing you can always setup a data
> > > > collector that gathers info from the several threads by using the
> > > > `Queue.queue()` container. As it is thread safe, you don't have to
> > > > worry about concurrency problems.
> > >
> > > Yes, I've used that before quite succesfully. I'm however not quite
> > > sure how separate apache threads will be able to contact my writer
> > > thread. Maybe I'll have to dive into apache documentation. Another
> > > problem is that we'd like to have a confirmation that the data was
> > > actually stored, so simple queuing won't suffice, I guess. Using
> > > threads and loading off writing data to one thread is a solution, then.
> >
> > This exactly what I meant.  Several threads sending data to the same
> > queue that will be extracted by the writing thread later on.  You can
> > also arbitrate a simple protocol for ensuring that data has been
> > effectively written (or not).
>
> Yes, but I can't figure out how to have multiple apache processes
> connect to the same Queue object.


Ah, correct, you are using different *processes*, not threads.  Well, I think 
that some kind of communications package must be used then.  There are plenty 
of options, but Pyro [1] or Ice [2] (I recently was told about this), seems to 
be powerful and easy enough to program.  If you want more performance, you may 
want to use MPI via mpi4py [3], but I don't really think you are going to need 
this.

[1] http://pyro.sourceforge.net/
[2] http://www.zeroc.com/icepy.html
[3] http://mpi4py.scipy.org/

[clip]
> > Mmh, that's a good question :-)  Well, provided that Pro allows me to
> > dedicate time to the PyTables project as a whole, I must admit that I
> > won't be very happy if that happens one day, and perhaps some users won't
> > be happy neither that the main developer of PyTables has to cease his
> > work on it --unless of course, other developers can continue my work by
> > using another business model.
>
> Figured this much. It's a pity, though. I guess some corporation has to
> step in and decide to fund you so that Pro can be fully open source, ;-)

That would be nice indeed :)

> > Having said this, I can't stop nobody doing things, and most specially
> > try to prevent implementing new indexing schemes on top of PyTables,
> > which can be a *good* thing for the community.  However, re-implementing
> > OPSI in open-source seems to me like a very little innovative project for
> > others, so it might be more interesting to go for something better (which
> > is possible for sure).
>
> Not innovative as opposed to Pro, but OPSI itself seems to be
> innovative. And fast, apparently. Oh well, certainly don't want you to
> stop working on pytables, :-)

Thank you :-)

> > > > Pro, perhaps you can split your data in several tables (hourly
> > > > tables?), setup some search code that can select the appropriate
> > > > table and then do an in- kernel query for the interesting table.
> > >
> > > Several identical (in structure) tables in one file? Haven't thought of
> > > that...
> >
> > Why not?  It can be a way to have better access to your data (if properly
> > categorized, of course).
>
> Indeed. It just never occured to me. I have several tables in my offline
> analysis files, but that is just because they _are_ different. When
> thinking about partitioning, I was thinking about splitting into several
> files, not splitting into several tables, for no particular reason
> really.
>
> Do you see a performance difference between the two options? Right now I
> can see that splitting up in tables has the benefit that you only have
> one file open, whereas splitting up in files has the benefit that you
> can more easily move data around.

You can also move tables around when working with the single file approach 
(see the `File.moveNode` method, or the ``ptrepack`` utility).  I don't see 
any disadvantage in having several tables in one single file other than you 
should keep the number small (around some thousands) for not incurring into 
performance penalties derived from internal HDF5 pointer handling.

Cheers,

-- 
Francesc Alted

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] pytables or pyroot?

Reply via email to