Re: [Pytables-users] pytables or pyroot?

David Fokkema Fri, 18 Sep 2009 06:17:19 -0700

Hi list,

I can now announce that my project has decided to drop MySQL, not
consider ROOT and move to PyTables for its central data storage
solution, due to speed and ease of use. Thanks for a great product, in
particular thank you Francesc, for creating it and ease our burden by
providing us with innumerable and elaborate answers to our hard
questions.



On Thu, 2009-07-30 at 15:00 +0200, Francesc Alted wrote:
> A Thursday 30 July 2009 14:32:42 escriguéreu:
> > On Thu, 2009-07-30 at 13:46 +0200, Francesc Alted wrote:
> > > A Thursday 30 July 2009 12:22:27 David Fokkema escrigué:
> > > > On Wed, 2009-07-29 at 18:47 +0200, Francesc Alted wrote:
> > > > > Hi David,
> > > > >
> > > > > If you need concurrency for writing you can always setup a data
> > > > > collector that gathers info from the several threads by using the
> > > > > `Queue.queue()` container. As it is thread safe, you don't have to
> > > > > worry about concurrency problems.
> > > >
> > > > Yes, I've used that before quite succesfully. I'm however not quite
> > > > sure how separate apache threads will be able to contact my writer
> > > > thread. Maybe I'll have to dive into apache documentation. Another
> > > > problem is that we'd like to have a confirmation that the data was
> > > > actually stored, so simple queuing won't suffice, I guess. Using
> > > > threads and loading off writing data to one thread is a solution, then.
> > >
> > > This exactly what I meant.  Several threads sending data to the same
> > > queue that will be extracted by the writing thread later on.  You can
> > > also arbitrate a simple protocol for ensuring that data has been
> > > effectively written (or not).
> >
> > Yes, but I can't figure out how to have multiple apache processes
> > connect to the same Queue object.
> 
> Ah, correct, you are using different *processes*, not threads.  Well, I think 
> that some kind of communications package must be used then.  There are plenty 
> of options, but Pyro [1] or Ice [2] (I recently was told about this), seems 
> to 
> be powerful and easy enough to program.  If you want more performance, you 
> may 
> want to use MPI via mpi4py [3], but I don't really think you are going to 
> need 
> this.
> 
> [1] http://pyro.sourceforge.net/
> [2] http://www.zeroc.com/icepy.html
> [3] http://mpi4py.scipy.org/
> 

We'll look into those, thanks!

> [clip]
> > > Mmh, that's a good question :-)  Well, provided that Pro allows me to
> > > dedicate time to the PyTables project as a whole, I must admit that I
> > > won't be very happy if that happens one day, and perhaps some users won't
> > > be happy neither that the main developer of PyTables has to cease his
> > > work on it --unless of course, other developers can continue my work by
> > > using another business model.
> >
> > Figured this much. It's a pity, though. I guess some corporation has to
> > step in and decide to fund you so that Pro can be fully open source, ;-)
> 
> That would be nice indeed :)

Any candidates yet?

> 
> > > Having said this, I can't stop nobody doing things, and most specially
> > > try to prevent implementing new indexing schemes on top of PyTables,
> > > which can be a *good* thing for the community.  However, re-implementing
> > > OPSI in open-source seems to me like a very little innovative project for
> > > others, so it might be more interesting to go for something better (which
> > > is possible for sure).
> >
> > Not innovative as opposed to Pro, but OPSI itself seems to be
> > innovative. And fast, apparently. Oh well, certainly don't want you to
> > stop working on pytables, :-)
> 
> Thank you :-)

:-)

> 
> > > > > Pro, perhaps you can split your data in several tables (hourly
> > > > > tables?), setup some search code that can select the appropriate
> > > > > table and then do an in- kernel query for the interesting table.
> > > >
> > > > Several identical (in structure) tables in one file? Haven't thought of
> > > > that...
> > >
> > > Why not?  It can be a way to have better access to your data (if properly
> > > categorized, of course).
> >
> > Indeed. It just never occured to me. I have several tables in my offline
> > analysis files, but that is just because they _are_ different. When
> > thinking about partitioning, I was thinking about splitting into several
> > files, not splitting into several tables, for no particular reason
> > really.
> >
> > Do you see a performance difference between the two options? Right now I
> > can see that splitting up in tables has the benefit that you only have
> > one file open, whereas splitting up in files has the benefit that you
> > can more easily move data around.
> 
> You can also move tables around when working with the single file approach 
> (see the `File.moveNode` method, or the ``ptrepack`` utility).  I don't see 
> any disadvantage in having several tables in one single file other than you 
> should keep the number small (around some thousands) for not incurring into 
> performance penalties derived from internal HDF5 pointer handling.

Some thousands... that won't be a problem for us. We only have about a
hundred detector stations. Furthermore, since millions of rows is not
really a problem, it might be faster to dump them all in one table. One
use case sees us fetching all station timestamps to look for
coincidences, might as well be from one table.

How easy is it to open a new file, lift lots of data from the old file
(possibly using a selection mechanism) and 'copy' it to the new file? Of
course, you can iterate and table.row.append() each row, but is there a
faster solution?

Thanks,

David


------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] pytables or pyroot?

Reply via email to