Quick and easy solution: at the end of each day, remove all the
time-series data from MySQL and add it to PyTables. Make it every 10
minutes if you need it faster.

Alternatively, write a single server program that inserts data into
backend storage sequentially. The front-end Apache server processes
(if you really need them to be Apache) can just forward data to it --
over the network, or by writing files on disk (hint: renaming or
moving files is atomic).

-Ken



On Tue, Jul 28, 2009 at 3:14 PM, David Fokkema<dfokk...@ileos.nl> wrote:
> Hi list,
>
> I've been using pytables for offline analysis for a while now. Workflow
> is simple: I extract a set of data from a database and store it in hdf5
> using pytables and start doing analysis work. The thing is: database
> performance is breaking down now that we have about 100M events stored
> and after two years of patching indexes, queries, mysql settings and
> things like that we're increasingly worrying about using a relational
> database for data storage in the first place. Furthermore, when
> extracting real data queries get horribly complex and the data must be
> postprocessed before it can be useful. Once stored in pytables,
> retrieving data is, of course, very easy.
>
> We've asked around what large experiments (LHC experiments like ATLAS)
> are using and they are _not_ using db's for storage. That is expected
> since a single event could take up in the order of a hundred Mb. The
> point is that they are very happy with using ROOT for data storage. ROOT
> is the analysis framework used by most high energy physicists and is
> especially adapted to be used for data storage as well. However, not
> everyone is happy with ROOT. Criticism mainly concerns the complexity of
> ROOT and the cleanliness of the design.
>
> For python users, there is pyROOT. Of course, we know and love pytables.
> We're going to test several things, but I'd like to have your thoughts
> on the question if pytables is a sane choice for semi-large scale data
> storage. Our requirements are:
>
> - Data is send over http and received by python scripts running behind
> apache. We need concurrency (no problem for mysql)
> - Each detector station sends about 40k events per day.
> - Within a year or two, we need to be able to handle about 100 detector
> stations, making this 4M events per day.
> - Each event is about 12k
> - It should be relatively easy to access all data from one detector on a
> particular day
> - It should be relatively easy to search for coincidences between
> detector stations, based on timestamps. That is, retrieving all
> timestamps from all detector stations on a particular day should be
> easy.
>
> It is possible to have a relational database containing metadata on top
> of the low-level data storage. In fact, that's how ATLAS manages things.
>
> When using pytables, what are your thoughts on the size of individual
> files? One file per day? One file per detector one day? One file per
> apache thread per day? The last option is probably the easiest to
> implement (no need to worry about several threads accessing the same
> file) but would probably make it hard to quickly access one detectors
> data because it would be contained in separate files.
>
> Your input is very much appreciated!
>
> Thanks,
>
> David
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to