Quick and easy solution: at the end of each day, remove all the time-series data from MySQL and add it to PyTables. Make it every 10 minutes if you need it faster.
Alternatively, write a single server program that inserts data into backend storage sequentially. The front-end Apache server processes (if you really need them to be Apache) can just forward data to it -- over the network, or by writing files on disk (hint: renaming or moving files is atomic). -Ken On Tue, Jul 28, 2009 at 3:14 PM, David Fokkema<dfokk...@ileos.nl> wrote: > Hi list, > > I've been using pytables for offline analysis for a while now. Workflow > is simple: I extract a set of data from a database and store it in hdf5 > using pytables and start doing analysis work. The thing is: database > performance is breaking down now that we have about 100M events stored > and after two years of patching indexes, queries, mysql settings and > things like that we're increasingly worrying about using a relational > database for data storage in the first place. Furthermore, when > extracting real data queries get horribly complex and the data must be > postprocessed before it can be useful. Once stored in pytables, > retrieving data is, of course, very easy. > > We've asked around what large experiments (LHC experiments like ATLAS) > are using and they are _not_ using db's for storage. That is expected > since a single event could take up in the order of a hundred Mb. The > point is that they are very happy with using ROOT for data storage. ROOT > is the analysis framework used by most high energy physicists and is > especially adapted to be used for data storage as well. However, not > everyone is happy with ROOT. Criticism mainly concerns the complexity of > ROOT and the cleanliness of the design. > > For python users, there is pyROOT. Of course, we know and love pytables. > We're going to test several things, but I'd like to have your thoughts > on the question if pytables is a sane choice for semi-large scale data > storage. Our requirements are: > > - Data is send over http and received by python scripts running behind > apache. We need concurrency (no problem for mysql) > - Each detector station sends about 40k events per day. > - Within a year or two, we need to be able to handle about 100 detector > stations, making this 4M events per day. > - Each event is about 12k > - It should be relatively easy to access all data from one detector on a > particular day > - It should be relatively easy to search for coincidences between > detector stations, based on timestamps. That is, retrieving all > timestamps from all detector stations on a particular day should be > easy. > > It is possible to have a relational database containing metadata on top > of the low-level data storage. In fact, that's how ATLAS manages things. > > When using pytables, what are your thoughts on the size of individual > files? One file per day? One file per detector one day? One file per > apache thread per day? The last option is probably the easiest to > implement (no need to worry about several threads accessing the same > file) but would probably make it hard to quickly access one detectors > data because it would be contained in separate files. > > Your input is very much appreciated! > > Thanks, > > David > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users