Hi Ken, On Tue, 2009-07-28 at 17:28 -0400, Kenneth Arnold wrote: > Quick and easy solution: at the end of each day, remove all the > time-series data from MySQL and add it to PyTables. Make it every 10 > minutes if you need it faster.
You don't know our current MySQL schema. It really is a pain to retrieve _all_ detector data (lots of joins resulting in ten result rows for _one_ event). I'm happy we can drop it, ;-) > Alternatively, write a single server program that inserts data into > backend storage sequentially. The front-end Apache server processes > (if you really need them to be Apache) can just forward data to it -- > over the network, or by writing files on disk (hint: renaming or > moving files is atomic). Thanks for you detailed response in your other mail. I'll pass it on! Best regards, David > > -Ken > > > > On Tue, Jul 28, 2009 at 3:14 PM, David Fokkema<dfokk...@ileos.nl> wrote: > > Hi list, > > > > I've been using pytables for offline analysis for a while now. Workflow > > is simple: I extract a set of data from a database and store it in hdf5 > > using pytables and start doing analysis work. The thing is: database > > performance is breaking down now that we have about 100M events stored > > and after two years of patching indexes, queries, mysql settings and > > things like that we're increasingly worrying about using a relational > > database for data storage in the first place. Furthermore, when > > extracting real data queries get horribly complex and the data must be > > postprocessed before it can be useful. Once stored in pytables, > > retrieving data is, of course, very easy. > > > > We've asked around what large experiments (LHC experiments like ATLAS) > > are using and they are _not_ using db's for storage. That is expected > > since a single event could take up in the order of a hundred Mb. The > > point is that they are very happy with using ROOT for data storage. ROOT > > is the analysis framework used by most high energy physicists and is > > especially adapted to be used for data storage as well. However, not > > everyone is happy with ROOT. Criticism mainly concerns the complexity of > > ROOT and the cleanliness of the design. > > > > For python users, there is pyROOT. Of course, we know and love pytables. > > We're going to test several things, but I'd like to have your thoughts > > on the question if pytables is a sane choice for semi-large scale data > > storage. Our requirements are: > > > > - Data is send over http and received by python scripts running behind > > apache. We need concurrency (no problem for mysql) > > - Each detector station sends about 40k events per day. > > - Within a year or two, we need to be able to handle about 100 detector > > stations, making this 4M events per day. > > - Each event is about 12k > > - It should be relatively easy to access all data from one detector on a > > particular day > > - It should be relatively easy to search for coincidences between > > detector stations, based on timestamps. That is, retrieving all > > timestamps from all detector stations on a particular day should be > > easy. > > > > It is possible to have a relational database containing metadata on top > > of the low-level data storage. In fact, that's how ATLAS manages things. > > > > When using pytables, what are your thoughts on the size of individual > > files? One file per day? One file per detector one day? One file per > > apache thread per day? The last option is probably the easiest to > > implement (no need to worry about several threads accessing the same > > file) but would probably make it hard to quickly access one detectors > > data because it would be contained in separate files. > > > > Your input is very much appreciated! > > > > Thanks, > > > > David > > > > > > ------------------------------------------------------------------------------ > > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > > trial. Simplify your report design, integration and deployment - and focus > > on > > what you do best, core application coding. Discover what's new with > > Crystal Reports now. http://p.sf.net/sfu/bobj-july > > _______________________________________________ > > Pytables-users mailing list > > Pytables-users@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users