On Wed, 2009-07-29 at 18:47 +0200, Francesc Alted wrote:
> Hi David,

> If you need concurrency for writing you can always setup a data collector 
> that 
> gathers info from the several threads by using the `Queue.queue()` container. 
>  
> As it is thread safe, you don't have to worry about concurrency problems.

Yes, I've used that before quite succesfully. I'm however not quite sure
how separate apache threads will be able to contact my writer thread.
Maybe I'll have to dive into apache documentation. Another problem is
that we'd like to have a confirmation that the data was actually stored,
so simple queuing won't suffice, I guess. Using threads and loading off
writing data to one thread is a solution, then.

> > - Each detector station sends about 40k events per day.
> > - Within a year or two, we need to be able to handle about 100 detector
> > stations, making this 4M events per day.
> > - Each event is about 12k
> 
> Well, 4 MB * 12 KB makes around 50 GB per day.  Provided that PyTables can 
> write at full disk speed (if used correctly), say 500 MB/s on a decent RAID, 
> it can write this info in less than 2 minutes, so I would not say that this 
> is 
> a problem at all.  You will only need to make sure that your system has a 
> decent amount of memory so that the queue object can act as a buffer with 
> enough capacity to cope with data bunches.

We have a RAID, so writing speed will not be a problem then. And I'll
think we can have a decent amount of memory, since a few GB should
suffice.

> > - It should be relatively easy to access all data from one detector on a
> > particular day
> > - It should be relatively easy to search for coincidences between
> > detector stations, based on timestamps. That is, retrieving all
> > timestamps from all detector stations on a particular day should be
> > easy.
> 
> My preferences here go to use a monolithic table to save your daily 
> observations, and then make use of the indexing capabilities of PyTables Pro 
> for locating and retrieving your data quickly.  If you can't afford buying 

We'll take that into consideration. My personal preference would be to
always use open source, so pro is in that respect a step back. However,
I'd rather spend money on the pro version of a product I really like
with a very dedicated maintainer than on some other proprietary
solution. We basically don't have much money to spend because we're a
relatively small outreach project which lives off donations from our
institute and some industrial partners (earmarked for particular schools
in their neighborhood and not to be spent centrally).

Just curious, how would you look upon someone else implementing OPSI in
pytables (fully open source)?

> Pro, perhaps you can split your data in several tables (hourly tables?), 
> setup 
> some search code that can select the appropriate table and then do an in-
> kernel query for the interesting table.

Several identical (in structure) tables in one file? Haven't thought of
that...

> > It is possible to have a relational database containing metadata on top
> > of the low-level data storage. In fact, that's how ATLAS manages things.
> 
> Exactly.  This is why I like to call PyTables as a relational database 
> *teammate*, not a competitor:
> 
> http://pytables.org/moin/FAQ#IsPyTablesareplacementforarelationaldatabase.3F

Yes, interesting indeed.

> 
> >
> > When using pytables, what are your thoughts on the size of individual
> > files? One file per day? One file per detector one day? One file per
> > apache thread per day? The last option is probably the easiest to
> > implement (no need to worry about several threads accessing the same
> > file) but would probably make it hard to quickly access one detectors
> > data because it would be contained in separate files.
> 
> As I said before, my preference goes to consolidate daily data on a single 
> table (50 GB is not that much), but of course that would depend on your 
> requirements and budget.  At any rate there are many different possibilities, 
> so I'd recommend you doing some experiments as it is the best way to assess 
> the best solution to your long term needs.

We'll start experimenting!

As always, thank you for your thoughtful reply.

Best regards,

David


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to