Hopefully an easy one for someone out there! Am trying to transfer some of my existing calculations from a relational DB into pytables to speed up a range of calculations. One of the problems is that the data has irregular dates and I'm not sure where to do the processing.
For example: Date CompanyID Measure1 Measure2 ..... Measure 20 '20110912' 535 5.0 15 '20110911' 535 5.1 12 '20110910' 535 4.9 11 '20110909' 535 3.6 20 '20110912' 12 6.0 34 '20110911' 12 6.4 34 '20110910' 12 4.6 36 '20110909' 12 2.6 35 Date CompanyID Measure21 ........ Measure 30 '20110901' 535 6.6 '20110812' 535 6.1 '20110901' 12 6.0 '20110814' 12 6.3 Date CompanyID Measure31 ...... Measure 100 '20110911' 535 7.6 '20110818' 535 7.1 '20110903' 12 7.0 '20110818' 12 7.3 Usually I will be using the latest value for each measure until a newer one presents itself (using scikits.timeseries to do the joins). While its not strictly true, for the purpose of the exercise I think we could assume the first table has daily data with 10-20 measures, the second has monthly with 5-10 measures and the third has annual data with 50-70 measures. The calculations include a lot of aggregations with different splices of companies plus a lot of statistical measures on timeseries. I am thinking my options are: 1. Store in one big table with the minimum data i.e. CompanyID for my first column and then an array for each column: eg column 2 might be Measure1 and 2, so for the example with company 535 column 2 would be [['20110912',5.0,15],['20110911',5.1,12],[ '20110910',4.9,11],['20110909',3.6,20]] say filled with daily data. Then column 3 might be a much smaller array filled with (approximately) monthly data, column 4 would be smaller again filled with irregular data (say annual data) 2. Store in one huge table but process the data as it goes into the pytable so that the data is unique on company and the date and I fill all the empty spaces as it goes in. My rough estimate is that this will make my database about 50-100 times larger than the first option (maybe 1Tb vs 10-20Gb). Final thought though is that I am guessing that as the annual data is the same for 250 weekdays a year that it would compress pretty well... haven't looked at compression yet... 3. Store in multiple small tables and then try to join (I am assuming this one is wrong because its how I would do it in SQL!) 4 ????? Thanks for any help - am hoping to get the data design right! Damien
------------------------------------------------------------------------------ BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA Learn about the latest advances in developing for the BlackBerry® mobile platform with sessions, labs & more. See new tools and technologies. Register for BlackBerry® DevCon today! http://p.sf.net/sfu/rim-devcon-copy1
_______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users