Hi all,
I am working on an energy trading system right now, and am looking at
pytables as a way to store some large multidimensional arrays for use with
numpy.
The main data is stored as roughly 4,000 points * 24 hours * 365 days * 6
items * 10 years or so, and can be fit into an int16 if I'm willing to lose
a little resolution.
The algorithms are mostly vectorizable, but occasionally I need to iterate
through a few million rows and do some math that I can't vectorize. The
vectorized algorithms will hit pretty much every datapoint during
backtesting.
So, here's my question, which I can't seem to find help for on the pytables
site -- what's the best way to store this data in pytables? Should I be
creating a custom pytables-style data structure, or should I create a
location in the pytables HDF5 file which stores a compressed numpy array,
maybe one per year or so, maybe everything in one ginormous array?
My main focus right now is making sure that as much of the vector / matrix
math as possible can hit numpy quickly and ideally use the multicore support
there, or fall back to weave or inlined C if necessary; secondary is ease of
importing more data into the system.
Right now, though, I'd just love some initial thoughts on best practices /
trade-offs. Pitches on Pytables Pro also appreciated.
Thanks,
Peter
--
Peter J Vessenes http://about.me/peterv
------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users