A Thursday 09 December 2010 10:19:22 Nicholas Potter escrigué: > Hello everyone, > > I am working with economic data for 3140 counties and the 50 states > as well as 500 industries, and trying to figure out the best way to > store and access the data. The two options seem to be to have one > table of ~32 million rows, like this: > > Region | Industry | variable | value > **data** > > or instead, decouple by region, having 3140 tables, one for each > county, with industries as the columns (so 500 columns) and > variables as the rows. > > I guess this is essentially a row versus column orientation question, > but also whether it would be better to split the tables or keep them > together. Are there advantages to either way? > > I will always be accessing data for a specific county, so it seems > separate tables might be better, but is there any reason to go with > one giant table instead?
No, I also think that separate tables is best. You know, PyTables uses a row-wise data arrangement for its current Table implementation --for PyTables 3, that I plan to introduce column-wise tables, but we are not there yet. Until then, having too wide tables (> 1000 bytes/row) is strongly discouraged. Hope this helps, -- Francesc Alted ------------------------------------------------------------------------------ This SF Dev2Dev email is sponsored by: WikiLeaks The End of the Free Internet http://p.sf.net/sfu/therealnews-com _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users