A Thursday 09 December 2010 10:19:22 Nicholas Potter escrigué:
> Hello everyone,
> 
> I am working with economic data for 3140 counties and the 50 states
> as well as 500 industries, and trying to figure out the best way to
> store and access the data.  The two options seem to be to have one
> table of ~32 million rows, like this:
> 
> Region | Industry | variable | value
> **data**
> 
> or instead, decouple by region, having 3140 tables, one for each
> county, with industries as the columns (so 500 columns) and
> variables as the rows.
> 
> I guess this is essentially a row versus column orientation question,
> but also whether it would be better to split the tables or keep them
> together. Are there advantages to either way?
> 
> I will always be accessing data for a specific county, so it seems
> separate tables might be better, but is there any reason to go with
> one giant table instead?

No, I also think that separate tables is best.  You know, PyTables uses 
a row-wise data arrangement for its current Table implementation --for 
PyTables 3, that I plan to introduce column-wise tables, but we are not 
there yet.  Until then, having too wide tables (> 1000 bytes/row) is 
strongly discouraged.

Hope this helps,

-- 
Francesc Alted

------------------------------------------------------------------------------
This SF Dev2Dev email is sponsored by:

WikiLeaks The End of the Free Internet
http://p.sf.net/sfu/therealnews-com
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to