Re: [Pytables-users] Is it better to have many smaller tables or one large table

Vineet Jain Wed, 16 Sep 2009 11:52:25 -0700

When I initially migrated my sqlite code to pytables, the pytables code was a 
lot smaller. But then I realized if I stored the keys in the data table. It 
will make my data very large (I also found out that having one large table is 
the way to go against having a large number of smaller tables). Imagine having 
to repeat the same keys over and over again on 10+ Million Rows (I'm guessing 
that compression would help, however, don't know what the downside to that is).


Then I replaced the keys (as you suggested with unsigned 16 ints) and have a 
separate table which stores the mapping from keys to indexes. I have another 
table which stores metrics on the table, so as I update the data table the 
metrics are kept up to date. The code now has gotten larger than sqlite. I'm 
hoping that the performance improvement will more than make up for the 
additional complexity. I'm also looking forward to compression which I would 
not have gotten with sqlite. 

Things which would be nice to have on indexed columns (without having to read 
all the data):

1. Min and Max
2. Distinct

And be able to limit number of rows to get back from the db. 

Thanks,

Vineet


-----Original Message-----
From: Kenneth Arnold [mailto:kenneth.arn...@gmail.com] 
Sent: Wednesday, September 16, 2009 2:43 PM
To: Vineet Jain
Cc: Francesc Alted; pytables-users@lists.sourceforge.net
Subject: Re: [Pytables-users] Is it better to have many smaller tables or one 
large table

"names" sounds like strings. To borrow a SQL concept, normalize your
database: make a table (or VLArray) that just stores e.g. the distinct
field_names, and use the index into that table instead of the full
name in your big table.

A potential problem is that updating the indices for the field_names
table will be slow, but it has to happen each time a new field_name is
added. Maybe Francesc has some ideas about that.

-Ken



On Wed, Sep 16, 2009 at 11:19 AM, Vineet Jain <vinjv...@gmail.com> wrote:
> There is one big downside though. My Two keys are field_name and
> symbol_name. When I add tables/groups for field_name/symbol_name, I can
> walkgroups and quickly found out what the unique field_names and
> symbol_names are in the file. However, if I have 10 million plus rows+, is
> there a sql equivalent of 'distinct' which will give me what the unique
> field_names and symbol_names are in the table (assuming that both
> field_names and symbol_names are indexed columns) without having to load all
> the rows?
>
>
> -----Original Message-----
> From: Francesc Alted [mailto:fal...@pytables.org]
> Sent: Wednesday, September 16, 2009 11:10 AM
> To: pytables-users@lists.sourceforge.net
> Subject: Re: [Pytables-users] Is it better to have many smaller tables or
> one large table
>
> A Wednesday 16 September 2009 15:18:04 Vineet Jain escrigué:
>> I have a table with two fields: date (str 8) and value (float32).
>>
>>
>>
>> I created two files and am trying to explain the file size difference
>> between the two:
>>
>>
>>
>> File 1:
>>
>> 3 groups and 2 tables. Each table has 390 rows. Total number of rows: 780
>>
>>
>>
>> File 2:
>>
>> 3 groups and 102 tables. Each table has 7 rows. Total number of rows: 784
>>
>>
>>
>> File 2 is 1.3MB while file 1 is 43k.  In my design I was going to have
>> 1000's of tables and hundred's of groups. Given the size difference
> between
>> the two files, is it better to have few table with extra keys or a large
>> number of smaller tables?
>
> The difference is due to the fact that file 1 has to put much more metadata
> (i.e. data that describes data) in there, while in file 2 metadata is
> minimal.
> File 1 structure is always preferred over file 2 because it is more
> scalable.
>
> Hope that helps,
>
> --
> Francesc Alted
>
> ----------------------------------------------------------------------------
> --
> Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay
> ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
> http://p.sf.net/sfu/devconf
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay
> ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
> http://p.sf.net/sfu/devconf
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>


------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Is it better to have many smaller tables or one large table

Reply via email to