"names" sounds like strings. To borrow a SQL concept, normalize your
database: make a table (or VLArray) that just stores e.g. the distinct
field_names, and use the index into that table instead of the full
name in your big table.

A potential problem is that updating the indices for the field_names
table will be slow, but it has to happen each time a new field_name is
added. Maybe Francesc has some ideas about that.

-Ken



On Wed, Sep 16, 2009 at 11:19 AM, Vineet Jain <vinjv...@gmail.com> wrote:
> There is one big downside though. My Two keys are field_name and
> symbol_name. When I add tables/groups for field_name/symbol_name, I can
> walkgroups and quickly found out what the unique field_names and
> symbol_names are in the file. However, if I have 10 million plus rows+, is
> there a sql equivalent of 'distinct' which will give me what the unique
> field_names and symbol_names are in the table (assuming that both
> field_names and symbol_names are indexed columns) without having to load all
> the rows?
>
>
> -----Original Message-----
> From: Francesc Alted [mailto:fal...@pytables.org]
> Sent: Wednesday, September 16, 2009 11:10 AM
> To: pytables-users@lists.sourceforge.net
> Subject: Re: [Pytables-users] Is it better to have many smaller tables or
> one large table
>
> A Wednesday 16 September 2009 15:18:04 Vineet Jain escrigué:
>> I have a table with two fields: date (str 8) and value (float32).
>>
>>
>>
>> I created two files and am trying to explain the file size difference
>> between the two:
>>
>>
>>
>> File 1:
>>
>> 3 groups and 2 tables. Each table has 390 rows. Total number of rows: 780
>>
>>
>>
>> File 2:
>>
>> 3 groups and 102 tables. Each table has 7 rows. Total number of rows: 784
>>
>>
>>
>> File 2 is 1.3MB while file 1 is 43k.  In my design I was going to have
>> 1000's of tables and hundred's of groups. Given the size difference
> between
>> the two files, is it better to have few table with extra keys or a large
>> number of smaller tables?
>
> The difference is due to the fact that file 1 has to put much more metadata
> (i.e. data that describes data) in there, while in file 2 metadata is
> minimal.
> File 1 structure is always preferred over file 2 because it is more
> scalable.
>
> Hope that helps,
>
> --
> Francesc Alted
>
> ----------------------------------------------------------------------------
> --
> Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay
> ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
> http://p.sf.net/sfu/devconf
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay
> ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
> http://p.sf.net/sfu/devconf
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to