"names" sounds like strings. To borrow a SQL concept, normalize your database: make a table (or VLArray) that just stores e.g. the distinct field_names, and use the index into that table instead of the full name in your big table.
A potential problem is that updating the indices for the field_names table will be slow, but it has to happen each time a new field_name is added. Maybe Francesc has some ideas about that. -Ken On Wed, Sep 16, 2009 at 11:19 AM, Vineet Jain <vinjv...@gmail.com> wrote: > There is one big downside though. My Two keys are field_name and > symbol_name. When I add tables/groups for field_name/symbol_name, I can > walkgroups and quickly found out what the unique field_names and > symbol_names are in the file. However, if I have 10 million plus rows+, is > there a sql equivalent of 'distinct' which will give me what the unique > field_names and symbol_names are in the table (assuming that both > field_names and symbol_names are indexed columns) without having to load all > the rows? > > > -----Original Message----- > From: Francesc Alted [mailto:fal...@pytables.org] > Sent: Wednesday, September 16, 2009 11:10 AM > To: pytables-users@lists.sourceforge.net > Subject: Re: [Pytables-users] Is it better to have many smaller tables or > one large table > > A Wednesday 16 September 2009 15:18:04 Vineet Jain escrigué: >> I have a table with two fields: date (str 8) and value (float32). >> >> >> >> I created two files and am trying to explain the file size difference >> between the two: >> >> >> >> File 1: >> >> 3 groups and 2 tables. Each table has 390 rows. Total number of rows: 780 >> >> >> >> File 2: >> >> 3 groups and 102 tables. Each table has 7 rows. Total number of rows: 784 >> >> >> >> File 2 is 1.3MB while file 1 is 43k. In my design I was going to have >> 1000's of tables and hundred's of groups. Given the size difference > between >> the two files, is it better to have few table with extra keys or a large >> number of smaller tables? > > The difference is due to the fact that file 1 has to put much more metadata > (i.e. data that describes data) in there, while in file 2 metadata is > minimal. > File 1 structure is always preferred over file 2 because it is more > scalable. > > Hope that helps, > > -- > Francesc Alted > > ---------------------------------------------------------------------------- > -- > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9-12, 2009. Register now! > http://p.sf.net/sfu/devconf > _______________________________________________ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9-12, 2009. Register now! > http://p.sf.net/sfu/devconf > _______________________________________________ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users