When I initially migrated my sqlite code to pytables, the pytables code was a lot smaller. But then I realized if I stored the keys in the data table. It will make my data very large (I also found out that having one large table is the way to go against having a large number of smaller tables). Imagine having to repeat the same keys over and over again on 10+ Million Rows (I'm guessing that compression would help, however, don't know what the downside to that is).
Then I replaced the keys (as you suggested with unsigned 16 ints) and have a separate table which stores the mapping from keys to indexes. I have another table which stores metrics on the table, so as I update the data table the metrics are kept up to date. The code now has gotten larger than sqlite. I'm hoping that the performance improvement will more than make up for the additional complexity. I'm also looking forward to compression which I would not have gotten with sqlite. Things which would be nice to have on indexed columns (without having to read all the data): 1. Min and Max 2. Distinct And be able to limit number of rows to get back from the db. Thanks, Vineet -----Original Message----- From: Kenneth Arnold [mailto:kenneth.arn...@gmail.com] Sent: Wednesday, September 16, 2009 2:43 PM To: Vineet Jain Cc: Francesc Alted; pytables-users@lists.sourceforge.net Subject: Re: [Pytables-users] Is it better to have many smaller tables or one large table "names" sounds like strings. To borrow a SQL concept, normalize your database: make a table (or VLArray) that just stores e.g. the distinct field_names, and use the index into that table instead of the full name in your big table. A potential problem is that updating the indices for the field_names table will be slow, but it has to happen each time a new field_name is added. Maybe Francesc has some ideas about that. -Ken On Wed, Sep 16, 2009 at 11:19 AM, Vineet Jain <vinjv...@gmail.com> wrote: > There is one big downside though. My Two keys are field_name and > symbol_name. When I add tables/groups for field_name/symbol_name, I can > walkgroups and quickly found out what the unique field_names and > symbol_names are in the file. However, if I have 10 million plus rows+, is > there a sql equivalent of 'distinct' which will give me what the unique > field_names and symbol_names are in the table (assuming that both > field_names and symbol_names are indexed columns) without having to load all > the rows? > > > -----Original Message----- > From: Francesc Alted [mailto:fal...@pytables.org] > Sent: Wednesday, September 16, 2009 11:10 AM > To: pytables-users@lists.sourceforge.net > Subject: Re: [Pytables-users] Is it better to have many smaller tables or > one large table > > A Wednesday 16 September 2009 15:18:04 Vineet Jain escriguĂ©: >> I have a table with two fields: date (str 8) and value (float32). >> >> >> >> I created two files and am trying to explain the file size difference >> between the two: >> >> >> >> File 1: >> >> 3 groups and 2 tables. Each table has 390 rows. Total number of rows: 780 >> >> >> >> File 2: >> >> 3 groups and 102 tables. Each table has 7 rows. Total number of rows: 784 >> >> >> >> File 2 is 1.3MB while file 1 is 43k. In my design I was going to have >> 1000's of tables and hundred's of groups. Given the size difference > between >> the two files, is it better to have few table with extra keys or a large >> number of smaller tables? > > The difference is due to the fact that file 1 has to put much more metadata > (i.e. data that describes data) in there, while in file 2 metadata is > minimal. > File 1 structure is always preferred over file 2 because it is more > scalable. > > Hope that helps, > > -- > Francesc Alted > > ---------------------------------------------------------------------------- > -- > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9-12, 2009. Register now! > http://p.sf.net/sfu/devconf > _______________________________________________ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9-12, 2009. Register now! > http://p.sf.net/sfu/devconf > _______________________________________________ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users