Re: [sqlite] designing a db to hold repeating data

John Machin Sun, 15 Mar 2009 18:25:06 -0700

On 16/03/2009 11:45 AM, P Kishor wrote:
> On Sun, Mar 15, 2009 at 7:29 PM, John Machin <[email protected]> wrote:
>> On 16/03/2009 11:00 AM, P Kishor wrote:
>>> I have a grid of 1000 x 1000 cells with their own data as well as 20
>>> years of daily weather data (20 * 365 = 7300 rows) for each contiguous
>>> group of 50 x 50 cell.
>>>
>> [snip]
>>> CREATE TABLE met (
>>>        met_id INTEGER PRIMARY KEY,
>>>        other met attributes,
>>>        met_grid_id INTEGER
>>> );
>> [snip]
>>
>>> The db is about 350 MB with the cell table with 1000,000 rows and the
>>> met table with 2,920,000 rows and the R*Tree index.
>>>
>>> Is there any other better way that jumps out at any of you?
>> How many unique sets of "other met attributes" are there? If the answer is
>> substantially less than 2,920,000, you could consider parking those on yet
>> another table, referenced from the "met" table.
>>
> 
> 
> I am not sure what that gains me?


If there are are only (say) 100,000 unique sets of "other met 
attributes", then you can save substantial space, at the cost of some
complexity.

Here's a simple example of the concept in practice:

Imagine a spreadsheet containing the names and addresses of the 
employees of a large corporation. The "surname / family name / last 
name" column is quite likely to have many fewer unique entries than the 
number of rows. The same is likely to apply to the "city / location" 
column and the "zipcode / postcode" column [assuming the zipcode is kept 
as text and not a number].

Space is saved by keeping the unique strings in a "shared string table"; 
text cells contain an index into the shared string table.

> 
> Imagine a million square kms area divided into 1 sq. kms. cells. Those
> are the land cells.
> 
> Now imagine another overlay, the met grid, on top of the land grid.
> The met grid is also a million square kms., but is tessellated into
> 2500 sq. kms. cells, that is, a 20 x 20 grid.
> 
> I have 7 met attributes for every day of 20 years (that is, 7300 rows)
> for each "met_cell". So, imagine a stack of 7300 rows sitting on each
> met cell. In other words, each set of 7300 rows is related to a block
> of 2500 land cells.
> 
> Hope that helps picture the problem.

It appears to match what I had to infer from your sample blah_blah_id 
numbers :-)

Cheers,
John


_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] designing a db to hold repeating data

Reply via email to