On Sun, Mar 15, 2009 at 8:57 PM, Rich Shepard <[email protected]> wrote:
> On Sun, 15 Mar 2009, P Kishor wrote:
>
>> CREATE TABLE met (
>>        met_id INTEGER PRIMARY KEY,
>>        other met attributes,
>>        met_grid_id INTEGER
>> );
>
>   Given the quantity of data you have I urge you to read Joe Celko's "SQL
> Programming Style" before you dig yourself into a really deep hole. For
> example, your met_grid covers a defined land area. It would be much better
> to use the geographic coordinates (lat/lon, UTM, State Plane) of the center
> of each grid as the primary key. On the other hand, if all your information
> is associated with the met_grid, then its center should be the unique
> identifier. Your *_id are artificial and convey no useful information.
>
>> Imagine a million square kms area divided into 1 sq. kms. cells. Those are
>> the land cells. Now imagine another overlay, the met grid, on top of the
>> land grid. The met grid is also a million square kms., but is tessellated
>> into 2500 sq. kms. cells, that is, a 20 x 20 grid.
>
>   What you could do is have a Grid table whose primary key is the center of
> the central met_grid, and which references all the other met_grids it
> contains.
>
>   Given your schema above, how can you tell that a specific 1 km**2 met cell
> belongs to a specific 2500 km**2 grid cell? The numbers are meaning less.
> Geographic coordinates use real, meaningful data to associate all 2500 met
> cells in each grid cell.
>
>> I have 7 met attributes for every day of 20 years (that is, 7300 rows)
>> for each "met_cell". So, imagine a stack of 7300 rows sitting on each
>> met cell. In other words, each set of 7300 rows is related to a block
>> of 2500 land cells.
>
>   Then your Met table contains 7 columns (one for each attribute), plus the
> geographic center columns (Northing/Easting, Lat/Lon), and the date. That's
> 10 columns and 7300 rows.
>
>   Now you have a start toward a normalized table.
>
>   I suspect that your thinking has been shaped by trying to use spreadsheets
> as databases, and thinking of SQL as a procedural language rather than a
> declarative language that works on sets of tables (rows, columns) instead of
> files (records, fields).
>
> Rich
>


I appreciate your help Rich, but your suspicion is unfounded and
wrong. I don't use spreadsheets other than when doing my taxes, and
while I am only about 1/100th as good as Igor at SQL, that probably
places me in the upper 90s percentile.

That said, my original question was about designing a db for ease of
use and speed of query retrieval, not for size or memory constraints.

I also have a fairly long history with GIS, so I have considered fully
the benefits and drawbacks of using lat/lon as the the key.
Geographically unique keys are useless for my problem even though the
problem space is geographic.

I don't really need to know which cell_id or what location I am trying
to query... I will build a map front end that will allow the users to
choose an area of interest and send the bounding box back to the
application. The application will take that bounding box and locate
the cells. Hence, the R*Tree index. The cell_ids will then be used to
return the spatial and weather attributes which will be passed to the
model. Hence, the cell_ids and met_ids can be as artificial as
Sucralose.

In any case, I built the db last night. Weighs in at about 450 MB. I
basically loaded all the weather data in a single table, and linked
the spatial cells to the weather cells using met_id. Built an index on
met_id, and, of course, the R*Tree index. Did a few cursory tests. The
performance is outstanding.

Now I need to do a few load tests, and I think I am well on my way to
the next step.

I think I have converted a procedure that took many several hours
doing data prep to a process that returns data in a few milliseconds.

Simple schema, simple db, simply great results.

-- 
Puneet Kishor http://www.punkish.org/
Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/
Carbon Model http://carbonmodel.org/
Open Source Geospatial Foundation http://www.osgeo.org/
Sent from: Madison WI United States.
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to