Hi Michel, Yes, there is redundant data stored in the pgsnapshot schema. It's there for performance reasons. If you compare the pgsnapshot schema to the earlier pgsimple schema you'll notice that the way.nodes column is new, but that the way_nodes table has always existed. Both way->node mechanisms are used for different purposes.
The pgsnapshot schema relies heavily on the PostgreSQL CLUSTER feature where table contents can be sorted to align with one of the indexes. The nodes and ways tables are both organised by geographical location which drastically improves performance when performing bounding box style queries because data in the same geographical area will be stored close together on disk. The way_nodes table can't make use of geographical clustering which means that joining the ways table to the nodes table via way_nodes is very slow for large amounts of data due to large amounts of disk seeks. Therefore the way->node information is duplicated inside the ways table for fast lookup of nodes within ways. This schema includes tags directly against the node, way and relation tables in hstore columns for exactly the same reason. On typical SATA disks, the pgsnapshot schema performs approximately 10 times faster than the pgsimple schema for bounding box queries. I can't drop the way_node table because it is still needed for two reasons (may be others I've forgotten). 1. It is used for replication when a node changes and the related way geometries have to be updated. 2. It is used for bounding box queries if the ways table doesn't have any geometric columns configured. I hope that makes it clearer. Brett
_______________________________________________ osmosis-dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/osmosis-dev
