Thanks for your detailed reply, Brett. Makes sense. 2011/12/22 Brett Henderson <[email protected]>
> Hi Michel, > > Yes, there is redundant data stored in the pgsnapshot schema. It's there > for performance reasons. If you compare the pgsnapshot schema to the > earlier pgsimple schema you'll notice that the way.nodes column is new, but > that the way_nodes table has always existed. Both way->node mechanisms are > used for different purposes. > > The pgsnapshot schema relies heavily on the PostgreSQL CLUSTER feature > where table contents can be sorted to align with one of the indexes. The > nodes and ways tables are both organised by geographical location which > drastically improves performance when performing bounding box style queries > because data in the same geographical area will be stored close together on > disk. The way_nodes table can't make use of geographical clustering which > means that joining the ways table to the nodes table via way_nodes is very > slow for large amounts of data due to large amounts of disk seeks. > Therefore the way->node information is duplicated inside the ways table for > fast lookup of nodes within ways. This schema includes tags directly > against the node, way and relation tables in hstore columns for exactly the > same reason. On typical SATA disks, the pgsnapshot schema performs > approximately 10 times faster than the pgsimple schema for bounding box > queries. > > I can't drop the way_node table because it is still needed for two reasons > (may be others I've forgotten). 1. It is used for replication when a node > changes and the related way geometries have to be updated. 2. It is used > for bounding box queries if the ways table doesn't have any geometric > columns configured. > > I hope that makes it clearer. > Brett > >
_______________________________________________ osmosis-dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/osmosis-dev
