While waiting for a recent planet import to catch up using minutely diffs I started wondering what the slow parts of minutely processing were. So I took a look at my postgres log which is set to record slow queries. Turns out, the most frequent slow query during diff processing is the one that updates a way's linestring after a node is modified. [1] Sometimes it takes 10s of seconds. I assume this is only on very large ways and/or nodes that belong to many ways. On some random way with 100 nodes the query took ~300ms for me. A node that was a member of two ways took ~600ms.
The one that does the same when a way is modified is also in the slow query log quite frequently. [2] But this got me to thinking... this query is executed at the node level. If I am reading this right, this will lead to a LOT of unnecessary linestring updates. For example if I am working on TIGER fixup and move 50 nodes in a way and then add another 10, the way is going to have its linestring updated 51 times while processing a single diff. One time each when the 50 existing nodes are moved and then once more when the 10 new nodes are added to the way. Maybe not in that order. This query is also executed on nodes that aren't members of any way. Executing the query for this type of way took me about 150ms. Unfortunately diffs don't tell us what ways nodes are a member of so executing the query on unconnected nodes seems unavoidable. The one place where I think an improvement might be possible is the "linestring updated 51 times in one diff" problem. My initial thought would be to make the "action" table required for diff consumption. Then instead of updating the linestring in Node/WayDao, wait until the diff processing is complete and issue a single query to update all ways affected in the diff based on what is in the action table. Maybe something in ChangeWriter.complete() [3] right before it truncates the action table. This is somewhat similar to what osm2pgsql does with its "pending" tables although osm2pgsql actually constructs linestrings in code using cached node locations so it is a little bit different. Here is an EXPLAIN ANALYZE for the node query: http://pastebin.com/H4nbJ402 (might need to click on the "raw" option to avoid wrapping) It seems like there is potential benefit from grouping updates so some of those index scans only happen once per diff instead of once per node. The first question that comes to mind is what is the overhead of using the action table and would it eclipse any gains achieved by this? Has anyone else given this subject any thought? Is it even worth pursuing further? [1] https://github.com/openstreetmap/osmosis/blob/master/pgsnapshot/src/main/java/org/openstreetmap/osmosis/pgsnapshot/v0_6/impl/NodeDao.java#L28-L37 [2] https://github.com/openstreetmap/osmosis/blob/master/pgsnapshot/src/main/java/org/openstreetmap/osmosis/pgsnapshot/v0_6/impl/WayDao.java#L30-L37 [3] https://github.com/openstreetmap/osmosis/blob/master/pgsnapshot/src/main/java/org/openstreetmap/osmosis/pgsnapshot/v0_6/impl/ChangeWriter.java#L211 Toby _______________________________________________ osmosis-dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/osmosis-dev
