Re: [OSM-dev] osm2pgsql diff imports benchmarks

2012-03-19 Thread Lynn W. Deffenbaugh (Mr)

On 3/18/2012 4:07 PM, Kai Krueger wrote:

sylvain letuffe wrote

It turned out that the index on the pending field wasn't used at all and
led
to a full scan of the planet_osm_ways table, increasing noticeably the
diff
import.


I think I have seen that one before as well. If I remember correctly, it was
sufficient to run a simple analyze on the table. No need to do a re-index.
It probably thinks a too large proportion of the ways are pending and
therefore decides it is best to do a seq scan. Possibly because before the
going over pending ways stage about 50% of ways are pending during the
import. Osm2pgsql should be doing a analyze at the end of the import though,
so I am not sure why this is happening.


I have a planet database that has been imported and has been attempting 
to catch up for a while now (3 days to go) and it's been showing extreme 
data throughput spikes when entering the pending ways phase, so I 
decided to check this out.  An analyze command show that a sequential 
scan is, in fact, being done:




gis= explain select id from planet_osm_ways where pending;
 Seq Scan on planet_osm_ways  (cost=0.00..5253263.74 rows=65592587 
width=4)

   Filter: pending



I'm not sure why, but I got an error attempting to analyze verbose 
planet_osm_ways as www-data, but it seems to execute as the postgres user:



gis= analyze verbose planet_osm_ways;
WARNING:  skipping planet_osm_ways --- only table or database owner 
can analyze it


The analyze output really didn't tell me much:


gis=# analyze verbose planet_osm_ways;
INFO:  analyzing public.planet_osm_ways
INFO:  planet_osm_ways: scanned 3 of 3941416 pages, containing 
983808 live rows and 589826 dead rows; 3 rows in sample, 131170601 
estimated total rows


But a subsequent explain certainly looks promising:


gis= explain select id from planet_osm_ways where pending;
 Index Scan using planet_osm_ways_idx on planet_osm_ways  
(cost=0.00..916513.78 rows=43724 width=4)


Now to wait for my next 12 hour catchup chunk to get to the pending ways 
phase,probably in about 6 hours or so.


Lynn (D) - KJ4ERJ

PS.  If anyone gets this far, are those dead rows a function of my 
running with autovaccuum off?  And should I be doing a periodic vaccuum 
to clean out accumulated cruft?  I was thinking that not too much in the 
osm planet would be deleting, but maybe my assumption is incorrect?



___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] osm2pgsql diff imports benchmarks

2012-03-19 Thread Simon Poole


Am 19.03.2012 12:37, schrieb Lynn W. Deffenbaugh (Mr):
 And should I be doing a periodic vaccuum to clean out accumulated
cruft? I was thinking that not too much in the osm planet would be
deleting, but maybe my assumption is incorrect?

Yes and yes.

osm2pgsql does not update rows, it deletes them and inserts the new
version. You really need to run vacuum at least once per week, probably
more right now.

Simon

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] osm2pgsql diff imports benchmarks

2012-03-19 Thread sly (sylvain letuffe)

  gis= explain select id from planet_osm_ways where pending;
   Index Scan using planet_osm_ways_idx on planet_osm_ways  
  (cost=0.00..916513.78 rows=43724 width=4)

This is what I've got after a re-index :
gis=# explain select id from planet_osm_ways where pending;
  QUERY PLAN
--
 Index Scan using planet_osm_ways_idx on planet_osm_ways  (cost=0.00..1157.47 
rows=1 width=4)
(1 ligne)

and that saves me around ~20 seconds per minute diff



-- 
sly
qui suis-je : http://sly.letuffe.org
email perso : sylvain chez letuffe un point org

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


[OSM-dev] osm2pgsql diff imports benchmarks

2012-03-18 Thread sly (sylvain letuffe)
Hi,

In the process of trying to speed up diff imports with osm2pgsql, I'm searching 
for a typical output of osm2pgsql importing one minute.

I know of :
http://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks
But beside the one I just wrote, there are no output of what is expected  
for a one minute (or few minutes) diff import.

Could someone operating and maintaining a planet osm2pgsql database be so kind 
as to provide me one (or more in order to have average metrics) ?


On a side note, while loging long queries I ran into a weird long one about 
retrieving pending ways during diff import.

It turned out that the index on the pending field wasn't used at all and led 
to a full scan of the planet_osm_ways table, increasing noticeably the diff 
import.
I don't feel I have imported data in a weird way and don't know if others are 
affected by that, but you can check if you are affected by runing :
select id from planet_osm_way where pending which should run in less than a 
second if index is well used.

To solve the problem post import, we did re-index and ran analyse
reindex index planet_osm_ways_idx;
analyse;
-- 
sly (sylvain letuffe)

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev