Re: [OSM-dev] An alternative rendering-database setup and update strategy!

2013-12-17 Thread Stefan Keller
Hi Sven

Can you explain liitle bit more the setup of the servers you have in mind?
You seem to have a potent machine which has a PostGIS database and uses
osm2pgsql to populate and update it periodically.
Then you want to attach a semi-potent server to that master?

Yours, Stefan


2013/11/25 Sven Geggus li...@fuchsschwanzdomain.de

 Hello,

 on my day-job I recently had to solve the Problem of setting up a Postgis
 database contaning a full-planet extract using one of those cheap 180GB SSD
 on a semi-potent machine (only 8GB of RAM).

 I first tried to use osm2pgsql for this purpose which is almost impossible
 for a couple of reasons:

 * I would have needed more than 180GB of disk space (at least during
 import)
 * The import would take a _very_ long time (several days rather than hours)

 So I had to look for a backup strategy!

 I found one which proved to be that good, that I would like to discuss it
 here as an alternative way for rendering-database setup.

 The first thing I discovered is the fact, that we currently use roughly
 twice
 the disk-space for intermediate tables (or file in case of flatnode) than
 for the processed data itself.

 Here is how this looks like on tile.openstreetmap.de:

  relation  | total_size
 ---+
  public.planet_osm_line| 61 GB
  public.planet_osm_polygon | 59 GB
  public.planet_osm_point   | 10 GB
  public.planet_osm_roads   |  8 GB

  public.planet_osm_ways| 174 GB
  public.planet_osm_rels|   4 GB

 + flatnode.dat 20 GB

 processed data: 61GB+59GB+10GB+8GB=138GB
 intermediate data: 20GB+174GB+4GB=202GB

 The most annoying part is the 174GB planet_osm_ways table.

 So what I did to solve my problem was using pg_dump for the processed data
 tables only and setting up my target database using pg_restore.

 Advantages:
 * Very fast data import even on machines where import using osm2pgsql would
   be practically impossible
 * Decent size of database dump (32GB in case of a --hstore-match-only
 database
   which is about the same size as a planet dump)

 Disadvantages:
 * Currently no update strategy available
 * Will need a master osm2pgsql database
 * Will inherit table scheme from master database

 Conclusion:

 IMO the disadvantages can be resolved or at least mitigated in the
 following
 way:

 * we generate a downloadable dump of the database on
   tile.openstreetmap.de (e.g. weekly)
 * osm2pgsql needs to be patched to output the changes to the processed data
   tables as SQL commands which can then be used to replicated the slave
 databases
 * We already use --hstore-match-only database format. So flexibility in
   table-layout is not that much of a concern as views to hstore column can
 be used
   for rendering instead of tables.

 I would like to hear your comments to this proposial.

 Regards

 Sven

 --
 Threading is a performance hack.
 (The Art of Unix Programming by Eric S. Raymond)

 /me is giggls@ircnet, http://sven.gegg.us/ on the Web


 --
 Thinking of using NT for your critical apps?
   Isn't there enough suffering in the
 world?
(Advertisement of Sun Microsystems in Wall Street
 Journal)
 /me is giggls@ircnet, http://sven.gegg.us/ on the Web

 ___
 dev mailing list
 dev@openstreetmap.org
 https://lists.openstreetmap.org/listinfo/dev

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


[OSM-dev] An alternative rendering-database setup and update strategy!

2013-11-25 Thread Sven Geggus
Hello,

on my day-job I recently had to solve the Problem of setting up a Postgis
database contaning a full-planet extract using one of those cheap 180GB SSD
on a semi-potent machine (only 8GB of RAM).

I first tried to use osm2pgsql for this purpose which is almost impossible
for a couple of reasons:

* I would have needed more than 180GB of disk space (at least during import)
* The import would take a _very_ long time (several days rather than hours)

So I had to look for a backup strategy!

I found one which proved to be that good, that I would like to discuss it
here as an alternative way for rendering-database setup.

The first thing I discovered is the fact, that we currently use roughly twice
the disk-space for intermediate tables (or file in case of flatnode) than
for the processed data itself.

Here is how this looks like on tile.openstreetmap.de:

 relation  | total_size 
---+
 public.planet_osm_line| 61 GB
 public.planet_osm_polygon | 59 GB
 public.planet_osm_point   | 10 GB
 public.planet_osm_roads   |  8 GB

 public.planet_osm_ways| 174 GB
 public.planet_osm_rels|   4 GB

+ flatnode.dat 20 GB

processed data: 61GB+59GB+10GB+8GB=138GB
intermediate data: 20GB+174GB+4GB=202GB

The most annoying part is the 174GB planet_osm_ways table.

So what I did to solve my problem was using pg_dump for the processed data
tables only and setting up my target database using pg_restore.

Advantages:
* Very fast data import even on machines where import using osm2pgsql would
  be practically impossible
* Decent size of database dump (32GB in case of a --hstore-match-only database
  which is about the same size as a planet dump)

Disadvantages:
* Currently no update strategy available
* Will need a master osm2pgsql database
* Will inherit table scheme from master database

Conclusion:

IMO the disadvantages can be resolved or at least mitigated in the following
way:

* we generate a downloadable dump of the database on
  tile.openstreetmap.de (e.g. weekly)
* osm2pgsql needs to be patched to output the changes to the processed data
  tables as SQL commands which can then be used to replicated the slave 
databases
* We already use --hstore-match-only database format. So flexibility in
  table-layout is not that much of a concern as views to hstore column can be 
used
  for rendering instead of tables.

I would like to hear your comments to this proposial.

Regards

Sven

-- 
Threading is a performance hack.
(The Art of Unix Programming by Eric S. Raymond)

/me is giggls@ircnet, http://sven.gegg.us/ on the Web


-- 
Thinking of using NT for your critical apps?
  Isn't there enough suffering in the world?
   (Advertisement of Sun Microsystems in Wall Street Journal)
/me is giggls@ircnet, http://sven.gegg.us/ on the Web

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev