Re: [OSM-talk] Bug in using osm2pgsql to keep up with dailies

2008-09-03 Thread Martijn van Oosterhout
Did you see the patch posted a few days ago about making creates into modifies?

Otherwise I'll just commit it to SVN.

Have a nice day,

On Wed, Sep 3, 2008 at 7:00 PM, Michal Migurski [EMAIL PROTECTED] wrote:
 I'm revisiting the planet.osm stuff this week, from below.

 If the planet.osm file is from 2008-08-27, do I start running daily
 diffs at 20080827-20080828 or 20080828-20080829?

 I would assume the former, but I get duplicate key errors when I try:

ERROR:  duplicate key value violates unique constraint
 osm_bayarea_ways_pkey
(7)
Arguments were: 26580292, {26469086,11080906,165095606,11080816},
 {ref,A232,highway,trunk,name,Croydon Road}, f,
Error occurred, cleaning up

 Am I correct to go in this order?

create planet-080827.osm.bz2
ignore  20080827-20080828.osc.gz
append 20080828-20080829.osc.gz
append 20080829-20080830.osc.gz
etc.

 -mike.

 On Aug 12, 2008, at 12:30 PM, Jon Burgess wrote:

 On Sun, 2008-08-10 at 18:45 -0700, Michal Migurski wrote:
 So I'm definitely doing the bbox thing - I ran out of space on the
 volume when doing a slim import of planet.osm with a box that
 covered
 only the extended SF Bay Area. Seems like that should be fairly
 reasonable, right?

 Perhaps the slim mode is not taking the bounding box into account.
 I'll take a look.

 Any news?


 The news is mixed. The slim mode code does correctly exclude nodes
 outside of the bounding box when reading them in from the file.
 Unfortunately all the ways and relations still make it to the
 intermediate tables. It isn't until the code tries to extract the
 geometries from the ways that it can discover if the nodes for the way
 are outside the bounding box.

 It may be possible to improve this but it would need to make the
 assumption that all the nodes are in the file. I don't have time to
 look
 at this right now though.




 
 michal migurski- [EMAIL PROTECTED]
  415.558.1610




 ___
 talk mailing list
 talk@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/talk




-- 
Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] Bug in using osm2pgsql to keep up with dailies

2008-08-12 Thread Jon Burgess
On Sun, 2008-08-10 at 18:45 -0700, Michal Migurski wrote:
  So I'm definitely doing the bbox thing - I ran out of space on the
  volume when doing a slim import of planet.osm with a box that
 covered
  only the extended SF Bay Area. Seems like that should be fairly
  reasonable, right?
 
  Perhaps the slim mode is not taking the bounding box into account.
  I'll take a look.
 
 Any news?
 

The news is mixed. The slim mode code does correctly exclude nodes
outside of the bounding box when reading them in from the file.
Unfortunately all the ways and relations still make it to the
intermediate tables. It isn't until the code tries to extract the
geometries from the ways that it can discover if the nodes for the way
are outside the bounding box.

It may be possible to improve this but it would need to make the
assumption that all the nodes are in the file. I don't have time to look
at this right now though.

Jon



___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] Bug in using osm2pgsql to keep up with dailies

2008-08-12 Thread Jon Burgess
On Sun, 2008-08-10 at 18:45 -0700, Michal Migurski wrote:
  So I'm definitely doing the bbox thing - I ran out of space on the
  volume when doing a slim import of planet.osm with a box that covered
  only the extended SF Bay Area. Seems like that should be fairly
  reasonable, right?

Your best bet would probably to start with a pre-filtered planet extract
like the one for California here:
http://downloads.cloudmade.com/north_america/united_states/california

Jon



___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] Bug in using osm2pgsql to keep up with dailies

2008-08-11 Thread Dave Stubbs
 Slim mode over the full planet won't work without the intarray module,
 but it's not included in contrib/ for postgresql-8.3 on Debian Lenny.
 Where should I be looking for this?



If anything like ubuntu then it is -- but freakily it's called _int
rather than intarray. So see if you can find that instead.

Dave

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] Bug in using osm2pgsql to keep up with dailies

2008-08-10 Thread Michal Migurski
 So I'm definitely doing the bbox thing - I ran out of space on the
 volume when doing a slim import of planet.osm with a box that covered
 only the extended SF Bay Area. Seems like that should be fairly
 reasonable, right?

 Perhaps the slim mode is not taking the bounding box into account.
 I'll take a look.

Any news?


 Probably the right thing to do would be to get the import done once
 with a larger volume available to Postgres (EC2 does give you a
 secondary disk at /mnt that's over 100GB), then keep up with
 incrementals moving forward after the initial inconvenience.

 100GB would definitely be enough. I've seen the full slim-mode planet
 import taking around 40GB. This code to handle the diff mode import is
 all very new and does not scale up to handling the whole planet yet.


So I'm running in further frustrations here.

Slim mode over the full planet won't work without the intarray module,  
but it's not included in contrib/ for postgresql-8.3 on Debian Lenny.  
Where should I be looking for this?

-mike.


michal migurski- [EMAIL PROTECTED]
  415.558.1610




___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] Bug in using osm2pgsql to keep up with dailies

2008-08-05 Thread Jon Burgess
2008/8/5 Michal Migurski [EMAIL PROTECTED]:
 Another way to save more disk space is to filter out the data you
 don't
 require. Either by applying a bounding box or by removing items from
 the
 default.style.


 So I'm definitely doing the bbox thing - I ran out of space on the
 volume when doing a slim import of planet.osm with a box that covered
 only the extended SF Bay Area. Seems like that should be fairly
 reasonable, right?

Perhaps the slim mode is not taking the bounding box into account.
I'll take a look.

 Probably the right thing to do would be to get the import done once
 with a larger volume available to Postgres (EC2 does give you a
 secondary disk at /mnt that's over 100GB), then keep up with
 incrementals moving forward after the initial inconvenience.

100GB would definitely be enough. I've seen the full slim-mode planet
import taking around 40GB. This code to handle the diff mode import is
all very new and does not scale up to handling the whole planet yet.

-- 
 Jon

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


[OSM-talk] Bug in using osm2pgsql to keep up with dailies

2008-08-04 Thread Michal Migurski
 Nice, works very well.

 One hiccup I see is that if I run the executable from a directory
 other than the one where it was built, it complains that default.style
 can't be found. Otherwise works beautifully.


So, two frustrating things about osm2pgsql's --slim mode:

I first tried doing the whole planet.osm without --slim, which worked  
well. However, when I would then use the --slim option to catch up on  
dailies, I found that a number of tables (prefix_nodes, prefix_ways,  
etc.) hadn't been created. It was not possible to do the dailies  
unless they had been planed-for from the start.

The second thing is that upon going back to the original planet.osm  
with the much-slower --slim mode turned on, it required so much disk  
space that it maxed out an EC2 standard disk image.

It would be nice if it were possible to do the initial planet.osm  
import without --slim for speed and space, and still import subsequent  
diffs.

-mike.


michal migurski- [EMAIL PROTECTED]
  415.558.1610




___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] Bug in using osm2pgsql to keep up with dailies

2008-08-04 Thread Jon Burgess
On Mon, 2008-08-04 at 14:51 -0700, Michal Migurski wrote:
  Nice, works very well.
 
  One hiccup I see is that if I run the executable from a directory
  other than the one where it was built, it complains that default.style
  can't be found. Otherwise works beautifully.
 
 
 So, two frustrating things about osm2pgsql's --slim mode:
 
 I first tried doing the whole planet.osm without --slim, which worked  
 well. However, when I would then use the --slim option to catch up on  
 dailies, I found that a number of tables (prefix_nodes, prefix_ways,  
 etc.) hadn't been created. It was not possible to do the dailies  
 unless they had been planed-for from the start.
 
 The second thing is that upon going back to the original planet.osm  
 with the much-slower --slim mode turned on, it required so much disk  
 space that it maxed out an EC2 standard disk image.
 
 It would be nice if it were possible to do the initial planet.osm  
 import without --slim for speed and space, and still import subsequent  
 diffs.

I'm afraid that is not possible. The conversion from OSM to postgres is
lossy. It converts all the node references on the ways into linestring
geometries referencing the individual lat/lon of the nodes without any
reference to the IDs. This makes it impossible to update this data
without storing a copy of all the raw nodes and ways in the extra tables
generated by the slim-mode import.

An alternative way to do this is to use osmosis to update the planet
file with the daily diff and then reload this into postgres.
Unfortunately this may take too long to be a practical solution.

Another way to save more disk space is to filter out the data you don't
require. Either by applying a bounding box or by removing items from the
default.style.

Jon



___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] Bug in using osm2pgsql to keep up with dailies

2008-08-04 Thread Michal Migurski
 It would be nice if it were possible to do the initial planet.osm
 import without --slim for speed and space, and still import  
 subsequent
 diffs.

 I'm afraid that is not possible. The conversion from OSM to postgres  
 is
 lossy. It converts all the node references on the ways into linestring
 geometries referencing the individual lat/lon of the nodes without any
 reference to the IDs. This makes it impossible to update this data
 without storing a copy of all the raw nodes and ways in the extra  
 tables
 generated by the slim-mode import.

Gotcha.


 An alternative way to do this is to use osmosis to update the planet
 file with the daily diff and then reload this into postgres.
 Unfortunately this may take too long to be a practical solution.

 Another way to save more disk space is to filter out the data you  
 don't
 require. Either by applying a bounding box or by removing items from  
 the
 default.style.


So I'm definitely doing the bbox thing - I ran out of space on the  
volume when doing a slim import of planet.osm with a box that covered  
only the extended SF Bay Area. Seems like that should be fairly  
reasonable, right?

Probably the right thing to do would be to get the import done once  
with a larger volume available to Postgres (EC2 does give you a  
secondary disk at /mnt that's over 100GB), then keep up with  
incrementals moving forward after the initial inconvenience.

-mike.


michal migurski- [EMAIL PROTECTED]
  415.558.1610




___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk