Re: [osmosis-dev] Reading OSM History dumps

2010-08-25 Thread Brett Henderson
On Wed, Aug 25, 2010 at 11:14 PM, Peter Körner osm-li...@mazdermind.dewrote:

 Brett, the pgsql tasks currently write (in COPY mode) all data to temp
 files first. The process seems to be

 PlanetFile - NodeStoreTempFile - CopyFormatTempFile - PgsqlCopyImport

 in osm2pgsql the copy data is pushed to pgsql via unix pipes (5 or 6 COPY
 transactions running at the same time in different connections). This
 approach skips the CopyFormatTempFile stage. Is there any special reason
 this approach isn't used in the pgsnapshot package?


Not too sure now :-)  I think it was the simplest way to share code between
both the --write-pgsql-dump task and what was then the --fast-write-pgsql
(now simply --write-pgsql) task.

In practice the COPY file creation and loading is fairly fast.  The biggest
downside is the extra disk space.  The slowest parts of the whole process
are the way geometry creation, index building, and the CLUSTER statements
(in the newest schema).  On relatively low-end hardware it takes many days
to import an entire planet, only a small part of which is the COPY
processing.

In most cases I create the COPY files using --write-pgsql-dump and load them
via the provided load script so that I can better monitor progress and
resume if processing is interrupted.

In short it just hasn't been a high priority to change it.

While I'm on the topic, I've mostly completed the changes to the schema
now.  Performance is drastically improved over the old version for bounding
box query processing.  The --read-pgsql --dataset-bounding-box task
combination would previously take approximately an hour to retrieve a 1x1
degree box in a populated area, now it is down to around 5 minutes due to
far better disk layout.  The biggest downside is that the table clustering
takes a long time during initial database creation.
___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev


Re: [osmosis-dev] Reading OSM History dumps

2010-08-25 Thread Peter Körner

Am 25.08.2010 15:16, schrieb Marco Lechner - FOSSGIS e.V.:

Hi Peter,

I'm very intersted in your history-extension and I'm going to test as
soon as a first snapshot is available. Will it be possible to eat an
--bound-polygon stream from osmosis? Or will it just import the whole
history-plane?


You will be able to add a bbox or bound-polygon task before pushing 
things into the database.


But without having special tasks to handle bonding boxes in regard to 
history dumps, you will get problems with nodes moving in- and out of 
your bounding box.


The plugin will, for the time being,  also not be able to handle change 
streams, so it will not be possible to keep the database updated.


This is still work in progress in its earliest stage, so please don't 
expect it solving any real problems.


Peter

___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev


Re: [osmosis-dev] Reading OSM History dumps

2010-08-25 Thread Peter Körner

Am 25.08.2010 15:26, schrieb Brett Henderson:

In short it just hasn't been a high priority to change it.
I was planning to share on FileInputStream/FileOutputStream level. You 
can feed a FileInputStream into the CopyManager as well as into a file, 
can't you?


Maybe want to can copy the relevant bits later to pgsnapshot.

Peter

___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev


Re: [osmosis-dev] Reading OSM History dumps

2010-08-25 Thread Peter Körner

Hi Marco

The first snapshot is out. Unfortunately the hstore migration progress 
Brett is still in let the pgsnapshot tests fail, which is why hudson is 
not providing nightly builds anymore.


Because of that you'll need to compile osmosis yourself. I attached 
instructions to this mail that also include the concrete plugin usage.


The following tasks are available:--write-pgsql-history and 
--write-pgsql-history-dump. They correlate closely to --write-pgsql and 
--write-pgsql-dump.


All features that are marked as experimental may work or they may not 
and of course they will be painfully memory intensive on larger datasets 
because of the lack of a good store implementation.


Peter

Am 25.08.2010 15:16, schrieb Marco Lechner - FOSSGIS e.V.:

Hi Peter,

I'm very intersted in your history-extension and I'm going to test as
soon as a first snapshot is available. Will it be possible to eat an
--bound-polygon stream from osmosis? Or will it just import the whole
history-plane?

Marco

Am 25.08.2010 15:14, schrieb Peter Körner:

Hi all

After a little playing around I now got an idea of how I'm going to
implement everything. I'll keep as close as possible at the regular
simple schema and at the way the pgsql tasks work.

Just as with the optional linestring/bbox builder, the history import
tasks will serve more then one scheme. I'm leaving relations out, again.

the regular simple scheme
-  its the basis of all but not capable of holding history data

+ history columns
-  create and populate an extra column in way_nodes to store
the way version.
-  change the PKs of way_nodes to allow
more then one version of an element

+ way_nodes version builder
-  create and populate an extra column in way_nodes that holds the node
version that corresponds to the way's timestamp

+ minor version builder
-  create and populate an extra column in ways and way_nodes to store
the ways minor versions, which are generated by changes to the nodes
of the way between version changes of the way self.

+ from-to-timestamp builder
-  create and populate an extra column in the nodes and ways table that
specifies the date until which a version of an item was the current
one. After that time, the next version of the same item was
current (or the item was deleted). the tstamp field in contrast
contains the starting date from which an item was current.

+ linestring / bbox builder
-  just the same as with the regular simple scheme, works for all
version and minor-version rows

Until the end of the week I'll get a pre snapshot out that can
populate the history table with version columns and changed PKs. The
database created from this can be used to test Scotts SQL-Only
solution [1].

It will also contain a first implementation of the way_nodes version
builder but only with an example implementation of the NodeStore, that
performs bad on bigger files.


Brett, the pgsql tasks currently write (in COPY mode) all data to temp
files first. The process seems to be

PlanetFile -  NodeStoreTempFile -  CopyFormatTempFile -  PgsqlCopyImport

in osm2pgsql the copy data is pushed to pgsql via unix pipes (5 or 6
COPY transactions running at the same time in different connections).
This approach skips the CopyFormatTempFile stage. Is there any special
reason this approach isn't used in the pgsnapshot package?


Peter


[1]
http://lists.openstreetmap.org/pipermail/dev/2010-August/020308.html

___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev



___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev
# download osmosis
svn export http://svn.openstreetmap.org/applications/utils/osmosis/trunk/ 
osmosis-trunk

# enter the source directory
cd osmosis-trunk

# download the history plugin
svn export 
http://svn.toolserver.org/svnroot/mazder/osmhist/osmosis-plugin/history/

# enable the history plugin
patch -p0  history/script/source-activation.patch

# compile
ant clean publish

# reate a postgis user, if not already done
sudo -u postgres createuser osmosis

# create an empty database with hstore and postgis capabilities, if not already 
done
sudo -u postgres createdb -E UTF8 -O osmosis osmosis-history
sudo -u postgres createlang plpgsql osmosis-history

# create the simple schema database
psql -U osmosis osmosis-history  package/script/pgsql_simple_schema_0.6.sql

# add the history extension to the database
psql -U osmosis osmosis-history  
history/script/pgsql_simple_schema_0.6_history.sql

# the following lines add extra features to the database
# execute before them before the import
# they are experimental and very memory intensive
# use only with small data sets

# enable the node version builder
#psql -U osmosis osmosis-history  
history/script/pgsql_simple_schema_0.6_history_way_nodes_version.sql

# enable 

Re: [osmosis-dev] Reading OSM History dumps

2010-08-25 Thread Brett Henderson
On Wed, Aug 25, 2010 at 11:33 PM, Peter Körner osm-li...@mazdermind.dewrote:

 Am 25.08.2010 15:26, schrieb Brett Henderson:

  In short it just hasn't been a high priority to change it.

 I was planning to share on FileInputStream/FileOutputStream level. You can
 feed a FileInputStream into the CopyManager as well as into a file, can't
 you?


Sorry, I'm not sure what you mean.  I think the only way to feed data into
the CopyManager is via an InputStream.  That InputStream can be a
FileInputStream or a piped input stream or whatever you wish.  But there are
also classes like PGCopyOutputStream so perhaps you can use those directly
to avoid using multiple threads.  It's been a while since I looked at it.



 Maybe want to can copy the relevant bits later to pgsnapshot.


Yep, sure.
___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev


Re: [osmosis-dev] Reading OSM History dumps

2010-08-25 Thread Brett Henderson
On Thu, Aug 26, 2010 at 8:19 AM, Peter Körner osm-li...@mazdermind.dewrote:

 Hi Marco

 The first snapshot is out. Unfortunately the hstore migration progress
 Brett is still in let the pgsnapshot tests fail, which is why hudson is not
 providing nightly builds anymore.


I hope to have this fixed over the next few days.  I'm working with the
server admins to get hstore support added to the database.

Because of that you'll need to compile osmosis yourself. I attached
 instructions to this mail that also include the concrete plugin usage.


If you wish to avoid compiling yourself you can also get nightly builds
from:
http://bretth.dev.openstreetmap.org/osmosis-build/

The 0.37.SNAPSHOT version in the above location is built via a cron job.  No
tests are run during the build.

Brett
___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev