Re: [osmosis-dev] Reading OSM History dumps

2010-08-25 Thread Brett Henderson
On Wed, Aug 25, 2010 at 11:14 PM, Peter Körner osm-li...@mazdermind.dewrote:

 Brett, the pgsql tasks currently write (in COPY mode) all data to temp
 files first. The process seems to be

 PlanetFile - NodeStoreTempFile - CopyFormatTempFile - PgsqlCopyImport

 in osm2pgsql the copy data is pushed to pgsql via unix pipes (5 or 6 COPY
 transactions running at the same time in different connections). This
 approach skips the CopyFormatTempFile stage. Is there any special reason
 this approach isn't used in the pgsnapshot package?


Not too sure now :-)  I think it was the simplest way to share code between
both the --write-pgsql-dump task and what was then the --fast-write-pgsql
(now simply --write-pgsql) task.

In practice the COPY file creation and loading is fairly fast.  The biggest
downside is the extra disk space.  The slowest parts of the whole process
are the way geometry creation, index building, and the CLUSTER statements
(in the newest schema).  On relatively low-end hardware it takes many days
to import an entire planet, only a small part of which is the COPY
processing.

In most cases I create the COPY files using --write-pgsql-dump and load them
via the provided load script so that I can better monitor progress and
resume if processing is interrupted.

In short it just hasn't been a high priority to change it.

While I'm on the topic, I've mostly completed the changes to the schema
now.  Performance is drastically improved over the old version for bounding
box query processing.  The --read-pgsql --dataset-bounding-box task
combination would previously take approximately an hour to retrieve a 1x1
degree box in a populated area, now it is down to around 5 minutes due to
far better disk layout.  The biggest downside is that the table clustering
takes a long time during initial database creation.
___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev


Re: [osmosis-dev] Reading OSM History dumps

2010-08-25 Thread Peter Körner

Am 25.08.2010 15:16, schrieb Marco Lechner - FOSSGIS e.V.:

Hi Peter,

I'm very intersted in your history-extension and I'm going to test as
soon as a first snapshot is available. Will it be possible to eat an
--bound-polygon stream from osmosis? Or will it just import the whole
history-plane?


You will be able to add a bbox or bound-polygon task before pushing 
things into the database.


But without having special tasks to handle bonding boxes in regard to 
history dumps, you will get problems with nodes moving in- and out of 
your bounding box.


The plugin will, for the time being,  also not be able to handle change 
streams, so it will not be possible to keep the database updated.


This is still work in progress in its earliest stage, so please don't 
expect it solving any real problems.


Peter

___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev


Re: [osmosis-dev] Reading OSM History dumps

2010-08-25 Thread Peter Körner

Am 25.08.2010 15:26, schrieb Brett Henderson:

In short it just hasn't been a high priority to change it.
I was planning to share on FileInputStream/FileOutputStream level. You 
can feed a FileInputStream into the CopyManager as well as into a file, 
can't you?


Maybe want to can copy the relevant bits later to pgsnapshot.

Peter

___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev


Re: [osmosis-dev] Reading OSM History dumps

2010-08-25 Thread Peter Körner

Hi Marco

The first snapshot is out. Unfortunately the hstore migration progress 
Brett is still in let the pgsnapshot tests fail, which is why hudson is 
not providing nightly builds anymore.


Because of that you'll need to compile osmosis yourself. I attached 
instructions to this mail that also include the concrete plugin usage.


The following tasks are available:--write-pgsql-history and 
--write-pgsql-history-dump. They correlate closely to --write-pgsql and 
--write-pgsql-dump.


All features that are marked as experimental may work or they may not 
and of course they will be painfully memory intensive on larger datasets 
because of the lack of a good store implementation.


Peter

Am 25.08.2010 15:16, schrieb Marco Lechner - FOSSGIS e.V.:

Hi Peter,

I'm very intersted in your history-extension and I'm going to test as
soon as a first snapshot is available. Will it be possible to eat an
--bound-polygon stream from osmosis? Or will it just import the whole
history-plane?

Marco

Am 25.08.2010 15:14, schrieb Peter Körner:

Hi all

After a little playing around I now got an idea of how I'm going to
implement everything. I'll keep as close as possible at the regular
simple schema and at the way the pgsql tasks work.

Just as with the optional linestring/bbox builder, the history import
tasks will serve more then one scheme. I'm leaving relations out, again.

the regular simple scheme
-  its the basis of all but not capable of holding history data

+ history columns
-  create and populate an extra column in way_nodes to store
the way version.
-  change the PKs of way_nodes to allow
more then one version of an element

+ way_nodes version builder
-  create and populate an extra column in way_nodes that holds the node
version that corresponds to the way's timestamp

+ minor version builder
-  create and populate an extra column in ways and way_nodes to store
the ways minor versions, which are generated by changes to the nodes
of the way between version changes of the way self.

+ from-to-timestamp builder
-  create and populate an extra column in the nodes and ways table that
specifies the date until which a version of an item was the current
one. After that time, the next version of the same item was
current (or the item was deleted). the tstamp field in contrast
contains the starting date from which an item was current.

+ linestring / bbox builder
-  just the same as with the regular simple scheme, works for all
version and minor-version rows

Until the end of the week I'll get a pre snapshot out that can
populate the history table with version columns and changed PKs. The
database created from this can be used to test Scotts SQL-Only
solution [1].

It will also contain a first implementation of the way_nodes version
builder but only with an example implementation of the NodeStore, that
performs bad on bigger files.


Brett, the pgsql tasks currently write (in COPY mode) all data to temp
files first. The process seems to be

PlanetFile -  NodeStoreTempFile -  CopyFormatTempFile -  PgsqlCopyImport

in osm2pgsql the copy data is pushed to pgsql via unix pipes (5 or 6
COPY transactions running at the same time in different connections).
This approach skips the CopyFormatTempFile stage. Is there any special
reason this approach isn't used in the pgsnapshot package?


Peter


[1]
http://lists.openstreetmap.org/pipermail/dev/2010-August/020308.html

___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev



___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev
# download osmosis
svn export http://svn.openstreetmap.org/applications/utils/osmosis/trunk/ 
osmosis-trunk

# enter the source directory
cd osmosis-trunk

# download the history plugin
svn export 
http://svn.toolserver.org/svnroot/mazder/osmhist/osmosis-plugin/history/

# enable the history plugin
patch -p0  history/script/source-activation.patch

# compile
ant clean publish

# reate a postgis user, if not already done
sudo -u postgres createuser osmosis

# create an empty database with hstore and postgis capabilities, if not already 
done
sudo -u postgres createdb -E UTF8 -O osmosis osmosis-history
sudo -u postgres createlang plpgsql osmosis-history

# create the simple schema database
psql -U osmosis osmosis-history  package/script/pgsql_simple_schema_0.6.sql

# add the history extension to the database
psql -U osmosis osmosis-history  
history/script/pgsql_simple_schema_0.6_history.sql

# the following lines add extra features to the database
# execute before them before the import
# they are experimental and very memory intensive
# use only with small data sets

# enable the node version builder
#psql -U osmosis osmosis-history  
history/script/pgsql_simple_schema_0.6_history_way_nodes_version.sql

# enable 

Re: [osmosis-dev] Reading OSM History dumps

2010-08-25 Thread Brett Henderson
On Wed, Aug 25, 2010 at 11:33 PM, Peter Körner osm-li...@mazdermind.dewrote:

 Am 25.08.2010 15:26, schrieb Brett Henderson:

  In short it just hasn't been a high priority to change it.

 I was planning to share on FileInputStream/FileOutputStream level. You can
 feed a FileInputStream into the CopyManager as well as into a file, can't
 you?


Sorry, I'm not sure what you mean.  I think the only way to feed data into
the CopyManager is via an InputStream.  That InputStream can be a
FileInputStream or a piped input stream or whatever you wish.  But there are
also classes like PGCopyOutputStream so perhaps you can use those directly
to avoid using multiple threads.  It's been a while since I looked at it.



 Maybe want to can copy the relevant bits later to pgsnapshot.


Yep, sure.
___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev


Re: [osmosis-dev] Reading OSM History dumps

2010-08-25 Thread Brett Henderson
On Thu, Aug 26, 2010 at 8:19 AM, Peter Körner osm-li...@mazdermind.dewrote:

 Hi Marco

 The first snapshot is out. Unfortunately the hstore migration progress
 Brett is still in let the pgsnapshot tests fail, which is why hudson is not
 providing nightly builds anymore.


I hope to have this fixed over the next few days.  I'm working with the
server admins to get hstore support added to the database.

Because of that you'll need to compile osmosis yourself. I attached
 instructions to this mail that also include the concrete plugin usage.


If you wish to avoid compiling yourself you can also get nightly builds
from:
http://bretth.dev.openstreetmap.org/osmosis-build/

The 0.37.SNAPSHOT version in the above location is built via a cron job.  No
tests are run during the build.

Brett
___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev


Re: [osmosis-dev] Reading OSM History dumps

2010-08-24 Thread Brett Henderson
On Tue, Aug 24, 2010 at 1:09 AM, Peter Körner osm-li...@mazdermind.dewrote:

 Am 23.08.2010 13:35, schrieb Brett Henderson:

 To create your own store implementation you can build on the Osmosis
 persistence support.  All classes that are persistable implement the
 Storeable interface and have a constructor with StoreReader sr,
 StoreClassRegister scr arguments.

 The existing IndexedObjectStore assumes that the key is a long but
 provides a good example to start from.  The underlying IndexStore it
 uses can support any type of key as long as it has a fixed width (ie.
 always persists to the same number of bytes).

 It would need a key of 96 bit (id long + version int). I was not aware of
 any type 64bit in java so I'm not sure how I could build a store with a
 96bit index, but I think I have to take a deeper look into the IndexStore 
 company.


IndexStore just requires an IndexElement implementation that holds both the
key and the value.  You can define a key implementation class that holds as
many individual long or int values as you like, so long as it persists
through the Storeable interface to a fixed number of bytes.  You also have
to provide the IndexStore with a comparator that knows how to compare the
order of two keys.



 The timestamp is just a 64bit long value, so the only problem is here to do
 the comparison but this is the easy past, i think.



  It may
 be possible to make the existing IndexedObjectStore more generic but I'd
 need to experiment with it.

 I'll try to keep the whole changes local to my project. Once its finished
 you can take classes over to core as they're needed.


  Hmm, but thinking more about your problem it may make more sense to
 stick with the IndexedObjectStore and store a list of Nodes as each
 element instead of single Nodes.  I suspect in most cases you won't know
 the exact version you're looking for when you're loading a Node

 In the first phase when selecting the versions of the nodes used to create
 a version of a way I'll have a lot of timestamp searches (find the oldest
 node that is younger then the timestamp of the way) that need the timestamp
 index.

 later on, when the intermediate versions are calculated, i'll need a lookup
 for all versions of an id.

 a direct request for a known id/version will, as far as I see in this early
 stage, not be used too often (maybe during linestring building)


  (you'll

 only know node ids when looking at a way after all), and will only know
 a timestamp range.  When looking up a specific node/version/timestamp
 combination you would have to load all versions of a node from the
 IndexedObjectStore then linearly search for a match in the (usually
 fairly limited) list of objects.  You will possibly need to create you
 own Storeable list type to hold all versions of a particular Node
 because I don't think one exists.

 The main problem I see is, that such a list won't be of fixed size. When I
 write it to the store and later on add another version, it will grow bigger
 and have to be re-allocated in the store file, freing up space at the
 beginning. Basically a malloc/realloc/free in files.


If you need the ability to write values randomly then it won't work.  But if
you have sorted input (ie. all versions of a node are together on input)
then you can write them all to the store at once.  IndexedObjectStore will
allow you to write variable length objects to the store which is already
necessary to hold entities with variable numbers of tags.




  Just keep in mind that Osmosis stores aren't particularly fast to query
 because they're based on very simple data structures.  They tend to
 result in huge amounts of disk seeks when processing, so there may be
 libraries out there that perform better.  The main reason they were
 originally developed was to minimise external library dependencies and I
 haven't revisited that decision since Osmosis put on weight (ie. it now
 relies on many third-party jars).

 Thinking about all this I find that we're re-inventing the wheel. I'll try
 to use a JavaDB as the backend store. It is entirely written in Java ant
 thus cross platform compatible, supports btree indexes on multiple fields an
 can reside both, in-memory and on-disk. If it shows that it's fast enough,
 it may be a good alternative to a custom binary file/memory store.


I hope it works out because I've been down a similar path here.  After I
gave up on custom stores I tried Berkeley DB Java edition and performance
was horrible.  I finally bit the bullet and went the PostgreSQL path and
created the pgsql tasks.  I hope JavaDB works out though because requiring a
full database server really complicates usage.

Be very careful with btree indexes on multiple fields because they usually
only work well when you're looking up values for specific values of indexed
fields.  If you ever need to use range queries (eg. timestamp range)
involving multiple fields they tend to fall down.  I suspect you'll be just
as well off 

Re: [osmosis-dev] Reading OSM History dumps

2010-08-22 Thread Brett Henderson
Hi Peter,

This all sounds very interesting and will no doubt have many uses that I
can't anticipate.

I can't give you much assistance but will try to answer any specific
questions you have.  My wife is going to give birth sometime within the next
month which means my priorities are about to change drastically ;-)

You seem to have thought about most of the complexities of the problem
already so you know what you're dealing with.

You mentioned the problem of obtaining test data.  I'd suggest using:
http://planet.openstreetmap.org/history/

That is a full history from day one of the project up until now.  It is
already in the OSM change format that Osmosis understands.  Cutting bounding
boxes out of full history data is a difficult (but not impossible) problem
that you may have to solve in order to move forward.  In order to build way
linestrings for all way versions and for all node versions impacting the way
you will have to solve a similar problem to understanding how to cut bbox
data so you may be able to kill a couple of birds with one stone.

One thing to note is that I'm currently changing the simple schema a bit to
improve performance.  I've moved the tags into hstore columns, and have
duplicated the way_node table info into a nodes array column on the way
table.  This improves bounding box style query performance by several times
on large datasets.  I don't think it will impact you too much.

Good luck!

Cheers,
Brett

On Sun, Aug 22, 2010 at 12:18 AM, Peter Körner osm-li...@mazdermind.dewrote:

 Hi

 I during the last week I thought intensively about the new full history
 dump and how we could use it. I wrote some kind of paper and also some demo
 code to check how we could get osm history information into a postgis
 database with linestings and all this delicate features osmosis offers.

 I've put it on the wiki at
 
 http://wiki.openstreetmap.org/wiki/User:MaZderMind/Reading_OSM_History_dumps
 

 I'd love to hear some comments about it.


 Peter

 ___
 osmosis-dev mailing list
 osmosis-dev@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/osmosis-dev

___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev


Re: [osmosis-dev] Reading OSM History dumps

2010-08-22 Thread Peter Körner



Am 22.08.2010 08:26, schrieb Brett Henderson:

Hi Peter,

This all sounds very interesting and will no doubt have many uses that I
can't anticipate.

I can't give you much assistance but will try to answer any specific
questions you have.  My wife is going to give birth sometime within the
next month which means my priorities are about to change drastically ;-)

Oh, congratulations on this!


You seem to have thought about most of the complexities of the problem
already so you know what you're dealing with.
I think that all is solvable using just enough logic :) I did the demo 
implementation in PHP to see if this is possible and I think I know the 
OSM data structure enough to know what it means.


But I don't know Osmosis and Java enough to know how tow to implement 
the simple multi-level arrays from PHP in a way that will work with 
those really big files.


What I need is a store that can
 - store all versions of a Node*
 - access a specific version of a node
 - access all versions of a node
 - the oldest version of a node that has been created before Date X

*not only the Node's location but also the Meta-Info (Timestamp, User, 
UserID) because you would want to have this as the Meta-Info on the 
generated intermediate Way-Versions.


I looked into the three implementations of NodeLocationStore (especially 
the InMemoryNodeLocationStore) and I was thinking how I could extend the 
really simple fixed-size memory store to be able to store a complete 
Node and index by Id and Version at the same time.


Because there is no fixed number of versions per Node I can't go with a 
simple offset=NodeID*NodeSize calculation but I have to write the nodes 
one after another just as they come in and save the Offsets in a List, 
but I'm not sure how to build a List that allows Random Access to the 
offset to all versions of a node as well as to a specific version in Java.


I also found the IndexedObjectStore class in 
org.openstreetmap.osmosis.core.store and I thought about extending it to 
track three Indexes (NodeID, Version and Timestamp). Do you know if this 
would be workable?



You mentioned the problem of obtaining test data.  I'd suggest using:
http://planet.openstreetmap.org/history/
They are in .osc format but I need a task to convert from .osc to 
history-.osm and back, too.



That is a full history from day one of the project up until now.  It is
already in the OSM change format that Osmosis understands.  Cutting
bounding boxes out of full history data is a difficult (but not
impossible)
In regard to the Node-Moded-In/-Out problem, yes. At the moment I'm 
working with self-including history files, that contain all referenced 
items from version 1 on. When I start to convert .osc files into 
history-.osm files I will have to deal with objects with incomplete 
histories (when a node has been moved I only know its new position). 
There is a need to feed in a second data-source like an already existing 
database.


 problem that you may have to solve in order to move

forward.  In order to build way linestrings for all way versions and for
all node versions impacting the way you will have to solve a similar
problem to understanding how to cut bbox data so you may be able to kill
a couple of birds with one stone.
I'm not really sure if this will work as all I'm focusing on now is to 
get a complete dump analyzed, but we may get closer to this goal.



One thing to note is that I'm currently changing the simple schema a bit
to improve performance.
Yes I tracked that and it like the step towards hstore as I already used 
it a lot with osm2pgsql.


Peter

___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev


[osmosis-dev] Reading OSM History dumps

2010-08-21 Thread Peter Körner

Hi

I during the last week I thought intensively about the new full history 
dump and how we could use it. I wrote some kind of paper and also some 
demo code to check how we could get osm history information into a 
postgis database with linestings and all this delicate features osmosis 
offers.


I've put it on the wiki at
http://wiki.openstreetmap.org/wiki/User:MaZderMind/Reading_OSM_History_dumps

I'd love to hear some comments about it.

Peter

___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev