Re: [osmosis-dev] Reading OSM History dumps
On Wed, Aug 25, 2010 at 11:14 PM, Peter Körner osm-li...@mazdermind.dewrote: Brett, the pgsql tasks currently write (in COPY mode) all data to temp files first. The process seems to be PlanetFile - NodeStoreTempFile - CopyFormatTempFile - PgsqlCopyImport in osm2pgsql the copy data is pushed to pgsql via unix pipes (5 or 6 COPY transactions running at the same time in different connections). This approach skips the CopyFormatTempFile stage. Is there any special reason this approach isn't used in the pgsnapshot package? Not too sure now :-) I think it was the simplest way to share code between both the --write-pgsql-dump task and what was then the --fast-write-pgsql (now simply --write-pgsql) task. In practice the COPY file creation and loading is fairly fast. The biggest downside is the extra disk space. The slowest parts of the whole process are the way geometry creation, index building, and the CLUSTER statements (in the newest schema). On relatively low-end hardware it takes many days to import an entire planet, only a small part of which is the COPY processing. In most cases I create the COPY files using --write-pgsql-dump and load them via the provided load script so that I can better monitor progress and resume if processing is interrupted. In short it just hasn't been a high priority to change it. While I'm on the topic, I've mostly completed the changes to the schema now. Performance is drastically improved over the old version for bounding box query processing. The --read-pgsql --dataset-bounding-box task combination would previously take approximately an hour to retrieve a 1x1 degree box in a populated area, now it is down to around 5 minutes due to far better disk layout. The biggest downside is that the table clustering takes a long time during initial database creation. ___ osmosis-dev mailing list osmosis-dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/osmosis-dev
Re: [osmosis-dev] Reading OSM History dumps
Am 25.08.2010 15:16, schrieb Marco Lechner - FOSSGIS e.V.: Hi Peter, I'm very intersted in your history-extension and I'm going to test as soon as a first snapshot is available. Will it be possible to eat an --bound-polygon stream from osmosis? Or will it just import the whole history-plane? You will be able to add a bbox or bound-polygon task before pushing things into the database. But without having special tasks to handle bonding boxes in regard to history dumps, you will get problems with nodes moving in- and out of your bounding box. The plugin will, for the time being, also not be able to handle change streams, so it will not be possible to keep the database updated. This is still work in progress in its earliest stage, so please don't expect it solving any real problems. Peter ___ osmosis-dev mailing list osmosis-dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/osmosis-dev
Re: [osmosis-dev] Reading OSM History dumps
Am 25.08.2010 15:26, schrieb Brett Henderson: In short it just hasn't been a high priority to change it. I was planning to share on FileInputStream/FileOutputStream level. You can feed a FileInputStream into the CopyManager as well as into a file, can't you? Maybe want to can copy the relevant bits later to pgsnapshot. Peter ___ osmosis-dev mailing list osmosis-dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/osmosis-dev
Re: [osmosis-dev] Reading OSM History dumps
Hi Marco The first snapshot is out. Unfortunately the hstore migration progress Brett is still in let the pgsnapshot tests fail, which is why hudson is not providing nightly builds anymore. Because of that you'll need to compile osmosis yourself. I attached instructions to this mail that also include the concrete plugin usage. The following tasks are available:--write-pgsql-history and --write-pgsql-history-dump. They correlate closely to --write-pgsql and --write-pgsql-dump. All features that are marked as experimental may work or they may not and of course they will be painfully memory intensive on larger datasets because of the lack of a good store implementation. Peter Am 25.08.2010 15:16, schrieb Marco Lechner - FOSSGIS e.V.: Hi Peter, I'm very intersted in your history-extension and I'm going to test as soon as a first snapshot is available. Will it be possible to eat an --bound-polygon stream from osmosis? Or will it just import the whole history-plane? Marco Am 25.08.2010 15:14, schrieb Peter Körner: Hi all After a little playing around I now got an idea of how I'm going to implement everything. I'll keep as close as possible at the regular simple schema and at the way the pgsql tasks work. Just as with the optional linestring/bbox builder, the history import tasks will serve more then one scheme. I'm leaving relations out, again. the regular simple scheme - its the basis of all but not capable of holding history data + history columns - create and populate an extra column in way_nodes to store the way version. - change the PKs of way_nodes to allow more then one version of an element + way_nodes version builder - create and populate an extra column in way_nodes that holds the node version that corresponds to the way's timestamp + minor version builder - create and populate an extra column in ways and way_nodes to store the ways minor versions, which are generated by changes to the nodes of the way between version changes of the way self. + from-to-timestamp builder - create and populate an extra column in the nodes and ways table that specifies the date until which a version of an item was the current one. After that time, the next version of the same item was current (or the item was deleted). the tstamp field in contrast contains the starting date from which an item was current. + linestring / bbox builder - just the same as with the regular simple scheme, works for all version and minor-version rows Until the end of the week I'll get a pre snapshot out that can populate the history table with version columns and changed PKs. The database created from this can be used to test Scotts SQL-Only solution [1]. It will also contain a first implementation of the way_nodes version builder but only with an example implementation of the NodeStore, that performs bad on bigger files. Brett, the pgsql tasks currently write (in COPY mode) all data to temp files first. The process seems to be PlanetFile - NodeStoreTempFile - CopyFormatTempFile - PgsqlCopyImport in osm2pgsql the copy data is pushed to pgsql via unix pipes (5 or 6 COPY transactions running at the same time in different connections). This approach skips the CopyFormatTempFile stage. Is there any special reason this approach isn't used in the pgsnapshot package? Peter [1] http://lists.openstreetmap.org/pipermail/dev/2010-August/020308.html ___ osmosis-dev mailing list osmosis-dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/osmosis-dev ___ osmosis-dev mailing list osmosis-dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/osmosis-dev # download osmosis svn export http://svn.openstreetmap.org/applications/utils/osmosis/trunk/ osmosis-trunk # enter the source directory cd osmosis-trunk # download the history plugin svn export http://svn.toolserver.org/svnroot/mazder/osmhist/osmosis-plugin/history/ # enable the history plugin patch -p0 history/script/source-activation.patch # compile ant clean publish # reate a postgis user, if not already done sudo -u postgres createuser osmosis # create an empty database with hstore and postgis capabilities, if not already done sudo -u postgres createdb -E UTF8 -O osmosis osmosis-history sudo -u postgres createlang plpgsql osmosis-history # create the simple schema database psql -U osmosis osmosis-history package/script/pgsql_simple_schema_0.6.sql # add the history extension to the database psql -U osmosis osmosis-history history/script/pgsql_simple_schema_0.6_history.sql # the following lines add extra features to the database # execute before them before the import # they are experimental and very memory intensive # use only with small data sets # enable the node version builder #psql -U osmosis osmosis-history history/script/pgsql_simple_schema_0.6_history_way_nodes_version.sql # enable
Re: [osmosis-dev] Reading OSM History dumps
On Wed, Aug 25, 2010 at 11:33 PM, Peter Körner osm-li...@mazdermind.dewrote: Am 25.08.2010 15:26, schrieb Brett Henderson: In short it just hasn't been a high priority to change it. I was planning to share on FileInputStream/FileOutputStream level. You can feed a FileInputStream into the CopyManager as well as into a file, can't you? Sorry, I'm not sure what you mean. I think the only way to feed data into the CopyManager is via an InputStream. That InputStream can be a FileInputStream or a piped input stream or whatever you wish. But there are also classes like PGCopyOutputStream so perhaps you can use those directly to avoid using multiple threads. It's been a while since I looked at it. Maybe want to can copy the relevant bits later to pgsnapshot. Yep, sure. ___ osmosis-dev mailing list osmosis-dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/osmosis-dev
Re: [osmosis-dev] Reading OSM History dumps
On Thu, Aug 26, 2010 at 8:19 AM, Peter Körner osm-li...@mazdermind.dewrote: Hi Marco The first snapshot is out. Unfortunately the hstore migration progress Brett is still in let the pgsnapshot tests fail, which is why hudson is not providing nightly builds anymore. I hope to have this fixed over the next few days. I'm working with the server admins to get hstore support added to the database. Because of that you'll need to compile osmosis yourself. I attached instructions to this mail that also include the concrete plugin usage. If you wish to avoid compiling yourself you can also get nightly builds from: http://bretth.dev.openstreetmap.org/osmosis-build/ The 0.37.SNAPSHOT version in the above location is built via a cron job. No tests are run during the build. Brett ___ osmosis-dev mailing list osmosis-dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/osmosis-dev