Re: [OSM-talk] Semi-automated edits - postal code database

2012-11-06 Thread Svavar Kjarrval
Hi.

This is an update to an e-mail I sent at the beginning of October to the
talk@osm list regarding updating postal codes in Iceland semi-automatically.

I wanted to let you know I have written the script, which is for Python
3.2. I have not yet submitted data made by the script but I haven't
detected any problems thus far. I have performed some random manual
checks on the output and see nothing wrong with the XML. JOSM didn't
complain when I opened the .osc file.

The input is any valid .osm file and the output is an .osc file (
https://wiki.openstreetmap.org/wiki/Osc) which lists any changes made.
The output can be loaded into an editor and submitted to the OSM server
from there.

You're free to adapt the script to suit your purpose but I recommend
that you always check the proposed changes before uploading. The code is
commented enough so anybody who knows Python should be able to know
what's going on there.

Minimum requirements:
- Enough computer memory. The larger the .osm file, the more memory the
script needs.
- Python 3.
- A working installation of the Osmosis program (
https://wiki.openstreetmap.org/wiki/Osmosis).

- Svavar Kjarrval

On 04/10/12 23:48, Martin Guttesen wrote:
 I have imported all the addresses for Faroe Islands
 and updating them from time to time when there is new data available
 see http://wiki.openstreetmap.org/wiki/Import/Catalogue/usfo
 i keep an Id tag (us.fo:Adressutal) so i can Create/Update or Delete
 address nodes


 -Original Message- From: Jochen Topf
 Sent: Thursday, October 04, 2012 7:39 AM
 To: Svavar Kjarrval
 Cc: talk@openstreetmap.org
 Subject: Re: [OSM-talk] Semi-automated edits - postal code database

 Hi!

 On Wed, Oct 03, 2012 at 11:10:05AM +, Svavar Kjarrval wrote:
 I'm trying to find a good method to maintain data from outside sources.
 The data in question is the Icelandic postal code database (which they
 say we may use freely). My searches on the OSM wiki have been fruitless
 so far.

 The idea is to maintain the data in associatedStreet relations. Each
 relation has a tag called 'götuskrá:id' which value is a direct
 reference to the row ID in the files we retrieve from the postal
 company's website. The file formats available are CVS and XML 1.0. The
 script would presumably go ever each associatedStreet relation and make
 any changes (if appropriate) when a götuskrá:id tag is found. The output
 could be an OSM change file loaded into an editor like JOSM to be
 uploaded manually. Maybe an automated process later when we're confident
 that everything is done correctly, and of course after submitting the
 script(s) for review by the local community.

 It is not a good idea to add some random ID of your favourite database to
 OSM, because nobody except you can understand this ID and do useful
 things
 with it. It just confuses mappers and make it more difficult to edit the
 data. For every change somebody does to the data they have to know
 what this
 tag means so that they can properly do their edit. And if they don't,
 people
 will just mess up your data and you will not be able to use this ID for
 syncing the data anyways.

 And in this case I don't even see why you need it. You have street
 names and
 postal codes in both OSM and the Icelandic postal code database. If
 something
 changes you can find out which combinations changed and apply those
 changes
 to OSM easily just based on the postal code and street name. There is no
 need for those IDs.

 And, btw, you should not use the associatedStreet relation. It solves
 the same
 problem as the addr:street tags on nodes and buildings but in a much more
 complicated way. The overwhelming majority of all addresses are tagged
 with
 addr:street (there are nearly 15 million addr:street tags vs. only 18.000
 associatedStreet relations).

 Jochen

#!/usr/bin/env python3.2
# -*- coding: utf-8 -*-

# Copyright 2012, Svavar Kjarrval Lúthersson
# Released under the CC0 license.
# I can be contacted at sva...@kjarrval.is.

# This program performs changes according to pretermined formulas to .osm files
# and outputs a single .osc file which in turn can either be submitted automatically
# by another program (which is not implemented here) or manually with an editor.

# To use it, you must have:
# 1 - An .osm file of the area in question.
# 2 - An Osmosis binary set up and ready to use.

# The reason the script filters instead of working directly on the original file
# is to reduce memory consumption of programs which need to load the complete .osm file into memory.
# If, despite having done proper filtering, the .osm file is still too big to fit into memory,
# please consider splitting the area further.

import os
import xml.etree.cElementTree as etree

# Change the value of DEBUG to 0 when you don't want extra debug messages to appear on screen.
DEBUG = 0

# Get the current working directory
pwd = os.getcwd() + '/'

# Location of the osmosis binary.
osmosis_bin = '/home/kjarrval/bin/osmosis

Re: [OSM-talk] Semi-automated edits - postal code database

2012-10-04 Thread Christian Quest
2012/10/4 Jochen Topf joc...@remote.org:
 And, btw, you should not use the associatedStreet relation. It solves the same
 problem as the addr:street tags on nodes and buildings but in a much more
 complicated way. The overwhelming majority of all addresses are tagged with
 addr:street (there are nearly 15 million addr:street tags vs. only 18.000
 associatedStreet relations).


Direct comparison of number of addr:street tags and associatedStreet
relations is not that simple.
How many addresses are behind the associatedStreet relations ?

For example in France, we currently have:
- 27730 associatedStreet relations
- 472941 members with the house role
- 395895 on 761051 nodes (52%) and 78541 on 187193 ways (42%) with
addr:housenumber in these relations, so a total of 50% of addresses
are in associatedStreet relations.

This is also due to JOSM plugin we use to simplify creating addresses
which automatically takes care of all the associatedStreet relation
stuff.
We also developed quality assurance analysis on our osmose tool to
make sure the addresses are coherent (unique addr:number in one
relation, unique relation for one addr:street in a town, limited
distance between addr:housenumber nodes/ways and the street highway,
etc).

-- 
Christian Quest - OpenStreetMap France - http://openstreetmap.fr/u/cquest

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] Semi-automated edits - postal code database

2012-10-04 Thread Jochen Topf
On Thu, Oct 04, 2012 at 09:58:02AM +0200, Christian Quest wrote:
 2012/10/4 Jochen Topf joc...@remote.org:
  And, btw, you should not use the associatedStreet relation. It solves the 
  same
  problem as the addr:street tags on nodes and buildings but in a much more
  complicated way. The overwhelming majority of all addresses are tagged with
  addr:street (there are nearly 15 million addr:street tags vs. only 18.000
  associatedStreet relations).
 
 
 Direct comparison of number of addr:street tags and associatedStreet
 relations is not that simple.

Okay sorry. Worldwide we have about 16 million addr:housenumber tags and about
15 million addr:street tags. So there is no addr:street for about 1 mio
housenumbers.  Presumably thats because they are members in an associatedStreet
relation. (It could also be because it is easy to find the right street,
because it is the one next to the house, but lets ignore those cases.) So
its still less than 10%.

Jochen
-- 
Jochen Topf  joc...@remote.org  http://www.remote.org/jochen/  +49-721-388298

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] Semi-automated edits - postal code database

2012-10-04 Thread Tobias Knerr
On 04.10.2012 14:53, Ed Loach wrote:
 But how many of the 15 million are the results of imports taking the
 easy way of using addr:street? Taginfo lists combinations and we
 have 2.3 million that also have osak: tags, 0.8 million that also
 have kms: tags, then lesser combinations such as uir_adr:ADRESA_KOD,
 usar_addr:edit_date, mvdgis:cod_nombre, chicago:building_id and
 surrey:addrid and that's only got me to page 5 of 519 of the
 combinations.

It is true that probably a lot of these are imports. But this might be
true for both tagging styles, and you also have to account for the JOSM
plugins where the authors decided to automatically create relations.
They don't necessarily set tags like that, so they are harder to filter out.

 Then you have all the people who have used addr:street instead of
 the relation because it seems the more popular option, perhaps only
 because of those imports.

Then you have all the people who believe that relations are easier to
use for computers - after all, why would anyone use that confusing
concept otherwise? -, and therefore suffer through them because they
mistakenly believe that it makes their data better.
Or they think that addr:street is outdated because relations as a whole
are newer than other elements. (I've encountered both of these beliefs.)

Imo, addr:street is more straightforward to understand, makes the common
beginner task of entering or fixing an address much more accessible and
is therefore preferable over relations. The number of uses is hard to
measure, but doesn't really affect these basic arguments anyway.

To me, it's associatedStreet which seems out of place in OSM tagging,
and that's not just because it uses camelCase for some reason. ;)

Tobias

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] Semi-automated edits - postal code database

2012-10-04 Thread Paul Norman
 From: Christian Quest [mailto:cqu...@openstreetmap.fr]
 Sent: Thursday, October 04, 2012 12:58 AM
 To: talk@openstreetmap.org
 Subject: Re: [OSM-talk] Semi-automated edits - postal code database
 
 2012/10/4 Jochen Topf joc...@remote.org:
  And, btw, you should not use the associatedStreet relation. It solves
  the same problem as the addr:street tags on nodes and buildings but in
  a much more complicated way. The overwhelming majority of all
  addresses are tagged with addr:street (there are nearly 15 million
  addr:street tags vs. only 18.000 associatedStreet relations).
 
 Direct comparison of number of addr:street tags and associatedStreet
 relations is not that simple.
 How many addresses are behind the associatedStreet relations ?

And how many associatedStreets don't have addresses at all?
http://www.openstreetmap.org/browse/relation/2523 doesn't have any members
except for streets.

A more accurate count would be how many relation members have the type house
and are also a member of a relatedStreet relation.

The answer is 1128546 objects. Broken down by object type, this is 656010
nodes, 658 relations and 471878 ways.

So there is about a 13:1 preference in the database for addr:street over
relations.


___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] Semi-automated edits - postal code database

2012-10-04 Thread Martin Guttesen

I have imported all the addresses for Faroe Islands
and updating them from time to time when there is new data available
see http://wiki.openstreetmap.org/wiki/Import/Catalogue/usfo
i keep an Id tag (us.fo:Adressutal) so i can Create/Update or Delete address 
nodes



-Original Message- 
From: Jochen Topf

Sent: Thursday, October 04, 2012 7:39 AM
To: Svavar Kjarrval
Cc: talk@openstreetmap.org
Subject: Re: [OSM-talk] Semi-automated edits - postal code database

Hi!

On Wed, Oct 03, 2012 at 11:10:05AM +, Svavar Kjarrval wrote:

I'm trying to find a good method to maintain data from outside sources.
The data in question is the Icelandic postal code database (which they
say we may use freely). My searches on the OSM wiki have been fruitless
so far.

The idea is to maintain the data in associatedStreet relations. Each
relation has a tag called 'götuskrá:id' which value is a direct
reference to the row ID in the files we retrieve from the postal
company's website. The file formats available are CVS and XML 1.0. The
script would presumably go ever each associatedStreet relation and make
any changes (if appropriate) when a götuskrá:id tag is found. The output
could be an OSM change file loaded into an editor like JOSM to be
uploaded manually. Maybe an automated process later when we're confident
that everything is done correctly, and of course after submitting the
script(s) for review by the local community.


It is not a good idea to add some random ID of your favourite database to
OSM, because nobody except you can understand this ID and do useful things
with it. It just confuses mappers and make it more difficult to edit the
data. For every change somebody does to the data they have to know what this
tag means so that they can properly do their edit. And if they don't, people
will just mess up your data and you will not be able to use this ID for
syncing the data anyways.

And in this case I don't even see why you need it. You have street names and
postal codes in both OSM and the Icelandic postal code database. If 
something

changes you can find out which combinations changed and apply those changes
to OSM easily just based on the postal code and street name. There is no
need for those IDs.

And, btw, you should not use the associatedStreet relation. It solves the 
same

problem as the addr:street tags on nodes and buildings but in a much more
complicated way. The overwhelming majority of all addresses are tagged with
addr:street (there are nearly 15 million addr:street tags vs. only 18.000
associatedStreet relations).

Jochen
--
Jochen Topf  joc...@remote.org  http://www.remote.org/jochen/ 
+49-721-388298


___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk 



___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


[OSM-talk] Semi-automated edits - postal code database

2012-10-03 Thread Svavar Kjarrval
Hi.

I'm trying to find a good method to maintain data from outside sources.
The data in question is the Icelandic postal code database (which they
say we may use freely). My searches on the OSM wiki have been fruitless
so far.

The idea is to maintain the data in associatedStreet relations. Each
relation has a tag called 'götuskrá:id' which value is a direct
reference to the row ID in the files we retrieve from the postal
company's website. The file formats available are CVS and XML 1.0. The
script would presumably go ever each associatedStreet relation and make
any changes (if appropriate) when a götuskrá:id tag is found. The output
could be an OSM change file loaded into an editor like JOSM to be
uploaded manually. Maybe an automated process later when we're confident
that everything is done correctly, and of course after submitting the
script(s) for review by the local community.

I can make the script myself in Python if neccessary but decided to find
out if somebody has already done all the work before.

With regards,
Svavar Kjarrval



signature.asc
Description: OpenPGP digital signature
___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk