Hi Daniel,
thanks for the input. It helps me to understand some of the reasons for the
problems I found.
Also thanks for checking the proposed algorithm.
I am still working on the code in the housenumber2 branch for mkgmap and I want
to finish this first.
I'll probably don't find time to do much more coding before end of summer,
so I hope that I've inspired someone to start cleaning up.
If not, maybe I'll find the time in some months.
Gerd
From: jfd...@hotmail.com
To: gpetermann_muenc...@hotmail.com
Subject: RE: [Talk-ca] duplicate address data
Date: Mon, 30 Mar 2015 07:03:40 -0400
Bonjour Gerd, I used to work for Natural Resources Canada (NRCan) who produced
Canvec files (note 1). I am actually the guy who made the conversion from
government to .osm map format. The objective was to provide 50K topographic
maps data to the community in OSM format, without modifications to the original
data (if possible). Reading your emails, I understand there are three problems
mixed together: Initial addr interpolation, multiple/bad Imports, and
inconsistencies between OSM and governmental data… Initial addr interpolation:
The interpolation lines and addresses were created from governmental street
network available at the time of conversion. There were slight changes in the
algorithm used to create addresses interpolation between the different versions
of the Canvec Product – however, most of them should look similar. However,
errors in original data were discovered when producing the interpolation but
could not be repaired (such as few meters road segments, bad addressing scheme,
etc…). Such errors were exceptions, not the norm. Addresses were available only
for first/last coordinates of original line segments, whatever the length of
that line segment. Sometime it results in address interpolation line with the
same address on both ends of the line; sometime you will find hundreds of
potential addresses between both ends. It might be helpful to know that the
width between interpolation lines and the original street network was set to
20m for tertiary-motorway, 15m for lower highway classes. It produced some
strange artefacts sometime. Multiple/bad Imports: The Canadian OSM community
asked being able to import Canvec data by layers (i.e. only street or waterway
network rather than the whole file) which explain the Canvec data model and the
way contributors had imported their data. Some contributor had imported data
layers without considering existing OSM content – which often included
previously imported Canvec data. It creates a lot of duplicated objects as you
have found out! In areas where the street networks were well developed, some
contributors imported only the address interpolation layer, which creates the
third problem… Inconsistencies in resulting OSM data: There are inconsistencies
between OSM and governmental data!-) The data model of governmental street
network differs from the OSM data model. I had to convert them to mimic the
Karlsruhe Schema. When only address interpolation layer were imported, the
geometry of the street network does not necessarily fit the geometry of the
address interpolation schema. It results that street segments will cross
address interpolation lines or may be found outside the interpolation lines of
that street. Street names may then be different from the street names in
addresses nodes. From my experiences, there is no way to know which one is the
actual road name. The algorithm you proposed seem right, even though I am not
sure looking at Canvec in the source would help (point 8). Hope it will
help.Daniel Notes (1): Some documentation you may have already read even if
the addressing schema is not documented
…http://wiki.openstreetmap.org/wiki/CanVechttp://wiki.openstreetmap.org/wiki/CanVec:_Geometric_Modelhttp://wiki.openstreetmap.org/wiki/CanVec:_Transportation_(TR)
From: Gerd Petermann [mailto:gpetermann_muenc...@hotmail.com]
Sent: March-29-15 01:44
To: talk-ca@openstreetmap.org
Subject: Re: [Talk-ca] duplicate address data Hi Stewart,
I don't care much about special cases.
I'd say that rural addressing is between 10-20% of addresses in Ontario.
Far from a special case.
OK. I understand that this is a problem, I just don't care about it because
I can't solve it with my knowledge.
I wanted to point out that the OSM data base for Canada contains a
huge amount of
- useless data like duplicated addr:interpolation ways including nodes
from different imports
which IMHO should be removed ASAP
Yes, I agree that there are some errors, but we can't guarantee that the
Canvec 10 data will be much better, or that some of the older data is
bad just because of its version. Imports work really badly in Canada, as
our source data isn't wonderful and we don't have enough folks on the
ground to verify.
Let's start with the simple problem first.
I don't want to replace data, I just want to remove completely obsolete