Re: [Talk-ca] duplicate address data‏

2015-03-30 Thread Gerd Petermann
Hi Daniel,

thanks for the input. It helps me to understand some of the reasons for the 
problems I found.
Also thanks for checking the proposed algorithm.

I am still working on the code in the housenumber2 branch for mkgmap and I want 
to finish this first.
I'll probably don't find time to do much more coding before end of summer,
so I hope that I've inspired someone to start cleaning up.
If not, maybe I'll find the time in some months.

Gerd



From: jfd...@hotmail.com
To: gpetermann_muenc...@hotmail.com
Subject: RE: [Talk-ca] duplicate address data‏
Date: Mon, 30 Mar 2015 07:03:40 -0400

Bonjour Gerd, I used to work for Natural Resources Canada (NRCan) who produced 
Canvec files (note 1). I am actually the guy who made the conversion from 
government to .osm map format. The objective was to provide 50K topographic 
maps data to the community in OSM format, without modifications to the original 
data (if possible).  Reading your emails, I understand there are three problems 
mixed together:  Initial addr interpolation, multiple/bad Imports, and 
inconsistencies between OSM and governmental data… Initial addr interpolation:  
The interpolation lines and addresses were created from governmental street 
network available at the time of conversion.  There were slight changes in the 
algorithm used to create addresses interpolation between the different versions 
of the Canvec Product – however, most of them should look similar. However, 
errors in original data were discovered when producing the interpolation but 
could not be repaired (such as few meters road segments, bad addressing scheme, 
etc…). Such errors were exceptions, not the norm. Addresses were available only 
for first/last coordinates of original line segments, whatever the length of 
that line segment. Sometime it results in address interpolation line with the 
same address on both ends of the line; sometime you will find hundreds of 
potential addresses between both ends. It might be helpful to know that the 
width between interpolation lines and the original street network was set to 
20m for tertiary-motorway, 15m for lower highway classes. It produced some 
strange artefacts sometime. Multiple/bad Imports:  The Canadian OSM community 
asked being able to import Canvec data by layers (i.e. only street or waterway 
network rather than the whole file) which explain the Canvec data model and the 
way contributors had imported their data.  Some contributor had imported data 
layers without considering existing OSM content – which often included 
previously imported Canvec data. It creates a lot of duplicated objects as you 
have found out! In areas where the street networks were well developed, some 
contributors imported only the address interpolation layer, which creates the 
third problem… Inconsistencies in resulting OSM data: There are inconsistencies 
between OSM and governmental data!-)  The data model of governmental street 
network differs from the OSM data model. I had to convert them to mimic the 
Karlsruhe Schema. When only address interpolation layer were imported, the 
geometry of the street network does not necessarily fit the geometry of the 
address interpolation schema. It results that street segments will cross 
address interpolation lines or may be found outside the interpolation lines of 
that street.  Street names may then be different from the street names in 
addresses nodes. From my experiences, there is no way to know which one is the 
actual road name. The algorithm you proposed seem right, even though I am not 
sure looking at Canvec in the source would help (point 8).  Hope it will 
help.Daniel  Notes (1): Some documentation you may have already read even if 
the addressing schema is not documented 
…http://wiki.openstreetmap.org/wiki/CanVechttp://wiki.openstreetmap.org/wiki/CanVec:_Geometric_Modelhttp://wiki.openstreetmap.org/wiki/CanVec:_Transportation_(TR)
From: Gerd Petermann [mailto:gpetermann_muenc...@hotmail.com] 
Sent: March-29-15 01:44
To: talk-ca@openstreetmap.org
Subject: Re: [Talk-ca] duplicate address data‏ Hi Stewart,

 I don't care much about special cases.

I'd say that rural addressing is between 10-20% of addresses in Ontario.
Far from a special case.

OK. I understand that this is a problem, I just don't care about it because
I can't solve it with my knowledge.


 I wanted to point out that the OSM data base for Canada contains a
 huge amount of
 - useless data like duplicated addr:interpolation ways including nodes
 from different imports
  which IMHO should be removed ASAP

Yes, I agree that there are some errors, but we can't guarantee that the
Canvec 10 data will be much better, or that some of the older data is
bad just because of its version. Imports work really badly in Canada, as
our source data isn't wonderful and we don't have enough folks on the
ground to verify.

Let's start with the simple problem first.
I don't want to replace data, I just want to remove completely obsolete

Re: [Talk-ca] Talk-ca Digest, Vol 85, Issue 15

2015-03-30 Thread Gilles Allard
Today, Gerd Petermann wrote:
 - wrong data like 
  * addr:interpolation ways crossing the road
  *  addr:interpolation ways with nodes that have equal numbers

I don't see any problem with addr:interpolation ways going from the beginning 
up to the end of a street. If it cause a problem in the renderer, it should be 
addressed by the renderer developers.
For the mapper (surveyor), the address vector is, (in most cases) a linear set 
of data.
If the constraint (addr:interpolation crossing a junction) is  removed, then 
segments can be merged and duplicated addresses will not be a problem anymore.

dega

Le 28 mars 2015, 08:15:56 talk-ca-requ...@openstreetmap.org a écrit :
 Send Talk-ca mailing list submissions to
   talk-ca@openstreetmap.org
 
 To subscribe or unsubscribe via the World Wide Web, visit
   https://lists.openstreetmap.org/listinfo/talk-ca
 or, via email, send a message with subject or body 'help' to
   talk-ca-requ...@openstreetmap.org
 
 You can reach the person managing the list at
   talk-ca-ow...@openstreetmap.org
 
 When replying, please edit your Subject line so it is more specific
 than Re: Contents of Talk-ca digest...
 
 
 Today's Topics:
 
1. Re: duplicate address data (Gerd Petermann)
2. Re: duplicate address data (Stewart C. Russell)
3. duplicate address data (Gerd Petermann)
 
 
 --
 
 Message: 1
 Date: Fri, 27 Mar 2015 14:51:35 +0100
 From: Gerd Petermann gpetermann_muenc...@hotmail.com
 To: talk-ca@openstreetmap.org talk-ca@openstreetmap.org
 Subject: Re: [Talk-ca] duplicate address data
 Message-ID: dub112-w68d686608b2188f44c13209e...@phx.gbl
 Content-Type: text/plain; charset=utf-8
 
 Hi all,
 
 I've tried the latest data for Ontario.
 I see few errors in data with source NRCan-CanVec-10.0,
 but I still see quite a lot of warnings like this:
 
 http://www.openstreetmap.org/way/176690658 addr:interpolation way connects
 two points with equal numbers, numbers are ignored
 
 I also see a lot of messages like this:
 found no street for house number element County Road 17 1002
 http://www.openstreetmap.org/node/2009492976 , distance to next possible
 road: 9161 m
 
 
 I also tried the inspector at http://tools.geofabrik.de/osmi/ 
 The tools shows a lot of errors, but I fear nobody will spent the time to 
 investigate an error when he sees  100 other errors close to it,
 and all of them seem to be real errors.
 
 I agree that the best approach to fix this problem seems to be to remove all
 
 the old data and start from scratch with the latest import data.
 
 I will not do anything like that, but I think a good approach is to remove 
 all address data with a tag like
 source=CanVec␣6.0␣-␣NRCan
 and maybe also 
 source=NRCan-CanVec-7.0
 
 and than see what is missing.
 
 Gerd
 
 
 Date: Thu, 26 Mar 2015 15:20:36 -0400
 Subject: Re: [Talk-ca] duplicate address data
 From: jwhelan0...@gmail.com
 To: gpetermann_muenc...@hotmail.com
 CC: talk-ca@openstreetmap.org
 
 Basically the CANVEC data imports need cleaning up.  I think there were ten
 different versions, the most common I think is 7.  Unfortunately some
 mappers locally removed the CANVEC tags from the data if they touched it or
 even on the import as they didn't think it was important.  Sometimes
 addresses were imported with a CANVEC tag, sometimes not.  Due to funding
 cutbacks the CANVEC data is no longer exported in OSM format.
 
 Also there was some original mapping done from low resolution satellites,
 Yahoo I think provided the images so some roads were mapped about 100
 meters from where they should be, where highways had been mapped the CANVEC
 imports were sometimes used and sometimes not.  In Ottawa we took a local
 decision to delete all the roads above service roads and replace them with
 CANVEC imports because of the data quality issues of the existing road
 network and that was some years ago.
 
 Unfortunately in Canada we have fewer mappers per kilometer of highway than
 in Germany and the CANVEC imports were very useful.
 
 The clean up solution I would suggest would be to delete all address
 information with a CANVEC tag on it then import only CANVEC 10 which is the
 latest version but that's a lot of work but the end result would be clean. 
 It might also hit problems as the address information would follow the
 CANVEC highways rather than those highways mapped in other ways but it
 would only be the address information and the road network would remain as
 it is.
 
 Cheerio John
 
 On 26 March 2015 at 14:54, Gerd Petermann gpetermann_muenc...@hotmail.com
 wrote:
 
 
 
 Hi list,
 
 I am one of the developers of mkgmap, see also
 http://wiki.openstreetmap.org/wiki/Mkgmap
 and
 http://gis.19327.n5.nabble.com/Mkgmap-Development-f5324443.html
 
 During the last weeks I've enhanced the support for 
 the evaluation of addr:interpolation ways or more general
 the evaluation of addr:housenumber, addr:street and so on
 to