[OSM-talk-be] Fwd: [OSM-talk] Fwd: Addresses are a tiny fraction of what we do (was: The world's best addressable map)

2014-10-24 Thread Marc Gemis
crossposting from the talk mailing list, where address imports are also
discussed at the moment.

-- Forwarded message --
From: Christian Quest cqu...@openstreetmap.fr
Date: Fri, Oct 24, 2014 at 9:38 PM
Subject: Re: [OSM-talk] Fwd: Addresses are a tiny fraction of what we do
(was: The world's best addressable map)
To: OpenStreetMap t...@openstreetmap.org


Addresses in France...

We started a project to collect addresses on a separate database called
BANO (Base d'Adresses Nationale Ouverte : Open National Address Database).

We've recreated data from the national cadastre (scrapping 1.3 millions PDF
files), opendata source and... OSM.

This database contains 15+ millions addresses so far, and we added almost 4
millions hamlet and locality names recently.
A full dump contains 19.7 millions locations ranging from housenumber to
municipalities (no POI).

Why we did it that way ?

Import of millions of address can be done quick and dirty in a couple of
days, but such a blind import does not really fit the import policy and
we also learned from the TIGER import that fixing data is much less fun
than creating new data.

Why import all this if the data is available (under ODbL) ?

It seems much better to take the required time to import these data street
by street, reviewing it to make sure we improve its quality and not just
copy it. This will take years, many years (from 5 to 20) depending on how
deep to review the data before the upload. Some contributors have started
this work, but it is really boring and I don't expect we can attract a
large bunch of contributors on that project.

Anyway, BANO updates its content every night and collects new OSM addresses
to replace other sources. So it also take advantage of address
reviewing/fixing done in OSM during this import process or during any
address related contribution.

What is much more interesting is that OSM contributors can use BANO to
detect missing roads/streets and names (we have a BANO tiled overlay
showing missing names like here
http://layers.openstreetmap.fr/?zoom=18lat=48.8474lon=3.23191layers=BFT
).
This seems much more useful as we're far from having all roads and streets
mapped and named in France.

We can even see this BANO effect on some graphs:
http://osm2020.free.fr/qa-commune/popu-sans-route-name-france.png

Yes, something happened last may... BANO started to be available at that
time and the population for which no nearby named road was present as
decreased almost twice faster since then.

You can see also the missing names graph here:
http://munin.openstreetmap.fr/osm12.free.org/osm104.openstreetmap.fr/bano_rapproche.html
More than 100.000 names have been added since may.


To summarize... yes, address are really an important dataset, mainly
because it allows to cross the boundary between non geographic data (postal
addresses) and geographic data with the help of (good) geocoding algorithm.
This allows to bring a lot of new data users to OSM by providing the data
fuel for services like routing from address A to address B. Some public
services web sites have started using OSM + BANO that way.
This also allows to geocode new (open) datasets to improve OSM with more
interesting data (we're about to do this for almost 3 pharmacy).

Is it mandatory to have the huge address datasets in OSM ?
Maybe not, and not if the import process does not bring any improvement to
the data.
Mappers' time seems to me much better used for less mechanical
contributions.

-- 
Christian Quest - OpenStreetMap France

___
talk mailing list
t...@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk
___
Talk-be mailing list
Talk-be@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-be


Re: [OSM-talk] Fwd: Addresses are a tiny fraction of what we do (was: The world's best addressable map)

2014-10-24 Thread Christian Quest
Addresses in France...

We started a project to collect addresses on a separate database called
BANO (Base d'Adresses Nationale Ouverte : Open National Address Database).

We've recreated data from the national cadastre (scrapping 1.3 millions PDF
files), opendata source and... OSM.

This database contains 15+ millions addresses so far, and we added almost 4
millions hamlet and locality names recently.
A full dump contains 19.7 millions locations ranging from housenumber to
municipalities (no POI).

Why we did it that way ?

Import of millions of address can be done quick and dirty in a couple of
days, but such a blind import does not really fit the import policy and
we also learned from the TIGER import that fixing data is much less fun
than creating new data.

Why import all this if the data is available (under ODbL) ?

It seems much better to take the required time to import these data street
by street, reviewing it to make sure we improve its quality and not just
copy it. This will take years, many years (from 5 to 20) depending on how
deep to review the data before the upload. Some contributors have started
this work, but it is really boring and I don't expect we can attract a
large bunch of contributors on that project.

Anyway, BANO updates its content every night and collects new OSM addresses
to replace other sources. So it also take advantage of address
reviewing/fixing done in OSM during this import process or during any
address related contribution.

What is much more interesting is that OSM contributors can use BANO to
detect missing roads/streets and names (we have a BANO tiled overlay
showing missing names like here
http://layers.openstreetmap.fr/?zoom=18lat=48.8474lon=3.23191layers=BFT
).
This seems much more useful as we're far from having all roads and streets
mapped and named in France.

We can even see this BANO effect on some graphs:
http://osm2020.free.fr/qa-commune/popu-sans-route-name-france.png

Yes, something happened last may... BANO started to be available at that
time and the population for which no nearby named road was present as
decreased almost twice faster since then.

You can see also the missing names graph here:
http://munin.openstreetmap.fr/osm12.free.org/osm104.openstreetmap.fr/bano_rapproche.html
More than 100.000 names have been added since may.


To summarize... yes, address are really an important dataset, mainly
because it allows to cross the boundary between non geographic data (postal
addresses) and geographic data with the help of (good) geocoding algorithm.
This allows to bring a lot of new data users to OSM by providing the data
fuel for services like routing from address A to address B. Some public
services web sites have started using OSM + BANO that way.
This also allows to geocode new (open) datasets to improve OSM with more
interesting data (we're about to do this for almost 3 pharmacy).

Is it mandatory to have the huge address datasets in OSM ?
Maybe not, and not if the import process does not bring any improvement to
the data.
Mappers' time seems to me much better used for less mechanical
contributions.

-- 
Christian Quest - OpenStreetMap France
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] Fwd: Addresses are a tiny fraction of what we do (was: The world's best addressable map)

2014-10-24 Thread Marc Gemis
Thanks a lot for sharing the methodology of the French community.
I like this approach

regards

m

On Fri, Oct 24, 2014 at 9:38 PM, Christian Quest cqu...@openstreetmap.fr
wrote:

 Addresses in France...

 We started a project to collect addresses on a separate database called
 BANO (Base d'Adresses Nationale Ouverte : Open National Address Database).

 We've recreated data from the national cadastre (scrapping 1.3 millions
 PDF files), opendata source and... OSM.

 This database contains 15+ millions addresses so far, and we added almost
 4 millions hamlet and locality names recently.
 A full dump contains 19.7 millions locations ranging from housenumber to
 municipalities (no POI).

 Why we did it that way ?

 Import of millions of address can be done quick and dirty in a couple of
 days, but such a blind import does not really fit the import policy and
 we also learned from the TIGER import that fixing data is much less fun
 than creating new data.

 Why import all this if the data is available (under ODbL) ?

 It seems much better to take the required time to import these data street
 by street, reviewing it to make sure we improve its quality and not just
 copy it. This will take years, many years (from 5 to 20) depending on how
 deep to review the data before the upload. Some contributors have started
 this work, but it is really boring and I don't expect we can attract a
 large bunch of contributors on that project.

 Anyway, BANO updates its content every night and collects new OSM
 addresses to replace other sources. So it also take advantage of address
 reviewing/fixing done in OSM during this import process or during any
 address related contribution.

 What is much more interesting is that OSM contributors can use BANO to
 detect missing roads/streets and names (we have a BANO tiled overlay
 showing missing names like here
 http://layers.openstreetmap.fr/?zoom=18lat=48.8474lon=3.23191layers=BFT
 ).
 This seems much more useful as we're far from having all roads and streets
 mapped and named in France.

 We can even see this BANO effect on some graphs:
 http://osm2020.free.fr/qa-commune/popu-sans-route-name-france.png

 Yes, something happened last may... BANO started to be available at that
 time and the population for which no nearby named road was present as
 decreased almost twice faster since then.

 You can see also the missing names graph here:
 http://munin.openstreetmap.fr/osm12.free.org/osm104.openstreetmap.fr/bano_rapproche.html
 More than 100.000 names have been added since may.


 To summarize... yes, address are really an important dataset, mainly
 because it allows to cross the boundary between non geographic data (postal
 addresses) and geographic data with the help of (good) geocoding algorithm.
 This allows to bring a lot of new data users to OSM by providing the data
 fuel for services like routing from address A to address B. Some public
 services web sites have started using OSM + BANO that way.
 This also allows to geocode new (open) datasets to improve OSM with more
 interesting data (we're about to do this for almost 3 pharmacy).

 Is it mandatory to have the huge address datasets in OSM ?
 Maybe not, and not if the import process does not bring any improvement to
 the data.
 Mappers' time seems to me much better used for less mechanical
 contributions.

 --
 Christian Quest - OpenStreetMap France

 ___
 talk mailing list
 talk@openstreetmap.org
 https://lists.openstreetmap.org/listinfo/talk


___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk