> From: Olivier Croquette [mailto:m...@ocroquette.de]
> Sent: Wednesday, September 26, 2012 11:59 AM
> Subject: Re: [OSM-talk] All you've ever wanted to know about the french
> cadastre
> 
> > This is not an example that you only find after a long search; it is a
> typical cadastre import building.
> 
> Until you can back up your claim with solid numbers, your claim, more
> specifically the word"typical", is just FUD.
> Furthermore it can hurt many hard working french contributors, who for a
> single city spent dozens of hours integrating the cadaster into OSM.

Time for some numbers then...

Detailed data is available upon request.

I decided that to rewind my scripts one week and let them run until I had at
least 15 import changesets by at least 10 different users was the simplest
way to get a representative sample of recent cadastre imports. A set of 16
changesets was generated. A side effect of running the scripts was
generating a list of all changesets that overlapped with the time period. As
there were approximately 11k changesets I used python's random library to
select 1000 random changesets as a representative sample of data the same
age.

By setting these criteria beforehand I avoided any bias that may be present
in existing import changeset lists such as a list from a list of blocks.

Due to technical reasons this analysis is based on the data as it is now,
not as it was immediately after import. This may result in bad imports (e.g.
duplicate uploads) not being considered if they were reverted in the past
week.

The total time reviewed was approximately from Wed, 19 Sep 2012 20:00 to
Thu, 20 Sep 2012 17:30, or approximately 1 day. If there is day to day
variation in import quality this analysis will not show it.

Although not the point of this analysis, one potential measure of
integration with other data is the version of the objects. I present this
for general interest, not for the question of what a typical cadastre
building is.

Previous research into object versions has found they tend to obey a power
law (after normalization with number of v1 objects) where count = version ^
m.

A best fit line on a log-log plot finds m=-4.681 (R^2=0.946) for the
imported ways and m=-2.243 (R^2=0.991). There is a marked difference between
the versions of ways in the two sets of changesets. This is not unexpected
as the cadastre imports are primarily new buildings and involve minimal
changes to existing data. [1] An analysis of buildings in the random
changesets finds m=-3.520 (R^2=0.971) but a similar import building-only
analysis does not find enough data to draw conclusions from.

Changes to ways in cadastre imports cannot be said be similar to changes to
ways in random changesets w.r.t. versions of objects.

Now, the analysis of geometry.

One measure of how broken down into parts buildings are is to take the
buildings, turn them into polygons, combine them into one multipolygon with
ST_Union and then count the number of parts with ST_Dump and compare it with
the original number of buildings. This does not consider buildings made from
multipolygons (e.g. those with an inner hole). I get the following data

Changeset  Joined  Original
13175035     3661      6341
13175649      503      1240
13176058      521       951
13176212      219       341
13176769      922      1510
13177032     1515      2782
13177569     2216      4291
13180264      536       830
13180628     1449      2230
13181698        2         2
13183198      506       883
13184921      286       462
13185567      255       438
13185645     1135      2373
Total       13726     24674

An analysis of the average number of parts of a building is beyond the scope
of this, but if you assume that a building like Frederik's example on
average consists of 5 parts then 20% of buildings then consist of multiple
ways. (Or 55% of ways are part of these buildings)

For reference, when I ran the same analysis on the random data (but not
grouping by changeset) I found 5023 buildings and 6375 ways. Using the same
assumptions this is 6.7% of buildings.

I repeated the same analysis, looking only at v1 ways and found 4249
buildings and 5497 for the random changesets and 13109 buildings and 23611
ways for import changesets.

Conclusion:

A significant number of cadastre imported buildings consist of multiple
ways, such as in the example Frederik gave. The difference from other
buildings a week old is statistically significant. This is true even if only
looking at the subset of buildings that are new buildings.
        
[1]: If anyone doubts this I could carry out an analysis on this point. 


_______________________________________________
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk

Reply via email to