> From: Olivier Croquette [mailto:m...@ocroquette.de] > Sent: Wednesday, September 26, 2012 11:59 AM > Subject: Re: [OSM-talk] All you've ever wanted to know about the french > cadastre > > > This is not an example that you only find after a long search; it is a > typical cadastre import building. > > Until you can back up your claim with solid numbers, your claim, more > specifically the word"typical", is just FUD. > Furthermore it can hurt many hard working french contributors, who for a > single city spent dozens of hours integrating the cadaster into OSM.
Time for some numbers then... Detailed data is available upon request. I decided that to rewind my scripts one week and let them run until I had at least 15 import changesets by at least 10 different users was the simplest way to get a representative sample of recent cadastre imports. A set of 16 changesets was generated. A side effect of running the scripts was generating a list of all changesets that overlapped with the time period. As there were approximately 11k changesets I used python's random library to select 1000 random changesets as a representative sample of data the same age. By setting these criteria beforehand I avoided any bias that may be present in existing import changeset lists such as a list from a list of blocks. Due to technical reasons this analysis is based on the data as it is now, not as it was immediately after import. This may result in bad imports (e.g. duplicate uploads) not being considered if they were reverted in the past week. The total time reviewed was approximately from Wed, 19 Sep 2012 20:00 to Thu, 20 Sep 2012 17:30, or approximately 1 day. If there is day to day variation in import quality this analysis will not show it. Although not the point of this analysis, one potential measure of integration with other data is the version of the objects. I present this for general interest, not for the question of what a typical cadastre building is. Previous research into object versions has found they tend to obey a power law (after normalization with number of v1 objects) where count = version ^ m. A best fit line on a log-log plot finds m=-4.681 (R^2=0.946) for the imported ways and m=-2.243 (R^2=0.991). There is a marked difference between the versions of ways in the two sets of changesets. This is not unexpected as the cadastre imports are primarily new buildings and involve minimal changes to existing data. [1] An analysis of buildings in the random changesets finds m=-3.520 (R^2=0.971) but a similar import building-only analysis does not find enough data to draw conclusions from. Changes to ways in cadastre imports cannot be said be similar to changes to ways in random changesets w.r.t. versions of objects. Now, the analysis of geometry. One measure of how broken down into parts buildings are is to take the buildings, turn them into polygons, combine them into one multipolygon with ST_Union and then count the number of parts with ST_Dump and compare it with the original number of buildings. This does not consider buildings made from multipolygons (e.g. those with an inner hole). I get the following data Changeset Joined Original 13175035 3661 6341 13175649 503 1240 13176058 521 951 13176212 219 341 13176769 922 1510 13177032 1515 2782 13177569 2216 4291 13180264 536 830 13180628 1449 2230 13181698 2 2 13183198 506 883 13184921 286 462 13185567 255 438 13185645 1135 2373 Total 13726 24674 An analysis of the average number of parts of a building is beyond the scope of this, but if you assume that a building like Frederik's example on average consists of 5 parts then 20% of buildings then consist of multiple ways. (Or 55% of ways are part of these buildings) For reference, when I ran the same analysis on the random data (but not grouping by changeset) I found 5023 buildings and 6375 ways. Using the same assumptions this is 6.7% of buildings. I repeated the same analysis, looking only at v1 ways and found 4249 buildings and 5497 for the random changesets and 13109 buildings and 23611 ways for import changesets. Conclusion: A significant number of cadastre imported buildings consist of multiple ways, such as in the example Frederik gave. The difference from other buildings a week old is statistically significant. This is true even if only looking at the subset of buildings that are new buildings. [1]: If anyone doubts this I could carry out an analysis on this point. _______________________________________________ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk