> > > I will repeat that this is not something which COULD be done, this > comparison is something, what IS ACTUALLY DONE and has been done for > years.
Tomas, this is what I understand from what you are saying: * You download a geotagging wikidata dump and generate a table with latitude, longitude, and a wiki page title. * You also generate the same table from OSM for all nodes, ways (using geo centroid?), and relations (using ??) * you compare article titles between the two, and when OSM has something that Wikipedia doesn't, you search automatically by geo proximity, or you let users fix it or ?? If I understood you correctly (and please correct my understanding if I did not), it wouldn't work for the whole planet, simply because the average distance between what OSM has and what Wikidata has is far too great to be useful. Maybe Lithuania, being a relatively small area with a very active community has been kept up in a perfect form (and each geo point is identical in both Wikidata & OSM, which might be a licensing issue), but the current state of the world OSM data is that there are only 17% of nodes are within 10 meters of their Wikidata counterpart. If we count ways and relations, it drops to 11% -- http://tinyurl.com/ybp4tp7a In other words, with your approach, you can detect when OSM's wikipedia tag is no longer correct, because Wikipedia geo dump no longer has it. But afterwards you have to go and fix it by hand. And this is pretty much the only operation you can do with this approach. You cannot analyze tens of thousands of existing wikipedia tags that are pointing to links, disambigs, people, tree species, places of business - you can simply mark them as "geo missing in Wikipedia". I took a quick look at the various quality control queries I built on the cleanup page. Lithuania does seem pretty clean, with only one disambiguation at the moment (has been there for 4 months) - https://www.openstreetmap.org/node/1717783246 - but both have the same location, two airports that point to a list - https://www.openstreetmap.org/node/1042034645 and https://www.openstreetmap.org/node/1042034660 . None of these issues are possible to find with your approach, or detect renaming. For the rest of the world, the situation is much worse.
_______________________________________________ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk