> This all conversation confort my (un-educated, I confess) idea of the > uselessness of cross referencing the Wikipedia ecosystem with OSM with OSM > tags. > > Automated addition of wikidata id to OSM objects seems worthy, so why not > doing it on the fly instead of writing it to the database? Next year maybe?
When you have a lot of data, it becomes difficult to verify it. That is you encounter an "Oracle problem" (it takes too much time for somebody with a knowledge to verify the date, or you do not have an Oracle who can verify the data). One way of solving Oracle problem is to have a different dataset captured INDEPENDENTLY. This way you can compare two (or more) datasets and identify PROBABLE errors. When such errors/incompatibilities are found, they should be checked manually and resolved manually. If you do a dumb copy/overwrite data (maybe converted data as is the case of wikipedia article->wikidata) you lose the "two dataset" situation. That is you can no longer use these two datasets to solve the Oracle problem -> to verify the data in BOTH datasets. You simply take one of the datasets (or somebody's assumption of how data in set A converts to set B) as "correct" and overwrite the other dataset thus destroying the possibility to do a genuine data validation. So such automated addition of wikidata tags without local knowledge does more damage than good. If all of this change is based on existing wikipedia tags, such conversion can be done by anybody with minimal knowledge on the fly. And to give more practical perspective. We had been doing OSM<->wikipedia comparison for more than two years now. That is we take osm objects which have wikipedia tag and so we get page:coordinates. Then we take wikipedia dump (***-latest-geo_tags) and thus get a different dataset of page:coordinates. Then we compare those two datasets and identify miss-matches. Then we manually check each of those to be sure that information in BOTH datasets is correct. We NEVER do any automated update. This is the only way to be sure data quality is kept at high level. Try going at least through a hundred of miss-matches and you will see how many different situations there are and you will understand why automated update is NOT an option. -- Tomas _______________________________________________ talk mailing list [email protected] https://lists.openstreetmap.org/listinfo/talk

