Hi Wiktor, If you want to automated this, you will need a database outside of OSM, that stores the matches between the municipiality and OSM addresses. the municipal differences over time, the OSM differences over time. The same underlying matching algorithm previously described is used to create the osm to municipal matches, differences in the municipal data over time, and differences in OSM over time. Good automation opportunities exist when you see previously matching OSM and municipal data, diverging on the municipal side. Conflicting data (divergent changes made to both OSM and municipal data) could be ignored, or feed into a some kind of manual pipline like the task manager, MapRoulette, or a QA tool. If you can get a copy of of the older address data that was actually imported, you should be able to mostly automate catching OSM up. I don't really see any way of automating divergent changes, since it will be impossible for the software to know which side it "better". This is all normal diff/merging type concepts, except rather than text files, the fuzzy matching algorithm is generating the diffs.
Thanks Jason On Sat, Jan 10, 2015 at 1:35 PM, Wiktor Niesiobedzki <o...@vink.pl> wrote: > 2015-01-10 16:44 GMT+01:00 Jason Remillard <remillard.ja...@gmail.com>: >> Hi Wiktor, >> >> I don't think an address tag is needed or desirable. >> >> The best way of doing this is to compare versions of the official data >> (perhaps every 6 months), making a list of things that have changed so >> that they can be examined in OSM. > > To compare only changes in the source, I need to know, what was > imported to OSM first. And without any reference in OSM, how do I > guess the baseline? I could just check what's new for last 6 months > (for example - previous half of calendar year), but then - we still > need some tooling to verify, if someone actually did it during this > time and present backlog for specific areas. We have very uneven > distribution of mappers between geographic areas. > > And this way, I may fail to identify nodes, that were deleted in the > source (not all sources report deleted nodes). > >> >> Of coarse the big issue is that the matching is not trivial. First >> devise a matching score combining of distance to address, and edit >> distance in the address name and number. These scores are the weights. >> Then use one of the weighted bipartite graph matching algorithm >> (augmented path) that works well on sparse data. If you keep the >> search radius down, the graph will be very sparse, so should be >> manageable. Using the match, you can get a list of nodes that have >> been moved, deleted, and edited in the official data set. > > But how should I handle such real scenario: > - address is created in municipality > - mapper adds it on a map > - the script runs, sees a new address, finds a nearby address, but if > there are some difference (different street or something like that) - > should it update, or skip it? According to the rules so far, when I > have a change in source, I should update the OSM, but this might not > be the case here. > > And - from algoritmic point of view it looks exactly the same as scenario: > - address is created in municipality > - address is imported to OSM > - street change in address by municipiality > - the script runs > > > Cheers, > > Wiktor > > _______________________________________________ > talk mailing list > talk@openstreetmap.org > https://lists.openstreetmap.org/listinfo/talk _______________________________________________ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk