Hi Glenn, I will respond to some of your points because they are relevant to my contributions in this thread. At the end of this email I also comment on a survey I made today of six stations in order to evaluate the quality of the API data.
As far as I can tell from my survey, the station names returned by the Villo! API in the "name" field are exactly what shows up at the stations' locations. (On the other hand the website only shows the "address" field, which contains a name that often matches the "name" field, but not always.) The station names are not printed on the infrastructure: they only show up on the dynamic displays. (Only the reference number is physically printed on the station, along with "bonus" if it is a bonus station.) The full official name of (most) stations, as reported by the "name" API field, follows the format of Yves' example: "076 - PLACE VAN MEENEN/VAN MEENENPLEIN". Of course, in OSM we want to split that into two (or three) components: ref and name (or ref, name:fr and name:nl). Note however that this cannot be straightforwardly automated, unlike with the Antwerp Velo API data. There are multiple reasons for this. First of all, names are in all-caps and (partially) stripped from accents, and turning that into properly capitalized names with no missing accents is nontrivial. Second, many stations are misspelled or don't follow the standard OSM practice of expanding abbreviations (e.g. Place St Jean -> Place Saint-Jean). Third, there is the problem of bilingual names: Dutch names are sometimes missing while a STIB/MIVB station nearby (or some street, or some building) has the exact same French name and an available Dutch translation. Moreover in a couple of instances it is not so easy to split the French and Dutch names. For example "255 - SACRE-COEUR DE/HEILIGE HART VAN GANSHOREN". Finally names are limited to 50 characters, and we probably don't want to encode them as-is even if that is the official name. For example "257 - PL MARGHERITE D'AUTRICHE / MARGARETHA VAN OO". When I saw all those issues I decided to go through the list of station names and clean them up myself. I did a first pass using a dictionary I built from OSM street names to translate all-caps words to properly-capitalized words with accents. Then I went through the list by hand to fix conversion mistakes, misspellings, and provide Dutch translations when they were missing. The results are in my github repository (see my previous message in this thread), and that is what I propose we use in name, name:fr and name:nl tags. I don't know how we can do QA on name tags given the quality of the source data, but at the very least we can store the official name (in all caps, maybe with the station number stripped off) in the official_name tag. That way we can easily compare that field against the API in the event that it changes. Sometimes the Villo! operators change the name to include a notice that the station is closed for works, but this can be filtered out, either by removing all text in parentheses or ignoring name discrepancies on stations which are marked as "closed" (which is another field in the API). Given that the API names are the same as the names displayed on-location, we can reliably use them for armchair mapping, so I wouldn't say the API "just sucks and we shouldn't use it". The API also reports station capacity and the possibility of card payment, which is also useful. -------- I did a quick survey of six stations in Auderghem to compare the API data to reality. Three stations had wrong coordinates (wrong street block). I suppose they must have been correct at some point in the past, but the stations have been moved since. However in two out of three wrongly-located stations, the API "address" field pointed at the correct house numbers. The third station was not in front of a house so the "address" field only pointed out the street name. I checked the "banking", "bonus" and "bike_stands" fields, which all matched reality, as well as the sum of "available_bike_stands" and "available_bikes". Note that sometimes this sum is not equal to "bike_stands". I checked one of those stations (311 - Delta), where bike_stands is 22 but available stands+bikes is 21. This is explained by the fact that one of the stands is out of service, as indicated by a red light on the stand. Strangely, last time I checked, one station in the API (003 - Porte de Flandre / Vlaamsepoort) had four more available bike+stands than "bike_stands", which makes no sense unless the station was upgraded without updating the API field "bike_stands". I did not survey that station. As far as I could tell, the data reported on the interactive displays on the stations matches the API data exactly (including the wrong locations). In conclusion, I think the "name" API field is perfectly OK to use after cleanup. Columns "banking" and "bonus" matched in the six stations surveyed. The "bike_stands" field seems to be static data, unlike "available_bike_stands" and "available_bikes" which are dynamic. The static data matched in my six surveyed stations, but it may be outdated in some instances (though only one station in the API shows signs of this). Meanwhile the dynamic data only counts stands that are in service rather than the actual number of physical stands. Therefore I think importing "bike_stands" data is also OK, as we only risk importing a few outdated counts (perhaps even only one) instead of plainly incorrect ones. On the other hand, location data is clearly problematic and should only be used to guide mappers to the approximate location of the stations. Perhaps we can still import yet-unmapped stations with a "note" or "fixme" tag indicating that the location should be surveyed? Surely this is better than not mapping the station at all, and there are about a hundred missing stations in OSM. Best, Cédric _______________________________________________ Talk-be mailing list [email protected] https://lists.openstreetmap.org/listinfo/talk-be
