Re: [OSM-dev] Disallowing certain characters in tag keys
On Sun, Oct 17, 2010 at 09:48:31AM +0200, Ulf Lamping wrote: Am 16.10.2010 20:44, schrieb Jochen Topf: Hi! I am currently fighting some issues where tags with strange characters in them need to be represented in a URL for Taginfo. Lots of other websites probably will have similar issues. Characters like /, ?,, etc. have special meaning in URLs so if they appear in tags I can't have those tags in URLs. Sometimes escaping characters as %XX helps, sometimes not. And those problems are not confined to web pages and URLs only. Special characters that need escaping are often a problem. Yes, special characters can cause headaches. I remember this from my own tag analyzing experiments and other software projects ;-) I agree with you that most (all?) of them are (usually unintended) bugs. For example: Not long ago, it was a common tag problem that keys started or ended with a space char. IIRC the xybot regularly fixes those bugs now. However, as those characters can be used in the name values (and elsewhere), you have to deal with the correct handling of special characters in your software anyway, so I'm not sure if disallowing special characters in the key will really help us in that regard. The problem with disallowing special characters is that you close the door. Software writers will then write software that depends on them not being there (or not caring which is probably the common case today). If we later find out that - for whatever reasons - we want to use one of those chars this will become extremely difficult, as it will cause trouble at many places in existing software. Thats absolutely true. Thats why I am only proposing a very small list and don't include characters like {} that are not used now, but might make sense in the future. What we currently don't have is a guideline for mappers. I'm missing (and thinking to write for some time) a: How to write good tags. To my knowledge we don't have a written guide (not rule) that we tend to used lower case chars, underscores instead of spaces and all that unwritten rules. Of course, this could include: don't use special chars like /, ?, ... in keys - because this makes it hard for software writers. I agree that we should come up with these guidelines, but thats really a different issue. I tend to simply ignore keys with special chars - as we do it today ... Which works well for lots of software (like renderers who don't care about the things they don't understand). Unfortunately it doesn't work for editors or something like Taginfo which needs to work with *all* legal tags. Jochen -- Jochen Topf joc...@remote.org http://www.remote.org/jochen/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Disallowing certain characters in tag keys
On Sun, Oct 17, 2010 at 04:57:33PM -0400, Anthony wrote: On Sat, Oct 16, 2010 at 2:44 PM, Jochen Topf joc...@remote.org wrote: Technically this would mean changing the API to check for those characters, removing any that are already in the database (can be done with normal manual edits because there are so few cases) and adding checks to the editors so that they can give meaningful error messages. To be clear, they'd still be in the database, in the history. Which is one implementation problem, because it means putting checks in more than one different place. At the very least, the regular API, and the Potlatch API, but there are probably multiple places within the regular API where things would need to be checked. But thats much fewer places than all the software out there. The whole point of an API is that its a sort of choke point, a single place where things can be checked. And then any software which relies on these changes wouldn't work with historical data. Thats a problem, you are right. We could solve that by faking the history. Not the first time this has been done, it would be possible. But most software out there only deals with current data. So even if we keep the history, that software would be made easier. It could be done, but to do all that work just to make it easier to code Taginfo would be, in my opinion, a waste. Especially when there are plenty of simple solutions within taginfo. If URL encoding is too painful, use a modified base64 encoding of the unicode string (using - and _ instead of + and /). Its not only Taginfo. Every software out there would be made easier. If this would be a Taginfo-only problem I wouldn't propose it. One of the biggest problems is that Taginfo doesn't work alone, but wants to work with other tools. If I use base64 encoding then people would need to link to something like http://taginfo.openstreetmap.de/keys/aGlnaHdheQo= instead of http://taginfo.openstreetmap.de/keys/highway. And the link then to XAPI would not be http://www.informationfreeway.org/api/0.6/*[highway=*] but http://www.informationfreeway.org/api/0.6/*[aGlnaHdheQo==*] . Not very user friendly. And then every service would probably use different encoding schemes... I have actually thought about that and might offer a secondary interface to Taginfo using base64 or something like it if I can't avoid it. But thats really ugly and probably nobody would use it anyway, because nobody wants to write special cases for the few keys that use those characters and are bogus anyway. For cleaning up the keys, I'd want to strip down to as few characters as possible. There's no point supporting most unicode characters - keys are supposed to be in English. No. English people should be allowed to use their own language if they want to. So should speakers of every other language on the planet, too. Jochen -- Jochen Topf joc...@remote.org http://www.remote.org/jochen/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Disallowing certain characters in tag keys
On 16/10/10 19:44, Jochen Topf wrote: I am currently fighting some issues where tags with strange characters in them need to be represented in a URL for Taginfo. Lots of other websites probably will have similar issues. Characters like /, ?,, etc. have special meaning in URLs so if they appear in tags I can't have those tags in URLs. Sometimes escaping characters as %XX helps, sometimes not. And those problems are not confined to web pages and URLs only. Special characters that need escaping are often a problem. I really don't understand the problem here - as far as I know all characters can be used in URLs so long as they are properly escaped. If your server software is not coping with that for some reason then I think it's a bug. As a test I just created a file called '+?#;%.html' in an apache served directory and then asked Firefox to fetch: http://server/%3c%3e%26%2b%3f%23%3b%25.html and it was retrieved just fine. Tom -- Tom Hughes (t...@compton.nu) http://compton.nu/ ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Disallowing certain characters in tag keys
On Tue, Oct 19, 2010 at 10:06:15AM +0100, Tom Hughes wrote: On 16/10/10 19:44, Jochen Topf wrote: I am currently fighting some issues where tags with strange characters in them need to be represented in a URL for Taginfo. Lots of other websites probably will have similar issues. Characters like /, ?,, etc. have special meaning in URLs so if they appear in tags I can't have those tags in URLs. Sometimes escaping characters as %XX helps, sometimes not. And those problems are not confined to web pages and URLs only. Special characters that need escaping are often a problem. I really don't understand the problem here - as far as I know all characters can be used in URLs so long as they are properly escaped. If your server software is not coping with that for some reason then I think it's a bug. That might well be a bug. But those bugs creep up all the time, because these things are hard to do and because the specs are not as clear as they should be. I am not saying these things can't be done right, but wouldn't it be nice if we can get rid of that problem instead of everybody writing software for OSM having to make sure all those cases are handled properly? As a test I just created a file called '+?#;%.html' in an apache served directory and then asked Firefox to fetch: http://server/%3c%3e%26%2b%3f%23%3b%25.html and it was retrieved just fine. And now try the same thing again creating a filename with a '/' in it and see whether it works this time. It doesn't, because '/' is special for Unix (and HTTP) and you need to create a directory with the first part of your name and then the second as file. If you would actually want to create one file for every key in the OSM database in your filesystem, you'd have a problem. You example basically proves my point. :-) Jochen -- Jochen Topf joc...@remote.org http://www.remote.org/jochen/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Disallowing certain characters in tag keys
On Tue, Oct 19, 2010 at 10:25 AM, Jochen Topf joc...@remote.org wrote: On Tue, Oct 19, 2010 at 10:06:15AM +0100, Tom Hughes wrote: On 16/10/10 19:44, Jochen Topf wrote: I am currently fighting some issues where tags with strange characters in them need to be represented in a URL for Taginfo. Lots of other websites probably will have similar issues. Characters like /, ?,, etc. have special meaning in URLs so if they appear in tags I can't have those tags in URLs. Sometimes escaping characters as %XX helps, sometimes not. And those problems are not confined to web pages and URLs only. Special characters that need escaping are often a problem. I really don't understand the problem here - as far as I know all characters can be used in URLs so long as they are properly escaped. If your server software is not coping with that for some reason then I think it's a bug. That might well be a bug. But those bugs creep up all the time, because these things are hard to do and because the specs are not as clear as they should be. I am not saying these things can't be done right, but wouldn't it be nice if we can get rid of that problem instead of everybody writing software for OSM having to make sure all those cases are handled properly? As a test I just created a file called '+?#;%.html' in an apache served directory and then asked Firefox to fetch: http://server/%3c%3e%26%2b%3f%23%3b%25.html and it was retrieved just fine. And now try the same thing again creating a filename with a '/' in it and see whether it works this time. It doesn't, because '/' is special for Unix (and HTTP) and you need to create a directory with the first part of your name and then the second as file. If you would actually want to create one file for every key in the OSM database in your filesystem, you'd have a problem. You example basically proves my point. :-) No, it really doesn't. Let's put it this way - there is a subset[1] of unicode code points that is valid for both keys and values. If you find any characters emitted by OSM that lie outwith that range, then do let us know[3] But we've taken great care to permit all other code points in both keys and values alike, since we've no idea when someone is going to need them. Your example of why we need (and presumably ) is actually great example to undermine your point. Some of these characters need escaping for particular purposes. If you find a unicode character that cannot be URLencoded[2], then do let us know. Or if you find another encoding scenario which can only encode a sub-set of unicode code points, let us know. Your application should be able to handle every valid input. You've found that your application is buggy, and now you're asking for the input to be changed. But just the keys, not the values, and only current data, not historical data. It seems a bit ... weird. And your original list of characters is completely arbitrary, not based on any formal specification as far as I can see. If your editor can't handle all necessary characters, fix the editor. If your application can't handle all the characters, fix the application. And if you find dealing with or = or in a key to be hard, it's probably worth taking some time to test with non-BMP characters. (If you later find that having ');DROP DATABASE;-- in a key or value is breaking your database inserts, then please don't ask for these characters to be banned too!) Thanks, Andy [1] See http://www.w3.org/TR/2008/REC-xml-20081126/#charsets [2] http://en.wikipedia.org/wiki/Urlencode - / is %2f, by the way. [3] But you shouldn't rely on it, and defensively program anyway. Not all OSM files are generated by the API. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Disallowing certain characters in tag keys
On 19/10/10 10:25, Jochen Topf wrote: On Tue, Oct 19, 2010 at 10:06:15AM +0100, Tom Hughes wrote: As a test I just created a file called '+?#;%.html' in an apache served directory and then asked Firefox to fetch: http://server/%3c%3e%26%2b%3f%23%3b%25.html and it was retrieved just fine. And now try the same thing again creating a filename with a '/' in it and see whether it works this time. It doesn't, because '/' is special for Unix (and HTTP) and you need to create a directory with the first part of your name and then the second as file. If you would actually want to create one file for every key in the OSM database in your filesystem, you'd have a problem. Sure if you have a slash then, for static files served from unix, that would have to correspond to a directory separator. That's a unix file naming limitation though. In a dynamic application where you are decoding the path information yourself and deciding what it means there is no such restriction. Tom -- Tom Hughes (t...@compton.nu) http://compton.nu/ ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Disallowing certain characters in tag keys
On Tue, Oct 19, 2010 at 6:52 AM, Andy Allan gravityst...@gmail.com wrote: Let's put it this way - there is a subset[1] of unicode code points that is valid for both keys and values. If you find any characters emitted by OSM that lie outwith that range, then do let us know[3] Even if they're only in the history? Last I checked (a couple months ago), there were quite a few invalid characters in the history (1). Would you like the list (seems like something which would be easy for you to generate yourself)? If so, is there something going to be done about them? (1) For example, see the last character in the comment at http://www.openstreetmap.org/api/0.6/changeset/936207 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev