Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
On Fri, Feb 15, 2008 at 12:57:41AM +0100, Stefan Keller wrote: model and will never evolve or be re-imported from other databases. Users will be 'surprised' when they miss their data on the map like with 'Tunnel ' instead 'Tunnel' or with things like that '¨name'='Südstrasse'. My proposal would eliminate that. In OSM this problem is solved not by restricting what people can tag but by having tools like the JOSM validator plugin and Maplint that will tell you if there is data that looks wrong to the computer according to some criteria. It is the job of a human then to decide whether it actually is wrong in his opinion and fix it. I expect these tools to improve over time and eventually find most cases where people have accidentally used the wrong tag. With this solution we keep the default openness: People who want to use unusual tags on purpose are not restricted. Now obviously it's about geographic data as said in the OSM homepage and its about databases and it's hopefully not HTML. The model you choose it a sort of meta model which is ok for initial capturing but not optimal for post-processing and showing it in maps - and the difference (from your point of view) is just restricting key characters to some smaller set! Thats exactly what we are doing and want to be doing: We are capturing data in the most flexible way possible. Everybody who wants to *use* the data for something has to pick the subset of the data he is interested in and can convert it to any format he likes best and that is suited for his needs. Of course there are ways to store subsets of the data in more efficient forms and many people do that, but everybody has different needs and so needs different subsets of the data and in different forms. The needs for somebody drawing a map are very different from somebody doing routing, for instance. The OSM data model is not designed to be efficient, it is designed to be flexible. Jochen -- Jochen Topf [EMAIL PROTECTED] http://www.remote.org/jochen/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
Look: It seems to be debate about unstructured, semi-structured and structured data. What you're celebrating is something around semi-structured data. Marcus reminded my that OSM allows for a building to be more than just a shop, but a gas-station a fuel-station, lit, car-wash, etc. That's right, but keep in mind, that I *don't* propose to change the current internal OSM schema and I *won't* restrict the number of key on your side. I propose only to restict the character set of keys. I'm showing a use case where parts of the OSM data gets out into a usual application environment and I hope OSM was'nt meant to stick in it's free but closed world. So let's say for my application I'm happy with one attribute (type=shop) or two. You know that in a more application oriented environment you do stick to a schema with a distinct number of attributes and take care of each of them. Having said that, if you want multivalued attributes and lossless data exchange there are many possibilities in my example schema to either sync' back based on the ID or store all the additional key/values you may receive from the users (e.g. in a attribute). This again to the prize that there is no usable index in the application schema (except for the ). Rob wrote: You use building=shop in your examples but there is absolutely nothing to stop someone using (apologies in advance for my lack of language skills) construcción=almacén or ??=?. Good example: How to make a map of more than one country when you an me can't interpret construcción and ?? being buildings? 2008/2/15, Jochen Topf [EMAIL PROTECTED]: On Fri, Feb 15, 2008 at 12:57:41AM +0100, Stefan Keller wrote: model and will never evolve or be re-imported from other databases. Users will be 'surprised' when they miss their data on the map like with 'Tunnel ' instead 'Tunnel' or with things like that '¨name'='Südstrasse'. My proposal would eliminate that. In OSM this problem is solved not by restricting what people can tag but by having tools like the JOSM validator plugin and Maplint that will tell you if there is data that looks wrong to the computer according to some criteria. It is the job of a human then to decide whether it actually is wrong in his opinion and fix it. I expect these tools to improve over time and eventually find most cases where people have accidentally used the wrong tag. With this solution we keep the default openness: People who want to use unusual tags on purpose are not restricted. Now obviously it's about geographic data as said in the OSM homepage and its about databases and it's hopefully not HTML. The model you choose it a sort of meta model which is ok for initial capturing but not optimal for post-processing and showing it in maps - and the difference (from your point of view) is just restricting key characters to some smaller set! Thats exactly what we are doing and want to be doing: We are capturing data in the most flexible way possible. Everybody who wants to *use* the data for something has to pick the subset of the data he is interested in and can convert it to any format he likes best and that is suited for his needs. Of course there are ways to store subsets of the data in more efficient forms and many people do that, but everybody has different needs and so needs different subsets of the data and in different forms. The needs for somebody drawing a map are very different from somebody doing routing, for instance. The OSM data model is not designed to be efficient, it is designed to be flexible. Jochen -- Jochen Topf [EMAIL PROTECTED] http://www.remote.org/jochen/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
2008/2/15 Stefan Keller [EMAIL PROTECTED]: Look: It seems to be debate about unstructured, semi-structured and structured data. What you're celebrating is something around semi-structured data. Marcus reminded my that OSM allows for a building to be more than just a shop, but a gas-station a fuel-station, lit, car-wash, etc. That's right, but keep in mind, that I *don't* propose to change the current internal OSM schema and I *won't* restrict the number of key on your side. I propose only to restict the character set of keys. I'm showing a use case where parts of the OSM data gets out into a usual application environment and I hope OSM was'nt meant to stick in it's free but closed world. So let's say for my application I'm happy with one attribute (type=shop) or two. You know that in a more application oriented environment you do stick to a schema with a distinct number of attributes and take care of each of them. Having said that, if you want multivalued attributes and lossless data exchange there are many possibilities in my example schema to either sync' back based on the ID or store all the additional key/values you may receive from the users (e.g. in a attribute). This again to the prize that there is no usable index in the application schema (except for the ). Rob wrote: You use building=shop in your examples but there is absolutely nothing to stop someone using (apologies in advance for my lack of language skills) construcción=almacén or ??=?. Good example: How to make a map of more than one country when you an me can't interpret construcción and ?? being buildings? How do you make a map of China without being able to use Mandarin characters? 2008/2/15, Jochen Topf [EMAIL PROTECTED]: On Fri, Feb 15, 2008 at 12:57:41AM +0100, Stefan Keller wrote: model and will never evolve or be re-imported from other databases. Users will be 'surprised' when they miss their data on the map like with 'Tunnel ' instead 'Tunnel' or with things like that '¨name'='Südstrasse'. My proposal would eliminate that. In OSM this problem is solved not by restricting what people can tag but by having tools like the JOSM validator plugin and Maplint that will tell you if there is data that looks wrong to the computer according to some criteria. It is the job of a human then to decide whether it actually is wrong in his opinion and fix it. I expect these tools to improve over time and eventually find most cases where people have accidentally used the wrong tag. With this solution we keep the default openness: People who want to use unusual tags on purpose are not restricted. Now obviously it's about geographic data as said in the OSM homepage and its about databases and it's hopefully not HTML. The model you choose it a sort of meta model which is ok for initial capturing but not optimal for post-processing and showing it in maps - and the difference (from your point of view) is just restricting key characters to some smaller set! Thats exactly what we are doing and want to be doing: We are capturing data in the most flexible way possible. Everybody who wants to *use* the data for something has to pick the subset of the data he is interested in and can convert it to any format he likes best and that is suited for his needs. Of course there are ways to store subsets of the data in more efficient forms and many people do that, but everybody has different needs and so needs different subsets of the data and in different forms. The needs for somebody drawing a map are very different from somebody doing routing, for instance. The OSM data model is not designed to be efficient, it is designed to be flexible. Jochen -- Jochen Topf [EMAIL PROTECTED] http://www.remote.org/jochen/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev -- Nick Black http://www.blacksworld.net ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
On Friday 15 February 2008 08:32:17 Stefan Keller wrote: Marcus reminded my that OSM allows for a building to be more than just a shop, but a gas-station a fuel-station, lit, car-wash, etc. That's right, but keep in mind, that I *don't* propose to change the current internal OSM schema and I *won't* restrict the number of key on your side. I propose only to restict the character set of keys. That doesn't make technical sense. All db systems that I know of can cope with UTF-8 everywhere. Certainly MySQL allows me to create UTF-8 table names, column names, and data. I can understand wanting to restrict the usage of tags to 'highway' rather than 'road', or some some translation of 'highway', that would allow import into a more structured db system, if required. But that doesn't mean that one of those 'allowed key names' couldn't be in mandarin. Perhaps what we need is some sort of translation system that can translate the mandarin for 'highway' so that renderers don't have to implement all the translations for 'highway' in their rules. Maybe too, the mapping tools can warn people of keys that will not currently render. Users can then choose whether the key is a misspelling, or decide that the keyname is correct for their needs. Alex Wright. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
In the model I showed and I would use for such purposes a sample query would look like this: # select * from building'; -- was: where type='shop' # select * from streets: -- where street type value can be anything Where buildings get a building (= table = keyname) point symbol/style and streets (= table = keyname) get a street line map symbol/style. How would you do it in OSM better than with unicolored node and a line symbols? - S. 2008/2/15, Nick Black [EMAIL PROTECTED]: 2008/2/15 Stefan Keller [EMAIL PROTECTED]: Look: It seems to be debate about unstructured, semi-structured and structured data. What you're celebrating is something around semi-structured data. Marcus reminded my that OSM allows for a building to be more than just a shop, but a gas-station a fuel-station, lit, car-wash, etc. That's right, but keep in mind, that I *don't* propose to change the current internal OSM schema and I *won't* restrict the number of key on your side. I propose only to restict the character set of keys. I'm showing a use case where parts of the OSM data gets out into a usual application environment and I hope OSM was'nt meant to stick in it's free but closed world. So let's say for my application I'm happy with one attribute (type=shop) or two. You know that in a more application oriented environment you do stick to a schema with a distinct number of attributes and take care of each of them. Having said that, if you want multivalued attributes and lossless data exchange there are many possibilities in my example schema to either sync' back based on the ID or store all the additional key/values you may receive from the users (e.g. in a attribute). This again to the prize that there is no usable index in the application schema (except for the ). Rob wrote: You use building=shop in your examples but there is absolutely nothing to stop someone using (apologies in advance for my lack of language skills) construcción=almacén or ??=?. Good example: How to make a map of more than one country when you an me can't interpret construcción and ?? being buildings? How do you make a map of China without being able to use Mandarin characters? 2008/2/15, Jochen Topf [EMAIL PROTECTED]: On Fri, Feb 15, 2008 at 12:57:41AM +0100, Stefan Keller wrote: model and will never evolve or be re-imported from other databases. Users will be 'surprised' when they miss their data on the map like with 'Tunnel ' instead 'Tunnel' or with things like that '¨name'='Südstrasse'. My proposal would eliminate that. In OSM this problem is solved not by restricting what people can tag but by having tools like the JOSM validator plugin and Maplint that will tell you if there is data that looks wrong to the computer according to some criteria. It is the job of a human then to decide whether it actually is wrong in his opinion and fix it. I expect these tools to improve over time and eventually find most cases where people have accidentally used the wrong tag. With this solution we keep the default openness: People who want to use unusual tags on purpose are not restricted. Now obviously it's about geographic data as said in the OSM homepage and its about databases and it's hopefully not HTML. The model you choose it a sort of meta model which is ok for initial capturing but not optimal for post-processing and showing it in maps - and the difference (from your point of view) is just restricting key characters to some smaller set! Thats exactly what we are doing and want to be doing: We are capturing data in the most flexible way possible. Everybody who wants to *use* the data for something has to pick the subset of the data he is interested in and can convert it to any format he likes best and that is suited for his needs. Of course there are ways to store subsets of the data in more efficient forms and many people do that, but everybody has different needs and so needs different subsets of the data and in different forms. The needs for somebody drawing a map are very different from somebody doing routing, for instance. The OSM data model is not designed to be efficient, it is designed to be flexible. Jochen -- Jochen Topf [EMAIL PROTECTED] http://www.remote.org/jochen/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev -- Nick Black http://www.blacksworld.net ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
It is no harder for me to add construcción(spanish) to the renderer than So, you're after a running target? Not so in my example schema of exported OSM data. And finally all this just because being reluctant to restrict key names to ASCII without space or so? - S. 2008/2/15, Rob Reid [EMAIL PROTECTED]: Stefan Keller wrote the following on 15/02/2008 21:32: That's right, but keep in mind, that I *don't* propose to change the current internal OSM schema and I *won't* restrict the number of key on your side. I propose only to restict the character set of keys. So let's say for my application I'm happy with one attribute (type=shop) or two. You know that in a more application oriented environment you do stick to a schema with a distinct number of attributes and take care of each of them. Rob wrote: You use building=shop in your examples but there is absolutely nothing to stop someone using (apologies in advance for my lack of language skills) construcción=almacén or ??=?. Good example: How to make a map of more than one country when you an me can't interpret construcción and ?? being buildings? If someone wants to start using construcción as a key and wants to to be rendered in the standard maps then the process is no different to from any other new key, they have to add it to the rendering rules for the various renderers or ask for it to be added and specify how it should appear. Just because I don't speak spanish does not mean I'm unable to add 'construcción' as a rule to one of the renderers and the renderers don't care what language it is. It is no harder for me to add construcción(spanish) to the renderer than it would be to add construktion(german), our current system would allow both, your proposal would allow one and not the other. Cheers rcr ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
Stefan, What exactly is the problem you're trying to solve ? It seems to me you're on a mission:) Here is what you should do: A. If you have the skills, get the data and convert it into whatever format/schema is good for you. B. If you haven't got the skills, _fund_ someone to help you to convert the data into whatever format is good for you. C. If none of the above, go and acquire either A or B Cheers Artem On 15 Feb 2008, at 11:28, Stefan Keller wrote: It is no harder for me to add construcción(spanish) to the renderer than So, you're after a running target? Not so in my example schema of exported OSM data. And finally all this just because being reluctant to restrict key names to ASCII without space or so? - S. 2008/2/15, Rob Reid [EMAIL PROTECTED]: Stefan Keller wrote the following on 15/02/2008 21:32: That's right, but keep in mind, that I *don't* propose to change the current internal OSM schema and I *won't* restrict the number of key on your side. I propose only to restict the character set of keys. So let's say for my application I'm happy with one attribute (type=shop) or two. You know that in a more application oriented environment you do stick to a schema with a distinct number of attributes and take care of each of them. Rob wrote: You use building=shop in your examples but there is absolutely nothing to stop someone using (apologies in advance for my lack of language skills) construcción=almacén or ??=?. Good example: How to make a map of more than one country when you an me can't interpret construcción and ?? being buildings? If someone wants to start using construcción as a key and wants to to be rendered in the standard maps then the process is no different to from any other new key, they have to add it to the rendering rules for the various renderers or ask for it to be added and specify how it should appear. Just because I don't speak spanish does not mean I'm unable to add 'construcción' as a rule to one of the renderers and the renderers don't care what language it is. It is no harder for me to add construcción(spanish) to the renderer than it would be to add construktion(german), our current system would allow both, your proposal would allow one and not the other. Cheers rcr ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
On Fri, Feb 15, 2008 at 01:53:43PM +0100, Martijn van Oosterhout wrote: VALUES ( 3029222, 'LINESTRING()', 'key=value, key=value, ...' ); ^^ I'm all for completely redoing the data model every once in a while but I'd suggest that you prepare a complete proposal in that case, including answers to the skeptics who'd like to ask 'How does that work with junctions?' Gabriel. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
On Fri, Feb 15, 2008 at 2:22 PM, Gabriel Ebner [EMAIL PROTECTED] wrote: On Fri, Feb 15, 2008 at 01:53:43PM +0100, Martijn van Oosterhout wrote: VALUES ( 3029222, 'LINESTRING()', 'key=value, key=value, ...' ); ^^ I'm all for completely redoing the data model every once in a while but I'd suggest that you prepare a complete proposal in that case, including answers to the skeptics who'd like to ask 'How does that work with junctions?' Umm, sorry. I'm not talking about the main DB, I'm talking about what users might want. Map generators like mpanik don't care bout junctions so the above model doesn't include them. If you want to store them too, add an ARRAY OF INTEGER in addition or instead of the geometry. They're as easily indexable as the attributes or geometries. In that case you could store a BBOX instead of the actual linestring. My point was: the fact the we don't restrict key values doesn't make it hard on users and hard for databases to deal with. Databases are good at what they do, so let them do it... Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
As Frederik Ramm got it, it's about this: There is a reason why our data format has tag k=keyname v=value / instead of keynamevalue/keyname and that reason is allowing non-XML stuff in the key names. Or at least it seems like that could have been a reason. I hope nobody's going to come forward and tell me it was just decided over a few beers ;-) All the tools/languages I use (xerces/xalan/Java) can cope with UTF-8. As I tried to explain, GML is *not* the issue; strictly speaking, it's *even not* about XML. I'm just trying to say that you gave here an accidental freedom to keynames. While for values UTF-8 is allright, which means you can give as exotic street names as you wish. The consequences of this are problematic for all those more database schema oriented applications. It's about databases, XML and other formats outside the OSM farm you described which don't rely on this fancy general meta schema, and it goes like this: TABLE way_tags ( geometry ST_GEOMETRY, id bigint(64) NOT NULL, keyname varchar(255) NOT NULL, value varchar(255) NOT NULL // type ) which typically becomes this: TABLE highway ( -- former way keyname in UTF-8??? geometry ST_GEOMETRY, id bigint(64) NOT NULL, type varchar(255) NOT NULL, -- e.g. primary, footway, residential, unclassified name varchar(255) NOT NULL ) TABLE pointsofinterest ( -- former node keyname in UTF-8??? geometry ST_GEOMETRY, id bigint(64) NOT NULL, type varchar(255) NOT NULL -- eg. level_crossing, rail, station, viaduct. ) TABLE whateveryourkeynamewillbe ( ... So to recap: The current allowable characters in OSM tag names is UTF8 - Deal with it, instead of trying to impose limitations into OSM to make OSM data comply with YOUR requirements. It's not 'my' requirement, it's about best practices. The use cases are e.g.: * UMN MapServer (which is at least as capable as all known OSM renderers) * Almost any other XML format/XML Schema * Any other database, any GIS So to recap: It's about few limits within the scope of OSM on an single meta attribute (keyname) while retaining the GOOD thing of UTF-8 values in order to avoid limited reusability and broken compatibility of OSM data beyond it's actual well known tools. And best of all: Users obviously did not require the kind of freedom so far you are argueing for. It's simply easier to release this small constraint afterwards than the other way round! Stefan 2008/2/13, J.D. Schmidt [EMAIL PROTECTED]: Stefan Keller skrev: You are right that XML names (= keys/tags) are valid in unicode in which case the encoding of the whole XML document (exchange file) must support this. But you know well that many tools have problems with non-ASCII XML element and attribute names (for content/value UTF-8 is ok since chars can escaped)! So, my last 20cents for valid key names before I give up is the following: 'aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ_-.0123456789' whereas such qualified names must begin with a letter and contain at most one colon and have at most a length of 255. Stefan, if I am coming across in this message as a bit harsh, then you're not mistaken - I am a grumpy old man, and damn proud of it. Just remember, it's not personal. I try to go after the ball, not the man. (No FakeSteveC, it doesn't mean I try to go after a guys ball(s) in THAT way..) Three times you have posted that you want to limit the characters used in tag naming, revising your proposal first to include the colon, and now to include numbers. Each previous attempt you have been told that UTF8 is valid, for good reasons, and yet still you persist. You have not once given a valid TECHNICAL reason for such a change, WITHIN THE SCOPE OF OSM, for limiting the characters allowable in tag names. As far as I can see from your first message on this subject, your idea stems from converting OSM data from its XML format to GML. Your project might need GML, OSM doesn't. If you are in the need of GML compliant output, then it is your task to massage the OSM provided data into a GML compliant output. It is not the task of OSM to have the data in GML compliant format, since the XML format with UTF8 as allowable just plain works for OSM. The tools that you state have problems with non-ascii characters should be fixed to be able to handle the UTF8 characters. Not the other way around, by changing the dataset to comply with the requirements of the tools. You might think it's a hen and egg situation, although in this case, the egg definitely is the important part, and has priority. The egg (the data) in this case has attributes that can contain non-ascii characters, thereby allowing non-latin based nationalities to define their own tags in their own language. This is a GOOD thing, which should NOT be changed. The hen (tools and programs utilizing OSM data) must take this into account. If a tool can't do that, then the farmer (the user of that tool) have
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
On Feb 13, 2008 8:47 AM, Stefan Keller [EMAIL PROTECTED] wrote: As Frederik Ramm got it, it's about this: There is a reason why our data format has tag k=keyname v=value / instead of keynamevalue/keyname and that reason is allowing non-XML stuff in the key names. Or at least it seems like that could have been a reason. I hope nobody's going to come forward and tell me it was just decided over a few beers ;-) All the tools/languages I use (xerces/xalan/Java) can cope with UTF-8. As I tried to explain, GML is *not* the issue; strictly speaking, it's *even not* about XML. I'm just trying to say that you gave here an accidental freedom to keynames. While for values UTF-8 is allright, which means you can give as exotic street names as you wish. The consequences of this are problematic for all those more database schema oriented applications. It's about databases, XML and other formats outside the OSM farm you described which don't rely on this fancy general meta schema, and it goes like this: TABLE way_tags ( geometry ST_GEOMETRY, id bigint(64) NOT NULL, keyname varchar(255) NOT NULL, value varchar(255) NOT NULL // type ) which typically becomes this: TABLE highway ( -- former way keyname in UTF-8??? geometry ST_GEOMETRY, id bigint(64) NOT NULL, type varchar(255) NOT NULL, -- e.g. primary, footway, residential, unclassified name varchar(255) NOT NULL ) It's a table name. You define this schema yourself so you can call the table whatever you like. If the database doesn't accept the tag name as a valid table name you call it something slightly different. If you're dynamically generating tables (probably a bit dodgy) then you come up with some way of encoding the name. A trivial way of doing this is to just use the UTF8 hex you can probably come up with something more intelligent. This isn't a problem, it's a simple programming exercise and one that you're going to have to come up with anyway even with you restrictions as you'd know if you've ever tried to run the following query in postgres: create table natural ( id int NOT NULL, type text NOT NULL ); There's actually a very simple solution to this in postgres, as it is quite happy with some very odd table names as long as you quote them (including spaces, UTF8 etc). So no, I still don't buy it. We don't /need/ any restrictions so why impose them? Dave ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
On 13 Feb 2008, at 08:47, Stefan Keller wrote: So to recap: The current allowable characters in OSM tag names is UTF8 - Deal with it, instead of trying to impose limitations into OSM to make OSM data comply with YOUR requirements. It's not 'my' requirement, it's about best practices. The use cases are e.g.: * UMN MapServer (which is at least as capable as all known OSM renderers) We don't like mapserver, see mapnik. * Almost any other XML format/XML Schema We don't like schemas, see keyvals * Any other database, any GIS We don't like GIS, see potlatch/josm OSM is and was a break with the past. Mapserver map files, GML, WFS-T, strong ontologies and friends.. these top-down 'best practices' in general just aren't very nice. We do the best/simplest things to make maps. Joe User doesn't care about any of those things. Joe User wants to make maps. It turns out that we've been able to do everything we need with simple nodes, ways and tags. have fun, SteveC | [EMAIL PROTECTED] | http://www.asklater.com/steve/ ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
On 13 Feb 2008, at 11:31, Stefan Keller wrote: SteveC, I acknowledge the argumentation of Jochen and Dave but I can't follow yours. As regards me, you're still welcome to join the technical discussion. Well that's kind of the point, OSM isn't a technical project. Mapping parties aren't about mapping the most area in a weekend, they're about engendering community. If you start from the position of making maps and community then the technical pieces fall in to place. Other projects start from the technology and then go the other way, that's why they fuck up. Stefan 2008/2/13, SteveC [EMAIL PROTECTED]: On 13 Feb 2008, at 08:47, Stefan Keller wrote: So to recap: The current allowable characters in OSM tag names is UTF8 - Deal with it, instead of trying to impose limitations into OSM to make OSM data comply with YOUR requirements. It's not 'my' requirement, it's about best practices. The use cases are e.g.: * UMN MapServer (which is at least as capable as all known OSM renderers) We don't like mapserver, see mapnik. * Almost any other XML format/XML Schema We don't like schemas, see keyvals * Any other database, any GIS We don't like GIS, see potlatch/josm OSM is and was a break with the past. Mapserver map files, GML, WFS-T, strong ontologies and friends.. these top-down 'best practices' in general just aren't very nice. We do the best/simplest things to make maps. Joe User doesn't care about any of those things. Joe User wants to make maps. It turns out that we've been able to do everything we need with simple nodes, ways and tags. have fun, SteveC | [EMAIL PROTECTED] | http://www.asklater.com/steve/ have fun, SteveC | [EMAIL PROTECTED] | http://www.asklater.com/steve/ ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
Sorry, are you sure, you're on the right list here? I thought this was the OSM-developers list. S. 2008/2/13, SteveC [EMAIL PROTECTED]: On 13 Feb 2008, at 11:31, Stefan Keller wrote: SteveC, I acknowledge the argumentation of Jochen and Dave but I can't follow yours. As regards me, you're still welcome to join the technical discussion. Well that's kind of the point, OSM isn't a technical project. Mapping parties aren't about mapping the most area in a weekend, they're about engendering community. If you start from the position of making maps and community then the technical pieces fall in to place. Other projects start from the technology and then go the other way, that's why they fuck up. Stefan 2008/2/13, SteveC [EMAIL PROTECTED]: On 13 Feb 2008, at 08:47, Stefan Keller wrote: So to recap: The current allowable characters in OSM tag names is UTF8 - Deal with it, instead of trying to impose limitations into OSM to make OSM data comply with YOUR requirements. It's not 'my' requirement, it's about best practices. The use cases are e.g.: * UMN MapServer (which is at least as capable as all known OSM renderers) We don't like mapserver, see mapnik. * Almost any other XML format/XML Schema We don't like schemas, see keyvals * Any other database, any GIS We don't like GIS, see potlatch/josm OSM is and was a break with the past. Mapserver map files, GML, WFS-T, strong ontologies and friends.. these top-down 'best practices' in general just aren't very nice. We do the best/simplest things to make maps. Joe User doesn't care about any of those things. Joe User wants to make maps. It turns out that we've been able to do everything we need with simple nodes, ways and tags. have fun, SteveC | [EMAIL PROTECTED] | http://www.asklater.com/steve/ have fun, SteveC | [EMAIL PROTECTED] | http://www.asklater.com/steve/ ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
On 13 Feb 2008, at 19:12, Stefan Keller wrote: Sorry, are you sure, you're on the right list here? Yeah... I think so... I think I even had something to do with creating it. But I'm not sure. Maybe I'm a list dreaming that I was a person? I thought this was the OSM-developers list. Yes, and I'm trying to make you understand why you're deeply wrong. It may be better to move this discussion to talk@ where we can help you understand the philosophy, and then the technology will be more easily understandable. have fun, SteveC | [EMAIL PROTECTED] | http://www.asklater.com/steve/ ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
On Feb 12, 2008 2:23 AM, Frederik Ramm [EMAIL PROTECTED] wrote: BTW, does having UTF8 keys mean that a key may contain a null byte, or is UTF8 crafted in a way to avoid that? It's specially crafted so that: - A NUL byte can't appear in any valid charater - No character is a substring of any other character - Leading bytes are distinguished from following bytes (for quick scanning) Which pretty much means all the functions of the C library work on utf-8 strings, even if you don't know they're utf-8. Not many encodings have that property. FWIW, if we're going to forbid anything in keys, I'd say forbid just the space, that will discourage people from putting values in it, but I don't feel strongly about it. Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
GML/XML is *not* the issue, you know that: It's almost any application outside OSM database. It's about reusability and consistency! I love the approach of key-value pairs (and I like beers too... ;-). I agree with Martijn that before all, spaces must be kept out. I agree too with Frederik: Colons can be included as namespace delimiters. Namespace, tags and keys reminds us, that OSM is a database and *not* a Wiki on an island (whereas I'm loving Wikis used as they are)! So I'm sorry, guys, but I have to insist: I propose distinctly to restrict key names (elemement, tag) to the set 'aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ_', now plus colon as namespace delimiter, allowed once and not at the beginning or the end. -- Stefan BTW: Restrincting tags in del.icio.us to ASCII did not restrain the success of social bookmarking in any way IMHO. 2008/2/12, Andy Robinson (blackadder) [EMAIL PROTECTED]: Stefan Keller wrote: Sent: 12 February 2008 12:36 AM To: dev@openstreetmap.org Subject: [OSM-dev] Restrict key names on order to retain reusability of OSM Hi all, I just have finished a converter of OSM xml format to GML and I BOLDLY suggest to constrain the allowed characters of tags (= key-names) to the following XML related set: 'aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ_' in order to retain reusability. No chance, GML is going to have to get with the times. OSM is of course an international open project so just about anything that gets thrown at it is acceptable. If you want something out of OSM the tools are going to have to be clever and work with whatever is there (or ignore it as you see fit.) Cheers Andy After having looked at more than 100 MB of data we found in key names characters like space, slashes, colons and even more weird ones. I don't think this will take too much of users freedom of choice... What do you think to agree on such a character list and subsequenctly to build this into editors like JOSM on order to get clean key names from the beginning? -- Stefan ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
Stefan Keller skrev: You are right that XML names (= keys/tags) are valid in unicode in which case the encoding of the whole XML document (exchange file) must support this. But you know well that many tools have problems with non-ASCII XML element and attribute names (for content/value UTF-8 is ok since chars can escaped)! So, my last 20cents for valid key names before I give up is the following: 'aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ_-.0123456789' whereas such qualified names must begin with a letter and contain at most one colon and have at most a length of 255. Stefan, if I am coming across in this message as a bit harsh, then you're not mistaken - I am a grumpy old man, and damn proud of it. Just remember, it's not personal. I try to go after the ball, not the man. (No FakeSteveC, it doesn't mean I try to go after a guys ball(s) in THAT way..) Three times you have posted that you want to limit the characters used in tag naming, revising your proposal first to include the colon, and now to include numbers. Each previous attempt you have been told that UTF8 is valid, for good reasons, and yet still you persist. You have not once given a valid TECHNICAL reason for such a change, WITHIN THE SCOPE OF OSM, for limiting the characters allowable in tag names. As far as I can see from your first message on this subject, your idea stems from converting OSM data from its XML format to GML. Your project might need GML, OSM doesn't. If you are in the need of GML compliant output, then it is your task to massage the OSM provided data into a GML compliant output. It is not the task of OSM to have the data in GML compliant format, since the XML format with UTF8 as allowable just plain works for OSM. The tools that you state have problems with non-ascii characters should be fixed to be able to handle the UTF8 characters. Not the other way around, by changing the dataset to comply with the requirements of the tools. You might think it's a hen and egg situation, although in this case, the egg definitely is the important part, and has priority. The egg (the data) in this case has attributes that can contain non-ascii characters, thereby allowing non-latin based nationalities to define their own tags in their own language. This is a GOOD thing, which should NOT be changed. The hen (tools and programs utilizing OSM data) must take this into account. If a tool can't do that, then the farmer (the user of that tool) have to either change that tool, or use the egg to prepare a dish that the tool can digest (massage the OSM data into a format the tool can use). The farmer should not try to persuade the egg that it is better of as a watermelon. So to recap: The current allowable characters in OSM tag names is UTF8 - Deal with it, instead of trying to impose limitations into OSM to make OSM data comply with YOUR requirements. Dutch ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
Marcus Wolschon wrote: Stefan Keller schrieb: | BTW: Restrincting tags in del.icio.us http://del.icio.us to ASCII did | not restrain the success of social bookmarking in any way IMHO. Well, it did. There are localized sites like that that allow any utf8-characters and they are used. Thinking about it a bit further, restricting tags in del.icio.us _didn't_ solve the problem of reusability. If I tag something as open source in del.icio.us, should it be open_source, or opensource, or openSource, or OpenSource? I've seen all four used. Interoperability is best ensured by community and convention, not by hard-and-fast rules; and we're doing pretty well on the former. cheers Richard ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
2008/2/12 Stefan Keller [EMAIL PROTECTED]: GML/XML is *not* the issue, you know that: It's almost any application outside OSM database. It's about reusability and consistency! I love the approach of key-value pairs (and I like beers too... ;-). I agree with Martijn that before all, spaces must be kept out. I agree too with Frederik: Colons can be included as namespace delimiters. Namespace, tags and keys reminds us, that OSM is a database and *not* a Wiki on an island (whereas I'm loving Wikis used as they are)! So I'm sorry, guys, but I have to insist: I propose distinctly to restrict key names (elemement, tag) to the set 'aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ_', now plus colon as namespace delimiter, allowed once and not at the beginning or the end. Even XML allows significantly more than that -- pretty much anything but whitespace [1], with a : as namespace delimiter. So insist all you like, but personally I think making people handle UTF-8 nicely is probably a good thing given the number of values that will rely on it heavily anyway. Most reasonable programming environments have decent unicode support these days, and certainly every XML parser that isn't a hack. Dave [1] http://www.w3.org/TR/2006/REC-xml-20060816/#charsets ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
Stefan Keller wrote: BTW: Restrincting tags in del.icio.us to ASCII did not restrain the success of social bookmarking in any way IMHO. And allowing \W tags in OSM has not restrained the success of Mapnik, Osmarender, the cycle map, the Garmin .img files, Kosmos, or any of the other wonderful things that people are doing, right now, with OSM data. If GML is sufficiently braindead that it can't cope with anything beyond the charset of a Sinclair Spectrum*, the converter should have an escaping function. Like Andy says, constraining OSM for this one particular use - and it is just one use, none of the others above are affected by it - is mad. I could equally say that Actionscript doesn't like colons in keys (which it doesn't), so colons should be banned. After all, there are many many more people using Actionscript on OSM data (via Potlatch) than there are GML. Instead I wrote about three lines of code to escape them. I recommend it. :) cheers Richard * Actually, I'm misrepresenting the Spectrum. It had some very handy block-graphic characters at 128+. Mr Westcott, are you there? ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Stefan Keller schrieb: | GML/XML is *not* the issue, you know that: | It's almost any application outside OSM database. | It's about reusability and consistency! Dear Stefan, I too strongly oppose to restrict the character-set. If you want consistency, you need national characters. Here UTF8 is the way to go. The simple fact is, that there are things that are to be in a map that only exist in certain regions and that have names containing non-ascii-characters. YOU need not use them but you have no authority to decide for the the people in that part of the world to not use them. Every application working with the OSM-database uses utf8 with no issues. That simply IS the default-character - -set to use for XML-data and for a reason. Most applications outside are converted to utf8 or pass it along nicely. Any application that only accepts us-characters is plainly broken and unfit for international use. If you want to use it anyway, you are free to convert it's input- and output-data. (Something as ugly as the punnycode used in dns will at least not destroy information you process.) | BTW: Restrincting tags in del.icio.us http://del.icio.us to ASCII did | not restrain the success of social bookmarking in any way IMHO. Well, it did. There are localized sites like that that allow any utf8-characters and they are used. Sorry pal but UTF8 is the right way to go and the very concept of national character-sets is dying everywhere because it is no longer fit for the multi-lingual world out there. This is a map of the world to be used for many different purposes, not a map of one country only. Marcus -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHsX5Pf1hPnk3Z0cQRAgV/AKDDOEKPGupW5MU0DhsHzPGad24/9QCg1AIS uudAuveFKyoyAKZ3SChtrZ8= =/w4i -END PGP SIGNATURE- ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
Frederik Ramm skrev: Hi, I just have finished a converter of OSM xml format to GML and I *BOLDLY*suggest to constrain the allowed characters of tags (= key-names) to the following XML related set: 'aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ_' in order to retain reusability. There is a reason why our data format has tag k=keyname v=value / instead of keynamevalue/keyname and that reason is allowing non-XML stuff in the key names. Or at least it seems like that could have been a reason. I hope nobody's going to come forward and tell me it was just decided over a few beers ;-) Since it was defined by brits, you can bet your sweet patootie that beers MUST have been involved... ;) But I'd agree with Dutch, Damn, there goes my perfect record.. ;) Dutch ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
Re: [OSM-dev] Restrict key names on order to retain reusability of OSM
Stefan Keller skrev: Hi all, I just have finished a converter of OSM xml format to GML and I *BOLDLY*suggest to constrain the allowed characters of tags (= key-names) to the following XML related set: 'aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ_' in order to retain reusability. After having looked at more than 100 MB of data we found in key names characters like space, slashes, colons and even more weird ones. I don't think this will take too much of users freedom of choice... What do you think to agree on such a character list and subsequenctly to build this into editors like JOSM on order to get clean key names from the beginning? -- Stefan NO FRACKIN' WAY If people can't use national characters such as ÆØÅæøå, accented characters such as ÔÂÊôâê and ÕÃŨĨõãũĩ, or left-to-right characters as used in Hebrew, Arabic, and Chinese for in the naming of tags, we are loosing one hell of an incitament for people all over the world to use, participate in and develop OSM further. If you had suggested that Tag names to be used on the Map Features list of recommended tags on the Wiki should only contain the characters used in the English language then I had been with you, but to indiscriminately remove the possibillity of having tags in national languages using characters only found in those languages is IMHO a completely No-Go. Remember, one of the great advantages of the OSM tagging model, is that anybody can define a tag, even if that tag only is going to be usefull for that particular person. This maintains the versatility for people who might not be very well versed in english, to still be able to customize and name the data they put in to OSM, so it makes sense for their particular use. I still recall the old days in the 1970's and 1980's when a lot of US made software balked on hi-ascii characters, which meant that a lot of us europeans had to for instance spell streetnames that contained national characters such as the above, as they sounded in english, instead of the way they actually were spelled. And I do NOT wish for that to return in the twentyfirst century in OSM, not even for tag names. If an OSM tool is unable to utilize UTF8 characters, then the tool should be rewritten. It is a big step backwards, if we instead choose to limit the characters available for use. Just my 0.02€ Dutch ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
[OSM-dev] Restrict key names on order to retain reusability of OSM
Hi all, I just have finished a converter of OSM xml format to GML and I *BOLDLY*suggest to constrain the allowed characters of tags (= key-names) to the following XML related set: 'aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ_' in order to retain reusability. After having looked at more than 100 MB of data we found in key names characters like space, slashes, colons and even more weird ones. I don't think this will take too much of users freedom of choice... What do you think to agree on such a character list and subsequenctly to build this into editors like JOSM on order to get clean key names from the beginning? -- Stefan ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev