Re: [OSM-talk] keys with multiple values
On 15/09/2014, Paul Norman penor...@mac.com wrote: On 9/15/2014 9:45 AM, moltonel 3x Combo wrote: Supporting multiple values natively in the osm data model would provide a clean and efficient solution, but updating all the tools to support it would be a huge undertaking. It's not going to get supported by most data consumers. This isn't a question of upgrading tools, this is a question of tools relying on a key=value store, a single column, or some other external dependency which doesn't allow multiple tag values. I believe when the API did support multiple values for one tag almost no data consumers supported it. Implementation doesn't need to break the key=val with keys being unique restriction of common stores (like PG's hstore). The obvious way is to store a composite value in val. This is in essense exactly what the current semicolon-separated-value scheme does, but if it was done at API level, it would avoid the inconsistent parsing issues. msgpack is a very lean and fast format that could be used. Compared to the current csv approach, the overhead of storing a typical array of strings is just 2 bytes (and splitting would be faster). It can be introduced in a backward-compatible maner : The old API version can convert arrays to the traditional csv-string format when exporting, and convert them back to a proper array when importing (with the added benefit of syntax checking). The new API skips the conversion, dealing only in native strings and arrays. Any consumer that can't handle arrays can request the stringified version instead. All conversions are done on the fly; the integrity of the array/string data in the db is kept. ___ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] keys with multiple values
I'm not quite sure what the issue with formalizing the ; convention is. As Jochen Topf has wrote before, we do need to at least define the semantics (set, ordered list etc), regardless of implementation. Once we agree on that, agreeing on if we should simply define an escape ( ;; would be good enough IMHO) or do something new should be really easy. Simon Am 17.09.2014 14:06, schrieb moltonel 3x Combo: On 15/09/2014, Paul Norman penor...@mac.com wrote: On 9/15/2014 9:45 AM, moltonel 3x Combo wrote: Supporting multiple values natively in the osm data model would provide a clean and efficient solution, but updating all the tools to support it would be a huge undertaking. It's not going to get supported by most data consumers. This isn't a question of upgrading tools, this is a question of tools relying on a key=value store, a single column, or some other external dependency which doesn't allow multiple tag values. I believe when the API did support multiple values for one tag almost no data consumers supported it. Implementation doesn't need to break the key=val with keys being unique restriction of common stores (like PG's hstore). The obvious way is to store a composite value in val. This is in essense exactly what the current semicolon-separated-value scheme does, but if it was done at API level, it would avoid the inconsistent parsing issues. msgpack is a very lean and fast format that could be used. Compared to the current csv approach, the overhead of storing a typical array of strings is just 2 bytes (and splitting would be faster). It can be introduced in a backward-compatible maner : The old API version can convert arrays to the traditional csv-string format when exporting, and convert them back to a proper array when importing (with the added benefit of syntax checking). The new API skips the conversion, dealing only in native strings and arrays. Any consumer that can't handle arrays can request the stringified version instead. All conversions are done on the fly; the integrity of the array/string data in the db is kept. ___ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk signature.asc Description: OpenPGP digital signature ___ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] keys with multiple values
On 14/09/2014, Norbert Wenzel norbert.wenzel.li...@gmail.com wrote: How is making a data search accept all nameX fields with X being any arbitrary number easier, than splitting all values at semicola? The data preprocessing has to be fixed (? has it, or are semicola already supported by Nominatim?) for all variants, and I'd say a simple string split at a semicolon is easier to implement than checking every key for possible numbers at the end. And the semicolon list is in my eyes more logical when it comes to removed values, eg. what does it mean when name2 is deleted, but name3 still exists? The problem with semicolon-separated values is that you can never be sure wether you're looking at multiple values, or one value that happens to contain a semicolon. That's part of the reason why other tags sometimes use a different separator (a coma, a pipe...). It's messy. Varying the key name is only slightly better: you know for sure that there are multiple values, but finding all those values is a bit of a nightmare (foo, foo_2, foo2, foo:2, foo[2], alt_foo, old_foo...). With some standardisation it could work (perhaps aided by editor support), but it hasn't happened yet. Supporting multiple values natively in the osm data model would provide a clean and efficient solution, but updating all the tools to support it would be a huge undertaking. ___ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] keys with multiple values
On 09/15/2014 06:45 PM, moltonel 3x Combo wrote: The problem with semicolon-separated values is that you can never be sure wether you're looking at multiple values, or one value that happens to contain a semicolon. That's part of the reason why other tags sometimes use a different separator (a coma, a pipe...). It's messy. As I've mainly thought about name tags (as you may have guessed from my examples) I still think it's safe to assume there is no name containing a semicolon (except Little Bobby Tables), but nevertheless you have a very valid point here. Of course we could define some escape character like the famous backslash but ... I don't even want to write that stupid idea down. Supporting multiple values natively in the osm data model would provide a clean and efficient solution, but updating all the tools to support it would be a huge undertaking. That would be the best solution and should at least be considered for a next API version, whenever that seems necessary. While I was reading your mail I just remembered I tried to put multiple values under the same key when I started with OSM and I was surprised the editors wouldn't let me do this. So this might also be the most intuitive way to add multiple values. Some years of OSM made me forget about that. ;-) Norbert ___ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] keys with multiple values
Am 15.09.2014 19:08, schrieb Norbert Wenzel: On 09/15/2014 06:45 PM, moltonel 3x Combo wrote: The problem with semicolon-separated values is that you can never be sure wether you're looking at multiple values, or one value that happens to contain a semicolon. That's part of the reason why other tags sometimes use a different separator (a coma, a pipe...). It's messy. Actually, I met a lot of special characters in name but no semi-colon, so far. As I've mainly thought about name tags (as you may have guessed from my examples) I still think it's safe to assume there is no name containing a semicolon (except Little Bobby Tables), but nevertheless you have a very valid point here. Of course we could define some escape character like the famous backslash but ... I don't even want to write that stupid idea down. Do not think that is stupid but one solution for a rare situation. If it is properly described on the wiki, there should be no problem. Supporting multiple values natively in the osm data model would provide a clean and efficient solution, but updating all the tools to support it would be a huge undertaking. That would be the best solution and should at least be considered for a next API version, whenever that seems necessary. The API already supports it but other software might not. While I was reading your mail I just remembered I tried to put multiple values under the same key when I started with OSM and I was surprised the editors wouldn't let me do this. So this might also be the most intuitive way to add multiple values. Some years of OSM made me forget about that. ;-) Once more, depending on the software (editor) cu colliar signature.asc Description: OpenPGP digital signature ___ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] keys with multiple values
On 9/15/2014 9:45 AM, moltonel 3x Combo wrote: Supporting multiple values natively in the osm data model would provide a clean and efficient solution, but updating all the tools to support it would be a huge undertaking. It's not going to get supported by most data consumers. This isn't a question of upgrading tools, this is a question of tools relying on a key=value store, a single column, or some other external dependency which doesn't allow multiple tag values. I believe when the API did support multiple values for one tag almost no data consumers supported it. ___ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] keys with multiple values
On 9/15/2014 10:29 AM, colliar wrote: Of course we could define some escape character like the famous backslash but ... I don't even want to write that stupid idea down. Do not think that is stupid but one solution for a rare situation. If it is properly described on the wiki, there should be no problem. Anyone who does handle a semicolon is almost certainly going to be simply doing a string split on ;, regardless of what the wiki says, rendering the wiki wrong if it were to say that. ___ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] keys with multiple values
Paul Norman penorman at mac.com writes: I believe with TIGER these were not alternate names, but different names, none of which were clearly main or alternates in the data. It's one of those things that makes raw TIGER data a pain to work with. That situation is distinct from when you have a primary name and multiple alternate names, although in practice many of the TIGER cases had a primary name, or one of the alternate names wasn't really a name. Alternative names in OSM come in two types: names to display (with tags such as name:fr) and names to search for. Although a semicolon-separated list is more logical in some ways than artificially separating into multiple tags, it is makes a difference from searching alternative display names. TIGER is also not where to look to for good tagging practices. For various reasons, there were numerous problems with the tagging. It was a learning experience. The red tape at the beginning of this discussion is one of the legacies. -- Andrew ___ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] keys with multiple values
On 09/14/2014 04:28 PM, Andrew Hain wrote: Alternative names in OSM come in two types: names to display (with tags such as name:fr) and names to search for. Although a semicolon-separated list is more logical in some ways than artificially separating into multiple tags, it is makes a difference from searching alternative display names. How is making a data search accept all nameX fields with X being any arbitrary number easier, than splitting all values at semicola? The data preprocessing has to be fixed (? has it, or are semicola already supported by Nominatim?) for all variants, and I'd say a simple string split at a semicolon is easier to implement than checking every key for possible numbers at the end. And the semicolon list is in my eyes more logical when it comes to removed values, eg. what does it mean when name2 is deleted, but name3 still exists? Norbert PS: Of course this should not mean that name:lang should be changed, since the language is given in the key, which adds information to the key. I'm simply referring to fields with multiple values for (I think most of the time it will be alt_name) a key, where all values are in the same language and equally important. ___ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] keys with multiple values
On Fri, Sep 12, 2014 at 5:24 PM, moltonel 3x Combo molto...@gmail.com wrote: To me there's very little semantic value in distinguishing between name_2 and alt_name. Even old_name and loc_name arguably don't bring much to the table (I do see the nuance, but it doesnt seem to be worth the complication). We've got the same problem with the ref tag and many others. These are artifacts of the TIGER import, and I think I've already beat that horse until it's a horse-shaped hole. Try grepping the archives for TIGER import considered harmful or something to that effect. My mood about TIGER has lightened for the most part, but I know in the early days a few years ago after the imports, particularly in the Portland area, it was like shoot me now bad trying to fix a lot of it. ___ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk
[OSM-talk] keys with multiple values
On 11/09/2014, Andrew Hain andrewhain...@hotmail.co.uk wrote: The TIGER import used name_n istead of alt_name_n. Have any data consumers got a preference for one or the other? To me there's very little semantic value in distinguishing between name_2 and alt_name. Even old_name and loc_name arguably don't bring much to the table (I do see the nuance, but it doesnt seem to be worth the complication). We've got the same problem with the ref tag and many others. We've invented loads of conflicting schemes because there's no obviously correct solution for this common problem. To me it's a failure in the osm data model, like the lack of a dedicated area object type. It should be possible to assign an ordered list of values to a key, without resorting to a confusing collection of hacks. Maybe an update to the data model would be so disruptive that it wouldn't be worth it. Maybe we could decide instead on a key separator that only and always indicates an alternate value (neither _ nor : fit the bill). But it'll face the usual adoption uphill battle (xkcd 927), and efficiency/ambiguity issues (like the current use of relations and implicit area=yes/no tags to identify areas). ___ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] keys with multiple values
On 9/12/2014 3:24 PM, moltonel 3x Combo wrote: To me there's very little semantic value in distinguishing between name_2 and alt_name. Even old_name and loc_name arguably don't bring much to the table (I do see the nuance, but it doesnt seem to be worth the complication). We've got the same problem with the ref tag and many others. I believe with TIGER these were not alternate names, but different names, none of which were clearly main or alternates in the data. It's one of those things that makes raw TIGER data a pain to work with. That situation is distinct from when you have a primary name and multiple alternate names, although in practice many of the TIGER cases had a primary name, or one of the alternate names wasn't really a name. TIGER is also not where to look to for good tagging practices. For various reasons, there were numerous problems with the tagging. ___ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk