Re: [OSM-talk] keys with multiple values

2014-09-17 Thread moltonel 3x Combo
On 15/09/2014, Paul Norman penor...@mac.com wrote:
 On 9/15/2014 9:45 AM, moltonel 3x Combo wrote:
 Supporting multiple values natively in the osm data model would
 provide a clean and efficient solution, but updating all the tools to
 support it would be a huge undertaking.
 It's not going to get supported by most data consumers. This isn't a
 question of upgrading tools, this is a question of tools relying on a
 key=value store, a single column, or some other external dependency
 which doesn't allow multiple tag values. I believe when the API did
 support multiple values for one tag almost no data consumers supported it.

Implementation doesn't need to break the key=val with keys being
unique restriction of common stores (like PG's hstore). The obvious
way is to store a composite value in val. This is in essense exactly
what the current semicolon-separated-value scheme does, but if it was
done at API level, it would avoid the inconsistent parsing issues.

msgpack is a very lean and fast format that could be used. Compared to
the current csv approach, the overhead of storing a typical array of
strings is just 2 bytes (and splitting would be faster).

It can be introduced in a backward-compatible maner : The old API
version can convert arrays to the traditional csv-string format when
exporting, and convert them back to a proper array when importing
(with the added benefit of syntax checking). The new API skips the
conversion, dealing only in native strings and arrays. Any consumer
that can't handle arrays can request the stringified version instead.
All conversions are done on the fly; the integrity of the array/string
data in the db is kept.

___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] keys with multiple values

2014-09-17 Thread Simon Poole
I'm not quite sure what the issue with formalizing the ; convention is.

As Jochen Topf has wrote before, we do need to at least define the
semantics (set, ordered list etc), regardless of implementation.

Once we agree on that, agreeing on if we should simply define an escape
( ;; would be good enough IMHO) or do something new should be really easy.

Simon


Am 17.09.2014 14:06, schrieb moltonel 3x Combo:
 On 15/09/2014, Paul Norman penor...@mac.com wrote:
 On 9/15/2014 9:45 AM, moltonel 3x Combo wrote:
 Supporting multiple values natively in the osm data model would
 provide a clean and efficient solution, but updating all the tools to
 support it would be a huge undertaking.
 It's not going to get supported by most data consumers. This isn't a
 question of upgrading tools, this is a question of tools relying on a
 key=value store, a single column, or some other external dependency
 which doesn't allow multiple tag values. I believe when the API did
 support multiple values for one tag almost no data consumers supported it.
 
 Implementation doesn't need to break the key=val with keys being
 unique restriction of common stores (like PG's hstore). The obvious
 way is to store a composite value in val. This is in essense exactly
 what the current semicolon-separated-value scheme does, but if it was
 done at API level, it would avoid the inconsistent parsing issues.
 
 msgpack is a very lean and fast format that could be used. Compared to
 the current csv approach, the overhead of storing a typical array of
 strings is just 2 bytes (and splitting would be faster).
 
 It can be introduced in a backward-compatible maner : The old API
 version can convert arrays to the traditional csv-string format when
 exporting, and convert them back to a proper array when importing
 (with the added benefit of syntax checking). The new API skips the
 conversion, dealing only in native strings and arrays. Any consumer
 that can't handle arrays can request the stringified version instead.
 All conversions are done on the fly; the integrity of the array/string
 data in the db is kept.
 
 ___
 talk mailing list
 talk@openstreetmap.org
 https://lists.openstreetmap.org/listinfo/talk
 



signature.asc
Description: OpenPGP digital signature
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] keys with multiple values

2014-09-15 Thread moltonel 3x Combo
On 14/09/2014, Norbert Wenzel norbert.wenzel.li...@gmail.com wrote:
 How is making a data search accept all nameX fields with X being
 any arbitrary number easier, than splitting all values at semicola? The
 data preprocessing has to be fixed (? has it, or are semicola already
 supported by Nominatim?) for all variants, and I'd say a simple string
 split at a semicolon is easier to implement than checking every key for
 possible numbers at the end.

 And the semicolon list is in my eyes more logical when it comes to
 removed values, eg. what does it mean when name2 is deleted, but name3
 still exists?

The problem with semicolon-separated values is that you can never be
sure wether you're looking at multiple values, or one value that
happens to contain a semicolon. That's part of the reason why other
tags sometimes use a different separator (a coma, a pipe...). It's
messy.

Varying the key name is only slightly better: you know for sure that
there are multiple values, but finding all those values is a bit of a
nightmare (foo, foo_2, foo2, foo:2, foo[2], alt_foo, old_foo...). With
some standardisation it could work (perhaps aided by editor support),
but it hasn't happened yet.

Supporting multiple values natively in the osm data model would
provide a clean and efficient solution, but updating all the tools to
support it would be a huge undertaking.

___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] keys with multiple values

2014-09-15 Thread Norbert Wenzel
On 09/15/2014 06:45 PM, moltonel 3x Combo wrote:
 The problem with semicolon-separated values is that you can never be
 sure wether you're looking at multiple values, or one value that
 happens to contain a semicolon. That's part of the reason why other
 tags sometimes use a different separator (a coma, a pipe...). It's
 messy.

As I've mainly thought about name tags (as you may have guessed from my
examples) I still think it's safe to assume there is no name containing
a semicolon (except Little Bobby Tables), but nevertheless you have a
very valid point here. Of course we could define some escape character
like the famous backslash but ... I don't even want to write that stupid
idea down.

 Supporting multiple values natively in the osm data model would
 provide a clean and efficient solution, but updating all the tools to
 support it would be a huge undertaking.

That would be the best solution and should at least be considered for a
next API version, whenever that seems necessary. While I was reading
your mail I just remembered I tried to put multiple values under the
same key when I started with OSM and I was surprised the editors
wouldn't let me do this. So this might also be the most intuitive way to
add multiple values. Some years of OSM made me forget about that. ;-)

Norbert

___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] keys with multiple values

2014-09-15 Thread colliar
Am 15.09.2014 19:08, schrieb Norbert Wenzel: On 09/15/2014 06:45 PM,
moltonel 3x Combo wrote:
 The problem with semicolon-separated values is that you can never be
 sure wether you're looking at multiple values, or one value that
 happens to contain a semicolon. That's part of the reason why other
 tags sometimes use a different separator (a coma, a pipe...). It's
 messy.

Actually, I met a lot of special characters in name but no semi-colon,
so far.

 As I've mainly thought about name tags (as you may have guessed from my
 examples) I still think it's safe to assume there is no name containing
 a semicolon (except Little Bobby Tables), but nevertheless you have a
 very valid point here. Of course we could define some escape character
 like the famous backslash but ... I don't even want to write that stupid
 idea down.

Do not think that is stupid but one solution for a rare situation. If it
is properly described on the wiki, there should be no problem.

 Supporting multiple values natively in the osm data model would
 provide a clean and efficient solution, but updating all the tools to
 support it would be a huge undertaking.

 That would be the best solution and should at least be considered for a
 next API version, whenever that seems necessary.

The API already supports it but other software might not.

 While I was reading
 your mail I just remembered I tried to put multiple values under the
 same key when I started with OSM and I was surprised the editors
 wouldn't let me do this. So this might also be the most intuitive way to
 add multiple values. Some years of OSM made me forget about that. ;-)

Once more, depending on the software (editor)

cu colliar



signature.asc
Description: OpenPGP digital signature
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] keys with multiple values

2014-09-15 Thread Paul Norman

On 9/15/2014 9:45 AM, moltonel 3x Combo wrote:

Supporting multiple values natively in the osm data model would
provide a clean and efficient solution, but updating all the tools to
support it would be a huge undertaking.
It's not going to get supported by most data consumers. This isn't a 
question of upgrading tools, this is a question of tools relying on a 
key=value store, a single column, or some other external dependency 
which doesn't allow multiple tag values. I believe when the API did 
support multiple values for one tag almost no data consumers supported it.


___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] keys with multiple values

2014-09-15 Thread Paul Norman

On 9/15/2014 10:29 AM, colliar wrote:

Of course we could define some escape character
like the famous backslash but ... I don't even want to write that stupid
idea down.

Do not think that is stupid but one solution for a rare situation. If it
is properly described on the wiki, there should be no problem.
Anyone who does handle a semicolon is almost certainly going to be 
simply doing a string split on ;, regardless of what the wiki says, 
rendering the wiki wrong if it were to say that.


___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] keys with multiple values

2014-09-14 Thread Andrew Hain
Paul Norman penorman at mac.com writes:

 I believe with TIGER these were not alternate names, but different 
 names, none of which were clearly main or alternates in the data. It's 
 one of those things that makes raw TIGER data a pain to work with.
 
 That situation is distinct from when you have a primary name and 
 multiple alternate names, although in practice many of the TIGER cases 
 had a primary name, or one of the alternate names wasn't really a name.

Alternative names in OSM come in two types: names to display (with tags such
as name:fr) and names to search for. Although a semicolon-separated list is
more logical in some ways than artificially separating into multiple tags,
it is makes a difference from searching alternative display names. 

 TIGER is also not where to look to for good tagging practices. For 
 various reasons, there were numerous problems with the tagging.

It was a learning experience. The red tape at the beginning of this
discussion is one of the legacies.

--
Andrew


___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] keys with multiple values

2014-09-14 Thread Norbert Wenzel
On 09/14/2014 04:28 PM, Andrew Hain wrote:
 Alternative names in OSM come in two types: names to display (with tags such
 as name:fr) and names to search for. Although a semicolon-separated list is
 more logical in some ways than artificially separating into multiple tags,
 it is makes a difference from searching alternative display names. 

How is making a data search accept all nameX fields with X being
any arbitrary number easier, than splitting all values at semicola? The
data preprocessing has to be fixed (? has it, or are semicola already
supported by Nominatim?) for all variants, and I'd say a simple string
split at a semicolon is easier to implement than checking every key for
possible numbers at the end.

And the semicolon list is in my eyes more logical when it comes to
removed values, eg. what does it mean when name2 is deleted, but name3
still exists?

Norbert

PS: Of course this should not mean that name:lang should be changed,
since the language is given in the key, which adds information to the
key. I'm simply referring to fields with multiple values for (I think
most of the time it will be alt_name) a key, where all values are in the
same language and equally important.


___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] keys with multiple values

2014-09-13 Thread Paul Johnson
On Fri, Sep 12, 2014 at 5:24 PM, moltonel 3x Combo molto...@gmail.com
wrote:

 To me there's very little semantic value in distinguishing between
 name_2 and alt_name. Even old_name and loc_name arguably don't bring
 much to the table (I do see the nuance, but it doesnt seem to be worth
 the complication). We've got the same problem with the ref tag and
 many others.


These are artifacts of the TIGER import, and I think I've already beat that
horse until it's a horse-shaped hole.  Try grepping the archives for TIGER
import considered harmful or something to that effect.  My mood about
TIGER has lightened for the most part, but I know in the early days a few
years ago after the imports, particularly in the Portland area, it was like
shoot me now bad trying to fix a lot of it.
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


[OSM-talk] keys with multiple values

2014-09-12 Thread moltonel 3x Combo
On 11/09/2014, Andrew Hain andrewhain...@hotmail.co.uk wrote:

 The TIGER import used name_n istead of alt_name_n.

 Have any data consumers got a preference for one or the other?

To me there's very little semantic value in distinguishing between
name_2 and alt_name. Even old_name and loc_name arguably don't bring
much to the table (I do see the nuance, but it doesnt seem to be worth
the complication). We've got the same problem with the ref tag and
many others.

We've invented loads of conflicting schemes because there's no
obviously correct solution for this common problem. To me it's a
failure in the osm data model, like the lack of a dedicated area
object type. It should be possible to assign an ordered list of values
to a key, without resorting to a confusing collection of hacks.

Maybe an update to the data model would be so disruptive that it
wouldn't be worth it. Maybe we could decide instead on a key separator
that only and always indicates an alternate value (neither _ nor :
fit the bill). But it'll face the usual adoption uphill battle (xkcd
927), and efficiency/ambiguity issues (like the current use of
relations and implicit area=yes/no tags to identify areas).

___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] keys with multiple values

2014-09-12 Thread Paul Norman

On 9/12/2014 3:24 PM, moltonel 3x Combo wrote:

To me there's very little semantic value in distinguishing between
name_2 and alt_name. Even old_name and loc_name arguably don't bring
much to the table (I do see the nuance, but it doesnt seem to be worth
the complication). We've got the same problem with the ref tag and
many others.
I believe with TIGER these were not alternate names, but different 
names, none of which were clearly main or alternates in the data. It's 
one of those things that makes raw TIGER data a pain to work with.


That situation is distinct from when you have a primary name and 
multiple alternate names, although in practice many of the TIGER cases 
had a primary name, or one of the alternate names wasn't really a name.


TIGER is also not where to look to for good tagging practices. For 
various reasons, there were numerous problems with the tagging.


___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk