Re: [OSM-talk] Revival: Multilingual Country-List

Peter Wendorff Fri, 22 Feb 2013 00:44:07 -0800

Am 21.02.2013 13:01, schrieb Hans Schmidt:

Am 21.02.2013 12:36, schrieb Peter Wendorff:
Well... if there's no localized name tag, then you may omit thename:xx tag for that language, as there's no alternative.On the other hand name:de might be useful even then, as it's possibleto translate programmatically if the software knows about thelanguage. The German suffixes -straße, -weg, -platz could beautomatically transcoded to street, way and square, the afaik swedish-gatan is street again, väg is way and so on.But if you try to translate something to another language this waywhere you don't know the source language, it's much more difficult.
Why would you want to translate the street names? Do you want totranslate Paris' “Avenue des Champs-Élysées” to “Allee derChamps-Élysées”? Nobody would know what it is anymore.Also, nobody wants to translate a “Lindenallee” in some minor germantown to “Linden avenue”. Also, automatic translation would be errorprone.

For complete names you may be right, but for Natural Language Generationused in tools based on osm data parts of names might be useful totranslate. For the Lindenallee this might translate to "Go down thealley..." where alley might not be a given classification by tags, butdue to the name only.

So a recommendation might be to
- always tag name
- if you translate name into different languages, always addname:originalLanguageCode with the same content- if you want, add that even if you don't translate it to differentlanguages.
Yes, that's redundant - but it's easy to cut out for software (cutout every language attribute that equals the plain name), if wanted;and it's less error prone than a tag like "language=de" or like thelists of default language areas you propose above.Sure: These list are helpful for all cases where only name is given,and that's a necessity for great software dealing with that, butthat's the way defaults in OSM work: there should be a few defaultsfor mappers, where they should decide to not add a tag, but moredefaults for data consumers, who could/should be able to have a bestguess where data is missing.
You say that there should be few defaults for mappers. But what youpropose is exactly the opposite: You'd have a default, meaning thatyou would need to create a name:originallanguage even if there is aname present. I would bet that nobody does this. And if you don’t doit like that, chaos will occur if you decide to display the name.

Wait...

I agree: even in the long term the majority of objects for sure will nothave a name:originallanguage in addition to the only plain name tag.This is part of the incompleteness we have everywhere in osm.

I disagree, that this would lead to chaos for itself.

Imagine a text based application that could be read aloud by software.To do that properly names should be spoken with the pronunciation of thelanguage they are from.Let's consider a screenreader for browsers and a browser basedapplication as an example. The output of "Dies ist der Times Square inNew York" (this is the Times Square in New York) is simple to do, but ascreen reader based only on German as a language would speak it outroughly like (not sure if I get it comparable for English speakershere): "Dees ist der Teames Square in Nu Johk", because nobody couldknow that Times Square and New York are names based on the Englishlanguage. In a website, additional markup could ideally solve that(given that the screenreader supports english language as well in theusers setup): Dies ist der TimesSquare in New York.But to generate markup like this the software has to know about thelanguage.Sure: this may be done by approximation based on the area in the world,and yes, developers have to use something like that for the usual casewhere the languages is still unknown, but in the text-to-speech areathis would produce many wrong results by accident.

In contrast, if you do it based on region, it would simplify thingsmuch more:
1. You take the nodes/relation for Canada, add language=en.
2. You take the nodes/relation for Québec: language=fr
Then everybody would just continue using name=British Columbia andname=Montréal, and no problem. The multilingual renderer would thenshow, in case the user wants to see French names, name=Montréal andname:fr=Colombie-Britannique. If the user is English, he would showname:en=Montreal and name:British Columbia.

I completely agree, as long as it's only about displaying. I completelyagree that this is a valid fallback, but as I showed above that is notable to solve all problems.Even for rendering I'm not sure if that's really an optimal solution forlanguages written right-to-left or downwards. Here you have to know atleast this characteristics of the language to decide about label sizesand placements - not sure if that's really given in the unicodecharacters itself.

Tell me where this is not easier than adding a redundant name:en orname:fr for every town, bus stop and street in Canada. You would onlyhave to change the multilangual renderer so that it would display itlike that. This is no problem because I guess it is still indevelopment – It could be done relatively easy (from a non-developerstandpoint speaking).

Examples above.

And yes, it's easier to skip the native language as a separate tag. Itwill work for most cases; but it won't for many others.We're not a map, we're a geo database, and languages are important forthat as well, especially interesting for foreign languages. It is infact interesting to see which pubs and restaurants in Germany are namedby names from English/French/Spanish/Italian/... language. It'sfascinating to see where e.g. pubs have Cymraeg (Welsh) or Gaelic namesin their "native" areas and outside.

And these are examples that occur often.

Plus, most of todays nodes only have a name=... tag, not aname:xyz=... one. You would not need to change anything.

Sure. Software has to support that, and has to make a best guess, butit's only that: a best guess - sometimes it's wrong. Especially inmulti-language parts of the world. To suggest English as a language inthe hispanic cities, towns or suburbs of the united states (e.g. SantaFe, New Mexico [1]) is error prone, I'm sure there are areas where youhave two or more languages used roughly equally.

So:

- Yes: Software developers should support guessing the natural languages(where that's necessary)- No: Mappers should NOT delete localized name tags even if these areequal to the local one out of the assumption of redundancy.- No: Mappers should NOT be told to never add localized tags where onlyone single name tag exists.


regards
Peter

[1]http://www.openstreetmap.org/?lat=35.68022&lon=-105.94028&zoom=17&layers=M


_______________________________________________
talk mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/talk

Re: [OSM-talk] Revival: Multilingual Country-List

Reply via email to