Re: Language codes

Andy Seaborne Wed, 02 Jul 2014 14:57:08 -0700

On 02/07/14 22:27, Benson Margulies wrote:

On Wed, Jul 2, 2014 at 5:11 PM, Andy Seaborne <[email protected]> wrote:

On 02/07/14 21:45, Benson Margulies wrote:


Andy,

The upshot of all of this is that ISO-639-3 codes should work.
However, that leaves a mystery to me. If I store a triple with @en,
and someone queries with @eng, are they supposed to match? In
practical terms, do they match in TDB or any other common triple
stores?



No.

""@en and ""@eng are different RDF terms.  As is ""@en-uk.

All the stores I know of treat language tags as (normalized) strings.


That's perfectly clear and perfectly awful, at least for people who
care about Persian, Dari, and that ilk. Thanks.

Why? All ISO-639 systems are supported - but there is no equivalencetables between the different systems built in. Or within the systems (Band T codes).

(This is all outside the RDF specs - they just inherit from W3CInternationalization and BCP 47).



Experiment with:
http://www.sparql.org/data-validator.html

        Andy


SPARQL uses LANGMATCHES, which is the algorithm from RFC 4647 "Matching of
Language Tags".

If you want semantic (ha!) equality, then canonicalizing on input is best.
Then worry about en-uk.

         Andy




On Wed, Jul 2, 2014 at 12:34 PM, Andy Seaborne <[email protected]> wrote:


On 02/07/14 12:01, Benson Margulies wrote:



I always see two-letter ISO-639-1 language codes. This isn't enough,
not all languages have them.

Does the spec specifically call for these, or does it also allow for -3?

--benson


RDF 1.1 Concepts:

http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal

so it's BCP 47 / RFC 5646

The grammars do not include the RFC grammar (because a big language tag
grammar would dwarf the rest).

http://www.w3.org/TR/turtle/#grammar-production-LANGTAG

[144s]  LANGTAG         ::=     '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*

So neutral and the grammars provide a more general match to language
codes.

Jena has a language tag parser: LangTag.

          Andy

Re: Language codes

Reply via email to