Re: Language codes

Benson Margulies Wed, 02 Jul 2014 14:28:29 -0700

On Wed, Jul 2, 2014 at 5:11 PM, Andy Seaborne <[email protected]> wrote:
> On 02/07/14 21:45, Benson Margulies wrote:
>>
>> Andy,
>>
>> The upshot of all of this is that ISO-639-3 codes should work.
>> However, that leaves a mystery to me. If I store a triple with @en,
>> and someone queries with @eng, are they supposed to match? In
>> practical terms, do they match in TDB or any other common triple
>> stores?
>
>
> No.
>
> ""@en and ""@eng are different RDF terms.  As is ""@en-uk.
>
> All the stores I know of treat language tags as (normalized) strings.


That's perfectly clear and perfectly awful, at least for people who
care about Persian, Dari, and that ilk. Thanks.

>
> SPARQL uses LANGMATCHES, which is the algorithm from RFC 4647 "Matching of
> Language Tags".
>
> If you want semantic (ha!) equality, then canonicalizing on input is best.
> Then worry about en-uk.
>
>         Andy
>
>
>>
>>
>>
>> On Wed, Jul 2, 2014 at 12:34 PM, Andy Seaborne <[email protected]> wrote:
>>>
>>> On 02/07/14 12:01, Benson Margulies wrote:
>>>>
>>>>
>>>> I always see two-letter ISO-639-1 language codes. This isn't enough,
>>>> not all languages have them.
>>>>
>>>> Does the spec specifically call for these, or does it also allow for -3?
>>>>
>>>> --benson
>>>>
>>>
>>> RDF 1.1 Concepts:
>>>
>>> http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal
>>>
>>> so it's BCP 47 / RFC 5646
>>>
>>> The grammars do not include the RFC grammar (because a big language tag
>>> grammar would dwarf the rest).
>>>
>>> http://www.w3.org/TR/turtle/#grammar-production-LANGTAG
>>>
>>> [144s]  LANGTAG         ::=     '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
>>>
>>> So neutral and the grammars provide a more general match to language
>>> codes.
>>>
>>> Jena has a language tag parser: LangTag.
>>>
>>>          Andy
>>>
>

Re: Language codes

Reply via email to