On Wed, Jul 2, 2014 at 5:55 PM, Andy Seaborne <[email protected]> wrote:
> On 02/07/14 22:27, Benson Margulies wrote:
>>
>> On Wed, Jul 2, 2014 at 5:11 PM, Andy Seaborne <[email protected]> wrote:
>>>
>>> On 02/07/14 21:45, Benson Margulies wrote:
>>>>
>>>>
>>>> Andy,
>>>>
>>>> The upshot of all of this is that ISO-639-3 codes should work.
>>>> However, that leaves a mystery to me. If I store a triple with @en,
>>>> and someone queries with @eng, are they supposed to match? In
>>>> practical terms, do they match in TDB or any other common triple
>>>> stores?
>>>
>>>
>>>
>>> No.
>>>
>>> ""@en and ""@eng are different RDF terms.  As is ""@en-uk.
>>>
>>> All the stores I know of treat language tags as (normalized) strings.
>>
>>
>> That's perfectly clear and perfectly awful, at least for people who
>> care about Persian, Dari, and that ilk. Thanks.
>
>
> Why?  All ISO-639 systems are supported - but there is no equivalence tables
> between the different systems built in.  Or within the systems (B and T
> codes).

Well, I'm exaggerating. "Perfectly awful" should be 'mildly inconvenient'/

In my space, it's typical to assume that language code comparisons
know the equivalence between en and eng. So, if one expect to process
a range of data including languages only distinguished in -3 space.
There's lots of RDF out there with @en. So, I can't just throw the
switch, as it were, to -3 codes and expect to match against it. I need
to be careful to use -1 codes except for those languages where -3
codes are required to distinguish.

Am I making sense?


>
> (This is all outside the RDF specs - they just inherit from W3C
> Internationalization and BCP 47).
>
>
> Experiment with:
> http://www.sparql.org/data-validator.html
>
>         Andy
>
>
>>
>>>
>>> SPARQL uses LANGMATCHES, which is the algorithm from RFC 4647 "Matching
>>> of
>>> Language Tags".
>>>
>>> If you want semantic (ha!) equality, then canonicalizing on input is
>>> best.
>>> Then worry about en-uk.
>>>
>>>          Andy
>>>
>>>
>>>>
>>>>
>>>>
>>>> On Wed, Jul 2, 2014 at 12:34 PM, Andy Seaborne <[email protected]> wrote:
>>>>>
>>>>>
>>>>> On 02/07/14 12:01, Benson Margulies wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> I always see two-letter ISO-639-1 language codes. This isn't enough,
>>>>>> not all languages have them.
>>>>>>
>>>>>> Does the spec specifically call for these, or does it also allow for
>>>>>> -3?
>>>>>>
>>>>>> --benson
>>>>>>
>>>>>
>>>>> RDF 1.1 Concepts:
>>>>>
>>>>> http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal
>>>>>
>>>>> so it's BCP 47 / RFC 5646
>>>>>
>>>>> The grammars do not include the RFC grammar (because a big language tag
>>>>> grammar would dwarf the rest).
>>>>>
>>>>> http://www.w3.org/TR/turtle/#grammar-production-LANGTAG
>>>>>
>>>>> [144s]  LANGTAG         ::=     '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
>>>>>
>>>>> So neutral and the grammars provide a more general match to language
>>>>> codes.
>>>>>
>>>>> Jena has a language tag parser: LangTag.
>>>>>
>>>>>           Andy
>>>>>
>>>
>

Reply via email to