On Wed, Jul 2, 2014 at 5:55 PM, Andy Seaborne <[email protected]> wrote: > On 02/07/14 22:27, Benson Margulies wrote: >> >> On Wed, Jul 2, 2014 at 5:11 PM, Andy Seaborne <[email protected]> wrote: >>> >>> On 02/07/14 21:45, Benson Margulies wrote: >>>> >>>> >>>> Andy, >>>> >>>> The upshot of all of this is that ISO-639-3 codes should work. >>>> However, that leaves a mystery to me. If I store a triple with @en, >>>> and someone queries with @eng, are they supposed to match? In >>>> practical terms, do they match in TDB or any other common triple >>>> stores? >>> >>> >>> >>> No. >>> >>> ""@en and ""@eng are different RDF terms. As is ""@en-uk. >>> >>> All the stores I know of treat language tags as (normalized) strings. >> >> >> That's perfectly clear and perfectly awful, at least for people who >> care about Persian, Dari, and that ilk. Thanks. > > > Why? All ISO-639 systems are supported - but there is no equivalence tables > between the different systems built in. Or within the systems (B and T > codes).
Well, I'm exaggerating. "Perfectly awful" should be 'mildly inconvenient'/ In my space, it's typical to assume that language code comparisons know the equivalence between en and eng. So, if one expect to process a range of data including languages only distinguished in -3 space. There's lots of RDF out there with @en. So, I can't just throw the switch, as it were, to -3 codes and expect to match against it. I need to be careful to use -1 codes except for those languages where -3 codes are required to distinguish. Am I making sense? > > (This is all outside the RDF specs - they just inherit from W3C > Internationalization and BCP 47). > > > Experiment with: > http://www.sparql.org/data-validator.html > > Andy > > >> >>> >>> SPARQL uses LANGMATCHES, which is the algorithm from RFC 4647 "Matching >>> of >>> Language Tags". >>> >>> If you want semantic (ha!) equality, then canonicalizing on input is >>> best. >>> Then worry about en-uk. >>> >>> Andy >>> >>> >>>> >>>> >>>> >>>> On Wed, Jul 2, 2014 at 12:34 PM, Andy Seaborne <[email protected]> wrote: >>>>> >>>>> >>>>> On 02/07/14 12:01, Benson Margulies wrote: >>>>>> >>>>>> >>>>>> >>>>>> I always see two-letter ISO-639-1 language codes. This isn't enough, >>>>>> not all languages have them. >>>>>> >>>>>> Does the spec specifically call for these, or does it also allow for >>>>>> -3? >>>>>> >>>>>> --benson >>>>>> >>>>> >>>>> RDF 1.1 Concepts: >>>>> >>>>> http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal >>>>> >>>>> so it's BCP 47 / RFC 5646 >>>>> >>>>> The grammars do not include the RFC grammar (because a big language tag >>>>> grammar would dwarf the rest). >>>>> >>>>> http://www.w3.org/TR/turtle/#grammar-production-LANGTAG >>>>> >>>>> [144s] LANGTAG ::= '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)* >>>>> >>>>> So neutral and the grammars provide a more general match to language >>>>> codes. >>>>> >>>>> Jena has a language tag parser: LangTag. >>>>> >>>>> Andy >>>>> >>> >
