On Wed, Jul 2, 2014 at 5:11 PM, Andy Seaborne <[email protected]> wrote: > On 02/07/14 21:45, Benson Margulies wrote: >> >> Andy, >> >> The upshot of all of this is that ISO-639-3 codes should work. >> However, that leaves a mystery to me. If I store a triple with @en, >> and someone queries with @eng, are they supposed to match? In >> practical terms, do they match in TDB or any other common triple >> stores? > > > No. > > ""@en and ""@eng are different RDF terms. As is ""@en-uk. > > All the stores I know of treat language tags as (normalized) strings.
That's perfectly clear and perfectly awful, at least for people who care about Persian, Dari, and that ilk. Thanks. > > SPARQL uses LANGMATCHES, which is the algorithm from RFC 4647 "Matching of > Language Tags". > > If you want semantic (ha!) equality, then canonicalizing on input is best. > Then worry about en-uk. > > Andy > > >> >> >> >> On Wed, Jul 2, 2014 at 12:34 PM, Andy Seaborne <[email protected]> wrote: >>> >>> On 02/07/14 12:01, Benson Margulies wrote: >>>> >>>> >>>> I always see two-letter ISO-639-1 language codes. This isn't enough, >>>> not all languages have them. >>>> >>>> Does the spec specifically call for these, or does it also allow for -3? >>>> >>>> --benson >>>> >>> >>> RDF 1.1 Concepts: >>> >>> http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal >>> >>> so it's BCP 47 / RFC 5646 >>> >>> The grammars do not include the RFC grammar (because a big language tag >>> grammar would dwarf the rest). >>> >>> http://www.w3.org/TR/turtle/#grammar-production-LANGTAG >>> >>> [144s] LANGTAG ::= '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)* >>> >>> So neutral and the grammars provide a more general match to language >>> codes. >>> >>> Jena has a language tag parser: LangTag. >>> >>> Andy >>> >
