good evening; > On 2016-10-24, at 14:02, Osma Suominen <[email protected]> wrote: > > Hi! > > I'm looking into an issue [1] we have in the Skosmos application with the > ordering of literals as returned by a SPARQL query served by Fuseki. It > appears that when using ORDER BY, the order of literals is based on Unicode > collation order (or something similar). This is not always optimal for > user-facing applications where language-specific collation order would be > expected. > ... > > Based on what I found out, SPARQL doesn't really state the collation order of > literals [3,4,5]. Often generic Unicode collation is used. However, Dydra, a > cloud-based triple store, has special support for language-specific collation > [5]. There, the logic is this: "plain literals which share a language tag are > ordered according to the collation rules for the respective language" [5,6]. > Implementing collation this way makes a lot of sense to me. > > Could the same be done with Jena ARQ? Either by changing the current sorting > implementation to be language-aware, or by using some custom extension > function to pre-process the literals into strings that can then be compared > using ORDER BY? Would this be a lot of work to implement? > > I note that there is basic language-sensitive collation support available in > the Collator class [7] introduced in Java 7. A possibly more complete (and > apparently faster) Collator implementation [8] is available in the ICU4J > library.
in the interest of promoting interoperability, i note, we use the icu-project library. best regards, from berlin, --- james anderson | [email protected] | http://dydra.com
