Re: Language-specific collation in ARQ

james anderson Mon, 24 Oct 2016 13:43:02 -0700

good evening;

> On 2016-10-24, at 14:02, Osma Suominen <[email protected]> wrote:
> 
> Hi!
> 
> I'm looking into an issue [1] we have in the Skosmos application with the 
> ordering of literals as returned by a SPARQL query served by Fuseki. It 
> appears that when using ORDER BY, the order of literals is based on Unicode 
> collation order (or something similar). This is not always optimal for 
> user-facing applications where language-specific collation order would be 
> expected.
> ...
> 
> Based on what I found out, SPARQL doesn't really state the collation order of 
> literals [3,4,5]. Often generic Unicode collation is used. However, Dydra, a 
> cloud-based triple store, has special support for language-specific collation 
> [5]. There, the logic is this: "plain literals which share a language tag are 
> ordered according to the collation rules for the respective language" [5,6]. 
> Implementing collation this way makes a lot of sense to me.
> 
> Could the same be done with Jena ARQ? Either by changing the current sorting 
> implementation to be language-aware, or by using some custom extension 
> function to pre-process the literals into strings that can then be compared 
> using ORDER BY? Would this be a lot of work to implement?
> 
> I note that there is basic language-sensitive collation support available in 
> the Collator class [7] introduced in Java 7. A possibly more complete (and 
> apparently faster) Collator implementation [8] is available in the ICU4J 
> library.


in the interest of promoting interoperability, i note, we use the icu-project 
library.

best regards, from berlin,
---
james anderson | [email protected] | http://dydra.com

Re: Language-specific collation in ARQ

Reply via email to