Re: Language-specific collation in ARQ

Andy Seaborne Mon, 24 Oct 2016 13:44:51 -0700

Does that apply to "<" on strings as well or is it specific to collation?


    Andy

On 24/10/16 21:25, james anderson wrote:

good evening;

On 2016-10-24, at 14:02, Osma Suominen <[email protected]> wrote:

Hi!

I'm looking into an issue [1] we have in the Skosmos application with the 
ordering of literals as returned by a SPARQL query served by Fuseki. It appears 
that when using ORDER BY, the order of literals is based on Unicode collation 
order (or something similar). This is not always optimal for user-facing 
applications where language-specific collation order would be expected.
...

Based on what I found out, SPARQL doesn't really state the collation order of literals 
[3,4,5]. Often generic Unicode collation is used. However, Dydra, a cloud-based triple 
store, has special support for language-specific collation [5]. There, the logic is this: 
"plain literals which share a language tag are ordered according to the collation 
rules for the respective language" [5,6]. Implementing collation this way makes a 
lot of sense to me.

Could the same be done with Jena ARQ? Either by changing the current sorting 
implementation to be language-aware, or by using some custom extension function 
to pre-process the literals into strings that can then be compared using ORDER 
BY? Would this be a lot of work to implement?

I note that there is basic language-sensitive collation support available in 
the Collator class [7] introduced in Java 7. A possibly more complete (and 
apparently faster) Collator implementation [8] is available in the ICU4J 
library.


in the interest of promoting interoperability, i note, we use the icu-project 
library.

best regards, from berlin,
---
james anderson | [email protected] | http://dydra.com

Re: Language-specific collation in ARQ

Reply via email to