Does that apply to "<" on strings as well or is it specific to collation?
Andy
On 24/10/16 21:25, james anderson wrote:
good evening;
On 2016-10-24, at 14:02, Osma Suominen <[email protected]> wrote:
Hi!
I'm looking into an issue [1] we have in the Skosmos application with the
ordering of literals as returned by a SPARQL query served by Fuseki. It appears
that when using ORDER BY, the order of literals is based on Unicode collation
order (or something similar). This is not always optimal for user-facing
applications where language-specific collation order would be expected.
...
Based on what I found out, SPARQL doesn't really state the collation order of literals
[3,4,5]. Often generic Unicode collation is used. However, Dydra, a cloud-based triple
store, has special support for language-specific collation [5]. There, the logic is this:
"plain literals which share a language tag are ordered according to the collation
rules for the respective language" [5,6]. Implementing collation this way makes a
lot of sense to me.
Could the same be done with Jena ARQ? Either by changing the current sorting
implementation to be language-aware, or by using some custom extension function
to pre-process the literals into strings that can then be compared using ORDER
BY? Would this be a lot of work to implement?
I note that there is basic language-sensitive collation support available in
the Collator class [7] introduced in Java 7. A possibly more complete (and
apparently faster) Collator implementation [8] is available in the ICU4J
library.
in the interest of promoting interoperability, i note, we use the icu-project
library.
best regards, from berlin,
---
james anderson | [email protected] | http://dydra.com