Hello,

I've been trying to get this working for some time now. The problem is with 
polish characters in orientdb (i.e. 'ą','ę','ć','ź','ż',etc.). When using 
the order by clause the results aren't ordered properly.
Test case:
1. Create new database (test)
2. Connect to test
3. Create class Book
4. Create 'name' property (string) in Book 
5. Add 6 records to Book
- name = "ącki"
- name = "abrakadabra"
- name = "baran"
- name = "bączek"
- name = "ćwierkacz"
- name = "czarny"
6. select * from Book order by name asc.

*Expected result (sort order by name):*
abrakadabra
ącki
baran
bączek
czarny
ćwierkacz

*Received result:*
abrakadabra
baran
bączek
czarny
ącki
ćwierkacz


I already tried using lucene index with analyzer: 
org.apache.lucene.analysis.pl.PolishAnalyzer, but it doesn't seem to work 
(the index distinguishes "a" and "ą" and I don't see a way to set the field 
as ICUCollationField  which works for Solr as expected), and also tried to 
use the normalize function (i.e. Select * from Book order by 
name.normalize()) which works almost as expected except the "ą"gets in 
front of the "a" and it also seems that the normalize() purpose was 
different then using it in order by.

To summarize, I am expecting to get the order by working for any language 
using diacritics. So it should work for german ü, polish ą, czech č, etc. 
For polish, the letter "ą" is after "a" but before "b".

What would be the proper way to get this working? I've noticed that the 
only engine that gets this done properly is arangoDB. Neo4j has the same 
problem and now when trying orientDB I cannot get this to work. 
Would someone be so kind to point me in the proper direction on how to 
approach this issue?


Best Regards
Rafal.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to