On 30/10/12 14:28, Osma Suominen wrote:
27.10.2012 02:31, Ondřej Hoferek kirjoitti:
I would like to use the full text search with LARQ for accent-insensitive
matching. I.e. pattern {?literal pf:textMatch "laska"} should also return
literal "láska žije".
I know that in Lucene, there is a class ISOLatin1AccentFilter which
can be
used while building/querying the index. However, I don't know how to
use it
from within LARQ.
Is there any way to achieve my goal?
Hi Ondrej,
I've had similar wishes but unfortunately LARQ is currently hardwired to
use the Lucene StandardAnalyzer which does not support such a filter. So
you would have to modify LARQ to use a custom analyzer with an accent
filter such as ISOLatin1AccentFilter or perhaps ICUFoldingFilter.
What I would like to see is a way to configure LARQ to use a custom
analyzer, filter etc. combination without having to change the Java
code. For example Solr allows one to select suitable analysis components
using an XML configuration file.
How could we do this?
IndexLARQ.java has a couple of constructors which take an Analyzer:
public IndexLARQ(IndexReader r, Analyzer a)
public IndexLARQ(IndexWriter w, Analyzer a)
But they are not used directly.
One would need to make the Lucene Analyzer configurable via Jena's
Assembler vocabulary and use these if the user specify an analyzer.
I confess that the Assembler mechanism to build stuff in Jena is a bit
confusing to me and everytime I need to re-learn it or refresh it.
The XML configuration file in Solr is quite powerful but also complex.
Where would you draw the line between what LARQ does, should do and what
a full search engine such as Solr does?
Paolo
-Osma