Re: Accents-insensitive search with LARQ

Paolo Castagna Fri, 16 Nov 2012 14:34:40 -0800

On 30/10/12 14:28, Osma Suominen wrote:

27.10.2012 02:31, Ondřej Hoferek kirjoitti:

I would like to use the full text search with LARQ for accent-insensitive
matching. I.e. pattern {?literal pf:textMatch "laska"} should also return
literal "láska žije".


I know that in Lucene, there is a class ISOLatin1AccentFilter which
can be
used while building/querying the index. However, I don't know how to
use it
from within LARQ.

Is there any way to achieve my goal?


Hi Ondrej,

I've had similar wishes but unfortunately LARQ is currently hardwired to
use the Lucene StandardAnalyzer which does not support such a filter. So
you would have to modify LARQ to use a custom analyzer with an accent
filter such as ISOLatin1AccentFilter or perhaps ICUFoldingFilter.

What I would like to see is a way to configure LARQ to use a custom
analyzer, filter etc. combination without having to change the Java
code. For example Solr allows one to select suitable analysis components
using an XML configuration file.


How could we do this?

IndexLARQ.java has a couple of constructors which take an Analyzer:
public IndexLARQ(IndexReader r, Analyzer a)
public IndexLARQ(IndexWriter w, Analyzer a)
But they are not used directly.

One would need to make the Lucene Analyzer configurable via Jena'sAssembler vocabulary and use these if the user specify an analyzer.I confess that the Assembler mechanism to build stuff in Jena is a bitconfusing to me and everytime I need to re-learn it or refresh it.


The XML configuration file in Solr is quite powerful but also complex.

Where would you draw the line between what LARQ does, should do and whata full search engine such as Solr does?


Paolo


-Osma

Re: Accents-insensitive search with LARQ

Reply via email to