Hello Sebastian,

> this sounds like quite a lot of effort and additional storage space. I
> am not sure if the problem is worth it.
> In any case, could you point me towards documentation on the matter so I
> can evaluate it?

http://docs.openlinksw.com/virtuoso/cinterface.html#langfuncapi
The doc is very incomplete. Related sources are in libsrc/langfunc .

Best Regards,
Ivan.

> 
> Cheers,
> Sebastian
> 
> On 03/23/2010 04:48 PM, Ivan Mikhailov wrote:
> > Hello Sebastian,
> > 
> > Yes, tokenizers are pluggable (as well as descriptions of funny
> > multicharacter encodings). In the default language, named "x-any", the
> > dot does not act as a delimiter if it appears between chars or numbers.
> > It is possible to create some other language or tweak this (and rebuild
> > the index and worry about future compatibility) or make more than one
> > tokenization and index them all. For an RDF storage, I'd recomment the
> > third option, because the interop is more important in RDF world than in
> > corporate databases.
> > 
> > Best Regards,
> > 
> > Ivan Mikhailov
> > OpenLink Software
> > http://virtuoso.openlinksw.com
> > 
> > On Tue, 2010-03-23 at 15:16 +0100, Sebastian Trüg wrote:
> >> Is there any way to influence the tokenizer for the Virtuoso full text
> >> indexer?
> >> The issue is that a dot does not act as a token divider. For an example
> >> see the bug report: http://bugs.kde.org/show_bug.cgi?id=231549
> >>
> >> Thanks,
> >> Sebastian Trueg


Reply via email to