Hello Sebastian,

Yes, tokenizers are pluggable (as well as descriptions of funny
multicharacter encodings). In the default language, named "x-any", the
dot does not act as a delimiter if it appears between chars or numbers.
It is possible to create some other language or tweak this (and rebuild
the index and worry about future compatibility) or make more than one
tokenization and index them all. For an RDF storage, I'd recomment the
third option, because the interop is more important in RDF world than in
corporate databases.

Best Regards,

Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com

On Tue, 2010-03-23 at 15:16 +0100, Sebastian Trüg wrote:
> Is there any way to influence the tokenizer for the Virtuoso full text
> indexer?
> The issue is that a dot does not act as a token divider. For an example
> see the bug report: http://bugs.kde.org/show_bug.cgi?id=231549
> 
> Thanks,
> Sebastian Trueg



Reply via email to