Hi Ivan, this sounds like quite a lot of effort and additional storage space. I am not sure if the problem is worth it. In any case, could you point me towards documentation on the matter so I can evaluate it?
Cheers, Sebastian On 03/23/2010 04:48 PM, Ivan Mikhailov wrote: > Hello Sebastian, > > Yes, tokenizers are pluggable (as well as descriptions of funny > multicharacter encodings). In the default language, named "x-any", the > dot does not act as a delimiter if it appears between chars or numbers. > It is possible to create some other language or tweak this (and rebuild > the index and worry about future compatibility) or make more than one > tokenization and index them all. For an RDF storage, I'd recomment the > third option, because the interop is more important in RDF world than in > corporate databases. > > Best Regards, > > Ivan Mikhailov > OpenLink Software > http://virtuoso.openlinksw.com > > On Tue, 2010-03-23 at 15:16 +0100, Sebastian Trüg wrote: >> Is there any way to influence the tokenizer for the Virtuoso full text >> indexer? >> The issue is that a dot does not act as a token divider. For an example >> see the bug report: http://bugs.kde.org/show_bug.cgi?id=231549 >> >> Thanks, >> Sebastian Trueg > > >
