Hello Sebastian, > this sounds like quite a lot of effort and additional storage space. I > am not sure if the problem is worth it. > In any case, could you point me towards documentation on the matter so I > can evaluate it?
http://docs.openlinksw.com/virtuoso/cinterface.html#langfuncapi The doc is very incomplete. Related sources are in libsrc/langfunc . Best Regards, Ivan. > > Cheers, > Sebastian > > On 03/23/2010 04:48 PM, Ivan Mikhailov wrote: > > Hello Sebastian, > > > > Yes, tokenizers are pluggable (as well as descriptions of funny > > multicharacter encodings). In the default language, named "x-any", the > > dot does not act as a delimiter if it appears between chars or numbers. > > It is possible to create some other language or tweak this (and rebuild > > the index and worry about future compatibility) or make more than one > > tokenization and index them all. For an RDF storage, I'd recomment the > > third option, because the interop is more important in RDF world than in > > corporate databases. > > > > Best Regards, > > > > Ivan Mikhailov > > OpenLink Software > > http://virtuoso.openlinksw.com > > > > On Tue, 2010-03-23 at 15:16 +0100, Sebastian Trüg wrote: > >> Is there any way to influence the tokenizer for the Virtuoso full text > >> indexer? > >> The issue is that a dot does not act as a token divider. For an example > >> see the bug report: http://bugs.kde.org/show_bug.cgi?id=231549 > >> > >> Thanks, > >> Sebastian Trueg
