Hi Ivan,

this sounds like quite a lot of effort and additional storage space. I
am not sure if the problem is worth it.
In any case, could you point me towards documentation on the matter so I
can evaluate it?

Cheers,
Sebastian

On 03/23/2010 04:48 PM, Ivan Mikhailov wrote:
> Hello Sebastian,
> 
> Yes, tokenizers are pluggable (as well as descriptions of funny
> multicharacter encodings). In the default language, named "x-any", the
> dot does not act as a delimiter if it appears between chars or numbers.
> It is possible to create some other language or tweak this (and rebuild
> the index and worry about future compatibility) or make more than one
> tokenization and index them all. For an RDF storage, I'd recomment the
> third option, because the interop is more important in RDF world than in
> corporate databases.
> 
> Best Regards,
> 
> Ivan Mikhailov
> OpenLink Software
> http://virtuoso.openlinksw.com
> 
> On Tue, 2010-03-23 at 15:16 +0100, Sebastian Trüg wrote:
>> Is there any way to influence the tokenizer for the Virtuoso full text
>> indexer?
>> The issue is that a dot does not act as a token divider. For an example
>> see the bug report: http://bugs.kde.org/show_bug.cgi?id=231549
>>
>> Thanks,
>> Sebastian Trueg
> 
> 
> 

Reply via email to