Re: [basex-talk] Can the full text tokenizer be instructed not to tokenize at hyphens?

2021-07-13 Thread Imsieke, Gerrit, le-tex
That’s a feasible workaround, thank you. On 13.07.2021 08:27, Christian Grün wrote: Hi Gerrit, I’m sorry there’s currently no way to adjust that. We’d probably think of how this goes hand in hand with other XQFT features that rely on single-word tokens (such as stemming). For now, a little

Re: [basex-talk] Can the full text tokenizer be instructed not to tokenize at hyphens?

2021-07-13 Thread Christian Grün
Hi Gerrit, I’m sorry there’s currently no way to adjust that. We’d probably think of how this goes hand in hand with other XQFT features that rely on single-word tokens (such as stemming). For now, a little extra index could be generated instead, which contains all terms the that are supposed to

[basex-talk] Can the full text tokenizer be instructed not to tokenize at hyphens?

2021-07-12 Thread Imsieke, Gerrit, le-tex
A customer wants to include composite terms such as 'third-generation' as single tokens so that they may be offered in a completion list. I don’t think this is configurable, or is it? Gerrit