Re: 1.3.9+ and CJK support in sphinx.conf

2020-12-28 Thread d tbsky




>
>
>
> Hello,
>
> On 2020-12-27 08:31, d tbsky wrote:
> >I want to upgrade our piler from "1.3.4 + sphinx 2.2" to "1.3.9/10
> > + sphinx 3.3.1".
> >   I notice there is a new config entry at sphinx.conf:
> > "SPHINX_CHARSET_TABLE".I am still confused after reading the document.
> >if I already setup "ngram_len" and "ngram_chars"  as FAQ said, do I
> > need to change the setting of "SPHINX_CHARSET_TABLE" of piler default
> > to something else?
> >thanks a lot for your help!!
>
> the SPHINX_CHARSET_TABLE settings cover the characters that most Western
> countries use. If you want support for CJK languages, then you need to
> uncomment ngram_len and ngram_chars in the line of NGRAM_CONFIG.
>
> However, frankly I don't have much experience with CJK stuff, so I count
> on your and other's experience and feedback whether the mentioned setup
> works or if it needs some improvement.

I didn't change the settings of "SPHINX_CHARSET_TABLE" and it seems
works fine.  I don't really know how sphinx works, but I guess with
"ngram_len = 1", sphinx just treat any CJK characters as words?  so if
I search a CJK "word" which consist of  two "characters",  piler will
return results with any character.  however if I double quote the
"word", then piler will return result with the exact "word".  I think
the behavior is ok.



Re: 1.3.9+ and CJK support in sphinx.conf

2020-12-28 Thread sj




Hello,

On 2020-12-27 08:31, d tbsky wrote:

   I want to upgrade our piler from "1.3.4 + sphinx 2.2" to "1.3.9/10
+ sphinx 3.3.1".
  I notice there is a new config entry at sphinx.conf:
"SPHINX_CHARSET_TABLE".I am still confused after reading the document.
   if I already setup "ngram_len" and "ngram_chars"  as FAQ said, do I
need to change the setting of "SPHINX_CHARSET_TABLE" of piler default
to something else?
   thanks a lot for your help!!


the SPHINX_CHARSET_TABLE settings cover the characters that most Western
countries use. If you want support for CJK languages, then you need to
uncomment ngram_len and ngram_chars in the line of NGRAM_CONFIG.

However, frankly I don't have much experience with CJK stuff, so I count
on your and other's experience and feedback whether the mentioned setup
works or if it needs some improvement.

Janos