Re: sphinx reindex question
> > cd /var/piler/tmp > > runuser -u piler -- reindex -a (why this command didn't need to > > specify /etc/piler/sphinx.conf?) > > Because it has nothing to with sphinx. The reindex tool retrieves the > emails, parses them, and puts their contents to the sph_index table. > Then the sphinx indexer reads and process this table. I am really stupid. now I understand. > sphinx 3.3.1 has some improvements over the 2.2 series. If you can find > all your email, then don't worry. > > > /var/lib/mysql/piler/sph_index.ibd => become 1.9GB > > This is the before mentioned table. Since you put a huge volume of data > during the reindex process this table has grown, and now it just doesn't > want to shrink. Try running "optimize table sph_index;" at the mysql > console to shrink it when the reindex has completed. ok now I get it. after "reindex -a", I need to wait cron or run it manually: "indexer.delta.sh", "indexer.main.sh". they will empty mysql sph_index table and write to /var/piler/sphinx. and finally I can optimize the mysql table and get back disk spaces. thanks a lot for the detailed explain!!
Re: 1.3.9+ and CJK support in sphinx.conf
> > > > Hello, > > On 2020-12-27 08:31, d tbsky wrote: > >I want to upgrade our piler from "1.3.4 + sphinx 2.2" to "1.3.9/10 > > + sphinx 3.3.1". > > I notice there is a new config entry at sphinx.conf: > > "SPHINX_CHARSET_TABLE".I am still confused after reading the document. > >if I already setup "ngram_len" and "ngram_chars" as FAQ said, do I > > need to change the setting of "SPHINX_CHARSET_TABLE" of piler default > > to something else? > >thanks a lot for your help!! > > the SPHINX_CHARSET_TABLE settings cover the characters that most Western > countries use. If you want support for CJK languages, then you need to > uncomment ngram_len and ngram_chars in the line of NGRAM_CONFIG. > > However, frankly I don't have much experience with CJK stuff, so I count > on your and other's experience and feedback whether the mentioned setup > works or if it needs some improvement. I didn't change the settings of "SPHINX_CHARSET_TABLE" and it seems works fine. I don't really know how sphinx works, but I guess with "ngram_len = 1", sphinx just treat any CJK characters as words? so if I search a CJK "word" which consist of two "characters", piler will return results with any character. however if I double quote the "word", then piler will return result with the exact "word". I think the behavior is ok.
Re: 1.3.9+ and CJK support in sphinx.conf
Hello, On 2020-12-27 08:31, d tbsky wrote: I want to upgrade our piler from "1.3.4 + sphinx 2.2" to "1.3.9/10 + sphinx 3.3.1". I notice there is a new config entry at sphinx.conf: "SPHINX_CHARSET_TABLE".I am still confused after reading the document. if I already setup "ngram_len" and "ngram_chars" as FAQ said, do I need to change the setting of "SPHINX_CHARSET_TABLE" of piler default to something else? thanks a lot for your help!! the SPHINX_CHARSET_TABLE settings cover the characters that most Western countries use. If you want support for CJK languages, then you need to uncomment ngram_len and ngram_chars in the line of NGRAM_CONFIG. However, frankly I don't have much experience with CJK stuff, so I count on your and other's experience and feedback whether the mentioned setup works or if it needs some improvement. Janos
Re: sphinx reindex question
Hello, On 2020-12-28 08:56, d tbsky wrote: I was testing master commit fb9150f and upgrade from sphinx 2.2 to 3.3.1. according to piler manual, I need to reindex everything. so I install a new machine. copy /var/piler and /var/lib/mysql to new machine and do procedure below: rm -f /var/piler/sphinx/* runuser -u piler -- indexer --all --config /etc/piler/sphinx.conf cd /var/piler/tmp runuser -u piler -- reindex -a (why this command didn't need to specify /etc/piler/sphinx.conf?) Because it has nothing to with sphinx. The reindex tool retrieves the emails, parses them, and puts their contents to the sph_index table. Then the sphinx indexer reads and process this table. I found I can not understand what happened. the size at old system: /var/piler/sphinx => 300MB /var/lib/mysql/piler/sph_index.ibd => 22M the size at new system after reindex: /var/piler/sphinx => 89M, which didn't change at all at the reindex process sphinx 3.3.1 has some improvements over the 2.2 series. If you can find all your email, then don't worry. /var/lib/mysql/piler/sph_index.ibd => become 1.9GB This is the before mentioned table. Since you put a huge volume of data during the reindex process this table has grown, and now it just doesn't want to shrink. Try running "optimize table sph_index;" at the mysql console to shrink it when the reindex has completed. /var/piler/tmp => 17G, the same size as /var/piler/store so the whole process seems extract mails from /var/piler/store to /var/piler/tmp, and write to mysql. I don't know is this behavior correct? why reindex touch nothing at /var/piler/sphinx and why mysql db become so huge. I try to open piler web UI and do some mail search, it seems working fine, but I am still confused. That's the ultimate test that you have done it well or not :-) Janos