Re: sphinx reindex question

2020-12-28 Thread d tbsky




> >   cd /var/piler/tmp
> >   runuser -u piler -- reindex -a  (why this command didn't need to
> > specify /etc/piler/sphinx.conf?)
>
> Because it has nothing to with sphinx. The reindex tool retrieves the
> emails, parses them, and puts their contents to the sph_index table.
> Then the sphinx indexer reads and process this table.

  I am really stupid.  now I understand.

> sphinx 3.3.1 has some improvements over the 2.2 series. If you can find
> all your email, then don't worry.
>
> > /var/lib/mysql/piler/sph_index.ibd => become 1.9GB
>
> This is the before mentioned table. Since you put a huge volume of data
> during the reindex process this table has grown, and now it just doesn't
> want to shrink. Try running "optimize table sph_index;" at the mysql
> console to shrink it when the reindex has completed.

 ok now I get it. after "reindex -a", I need to wait cron or run it manually:
 "indexer.delta.sh", "indexer.main.sh". they will empty mysql
sph_index table and write to /var/piler/sphinx.
 and finally I can optimize the mysql table and get back disk spaces.
 thanks a lot for the detailed explain!!



Re: 1.3.9+ and CJK support in sphinx.conf

2020-12-28 Thread d tbsky




>
>
>
> Hello,
>
> On 2020-12-27 08:31, d tbsky wrote:
> >I want to upgrade our piler from "1.3.4 + sphinx 2.2" to "1.3.9/10
> > + sphinx 3.3.1".
> >   I notice there is a new config entry at sphinx.conf:
> > "SPHINX_CHARSET_TABLE".I am still confused after reading the document.
> >if I already setup "ngram_len" and "ngram_chars"  as FAQ said, do I
> > need to change the setting of "SPHINX_CHARSET_TABLE" of piler default
> > to something else?
> >thanks a lot for your help!!
>
> the SPHINX_CHARSET_TABLE settings cover the characters that most Western
> countries use. If you want support for CJK languages, then you need to
> uncomment ngram_len and ngram_chars in the line of NGRAM_CONFIG.
>
> However, frankly I don't have much experience with CJK stuff, so I count
> on your and other's experience and feedback whether the mentioned setup
> works or if it needs some improvement.

I didn't change the settings of "SPHINX_CHARSET_TABLE" and it seems
works fine.  I don't really know how sphinx works, but I guess with
"ngram_len = 1", sphinx just treat any CJK characters as words?  so if
I search a CJK "word" which consist of  two "characters",  piler will
return results with any character.  however if I double quote the
"word", then piler will return result with the exact "word".  I think
the behavior is ok.



Re: 1.3.9+ and CJK support in sphinx.conf

2020-12-28 Thread sj




Hello,

On 2020-12-27 08:31, d tbsky wrote:

   I want to upgrade our piler from "1.3.4 + sphinx 2.2" to "1.3.9/10
+ sphinx 3.3.1".
  I notice there is a new config entry at sphinx.conf:
"SPHINX_CHARSET_TABLE".I am still confused after reading the document.
   if I already setup "ngram_len" and "ngram_chars"  as FAQ said, do I
need to change the setting of "SPHINX_CHARSET_TABLE" of piler default
to something else?
   thanks a lot for your help!!


the SPHINX_CHARSET_TABLE settings cover the characters that most Western
countries use. If you want support for CJK languages, then you need to
uncomment ngram_len and ngram_chars in the line of NGRAM_CONFIG.

However, frankly I don't have much experience with CJK stuff, so I count
on your and other's experience and feedback whether the mentioned setup
works or if it needs some improvement.

Janos



Re: sphinx reindex question

2020-12-28 Thread sj




Hello,

On 2020-12-28 08:56, d tbsky wrote:


I was testing master commit fb9150f and upgrade from sphinx 2.2 to
3.3.1. according to piler manual, I need to reindex everything.
   so I install a new machine.  copy /var/piler and /var/lib/mysql  to
new machine and do procedure below:

  rm -f /var/piler/sphinx/*
  runuser -u piler -- indexer --all --config /etc/piler/sphinx.conf
  cd /var/piler/tmp
  runuser -u piler -- reindex -a  (why this command didn't need to
specify /etc/piler/sphinx.conf?)


Because it has nothing to with sphinx. The reindex tool retrieves the
emails, parses them, and puts their contents to the sph_index table.
Then the sphinx indexer reads and process this table.


 I found I can not understand what happened.
 the size at old system:
 /var/piler/sphinx => 300MB
/var/lib/mysql/piler/sph_index.ibd => 22M

the size at new system after reindex:
/var/piler/sphinx => 89M, which didn't change at all at the reindex 
process


sphinx 3.3.1 has some improvements over the 2.2 series. If you can find
all your email, then don't worry.


/var/lib/mysql/piler/sph_index.ibd => become 1.9GB


This is the before mentioned table. Since you put a huge volume of data
during the reindex process this table has grown, and now it just doesn't
want to shrink. Try running "optimize table sph_index;" at the mysql
console to shrink it when the reindex has completed.



/var/piler/tmp => 17G, the same size as /var/piler/store

so the whole process seems extract mails from /var/piler/store to
/var/piler/tmp, and write to mysql.

I don't know is this behavior correct? why reindex touch nothing at
/var/piler/sphinx and why mysql db become so huge.

I try to open piler web UI and do some mail search, it seems working
fine, but I am still confused.


That's the ultimate test that you have done it well or not :-)

Janos