piler 1.2.0 is out!

Janos SUTO Mon, 17 Oct 2016 13:37:49 -0700

Dear piler users,

I've just released the latest stable version of piler. Actually it hasbeen released

a while ago, I just had no time to write this email.

Notice that it's 1.2.x, and not 1.1.x, that is there're some minorincompatibilitiesyou must be aware of. I've compiled a RELEASE_NOTES file which describessome of the

changes.

The most important change is that I've moved all piler related configsto ${sysconfdir}/piler

directory (with the default options it's /usr/local/etc/piler).

It means that whatever you had in /usr/local/etc, that must be moved to/usr/local/etc/piler,

eg. /usr/local/etc/piler.conf -> /usr/local/etc/piler/piler.conf, etc.

I've decided to put the sphinx config file to ${sysconfdir}/piler.Debian and Ubuntu shipa sphinx package which enabled a periodic indexer --all cron job, whichpractically destroysthe sphinx indices, and despite both the install docs and the FAQ warnabout it, many piler

users fell for this debian 'trick'.

To match the new path, I've updated the rc.searchd file, and the indexershell scripts as well.

If you upgraded, then be sure to run theutil/db-upgrade-1.1.0-vs-1.2.0.sql script. If you havequestions about the upgrade procedure, don't hesitate to ask. Irecommend you to run pilerconfafter the upgrade, and check if you get the values in piler.conf back.If so, then the config

files are at the proper new location.

What next? I have three interesting topics in my head. One of them ishigh availability.Currently your most basic option is to setup two archives (even indifferent locations),and have your mail server send copies of emails to both archives. Thenyou have twoindependent archives with the same content. Either of them goes down,your archived

emails are still accessible.

However it's not that elegant, and while this approach may work out foryou, it can beimproved. Mysql supports a cluster mode. Sphinx data can be replicatedeasily (think aboutrsync), however replicating the stored millions of files is not thateasy. I've seen somereplicating object stores, eg. swift from openstack or ambry fromlinkedin. I think they

could be used to replicate the stored encrypted and gzipped files.

Another idea in my head is zstandard. It's facebook's new compressingalgorithm whichoutperforms gzip in every way. Fortunately it can read gzipped data(=your alreadystored emails will be readable in the future), and new emails can becompressed with

zstandard's new algorithm offering better speed and compression.

The 3rd thing in my head is a non forking version of piler. An o365 userreported aproblem that he got lots of NDRs of undeliverable emails. It turned outthat o365has no means of flow control, so in case of a spike in the email volumethe default10 piler workers are not enough to handle the emails delivered inparalel. Aftera trial and error approach it required 40 piler workers to serve theload.

A non forking piler smtp server would solve the problem by onlyreceiving the emails,and amazingly fast. With such a processing model it can receive 100 oreven moresmtp sessions simultaneously very effectively. Then we need a fewworkers thatactually processing the stored emails, ie. parsing, indexing, encryptingand

storing.

I'm investigating the poll() mechanism at the moment. I've been told touse epoll,because it's much more efficient than select() or poll(), however epollis Linux only.So if anyone used piler on freebsd, solaris, etc. other than Linux, thenit wouldbe a problem. So before picking either poll or epoll, let me know ifsomeone

uses piler on a Unix flavor other than Linux.

Let me know what you think.

Janos

piler 1.2.0 is out!

Reply via email to