Dear piler users,

I've just released the latest stable version of piler. Actually it has been released
a while ago, I just had no time to write this email.

Notice that it's 1.2.x, and not 1.1.x, that is there're some minor incompatibilities you must be aware of. I've compiled a RELEASE_NOTES file which describes some of the
changes.

The most important change is that I've moved all piler related configs to ${sysconfdir}/piler
directory (with the default options it's /usr/local/etc/piler).

It means that whatever you had in /usr/local/etc, that must be moved to /usr/local/etc/piler,
eg. /usr/local/etc/piler.conf -> /usr/local/etc/piler/piler.conf, etc.

I've decided to put the sphinx config file to ${sysconfdir}/piler. Debian and Ubuntu ship a sphinx package which enabled a periodic indexer --all cron job, which practically destroys the sphinx indices, and despite both the install docs and the FAQ warn about it, many piler
users fell for this debian 'trick'.

To match the new path, I've updated the rc.searchd file, and the indexer shell scripts as well.

If you upgraded, then be sure to run the util/db-upgrade-1.1.0-vs-1.2.0.sql script. If you have questions about the upgrade procedure, don't hesitate to ask. I recommend you to run pilerconf after the upgrade, and check if you get the values in piler.conf back. If so, then the config
files are at the proper new location.

What next? I have three interesting topics in my head. One of them is high availability. Currently your most basic option is to setup two archives (even in different locations), and have your mail server send copies of emails to both archives. Then you have two independent archives with the same content. Either of them goes down, your archived
emails are still accessible.

However it's not that elegant, and while this approach may work out for you, it can be improved. Mysql supports a cluster mode. Sphinx data can be replicated easily (think about rsync), however replicating the stored millions of files is not that easy. I've seen some replicating object stores, eg. swift from openstack or ambry from linkedin. I think they
could be used to replicate the stored encrypted and gzipped files.

Another idea in my head is zstandard. It's facebook's new compressing algorithm which outperforms gzip in every way. Fortunately it can read gzipped data (=your already stored emails will be readable in the future), and new emails can be compressed with
zstandard's new algorithm offering better speed and compression.

The 3rd thing in my head is a non forking version of piler. An o365 user reported a problem that he got lots of NDRs of undeliverable emails. It turned out that o365 has no means of flow control, so in case of a spike in the email volume the default 10 piler workers are not enough to handle the emails delivered in paralel. After a trial and error approach it required 40 piler workers to serve the load.

A non forking piler smtp server would solve the problem by only receiving the emails, and amazingly fast. With such a processing model it can receive 100 or even more smtp sessions simultaneously very effectively. Then we need a few workers that actually processing the stored emails, ie. parsing, indexing, encrypting and
storing.

I'm investigating the poll() mechanism at the moment. I've been told to use epoll, because it's much more efficient than select() or poll(), however epoll is Linux only. So if anyone used piler on freebsd, solaris, etc. other than Linux, then it would be a problem. So before picking either poll or epoll, let me know if someone
uses piler on a Unix flavor other than Linux.

Let me know what you think.

Janos

Reply via email to