Hello Janos,


happened again, the same way, with the recompiled piler-smtp binary (git master).

~# piler-smtp -V
1.3.10 build 998

Feb 10 02:30:01 hostname CRON[8429]: (piler) CMD (/usr/bin/indexer --quiet note1 --rotate) Feb 10 02:30:01 hostname CRON[8427]: (piler) CMD (/usr/bin/find /var/piler/error -type f|wc -l > /var/piler/stat/error) Feb 10 02:30:01 hostname CRON[8428]: (piler) CMD (/usr/bin/find /var/www/piler/tmp -type f -name i.\* -exec rm -f {} \;) Feb 10 02:30:01 hostname CRON[8431]: (piler) CMD (/usr/local/libexec/piler/indexer.main.sh) Feb 10 02:30:01 hostname CRON[8430]: (piler) CMD (/usr/bin/indexer --quiet tag1 --rotate) Feb 10 02:30:01 hostname CRON[8424]: (CRON) info (No MTA installed, discarding output) Feb 10 02:30:01 hostname CRON[8425]: (CRON) info (No MTA installed, discarding output)
Feb 10 02:30:01 hostname piler: INDEXER INFO: merging to main started
Feb 10 02:30:17 hostname piler-smtp[27596]: connected from 192.168.165.252:57902 on fd=6 (active connections: 1) Feb 10 02:30:17 hostname piler-smtp[27596]: connected from 192.168.165.252:57908 on fd=9 (active connections: 2) Feb 10 02:30:17 hostname piler-smtp[27596]: disconnected from 192.168.165.252 on fd=6, slot=0 (1 active connections) Feb 10 02:30:17 hostname piler-smtp[27596]: disconnected from 192.168.165.252 on fd=9, slot=1 (0 active connections) Feb 10 02:30:31 hostname piler-smtp[27596]: connected from 192.168.165.154:32120 on fd=6 (active connections: 1) Feb 10 02:31:47 hostname piler-smtp[27596]: received: ZQAHOOJ8RXM5NCHD, from=, size=4945, client=192.168.165.154, fd=6, fsync=75881359 Feb 10 02:31:47 hostname piler-smtp[27596]: client 192.168.165.154 timeout, lasttime: 1612920631 Feb 10 02:31:47 hostname piler-smtp[27596]: disconnected from 192.168.165.154 on fd=6, slot=0 (0 active connections) Feb 10 02:31:47 hostname piler-smtp[27596]: got signal: 11, Segmentation fault
Feb 10 02:31:47 hostname piler-smtp[27596]: piler-smtp has been terminated
Feb 10 02:31:48 hostname piler[27599]: 0/ZQAHOOJ8RXM5NCHD: 400000006023378d2c9c99d400a51d237bc0, size=4945/1936, attachments=0, reference=, message-id=<4Db2JR32vQz5vS9@mailrelay.intern>, retention=7350, folder=0, delay=0.9705, status=stored

There were no changes to backup schedules recently, and no scheduled job or I/O heavy maintenance should happen, especially not at this point in time. Still, point taken, we will check whether there could be anything, but why would only 1-2 VMs out of hundreds be affected?

In any case, SIGSEGV still happens and my piler-smtp process is dead.

- Patrick

On 09.02.2021 09:04, s...@acts.hu wrote:


Hello Patrick,

I've been reported such issues when a backup system paused or froze
the virtual machine running piler. I suspect that it might be the
case for you based on the time and days you mentioned.

The problem in such cases is that piler-smtp is in the middle of
processing an email, then the system is frozen for a few minutes
enough to pass the smtp timeout limit (60 sec), and then when it's
unfrozen then piler-smtp notices the timeout situation, then tears
down the connection. However, at the end of the session it tries
to tear down the already terminated and freed connection data.

So, let's try the following: get the master branch, recompile piler,
however replace the piler-smtp binary only, nothing else. Then let's
see if the problem is solved.

Janos

On 2021-02-09 07:56, Wagner, Patrick wrote:
Hello everyone,

a few weeks ago 2 of our Piler instances began segfaulting - more
precisely, several times per month (but not every day night) at about
02:30 am local time the MS Exchange server that has the Piler instance
setup as journaling target connects to piler-smtp, upon which
piler-smtp segfaults after trying to store an email that clearly
doesn't get parsed correctly by piler-smtp as it is represented as
gibberish (and gibberish cannot be renamed...)

See below for a few of the latest examples. Interestingly, the SIGSEGV
did not happen Sunday and Monday night, meaning the day following a
non-weekday? Not going on long enough that I could call it a pattern,
but in any case it would be nice if piler-smtp didn't segfault in the
first place.

What can I do to supply more useful information to get this fixed? Or
has there been meaningful work on piler-smtp in the meantime that this
could / would've already been fixed?


Thanks, Patrick


Feb  4 02:30:01 hostname CRON[24428]: (piler) CMD (/usr/bin/find
/var/www/piler/tmp -type f -name i.\* -exec rm -f {} \;)
Feb  4 02:30:01 hostname CRON[24430]: (piler) CMD (/usr/bin/find
/var/piler/error -type f|wc -l > /var/piler/stat/error)
Feb  4 02:30:01 hostname CRON[24429]: (piler) CMD
(/usr/local/libexec/piler/indexer.main.sh)
Feb  4 02:30:01 hostname CRON[24431]: (piler) CMD (/usr/bin/indexer
--quiet note1 --rotate)
Feb  4 02:30:01 hostname CRON[24436]: (piler) CMD (/usr/bin/indexer
--quiet tag1 --rotate)
Feb  4 02:30:01 hostname piler: INDEXER INFO: merging to main started
Feb  4 02:30:01 hostname CRON[24426]: (CRON) info (No MTA installed,
discarding output)
Feb  4 02:30:01 hostname CRON[24425]: (CRON) info (No MTA installed,
discarding output)
Feb  4 02:30:20 hostname piler-smtp[1144]: connected from
192.168.165.252:58292 on fd=6 (active connections: 1)
Feb  4 02:30:20 hostname piler-smtp[1144]: connected from
192.168.165.252:58294 on fd=9 (active connections: 2)
Feb  4 02:30:20 hostname piler-smtp[1144]: disconnected from
192.168.165.252 on fd=6 (1 active connections)
Feb  4 02:30:20 hostname piler-smtp[1144]: disconnected from
192.168.165.252 on fd=9 (0 active connections)
Feb  4 02:30:27 hostname piler-smtp[1144]: connected from
192.168.165.154:41139 on fd=6 (active connections: 1)
Feb  4 02:32:18 hostname piler-smtp[1144]: client 192.168.165.154 timeout
Feb  4 02:32:18 hostname piler-smtp[1144]: disconnected from
192.168.165.154 on fd=6 (0 active connections)
Feb  4 02:32:18 hostname piler-smtp[1144]: Removing 6KHX6ZMY7OD1R0KG
Feb  4 02:32:18 hostname piler-smtp[1144]: received: <A4><BD>#013}^?,
from=, size=4940, client=192.168.165.154, fd=6
Feb  4 02:32:18 hostname piler-smtp[1144]: ERROR: couldn't rename
<A4><BD>#013}^? to 2/ <A4><BD>#013}^? (reason: no such file or
directory)
Feb  4 02:32:18 hostname piler-smtp[1144]: got signal: 11, Segmentation fault Feb  4 02:32:18 hostname piler-smtp[1144]: piler-smtp has been terminated
########
Feb  5 02:32:15 hostname piler-smtp[28329]: connected from
192.168.165.252:57818 on fd=6 (active connections: 1)
Feb  5 02:32:15 hostname piler-smtp[28329]: connected from
192.168.165.252:57820 on fd=9 (active connections: 2)
Feb  5 02:32:15 hostname piler-smtp[28329]: disconnected from
192.168.165.252 on fd=6 (1 active connections)
Feb  5 02:32:15 hostname piler-smtp[28329]: disconnected from
192.168.165.252 on fd=9 (0 active connections)
Feb  5 02:33:15 hostname piler-smtp[28329]: connected from
192.168.165.252:60458 on fd=6 (active connections: 1)
Feb  5 02:33:15 hostname piler-smtp[28329]: connected from
192.168.165.252:60460 on fd=9 (active connections: 2)
Feb  5 02:33:15 hostname piler-smtp[28329]: disconnected from
192.168.165.252 on fd=6 (1 active connections)
Feb  5 02:33:15 hostname piler-smtp[28329]: disconnected from
192.168.165.252 on fd=9 (0 active connections)
Feb  5 02:34:15 hostname piler-smtp[28329]: connected from
192.168.165.252:34840 on fd=6 (active connections: 1)
Feb  5 02:34:15 hostname piler-smtp[28329]: connected from
192.168.165.252:34842 on fd=9 (active connections: 2)
Feb  5 02:34:15 hostname piler-smtp[28329]: disconnected from
192.168.165.252 on fd=6 (1 active connections)
Feb  5 02:34:15 hostname piler-smtp[28329]: disconnected from
192.168.165.252 on fd=9 (0 active connections)
Feb  5 02:34:29 hostname piler-smtp[28329]: connected from
192.168.165.154:25626 on fd=6 (active connections: 1)
Feb  5 02:35:01 hostname CRON[30058]: (piler) CMD (/usr/bin/find
/var/piler/error -type f|wc -l > /var/piler/stat/error)
Feb  5 02:35:01 hostname CRON[30060]: (root) CMD (command -v
debian-sa1 > /dev/null && debian-sa1 1 1)
Feb  5 02:35:01 hostname CRON[30059]: (piler) CMD
(/usr/local/libexec/piler/indexer.delta.sh)
Feb  5 02:35:01 hostname CRON[30063]: (piler) CMD (/usr/bin/find
/var/www/piler/tmp -type f -name i.\* -exec rm -f {} \;)
Feb  5 02:35:01 hostname piler: INDEXER ERROR: indexer merging to main
index is already running. It started at Fr 5. Feb 02:30:01 CET 2021
Feb  5 02:35:51 hostname piler-smtp[28329]: client 192.168.165.154 timeout
Feb  5 02:35:51 hostname piler-smtp[28329]: disconnected from
192.168.165.154 on fd=6 (0 active connections)
Feb  5 02:35:51 hostname piler-smtp[28329]: Removing LYLJITGU9CLG714L
Feb  5 02:35:51 hostname piler-smtp[28329]: received: <C4>#035<DB>#^?,
from=, size=4940, client=192.168.165.154, fd=6
Feb  5 02:35:51 hostname piler-smtp[28329]: ERROR: couldn't rename
<C4>#035<DB>#^? to 2/ <C4>#035<DB>#^? (reason: no such file or
directory)
Feb  5 02:35:51 hostname piler-smtp[28329]: got signal: 11, Segmentation fault Feb  5 02:35:51 hostname piler-smtp[28329]: piler-smtp has been terminated
########
Feb  6 02:32:19 hostname kernel: [54858.889863] ret_from_fork+0x35/0x40
Feb  6 02:32:58 hostname piler-smtp[932]: connected from
192.168.165.252:60962 on fd=6 (active connections: 1)
Feb  6 02:32:58 hostname piler-smtp[932]: disconnected from
192.168.165.252 on fd=6 (0 active connections)
Feb  6 02:32:58 hostname piler-smtp[932]: connected from
192.168.165.252:60978 on fd=6 (active connections: 1)
Feb  6 02:32:58 hostname piler-smtp[932]: disconnected from
192.168.165.252 on fd=6 (0 active connections)
Feb  6 02:32:58 hostname piler-smtp[932]: connected from
192.168.165.154:9236 on fd=6 (active connections: 1)
Feb  6 02:33:37 hostname piler-smtp[932]: client 192.168.165.154 timeout
Feb  6 02:33:38 hostname piler-smtp[932]: disconnected from
192.168.165.154 on fd=6 (0 active connections)
Feb  6 02:33:38 hostname piler-smtp[932]: Removing 2C8S47TCJPA3JY1B
Feb  6 02:33:38 hostname piler-smtp[932]: received: d<A0>#022#020^?,
from=, size=4938, client=192.168.165.154, fd=6
Feb  6 02:33:38 hostname piler-smtp[932]: ERROR: couldn't rename
d<A0>#022#020^? to 2/ d<A0>#022#020^? (reason: no such file or
directory)
Feb  6 02:33:38 hostname piler-smtp[932]: got signal: 11, Segmentation fault
Feb  6 02:33:38 hostname piler-smtp[932]: piler-smtp has been terminated
########
Feb  9 02:37:54 hostname piler-smtp[817]: connected from
192.168.165.154:47818 on fd=6 (active connections: 1)
Feb  9 02:39:00 hostname systemd[1]: Starting Clean php session files...
Feb  9 02:39:00 hostname systemd[1]: Started Clean php session files.
Feb  9 02:39:01 hostname CRON[31441]: (root) CMD (  [ -x
/usr/lib/php/sessionclean ] && if [ ! -d /run/systemd/system ]; then
/usr/lib/php/sessionclean; fi)
Feb  9 02:39:01 hostname CRON[31442]: (root) CMD (  [ -x
/usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [
-d /var/lib/php5 ] && /usr/lib/php5/se
ssionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime))
Feb  9 02:39:13 hostname piler-smtp[817]: client 192.168.165.154 timeout
Feb  9 02:39:13 hostname piler-smtp[817]: disconnected from
192.168.165.154 on fd=6 (0 active connections)
Feb  9 02:39:13 hostname piler-smtp[817]: Removing F8BS7OAUXENUK2W1
Feb  9 02:39:13 hostname piler-smtp[817]: received:
<84><CA><CA><FB>^?, from=, size=4940, client=192.168.165.154, fd=6
Feb  9 02:39:13 hostname piler-smtp[817]: ERROR: couldn't rename
<84><CA><CA><FB>^? to 2/ <84><CA><CA><FB>^? (reason: no such file or
directory)
Feb  9 02:39:13 hostname piler-smtp[817]: got signal: 11, Segmentation fault
Feb  9 02:39:13 hostname piler-smtp[817]: piler-smtp has been terminated
Feb  9 02:39:44 hostname piler[827]: 6/BYGS1XC5QYVNUFK0:
400000006021e6f93527cf9c0084b50d9de2, size=4940/1936, attachments=0,
reference=, message-id=<4DZQRr2C3Rz5vSM@mailrelay.intern>,
retention=7350, folder=0, delay=240.2417, status=stored

#########################################

root@hostname:~# piler -V
piler 1.3.5, build 997, Janos SUTO <s...@acts.hu>         // ready-made
RPM on the Piler website

Build Date: Tue Oct 8 21:29:23 CEST 2019
ldd version: ldd (Ubuntu GLIBC 2.27-3ubuntu1) 2.27
gcc version: gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
OS: Linux hostname 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17
17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Configure command: ./configure --localstatedir=/var
--with-database=mysql --enable-starttls --enable-tcpwrappers
MySQL client library version: 5.7.33
Extractors: /usr/bin/pdftotext /usr/bin/catdoc /usr/bin/catppt
/usr/bin/xls2csv /usr/bin/unrtf /usr/bin/tnef

root@hostname:~# piler-smtp -V
1.3.5 build 997
root@hostname:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.5 LTS
Release:        18.04
Codename:       bionic


Reply via email to