Piler 1.35 on Ubuntu 18.04 - piler-smtp SIGSEGV
Hello everyone, a few weeks ago 2 of our Piler instances began segfaulting - more precisely, several times per month (but not every day night) at about 02:30 am local time the MS Exchange server that has the Piler instance setup as journaling target connects to piler-smtp, upon which piler-smtp segfaults after trying to store an email that clearly doesn't get parsed correctly by piler-smtp as it is represented as gibberish (and gibberish cannot be renamed...) See below for a few of the latest examples. Interestingly, the SIGSEGV did not happen Sunday and Monday night, meaning the day following a non-weekday? Not going on long enough that I could call it a pattern, but in any case it would be nice if piler-smtp didn't segfault in the first place. What can I do to supply more useful information to get this fixed? Or has there been meaningful work on piler-smtp in the meantime that this could / would've already been fixed? Thanks, Patrick Feb 4 02:30:01 hostname CRON[24428]: (piler) CMD (/usr/bin/find /var/www/piler/tmp -type f -name i.\* -exec rm -f {} \;) Feb 4 02:30:01 hostname CRON[24430]: (piler) CMD (/usr/bin/find /var/piler/error -type f|wc -l > /var/piler/stat/error) Feb 4 02:30:01 hostname CRON[24429]: (piler) CMD (/usr/local/libexec/piler/indexer.main.sh) Feb 4 02:30:01 hostname CRON[24431]: (piler) CMD (/usr/bin/indexer --quiet note1 --rotate) Feb 4 02:30:01 hostname CRON[24436]: (piler) CMD (/usr/bin/indexer --quiet tag1 --rotate) Feb 4 02:30:01 hostname piler: INDEXER INFO: merging to main started Feb 4 02:30:01 hostname CRON[24426]: (CRON) info (No MTA installed, discarding output) Feb 4 02:30:01 hostname CRON[24425]: (CRON) info (No MTA installed, discarding output) Feb 4 02:30:20 hostname piler-smtp[1144]: connected from 192.168.165.252:58292 on fd=6 (active connections: 1) Feb 4 02:30:20 hostname piler-smtp[1144]: connected from 192.168.165.252:58294 on fd=9 (active connections: 2) Feb 4 02:30:20 hostname piler-smtp[1144]: disconnected from 192.168.165.252 on fd=6 (1 active connections) Feb 4 02:30:20 hostname piler-smtp[1144]: disconnected from 192.168.165.252 on fd=9 (0 active connections) Feb 4 02:30:27 hostname piler-smtp[1144]: connected from 192.168.165.154:41139 on fd=6 (active connections: 1) Feb 4 02:32:18 hostname piler-smtp[1144]: client 192.168.165.154 timeout Feb 4 02:32:18 hostname piler-smtp[1144]: disconnected from 192.168.165.154 on fd=6 (0 active connections) Feb 4 02:32:18 hostname piler-smtp[1144]: Removing 6KHX6ZMY7OD1R0KG Feb 4 02:32:18 hostname piler-smtp[1144]: received: #013}^?, from=, size=4940, client=192.168.165.154, fd=6 Feb 4 02:32:18 hostname piler-smtp[1144]: ERROR: couldn't rename #013}^? to 2/ #013}^? (reason: no such file or directory) Feb 4 02:32:18 hostname piler-smtp[1144]: got signal: 11, Segmentation fault Feb 4 02:32:18 hostname piler-smtp[1144]: piler-smtp has been terminated Feb 5 02:32:15 hostname piler-smtp[28329]: connected from 192.168.165.252:57818 on fd=6 (active connections: 1) Feb 5 02:32:15 hostname piler-smtp[28329]: connected from 192.168.165.252:57820 on fd=9 (active connections: 2) Feb 5 02:32:15 hostname piler-smtp[28329]: disconnected from 192.168.165.252 on fd=6 (1 active connections) Feb 5 02:32:15 hostname piler-smtp[28329]: disconnected from 192.168.165.252 on fd=9 (0 active connections) Feb 5 02:33:15 hostname piler-smtp[28329]: connected from 192.168.165.252:60458 on fd=6 (active connections: 1) Feb 5 02:33:15 hostname piler-smtp[28329]: connected from 192.168.165.252:60460 on fd=9 (active connections: 2) Feb 5 02:33:15 hostname piler-smtp[28329]: disconnected from 192.168.165.252 on fd=6 (1 active connections) Feb 5 02:33:15 hostname piler-smtp[28329]: disconnected from 192.168.165.252 on fd=9 (0 active connections) Feb 5 02:34:15 hostname piler-smtp[28329]: connected from 192.168.165.252:34840 on fd=6 (active connections: 1) Feb 5 02:34:15 hostname piler-smtp[28329]: connected from 192.168.165.252:34842 on fd=9 (active connections: 2) Feb 5 02:34:15 hostname piler-smtp[28329]: disconnected from 192.168.165.252 on fd=6 (1 active connections) Feb 5 02:34:15 hostname piler-smtp[28329]: disconnected from 192.168.165.252 on fd=9 (0 active connections) Feb 5 02:34:29 hostname piler-smtp[28329]: connected from 192.168.165.154:25626 on fd=6 (active connections: 1) Feb 5 02:35:01 hostname CRON[30058]: (piler) CMD (/usr/bin/find /var/piler/error -type f|wc -l > /var/piler/stat/error) Feb 5 02:35:01 hostname CRON[30060]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Feb 5 02:35:01 hostname CRON[30059]: (piler) CMD (/usr/local/libexec/piler/indexer.delta.sh) Feb 5 02:35:01 hostname CRON[30063]: (piler) CMD (/usr/bin/find /var/www/piler/tmp -type f -name i.\* -exec rm -f {} \;) Feb 5 02:35:01 hostname piler: INDEXER ERROR: indexer merging to main index is already running. It started at Fr 5. Feb 02:30:01 CET
Re: Piler 1.35 on Ubuntu 18.04 - piler-smtp SIGSEGV
Hello Patrick, I've been reported such issues when a backup system paused or froze the virtual machine running piler. I suspect that it might be the case for you based on the time and days you mentioned. The problem in such cases is that piler-smtp is in the middle of processing an email, then the system is frozen for a few minutes enough to pass the smtp timeout limit (60 sec), and then when it's unfrozen then piler-smtp notices the timeout situation, then tears down the connection. However, at the end of the session it tries to tear down the already terminated and freed connection data. So, let's try the following: get the master branch, recompile piler, however replace the piler-smtp binary only, nothing else. Then let's see if the problem is solved. Janos On 2021-02-09 07:56, Wagner, Patrick wrote: Hello everyone, a few weeks ago 2 of our Piler instances began segfaulting - more precisely, several times per month (but not every day night) at about 02:30 am local time the MS Exchange server that has the Piler instance setup as journaling target connects to piler-smtp, upon which piler-smtp segfaults after trying to store an email that clearly doesn't get parsed correctly by piler-smtp as it is represented as gibberish (and gibberish cannot be renamed...) See below for a few of the latest examples. Interestingly, the SIGSEGV did not happen Sunday and Monday night, meaning the day following a non-weekday? Not going on long enough that I could call it a pattern, but in any case it would be nice if piler-smtp didn't segfault in the first place. What can I do to supply more useful information to get this fixed? Or has there been meaningful work on piler-smtp in the meantime that this could / would've already been fixed? Thanks, Patrick Feb 4 02:30:01 hostname CRON[24428]: (piler) CMD (/usr/bin/find /var/www/piler/tmp -type f -name i.\* -exec rm -f {} \;) Feb 4 02:30:01 hostname CRON[24430]: (piler) CMD (/usr/bin/find /var/piler/error -type f|wc -l > /var/piler/stat/error) Feb 4 02:30:01 hostname CRON[24429]: (piler) CMD (/usr/local/libexec/piler/indexer.main.sh) Feb 4 02:30:01 hostname CRON[24431]: (piler) CMD (/usr/bin/indexer --quiet note1 --rotate) Feb 4 02:30:01 hostname CRON[24436]: (piler) CMD (/usr/bin/indexer --quiet tag1 --rotate) Feb 4 02:30:01 hostname piler: INDEXER INFO: merging to main started Feb 4 02:30:01 hostname CRON[24426]: (CRON) info (No MTA installed, discarding output) Feb 4 02:30:01 hostname CRON[24425]: (CRON) info (No MTA installed, discarding output) Feb 4 02:30:20 hostname piler-smtp[1144]: connected from 192.168.165.252:58292 on fd=6 (active connections: 1) Feb 4 02:30:20 hostname piler-smtp[1144]: connected from 192.168.165.252:58294 on fd=9 (active connections: 2) Feb 4 02:30:20 hostname piler-smtp[1144]: disconnected from 192.168.165.252 on fd=6 (1 active connections) Feb 4 02:30:20 hostname piler-smtp[1144]: disconnected from 192.168.165.252 on fd=9 (0 active connections) Feb 4 02:30:27 hostname piler-smtp[1144]: connected from 192.168.165.154:41139 on fd=6 (active connections: 1) Feb 4 02:32:18 hostname piler-smtp[1144]: client 192.168.165.154 timeout Feb 4 02:32:18 hostname piler-smtp[1144]: disconnected from 192.168.165.154 on fd=6 (0 active connections) Feb 4 02:32:18 hostname piler-smtp[1144]: Removing 6KHX6ZMY7OD1R0KG Feb 4 02:32:18 hostname piler-smtp[1144]: received: #013}^?, from=, size=4940, client=192.168.165.154, fd=6 Feb 4 02:32:18 hostname piler-smtp[1144]: ERROR: couldn't rename #013}^? to 2/ #013}^? (reason: no such file or directory) Feb 4 02:32:18 hostname piler-smtp[1144]: got signal: 11, Segmentation fault Feb 4 02:32:18 hostname piler-smtp[1144]: piler-smtp has been terminated Feb 5 02:32:15 hostname piler-smtp[28329]: connected from 192.168.165.252:57818 on fd=6 (active connections: 1) Feb 5 02:32:15 hostname piler-smtp[28329]: connected from 192.168.165.252:57820 on fd=9 (active connections: 2) Feb 5 02:32:15 hostname piler-smtp[28329]: disconnected from 192.168.165.252 on fd=6 (1 active connections) Feb 5 02:32:15 hostname piler-smtp[28329]: disconnected from 192.168.165.252 on fd=9 (0 active connections) Feb 5 02:33:15 hostname piler-smtp[28329]: connected from 192.168.165.252:60458 on fd=6 (active connections: 1) Feb 5 02:33:15 hostname piler-smtp[28329]: connected from 192.168.165.252:60460 on fd=9 (active connections: 2) Feb 5 02:33:15 hostname piler-smtp[28329]: disconnected from 192.168.165.252 on fd=6 (1 active connections) Feb 5 02:33:15 hostname piler-smtp[28329]: disconnected from 192.168.165.252 on fd=9 (0 active connections) Feb 5 02:34:15 hostname piler-smtp[28329]: connected from 192.168.165.252:34840 on fd=6 (active connections: 1) Feb 5 02:34:15 hostname piler-smtp[28329]: connected from 192.168.165.252:34842 on fd=9 (active connections: 2) Feb 5 02:34:15 hostname piler-smtp[28329]: disconnected from 192.168.165.252 on fd=6 (1 active connections) Feb 5 02:34:15 hostname