Dovecot Index Corruption and dovecot-uidlist Issues

2019-10-25 Thread Billy Luedtke via dovecot
Hello,

We have recently implemented dovecot directors to resolve an ongoing NFS 
lock/corruption issue. Everything was working great! So we updated our servers 
from centos 7.3 to 7.7 and dovecot 2.2.10 to 2.3.8
This was a bit of a failure and we were getting spammed with corrupt index 
files and logs. After we clear the index files with rm 
rf/mail/d/e/username/Maildir/dovecot*, they just come back in.
From:
dovecot-mysql-2.3.8-4.x86_64
dovecot-2.3.8-4.x86_64
dovecot-pigeonhole-2.3.8-4.x86_64
To:
dovecot-2.2.10-8.el7.x86_64
dovecot-mysql-2.2.10-8.el7.x86_64
dovecot-pigeonhole-2.2.10-8.el7.x86_64

We rolled back nfs-utils and libs due to a case on redhats bug report 
"https://bugzilla.redhat.com/show_bug.cgi?id=1521071;
libnfsidmap-0.25-19.el7.x86_64
nfs-utils-1.3.0-0.65.el7.x86_64
to
libnfsidmap-0.25-15.el7.x86_64
nfs-utils-1.3.0-0.33.el7.x86_64

We rolled back our dovecot version to 2.2.10 and are still getting the 
following logs.

Any advice would help

Network view:

|Dovecot Directors x 3|
 |
|Dovecot servers x4| Storage is shared between all 4 servers
 |
|Netapp NFSv3 Storage| Nothing changed here


Oct 25 01:33:25  dovecot: imap(): Error: Log synchronization 
error at seq=89,offset=32340 for /mail/j/k//Maildir/dovecot.index: 
Extension record update for invalid uid=6798
Oct 25 01:33:25  dovecot: imap(): Error: Log synchronization 
error at seq=89,offset=32384 for /mail/j/k//Maildir/dovecot.index: 
Extension record update for invalid uid=6798
Oct 25 01:33:25  dovecot: imap(): Error: Synchronization 
corrupted index header /mail/j/k//Maildir/dovecot.index: 
first_recent_uid 6799 > next_uid 6798
Oct 25 01:33:25  dovecot: imap(): Warning: fscking index file 
/mail/j/k//Maildir/dovecot.index
Oct 25 01:33:25  dovecot: imap(): Error: Fixed index file 
/mail/j/k//Maildir/dovecot.index: first_recent_uid 6799 -> 6798
Oct 25 01:33:26  dovecot: imap(): Error: Broken file 
/mail/j/k//Maildir/dovecot-uidlist line 6461: Invalid data:
Oct 25 01:33:26  dovecot: imap(): Error: Log synchronization 
error at seq=89,offset=32488 for /mail/j/k//Maildir/dovecot.index: 
Append with UID 6798, but next_uid = 6799
Oct 25 01:33:26  dovecot: imap(): Warning: fscking index file 
/mail/j/k//Maildir/dovecot.index
Oct 25 01:47:14  dovecot: imap(): Warning: fscking index file 
/mail/d/e//Maildir/dovecot.index
Oct 25 01:47:14  dovecot: imap(): Error: Failed to map 
transaction log /mail/d/e//Maildir/dovecot.index.log at 
sync_offset=9900 after locking: pread() failed: Stale file handle
Oct 25 01:47:14  dovecot: imap(): Error: Index 
/mail/d/e//Maildir/dovecot.index: Lost log for seq=16 offset=192: 
Missing middle file seq=16 (between 16..4294967295, we have seqs 14): Requested 
newer log than exists: Log inode is unchanged (initial_mapped=1, reason=Index 
mapped)
Oct 25 01:51:14  dovecot: imap(): Error: Broken file 
/mail/d/e//Maildir/dovecot-uidlist line 65: Invalid data:

Dovecot -n

# 2.2.10: /etc/dovecot/dovecot.conf
# OS: Linux 3.10.0-1062.1.2.el7.x86_64 x86_64 CentOS Linux release 7.7.1908 
(Core)
auth_mechanisms = plain login
auth_username_format = %Ln
auth_verbose = yes
auth_verbose_passwords = plain
disable_plaintext_auth = no
first_valid_uid = 300
lda_mailbox_autocreate = yes
lda_mailbox_autosubscribe = yes
mail_fsync = always
mail_location = maildir:~/Maildir
mail_nfs_index = yes
mail_nfs_storage = yes
maildir_very_dirty_syncs = yes
managesieve_notify_capability = mailto
managesieve_sieve_capability = fileinto reject envelope encoded-character 
vacation subaddress comparator-i;ascii-numeric relational regex imap4flags copy 
include variables body enotify environment mailbox date ihave
mmap_disable = yes
namespace inbox {
  inbox = yes
  location =
  mailbox Drafts {
special_use = \Drafts
  }
  mailbox Junk {
special_use = \Junk
  }
  mailbox Sent {
special_use = \Sent
  }
  mailbox "Sent Messages" {
special_use = \Sent
  }
  mailbox Trash {
special_use = \Trash
  }
  prefix =
}
passdb {
  args = /etc/dovecot/sql.conf.ext
  driver = sql
}
plugin {
  sieve = ~/.dovecot.sieve
  sieve_dir = ~/sieve
  sieve_global_dir = /mail/sieve/global/
  sieve_global_path = /mail/sieve/default.sieve
}
postmaster_address = postmaster@%d
protocols = imap pop3 lmtp sieve
service auth {
  client_limit = 0
  inet_listener {
port = 11010
  }
  process_limit = 1
  process_min_avail = 0
  service_count = 0
}
service dict {
  unix_listener dict {
group = vmail
mode = 0600
user = vmail
  }
}
service imap-login {
  inet_listener imap {
port = 143
  }
  inet_listener imaps {
port = 993
ssl = yes
  }
  process_limit = 2
  process_min_avail = 2
  service_count = 0
}
service imap {
  client_limit = 1
  executable = imap
  process_limit = 2048
  process_min_avail = 50
  service_count = 1
}
service lmtp {
  client_limit = 1
  process_limit = 0
  process_min_avail = 0
  service_count = 0
  unix_listener /var/spool/postfix/private/dovecot-lmtp {
group = postfix

warning: NFS hangs with dovecot 2.3.8 on Debian buster

2019-10-25 Thread Jan-Pieter Cornet via dovecot

A warning to those considering to upgrade to Debian 10 (buster): we have seen 
occasional NFS hangs with dovecot when using the stock debian buster kernel 
(4.19.67-2+deb10u1).

When we downgrade to the debian stretch kernel (4.9.189-3+deb9u1), the issue 
does not occur. Note that we *only* downgraded the kernel, the rest of the OS 
is still debian buster. Dovecot 2.3.8.

A little more info: we have a dovecot cluster, using mdbox for storage, on an 
NFS server (netapp Cmode version 9.6P2). We use a dovecot director layer, so a 
user is always connected to the same back-end dovecot server. The NFS hang 
occurs on the back-end server.

Once the process hangs, other processes trying to write to the same mailbox, 
will get an error like this:

Timeout (180s) while waiting for lock for transaction log file 
/var/mail/.../index/storage/dovecot.map.index.log (WRITE lock held by pid )

The stuck process itself doesn't seem to do anything, is stuck in "D" disk state, 
"strace" doesn't show anything (and after attaching, strace itself needs a kill -KILL to 
stop). The only way to unwedge the process seems to be to do a kill -KILL of the stuck process. 
Reading from the mailbox is still possible.

We are in the process of contacting the linux-nfs folks about this, but I'm 
trying to reproduce this on a test-cluster first, to be able to present a 
well-documented case. Since this hang doesn't happen immediately, but takes a 
few hours to a day to occur in the wild, or a few thousand writes to the same 
mailbox, it's a bit hard to debug.

--
Jan-Pieter Cornet 
Systeembeheer XS4ALL Internet bv
www.xs4all.nl




signature.asc
Description: OpenPGP digital signature