Re: Cyrus-imapd 2.4.17: processes stick on mailbox locking resulting in total mailsystem failure

2019-02-22 Thread Simon Matter via Info-cyrus
> Hi Ellie
>
> Thanks a lot, I will try to build and test 2.4.20

Maybe try this:

http://www.invoca.ch/pub/packages/cyrus-imapd/RPMS/ils-7/SRPMS/cyrus-imapd-2.4.20-2.el7.src.rpm

Regards,
Simon


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: Cyrus-imapd 2.4.17: processes stick on mailbox locking resulting in total mailsystem failure

2019-02-22 Thread Ivan Kuznetsov

Hi Ellie

Thanks a lot, I will try to build and test 2.4.20

22.02.2019 04:20, ellie timoney пишет:

Hi Ivan,


#7  0x56193ed6382e in idle_update ()



with cyrus-imapd 2.4.17 (cyrus-imapd-2.4.17-13.el7.x86_64 rpm)


It looks like this version is old enough to still be using signal handlers in 
its IMAP IDLE implementation.  This is known to be a problem, and IDLE was 
rewritten to not use signals for 2.5 and later.

Thomas Jarosch kindly backported the rewrite for 2.4, and it has been included 
in releases since 2.4.19 (the current 2.4 release is 2.4.20)

Cheers,

ellie

On Wed, Feb 20, 2019, at 1:15 AM, Ivan Kuznetsov wrote:

Hello

I have gdb'ed the locked process, backtrace is below. It seems that
problem occurs when imapd process catch a signal when is inside malloc()
call. The signal handler has a malloc() call too, so finally there is
interlock between two mallocs

I have only a few time to investigate because locked processes list
grows up dramatically. So I didn't found what the signal was. But it
seems that there is a bug in imapd code...

(gdb) thread apply all bt

Thread 1 (Thread 0x7ff6ead5f840 (LWP 22980)):
#0  0x7ff6e83e282c in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x7ff6e835e7e5 in _L_lock_16773 () from /lib64/libc.so.6
#2  0x7ff6e835b833 in malloc () from /lib64/libc.so.6
#3  0x56193edc9910 in xzmalloc ()
#4  0x56193edb2664 in seqset_init ()
#5  0x56193ed821b0 in index_tellchanges ()
#6  0x56193ed8232b in index_check ()
#7  0x56193ed6382e in idle_update ()
#8  
#9  0x7ff6e83580f0 in _int_malloc () from /lib64/libc.so.6
#10 0x7ff6e835b7dc in malloc () from /lib64/libc.so.6
#11 0x7ff6e956f4b8 in CRYPTO_malloc () from /lib64/libcrypto.so.10
#12 0x7ff6e96397ec in EVP_PKEY_CTX_dup () from /lib64/libcrypto.so.10
#13 0x7ff6e962b360 in EVP_MD_CTX_copy_ex () from /lib64/libcrypto.so.10
#14 0x7ff6e999ab3d in tls1_mac () from /lib64/libssl.so.10
#15 0x7ff6e998db02 in ssl3_read_bytes () from /lib64/libssl.so.10
#16 0x7ff6e998a644 in ssl3_read_internal () from /lib64/libssl.so.10
#17 0x56193edb9745 in prot_fill ()
#18 0x56193eda9ad7 in getword ()
#19 0x56193ed670e7 in cmd_idle ()
#20 0x56193ed7848d in cmdloop ()
#21 0x56193ed79769 in service_main ()
#22 0x56193ed62875 in main ()


13.12.2018 17:50, Ivan Kuznetsov пишет:

Jonk, thank you for the idea. Somewhat looks strange as old mail server
worked w/o this problem 5+ years. But the system environment changed
dramatically, may be some filesystem quircks are significant for this
locks...

I will try gdb'ing the process when problem occurs once more

13.12.2018 17:34, John Wade пишет:

Without running gdb on the process, I have no idea, but your problem
sounds similar to something we hit a very long time ago:

See https://www.oakton.edu/user/3/jwade/cyrus/Readme.html

In our cases, the problem was the imapd process that was holding the
lock was trying to obtain a second lock on the same file.   What does
a stack trace look like on the imapd process that is holding the lock?
It would appear the lock process has changed since I last looked at
this, so this may not be a help at all.

Good luck,
John



On 12/13/2018 5:21 AM, Ivan Kuznetsov wrote:

Hello

We had a company mail server under Oracle Linux 6 (a clone of RHEL6)
with cyrus-imapd 2.3.16 working for years w/o problems. There are >9M
messages in ~9500 mailboxes in two partitions. Daily mail traffic is
20-50K messages.

Some weeks ago we migrated the server to a new one under Oracle Linux
7 with cyrus-imapd 2.4.17 (cyrus-imapd-2.4.17-13.el7.x86_64 rpm) and
now have problem. Some times a week one of imapd processes locks an
"INBOX" mailbox with corresponding
/var/lib/imap/lock/user/.lock file and does not unlock it
anymore. Other imapd processes trying to access this mailbox sticks
waiting to obtain the lock. The bigger problem is that lmtpd
processes trying to deliver new mail to this mailbox hangs too. The
number of lmtpd processes is limited (maxchild=32) to limit the
server load, so free lmtpd pool become empty after a time, and mail
delivery stopsto all the mailboxes. MTA (postfix) mail queue blows up
quickly

Example lsof output:

lmtpd   8182 cyrus   18uR  REG  249,0 0 202944402
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   8183 cyrus   18uR  REG  249,0 0 202944402
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   8187 cyrus   18uR  REG  249,0 0 202944402
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   8771 cyrus   18uR  REG  249,0 0 202944402
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   8834 cyrus   18uR  REG  249,0 0 202944402
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   9123 cyrus   18uR  REG  249,0 0 202944402
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   9631 cyrus   18uR  REG  249,0 0 202944402
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   

Re: Cyrus-imapd 2.4.17: processes stick on mailbox locking resulting in total mailsystem failure

2019-02-21 Thread ellie timoney
Hi Ivan,

> #7  0x56193ed6382e in idle_update ()

> with cyrus-imapd 2.4.17 (cyrus-imapd-2.4.17-13.el7.x86_64 rpm)

It looks like this version is old enough to still be using signal handlers in 
its IMAP IDLE implementation.  This is known to be a problem, and IDLE was 
rewritten to not use signals for 2.5 and later.

Thomas Jarosch kindly backported the rewrite for 2.4, and it has been included 
in releases since 2.4.19 (the current 2.4 release is 2.4.20)

Cheers,

ellie

On Wed, Feb 20, 2019, at 1:15 AM, Ivan Kuznetsov wrote:
> Hello
> 
> I have gdb'ed the locked process, backtrace is below. It seems that 
> problem occurs when imapd process catch a signal when is inside malloc() 
> call. The signal handler has a malloc() call too, so finally there is 
> interlock between two mallocs
> 
> I have only a few time to investigate because locked processes list 
> grows up dramatically. So I didn't found what the signal was. But it 
> seems that there is a bug in imapd code...
> 
> (gdb) thread apply all bt
> 
> Thread 1 (Thread 0x7ff6ead5f840 (LWP 22980)):
> #0  0x7ff6e83e282c in __lll_lock_wait_private () from /lib64/libc.so.6
> #1  0x7ff6e835e7e5 in _L_lock_16773 () from /lib64/libc.so.6
> #2  0x7ff6e835b833 in malloc () from /lib64/libc.so.6
> #3  0x56193edc9910 in xzmalloc ()
> #4  0x56193edb2664 in seqset_init ()
> #5  0x56193ed821b0 in index_tellchanges ()
> #6  0x56193ed8232b in index_check ()
> #7  0x56193ed6382e in idle_update ()
> #8  
> #9  0x7ff6e83580f0 in _int_malloc () from /lib64/libc.so.6
> #10 0x7ff6e835b7dc in malloc () from /lib64/libc.so.6
> #11 0x7ff6e956f4b8 in CRYPTO_malloc () from /lib64/libcrypto.so.10
> #12 0x7ff6e96397ec in EVP_PKEY_CTX_dup () from /lib64/libcrypto.so.10
> #13 0x7ff6e962b360 in EVP_MD_CTX_copy_ex () from /lib64/libcrypto.so.10
> #14 0x7ff6e999ab3d in tls1_mac () from /lib64/libssl.so.10
> #15 0x7ff6e998db02 in ssl3_read_bytes () from /lib64/libssl.so.10
> #16 0x7ff6e998a644 in ssl3_read_internal () from /lib64/libssl.so.10
> #17 0x56193edb9745 in prot_fill ()
> #18 0x56193eda9ad7 in getword ()
> #19 0x56193ed670e7 in cmd_idle ()
> #20 0x56193ed7848d in cmdloop ()
> #21 0x56193ed79769 in service_main ()
> #22 0x56193ed62875 in main ()
> 
> 
> 13.12.2018 17:50, Ivan Kuznetsov пишет:
> > Jonk, thank you for the idea. Somewhat looks strange as old mail server 
> > worked w/o this problem 5+ years. But the system environment changed 
> > dramatically, may be some filesystem quircks are significant for this 
> > locks...
> > 
> > I will try gdb'ing the process when problem occurs once more
> > 
> > 13.12.2018 17:34, John Wade пишет:
> >> Without running gdb on the process, I have no idea, but your problem 
> >> sounds similar to something we hit a very long time ago:
> >>
> >> See https://www.oakton.edu/user/3/jwade/cyrus/Readme.html
> >>
> >> In our cases, the problem was the imapd process that was holding the 
> >> lock was trying to obtain a second lock on the same file.   What does 
> >> a stack trace look like on the imapd process that is holding the lock? 
> >> It would appear the lock process has changed since I last looked at 
> >> this, so this may not be a help at all.
> >>
> >> Good luck,
> >> John
> >>
> >>
> >>
> >> On 12/13/2018 5:21 AM, Ivan Kuznetsov wrote:
> >>> Hello
> >>>
> >>> We had a company mail server under Oracle Linux 6 (a clone of RHEL6) 
> >>> with cyrus-imapd 2.3.16 working for years w/o problems. There are >9M 
> >>> messages in ~9500 mailboxes in two partitions. Daily mail traffic is 
> >>> 20-50K messages.
> >>>
> >>> Some weeks ago we migrated the server to a new one under Oracle Linux 
> >>> 7 with cyrus-imapd 2.4.17 (cyrus-imapd-2.4.17-13.el7.x86_64 rpm) and 
> >>> now have problem. Some times a week one of imapd processes locks an 
> >>> "INBOX" mailbox with corresponding 
> >>> /var/lib/imap/lock/user/.lock file and does not unlock it 
> >>> anymore. Other imapd processes trying to access this mailbox sticks 
> >>> waiting to obtain the lock. The bigger problem is that lmtpd 
> >>> processes trying to deliver new mail to this mailbox hangs too. The 
> >>> number of lmtpd processes is limited (maxchild=32) to limit the 
> >>> server load, so free lmtpd pool become empty after a time, and mail 
> >>> delivery stopsto all the mailboxes. MTA (postfix) mail queue blows up 
> >>> quickly
> >>>
> >>> Example lsof output:
> >>>
> >>> lmtpd   8182 cyrus   18uR  REG  249,0 0 202944402 
> >>> /var/lib/imap/lock/user/smilovsky.lock
> >>> lmtpd   8183 cyrus   18uR  REG  249,0 0 202944402 
> >>> /var/lib/imap/lock/user/smilovsky.lock
> >>> lmtpd   8187 cyrus   18uR  REG  249,0 0 202944402 
> >>> /var/lib/imap/lock/user/smilovsky.lock
> >>> lmtpd   8771 cyrus   18uR  REG  249,0 0 202944402 
> >>> /var/lib/imap/lock/user/smilovsky.lock
> >>> lmtpd   8834 cyrus   18uR  REG  

Re: Cyrus-imapd 2.4.17: processes stick on mailbox locking resulting in total mailsystem failure

2019-02-19 Thread Ivan Kuznetsov

Hello

I have gdb'ed the locked process, backtrace is below. It seems that 
problem occurs when imapd process catch a signal when is inside malloc() 
call. The signal handler has a malloc() call too, so finally there is 
interlock between two mallocs


I have only a few time to investigate because locked processes list 
grows up dramatically. So I didn't found what the signal was. But it 
seems that there is a bug in imapd code...


(gdb) thread apply all bt

Thread 1 (Thread 0x7ff6ead5f840 (LWP 22980)):
#0  0x7ff6e83e282c in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x7ff6e835e7e5 in _L_lock_16773 () from /lib64/libc.so.6
#2  0x7ff6e835b833 in malloc () from /lib64/libc.so.6
#3  0x56193edc9910 in xzmalloc ()
#4  0x56193edb2664 in seqset_init ()
#5  0x56193ed821b0 in index_tellchanges ()
#6  0x56193ed8232b in index_check ()
#7  0x56193ed6382e in idle_update ()
#8  
#9  0x7ff6e83580f0 in _int_malloc () from /lib64/libc.so.6
#10 0x7ff6e835b7dc in malloc () from /lib64/libc.so.6
#11 0x7ff6e956f4b8 in CRYPTO_malloc () from /lib64/libcrypto.so.10
#12 0x7ff6e96397ec in EVP_PKEY_CTX_dup () from /lib64/libcrypto.so.10
#13 0x7ff6e962b360 in EVP_MD_CTX_copy_ex () from /lib64/libcrypto.so.10
#14 0x7ff6e999ab3d in tls1_mac () from /lib64/libssl.so.10
#15 0x7ff6e998db02 in ssl3_read_bytes () from /lib64/libssl.so.10
#16 0x7ff6e998a644 in ssl3_read_internal () from /lib64/libssl.so.10
#17 0x56193edb9745 in prot_fill ()
#18 0x56193eda9ad7 in getword ()
#19 0x56193ed670e7 in cmd_idle ()
#20 0x56193ed7848d in cmdloop ()
#21 0x56193ed79769 in service_main ()
#22 0x56193ed62875 in main ()


13.12.2018 17:50, Ivan Kuznetsov пишет:
Jonk, thank you for the idea. Somewhat looks strange as old mail server 
worked w/o this problem 5+ years. But the system environment changed 
dramatically, may be some filesystem quircks are significant for this 
locks...


I will try gdb'ing the process when problem occurs once more

13.12.2018 17:34, John Wade пишет:
Without running gdb on the process, I have no idea, but your problem 
sounds similar to something we hit a very long time ago:


See https://www.oakton.edu/user/3/jwade/cyrus/Readme.html

In our cases, the problem was the imapd process that was holding the 
lock was trying to obtain a second lock on the same file.   What does 
a stack trace look like on the imapd process that is holding the lock? 
It would appear the lock process has changed since I last looked at 
this, so this may not be a help at all.


Good luck,
John



On 12/13/2018 5:21 AM, Ivan Kuznetsov wrote:

Hello

We had a company mail server under Oracle Linux 6 (a clone of RHEL6) 
with cyrus-imapd 2.3.16 working for years w/o problems. There are >9M 
messages in ~9500 mailboxes in two partitions. Daily mail traffic is 
20-50K messages.


Some weeks ago we migrated the server to a new one under Oracle Linux 
7 with cyrus-imapd 2.4.17 (cyrus-imapd-2.4.17-13.el7.x86_64 rpm) and 
now have problem. Some times a week one of imapd processes locks an 
"INBOX" mailbox with corresponding 
/var/lib/imap/lock/user/.lock file and does not unlock it 
anymore. Other imapd processes trying to access this mailbox sticks 
waiting to obtain the lock. The bigger problem is that lmtpd 
processes trying to deliver new mail to this mailbox hangs too. The 
number of lmtpd processes is limited (maxchild=32) to limit the 
server load, so free lmtpd pool become empty after a time, and mail 
delivery stopsto all the mailboxes. MTA (postfix) mail queue blows up 
quickly


Example lsof output:

lmtpd   8182 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   8183 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   8187 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   8771 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   8834 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   9123 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   9631 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   10343 cyrus   18uR  REG  249,0 0 
202944402 /var/lib/imap/lock/user/smilovsky.lock
lmtpd   10437 cyrus   18uR  REG  249,0 0 
202944402 /var/lib/imap/lock/user/smilovsky.lock
lmtpd   11411 cyrus   18uR  REG  249,0 0 
202944402 /var/lib/imap/lock/user/smilovsky.lock
lmtpd   11413 cyrus   18uR  REG  249,0 0 
202944402 /var/lib/imap/lock/user/smilovsky.lock
lmtpd   11506 cyrus   18uR  REG  249,0 0 
202944402 /var/lib/imap/lock/user/smilovsky.lock


[root@hippo4 bin]# lsof 

Re: Cyrus-imapd 2.4.17: processes stick on mailbox locking resulting in total mailsystem failure

2018-12-13 Thread Ivan Kuznetsov
Jonk, thank you for the idea. Somewhat looks strange as old mail server 
worked w/o this problem 5+ years. But the system environment changed 
dramatically, may be some filesystem quircks are significant for this 
locks...


I will try gdb'ing the process when problem occurs once more

13.12.2018 17:34, John Wade пишет:
Without running gdb on the process, I have no idea, but your problem 
sounds similar to something we hit a very long time ago:


See https://www.oakton.edu/user/3/jwade/cyrus/Readme.html

In our cases, the problem was the imapd process that was holding the 
lock was trying to obtain a second lock on the same file.   What does a 
stack trace look like on the imapd process that is holding the lock? It 
would appear the lock process has changed since I last looked at this, 
so this may not be a help at all.


Good luck,
John



On 12/13/2018 5:21 AM, Ivan Kuznetsov wrote:

Hello

We had a company mail server under Oracle Linux 6 (a clone of RHEL6) 
with cyrus-imapd 2.3.16 working for years w/o problems. There are >9M 
messages in ~9500 mailboxes in two partitions. Daily mail traffic is 
20-50K messages.


Some weeks ago we migrated the server to a new one under Oracle Linux 
7 with cyrus-imapd 2.4.17 (cyrus-imapd-2.4.17-13.el7.x86_64 rpm) and 
now have problem. Some times a week one of imapd processes locks an 
"INBOX" mailbox with corresponding /var/lib/imap/lock/user/.lock 
file and does not unlock it anymore. Other imapd processes trying to 
access this mailbox sticks waiting to obtain the lock. The bigger 
problem is that lmtpd processes trying to deliver new mail to this 
mailbox hangs too. The number of lmtpd processes is limited 
(maxchild=32) to limit the server load, so free lmtpd pool become 
empty after a time, and mail delivery stopsto all the mailboxes. MTA 
(postfix) mail queue blows up quickly


Example lsof output:

lmtpd   8182 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   8183 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   8187 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   8771 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   8834 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   9123 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   9631 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   10343 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   10437 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   11411 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   11413 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   11506 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock


[root@hippo4 bin]# lsof /var/lib/imap/lock/user/smilovsky.lock | grep 
'^imapd'
imapd    9132 cyrus   19uR  REG  249,0    0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
imapd    9154 cyrus   19uR  REG  249,0    0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
imapd   10311 cyrus   19uR  REG  249,0    0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
imapd   10746 cyrus   19uR  REG  249,0    0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
imapd   11843 cyrus   19uR  REG  249,0    0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
imapd   12059 cyrus   19uR  REG  249,0    0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
imapd   21885 cyrus   19uR  REG  249,0    0 202944402 
/var/lib/imap/lock/user/smilovsky.lock


In this case the root cause imapd process is 21885. strace shows that 
the process waits on a futex forever:


[root@hippo4 bin]# strace -tt -f -p 21885
strace: Process 21885 attached
13:07:28.264713 futex(0x7f6dcc5ac760, FUTEX_WAIT_PRIVATE, 2, 
NULL^Cstrace: Process 21885 detached

 

Killing the process frees the lock, all the other stuck processes runs 
and the problem disappears in a few seconds.


Investigation shows that this process has a long-term connection from 
a user MUA (Thunderbird) and deletes mail messages time-to-time. This 
may be a Thunderbird 'filter' thread


[root@hippo4 bin]# grep 'imap\[21885\]' /var/log/maillog
[...]
Dec 13 09:59:11 hippo4 imap[21885]: starttls: TLSv1.2 with cipher 
DHE-RSA-AES256-SHA (256/256 bits new) no authentication
Dec 13 09:59:11 hippo4 imap[21885]: login: thunder.solvo.ru 
[172.16.85.69] smilovsky CRAM-MD5+TLS User logged in 
SESSIONID=
Dec 13 09:59:11 hippo4 imap[21885]: client id: "name" "Thunderbird" 
"version" "52.5.2"
Dec 13 09:59:11 hippo4 imap[21885]: 

Re: Cyrus-imapd 2.4.17: processes stick on mailbox locking resulting in total mailsystem failure

2018-12-13 Thread John Wade
Without running gdb on the process, I have no idea, but your problem 
sounds similar to something we hit a very long time ago:


See https://www.oakton.edu/user/3/jwade/cyrus/Readme.html

In our cases, the problem was the imapd process that was holding the 
lock was trying to obtain a second lock on the same file.   What does a 
stack trace look like on the imapd process that is holding the lock? 
It would appear the lock process has changed since I last looked at 
this, so this may not be a help at all.


Good luck,
John



On 12/13/2018 5:21 AM, Ivan Kuznetsov wrote:

Hello

We had a company mail server under Oracle Linux 6 (a clone of RHEL6) 
with cyrus-imapd 2.3.16 working for years w/o problems. There are >9M 
messages in ~9500 mailboxes in two partitions. Daily mail traffic is 
20-50K messages.


Some weeks ago we migrated the server to a new one under Oracle Linux 
7 with cyrus-imapd 2.4.17 (cyrus-imapd-2.4.17-13.el7.x86_64 rpm) and 
now have problem. Some times a week one of imapd processes locks an 
"INBOX" mailbox with corresponding /var/lib/imap/lock/user/.lock 
file and does not unlock it anymore. Other imapd processes trying to 
access this mailbox sticks waiting to obtain the lock. The bigger 
problem is that lmtpd processes trying to deliver new mail to this 
mailbox hangs too. The number of lmtpd processes is limited 
(maxchild=32) to limit the server load, so free lmtpd pool become 
empty after a time, and mail delivery stopsto all the mailboxes. MTA 
(postfix) mail queue blows up quickly


Example lsof output:

lmtpd   8182 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   8183 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   8187 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   8771 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   8834 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   9123 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   9631 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   10343 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   10437 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   11411 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   11413 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
lmtpd   11506 cyrus   18uR  REG  249,0 0 202944402 
/var/lib/imap/lock/user/smilovsky.lock


[root@hippo4 bin]# lsof /var/lib/imap/lock/user/smilovsky.lock | grep 
'^imapd'
imapd    9132 cyrus   19uR  REG  249,0    0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
imapd    9154 cyrus   19uR  REG  249,0    0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
imapd   10311 cyrus   19uR  REG  249,0    0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
imapd   10746 cyrus   19uR  REG  249,0    0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
imapd   11843 cyrus   19uR  REG  249,0    0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
imapd   12059 cyrus   19uR  REG  249,0    0 202944402 
/var/lib/imap/lock/user/smilovsky.lock
imapd   21885 cyrus   19uR  REG  249,0    0 202944402 
/var/lib/imap/lock/user/smilovsky.lock


In this case the root cause imapd process is 21885. strace shows that 
the process waits on a futex forever:


[root@hippo4 bin]# strace -tt -f -p 21885
strace: Process 21885 attached
13:07:28.264713 futex(0x7f6dcc5ac760, FUTEX_WAIT_PRIVATE, 2, 
NULL^Cstrace: Process 21885 detached

 

Killing the process frees the lock, all the other stuck processes runs 
and the problem disappears in a few seconds.


Investigation shows that this process has a long-term connection from 
a user MUA (Thunderbird) and deletes mail messages time-to-time. This 
may be a Thunderbird 'filter' thread


[root@hippo4 bin]# grep 'imap\[21885\]' /var/log/maillog
[...]
Dec 13 09:59:11 hippo4 imap[21885]: starttls: TLSv1.2 with cipher 
DHE-RSA-AES256-SHA (256/256 bits new) no authentication
Dec 13 09:59:11 hippo4 imap[21885]: login: thunder.solvo.ru 
[172.16.85.69] smilovsky CRAM-MD5+TLS User logged in 
SESSIONID=
Dec 13 09:59:11 hippo4 imap[21885]: client id: "name" "Thunderbird" 
"version" "52.5.2"
Dec 13 09:59:11 hippo4 imap[21885]: Expunged 1 messages from 
user.smilovsky
Dec 13 09:59:12 hippo4 imap[21885]: Expunged 1 messages from 
user.smilovsky
Dec 13 09:59:12 hippo4 imap[21885]: Expunged 1 messages from 
user.smilovsky
Dec 13 09:59:12 hippo4 imap[21885]: Expunged 1 messages from 
user.smilovsky
Dec 13 10:01:45 hippo4 imap[21885]: Expunged 1