Bug#842291: [pkg-gnupg-maint] Bug#842291: notmuch processes frequently stuck in select()
Werner Kochwrites: > On Wed, 23 Nov 2016 18:19, d...@fifthhorseman.net said: > >> 0) turn off CRL updates entirely during s/mime signature verification > > The gpgsm option is --disable-crl-checks. > >> 1) do s/mime signature verification without CRL updates, but schedule >> CRL checks to happen in the background for dirmngr, so that future >> verifications will reflect the cert validity A notmuch user reported on IRC that adding disable-crl-checks to ~/.gnupg/dirmngr.conf eliminated the long pauses when verifying s/mime signatures. This will prevent the user from noticing Certificate revokations, so it's not without cost in security, but perhaps it's temporary workaround until we figure out some better solution. d
Bug#842291: [pkg-gnupg-maint] Bug#842291: notmuch processes frequently stuck in select()
On Wed, 23 Nov 2016 18:19, d...@fifthhorseman.net said: > 0) turn off CRL updates entirely during s/mime signature verification The gpgsm option is --disable-crl-checks. > 1) do s/mime signature verification without CRL updates, but schedule > CRL checks to happen in the background for dirmngr, so that future > verifications will reflect the cert validity As above but use dirmngr-client--url --load-crl URLOFCRL You need to known the URL of the CRL, though. > 2) have dirmngr avoid checking CRLs that it knows it has already > updated recently A CRL carries a next-update date which is homored by dirmngr. Further dirmngr doesn't avoids to download a CRL unless 30 minutes have passed since the lassed download. > 3) tell dirmngr to use much shorter CRL fetch timeouts gpgsm -k --enable-crl-check --force-crl-refresh USERID > that's a 20-second lag between each failed check, adding up to 80 That seems to be caused by DNS lookups. For example ADNS keeps on trying even if it has received an ENETUNREACH and thus no UDP query packet has been sent out. We will very likely replace ADNS by a more flexible library in the next GnuPG version. Salam-Shalom, Werner -- Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz. pgpM3cSJdhUyC.pgp Description: PGP signature
Bug#842291: notmuch processes frequently stuck in select()
Daniel Kahn Gillmorwrites: > > 0) turn off CRL updates entirely during s/mime signature verification > > 1) do s/mime signature verification without CRL updates, but schedule > CRL checks to happen in the background for dirmngr, so that future > verifications will reflect the cert validity > > 2) have dirmngr avoid checking CRLs that it knows it has already > updated recently > > 3) tell dirmngr to use much shorter CRL fetch timeouts > > > Any thoughts on the best way to pursue this? > > --dkg Maybe the issue is in gmime's usage of gpgme. If I understand correctly (which is far from a sure thing), pkcs7_verify calls gpgme_op_verify which is synchronous, and (apparently) does not support timeouts. An alternate strategy would be to call gpgme_op_verify_start, and then call gpgme_wait, which has a nonblocking mode. I don't really understand the S/MIME model, but naively it seems OK for signature verification to fail if the CRL check doesn't finish quickly. d
Bug#842291: notmuch processes frequently stuck in select()
Control: affects 842291 + gpgsm dirmngr On Wed 2016-11-23 03:50:40 -0500, David Bremner wrote: > David Bremnerwrites: > >> Brian May writes: >>> strace shows notmuch looping in select. >>> >>> select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) >>> select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) >>> select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) >>> select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) >>> etc >>> >> >> a backtrace would be helpful. >> >> d > > Nevermind, I managed to download the list archive for debian-devel and > replicate the bug. > > The bug seems to be related to smime signature verification. After > adding the attached mail message (and running notmuch-new), to replicate > the hang it suffices to run > > % notmuch show --decrypt id:20161116t143809.ga.21721.s...@fsing.rootsland.net > > > As far as workarounds, turning off decryption / signature verification > should allow you to at least view the thread. I've noticed similar behavior, and it seems to correlate with gpgsm asking dirmngr for an update to CRLs related to S/MIME certs. some CRLs simply do not respond at all (resulting in a timeout), and some do not respond, or are laggy, when accessed over tor specifically. I see a couple possible ways to consider resolving this, none of them great, and i don't know exactly how to implement any of them: 0) turn off CRL updates entirely during s/mime signature verification 1) do s/mime signature verification without CRL updates, but schedule CRL checks to happen in the background for dirmngr, so that future verifications will reflect the cert validity 2) have dirmngr avoid checking CRLs that it knows it has already updated recently 3) tell dirmngr to use much shorter CRL fetch timeouts Some example traffic from my dirmngr that uses tor (complete with timestamps indicating just how bad the delays can be): Nov 22 14:08:24 alice dirmngr[11976]: no CRL available for issuer id 770B4DA5913F2572B9F679AE0819FB7D77572689 Nov 22 14:08:24 alice dirmngr[11976]: fetching CRL from 'http://crl.ll.mit.edu/getcrl/LLCA3' Nov 22 14:08:44 alice dirmngr[11976]: resolving 'crl.ll.mit.edu' failed: No data Nov 22 14:08:44 alice dirmngr[11976]: can't connect to 'crl.ll.mit.edu': host not found Nov 22 14:08:44 alice dirmngr[11976]: error retrieving 'http://crl.ll.mit.edu/getcrl/LLCA3': Unknown host Nov 22 14:08:44 alice dirmngr[11976]: crl_fetch via DP failed: Unknown host Nov 22 14:08:45 alice dirmngr[11976]: no CRL available for issuer id 770B4DA5913F2572B9F679AE0819FB7D77572689 Nov 22 14:08:45 alice dirmngr[11976]: fetching CRL from 'http://crl.ll.mit.edu/getcrl/LLCA3' Nov 22 14:09:05 alice dirmngr[11976]: resolving 'crl.ll.mit.edu' failed: No data Nov 22 14:09:05 alice dirmngr[11976]: can't connect to 'crl.ll.mit.edu': host not found Nov 22 14:09:05 alice dirmngr[11976]: error retrieving 'http://crl.ll.mit.edu/getcrl/LLCA3': Unknown host Nov 22 14:09:05 alice dirmngr[11976]: crl_fetch via DP failed: Unknown host Nov 22 14:09:05 alice dirmngr[11976]: no CRL available for issuer id 26FD002905277B015EE9B2A3C092A348F28A4C6B Nov 22 14:09:05 alice dirmngr[11976]: fetching CRL from 'http://crl.startssl.com/sca-client1.crl' Nov 22 14:09:25 alice dirmngr[11976]: resolving 'crl.startssl.com' failed: No data Nov 22 14:09:25 alice dirmngr[11976]: can't connect to 'crl.startssl.com': host not found Nov 22 14:09:25 alice dirmngr[11976]: error retrieving 'http://crl.startssl.com/sca-client1.crl': Unknown host Nov 22 14:09:25 alice dirmngr[11976]: crl_fetch via DP failed: Unknown host Nov 22 14:09:25 alice dirmngr[11976]: no CRL available for issuer id 26FD002905277B015EE9B2A3C092A348F28A4C6B Nov 22 14:09:25 alice dirmngr[11976]: fetching CRL from 'http://crl.startssl.com/sca-client1.crl' Nov 22 14:09:45 alice dirmngr[11976]: resolving 'crl.startssl.com' failed: No data Nov 22 14:09:45 alice dirmngr[11976]: can't connect to 'crl.startssl.com': host not found Nov 22 14:09:45 alice dirmngr[11976]: error retrieving 'http://crl.startssl.com/sca-client1.crl': Unknown host Nov 22 14:09:45 alice dirmngr[11976]: crl_fetch via DP failed: Unknown host that's a 20-second lag between each failed check, adding up to 80 seconds delay in rendering a single thread where 4 messages were signed by S/MIME keys signed by two different authorities. Fwiw, crl.ll.mit.edu doesn't seem to respond over tor on port 80 at all in some cases, and in other cases takes nearly a minute to reply: 0 dkg@alice:/tmp/cdtemp.Ue45bu$ time wget -q 'http://crl.ll.mit.edu/getcrl/LLCA3' real0m0.694s user0m0.008s sys 0m0.008s 0 dkg@alice:/tmp/cdtemp.Ue45bu$ time torsocks wget -q 'http://crl.ll.mit.edu/getcrl/LLCA3' real0m58.828s user0m0.008s sys 0m0.008s 0 dkg@alice:/tmp/cdtemp.Ue45bu$ Any thoughts on the best way to pursue this? --dkg signature.asc Description: PGP signature
Bug#842291: notmuch processes frequently stuck in select()
David Bremnerwrites: > Brian May writes: >> strace shows notmuch looping in select. >> >> select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) >> select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) >> select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) >> select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) >> etc >> > > a backtrace would be helpful. > > d Nevermind, I managed to download the list archive for debian-devel and replicate the bug. The bug seems to be related to smime signature verification. After adding the attached mail message (and running notmuch-new), to replicate the hang it suffices to run % notmuch show --decrypt id:20161116t143809.ga.21721.s...@fsing.rootsland.net As far as workarounds, turning off decryption / signature verification should allow you to at least view the thread. 20161116T143809.GA.21721.stse@fsing.rootsland.net.eml:2, Description: Binary data
Bug#842291: notmuch processes frequently stuck in select()
Brian Maywrites: > strace shows notmuch looping in select. > > select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) > select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) > select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) > select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) > select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) > select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) > select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) > select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) > select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) > select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) > select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) > select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) > select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) > select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) > select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) > etc > a backtrace would be helpful. d
Bug#842291: notmuch processes frequently stuck in select()
I am also seeing similar behaviour. Means I can't read certain threads in notmuch. For example, the recent openssl thread on debian-devel. I have version 0.23.1-1 It doesn't seem to matter what messages I want to try to look at from that thread, I can't see anything in it. strace shows notmuch looping in select. select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) select(10, [9], [], NULL, {1, 0}) = 0 (Timeout) etc Work arounds would be appreciated. Thanks. -- Brian May
Bug#842291: notmuch processes frequently stuck in select()
Robbie Harwoodwrites: > David Bremner writes: > >> 1) Can you duplicate the problem with non-encrypted messages? > > Not sure. Most of the time it's inconsistent what messages it happens > for (and it's hard to see whether the message is encrypted when notmuch > won't show it to me). I'll update with a traceback for a non-encrypted > if I can find one. I guess the question is really can you find a case that doesn't involve hanging in a function starting with _gpgme. Your initial traceback actually looks like it involves verifying an SMIME signature, rather than decrypting. You can test this hypothesis by repeating the command without '--decrypt' or '--verify'
Bug#842291: notmuch processes frequently stuck in select()
David Bremnerwrites: > 1) Can you duplicate the problem with non-encrypted messages? Not sure. Most of the time it's inconsistent what messages it happens for (and it's hard to see whether the message is encrypted when notmuch won't show it to me). I'll update with a traceback for a non-encrypted if I can find one. > 2) Can you duplicate the problem with the same command run outside of >emacs? Yes, the same command hangs outside of emacs. signature.asc Description: PGP signature
Bug#842291: notmuch processes frequently stuck in select()
Robbie Harwoodwrites: > Package: notmuch > Version: 0.23.1-1 > Severity: important > > Dear Maintainer, > > Frequently the processes notmuch-emacs spawns to view messages never finish > returning the message data. Sometimes this is deterministic and the message > is never viewable; other times, spawning more of the same process causes the > first process to complete. > > Here's a traceback from one of the processes (invoked as > /usr/bin/notmuch show --format=sexp --format-version=1 --decrypt > --exclude=false ' thread:[redacted] and ( [redacted] )' > ): > > (gdb) bt > #0 0x7f1d74cff293 in __select_nocancel () at > ../sysdeps/unix/syscall-template.S:84 > #1 0x7f1d736b2429 in _gpgme_ath_select (nfd=nfd@entry=10, > rset=rset@entry=0x7ffe2d9c99e0, > wset=wset@entry=0x7ffe2d9c9a60, eset=eset@entry=0x0, > timeout=timeout@entry=0x7ffe2d9c99d0) > at ath-pthread.c:153 > #2 0x7f1d736ad857 in _gpgme_io_select (fds=0x556a3ddb1e30, nfds=10, > nonblock=nonblock@entry=0) > at posix-io.c:643 > #3 0x7f1d7368d2de in _gpgme_wait_on_condition > (ctx=ctx@entry=0x556a3ddb1260, cond=cond@entry=0x0, > op_err_p=op_err_p@entry=0x0) at wait-private.c:87 > #4 0x7f1d7368d4d9 in _gpgme_wait_one (ctx=ctx@entry=0x556a3ddb1260) at > wait-private.c:170 > #5 0x7f1d73691898 in gpgme_op_verify (ctx=0x556a3ddb1260, sig= out>, 1) Can you duplicate the problem with non-encrypted messages? 2) Can you duplicate the problem with the same command run outside of emacs? d
Bug#842291: notmuch processes frequently stuck in select()
Package: notmuch Version: 0.23.1-1 Severity: important Dear Maintainer, Frequently the processes notmuch-emacs spawns to view messages never finish returning the message data. Sometimes this is deterministic and the message is never viewable; other times, spawning more of the same process causes the first process to complete. Here's a traceback from one of the processes (invoked as /usr/bin/notmuch show --format=sexp --format-version=1 --decrypt --exclude=false ' thread:[redacted] and ( [redacted] )' ): (gdb) bt #0 0x7f1d74cff293 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84 #1 0x7f1d736b2429 in _gpgme_ath_select (nfd=nfd@entry=10, rset=rset@entry=0x7ffe2d9c99e0, wset=wset@entry=0x7ffe2d9c9a60, eset=eset@entry=0x0, timeout=timeout@entry=0x7ffe2d9c99d0) at ath-pthread.c:153 #2 0x7f1d736ad857 in _gpgme_io_select (fds=0x556a3ddb1e30, nfds=10, nonblock=nonblock@entry=0) at posix-io.c:643 #3 0x7f1d7368d2de in _gpgme_wait_on_condition (ctx=ctx@entry=0x556a3ddb1260, cond=cond@entry=0x0, op_err_p=op_err_p@entry=0x0) at wait-private.c:87 #4 0x7f1d7368d4d9 in _gpgme_wait_one (ctx=ctx@entry=0x556a3ddb1260) at wait-private.c:170 #5 0x7f1d73691898 in gpgme_op_verify (ctx=0x556a3ddb1260, sig=, signed_text=0x556a3ce9eb10, plaintext=plaintext@entry=0x0) at verify.c:1147 #6 0x7f1d75985b95 in pkcs7_verify (context=, digest=, istream=, sigstream=0x556a3ce38850, err=0x7ffe2d9c9c70) at gmime-pkcs7-context.c:646 #7 0x7f1d7597c832 in g_mime_multipart_signed_verify (mps=, ctx=ctx@entry=0x556a3ce5f630, err=err@entry=0x7ffe2d9c9c70) at gmime-multipart-signed.c:466 #8 0x556a3c5f1c80 in node_verify (cryptoctx=0x556a3ce5f630, part=0x556a3ce3cf00, node=0x556a3ce7e100) at mime-node.c:159 #9 _mime_node_create (part=0x556a3ce3cf00, parent=0x556a3ddc92d0) at mime-node.c:267 #10 mime_node_child (parent=parent@entry=0x556a3ddc92d0, child=child@entry=0) at mime-node.c:296 #11 0x556a3c5eed6d in format_part_sprinter (ctx=0x556a3ddb7cc0, sp=0x556a3ce5e540, node=0x556a3ddc92d0, first=1, output_body=1, include_html=0) at notmuch-show.c:563 #12 0x556a3c5ef486 in format_part_sprinter_entry (ctx=, sp=, node=, indent=, params=) at notmuch-show.c:670 #13 0x556a3c5edfc0 in show_message (ctx=ctx@entry=0x556a3ce38200, sp=sp@entry=0x556a3ce5e540, message=message@entry=0x556a3ddb87a0, indent=indent@entry=3, params=params@entry=0x7ffe2d9c9fa0, format=) at notmuch-show.c:827 #14 0x556a3c5ee0b9 in show_messages (ctx=ctx@entry=0x556a3ce38200, format=format@entry=0x556a3c7fd280 , sp=sp@entry=0x556a3ce5e540, messages=0x556a3ddb5aa0, indent=indent@entry=3, params=params@entry=0x7ffe2d9c9fa0) at notmuch-show.c:863 #15 0x556a3c5ee0fc in show_messages (ctx=ctx@entry=0x556a3ce38200, format=format@entry=0x556a3c7fd280 , sp=sp@entry=0x556a3ce5e540, messages=0x556a3ce81700, indent=indent@entry=2, params=params@entry=0x7ffe2d9c9fa0) at notmuch-show.c:871 #16 0x556a3c5ee0fc in show_messages (ctx=ctx@entry=0x556a3ce38200, format=format@entry=0x556a3c7fd280 , sp=sp@entry=0x556a3ce5e540, messages=0x556a3ce846c0, indent=indent@entry=1, params=params@entry=0x7ffe2d9c9fa0) at notmuch-show.c:871 #17 0x556a3c5ee0fc in show_messages (ctx=ctx@entry=0x556a3ce38200, format=format@entry=0x556a3c7fd280 , sp=sp@entry=0x556a3ce5e540, messages=0x556a3ce6dec0, indent=indent@entry=0, params=params@entry=0x7ffe2d9c9fa0) at notmuch-show.c:871 #18 0x556a3c5efd9f in do_show (params=0x7ffe2d9c9fa0, sp=0x556a3ce5e540, format=0x556a3c7fd280 , query=0x556a3ce3dfd0, ctx=0x556a3ce38200) at notmuch-show.c:959 #19 notmuch_show_command (config=0x556a3ce38200, argc=, argv=) at notmuch-show.c:1171 #20 0x556a3c5e1b8f in main (argc=, argv=0x7ffe2d9ca458) at notmuch.c:421 (gdb) (Versions of programs are the latest I could install, but it also occurs with about the same frequency when everything is from testing.) Thanks! -- System Information: Debian Release: stretch/sid APT prefers testing-debug APT policy: (600, 'testing-debug'), (600, 'testing'), (400, 'unstable-debug'), (400, 'unstable'), (200, 'experimental'), (1, 'experimental-debug') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.7.0-1-amd64 (SMP w/8 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: sysvinit (via /sbin/init) Versions of packages notmuch depends on: ii libc6 2.24-5 ii libglib2.0-02.50.1-1 ii libgmime-2.6-0 2.6.20-8 ii libnotmuch4 0.23.1-1 ii libtalloc2 2.1.8-1 ii zlib1g 1:1.2.8.dfsg-2+b1 Versions of packages notmuch recommends: ii gnupg-agent2.1.15-4 ii gpgsm 2.1.15-4 ii notmuch-emacs 0.23.1-1 notmuch suggests no packages. -- no debconf information