Re: Bug#842291: notmuch processes frequently stuck in select()

2016-11-23 Thread David Bremner
Daniel Kahn Gillmor  writes:
>
>  0) turn off CRL updates entirely during s/mime signature verification
>
>  1) do s/mime signature verification without CRL updates, but schedule
> CRL checks to happen in the background for dirmngr, so that future
> verifications will reflect the cert validity
>
>  2) have dirmngr avoid checking CRLs that it knows it has already
> updated recently
>
>  3) tell dirmngr to use much shorter CRL fetch timeouts
>

>
> Any thoughts on the best way to pursue this?
>
> --dkg

Maybe the issue is in gmime's usage of gpgme. If I understand correctly
(which is far from a sure thing), pkcs7_verify calls gpgme_op_verify
which is synchronous, and (apparently) does not support timeouts. An
alternate strategy would be to call gpgme_op_verify_start, and then call
gpgme_wait, which has a nonblocking mode. I don't really understand the
S/MIME model, but naively it seems OK for signature verification to fail
if the CRL check doesn't finish quickly.

d
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: Bug#842291: notmuch processes frequently stuck in select()

2016-11-23 Thread Daniel Kahn Gillmor
Control: affects 842291 + gpgsm dirmngr

On Wed 2016-11-23 03:50:40 -0500, David Bremner wrote:
> David Bremner  writes:
>
>> Brian May  writes:
>>> strace shows notmuch looping in select.
>>>
>>> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
>>> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
>>> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
>>> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
>>> etc
>>>
>>
>> a backtrace would be helpful.
>>
>> d
>
> Nevermind, I managed to download the list archive for debian-devel and
> replicate the bug.
>
> The bug seems to be related to smime signature verification. After
> adding the attached mail message (and running notmuch-new), to replicate
> the hang it suffices to run
>
> % notmuch show --decrypt id:20161116t143809.ga.21721.s...@fsing.rootsland.net 
>  
>
> As far as workarounds, turning off decryption / signature verification
> should allow you to at least view the thread.

I've noticed similar behavior, and it seems to correlate with gpgsm
asking dirmngr for an update to CRLs related to S/MIME certs.

some CRLs simply do not respond at all (resulting in a timeout), and
some do not respond, or are laggy, when accessed over tor specifically.

I see a couple possible ways to consider resolving this, none of them
great, and i don't know exactly how to implement any of them:

 0) turn off CRL updates entirely during s/mime signature verification

 1) do s/mime signature verification without CRL updates, but schedule
CRL checks to happen in the background for dirmngr, so that future
verifications will reflect the cert validity

 2) have dirmngr avoid checking CRLs that it knows it has already
updated recently

 3) tell dirmngr to use much shorter CRL fetch timeouts


Some example traffic from my dirmngr that uses tor (complete with
timestamps indicating just how bad the delays can be):

Nov 22 14:08:24 alice dirmngr[11976]: no CRL available for issuer id 
770B4DA5913F2572B9F679AE0819FB7D77572689
Nov 22 14:08:24 alice dirmngr[11976]: fetching CRL from 
'http://crl.ll.mit.edu/getcrl/LLCA3'
Nov 22 14:08:44 alice dirmngr[11976]: resolving 'crl.ll.mit.edu' failed: No data
Nov 22 14:08:44 alice dirmngr[11976]: can't connect to 'crl.ll.mit.edu': host 
not found
Nov 22 14:08:44 alice dirmngr[11976]: error retrieving 
'http://crl.ll.mit.edu/getcrl/LLCA3': Unknown host
Nov 22 14:08:44 alice dirmngr[11976]: crl_fetch via DP failed: Unknown host
Nov 22 14:08:45 alice dirmngr[11976]: no CRL available for issuer id 
770B4DA5913F2572B9F679AE0819FB7D77572689
Nov 22 14:08:45 alice dirmngr[11976]: fetching CRL from 
'http://crl.ll.mit.edu/getcrl/LLCA3'
Nov 22 14:09:05 alice dirmngr[11976]: resolving 'crl.ll.mit.edu' failed: No data
Nov 22 14:09:05 alice dirmngr[11976]: can't connect to 'crl.ll.mit.edu': host 
not found
Nov 22 14:09:05 alice dirmngr[11976]: error retrieving 
'http://crl.ll.mit.edu/getcrl/LLCA3': Unknown host
Nov 22 14:09:05 alice dirmngr[11976]: crl_fetch via DP failed: Unknown host
Nov 22 14:09:05 alice dirmngr[11976]: no CRL available for issuer id 
26FD002905277B015EE9B2A3C092A348F28A4C6B
Nov 22 14:09:05 alice dirmngr[11976]: fetching CRL from 
'http://crl.startssl.com/sca-client1.crl'
Nov 22 14:09:25 alice dirmngr[11976]: resolving 'crl.startssl.com' failed: No 
data
Nov 22 14:09:25 alice dirmngr[11976]: can't connect to 'crl.startssl.com': host 
not found
Nov 22 14:09:25 alice dirmngr[11976]: error retrieving 
'http://crl.startssl.com/sca-client1.crl': Unknown host
Nov 22 14:09:25 alice dirmngr[11976]: crl_fetch via DP failed: Unknown host
Nov 22 14:09:25 alice dirmngr[11976]: no CRL available for issuer id 
26FD002905277B015EE9B2A3C092A348F28A4C6B
Nov 22 14:09:25 alice dirmngr[11976]: fetching CRL from 
'http://crl.startssl.com/sca-client1.crl'
Nov 22 14:09:45 alice dirmngr[11976]: resolving 'crl.startssl.com' failed: No 
data
Nov 22 14:09:45 alice dirmngr[11976]: can't connect to 'crl.startssl.com': host 
not found
Nov 22 14:09:45 alice dirmngr[11976]: error retrieving 
'http://crl.startssl.com/sca-client1.crl': Unknown host
Nov 22 14:09:45 alice dirmngr[11976]: crl_fetch via DP failed: Unknown host

that's a 20-second lag between each failed check, adding up to 80
seconds delay in rendering a single thread where 4 messages were signed
by S/MIME keys signed by two different authorities.

Fwiw, crl.ll.mit.edu doesn't seem to respond over tor on port 80 at all
in some cases, and in other cases takes nearly a minute to reply:

0 dkg@alice:/tmp/cdtemp.Ue45bu$ time wget -q 
'http://crl.ll.mit.edu/getcrl/LLCA3'

real0m0.694s
user0m0.008s
sys 0m0.008s
0 dkg@alice:/tmp/cdtemp.Ue45bu$ time torsocks wget -q 
'http://crl.ll.mit.edu/getcrl/LLCA3'

real0m58.828s
user0m0.008s
sys 0m0.008s
0 dkg@alice:/tmp/cdtemp.Ue45bu$ 


Any thoughts on the best way to pursue this?

--dkg


signature.asc
Description: PGP signature
___

Re: Bug#842291: notmuch processes frequently stuck in select()

2016-11-23 Thread David Bremner
David Bremner  writes:

> Brian May  writes:
>> strace shows notmuch looping in select.
>>
>> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
>> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
>> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
>> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
>> etc
>>
>
> a backtrace would be helpful.
>
> d

Nevermind, I managed to download the list archive for debian-devel and
replicate the bug.

The bug seems to be related to smime signature verification. After
adding the attached mail message (and running notmuch-new), to replicate
the hang it suffices to run

% notmuch show --decrypt id:20161116t143809.ga.21721.s...@fsing.rootsland.net  

As far as workarounds, turning off decryption / signature verification
should allow you to at least view the thread.



20161116T143809.GA.21721.stse@fsing.rootsland.net.eml:2,
Description: Binary data
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch