Bug#842291: [pkg-gnupg-maint] Bug#842291: notmuch processes frequently stuck in select()

2016-11-29 Thread David Bremner
Werner Koch  writes:

> On Wed, 23 Nov 2016 18:19, d...@fifthhorseman.net said:
>
>>  0) turn off CRL updates entirely during s/mime signature verification
>
> The gpgsm option is --disable-crl-checks.  
>
>>  1) do s/mime signature verification without CRL updates, but schedule
>> CRL checks to happen in the background for dirmngr, so that future
>> verifications will reflect the cert validity

A notmuch user reported on IRC that adding disable-crl-checks to
~/.gnupg/dirmngr.conf eliminated the long pauses when verifying s/mime
signatures.

This will prevent the user from noticing Certificate revokations, so
it's not without cost in security, but perhaps it's temporary workaround
until we figure out some better solution.

d



Bug#842291: [pkg-gnupg-maint] Bug#842291: notmuch processes frequently stuck in select()

2016-11-25 Thread Werner Koch
On Wed, 23 Nov 2016 18:19, d...@fifthhorseman.net said:

>  0) turn off CRL updates entirely during s/mime signature verification

The gpgsm option is --disable-crl-checks.  

>  1) do s/mime signature verification without CRL updates, but schedule
> CRL checks to happen in the background for dirmngr, so that future
> verifications will reflect the cert validity

As above but use 

  dirmngr-client--url --load-crl URLOFCRL

You need to known the URL of the CRL, though.

>  2) have dirmngr avoid checking CRLs that it knows it has already
> updated recently

A CRL carries a next-update date which is homored by dirmngr.  Further
dirmngr doesn't avoids to download a CRL unless 30 minutes have passed
since the lassed download.

>  3) tell dirmngr to use much shorter CRL fetch timeouts

gpgsm -k  --enable-crl-check --force-crl-refresh  USERID

> that's a 20-second lag between each failed check, adding up to 80

That seems to be caused by DNS lookups.  For example ADNS keeps on
trying even if it has received an ENETUNREACH and thus no UDP query
packet has been sent out.   We will very likely replace ADNS by a more
flexible library in the next GnuPG version.


Salam-Shalom,

   Werner

-- 
Die Gedanken sind frei.  Ausnahmen regelt ein Bundesgesetz.


pgpM3cSJdhUyC.pgp
Description: PGP signature


Bug#842291: notmuch processes frequently stuck in select()

2016-11-23 Thread David Bremner
Daniel Kahn Gillmor  writes:
>
>  0) turn off CRL updates entirely during s/mime signature verification
>
>  1) do s/mime signature verification without CRL updates, but schedule
> CRL checks to happen in the background for dirmngr, so that future
> verifications will reflect the cert validity
>
>  2) have dirmngr avoid checking CRLs that it knows it has already
> updated recently
>
>  3) tell dirmngr to use much shorter CRL fetch timeouts
>

>
> Any thoughts on the best way to pursue this?
>
> --dkg

Maybe the issue is in gmime's usage of gpgme. If I understand correctly
(which is far from a sure thing), pkcs7_verify calls gpgme_op_verify
which is synchronous, and (apparently) does not support timeouts. An
alternate strategy would be to call gpgme_op_verify_start, and then call
gpgme_wait, which has a nonblocking mode. I don't really understand the
S/MIME model, but naively it seems OK for signature verification to fail
if the CRL check doesn't finish quickly.

d



Bug#842291: notmuch processes frequently stuck in select()

2016-11-23 Thread Daniel Kahn Gillmor
Control: affects 842291 + gpgsm dirmngr

On Wed 2016-11-23 03:50:40 -0500, David Bremner wrote:
> David Bremner  writes:
>
>> Brian May  writes:
>>> strace shows notmuch looping in select.
>>>
>>> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
>>> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
>>> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
>>> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
>>> etc
>>>
>>
>> a backtrace would be helpful.
>>
>> d
>
> Nevermind, I managed to download the list archive for debian-devel and
> replicate the bug.
>
> The bug seems to be related to smime signature verification. After
> adding the attached mail message (and running notmuch-new), to replicate
> the hang it suffices to run
>
> % notmuch show --decrypt id:20161116t143809.ga.21721.s...@fsing.rootsland.net 
>  
>
> As far as workarounds, turning off decryption / signature verification
> should allow you to at least view the thread.

I've noticed similar behavior, and it seems to correlate with gpgsm
asking dirmngr for an update to CRLs related to S/MIME certs.

some CRLs simply do not respond at all (resulting in a timeout), and
some do not respond, or are laggy, when accessed over tor specifically.

I see a couple possible ways to consider resolving this, none of them
great, and i don't know exactly how to implement any of them:

 0) turn off CRL updates entirely during s/mime signature verification

 1) do s/mime signature verification without CRL updates, but schedule
CRL checks to happen in the background for dirmngr, so that future
verifications will reflect the cert validity

 2) have dirmngr avoid checking CRLs that it knows it has already
updated recently

 3) tell dirmngr to use much shorter CRL fetch timeouts


Some example traffic from my dirmngr that uses tor (complete with
timestamps indicating just how bad the delays can be):

Nov 22 14:08:24 alice dirmngr[11976]: no CRL available for issuer id 
770B4DA5913F2572B9F679AE0819FB7D77572689
Nov 22 14:08:24 alice dirmngr[11976]: fetching CRL from 
'http://crl.ll.mit.edu/getcrl/LLCA3'
Nov 22 14:08:44 alice dirmngr[11976]: resolving 'crl.ll.mit.edu' failed: No data
Nov 22 14:08:44 alice dirmngr[11976]: can't connect to 'crl.ll.mit.edu': host 
not found
Nov 22 14:08:44 alice dirmngr[11976]: error retrieving 
'http://crl.ll.mit.edu/getcrl/LLCA3': Unknown host
Nov 22 14:08:44 alice dirmngr[11976]: crl_fetch via DP failed: Unknown host
Nov 22 14:08:45 alice dirmngr[11976]: no CRL available for issuer id 
770B4DA5913F2572B9F679AE0819FB7D77572689
Nov 22 14:08:45 alice dirmngr[11976]: fetching CRL from 
'http://crl.ll.mit.edu/getcrl/LLCA3'
Nov 22 14:09:05 alice dirmngr[11976]: resolving 'crl.ll.mit.edu' failed: No data
Nov 22 14:09:05 alice dirmngr[11976]: can't connect to 'crl.ll.mit.edu': host 
not found
Nov 22 14:09:05 alice dirmngr[11976]: error retrieving 
'http://crl.ll.mit.edu/getcrl/LLCA3': Unknown host
Nov 22 14:09:05 alice dirmngr[11976]: crl_fetch via DP failed: Unknown host
Nov 22 14:09:05 alice dirmngr[11976]: no CRL available for issuer id 
26FD002905277B015EE9B2A3C092A348F28A4C6B
Nov 22 14:09:05 alice dirmngr[11976]: fetching CRL from 
'http://crl.startssl.com/sca-client1.crl'
Nov 22 14:09:25 alice dirmngr[11976]: resolving 'crl.startssl.com' failed: No 
data
Nov 22 14:09:25 alice dirmngr[11976]: can't connect to 'crl.startssl.com': host 
not found
Nov 22 14:09:25 alice dirmngr[11976]: error retrieving 
'http://crl.startssl.com/sca-client1.crl': Unknown host
Nov 22 14:09:25 alice dirmngr[11976]: crl_fetch via DP failed: Unknown host
Nov 22 14:09:25 alice dirmngr[11976]: no CRL available for issuer id 
26FD002905277B015EE9B2A3C092A348F28A4C6B
Nov 22 14:09:25 alice dirmngr[11976]: fetching CRL from 
'http://crl.startssl.com/sca-client1.crl'
Nov 22 14:09:45 alice dirmngr[11976]: resolving 'crl.startssl.com' failed: No 
data
Nov 22 14:09:45 alice dirmngr[11976]: can't connect to 'crl.startssl.com': host 
not found
Nov 22 14:09:45 alice dirmngr[11976]: error retrieving 
'http://crl.startssl.com/sca-client1.crl': Unknown host
Nov 22 14:09:45 alice dirmngr[11976]: crl_fetch via DP failed: Unknown host

that's a 20-second lag between each failed check, adding up to 80
seconds delay in rendering a single thread where 4 messages were signed
by S/MIME keys signed by two different authorities.

Fwiw, crl.ll.mit.edu doesn't seem to respond over tor on port 80 at all
in some cases, and in other cases takes nearly a minute to reply:

0 dkg@alice:/tmp/cdtemp.Ue45bu$ time wget -q 
'http://crl.ll.mit.edu/getcrl/LLCA3'

real0m0.694s
user0m0.008s
sys 0m0.008s
0 dkg@alice:/tmp/cdtemp.Ue45bu$ time torsocks wget -q 
'http://crl.ll.mit.edu/getcrl/LLCA3'

real0m58.828s
user0m0.008s
sys 0m0.008s
0 dkg@alice:/tmp/cdtemp.Ue45bu$ 


Any thoughts on the best way to pursue this?

--dkg


signature.asc
Description: PGP signature


Bug#842291: notmuch processes frequently stuck in select()

2016-11-23 Thread David Bremner
David Bremner  writes:

> Brian May  writes:
>> strace shows notmuch looping in select.
>>
>> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
>> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
>> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
>> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
>> etc
>>
>
> a backtrace would be helpful.
>
> d

Nevermind, I managed to download the list archive for debian-devel and
replicate the bug.

The bug seems to be related to smime signature verification. After
adding the attached mail message (and running notmuch-new), to replicate
the hang it suffices to run

% notmuch show --decrypt id:20161116t143809.ga.21721.s...@fsing.rootsland.net  

As far as workarounds, turning off decryption / signature verification
should allow you to at least view the thread.



20161116T143809.GA.21721.stse@fsing.rootsland.net.eml:2,
Description: Binary data


Bug#842291: notmuch processes frequently stuck in select()

2016-11-23 Thread David Bremner
Brian May  writes:
> strace shows notmuch looping in select.
>
> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
> select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
> etc
>

a backtrace would be helpful.

d



Bug#842291: notmuch processes frequently stuck in select()

2016-11-22 Thread Brian May
I am also seeing similar behaviour. Means I can't read certain threads
in notmuch. For example, the recent openssl thread on debian-devel.

I have version 0.23.1-1

It doesn't seem to matter what messages I want to try to look at from
that thread, I can't see anything in it.

strace shows notmuch looping in select.

select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
select(10, [9], [], NULL, {1, 0})   = 0 (Timeout)
etc

Work arounds would be appreciated.

Thanks.
-- 
Brian May 



Bug#842291: notmuch processes frequently stuck in select()

2016-10-27 Thread David Bremner
Robbie Harwood  writes:

> David Bremner  writes:
>
>> 1) Can you duplicate the problem with non-encrypted messages?
>
> Not sure.  Most of the time it's inconsistent what messages it happens
> for (and it's hard to see whether the message is encrypted when notmuch
> won't show it to me).  I'll update with a traceback for a non-encrypted
> if I can find one.

I guess the question is really can you find a case that doesn't involve
hanging in a function starting with _gpgme. Your initial traceback
actually looks like it involves verifying an SMIME signature, rather
than decrypting. You can test this hypothesis by repeating the command
without '--decrypt' or '--verify'



Bug#842291: notmuch processes frequently stuck in select()

2016-10-27 Thread Robbie Harwood
David Bremner  writes:

> 1) Can you duplicate the problem with non-encrypted messages?

Not sure.  Most of the time it's inconsistent what messages it happens
for (and it's hard to see whether the message is encrypted when notmuch
won't show it to me).  I'll update with a traceback for a non-encrypted
if I can find one.

> 2) Can you duplicate the problem with the same command run outside of
>emacs?

Yes, the same command hangs outside of emacs.


signature.asc
Description: PGP signature


Bug#842291: notmuch processes frequently stuck in select()

2016-10-27 Thread David Bremner
Robbie Harwood  writes:

> Package: notmuch
> Version: 0.23.1-1
> Severity: important
>
> Dear Maintainer,
>
> Frequently the processes notmuch-emacs spawns to view messages never finish
> returning the message data.  Sometimes this is deterministic and the message
> is never viewable; other times, spawning more of the same process causes the
> first process to complete.
>
> Here's a traceback from one of the processes (invoked as
> /usr/bin/notmuch show --format=sexp --format-version=1 --decrypt 
> --exclude=false ' thread:[redacted] and ( [redacted] )'
> ):
>
> (gdb) bt
> #0  0x7f1d74cff293 in __select_nocancel () at 
> ../sysdeps/unix/syscall-template.S:84
> #1  0x7f1d736b2429 in _gpgme_ath_select (nfd=nfd@entry=10, 
> rset=rset@entry=0x7ffe2d9c99e0, 
> wset=wset@entry=0x7ffe2d9c9a60, eset=eset@entry=0x0, 
> timeout=timeout@entry=0x7ffe2d9c99d0)
> at ath-pthread.c:153
> #2  0x7f1d736ad857 in _gpgme_io_select (fds=0x556a3ddb1e30, nfds=10, 
> nonblock=nonblock@entry=0)
> at posix-io.c:643
> #3  0x7f1d7368d2de in _gpgme_wait_on_condition 
> (ctx=ctx@entry=0x556a3ddb1260, cond=cond@entry=0x0, 
> op_err_p=op_err_p@entry=0x0) at wait-private.c:87
> #4  0x7f1d7368d4d9 in _gpgme_wait_one (ctx=ctx@entry=0x556a3ddb1260) at 
> wait-private.c:170
> #5  0x7f1d73691898 in gpgme_op_verify (ctx=0x556a3ddb1260, sig= out>, 

1) Can you duplicate the problem with non-encrypted messages?
2) Can you duplicate the problem with the same command run outside of
   emacs?

d



Bug#842291: notmuch processes frequently stuck in select()

2016-10-27 Thread Robbie Harwood
Package: notmuch
Version: 0.23.1-1
Severity: important

Dear Maintainer,

Frequently the processes notmuch-emacs spawns to view messages never finish
returning the message data.  Sometimes this is deterministic and the message
is never viewable; other times, spawning more of the same process causes the
first process to complete.

Here's a traceback from one of the processes (invoked as
/usr/bin/notmuch show --format=sexp --format-version=1 --decrypt 
--exclude=false ' thread:[redacted] and ( [redacted] )'
):

(gdb) bt
#0  0x7f1d74cff293 in __select_nocancel () at 
../sysdeps/unix/syscall-template.S:84
#1  0x7f1d736b2429 in _gpgme_ath_select (nfd=nfd@entry=10, 
rset=rset@entry=0x7ffe2d9c99e0, 
wset=wset@entry=0x7ffe2d9c9a60, eset=eset@entry=0x0, 
timeout=timeout@entry=0x7ffe2d9c99d0)
at ath-pthread.c:153
#2  0x7f1d736ad857 in _gpgme_io_select (fds=0x556a3ddb1e30, nfds=10, 
nonblock=nonblock@entry=0)
at posix-io.c:643
#3  0x7f1d7368d2de in _gpgme_wait_on_condition 
(ctx=ctx@entry=0x556a3ddb1260, cond=cond@entry=0x0, 
op_err_p=op_err_p@entry=0x0) at wait-private.c:87
#4  0x7f1d7368d4d9 in _gpgme_wait_one (ctx=ctx@entry=0x556a3ddb1260) at 
wait-private.c:170
#5  0x7f1d73691898 in gpgme_op_verify (ctx=0x556a3ddb1260, sig=, 
signed_text=0x556a3ce9eb10, plaintext=plaintext@entry=0x0) at verify.c:1147
#6  0x7f1d75985b95 in pkcs7_verify (context=, 
digest=, 
istream=, sigstream=0x556a3ce38850, err=0x7ffe2d9c9c70) at 
gmime-pkcs7-context.c:646
#7  0x7f1d7597c832 in g_mime_multipart_signed_verify (mps=, 
ctx=ctx@entry=0x556a3ce5f630, err=err@entry=0x7ffe2d9c9c70) at 
gmime-multipart-signed.c:466
#8  0x556a3c5f1c80 in node_verify (cryptoctx=0x556a3ce5f630, 
part=0x556a3ce3cf00, 
node=0x556a3ce7e100) at mime-node.c:159
#9  _mime_node_create (part=0x556a3ce3cf00, parent=0x556a3ddc92d0) at 
mime-node.c:267
#10 mime_node_child (parent=parent@entry=0x556a3ddc92d0, child=child@entry=0) 
at mime-node.c:296
#11 0x556a3c5eed6d in format_part_sprinter (ctx=0x556a3ddb7cc0, 
sp=0x556a3ce5e540, 
node=0x556a3ddc92d0, first=1, output_body=1, include_html=0) at 
notmuch-show.c:563
#12 0x556a3c5ef486 in format_part_sprinter_entry (ctx=, 
sp=, 
node=, indent=, params=) at 
notmuch-show.c:670
#13 0x556a3c5edfc0 in show_message (ctx=ctx@entry=0x556a3ce38200, 
sp=sp@entry=0x556a3ce5e540, 
message=message@entry=0x556a3ddb87a0, indent=indent@entry=3, 
params=params@entry=0x7ffe2d9c9fa0, 
format=) at notmuch-show.c:827
#14 0x556a3c5ee0b9 in show_messages (ctx=ctx@entry=0x556a3ce38200, 
format=format@entry=0x556a3c7fd280 , 
sp=sp@entry=0x556a3ce5e540, 
messages=0x556a3ddb5aa0, indent=indent@entry=3, 
params=params@entry=0x7ffe2d9c9fa0)
at notmuch-show.c:863
#15 0x556a3c5ee0fc in show_messages (ctx=ctx@entry=0x556a3ce38200, 
format=format@entry=0x556a3c7fd280 , 
sp=sp@entry=0x556a3ce5e540, 
messages=0x556a3ce81700, indent=indent@entry=2, 
params=params@entry=0x7ffe2d9c9fa0)
at notmuch-show.c:871
#16 0x556a3c5ee0fc in show_messages (ctx=ctx@entry=0x556a3ce38200, 
format=format@entry=0x556a3c7fd280 , 
sp=sp@entry=0x556a3ce5e540, 
messages=0x556a3ce846c0, indent=indent@entry=1, 
params=params@entry=0x7ffe2d9c9fa0)
at notmuch-show.c:871
#17 0x556a3c5ee0fc in show_messages (ctx=ctx@entry=0x556a3ce38200, 
format=format@entry=0x556a3c7fd280 , 
sp=sp@entry=0x556a3ce5e540, 
messages=0x556a3ce6dec0, indent=indent@entry=0, 
params=params@entry=0x7ffe2d9c9fa0)
at notmuch-show.c:871
#18 0x556a3c5efd9f in do_show (params=0x7ffe2d9c9fa0, sp=0x556a3ce5e540, 
format=0x556a3c7fd280 , query=0x556a3ce3dfd0, 
ctx=0x556a3ce38200) at notmuch-show.c:959
#19 notmuch_show_command (config=0x556a3ce38200, argc=, 
argv=)
at notmuch-show.c:1171
#20 0x556a3c5e1b8f in main (argc=, argv=0x7ffe2d9ca458) at 
notmuch.c:421
(gdb) 


(Versions of programs are the latest I could install, but it also occurs with
about the same frequency when everything is from testing.)

Thanks!

-- System Information:
Debian Release: stretch/sid
  APT prefers testing-debug
  APT policy: (600, 'testing-debug'), (600, 'testing'), (400, 
'unstable-debug'), (400, 'unstable'), (200, 'experimental'), (1, 
'experimental-debug')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.7.0-1-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: sysvinit (via /sbin/init)

Versions of packages notmuch depends on:
ii  libc6   2.24-5
ii  libglib2.0-02.50.1-1
ii  libgmime-2.6-0  2.6.20-8
ii  libnotmuch4 0.23.1-1
ii  libtalloc2  2.1.8-1
ii  zlib1g  1:1.2.8.dfsg-2+b1

Versions of packages notmuch recommends:
ii  gnupg-agent2.1.15-4
ii  gpgsm  2.1.15-4
ii  notmuch-emacs  0.23.1-1

notmuch suggests no packages.

-- no debconf information