Re: [RFH] gpg --import entropy while running tests

2016-12-28 Thread Theodore Ts'o
On Wed, Dec 28, 2016 at 03:39:30AM -0500, Jeff King wrote:
> >   
> > https://git.gnupg.org/cgi-bin/gitweb.cgi?p=gnupg.git;a=commit;h=4473db1ef24031ff4e26c9a9de95dbe898ed2b97
> > 
> > So this does seem like a gpg bug.
> 
> I've submitted a bug report to gpg:
> 
>   https://bugs.gnupg.org/gnupg/issue2897
> 
> so we'll see what they say.

Yeah, they are definitely doing something very hard to explain.

Pid 8348 is the gpg-agent process which the main gpg program (pid
8344) connected to.  It starts trying to get randomness in response to
a KEYWRAP command:

8348  10:58:57.882909 access("/dev/random", R_OK) = 0
8348  10:58:57.883205 access("/dev/urandom", R_OK) = 0
8348  10:58:57.883472 open("/dev/urandom", O_RDONLY) = 9
8348  10:58:57.883729 fcntl(9, F_GETFD) = 0
8348  10:58:57.883914 fcntl(9, F_SETFD, FD_CLOEXEC) = 0

It opens /dev/urandom, but then never uses fd 9 ever again.  Instead,
it uses getrandom, but in a pretty silly fashion, with lots of sleeps
in between, and not between each progress report, either:

8348  10:58:57.884129 write(8, "S PROGRESS need_entropy X 30 120", 32 

8344  10:58:57.884338 <... read resumed> "S PROGRESS need_entropy X 30 120", 
1002) = 32
8348  10:58:57.884424 <... write resumed> ) = 32
8344  10:58:57.884488 read(5,  
8348  10:58:57.884550 write(8, "\n", 1 
8344  10:58:57.884715 <... read resumed> "\n", 970) = 1
8348  10:58:57.884800 <... write resumed> ) = 1
8344  10:58:57.884883 read(5,  
8348  10:58:57.884951 nanosleep({0, 1}, NULL) = 0
8348  10:58:57.985363 select(10, [9], NULL, NULL, {0, 10}) = 1 (in [9], 
left {0, 4})
8348  10:58:57.985593 
getrandom("&\275\354^\256\320\3w\21:R]`eJ\t\t\350\245\202>\255\237\324\324\340\24^c\323\210\376"...,
 90, 0) = 90
8348  10:58:57.985751 write(8, "S PROGRESS need_entropy X 120 12"..., 33) = 33
8344  10:58:57.985885 <... read resumed> "S PROGRESS need_entropy X 120 12"..., 
1002) = 33
8348  10:58:57.985934 write(8, "\n", 1 
8344  10:58:57.985982 read(5,  
8348  10:58:57.986015 <... write resumed> ) = 1
8344  10:58:57.986048 <... read resumed> "\n", 969) = 1
8348  10:58:57.986090 nanosleep({0, 1},  
8344  10:58:57.986142 read(5,  
8348  10:58:58.086253 <... nanosleep resumed> NULL) = 0
8348  10:58:58.086370 write(8, "S PROGRESS need_entropy X 30 120", 32) = 32
8344  10:58:58.086502 <... read resumed> "S PROGRESS need_entropy X 30 120", 
1002) = 32
8348  10:58:58.086541 write(8, "\n", 1 
8344  10:58:58.086579 read(5,  
8348  10:58:58.086604 <... write resumed> ) = 1
8344  10:58:58.086630 <... read resumed> "\n", 970) = 1
8348  10:58:58.086661 nanosleep({0, 1},  
8344  10:58:58.086703 read(5,  
8348  10:58:58.186815 <... nanosleep resumed> NULL) = 0
8348  10:58:58.186894 select(10, [9], NULL, NULL, {0, 10}) = 1 (in [9], 
left {0, 5})
8348  10:58:58.187038 
getrandom("\365\221\374m\360\235\27\330\264\223\365\363<6\302\324F\5\354Q|,\366\253\337u\226\265\345\250CA"...,
 90, 0) = 90

The worst part of this is that the commit description claims this is a
workaround for libgcrypt using /dev/random, but it's not using
/dev/random --- it's using getrandom, and it pointlessly opened
/dev/urandom first (having never opened /dev/random).

This looks like a classic case of Lotus Notes / Websphere disease ---
to many d*mned layers of abstraction

- Ted


Re: [RFH] gpg --import entropy while running tests

2016-12-28 Thread Jeff King
On Wed, Dec 28, 2016 at 03:02:30AM -0500, Jeff King wrote:

> > Is it a bug in gpg (oddly, the kernel reports lots of entropy available,
> > and generating the signatures themselves is quite fast)? Or is the new
> > version doing something special in the import process that we need to
> > work around or disable?
> 
> Answering my own question (somewhat): this is bisectable in the gnupg
> repository, and it turns out to be caused by their 4473db1ef (agent:
> Kludge to mitigate blocking calls in Libgcrypt., 2016-11-11), which
> introduces a 100ms sleep (yuck) that is presumably triggering way more
> than it needs to. More details at:
> 
>   
> https://git.gnupg.org/cgi-bin/gitweb.cgi?p=gnupg.git;a=commit;h=4473db1ef24031ff4e26c9a9de95dbe898ed2b97
> 
> So this does seem like a gpg bug.

I've submitted a bug report to gpg:

  https://bugs.gnupg.org/gnupg/issue2897

so we'll see what they say.

-Peff


Re: [RFH] gpg --import entropy while running tests

2016-12-28 Thread Jeff King
On Wed, Dec 28, 2016 at 02:23:03AM -0500, Jeff King wrote:

> That's a lot of time not using any CPU. What's going on? Running with
> "sh -x" shows that we spend most of the time in this line from
> lib-gpg.sh:
> 
>   gpg --homedir "${GNUPGHOME}" 2>/dev/null --import \
> "$TEST_DIRECTORY"/lib-gpg/keyring.gpg
> 
> And running gpg with "--debug-level guru" shows that we are blocking
> while waiting for entropy. Has anybody else seen this? I feel like I
> noticed it starting a few weeks ago, and indeed dropping back to gpg
> 2.0.26 (from 2.1.17) makes the problem go away.
> 
> Is it a bug in gpg (oddly, the kernel reports lots of entropy available,
> and generating the signatures themselves is quite fast)? Or is the new
> version doing something special in the import process that we need to
> work around or disable?

Answering my own question (somewhat): this is bisectable in the gnupg
repository, and it turns out to be caused by their 4473db1ef (agent:
Kludge to mitigate blocking calls in Libgcrypt., 2016-11-11), which
introduces a 100ms sleep (yuck) that is presumably triggering way more
than it needs to. More details at:

  
https://git.gnupg.org/cgi-bin/gitweb.cgi?p=gnupg.git;a=commit;h=4473db1ef24031ff4e26c9a9de95dbe898ed2b97

So this does seem like a gpg bug.

-Peff


[RFH] gpg --import entropy while running tests

2016-12-27 Thread Jeff King
I noticed a few of our test scripts taking a long time to run, even
though they used to be quick. Here's one:

  $ time ./t7612-merge-verify-signatures.sh 
  ok 1 - create signed commits
  ok 2 - merge unsigned commit with verification
  ok 3 - merge commit with bad signature with verification
  ok 4 - merge commit with untrusted signature with verification
  ok 5 - merge signed commit with verification
  ok 6 - merge commit with bad signature without verification
  # passed all 6 test(s)
  1..6

  real0m12.285s
  user0m0.048s
  sys 0m0.044s


That's a lot of time not using any CPU. What's going on? Running with
"sh -x" shows that we spend most of the time in this line from
lib-gpg.sh:

  gpg --homedir "${GNUPGHOME}" 2>/dev/null --import \
  "$TEST_DIRECTORY"/lib-gpg/keyring.gpg

And running gpg with "--debug-level guru" shows that we are blocking
while waiting for entropy. Has anybody else seen this? I feel like I
noticed it starting a few weeks ago, and indeed dropping back to gpg
2.0.26 (from 2.1.17) makes the problem go away.

Is it a bug in gpg (oddly, the kernel reports lots of entropy available,
and generating the signatures themselves is quite fast)? Or is the new
version doing something special in the import process that we need to
work around or disable?

The reason we use --import at all is to handle differences in the .gnupg
format between versions 1 and 2. So the nuclear option would be to just
carry pre-made .gnupg directories for _both_ versions in our test suite,
and pick the appropriate one based on the output of "gpg --version".
That feels like a hack, but it gives us a lot of control.

I'd love it if we could figure out a way of making --import reliably
fast, though.

-Peff