Bug#923423: dpkg: Hangs for 19 mins installing texlive-fonts-extra

2022-10-15 Thread Vincent Lefevre
Control: reopen -1

On 2022-10-14 02:26:20 +0200, Guillem Jover wrote:
> All these fsync()s you see in rapid succession are used as a
> synchronization points, way after the data has been requested to be
> synced to disk asynchronously via sync_file_range().

I don't know why, but what strace shows is fsync(), not
sync_file_range(). See strace output at

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=923423#44

So this is not "asynchronously".

> What this is trying to achieve is durability, so that dpkg can know
> the data is on the disk, so that it can mark the package as installed.

I agree that there should be a sync at the end (at least). But here,
there sems to be one for *every* of the 92000 files!

> This is explained on the dpkg FAQ:
> 
>   
> 
> [...]
> Most programs do not seem concerned about making sure the data is
> stored safely on disk, I'm afraid.
> 
> In any case, I don't think there's anything else for dpkg to do here.
> Please see the FAQ entry. I'm thus closing this now.

According to the discussion at bug 1021750, and in particular

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1021750#45

this FAQ is wrong (concerning both the performance and data safety).
If you disagree, please comment there.

Since this is based on incorrect assumptions (wrong FAQ, and sync's
are not asynchronous) and synchronizations could be less frequent
without making data store less safe[*], I'm reopening the bug.

[*] On the opposite, I would tend to think that such frequent
synchronizations tend to yield more writes on the disk, thus more
stress on the hardware, which could make data store less safe.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#923423: dpkg: Hangs for 19 mins installing texlive-fonts-extra

2019-04-09 Thread Vincent Lefevre
Control: retitle -1 dpkg: package install can be very slow with some disks due 
to too frequent fsync

Hi,

On 2019-04-09 00:33:27 +0200, Guillem Jover wrote:
> On Tue, 2019-03-05 at 15:43:05 +0100, Vincent Lefevre wrote:
> > zira:~> iozone -a -e
> > [...]
> >   random
> > random bkwdrecordstride
> >   kB  reclenwrite  rewritereadrereadread 
> > write read   rewrite  read   fwrite frewritefread  freread
> >   64   4 1821 1987 1922 3703 3657 
> > 1770 1621  1628  1697 1707 1773  1879725  2892445
> >   64   8 1606 1684 1713 3493 3476 
> > 1700 1765  1730  1812 2135 1852  2278628  3406295
> >   64  16 1807 1823 1986 3898 3798 
> > 1977 1813  1988  1871 2049 3320  2561267  4018152
> >   64  32 1702 1714 1755 3494 3683 
> > 1741 1841  1796  1737 1633 1674  1879725  4018152
> >   64  64 1808 1790 1632 3314 3387 
> > 2254 1995  1843  1984 1789 1820  2801873  3203069
> > [...]
> 
> Ok, can we conclude this is not a problem in dpkg then? :)

I'm not sure. This may just be a slow disk after all, and in practice
(I mean except benchmarking), the problem seems specific to dpkg.
I assume that programs normally don't do a fsync at every fraction
of microsecond! I don't know what dpkg is trying to achieve with such
a frequency of fsync, but this doesn't seem to make sense to me. You
had said "which has been the solution to earlier problems with ext4",
but the real solution would be to fix ext4 (does "earlier" mean this
has eventually been fixed?).

> It seems to me either hardware, filesystem or kernel related as
> mentioned before.

I've seen https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1785020
"fsync is slow on later kernels with ext4 filesystms" but the comments
in this bug page and 4.19.28-1 kernel logs show that this has been
fixed.

FYI, I get very similar timings on the same machine, but on /boot,
which is:

/dev/sda1 on /boot type ext2 
(rw,relatime,block_validity,barrier,user_xattr,acl,stripe=4)

The fact that the main, ext4 partition is encrypted but not this
one makes another difference, but the timings are similar, so that
this doesn't seem to be related to the FS system itself and appears
to be at a lower level.

Note also that my machine is a laptop, and I couldn't do a comparison
with other laptops, just in case.

> Could you reassign where you think it would be more appropriate?

I think that this should still be regarded as a dpkg bug, based on
my first paragraph above (the fsync's occur much too often in dpkg
compared to other programs, and this doesn't seem to be useful).

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#923423: dpkg: Hangs for 19 mins installing texlive-fonts-extra

2019-04-08 Thread Guillem Jover
Hi!

On Tue, 2019-03-05 at 15:43:05 +0100, Vincent Lefevre wrote:
> I've done a test with "iozone -a -e", and with this -e option
> (to include fsync), the operations are much slower than on the
> other machine.
> 
> zira:~> iozone -a -e
> [...]
>   random
> random bkwdrecordstride
>   kB  reclenwrite  rewritereadrereadread 
> write read   rewrite  read   fwrite frewritefread  freread
>   64   4 1821 1987 1922 3703 3657 
> 1770 1621  1628  1697 1707 1773  1879725  2892445
>   64   8 1606 1684 1713 3493 3476 
> 1700 1765  1730  1812 2135 1852  2278628  3406295
>   64  16 1807 1823 1986 3898 3798 
> 1977 1813  1988  1871 2049 3320  2561267  4018152
>   64  32 1702 1714 1755 3494 3683 
> 1741 1841  1796  1737 1633 1674  1879725  4018152
>   64  64 1808 1790 1632 3314 3387 
> 2254 1995  1843  1984 1789 1820  2801873  3203069
> [...]

Ok, can we conclude this is not a problem in dpkg then? :) It seems to
me either hardware, filesystem or kernel related as mentioned before.
Could you reassign where you think it would be more appropriate?

Thanks,
Guillem



Bug#923423: dpkg: Hangs for 19 mins installing texlive-fonts-extra

2019-03-05 Thread Vincent Lefevre
I've done a test with "iozone -a -e", and with this -e option
(to include fsync), the operations are much slower than on the
other machine.

zira:~> iozone -a -e
[...]
  randomrandom  
   bkwdrecordstride
  kB  reclenwrite  rewritereadrereadread write  
   read   rewrite  read   fwrite frewritefread  freread
  64   4 1821 1987 1922 3703 3657 1770  
   1621  1628  1697 1707 1773  1879725  2892445
  64   8 1606 1684 1713 3493 3476 1700  
   1765  1730  1812 2135 1852  2278628  3406295
  64  16 1807 1823 1986 3898 3798 1977  
   1813  1988  1871 2049 3320  2561267  4018152
  64  32 1702 1714 1755 3494 3683 1741  
   1841  1796  1737 1633 1674  1879725  4018152
  64  64 1808 1790 1632 3314 3387 2254  
   1995  1843  1984 1789 1820  2801873  3203069
[...]

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#923423: dpkg: Hangs for 19 mins installing texlive-fonts-extra

2019-03-05 Thread Vincent Lefevre
On 2019-03-05 15:43:05 +0100, Vincent Lefevre wrote:
> I've done a test with "iozone -a -e", and with this -e option
> (to include fsync), the operations are much slower than on the
> other machine.
> 
> zira:~> iozone -a -e
> [...]
>   random
> random bkwdrecordstride
>   kB  reclenwrite  rewritereadrereadread 
> write read   rewrite  read   fwrite frewritefread  freread
>   64   4 1821 1987 1922 3703 3657 
> 1770 1621  1628  1697 1707 1773  1879725  2892445
>   64   8 1606 1684 1713 3493 3476 
> 1700 1765  1730  1812 2135 1852  2278628  3406295
>   64  16 1807 1823 1986 3898 3798 
> 1977 1813  1988  1871 2049 3320  2561267  4018152
>   64  32 1702 1714 1755 3494 3683 
> 1741 1841  1796  1737 1633 1674  1879725  4018152
>   64  64 1808 1790 1632 3314 3387 
> 2254 1995  1843  1984 1789 1820  2801873  3203069
> [...]

I meant, except fread and freread.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#923423: dpkg: Hangs for 19 mins installing texlive-fonts-extra

2019-03-05 Thread Vincent Lefevre
On 2019-03-05 13:52:27 +0100, Guillem Jover wrote:
> Hi!
> 
> On Sun, 2019-03-03 at 02:38:10 +0100, Vincent Lefevre wrote:
> > On 2019-03-03 00:15:13 +0100, Guillem Jover wrote:
> > > Did this start happening due to the new kernel version?
> > 
> > In March 2018, this was already slow, but much less:
> > 
> > [...]
> > 2018-03-08 00:55:55 status unpacked texlive-fonts-extra:all 2017.20180225-1
> > 2018-03-08 00:55:55 status half-installed texlive-fonts-extra:all 
> > 2017.20180225-1
> > 2018-03-08 01:02:55 status half-installed texlive-fonts-extra:all 
> > 2017.20180225-1
> > 2018-03-08 01:02:56 status unpacked texlive-fonts-extra:all 2017.20180305-2
> > 2018-03-08 01:02:56 status unpacked texlive-fonts-extra:all 2017.20180305-2
> > [...]
> > 
> > I don't have older information.
> > 
> > > What filesystem are you using (say btrfs or similar?),
> > 
> > mount information:
> > 
> > /dev/mapper/zira--vg-root on / type ext4 (rw,relatime,errors=remount-ro)
> > 
> > > etc, can you reproduce on another system? I certainly can't. :)
> > 
> > No, on two of my other machines, it takes around 1 minute
> > (both for old and new versions). I assume that this is what
> > is expected.
> 
> Yes. This does really look like a kernel or hardware issue TBH.
> Could you try downgrading the kernel to earlier versions to see
> whether that improves, that'd give a definitive answer.

Same issue (also around 20 minutes) with the kernel from stable:

Linux zira 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux

If it's a hardware issue, I'm wondering why I don't have messages
in journalctl output.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#923423: dpkg: Hangs for 19 mins installing texlive-fonts-extra

2019-03-05 Thread Guillem Jover
Hi!

On Sun, 2019-03-03 at 02:38:10 +0100, Vincent Lefevre wrote:
> On 2019-03-03 00:15:13 +0100, Guillem Jover wrote:
> > Did this start happening due to the new kernel version?
> 
> In March 2018, this was already slow, but much less:
> 
> [...]
> 2018-03-08 00:55:55 status unpacked texlive-fonts-extra:all 2017.20180225-1
> 2018-03-08 00:55:55 status half-installed texlive-fonts-extra:all 
> 2017.20180225-1
> 2018-03-08 01:02:55 status half-installed texlive-fonts-extra:all 
> 2017.20180225-1
> 2018-03-08 01:02:56 status unpacked texlive-fonts-extra:all 2017.20180305-2
> 2018-03-08 01:02:56 status unpacked texlive-fonts-extra:all 2017.20180305-2
> [...]
> 
> I don't have older information.
> 
> > What filesystem are you using (say btrfs or similar?),
> 
> mount information:
> 
> /dev/mapper/zira--vg-root on / type ext4 (rw,relatime,errors=remount-ro)
> 
> > etc, can you reproduce on another system? I certainly can't. :)
> 
> No, on two of my other machines, it takes around 1 minute
> (both for old and new versions). I assume that this is what
> is expected.

Yes. This does really look like a kernel or hardware issue TBH.
Could you try downgrading the kernel to earlier versions to see
whether that improves, that'd give a definitive answer.

I'd say try also older dpkg versions, but I seriously doubt that'd
change much, as I don't remember any major changes in this area
recently.

> > I guess attaching strace to the running dpkg might also give some more
> > info on what's going on, or where it's stalling. Also if you can
> > simply reproduce via «dpkg -i», then enabling all dpkg debug flags
> > might also help.
> 
> With
> 
>   dpkg -i /var/cache/apt/archives/texlive-fonts-extra_2018.20190227-1_all.deb
> 
> GKrellM shows full disk activity for a bit less than 2 minutes, then
> no activity, but dpkg is still running. Attaching strace against it
> during this time shows lots of rename's:
> 
> [...]
> rename("/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2cijw8y.tfm.dpkg-new",
>  
> "/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2cijw8y.tfm")
>  = 0
> openat(AT_FDCWD, 
> "/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2ciw8y.tfm.dpkg-new",
>  O_WRONLY) = 11
> fsync(11)   = 0
> close(11)   = 0
> rename("/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2ciw8y.tfm.dpkg-new",
>  
> "/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2ciw8y.tfm") 
> = 0
> openat(AT_FDCWD, 
> "/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2cj8c.tfm.dpkg-new",
>  O_WRONLY) = 11
> fsync(11)   = 0
> close(11)   = 0
> rename("/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2cj8c.tfm.dpkg-new",
>  "/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2cj8c.tfm") 
> = 0
> openat(AT_FDCWD, 
> "/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2cj8t.tfm.dpkg-new",
>  O_WRONLY) = 11
> fsync(11)   = 0
> close(11)   = 0
> [...]

Thanks. Yes this would be the code doing the delayed fsync(3 ) with
atomic renames(3), which has been the solution to earlier problems
with ext4.

Thanks,
Guillem



Bug#923423: dpkg: Hangs for 19 mins installing texlive-fonts-extra

2019-03-02 Thread Vincent Lefevre
On 2019-03-03 00:15:13 +0100, Guillem Jover wrote:
> Did this start happening due to the new kernel version?

In March 2018, this was already slow, but much less:

[...]
2018-03-08 00:55:55 status unpacked texlive-fonts-extra:all 2017.20180225-1
2018-03-08 00:55:55 status half-installed texlive-fonts-extra:all 
2017.20180225-1
2018-03-08 01:02:55 status half-installed texlive-fonts-extra:all 
2017.20180225-1
2018-03-08 01:02:56 status unpacked texlive-fonts-extra:all 2017.20180305-2
2018-03-08 01:02:56 status unpacked texlive-fonts-extra:all 2017.20180305-2
[...]

I don't have older information.

> What filesystem are you using (say btrfs or similar?),

mount information:

/dev/mapper/zira--vg-root on / type ext4 (rw,relatime,errors=remount-ro)

> etc, can you reproduce on another system? I certainly can't. :)

No, on two of my other machines, it takes around 1 minute
(both for old and new versions). I assume that this is what
is expected.

> I guess attaching strace to the running dpkg might also give some more
> info on what's going on, or where it's stalling. Also if you can
> simply reproduce via «dpkg -i», then enabling all dpkg debug flags
> might also help.

With

  dpkg -i /var/cache/apt/archives/texlive-fonts-extra_2018.20190227-1_all.deb

GKrellM shows full disk activity for a bit less than 2 minutes, then
no activity, but dpkg is still running. Attaching strace against it
during this time shows lots of rename's:

[...]
rename("/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2cijw8y.tfm.dpkg-new",
 "/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2cijw8y.tfm") 
= 0
openat(AT_FDCWD, 
"/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2ciw8y.tfm.dpkg-new",
 O_WRONLY) = 11
fsync(11)   = 0
close(11)   = 0
rename("/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2ciw8y.tfm.dpkg-new",
 "/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2ciw8y.tfm") 
= 0
openat(AT_FDCWD, 
"/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2cj8c.tfm.dpkg-new",
 O_WRONLY) = 11
fsync(11)   = 0
close(11)   = 0
rename("/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2cj8c.tfm.dpkg-new",
 "/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2cj8c.tfm") = 0
openat(AT_FDCWD, 
"/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2cj8t.tfm.dpkg-new",
 O_WRONLY) = 11
fsync(11)   = 0
close(11)   = 0
[...]

> > -- System Information:
> > Debian Release: buster/sid
> >   APT prefers unstable-debug
> >   APT policy: (500, 'unstable-debug'), (500, 'stable-updates'), (500, 
> > 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental')
> > Architecture: amd64 (x86_64)
> > 
> > Kernel: Linux 4.19.0-3-amd64 (SMP w/8 CPU cores)
> > Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, 
> > TAINT_UNSIGNED_MODULE
> > Locale: LANG=POSIX, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=POSIX 
> > (charmap=UTF-8)
> > Shell: /bin/sh linked to /bin/dash
> > Init: systemd (via /run/systemd/system)
> > LSM: AppArmor: enabled
> 
> I see the kernel is tainted. Perhaps try to unload those and see? :)

That's the NVIDIA drivers (because nouveau is too broken).

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#923423: dpkg: Hangs for 19 mins installing texlive-fonts-extra

2019-03-02 Thread Guillem Jover
Hi!

On Thu, 2019-02-28 at 01:06:28 +0100, Vincent Lefevre wrote:
> Package: texlive-fonts-extra
> Version: 2018.20190227-1
> Severity: serious

> Installation of texlive-fonts-extra 2018.20190227-1 hangs.

> 17m01s ago dpkg was started, at 2019-02-28T00:48:45+01:00.
> 1.5% has been its average CPU usage since then, or 15s/17m01s
> 
> Other processes started close to dpkg(5553):
>   [kworker/0:1-events](5351) was started 2m57s before dpkg(5553)
>   [kworker/u17:0-kcryptd](5540) was started 1.0s before dpkg(5553)
>   [kworker/4:2](5771) was started 2m19s after dpkg(5553)
>   [kworker/u16:1-events_unbound](5928) was started 4m21s after dpkg(5553)
>   [kworker/5:2-mm_percpu_wq](6068) was started 5m06s after dpkg(5553)

Thanks for all the info. This does look like a hardware or kernel
problem to me, though.

Did this start happening due to the new kernel version? What
filesystem are you using (say btrfs or similar?), etc, can you
reproduce on another system? I certainly can't. :)

I guess attaching strace to the running dpkg might also give some more
info on what's going on, or where it's stalling. Also if you can
simply reproduce via «dpkg -i», then enabling all dpkg debug flags
might also help.

> -- System Information:
> Debian Release: buster/sid
>   APT prefers unstable-debug
>   APT policy: (500, 'unstable-debug'), (500, 'stable-updates'), (500, 
> 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental')
> Architecture: amd64 (x86_64)
> 
> Kernel: Linux 4.19.0-3-amd64 (SMP w/8 CPU cores)
> Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, 
> TAINT_UNSIGNED_MODULE
> Locale: LANG=POSIX, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=POSIX 
> (charmap=UTF-8)
> Shell: /bin/sh linked to /bin/dash
> Init: systemd (via /run/systemd/system)
> LSM: AppArmor: enabled

I see the kernel is tainted. Perhaps try to unload those and see? :)

> Versions of packages tex-common depends on:
> ii  dpkg  1.19.5

Thanks,
Guillem