[Bug 1948995] Re: Allow reverting to older revisions of a snap

2022-01-03 Thread Benjamin Allot
I think this bug should be followed up with a change to retain=3 on
Ubuntu servers.

** Changed in: snapd (Ubuntu)
   Status: Expired => New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1948995

Title:
  Allow reverting to older revisions of a snap

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/snapd/+bug/1948995/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1900438] Re: Bcache bypass writeback on caching device with fragmentation

2021-10-13 Thread Benjamin Allot
** Summary changed:

- Bcache bypasse writeback on caching device with fragmentation
+ Bcache bypass writeback on caching device with fragmentation

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1900438

Title:
  Bcache bypass writeback on caching device with fragmentation

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1900438/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1902960] Re: Upgrade from 245.4-4ubuntu3.3 to 245.4-4ubuntu3.2 appears to break DNS resolution in some cases

2021-01-13 Thread Benjamin Allot
I confirm I got it working at first boot on azure with
systemd-245.4-4ubuntu3.4

```
ubuntu@machine-3:~$ sudo networkctl
IDX LINK TYPE OPERATIONAL SETUP 
  1 lo   loopback carrier unmanaged 
  2 eth0 etherroutableconfigured

2 links listed.
ubuntu@machine-3:~$ sudo apt update
Hit:1 http://ppa.launchpad.net/telegraf-devs/ppa/ubuntu focal InRelease
Hit:2 http://us.archive.ubuntu.com/ubuntu focal InRelease
Get:3 http://us.archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:4 http://us.archive.ubuntu.com/ubuntu focal-backports InRelease [101 kB]
Get:5 http://us.archive.ubuntu.com/ubuntu focal-security InRelease [109 kB]
Get:6 http://us.archive.ubuntu.com/ubuntu focal-proposed InRelease [267 kB]
Fetched 591 kB in 3s (225 kB/s)  
Reading package lists... Done
Building dependency tree   
Reading state information... Done
All packages are up to date.
ubuntu@machine-3:~$ dpkg -l systemd
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name   Version  Architecture Description
+++-==---=
ii  systemd245.4-4ubuntu3.4 amd64system and service manager

```

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1902960

Title:
  Upgrade from 245.4-4ubuntu3.3 to 245.4-4ubuntu3.2 appears to break DNS
  resolution in some cases

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/1902960/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1904549] Re: MTU is not set on vlan interface

2020-11-30 Thread Benjamin Allot
Sadly, the journalctl logs don't go back that far now.

It failed on a baremetal server, not a cloud.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1904549

Title:
  MTU is not set on vlan interface

To manage notifications about this bug go to:
https://bugs.launchpad.net/netplan/+bug/1904549/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1898786] Re: bcache: Issues with large IO wait in bch_mca_scan() when shrinker is enabled

2020-11-30 Thread Benjamin Allot
Ack !

I'll check with Launchpad team then, I think they would probably prefer
to wait for the -updates indeed.

Thanks again for your work dans Dan's.

Cheers,

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1898786

Title:
  bcache: Issues with large IO wait in bch_mca_scan() when shrinker is
  enabled

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1898786/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1898786] Re: bcache: Issues with large IO wait in bch_mca_scan() when shrinker is enabled

2020-11-27 Thread Benjamin Allot
Hello Matthew, sorry for the lack of response.

I'll check with Launchpad people if we can justify a reboot of the
server soon and will keep you posted !

Regards,

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1898786

Title:
  bcache: Issues with large IO wait in bch_mca_scan() when shrinker is
  enabled

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1898786/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1904549] Re: MTU is not set on vlan interface

2020-11-23 Thread Benjamin Allot
Hello,

I don't need it on bond-manlan for sure (MTU 1500 on this one and its VLANs), 
but I read in several place that setting MTU on the bond would also set it on 
interface members.
And that indeed worked correctly.

As I said, I think the issue is definitely more in the udevd/networkd
part (remind me of the azure issue seen in
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1902960) because
the /run/systemd/network files were correct after the netplan apply.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1904549

Title:
  MTU is not set on vlan interface

To manage notifications about this bug go to:
https://bugs.launchpad.net/netplan/+bug/1904549/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1902960] Re: Upgrade from 245.4-4ubuntu3.3 to 245.4-4ubuntu3.2 appears to break DNS resolution in some cases

2020-11-10 Thread Benjamin Allot
Thanks for the explanation.

I confirm that the workaround using "sytemctl restart systemd-udev-
trigger && systemctl restart systemd-networkd" does the trick.

@Dan Watkins : did you do some specific thing to reproduce the issue on
your local VM ? It would be interesting to see the whole logs happening
there.

We could possibly hijack the image to add a 
 | udevadm control --log-priority=debug

and see what happens.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1902960

Title:
  Upgrade from 245.4-4ubuntu3.3 to 245.4-4ubuntu3.2 appears to break DNS
  resolution in some cases

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1902960/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1902960] Re: Upgrade from 245.4-4ubuntu3.3 to 245.4-4ubuntu3.2 appears to break DNS resolution in some cases

2020-11-05 Thread Benjamin Allot
Here is a pastebin of the situation and how I tried to resolve this :
https://pastebin.ubuntu.com/p/c6cfKqvBmN/

Unfortunately, the interface stays "unmanaged".

When I check the netplan source
(https://github.com/CanonicalLtd/netplan/blob/master/netplan/cli/commands/apply.py#L128),
it just stops systemd-networkd service, then start it after generating
the file.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1902960

Title:
  Upgrade from 245.4-4ubuntu3.3 to 245.4-4ubuntu3.2 appears to break DNS
  resolution in some cases

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1902960/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1900438] Re: Bcache bypasse writeback on caching device with fragmentation

2020-10-20 Thread Benjamin Allot
I tried to run the apport-collect on one of the server we can see the
issue on :

Waiting to hear from Launchpad about your decision...

*** Collecting problem information

The collected information can be sent to the developers to improve the
application. This might take a few minutes.
..dpkg-query: no packages found matching linux
..
Traceback (most recent call last):
  File "/usr/bin/apport-cli", line 370, in 
if not app.run_argv():
  File "/usr/lib/python2.7/dist-packages/apport/ui.py", line 666, in run_argv
return self.run_update_report()
  File "/usr/lib/python2.7/dist-packages/apport/ui.py", line 564, in 
run_update_report
self.report.add_proc_environ()
  File "/usr/lib/python2.7/dist-packages/apport/report.py", line 577, in 
add_proc_environ
proc_pid_fd = os.open('/proc/%s' % pid, os.O_RDONLY | os.O_PATH | 
os.O_DIRECTORY)
AttributeError: 'module' object has no attribute 'O_PATH'
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/apport_python_hook.py", line 109, in 
apport_excepthook
pr.add_proc_info(extraenv=['PYTHONPATH', 'PYTHONHOME'])
  File "/usr/lib/python2.7/dist-packages/apport/report.py", line 507, in 
add_proc_info
proc_pid_fd = os.open('/proc/%s' % pid, os.O_RDONLY | os.O_PATH | 
os.O_DIRECTORY)
AttributeError: 'module' object has no attribute 'O_PATH'


But since the bug has already been marked as "Confirmed".

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1900438

Title:
  Bcache bypasse writeback on caching device with fragmentation

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1900438/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1900438] [NEW] Bcache bypasse writeback on caching device with fragmentation

2020-10-19 Thread Benjamin Allot
Public bug reported:

Hello,

An upstream bug has been opened on the matter for quite some time now [0].
I can reproduce easily on our production compute node instance, which are 
trusty host with xenial hwe kernels (4.15.0-101-generic).
However due to heavy backport and such, doing real tracing is a bit hard there.

I was able to reproduce the behavior on a hwe-bionic kernel as well.
Since most of our critical deployments use bcache, I think this is a kinda 
nasty bug to have.

Reproducing the issue is relatively easy with the script provided in the bug 
[1].
The script used to capture the stats is this one [2].

[0]: https://bugzilla.kernel.org/show_bug.cgi?id=206767
[1]: https://pastebin.ubuntu.com/p/YnnvvSRhXK/
[2]: https://pastebin.ubuntu.com/p/XfVpzg32sN/

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1900438

Title:
  Bcache bypasse writeback on caching device with fragmentation

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1900438/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899852] Re: Cannot assign requested address: AH00072: make_sock: could not bind to address

2020-10-15 Thread Benjamin Allot
** Description changed:

  Hello,
  
  Let's first list my configuration items:
  * apache2 2.4.29-1ubuntu4.14
  * release: Ubuntu 18.04.5 LTS
  
  Upon reboot, the following message is seen in apache2.service logs:
  
  -- Unit apache2.service has begun starting up.
  Oct 14 12:18:32 SERVER apachectl[3833]: (99)Cannot assign requested address: 
AH00072: make_sock: could not bind to address [REDACTED IPV6.33]:443
  Oct 14 12:18:32 SERVER apachectl[3833]: no listening sockets available, 
shutting down
  Oct 14 12:18:32 SERVER apachectl[3833]: AH00015: Unable to open logs
  Oct 14 12:18:32 SERVER apachectl[3833]: Action 'start' failed.
  Oct 14 12:18:32 SERVER apachectl[3833]: The Apache error log may have more 
information.
  Oct 14 12:18:33 SERVER systemd[1]: apache2.service: Control process exited, 
code=exited status=1
  Oct 14 12:18:33 SERVER systemd[1]: apache2.service: Failed with result 
'exit-code'.
  Oct 14 12:18:33 SERVER systemd[1]: Failed to start The Apache HTTP Server.
  
  The apache2 configuration is using the ipv4 and ipv6 present on the server:
  /etc/apache2/ports.conf:Listen :443
  /etc/apache2/ports.conf:Listen :443
  /etc/apache2/ports.conf:Listen [REDACTED IPV6::33]:443
  /etc/apache2/ports.conf:Listen [REDACTED IPV6::35]:443
  
- and the /etc/network/interfaces look as this (no netplan):
+ and the /etc/network/interfaces looks like this (no netplan):
  # Additional IPs that are used to serve https traffic for
  # releases.ubuntu.com so that archive doesn't respond on 443.
  auto bond0:1
  iface bond0:1 inet static
- address .247/32
- # Using up/down to avoid LP:1347246.
- up /sbin/ip addr add REDACTED IPV6::33/128 dev $IFACE preferred_lft 0
- down /bin/ip addr del REDACTED IPV6::33/128 dev $IFACE preferred_lft 0
+ address .247/32
+ # Using up/down to avoid LP:1347246.
+ up /sbin/ip addr add REDACTED IPV6::33/128 dev $IFACE preferred_lft 0
+ down /bin/ip addr del REDACTED IPV6::33/128 dev $IFACE preferred_lft 0
  
  # Additional IPs that are used to serve *.clouds.archive.ubuntu.com
  # with HTTPProtocolOptions unsafe, which is needed to work around
  # cloud-init bug LP:1868232 (cRT#125271).
  auto bond0:2
  iface bond0:2 inet static
- address .245/32
- # Using up/down to avoid LP:1347246.
- up /sbin/ip addr add REDACTED IPV6::35/128 dev $IFACE preferred_lft 0
- down /bin/ip addr del REDACTED IPV6::35/128 dev $IFACE preferred_lft 0
+ address .245/32
+ # Using up/down to avoid LP:1347246.
+ up /sbin/ip addr add REDACTED IPV6::35/128 dev $IFACE preferred_lft 0
+ down /bin/ip addr del REDACTED IPV6::35/128 dev $IFACE preferred_lft 0
  
- I was surprised that the apache2.service does not contain a 
+ I was surprised that the apache2.service does not contain a
  After=network-online.target
  
  $ systemctl show apache2.service | grep -E '(Wants|Require|After|Before)'
  RemainAfterExit=no
  Requires=system.slice sysinit.target -.mount
  Before=multi-user.target shutdown.target
  After=basic.target sysinit.target systemd-journald.socket system.slice 
network.target nss-lookup.target systemd-tmpfiles-setup.service 
remote-fs.target -.mount
  RequiresMountsFor=/var/tmp /tmp
  
  $ systemctl show network.target | grep "^After"
  After=network-pre.target ifup@bond0.service ifup@ens2f0.service 
ifup@ens2f1.service systemd-resolved.service ufw.service networking.service 
systemd-networkd.service
  
  So I was wondering if the "ifup@bond0" was enough as a dependency here,
  to be sure to have the ipv6 up and running or if we would need something
  like "ifup@bond0:2" and "ifup@bond0:1" as part of the list of the
  services in the network.target "After" list.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899852

Title:
   Cannot assign requested address: AH00072: make_sock: could not bind
  to address

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/1899852/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899852] [NEW] Cannot assign requested address: AH00072: make_sock: could not bind to address

2020-10-14 Thread Benjamin Allot
Public bug reported:

Hello,

Let's first list my configuration items:
* apache2 2.4.29-1ubuntu4.14
* release: Ubuntu 18.04.5 LTS

Upon reboot, the following message is seen in apache2.service logs:

-- Unit apache2.service has begun starting up.
Oct 14 12:18:32 SERVER apachectl[3833]: (99)Cannot assign requested address: 
AH00072: make_sock: could not bind to address [REDACTED IPV6.33]:443
Oct 14 12:18:32 SERVER apachectl[3833]: no listening sockets available, 
shutting down
Oct 14 12:18:32 SERVER apachectl[3833]: AH00015: Unable to open logs
Oct 14 12:18:32 SERVER apachectl[3833]: Action 'start' failed.
Oct 14 12:18:32 SERVER apachectl[3833]: The Apache error log may have more 
information.
Oct 14 12:18:33 SERVER systemd[1]: apache2.service: Control process exited, 
code=exited status=1
Oct 14 12:18:33 SERVER systemd[1]: apache2.service: Failed with result 
'exit-code'.
Oct 14 12:18:33 SERVER systemd[1]: Failed to start The Apache HTTP Server.

The apache2 configuration is using the ipv4 and ipv6 present on the server:
/etc/apache2/ports.conf:Listen :443
/etc/apache2/ports.conf:Listen :443
/etc/apache2/ports.conf:Listen [REDACTED IPV6::33]:443
/etc/apache2/ports.conf:Listen [REDACTED IPV6::35]:443

and the /etc/network/interfaces look as this (no netplan):
# Additional IPs that are used to serve https traffic for
# releases.ubuntu.com so that archive doesn't respond on 443.
auto bond0:1
iface bond0:1 inet static
address .247/32
# Using up/down to avoid LP:1347246.
up /sbin/ip addr add REDACTED IPV6::33/128 dev $IFACE preferred_lft 0
down /bin/ip addr del REDACTED IPV6::33/128 dev $IFACE preferred_lft 0

# Additional IPs that are used to serve *.clouds.archive.ubuntu.com
# with HTTPProtocolOptions unsafe, which is needed to work around
# cloud-init bug LP:1868232 (cRT#125271).
auto bond0:2
iface bond0:2 inet static
address .245/32
# Using up/down to avoid LP:1347246.
up /sbin/ip addr add REDACTED IPV6::35/128 dev $IFACE preferred_lft 0
down /bin/ip addr del REDACTED IPV6::35/128 dev $IFACE preferred_lft 0

I was surprised that the apache2.service does not contain a 
After=network-online.target

$ systemctl show apache2.service | grep -E '(Wants|Require|After|Before)'
RemainAfterExit=no
Requires=system.slice sysinit.target -.mount
Before=multi-user.target shutdown.target
After=basic.target sysinit.target systemd-journald.socket system.slice 
network.target nss-lookup.target systemd-tmpfiles-setup.service 
remote-fs.target -.mount
RequiresMountsFor=/var/tmp /tmp

$ systemctl show network.target | grep "^After"
After=network-pre.target ifup@bond0.service ifup@ens2f0.service 
ifup@ens2f1.service systemd-resolved.service ufw.service networking.service 
systemd-networkd.service

So I was wondering if the "ifup@bond0" was enough as a dependency here,
to be sure to have the ipv6 up and running or if we would need something
like "ifup@bond0:2" and "ifup@bond0:1" as part of the list of the
services in the network.target "After" list.

** Affects: apache2 (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899852

Title:
   Cannot assign requested address: AH00072: make_sock: could not bind
  to address

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/1899852/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1898786] Re: Issue with bcache bch_mca_scan causing huge IO wait

2020-10-13 Thread Benjamin Allot
The testing of the new kernel looks very promising. We don't observe any
of the latency/IOwait we had before even with the btree_shrinker
enabled.

We'll give it a week probably, but having a backport of those patches
would be fantastic for sure.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1898786

Title:
  Issue with bcache bch_mca_scan causing huge IO wait

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1898786/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1898786] Re: Issue with bcache bch_mca_scan causing huge IO wait

2020-10-12 Thread Benjamin Allot
Thanks Matthew, we'll try this kernel tomorrow.

I tested it on an openstack instance, the only downside is that it
uninstalls the official 4.15.0-118-generic one.

Regards,

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1898786

Title:
  Issue with bcache bch_mca_scan causing huge IO wait

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1898786/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1898786] Re: Issue with bcache bch_mca_scan causing huge IO wait

2020-10-08 Thread Benjamin Allot
Hi Mauricio,

The tests in [5][6] and [7] have been done with a 44GB memory VM.
This VM usually has 64GB of allocated memory.
The goal was to verify that the load of the whole btree in RAM was not 
prevented (like in mca-reap -> down_write_trylock).

All this kernel stuff is rather new to me, so I may be following a wrong
lead.

** Description changed:

  Hello,
  
  In short, we faced an issue with a huge IO wait on a bionic Ubuntu 
4.15.0-118.119-generic kernel.
  This is the full list of process and the kernel function they were stuck in 
[0].
  
  The main issue can probably be summarized by this perf reports
  * first identify that the cpu are stuck in idle because of something[1]
  * second, see what kernel function seems to stuck the process kswapd0 and 
kswapd1 [2].
  
  We could see that this seems to be the mutex_lock in the bch_mca_scan
  function [3].
  
  After running the command:
  
   | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-
  583c48d0c2b8/internal/btree_shrinker_disabled"
  
  The server started to respond normally and the IO wait dropped
  significantly
  
  Here is a trace of the bcache event related lock in the kernel obtained
  with some bpfcc-tools [4].
  
  klockstat-bpfcc -c bch_ -i 5 -s 3
  
  The trace has been run in parallel with the following command line
  
  echo "Shrinker disabled: $(date)"; sleep 60; echo "Enabling shrinker:
  $(date)"; echo 0 | sudo tee
  /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled ; sleep
  60; echo "Disabling shrinker: $(date)"; echo 1 | sudo tee
  /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled; sleep
  60; echo "End of test: $(date)"
  
  Trying to dig more, we reduced by 20 GB the memory allocated to a VM on the 
server.
  * The bcache btree size fluctuation seems "normal" [5]
- * I noticed that, when the shrinker was enabled,a lot of time was spent in 
the locks during "bch_btree_insert_node".
+ * I noticed that, when the shrinker was enabled,a lot of time was spent in 
the locks during "bch_btree_insert_node". [6]
  
  I decided to check if one of the function called during
  bch_btree_insert_node was taking longer than usual when the shrinker was
  enabled.
  
  I finally found the "funclatency" tool and tried do have the same approach I 
had with the klockstat [7]. However, that was inconclusive. I could see there 
that the bch_btree_insert_node was barely called during the whole duration of 
the test.
  Which made me think it's amount of time spent in lock is more due to another 
process acquiring the lock.
  
  I'm going to try to have another go with some perf/klockstat/funclatency
  focused on bch_mca_scan and the function called there.
  
  Also, here are some memory related metrics [8].
  
- 
  Now another perf stacktrace with the command used [9].
  Strangely this one doesn't show any bch_mca_scan at all.
  
  I enabled the shrinker again, hoping to get more traces, but apparently the 
timeframe was not right. Not enough load to trigger the cliff resulting in a 
1sec IOwait plateau.
  Which is interesting because that means that without the maximal workload, 
the kernel can cope with the shrinker.
- 
  
  [0]: https://pastebin.ubuntu.com/p/QYXPdsMCWC/
  [1]: https://pastebin.ubuntu.com/p/BFsvF7H54r/
  [2]: https://pastebin.ubuntu.com/p/35qdsHYHf5/
  [3]: 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674
  [4]: https://pastebin.ubuntu.com/p/qhyqP35fCw/
  [5]: https://pastebin.ubuntu.com/p/McjxxqTVjn/
  [6]: https://pastebin.ubuntu.com/p/KmrnW4Ng8F/
  [7]: https://pastebin.ubuntu.com/p/fSX4c7tTFV/
  [8]: https://pastebin.ubuntu.com/p/CZgXkgKhmJ/
  [9]: https://pastebin.ubuntu.com/p/DzKCP8NGdf/
  
  
  $ cat /proc/version_signature
  Ubuntu 4.15.0-118.119-generic 4.15.18
  
  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-image-4.15.0-118-generic 4.15.0-118.119
  ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18
  Uname: Linux 4.15.0-118-generic x86_64
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Sep 29 10:04 seq
   crw-rw 1 root audio 116, 33 Sep 29 10:04 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.16
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Tue Oct  6 20:36:18 2020
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  MachineType: HP ProLiant DL380 G7
  PciMultimedia:
  
  ProcFB: 0 radeondrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic 
root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-118-generic N/A
   

[Bug 1898786] Re: Issue with bcache bch_mca_scan causing huge IO wait

2020-10-08 Thread Benjamin Allot
** Description changed:

  Hello,
  
  In short, we faced an issue with a huge IO wait on a bionic Ubuntu 
4.15.0-118.119-generic kernel.
  This is the full list of process and the kernel function they were stuck in 
[0].
  
  The main issue can probably be summarized by this perf reports
  * first identify that the cpu are stuck in idle because of something[1]
  * second, see what kernel function seems to stuck the process kswapd0 and 
kswapd1 [2].
  
  We could see that this seems to be the mutex_lock in the bch_mca_scan
  function [3].
  
  After running the command:
  
   | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-
  583c48d0c2b8/internal/btree_shrinker_disabled"
  
  The server started to respond normally and the IO wait dropped
  significantly
  
  Here is a trace of the bcache event related lock in the kernel obtained
  with some bpfcc-tools [4].
  
  klockstat-bpfcc -c bch_ -i 5 -s 3
  
  The trace has been run in parallel with the following command line
  
  echo "Shrinker disabled: $(date)"; sleep 60; echo "Enabling shrinker:
  $(date)"; echo 0 | sudo tee
  /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled ; sleep
  60; echo "Disabling shrinker: $(date)"; echo 1 | sudo tee
  /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled; sleep
  60; echo "End of test: $(date)"
  
  Trying to dig more, we reduced by 20 GB the memory allocated to a VM on the 
server.
  * The bcache btree size fluctuation seems "normal" [5]
  * I noticed that, when the shrinker was enabled,a lot of time was spent in 
the locks during "bch_btree_insert_node".
  
  I decided to check if one of the function called during
  bch_btree_insert_node was taking longer than usual when the shrinker was
  enabled.
  
  I finally found the "funclatency" tool and tried do have the same approach I 
had with the klockstat [7]. However, that was inconclusive. I could see there 
that the bch_btree_insert_node was barely called during the whole duration of 
the test.
  Which made me think it's amount of time spent in lock is more due to another 
process acquiring the lock.
  
  I'm going to try to have another go with some perf/klockstat/funclatency
  focused on bch_mca_scan and the function called there.
  
  Also, here are some memory related metrics [8].
  
  
+ Now another perf stacktrace with the command used [9].
+ Strangely this one doesn't show any bch_mca_scan at all.
+ 
+ I enabled the shrinker again, hoping to get more traces, but apparently the 
timeframe was not right. Not enough load to trigger the cliff resulting in a 
1sec IOwait plateau.
+ Which is interesting because that means that without the maximal workload, 
the kernel can cope with the shrinker.
+ 
+ 
  [0]: https://pastebin.ubuntu.com/p/QYXPdsMCWC/
  [1]: https://pastebin.ubuntu.com/p/BFsvF7H54r/
  [2]: https://pastebin.ubuntu.com/p/35qdsHYHf5/
  [3]: 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674
  [4]: https://pastebin.ubuntu.com/p/qhyqP35fCw/
  [5]: https://pastebin.ubuntu.com/p/McjxxqTVjn/
  [6]: https://pastebin.ubuntu.com/p/KmrnW4Ng8F/
  [7]: https://pastebin.ubuntu.com/p/fSX4c7tTFV/
  [8]: https://pastebin.ubuntu.com/p/CZgXkgKhmJ/
+ [9]: https://pastebin.ubuntu.com/p/DzKCP8NGdf/
  
  
  $ cat /proc/version_signature
  Ubuntu 4.15.0-118.119-generic 4.15.18
  
  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-image-4.15.0-118-generic 4.15.0-118.119
  ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18
  Uname: Linux 4.15.0-118-generic x86_64
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Sep 29 10:04 seq
   crw-rw 1 root audio 116, 33 Sep 29 10:04 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.16
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Tue Oct  6 20:36:18 2020
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  MachineType: HP ProLiant DL380 G7
  PciMultimedia:
  
  ProcFB: 0 radeondrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic 
root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-118-generic N/A
   linux-backports-modules-4.15.0-118-generic  N/A
   linux-firmware  1.173.18
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago)
  dmi.bios.date: 05/05/2011
  dmi.bios.vendor: HP
  dmi.bios.version: P67
  dmi.chassis.type: 23
  dmi.chassis.vendor: HP
  dmi.modalias: 
dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr:
  

[Bug 1898786] Re: Issue with bcache bch_mca_scan causing huge IO wait

2020-10-08 Thread Benjamin Allot
** Description changed:

  Hello,
  
  In short, we faced an issue with a huge IO wait on a bionic Ubuntu 
4.15.0-118.119-generic kernel.
  This is the full list of process and the kernel function they were stuck in 
[0].
  
  The main issue can probably be summarized by this perf reports
  * first identify that the cpu are stuck in idle because of something[1]
  * second, see what kernel function seems to stuck the process kswapd0 and 
kswapd1 [2].
  
  We could see that this seems to be the mutex_lock in the bch_mca_scan
  function [3].
  
  After running the command:
  
   | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-
  583c48d0c2b8/internal/btree_shrinker_disabled"
  
  The server started to respond normally and the IO wait dropped
  significantly
  
  Here is a trace of the bcache event related lock in the kernel obtained
  with some bpfcc-tools [4].
  
  klockstat-bpfcc -c bch_ -i 5 -s 3
  
  The trace has been run in parallel with the following command line
  
  echo "Shrinker disabled: $(date)"; sleep 60; echo "Enabling shrinker:
  $(date)"; echo 0 | sudo tee
  /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled ; sleep
  60; echo "Disabling shrinker: $(date)"; echo 1 | sudo tee
  /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled; sleep
  60; echo "End of test: $(date)"
  
+ Trying to dig more, we reduced by 20 GB the memory allocated to a VM on the 
server.
+ * The bcache btree size fluctuation seems "normal" [5]
+ * I noticed that, when the shrinker was enabled,a lot of time was spent in 
the locks during "bch_btree_insert_node".
+ 
+ I decided to check if one of the function called during
+ bch_btree_insert_node was taking longer than usual when the shrinker was
+ enabled.
+ 
+ I finally found the "funclatency" tool and tried do have the same approach I 
had with the klockstat [7]. However, that was inconclusive. I could see there 
that the bch_btree_insert_node was barely called during the whole duration of 
the test.
+ Which made me think it's amount of time spent in lock is more due to another 
process acquiring the lock.
+ 
+ I'm going to try to have another go with some perf/klockstat/funclatency
+ focused on bch_mca_scan and the function called there.
+ 
+ Also, here are some memory related metrics [8].
+ 
  
  [0]: https://pastebin.ubuntu.com/p/QYXPdsMCWC/
  [1]: https://pastebin.ubuntu.com/p/BFsvF7H54r/
  [2]: https://pastebin.ubuntu.com/p/35qdsHYHf5/
  [3]: 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674
  [4]: https://pastebin.ubuntu.com/p/qhyqP35fCw/
+ [5]: https://pastebin.ubuntu.com/p/McjxxqTVjn/
+ [6]: https://pastebin.ubuntu.com/p/KmrnW4Ng8F/
+ [7]: https://pastebin.ubuntu.com/p/fSX4c7tTFV/
+ [8]: https://pastebin.ubuntu.com/p/CZgXkgKhmJ/
  
  
  $ cat /proc/version_signature
  Ubuntu 4.15.0-118.119-generic 4.15.18
  
  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-image-4.15.0-118-generic 4.15.0-118.119
  ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18
  Uname: Linux 4.15.0-118-generic x86_64
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Sep 29 10:04 seq
   crw-rw 1 root audio 116, 33 Sep 29 10:04 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.16
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Tue Oct  6 20:36:18 2020
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  MachineType: HP ProLiant DL380 G7
  PciMultimedia:
  
  ProcFB: 0 radeondrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic 
root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-118-generic N/A
   linux-backports-modules-4.15.0-118-generic  N/A
   linux-firmware  1.173.18
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago)
  dmi.bios.date: 05/05/2011
  dmi.bios.vendor: HP
  dmi.bios.version: P67
  dmi.chassis.type: 23
  dmi.chassis.vendor: HP
  dmi.modalias: 
dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr:
  dmi.product.family: ProLiant
  dmi.product.name: ProLiant DL380 G7
  dmi.sys.vendor: HP

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1898786

Title:
  Issue with bcache bch_mca_scan causing huge IO wait

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1898786/+subscriptions

-- 
ubuntu-bugs mailing list

[Bug 1898786] Re: Issue with bcache bch_mca_scan causing huge IO wait

2020-10-08 Thread Benjamin Allot
Hello Mauricio,

That was also one of the conclusion we reached yesterday.

I followed a wrong lead yesterday, after getting another more detailed
output with klockstat.

I'll update the description of the bug at the end, adding the bits I
found.

The system is not really having memory pressure but to confirm this, we resized 
the memory allocated to a VM on the server from 64GB to 44GB and looked into 
the shrinker behavior.
It didn't change a thing, and the IO wait started to rise again.

As for the IO load, I'm not sure, I would probably need to do a blktrace
and register it (never did that, but I read that you can then passes the
result to fio to reproduce a load)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1898786

Title:
  Issue with bcache bch_mca_scan causing huge IO wait

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1898786/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1898786] Re: Issue with bcache bch_mca_scan causing huge IO wait

2020-10-08 Thread Benjamin Allot
** Description changed:

  Hello,
  
  In short, we faced an issue with a huge IO wait on a bionic Ubuntu 
4.15.0-118.119-generic kernel.
  This is the full list of process and the kernel function they were stuck in 
[0].
  
  The main issue can probably be summarized by this perf reports
  * first identify that the cpu are stuck in idle because of something[1]
  * second, see what kernel function seems to stuck the process kswapd0 and 
kswapd1 [2].
  
  We could see that this seems to be the mutex_lock in the bch_mca_scan
  function [3].
  
  After running the command:
  
   | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-
  583c48d0c2b8/internal/btree_shrinker_disabled"
  
  The server started to respond normally and the IO wait dropped significantly
- [0]: https://pastebin.canonical.com/p/wYYKwHdRXk/
- [1]: https://pastebin.canonical.com/p/n2Tw57QyBC/
- [2]: https://pastebin.canonical.com/p/3QqFTfdHhX/
+ [0]: https://pastebin.ubuntu.com/p/QYXPdsMCWC/
+ [1]: https://pastebin.ubuntu.com/p/BFsvF7H54r/
+ [2]: https://pastebin.ubuntu.com/p/35qdsHYHf5/
  [3]: 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674
  
  
  $ cat /proc/version_signature
  Ubuntu 4.15.0-118.119-generic 4.15.18
  
  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-image-4.15.0-118-generic 4.15.0-118.119
  ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18
  Uname: Linux 4.15.0-118-generic x86_64
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Sep 29 10:04 seq
   crw-rw 1 root audio 116, 33 Sep 29 10:04 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.16
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Tue Oct  6 20:36:18 2020
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  MachineType: HP ProLiant DL380 G7
  PciMultimedia:
  
  ProcFB: 0 radeondrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic 
root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-118-generic N/A
   linux-backports-modules-4.15.0-118-generic  N/A
   linux-firmware  1.173.18
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago)
  dmi.bios.date: 05/05/2011
  dmi.bios.vendor: HP
  dmi.bios.version: P67
  dmi.chassis.type: 23
  dmi.chassis.vendor: HP
  dmi.modalias: 
dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr:
  dmi.product.family: ProLiant
  dmi.product.name: ProLiant DL380 G7
  dmi.sys.vendor: HP

** Description changed:

  Hello,
  
  In short, we faced an issue with a huge IO wait on a bionic Ubuntu 
4.15.0-118.119-generic kernel.
  This is the full list of process and the kernel function they were stuck in 
[0].
  
  The main issue can probably be summarized by this perf reports
  * first identify that the cpu are stuck in idle because of something[1]
  * second, see what kernel function seems to stuck the process kswapd0 and 
kswapd1 [2].
  
  We could see that this seems to be the mutex_lock in the bch_mca_scan
  function [3].
  
  After running the command:
  
   | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-
  583c48d0c2b8/internal/btree_shrinker_disabled"
  
- The server started to respond normally and the IO wait dropped significantly
+ The server started to respond normally and the IO wait dropped
+ significantly
+ 
+ Here is a trace of the bcache event related lock in the kernel obtained
+ with some bpfcc-tools [4].
+ 
+ klockstat-bpfcc -c bch_ -i 5 -s 3
+ 
+ The trace has been run in parallel with the following command line
+ 
+ echo "Shrinker disabled: $(date)"; sleep 60; echo "Enabling shrinker:
+ $(date)"; echo 0 | sudo tee
+ /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled ; sleep
+ 60; echo "Disabling shrinker: $(date)"; echo 1 | sudo tee
+ /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled; sleep
+ 60; echo "End of test: $(date)"
+ 
+ 
  [0]: https://pastebin.ubuntu.com/p/QYXPdsMCWC/
  [1]: https://pastebin.ubuntu.com/p/BFsvF7H54r/
  [2]: https://pastebin.ubuntu.com/p/35qdsHYHf5/
  [3]: 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674
+ [4]: https://pastebin.ubuntu.com/p/qhyqP35fCw/
  
  
  $ cat /proc/version_signature
  Ubuntu 4.15.0-118.119-generic 4.15.18
  
  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-image-4.15.0-118-generic 4.15.0-118.119
  ProcVersionSignature: User Name 

[Bug 1898786] Re: Issue with bcache bch_mca_scan causing huge IO wait

2020-10-07 Thread Benjamin Allot
Here is a trace of the bcache event related lock in the kernel obtained
with some bpfcc-tools.

klockstat-bpfcc -c bch_ -i 5 -s 3

The trace has been run in parallel with the following command line

echo "Shrinker disabled: $(date)"; sleep 60; echo "Enabling shrinker:
$(date)"; echo 0 | sudo tee
/sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled ; sleep
60; echo "Disabling shrinker: $(date)"; echo 1 | sudo tee
/sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled; sleep
60; echo "End of test: $(date)"

The log are here : https://pastebin.canonical.com/p/jVKdbV3RrK/

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1898786

Title:
  Issue with bcache bch_mca_scan causing huge IO wait

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1898786/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1898786] Re: Issue with bcache bch_mca_scan causing huge IO wait

2020-10-07 Thread Benjamin Allot
Actually we cannot be sure no. The server didn't have any metrics prior
to few days ago and the issue was already there.

It's worth nothing that few servers have this bcache configuration,
because the cache mode is configured as writethrough and the load is
pretty significant.

So no last "good" version.

Actually, we have various IO wait issue on another platform (but running
xenial-hwe kernel) and we suspected bcache already. Mentioning it to
show that bcache behavior seems to be related to some disk performance
since quite some time.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1898786

Title:
  Issue with bcache bch_mca_scan causing huge IO wait

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1898786/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1898786] [NEW] Issue with bcache bch_mca_scan causing huge IO wait

2020-10-06 Thread Benjamin Allot
Public bug reported:

Hello,

In short, we faced an issue with a huge IO wait on a bionic Ubuntu 
4.15.0-118.119-generic kernel.
This is the full list of process and the kernel function they were stuck in [2].

The main issue can probably be summarized by this perf reports 
* first identify that the cpu are stuck in idle because of something[1]
* second, see what kernel function seems to stuck the process kswapd0 and 
kswapd1 [2].

We could see that this seems to be the mutex_lock in the bch_mca_scan
function [3].

After running the command:

 | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-
583c48d0c2b8/internal/btree_shrinker_disabled"

The server started to respond normally and the IO wait dropped significantly
[0]: https://pastebin.canonical.com/p/wYYKwHdRXk/
[1]: https://pastebin.canonical.com/p/n2Tw57QyBC/
[2]: https://pastebin.canonical.com/p/3QqFTfdHhX/
[3]: 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674


$ cat /proc/version_signature
Ubuntu 4.15.0-118.119-generic 4.15.18

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-118-generic 4.15.0-118.119
ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18
Uname: Linux 4.15.0-118-generic x86_64
AlsaDevices:
 total 0
 crw-rw 1 root audio 116,  1 Sep 29 10:04 seq
 crw-rw 1 root audio 116, 33 Sep 29 10:04 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.16
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
Date: Tue Oct  6 20:36:18 2020
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
MachineType: HP ProLiant DL380 G7
PciMultimedia:
 
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic 
root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-118-generic N/A
 linux-backports-modules-4.15.0-118-generic  N/A
 linux-firmware  1.173.18
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago)
dmi.bios.date: 05/05/2011
dmi.bios.vendor: HP
dmi.bios.version: P67
dmi.chassis.type: 23
dmi.chassis.vendor: HP
dmi.modalias: 
dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr:
dmi.product.family: ProLiant
dmi.product.name: ProLiant DL380 G7
dmi.sys.vendor: HP

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New


** Tags: amd64 apport-bug bionic

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1898786

Title:
  Issue with bcache bch_mca_scan causing huge IO wait

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1898786/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1873352] UdevLog.txt

2020-04-16 Thread Benjamin Allot
apport information

** Attachment added: "UdevLog.txt"
   
https://bugs.launchpad.net/bugs/1873352/+attachment/5355748/+files/UdevLog.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1873352

Title:
  unregister_netdevice: waiting for tape3e33cb9-d6 to become free. Usage
  count = 2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873352/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1873352] ProcCpuinfo.txt

2020-04-16 Thread Benjamin Allot
apport information

** Attachment added: "ProcCpuinfo.txt"
   
https://bugs.launchpad.net/bugs/1873352/+attachment/5355742/+files/ProcCpuinfo.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1873352

Title:
  unregister_netdevice: waiting for tape3e33cb9-d6 to become free. Usage
  count = 2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873352/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1873352] ProcInterrupts.txt

2020-04-16 Thread Benjamin Allot
apport information

** Attachment added: "ProcInterrupts.txt"
   
https://bugs.launchpad.net/bugs/1873352/+attachment/5355745/+files/ProcInterrupts.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1873352

Title:
  unregister_netdevice: waiting for tape3e33cb9-d6 to become free. Usage
  count = 2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873352/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1873352] WifiSyslog.txt

2020-04-16 Thread Benjamin Allot
apport information

** Attachment added: "WifiSyslog.txt"
   
https://bugs.launchpad.net/bugs/1873352/+attachment/5355749/+files/WifiSyslog.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1873352

Title:
  unregister_netdevice: waiting for tape3e33cb9-d6 to become free. Usage
  count = 2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873352/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1873352] UdevDb.txt

2020-04-16 Thread Benjamin Allot
apport information

** Attachment added: "UdevDb.txt"
   https://bugs.launchpad.net/bugs/1873352/+attachment/5355747/+files/UdevDb.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1873352

Title:
  unregister_netdevice: waiting for tape3e33cb9-d6 to become free. Usage
  count = 2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873352/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1873352] ProcCpuinfoMinimal.txt

2020-04-16 Thread Benjamin Allot
apport information

** Attachment added: "ProcCpuinfoMinimal.txt"
   
https://bugs.launchpad.net/bugs/1873352/+attachment/5355743/+files/ProcCpuinfoMinimal.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1873352

Title:
  unregister_netdevice: waiting for tape3e33cb9-d6 to become free. Usage
  count = 2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873352/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1873352] ProcModules.txt

2020-04-16 Thread Benjamin Allot
apport information

** Attachment added: "ProcModules.txt"
   
https://bugs.launchpad.net/bugs/1873352/+attachment/5355746/+files/ProcModules.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1873352

Title:
  unregister_netdevice: waiting for tape3e33cb9-d6 to become free. Usage
  count = 2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873352/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1873352] NonfreeKernelModules.txt

2020-04-16 Thread Benjamin Allot
apport information

** Attachment added: "NonfreeKernelModules.txt"
   
https://bugs.launchpad.net/bugs/1873352/+attachment/5355741/+files/NonfreeKernelModules.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1873352

Title:
  unregister_netdevice: waiting for tape3e33cb9-d6 to become free. Usage
  count = 2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873352/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1873352] CurrentDmesg.txt

2020-04-16 Thread Benjamin Allot
apport information

** Attachment added: "CurrentDmesg.txt"
   
https://bugs.launchpad.net/bugs/1873352/+attachment/5355738/+files/CurrentDmesg.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1873352

Title:
  unregister_netdevice: waiting for tape3e33cb9-d6 to become free. Usage
  count = 2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873352/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1873352] ProcEnviron.txt

2020-04-16 Thread Benjamin Allot
apport information

** Attachment added: "ProcEnviron.txt"
   
https://bugs.launchpad.net/bugs/1873352/+attachment/5355744/+files/ProcEnviron.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1873352

Title:
  unregister_netdevice: waiting for tape3e33cb9-d6 to become free. Usage
  count = 2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873352/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1873352] Lsusb.txt

2020-04-16 Thread Benjamin Allot
apport information

** Attachment added: "Lsusb.txt"
   https://bugs.launchpad.net/bugs/1873352/+attachment/5355740/+files/Lsusb.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1873352

Title:
  unregister_netdevice: waiting for tape3e33cb9-d6 to become free. Usage
  count = 2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873352/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1873352] BootDmesg.txt

2020-04-16 Thread Benjamin Allot
apport information

** Attachment added: "BootDmesg.txt"
   
https://bugs.launchpad.net/bugs/1873352/+attachment/5355736/+files/BootDmesg.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1873352

Title:
  unregister_netdevice: waiting for tape3e33cb9-d6 to become free. Usage
  count = 2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873352/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1873352] CRDA.txt

2020-04-16 Thread Benjamin Allot
apport information

** Attachment added: "CRDA.txt"
   https://bugs.launchpad.net/bugs/1873352/+attachment/5355737/+files/CRDA.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1873352

Title:
  unregister_netdevice: waiting for tape3e33cb9-d6 to become free. Usage
  count = 2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873352/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1873352] [NEW] unregister_netdevice: waiting for tape3e33cb9-d6 to become free. Usage count = 2

2020-04-16 Thread Benjamin Allot
Public bug reported:

Hello,

We recently encountered an issue when trying to stop nova instances.

| Apr 16 09:35:10 ginzel kernel: [6925958.071665] unregister_netdevice:
waiting for tape3e33cb9-d6 to become free. Usage count = 2

When I try to stop manually the qemu process, it errors
"""
ballot@ginzel:~$ sudo virsh destroy instance-000718af   
  
setlocale: No such file or directory
 
error: Failed to destroy domain instance-000718af   
 
error: Failed to terminate process 20953 with SIGKILL: Device or resource busy
"""

The qemu process is in D state because of this
"""
libvirt+ 20953 12.9  0.5 9001400 2925908 ? DApr15  84:25 
/usr/bin/qemu-system-x86_64 -name instance-000718af -S -machine 
pc-i440fx-trusty,accel=kvm,usb=off -cpu 
EPYC-IBPB-2.0,+perfctr_nb,+perfctr_core,+t
opoext,+tce,+wdt,+skinit,+extapic,+cmp_legacy,+osxsave,+ht -m 4096 -realtime 
mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 
020ecf87-4c96-47af-8d3f-d86d51b445f4 -smbios type=1,manufacturer=OpenStack 
Foundati
on,product=OpenStack 
Nova,version=2014.1.5,serial=993710e9-0a44-428e-88ec-9210c6dfed55,uuid=020ecf87-4c96-47af-8d3f-d86d51b445f4
 -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/q
emu/instance-000718af.monitor,server,nowait -mon 
chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global 
kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device 
pii
x3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
file=/srv/nova/instances/020ecf87-4c96-47af-8d3f-d86d51b445f4/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
 -device virtio-blk-pci,scsi=off,bus=pci.0
,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev 
tap,fd=24,id=hostnet0,vhost=on,vhostfd=29 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:eb:99:5e,bus=pci.0,addr=0x3 
-chardev f
ile,id=charserial0,path=/srv/nova/instances/020ecf87-4c96-47af-8d3f-d86d51b445f4/console.log
 -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 
-device isa-serial,chardev=charserial1,i
d=serial1 -device usb-tablet,id=input0 -vnc 0.0.0.0:4 -k en-us -device 
cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
ballot@ginzel:~$ sudo cat /proc/20953/sta   
 
stack   statstatm   status
"""

and we can see the stack of the process
"""
ballot@ginzel:~$ sudo cat /proc/20953/stack 
[<0>] msleep+0x2d/0x40  
[<0>] netdev_run_todo+0x11c/0x320   
 
[<0>] rtnl_unlock+0xe/0x10
[<0>] tun_chr_close+0x28/0x30   
[<0>] __fput+0xea/0x220 
 
[<0>] fput+0xe/0x10 
 
[<0>] task_work_run+0x9d/0xc0   
 
[<0>] exit_to_usermode_loop+0xc0/0xd0   
 
[<0>] do_syscall_64+0x115/0x130 
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2  
 
[<0>] 0x
"""

Will attach the relevant files
--- 
AlsaDevices:
 total 0
 crw-rw 1 root audio 116,  1 Jan 27 05:43 seq
 crw-rw 1 root audio 116, 33 Jan 27 05:43 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.14.1-0ubuntu3.29
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 14.04
IwConfig: Error: [Errno 2] No such file or directory
MachineType: HPE ProLiant DL385 Gen10
Package: linux (not installed)
PciMultimedia:
 
ProcFB: 0 mgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-42-generic 
root=UUID=06e712a9-0adb-4078-8976-0c41ce346aa5 ro console=tty0 
console=ttyS0,115200
ProcVersionSignature: Ubuntu 4.15.0-42.45-generic 4.15.18
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-42-generic N/A
 linux-backports-modules-4.15.0-42-generic  N/A
 linux-firmware 1.127.24
RfKill: Error: [Errno 2] No such file or directory
Tags:  trusty uec-images
Uname: Linux 4.15.0-42-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
 
_MarkForUpload: True
dmi.bios.date: 10/02/2018
dmi.bios.vendor: HPE
dmi.bios.version: A40
dmi.board.name: ProLiant DL385 Gen10
dmi.board.vendor: HPE
dmi.chassis.type: 23
dmi.chassis.vendor: HPE
dmi.modalias: 
dmi:bvnHPE:bvrA40:bd10/02/2018:svnHPE:pnProLiantDL385Gen10:pvr:rvnHPE:rnProLiantDL385Gen10:rvr:cvnHPE:ct23:cvr:

[Bug 1873352] Lspci.txt

2020-04-16 Thread Benjamin Allot
apport information

** Attachment added: "Lspci.txt"
   https://bugs.launchpad.net/bugs/1873352/+attachment/5355739/+files/Lspci.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1873352

Title:
  unregister_netdevice: waiting for tape3e33cb9-d6 to become free. Usage
  count = 2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873352/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs