[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
** No longer affects: linux-azure (Ubuntu) ** No longer affects: linux-azure (Ubuntu Hirsute) ** No longer affects: cloud-init (Ubuntu Hirsute) ** No longer affects: cloud-init (Ubuntu) ** Project changed: cloud-init => ubuntu-translations ** No longer affects: ubuntu-translations -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
I tried to reproduce the issue with the latest hirsute image pushed to Azure and it appears that I cannot. While I can still reproduce the issue with 20210511.1, I can't with 20210622.1. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
@rbalint @ddstreed any update on this issue? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
It's been there for almost 3 months, I think it can wait a few more days. It's affecting Azure's users who use Hirsute instances with accelerated networking enabled, I don't know how many users that is. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
@ddstreed Could you please look into this SRU candidate? @gjolly What is the importance/urgency of landing this fix? ** Also affects: cloud-init (Ubuntu Hirsute) Importance: Undecided Status: New ** Also affects: systemd (Ubuntu Hirsute) Importance: Undecided Status: New ** Also affects: linux-azure (Ubuntu Hirsute) Importance: Undecided Status: New ** Changed in: systemd (Ubuntu) Status: Incomplete => Fix Released ** Changed in: linux-azure (Ubuntu Hirsute) Status: New => Invalid ** Changed in: cloud-init (Ubuntu Hirsute) Status: New => Incomplete -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
@rbalint, I just tested Impish and I cannot reproduce the issue. It doesn't prove it's not there but it's a very good sign! I can see that our automated testing failed because of this issue on the 2021-05-10 but not after. If I'm not wrong, systemd 248 was pushed on the 14th, this would make a nice correlation. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
@rbalint I was able to reproduce the issue with an image that was running the following: ii cloud-init20.4.1-79-g71564dce-0ubuntu1 all initialization and customization tool for cloud instances ii linux-image-azure 5.8.0.1017.19+21.04.14 amd64Linux kernel image for Azure systems. ii systemd 247.3-3ubuntu3 amd64system and service manager For reference, I took 20210219 and only upgraded systemd. I will also try to reproduce with Impish to confirm whether or not systemd 248 solves the issue. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
Hi, Gauthier and Gauthier. I'm marking linux-azure as invalid for now. If something changes please let us know. Thank you. ** Changed in: linux-azure (Ubuntu) Status: New => Invalid -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
@gjolly There were networking-related changes in systemd which could have caused this but I also see that the kernel packages also changed between the Azure images: -ii linux-image-azure 5.8.0.1017.19+21.04.14 ... +ii linux-image-5.8.0-1022-azure 5.8.0-1022.24+21.04.2 Could you please prepare an image that has the older kernel and newer systemd to rule out kernel changes causing the issue? Also do you still observe the issue in Impish? It has systemd 248 now. ** Changed in: systemd (Ubuntu) Status: New => Incomplete -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
Hi there, Is there any update on this issue? I would like to understand who owns the investigation/debugging process? Please tell us if you need any help from the CPC team. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
** Tags removed: rls-hh-incoming -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
** Tags added: fr-1324 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
** Changed in: cloud-init (Ubuntu) Milestone: ubuntu-21.04 => hirsute-updates ** Changed in: linux-azure (Ubuntu) Milestone: ubuntu-21.04 => hirsute-updates -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
** Changed in: cloud-init (Ubuntu) Status: New => Incomplete -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
I attach here the full diff of packages between the two serials. ** Attachment added: "Package diff between 20210219 and 20210220 20.04 Azure images" https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1919177/+attachment/5491010/+files/package-diff.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
I isolated the issue between those two serial: 20210219 (doesn't have the issue) and 20210220 (reproduces the issue). The main difference I can see between those two serial is systemd version: 20210219 -> 247.1-4ubuntu1 20210220 -> 247.3-1ubuntu2 Cloud-init version is the same (20.4.1-79-g71564dce-0ubuntu1). I will try to get a better package diff between those two images to understand if anything else significant changed. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
** Also affects: systemd (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
** Description changed: [General] On Azure, when provisioning a Hirsute VM with Accelerated Networking - enabled, sometimes the SSH key is not setup properly and the user cannot - log into the VM. + enabled, sometimes part of the cloud-init configuration is not applied. + + Especially, in those cases, the SSH key is not setup properly and the + user cannot log into the VM. [how to reproduce] Start a VM with AN enabled: ``` az vm create --name "$VM_NAME --resource-group "$GROUP" --location "UK South" --image 'Canonical:0001-com-ubuntu-server-hirsute-daily:21_04-daily-gen2:latest' --size Standard_F8s_v2 --admin-username ubuntu --ssh-key-value "$SSH_KEY" --accelerated-networking ``` After a moment, try to SSH: if you succeed, delete and recreate a new VM. [troubleshooting] - To be able to connect into the VM to debug, run: + To be able to connect into the VM, run: - ``` + az vm run-command invoke -g "$GROUP" -n "$VM_NAME" --command-id RunShellScript --scripts "sudo -u ubuntu ssh-import-id $LP_USERNAME" ``` In "/run/cloud-init/instance-data.json", I can see: ``` - "publicKeys": [ - { -"keyData": "", -"path": "/home/ubuntu/.ssh/authorized_keys" - } - ], + "publicKeys": [ + { + "keyData": "", + "path": "/home/ubuntu/.ssh/authorized_keys" + } + ], ``` as expected. + + [workaround] + + As mentioned, Azure allows the user to run command into the VM without + SSH connection. To do so, one can use the Azure CLI: + + + az vm run-command invoke -g "$GROUP" -n "$VM_NAME" --command-id RunShellScript --scripts "sudo -u ubuntu ssh-import-id $LP_USERNAME" + + This example uses "ssh-import-id" but it's also possible to just echo a + given public key into /home/ubuntu/.ssh/authorized_keys + + NOTE: this will only solves the SSH issue, I do not know if this bug + affects other things. If so the user would have to apply those things + manually. ** Description changed: [General] On Azure, when provisioning a Hirsute VM with Accelerated Networking enabled, sometimes part of the cloud-init configuration is not applied. - Especially, in those cases, the SSH key is not setup properly and the - user cannot log into the VM. + Especially, in those cases, the public SSH key is not setup properly. [how to reproduce] Start a VM with AN enabled: ``` az vm create --name "$VM_NAME --resource-group "$GROUP" --location "UK South" --image 'Canonical:0001-com-ubuntu-server-hirsute-daily:21_04-daily-gen2:latest' --size Standard_F8s_v2 --admin-username ubuntu --ssh-key-value "$SSH_KEY" --accelerated-networking ``` After a moment, try to SSH: if you succeed, delete and recreate a new VM. [troubleshooting] To be able to connect into the VM, run: - az vm run-command invoke -g "$GROUP" -n "$VM_NAME" --command-id RunShellScript --scripts "sudo -u ubuntu ssh-import-id $LP_USERNAME" ``` In "/run/cloud-init/instance-data.json", I can see: ``` "publicKeys": [ { "keyData": "", "path": "/home/ubuntu/.ssh/authorized_keys" } ], ``` as expected. [workaround] As mentioned, Azure allows the user to run command into the VM without SSH connection. To do so, one can use the Azure CLI: - - az vm run-command invoke -g "$GROUP" -n "$VM_NAME" --command-id RunShellScript --scripts "sudo -u ubuntu ssh-import-id $LP_USERNAME" + az vm run-command invoke -g "$GROUP" -n "$VM_NAME" --command-id + RunShellScript --scripts "sudo -u ubuntu ssh-import-id $LP_USERNAME" This example uses "ssh-import-id" but it's also possible to just echo a given public key into /home/ubuntu/.ssh/authorized_keys NOTE: this will only solves the SSH issue, I do not know if this bug affects other things. If so the user would have to apply those things manually. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
The issue probably* started between these two serials: 20210215 and 20210304. *I say "probably" because while I'm sure I can reproduce the issue with 20210304, the fact I didn't manage to reproduce it with 20210215 (in ~15 tries) doesn't **absolutely prove** that the issue wasn't there. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
Hi, Ok thanks. I am now trying to find at which time the issue was first introduced. So far I tested an image from 20201222 and confirm that I cannot reproduce the issue with this image. Then by modifying the image (always before first boot) I was able to find that: - upgrading cloud-init from 20.4-0ubuntu1 to 21.1-19-gbad84ad4-0ubuntu2 I still cannot reproduce the issue - upgrading systemd from 246.6-5ubuntu1 to 247.3-1ubuntu4 I couldn't reproduce the issue - upgrading BOTH cloud-init and systemd at the same time I was able to reproduce the issue I am now trying to test newer images to understand what is the first image that introduces the issue. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
Hi, I don't think that's a viable option as cloud-init may very well be the component that allows the network interface to be configured with a routable address, so we can't wait such an address to be available before configuring the machine, at least not in general. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
I don't have any particular knowledge of how systemd waits for the network to be ready but reading systemd-networkd-wait-online.service(8) and networkctl(1) I wonder if cloud-init shouldn't wait for networkd- wait-online to report that the link is "routable" instead of "degradated" (default minimum). "/usr/lib/systemd/systemd-networkd-wait-online -o routable" -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
Hi there, For whether or not the issue affects Groovy. I've been creating Groovy VMs on a loop to confirm if I could reproduce it there and (after ~50 tries) I still haven't seen it happen. I think we can safely assume that this issue does not affect Groovy. Also I was able to reproduce the issue with gen1 hyper-v (before I was only using gen2). To continue the system log investigation, I can add the following: GOOD Apr 19 08:53:43 hirsute-man-2 systemd[1]: Finished Wait for Network to be Configured. Apr 19 08:53:43 hirsute-man-2 systemd-networkd[660]: eth0: Gained carrier Apr 19 08:53:43 hirsute-man-2 systemd-networkd[660]: eth0: DHCPv4 address 10.0.0.4/24 via 10.0.0.1 Apr 19 08:53:44 hirsute-man-2 cloud-init[667]: Cloud-init v. 21.1-19-gbad84ad4-0ubuntu2 running 'init' at Mon, 19 Apr 2021 08:53:44 +. Up 84.81 seconds. Apr 19 08:53:48 hirsute-man-2 systemd-networkd[660]: enP6454s1: Link UP Apr 19 08:53:48 hirsute-man-2 systemd-networkd[660]: enP6454s1: Gained carrier BAD Apr 19 08:33:48 groovy-acc-UWP5F systemd[1]: Finished Wait for Network to be Configured. Apr 19 08:33:48 groovy-acc-UWP5F systemd-networkd[616]: eth0: Gained carrier Apr 19 08:33:48 groovy-acc-UWP5F cloud-init[626]: Cloud-init v. 21.1-19-gbad84ad4-0ubuntu2 running 'init' at Mon, 19 Apr 2021 08:33:48 +. Up 7.34 seconds. Apr 19 08:33:48 groovy-acc-UWP5F systemd-networkd[616]: enP30932s1np0: Gained carrier Apr 19 08:33:51 groovy-acc-UWP5F systemd-networkd[616]: eth0: DHCPv4 address 10.0.0.4/24 via 10.0.0.1 On GOOD, DHCP is configured before Cloud-init runs, on BAD it's done after. This is "obvious" considering the error raised by cloud-init but I just wanted to underline it. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
Hi again Guilherme, I don't think there's any useful information hiding in the fact that cloud-init logs some messages as DataSourceAzure.py[DEBUG] and others as azure.py[DEBUG]. Look at: https://github.com/canonical/cloud- init/blob/45db197cfc7e3488baae7dc1053c45da070248f6/cloudinit/sources/DataSourceAzure.py#L691 See that some messages are logged with LOG.debug(), while others with report_diagnostic_event(). This latter function is defined in azure.py, hence the difference. It could have been LOG.debug() all the time, then we would always have DataSourceAzure.py[DEBUG]. I'm not sure of what was the rationale for the two ways of logging, maybe report_diagnostic_event() was used for "more useful" debugging messages? Anyway I doubt it will help debugging this issue. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
Thanks Paride. I still think we should precisely understand that difference in the logs, since in the BAD case we always see the "azure.py" messages, not the other one. This could be related or at least a clue on the root cause. Regarding the kernel side, I've build a 5.11 kernel with debug patch [0] - I'm attaching the patch here, very simple, just a parameter-delay in the carrier notification. Unfortunately gjolly tried it in a custom image and it didn't reproduce. My theory is that just delaying the notification is not enough, due to the complex SR-IOV multi-interface nature in Hyper-V, maybe there is network connectivity even before the carrier is fully set UP, so the debug patch could be extended maybe to block packet transmission in mlx5 for N seconds. I have a feeling that Groovy should reproduce this, as discussed with gjolly - in our first reproducer, we had a Hirsute image with Groovy 5.8 kernel and also we have cloud-init versions really alike in Groovy/Hirsute. So, if reproduces in Groovy it shouldn't be a release blocker, definitely. Thanks! [0] https://launchpad.net/~gpiccoli/+archive/ubuntu/test1919177/ ** Patch added: "DBG-mlx5-Add-delaylink-parameter-to-delay-Link-up-event-.patch" https://bugs.launchpad.net/cloud-init/+bug/1919177/+attachment/5488894/+files/DBG-mlx5-Add-delaylink-parameter-to-delay-Link-up-event-.patch -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
Hi Guilherme, Answering your question in comment #6: > Why one method executes from DataSourceAzure.py > whereas the other from azure.py? The reason is not that interesting and won't really help here. Some log messages generated in DataSourceAzure.py are logged using a helper function defined in azure.py, while some others are logged directly from DataSourceAzure.py. Honestly I'm not completely sure on why we have this difference, it could be in part a legacy thing. In any case the interesting logic is all in DataSourceAzure.py. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
Hi, Thanks for looking into that. Indeed, while trying to reproduce the issue this morning, I found it more challenging than I originally thought. I want to add a few points here on how I reproduced the issue: 1. Usually, I do not use the Azure CLI directly. I use a custom CLI of my own that uses the Azure SDK. This custom CLI always creates the NIC (with AN) before creating the VM. The VM is created with the existing NIC. I don't know how Azure CLI manages "--accelerated-networking" flag under the hood, maybe it's doing something different that makes it harder to reproduce the issue. 2. (for costs reasons) I always create a new resource group when I create a new VM. Once again, I don't know if it has any impact on the reproducibility. 3. This morning I managed to reproduce the issue using only the Azure CLI and after a few (unsuccessful) tries: ➜ ~ az group create --resource-group hirsute-acc-manual-1 --location 'UK South' { "id": "/subscriptions/5059ce5a-a72d-4085-acb7-33b421daa1ee/resourceGroups/hirsute-acc-manual-1", "location": "uksouth", "managedBy": null, "name": "hirsute-acc-manual-1", "properties": { "provisioningState": "Succeeded" }, "tags": null, "type": "Microsoft.Resources/resourceGroups" } ➜ ~ az vm create --name hirsute-acc-manual --resource-group hirsute-acc-manual-1 --location "UK South" --image 'Canonical:0001-com-ubuntu-server-hirsute-daily:21_04-daily-gen2:latest' --size Standard_F8s_v2 --admin-username ubuntu --ssh-key-value "$(cat ~/.ssh/canonical.pub)" --accelerated-networking {- Finished .. "fqdns": "", "id": "/subscriptions/5059ce5a-a72d-4085-acb7-33b421daa1ee/resourceGroups/hirsute-acc-manual-1/providers/Microsoft.Compute/virtualMachines/hirsute-acc-manual", "location": "uksouth", "macAddress": "00-22-48-40-82-32", "powerState": "VM running", "privateIpAddress": "10.0.0.4", "publicIpAddress": "51.104.198.218", "resourceGroup": "hirsute-acc-manual-1", "zones": "" } ➜ ~ ssh -i ~/.ssh/canonical ubuntu@51.104.198.218 The authenticity of host '51.104.198.218 (51.104.198.218)' can't be established. ECDSA key fingerprint is SHA256:wIQAUjmIeFvdBeqT5a2RHJEpDtjCnrJ+FggR8pzW7OM. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added '51.104.198.218' (ECDSA) to the list of known hosts. ubuntu@51.104.198.218: Permission denied (publickey). 4. I will post the full syslog file here but I also want to point that I THINK this issue only appears with mlx5 devices/drivers. When I was checking the VM created with no issue, mlx4 modules were loaded. On the previous VM, I can see: ubuntu@hirsute-acc-manual:~$ lsmod | grep mlx mlx5_ib 331776 0 ib_uverbs 139264 1 mlx5_ib ib_core 348160 2 ib_uverbs,mlx5_ib mlx5_core1081344 1 mlx5_ib tls90112 1 mlx5_core mlxfw 36864 1 mlx5_core Once again, I don't know if that really matters. ** Attachment added: "syslog" https://bugs.launchpad.net/cloud-init/+bug/1919177/+attachment/5487677/+files/syslog -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
(b) Regarding kernel, I found some oddities too [ output of dmesg | grep -v "audit\|apparmor" ]: GOOD: [ 1627.732924] hv_netvsc 00224840-7fbf-0022-4840-7fbf00224840 eth0: VF slot 1 added [ 1627.733637] hv_pci fb9ea909-d0dd-41b6-a1c2-98b1233e987d: PCI VMBus probing: Using version 0x10002 [ 1627.742469] hv_pci fb9ea909-d0dd-41b6-a1c2-98b1233e987d: PCI host bridge to bus d0dd:00 [ 1627.742472] pci_bus d0dd:00: root bus resource [mem 0xfe000-0xfe00f window] [ 1627.743208] pci d0dd:00:02.0: [15b3:1016] type 00 class 0x02 [ 1627.747302] pci d0dd:00:02.0: reg 0x10: [mem 0xfe000-0xfe00f 64bit pref] [ 1627.825728] pci d0dd:00:02.0: enabling Extended Tags [ 1627.830106] pci d0dd:00:02.0: 0.000 Gb/s available PCIe bandwidth, limited by Unknown x0 link at d0dd:00:02.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link) [ 1627.834774] pci d0dd:00:02.0: BAR 0: assigned [mem 0xfe000-0xfe00f 64bit pref] [ 1627.926227] mlx5_core d0dd:00:02.0: firmware version: 14.25.8362 [ 1627.936526] mlx5_core d0dd:00:02.0: handle_hca_cap:526:(pid 619): log_max_qp value in current profile is 18, changing it to HCA capability limit (12) [ 1628.131452] mlx5_core d0dd:00:02.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0) [ 1628.256107] hv_netvsc 00224840-7fbf-0022-4840-7fbf00224840 eth0: VF registering: eth1 [ 1628.256147] mlx5_core d0dd:00:02.0 eth1: joined to eth0 [ 1628.257059] mlx5_core d0dd:00:02.0 eth1: Disabling LRO, not supported in legacy RQ [ 1628.266117] mlx5_core d0dd:00:02.0 eth1: Disabling LRO, not supported in legacy RQ [ 1628.266803] mlx5_core d0dd:00:02.0 enP53469s1np0: renamed from eth1 [ 1628.305588] mlx5_core d0dd:00:02.0 enP53469s1np0: Disabling LRO, not supported in legacy RQ [ 1628.444056] mlx5_core d0dd:00:02.0 enP53469s1np0: Link up [ 1628.445592] hv_netvsc 00224840-7fbf-0022-4840-7fbf00224840 eth0: Data path switched to VF: enP53469s1np0 BAD: [5.211059] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: VF slot 1 added [5.211618] hv_pci e94e34d7-6e53-4ab5-95d1-4328102a7c87: PCI VMBus probing: Using version 0x10002 [5.220736] hv_pci e94e34d7-6e53-4ab5-95d1-4328102a7c87: PCI host bridge to bus 6e53:00 [5.220739] pci_bus 6e53:00: root bus resource [mem 0xfe000-0xfe00f window] [5.221465] pci 6e53:00:02.0: [15b3:1016] type 00 class 0x02 [5.239844] pci 6e53:00:02.0: reg 0x10: [mem 0xfe000-0xfe00f 64bit pref] [5.317841] pci 6e53:00:02.0: enabling Extended Tags [5.322168] pci 6e53:00:02.0: 0.000 Gb/s available PCIe bandwidth, limited by Unknown x0 link at 6e53:00:02.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link) [5.325766] pci 6e53:00:02.0: BAR 0: assigned [mem 0xfe000-0xfe00f 64bit pref] [5.412415] mlx5_core 6e53:00:02.0: firmware version: 14.25.8102 [5.424153] mlx5_core 6e53:00:02.0: handle_hca_cap:526:(pid 7): log_max_qp value in current profile is 18, changing it to HCA capability limit (12) [5.613614] mlx5_core 6e53:00:02.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0) [5.727686] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: VF registering: eth1 [5.727714] mlx5_core 6e53:00:02.0 eth1: joined to eth0 [5.728473] mlx5_core 6e53:00:02.0 eth1: Disabling LRO, not supported in legacy RQ [5.740989] mlx5_core 6e53:00:02.0 eth1: Disabling LRO, not supported in legacy RQ [5.741670] mlx5_core 6e53:00:02.0 enP28243s1np0: renamed from eth1 [5.785980] mlx5_core 6e53:00:02.0 enP28243s1np0: Disabling LRO, not supported in legacy RQ [8.790213] mlx5_core 6e53:00:02.0 enP28243s1np0: Link up [8.791599] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: Data path switched to VF: enP28243s1np0 [8.792382] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [9.759956] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: Data path switched from VF: enP28243s1np0 [ 10.388005] mlx5_core 6e53:00:02.0 enP28243s1np0: Link up [ 10.389894] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: Data path switched to VF: enP28243s1np0 [...] [79404.014695] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: Data path switched from VF: enP28243s1np0 [79404.015275] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: VF unregistering: enP28243s1np0 [79404.988951] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: VF slot 1 removed [79536.467267] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: VF slot 1 added [79536.467725] hv_pci e94e34d7-6e53-4ab5-95d1-4328102a7c87: PCI VMBus probing: Using version 0x10002 [79536.477073] hv_pci e94e34d7-6e53-4ab5-95d1-4328102a7c87: PCI host bridge to bus 6e53:00 [79536.477075] pci_bus 6e53:00: root bus resource [mem 0xfe000-0xfe00f window] [79536.477816] pci 6e53:00:02.0: [15b3:1016] type 00 class 0x02 [79536.491917] pci 6e53:00:02.0: reg 0x10: [mem 0xfe000-0xfe00f 64bit pref] [79536.883663] pci 6e53:00:02.0: enabling Extended Tags [79536.889425] pci 6e53:00:02.0:
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
I'm also taking a look in this one, I couldn't reproduce it (tried 11 times, with the same az-cli command-line provided by gjolly. Found 2 interesting things after Gauthier provide me access to one of his failing instances: (a) Regarding cloud-init, I see the following in the logs (comparing a GOOD and BAD instance): GOOD: 2021-04-13 19:19:09,341 - DataSourceAzure.py[DEBUG]: Retrieving public SSH keys 2021-04-13 19:19:09,341 - azure.py[DEBUG]: Unable to get keys from IMDS, falling back to OVF 2021-04-13 19:19:09,341 - DataSourceAzure.py[DEBUG]: Retrieved keys from OVF 2021-04-13 19:19:09,342 - handlers.py[DEBUG]: finish: azure-ds/get_public_ssh_keys: SUCCESS: get_public_ssh_keys 2021-04-13 19:19:09,342 - util.py[DEBUG]: Changing the ownership of /home/ubuntu/.ssh to 1000:1000 2021-04-13 19:19:09,343 - util.py[DEBUG]: Reading from /etc/ssh/sshd_config (quiet=False) 2021-04-13 19:19:09,343 - util.py[DEBUG]: Read 3287 bytes from /etc/ssh/sshd_config 2021-04-13 19:19:09,343 - util.py[DEBUG]: Writing to /home/ubuntu/.ssh/authorized_keys - wb: [600] 381 bytes 2021-04-13 19:19:09,343 - util.py[DEBUG]: Changing the ownership of /home/ubuntu/.ssh/authorized_keys to 1000:1000 2021-04-13 19:19:09,344 - util.py[DEBUG]: Changing the ownership of /root/.ssh to 0:0 2021-04-13 19:19:09,344 - util.py[DEBUG]: Reading from /etc/ssh/sshd_config (quiet=False) 2021-04-13 19:19:09,344 - util.py[DEBUG]: Read 3287 bytes from /etc/ssh/sshd_config 2021-04-13 19:19:09,344 - util.py[DEBUG]: Writing to /root/.ssh/authorized_keys - wb: [600] 545 bytes BAD: 2021-04-12 08:25:07,412 - DataSourceAzure.py[DEBUG]: Retrieving public SSH keys 2021-04-12 08:25:07,412 - azure.py[DEBUG]: Unable to get keys from IMDS, falling back to OVF 2021-04-12 08:25:07,412 - azure.py[DEBUG]: No keys available from OVF 2021-04-12 08:25:07,412 - handlers.py[DEBUG]: finish: azure-ds/get_public_ssh_keys: SUCCESS: get_public_ssh_keys 2021-04-12 08:25:07,413 - util.py[DEBUG]: Changing the ownership of /home/ubuntu/.ssh to 1000:1000 2021-04-12 08:25:07,413 - util.py[DEBUG]: Reading from /etc/ssh/sshd_config (quiet=False) 2021-04-12 08:25:07,413 - util.py[DEBUG]: Read 3287 bytes from /etc/ssh/sshd_config 2021-04-12 08:25:07,414 - util.py[DEBUG]: Writing to /home/ubuntu/.ssh/authorized_keys - wb: [600] 0 bytes 2021-04-12 08:25:07,414 - util.py[DEBUG]: Changing the ownership of /home/ubuntu/.ssh/authorized_keys to 1000:1000 2021-04-12 08:25:07,414 - util.py[DEBUG]: Changing the ownership of /root/.ssh to 0:0 2021-04-12 08:25:07,414 - util.py[DEBUG]: Reading from /etc/ssh/sshd_config (quiet=False) 2021-04-12 08:25:07,415 - util.py[DEBUG]: Read 3287 bytes from /etc/ssh/sshd_config 2021-04-12 08:25:07,415 - util.py[DEBUG]: Writing to /root/.ssh/authorized_keys - wb: [600] 0 bytes So, the main difference here is: 2021-04-13 19:19:09,341 - DataSourceAzure.py[DEBUG]: Retrieved keys from OVF vs 2021-04-12 08:25:07,412 - azure.py[DEBUG]: No keys available from OVF Why one method executes from DataSourceAzure.py whereas the other from azure.py? I'm far from expert in cloud-init, so I'll defer that questions to cloud-init folks. Will continue in next comment. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
gjolly - can you attach the whole /var/log/syslog soon after boot ? I see the same pattern of link UP/DOWN/UP on a non-accelerated Hirsute instance, but systemd correctly waits for networking to be configured in this case. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
** Also affects: linux-azure (Ubuntu) Importance: Undecided Status: New ** Changed in: linux-azure (Ubuntu) Milestone: None => ubuntu-21.04 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1919177] Re: Azure: issues with accelerated networking on Hirsute
** Also affects: cloud-init (Ubuntu) Importance: Undecided Status: New ** Changed in: cloud-init (Ubuntu) Milestone: None => ubuntu-21.04 ** Tags added: rls-hh-incoming -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1919177 Title: Azure: issues with accelerated networking on Hirsute To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs