[Bug 1195524] Re: race condition / transient failure to provision
** Changed in: walinuxagent (Ubuntu Precise) Status: Fix Committed = Fix Released -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
This bug was fixed in the package walinuxagent - 1.3.2-0ubuntu2~13.04.1 --- walinuxagent (1.3.2-0ubuntu2~13.04.1) raring-proposed; urgency=low * Backport of 1.3.2-0ubuntu5 from 13.10 * disable ephemeral disk formating by default (LP: #1231490) * debian/patches/shadow_permissions.patch: apply the appropriate permissions to /etc/shadow (LP: #1188820). * debian/patches/verbose_logging.patch: use the appropriate log faculty when using verbose logging (LP: #1193404). * Mark bugs fixed in 1.3.2-0ubuntu3: debian/patches/config_for_cloud-init.patch: - fix for race condition between cloud-init and waagent (LP: #1195524) - mount resource disk on /mnt (LP: #1193380) - move walinuxagent init functionality to cloud-init (LP: #1037723) * Add requirement of cloud-init (LP: #1037723). -- Ben Howard ben.how...@ubuntu.com Thu, 10 Oct 2013 09:24:46 -0600 ** Changed in: walinuxagent (Ubuntu Raring) Status: Fix Committed = Fix Released -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
This bug was fixed in the package walinuxagent - 1.3.2-0ubuntu2~13.04.1 --- walinuxagent (1.3.2-0ubuntu2~13.04.1) raring-proposed; urgency=low * Backport of 1.3.2-0ubuntu5 from 13.10 * disable ephemeral disk formating by default (LP: #1231490) * debian/patches/shadow_permissions.patch: apply the appropriate permissions to /etc/shadow (LP: #1188820). * debian/patches/verbose_logging.patch: use the appropriate log faculty when using verbose logging (LP: #1193404). * Mark bugs fixed in 1.3.2-0ubuntu3: debian/patches/config_for_cloud-init.patch: - fix for race condition between cloud-init and waagent (LP: #1195524) - mount resource disk on /mnt (LP: #1193380) - move walinuxagent init functionality to cloud-init (LP: #1037723) * Add requirement of cloud-init (LP: #1037723). -- Ben Howard ben.how...@ubuntu.com Thu, 10 Oct 2013 09:24:46 -0600 ** Changed in: walinuxagent (Ubuntu Raring) Status: Fix Committed = Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
** Changed in: walinuxagent (Ubuntu Precise) Status: New = Fix Committed ** Changed in: walinuxagent (Ubuntu Precise) Importance: Undecided = Medium ** Changed in: walinuxagent (Ubuntu Raring) Importance: Undecided = Medium ** Changed in: walinuxagent (Ubuntu) Assignee: (unassigned) = Ben Howard (utlemming) ** Changed in: walinuxagent (Ubuntu Precise) Assignee: (unassigned) = Ben Howard (utlemming) ** Changed in: walinuxagent (Ubuntu Raring) Assignee: (unassigned) = Ben Howard (utlemming) -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
** Changed in: walinuxagent (Ubuntu Precise) Status: New = Fix Committed ** Changed in: walinuxagent (Ubuntu Precise) Importance: Undecided = Medium ** Changed in: walinuxagent (Ubuntu Raring) Importance: Undecided = Medium ** Changed in: walinuxagent (Ubuntu) Assignee: (unassigned) = Ben Howard (utlemming) ** Changed in: walinuxagent (Ubuntu Precise) Assignee: (unassigned) = Ben Howard (utlemming) ** Changed in: walinuxagent (Ubuntu Raring) Assignee: (unassigned) = Ben Howard (utlemming) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
Hello Scott, or anyone else affected, Accepted walinuxagent into raring-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/walinuxagent/1.3.2-0ubuntu2~13.04.1 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance! ** Also affects: walinuxagent (Ubuntu Precise) Importance: Undecided Status: New ** Also affects: walinuxagent (Ubuntu Raring) Importance: Undecided Status: New ** Changed in: walinuxagent (Ubuntu Raring) Status: New = Fix Committed -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
Hello Scott, or anyone else affected, Accepted walinuxagent into raring-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/walinuxagent/1.3.2-0ubuntu2~13.04.1 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance! ** Also affects: walinuxagent (Ubuntu Precise) Importance: Undecided Status: New ** Also affects: walinuxagent (Ubuntu Raring) Importance: Undecided Status: New ** Changed in: walinuxagent (Ubuntu Raring) Status: New = Fix Committed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
Makring verification-done as part of SRU testing. ** Tags added: verification-done -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
Makring verification-done as part of SRU testing. ** Tags added: verification-done -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
** Description changed: + [Impact]: + WALinuxAgent currently handles the provisioning of Ubuntu. This bug is fixed by Bug #1037723, which causes provisioning to be handled by Cloud-init. + + [Regression] : By moving provisioning functions to cloud-init, the + regression potential is low. Further, since this handled by a + configuration change in WALinuxAgent, current users will not be + affected. + + [Test Case]: Boot on Windows Azure and make sure that the system + provisioned. + + + [ORIGINAL REPORT] + when starting instances on azure, there is a fair chance (seems more likely for some users thanothers) that the instance will fail to reach provisioned state. I do not have a a good guess as to why this is. ProblemType: Bug DistroRelease: Ubuntu 13.04 Package: walinuxagent 1.3.2-0ubuntu1 ProcVersionSignature: Ubuntu 3.8.0-25.37-generic 3.8.13 Uname: Linux 3.8.0-25-generic x86_64 ApportVersion: 2.9.2-0ubuntu8.1 Architecture: amd64 Date: Fri Jun 28 01:29:25 2013 MarkForUpload: True ProcEnviron: - TERM=screen - PATH=(custom, no user) - LANG=en_US.UTF-8 - SHELL=/bin/bash + TERM=screen + PATH=(custom, no user) + LANG=en_US.UTF-8 + SHELL=/bin/bash SourcePackage: walinuxagent UpgradeStatus: No upgrade log present (probably fresh install) -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
** Description changed: + [Impact]: + WALinuxAgent currently handles the provisioning of Ubuntu. This bug is fixed by Bug #1037723, which causes provisioning to be handled by Cloud-init. + + [Regression] : By moving provisioning functions to cloud-init, the + regression potential is low. Further, since this handled by a + configuration change in WALinuxAgent, current users will not be + affected. + + [Test Case]: Boot on Windows Azure and make sure that the system + provisioned. + + + [ORIGINAL REPORT] + when starting instances on azure, there is a fair chance (seems more likely for some users thanothers) that the instance will fail to reach provisioned state. I do not have a a good guess as to why this is. ProblemType: Bug DistroRelease: Ubuntu 13.04 Package: walinuxagent 1.3.2-0ubuntu1 ProcVersionSignature: Ubuntu 3.8.0-25.37-generic 3.8.13 Uname: Linux 3.8.0-25-generic x86_64 ApportVersion: 2.9.2-0ubuntu8.1 Architecture: amd64 Date: Fri Jun 28 01:29:25 2013 MarkForUpload: True ProcEnviron: - TERM=screen - PATH=(custom, no user) - LANG=en_US.UTF-8 - SHELL=/bin/bash + TERM=screen + PATH=(custom, no user) + LANG=en_US.UTF-8 + SHELL=/bin/bash SourcePackage: walinuxagent UpgradeStatus: No upgrade log present (probably fresh install) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
** Branch linked: lp:ubuntu/walinuxagent -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
This bug was fixed in the package walinuxagent - 1.3.2-0ubuntu4 --- walinuxagent (1.3.2-0ubuntu4) saucy; urgency=low * debian/patches/shadow_permissions.patch: apply the appropriate permissions to /etc/shadow (LP: #1188820). * debian/patches/verbose_logging.patch: use the appropriate log faculty when using verbose logging (LP: #1193404). * Mark bugs fixed in 1.3.2-0ubuntu3: debian/patches/config_for_cloud-init.patch: - fix for race condition between cloud-init and waagent (LP: #1195524) - mount resource disk on /mnt (LP: #1193380) - move walinuxagent init functionality to cloud-init (LP: #1037723) -- Ben Howard ben.how...@ubuntu.com Tue, 23 Jul 2013 09:43:40 -0600 ** Changed in: walinuxagent (Ubuntu) Status: Confirmed = Fix Released -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
** Branch linked: lp:ubuntu/walinuxagent -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
This bug was fixed in the package walinuxagent - 1.3.2-0ubuntu4 --- walinuxagent (1.3.2-0ubuntu4) saucy; urgency=low * debian/patches/shadow_permissions.patch: apply the appropriate permissions to /etc/shadow (LP: #1188820). * debian/patches/verbose_logging.patch: use the appropriate log faculty when using verbose logging (LP: #1193404). * Mark bugs fixed in 1.3.2-0ubuntu3: debian/patches/config_for_cloud-init.patch: - fix for race condition between cloud-init and waagent (LP: #1195524) - mount resource disk on /mnt (LP: #1193380) - move walinuxagent init functionality to cloud-init (LP: #1037723) -- Ben Howard ben.how...@ubuntu.com Tue, 23 Jul 2013 09:43:40 -0600 ** Changed in: walinuxagent (Ubuntu) Status: Confirmed = Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
I'm fairly certain there is still a race condition that I described comment 8. Please raise the hostname issue in another bug. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
I'm fairly certain there is still a race condition that I described comment 8. Please raise the hostname issue in another bug. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
I did some testing and it turns out the hostname size seems to be the problem! I started a bunch of machines with a hostname of 64 characters and a bunch of machines with a hostname of 25 characters (with the script I linked above): all the machines with a hostname of 25 ended up in the state Running and all the others were stuck Commissioning. Just to be sure, I had someone else confirm that behavior. I did some more testing and I can now say that Azure is ok with hostnames up to 55 characters. A machine with a hostname of size 56 or more won't get out of the commissioning state. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
s/Commissioning/Provisioning/ sorry about that. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
I did some testing and it turns out the hostname size seems to be the problem! I started a bunch of machines with a hostname of 64 characters and a bunch of machines with a hostname of 25 characters (with the script I linked above): all the machines with a hostname of 25 ended up in the state Running and all the others were stuck Commissioning. Just to be sure, I had someone else confirm that behavior. I did some more testing and I can now say that Azure is ok with hostnames up to 55 characters. A machine with a hostname of size 56 or more won't get out of the commissioning state. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
s/Commissioning/Provisioning/ sorry about that. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
2013/06/28 15:08:12 EnvMonitor: Detected host name change: ubuntu - gwaclhostblkhljy4re3yp9swkdwp63kswkss9bqhn0zm3f3gunipzu5vwdr8qzw 2013/06/28 15:08:12 Setting host name: gwaclhostblkhljy4re3yp9swkdwp63kswkss9bqhn0zm3f3gunipzu5vwdr8qzw Is it expected? Yes, this machine was created using (a slightly modified version of) this script: http://bazaar.launchpad.net/~gwacl- hackers/gwacl/trunk/view/head:/example/management/run.go The name is randomly generated. It is a bit long (64 characters) but I guess this cannot be the cause of our problem. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
The doc (http://msdn.microsoft.com/en- us/library/windowsazure/jj157194.aspx) explicitly says the hostname can get 64 characters long: HostName: Required. Specifies the host name for the VM. Host names are ASCII character strings 1 to 64 characters in length. Used with the LinuxProvisioningConfigurationSet configuration set. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
Hello! Thank you for confirming that. I think the hostname just looked odd so just wanted to confirm as we don't have access to this environment. If the instance is still available we really need more logs to figure out what went wrong. This will help determine if there's something wrong on the VM (disk or network issue) or a timing issue that we may be able work around. We know the timeout set on the socket by httplib is the system default timeout for sockets, so if the following http request from ReportRoleProperties() was 'blocked' it would throw an exception on a timeout or other IO error. However, clearly the EnvMonitor was not able to complete in this case. Most likely the ifup call could not complete for some reason. If this is not a reliable operation then perhaps disabling the HostMonitor a good solution, of course the VMs other services may not be aware of a hostname change. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
2013/06/28 15:08:12 EnvMonitor: Detected host name change: ubuntu - gwaclhostblkhljy4re3yp9swkdwp63kswkss9bqhn0zm3f3gunipzu5vwdr8qzw 2013/06/28 15:08:12 Setting host name: gwaclhostblkhljy4re3yp9swkdwp63kswkss9bqhn0zm3f3gunipzu5vwdr8qzw Is it expected? Yes, this machine was created using (a slightly modified version of) this script: http://bazaar.launchpad.net/~gwacl- hackers/gwacl/trunk/view/head:/example/management/run.go The name is randomly generated. It is a bit long (64 characters) but I guess this cannot be the cause of our problem. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
The doc (http://msdn.microsoft.com/en- us/library/windowsazure/jj157194.aspx) explicitly says the hostname can get 64 characters long: HostName: Required. Specifies the host name for the VM. Host names are ASCII character strings 1 to 64 characters in length. Used with the LinuxProvisioningConfigurationSet configuration set. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
Hello! Thank you for confirming that. I think the hostname just looked odd so just wanted to confirm as we don't have access to this environment. If the instance is still available we really need more logs to figure out what went wrong. This will help determine if there's something wrong on the VM (disk or network issue) or a timing issue that we may be able work around. We know the timeout set on the socket by httplib is the system default timeout for sockets, so if the following http request from ReportRoleProperties() was 'blocked' it would throw an exception on a timeout or other IO error. However, clearly the EnvMonitor was not able to complete in this case. Most likely the ifup call could not complete for some reason. If this is not a reliable operation then perhaps disabling the HostMonitor a good solution, of course the VMs other services may not be aware of a hostname change. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
** Attachment added: waagent.log and waagent dir. https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+attachment/3717115/+files/waagent-info.tar -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
ok. so i have 2 other systems that are showing this failure now. I was able to ssh into them, though. walinux-agent had provisioned the user, populated ssh keys and then also started sshd (which it actually should not do). it shouldn't start sshd because it is possibly doing that before sshd has the required facilities up (sshd starts on 'filesystem or runlevel [2345]). that wouldn't seem to be the problem here, and actually has allowed us into the instance to debug. $ ls -tr --full-time /var/log/upstart/*.log -rw-r- 1 root root 46 2013-06-28 15:07:55.843772000 + /var/log/upstart/container-detect.log -rw-r- 1 root root 95 2013-06-28 15:07:56.095772000 + /var/log/upstart/console-setup.log -rw-r- 1 root root 282 2013-06-28 15:07:56.183772000 + /var/log/upstart/procps-virtual-filesystems.log -rw-r- 1 root root 118 2013-06-28 15:07:56.311772000 + /var/log/upstart/module-init-tools.log -rw-r- 1 root root 282 2013-06-28 15:07:58.310376600 + /var/log/upstart/procps-static-network-up.log -rw-r- 1 root root 110 2013-06-28 15:08:02.993943800 + /var/log/upstart/udev-fallback-graphics.log -rw-r- 1 root root 158 2013-06-28 15:08:09.876561300 + /var/log/upstart/ureadahead-other.log -rw-r- 1 root root 64 2013-06-28 15:09:30.346411301 + /var/log/upstart/rsyslog.log -rw-r- 1 root root 64 2013-06-28 15:09:30.370411301 + /var/log/upstart/dbus.log $ cat /proc/mounts rootfs / rootfs rw 0 0 sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 udev /dev devtmpfs rw,relatime,size=335336k,nr_inodes=83834,mode=755 0 0 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /run tmpfs rw,nosuid,relatime,size=137672k,mode=755 0 0 /dev/disk/by-uuid/65a0705a-7afe-482f-917d-c59e75cf0c52 / ext4 rw,relatime,user_xattr,barrier=1,data=ordered,discard 0 0 none /sys/fs/fuse/connections fusectl rw,relatime 0 0 none /sys/kernel/debug debugfs rw,relatime 0 0 none /sys/kernel/security securityfs rw,relatime 0 0 none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0 none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0 /dev/sdb1 /mnt/resource ext4 rw,relatime,user_xattr,barrier=1,data=ordered 0 0 $ cat /etc/fstab UUID=65a0705a-7afe-482f-917d-c59e75cf0c52 /ext4 defaults,discard 0 0 mountall is not running. $ sudo status mountall mountall stop/waiting $ ls -altr /var/run/landscape ls: cannot access /var/run/landscape: No such file or directory $ runlevel N 2 $ ps axw .. root 389 1 0 15:07 ?00:00:00 upstart-udev-bridge --daemon root 391 1 0 15:07 ?00:00:00 /sbin/udevd --daemon root 508 1 0 15:07 ?00:00:00 /usr/bin/python /usr/sbin/waagent -daemo root 574 391 0 15:07 ?00:00:00 /sbin/udevd --daemon root 577 391 0 15:07 ?00:00:00 /sbin/udevd --daemon root 598 2 0 15:07 ?00:00:00 [kpsmoused] root 633 1 0 15:07 ?00:00:00 upstart-socket-bridge --daemon root 906 2 0 15:08 ?00:00:00 [jbd2/sdb1-8] root 907 2 0 15:08 ?00:00:00 [ext4-dio-unwrit] root 931 508 0 15:08 ?00:00:00 [sh] defunct root 1015 1 0 15:08 ?00:00:00 dhclient3 -e IF_METRIC=100 -pf /var/run/ root 1025 1 0 15:08 ?00:00:00 /bin/sh /etc/network/if-up.d/ntpdate root 1028 1025 0 15:08 ?00:00:00 lockfile-create /var/lock/ntpdate-ifup root 1121 1 0 15:09 ?00:00:00 /usr/sbin/sshd -D syslog1137 1 0 15:09 ?00:00:00 rsyslogd -c5 102 1142 1 0 15:09 ?00:00:00 dbus-daemon --system --fork --activation root 1200 1 0 15:09 tty4 00:00:00 /sbin/getty -8 38400 tty4 root 1207 1 0 15:09 tty5 00:00:00 /sbin/getty -8 38400 tty5 root 1214 1 0 15:09 tty2 00:00:00 /sbin/getty -8 38400 tty2 root 1215 1 0 15:09 tty3 00:00:00 /sbin/getty -8 38400 tty3 root 1218 1 0 15:09 tty6 00:00:00 /sbin/getty -8 38400 tty6 root 1248 1 0 15:09 ?00:00:00 /usr/sbin/hv_kvp_daemon_3.2.0-48-virtual root 1250 1 0 15:09 ?00:00:00 acpid -c /etc/acpi/events -s /var/run/ac root 1251 1 0 15:09 ?00:00:00 cron daemon1252 1 0 15:09 ?00:00:00 atd root 1265 1 0 15:09 tty1 00:00:00 /sbin/getty -8 38400 tty1 whoopsie 1279 1 0 15:09 ?00:00:00 whoopsie root 1308 1121 0 15:23 ?00:00:00 sshd: test [priv] test 1412 1308 0 15:24 ?00:00:00 sshd: test@pts/0 test 1413 1412 0 15:24 pts/000:00:01 -bash root 1755 2 0 15:33 ?00:00:00 [kworker/0:0] root 2054 2 0 15:38 ?00:00:00 [kworker/0:2] root 2274 2 0 15:43 ?00:00:00 [kworker/0:1] test 2450 1413 0 15:45 pts/000:00:00 ps -ef Note, it seems that
Re: [Bug 1195524] Re: race condition / transient failure to provision
I noticed while doing the kdump debugging that you can ssh into a cloud-image instance very early (well before the console allows login) on a kvm based cloud-image instance. I was surprised that service was available before console login was available. Not sure it is related to what your are seeing on azure though. (During kdump there is about a 40 second delay while the dump is post-processed before console login is available--but ssh login is already available and you can 'watch' the kdump post processing.) This means (at least to me) that ssh is on fairly early. On Fri, Jun 28, 2013 at 10:06 AM, Scott Moser smo...@ubuntu.com wrote: ** Attachment added: waagent.log and waagent dir. https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+attachment/3717115/+files/waagent-info.tar -- You received this bug notification because you are a member of Canonical Microsoft Azure Collaboration, which is subscribed to walinuxagent in Ubuntu. Matching subscriptions: walinxuagnet bugs https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- -dave -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
ok... i think ihave a reasonable description of what caused this specific hang. The last entry in this /var/log/waagent.log was: 2013/06/28 14:50:12 Provisioning image using OVF settings in the DVD. 2013/06/28 14:50:12 Resource disk (/dev/sdb1) is mounted at /mnt/resource with fstype ext4 2013/06/28 14:50:14 Created user account: test 2013/06/28 14:50:15 EnvMonitor: Detected host name change: ubuntu - gwaclhost24lm4crqqc7kqfl9hya2ieuvxz9fvdqpn6ipfgdi4gh1p7snuwjezcw 2013/06/28 14:50:15 Setting host name: gwaclhost24lm4crqqc7kqfl9hya2ieuvxz9fvdqpn6ipfgdi4gh1p7snuwjezcw In a *good* run, the entry after Created user account should be somethin glike: 2013/06/28 16:38:40 Provisioning image using OVF settings in the DVD. 2013/06/28 16:38:40 Disabled SSH password-based authentication methods. 2013/06/28 16:38:41 Created user account: smoser 2013/06/28 16:38:42 EnvMonitor: Detected host name change: ubuntu - smoser0628p 2013/06/28 16:38:42 Setting host name: smoser0628p 2013/06/28 16:38:55 ERROR:CalledProcessError. Error Code: 1 2013/06/28 16:38:55 ERROR:CalledProcessError. Command string: service ssh status | grep running 2013/06/28 16:38:55 ERROR:CalledProcessError. Command result: 2013/06/28 16:38:55 Posted Role Properties. CertificateThumbprint=075533b9075b4130651b1f74c451ca9b 2013/06/28 16:38:55 Root password deleted. Note, the ERROR information is coming from a differen thread in the process. you can just ignore it. The key is that it should be Posting Role Properties, but this instnace didn't get there. an 'strace' of the pid, shows: $ sudo strace -p 498 Process 498 attached - interrupt to quit select(0, NULL, NULL, NULL, {0, 321648}) = 0 (Timeout) select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) select(0, NULL, NULL, NULL, {1, 0} the other key bit of info here is that 'Setting host name' comes from 'UpdateAndPublishHostName', which , amoung other questionable operations, does: for ethernetInterface in PossibleEthernetInterfaces: Run(ifdown + ethernetInterface + ifup + ethernetInterface,chk_err=False) # We supress error logging on error. self.RestoreRoutes() So what I think happened here, is that the attempt to ReportRoleProperties (Posting Role Properties) ends up blocking/hanging in httplib.HTTPConnection(self.Endpoint), where no 'timeout' is specified. One good thing is that this can be disabled via: Provisioning.MonitorHostName=n ** Changed in: walinuxagent (Ubuntu) Importance: Undecided = High ** Changed in: walinuxagent (Ubuntu) Status: New = Confirmed -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
Can you also share the kernel logs, and all the files in /var/lib/waagent? In the waagent.log, the new host name seems uncommon: 2013/06/28 15:08:12 EnvMonitor: Detected host name change: ubuntu - gwaclhostblkhljy4re3yp9swkdwp63kswkss9bqhn0zm3f3gunipzu5vwdr8qzw 2013/06/28 15:08:12 Setting host name: gwaclhostblkhljy4re3yp9swkdwp63kswkss9bqhn0zm3f3gunipzu5vwdr8qzw Is it expected? -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
Aside from the kernel logs, please also include /var/log/syslog and any other relevant logs. Long, I believe the contents of /var/lib/waagent were posted here: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+attachment/3717115/+files/waagent-info.tar There is retry logic in httplib, and there would be logs if we encountered issues/timeouts from ReportRoleProperties(), so I'm not sure if the VM even got that far. I see we also have a defunct 'sh' process, possibly something suffered earlier when calling ifup. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
** Attachment added: waagent.log and waagent dir. https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+attachment/3717115/+files/waagent-info.tar -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
ok. so i have 2 other systems that are showing this failure now. I was able to ssh into them, though. walinux-agent had provisioned the user, populated ssh keys and then also started sshd (which it actually should not do). it shouldn't start sshd because it is possibly doing that before sshd has the required facilities up (sshd starts on 'filesystem or runlevel [2345]). that wouldn't seem to be the problem here, and actually has allowed us into the instance to debug. $ ls -tr --full-time /var/log/upstart/*.log -rw-r- 1 root root 46 2013-06-28 15:07:55.843772000 + /var/log/upstart/container-detect.log -rw-r- 1 root root 95 2013-06-28 15:07:56.095772000 + /var/log/upstart/console-setup.log -rw-r- 1 root root 282 2013-06-28 15:07:56.183772000 + /var/log/upstart/procps-virtual-filesystems.log -rw-r- 1 root root 118 2013-06-28 15:07:56.311772000 + /var/log/upstart/module-init-tools.log -rw-r- 1 root root 282 2013-06-28 15:07:58.310376600 + /var/log/upstart/procps-static-network-up.log -rw-r- 1 root root 110 2013-06-28 15:08:02.993943800 + /var/log/upstart/udev-fallback-graphics.log -rw-r- 1 root root 158 2013-06-28 15:08:09.876561300 + /var/log/upstart/ureadahead-other.log -rw-r- 1 root root 64 2013-06-28 15:09:30.346411301 + /var/log/upstart/rsyslog.log -rw-r- 1 root root 64 2013-06-28 15:09:30.370411301 + /var/log/upstart/dbus.log $ cat /proc/mounts rootfs / rootfs rw 0 0 sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 udev /dev devtmpfs rw,relatime,size=335336k,nr_inodes=83834,mode=755 0 0 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /run tmpfs rw,nosuid,relatime,size=137672k,mode=755 0 0 /dev/disk/by-uuid/65a0705a-7afe-482f-917d-c59e75cf0c52 / ext4 rw,relatime,user_xattr,barrier=1,data=ordered,discard 0 0 none /sys/fs/fuse/connections fusectl rw,relatime 0 0 none /sys/kernel/debug debugfs rw,relatime 0 0 none /sys/kernel/security securityfs rw,relatime 0 0 none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0 none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0 /dev/sdb1 /mnt/resource ext4 rw,relatime,user_xattr,barrier=1,data=ordered 0 0 $ cat /etc/fstab UUID=65a0705a-7afe-482f-917d-c59e75cf0c52 /ext4 defaults,discard 0 0 mountall is not running. $ sudo status mountall mountall stop/waiting $ ls -altr /var/run/landscape ls: cannot access /var/run/landscape: No such file or directory $ runlevel N 2 $ ps axw .. root 389 1 0 15:07 ?00:00:00 upstart-udev-bridge --daemon root 391 1 0 15:07 ?00:00:00 /sbin/udevd --daemon root 508 1 0 15:07 ?00:00:00 /usr/bin/python /usr/sbin/waagent -daemo root 574 391 0 15:07 ?00:00:00 /sbin/udevd --daemon root 577 391 0 15:07 ?00:00:00 /sbin/udevd --daemon root 598 2 0 15:07 ?00:00:00 [kpsmoused] root 633 1 0 15:07 ?00:00:00 upstart-socket-bridge --daemon root 906 2 0 15:08 ?00:00:00 [jbd2/sdb1-8] root 907 2 0 15:08 ?00:00:00 [ext4-dio-unwrit] root 931 508 0 15:08 ?00:00:00 [sh] defunct root 1015 1 0 15:08 ?00:00:00 dhclient3 -e IF_METRIC=100 -pf /var/run/ root 1025 1 0 15:08 ?00:00:00 /bin/sh /etc/network/if-up.d/ntpdate root 1028 1025 0 15:08 ?00:00:00 lockfile-create /var/lock/ntpdate-ifup root 1121 1 0 15:09 ?00:00:00 /usr/sbin/sshd -D syslog1137 1 0 15:09 ?00:00:00 rsyslogd -c5 102 1142 1 0 15:09 ?00:00:00 dbus-daemon --system --fork --activation root 1200 1 0 15:09 tty4 00:00:00 /sbin/getty -8 38400 tty4 root 1207 1 0 15:09 tty5 00:00:00 /sbin/getty -8 38400 tty5 root 1214 1 0 15:09 tty2 00:00:00 /sbin/getty -8 38400 tty2 root 1215 1 0 15:09 tty3 00:00:00 /sbin/getty -8 38400 tty3 root 1218 1 0 15:09 tty6 00:00:00 /sbin/getty -8 38400 tty6 root 1248 1 0 15:09 ?00:00:00 /usr/sbin/hv_kvp_daemon_3.2.0-48-virtual root 1250 1 0 15:09 ?00:00:00 acpid -c /etc/acpi/events -s /var/run/ac root 1251 1 0 15:09 ?00:00:00 cron daemon1252 1 0 15:09 ?00:00:00 atd root 1265 1 0 15:09 tty1 00:00:00 /sbin/getty -8 38400 tty1 whoopsie 1279 1 0 15:09 ?00:00:00 whoopsie root 1308 1121 0 15:23 ?00:00:00 sshd: test [priv] test 1412 1308 0 15:24 ?00:00:00 sshd: test@pts/0 test 1413 1412 0 15:24 pts/000:00:01 -bash root 1755 2 0 15:33 ?00:00:00 [kworker/0:0] root 2054 2 0 15:38 ?00:00:00 [kworker/0:2] root 2274 2 0 15:43 ?00:00:00 [kworker/0:1] test 2450 1413 0 15:45 pts/000:00:00 ps -ef Note, it seems that
Re: [Bug 1195524] Re: race condition / transient failure to provision
I noticed while doing the kdump debugging that you can ssh into a cloud-image instance very early (well before the console allows login) on a kvm based cloud-image instance. I was surprised that service was available before console login was available. Not sure it is related to what your are seeing on azure though. (During kdump there is about a 40 second delay while the dump is post-processed before console login is available--but ssh login is already available and you can 'watch' the kdump post processing.) This means (at least to me) that ssh is on fairly early. On Fri, Jun 28, 2013 at 10:06 AM, Scott Moser smo...@ubuntu.com wrote: ** Attachment added: waagent.log and waagent dir. https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+attachment/3717115/+files/waagent-info.tar -- You received this bug notification because you are a member of Canonical Microsoft Azure Collaboration, which is subscribed to walinuxagent in Ubuntu. Matching subscriptions: walinxuagnet bugs https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- -dave -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
ok... i think ihave a reasonable description of what caused this specific hang. The last entry in this /var/log/waagent.log was: 2013/06/28 14:50:12 Provisioning image using OVF settings in the DVD. 2013/06/28 14:50:12 Resource disk (/dev/sdb1) is mounted at /mnt/resource with fstype ext4 2013/06/28 14:50:14 Created user account: test 2013/06/28 14:50:15 EnvMonitor: Detected host name change: ubuntu - gwaclhost24lm4crqqc7kqfl9hya2ieuvxz9fvdqpn6ipfgdi4gh1p7snuwjezcw 2013/06/28 14:50:15 Setting host name: gwaclhost24lm4crqqc7kqfl9hya2ieuvxz9fvdqpn6ipfgdi4gh1p7snuwjezcw In a *good* run, the entry after Created user account should be somethin glike: 2013/06/28 16:38:40 Provisioning image using OVF settings in the DVD. 2013/06/28 16:38:40 Disabled SSH password-based authentication methods. 2013/06/28 16:38:41 Created user account: smoser 2013/06/28 16:38:42 EnvMonitor: Detected host name change: ubuntu - smoser0628p 2013/06/28 16:38:42 Setting host name: smoser0628p 2013/06/28 16:38:55 ERROR:CalledProcessError. Error Code: 1 2013/06/28 16:38:55 ERROR:CalledProcessError. Command string: service ssh status | grep running 2013/06/28 16:38:55 ERROR:CalledProcessError. Command result: 2013/06/28 16:38:55 Posted Role Properties. CertificateThumbprint=075533b9075b4130651b1f74c451ca9b 2013/06/28 16:38:55 Root password deleted. Note, the ERROR information is coming from a differen thread in the process. you can just ignore it. The key is that it should be Posting Role Properties, but this instnace didn't get there. an 'strace' of the pid, shows: $ sudo strace -p 498 Process 498 attached - interrupt to quit select(0, NULL, NULL, NULL, {0, 321648}) = 0 (Timeout) select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) select(0, NULL, NULL, NULL, {1, 0} the other key bit of info here is that 'Setting host name' comes from 'UpdateAndPublishHostName', which , amoung other questionable operations, does: for ethernetInterface in PossibleEthernetInterfaces: Run(ifdown + ethernetInterface + ifup + ethernetInterface,chk_err=False) # We supress error logging on error. self.RestoreRoutes() So what I think happened here, is that the attempt to ReportRoleProperties (Posting Role Properties) ends up blocking/hanging in httplib.HTTPConnection(self.Endpoint), where no 'timeout' is specified. One good thing is that this can be disabled via: Provisioning.MonitorHostName=n ** Changed in: walinuxagent (Ubuntu) Importance: Undecided = High ** Changed in: walinuxagent (Ubuntu) Status: New = Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
Can you also share the kernel logs, and all the files in /var/lib/waagent? In the waagent.log, the new host name seems uncommon: 2013/06/28 15:08:12 EnvMonitor: Detected host name change: ubuntu - gwaclhostblkhljy4re3yp9swkdwp63kswkss9bqhn0zm3f3gunipzu5vwdr8qzw 2013/06/28 15:08:12 Setting host name: gwaclhostblkhljy4re3yp9swkdwp63kswkss9bqhn0zm3f3gunipzu5vwdr8qzw Is it expected? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
Aside from the kernel logs, please also include /var/log/syslog and any other relevant logs. Long, I believe the contents of /var/lib/waagent were posted here: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+attachment/3717115/+files/waagent-info.tar There is retry logic in httplib, and there would be logs if we encountered issues/timeouts from ReportRoleProperties(), so I'm not sure if the VM even got that far. I see we also have a defunct 'sh' process, possibly something suffered earlier when calling ifup. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
as I'm trying to debug this instance that failed for me, and then came up after a 'vm restart', there is no indication that / was ever mounted RW the first time. Ie, there is no evidence in /var/log of *anything* having run. cloud-init starts on mounted MOUNTPOINT=/ and logs pretty much immediately to /var/log/cloud-init.log but there is no timestamps other than this boot. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
given my previous comment, i would have suspected that there was disk or kernel failure. however, ssh-keyscan seemed to indicate ssh was running: $ ssh-keyscan us-west-1.cloudapp.net # us-west-1.cloudapp.net SSH-2.0-OpenSSH_6.1p1 Debian-4 Connection closed by 137.135.115.232 -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
Re: [Bug 1195524] Re: race condition / transient failure to provision
You can get kdump working in Azure (not that you really need it for this) with the attached patch. Ref: http://support.microsoft.com/kb/2858695 On Thu, Jun 27, 2013 at 8:01 PM, Scott Moser smo...@ubuntu.com wrote: given my previous comment, i would have suspected that there was disk or kernel failure. however, ssh-keyscan seemed to indicate ssh was running: $ ssh-keyscan us-west-1.cloudapp.net # us-west-1.cloudapp.net SSH-2.0-OpenSSH_6.1p1 Debian-4 Connection closed by 137.135.115.232 -- You received this bug notification because you are a member of Canonical Microsoft Azure Collaboration, which is subscribed to walinuxagent in Ubuntu. Matching subscriptions: walinxuagnet bugs https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- -dave ** Attachment added: blacklist.hyperv.kdump.patch https://bugs.launchpad.net/bugs/1195524/+attachment/3716552/+files/blacklist.hyperv.kdump.patch -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1195524] Re: race condition / transient failure to provision
as I'm trying to debug this instance that failed for me, and then came up after a 'vm restart', there is no indication that / was ever mounted RW the first time. Ie, there is no evidence in /var/log of *anything* having run. cloud-init starts on mounted MOUNTPOINT=/ and logs pretty much immediately to /var/log/cloud-init.log but there is no timestamps other than this boot. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1195524] Re: race condition / transient failure to provision
given my previous comment, i would have suspected that there was disk or kernel failure. however, ssh-keyscan seemed to indicate ssh was running: $ ssh-keyscan us-west-1.cloudapp.net # us-west-1.cloudapp.net SSH-2.0-OpenSSH_6.1p1 Debian-4 Connection closed by 137.135.115.232 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 1195524] Re: race condition / transient failure to provision
You can get kdump working in Azure (not that you really need it for this) with the attached patch. Ref: http://support.microsoft.com/kb/2858695 On Thu, Jun 27, 2013 at 8:01 PM, Scott Moser smo...@ubuntu.com wrote: given my previous comment, i would have suspected that there was disk or kernel failure. however, ssh-keyscan seemed to indicate ssh was running: $ ssh-keyscan us-west-1.cloudapp.net # us-west-1.cloudapp.net SSH-2.0-OpenSSH_6.1p1 Debian-4 Connection closed by 137.135.115.232 -- You received this bug notification because you are a member of Canonical Microsoft Azure Collaboration, which is subscribed to walinuxagent in Ubuntu. Matching subscriptions: walinxuagnet bugs https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- -dave ** Attachment added: blacklist.hyperv.kdump.patch https://bugs.launchpad.net/bugs/1195524/+attachment/3716552/+files/blacklist.hyperv.kdump.patch -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs