[Group.of.nepali.translators] [Bug 1696434] Re: drmgr command fails during the scale-up test on Novalink System (Brazos)
** Changed in: ubuntu-power-systems Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of नेपाली भाषा समायोजकहरुको समूह, which is subscribed to Xenial. Matching subscriptions: Ubuntu 16.04 Bugs https://bugs.launchpad.net/bugs/1696434 Title: drmgr command fails during the scale-up test on Novalink System (Brazos) Status in The Ubuntu-power-systems project: Fix Released Status in powerpc-utils package in Ubuntu: Fix Released Status in powerpc-utils source package in Xenial: Fix Released Status in powerpc-utils source package in Yakkety: Fix Committed Status in powerpc-utils source package in Zesty: Fix Released Bug description: [SRU Justification] drmgr fails intermittently when adding devices to the system. [Test case] To be completed by IBM, who have access to the hardware. 1. Run a scale test of launching 1000 VMs on a Novalink system. 2. Observe that some of the deployments fail with the following error: kernel I/O op failed, rc = 26 len = 26. 3. Install powerpc-utils from -proposed 4. Run the scale test again. 5. Observe that all the deployments succeed. [Regression potential] This change cherry-picked from upstream corrects faulty handling of a 0 return code from syscalls. Regression potential appears to be minimal. Problem: During the scale-up test to 1000 VMs I could see 20 deploys failed due to following command failure.. Command /usr/sbin/pvmdrmgr drmgr -c slot -s 'U9119.MHE.1085B07-V1-C1030' -a -w 3 returned 19. Additional messages: /usr/sbin/pvmdrmgr drmgr -c slot -s 'U9119.MHE.1085B07-V1-C1030' -a -w 3 Validating I/O DLPAR capability...yes. kernel I/O op failed, rc = 26 len = 26. I have been looking through the logs on this system to piece together what is happening when the dlpar add failures occur. From what I am seeing we are trying to dlpar add a virtual network device and getting a error when trying to add the device to the system. > ## May 17 05:18:00 2017 ## > drmgr: -c slot -s U9119.MHE.1085B07-V1-C1030 -a -w 3 > Validating I/O DLPAR capability...yes. > Getting node types 0x0003 > Could not find DRC property group in path: /proc/device-tree/ibm,serial. > Acquiring drc index 0x3406 > get-sensor for 3406: 0, 2 > Setting allocation state to 'alloc usable' > Setting indicator state to 'unisolate' > Configuring connector for drc index 3406 > Adding device-tree node /proc/device-tree/vdevice/l-lan@3406 > ofdt update: add_node /vdevice/l-lan@3406 ibm,loc-code 30 U9119.MHE.1085B07-V1-C1030-T1 > Getting node types 0x0003 > performing kernel op for U9119.MHE.1085B07-V1-C1030, file is /sys/bus/pci/slots/control/add_slot > kernel I/O op failed, rc = 26 len = 26. > No such device > Releasing drc index 0x3406 > get-sensor for 3406: 0, 1 > Setting isolation state to 'isolate' > Setting allocation state to 'alloc unusable' > get-sensor for 3406: 0, 2 > drc_index 3406 sensor-state: 2 > Resource is not available to the partition. > Removing device-tree node /proc/device-tree/vdevice/l-lan@3406 > ## May 17 05:20:11 2017 ## From the drmgr log, you can see that we get a ENODEV return code when performing the kernel operation to add the device to the system. > performing kernel op for U9119.MHE.1085B07-V1-C1030, file is /sys/bus/pci/slots/control/add_slot > kernel I/O op failed, rc = 26 len = 26. > No such device This indicates that the rpadlpar_io kernel modules was unable to find the device in the device tree. This doesn not seem right because earlier in the drmgr logs we add the device to the device tree. Additionally, the drmgr code validates that the add succeeds by retrieveing the newly added device node from the device tree as a sanity check. There are no failures reported for this. > Adding device-tree node /proc/device-tree/vdevice/l-lan@3406 > ofdt update: add_node /vdevice/l-lan@3406 ibm,loc-code 30 U9119.MHE.1085B07-V1-C1030-T1 > Getting node types 0x0003 I started scale-up testing and I could see deploys are going fine. Will post a comment here if I see further drmgr failures. Patches have been submitted upstream. https://groups.google.com/forum/#!topic/powerpc-utils- devel/GNEi65WBwkQ and https://groups.google.com/forum/#!topic/powerpc-utils- devel/hJfUb5wYPsE To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1696434/+subscriptions ___ Mailing list: https://launchpad.net/~group.of.nepali.translators Post to : group.of.nepali.translators@lists.launchpad.net Unsubscribe : https://launchpad.net/~group.of.nepali.translators More help : https://help.launchpad.net/ListHelp
[Group.of.nepali.translators] [Bug 1696434] Re: drmgr command fails during the scale-up test on Novalink System (Brazos)
This bug was fixed in the package powerpc-utils - 1.3.2-1ubuntu2~17.04 --- powerpc-utils (1.3.2-1ubuntu2~17.04) zesty; urgency=medium * d/p/Improve-perf-of-drmgr-lsslot-with-large-num-of-virt.patch: Fix scaling with large number of virtual adapters. LP: #1692837 * d/p/drmgr-Stale-errno-usage-corrections.patch, d/p/drmgr-Correct-errno-usage-use-in-validate_paltform.patch, d/p/drmgr-Correct-errno-usage-in-init_cpu_info.patch: Fix failures during scale-up test on Novalink System. LP: #1696434 -- Steve Langasek Mon, 19 Jun 2017 14:18:01 -0700 ** Changed in: powerpc-utils (Ubuntu Zesty) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of नेपाली भाषा समायोजकहरुको समूह, which is subscribed to Xenial. Matching subscriptions: Ubuntu 16.04 Bugs https://bugs.launchpad.net/bugs/1696434 Title: drmgr command fails during the scale-up test on Novalink System (Brazos) Status in The Ubuntu-power-systems project: Fix Committed Status in powerpc-utils package in Ubuntu: Fix Released Status in powerpc-utils source package in Xenial: Fix Released Status in powerpc-utils source package in Yakkety: Fix Committed Status in powerpc-utils source package in Zesty: Fix Released Bug description: [SRU Justification] drmgr fails intermittently when adding devices to the system. [Test case] To be completed by IBM, who have access to the hardware. 1. Run a scale test of launching 1000 VMs on a Novalink system. 2. Observe that some of the deployments fail with the following error: kernel I/O op failed, rc = 26 len = 26. 3. Install powerpc-utils from -proposed 4. Run the scale test again. 5. Observe that all the deployments succeed. [Regression potential] This change cherry-picked from upstream corrects faulty handling of a 0 return code from syscalls. Regression potential appears to be minimal. Problem: During the scale-up test to 1000 VMs I could see 20 deploys failed due to following command failure.. Command /usr/sbin/pvmdrmgr drmgr -c slot -s 'U9119.MHE.1085B07-V1-C1030' -a -w 3 returned 19. Additional messages: /usr/sbin/pvmdrmgr drmgr -c slot -s 'U9119.MHE.1085B07-V1-C1030' -a -w 3 Validating I/O DLPAR capability...yes. kernel I/O op failed, rc = 26 len = 26. I have been looking through the logs on this system to piece together what is happening when the dlpar add failures occur. From what I am seeing we are trying to dlpar add a virtual network device and getting a error when trying to add the device to the system. > ## May 17 05:18:00 2017 ## > drmgr: -c slot -s U9119.MHE.1085B07-V1-C1030 -a -w 3 > Validating I/O DLPAR capability...yes. > Getting node types 0x0003 > Could not find DRC property group in path: /proc/device-tree/ibm,serial. > Acquiring drc index 0x3406 > get-sensor for 3406: 0, 2 > Setting allocation state to 'alloc usable' > Setting indicator state to 'unisolate' > Configuring connector for drc index 3406 > Adding device-tree node /proc/device-tree/vdevice/l-lan@3406 > ofdt update: add_node /vdevice/l-lan@3406 ibm,loc-code 30 U9119.MHE.1085B07-V1-C1030-T1 > Getting node types 0x0003 > performing kernel op for U9119.MHE.1085B07-V1-C1030, file is /sys/bus/pci/slots/control/add_slot > kernel I/O op failed, rc = 26 len = 26. > No such device > Releasing drc index 0x3406 > get-sensor for 3406: 0, 1 > Setting isolation state to 'isolate' > Setting allocation state to 'alloc unusable' > get-sensor for 3406: 0, 2 > drc_index 3406 sensor-state: 2 > Resource is not available to the partition. > Removing device-tree node /proc/device-tree/vdevice/l-lan@3406 > ## May 17 05:20:11 2017 ## From the drmgr log, you can see that we get a ENODEV return code when performing the kernel operation to add the device to the system. > performing kernel op for U9119.MHE.1085B07-V1-C1030, file is /sys/bus/pci/slots/control/add_slot > kernel I/O op failed, rc = 26 len = 26. > No such device This indicates that the rpadlpar_io kernel modules was unable to find the device in the device tree. This doesn not seem right because earlier in the drmgr logs we add the device to the device tree. Additionally, the drmgr code validates that the add succeeds by retrieveing the newly added device node from the device tree as a sanity check. There are no failures reported for this. > Adding device-tree node /proc/device-tree/vdevice/l-lan@3406 > ofdt update: add_node /vdevice/l-lan@3406 ibm,loc-code 30 U9119.MHE.1085B07-V1-C1030-T1 > Getting node types 0x0003 I started scale-up testing and I could see deploys are going fine. Will post a comment here if I see further drmgr failures. Patches have been submitted upstream. https://groups.google.com/forum/#!topic/powerpc-utils- devel/GNEi65WBwkQ and
[Group.of.nepali.translators] [Bug 1696434] Re: drmgr command fails during the scale-up test on Novalink System (Brazos)
This bug was fixed in the package powerpc-utils - 1.3.1-2ubuntu0.3 --- powerpc-utils (1.3.1-2ubuntu0.3) xenial; urgency=medium * d/p/Improve-perf-of-drmgr-lsslot-with-large-num-of-virt.patch: Fix scaling with large number of virtual adapters. LP: #1692837 * d/p/drmgr-Stale-errno-usage-corrections.patch, d/p/drmgr-Correct-errno-usage-use-in-validate_paltform.patch, d/p/drmgr-Correct-errno-usage-in-init_cpu_info.patch: Fix failures during scale-up test on Novalink System. LP: #1696434 -- Breno Leitao Fri, 09 Jun 2017 10:39:15 -0400 ** Changed in: powerpc-utils (Ubuntu Xenial) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of नेपाली भाषा समायोजकहरुको समूह, which is subscribed to Xenial. Matching subscriptions: Ubuntu 16.04 Bugs https://bugs.launchpad.net/bugs/1696434 Title: drmgr command fails during the scale-up test on Novalink System (Brazos) Status in The Ubuntu-power-systems project: Fix Committed Status in powerpc-utils package in Ubuntu: Fix Released Status in powerpc-utils source package in Xenial: Fix Released Status in powerpc-utils source package in Yakkety: Fix Committed Status in powerpc-utils source package in Zesty: Fix Committed Bug description: [SRU Justification] drmgr fails intermittently when adding devices to the system. [Test case] To be completed by IBM, who have access to the hardware. 1. Run a scale test of launching 1000 VMs on a Novalink system. 2. Observe that some of the deployments fail with the following error: kernel I/O op failed, rc = 26 len = 26. 3. Install powerpc-utils from -proposed 4. Run the scale test again. 5. Observe that all the deployments succeed. [Regression potential] This change cherry-picked from upstream corrects faulty handling of a 0 return code from syscalls. Regression potential appears to be minimal. Problem: During the scale-up test to 1000 VMs I could see 20 deploys failed due to following command failure.. Command /usr/sbin/pvmdrmgr drmgr -c slot -s 'U9119.MHE.1085B07-V1-C1030' -a -w 3 returned 19. Additional messages: /usr/sbin/pvmdrmgr drmgr -c slot -s 'U9119.MHE.1085B07-V1-C1030' -a -w 3 Validating I/O DLPAR capability...yes. kernel I/O op failed, rc = 26 len = 26. I have been looking through the logs on this system to piece together what is happening when the dlpar add failures occur. From what I am seeing we are trying to dlpar add a virtual network device and getting a error when trying to add the device to the system. > ## May 17 05:18:00 2017 ## > drmgr: -c slot -s U9119.MHE.1085B07-V1-C1030 -a -w 3 > Validating I/O DLPAR capability...yes. > Getting node types 0x0003 > Could not find DRC property group in path: /proc/device-tree/ibm,serial. > Acquiring drc index 0x3406 > get-sensor for 3406: 0, 2 > Setting allocation state to 'alloc usable' > Setting indicator state to 'unisolate' > Configuring connector for drc index 3406 > Adding device-tree node /proc/device-tree/vdevice/l-lan@3406 > ofdt update: add_node /vdevice/l-lan@3406 ibm,loc-code 30 U9119.MHE.1085B07-V1-C1030-T1 > Getting node types 0x0003 > performing kernel op for U9119.MHE.1085B07-V1-C1030, file is /sys/bus/pci/slots/control/add_slot > kernel I/O op failed, rc = 26 len = 26. > No such device > Releasing drc index 0x3406 > get-sensor for 3406: 0, 1 > Setting isolation state to 'isolate' > Setting allocation state to 'alloc unusable' > get-sensor for 3406: 0, 2 > drc_index 3406 sensor-state: 2 > Resource is not available to the partition. > Removing device-tree node /proc/device-tree/vdevice/l-lan@3406 > ## May 17 05:20:11 2017 ## From the drmgr log, you can see that we get a ENODEV return code when performing the kernel operation to add the device to the system. > performing kernel op for U9119.MHE.1085B07-V1-C1030, file is /sys/bus/pci/slots/control/add_slot > kernel I/O op failed, rc = 26 len = 26. > No such device This indicates that the rpadlpar_io kernel modules was unable to find the device in the device tree. This doesn not seem right because earlier in the drmgr logs we add the device to the device tree. Additionally, the drmgr code validates that the add succeeds by retrieveing the newly added device node from the device tree as a sanity check. There are no failures reported for this. > Adding device-tree node /proc/device-tree/vdevice/l-lan@3406 > ofdt update: add_node /vdevice/l-lan@3406 ibm,loc-code 30 U9119.MHE.1085B07-V1-C1030-T1 > Getting node types 0x0003 I started scale-up testing and I could see deploys are going fine. Will post a comment here if I see further drmgr failures. Patches have been submitted upstream. https://groups.google.com/forum/#!topic/powerpc-utils- devel/GNEi65WBwkQ and https:
[Group.of.nepali.translators] [Bug 1696434] Re: drmgr command fails during the scale-up test on Novalink System (Brazos)
This bug was fixed in the package powerpc-utils - 1.3.2-1ubuntu2 --- powerpc-utils (1.3.2-1ubuntu2) artful; urgency=medium * d/p/in-kernel-dlpar.patch: fix FTBFS. -- Steve Langasek Mon, 19 Jun 2017 14:18:01 -0700 ** Changed in: powerpc-utils (Ubuntu) Status: Confirmed => Fix Released -- You received this bug notification because you are a member of नेपाली भाषा समायोजकहरुको समूह, which is subscribed to Xenial. Matching subscriptions: Ubuntu 16.04 Bugs https://bugs.launchpad.net/bugs/1696434 Title: drmgr command fails during the scale-up test on Novalink System (Brazos) Status in The Ubuntu-power-systems project: Confirmed Status in powerpc-utils package in Ubuntu: Fix Released Status in powerpc-utils source package in Xenial: New Status in powerpc-utils source package in Yakkety: New Status in powerpc-utils source package in Zesty: New Bug description: Problem: During the scale-up test to 1000 VMs I could see 20 deploys failed due to following command failure.. Command /usr/sbin/pvmdrmgr drmgr -c slot -s 'U9119.MHE.1085B07-V1-C1030' -a -w 3 returned 19. Additional messages: /usr/sbin/pvmdrmgr drmgr -c slot -s 'U9119.MHE.1085B07-V1-C1030' -a -w 3 Validating I/O DLPAR capability...yes. kernel I/O op failed, rc = 26 len = 26. I have been looking through the logs on this system to piece together what is happening when the dlpar add failures occur. From what I am seeing we are trying to dlpar add a virtual network device and getting a error when trying to add the device to the system. > ## May 17 05:18:00 2017 ## > drmgr: -c slot -s U9119.MHE.1085B07-V1-C1030 -a -w 3 > Validating I/O DLPAR capability...yes. > Getting node types 0x0003 > Could not find DRC property group in path: /proc/device-tree/ibm,serial. > Acquiring drc index 0x3406 > get-sensor for 3406: 0, 2 > Setting allocation state to 'alloc usable' > Setting indicator state to 'unisolate' > Configuring connector for drc index 3406 > Adding device-tree node /proc/device-tree/vdevice/l-lan@3406 > ofdt update: add_node /vdevice/l-lan@3406 ibm,loc-code 30 U9119.MHE.1085B07-V1-C1030-T1 > Getting node types 0x0003 > performing kernel op for U9119.MHE.1085B07-V1-C1030, file is /sys/bus/pci/slots/control/add_slot > kernel I/O op failed, rc = 26 len = 26. > No such device > Releasing drc index 0x3406 > get-sensor for 3406: 0, 1 > Setting isolation state to 'isolate' > Setting allocation state to 'alloc unusable' > get-sensor for 3406: 0, 2 > drc_index 3406 sensor-state: 2 > Resource is not available to the partition. > Removing device-tree node /proc/device-tree/vdevice/l-lan@3406 > ## May 17 05:20:11 2017 ## From the drmgr log, you can see that we get a ENODEV return code when performing the kernel operation to add the device to the system. > performing kernel op for U9119.MHE.1085B07-V1-C1030, file is /sys/bus/pci/slots/control/add_slot > kernel I/O op failed, rc = 26 len = 26. > No such device This indicates that the rpadlpar_io kernel modules was unable to find the device in the device tree. This doesn not seem right because earlier in the drmgr logs we add the device to the device tree. Additionally, the drmgr code validates that the add succeeds by retrieveing the newly added device node from the device tree as a sanity check. There are no failures reported for this. > Adding device-tree node /proc/device-tree/vdevice/l-lan@3406 > ofdt update: add_node /vdevice/l-lan@3406 ibm,loc-code 30 U9119.MHE.1085B07-V1-C1030-T1 > Getting node types 0x0003 I started scale-up testing and I could see deploys are going fine. Will post a comment here if I see further drmgr failures. Patches have been submitted upstream. https://groups.google.com/forum/#!topic/powerpc-utils- devel/GNEi65WBwkQ and https://groups.google.com/forum/#!topic/powerpc-utils- devel/hJfUb5wYPsE To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1696434/+subscriptions ___ Mailing list: https://launchpad.net/~group.of.nepali.translators Post to : group.of.nepali.translators@lists.launchpad.net Unsubscribe : https://launchpad.net/~group.of.nepali.translators More help : https://help.launchpad.net/ListHelp