Public bug reported:

Problem:

During the scale-up test to 1000 VMs I could see 20 deploys failed due
to following command failure..

Command /usr/sbin/pvmdrmgr drmgr -c slot -s 'U9119.MHE.1085B07-V1-C1030' -a -w 
3 returned 19. Additional messages: /usr/sbin/pvmdrmgr drmgr -c slot -s 
'U9119.MHE.1085B07-V1-C1030' -a -w 3
Validating I/O DLPAR capability...yes.
kernel I/O op failed, rc = 26 len = 26.

I have been looking through the logs on this system to piece together
what is happening when the dlpar add failures occur. From what I am
seeing we are trying to dlpar add a virtual network device and getting a
error when trying to add the device to the system.

> ########## May 17 05:18:00 2017 ##########
> drmgr: -c slot -s U9119.MHE.1085B07-V1-C1030 -a -w 3 
> Validating I/O DLPAR capability...yes.
> Getting node types 0x00000003
> Could not find DRC property group in path: /proc/device-tree/ibm,serial.
> Acquiring drc index 0x30000406
> get-sensor for 30000406: 0, 2
> Setting allocation state to 'alloc usable'
> Setting indicator state to 'unisolate'
> Configuring connector for drc index 30000406
> Adding device-tree node /proc/device-tree/vdevice/l-lan@30000406
> ofdt update: add_node /vdevice/l-lan@30000406 ibm,loc-code 30 
> U9119.MHE.1085B07-V1-C1030-T1
> Getting node types 0x00000003
> performing kernel op for U9119.MHE.1085B07-V1-C1030, file is 
> /sys/bus/pci/slots/control/add_slot
> kernel I/O op failed, rc = 26 len = 26.
> No such device
> Releasing drc index 0x30000406
> get-sensor for 30000406: 0, 1
> Setting isolation state to 'isolate'
> Setting allocation state to 'alloc unusable'
> get-sensor for 30000406: 0, 2
> drc_index 30000406 sensor-state: 2
> Resource is not available to the partition.
> Removing device-tree node /proc/device-tree/vdevice/l-lan@30000406
> ########## May 17 05:20:11 2017 ##########

>From the drmgr log, you can see that we get a ENODEV return code when
performing the kernel operation to add the device to the system.

> performing kernel op for U9119.MHE.1085B07-V1-C1030, file is 
> /sys/bus/pci/slots/control/add_slot
> kernel I/O op failed, rc = 26 len = 26.
> No such device

This indicates that the rpadlpar_io kernel modules was unable to find
the device in the device tree. This doesn not seem right because earlier
in the drmgr logs we add the device to the device tree. Additionally,
the drmgr code validates that the add succeeds by retrieveing the newly
added device node from the device tree as a sanity check. There are no
failures reported for this.

> Adding device-tree node /proc/device-tree/vdevice/l-lan@30000406
> ofdt update: add_node /vdevice/l-lan@30000406 ibm,loc-code 30 
> U9119.MHE.1085B07-V1-C1030-T1
> Getting node types 0x00000003

I started scale-up testing and I could see deploys are going fine. Will
post a comment here if I see further drmgr failures.

Patches have been submitted upstream.

https://groups.google.com/forum/#!topic/powerpc-utils-devel/GNEi65WBwkQ

and

https://groups.google.com/forum/#!topic/powerpc-utils-devel/hJfUb5wYPsE

** Affects: powerpc-ibm-utils (Ubuntu)
     Importance: Undecided
     Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
         Status: New


** Tags: architecture-ppc64le bugnameltc-154853 severity-high 
targetmilestone-inin16043

** Tags added: architecture-ppc64le bugnameltc-154853 severity-high
targetmilestone-inin16043

** Changed in: ubuntu
     Assignee: (unassigned) => Ubuntu on IBM Power Systems Bug Triage 
(ubuntu-power-triage)

** Package changed: ubuntu => powerpc-ibm-utils (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1696434

Title:
  drmgr command fails during the scale-up test on Novalink System
  (Brazos)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/powerpc-ibm-utils/+bug/1696434/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to