On 09/12/2021 04:17, Jan Beulich wrote:
Paul,

in 0fdb48ffe7a1 ("libxl: Make sure devices added by pci-attach are
reflected in the config") you've moved down the invocation of
libxl__create_pci_backend() from libxl__device_pci_add_xenstore().
In the PV case, soon after the original invocation place there is

     if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
         if (libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d", 
XenbusStateConnected)) < 0)
             return ERROR_FAIL;
     }

Afaict the only way this wait could succeed is if the backend was
created up front. The lack thereof does, I think, explain a report
we've had:

vh015:~ # xl -vvv pci-attach sles-15-sp4-64-pv-def-net 63:11.4
libxl: debug: libxl_pci.c:1561:libxl_device_pci_add: Domain 18:ao 
0x55a517704170: create: how=(nil) callback=(nil) poller=0x55a517704210
libxl: debug: libxl_qmp.c:1921:libxl__ev_qmp_dispose:  ev 0x55a5177047e8
libxl: error: libxl_device.c:1393:libxl__wait_for_backend: Backend 
/local/domain/0/backend/pci/18/0 does not exist
libxl: error: libxl_pci.c:1779:device_pci_add_done: Domain 
18:libxl__device_pci_add failed for PCI device 0:63:11.4 (rc -3)
libxl: error: libxl_device.c:1420:device_addrm_aocomplete: unable to add device


Wow. It must be a year since those patches went in... Most of the context has disappeared from my mind.

Since I don't fully understand what that commit as a whole is
doing, and since the specific change in the sequence of operations
also isn't explained in the description (or at least not in a way
for me to recognize the connection), I'm afraid I can't see how a
possible solution to this could look like. The best guess I could
come up with so far is that the code quoted above may also need
moving down, but I can't tell at all whether doing this after the
various other intermediate steps wouldn't be too late. Your input
(or even better a patch) would be highly appreciated.

The commit comment explains the problem that it is trying to fix but I agree that it does not call out the new sequence. The issue IIRC was in what happened before the call to device_add_domain_config() and what happened afterwards. In fixing that I guess I missed this immediate use of xenstore.

I *think* the correct fix would be to move the wait into the end of libxl__create_pci_backend(), which is where the frontend and backend state nodes are now set.

  Paul



Thanks, Jan



Reply via email to