I thought newer NVIDIA cards had integrated circuits, not using PCI
anymore, still the same case?
NVLink is much faster than PCI, but it's still a bus that is slower (and
higher latency) than aggregate DDR, though there are open questions
about how much message packing (for example) should take place on GPU
versus CPU.
To add: NVLink has higher bandwidth, but as far as I know it has about
the same latency. That is, if you send just a few (kilo-)bytes around,
you'll see the same latency problems as with PCI-Express.
Best regards,
Karli