Hi, In the intend to run an OCP networking demo at the OCP summit in Amsterdam, I tried to setup RDMA on top of OCP gears with Mellanox adapters.
My goal is to demonstrate iSCSI performances using iSER and virtualization. So I picked up randomly 2 OCP nodes, connected them in a back to back configuration and setup a target nodes with a local SSD drive. Both nodes are running Ubuntu 16.04.4 distro. The iSER connection works pretty well with a bare metal approach, and after some deeper testing LIO is definitly the best initiator currently for such setup My final goal is to make run some VMs using KVM on the client nodes, and booting this VMs with iSER through PCI passthrough implemented with SR-IOV. The target O/S for the guests are Ubuntu and Windows. Everything works like a charm with Ubuntu and as expected. Unfortunatly that is a massive mess with Windows Servers 2016 (or other flavors). The WinOF driver is unable to properly initialized the passthrough Virtual Function adapter for a reason I don't know. The error code I got is something like 0x2B, which is not really usefull without the right documentation. I spent some time to read all the available documentations from Mellanox. My setup is based on Ethernet only, single port OCP Mezzanine card (Currently Connect-x 3, but I tested with 4 also). Mellanox documentation is a little bit weird. This one http://www.mellanox.com/related-docs/prod_software/WinOF_VPI_User_Manual_v5.50.pdf saies that SR-IOV Ethernet is only supported with Hyper-V hypervisoer and KVM is supported only with Infiniband The release not of the same driver http://www.mellanox.com/related-docs/prod_software/WinOF_VPI_Release_Notes_5.50.pdf Saies that SR-IOV Ethernet is supported with KVM and Windows Server 2016 Now in the linux driver release note http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_Release_Notes_4_4-2_0_7_0.pdf It is mentionned that The following are the supported non-Linux Virtual Machines in MLNX_OFED Rev 4.4-2.0.7.0: ConnectX-3 Windows 2012 R2 DC MLNX_VPI 5.46 IPoIB, ETH ConnectX-3Pro Windows 2016 DC MLNX_VPI 5.46 IPoIB, ETH ConnectX-4 Windows 2012 R2 DC MLNX_WinOF2 1.90 IB, IPoIB, ETH ConnectX-4 Lx Windows 2016 DC MLNX_WinOF2 1.90 IB, IPoIB, ETH And that I need to use WinOF 5.46 which is available nowhere. Does anybody has ever met that issue ? I am strugging a little, and got frustrated about all of this mix up. I tried to report the issue to Mellanox, without too much success as they went through the standard process of requesting for a support contract, which is fine, but I am not really a big supporter of when trying to setup something which is smoggily supported. Some more technical infos Lspci through the server (the good news is that SR-IOV works as well as the IOMMU) 06:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3] 06:00.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 06:00.2 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] and my KVM is setup through that way <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x06' slot='0x00' function='0x1'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </hostdev> This is the driver report from the ubuntu guest using the Vf [ 1.710125] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014) [ 1.711813] mlx4_core: Initializing 0000:00:03.0 [ 1.715953] mlx4_core 0000:00:03.0: Detected virtual function - running in slave mode [ 1.721014] mlx4_core 0000:00:03.0: Sending reset [ 1.726681] mlx4_core 0000:00:03.0: Sending vhcr0 [ 1.730422] mlx4_core 0000:00:03.0: HCA minimum page size:512 [ 1.732877] mlx4_core 0000:00:03.0: Timestamping is not supported in slave mode [ 1.807115] mlx4_en: Mellanox ConnectX HCA Ethernet driver v2.2-1 (Feb 2014) [ 1.809836] mlx4_en 0000:00:03.0: Activating port:1 [ 1.811763] mlx4_en: 0000:00:03.0: Port 1: Assigned random MAC address b2:a5:c7:d1:f1:2a [ 1.860248] mlx4_en: 0000:00:03.0: Port 1: Using 64 TX rings [ 1.862343] mlx4_en: 0000:00:03.0: Port 1: Using 8 RX rings [ 1.863899] mlx4_en: 0000:00:03.0: Port 1: frag:0 - size:1522 prefix:0 stride:1536 [ 1.865796] mlx4_en: 0000:00:03.0: Port 1: Initializing port [ 1.870796] mlx4_core 0000:00:03.0 ens3: renamed from eth0 [ 54.480763] mlx4_en: ens3: frag:0 - size:1522 prefix:0 stride:1536 [ 54.665195] mlx4_en: ens3: Link Up Looks like the driver is switching in a specific mode when the virtual function is turned on (slave mode). Could it be that the Windows driver doesn't detect properly the connection with the PF function of the KVM passthrough ? vejmarie _______________________________________________ opencompute-networking mailing list Unsubscribe: http://lists.opencompute.org/mailman/options/opencompute-networking opencompute-networking@lists.opencompute.org http://lists.opencompute.org/mailman/listinfo/opencompute-networking