Back in March I had a problem with initializing the ib_mthca driver on an EM64T system.   The module loading would give an error of "ib_mthca 0000:03:00.0: NOP command failed to generate interrupt (IRQ 169), aborting."    This appeared to be corrected when I updated the firmware on the Mellanox MT25208 HCA card.

The problem has reappeared with the OFED release, on the same system,  but different software and a different HCA card.

I have a small testbed with two EM64T machines connected back-to-back with two Mellanox MT25204 single port DDR cards.   I was successfully running the backported 2.6.9-34 kernel on RHEL4 Update 3, with a recent version of the OpenIB tree.   Both systems would come up and the cards successfully initialized.

Over the weekend I moved to the 2.6.16 stock kernel,  and then built and installed the OFED-1.0-rc4 release.   One of the systems appears to come up ok, but the port stays in the "down" state.   I assumed this was because the other end of the link (the other machine) was not up.

The second machine boots, but I see the following in dmesg:

    ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
    ib_mthca: Initializing 0000:03:00.0
    ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 16 (level, low) -> IRQ 169
    PCI: Setting latency timer of device 0000:03:00.0 to 64
    ib_mthca 0000:03:00.0: NOP command failed to generate interrupt (IRQ 169), aborting.
    ib_mthca 0000:03:00.0: BIOS or ACPI interrupt routing problem?

When I had the problem previously, Roland Drier suggested trying to load the ib_mthca module with "fw_cmd_doorbell=0",  which did avoid the error then,  and in fact does on this new problem.   But the question is why?    Updating the firmware on the old board seemed to have solved the problem before, but now it has occurred again on a fairly new card with recent firmware.    Has anyone else seen this problem?

One thing that may have a bearing on this is that the "/sbin/lspci" command has also started issuing an error message relating to the PCI slot that the HCA is in.  Here is the message:

   pcilib: Resource 2 in /sys/bus/pci/devices/0000:03:00.0/resource has a 64-bit address, ignoring
   ....
   03:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20)

Do I need a new version of pcilib?  I currently have pciutils-2.1.99.test8-3.1.

        -Don Albert-
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to