> > > > > > Can we just backport our own version of ip_dev_find()? We had this once > > > before > > > in svn when they removed it from being exported from the kernel. > > > > Yes, this is in kernel_addons for 2.6.19 or something like that. > > Just copy from there, much cleaner than the patch. > > > > I just realized that ip_dev_find() is being redefined to xxx_ip_dev_find > for sles9sp3. So maybe this function is causing the error. Stay tuned.
xxx_ip_dev_find() is returning the wrong interface (sometimes). I added printks to xxx_ip_dev_find(). Then I ran rping -s -a <local ip addr> and it failed because xxx_ip_dev_find() returned loopback instead of my eth device. Here is the function with printks: static inline struct net_device *xxx_ip_dev_find(u32 addr) { struct net_device *dev; u32 ip; read_lock(&dev_base_lock); printk("%s looking for dev with addr %x\n", __FUNCTION__, addr); for (dev = dev_base; dev; dev = dev->next) { ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); printk("%s dev %p name %s ipaddr %x\n", __FUNCTION__, dev, dev->name, ip); if (ip == addr) { dev_hold(dev); break; } } read_unlock(&dev_base_lock); return dev; } Here is the printk log showing loopback being returned: xxx_ip_dev_find looking for dev with addr 8846a8c0 xxx_ip_dev_find dev ffffffff804000e0 name lo ipaddr 8846a8c0 The address bound to eth3 is 192.168.70.136 (0xc0a84688). For some reason, this line: ip = inet_select_addr(dev, 0, RT_SCOPE_LINK); Returns the 192.168.70.136 address for device->name == "lo". Riddle me that! Also, sometimes it works ok because the loopback interface gets some other ip address that is assigned to the local system as opposed to my rdma address. For example, I booted up the sles9sp3 system with a rebuilt kernel and no ofed modules installed. The system gets 10.10.0.136 via DHCP for its "public" interface. I then built the ofed modules and installed them. I then loaded them and configured my rnic interface with 192.168.70.136. I ran rping and bound to the local ipaddr and it worked. The log showed that inet_select_addr() returned 10.10.0.136 for loopback and thus xxx_ip_dev_find() continued walking the list and found the correct ethernet interface. I then rebooted and ran the test again and it failed. So somehow module load order affects this, I think. grrrr. Steve. _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general