I was running a 4 node cluster running an MPI application over IPoIB and one of the system's died with the following messages logged to /var/log/messages. The svn rev. is 1355. 2 of the nodes are PCI-E cards and 2 nodes are PCI-X. The system that asserted was one of the PCI-E systems.
Dec 17 11:45:33 iclust-16 rsh(pam_unix)[4605]: session closed for user woody Dec 17 11:46:30 iclust-16 kernel: ib_mthca 0000:04:00.0: SQ full (64 posted, 64 max, 0 nreq) Dec 17 11:46:30 iclust-16 kernel: ib0: post_send failed Dec 17 11:46:39 iclust-16 kernel: ib_mthca 0000:04:00.0: SQ full (64 posted, 64 max, 0 nreq) Dec 17 11:46:39 iclust-16 kernel: ib0: post_send failed Dec 17 11:46:39 iclust-16 kernel: KERNEL: assertion (!atomic_read(&skb->users)) failed at net/core/dev.c (1616) Dec 17 11:49:49 iclust-16 syslogd 1.4.1: restart. Any ideas ? woody _______________________________________________ openib-general mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
