I was running a 4 node cluster running an MPI application over IPoIB and
one
of the system's died with the following messages logged to
/var/log/messages.
The svn rev. is 1355. 2 of the nodes are PCI-E cards and 2 nodes are
PCI-X.
The system that asserted was one of the PCI-E systems. 

Dec 17 11:45:33 iclust-16 rsh(pam_unix)[4605]: session closed for user
woody
Dec 17 11:46:30 iclust-16 kernel: ib_mthca 0000:04:00.0: SQ full (64
posted, 64 max, 0 nreq)
Dec 17 11:46:30 iclust-16 kernel: ib0: post_send failed
Dec 17 11:46:39 iclust-16 kernel: ib_mthca 0000:04:00.0: SQ full (64
posted, 64 max, 0 nreq)
Dec 17 11:46:39 iclust-16 kernel: ib0: post_send failed
Dec 17 11:46:39 iclust-16 kernel: KERNEL: assertion
(!atomic_read(&skb->users)) failed at net/core/dev.c (1616)
Dec 17 11:49:49 iclust-16 syslogd 1.4.1: restart.
 
Any ideas ?

woody
_______________________________________________
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to