Greg,

I have located the source of the bus error and core dump on SGI multiprocessor machines. Randall mentioned it awhile back (see http://opendx.npaci.edu/mail/opendx-users/2002.08/msg00098.html ), and I have been seeing it while testing 4.3.

What happens is exCleanup() in dxmain.c is executed twice, once for the parent (_dxd_exMyPID = -1) and once for the master child (_dxd_exMyPID = 0). If the child gets there first, then the bus error occurs. If the parent gets there first then a clean exit occurs.

The condition around line 1557 in dxmain.c can be modified to eliminate this error by choosing either the parent or master child do the cleanup:

ok = ((exParent && _dxd_exMyPID==-1) || (nprocs == 1 && _dxd_exMyPID == 0 && ! processor_status_on));

or

ok = ((exParent && !_dxd_exMyPID) || (nprocs == 1 && _dxd_exMyPID == 0 && ! processor_status_on));

Since you have worked on the multiprocessor implementation, wanted to check with you on the correct approach. Another method that works is to sleep(1) if _dxd_exMyPID == 0.

Jeff

Reply via email to