Cross your fingers; we might release tomorrow (I've probably now
jinxed it by saying that!).
On Jan 12, 2009, at 1:54 PM, Justin wrote:
In order for me to test this out I need to wait for TACC to install
this version on Ranger. Right now they have version 1.3a1r19685
installed. I'm guessing this is probably an older version. I'm not
sure when TACC will get around to updating there OpenMPI version. I
could request them to update it but it would be a lot easier to
request an actual release. What is the current schedule for the 1.3
release?
Justin
Jeff Squyres wrote:
Justin --
Could you actually give your code a whirl with 1.3rc3 to ensure
that it fixes the problem for you?
http://www.open-mpi.org/software/ompi/v1.3/
On Jan 12, 2009, at 1:30 PM, Tim Mattox wrote:
Hi Justin,
I applied the fixes for this particular deadlock to the 1.3 code
base
late last week, see ticket #1725:
https://svn.open-mpi.org/trac/ompi/ticket/1725
This should fix the described problem, but I personally have not
tested
to see if the deadlock in question is now gone. Everyone should
give
thanks to George for his efforts in tracking down the problem
and finding a solution.
-- Tim Mattox, the v1.3 gatekeeper
On Mon, Jan 12, 2009 at 12:46 PM, Justin <luitj...@cs.utah.edu>
wrote:
Hi, has this deadlock been fixed in the 1.3 source yet?
Thanks,
Justin
Jeff Squyres wrote:
On Dec 11, 2008, at 5:30 PM, Justin wrote:
The more I look at this bug the more I'm convinced it is with
openMPI and
not our code. Here is why: Our code generates a communication/
execution
schedule. At each timestep this schedule is executed and all
communication
and execution is performed. Our problem is AMR which means the
communication schedule may change from time to time. In this
case the
schedule has not changed in many timesteps meaning the same
communication
schedule is being used as the last X (x being around 20 in this
case)
timesteps.
Our code does have a very large communication problem. I have
been able
to reduce the hang down to 16 processors and it seems to me the
hang occurs
when he have lots of work per processor. Meaning if I add more
processors
it may not hang but reducing processors makes it more likely to
hang.
What is the status on the fix for this particular freelist
deadlock?
George is actively working on it because it is the "last" issue
blocking
us from releasing v1.3. I fear that if he doesn't get it fixed
by tonight,
we'll have to push v1.3 to next year (see
http://www.open-mpi.org/community/lists/devel/2008/12/5029.php and
http://www.open-mpi.org/community/lists/users/2008/12/7499.php).
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems