Paul H. Hargrove wrote:
Quoting from a different manpage for ftruncate:
[T]he POSIX standard allows two behaviours for ftruncate
when length exceeds the file length [...]: either returning an
error, or
extending the file.
So, if that is to be trusted, it is not legal by PO
On Mar 25, 2009, at 6:09 PM, Aurélien Bouteiller wrote:
I'm trying to state that a particular component depends on another
that should therefore be dlopened automatically when it is loaded. I
found some code doing exactly that in
mca_base_component_find:open_component, but can't find any example
FWIW, MPI_TEST* and MPI_WAIT* all check for MPI_STATUS[ES]_IGNORE at
the lower layers.
I believe that the correct fix for MPI_REQUEST_GET_STATUS should be
the following, because checks for MPI_STATUS_IGNORE are performed
later in the function:
Index: ompi/mpi/c/request_get_status.c
==
Quoting from a different manpage for ftruncate:
[T]he POSIX standard allows two behaviours for ftruncate
when length exceeds the file length [...]: either returning an
error, or
extending the file.
So, if that is to be trusted, it is not legal by POSIX to *silently* not
extend
Talking with Aurelien here @ UT we think we came-up with a possible
way to get such an error. Before explaining this let me set the bases.
There are 2 critical functions used in setting up the shared memory
file. One is ftruncate the other one mmap. Here are two snippets from
these function
In reference to this critical bug, there are implications for the current
1.3.x release schedule that are alluded to in Jeff's message. In
particular, there are two time-critical issues at play:
1) getting a fix for #1853 in time for inclusion for OFED-1.4.1 2)
getting in Sun's changes/CMRs in
Shaun,
Not in Open MPI :) But there is a section in the MPI Standard that
talk about the MPI_STATUS_IGNORE and make the list of functions that
can deal with it.
george.
On Mar 27, 2009, at 15:15 , Shaun Jackman wrote:
Hi George,
You will need to update MPI_Test and MPI_Wait as well, w
Hi George,
You will need to update MPI_Test and MPI_Wait as well, which do not
check that status != NULL. Is there an index of MPI functions by their
parameter type, such as the set of functions that take an MPI_Status
argument?
Cheers,
Shaun
George Bosilca wrote:
Shaun,
Thanks for the bu
The Open MPI team has uncovered a serious bug in Open MPI v1.3.0 and
v1.3.1: when running on OpenFabrics-based networks, silent data
corruption is possible in some cases. There are two workarounds to
avoid the issue -- please see the bug ticket that has been opened
about this issue for fur
Shaun,
Thanks for the bug report. In general we like to check the arguments
against NULL, in order to make sure we don't segfault. However, in
this particular context we check against NULL but NULL is our
MPI_STATUS_IGNORE. I think I will prefer a little bit more safer
solution where we t
MPI_Request_get_status fails if the status parameter is passed
MPI_STATUS_IGNORE. A patch is attached.
Cheers,
Shaun
2009-03-26 Shaun Jackman
* ompi/mpi/c/request_get_status.c (MPI_Request_get_status):
Do not fail if the status argument is NULL, because the
application may pas
Eugene,
I think I remember setting up the MTT tests on Sif so that tests
are run both with and without the coll_hierarch component selected.
The coll_hierarch component stresses code paths and potential
race conditions in its own way. So, if the problems are showing up
more frequently for the test
Josh Hursey wrote:
Sif is also running the coll_hierarch component on some of those
tests which has caused some additional problems. I don't know if that
is related or not.
Indeed. Many of the MTT stack traces (for both 1.3.1 and 1.3.2 and that
have seg faults and call out mca_btl_sm.so)
Ignore the 17k+ failures from Cisco last night...
I had a bunch of half-complete changes on my cluster last night and
forgot to disable MTT overnight.
--
Jeff Squyres
Cisco Systems
FWIW, when I was looking into this before, the problem was definitely
during MPI_INIT. I ran out of time before being able to track it down
further, but it was definitely something during the sm startup --
during add_procs, IIRC.
It *looked* like there was some kind of bogus value in the b
Hmmm...Eugene, you need to be a tad less sensitive. Nobody was
attempting to indict you or in any way attack you or your code.
What I was attempting to point out is that there are a number of sm
failures during sm init. I didn't single you out. I posted it to the
community because (a) it is
On Mar 26, 2009, at 6:41 PM, Ralph Castain wrote:
I suspect Josh or someone at IU could tell you the compiler. I
would be very surprised if it wasn't gcc, but I don't know what
version.
All the MTT runs on Sif are using gcc 4.1.2:
-bash-3.2$ gcc --version
gcc (GCC) 4.1.2 20080704 (Red Hat
Ralph Castain wrote:
You are correct - the Sun errors are in a version prior to the
insertion of the SM changes. We didn't relabel the version to 1.3.2
until -after- those changes went in, so you have to look for anything
with an r number >= 20839.
The sif errors are all in that group - I
18 matches
Mail list logo