Creating nightly hwloc snapshot git tarball was a success.
Snapshot: hwloc dev-397-ga95433a
Start time: Thu Feb 12 21:01:02 EST 2015
End time: Thu Feb 12 21:02:44 EST 2015
Your friendly daemon,
Cyrador
Nathan,
Just FYI: Both systems where I've seen this failure are VMs on a
well-loaded server.
So, the instruction interleaving (for reproducing races) is likely a bit
different than what you would see on ones own laptop or workstation. Also,
I don't see the SEGV in every run, but to reproduce it
Yes, seriously. This code is still undergoing testing which is part of
the reason it is on master. Once I am confident in the code I will be
updating some on my code to use a fifo instead of an opal_list_t and a
lock.
I don't know if the barrier will make a difference but it is the only
place I
Seriously?
George.
On Thu, Feb 12, 2015 at 1:00 PM, Nathan Hjelm wrote:
>
> I think I see the issue. Looks like there is a missing memory barrier
> after the head consistency code. I will add one and see if that fixes
> your problem.
>
> BTW, I can't reproduce the issue on
I think I see the issue. Looks like there is a missing memory barrier
after the head consistency code. I will add one and see if that fixes
your problem.
BTW, I can't reproduce the issue on any of my systems :-/.
-Nathan
On Thu, Feb 12, 2015 at 02:07:08AM -0800, Paul Hargrove wrote:
>Just
True - but to directly answer Adrian’s question:
Setting the buffer to NULL is not necessary and definitely a bad example.
> On Feb 12, 2015, at 3:01 AM, Gilles Gouaillardet
> wrote:
>
> Adrian,
>
> in the case of ompi/group/group_init.c, new_group = NULL is
Adrian,
in the case of ompi/group/group_init.c, new_group = NULL is clearly an
overkill,
but there is nothing wrong with it :
it can only be 1 when OBJ_RELEASE is invoked
(and hence new_group is already NULL, so no need to NULLify it a second
time)
that being said some typing can also be saved
Just experienced the same failure as below with openmpi-dev-904-g08dceda
build with "gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16)" on Scientific
Linux 7.x (a RHEL 7 clone).
gdb says:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x753b0700 (LWP 19685)]
I am not 100% sure I was understood correctly and I am also not sure I
understand the discussion I triggered.
Being not very familiar with the Open MPI code base I often look at
other places in the code for examples how something can/could be done.
Looking at different examples OBJ_RELEASE() I
It would be good to know where you are seeing this - as was stated, the macro
reduces the ref count and will NULL the pointer if and only if the ref count
goes to zero. However, the code may set it to NULL for some other reason that
relates to the later use of that particular variable.
If not
I was just curious as if I am calling
OBJ_RELEASE(buffer);
buffer = NULL;
on a buffer with an object count different to 1, the buffer is not free'd
but set to NULL. If I call it again the buffer is NULL and the original
buffer will not be free'd. Setting the buffer to NULL seems unnecessary.
I
Adrian,
opal_obj_update does not fail or success, it returns the new
obj_reference_count.
can you point to one specific location in the code where you think it is
wrong ?
OBJ_RELEASE(buffer)
buffer = NULL;
could be written as
if (((opal_object_t *)buffer)->obj_reference_count == 1) {
At many places all over the code I see
OBJ_RELEASE(buffer)
buffer = NULL;
Looking at the definition of OBJ_RELEASE() this seems unnecessary and
wrong:
#define OBJ_RELEASE(object) \
do {
13 matches
Mail list logo