Re: [OMPI devel] SM initialization race condition

2008-08-21 Thread Tim Mattox
A little google searching, and the best I can find is that
memset is part of the C89/C90 standard, while bzero isn't.
Thus memset would/should be supported even on non-POSIX
systems.  Also, the opengroup claims that the bzero is LEGACY
and "This function may be withdrawn in a future version."
http://www.opengroup.org/onlinepubs/95399/functions/bzero.html

However, who actually thinks that bzero would ever be removed?

And yes, there is a hyphen in ;-)Now back to productive work for me.

On Thu, Aug 21, 2008 at 9:39 AM, Tim Mattox  wrote:
> Actually, bzero() is POSIX.  Here is the history section of the bzero man page
> on Mac OS X 10.4:
>
> A bzero() function appeared in 4.3BSD.  Its prototype existed previously
> in  before it was moved to  for IEEE Std 1003.1-2001
> (``POSIX.1'') compliance.
>
> Hmmm, but the Linux man page says it is deprecated, and says we should
> use memset.
> Wish it explained why...  so I think we are safe to just use bzero,
>
> On Thu, Aug 21, 2008 at 9:32 AM, Jeff Squyres  wrote:
>> IIRC, bzero is a gnu-ism.  We should probably use memset instead.
>>
>>
>> On Aug 21, 2008, at 5:40 AM, George Bosilca wrote:
>>
>>> Terry,
>>>
>>> We use the feature defined by POSIX mmap where the area should be
>>> zero-filled when the file length is extended. What OS you're using when you
>>> see such problems ?
>>>
>>> Just in case, here is a patch that set the beginning of the mmaped region
>>> to zero, in case this is not done automatically. As in most cases this is an
>>> unnecessary overhead, we should find the cases where we really need this,
>>> and possibly conditionally compile it.
>>>
>>> Index: ompi/mca/common/sm/common_sm_mmap.c
>>> ===
>>> --- ompi/mca/common/sm/common_sm_mmap.c (revision 19377)
>>> +++ ompi/mca/common/sm/common_sm_mmap.c (working copy)
>>> @@ -163,6 +163,7 @@
>>>
>>>/* initialize the segment - only the first process
>>>   to open the file */
>>> +bzero( map->data_addr, size );
>>>mem_offset = map->data_addr - (unsigned char
>>> *)map->map_seg;
>>>map->map_seg->seg_offset = mem_offset;
>>>map->map_seg->seg_size = size - mem_offset;
>>>
>>>  george.
>>>
>>> On Aug 21, 2008, at 1:22 PM, Terry Dontje wrote:
>>>
 I've been seeing an intermittent (once every 4 hours looping on a quick
 initialization program) segv with the following stack trace.

 =>[1] mca_btl_sm_add_procs(btl = 0xfd7ffdb67ef0, nprocs = 2U, procs =
 0x591560, peers = 0x591580, reachability = 0xfd7fffdff000), line 519 in
 "btl_sm.c"
 [2] mca_bml_r2_add_procs(nprocs = 2U, procs = 0x591560, bml_endpoints =
 0x591500, reachable = 0xfd7fffdff000), line 222 in "bml_r2.c"
 [3] mca_pml_ob1_add_procs(procs = 0x5914c0, nprocs = 2U), line 248 in
 "pml_ob1.c"
 [4] ompi_mpi_init(argc = 1, argv = 0xfd7fffdff318, requested = 0,
 provided = 0xfd7fffdff234), line 651 in "ompi_mpi_init.c"
 [5] PMPI_Init(argc = 0xfd7fffdff2ec, argv = 0xfd7fffdff2e0), line
 90 in "pinit.c"
 [6] main(argc = 1, argv = 0xfd7fffdff318), line 82 in "buffer.c"

 I believe the problem is that mca_btl_sm_component.shm_fifo[j] contains
 uninitialized data causes the loop on line 504 in btl_sm.c to think that a
 remote rank has set its fifo address.

 Has anyone else seen the above happening?

 --td

 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
>
> --
> Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
>  tmat...@gmail.com || timat...@open-mpi.org
>  I'm a bright... http://www.the-brights.net/
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
 I'm a bright... http://www.the-brights.net/


Re: [OMPI devel] SM initialization race condition

2008-08-21 Thread Terry Dontje

George Bosilca wrote:

Terry,

We use the feature defined by POSIX mmap where the area should be 
zero-filled when the file length is extended. What OS you're using 
when you see such problems ?


So far I've only tested this on Solaris.  We'll try out the bzero to see 
if this goes away.


--td
Just in case, here is a patch that set the beginning of the mmaped 
region to zero, in case this is not done automatically. As in most 
cases this is an unnecessary overhead, we should find the cases where 
we really need this, and possibly conditionally compile it.


Index: ompi/mca/common/sm/common_sm_mmap.c
===
--- ompi/mca/common/sm/common_sm_mmap.c(revision 19377)
+++ ompi/mca/common/sm/common_sm_mmap.c(working copy)
@@ -163,6 +163,7 @@

 /* initialize the segment - only the first process
to open the file */
+bzero( map->data_addr, size );
 mem_offset = map->data_addr - (unsigned char 
*)map->map_seg;

 map->map_seg->seg_offset = mem_offset;
 map->map_seg->seg_size = size - mem_offset;

  george.

On Aug 21, 2008, at 1:22 PM, Terry Dontje wrote:

I've been seeing an intermittent (once every 4 hours looping on a 
quick initialization program) segv with the following stack trace.


=>[1] mca_btl_sm_add_procs(btl = 0xfd7ffdb67ef0, nprocs = 2U, 
procs = 0x591560, peers = 0x591580, reachability = 
0xfd7fffdff000), line 519 in "btl_sm.c"
[2] mca_bml_r2_add_procs(nprocs = 2U, procs = 0x591560, bml_endpoints 
= 0x591500, reachable = 0xfd7fffdff000), line 222 in "bml_r2.c"
[3] mca_pml_ob1_add_procs(procs = 0x5914c0, nprocs = 2U), line 248 in 
"pml_ob1.c"
[4] ompi_mpi_init(argc = 1, argv = 0xfd7fffdff318, requested = 0, 
provided = 0xfd7fffdff234), line 651 in "ompi_mpi_init.c"
[5] PMPI_Init(argc = 0xfd7fffdff2ec, argv = 0xfd7fffdff2e0), 
line 90 in "pinit.c"

[6] main(argc = 1, argv = 0xfd7fffdff318), line 82 in "buffer.c"

I believe the problem is that mca_btl_sm_component.shm_fifo[j] 
contains uninitialized data causes the loop on line 504 in btl_sm.c 
to think that a remote rank has set its fifo address.


Has anyone else seen the above happening?

--td

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
  




Re: [OMPI devel] SM initialization race condition

2008-08-21 Thread Tim Mattox
Actually, bzero() is POSIX.  Here is the history section of the bzero man page
on Mac OS X 10.4:

A bzero() function appeared in 4.3BSD.  Its prototype existed previously
 in  before it was moved to  for IEEE Std 1003.1-2001
 (``POSIX.1'') compliance.

Hmmm, but the Linux man page says it is deprecated, and says we should
use memset.
Wish it explained why...  so I think we are safe to just use bzero,

On Thu, Aug 21, 2008 at 9:32 AM, Jeff Squyres  wrote:
> IIRC, bzero is a gnu-ism.  We should probably use memset instead.
>
>
> On Aug 21, 2008, at 5:40 AM, George Bosilca wrote:
>
>> Terry,
>>
>> We use the feature defined by POSIX mmap where the area should be
>> zero-filled when the file length is extended. What OS you're using when you
>> see such problems ?
>>
>> Just in case, here is a patch that set the beginning of the mmaped region
>> to zero, in case this is not done automatically. As in most cases this is an
>> unnecessary overhead, we should find the cases where we really need this,
>> and possibly conditionally compile it.
>>
>> Index: ompi/mca/common/sm/common_sm_mmap.c
>> ===
>> --- ompi/mca/common/sm/common_sm_mmap.c (revision 19377)
>> +++ ompi/mca/common/sm/common_sm_mmap.c (working copy)
>> @@ -163,6 +163,7 @@
>>
>>/* initialize the segment - only the first process
>>   to open the file */
>> +bzero( map->data_addr, size );
>>mem_offset = map->data_addr - (unsigned char
>> *)map->map_seg;
>>map->map_seg->seg_offset = mem_offset;
>>map->map_seg->seg_size = size - mem_offset;
>>
>>  george.
>>
>> On Aug 21, 2008, at 1:22 PM, Terry Dontje wrote:
>>
>>> I've been seeing an intermittent (once every 4 hours looping on a quick
>>> initialization program) segv with the following stack trace.
>>>
>>> =>[1] mca_btl_sm_add_procs(btl = 0xfd7ffdb67ef0, nprocs = 2U, procs =
>>> 0x591560, peers = 0x591580, reachability = 0xfd7fffdff000), line 519 in
>>> "btl_sm.c"
>>> [2] mca_bml_r2_add_procs(nprocs = 2U, procs = 0x591560, bml_endpoints =
>>> 0x591500, reachable = 0xfd7fffdff000), line 222 in "bml_r2.c"
>>> [3] mca_pml_ob1_add_procs(procs = 0x5914c0, nprocs = 2U), line 248 in
>>> "pml_ob1.c"
>>> [4] ompi_mpi_init(argc = 1, argv = 0xfd7fffdff318, requested = 0,
>>> provided = 0xfd7fffdff234), line 651 in "ompi_mpi_init.c"
>>> [5] PMPI_Init(argc = 0xfd7fffdff2ec, argv = 0xfd7fffdff2e0), line
>>> 90 in "pinit.c"
>>> [6] main(argc = 1, argv = 0xfd7fffdff318), line 82 in "buffer.c"
>>>
>>> I believe the problem is that mca_btl_sm_component.shm_fifo[j] contains
>>> uninitialized data causes the loop on line 504 in btl_sm.c to think that a
>>> remote rank has set its fifo address.
>>>
>>> Has anyone else seen the above happening?
>>>
>>> --td
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
 I'm a bright... http://www.the-brights.net/


Re: [OMPI devel] SM initialization race condition

2008-08-21 Thread George Bosilca

bzero() function conforms to IEEE Std 1003.1-2001 (``POSIX.1'')

memset() function conforms to ISO/IEC 9899:1990 (``ISO C90'')

Both functions are in the libc, so it's definitively difficult to see  
which one is better.


  george.

On Aug 21, 2008, at 3:32 PM, Jeff Squyres wrote:


IIRC, bzero is a gnu-ism.  We should probably use memset instead.


On Aug 21, 2008, at 5:40 AM, George Bosilca wrote:


Terry,

We use the feature defined by POSIX mmap where the area should be  
zero-filled when the file length is extended. What OS you're using  
when you see such problems ?


Just in case, here is a patch that set the beginning of the mmaped  
region to zero, in case this is not done automatically. As in most  
cases this is an unnecessary overhead, we should find the cases  
where we really need this, and possibly conditionally compile it.


Index: ompi/mca/common/sm/common_sm_mmap.c
===
--- ompi/mca/common/sm/common_sm_mmap.c (revision 19377)
+++ ompi/mca/common/sm/common_sm_mmap.c (working copy)
@@ -163,6 +163,7 @@

   /* initialize the segment - only the first process
  to open the file */
+bzero( map->data_addr, size );
   mem_offset = map->data_addr - (unsigned char *)map- 
>map_seg;

   map->map_seg->seg_offset = mem_offset;
   map->map_seg->seg_size = size - mem_offset;

george.

On Aug 21, 2008, at 1:22 PM, Terry Dontje wrote:

I've been seeing an intermittent (once every 4 hours looping on a  
quick initialization program) segv with the following stack trace.


=>[1] mca_btl_sm_add_procs(btl = 0xfd7ffdb67ef0, nprocs = 2U,  
procs = 0x591560, peers = 0x591580, reachability =  
0xfd7fffdff000), line 519 in "btl_sm.c"
[2] mca_bml_r2_add_procs(nprocs = 2U, procs = 0x591560,  
bml_endpoints = 0x591500, reachable = 0xfd7fffdff000), line  
222 in "bml_r2.c"
[3] mca_pml_ob1_add_procs(procs = 0x5914c0, nprocs = 2U), line 248  
in "pml_ob1.c"
[4] ompi_mpi_init(argc = 1, argv = 0xfd7fffdff318, requested =  
0, provided = 0xfd7fffdff234), line 651 in "ompi_mpi_init.c"
[5] PMPI_Init(argc = 0xfd7fffdff2ec, argv =  
0xfd7fffdff2e0), line 90 in "pinit.c"

[6] main(argc = 1, argv = 0xfd7fffdff318), line 82 in "buffer.c"

I believe the problem is that mca_btl_sm_component.shm_fifo[j]  
contains uninitialized data causes the loop on line 504 in  
btl_sm.c to think that a remote rank has set its fifo address.


Has anyone else seen the above happening?

--td

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI devel] SM initialization race condition

2008-08-21 Thread Brian W. Barrett
bzero is not a gnu-ism -- it's in POSIX.1.  Either bzero or memset is 
correct and used throughout OMPI.


Brian

On Thu, 21 Aug 2008, Jeff Squyres wrote:


IIRC, bzero is a gnu-ism.  We should probably use memset instead.


On Aug 21, 2008, at 5:40 AM, George Bosilca wrote:


Terry,

We use the feature defined by POSIX mmap where the area should be 
zero-filled when the file length is extended. What OS you're using when you 
see such problems ?


Just in case, here is a patch that set the beginning of the mmaped region 
to zero, in case this is not done automatically. As in most cases this is 
an unnecessary overhead, we should find the cases where we really need 
this, and possibly conditionally compile it.


Index: ompi/mca/common/sm/common_sm_mmap.c
===
--- ompi/mca/common/sm/common_sm_mmap.c (revision 19377)
+++ ompi/mca/common/sm/common_sm_mmap.c (working copy)
@@ -163,6 +163,7 @@

   /* initialize the segment - only the first process
  to open the file */
+bzero( map->data_addr, size );
   mem_offset = map->data_addr - (unsigned char *)map->map_seg;
   map->map_seg->seg_offset = mem_offset;
   map->map_seg->seg_size = size - mem_offset;

george.

On Aug 21, 2008, at 1:22 PM, Terry Dontje wrote:

I've been seeing an intermittent (once every 4 hours looping on a quick 
initialization program) segv with the following stack trace.


=>[1] mca_btl_sm_add_procs(btl = 0xfd7ffdb67ef0, nprocs = 2U, procs = 
0x591560, peers = 0x591580, reachability = 0xfd7fffdff000), line 519 
in "btl_sm.c"
[2] mca_bml_r2_add_procs(nprocs = 2U, procs = 0x591560, bml_endpoints = 
0x591500, reachable = 0xfd7fffdff000), line 222 in "bml_r2.c"
[3] mca_pml_ob1_add_procs(procs = 0x5914c0, nprocs = 2U), line 248 in 
"pml_ob1.c"
[4] ompi_mpi_init(argc = 1, argv = 0xfd7fffdff318, requested = 0, 
provided = 0xfd7fffdff234), line 651 in "ompi_mpi_init.c"
[5] PMPI_Init(argc = 0xfd7fffdff2ec, argv = 0xfd7fffdff2e0), line 
90 in "pinit.c"

[6] main(argc = 1, argv = 0xfd7fffdff318), line 82 in "buffer.c"

I believe the problem is that mca_btl_sm_component.shm_fifo[j] contains 
uninitialized data causes the loop on line 504 in btl_sm.c to think that a 
remote rank has set its fifo address.


Has anyone else seen the above happening?

--td

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel






Re: [OMPI devel] SM initialization race condition

2008-08-21 Thread Jeff Squyres

IIRC, bzero is a gnu-ism.  We should probably use memset instead.


On Aug 21, 2008, at 5:40 AM, George Bosilca wrote:


Terry,

We use the feature defined by POSIX mmap where the area should be  
zero-filled when the file length is extended. What OS you're using  
when you see such problems ?


Just in case, here is a patch that set the beginning of the mmaped  
region to zero, in case this is not done automatically. As in most  
cases this is an unnecessary overhead, we should find the cases  
where we really need this, and possibly conditionally compile it.


Index: ompi/mca/common/sm/common_sm_mmap.c
===
--- ompi/mca/common/sm/common_sm_mmap.c (revision 19377)
+++ ompi/mca/common/sm/common_sm_mmap.c (working copy)
@@ -163,6 +163,7 @@

/* initialize the segment - only the first process
   to open the file */
+bzero( map->data_addr, size );
mem_offset = map->data_addr - (unsigned char *)map- 
>map_seg;

map->map_seg->seg_offset = mem_offset;
map->map_seg->seg_size = size - mem_offset;

 george.

On Aug 21, 2008, at 1:22 PM, Terry Dontje wrote:

I've been seeing an intermittent (once every 4 hours looping on a  
quick initialization program) segv with the following stack trace.


=>[1] mca_btl_sm_add_procs(btl = 0xfd7ffdb67ef0, nprocs = 2U,  
procs = 0x591560, peers = 0x591580, reachability =  
0xfd7fffdff000), line 519 in "btl_sm.c"
[2] mca_bml_r2_add_procs(nprocs = 2U, procs = 0x591560,  
bml_endpoints = 0x591500, reachable = 0xfd7fffdff000), line 222  
in "bml_r2.c"
[3] mca_pml_ob1_add_procs(procs = 0x5914c0, nprocs = 2U), line 248  
in "pml_ob1.c"
[4] ompi_mpi_init(argc = 1, argv = 0xfd7fffdff318, requested =  
0, provided = 0xfd7fffdff234), line 651 in "ompi_mpi_init.c"
[5] PMPI_Init(argc = 0xfd7fffdff2ec, argv =  
0xfd7fffdff2e0), line 90 in "pinit.c"

[6] main(argc = 1, argv = 0xfd7fffdff318), line 82 in "buffer.c"

I believe the problem is that mca_btl_sm_component.shm_fifo[j]  
contains uninitialized data causes the loop on line 504 in btl_sm.c  
to think that a remote rank has set its fifo address.


Has anyone else seen the above happening?

--td

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] SM initialization race condition

2008-08-21 Thread George Bosilca

Terry,

We use the feature defined by POSIX mmap where the area should be zero- 
filled when the file length is extended. What OS you're using when you  
see such problems ?


Just in case, here is a patch that set the beginning of the mmaped  
region to zero, in case this is not done automatically. As in most  
cases this is an unnecessary overhead, we should find the cases where  
we really need this, and possibly conditionally compile it.


Index: ompi/mca/common/sm/common_sm_mmap.c
===
--- ompi/mca/common/sm/common_sm_mmap.c (revision 19377)
+++ ompi/mca/common/sm/common_sm_mmap.c (working copy)
@@ -163,6 +163,7 @@

 /* initialize the segment - only the first process
to open the file */
+bzero( map->data_addr, size );
 mem_offset = map->data_addr - (unsigned char *)map- 
>map_seg;

 map->map_seg->seg_offset = mem_offset;
 map->map_seg->seg_size = size - mem_offset;

  george.

On Aug 21, 2008, at 1:22 PM, Terry Dontje wrote:

I've been seeing an intermittent (once every 4 hours looping on a  
quick initialization program) segv with the following stack trace.


=>[1] mca_btl_sm_add_procs(btl = 0xfd7ffdb67ef0, nprocs = 2U,  
procs = 0x591560, peers = 0x591580, reachability =  
0xfd7fffdff000), line 519 in "btl_sm.c"
[2] mca_bml_r2_add_procs(nprocs = 2U, procs = 0x591560,  
bml_endpoints = 0x591500, reachable = 0xfd7fffdff000), line 222  
in "bml_r2.c"
[3] mca_pml_ob1_add_procs(procs = 0x5914c0, nprocs = 2U), line 248  
in "pml_ob1.c"
[4] ompi_mpi_init(argc = 1, argv = 0xfd7fffdff318, requested =  
0, provided = 0xfd7fffdff234), line 651 in "ompi_mpi_init.c"
[5] PMPI_Init(argc = 0xfd7fffdff2ec, argv = 0xfd7fffdff2e0),  
line 90 in "pinit.c"

[6] main(argc = 1, argv = 0xfd7fffdff318), line 82 in "buffer.c"

I believe the problem is that mca_btl_sm_component.shm_fifo[j]  
contains uninitialized data causes the loop on line 504 in btl_sm.c  
to think that a remote rank has set its fifo address.


Has anyone else seen the above happening?

--td

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




smime.p7s
Description: S/MIME cryptographic signature