[OMPI devel] Fwd: [Open MPI] #1101: MPI_ALLOC_MEM with 0 size must be valid

2007-07-23 Thread Jeff Squyres
Does anyone have any opinions on this?  If not, I'll go implement  
option #1.


Len: can we add this to the agenda for tomorrow?  It should take <5  
minutes to discuss.



Begin forwarded message:


From: "Open MPI" 
Date: July 23, 2007 9:13:47 PM EDT
Cc: b...@osl.iu.edu
Subject: [Open MPI] #1101: MPI_ALLOC_MEM with 0 size must be valid

#1101: MPI_ALLOC_MEM with 0 size must be valid
- 
+--

Reporter:  jsquyres  |   Owner:
Type:  defect|  Status:  new
Priority:  minor |   Milestone:  Open MPI 1.2.4
 Version:  trunk |Keywords:
- 
+--

 From http://www.open-mpi.org/community/lists/devel/2007/07/1977.php,
 Lisandro Dalcin notes that MPI_ALLOC_MEM(0, ...) is a valid call,  
but OMPI

 returns a warning ("malloc debug: Request for 0 bytes
 (base/mpool_base_alloc.c, 194)").

 I see two choices for fixing this:

  1. Have the MPI_ALLOC_MEM function itself realize that the  
request was
 for 0 bytes and return a pointer to a global char array of size  
1.  This
 allows the user to dereference the pointer, but not store anything  
there.
 To match this, MPI_FREE_MEM will have to notice that the base  
passed is to

 this sentinel array and not actually free it.
  1. Do pretty much the same thing as described above but in the  
mpool base

 (we cannot malloc(0), becuase that's what [sometimes] generates the
 warning message).

 Personally, I like the former because the lower layers of OMPI should
 never be calling mpool_base_alloc(0, ...).  Any other comments?

--
Ticket URL: 
Open MPI 



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] MPI_ALLOC_MEM warning when requesting 0 (zero) bytes

2007-07-23 Thread Jeff Squyres

Excellent point.  Thanks!

I think that this will require a little tomfoolery to fix properly  
because we can't simply return NULL (you can't expect to use the  
pointer that we return to store anything, but you should be able to  
expect to be able to dereference it without seg faulting).


I'll fix a ticket for fixing.


On Jul 23, 2007, at 8:56 PM, Lisandro Dalcin wrote:


If I understand correctly the standard,
http://www.mpi-forum.org/docs/mpi-20-html/node54.htm#Node54

MPI_ALLOC_MEM with size=0 is valid ('size' is a nonnegative integer)

Then, using branch v1.2, I've got the following warning at runtime:

malloc debug: Request for 0 bytes (base/mpool_base_alloc.c, 194)

As always, forget me if this was already reported.

Regards,

--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems




[OMPI devel] MPI_ALLOC_MEM warning when requesting 0 (zero) bytes

2007-07-23 Thread Lisandro Dalcin

If I understand correctly the standard,
http://www.mpi-forum.org/docs/mpi-20-html/node54.htm#Node54

MPI_ALLOC_MEM with size=0 is valid ('size' is a nonnegative integer)

Then, using branch v1.2, I've got the following warning at runtime:

malloc debug: Request for 0 bytes (base/mpool_base_alloc.c, 194)

As always, forget me if this was already reported.

Regards,

--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] MPI_APPNUM value for apps not started through mpiexec

2007-07-23 Thread Lisandro Dalcin

Using a fresh (2 hours agoo) update of SVN branch v1.2, I've found
that attribute MPI_APPNUM returns -1 (minus one) when an 'sequential'
application is not launched through mpiexec.

Reading the MPI standard, I understand it should return a non-negative
integer if defined, or it should not be defined at all.

http://www.mpi-forum.org/docs/mpi-20-html/node113.htm#Node113
"""
If an application was not spawned with MPI_COMM_SPAWN or
MPI_COMM_SPAWN_MULTIPLE, and MPI_APPNUM doesn't make sense in the
context of the implementation-specific startup mechanism, MPI_APPNUM
is not set.
"""

I'm not sure if this is intended, but I report it anyway, sorry if
this is issue was already reported.

Regards,

--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] Fwd: lsf support / farm use models

2007-07-23 Thread Bill McMillan
Matt,

Sorry for the delay in replying.

>first of all, thanks for the info bill! i think i'm really starting to
piece things together now. you are right in
>that i'm working with a 6.x (6.2 with 6.1 devel libs ;) install here at
cadence, without the HPC extensions AFAIK. 
>also, i think that are customers are mostly in the same position -- i
assume that the HPC extensions cost extra? 
>or perhaps admins just don't bother to install them.

 Since most apps in EDA are sequential, most admins haven't installed
the extensions

>i'll try to gather more data, but my feeling it that the market
penetration of both HPC and LSF 7.0 is low in our 
>marker (EDA vendors and customers). i'd love to just stall until 7.0 is
widely available, but perhaps in the mean 
>time it would be nice to have some backward support for LSF 6.0 'base'.
it seems like supporting LSF 6.x /w HPC 
>might not be too useful, since:
>a) it's not clear that the 'built in' "bsub -n N -a openmpi foo"
>support will work with an MPI-2 dynamic-spawning application like mine
(or does it?),

 From an LSF perspective, you get allocated N slots, and how the
application uses them is pretty much at its
 discretion.   So in this case it would start orted on each allocated
node, and you can create whatever
 dynamic processes you like from your openmpi app within that
allocation.

 At present the actual allocation is fixed, but there will be support
for changing the actual allocation
 in a forthcoming release.

>b) i've heard that manually interfacing with the  parallel application
manager directly is tricky?

 If you don't use the supplied methods (such as the -a openmpi method)
then it can be a little tricky to
 set it up the first time.

>c) most importantly, it's not clear than any of our customers have the
HPC support, and certainly not all of them, 
>so i need to support LSF 6.0 'base' anyway -- it only needs to work
until 7.0 is widely available (< 1 year? i really
>have no idea ... will Platform end support for 6.x at some particular
time? or otherwise push customers to upgrade? perhaps 
>cadence can help there too ...) .

 The -a openmpi method works with LSF 6.x; and will be supported till at
least the end of the decade.  It sounds like
 the simplest solution may be to make the hpc extensions available as a
patch kit for everyone.

>1) use bsub -n N, followed by N-1 ls_rtaske() calls (or similar).
>while ls_rtaske() may not 'force' me to follow the queuing rules, if i
only launch on the proper machines, i should be okay, 
>right? i don't think IO and process marshaling (i'm not sure exactly
what you mean by
>that) are a problem since openmpi/orted handles those issues, i think?

 Yes it will work, however it has two drawbacks:
 * In this model you essentially become responsible for error handling
if a remote task dies, and cleaning up gracefully if
   the master process dies
 * From a process accounting (and hence scheduling) point of view,
resources consumed by the remote tasks are not attributed
   to the master task.
 The -a openmpi method (and blaunch) handles both these cases.

>2) use only bsub's of single processes, using some initial wrapper
script that bsub's all the jobs (master + N-1 slaves) 
>needed to reach the desired static allocation for openmpi. this seems
to be what my internal guy is suggesting is 'required'. 

 Again, this will work, tho you may not be too popular with your cluster
admin if you are holding onto N-1 cpus while waiting
 for the Nth to be allocated.  Method (1) would be viewed as a true
parallel job and could be backfilled, while (2) is just
 a lose collection of sequential tasks.  This would also suffer from the
same drawbacks as (1).

 If your application could start with just 1 cpu then deal with the rest
as they are added, then you keep the cluster admin 
 happy.


 I guess this discussion is becoming very LSF specific, if you would
prefer to discuss it offline please let me know.

 Cheers
 Bill



Re: [OMPI devel] problems with openib finalize

2007-07-23 Thread Pavel Shamis (Pasha)
Just committed r15557  that adds finalize 
flow to mpool. So now openib should be able to release

all resources in normal way.
Pasha


Pavel Shamis (Pasha) wrote:

Jeff Squyres wrote:
  
Background: Pasha added a call in the openib BTL finalize function  
that will only succeed if all registered memory has been released  
(ibv_dealloc_pd()).  Since the test app didn't call MPI_FREE_MEM,  
there was some memory that was still registered, and therefore the  
call in finalize failed.  We treated this as a fatal error.  Last  
night's MTT runs turned up several apps that exhibited this fatal error.


While we're examining this problem, Pasha has removed the call to  
ibv_dealloc_pd() in the trunk openib BTL finalize.


I examined 1 of the tests that was failing last night in MTT:  
onesided/t.f90.  This test has an MPI_ALLOC_MEM with no corresponding  
MPI_FREE_MEM.  To investigate this problem, I restored the call to  
ibv_dealloc_pd() and re-ran the t.f90 test -- the problem still  
occurs.  Good.


However, once I got the right MPI_FREE_MEM call in t.f90, the test  
started passing.  I.e., ibv_dealloc_pd(hca->ib_pd) succeeds because  
all registered memory has been released.  Hence, the test itself was  
faulty.


However, I don't think we should *error* if we fail to ibv_dealloc_pd 
(hca->ib_pd); it's a user error, but it's not catastrophic unless  
we're trying to do an HCA restart scenario.  Specifically: during a  
normal MPI_FINALIZE, who cares?


I think we should do the following:

1. If we're not doing an HCA restart/checkpoint and we fail to  
ibv_dealloc_pd(), just move on (i.e., it's not a warning/error unless  
we *want* a warning, such as if an MCA parameter  
btl_openib_warn_if_finalize_fail is enabled, or somesuch).


2. If we *are* doing an HCA restart/checkpoint and ibv_dealloc_pd()  
fails, then we have to gracefully fail to notify upper layers that  
Bad Things happened (I suspect that we need mpool finalize  
implemented to properly implement checkpointing for RDMA networks).


3. Add a new MCA parameter named mpi_show_mpi_alloc_mem_leaks that,  
when enabled, shows a warning in ompi_mpi_finalize() if there is  
still memory allocated by MPI_ALLOC_MEM that was not freed by  
MPI_FREE_MEM (this MCA parameter will parallel the already-existing  
mpi_show_handle_leaks MCA param which displays warnings if the app  
creates MPI objects but does not free them).


My points:
- leaked MPI_ALLOC_MEM memory should be reported by the MPI layer,  
not a BTL or mpool
- failing to ibv_dealloc_pd() during MPI_FINALIZE should only trigger  
a warning if the user wants to see it
- failing to ibv_dealloc_pd() during an HCA restart or checkpoint  
should gracefully fail upwards


Comments?
  


Agree.

In addition I will add code that will flush all user data from mpool and 
will allow normal IB finalization.



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel