On Aug 17, 2006, at 7:49 PM, Pete Wyckoff wrote:

[EMAIL PROTECTED] wrote on Thu, 17 Aug 2006 18:14 -0500:
* BMI memory allocation.  Do we place any restrictions on when or how
frequently BMI_memalloc is called?  In the pvfs code, we always call
BMI_memalloc for a post_send or post_recv.  Would it be possible to
avoid the malloc on the client for a write and just use the user
buffer?  Or should we mandate that calls to post_send and post_recv
always pass in a pointer from BMI_memalloc?  (as a side note, if we
make that mandate, maybe we should have a BMI_buffer type that
memalloc returns and post_send/post_recv accept).

Both bmi_ib and bmi_gm define the BMI memalloc method to do
something other than simply malloc().  In the IB case, it pins the
memory early, and never unpins it until the corresponding
BMI_memfree() happens.  This is better than letting BMI do the
pinning explicitly, as it moves some of the messaging work out of
the critical path, if you can arrange to alloc/free before you do
send/recv.

Note that these alloc routines only do something special if the
buffer is big enough to be "worth it" (8 kB for IB).

There's no restrictions on how frequently you can call these things.
Each pinned memory region has some overhead in terms of in-pvfs data
structures, in-kernel data structers, and on-NIC data structures.
Ideally we'd try to limit the growth of these things and force old
entries to be freed, but in practice they mostly just grow and it's
not a big problem (unless you have lots of pvfs apps on a single
box, for instance).

You can certainly avoid the malloc and use the user buffer when you
have one instead.  I think this is the common case for MPI-IO
operations.  Point out what case you're talking about and I'll take
a look.

It looks like the mem_to_bmi code (client write) in flow always does a memalloc for the intermediate buffer and then copies the user buffer into that. On reads (bmi_to_mem), flow does use the client's buffer, so I guess that's a case that doesn't do memalloc. I wonder if the copy on a client write could be avoided as well though.


We definitely cannot mandate that all memory is BMI_memalloc-ed.
Arbirtary MPI_File_Write() and similar will pass in user buffers.
We don't want to copy them into BMI_memalloc-ed memory, and it's not
really practical to require that application writers use the MPI (or
PVFS) alloc routines.

If the bmi_buffer_type argument to the post_send and post_recv
routines is BMI_PRE_ALLOC, a BMI implementation can avoid pinning
the memory, as does GM.  For IB, it's just as fast to check the
address to see if it has already been pinned, either through
memalloc or implicitly by having been used as a user buffer.


Sounds cool.  Thanks for the good explanation.

-sam

                -- Pete


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to