I run an entire battery of tests on these without any issues. Moreover it is an
OMPI related thing, and these error messages were never used. Anyway, please
let me know what exactly failed, I'll fix it asap.
Thanks,
george.
On Oct 19, 2011, at 10:06 , Ralph Castain wrote:
> If you are go
OK, just saw your commit. It make sense, an OMPI component should return OMPI
error codes. Thanks for the fix.
george.
On Oct 19, 2011, at 10:12 , George Bosilca wrote:
> I run an entire battery of tests on these without any issues. Moreover it is
> an OMPI related thing, and these
Indeed, I removed some of the OMPI level error codes. As you can see in the
patch they were defined but never used.
I don't think they were worth an RFC, as they are not only never used in the
trunk, but on 1.5 and 1.4. And I did check it because I was wondering why they
existed in the first pl
I don't know how you think that the error codes work in Open MPI, so I'll take
the liberty to depict it here so we all agree we're talking about the same
thing.
The opal_strerror is a nice feature, it allow to register a range of error
codes with a particular error converter. Every time you loo
Can I have an example on how the current trunk is broken due to this change?
Thanks,
george.
On Oct 19, 2011, at 16:32 , Ralph Castain wrote:
> I propose that we retain the rest of the changeset, but revert the OMPI
> constants to bring back their ORTE equivalents. We clearly should scrub tho
; should be discussed in an RFC. What the current code does is not
> consistent with the original intent.
>
> I don't agree that you shouldn't propagate error codes through OMPI; in
> fact, the original intent of the design was to allow such propagation.
> Again, suc
somebody feels like filling up an RFC to remove them, please feel free to go
ahead.
george.
On Oct 19, 2011, at 18:41 , George Bosilca wrote:
> A careful reading of the committed patch, would have pointed out that none of
> the concerns raised so far were true, the "old-way" beha
Wonderful!!! We've been waiting for such functionality for a while.
I do have some questions/remarks related to this patch.
What is the my_node_rank in the orte_proc_info_t structure? Is there any
difference between using the field my_node_rank or the vpid part of the
my_daemon? What is the cor
_Barrier(),
>>>> i got the following error :
>>>>
>>>> *** An error occurred in MPI_Barrier
>>>> *** on communicator
>>>> *** MPI_ERR_INTERN: internal error
>>>> *** MPI_ERRORS_ARE_FATAL: your
This is a much saner solution. We [mostly] stayed away from calling exit deep
into our libraries, there is no reason to add it now. I'll vote in favor of
show_help + return code.
george.
On Nov 1, 2011, at 15:14 , Jeff Squyres wrote:
> We talked about this on the call today.
>
> A good sugg
> 3. show_help. if OPAL_ENABLE_DEBUG, exit(1), else return error code.
>
>
>
> On Nov 1, 2011, at 4:50 PM, George Bosilca wrote:
>
>> This is a much saner solution. We [mostly] stayed away from calling exit
>> deep into our libraries, there is no reason to ad
as some opposition to having show_help messages possibly coming up all
> over the place I thought a fall back of only doing the show_help on
> enable_debug builds was a reasonable middle ground.
>
> --td
>>
>> On Nov 1, 2011, at 7:30 PM, George Bosilca wrote:
>>
&g
I might have missed some of the phone conferences, but this is a highly
critical modification of the one of the performance critical sub-system of Open
MPI. There was no RFC about and no prior warning. This change impacts every
other BTL and PML out there. Moreover, at this point there is no ass
This commit introduced quite a few warnings on Mac OS X. A snippet is attached
below. Btw, why do we need to build buffer event support in libevent? And why
ssl ...
../../../../../../ompi/opal/mca/event/libevent2013/libevent/bufferevent_openssl.c:
In function 'bio_bufferevent_read':
../../../..
t 11:25 PM, Nathan T. Hjelm wrote:
>
>> Hmm, I didn't come across that during testing. You are right that we
>> should't be compiling with ssl support.
>>
>>
>> On Mon, 7 Nov 2011 01:17:50 -0500, George Bosilca
>> wrote:
>>> This c
tl.
>>>
>>> Anyone else care to weigh in or do some measurements?
>>>
>>> -Nathan
>>>
>>> On Sun, 6 Nov 2011 23:05:57 -0500, George Bosilca
>>> wrote:
>>>> I might have missed some of the phone conferences, but this is a
A little bit of history:
1. r25305: added 2 atomic operations to OPAL. However, they only exists on
amd64 and are only used in the vader BTL, which I assume only supports amd64.
2. r25334: The seg_key union got a new member ptr. This member is solely used
in the vader BTL, as all other BTL use
On Nov 7, 2011, at 10:37 , Jeff Squyres wrote:
> On Nov 7, 2011, at 10:16 AM, Nathan T. Hjelm wrote:
>
>> Yes, and I completely agree. I was simply trying to keep it consistent in
>> case there is something I don't know about the heterogeneous case.
>>
>> I increased the size of the 64 bit memb
On Nov 7, 2011, at 17:34 , Barrett, Brian W wrote:
> A number of OMPI developer institutions are working on a new BTL
> (different from vader) for the Cray XE series using the uGNI upper layer.
> The rkeys in uGNI are 128 bytes.
Thanks Brian for the clarification.
Obviously there was no need fo
On Nov 7, 2011, at 17:34 , Barrett, Brian W wrote:
> I'm not sure why they called it vader, but vader is a fairly straight
> forward shared memory BTL. It differs from sm in two important ways: 1)
> it uses the XPMEM driver instead of SysV for shared memory and 2) it uses
> the the Nemesis queue
I was trying to understand how the debugger interface is supposed to work. And
if I was confused before, that feeling never disappeared.
There is one thing that I really can't figure out, and I hope that somebody
(Jeff/Ralph/Rolf based on svn blame) can enlighten me.
MPIR_debug_gate. In the doc
widely adopted.
>
> I'd suggest being a little careful about making changes without consulting
> people who use TV and "stat", at least - those are the ones most recently
> tested.
>
>
> On Nov 7, 2011, at 5:59 PM, George Bosilca wrote:
>
>> I was
t is broken...and I don't recall seeing that question raised
> to the community (and time given for a response) prior to the changes being
> committed. :-)
?
george.
>
>
> On Nov 7, 2011, at 8:20 PM, George Bosilca wrote:
>
>> They better do conform to what they
Nov 2011 17:18:42 -0500, George Bosilca
> wrote:
>> A little bit of history:
>>
>> 1. r25305: added 2 atomic operations to OPAL. However, they only exists
> on
>> amd64 and are only used in the vader BTL, which I assume only supports
>> amd64.
>
> Two thin
On Nov 8, 2011, at 07:52 , Jeff Squyres wrote:
> To be clear: that document simply standardizes what MPI implementations are
> supposed to provide in their MPIR implementation (prior to this, MPI
> implementations tended to have subtle differences between their MPIR
> implementations, which we
Folks,
Wednesday November 15th at 12:15 PST, we will have an Open MPI BOF. We will
have two guest speakers: Rolf vandeVaart from NVIDIA and Shinji Sumimoto from
the K-computer. If you are at SC, you are all invited to participate to this
annual event. Blend for a moment with our user community,
MPIR_Breakpoint, as the name indicates, it is just a breakpoint used by the
startup process or the MPI application to signal changes to the debugger. No
return value, nothing more than a breakpoint.
I wonder how the volatile got there, there is no such requirement on variables
that cannot be ch
Elements in an array are always stored in the expected [increasing] order,
regardless of the endianess of the architecture. Moreover, due to the alignment
rules, all members in a union will start at the same address.
It turns out there is no endianess conversion on the keys, so I suppose both
p
ptmalloc2/malloc.c in the various versions of OpenMPI that
> are still being maintained.
>
> Larry Baker
> US Geological Survey
> 650-329-5608
> ba...@usgs.gov
>
> On 17 Oct 2011, at 8:18 PM, George Bosilca wrote:
>
>> Larry,
>>
>> Sorry for not
tarter = 0;
george.
On Nov 8, 2011, at 17:43 , Ashley Pittman wrote:
>
> I think the volatiles are there to ensure the compiler doesn't optimise away
> reads or function calls which has been a problem with this interface in the
> past.
>
> On 8 Nov 201
uld ask a
hard-core compiler guru about …
george.
>
> -Paul
>
> On 11/8/2011 2:46 PM, George Bosilca wrote:
>> This value is not even read by the debugger. It only check for it's
>> existence in the startup process, so I guess we're safe here as well.
>
32;
> uint64_t key64;
> struct { uint64_t value[2] } key128;
> };
>
> -Nathan
>
> On Tue, 8 Nov 2011 17:22:48 -0500, George Bosilca
> wrote:
>> Elements in an array are always stored in the expected [increasing]
> order,
>> regardless of the endianess of the
t;> -Paul
>>
>> On 11/8/2011 2:46 PM, George Bosilca wrote:
>>> This value is not even read by the debugger. It only check for it's
>>> existence in the startup process, so I guess we're safe here as well.
>>
>> --
>> Paul H. Hargrov
On Nov 8, 2011, at 10:36 , Nathan T. Hjelm wrote:
> On Tue, 8 Nov 2011 06:36:03 -0800, Rolf vandeVaart
> wrote:
>>> george.
>>>
>>> PS: Regarding the hand-copy instead of the memcpy, we tried to avoid
>> using
>>> memcpy in performance critical codes, especially when we know the size of
>>> the
This is supposed to be an intrinsic, automatically replaced by GCC during the
compilation process. If I do the same configure as you (on the same machine) I
have in my opal_config.h:
/* Whether C compiler supports __builtin_expect */
#define OPAL_C_HAVE_BUILTIN_EXPECT 1
/* Whether C++ compiler s
I guess I reach one of these corner-cases that didn't got tested. I can't start
any apps (not even a hostname) after this commit using the rsh PLM (as soon as
I add a hostile). The mpirun is blocked in an infinite loop (after it spawned
the daemons) in orte_rmaps_base_compute_vpids. Attaching wi
.indiana.edu from app number 0
> universe size 8
>
>
> I'll get a fresh checkout and see if I can replicate from that...
>
> On Nov 17, 2011, at 7:42 PM, George Bosilca wrote:
>
>> I guess I reach one of these corner-cases that didn't got tested. I can't
>
Hello, World, I am 0 of 2 on host odin090.cs.indiana.edu from app number 0
>>> universe size 8
>>>
>>>
>>> I'll get a fresh checkout and see if I can replicate from that...
>>>
>>> On Nov 17, 2011, at 7:42 PM, George Bosilca wrote:
>
t get it to fail, even with hostfile arguments. I'll try again in the
> morning.
>
> On Nov 17, 2011, at 8:49 PM, George Bosilca wrote:
>
>> Maybe the issue is generated by how the hostile is specified. I used
>> orte_default_hostfile= in my mca-params.conf.
>>
Dear Yuki and Takahiro,
Thanks for the bug report and for the patch. I pushed a [nearly identical]
patch in the trunk in https://svn.open-mpi.org/trac/ompi/changeset/25488. A
special version for the 1.4 has been prepared and has been attached to the
ticket #2916 (https://svn.open-mpi.org/trac/o
On Nov 18, 2011, at 07:49 , Ralph Castain wrote:
> That's a condition which should never be reached, but just to be safe, I have
> added a "bozo check" that will cause the routine to error out with a message
> if that situation occurs. I have tried everything - hostfile, dash-host,
> bizarre co
On Nov 18, 2011, at 07:29 , Hugo Daniel Meyer wrote:
> Hello again.
>
> I was doing some trace into de PML_OB1 files. I start to follow a MPI_Ssend()
> trying to find where a message is stored (in the sender) if it is not send
> until the receiver post the recv, but i didn't find that place.
On Nov 18, 2011, at 11:14 , Hugo Daniel Meyer wrote:
> 2011/11/18 George Bosilca
>
> On Nov 18, 2011, at 07:29 , Hugo Daniel Meyer wrote:
>
>> Hello again.
>>
>> I was doing some trace into de PML_OB1 files. I start to follow a
>> MPI_Ssend() trying to fi
On Nov 18, 2011, at 11:50 , Hugo Daniel Meyer wrote:
>
> 2011/11/18 George Bosilca
>
> On Nov 18, 2011, at 11:14 , Hugo Daniel Meyer wrote:
>
>> 2011/11/18 George Bosilca
>>
>> On Nov 18, 2011, at 07:29 , Hugo Daniel Meyer wrote:
>>
>>> Hell
-10!
george.
On Nov 22, 2011, at 08:38 , Jeff Squyres wrote:
> 1. Personally, I would love to rename the openib BTL to something that makes
> sense and is a current name. By "rename", I include "rename the directory"
> -- so it would be ompi/mca/btl/ofrc, or something like that.
>
> 2. Goo
aming xpmem? Or ...?
>
> On Nov 22, 2011, at 11:37 AM, George Bosilca wrote:
>
>> -10!
>>
>> george.
>>
>> On Nov 22, 2011, at 08:38 , Jeff Squyres wrote:
>>
>>> 1. Personally, I would love to rename the openib BTL to something that
>
27;ing the whole concept? Or
>> just renaming xpmem? Or ...?
>>
>> On Nov 22, 2011, at 11:37 AM, George Bosilca wrote:
>>
>>> -10!
>>>
>>> george.
>>>
>>> On Nov 22, 2011, at 08:38 , Jeff Squyres wrote:
>>>
>
These two functions target at defining a memory layout (contiguous or not) that
can be target for a one-sided communication. I don't see why there is a need to
know what type of communication that will be … What is so different in the
xpmem that requires the memory to be prepared based on the op
On Nov 29, 2011, at 15:52 , Nathan Hjelm wrote:
>
>
> On Tue, 29 Nov 2011, George Bosilca wrote:
>
>> These two functions target at defining a memory layout (contiguous or not)
>> that can be target for a one-sided communication. I don't see why there is a
is triggered right before returning above the MPI
layer, at the level where you placed your interception you have all the freedom
you need to handle the faults.
george.
>
> Thanks for the help.
>
> Hugo
>
> 2011/11/19 Hugo Daniel Meyer
>
>
> 2011/11/18 Geo
Do we have the same issue in the trunk?
george.
On Dec 12, 2011, at 12:49 , Mike Dubman wrote:
> after removing my debug prints - the correct line is 448
>
> On Mon, Dec 12, 2011 at 7:46 PM, Mike Dubman wrote:
>
>
> Hi guys,
>
> The latest v1.5 from trunk, compiled in debug mode yields
Broken !!!
george.
On Dec 12, 2011, at 15:52 , hje...@osl.iu.edu wrote:
> Author: hjelmn
> Date: 2011-12-12 15:52:51 EST (Mon, 12 Dec 2011)
> New Revision: 25621
> URL: https://svn.open-mpi.org/trac/ompi/changeset/25621
>
> Log:
> enable ptmalloc with using uGNI
> Added:
> trunk/orte/mca/rm
aps_lb.c:530:9: error: too few arguments to function
> ‘orte_rmaps_base_compute_vpids’
> ../../../../orte/mca/rmaps/base/rmaps_private.h:65:19: note: declared here
> make[1]: *** [rmaps_lb.lo] Error 1
> make[1]: Leaving directory
> `/home/jsquyres/svn/ompi3/orte/mca/rmaps/load_balance
We are investigating. A fix will be hopefully provided soon.
Thanks for the report,
george.
On Dec 13, 2011, at 00:25 , Mike Dubman wrote:
> nope
>
> On Mon, Dec 12, 2011 at 10:40 PM, George Bosilca wrote:
> Do we have the same issue in the trunk?
>
> george.
>
>
I noticed today a drastic change in how ORTE deal with the hostfile between
trunk and 1.5.
1. 1.5 and prior used the hostile as a suggestion, a placeholder where to pick
the requested number of daemons during the launch. The current trunk spawn
daemons on all the nodes provided on the host file
Shiqing,
This file seems to be there.
$ pwd
/home/bosilca/unstable/1.5/ompi
$ svn info opal/mca/shmem/windows/.windows
Path: opal/mca/shmem/windows/.windows
Name: .windows
URL:
https://svn.open-mpi.org/svn/ompi/branches/v1.5/opal/mca/shmem/windows/.windows
Repository Root: https://svn.open-mp
A comment in the commit suggest that the symbols were not linked into the
orterun if they were not accessed there. I guess this was the trick to make
sure MPIR_Breakpoint is in there.
Now that you pointed me to this commit I have to disagree with. Why the MPI
debugging symbols have been delete
ses, at a point where after a while
we have to reboot the machines to liberate pids.
On Dec 14, 2011, at 10:08 , Ralph Castain wrote:
> On Dec 13, 2011, at 9:10 PM, George Bosilca wrote:
>
>> I noticed today a drastic change in how ORTE deal with the hostfile between
>> trunk
This patch is not correct. All these variables have been moved into the ORTE
layer (they are declared in orte/mca/debugger/base/base.h), so they should be
in fact removed from the MPI level files.
While I don't think moving them all in the ORTE was a good choice, changing
their definition in t
On Dec 15, 2011, at 16:55 , Ashley Pittman wrote:
> There is a problem with 1.5.5rc1 that prevents padb from loading the process
> table start from the orterun process, what appears to be happening is that
> MPIR_proctable and MPIR_proctable_size is present in both orterun itself and
> also in
This is quite impressive. After digging a little bit more, it appears that the
orte/tools/orterun/debuggers.c is in the repository but it is not used for
compilation. Thus, I really don't see where the second definition is coming
from?
george.
On Dec 15, 2011, at 17:02 , George Bo
MPI_ERROR is set. Do you know where jumps the execution? or
>> at least in which error handler?
>>
>> Thanks in advance.
>>
>> Hugo
>>
>> 2011/12/9 George Bosilca
>>
>> On Dec 9, 2011, at 06:59 , Hugo Daniel Meyer wrote:
>>
>>> He
I checked the wait_any code, and I can only see one possible execution path to
return MPI_UNDEFINED. All requests have to be marked as inactive, which only
happens after the OMPI request completion function is called.
This lead to the following question. Are your threads waiting on common
reque
The problem is correctly identified and solved. I already pushed the patch in
the trunk. I will create the CMR for both 1.5 and 1.4.
Kudos to the Fujitsu team, that was a tricky one to find. Thanks for you
contributions!
george.
On Jan 12, 2012, at 10:39 , Barrett, Brian W wrote:
> George -
Chris,
If the sys call are there we should clearly take advantage of them. The patch
looks good, I vote for it!
george.
On Jan 12, 2012, at 04:34 , Christopher Yeoh wrote:
> Hi Brad,
>
> WHAT: Adds Cross Memory Attach support to the sm btl
>
> WHY: For faster intranode communication
>
>
This is a critical change, with a significant impact on the code base.
Basically by moving the binding later in the code after the modex was
completed, all memory allocated before (which is all memory allocated during
the registration of all OMPI modules) will endue being on the wrong NUMA node
How about instead of all the patches (r25758, r25762 and r25763) we just set
both LD_LIBRARY_PATH and DYLD_LIBRARY_PATH everywhere? One will get ignored on
Unix while the other on Darwin?
Another benefit will be to have a significantly cleaner...
george.
On Jan 21, 2012, at 18:48 , r...@osl.
> Doesn't strike me as all that complicated.
>
> On Jan 21, 2012, at 7:26 PM, George Bosilca wrote:
>
>> How about instead of all the patches (r25758, r25762 and r25763) we just set
>> both LD_LIBRARY_PATH and DYLD_LIBRARY_PATH everywhere? One will get ignored
&g
Thanks Yuki,
All the submitted patches have been pushed into the trunk
(https://svn.open-mpi.org/trac/ompi/changeset/25781) and are pending on the
queue for 1.4 and 1.5.
Thanks,
george.
On Jan 25, 2012, at 05:25 , Y.MATSUMOTO wrote:
> Dear All,
>
> Next is about "MPI_FILE_SEEK_SHARED"
This doesn't sound like a very good idea, despite a significant support from a
lot of institutions. There is no standardization efforts in the targeted
community, and championing a broader support in the Java world was not one of
our main target.
OMPI does not include the Boost bindings, despit
The self BTL is different from any other BTL. Any memcpy operation done by this
BTL is automatically protected behind the matching logic, and therefore does
not require extra thread safety protection. A mutex in the self BTL is mostly a
copy/paste mistake.
george.
On Feb 8, 2012, at 17:57 ,
be done on a branch and the RFC can be
>>>>>> reissued when there is both
>>>>>> a) a standard to which the bindings can claim to conform
>>>>>> b) an implementation which has been shown to be stable
>>>>>>
>>>>>> -Paul
t can succeed only
for one thread).
george.
On Feb 8, 2012, at 21:38 , Christopher Yeoh wrote:
> On Wed, 8 Feb 2012 20:58:52 -0500
> George Bosilca wrote:
>
>> The self BTL is different from any other BTL. Any memcpy operation
>> done by this BTL is automatically protected be
Thanks Paul. Fixed in r25946.
george.
On Feb 16, 2012, at 21:47 , Paul H. Hargrove wrote:
> I just tried to build yesterday's ompi trunk tarball (1.7a1r25937) with the
> Intel compilers.
> Sorry if this was fixed in the past 23 hours or so.
>
>
> My system has GM-2.1.30 installed, and icc w
Good catch!!! That's indeed a quite nasty bug.
If it fixes the IB issues it justifies a 1.4.6 release.
Thanks,
george.
On Mar 1, 2012, at 10:56 , Nathan Hjelm wrote:
> Found a pretty nasty frag leak (and a minor one) in ob1 (see commit below).
> If this fix addresses some hangs we are se
Please do a "ompi_info --param btl sm" on your environment. The lazy_free
direct the internals of the SM BTL not to release the memory fragments used to
communicate until the lazy limit is reached. The default value was deemed as
reasonable a while back when the number of default fragments was l
Yuki,
I applied your patch and added your copyright in the corresponding files
(r26080). I will make a CMR for the 1.4 and 1.5. However, as you might have
noticed we're trying to close the 1.4 and move forward.
Thanks,
george.
On Mar 1, 2012, at 02:33 , Y.MATSUMOTO wrote:
> Dear All,
>
Yuki,
r26084 should fixes the issue with the dynamic rules file in the trunk. Thanks
for reporting it.
george.
On Mar 1, 2012, at 02:33 , Y.MATSUMOTO wrote:
> But, we found problem when over 2GiB message is written in rulefile as
> "message size".
> (over 2GiB message cannot read correctly.
On Mar 3, 2012, at 18:18 , Alex Margolin wrote:
> I've figured that what I really need is to write my own BTL component, rather
> then trying to manipulate the existing TCP one. I've started writing it using
> the 1.5.5rc3 tarball and some pdfs from 2006 I found on the website (anything
> else
Yuki,
I pushed a fix for this issue in the trunk (r26097). However, I disagree with
you on some of the topics below.
On Mar 5, 2012, at 04:02 , Y.MATSUMOTO wrote:
> Dear All,
>
> Next feedback is about "collective communications".
>
> Collective communication may be abend when it use over 2Gi
I was afraid about all those little intermediary steps. I asked a compiler guy
and apparently reversing the order (aka starting with the ptrdiff_t variable)
will not solve anything. The only portable way to solve this is to cast every
single member, to prevent __any__ compiler from hurting us.
I gave it a try (r26103). It was messy, and I hope I got it right. Let's soak
it for few days with our nightly testing to see how it behave.
george.
On Mar 5, 2012, at 16:37 , N.M. Maclaren wrote:
> On Mar 5 2012, George Bosilca wrote:
>>
>> I was afraid about all those
I din't check thoroughly the code, but OMPI_ERR_OUT_OF_RESOURCES is not an
error. If the registration returns out of resources, the BTL will returns
OUT_OF_RESOURCE (as an example via the mca_btl_openib_prepare_src). At the
upper level, the PML (in the mca_pml_ob1_send_request_start function) in
Josh,
Open MPI already have a similar function in the communicator part, function
that is not exposed to the upper layer. I think that using the code in
ompi_comm_compare (the second part that compare groups) is sound proof.
Moreover, if now we have an ompi_group_compare function you should use
On Mar 9, 2012, at 08:38 , Alex Margolin wrote:
> Hi,
>
> I'm implementing a new BTL component, and
>
> 1. I read the TCP code and ran into the three fragment lists:
>
>/* free list of fragment descriptors */
>ompi_free_list_t tcp_frag_eager;
>ompi_free_list_t tcp_frag_max;
>om
this is in the BTL registration code (openib_reg_mr), not in
>> the directly-invoked-by-the-PML code. So it's the mpool's fault -- not the
>> PML's fault.
>>
>>
>>
>> On Mar 6, 2012, at 10:05 AM, George Bosilca wrote:
>>
>>> I din'
On Mar 9, 2012, at 14:23 , Nathan Hjelm wrote:
> BTW, can anyone tell me why each mpool defines mca_mpool_base_resources_t
> instead of defining mca_mpool_blah_resources_t. The current design makes it
> impossible to support more than one mpool in a btl. I can delete a bunch of
> code if I can
; frag->hdr.base.tag;
>reg->cbfunc(&frag->btl->super, frag->hdr.base.tag,
> &frag->base, reg->cbdata);
>}
> This calls a callback function, which I assume notifies the upper layer of a
> message, but this is only for MCA_BTL
Josh,
I don't agree that these changes are required. In the current standard (2.2),
MPI_ERR_PENDING is only allowed to be returned by MPI_WAITALL, in some very
specific conditions. Here is the snippet from the MPI standard clarifying this
behavior.
> When one or more of the communications comp
equest_default_wait_all was
> incorrect. Would you care to elaborate as to what specifically is
> incorrect?
>
> -- Josh
>
> On Wed, Mar 21, 2012 at 2:17 PM, George Bosilca wrote:
>> Josh,
>>
>> I don't agree that these changes are required. In the current
is
supposed to achieve. Moreover, I guess your patch is indeed correct if the
MPICH test you were using pass.
> Do we both agree that before this patch Open MPI did not support
> MPI_ERR_PENDING?
Who dares claim the opposite?
george.
>
> -- Josh
>
>
> On Wed, Mar
ng in the default_wait_all of MPI_ERR_PENDING is
>> incorrect as well."
>> I'm glad that you no longer think the patch is "incorrect", but just
>> not implemented as well as it could be.
>>
>> Thanks,
>> Josh
>>
>> On Thu
On Mar 21, 2012, at 12:14 , Nathan Hjelm wrote:
> What: Change coll tuned default to pairwise exchange
>
> Why: The linear algorithm does not scale to any reasonable number of PEs
>
> When: Timeout in 2 days (Fri)
>
> Is there any reason the default should not be changed?
Nathan,
I can see w
Roswan,
There a re simpler solutions to achieve this. We have a built-in mechanism to
select a specific collective implementation. Here is what you have to add in
your .openmpi/mca-params.conf (or as MCA argument on the command line):
coll_tuned_use_dynamic_rules = 1
coll_tuned_bcast_algorithm
nd of first collective
On Apr 3, 2012, at 09:01 , Pavel Mezentsev wrote:
> Is there a way to specify collective depending on the size of the message and
> number of processes?
>
> Regards,
> Pavel Mezentsev
>
> 2012/4/3 George Bosilca
> Roswan,
>
> There a re simpler
Alex,
This is indeed quite strange. You're receiving an error about truncated data
during a barrier. The MPI_Barrier is the only MPI function that has a
synchronization meaning, and does not move data around, so I can hardly see how
this can generate a truncation.
You should put a breakpoint i
Adachi,
You're right indeed, I should have multiplied the displacement by the extent of
the datatype.
Thanks for catching this! Commit r26259 is supposed to fix this.
george.
On Apr 9, 2012, at 01:57 , ADACHI Tomoya wrote:
> Hi George,
>
> This fix seems insufficient for multibyte datatype
No strong opinion. However, the comment about the initial value of
opal_cache_line_size is wrong (opal/runtime/opal.h), as it states that the
default value is -1 while it is 128.
george.
On Apr 23, 2012, at 16:21 , Jeffrey Squyres wrote:
> No one replied to this RFC. Does anyone have an opi
I guess at the end is a trade-off between performance and space. What we wanted
with this code is to avoid false sharing of cache lines between our data
(internal header we add in from of elements in lists) and the content of the
data (whatever is coming from the upper layer).
If the header is
Alex,
You got the banner of the FT benchmark, so I guess at least the rank 0
successfully completed the MPI_Init call. This is a hint that you should
investigate more into the point-to-point logic of your mosix BTL.
george.
On Apr 25, 2012, at 09:30 , Alex Margolin wrote:
> NAS Parallel Ben
101 - 200 of 1295 matches
Mail list logo