Re: [OMPI devel] Using external libevent

2013-05-01 Thread Ralph Castain

On May 1, 2013, at 7:32 PM, Orion Poplawski  wrote:

> On 04/29/2013 11:04 AM, Ralph Castain wrote:
>> 
>> On Apr 27, 2013, at 7:37 PM, Orion Poplawski  wrote:
>> 
>>> On 04/26/2013 08:53 PM, Ralph Castain wrote:
 
 On Apr 26, 2013, at 7:40 PM, Orion Poplawski  wrote:
> 
> So it looks like I will need to shortly be looking at how to link against 
> an external libevent.  Any help with that would be greatly appreciated.
 
 As I said, I'll take a look at it, but can't commit to having it available 
 any time soon. It isn't something I would suggest someone try who isn't 
 fully versed in OMPI's code base.
>>> 
>>> Yeah, I'm not looking forward to it.  I get to at least wait until the 
>>> non-threaded version of libevent is available.
>> 
>> I hate to see someone suffer, so I went ahead and added the external 
>> libevent connection this morning. Not trivial, but it seems to work. It is 
>> in our developer's trunk if you want to test it. As Jeff has said, we would 
>> prefer you not do this until the 1.9 series is released, and we won't be 
>> porting this change to the 1.7 series anyway.
>> 
>> Just put it in so we can begin the investigation, and we always appreciate 
>> input and help in exploring the impacts!
>> Ralph
> 
> Great!  I'll try to take a look at next week.

You might wait a bit - Jeff is working on corner cases for it, so things will 
likely change. I'm not sure when he expects to finish.

> 
> I noticed another message about using a threaded libevent after all on the 
> devel list.  What is the status of that?  Do we still need to produce a 
> non-threaded libevent in Fedora?

I would hold off. I've been running some tests, and it looks to me like it 
punishes TCP messaging, but not too much (around 1%). Can't vouch that there 
won't be other problems, but it may prove to be okay. Let's see what happens 
once Jeff completes his work.


> 
> Thanks again.
> 
> -- 
> Orion Poplawski
> Technical Manager 303-415-9701 x222
> NWRA/CoRA DivisionFAX: 303-415-9702
> 3380 Mitchell Lane  or...@cora.nwra.com
> Boulder, CO 80301  http://www.cora.nwra.com
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Using external libevent

2013-05-01 Thread Orion Poplawski

On 04/29/2013 11:04 AM, Ralph Castain wrote:


On Apr 27, 2013, at 7:37 PM, Orion Poplawski  wrote:


On 04/26/2013 08:53 PM, Ralph Castain wrote:


On Apr 26, 2013, at 7:40 PM, Orion Poplawski  wrote:


So it looks like I will need to shortly be looking at how to link against an 
external libevent.  Any help with that would be greatly appreciated.


As I said, I'll take a look at it, but can't commit to having it available any 
time soon. It isn't something I would suggest someone try who isn't fully 
versed in OMPI's code base.


Yeah, I'm not looking forward to it.  I get to at least wait until the 
non-threaded version of libevent is available.


I hate to see someone suffer, so I went ahead and added the external libevent 
connection this morning. Not trivial, but it seems to work. It is in our 
developer's trunk if you want to test it. As Jeff has said, we would prefer you 
not do this until the 1.9 series is released, and we won't be porting this 
change to the 1.7 series anyway.

Just put it in so we can begin the investigation, and we always appreciate 
input and help in exploring the impacts!
Ralph


Great!  I'll try to take a look at next week.

I noticed another message about using a threaded libevent after all on 
the devel list.  What is the status of that?  Do we still need to 
produce a non-threaded libevent in Fedora?


Thanks again.

--
Orion Poplawski
Technical Manager 303-415-9701 x222
NWRA/CoRA DivisionFAX: 303-415-9702
3380 Mitchell Lane  or...@cora.nwra.com
Boulder, CO 80301  http://www.cora.nwra.com


[hwloc-devel] hwloc-1.7 woes

2013-05-01 Thread Pavan Balaji

One more issue with hwloc-1.7 on the mac.

http://git.mpich.org/mpich.git/commitdiff/d9a67f40

This showed up when we did a strict build of mpich.  I believe this can
be reproduced with "-Wall -Werror -O2", but I can find the exact set of
minimum required flags, if needed.

 -- Pavan

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


Re: [OMPI devel] [OMPI svn] svn:open-mpi r28435 - in trunk: . conf db db/revprops db/revprops/0 db/revs db/revs/0 db/transactions db/txn-protorevs hooks locks

2013-05-01 Thread Nathan Hjelm
Nevermind. Figured it out.

-Nathan

On Wed, May 01, 2013 at 10:06:08AM -0600, Nathan Hjelm wrote:
> *&&*$# . Can someone undo this.
> 
> -Nathan
> 
> On Wed, May 01, 2013 at 12:01:48PM -0400, svn-commit-mai...@open-mpi.org 
> wrote:
> > Author: hjelmn (Nathan Hjelm)
> > Date: 2013-05-01 12:01:48 EDT (Wed, 01 May 2013)
> > New Revision: 28435
> > URL: https://svn.open-mpi.org/trac/ompi/changeset/28435
> > 
> > Log:
> > import
> > 
> > Added:
> >trunk/README.txt
> >trunk/conf/
> >trunk/conf/authz
> >trunk/conf/passwd
> >trunk/conf/svnserve.conf
> >trunk/db/
> >trunk/db/current
> >trunk/db/format
> >trunk/db/fs-type
> >trunk/db/fsfs.conf
> >trunk/db/min-unpacked-rev
> >trunk/db/revprops/
> >trunk/db/revprops/0/
> >trunk/db/revprops/0/0
> >trunk/db/revs/
> >trunk/db/revs/0/
> >trunk/db/revs/0/0
> >trunk/db/transactions/
> >trunk/db/txn-current
> >trunk/db/txn-current-lock
> >trunk/db/txn-protorevs/
> >trunk/db/uuid
> >trunk/db/write-lock
> >trunk/format
> >trunk/hooks/
> >trunk/hooks/post-commit.tmpl
> >trunk/hooks/post-lock.tmpl
> >trunk/hooks/post-revprop-change.tmpl
> >trunk/hooks/post-unlock.tmpl
> >trunk/hooks/pre-commit.tmpl
> >trunk/hooks/pre-lock.tmpl
> >trunk/hooks/pre-revprop-change.tmpl
> >trunk/hooks/pre-unlock.tmpl
> >trunk/hooks/start-commit.tmpl
> >trunk/locks/
> >trunk/locks/db-logs.lock
> >trunk/locks/db.lock
> > 
> > 
> > Diff not shown due to size (32936 bytes).
> > To see the diff, run the following command:
> > 
> > svn diff -r 28434:28435 --no-diff-deleted
> > 
> > ___
> > svn mailing list
> > s...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/svn
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] [OMPI svn] svn:open-mpi r28435 - in trunk: . conf db db/revprops db/revprops/0 db/revs db/revs/0 db/transactions db/txn-protorevs hooks locks

2013-05-01 Thread Nathan Hjelm
*&&*$# . Can someone undo this.

-Nathan

On Wed, May 01, 2013 at 12:01:48PM -0400, svn-commit-mai...@open-mpi.org wrote:
> Author: hjelmn (Nathan Hjelm)
> Date: 2013-05-01 12:01:48 EDT (Wed, 01 May 2013)
> New Revision: 28435
> URL: https://svn.open-mpi.org/trac/ompi/changeset/28435
> 
> Log:
> import
> 
> Added:
>trunk/README.txt
>trunk/conf/
>trunk/conf/authz
>trunk/conf/passwd
>trunk/conf/svnserve.conf
>trunk/db/
>trunk/db/current
>trunk/db/format
>trunk/db/fs-type
>trunk/db/fsfs.conf
>trunk/db/min-unpacked-rev
>trunk/db/revprops/
>trunk/db/revprops/0/
>trunk/db/revprops/0/0
>trunk/db/revs/
>trunk/db/revs/0/
>trunk/db/revs/0/0
>trunk/db/transactions/
>trunk/db/txn-current
>trunk/db/txn-current-lock
>trunk/db/txn-protorevs/
>trunk/db/uuid
>trunk/db/write-lock
>trunk/format
>trunk/hooks/
>trunk/hooks/post-commit.tmpl
>trunk/hooks/post-lock.tmpl
>trunk/hooks/post-revprop-change.tmpl
>trunk/hooks/post-unlock.tmpl
>trunk/hooks/pre-commit.tmpl
>trunk/hooks/pre-lock.tmpl
>trunk/hooks/pre-revprop-change.tmpl
>trunk/hooks/pre-unlock.tmpl
>trunk/hooks/start-commit.tmpl
>trunk/locks/
>trunk/locks/db-logs.lock
>trunk/locks/db.lock
> 
> 
> Diff not shown due to size (32936 bytes).
> To see the diff, run the following command:
> 
>   svn diff -r 28434:28435 --no-diff-deleted
> 
> ___
> svn mailing list
> s...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn


Re: [OMPI devel] MPI_Mrecv(..., MPI_STATUS_IGNORE) in Open MPI 1.7.1

2013-05-01 Thread Jeff Squyres (jsquyres)
Right you are -- many thanks for finding the issue.  I just committed a fix to 
the trunk in SVN r28430; I'll CMR it over to v1.7.

On May 1, 2013, at 4:56 AM, Lisandro Dalcin  wrote:

> It seems that Mrecv() tries to write on the status arg, even when it
> is STATUS_IGNORE. Looking at the sources (pmrecv.c and pmprobe.c),
> there are some memcheck code paths that access status but do not check
> for STATUS_IGNORE, please review them.
> 
> $ cat tmp.c
> #include 
> 
> int main(int argc, char *argv[])
> {
>  MPI_Message message;
>  MPI_Init(, );
>  message = MPI_MESSAGE_NO_PROC;
>  MPI_Mrecv(NULL, 0, MPI_BYTE, , MPI_STATUS_IGNORE);
>  MPI_Finalize();
>  return 0;
> }
> 
> $ mpicc tmp.c
> $ valgrind ./a.out
> ...
> ==17489==
> ==17489== Invalid write of size 8
> ==17489==at 0x4CA811C: PMPI_Mrecv (pmrecv.c:62)
> ==17489==by 0x400816: main (in /tmp/a.out)
> ==17489==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==17489==
> [localhost:17489] *** Process received signal ***
> [localhost:17489] Signal: Segmentation fault (11)
> [localhost:17489] Signal code: Address not mapped (1)
> [localhost:17489] Failing at address: (nil)
> ...
> 
> 
> --
> Lisandro Dalcin
> ---
> CIMEC (INTEC/CONICET-UNL)
> Predio CONICET-Santa Fe
> Colectora RN 168 Km 472, Paraje El Pozo
> 3000 Santa Fe, Argentina
> Tel: +54-342-4511594 (ext 1011)
> Tel/Fax: +54-342-4511169
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27880 - trunk/ompi/request

2013-05-01 Thread KAWASHIMA Takahiro
George,

As I wrote in the ticket a few minutes ago, your patch looks good and
it passed my test. My previous patch didn't care about generalized
requests so your patch is better.

Thanks,
Takahiro Kawashima,
from my home

> Takahiro,
> 
> I went over this ticket and attached a new patch. Basically I went over all 
> the possible cases, both in test and wait, and ensure the behavior is always 
> consistent. Please give it a try, and let us know of the outcome.
> 
>   Thanks,
> George.
> 
> 
> 
> On Jan 25, 2013, at 00:53 , "Kawashima, Takahiro" 
>  wrote:
> 
> > Jeff,
> > 
> > I've filed the ticket.
> > https://svn.open-mpi.org/trac/ompi/ticket/3475
> > 
> > Thanks,
> > Takahiro Kawashima,
> > MPI development team,
> > Fujitsu
> > 
> >> Many thanks for the summary!
> >> 
> >> Can you file tickets about this stuff against 1.7?  Included your patches, 
> >> etc. 
> >> 
> >> These are pretty obscure issues and I'm ok not fixing them in the 1.6 
> >> branch (unless someone has a burning desire to get them fixed in 1.6). 
> >> 
> >> But we should properly track and fix these in the 1.7 series. I'd mark 
> >> them as "critical" so that they don't get lost in the wilderness of other 
> >> bugs. 
> >> 
> >> Sent from my phone. No type good. 
> >> 
> >> On Jan 22, 2013, at 8:57 PM, "Kawashima, Takahiro" 
> >>  wrote:
> >> 
> >>> George,
> >>> 
> >>> I reported the bug three months ago.
> >>> Your commit r27880 resolved one of the bugs reported by me,
> >>> in another approach.
> >>> 
> >>> http://www.open-mpi.org/community/lists/devel/2012/10/11555.php
> >>> 
> >>> But other bugs are still open.
> >>> 
> >>> "(1) MPI_SOURCE of MPI_Status for a null request must be MPI_ANY_SOURCE."
> >>> in my previous mail is not fixed yet. This can be fixed by my patch
> >>> (ompi/mpi/c/wait.c and ompi/request/request.c part only) attached
> >>> in my another mail.
> >>> 
> >>> http://www.open-mpi.org/community/lists/devel/2012/10/11561.php
> >>> 
> >>> "(2) MPI_Status for an inactive request must be an empty status."
> >>> in my previous mail is partially fixed. MPI_Wait is fixed by your
> >>> r27880. But MPI_Waitall and MPI_Testall should be fixed.
> >>> Codes similar to your r27880 should be inserted to
> >>> ompi_request_default_wait_all and ompi_request_default_test_all.
> >>> 
> >>> You can confirm the fixes by the test program status.c attached in
> >>> my previous mail. Run with -n 2. 
> >>> 
> >>> http://www.open-mpi.org/community/lists/devel/2012/10/11555.php
> >>> 
> >>> Regards,
> >>> Takahiro Kawashima,
> >>> MPI development team,
> >>> Fujitsu
> >>> 
>  To be honest it was hanging in one of my repos for some time. If I'm not 
>  mistaken it is somehow related to one active ticket (but I couldn't find 
>  the info). It might be good to push it upstream.
>  
>  George.
>  
>  On Jan 22, 2013, at 16:27 , "Jeff Squyres (jsquyres)" 
>   wrote:
>  
> > George --
> > 
> > Is there any reason not to CMR this to v1.6 and v1.7?
> > 
> > 
> > On Jan 21, 2013, at 6:35 AM, svn-commit-mai...@open-mpi.org wrote:
> > 
> >> Author: bosilca (George Bosilca)
> >> Date: 2013-01-21 06:35:42 EST (Mon, 21 Jan 2013)
> >> New Revision: 27880
> >> URL: https://svn.open-mpi.org/trac/ompi/changeset/27880
> >> 
> >> Log:
> >> My understanding is that an MPI_WAIT() on an inactive request should
> >> return the empty status (MPI 3.0 page 52 line 46).
> >> 
> >> Text files modified: 
> >> trunk/ompi/request/req_wait.c | 3 +++  
> >>
> >> 1 files changed, 3 insertions(+), 0 deletions(-)
> >> 
> >> Modified: trunk/ompi/request/req_wait.c
> >> ==
> >> --- trunk/ompi/request/req_wait.cSat Jan 19 19:33:42 2013
> >> (r27879)
> >> +++ trunk/ompi/request/req_wait.c2013-01-21 06:35:42 EST (Mon, 21 
> >> Jan 2013)(r27880)
> >> @@ -61,6 +61,9 @@
> >>  }
> >>  if( req->req_persistent ) {
> >>  if( req->req_state == OMPI_REQUEST_INACTIVE ) {
> >> +if (MPI_STATUS_IGNORE != status) {
> >> +*status = ompi_status_empty;
> >> +}
> >>  return OMPI_SUCCESS;
> >>  }
> >>  req->req_state = OMPI_REQUEST_INACTIVE;


[OMPI devel] MPI_Mrecv(..., MPI_STATUS_IGNORE) in Open MPI 1.7.1

2013-05-01 Thread Lisandro Dalcin
It seems that Mrecv() tries to write on the status arg, even when it
is STATUS_IGNORE. Looking at the sources (pmrecv.c and pmprobe.c),
there are some memcheck code paths that access status but do not check
for STATUS_IGNORE, please review them.

$ cat tmp.c
#include 

int main(int argc, char *argv[])
{
  MPI_Message message;
  MPI_Init(, );
  message = MPI_MESSAGE_NO_PROC;
  MPI_Mrecv(NULL, 0, MPI_BYTE, , MPI_STATUS_IGNORE);
  MPI_Finalize();
  return 0;
}

$ mpicc tmp.c
$ valgrind ./a.out
...
==17489==
==17489== Invalid write of size 8
==17489==at 0x4CA811C: PMPI_Mrecv (pmrecv.c:62)
==17489==by 0x400816: main (in /tmp/a.out)
==17489==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==17489==
[localhost:17489] *** Process received signal ***
[localhost:17489] Signal: Segmentation fault (11)
[localhost:17489] Signal code: Address not mapped (1)
[localhost:17489] Failing at address: (nil)
...


--
Lisandro Dalcin
---
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169