Re: [OMPI users] How does authentication between nodes work without password? (Newbie alert on)

2011-02-14 Thread Kevin . Buckley
This probably shows my lack of understanding as to how OpenMPI negotiates the connectivity between nodes when given a choice of interfaces but anyway: does dasher have any network interfaces that vixen does not? The scenario I am imgaining would be that you ssh into dasher from vixen using a

Re: [OMPI users] OpenMPI 1.2.x segfault as regular user

2011-03-20 Thread Kevin . Buckley
> It's not hard to test whether or not SELinux is the problem. You can > turn SELinux off on the command-line with this command: > > setenforce 0 > > Of course, you need to be root in order to do this. > > After turning SELinux off, you can try reproducing the error. If it > still occurs, it's

[OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

2011-09-11 Thread Kevin . Buckley
I have recently seen some OpenIB time out errors and see the following reported: * btl_openib_ib_retry_count - The number of times the sender will attempt to retry (defaulted to 7, the maximum value). * btl_openib_ib_timeout - The local ACK timeout parameter (defaulted to 10). The actual

Re: [OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

2011-09-11 Thread Kevin . Buckley
Ralph, > Are you getting those messages from ompi_info? Or from an MPI app >(and if so, what are you doing to get them)? They're coming out of a user's application. Reason I just wanted to check about what the errors are saying is that things are still in tesing mode wrt the IB kit though, as I

Re: [OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

2011-09-13 Thread Kevin . Buckley
> So the error output is not showing what you two think should be > the default value, 20, but then nor is it showing what I think I > have set it to globally, again, 20. > > But anyroad, what I wanted from this is confirmation that the output > is telling me the value that the job was running

Re: [OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

2011-09-14 Thread Kevin . Buckley
> That text message is hard-coded (and apparently out of date); it > does not show the current value. > > I agree that that is misleading. This error message needs to be > improved. OK, good to have that clarified Jeff, cheers. > This might suggest a hardware issue; let us know what you find.

[OMPI users] Pointers for understanding failure messages on NetBSD

2009-11-29 Thread Kevin . Buckley
Hi there, I recently compiled OpenMPI 1.3.3 for a NetBSD platform as part of an attempt to get some MPI-based codes running on the SGE cycle stealing grid we have in the School here. I should point out that this has not been done within the pkgsrc build system as yet but that I found I was able

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Kevin . Buckley
> "Jeff Squyres" > > > Oy. This is ick, because this error code is coming from horrendously > complex code deep in the depths of OMPI that is probing the OS to > figure out what ethernet interfaces you have. It may or may not be > simple to fix this. > > Do you mind diving

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Kevin . Buckley
>> I assume that both of you have seen the reply from Aleksej Saushev, >> who seems to be the bloke looking after the port of OpenMPI to the >> NetBSD platform. >> >> >> Aleksej suggested some mods he had partially looked at, in >> >> opal/util/if.c > > Nope - didn't see anything like that :-/

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Kevin . Buckley
> Interesting - especially since the existing code works quite well over a > wide range of platforms. So I'm not quite so eager to declare it incorrect > and only working by accident. > > However, I would welcome a proposed patch so we can look at it. This is > always an important area for us, so

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Kevin . Buckley
> I believe this line is incorrect: > >>opal_list_append(_if_list, (opal_list_item_t*) >> intf_ptr); > > It needs to be > > opal_list_append(_if_list, _ptr->super); Didn't seem to change things. Any thoughts on the: /* * hardcoded netmask, adrian says that's

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Kevin . Buckley
> I would be leery of the hard-coded stuff. Indeed, so I changed it to: intf.if_mask = prefix( sin_addr->sin_addr.s_addr); which seems to match what the "old" code was doing: still blowing up though. > Reason: the IPv6 code has been a continual source of trouble, > while the IPv4 code has

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Kevin . Buckley
Oh bugger, I did miss the obvious. The "old" code which I had ifdef'd out contained an actual construction of the list itself. OBJ_CONSTRUCT(_if_list, opal_list_t); If I make sure I do one of those, I now get a different set of messages but we are back to running again. mpirun -v -n 1

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-03 Thread Kevin . Buckley
>> I have actually already taken the IPv6 block and simply tried to >> replace any IPv6 stuff with IPv4 "equivalents", eg: > > At the risk of showing a lot of ignorance, here's the block I coddled > together based on the IPv6 block. > > I have tried to keep it looking as close to the original IPv6

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-08 Thread Kevin . Buckley
Aleksej Cc: to the OpenMPI list as the oftdump clash might be of interest elsewhere. > I attach a patch, but it doesn't work and I don't see where the > error lies now. It may be that I'm doing something stupid. > It produces working OpenMPI-1.3.4 package on Dragonfly though. Ok, I'll try and

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-08 Thread Kevin . Buckley
OK, it works although there are some temporary errors. This is the NetBSD wip openmpi package as downloaded from the webCVS a couple of days ago but with my patches as detailed before (I have not tried comparing yours with mine as yet) and the removal of the compilation and install of the

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-08 Thread Kevin . Buckley
>> I attach a patch, but it doesn't work and I don't see where the >> error lies now. It may be that I'm doing something stupid. >> It produces working OpenMPI-1.3.4 package on Dragonfly though. > > Ok, I'll try and merge it in to the working stuff we have here. > I, obviously, just #ifdef'd for

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-09 Thread Kevin . Buckley
>> 26a27 >>> CONFIGURE_ARGS+= --enable-contrib-no-build=vt >> >> I have no idea how NetBSD go about resolving such clashes in the long >> term though? > > I've disabled it the same way for this time, my local package differs > from what's in wip: > > --- PLIST 3 Dec 2009 10:18:00 -

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-10 Thread Kevin . Buckley
>> Are you going to upgrade the NetBSD port to build against OpenMPI 1.4 >> now that it available ? Might be a good time to check the fuzzz in the >> existing patches. > > http://pkgsrc-wip.cvs.sourceforge.net/viewvc/pkgsrc-wip/wip/openmpi/Makefile Just to say that I built the NetBSD OpenMPI 1.4

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-16 Thread Kevin . Buckley
> Just to say that I built the NetBSD OpenMPI 1.4 port from the CVS, > so includsing all the recent work and get the exmaples to run, albeit > still with the: > > opal_sockaddr2str failed:Unknown error (return code 4) > > non-fatal errors. > > As promised, I'll do bit more digging into this.

[OMPI users] Building on SPARC-Enterprise-T5120

2010-06-16 Thread Kevin . Buckley
Beyond what's documented at the FAQ (Questions 20 and 21) http://www.open-mpi.org/faq/?category=building#build-sun-compilers is there anything else worth tweaking for building on a SPARC-Enterprise-T5120 with the June 2010 Express compiler suite ? Perhaps, instead of -xtarget=ultra3 one

Re: [OMPI users] Fortran - MPI_WORLD_COMM - correction

2010-06-22 Thread Kevin . Buckley
> I think the problem is that you didn't include mpif.h in testsubr(). > Hence, the value of MPI_INTEGER was undefined -- I don't think it's a > problem with the value of MPI_Comm. That's correct. You also don't then need to pass MPI_Comm_World around, it is a parameter defined in mpif-common.h

Re: [OMPI users] Fortran - MPI_WORLD_COMM

2010-06-22 Thread Kevin . Buckley
> This is basic fortran programming issue, you may want to consult > some fortran programming book. > > A.Chan It is more an issue with understanding the usual implementations of the MPI Fortran bindings, namely, having to include mpif.h in ALL procedures that wish to make use of the MPI

Re: [OMPI users] mpiexec hangs - new install

2010-07-25 Thread Kevin . Buckley
> Here's what seems to be a solution that works for SuSE. May be > something similar for other systems: > >1) Edit the file /etc/sysconfig/SuseFirewall2 >2) Look for the keyword FW_TRUSTED_NETS >3) Add the IP addresses of your internal machines there. The format > for multiple