Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-16 Thread Ralph Castain
You could confirm that it is the IPv6 loop by simply disabling IPv6 support - configure with --disable-ipv6 and see if you still get the error messages Thanks for continuing to pursue this! Ralph On Dec 16, 2009, at 8:41 PM, kevin.buck...@ecs.vuw.ac.nz wrote: >> Just to say that I built the

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-16 Thread Kevin . Buckley
> Just to say that I built the NetBSD OpenMPI 1.4 port from the CVS, > so includsing all the recent work and get the exmaples to run, albeit > still with the: > > opal_sockaddr2str failed:Unknown error (return code 4) > > non-fatal errors. > > As promised, I'll do bit more digging into this.

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-10 Thread Kevin . Buckley
>> Are you going to upgrade the NetBSD port to build against OpenMPI 1.4 >> now that it available ? Might be a good time to check the fuzzz in the >> existing patches. > > http://pkgsrc-wip.cvs.sourceforge.net/viewvc/pkgsrc-wip/wip/openmpi/Makefile Just to say that I built the NetBSD OpenMPI 1.4

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-09 Thread Aleksej Saushev
kevin.buck...@ecs.vuw.ac.nz writes: CONFIGURE_ARGS+= --enable-contrib-no-build=vt >>> >>> I have no idea how NetBSD go about resolving such clashes in the long >>> term though? >> >> I've disabled it the same way for this time, my local package differs >> from what's in wip: >> >> ---

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-09 Thread Kevin . Buckley
>> 26a27 >>> CONFIGURE_ARGS+= --enable-contrib-no-build=vt >> >> I have no idea how NetBSD go about resolving such clashes in the long >> term though? > > I've disabled it the same way for this time, my local package differs > from what's in wip: > > --- PLIST 3 Dec 2009 10:18:00 -

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-09 Thread Aleksej Saushev
kevin.buck...@ecs.vuw.ac.nz writes: > Cc: to the OpenMPI list as the oftdump clash might be of interest > elsewhere. > >> I attach a patch, but it doesn't work and I don't see where the >> error lies now. It may be that I'm doing something stupid. >> It produces working OpenMPI-1.3.4 package on

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-08 Thread Kevin . Buckley
>> I attach a patch, but it doesn't work and I don't see where the >> error lies now. It may be that I'm doing something stupid. >> It produces working OpenMPI-1.3.4 package on Dragonfly though. > > Ok, I'll try and merge it in to the working stuff we have here. > I, obviously, just #ifdef'd for

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-08 Thread Kevin . Buckley
OK, it works although there are some temporary errors. This is the NetBSD wip openmpi package as downloaded from the webCVS a couple of days ago but with my patches as detailed before (I have not tried comparing yours with mine as yet) and the removal of the compilation and install of the

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-08 Thread Kevin . Buckley
Aleksej Cc: to the OpenMPI list as the oftdump clash might be of interest elsewhere. > I attach a patch, but it doesn't work and I don't see where the > error lies now. It may be that I'm doing something stupid. > It produces working OpenMPI-1.3.4 package on Dragonfly though. Ok, I'll try and

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-04 Thread Jeff Squyres
Excellent! Once you get some more definitive results, could you send this in patch form? On Dec 3, 2009, at 7:05 PM, wrote: > >> I have actually already taken the IPv6 block and simply tried to > >> replace any IPv6 stuff with IPv4

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-03 Thread Kevin . Buckley
>> I have actually already taken the IPv6 block and simply tried to >> replace any IPv6 stuff with IPv4 "equivalents", eg: > > At the risk of showing a lot of ignorance, here's the block I coddled > together based on the IPv6 block. > > I have tried to keep it looking as close to the original IPv6

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Kevin . Buckley
Oh bugger, I did miss the obvious. The "old" code which I had ifdef'd out contained an actual construction of the list itself. OBJ_CONSTRUCT(_if_list, opal_list_t); If I make sure I do one of those, I now get a different set of messages but we are back to running again. mpirun -v -n 1

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Kevin . Buckley
> I would be leery of the hard-coded stuff. Indeed, so I changed it to: intf.if_mask = prefix( sin_addr->sin_addr.s_addr); which seems to match what the "old" code was doing: still blowing up though. > Reason: the IPv6 code has been a continual source of trouble, > while the IPv4 code has

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Ralph Castain
I would be leery of the hard-coded stuff. Reason: the IPv6 code has been a continual source of trouble, while the IPv4 code has worked quite well. Could be a lot of reasons, especially the fact that the IPv6 code is hardly exercised by the devel team...so changes that cause problems are rarely

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Kevin . Buckley
> I believe this line is incorrect: > >>opal_list_append(_if_list, (opal_list_item_t*) >> intf_ptr); > > It needs to be > > opal_list_append(_if_list, _ptr->super); Didn't seem to change things. Any thoughts on the: /* * hardcoded netmask, adrian says that's

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Ralph Castain
Given that it is working for us at the moment, and my current priorities, I doubt I'll get to this over the next 2-3 weeks. So if you have time and care to look at it before then, please do! Thanks On Dec 1, 2009, at 8:45 PM, kevin.buck...@ecs.vuw.ac.nz wrote: >> Interesting - especially

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Kevin . Buckley
> Interesting - especially since the existing code works quite well over a > wide range of platforms. So I'm not quite so eager to declare it incorrect > and only working by accident. > > However, I would welcome a proposed patch so we can look at it. This is > always an important area for us, so

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Ralph Castain
Interesting - especially since the existing code works quite well over a wide range of platforms. So I'm not quite so eager to declare it incorrect and only working by accident. However, I would welcome a proposed patch so we can look at it. This is always an important area for us, so the more

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Kevin . Buckley
>> I assume that both of you have seen the reply from Aleksej Saushev, >> who seems to be the bloke looking after the port of OpenMPI to the >> NetBSD platform. >> >> >> Aleksej suggested some mods he had partially looked at, in >> >> opal/util/if.c > > Nope - didn't see anything like that :-/

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Ralph Castain
On Dec 1, 2009, at 6:43 PM, kevin.buck...@ecs.vuw.ac.nz wrote: > >> "Jeff Squyres" >> >> >> Oy. This is ick, because this error code is coming from horrendously >> complex code deep in the depths of OMPI that is probing the OS to >> figure out what ethernet interfaces

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Kevin . Buckley
> "Jeff Squyres" > > > Oy. This is ick, because this error code is coming from horrendously > complex code deep in the depths of OMPI that is probing the OS to > figure out what ethernet interfaces you have. It may or may not be > simple to fix this. > > Do you mind diving

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Ralph Castain
I believe what this is saying is that we are not finding any TCP interfaces - the ioctl itself is failing. So yes - miprun failing at that point is going to happen because we have no way to communicate for launch. Do you see interfaces if you do an /sbin/ifconfig? Do they have valid IP

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Jeff Squyres
On Nov 29, 2009, at 6:15 PM, wrote: $ mpirun -n 4 hello_f77 [somebox.ecs.vuw.ac.nz:04414] opal_ifinit: ioctl(SIOCGIFFLAGS) failed with errno=6 Oy. This is ick, because this error code is coming from horrendously complex code

[OMPI users] Pointers for understanding failure messages on NetBSD

2009-11-29 Thread Kevin . Buckley
Hi there, I recently compiled OpenMPI 1.3.3 for a NetBSD platform as part of an attempt to get some MPI-based codes running on the SGE cycle stealing grid we have in the School here. I should point out that this has not been done within the pkgsrc build system as yet but that I found I was able