On 8/29/08, Greg Kurtzer <[EMAIL PROTECTED]> wrote:
>
> Perceus only runs the provisioning part (xget) on a non standard port
> so it doesn't interfere. The xcpu stuff is all standard.
Strange...
>
> As Abhishek mentioned, check out the ipaddr Perceus module if your
> provisioning, otherwise just add entries in the hostfile and do a
> /etc/init.d/perceus reload. (Caos NSA should already set this up for
> the user automatically, let me know if it didn't work).
Is there a better description of the modules and their options and
configuration than what is written in the user guide?
I would much rather get static addresses, as it makes debugging and
problem solving a lot easier (as well as node identification).
>
> If you want to do the host resolution on the dynamic addresses (not
> something I recommend...) add "nameserver 127.0.0.1" to the top of the
> master's /etc/resolv.conf. The better fix is to add the entires in the
> /etc/hosts and the DHCP server itself will manage the static IP
> addresses.
The nameserver 127.0.0.1 entry did nothing. In fact, these are the
messages that I got in the system log:
Aug 29 13:59:53 dgk3 perceus-dnsmasq[3100]: reading /etc/resolv.conf
Aug 29 13:59:53 dgk3 perceus-dnsmasq[3100]: using nameserver 128.100.102.202#53
Aug 29 13:59:53 dgk3 perceus-dnsmasq[3100]: using nameserver 142.150.224.6#53
Aug 29 13:59:53 dgk3 perceus-dnsmasq[3100]: using nameserver 142.150.224.224#53
Aug 29 13:59:53 dgk3 perceus-dnsmasq[3100]: ignoring nameserver
127.0.0.1 - local interface
Aug 29 13:59:53 dgk3 perceus-dnsmasq[3100]: using local addresses only
for unqualified domains
So, just to clarify, if I want static addresses for the nodes I need
to activate the ipaddr module AND add the addresses to /etc/hosts?
I have another question, related to both perceus and xcpu: How does
one reboot a node? Is there a perceus command or an xcpu command to
do it?
Also, what does one do about time keeping in a setup like this? Is
ntp a possibility?
First I'd like to get this going as far as being able to run xrx -a
/bin/date.... :-)
Thanks,
Daniel
>
> Thanks,
>
> Greg
>
>
> On Fri, Aug 29, 2008 at 9:38 AM, Abhishek Kulkarni <[EMAIL PROTECTED]> wrote:
> >
> >
> > On Fri, Aug 29, 2008 at 10:13 AM, Daniel Gruner <[EMAIL PROTECTED]> wrote:
> >>
> >> Hi Ab
> >>
> >> On 8/29/08, Abhishek Kulkarni <[EMAIL PROTECTED]> wrote:
> >> > Hi Daniel,
> >> >
> >> > Understand the way in which XCPU is supposed to integrate with oneSIS
> >> > and/or
> >> > Perceus. It uses these as a "launch vehicle" to build minimal images
> >> > with
> >> > xcpufs running on them, and provision the nodes with these images. In
> >> > the
> >> > best case, that's all that you need to be running on the compute nodes.
> >>
> >> I understand.
> >>
> >> >
> >> > On Fri, Aug 29, 2008 at 8:46 AM, Daniel Gruner <[EMAIL PROTECTED]>
> >> > wrote:
> >> > >
> >> > > Hi Greg,
> >> > >
> >> > > I definitely have additional questions! :-)
> >> > >
> >> > > Ok, here we go:
> >> > >
> >> > > - assume I am totally new to this - what would one do in order to set
> >> > > up a perceus/xcpu cluster?
> >> >
> >> > As Greg said, you have two ways to go about it. You could choose either
> >> > of
> >> > them or try both to see what works for ya. It's just a matter of playing
> >> > with different configurations and rebooting your nodes to try them.
> >> >
> >> > >
> >> > >
> >> > > - now, I am not totally new to this game, and my background is with
> >> > > bproc clusters, so I would like to have a replacement for these, but
> >> > > with the same basic principle of having a minimal node installation,
> >> > > and basically no management of nodes needed. I definitely do not want
> >> > > to go to a model where the nodes have password files, and you ssh into
> >> > > them in order to run your codes.
> >> > >
> >> > > - in the caos-NSA installation, the warewulfd is started by default.
> >> > > I assume it needs to be stopped and perceus started, correct?
> >> >
> >> > You can enable Perceus from "sidekick" in NSA. Warewulf focuses on
> >> > cluster
> >> > monitoring starting with 3.0.
> >>
> >> Ok, I am concentrating on my RHEL5 machine for now. It seems to be
> >> working, at least insofar as the nodes boot. I haven't been able to
> >> contact them to try to do anything, other than running xstat with a
> >> positive response:
> >>
> >> n0000 tcp!10.10.0.170!6667 /Linux/x86_64 up 0
> >> n0001 tcp!10.10.0.185!6667 /Linux/x86_64 up 0
> >>
> >> I'd like the nodes to get sequential IP addresses, for ease of
> >> identification and management, and I have yet to find out how you do
> >> that in perceus.
> >
> > Take a look at the ipaddr module in Perceus.
> >
> >>
> >> Now, when I try to do anything on the nodes I get, for example:
> >>
> >> xgroupset 10.10.0.170 root 0
> >> xgroupset: Error: Connection refused:10.10.0.170
> >
> > Whoops! What about telnet 10.10.0.170 6667?
> > Perceus might possibly be running xcpufs on some non-standard port. I'm not
> > sure about that but I remember seeing something like that a while back.
> >
> >>
> >> similarly with xrx.
> >>
> >> xrx 10.10.0.170 /bin/date
> >> Error: Connection refused:10.10.0.170
> >
> > Ditto with this, if it's running on a different port you would want to do
> > xrx 10.10.0.170!port /bin/date
> >
> > Alternatively you could specify the "-a" flag to retrieve the nodes from
> the
> > statfs.
> >
> >>
> >> I also don't get name resolution for the nXXXX names assigned to the
> >> nodes by perceus.
> >
> > Check your /etc/resolv.conf.
> > Probably try adding the following to it.
> > nameserver 127.0.0.1
> >
> > If that doesn't work, the right place to ask this would be the Perceus ML.
> >
> >>
> >>
> >> >
> >> > >
> >> > >
> >> > > - what initialization of perceus needs to be done (the first time it
> >> > > runs)? I know about the network interface specification, and that I
> >> > > want it to use xget (the default), but is running the "perceus module
> >> > > activate xcpu" enough to get the nodes booting into xcpu?
> >> >
> >> > Yes, it is enough to get xcpufs running on the compute nodes.
> >> >
> >> > >
> >> > >
> >> > > - what about configuring the resource manager (e.g. slurm) for use in
> >> > > the perceus/xcpu environment?
> >> >
> >> > XCPU only supports Moab Torque for now.
> >>
> >> Is this the open source torque, or just the commercial product?
> >>
> >>
> >> >
> >> > >
> >> > >
> >> > > - I don't see the xcpufs and statfs daemons running on the master
> >> > > after starting perceus even though I told it to activate xcpu. I
> >> > > haven't tried to boot nodes yet, but I'd like to understand what I am
> >> > > doing first (I hate black boxes...).
> >> > >
> >> >
> >> > You shouldn't need to run xcpufs on the master. As for statfs, you can
> >> > start
> >> > it manually if it is not running already.
> >> >
> >> > Again, considering that you have fully configured the master and have
> >> > the
> >> > nodes provisioned to the init state, this is what I would do to generate
> >> > my
> >> > statfs.conf --
> >> >
> >> > perceus node status | awk 'NR > 2 {print $1 "=tcp!" $3 "!6667"}' >
> >> > /etc/xcpu/statfs.conf
> >>
> >> I had to replace the part "NR>2" with "NR>0" for the above incantation
> >> to work (??).
> >
> > Strange, I might probably be running a different version of Perceus.
> >
> >>
> >> >
> >> > And then,
> >> >
> >> > statfs -c /etc/xcpu/statfs
> >>
> >> statfs seems to work. Here is the output from xstat:
> >>
> >> n0000 tcp!10.10.0.170!6667 /Linux/x86_64 up 0
> >> n0001 tcp!10.10.0.185!6667 /Linux/x86_64 up 0
> >>
> >> In any case, there is some progress, but it is not quite there yet...
> >
> > I'm glad you are almost there.
> >
> > Thanks,
> > -- Abhishek
> >
> >>
> >>
> >> Thanks,
> >> Daniel
> >>
> >>
> >>
> >>
> >>
> >>
> >> >
> >> >
> >> > >
> >> > > etc.
> >> > >
> >> > > I guess the main problem I have is not with perceus itself (I have
> >> > > read the manual), but rather with its integration and provisioning for
> >> > > xcpu, and for the subsequent configuration of those pieces that make
> >> > > the cluster useable in a production environment.
> >> > >
> >> > >
> >> > > Thanks for your help,
> >> > > Daniel
> >> >
> >> > Thanks
> >> > -- Abhishek
> >> >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On 8/29/08, Greg Kurtzer <[EMAIL PROTECTED]> wrote:
> >> > > >
> >> > > > You have multiple choices on how to move forward.
> >> > > >
> >> > > > First you can run the xcpu Perceus module like:
> >> > > >
> >> > > > # perceus module activate xcpu
> >> > > >
> >> > > > That will interrupt the node provisioning process and instead of
> >> > > > copying the VNFS to the node it will just start up xcpu and start
> >> > > > accepting connections.
> >> > > >
> >> > > > The second option would be to run xcpu from within the VNFS of your
> >> > > > choice. That mechanism basically involves installing xcpu into the
> >> > > > mounted VNFS image and then provision your nodes with that.
> >> > > >
> >> > > > Let me know if that helps or if you have additional questions. :)
> >> > > >
> >> > > >
> >> > > > Greg
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Fri, Aug 29, 2008 at 6:45 AM, Daniel Gruner <[EMAIL PROTECTED]>
> >> > wrote:
> >> > > > >
> >> > > > > Hi Kevin,
> >> > > > >
> >> > > > > Well, I've just completed installing xcpu2 and perceus into my
> >> > > > RHEL5
> >> > > > > machine, but now I am stumped with the configuration. How do you
> >> > tell
> >> > > > > perceus that you want your cluster to run xcpu? I sure don't
> >> > > > > understand where this is configured (I assume somewhere in the
> >> > > > > /etc/perceus .conf files), and there is no mention of that in the
> >> > > > > manual other than saying that xcpu works.
> >> > > > >
> >> > > > > If you install xcpu2 you surely would need 9p, right?
> >> > > > >
> >> > > > > Also, how does slurm integrate into the perceus/xcpu world?
> >> > > > >
> >> > > > > I have also installed this on a caos-NSA test machine, but again
> >> > > > I
> >> > > > > don't know how to configure the provisioning.
> >> > > > >
> >> > > > > Any help with this would be much appreciated...
> >> > > > >
> >> > > > > Daniel
> >> > > > >
> >> > > > >
> >> > > > > On 8/28/08, Kevin Tegtmeier <[EMAIL PROTECTED]> wrote:
> >> > > > >> We used RHEL5 + perceus successfully. I had to modify the
> >> > > > perceus
> >> > boot
> >> > > > >> image for x86_64, but it may have been a kexec/hardware specific
> >> > issue I ran
> >> > > > >> into. If you run into an issue with it I can help you along.
> >> > > > >>
> >> > > > >> I don't think the 9P module was built in, but I don't think you
> >> > would use
> >> > > > >> it.
> >> > > > >>
> >> > > > >>
> >> > > > >> On Thu, Aug 28, 2008 at 11:31 AM, Daniel Gruner
> >> > > > <[EMAIL PROTECTED]>
> >> > wrote:
> >> > > > >>
> >> > > > >> >
> >> > > > >> > Thanks, Abhishek.
> >> > > > >> >
> >> > > > >> > I will try it and report on my success/lack thereof.
> >> > > > >> >
> >> > > > >> > Just for info, I am using a RHEL5 distribution, but with the
> >> > 2.6.26
> >> > > > >> > kernel so that it supports 9p. Has anybody been successful
> >> > > > with
> >> > this
> >> > > > >> > distribution? Otherwise, is there a preferred one?
> >> > > > >> >
> >> > > > >> > Daniel
> >> > > > >> >
> >> > > > >> >
> >> > > > >> >
> >> > > > >> >
> >> > > > >> > On 8/28/08, Abhishek Kulkarni <[EMAIL PROTECTED]> wrote:
> >> > > > >> > >
> >> > > > >> > > Daniel,
> >> > > > >> > >
> >> > > > >> > > It is _not_ necessary to install cAos Linux to use Perceus.
> >> > Perceus
> >> > > > >> > > supports most, if not all, distributions.
> >> > > > >> > >
> >> > > > >> > > XCPU is bundled up as a module within Perceus. The
> >> > documentation at
> >> > > > >> > >
> >> > > > >>
> >> > http://www.perceus.org/docs/perceus-userguide-1.4.0.pdf is
> >> > > > >> quite
> >> > > > >> > > extensive at that and has details on importing and
> >> > > > activating
> >> > modules.
> >> > > > >> > > It's quite simple even if you find yourself wanting to
> >> > > > tinker
> >> > with the
> >> > > > >> > > XCPU Perceus module (it's just a shell script that runs at
> >> > > > a
> >> > specified
> >> > > > >> > > provisioning state/level)
> >> > > > >> > >
> >> > > > >> > >
> >> > > > >> > > -- Abhishek
> >> > > > >> > >
> >> > > > >> > >
> >> > > > >> > > On Thu, 2008-08-28 at 14:17 -0400, Daniel Gruner wrote:
> >> > > > >> > > > Yes, that is a possibility. Instructions on that,
> >> > > > please?
> >> > > > >> > > > I tried installing caos linux, but it doesn't quite
> >> > > > finish
> >> > doing the
> >> > > > >> install.
> >> > > > >> > > >
> >> > > > >> > > > Daniel
> >> > > > >> > > >
> >> > > > >> > > > On 8/28/08, ron minnich <[EMAIL PROTECTED]> wrote:
> >> > > > >> > > > >
> >> > > > >> > > > > Use perceus.
> >> > > > >> > > > >
> >> > > > >> > > > > Ron
> >> > > > >> > > > >
> >> > > > >> > > > >
> >> > > > >> > > > > On 8/28/08, Daniel Gruner <[EMAIL PROTECTED]> wrote:
> >> > > > >> > > > > >
> >> > > > >> > > > > > Hi All,
> >> > > > >> > > > > >
> >> > > > >> > > > > > The list has been very quiet lately... :-)
> >> > > > >> > > > > >
> >> > > > >> > > > > > I've been trying, yet again, to install the latest
> >> > > > xcpu2
> >> > in a
> >> > > > >> test
> >> > > > >> > > > > > cluster. Ron's instructions on the xcpu.org site
> >> > > > seem
> >> > to be
> >> > > > >> outdated,
> >> > > > >> > > > > > and partly buggy too. For instance, here are a
> >> > > > couple
> >> > of
> >> > > > >> points:
> >> > > > >> > > > > >
> >> > > > >> > > > > > - After doing:
> >> > > > >> > > > > >
> >> > > > >> > > > > > make xcpu-tarball
> >> > > > >> > > > > >
> >> > > > >> > > > > > make ramfs-tarball
> >> > > > >> > > > > >
> >> > > > >> > > > > > make install
> >> > > > >> > > > > >
> >> > > > >> > > > > > I don't know whether xcpu2 has actually been built
> >> > > > (I
> >> > suspect
> >> > > > >> not),
> >> > > > >> > > > > > and it certainly has not been installed (e.g. no
> >> > > > xrx, or
> >> > xcpufs,
> >> > > > >> or
> >> > > > >> > > > > > any of that stuff has been installed).
> >> > > > >> > > > > >
> >> > > > >> > > > > > - The command
> >> > > > >> > > > > >
> >> > > > >> > > > > > export u=`uname -r`
> >> > > > >> > > > > > ./mk-initramfs-oneSIS -f initrd-$u.img $u -nn -rr \
> >> > > > >> > > > > > -o ../overlays/xcpu-64 \
> >> > > > >> > > > > > -w e1000 \
> >> > > > >> > > > > > -w forcedeth \
> >> > > > >> > > > > > -w ext3
> >> > > > >> > > > > >
> >> > > > >> > > > > > should really be
> >> > > > >> > > > > >
> >> > > > >> > > > > > ./mk-xcpu-oneSIS ....
> >> > > > >> > > > > >
> >> > > > >> > > > > > in order that the 9p and 9pnet modules get loaded
> >> > > > into
> >> > the
> >> > > > >> initrd.
> >> > > > >> > > > > >
> >> > > > >> > > > > > Can someone please take a look and revise the
> >> > instructions (and
> >> > > > >> let us
> >> > > > >> > > > > > mere mortals know what to do)?
> >> > > > >> > > > > >
> >> > > > >> > > > > >
> >> > > > >> > > > > > Furthermore, is xcpu2 actualy useable for production
> >> > work? What
> >> > > > >> about
> >> > > > >> > > > > > its integration with a scheduler/resource manager?
> >> > > > What
> >> > about
> >> > > > >> MPI?
> >> > > > >> > > > > >
> >> > > > >> > > > > > Regards,
> >> > > > >> > > > > > Daniel
> >> > > > >> > > > > >
> >> > > > >> > > > >
> >> > > > >> > > > >
> >> > > > >> > > > > --
> >> > > > >> > > > > Sent from Gmail for mobile | mobile.google.com
> >> > > > >> > > > >
> >> > > > >> > >
> >> > > > >> > >
> >> > > > >> >
> >> > > > >>
> >> > > > >>
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Greg Kurtzer
> >> > > > http://www.infiscale.com/
> >> > > > http://www.runlevelzero.net/
> >> > > > http://www.perceus.org/
> >> > > > http://www.caoslinux.org/
> >> > > >
> >> > >
> >> >
> >> >
> >
> >
>
>
>
>
> --
>
> Greg Kurtzer
> http://www.infiscale.com/
> http://www.runlevelzero.net/
> http://www.perceus.org/
> http://www.caoslinux.org/
>