Hi Ab
On 8/29/08, Abhishek Kulkarni <[EMAIL PROTECTED]> wrote:
> Hi Daniel,
>
> Understand the way in which XCPU is supposed to integrate with oneSIS and/or
> Perceus. It uses these as a "launch vehicle" to build minimal images with
> xcpufs running on them, and provision the nodes with these images. In the
> best case, that's all that you need to be running on the compute nodes.
I understand.
>
> On Fri, Aug 29, 2008 at 8:46 AM, Daniel Gruner <[EMAIL PROTECTED]> wrote:
> >
> > Hi Greg,
> >
> > I definitely have additional questions! :-)
> >
> > Ok, here we go:
> >
> > - assume I am totally new to this - what would one do in order to set
> > up a perceus/xcpu cluster?
>
> As Greg said, you have two ways to go about it. You could choose either of
> them or try both to see what works for ya. It's just a matter of playing
> with different configurations and rebooting your nodes to try them.
>
> >
> >
> > - now, I am not totally new to this game, and my background is with
> > bproc clusters, so I would like to have a replacement for these, but
> > with the same basic principle of having a minimal node installation,
> > and basically no management of nodes needed. I definitely do not want
> > to go to a model where the nodes have password files, and you ssh into
> > them in order to run your codes.
> >
> > - in the caos-NSA installation, the warewulfd is started by default.
> > I assume it needs to be stopped and perceus started, correct?
>
> You can enable Perceus from "sidekick" in NSA. Warewulf focuses on cluster
> monitoring starting with 3.0.
Ok, I am concentrating on my RHEL5 machine for now. It seems to be
working, at least insofar as the nodes boot. I haven't been able to
contact them to try to do anything, other than running xstat with a
positive response:
n0000 tcp!10.10.0.170!6667 /Linux/x86_64 up 0
n0001 tcp!10.10.0.185!6667 /Linux/x86_64 up 0
I'd like the nodes to get sequential IP addresses, for ease of
identification and management, and I have yet to find out how you do
that in perceus.
Now, when I try to do anything on the nodes I get, for example:
xgroupset 10.10.0.170 root 0
xgroupset: Error: Connection refused:10.10.0.170
similarly with xrx.
xrx 10.10.0.170 /bin/date
Error: Connection refused:10.10.0.170
I also don't get name resolution for the nXXXX names assigned to the
nodes by perceus.
>
> >
> >
> > - what initialization of perceus needs to be done (the first time it
> > runs)? I know about the network interface specification, and that I
> > want it to use xget (the default), but is running the "perceus module
> > activate xcpu" enough to get the nodes booting into xcpu?
>
> Yes, it is enough to get xcpufs running on the compute nodes.
>
> >
> >
> > - what about configuring the resource manager (e.g. slurm) for use in
> > the perceus/xcpu environment?
>
> XCPU only supports Moab Torque for now.
Is this the open source torque, or just the commercial product?
>
> >
> >
> > - I don't see the xcpufs and statfs daemons running on the master
> > after starting perceus even though I told it to activate xcpu. I
> > haven't tried to boot nodes yet, but I'd like to understand what I am
> > doing first (I hate black boxes...).
> >
>
> You shouldn't need to run xcpufs on the master. As for statfs, you can start
> it manually if it is not running already.
>
> Again, considering that you have fully configured the master and have the
> nodes provisioned to the init state, this is what I would do to generate my
> statfs.conf --
>
> perceus node status | awk 'NR > 2 {print $1 "=tcp!" $3 "!6667"}' >
> /etc/xcpu/statfs.conf
I had to replace the part "NR>2" with "NR>0" for the above incantation
to work (??).
>
> And then,
>
> statfs -c /etc/xcpu/statfs
statfs seems to work. Here is the output from xstat:
n0000 tcp!10.10.0.170!6667 /Linux/x86_64 up 0
n0001 tcp!10.10.0.185!6667 /Linux/x86_64 up 0
In any case, there is some progress, but it is not quite there yet...
Thanks,
Daniel
>
>
> >
> > etc.
> >
> > I guess the main problem I have is not with perceus itself (I have
> > read the manual), but rather with its integration and provisioning for
> > xcpu, and for the subsequent configuration of those pieces that make
> > the cluster useable in a production environment.
> >
> >
> > Thanks for your help,
> > Daniel
>
> Thanks
> -- Abhishek
>
> >
> >
> >
> >
> >
> >
> > On 8/29/08, Greg Kurtzer <[EMAIL PROTECTED]> wrote:
> > >
> > > You have multiple choices on how to move forward.
> > >
> > > First you can run the xcpu Perceus module like:
> > >
> > > # perceus module activate xcpu
> > >
> > > That will interrupt the node provisioning process and instead of
> > > copying the VNFS to the node it will just start up xcpu and start
> > > accepting connections.
> > >
> > > The second option would be to run xcpu from within the VNFS of your
> > > choice. That mechanism basically involves installing xcpu into the
> > > mounted VNFS image and then provision your nodes with that.
> > >
> > > Let me know if that helps or if you have additional questions. :)
> > >
> > >
> > > Greg
> > >
> > >
> > >
> > >
> > > On Fri, Aug 29, 2008 at 6:45 AM, Daniel Gruner <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > > Hi Kevin,
> > > >
> > > > Well, I've just completed installing xcpu2 and perceus into my RHEL5
> > > > machine, but now I am stumped with the configuration. How do you
> tell
> > > > perceus that you want your cluster to run xcpu? I sure don't
> > > > understand where this is configured (I assume somewhere in the
> > > > /etc/perceus .conf files), and there is no mention of that in the
> > > > manual other than saying that xcpu works.
> > > >
> > > > If you install xcpu2 you surely would need 9p, right?
> > > >
> > > > Also, how does slurm integrate into the perceus/xcpu world?
> > > >
> > > > I have also installed this on a caos-NSA test machine, but again I
> > > > don't know how to configure the provisioning.
> > > >
> > > > Any help with this would be much appreciated...
> > > >
> > > > Daniel
> > > >
> > > >
> > > > On 8/28/08, Kevin Tegtmeier <[EMAIL PROTECTED]> wrote:
> > > >> We used RHEL5 + perceus successfully. I had to modify the perceus
> boot
> > > >> image for x86_64, but it may have been a kexec/hardware specific
> issue I ran
> > > >> into. If you run into an issue with it I can help you along.
> > > >>
> > > >> I don't think the 9P module was built in, but I don't think you
> would use
> > > >> it.
> > > >>
> > > >>
> > > >> On Thu, Aug 28, 2008 at 11:31 AM, Daniel Gruner <[EMAIL PROTECTED]>
> wrote:
> > > >>
> > > >> >
> > > >> > Thanks, Abhishek.
> > > >> >
> > > >> > I will try it and report on my success/lack thereof.
> > > >> >
> > > >> > Just for info, I am using a RHEL5 distribution, but with the
> 2.6.26
> > > >> > kernel so that it supports 9p. Has anybody been successful with
> this
> > > >> > distribution? Otherwise, is there a preferred one?
> > > >> >
> > > >> > Daniel
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > On 8/28/08, Abhishek Kulkarni <[EMAIL PROTECTED]> wrote:
> > > >> > >
> > > >> > > Daniel,
> > > >> > >
> > > >> > > It is _not_ necessary to install cAos Linux to use Perceus.
> Perceus
> > > >> > > supports most, if not all, distributions.
> > > >> > >
> > > >> > > XCPU is bundled up as a module within Perceus. The
> documentation at
> > > >> > >
> > > >>
> http://www.perceus.org/docs/perceus-userguide-1.4.0.pdf is
> > > >> quite
> > > >> > > extensive at that and has details on importing and activating
> modules.
> > > >> > > It's quite simple even if you find yourself wanting to tinker
> with the
> > > >> > > XCPU Perceus module (it's just a shell script that runs at a
> specified
> > > >> > > provisioning state/level)
> > > >> > >
> > > >> > >
> > > >> > > -- Abhishek
> > > >> > >
> > > >> > >
> > > >> > > On Thu, 2008-08-28 at 14:17 -0400, Daniel Gruner wrote:
> > > >> > > > Yes, that is a possibility. Instructions on that, please?
> > > >> > > > I tried installing caos linux, but it doesn't quite finish
> doing the
> > > >> install.
> > > >> > > >
> > > >> > > > Daniel
> > > >> > > >
> > > >> > > > On 8/28/08, ron minnich <[EMAIL PROTECTED]> wrote:
> > > >> > > > >
> > > >> > > > > Use perceus.
> > > >> > > > >
> > > >> > > > > Ron
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On 8/28/08, Daniel Gruner <[EMAIL PROTECTED]> wrote:
> > > >> > > > > >
> > > >> > > > > > Hi All,
> > > >> > > > > >
> > > >> > > > > > The list has been very quiet lately... :-)
> > > >> > > > > >
> > > >> > > > > > I've been trying, yet again, to install the latest xcpu2
> in a
> > > >> test
> > > >> > > > > > cluster. Ron's instructions on the xcpu.org site seem
> to be
> > > >> outdated,
> > > >> > > > > > and partly buggy too. For instance, here are a couple
> of
> > > >> points:
> > > >> > > > > >
> > > >> > > > > > - After doing:
> > > >> > > > > >
> > > >> > > > > > make xcpu-tarball
> > > >> > > > > >
> > > >> > > > > > make ramfs-tarball
> > > >> > > > > >
> > > >> > > > > > make install
> > > >> > > > > >
> > > >> > > > > > I don't know whether xcpu2 has actually been built (I
> suspect
> > > >> not),
> > > >> > > > > > and it certainly has not been installed (e.g. no xrx, or
> xcpufs,
> > > >> or
> > > >> > > > > > any of that stuff has been installed).
> > > >> > > > > >
> > > >> > > > > > - The command
> > > >> > > > > >
> > > >> > > > > > export u=`uname -r`
> > > >> > > > > > ./mk-initramfs-oneSIS -f initrd-$u.img $u -nn -rr \
> > > >> > > > > > -o ../overlays/xcpu-64 \
> > > >> > > > > > -w e1000 \
> > > >> > > > > > -w forcedeth \
> > > >> > > > > > -w ext3
> > > >> > > > > >
> > > >> > > > > > should really be
> > > >> > > > > >
> > > >> > > > > > ./mk-xcpu-oneSIS ....
> > > >> > > > > >
> > > >> > > > > > in order that the 9p and 9pnet modules get loaded into
> the
> > > >> initrd.
> > > >> > > > > >
> > > >> > > > > > Can someone please take a look and revise the
> instructions (and
> > > >> let us
> > > >> > > > > > mere mortals know what to do)?
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > Furthermore, is xcpu2 actualy useable for production
> work? What
> > > >> about
> > > >> > > > > > its integration with a scheduler/resource manager? What
> about
> > > >> MPI?
> > > >> > > > > >
> > > >> > > > > > Regards,
> > > >> > > > > > Daniel
> > > >> > > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > --
> > > >> > > > > Sent from Gmail for mobile | mobile.google.com
> > > >> > > > >
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >
> > >
> > >
> > >
> > >
> > > --
> > > Greg Kurtzer
> > > http://www.infiscale.com/
> > > http://www.runlevelzero.net/
> > > http://www.perceus.org/
> > > http://www.caoslinux.org/
> > >
> >
>
>