On 8/29/08, Abhishek Kulkarni <[EMAIL PROTECTED]> wrote:
>
>
>
> On Fri, Aug 29, 2008 at 10:13 AM, Daniel Gruner <[EMAIL PROTECTED]> wrote:
> >
> > Hi Ab
> >
> >
> > On 8/29/08, Abhishek Kulkarni <[EMAIL PROTECTED]> wrote:
> > > Hi Daniel,
> > >
> > > Understand the way in which XCPU is supposed to integrate with oneSIS
> and/or
> > > Perceus. It uses these as a "launch vehicle" to build minimal images
> with
> > > xcpufs running on them, and provision the nodes with these images. In
> the
> > > best case, that's all that you need to be running on the compute nodes.
> >
> > I understand.
> >
> >
> > >
> > > On Fri, Aug 29, 2008 at 8:46 AM, Daniel Gruner <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > > Hi Greg,
> > > >
> > > > I definitely have additional questions! :-)
> > > >
> > > > Ok, here we go:
> > > >
> > > > - assume I am totally new to this - what would one do in order to set
> > > > up a perceus/xcpu cluster?
> > >
> > > As Greg said, you have two ways to go about it. You could choose either
> of
> > > them or try both to see what works for ya. It's just a matter of playing
> > > with different configurations and rebooting your nodes to try them.
> > >
> > > >
> > > >
> > > > - now, I am not totally new to this game, and my background is with
> > > > bproc clusters, so I would like to have a replacement for these, but
> > > > with the same basic principle of having a minimal node installation,
> > > > and basically no management of nodes needed. I definitely do not want
> > > > to go to a model where the nodes have password files, and you ssh into
> > > > them in order to run your codes.
> > > >
> > > > - in the caos-NSA installation, the warewulfd is started by default.
> > > > I assume it needs to be stopped and perceus started, correct?
> > >
> > > You can enable Perceus from "sidekick" in NSA. Warewulf focuses on
> cluster
> > > monitoring starting with 3.0.
> >
> > Ok, I am concentrating on my RHEL5 machine for now. It seems to be
> > working, at least insofar as the nodes boot. I haven't been able to
> > contact them to try to do anything, other than running xstat with a
> > positive response:
> >
> > n0000 tcp!10.10.0.170!6667 /Linux/x86_64 up 0
> > n0001 tcp!10.10.0.185!6667 /Linux/x86_64 up 0
> >
> > I'd like the nodes to get sequential IP addresses, for ease of
> > identification and management, and I have yet to find out how you do
> > that in perceus.
>
> Take a look at the ipaddr module in Perceus.
Thanks for the pointer. I am looking at it, but the manual leaves a
lot to be desired in terms of describing what each module does and how
to configure them. I'll try the perceus list if I keep getting stuck
on this.
>
> >
> >
> > Now, when I try to do anything on the nodes I get, for example:
> >
> > xgroupset 10.10.0.170 root 0
> > xgroupset: Error: Connection refused:10.10.0.170
>
> Whoops! What about telnet 10.10.0.170 6667?
> Perceus might possibly be running xcpufs on some non-standard port. I'm not
> sure about that but I remember seeing something like that a while back.
You seem to have hit it! I can in fact telnet using port 6667
explicitly (can't do anything while in there...:-). I thought that
was the default port anyway, correct?
>
> >
> >
> > similarly with xrx.
> >
> > xrx 10.10.0.170 /bin/date
> > Error: Connection refused:10.10.0.170
>
> Ditto with this, if it's running on a different port you would want to do
> xrx 10.10.0.170!port /bin/date
>
> Alternatively you could specify the "-a" flag to retrieve the nodes from the
> statfs.
I can get xgroupset and xuserset to work with the -a flag, with no
complaints. However, when I try to run anything on the nodes using
xrx, whether I use the -a flag or explicitly set the port (xrx
10.10.0.170\!6667 /bin/date) the command just hangs. I realize this
is some progress, but no cigar yet.
>
> >
> >
> > I also don't get name resolution for the nXXXX names assigned to the
> > nodes by perceus.
> >
> >
>
> Check your /etc/resolv.conf.
> Probably try adding the following to it.
> nameserver 127.0.0.1
>
> If that doesn't work, the right place to ask this would be the Perceus ML.
Doesn't work. I'll try the perceus gurus...
>
> >
> >
> >
> > >
> > > >
> > > >
> > > > - what initialization of perceus needs to be done (the first time it
> > > > runs)? I know about the network interface specification, and that I
> > > > want it to use xget (the default), but is running the "perceus module
> > > > activate xcpu" enough to get the nodes booting into xcpu?
> > >
> > > Yes, it is enough to get xcpufs running on the compute nodes.
> > >
> > > >
> > > >
> > > > - what about configuring the resource manager (e.g. slurm) for use in
> > > > the perceus/xcpu environment?
> > >
> > > XCPU only supports Moab Torque for now.
> >
> > Is this the open source torque, or just the commercial product?
Who would know about which version of Torque this is? I can't afford
the commercial Moab right now...
> >
> >
> >
> >
> > >
> > > >
> > > >
> > > > - I don't see the xcpufs and statfs daemons running on the master
> > > > after starting perceus even though I told it to activate xcpu. I
> > > > haven't tried to boot nodes yet, but I'd like to understand what I am
> > > > doing first (I hate black boxes...).
> > > >
> > >
> > > You shouldn't need to run xcpufs on the master. As for statfs, you can
> start
> > > it manually if it is not running already.
> > >
> > > Again, considering that you have fully configured the master and have
> the
> > > nodes provisioned to the init state, this is what I would do to generate
> my
> > > statfs.conf --
> > >
> > > perceus node status | awk 'NR > 2 {print $1 "=tcp!" $3 "!6667"}' >
> > > /etc/xcpu/statfs.conf
> >
> > I had to replace the part "NR>2" with "NR>0" for the above incantation
> > to work (??).
>
> Strange, I might probably be running a different version of Perceus.
Actually, what happens is that the first two lines of output from the
"perceus node status" command are output to stderr, and the rest to
stdout. If your '|' redirection included stderr then the command as
you wrote it would work. What shell are you using?
Thanks,
Daniel
>
> >
> >
> >
> > >
> > > And then,
> > >
> > > statfs -c /etc/xcpu/statfs
> >
> > statfs seems to work. Here is the output from xstat:
> >
> > n0000 tcp!10.10.0.170!6667 /Linux/x86_64 up 0
> > n0001 tcp!10.10.0.185!6667 /Linux/x86_64 up 0
> >
> > In any case, there is some progress, but it is not quite there yet...
>
> I'm glad you are almost there.
>
> Thanks,
> -- Abhishek
>
>
> >
> >
> > Thanks,
> > Daniel
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > >
> > >
> > > >
> > > > etc.
> > > >
> > > > I guess the main problem I have is not with perceus itself (I have
> > > > read the manual), but rather with its integration and provisioning for
> > > > xcpu, and for the subsequent configuration of those pieces that make
> > > > the cluster useable in a production environment.
> > > >
> > > >
> > > > Thanks for your help,
> > > > Daniel
> > >
> > > Thanks
> > > -- Abhishek
> > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On 8/29/08, Greg Kurtzer <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > You have multiple choices on how to move forward.
> > > > >
> > > > > First you can run the xcpu Perceus module like:
> > > > >
> > > > > # perceus module activate xcpu
> > > > >
> > > > > That will interrupt the node provisioning process and instead of
> > > > > copying the VNFS to the node it will just start up xcpu and start
> > > > > accepting connections.
> > > > >
> > > > > The second option would be to run xcpu from within the VNFS of your
> > > > > choice. That mechanism basically involves installing xcpu into the
> > > > > mounted VNFS image and then provision your nodes with that.
> > > > >
> > > > > Let me know if that helps or if you have additional questions. :)
> > > > >
> > > > >
> > > > > Greg
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Aug 29, 2008 at 6:45 AM, Daniel Gruner <[EMAIL PROTECTED]>
> > > wrote:
> > > > > >
> > > > > > Hi Kevin,
> > > > > >
> > > > > > Well, I've just completed installing xcpu2 and perceus into my
> RHEL5
> > > > > > machine, but now I am stumped with the configuration. How do you
> > > tell
> > > > > > perceus that you want your cluster to run xcpu? I sure don't
> > > > > > understand where this is configured (I assume somewhere in the
> > > > > > /etc/perceus .conf files), and there is no mention of that in the
> > > > > > manual other than saying that xcpu works.
> > > > > >
> > > > > > If you install xcpu2 you surely would need 9p, right?
> > > > > >
> > > > > > Also, how does slurm integrate into the perceus/xcpu world?
> > > > > >
> > > > > > I have also installed this on a caos-NSA test machine, but again
> I
> > > > > > don't know how to configure the provisioning.
> > > > > >
> > > > > > Any help with this would be much appreciated...
> > > > > >
> > > > > > Daniel
> > > > > >
> > > > > >
> > > > > > On 8/28/08, Kevin Tegtmeier <[EMAIL PROTECTED]> wrote:
> > > > > >> We used RHEL5 + perceus successfully. I had to modify the
> perceus
> > > boot
> > > > > >> image for x86_64, but it may have been a kexec/hardware specific
> > > issue I ran
> > > > > >> into. If you run into an issue with it I can help you along.
> > > > > >>
> > > > > >> I don't think the 9P module was built in, but I don't think you
> > > would use
> > > > > >> it.
> > > > > >>
> > > > > >>
> > > > > >> On Thu, Aug 28, 2008 at 11:31 AM, Daniel Gruner
> <[EMAIL PROTECTED]>
> > > wrote:
> > > > > >>
> > > > > >> >
> > > > > >> > Thanks, Abhishek.
> > > > > >> >
> > > > > >> > I will try it and report on my success/lack thereof.
> > > > > >> >
> > > > > >> > Just for info, I am using a RHEL5 distribution, but with the
> > > 2.6.26
> > > > > >> > kernel so that it supports 9p. Has anybody been successful
> with
> > > this
> > > > > >> > distribution? Otherwise, is there a preferred one?
> > > > > >> >
> > > > > >> > Daniel
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > On 8/28/08, Abhishek Kulkarni <[EMAIL PROTECTED]> wrote:
> > > > > >> > >
> > > > > >> > > Daniel,
> > > > > >> > >
> > > > > >> > > It is _not_ necessary to install cAos Linux to use Perceus.
> > > Perceus
> > > > > >> > > supports most, if not all, distributions.
> > > > > >> > >
> > > > > >> > > XCPU is bundled up as a module within Perceus. The
> > > documentation at
> > > > > >> > >
> > > > > >>
> > > http://www.perceus.org/docs/perceus-userguide-1.4.0.pdf
> is
> > > > > >> quite
> > > > > >> > > extensive at that and has details on importing and
> activating
> > > modules.
> > > > > >> > > It's quite simple even if you find yourself wanting to
> tinker
> > > with the
> > > > > >> > > XCPU Perceus module (it's just a shell script that runs at
> a
> > > specified
> > > > > >> > > provisioning state/level)
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > -- Abhishek
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Thu, 2008-08-28 at 14:17 -0400, Daniel Gruner wrote:
> > > > > >> > > > Yes, that is a possibility. Instructions on that,
> please?
> > > > > >> > > > I tried installing caos linux, but it doesn't quite
> finish
> > > doing the
> > > > > >> install.
> > > > > >> > > >
> > > > > >> > > > Daniel
> > > > > >> > > >
> > > > > >> > > > On 8/28/08, ron minnich <[EMAIL PROTECTED]> wrote:
> > > > > >> > > > >
> > > > > >> > > > > Use perceus.
> > > > > >> > > > >
> > > > > >> > > > > Ron
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > On 8/28/08, Daniel Gruner <[EMAIL PROTECTED]> wrote:
> > > > > >> > > > > >
> > > > > >> > > > > > Hi All,
> > > > > >> > > > > >
> > > > > >> > > > > > The list has been very quiet lately... :-)
> > > > > >> > > > > >
> > > > > >> > > > > > I've been trying, yet again, to install the latest
> xcpu2
> > > in a
> > > > > >> test
> > > > > >> > > > > > cluster. Ron's instructions on the xcpu.org site
> seem
> > > to be
> > > > > >> outdated,
> > > > > >> > > > > > and partly buggy too. For instance, here are a
> couple
> > > of
> > > > > >> points:
> > > > > >> > > > > >
> > > > > >> > > > > > - After doing:
> > > > > >> > > > > >
> > > > > >> > > > > > make xcpu-tarball
> > > > > >> > > > > >
> > > > > >> > > > > > make ramfs-tarball
> > > > > >> > > > > >
> > > > > >> > > > > > make install
> > > > > >> > > > > >
> > > > > >> > > > > > I don't know whether xcpu2 has actually been built
> (I
> > > suspect
> > > > > >> not),
> > > > > >> > > > > > and it certainly has not been installed (e.g. no
> xrx, or
> > > xcpufs,
> > > > > >> or
> > > > > >> > > > > > any of that stuff has been installed).
> > > > > >> > > > > >
> > > > > >> > > > > > - The command
> > > > > >> > > > > >
> > > > > >> > > > > > export u=`uname -r`
> > > > > >> > > > > > ./mk-initramfs-oneSIS -f initrd-$u.img $u -nn -rr \
> > > > > >> > > > > > -o ../overlays/xcpu-64 \
> > > > > >> > > > > > -w e1000 \
> > > > > >> > > > > > -w forcedeth \
> > > > > >> > > > > > -w ext3
> > > > > >> > > > > >
> > > > > >> > > > > > should really be
> > > > > >> > > > > >
> > > > > >> > > > > > ./mk-xcpu-oneSIS ....
> > > > > >> > > > > >
> > > > > >> > > > > > in order that the 9p and 9pnet modules get loaded
> into
> > > the
> > > > > >> initrd.
> > > > > >> > > > > >
> > > > > >> > > > > > Can someone please take a look and revise the
> > > instructions (and
> > > > > >> let us
> > > > > >> > > > > > mere mortals know what to do)?
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > Furthermore, is xcpu2 actualy useable for production
> > > work? What
> > > > > >> about
> > > > > >> > > > > > its integration with a scheduler/resource manager?
> What
> > > about
> > > > > >> MPI?
> > > > > >> > > > > >
> > > > > >> > > > > > Regards,
> > > > > >> > > > > > Daniel
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > --
> > > > > >> > > > > Sent from Gmail for mobile | mobile.google.com
> > > > > >> > > > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Greg Kurtzer
> > > > > http://www.infiscale.com/
> > > > > http://www.runlevelzero.net/
> > > > > http://www.perceus.org/
> > > > > http://www.caoslinux.org/
> > > > >
> > > >
> > >
> > >
> >
>
>