Yes, I used the xrx command as normal.  The back-slash is to escape
the '!', which the shell
(bash) interprets.  That is not the problem.  I will try to do the
debug bit and/or strace, and post
it for you guys to decipher...

Daniel

On Fri, Aug 29, 2008 at 6:11 PM, Abhishek Kulkarni <[EMAIL PROTECTED]> wrote:
>
>
> On Fri, Aug 29, 2008 at 12:45 PM, Daniel Gruner <[EMAIL PROTECTED]> wrote:
>>
>> On 8/29/08, Abhishek Kulkarni <[EMAIL PROTECTED]> wrote:
>> >
>> >
>> >
>> > On Fri, Aug 29, 2008 at 10:13 AM, Daniel Gruner <[EMAIL PROTECTED]>
>> > wrote:
>> > >
>> > > Hi Ab
>> > >
>> > >
>> > > On 8/29/08, Abhishek Kulkarni <[EMAIL PROTECTED]> wrote:
>> > > > Hi Daniel,
>> > > >
>> > > > Understand the way in which XCPU is supposed to integrate with
>> > > > oneSIS
>> > and/or
>> > > > Perceus. It uses these as a "launch vehicle" to build minimal images
>> > with
>> > > > xcpufs running on them, and provision the nodes with these images.
>> > > > In
>> > the
>> > > > best case, that's all that you need to be running on the compute
>> > > > nodes.
>> > >
>> > > I understand.
>> > >
>> > >
>> > > >
>> > > > On Fri, Aug 29, 2008 at 8:46 AM, Daniel Gruner <[EMAIL PROTECTED]>
>> > wrote:
>> > > > >
>> > > > > Hi Greg,
>> > > > >
>> > > > > I definitely have additional questions! :-)
>> > > > >
>> > > > > Ok, here we go:
>> > > > >
>> > > > > - assume I am totally new to this - what would one do in order to
>> > > > > set
>> > > > > up a perceus/xcpu cluster?
>> > > >
>> > > > As Greg said, you have two ways to go about it. You could choose
>> > > > either
>> > of
>> > > > them or try both to see what works for ya. It's just a matter of
>> > > > playing
>> > > > with different configurations and rebooting your nodes to try them.
>> > > >
>> > > > >
>> > > > >
>> > > > > - now, I am not totally new to this game, and my background is
>> > > > > with
>> > > > > bproc clusters, so I would like to have a replacement for these,
>> > > > > but
>> > > > > with the same basic principle of having a minimal node
>> > > > > installation,
>> > > > > and basically no management of nodes needed.  I definitely do not
>> > > > > want
>> > > > > to go to a model where the nodes have password files, and you ssh
>> > > > > into
>> > > > > them in order to run your codes.
>> > > > >
>> > > > > - in the caos-NSA installation, the warewulfd is started by
>> > > > > default.
>> > > > > I assume it needs to be stopped and perceus started, correct?
>> > > >
>> > > > You can enable Perceus from "sidekick" in NSA. Warewulf focuses on
>> > cluster
>> > > > monitoring starting with 3.0.
>> > >
>> > > Ok, I am concentrating on my RHEL5 machine for now.  It seems to be
>> > > working, at least insofar as the nodes boot.  I haven't been able to
>> > > contact them to try to do anything, other than running xstat with a
>> > > positive response:
>> > >
>> > > n0000   tcp!10.10.0.170!6667    /Linux/x86_64   up      0
>> > > n0001   tcp!10.10.0.185!6667    /Linux/x86_64   up      0
>> > >
>> > > I'd like the nodes to get sequential IP addresses, for ease of
>> > > identification and management, and I have yet to find out how you do
>> > > that in perceus.
>> >
>> > Take a look at the ipaddr module in Perceus.
>>
>> Thanks for the pointer.  I am looking at it, but the manual leaves a
>> lot to be desired in terms of describing what each module does and how
>> to configure them.  I'll try the perceus list if I keep getting stuck
>> on this.
>>
>> >
>> > >
>> > >
>> > > Now, when I try to do anything on the nodes I get, for example:
>> > >
>> > > xgroupset 10.10.0.170 root 0
>> > > xgroupset: Error: Connection refused:10.10.0.170
>> >
>> > Whoops! What about telnet 10.10.0.170 6667?
>> > Perceus might possibly be running xcpufs on some non-standard port. I'm
>> > not
>> > sure about that but I remember seeing something like that a while back.
>>
>> You seem to have hit it!  I can in fact telnet using port 6667
>> explicitly (can't do anything while in there...:-).  I thought that
>> was the default port anyway, correct?
>>
>> >
>> > >
>> > >
>> > > similarly with xrx.
>> > >
>> > > xrx 10.10.0.170 /bin/date
>> > > Error: Connection refused:10.10.0.170
>> >
>> > Ditto with this, if it's running on a different port you would want to
>> > do
>> >  xrx 10.10.0.170!port /bin/date
>> >
>> > Alternatively you could specify the "-a" flag to retrieve the nodes from
>> > the
>> > statfs.
>>
>> I can get xgroupset and xuserset to work with the -a flag, with no
>> complaints.  However, when I try to run anything on the nodes using
>> xrx, whether I use the -a flag or explicitly set the port (xrx
>> 10.10.0.170\!6667 /bin/date) the command just hangs.  I realize this
>> is some progress, but no cigar yet.
>
> Weird. Did you use the above xrx command as-is? If yes, notice the
> back-slash in your command.
> I have not ever seen xrx hang that way. Try passing the "-d" switch
> alongwith -a, to enable the debug mode.
> If not, do the strace and post the output. It's not that tough. Really. It
> should just work.
>
>>
>> >
>> > >
>> > >
>> > > I also don't get name resolution for the nXXXX names assigned to the
>> > > nodes by perceus.
>> > >
>> > >
>> >
>> > Check your /etc/resolv.conf.
>> > Probably try adding the following to it.
>> > nameserver 127.0.0.1
>> >
>> > If that doesn't work, the right place to ask this would be the Perceus
>> > ML.
>>
>> Doesn't work.  I'll try the perceus gurus...
>>
>> >
>> > >
>> > >
>> > >
>> > > >
>> > > > >
>> > > > >
>> > > > > - what initialization of perceus needs to be done (the first time
>> > > > > it
>> > > > > runs)?  I know about the network interface specification, and that
>> > > > > I
>> > > > > want it to use xget (the default), but is running the "perceus
>> > > > > module
>> > > > > activate xcpu" enough to get the nodes booting into xcpu?
>> > > >
>> > > > Yes, it is enough to get xcpufs running on the compute nodes.
>> > > >
>> > > > >
>> > > > >
>> > > > > - what about configuring the resource manager (e.g. slurm) for use
>> > > > > in
>> > > > > the perceus/xcpu environment?
>> > > >
>> > > > XCPU only supports Moab Torque for now.
>> > >
>> > > Is this the open source torque, or just the commercial product?
>>
>>
>> Who would know about which version of Torque this is?  I can't afford
>> the commercial Moab right now...
>>
>>
>> > >
>> > >
>> > >
>> > >
>> > > >
>> > > > >
>> > > > >
>> > > > > - I don't see the xcpufs and statfs daemons running on the master
>> > > > > after starting perceus even though I told it to activate xcpu.  I
>> > > > > haven't tried to boot nodes yet, but I'd like to understand what I
>> > > > > am
>> > > > > doing first (I hate black boxes...).
>> > > > >
>> > > >
>> > > > You shouldn't need to run xcpufs on the master. As for statfs, you
>> > > > can
>> > start
>> > > > it manually if it is not running already.
>> > > >
>> > > > Again, considering that you have fully configured the master and
>> > > > have
>> > the
>> > > > nodes provisioned to the init state, this is what I would do to
>> > > > generate
>> > my
>> > > > statfs.conf --
>> > > >
>> > > > perceus node status | awk 'NR > 2 {print $1 "=tcp!" $3 "!6667"}' >
>> > > > /etc/xcpu/statfs.conf
>> > >
>> > > I had to replace the part "NR>2" with "NR>0" for the above incantation
>> > > to work (??).
>> >
>> > Strange, I might probably be running a different version of Perceus.
>>
>> Actually, what happens is that the first two lines of output from the
>> "perceus node status" command are output to stderr, and the rest to
>> stdout.  If your '|' redirection included stderr then the command as
>> you wrote it would work.  What shell are you using?
>
> Ahh, maybe. I'm using bash. I didn't realize that.
>
> Thanks,
>  -- Abhishek
>
>>
>> Thanks,
>>
>> Daniel
>>
>>
>>
>> >
>> > >
>> > >
>> > >
>> > > >
>> > > > And then,
>> > > >
>> > > > statfs -c /etc/xcpu/statfs
>> > >
>> > > statfs seems to work.  Here is the output from xstat:
>> > >
>> > > n0000   tcp!10.10.0.170!6667    /Linux/x86_64   up      0
>> > > n0001   tcp!10.10.0.185!6667    /Linux/x86_64   up      0
>> > >
>> > > In any case, there is some progress, but it is not quite there yet...
>> >
>> > I'm glad you are almost there.
>> >
>> > Thanks,
>> >   -- Abhishek
>> >
>> >
>> > >
>> > >
>> > > Thanks,
>> > > Daniel
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > >
>> > > >
>> > > > >
>> > > > > etc.
>> > > > >
>> > > > > I guess the main problem I have is not with perceus itself (I have
>> > > > > read the manual), but rather with its integration and provisioning
>> > > > > for
>> > > > > xcpu, and for the subsequent configuration of those pieces that
>> > > > > make
>> > > > > the cluster useable in a production environment.
>> > > > >
>> > > > >
>> > > > > Thanks for your help,
>> > > > > Daniel
>> > > >
>> > > > Thanks
>> > > >  -- Abhishek
>> > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On 8/29/08, Greg Kurtzer <[EMAIL PROTECTED]> wrote:
>> > > > > >
>> > > > > >  You have multiple choices on how to move forward.
>> > > > > >
>> > > > > >  First you can run the xcpu Perceus module like:
>> > > > > >
>> > > > > >  # perceus module activate xcpu
>> > > > > >
>> > > > > >  That will interrupt the node provisioning process and instead
>> > > > > > of
>> > > > > >  copying the VNFS to the node it will just start up xcpu and
>> > > > > > start
>> > > > > >  accepting connections.
>> > > > > >
>> > > > > >  The second option would be to run xcpu from within the VNFS of
>> > > > > > your
>> > > > > >  choice. That mechanism basically involves installing xcpu into
>> > > > > > the
>> > > > > >  mounted VNFS image and then provision your nodes with that.
>> > > > > >
>> > > > > >  Let me know if that helps or if you have additional questions.
>> > > > > > :)
>> > > > > >
>> > > > > >
>> > > > > >  Greg
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >  On Fri, Aug 29, 2008 at 6:45 AM, Daniel Gruner
>> > > > > > <[EMAIL PROTECTED]>
>> > > > wrote:
>> > > > > >  >
>> > > > > >  > Hi Kevin,
>> > > > > >  >
>> > > > > >  > Well, I've just completed installing xcpu2 and perceus into
>> > > > > > my
>> > RHEL5
>> > > > > >  > machine, but now I am stumped with the configuration.  How do
>> > > > > > you
>> > > > tell
>> > > > > >  > perceus that you want your cluster to run xcpu?  I sure don't
>> > > > > >  > understand where this is configured (I assume somewhere in
>> > > > > > the
>> > > > > >  > /etc/perceus .conf files), and there is no mention of that in
>> > > > > > the
>> > > > > >  > manual other than saying that xcpu works.
>> > > > > >  >
>> > > > > >  > If you install xcpu2 you surely would need 9p, right?
>> > > > > >  >
>> > > > > >  > Also, how does slurm integrate into the perceus/xcpu world?
>> > > > > >  >
>> > > > > >  > I have also installed this on a caos-NSA test machine, but
>> > > > > > again
>> > I
>> > > > > >  > don't know how to configure the provisioning.
>> > > > > >  >
>> > > > > >  > Any help with this would be much appreciated...
>> > > > > >  >
>> > > > > >  > Daniel
>> > > > > >  >
>> > > > > >  >
>> > > > > >  > On 8/28/08, Kevin Tegtmeier <[EMAIL PROTECTED]> wrote:
>> > > > > >  >> We used RHEL5 + perceus successfully.  I had to modify the
>> > perceus
>> > > > boot
>> > > > > >  >> image for x86_64, but it may have been a kexec/hardware
>> > > > > > specific
>> > > > issue I ran
>> > > > > >  >> into.  If you run into an issue with it I can help you
>> > > > > > along.
>> > > > > >  >>
>> > > > > >  >> I don't think the 9P module was built in, but I don't think
>> > > > > > you
>> > > > would use
>> > > > > >  >> it.
>> > > > > >  >>
>> > > > > >  >>
>> > > > > >  >> On Thu, Aug 28, 2008 at 11:31 AM, Daniel Gruner
>> > <[EMAIL PROTECTED]>
>> > > > wrote:
>> > > > > >  >>
>> > > > > >  >> >
>> > > > > >  >> > Thanks, Abhishek.
>> > > > > >  >> >
>> > > > > >  >> > I will try it and report on my success/lack thereof.
>> > > > > >  >> >
>> > > > > >  >> > Just for info, I am using a RHEL5 distribution, but with
>> > > > > > the
>> > > > 2.6.26
>> > > > > >  >> > kernel so that it supports 9p.  Has anybody been
>> > > > > > successful
>> > with
>> > > > this
>> > > > > >  >> > distribution?  Otherwise, is there a preferred one?
>> > > > > >  >> >
>> > > > > >  >> > Daniel
>> > > > > >  >> >
>> > > > > >  >> >
>> > > > > >  >> >
>> > > > > >  >> >
>> > > > > >  >> > On 8/28/08, Abhishek Kulkarni <[EMAIL PROTECTED]> wrote:
>> > > > > >  >> > >
>> > > > > >  >> > >  Daniel,
>> > > > > >  >> > >
>> > > > > >  >> > >  It is _not_ necessary to install cAos Linux to use
>> > > > > > Perceus.
>> > > > Perceus
>> > > > > >  >> > >  supports most, if not all, distributions.
>> > > > > >  >> > >
>> > > > > >  >> > >  XCPU is bundled up as a module within Perceus. The
>> > > > documentation at
>> > > > > >  >> > >
>> > > > > >  >>
>> > > > http://www.perceus.org/docs/perceus-userguide-1.4.0.pdf
>> > is
>> > > > > >  >> quite
>> > > > > >  >> > >  extensive at that and has details on importing and
>> > activating
>> > > > modules.
>> > > > > >  >> > >  It's quite simple even if you find yourself wanting to
>> > tinker
>> > > > with the
>> > > > > >  >> > >  XCPU Perceus module (it's just a shell script that runs
>> > > > > > at
>> > a
>> > > > specified
>> > > > > >  >> > >  provisioning state/level)
>> > > > > >  >> > >
>> > > > > >  >> > >
>> > > > > >  >> > >   -- Abhishek
>> > > > > >  >> > >
>> > > > > >  >> > >
>> > > > > >  >> > >  On Thu, 2008-08-28 at 14:17 -0400, Daniel Gruner wrote:
>> > > > > >  >> > >  > Yes, that is a possibility.  Instructions on that,
>> > please?
>> > > > > >  >> > >  > I tried installing caos linux, but it doesn't quite
>> > finish
>> > > > doing the
>> > > > > >  >> install.
>> > > > > >  >> > >  >
>> > > > > >  >> > >  > Daniel
>> > > > > >  >> > >  >
>> > > > > >  >> > >  > On 8/28/08, ron minnich <[EMAIL PROTECTED]> wrote:
>> > > > > >  >> > >  > >
>> > > > > >  >> > >  > >  Use perceus.
>> > > > > >  >> > >  > >
>> > > > > >  >> > >  > >  Ron
>> > > > > >  >> > >  > >
>> > > > > >  >> > >  > >
>> > > > > >  >> > >  > >  On 8/28/08, Daniel Gruner <[EMAIL PROTECTED]>
>> > > > > > wrote:
>> > > > > >  >> > >  > >  >
>> > > > > >  >> > >  > >  > Hi All,
>> > > > > >  >> > >  > >  >
>> > > > > >  >> > >  > >  > The list has been very quiet lately... :-)
>> > > > > >  >> > >  > >  >
>> > > > > >  >> > >  > >  > I've been trying, yet again, to install the
>> > > > > > latest
>> > xcpu2
>> > > > in a
>> > > > > >  >> test
>> > > > > >  >> > >  > >  > cluster.  Ron's instructions on the xcpu.org
>> > > > > > site
>> > seem
>> > > > to be
>> > > > > >  >> outdated,
>> > > > > >  >> > >  > >  > and partly buggy too.  For instance, here are a
>> > couple
>> > > > of
>> > > > > >  >> points:
>> > > > > >  >> > >  > >  >
>> > > > > >  >> > >  > >  > - After doing:
>> > > > > >  >> > >  > >  >
>> > > > > >  >> > >  > >  > make xcpu-tarball
>> > > > > >  >> > >  > >  >
>> > > > > >  >> > >  > >  > make ramfs-tarball
>> > > > > >  >> > >  > >  >
>> > > > > >  >> > >  > >  > make install
>> > > > > >  >> > >  > >  >
>> > > > > >  >> > >  > >  > I don't know whether xcpu2 has actually been
>> > > > > > built
>> > (I
>> > > > suspect
>> > > > > >  >> not),
>> > > > > >  >> > >  > >  > and it certainly has not been installed (e.g. no
>> > xrx, or
>> > > > xcpufs,
>> > > > > >  >> or
>> > > > > >  >> > >  > >  > any of that stuff has been installed).
>> > > > > >  >> > >  > >  >
>> > > > > >  >> > >  > >  > - The command
>> > > > > >  >> > >  > >  >
>> > > > > >  >> > >  > >  > export u=`uname -r`
>> > > > > >  >> > >  > >  > ./mk-initramfs-oneSIS -f initrd-$u.img $u -nn
>> > > > > > -rr \
>> > > > > >  >> > >  > >  > -o ../overlays/xcpu-64 \
>> > > > > >  >> > >  > >  > -w e1000 \
>> > > > > >  >> > >  > >  > -w forcedeth \
>> > > > > >  >> > >  > >  > -w ext3
>> > > > > >  >> > >  > >  >
>> > > > > >  >> > >  > >  > should really be
>> > > > > >  >> > >  > >  >
>> > > > > >  >> > >  > >  > ./mk-xcpu-oneSIS ....
>> > > > > >  >> > >  > >  >
>> > > > > >  >> > >  > >  > in order that the 9p and 9pnet modules get
>> > > > > > loaded
>> > into
>> > > > the
>> > > > > >  >> initrd.
>> > > > > >  >> > >  > >  >
>> > > > > >  >> > >  > >  > Can someone please take a look and revise the
>> > > > instructions (and
>> > > > > >  >> let us
>> > > > > >  >> > >  > >  > mere mortals know what to do)?
>> > > > > >  >> > >  > >  >
>> > > > > >  >> > >  > >  >
>> > > > > >  >> > >  > >  > Furthermore, is xcpu2 actualy useable for
>> > > > > > production
>> > > > work?  What
>> > > > > >  >> about
>> > > > > >  >> > >  > >  > its integration with a scheduler/resource
>> > > > > > manager?
>> > What
>> > > > about
>> > > > > >  >> MPI?
>> > > > > >  >> > >  > >  >
>> > > > > >  >> > >  > >  > Regards,
>> > > > > >  >> > >  > >  > Daniel
>> > > > > >  >> > >  > >  >
>> > > > > >  >> > >  > >
>> > > > > >  >> > >  > >
>> > > > > >  >> > >  > > --
>> > > > > >  >> > >  > >  Sent from Gmail for mobile | mobile.google.com
>> > > > > >  >> > >  > >
>> > > > > >  >> > >
>> > > > > >  >> > >
>> > > > > >  >> >
>> > > > > >  >>
>> > > > > >  >>
>> > > > > >  >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > >  Greg Kurtzer
>> > > > > >  http://www.infiscale.com/
>> > > > > >  http://www.runlevelzero.net/
>> > > > > >  http://www.perceus.org/
>> > > > > >  http://www.caoslinux.org/
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > >
>> >
>> >
>
>

Reply via email to