Title: RE: [Oscar-devel] Anyone had problems with sis and monitor_server?
Hi Steve:
 
Please have a look at this document:

http://svn.oscar.openclustergroup.org/trac/oscar/wiki/SystemImager
 
regarding editing /tftpboot/pxelinux.cfg/default.
 
This should however be done for you automatically once you hit the "Setup Network Boot" button.  Does it currently have ramdisk_size=80000?  If so, perhaps you need it to be larger (you did say you have modified your Fedora installation right?)  Try changing it to 170000 or something.  If this is indeed the case perhaps we need a better way to determine the ramdisk_size, instead of just arbitrarily hardcoding a number...
 
After this, everything _should_ work.
 
Regarding your last question.  To remove the association between a client and an image, just "delete" the image with the "Delete OSCAR clients" step.  You can have multiple images within your installation, you just need to make sure that when you re-define the clients that you select the correct image.
 
If you would like to delete an image, please have a look here:

http://svn.oscar.openclustergroup.org/trac/oscar/wiki/DeleteImage
 
BTW, you might want to run netbootmgr on the command line and make sure that the nodes are set to "Install", otherwise they might just boot from the HD.  There's currently a bug filed on this issue when clients are deleted:

http://svn.oscar.openclustergroup.org/trac/oscar/ticket/13
 
Cheers,
 
Bernard


From: Steven Blackburn [mailto:[EMAIL PROTECTED]
Sent: Sun 23/07/2006 08:09
To: Bernard Li; [email protected]
Subject: RE: [Oscar-devel] Anyone had problems with sis and monitor_server?

Bernard,

I can raise a bug, although the reason I assumed the
type was the issue is that on the first image (using
the ide.disk, SystemImager said it couldn't find a
init partition to install on. At least I think that
was the gist of the error (I will make a better note
of the error if it happens again). I did notice that
the partition layout didn't resemble the one in
ide.disk, so figured it needed the scsi one.

Regarding the MONITOR_SERVER problem: perhaps I am
showing my lack of networking knowledge, as I believed
that each NIC had an IP and would only accept traffic
for that IP. On this system, the external IP is on
eth0, the internal one is on eth1. Nodes are only
connected to eth1 on the head (by their eth0).

Maybe that was a red herring, because the last error I
saw was about failing to connect to the network, and
scrolling to where the kernel is booted displayed the
external IP address, which I wasn't expecting.

I am selecting UYOK, as I had to with the r5122 oscar
build, so the installer finds my SATA disk. The head
and the nodes are the same mobo, with the head having
an additional network card.

I would be glad to post imaging logs, but I have to
hand-write it as it fails before connecting to the
monitor server. On booting, nothing looks particularly
out of place until two lines after "Autodetecting RAID
arrays.". There is a:

  RAMDISK: Compressed image found at block 0
  RAMDISK: Incomplete write (-28 != 32768) 81920000

It then mounts root readonly, frees unused kernel
memory (200k), write protects kernel read-only data
(919k). The next lines are:

  Attempt to access beyond end of device
  ram0: rw=0, want=163872, limit=160000
  EXT2-fs error (device ram0): ext2_get_inode: unable
to read inode block - inode=1623, block=81935

There are two more errors before it gives up:

  Killing off running processes.

  write_variables
  [autoinstall system failed blurb]
  Installation failed!! Stopping report task.
  nc: connect: Network unreachable
  nc: connect: Network unreachable

So is this saying the ramdisk is too small? The initrd
image in tftpboot is 15.6mb and the kernel is 1.8mb,
although I assume these are compressed but don't know
how to uncompress them (or even get a file listing).

Cheers,

Steve.

PS: Is there a simple way to wipe the state data for
OSCAR, e.g. which images have been built, which
clients have which images? Sort of like the start_over
script but which is designed to take the setup back to
step 4 (i.e. just after oscar packages have been
built).

--- Bernard Li <[EMAIL PROTECTED]> wrote:

> Hi Steve:

> Actually it doesn't really matter whether you use
> ide.disk or scsi.disk, they will both be treated the
> same as SystemImager knows how to automatically
> detect what type of disks you have, so in a sense it
> is only used as a template.  I think we should
> probably make this more clear so that users will not
> run into the same issue...  if you do not mind, can
> you please file a bug on this?
>
>
http://svn.oscar.openclustergroup.org/trac/oscar/newticket

> Regarding the MONITOR_SERVER issue, it again doesn't
> matter whether it goes to the external or the
> internal address, your clients will be able to reach
> it nevertheless.  Again, perhaps we should change
> this so that it uses the internal IP, can you file
> another issue? :)

> It would help if you can post more log messages
> during imaging.  I'm not sure if you have tried
> enabling UYOK during "Setup Networking" step.  If
> your headnode's nic is different from your compute
> node's nic, then you'll need to modprobe the module
> on the headnode before hitting "Setup Network Boot".
>  Anyways, report back more with log messages then I
> can give you further details.

> Cheers,

> Bernard
>
> ________________________________
>
> From: [EMAIL PROTECTED] on
> behalf of Steven Blackburn
> Sent: Thu 20/07/2006 17:05
> To: [email protected]
> Subject: [Oscar-devel] Anyone had problems with sis
> and monitor_server?
>
>
>
> I am learning all the ways in how not to install a
> cluster, like leaving the ide.disk layout but the
> nodes have SATA. I am using r5189 on a "respun"
> Fedora
> Core 5 and having problems imaging the nodes.
> SystemImager is failing with "Network is
> unreachable"
> which, seeing as it grabbed SystemImager okay,
> appears
> to be MONITOR_SERVER being set to the external IP
> address of the head node (i.e. 192.... instead of
> 10....). Is this a common problem? or did I do
> something stoopid (more likely).
>
> I checked that I ran "./install_cluster eth1",
> although I did make a second image and delete some
> nodes in the same session - that is I didn't exit
> the
> wizard. Oh and I cancelled building one of the
> images.
>
> I will rebuild the head (again) this weekend and try
> to get a clean run through.
>
> Thanks,
>
> Steve.
>
>
-------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get
> the chance to share your
> opinions on IT & business topics through brief
> surveys -- and earn cash
>
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Oscar-devel mailing list
> [email protected]
>
https://lists.sourceforge.net/lists/listinfo/oscar-devel
>
>
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Oscar-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-devel

Reply via email to