Title: Re: [Oscar-users] RE: 4.2.1b54423 : REPORT on Fedora Core 3 X86_64 and Ganglia is ON
Hi Jerome:
 
That oscarinstall.log looks fine - please re-test when b6 comes out and send along the log file if it happens again.
 
Thanks,
 
Bernard


From: Lefevre Jerome [mailto:[EMAIL PROTECTED]
Sent: Tue 28/03/2006 19:51
To: Bernard Li; [email protected]
Subject: Re: [Oscar-users] RE: 4.2.1b54423 : REPORT on Fedora Core 3 X86_64 and Ganglia is ON

A 19:25 28/03/2006 -0800, Bernard Li a écrit :


Sorry, but i didn't make a backup of oscar_install.log the first time
before to run start_over. Please find the oscar_install.log for my last
Oscar installation.


>Hey Jerome:
>
>The apr package is listed in the "base" package's config.xml and the
>libaio package is listed in "lam" package's config.xml - these RPMs should
>be automatically installed by the system and you do not need to manually
>install them.
>
>Can you post your full oscarinstall.log so that we can find out what went
>wrong during the installation process?
>
>Cheers,
>
>Bernard
>
>
>----------
>From: Lefevre Jerome [mailto:[EMAIL PROTECTED]]
>Sent: Tue 28/03/2006 17:25
>To: Bernard Li; [email protected]
>Subject: 4.2.1b54423 : REPORT on Fedora Core 3 X86_64 and Ganglia is ON
>
>
>Cluster 5 Dual-Opteron tyan S2885 3Ware Sata raid
>Switch 3Com gigabit
>Fedora Core 3 x86_64 fresh install
>Oscar 4.2.1b54423
>-------------
>29-march-2006
>
>
>
>OSCAR 4.2.1b54423
>
>   ------------------------------------------   INSTALLATION REPORT
>------------------------------------------------
>
>Finally, my oscar cluster install is on, but please, you will find some
>error and workaround.
>
>Below a resume :
>
>1 - STEP 3 (OSCAR server packages)    Failed dependancies  (Fedora core 3
>dependant ?)
>2 - STEP 3 (OSCAR server packages)    Eth0 not UP          (not oscar
>dependant)
>3 - STEP 4 (Build OSCAR client image) Forced packages for i686/i386 failed
>(some dependancies are UNKNOW)
>4 - STEP 6 (CLIENT Installation)      Rsync hang           (pfilter must be
>off before network deployment)
>5 - STEP 8 (TEST CLUSTER SETUP)       Ganglia Test failed  (cause :
>multicast default mode in gmond.
>conf, should be unicast on my cluster and switch device)
>
>
>Below Oscar log output and my workaround :
>
>Many thanks and enjoy with Oscar !!
>
>
>
>./install_cluster eth0
>   ...
>---------------------------------------------------------------------------------------
>1 - STEP 3 (OSCAR server packages)    Failed dependancies  (Fedora core 3
>dependant ?)
>---------------------------------------------------------------------------------------
>--> Returning oscar_server packages for apitest: elementtree Twisted
>apitest-profiled apitest
>--> Returning oscar_server packages for perl-Qt: perl-Qt
>--> Returning oscar_server packages for sync_files: crontabs sync_files
>--> Installing server core RPMs
>warning: /tftpboot/rpm/apr-util-0.9.4-17.x86_64.rpm: V3 DSA signature:
>NOKEY, key ID 4f2a6fd2
>error: Failed dependencies:
>          libapr-0.so.0()(64bit) is needed by apr-util-0.9.4-17.x86_64
>          libapr-0.so.0()(64bit) is needed by httpd-2.0.52-3.x86_64
>$pm->install ($dm->query_required_by ()) failed at ./wizard_prep line 312
>Couldn't install packages needed for OSCAR Wizard to run at ./wizard_prep
>line 312
>Oscar Wizard preparation script failed to complete at ./install_cluster
>line 212.
>
>
>WORKAROUND :
>in a shell, I type :  rpm -U /home/tftpboot/rpm/apr-0.9.4-23.x86_64.rpm
>
>
>
>./install_cluster eth0
>...
>---------------------------------------------------------------------------------------
>1 - STEP 3 (OSCAR server packages)    Failed dependancies  (Fedora core 3
>dependant ?)
>---------------------------------------------------------------------------------------
>...
>--> Returning oscar_server packages for lam: lam-switcher-modulefile
>lam-oscar-modulefile lam-oscar libaio-devel libaio
>--> Installing server non-core RPMs (core RPMs already installed)
>warning: /tftpboot/rpm/libaio-devel-0.3.102-1.x86_64.rpm: V3 DSA signature:
>NOKEY, key ID 4f2a6fd2
>error: Failed dependencies:
>          libaio.so.1.0.0()(64bit) is needed by lam-oscar-7.0.6-3.x86_64
>$pm->install ($dm->query_required_by ()) failed at ./install_server line 73
>Couldn't install the required packages needed for OSCAR at ./install_server
>line 73
>--> Step 3: Failed to properly install OSCAR server; please check the logs
>
>WORKAROUND :
>in a shell, I type :  rpm -U /home/tftpboot/rpm/libaio-0.3.102-1.x86_64.rpm
>
>
>
>---------------------------------------------------------------------------------------
>2 - STEP 3 (OSCAR server packages)    Eth0 not UP          (not oscar
>dependant)
>---------------------------------------------------------------------------------------
>...
>Ganglia page is located at <http://localhost/ganglia>http://localhost/ganglia
>--> Successfully ran server non-core package post_server_install scripts
>--> Getting internal IP address
>Cannot update hosts without a valid ip.
>   at ./install_server line 194
>          main::update_hosts('undef') called at ./install_server line 154
>--> Got: [IP ]
>--> Got: [broadcast ]
>--> Got: [netmask ]
>--> Adding hosts to /etc/hosts
>--> Step 3: Failed to properly install OSCAR server; please check the logs
>--> Update Wizard Env (as needed)
>Update environment: ENV{MANPATH}
>Update environment: ENV{PVM_RSH}
>Update environment: ENV{PVM_ROOT}
>Update environment: ENV{PVM_ARCH}
>Update environment: ENV{PATH}
>Update environment: ENV{_LMFILES_}
>Update environment: ENV{LOADEDMODULES}
>
>WORKAROUND : eth0 was not up and ifcfg-eth0 was bad
>I  edit by hand ifcfg-eth0 :
>
>DEVICE=eth0
>BOOTPROTO=static
>>>TYPE=Ethernet
>IPADDR=192.168.150.50
>GATEWAY=192.168.150.253
>BROADCAST=192.168.150.255
>NETMASK=255.255.255.0
>NETWORK=192.168.150.0
>HWADDR=00:0e:0c:60:84:4f
>
>
>
>---------------------------------------------------------------------------------------
>3 - STEP 4 (Build OSCAR client image) Forced packages for i686/i386 failed
>(some dependancies are UNKNOW)
>---------------------------------------------------------------------------------------
>...
><== OK  gpm     /tftpboot/rpm/gpm-1.20.1-66.x86_64.rpm  -ihv --nodeps
><==
>OK  authconfig      /tftpboot/rpm/authconfig-4.6.5-3.1.x86_64.rpm   -ihv
>--nodeps
><== OK  glibc-headers   /tftpboot/rpm/glibc-headers-2.3.3-74.x86_64.rpm
>-ihv --nodeps
><==
>OK  sysklogd        /tftpboot/rpm/sysklogd-1.4.1-22.x86_64.rpm      -ihv
>--nodeps
><==
>OK  torque-mom      /tftpboot/rpm/torque-mom-1.2.0p5-2.x86_64.rpm   -ihv
>--nodeps
><==
>OK  module-init-tools
>/tftpboot/rpm/module-init-tools-3.1-0.pre5.3.x86_64.rpm -ihv --nodeps
><== OK  cpio    /tftpboot/rpm/cpio-2.5-7.x86_64.rpm     -ihv --nodeps
><== t=3s
><== $? 0
>4: Forced packages for i686: glibc
>==> /usr/bin/update-rpms '--root=none' '--cache=u' '--check' '--arch'
>'i686' 'glibc'
><== NG  glibc   /tftpboot/rpm/glibc-2.3.3-74.i686.rpm   requires basesystem
>(UNKNOWN)   requires glibc-common = 2.3.3-74 (UNKNOWN)       requires
>libgcc (UNKNOWN)
><== t=1s
><== $? 1
>5: Forced packages for i386: tcl libstdc++ freetype fontconfig glibc-devel
>xorg-x11-libs expat ncurses xorg-x11-Mesa-libGL zlib gpm libgcc
>==> /usr/bin/update-rpms '--root=none' '--cache=u' '--check' '--arch'
>'i386' 'tcl' 'libstdc++' 'freetype' 'fontconfig' 'glibc-devel'
>'xorg-x11-libs' 'expat' 'ncurses' 'xorg-x11-Mesa-libGL' 'zlib' 'gpm' 'libgcc'
><== OK  libgcc  /tftpboot/rpm/libgcc-3.4.2-6.fc3.i386.rpm       -ihv
><== NG  expat   /tftpboot/rpm/expat-1.95.7-4.i386.rpm   requires
>libc.so.6(GLIBC_2.1) (UNKNOWN) requires libc.so.6(GLIBC_2.0) (UNKNOWN)
>
>
>WORKAROUND :
>I specify Debug Mode with "export DEBUG_UPDATE_RPMS=1"
>Very strange, I notice no more trouble after "start_over" and running
>"install_cluster eth0" on a fresh login ????
>But this is not reproductible !!!
>Now I have always trouble with UNKNOW basesystem. Is this package in
>tftpboot/rpm ? Yes.
>
>Before step 4 "Build OSCAR Client Image"
>In /usr/lib/systeminstaller/SystemInstaller/Package/UpdateRPMs.pm,
>I add after line 99 :
>
>                for my $farch (keys %{$forced}) {
>                 $cmd = "update-rpms --root=none --cache=u --list ";
>
>The option --check is default for x86_64 rpm. I change to --list for forced
>package i386 and i686 rpm.
>
>
>
>---------------------------------------------------------------------------------------
>4 - STEP 6 (CLIENT Installation)      Rsync hang           (pfilter must be
>off before network deployment)
>---------------------------------------------------------------------------------------
>   Rsync hang ??
>
>I check /var/log/systemimager/rsyncd
>
>[EMAIL PROTECTED] ~]# cat /var/log/systemimager/rsyncd
>2006/03/28 04:21:09 [9409] rsyncd version 2.6.3 starting, listening on
>port 873
>2006/03/28 04:36:24 [12692] rsync on
>boot/x86_64/standard/boel_binaries.tar.gz f rom node1.cluster.ird.nc
>(192.168.150.10)
>2006/03/27 17:36:24 [12692] wrote 5003073 bytes  read 122 bytes  total size
>5002 338
>2006/03/28 04:36:25 [12693] rsync on scripts/ from node1.cluster.ird.nc
>(192.168 .150.10)
>2006/03/27 17:36:25 [12693] wrote 22186 bytes  read 191 bytes  total size
>21567
>2006/03/28 04:37:12 [12698] rsync on editr-bunch1 from node1.cluster.ird.nc
>(192 .168.150.10)
>2006/03/28 04:37:13 [12698] wrote 1057372 bytes  read 81 bytes  total size
>89958 7892
>2006/03/28 04:37:13 [12700] rsync on editr-bunch1/ from
>node1.cluster.ird.nc (19 2.168.150.10)
>2006/03/28 04:47:17 [12700] rsync error: timeout in data send/receive (code
>30) at io.c(153)
>2006/03/28 04:48:17 [12700] rsync error: timeout in data send/receive (code
>30) at io.c(153)
>2006/03/28 04:49:17 [12700] rsync error: timeout in data send/receive (code
>30) at io.c(153)
>2006/03/28 04:50:17 [12700] rsync error: timeout in data send/receive (code
>30) at io.c(153)
>2006/03/28 04:51:17 [12700] rsync error: timeout in data send/receive (code
>30) at io.c(153)
>2006/03/28 04:52:17 [12700] rsync error: timeout in data send/receive (code
>30) at io.c(153)
>2006/03/28 04:52:43 [12700] rsync: writefd_unbuffered failed to write 69
>bytes: phase "unknown" [sender]: Connection timed out (110)
>2006/03/28 04:52:43 [12700] rsync error: error in rsync protocol data
>stream (co de 12) at io.c(909)
>
>WORKAROUND :
>After some googling on oscar-users,
>I stop "pfilter service" before cluster deployment, because there is some
>interference with the
>firewall.
>
>
>---------------------------------------------------------------------------------------
>5 - STEP 8 (TEST CLUSTER SETUP)       Ganglia Test failed  (cause :
>multicast default mode in gmond.
>---------------------------------------------------------------------------------------
>Ganclia Setup Test failed   ???
>
>I check  ganglia.err
>
>[EMAIL PROTECTED] ganglia]# cat ganglia.err
>Client nodes: node1.cluster.ird.nc node2.cluster.ird.nc
>node3.cluster.ird.nc node4.cluster.ird.nc
>Match pattern:
>editr.cluster.ird.nc|node1.cluster.ird.nc|node2.cluster.ird.nc|node3.cluster.ird.nc|node4.cluster.ird.nc
>Number of hosts matched: 1
>Gstat output:
>CLUSTER INFORMATION
>         Name: EDITR Cluster
>        Hosts: 1
>Gexec Hosts: 0
>   Dead Hosts: 0
>    Localtime: Tue Mar 28 19:54:16 2006
>
>CLUSTER HOSTS
>Hostname                     LOAD                       CPU              Gexec
>   CPUs (Procs/Total) [     1,     5, 15min] [  User,  Nice, System, Idle]
>
>editr.cluster.ird.nc
>      2 (    0/  175) [  0.10,  0.07,  0.02] [   2.0,   0.0,   0.9,  96.9] OFF
>
>The number of nodes expected is different from the number of nodes detected.
>Check to see if gmond is running on all your nodes and make sure that you
>are not having any network issues.
>
>
>WORKAROUND :
>After some googling on "Ganglia-general",
>I comment all the Multicast entries in gmond.conf on my compute nodes and
>master node, and add the
>master node Ip, like :
>
>udp_send_channel {
>   # mcast_join = 239.2.11.71
>    host = 192.168.150.50
>    port = 8649
>}
>
>udp_recv_channel {
>   # mcast_join = 239.2.11.71
>    port = 8649
>   # bind = 239.2.11.71
>}
>
>
>------------------------------------------------------------------------------------------------
>

Reply via email to