Hi Richard,

Sorry for not having been more clear, What I need is the install log from the 
node. You can retreive it by clicking on "Monitor Cluster Deployment" and start 
the imaging. When imaging is started, double click on the node being deployed 
and you should have a cloned console of the node. When it is finished , you can 
use the menu tu save the content and send it to me (you can changes infos 
specific to your site like IP addresses or hostnames if you don't want to 
disclose them. What is importat is that I can see all the steps dones.

On OSCAR side, I see no problem so far from you script, so I suspect something 
wrong is rsyncd.conf or maybe in /var/lib/systemimager/scripts.

Aside that, I've rebuild all oscar packages on a CentOS-6.6 yesterday, but I'm 
pretty sure that it has no impact on your problem unfortunately.

I'm trying to reproduce such a problem on my new Centos-6.6 VM right now. 
Hopefully this can be reproduced. I'm confident that it's a simple thing to fix.

Cheerrs.
--
   Olivier LAHAYE
   CEA DRT/LIST/DIR

________________________________________
De : Richard Young [richard.yo...@usq.edu.au]
Envoyé : jeudi 13 novembre 2014 02:54
À : oscar-users@lists.sourceforge.net
Objet : Re: [Oscar-users] Problem imaging nodes

Olivier
Thanks for your replay. The installed Oscar packages are below:
Apitest
Base
Blcr
C3
Ganglia
Jobmonarch
Maui
Mtaconfig
Munge
Naemon
Netbootmgr
Ntpconfig
Oda
Sc3
Sis
Switcher
Sync-files
Torque
Yume

Hopefully below is what you are after from the monitor console:

[INFO - mkdhcpconf] Loaded OSCAR configuration (at Network.pm:482)
[DB - mkdhcpconf] DB Query: SELECT rfc1918 FROM Networks WHERE 
base_ip='172.16.11.0';
[INFO - oscar_wizard] Checking lease file (/var/lib/dhcpd/dhcpd.leases).
[INFO - oscar_wizard] DHCP lease file ready.
[INFO - oscar_wizard] Setting service dhcp to on...
[INFO - oscar_wizard] Called getitem with dhcp_service and returning dhcpd
[INFO - oscar_wizard] dhcp is already on
[INFO - oscar_wizard] Performing restart on dhcp service.
[INFO - oscar_wizard] Called getitem with dhcp_service and returning dhcpd
[INFO - oscar_wizard] About to run:  LC_ALL=C /sbin/service dhcpd restart
Shutting down dhcpd:                                       [  OK  ]
Starting dhcpd:                                            [  OK  ]
[INFO - oscar_wizard] DHCP service successfully set up for interface eth2.
[INFO - oscar_wizard] Loaded OSCAR configuration (at MAC.pm:952)
[INFO - oscar_wizard] Setup network boot (PXE)
[ACTION - oscar_wizard] About to run: /usr/bin/setup_pxe -v
[INFO - setup_pxe] Loaded OSCAR configuration (at Database.pm:981)
2014-11-13 9:14:27 [main :: Line 329] Checking arguments.
2014-11-13 9:14:27 [main :: Line 136] Restarting atftpd
[INFO - setup_pxe] Called getitem with tftp_dir and returning /tftpboot/
[INFO - setup_pxe] Performing restart on tftp socket service.
[INFO - setup_pxe] Performing restart on xinetd service.
Stopping xinetd:                                           [  OK  ]
Starting xinetd:                                           [  OK  ]
2014-11-13 9:14:28 [main :: Line 139] Enabling atftpd
[INFO - setup_pxe] Setting xinetd service tftp to on...
[INFO - setup_pxe] Performing restart on xinetd service.
Stopping xinetd:                                           [  OK  ]
Starting xinetd:                                           [  OK  ]
2014-11-13 9:14:28 [main :: Line 151] Creating directories.
2014-11-13 9:14:28 [main :: Line 202] Getting pxelinux.0.
2014-11-13 9:14:28 [main :: Line 206] Copying default pxelinux.cfg file
2014-11-13 9:14:28 [main :: Line 215] Updating /tftpboot/pxelinux.cfg//default 
file to skip local.cfg and support si_monitor.
2014-11-13 9:14:28 [main :: Line 234] Disabling nonexec mappings on x86_64
2014-11-13 9:14:28 [main :: Line 240] Copying SystemImager's message.txt to 
/tftpboot/pxelinux.cfg/
2014-11-13 9:14:28 [main :: Line 305] Copying SystemImager standard boot kernel 
and initrd.img to /tftpboot/
2014-11-13 9:14:29 [main :: Line 312] Symlinking SystemImager standard boot 
kernel and initrd.img to /tftpboot//kernel and /tftpboot//initrd.img 
respectively
[INFO - oscar_wizard] Successfully setup network boot (PXE).
------------------------ Step 8: Completed successfully ------------------------
[INFO - oscar_wizard] Called getitem with oscar_testing_path and returning 
/usr/lib/oscar/testing
[INFO - oscar_wizard] Called getitem with oscar_apitests_logdir and returning 
/var/log/oscar/apitests
[ACTION - oscar_wizard] About to run: LC_ALL=C /usr/bin/apitest -o 
/var/log/oscar/apitests -v -f apitests.d/before_monitor_deployment.apb
[INFO - oscar_wizard] Test before_monitor_deployment.apb succeeded.
[INFO - oscar_wizard] Ready to enter step "monitor_deployment"
[INFO - oscar_wizard] Performing start on monitor service.
[INFO - oscar_wizard] Called getitem with monitor_service and returning 
systemimager-server-monitord
[INFO - oscar_wizard] Performing status on monitor service.
[INFO - oscar_wizard] Called getitem with monitor_service and returning 
systemimager-server-monitord
[INFO - oscar_wizard] About to run:  LC_ALL=C /sbin/service 
systemimager-server-monitord status
Status of SystemImager's installation monitoring: si_monitor... running.
[INFO - oscar_wizard] About to run:  LC_ALL=C /sbin/service 
systemimager-server-monitord restart
Stopping SystemImager's installation monitoring: si_monitor... stopped.
Starting SystemImager's installation monitoring: si_monitor... ok.

Below is the output from the above attempt to install a node in 
/var/log/systemimager/rsyncd:

2014/11/13 09:36:06 [14172] connect from usqhpc12 (172.16.11.72)
2014/11/12 23:36:06 [14172] rsync on scripts/imaging_complete_172.16.11.72 from 
usqhpc12 (172.16.11.72)
2014/11/12 23:36:06 [14172] building file list
2014/11/12 23:36:06 [14172] rsync: link_stat "/imaging_complete_172.16.11.72" 
(in scripts) failed: No such file or directory (2)

Also below is the output from the above installation in /var/log/messages:

Nov 13 09:14:26 usqhpcadm dhcpd: Internet Systems Consortium DHCP Server 
4.1.1-P1
Nov 13 09:14:26 usqhpcadm dhcpd: Copyright 2004-2010 Internet Systems 
Consortium.
Nov 13 09:14:26 usqhpcadm dhcpd: All rights reserved.
Nov 13 09:14:26 usqhpcadm dhcpd: For info, please visit 
https://www.isc.org/software/dhcp/
Nov 13 09:14:26 usqhpcadm dhcpd: Not searching LDAP since ldap-server, 
ldap-port and ldap-base-dn were not specified in the config file
Nov 13 09:14:26 usqhpcadm dhcpd: Wrote 0 deleted host decls to leases file.
Nov 13 09:14:26 usqhpcadm dhcpd: Wrote 0 new dynamic host decls to leases file.
Nov 13 09:14:26 usqhpcadm dhcpd: Wrote 0 leases to leases file.
Nov 13 09:14:26 usqhpcadm dhcpd: Listening on 
LPF/eth2/00:23:8b:03:80:1f/172.16.11.0/24
Nov 13 09:14:26 usqhpcadm dhcpd: Sending on   
LPF/eth2/00:23:8b:03:80:1f/172.16.11.0/24
Nov 13 09:14:26 usqhpcadm dhcpd: Sending on   Socket/fallback/fallback-net
Nov 13 09:14:28 usqhpcadm xinetd[1969]: Exiting...
Nov 13 09:14:28 usqhpcadm xinetd[13944]: xinetd Version 2.3.14 started with 
libwrap loadavg labeled-networking options compiled in.
Nov 13 09:14:28 usqhpcadm xinetd[13944]: Started working: 1 available service
Nov 13 09:14:28 usqhpcadm xinetd[13944]: Starting reconfiguration
Nov 13 09:14:28 usqhpcadm xinetd[13944]: Swapping defaults
Nov 13 09:14:28 usqhpcadm xinetd[13944]: readjusting service tftp
Nov 13 09:14:28 usqhpcadm xinetd[13944]: Reconfigured: new=0 old=1 dropped=0 
(services)
Nov 13 09:14:28 usqhpcadm xinetd[13944]: Exiting...
Nov 13 09:14:28 usqhpcadm xinetd[13969]: xinetd Version 2.3.14 started with 
libwrap loadavg labeled-networking options compiled in.
Nov 13 09:14:28 usqhpcadm xinetd[13969]: Started working: 1 available service
Nov 13 09:33:08 usqhpcadm dhcpd: DHCPDISCOVER from 00:26:9e:0a:a7:03 via eth2
Nov 13 09:33:08 usqhpcadm dhcpd: DHCPOFFER on 172.16.11.72 to 00:26:9e:0a:a7:03 
via eth2
Nov 13 09:33:12 usqhpcadm dhcpd: DHCPREQUEST for 172.16.11.72 (172.16.11.27) 
from 00:26:9e:0a:a7:03 via eth2
Nov 13 09:33:12 usqhpcadm dhcpd: DHCPACK on 172.16.11.72 to 00:26:9e:0a:a7:03 
via eth2
Nov 13 09:33:12 usqhpcadm xinetd[13969]: START: tftp pid=14135 from=172.16.11.72
Nov 13 09:33:12 usqhpcadm atftpd[14135]: Advanced Trivial FTP server started 
(0.7.1)
Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.0 to 172.16.11.72:2070
Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.0 to 172.16.11.72:2071
Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving 
pxelinux.cfg/a984443b-6d7a-0010-91d8-00232bced6c0 to 172.16.11.72:49152
Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving 
pxelinux.cfg/01-00-26-9e-0a-a7-03 to 172.16.11.72:49153
Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC100B48 to 
172.16.11.72:49154
Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC100B4 to 
172.16.11.72:49155
Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC100B to 
172.16.11.72:49156
Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC100 to 
172.16.11.72:49157
Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC10 to 
172.16.11.72:49158
Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC1 to 
172.16.11.72:49159
Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC to 
172.16.11.72:49160
Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/A to 
172.16.11.72:49161
Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/default to 
172.16.11.72:49162
Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving message.txt to 
172.16.11.72:49163
Nov 13 09:33:15 usqhpcadm atftpd[14135]: Serving kernel to 172.16.11.72:49164
Nov 13 09:33:15 usqhpcadm atftpd[14135]: Serving initrd.img to 
172.16.11.72:49165
Nov 13 09:33:30 usqhpcadm dhcpd: DHCPDISCOVER from 00:26:9e:0a:a7:03 via eth2
Nov 13 09:33:30 usqhpcadm dhcpd: DHCPOFFER on 172.16.11.72 to 00:26:9e:0a:a7:03 
via eth2
Nov 13 09:33:30 usqhpcadm dhcpd: DHCPREQUEST for 172.16.11.72 (172.16.11.27) 
from 00:26:9e:0a:a7:03 via eth2
Nov 13 09:33:30 usqhpcadm dhcpd: DHCPACK on 172.16.11.72 to 00:26:9e:0a:a7:03 
via eth2
Nov 13 09:38:15 usqhpcadm atftpd[14135]: atftpd terminating after 300 seconds
Nov 13 09:38:15 usqhpcadm atftpd[14135]: Main thread exiting
Nov 13 09:38:15 usqhpcadm xinetd[13969]: EXIT: tftp status=0 pid=14135 
duration=303(sec)

The output in /var/log/oscar/oscar_wizard.log is:
[DB - mkdhcpconf] querying ODA: Select Nodes.name From Nodes Where Nodes.id='31'
--------- SQL query: Select Nodes.name From Nodes Where Nodes.id='31'
 ---------
[DB - mkdhcpconf] Translated 31 to usqhpc30
[INFO - mkdhcpconf] Loaded OSCAR configuration (at Network.pm:482)
[DB - mkdhcpconf] DB Query: SELECT rfc1918 FROM Networks WHERE 
base_ip='172.16.11.0';
[INFO - oscar_wizard] Checking lease file (/var/lib/dhcpd/dhcpd.leases).
[INFO - oscar_wizard] DHCP lease file ready.
[INFO - oscar_wizard] Setting service dhcp to on...
[INFO - oscar_wizard] Called getitem with dhcp_service and returning dhcpd
[INFO - oscar_wizard] dhcp is already on
[INFO - oscar_wizard] Performing restart on dhcp service.
[INFO - oscar_wizard] Called getitem with dhcp_service and returning dhcpd
[INFO - oscar_wizard] About to run:  LC_ALL=C /sbin/service dhcpd restart
Shutting down dhcpd:                                       [  OK  ]
Starting dhcpd:                                            [  OK  ]
[INFO - oscar_wizard] DHCP service successfully set up for interface eth2.
[INFO - oscar_wizard] Loaded OSCAR configuration (at MAC.pm:952)
[INFO - oscar_wizard] Setup network boot (PXE)
[ACTION - oscar_wizard] About to run: /usr/bin/setup_pxe -v
[INFO - setup_pxe] Loaded OSCAR configuration (at Database.pm:981)
2014-11-13 9:14:27 [main :: Line 329] Checking arguments.
2014-11-13 9:14:27 [main :: Line 136] Restarting atftpd
[INFO - setup_pxe] Called getitem with tftp_dir and returning /tftpboot/
[INFO - setup_pxe] Performing restart on tftp socket service.
[INFO - setup_pxe] Performing restart on xinetd service.
Stopping xinetd:                                           [  OK  ]
Starting xinetd:                                           [  OK  ]
2014-11-13 9:14:28 [main :: Line 139] Enabling atftpd
[INFO - setup_pxe] Setting xinetd service tftp to on...
[INFO - setup_pxe] Performing restart on xinetd service.
Stopping xinetd:                                           [  OK  ]
Starting xinetd:                                           [  OK  ]
2014-11-13 9:14:28 [main :: Line 151] Creating directories.
2014-11-13 9:14:28 [main :: Line 202] Getting pxelinux.0.
2014-11-13 9:14:28 [main :: Line 206] Copying default pxelinux.cfg file
2014-11-13 9:14:28 [main :: Line 215] Updating /tftpboot/pxelinux.cfg//default 
file to skip local.cfg and support si_monitor.
2014-11-13 9:14:28 [main :: Line 234] Disabling nonexec mappings on x86_64
2014-11-13 9:14:28 [main :: Line 240] Copying SystemImager's message.txt to 
/tftpboot/pxelinux.cfg/
2014-11-13 9:14:28 [main :: Line 305] Copying SystemImager standard boot kernel 
and initrd.img to /tftpboot/
2014-11-13 9:14:29 [main :: Line 312] Symlinking SystemImager standard boot 
kernel and initrd.img to /tftpboot//kernel and /tftpboot//initrd.img 
respectively
[INFO - oscar_wizard] Successfully setup network boot (PXE).
------------------------ Step 8: Completed successfully ------------------------
[INFO - oscar_wizard] Called getitem with oscar_testing_path and returning 
/usr/lib/oscar/testing
[INFO - oscar_wizard] Called getitem with oscar_apitests_logdir and returning 
/var/log/oscar/apitests
[ACTION - oscar_wizard] About to run: LC_ALL=C /usr/bin/apitest -o 
/var/log/oscar/apitests -v -f apitests.d/before_monitor_deployment.apb
[INFO - oscar_wizard] Test before_monitor_deployment.apb succeeded.
[INFO - oscar_wizard] Ready to enter step "monitor_deployment"
[INFO - oscar_wizard] Performing start on monitor service.
[INFO - oscar_wizard] Called getitem with monitor_service and returning 
systemimager-server-monitord
[INFO - oscar_wizard] Performing status on monitor service.
[INFO - oscar_wizard] Called getitem with monitor_service and returning 
systemimager-server-monitord
[INFO - oscar_wizard] About to run:  LC_ALL=C /sbin/service 
systemimager-server-monitord status
Status of SystemImager's installation monitoring: si_monitor... running.
[INFO - oscar_wizard] About to run:  LC_ALL=C /sbin/service 
systemimager-server-monitord restart
Stopping SystemImager's installation monitoring: si_monitor... stopped.
Starting SystemImager's installation monitoring: si_monitor... ok.

>From what I can see the initial stage of re-partitioning the harddrive works 
>but copying the image over doesn't.

Thanks
---------------------------------------------------------------------
Richard A. Young
ICT Services
Email: richard.yo...@usq.edu.au   Phone: (07) 46315557
Mob:   0437544370          Fax:   (07) 46312798
---------------------------------------------------------------------


-----Original Message-----
From: LAHAYE Olivier [mailto:olivier.lah...@cea.fr]
Sent: Wednesday, 12 November 2014 8:22 PM
To: oscar-users@lists.sourceforge.net
Subject: Re: [Oscar-users] Problem imaging nodes


Hi Richard,

The 1st error is normal, it means that no file specific to usqhpc10 has been 
found.

Could you post the full log from the deployment monitor (You can hide 
specificities to your site by replacing IPS with other ones or hiostnames with 
other ones, but please keep all the lines if possible.
The second error indicates that something was KIA...

Also, tell me what oscar modules you did choose.

Best regards,

Olivier;

--
   Olivier LAHAYE
   CEA DRT/LIST/DIR

________________________________________
De : Richard Young [richard.yo...@usq.edu.au] Envoyé : mercredi 12 novembre 
2014 02:25 À : Oscar-User Objet : [Oscar-users] Problem imaging nodes

Hello,
I have recently rebuilt our HPC using RHEL 6.6 and the latest version of Oscar, 
i.e. unstable, and am having some trouble imaging the nodes. After running 
through both the standard install guide and the RHEL quick guide there seems to 
be a problem with the final stage of imaging the nodes. The Oscar_wizard 
monitor says the installation is fine however when the nodes restarts you 
simply get a cursor on the screen, basically nothing has been copied to disk. 
There doesn't seem to be any errors on the screen from dhcp or pxe however when 
checking systemimager/rsyncd logs there are the following errors:

rsync: change_dir "/usqhpc10" (in overrides) failed: No such file or directory 
(2)
rsync: link_stat "/imaging_complete_172.16.11.71" (in scripts) failed: No such 
file or directory (2)

also on the screen it says, sometimes, rsyncd not complete not all files 
copied. I have checked faqs and tips, and nothing covers this problem. Has 
anybody seen this before and is there a solution.

Thanks

---------------------------------------------------------------------
Richard A. Young
ICT Services
HPC Support Officer
University of Southern Queensland
Toowoomba, Queensland 4350
Australia
Email: richard.yo...@usq.edu.au   Phone: (07) 46315557
Mob:   0437544370          Fax:   (07) 46312798
---------------------------------------------------------------------




_____________________________________________________________
This email (including any attached files) is confidential and is for the 
intended recipient(s) only. If you received this email by mistake, please, as a 
courtesy, tell the sender, then delete this email.

The views and opinions are the originator's and do not necessarily reflect 
those of the University of Southern Queensland. Although all reasonable 
precautions were taken to ensure that this email contained no viruses at the 
time it was sent we accept no liability for any losses arising from its receipt.

The University of Southern Queensland is a registered provider of education 
with the Australian Government.
(CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 )


------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users


_____________________________________________________________
This email (including any attached files) is confidential and is for the 
intended recipient(s) only. If you received this email by mistake, please, as a 
courtesy, tell the sender, then delete this email.

The views and opinions are the originator's and do not necessarily reflect 
those of the University of Southern Queensland. Although all reasonable 
precautions were taken to ensure that this email contained no viruses at the 
time it was sent we accept no liability for any losses arising from its receipt.

The University of Southern Queensland is a registered provider of education 
with the Australian Government.
(CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 )


------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to