Hi Richard, Sorry for not having been more clear, What I need is the install log from the node. You can retreive it by clicking on "Monitor Cluster Deployment" and start the imaging. When imaging is started, double click on the node being deployed and you should have a cloned console of the node. When it is finished , you can use the menu tu save the content and send it to me (you can changes infos specific to your site like IP addresses or hostnames if you don't want to disclose them. What is importat is that I can see all the steps dones.
On OSCAR side, I see no problem so far from you script, so I suspect something wrong is rsyncd.conf or maybe in /var/lib/systemimager/scripts. Aside that, I've rebuild all oscar packages on a CentOS-6.6 yesterday, but I'm pretty sure that it has no impact on your problem unfortunately. I'm trying to reproduce such a problem on my new Centos-6.6 VM right now. Hopefully this can be reproduced. I'm confident that it's a simple thing to fix. Cheerrs. -- Olivier LAHAYE CEA DRT/LIST/DIR ________________________________________ De : Richard Young [richard.yo...@usq.edu.au] Envoyé : jeudi 13 novembre 2014 02:54 À : oscar-users@lists.sourceforge.net Objet : Re: [Oscar-users] Problem imaging nodes Olivier Thanks for your replay. The installed Oscar packages are below: Apitest Base Blcr C3 Ganglia Jobmonarch Maui Mtaconfig Munge Naemon Netbootmgr Ntpconfig Oda Sc3 Sis Switcher Sync-files Torque Yume Hopefully below is what you are after from the monitor console: [INFO - mkdhcpconf] Loaded OSCAR configuration (at Network.pm:482) [DB - mkdhcpconf] DB Query: SELECT rfc1918 FROM Networks WHERE base_ip='172.16.11.0'; [INFO - oscar_wizard] Checking lease file (/var/lib/dhcpd/dhcpd.leases). [INFO - oscar_wizard] DHCP lease file ready. [INFO - oscar_wizard] Setting service dhcp to on... [INFO - oscar_wizard] Called getitem with dhcp_service and returning dhcpd [INFO - oscar_wizard] dhcp is already on [INFO - oscar_wizard] Performing restart on dhcp service. [INFO - oscar_wizard] Called getitem with dhcp_service and returning dhcpd [INFO - oscar_wizard] About to run: LC_ALL=C /sbin/service dhcpd restart Shutting down dhcpd: [ OK ] Starting dhcpd: [ OK ] [INFO - oscar_wizard] DHCP service successfully set up for interface eth2. [INFO - oscar_wizard] Loaded OSCAR configuration (at MAC.pm:952) [INFO - oscar_wizard] Setup network boot (PXE) [ACTION - oscar_wizard] About to run: /usr/bin/setup_pxe -v [INFO - setup_pxe] Loaded OSCAR configuration (at Database.pm:981) 2014-11-13 9:14:27 [main :: Line 329] Checking arguments. 2014-11-13 9:14:27 [main :: Line 136] Restarting atftpd [INFO - setup_pxe] Called getitem with tftp_dir and returning /tftpboot/ [INFO - setup_pxe] Performing restart on tftp socket service. [INFO - setup_pxe] Performing restart on xinetd service. Stopping xinetd: [ OK ] Starting xinetd: [ OK ] 2014-11-13 9:14:28 [main :: Line 139] Enabling atftpd [INFO - setup_pxe] Setting xinetd service tftp to on... [INFO - setup_pxe] Performing restart on xinetd service. Stopping xinetd: [ OK ] Starting xinetd: [ OK ] 2014-11-13 9:14:28 [main :: Line 151] Creating directories. 2014-11-13 9:14:28 [main :: Line 202] Getting pxelinux.0. 2014-11-13 9:14:28 [main :: Line 206] Copying default pxelinux.cfg file 2014-11-13 9:14:28 [main :: Line 215] Updating /tftpboot/pxelinux.cfg//default file to skip local.cfg and support si_monitor. 2014-11-13 9:14:28 [main :: Line 234] Disabling nonexec mappings on x86_64 2014-11-13 9:14:28 [main :: Line 240] Copying SystemImager's message.txt to /tftpboot/pxelinux.cfg/ 2014-11-13 9:14:28 [main :: Line 305] Copying SystemImager standard boot kernel and initrd.img to /tftpboot/ 2014-11-13 9:14:29 [main :: Line 312] Symlinking SystemImager standard boot kernel and initrd.img to /tftpboot//kernel and /tftpboot//initrd.img respectively [INFO - oscar_wizard] Successfully setup network boot (PXE). ------------------------ Step 8: Completed successfully ------------------------ [INFO - oscar_wizard] Called getitem with oscar_testing_path and returning /usr/lib/oscar/testing [INFO - oscar_wizard] Called getitem with oscar_apitests_logdir and returning /var/log/oscar/apitests [ACTION - oscar_wizard] About to run: LC_ALL=C /usr/bin/apitest -o /var/log/oscar/apitests -v -f apitests.d/before_monitor_deployment.apb [INFO - oscar_wizard] Test before_monitor_deployment.apb succeeded. [INFO - oscar_wizard] Ready to enter step "monitor_deployment" [INFO - oscar_wizard] Performing start on monitor service. [INFO - oscar_wizard] Called getitem with monitor_service and returning systemimager-server-monitord [INFO - oscar_wizard] Performing status on monitor service. [INFO - oscar_wizard] Called getitem with monitor_service and returning systemimager-server-monitord [INFO - oscar_wizard] About to run: LC_ALL=C /sbin/service systemimager-server-monitord status Status of SystemImager's installation monitoring: si_monitor... running. [INFO - oscar_wizard] About to run: LC_ALL=C /sbin/service systemimager-server-monitord restart Stopping SystemImager's installation monitoring: si_monitor... stopped. Starting SystemImager's installation monitoring: si_monitor... ok. Below is the output from the above attempt to install a node in /var/log/systemimager/rsyncd: 2014/11/13 09:36:06 [14172] connect from usqhpc12 (172.16.11.72) 2014/11/12 23:36:06 [14172] rsync on scripts/imaging_complete_172.16.11.72 from usqhpc12 (172.16.11.72) 2014/11/12 23:36:06 [14172] building file list 2014/11/12 23:36:06 [14172] rsync: link_stat "/imaging_complete_172.16.11.72" (in scripts) failed: No such file or directory (2) Also below is the output from the above installation in /var/log/messages: Nov 13 09:14:26 usqhpcadm dhcpd: Internet Systems Consortium DHCP Server 4.1.1-P1 Nov 13 09:14:26 usqhpcadm dhcpd: Copyright 2004-2010 Internet Systems Consortium. Nov 13 09:14:26 usqhpcadm dhcpd: All rights reserved. Nov 13 09:14:26 usqhpcadm dhcpd: For info, please visit https://www.isc.org/software/dhcp/ Nov 13 09:14:26 usqhpcadm dhcpd: Not searching LDAP since ldap-server, ldap-port and ldap-base-dn were not specified in the config file Nov 13 09:14:26 usqhpcadm dhcpd: Wrote 0 deleted host decls to leases file. Nov 13 09:14:26 usqhpcadm dhcpd: Wrote 0 new dynamic host decls to leases file. Nov 13 09:14:26 usqhpcadm dhcpd: Wrote 0 leases to leases file. Nov 13 09:14:26 usqhpcadm dhcpd: Listening on LPF/eth2/00:23:8b:03:80:1f/172.16.11.0/24 Nov 13 09:14:26 usqhpcadm dhcpd: Sending on LPF/eth2/00:23:8b:03:80:1f/172.16.11.0/24 Nov 13 09:14:26 usqhpcadm dhcpd: Sending on Socket/fallback/fallback-net Nov 13 09:14:28 usqhpcadm xinetd[1969]: Exiting... Nov 13 09:14:28 usqhpcadm xinetd[13944]: xinetd Version 2.3.14 started with libwrap loadavg labeled-networking options compiled in. Nov 13 09:14:28 usqhpcadm xinetd[13944]: Started working: 1 available service Nov 13 09:14:28 usqhpcadm xinetd[13944]: Starting reconfiguration Nov 13 09:14:28 usqhpcadm xinetd[13944]: Swapping defaults Nov 13 09:14:28 usqhpcadm xinetd[13944]: readjusting service tftp Nov 13 09:14:28 usqhpcadm xinetd[13944]: Reconfigured: new=0 old=1 dropped=0 (services) Nov 13 09:14:28 usqhpcadm xinetd[13944]: Exiting... Nov 13 09:14:28 usqhpcadm xinetd[13969]: xinetd Version 2.3.14 started with libwrap loadavg labeled-networking options compiled in. Nov 13 09:14:28 usqhpcadm xinetd[13969]: Started working: 1 available service Nov 13 09:33:08 usqhpcadm dhcpd: DHCPDISCOVER from 00:26:9e:0a:a7:03 via eth2 Nov 13 09:33:08 usqhpcadm dhcpd: DHCPOFFER on 172.16.11.72 to 00:26:9e:0a:a7:03 via eth2 Nov 13 09:33:12 usqhpcadm dhcpd: DHCPREQUEST for 172.16.11.72 (172.16.11.27) from 00:26:9e:0a:a7:03 via eth2 Nov 13 09:33:12 usqhpcadm dhcpd: DHCPACK on 172.16.11.72 to 00:26:9e:0a:a7:03 via eth2 Nov 13 09:33:12 usqhpcadm xinetd[13969]: START: tftp pid=14135 from=172.16.11.72 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Advanced Trivial FTP server started (0.7.1) Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.0 to 172.16.11.72:2070 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.0 to 172.16.11.72:2071 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/a984443b-6d7a-0010-91d8-00232bced6c0 to 172.16.11.72:49152 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/01-00-26-9e-0a-a7-03 to 172.16.11.72:49153 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC100B48 to 172.16.11.72:49154 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC100B4 to 172.16.11.72:49155 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC100B to 172.16.11.72:49156 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC100 to 172.16.11.72:49157 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC10 to 172.16.11.72:49158 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC1 to 172.16.11.72:49159 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC to 172.16.11.72:49160 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/A to 172.16.11.72:49161 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/default to 172.16.11.72:49162 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving message.txt to 172.16.11.72:49163 Nov 13 09:33:15 usqhpcadm atftpd[14135]: Serving kernel to 172.16.11.72:49164 Nov 13 09:33:15 usqhpcadm atftpd[14135]: Serving initrd.img to 172.16.11.72:49165 Nov 13 09:33:30 usqhpcadm dhcpd: DHCPDISCOVER from 00:26:9e:0a:a7:03 via eth2 Nov 13 09:33:30 usqhpcadm dhcpd: DHCPOFFER on 172.16.11.72 to 00:26:9e:0a:a7:03 via eth2 Nov 13 09:33:30 usqhpcadm dhcpd: DHCPREQUEST for 172.16.11.72 (172.16.11.27) from 00:26:9e:0a:a7:03 via eth2 Nov 13 09:33:30 usqhpcadm dhcpd: DHCPACK on 172.16.11.72 to 00:26:9e:0a:a7:03 via eth2 Nov 13 09:38:15 usqhpcadm atftpd[14135]: atftpd terminating after 300 seconds Nov 13 09:38:15 usqhpcadm atftpd[14135]: Main thread exiting Nov 13 09:38:15 usqhpcadm xinetd[13969]: EXIT: tftp status=0 pid=14135 duration=303(sec) The output in /var/log/oscar/oscar_wizard.log is: [DB - mkdhcpconf] querying ODA: Select Nodes.name From Nodes Where Nodes.id='31' --------- SQL query: Select Nodes.name From Nodes Where Nodes.id='31' --------- [DB - mkdhcpconf] Translated 31 to usqhpc30 [INFO - mkdhcpconf] Loaded OSCAR configuration (at Network.pm:482) [DB - mkdhcpconf] DB Query: SELECT rfc1918 FROM Networks WHERE base_ip='172.16.11.0'; [INFO - oscar_wizard] Checking lease file (/var/lib/dhcpd/dhcpd.leases). [INFO - oscar_wizard] DHCP lease file ready. [INFO - oscar_wizard] Setting service dhcp to on... [INFO - oscar_wizard] Called getitem with dhcp_service and returning dhcpd [INFO - oscar_wizard] dhcp is already on [INFO - oscar_wizard] Performing restart on dhcp service. [INFO - oscar_wizard] Called getitem with dhcp_service and returning dhcpd [INFO - oscar_wizard] About to run: LC_ALL=C /sbin/service dhcpd restart Shutting down dhcpd: [ OK ] Starting dhcpd: [ OK ] [INFO - oscar_wizard] DHCP service successfully set up for interface eth2. [INFO - oscar_wizard] Loaded OSCAR configuration (at MAC.pm:952) [INFO - oscar_wizard] Setup network boot (PXE) [ACTION - oscar_wizard] About to run: /usr/bin/setup_pxe -v [INFO - setup_pxe] Loaded OSCAR configuration (at Database.pm:981) 2014-11-13 9:14:27 [main :: Line 329] Checking arguments. 2014-11-13 9:14:27 [main :: Line 136] Restarting atftpd [INFO - setup_pxe] Called getitem with tftp_dir and returning /tftpboot/ [INFO - setup_pxe] Performing restart on tftp socket service. [INFO - setup_pxe] Performing restart on xinetd service. Stopping xinetd: [ OK ] Starting xinetd: [ OK ] 2014-11-13 9:14:28 [main :: Line 139] Enabling atftpd [INFO - setup_pxe] Setting xinetd service tftp to on... [INFO - setup_pxe] Performing restart on xinetd service. Stopping xinetd: [ OK ] Starting xinetd: [ OK ] 2014-11-13 9:14:28 [main :: Line 151] Creating directories. 2014-11-13 9:14:28 [main :: Line 202] Getting pxelinux.0. 2014-11-13 9:14:28 [main :: Line 206] Copying default pxelinux.cfg file 2014-11-13 9:14:28 [main :: Line 215] Updating /tftpboot/pxelinux.cfg//default file to skip local.cfg and support si_monitor. 2014-11-13 9:14:28 [main :: Line 234] Disabling nonexec mappings on x86_64 2014-11-13 9:14:28 [main :: Line 240] Copying SystemImager's message.txt to /tftpboot/pxelinux.cfg/ 2014-11-13 9:14:28 [main :: Line 305] Copying SystemImager standard boot kernel and initrd.img to /tftpboot/ 2014-11-13 9:14:29 [main :: Line 312] Symlinking SystemImager standard boot kernel and initrd.img to /tftpboot//kernel and /tftpboot//initrd.img respectively [INFO - oscar_wizard] Successfully setup network boot (PXE). ------------------------ Step 8: Completed successfully ------------------------ [INFO - oscar_wizard] Called getitem with oscar_testing_path and returning /usr/lib/oscar/testing [INFO - oscar_wizard] Called getitem with oscar_apitests_logdir and returning /var/log/oscar/apitests [ACTION - oscar_wizard] About to run: LC_ALL=C /usr/bin/apitest -o /var/log/oscar/apitests -v -f apitests.d/before_monitor_deployment.apb [INFO - oscar_wizard] Test before_monitor_deployment.apb succeeded. [INFO - oscar_wizard] Ready to enter step "monitor_deployment" [INFO - oscar_wizard] Performing start on monitor service. [INFO - oscar_wizard] Called getitem with monitor_service and returning systemimager-server-monitord [INFO - oscar_wizard] Performing status on monitor service. [INFO - oscar_wizard] Called getitem with monitor_service and returning systemimager-server-monitord [INFO - oscar_wizard] About to run: LC_ALL=C /sbin/service systemimager-server-monitord status Status of SystemImager's installation monitoring: si_monitor... running. [INFO - oscar_wizard] About to run: LC_ALL=C /sbin/service systemimager-server-monitord restart Stopping SystemImager's installation monitoring: si_monitor... stopped. Starting SystemImager's installation monitoring: si_monitor... ok. >From what I can see the initial stage of re-partitioning the harddrive works >but copying the image over doesn't. Thanks --------------------------------------------------------------------- Richard A. Young ICT Services Email: richard.yo...@usq.edu.au Phone: (07) 46315557 Mob: 0437544370 Fax: (07) 46312798 --------------------------------------------------------------------- -----Original Message----- From: LAHAYE Olivier [mailto:olivier.lah...@cea.fr] Sent: Wednesday, 12 November 2014 8:22 PM To: oscar-users@lists.sourceforge.net Subject: Re: [Oscar-users] Problem imaging nodes Hi Richard, The 1st error is normal, it means that no file specific to usqhpc10 has been found. Could you post the full log from the deployment monitor (You can hide specificities to your site by replacing IPS with other ones or hiostnames with other ones, but please keep all the lines if possible. The second error indicates that something was KIA... Also, tell me what oscar modules you did choose. Best regards, Olivier; -- Olivier LAHAYE CEA DRT/LIST/DIR ________________________________________ De : Richard Young [richard.yo...@usq.edu.au] Envoyé : mercredi 12 novembre 2014 02:25 À : Oscar-User Objet : [Oscar-users] Problem imaging nodes Hello, I have recently rebuilt our HPC using RHEL 6.6 and the latest version of Oscar, i.e. unstable, and am having some trouble imaging the nodes. After running through both the standard install guide and the RHEL quick guide there seems to be a problem with the final stage of imaging the nodes. The Oscar_wizard monitor says the installation is fine however when the nodes restarts you simply get a cursor on the screen, basically nothing has been copied to disk. There doesn't seem to be any errors on the screen from dhcp or pxe however when checking systemimager/rsyncd logs there are the following errors: rsync: change_dir "/usqhpc10" (in overrides) failed: No such file or directory (2) rsync: link_stat "/imaging_complete_172.16.11.71" (in scripts) failed: No such file or directory (2) also on the screen it says, sometimes, rsyncd not complete not all files copied. I have checked faqs and tips, and nothing covers this problem. Has anybody seen this before and is there a solution. Thanks --------------------------------------------------------------------- Richard A. Young ICT Services HPC Support Officer University of Southern Queensland Toowoomba, Queensland 4350 Australia Email: richard.yo...@usq.edu.au Phone: (07) 46315557 Mob: 0437544370 Fax: (07) 46312798 --------------------------------------------------------------------- _____________________________________________________________ This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. The University of Southern Queensland is a registered provider of education with the Australian Government. (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) ------------------------------------------------------------------------------ Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users ------------------------------------------------------------------------------ Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users _____________________________________________________________ This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. The University of Southern Queensland is a registered provider of education with the Australian Government. (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) ------------------------------------------------------------------------------ Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users ------------------------------------------------------------------------------ Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users