Hi, I found the problem and it's an easy fix. My fault, the quick start guide is incomplete. You need to add some post-install scripts so grub is configured :-/
cd /var/lib/systemimager/scripts/post-install/ sudo wget http://olivier.lahaye1.free.fr/OSCAR/SystemImager-scripts/15all.grub_install # For grub based distros (centos-6, ...) sudo wget http://olivier.lahaye1.free.fr/OSCAR/SystemImager-scripts/16all.network_config # Edit this to update DNS config I've updated the quick start guide accordingly. Sorry about that. Now it should work. Best regards, Olivier. -- Olivier LAHAYE CEA DRT/LIST/DIR ________________________________________ De : Richard Young [richard.yo...@usq.edu.au] Envoyé : jeudi 13 novembre 2014 14:19 À : oscar-users@lists.sourceforge.net Objet : Re: [Oscar-users] Problem imaging nodes Olivier Hopefully below is what you are after: >>> virtual console started for client 00.26.9E.02.74.0E <<< gathering previous messages... [ 0.387010] vgaarb: loaded [ 0.387072] vgaarb: bridge control possible 0000:01:05.0 [ 0.387154] ACPI: bus type usb registered [ 0.387233] usbcore: registered new interface driver usbfs [ 0.387312] usbcore: registered new interface driver hub [ 0.387435] usbcore: registered new device driver usb [ 0.387520] PCI: Using ACPI for IRQ routing [ 0.392347] PCI: pci_cache_line_size set to 64 bytes [ 0.392389] e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff] [ 0.392390] e820: reserve RAM buffer [mem 0xdffb0000-0xdfffffff] [ 0.392504] Switching to clocksource refined-jiffies [ 0.392663] pnp: PnP ACPI init [ 0.392729] ACPI: bus type pnp registered [ 0.392860] pnp 00:00: [bus 00-ff] [ 0.392862] pnp 00:00: [io 0x0cf8-0x0cff] [ 0.392863] pnp 00:00: [io 0x0000-0x0cf7 window] [ 0.392865] pnp 00:00: [io 0x0d00-0xffff window] [ 0.392866] pnp 00:00: [mem 0x000a0000-0x000bffff window] [ 0.392867] pnp 00:00: [mem 0x000d0000-0x000dffff window] [ 0.392869] pnp 00:00: [mem 0xe0000000-0xdfffffff window disabled] [ 0.392870] pnp 00:00: [mem 0xf0000000-0xff6fffff window] [ 0.392889] pnp 00:00: Plug and Play ACPI device, IDs PNP0a03 (active) [ 0.392899] pnp 00:01: [dma 4] [ 0.392900] pnp 00:01: [io 0x0000-0x000f] [ 0.392902] pnp 00:01: [io 0x0081-0x0083] [ 0.392903] pnp 00:01: [io 0x0087] [ 0.392904] pnp 00:01: [io 0x0089-0x008b] [ 0.392905] pnp 00:01: [io 0x008f] [ 0.392906] pnp 00:01: [io 0x00c0-0x00df] [ 0.392916] pnp 00:01: Plug and Play ACPI device, IDs PNP0200 (active) [ 0.392925] pnp 00:02: [io 0x0070-0x0071] [ 0.392935] pnp 00:02: [irq 8] [ 0.392946] pnp 00:02: Plug and Play ACPI device, IDs PNP0b00 (active) [ 0.392951] pnp 00:03: [io 0x0061] [ 0.392962] pnp 00:03: Plug and Play ACPI device, IDs PNP0800 (active) [ 0.392967] pnp 00:04: [io 0x00f0-0x00ff] [ 0.392973] pnp 00:04: [irq 13] [ 0.392982] pnp 00:04: Plug and Play ACPI device, IDs PNP0c04 (active) [ 0.393153] pnp 00:05: [io 0x03f8-0x03ff] [ 0.393159] pnp 00:05: [irq 4] [ 0.393160] pnp 00:05: [dma 0 disabled] [ 0.393199] pnp 00:05: Plug and Play ACPI device, IDs PNP0501 (active) [ 0.393362] pnp 00:06: [io 0x02f8-0x02ff] [ 0.393368] pnp 00:06: [irq 3] [ 0.393369] pnp 00:06: [dma 0 disabled] [ 0.393423] pnp 00:06: Plug and Play ACPI device, IDs PNP0501 (active) [ 0.393706] pnp 00:07: [mem 0x000d0000-0x000d3fff window] [ 0.393708] pnp 00:07: [mem 0x000d4000-0x000d7fff window] [ 0.393709] pnp 00:07: [mem 0x000de000-0x000dffff window] [ 0.393712] pnp 00:07: [io 0x0010-0x001f] [ 0.393713] pnp 00:07: [io 0x0022-0x003f] [ 0.393714] pnp 00:07: [io 0x0044-0x005f] [ 0.393716] pnp 00:07: [io 0x0062-0x0063] [ 0.393717] pnp 00:07: [io 0x0065-0x006f] [ 0.393718] pnp 00:07: [io 0x0072-0x007f] [ 0.393719] pnp 00:07: [io 0x0080] [ 0.393720] pnp 00:07: [io 0x0084-0x0086] [ 0.393721] pnp 00:07: [io 0x0088] [ 0.393722] pnp 00:07: [io 0x008c-0x008e] [ 0.393723] pnp 00:07: [io 0x0090-0x009f] [ 0.393725] pnp 00:07: [io 0x00a2-0x00bf] [ 0.393726] pnp 00:07: [io 0x00e0-0x00ef] [ 0.393727] pnp 00:07: [io 0x0ca0-0x0cbf] [ 0.393728] pnp 00:07: [io 0x04d0-0x04d1] [ 0.393729] pnp 00:07: [io 0x0800-0x080f] [ 0.393730] pnp 00:07: [io 0x2000-0x207f] [ 0.393732] pnp 00:07: [io 0x2080-0x20ff] [ 0.393733] pnp 00:07: [io 0x2400-0x247f] [ 0.393734] pnp 00:07: [io 0x2480-0x24ff] [ 0.393735] pnp 00:07: [io 0x2800-0x287f] [ 0.393736] pnp 00:07: [io 0x2880-0x28ff] [ 0.393737] pnp 00:07: [io 0x2f00-0x2f7f] [ 0.393739] pnp 00:07: [io 0x2f80-0x2fff] [ 0.393740] pnp 00:07: [mem 0xfcf80000-0xfcfbffff] [ 0.393741] pnp 00:07: [mem 0xfee01000-0xfeefffff] [ 0.393783] system 00:07: [io 0x0ca0-0x0cbf] has been reserved [ 0.393857] system 00:07: [io 0x04d0-0x04d1] has been reserved [ 0.393930] system 00:07: [io 0x0800-0x080f] has been reserved [ 0.394004] system 00:07: [io 0x2000-0x207f] has been reserved [ 0.394081] system 00:07: [io 0x2080-0x20ff] has been reserved [ 0.394154] system 00:07: [io 0x2400-0x247f] has been reserved [ 0.394228] system 00:07: [io 0x2480-0x24ff] has been reserved [ 0.394302] system 00:07: [io 0x2800-0x287f] has been reserved [ 0.394375] system 00:07: [io 0x2880-0x28ff] has been reserved [ 0.394449] system 00:07: [io 0x2f00-0x2f7f] has been reserved [ 0.394523] system 00:07: [io 0x2f80-0x2fff] has been reserved [ 0.394597] system 00:07: [mem 0x000d0000-0x000d3fff window] has been reserved [ 0.394675] system 00:07: [mem 0x000d4000-0x000d7fff window] has been reserved [ 0.394753] system 00:07: [mem 0x000de000-0x000dffff window] has been reserved [ 0.394832] system 00:07: [mem 0xfcf80000-0xfcfbffff] has been reserved [ 0.394908] system 00:07: [mem 0xfee01000-0xfeefffff] has been reserved [ 0.394985] system 00:07: Plug and Play ACPI device, IDs PNP0c02 (active) [ 0.395036] pnp 00:08: [io 0x0060] [ 0.395037] pnp 00:08: [io 0x0064] [ 0.395038] pnp 00:08: [mem 0xfec00000-0xfec00fff] [ 0.395040] pnp 00:08: [mem 0xfee00000-0xfee00fff] [ 0.395059] system 00:08: [mem 0xfec00000-0xfec00fff] could not be reserved [ 0.395137] system 00:08: [mem 0xfee00000-0xfee00fff] has been reserved [ 0.395213] system 00:08: Plug and Play ACPI device, IDs PNP0c02 (active) [ 0.395285] pnp 00:09: [io 0x0000-0xffffffffffffffff disabled] [ 0.395286] pnp 00:09: [io 0x0a00-0x0a0f] [ 0.395287] pnp 00:09: [io 0x0a7f-0x0a8e] [ 0.395288] pnp 00:09: [io 0x0060] [ 0.395289] pnp 00:09: [io 0x0064] [ 0.395310] system 00:09: [io 0x0a00-0x0a0f] has been reserved [ 0.395384] system 00:09: [io 0x0a7f-0x0a8e] has been reserved [ 0.395458] system 00:09: Plug and Play ACPI device, IDs PNP0c02 (active) [ 0.395489] pnp 00:0a: [mem 0xe0000000-0xefffffff] [ 0.395507] system 00:0a: [mem 0xe0000000-0xefffffff] has been reserved [ 0.395583] system 00:0a: Plug and Play ACPI device, IDs PNP0c02 (active) [ 0.395689] pnp 00:0b: [mem 0x00000000-0x0009ffff] [ 0.395690] pnp 00:0b: [mem 0x000c0000-0x000cffff] [ 0.395691] pnp 00:0b: [mem 0x000e0000-0x000fffff] [ 0.395693] pnp 00:0b: [mem 0x00100000-0xdfffffff] [ 0.395694] pnp 00:0b: [mem 0xff700000-0xffffffff] [ 0.395717] system 00:0b: [mem 0x00000000-0x0009ffff] could not be reserved [ 0.395794] system 00:0b: [mem 0x000c0000-0x000cffff] could not be reserved [ 0.395872] system 00:0b: [mem 0x000e0000-0x000fffff] could not be reserved [ 0.395949] system 00:0b: [mem 0x00100000-0xdfffffff] could not be reserved [ 0.396027] system 00:0b: [mem 0xff700000-0xffffffff] has been reserved [ 0.396103] system 00:0b: Plug and Play ACPI device, IDs PNP0c01 (active) [ 0.396289] pnp: PnP ACPI: found 12 devices [ 0.396356] ACPI: ACPI bus type pnp unregistered [ 0.402024] Switching to clocksource acpi_pm [ 0.402194] pci 0000:00:06.0: PCI bridge to [bus 01] [ 0.402265] pci 0000:00:06.0: bridge window [io 0xe000-0xefff] [ 0.402340] pci 0000:00:06.0: bridge window [mem 0xfd000000-0xfdefffff] [ 0.402418] pci 0000:00:0a.0: PCI bridge to [bus 02] [ 0.402491] pci 0000:00:0b.0: PCI bridge to [bus 03] [ 0.402563] pci 0000:00:0c.0: PCI bridge to [bus 04] [ 0.402636] pci 0000:05:00.0: PCI bridge to [bus 06] [ 0.402709] pci 0000:05:00.0: bridge window [mem 0xfdf00000-0xfdffffff] [ 0.402789] pci 0000:00:0d.0: PCI bridge to [bus 05-06] [ 0.402861] pci 0000:00:0d.0: bridge window [mem 0xfdf00000-0xfdffffff] [ 0.402939] pci 0000:00:0f.0: PCI bridge to [bus 07] [ 0.403015] pci 0000:00:06.0: setting latency timer to 64 [ 0.403024] pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7] [ 0.403026] pci_bus 0000:00: resource 5 [io 0x0d00-0xffff] [ 0.403027] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff] [ 0.403029] pci_bus 0000:00: resource 7 [mem 0x000d0000-0x000dffff] [ 0.403030] pci_bus 0000:00: resource 8 [mem 0xf0000000-0xff6fffff] [ 0.403032] pci_bus 0000:01: resource 0 [io 0xe000-0xefff] [ 0.403033] pci_bus 0000:01: resource 1 [mem 0xfd000000-0xfdefffff] [ 0.403034] pci_bus 0000:01: resource 4 [io 0x0000-0x0cf7] [ 0.403036] pci_bus 0000:01: resource 5 [io 0x0d00-0xffff] [ 0.403037] pci_bus 0000:01: resource 6 [mem 0x000a0000-0x000bffff] [ 0.403038] pci_bus 0000:01: resource 7 [mem 0x000d0000-0x000dffff] [ 0.403040] pci_bus 0000:01: resource 8 [mem 0xf0000000-0xff6fffff] [ 0.403042] pci_bus 0000:05: resource 1 [mem 0xfdf00000-0xfdffffff] [ 0.403043] pci_bus 0000:06: resource 1 [mem 0xfdf00000-0xfdffffff] [ 0.403068] NET: Registered protocol family 2 [ 0.403263] TCP established hash table entries: 262144 (order: 10, 4194304 bytes) [ 0.404998] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) [ 0.405558] TCP: Hash tables configured (established 262144 bind 65536) [ 0.405697] TCP: reno registered [ 0.405765] UDP hash table entries: 8192 (order: 6, 262144 bytes) [ 0.405980] UDP-Lite hash table entries: 8192 (order: 6, 262144 bytes) [ 0.406238] NET: Registered protocol family 1 [ 0.406430] ACPI: PCI Interrupt Link [LUB0] enabled at IRQ 23 [ 0.696274] ACPI: PCI Interrupt Link [LUB2] enabled at IRQ 22 [ 0.696422] pci 0000:01:05.0: Boot video device [ 0.696429] PCI: CLS 64 bytes, default 64 [ 0.696469] Trying to unpack rootfs image as initramfs... [ 1.239949] Freeing initrd memory: 29848k freed [ 1.250402] PCI-DMA: Disabling AGP. [ 1.250607] PCI-DMA: aperture base @ d4000000 size 65536 KB [ 1.250679] PCI-DMA: using GART IOMMU. [ 1.250747] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture [ 1.255516] LVT offset 1 assigned for vector 0x400 [ 1.255602] IBS: LVT offset 1 assigned [ 1.255694] perf: AMD IBS detected (0x0000001f) [ 1.257293] msgmni has been set to 32065 [ 1.257682] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254) [ 1.257761] io scheduler noop registered [ 1.257827] io scheduler deadline registered (default) [ 1.258128] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0 [ 1.258213] ACPI: Power Button [PWRB] [ 1.258310] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1 [ 1.258389] ACPI: Power Button [PWRF] [ 1.261017] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 1.281426] 00:05: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A [ 1.301842] 00:06: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A [ 1.302085] Linux agpgart interface v0.103 [ 1.302966] brd: module loaded [ 1.303420] loop: module loaded [ 1.303533] i8042: PNP: No PS/2 controller found. Probing ports directly. [ 1.306084] serio: i8042 KBD port at 0x60,0x64 irq 1 [ 1.306175] serio: i8042 AUX port at 0x60,0x64 irq 12 [ 1.306435] mousedev: PS/2 mouse device common for all mice [ 1.306665] cpuidle: using governor ladder [ 1.306957] usbcore: registered new interface driver usbhid [ 1.307031] usbhid: USB HID core driver [ 1.307195] TCP: cubic registered [ 1.307262] NET: Registered protocol family 17 [ 1.307344] NET: Registered protocol family 15 [ 2.286940] tsc: Refined TSC clocksource calibration: 2713.906 MHz [ 2.287024] Switching to clocksource tsc [ 4.320267] floppy0: no floppy controllers found [ 4.320496] Freeing unused kernel memory: 540k freed get_arch adjust_arch ifconfig_loopback start_udevd load_my_modules variableize_kernel_append_parameters read_kernel_append_parameters read_local_cfg Skipping local.cfg: option SKIP_LOCAL_CFG=y has been specified read_kernel_append_parameters start_network IP Address not set with pre-boot settings. sleep 0: This is to give your switch (if you're using one) time to recognize your ethernet card before we try the network. Tip: You can use <ctrl>+<c> to pass the time (pun intended). dhclient Internet Systems Consortium DHCP Client V3.1.3 Copyright 2004-2009 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ Listening on LPF/eth1/00:26:9e:02:74:0f Sending on LPF/eth1/00:26:9e:02:74:0f Listening on LPF/eth0/00:26:9e:02:74:0e Sending on LPF/eth0/00:26:9e:02:74:0e Sending on Socket/fallback DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 6 DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 6 DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 10 DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8 DHCPOFFER from 172.16.11.27 DHCPREQUEST on eth0 to 255.255.255.255 port 67 DHCPACK from 172.16.11.27 Using option-140 as IMAGESERVER: 172.16.11.27 bound to 172.16.11.73 -- renewal in 20348 seconds. Overriding any DHCP settings with pre-boot settings from kernel append parameters. read_kernel_append_parameters ping_test Pinging your SystemImager server to ensure we have network connectivity. PING ATTEMPT 1: PING 172.16.11.27 (172.16.11.27): 56 data bytes 64 bytes from 172.16.11.27: seq=0 ttl=64 time=0.270 ms --- 172.16.11.27 ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 0.270/0.270/0.270 ms We have connectivity to your SystemImager server! Monitoring initialized. start_syslogd get_scripts_directory rsync -a 172.16.11.27::scripts/ /scripts/ >> Loaded kernel modules: sd_mod ata_generic pata_amd sata_nv libata scsi_mod hid_generic microcode pcspkr serio_raw tg3 amd74xx ide_pci_generic forcedeth ide_core ehci_hcd ohci_hcd evdev This hosts name is: usqhpc13 run_pre_install_scripts >>> 95all.nothing_to_do_script There is noting to do pre-installl. >>> 99all.harmless_example_script I live in /var/lib/systemimager/scripts/pre-install. choose_autoinstall_script Using autoinstall script: /scripts/usqhpc.master write_variables run_autoinstall_script >>> /scripts/usqhpc.master get_arch enumerate_disks sda DISKS=1 Partitioning /dev/sda... Old partition table for /dev/sda: Model: ATA SEAGATE ST32502N (scsi) Disk /dev/sda: 250GB Sector size (logical/physical): 512B/512B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 512B 102MB 102MB primary ext3 boot 2 102MB 250GB 250GB extended 5 102MB 65.6GB 65.5GB logical linux-swap(v1) 6 65.6GB 250GB 184GB logical ext3 dd if=/dev/zero of=/dev/sda bs=512 count=1 || shellout 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000102811 s, 5.0 MB/s blockdev --rereadpt /dev/sda parted -s -- /dev/sda mklabel msdos || shellout Creating partition /dev/sda1. parted -s -- /dev/sda mkpart primary linux-swap 1 65537 || shellout Creating partition /dev/sda2. parted -s -- /dev/sda mkpart primary 65537 250056 || shellout parted -s -- /dev/sda set 2 boot on || shellout parted -s -- /dev/sda set 2 boot on New partition table for /dev/sda: parted -s -- /dev/sda print Model: ATA SEAGATE ST32502N (scsi) Disk /dev/sda: 250GB Sector size (logical/physical): 512B/512B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 1049kB 65.5GB 65.5GB primary 2 65.5GB 250GB 185GB primary Load software RAID modules. Load device mapper driver (for LVM). Load additional filesystem drivers. modprobe: module fat not found in modules.dep modprobe: module vfat not found in modules.dep mkswap -v1 /dev/sda1 || shellout Setting up swapspace version 1, size = 63999996 KiB no label, UUID=68940132-7c7a-46e8-86a9-eb1713f551b0 swapon /dev/sda1 || shellout mke2fs -q -t ext3 /dev/sda2 || shellout mkdir -p /a/ || shellout mount /dev/sda2 /a/ -t ext3 -o defaults || shellout mkdir -p /a/proc || shellout mount proc /a/proc -t proc -o defaults || shellout mkdir -p /a/sys || shellout mount sysfs /a/sys -t sysfs -o defaults || shellout Evaluating image size... --> Image size = 1509MiB Report task started. Quietly installing image... rsync -aHS --exclude=lost+found/ --exclude=/proc/* --numeric-ids 172.16.11.27::usqhpcimage/ /a/ Report task stopped. rsync -av --numeric-ids 172.16.11.27::overrides/usqhpcimage/ /a/ rsync -av --numeric-ids 172.16.11.27::overrides/usqhpc13/ /a/ rsync: change_dir "/usqhpc13" (in overrides) failed: No such file or directory (2) rsync error: some files could not be transferred (code 23) at main.c(1538) [receiver=3.0.0] Override directory usqhpc13 doesn't seem to exist, but that may be OK. rsync -av --numeric-ids 172.16.11.27::overrides/usqhpcimage/ /a/ Editing files for actual disk configuration... /dev/sda -> /dev/sda /etc/fstab /etc/systemconfig/systemconfig.conf /boot/grub/menu.lst run_post_install_scripts >>> 10all.fix_swap_uuids >>> 11all.replace_byid_device >>> 95all.monitord_rebooted >>> 99all.harmless_example_script I live in /var/lib/systemimager/scripts/post-install. See: /var/lib/systemimager/scripts/post-install/README for details. umount /a/sys || mount -no remount,ro /a//sys || shellout umount /a/proc || mount -no remount,ro /a//proc || shellout umount /a/ || mount -no remount,ro /a// || shellout umount: /a: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Imaging completed --------------------------------------------------------------------- Richard A. Young ICT Services Email: richard.yo...@usq.edu.au Phone: (07) 46315557 Mob: 0437544370 Fax: (07) 46312798 --------------------------------------------------------------------- -----Original Message----- From: LAHAYE Olivier [mailto:olivier.lah...@cea.fr] Sent: Thursday, 13 November 2014 7:06 PM To: oscar-users@lists.sourceforge.net Subject: Re: [Oscar-users] Problem imaging nodes Hi Richard, Sorry for not having been more clear, What I need is the install log from the node. You can retreive it by clicking on "Monitor Cluster Deployment" and start the imaging. When imaging is started, double click on the node being deployed and you should have a cloned console of the node. When it is finished , you can use the menu tu save the content and send it to me (you can changes infos specific to your site like IP addresses or hostnames if you don't want to disclose them. What is importat is that I can see all the steps dones. On OSCAR side, I see no problem so far from you script, so I suspect something wrong is rsyncd.conf or maybe in /var/lib/systemimager/scripts. Aside that, I've rebuild all oscar packages on a CentOS-6.6 yesterday, but I'm pretty sure that it has no impact on your problem unfortunately. I'm trying to reproduce such a problem on my new Centos-6.6 VM right now. Hopefully this can be reproduced. I'm confident that it's a simple thing to fix. Cheerrs. -- Olivier LAHAYE CEA DRT/LIST/DIR ________________________________________ De : Richard Young [richard.yo...@usq.edu.au] Envoyé : jeudi 13 novembre 2014 02:54 À : oscar-users@lists.sourceforge.net Objet : Re: [Oscar-users] Problem imaging nodes Olivier Thanks for your replay. The installed Oscar packages are below: Apitest Base Blcr C3 Ganglia Jobmonarch Maui Mtaconfig Munge Naemon Netbootmgr Ntpconfig Oda Sc3 Sis Switcher Sync-files Torque Yume Hopefully below is what you are after from the monitor console: [INFO - mkdhcpconf] Loaded OSCAR configuration (at Network.pm:482) [DB - mkdhcpconf] DB Query: SELECT rfc1918 FROM Networks WHERE base_ip='172.16.11.0'; [INFO - oscar_wizard] Checking lease file (/var/lib/dhcpd/dhcpd.leases). [INFO - oscar_wizard] DHCP lease file ready. [INFO - oscar_wizard] Setting service dhcp to on... [INFO - oscar_wizard] Called getitem with dhcp_service and returning dhcpd [INFO - oscar_wizard] dhcp is already on [INFO - oscar_wizard] Performing restart on dhcp service. [INFO - oscar_wizard] Called getitem with dhcp_service and returning dhcpd [INFO - oscar_wizard] About to run: LC_ALL=C /sbin/service dhcpd restart Shutting down dhcpd: [ OK ] Starting dhcpd: [ OK ] [INFO - oscar_wizard] DHCP service successfully set up for interface eth2. [INFO - oscar_wizard] Loaded OSCAR configuration (at MAC.pm:952) [INFO - oscar_wizard] Setup network boot (PXE) [ACTION - oscar_wizard] About to run: /usr/bin/setup_pxe -v [INFO - setup_pxe] Loaded OSCAR configuration (at Database.pm:981) 2014-11-13 9:14:27 [main :: Line 329] Checking arguments. 2014-11-13 9:14:27 [main :: Line 136] Restarting atftpd [INFO - setup_pxe] Called getitem with tftp_dir and returning /tftpboot/ [INFO - setup_pxe] Performing restart on tftp socket service. [INFO - setup_pxe] Performing restart on xinetd service. Stopping xinetd: [ OK ] Starting xinetd: [ OK ] 2014-11-13 9:14:28 [main :: Line 139] Enabling atftpd [INFO - setup_pxe] Setting xinetd service tftp to on... [INFO - setup_pxe] Performing restart on xinetd service. Stopping xinetd: [ OK ] Starting xinetd: [ OK ] 2014-11-13 9:14:28 [main :: Line 151] Creating directories. 2014-11-13 9:14:28 [main :: Line 202] Getting pxelinux.0. 2014-11-13 9:14:28 [main :: Line 206] Copying default pxelinux.cfg file 2014-11-13 9:14:28 [main :: Line 215] Updating /tftpboot/pxelinux.cfg//default file to skip local.cfg and support si_monitor. 2014-11-13 9:14:28 [main :: Line 234] Disabling nonexec mappings on x86_64 2014-11-13 9:14:28 [main :: Line 240] Copying SystemImager's message.txt to /tftpboot/pxelinux.cfg/ 2014-11-13 9:14:28 [main :: Line 305] Copying SystemImager standard boot kernel and initrd.img to /tftpboot/ 2014-11-13 9:14:29 [main :: Line 312] Symlinking SystemImager standard boot kernel and initrd.img to /tftpboot//kernel and /tftpboot//initrd.img respectively [INFO - oscar_wizard] Successfully setup network boot (PXE). ------------------------ Step 8: Completed successfully ------------------------ [INFO - oscar_wizard] Called getitem with oscar_testing_path and returning /usr/lib/oscar/testing [INFO - oscar_wizard] Called getitem with oscar_apitests_logdir and returning /var/log/oscar/apitests [ACTION - oscar_wizard] About to run: LC_ALL=C /usr/bin/apitest -o /var/log/oscar/apitests -v -f apitests.d/before_monitor_deployment.apb [INFO - oscar_wizard] Test before_monitor_deployment.apb succeeded. [INFO - oscar_wizard] Ready to enter step "monitor_deployment" [INFO - oscar_wizard] Performing start on monitor service. [INFO - oscar_wizard] Called getitem with monitor_service and returning systemimager-server-monitord [INFO - oscar_wizard] Performing status on monitor service. [INFO - oscar_wizard] Called getitem with monitor_service and returning systemimager-server-monitord [INFO - oscar_wizard] About to run: LC_ALL=C /sbin/service systemimager-server-monitord status Status of SystemImager's installation monitoring: si_monitor... running. [INFO - oscar_wizard] About to run: LC_ALL=C /sbin/service systemimager-server-monitord restart Stopping SystemImager's installation monitoring: si_monitor... stopped. Starting SystemImager's installation monitoring: si_monitor... ok. Below is the output from the above attempt to install a node in /var/log/systemimager/rsyncd: 2014/11/13 09:36:06 [14172] connect from usqhpc12 (172.16.11.72) 2014/11/12 23:36:06 [14172] rsync on scripts/imaging_complete_172.16.11.72 from usqhpc12 (172.16.11.72) 2014/11/12 23:36:06 [14172] building file list 2014/11/12 23:36:06 [14172] rsync: link_stat "/imaging_complete_172.16.11.72" (in scripts) failed: No such file or directory (2) Also below is the output from the above installation in /var/log/messages: Nov 13 09:14:26 usqhpcadm dhcpd: Internet Systems Consortium DHCP Server 4.1.1-P1 Nov 13 09:14:26 usqhpcadm dhcpd: Copyright 2004-2010 Internet Systems Consortium. Nov 13 09:14:26 usqhpcadm dhcpd: All rights reserved. Nov 13 09:14:26 usqhpcadm dhcpd: For info, please visit https://www.isc.org/software/dhcp/ Nov 13 09:14:26 usqhpcadm dhcpd: Not searching LDAP since ldap-server, ldap-port and ldap-base-dn were not specified in the config file Nov 13 09:14:26 usqhpcadm dhcpd: Wrote 0 deleted host decls to leases file. Nov 13 09:14:26 usqhpcadm dhcpd: Wrote 0 new dynamic host decls to leases file. Nov 13 09:14:26 usqhpcadm dhcpd: Wrote 0 leases to leases file. Nov 13 09:14:26 usqhpcadm dhcpd: Listening on LPF/eth2/00:23:8b:03:80:1f/172.16.11.0/24 Nov 13 09:14:26 usqhpcadm dhcpd: Sending on LPF/eth2/00:23:8b:03:80:1f/172.16.11.0/24 Nov 13 09:14:26 usqhpcadm dhcpd: Sending on Socket/fallback/fallback-net Nov 13 09:14:28 usqhpcadm xinetd[1969]: Exiting... Nov 13 09:14:28 usqhpcadm xinetd[13944]: xinetd Version 2.3.14 started with libwrap loadavg labeled-networking options compiled in. Nov 13 09:14:28 usqhpcadm xinetd[13944]: Started working: 1 available service Nov 13 09:14:28 usqhpcadm xinetd[13944]: Starting reconfiguration Nov 13 09:14:28 usqhpcadm xinetd[13944]: Swapping defaults Nov 13 09:14:28 usqhpcadm xinetd[13944]: readjusting service tftp Nov 13 09:14:28 usqhpcadm xinetd[13944]: Reconfigured: new=0 old=1 dropped=0 (services) Nov 13 09:14:28 usqhpcadm xinetd[13944]: Exiting... Nov 13 09:14:28 usqhpcadm xinetd[13969]: xinetd Version 2.3.14 started with libwrap loadavg labeled-networking options compiled in. Nov 13 09:14:28 usqhpcadm xinetd[13969]: Started working: 1 available service Nov 13 09:33:08 usqhpcadm dhcpd: DHCPDISCOVER from 00:26:9e:0a:a7:03 via eth2 Nov 13 09:33:08 usqhpcadm dhcpd: DHCPOFFER on 172.16.11.72 to 00:26:9e:0a:a7:03 via eth2 Nov 13 09:33:12 usqhpcadm dhcpd: DHCPREQUEST for 172.16.11.72 (172.16.11.27) from 00:26:9e:0a:a7:03 via eth2 Nov 13 09:33:12 usqhpcadm dhcpd: DHCPACK on 172.16.11.72 to 00:26:9e:0a:a7:03 via eth2 Nov 13 09:33:12 usqhpcadm xinetd[13969]: START: tftp pid=14135 from=172.16.11.72 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Advanced Trivial FTP server started (0.7.1) Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.0 to 172.16.11.72:2070 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.0 to 172.16.11.72:2071 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/a984443b-6d7a-0010-91d8-00232bced6c0 to 172.16.11.72:49152 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/01-00-26-9e-0a-a7-03 to 172.16.11.72:49153 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC100B48 to 172.16.11.72:49154 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC100B4 to 172.16.11.72:49155 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC100B to 172.16.11.72:49156 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC100 to 172.16.11.72:49157 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC10 to 172.16.11.72:49158 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC1 to 172.16.11.72:49159 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/AC to 172.16.11.72:49160 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/A to 172.16.11.72:49161 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving pxelinux.cfg/default to 172.16.11.72:49162 Nov 13 09:33:12 usqhpcadm atftpd[14135]: Serving message.txt to 172.16.11.72:49163 Nov 13 09:33:15 usqhpcadm atftpd[14135]: Serving kernel to 172.16.11.72:49164 Nov 13 09:33:15 usqhpcadm atftpd[14135]: Serving initrd.img to 172.16.11.72:49165 Nov 13 09:33:30 usqhpcadm dhcpd: DHCPDISCOVER from 00:26:9e:0a:a7:03 via eth2 Nov 13 09:33:30 usqhpcadm dhcpd: DHCPOFFER on 172.16.11.72 to 00:26:9e:0a:a7:03 via eth2 Nov 13 09:33:30 usqhpcadm dhcpd: DHCPREQUEST for 172.16.11.72 (172.16.11.27) from 00:26:9e:0a:a7:03 via eth2 Nov 13 09:33:30 usqhpcadm dhcpd: DHCPACK on 172.16.11.72 to 00:26:9e:0a:a7:03 via eth2 Nov 13 09:38:15 usqhpcadm atftpd[14135]: atftpd terminating after 300 seconds Nov 13 09:38:15 usqhpcadm atftpd[14135]: Main thread exiting Nov 13 09:38:15 usqhpcadm xinetd[13969]: EXIT: tftp status=0 pid=14135 duration=303(sec) The output in /var/log/oscar/oscar_wizard.log is: [DB - mkdhcpconf] querying ODA: Select Nodes.name From Nodes Where Nodes.id='31' --------- SQL query: Select Nodes.name From Nodes Where Nodes.id='31' --------- [DB - mkdhcpconf] Translated 31 to usqhpc30 [INFO - mkdhcpconf] Loaded OSCAR configuration (at Network.pm:482) [DB - mkdhcpconf] DB Query: SELECT rfc1918 FROM Networks WHERE base_ip='172.16.11.0'; [INFO - oscar_wizard] Checking lease file (/var/lib/dhcpd/dhcpd.leases). [INFO - oscar_wizard] DHCP lease file ready. [INFO - oscar_wizard] Setting service dhcp to on... [INFO - oscar_wizard] Called getitem with dhcp_service and returning dhcpd [INFO - oscar_wizard] dhcp is already on [INFO - oscar_wizard] Performing restart on dhcp service. [INFO - oscar_wizard] Called getitem with dhcp_service and returning dhcpd [INFO - oscar_wizard] About to run: LC_ALL=C /sbin/service dhcpd restart Shutting down dhcpd: [ OK ] Starting dhcpd: [ OK ] [INFO - oscar_wizard] DHCP service successfully set up for interface eth2. [INFO - oscar_wizard] Loaded OSCAR configuration (at MAC.pm:952) [INFO - oscar_wizard] Setup network boot (PXE) [ACTION - oscar_wizard] About to run: /usr/bin/setup_pxe -v [INFO - setup_pxe] Loaded OSCAR configuration (at Database.pm:981) 2014-11-13 9:14:27 [main :: Line 329] Checking arguments. 2014-11-13 9:14:27 [main :: Line 136] Restarting atftpd [INFO - setup_pxe] Called getitem with tftp_dir and returning /tftpboot/ [INFO - setup_pxe] Performing restart on tftp socket service. [INFO - setup_pxe] Performing restart on xinetd service. Stopping xinetd: [ OK ] Starting xinetd: [ OK ] 2014-11-13 9:14:28 [main :: Line 139] Enabling atftpd [INFO - setup_pxe] Setting xinetd service tftp to on... [INFO - setup_pxe] Performing restart on xinetd service. Stopping xinetd: [ OK ] Starting xinetd: [ OK ] 2014-11-13 9:14:28 [main :: Line 151] Creating directories. 2014-11-13 9:14:28 [main :: Line 202] Getting pxelinux.0. 2014-11-13 9:14:28 [main :: Line 206] Copying default pxelinux.cfg file 2014-11-13 9:14:28 [main :: Line 215] Updating /tftpboot/pxelinux.cfg//default file to skip local.cfg and support si_monitor. 2014-11-13 9:14:28 [main :: Line 234] Disabling nonexec mappings on x86_64 2014-11-13 9:14:28 [main :: Line 240] Copying SystemImager's message.txt to /tftpboot/pxelinux.cfg/ 2014-11-13 9:14:28 [main :: Line 305] Copying SystemImager standard boot kernel and initrd.img to /tftpboot/ 2014-11-13 9:14:29 [main :: Line 312] Symlinking SystemImager standard boot kernel and initrd.img to /tftpboot//kernel and /tftpboot//initrd.img respectively [INFO - oscar_wizard] Successfully setup network boot (PXE). ------------------------ Step 8: Completed successfully ------------------------ [INFO - oscar_wizard] Called getitem with oscar_testing_path and returning /usr/lib/oscar/testing [INFO - oscar_wizard] Called getitem with oscar_apitests_logdir and returning /var/log/oscar/apitests [ACTION - oscar_wizard] About to run: LC_ALL=C /usr/bin/apitest -o /var/log/oscar/apitests -v -f apitests.d/before_monitor_deployment.apb [INFO - oscar_wizard] Test before_monitor_deployment.apb succeeded. [INFO - oscar_wizard] Ready to enter step "monitor_deployment" [INFO - oscar_wizard] Performing start on monitor service. [INFO - oscar_wizard] Called getitem with monitor_service and returning systemimager-server-monitord [INFO - oscar_wizard] Performing status on monitor service. [INFO - oscar_wizard] Called getitem with monitor_service and returning systemimager-server-monitord [INFO - oscar_wizard] About to run: LC_ALL=C /sbin/service systemimager-server-monitord status Status of SystemImager's installation monitoring: si_monitor... running. [INFO - oscar_wizard] About to run: LC_ALL=C /sbin/service systemimager-server-monitord restart Stopping SystemImager's installation monitoring: si_monitor... stopped. Starting SystemImager's installation monitoring: si_monitor... ok. >From what I can see the initial stage of re-partitioning the harddrive works >but copying the image over doesn't. Thanks --------------------------------------------------------------------- Richard A. Young ICT Services Email: richard.yo...@usq.edu.au Phone: (07) 46315557 Mob: 0437544370 Fax: (07) 46312798 --------------------------------------------------------------------- -----Original Message----- From: LAHAYE Olivier [mailto:olivier.lah...@cea.fr] Sent: Wednesday, 12 November 2014 8:22 PM To: oscar-users@lists.sourceforge.net Subject: Re: [Oscar-users] Problem imaging nodes Hi Richard, The 1st error is normal, it means that no file specific to usqhpc10 has been found. Could you post the full log from the deployment monitor (You can hide specificities to your site by replacing IPS with other ones or hiostnames with other ones, but please keep all the lines if possible. The second error indicates that something was KIA... Also, tell me what oscar modules you did choose. Best regards, Olivier; -- Olivier LAHAYE CEA DRT/LIST/DIR ________________________________________ De : Richard Young [richard.yo...@usq.edu.au] Envoyé : mercredi 12 novembre 2014 02:25 À : Oscar-User Objet : [Oscar-users] Problem imaging nodes Hello, I have recently rebuilt our HPC using RHEL 6.6 and the latest version of Oscar, i.e. unstable, and am having some trouble imaging the nodes. After running through both the standard install guide and the RHEL quick guide there seems to be a problem with the final stage of imaging the nodes. The Oscar_wizard monitor says the installation is fine however when the nodes restarts you simply get a cursor on the screen, basically nothing has been copied to disk. There doesn't seem to be any errors on the screen from dhcp or pxe however when checking systemimager/rsyncd logs there are the following errors: rsync: change_dir "/usqhpc10" (in overrides) failed: No such file or directory (2) rsync: link_stat "/imaging_complete_172.16.11.71" (in scripts) failed: No such file or directory (2) also on the screen it says, sometimes, rsyncd not complete not all files copied. I have checked faqs and tips, and nothing covers this problem. Has anybody seen this before and is there a solution. Thanks --------------------------------------------------------------------- Richard A. Young ICT Services HPC Support Officer University of Southern Queensland Toowoomba, Queensland 4350 Australia Email: richard.yo...@usq.edu.au Phone: (07) 46315557 Mob: 0437544370 Fax: (07) 46312798 --------------------------------------------------------------------- _____________________________________________________________ This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. The University of Southern Queensland is a registered provider of education with the Australian Government. (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) ------------------------------------------------------------------------------ Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users ------------------------------------------------------------------------------ Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users _____________________________________________________________ This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. The University of Southern Queensland is a registered provider of education with the Australian Government. (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) ------------------------------------------------------------------------------ Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users ------------------------------------------------------------------------------ Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users _____________________________________________________________ This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. The University of Southern Queensland is a registered provider of education with the Australian Government. (CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 ) ------------------------------------------------------------------------------ Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users ------------------------------------------------------------------------------ Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users