Hi,
 
If the console output covers the whole process, seems the the initrd boot up process did not reach the rootimg download phase. And there is a `[2019-03-14T10:20:39-03:00] [   37.280041] systemd-fstab-generator[261]: Could not find a root= entry on the kernel command line.`, 
 
several questions:
 
1. what is the node status: `lsdef <node> -i status,statustime`? is it changed to "netbooting"? 
2. did you provision a batch of nodes with the same osimage? did the issue always appear on the same node? 
3. please install xCAT-probe on you MN, run `xcatprobe xcatmn` to check if any configuration issue.
and watch `xcatprobe osdeploy -n <failing node>` in 1 terminal session, and then kick off provision. You will find the provision progress in the xcatprobe session. Please provide that session output.
------------------------------------------------------------------------------
YANG Song (杨嵩)
IBM China System Technology Laboratory
Tel: 86-10-82452903
Email: yang...@cn.ibm.com
Address: Building 28, ZhongGuanCun Software Park,
No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC

北京市海淀区东北旺西路8号中关村软件园28号楼
邮编: 100193
 
 
----- Original message -----
From: Angelo Cavalcanti <angelo.cavalca...@gmail.com>
To: Song BJ Yang <yang...@cn.ibm.com>
Cc: xcat-user@lists.sourceforge.net
Subject: Re: [xcat-user] [External] Netboot process stuck
Date: Fri, Mar 15, 2019 10:37 AM
 
Thanks Song,
 
I added the following kernel parameters:
 
debug ignore_loglevel log_buf_len=10M print_fatal_signals=1
 
The console output file is attached. I noticed that the machine's devices were not found in the udev database.
 
Regards,
 
Angelo Cavalcanti
br.linkedin.com/in/angelocr

Enviado do Gmail Android App
 
Em qui, 14 de mar de 2019 06:41, Song BJ Yang <yang...@cn.ibm.com> escreveu:
Hi,
 
We encountered a similar issue  https://github.com/xcat2/xcat-core/issues/274 , but in this case the console uncovered the root cause.
 
 
However, your console output does not show why the boot up process hang.   I suggest you add more verbose output during boot up, this is a reference dochttps://www.askapache.com/linux/linux-debugging/  on how to get more debug info during kernel boot up.
 
To apply the kernel options during diskless kernel boot up, you can leverage the `addkcmdline` attribute, an example for addkcmdline usage ,  
chdef mid05tor12cn05  addkcmdline="debug ignore_loglevel log_buf_len=10M print_fatal_signals=1 LOGLEVEL=8 earlyprintk=vga,keep sched_debug"
 
 
good luck
 
------------------------------------------------------------------------------
YANG Song (杨嵩)
IBM China System Technology Laboratory
Tel: 86-10-82452903
Email: yang...@cn.ibm.com
Address: Building 28, ZhongGuanCun Software Park,
No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC

北京市海淀区东北旺西路8号中关村软件园28号楼
邮编: 100193
 
 
----- Original message -----
From: Jarrod Johnson <jjohns...@lenovo.com>
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Cc:
Subject: Re: [xcat-user] [External] Netboot process stuck
Date: Thu, Mar 14, 2019 5:33 AM
 

What does the boot kernel get command line wise (e.g. /tftpboot/xcat/xnba/nodes/<nodename>)

 

From: Angelo Cavalcanti <angelo.cavalca...@gmail.com>
Sent: Wednesday, March 13, 2019 4:23 PM
To: xcat-user@lists.sourceforge.net
Subject: [External] [xcat-user] Netboot process stuck

 

Hi everyone,

 

I've setup various nodes to netboot image but one of them stuck in boot process, below:

 

[  485.127448] systemd[1]: Reached target Sockets.

[  515.134248] systemd[1]: Started Journal Service.

[  635.252091] RPC: Registered named UNIX socket transport module.

[  635.258163] RPC: Registered udp transport module.

[  635.263004] RPC: Registered tcp transport module.

[  635.267847] RPC: Registered tcp NFSv4.1 backchannel transport module.

[  845.308189] pps_core: LinuxPPS API ver. 1 registered

[  845.313333] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giome...@linux.it>

[  845.325629] PTP clock support registered

[  845.332035] dca service started, version 1.12.1

[  845.343852] mlx4_core: Mellanox ConnectX core driver v4.0-0

[  845.349585] mlx4_core: Initializing 0000:04:00.0

[  845.364042] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k

[  845.368351] ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 6 Gbps 0x3f impl SATA mode

[  845.368355] ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pio slum part ems apst

[  845.380249] scsi host0: ahci

[  845.380938] scsi host1: ahci

[  845.382757] scsi host2: ahci

[  845.383161] scsi host3: ahci

[  845.386805] scsi host4: ahci

[  845.387739] scsi host5: ahci

[  845.387852] ata1: SATA max UDMA/133 abar m2048@0xde100000 port 0xde100100 irq 39

[  845.387855] ata2: SATA max UDMA/133 abar m2048@0xde100000 port 0xde100180 irq 39

[  845.387857] ata3: SATA max UDMA/133 abar m2048@0xde100000 port 0xde100200 irq 39

[  845.387860] ata4: SATA max UDMA/133 abar m2048@0xde100000 port 0xde100280 irq 39

[  845.387862] ata5: SATA max UDMA/133 abar m2048@0xde100000 port 0xde100300 irq 39

[  845.387865] ata6: SATA max UDMA/133 abar m2048@0xde100000 port 0xde100380 irq 39

[  845.451810] igb: Copyright (c) 2007-2014 Intel Corporation.

[  845.511186] igb 0000:81:00.0: added PHC on eth0

[  845.515869] igb 0000:81:00.0: Intel(R) Gigabit Ethernet Network Connection

[  845.522896] igb 0000:81:00.0: eth0: (PCIe:5.0Gb/s:Width x4) 00:25:90:6c:a8:a2

[  845.530236] igb 0000:81:00.0: eth0: PBA No: 104900-000

[  845.535510] igb 0000:81:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)

[  845.598681] igb 0000:81:00.1: added PHC on eth1

[  845.603379] igb 0000:81:00.1: Intel(R) Gigabit Ethernet Network Connection

[  845.610404] igb 0000:81:00.1: eth1: (PCIe:5.0Gb/s:Width x4) 00:25:90:6c:a8:a3

[  845.617768] igb 0000:81:00.1: eth1: PBA No: 104900-000

[  845.623050] igb 0000:81:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)

[  845.694180] ata4: SATA link down (SStatus 0 SControl 300)

[  845.699730] ata3: SATA link down (SStatus 0 SControl 300)

[  845.705276] ata2: SATA link down (SStatus 0 SControl 300)

[  845.710813] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

[  845.717162] ata5: SATA link down (SStatus 0 SControl 300)

[  845.722738] ata6: SATA link down (SStatus 0 SControl 300)

[  845.728523] ata1.00: ATA-8: ST91000640NS, SN03, max UDMA/133

[  845.734330] ata1.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32)

[  845.742162] ata1.00: configured for UDMA/133

[  845.746833] scsi 0:0:0:0: Direct-Access     ATA      ST91000640NS     SN03 PQ: 0 ANSI: 5

[  845.791663] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)

[  845.799600] sd 0:0:0:0: [sda] Write Protect is off

[  845.804545] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

[  845.820557]  sda: sda1 sda2

[  845.823892] sd 0:0:0:0: [sda] Attached SCSI disk

[  851.888095] mlx4_core 0000:04:00.0: Old device ETS support detected

[  851.894490] mlx4_core 0000:04:00.0: Consider upgrading device FW.

[  852.632012] mlx4_core 0000:04:00.0: PCIe link speed is 8.0GT/s, device supports 8.0GT/s

[  852.640265] mlx4_core 0000:04:00.0: PCIe link width is x8, device supports x8

[  852.787509] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0

[  945.585085] igb 0000:81:00.0: changing MTU from 1500 to 2044

[  949.821377] igb 0000:81:00.0 enp129s0f0: igb: enp129s0f0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX

[  950.903525] random: crng init done

 

Notice that the boot process is slow.

 

The machine has the following configuration:

2x Intel Xeon E5-2670

256GB RAM

Motherboard Supermicro X9DRG-HF

HDD 1TB

Mellanox Infiniband ConnectX-3 card (MT27500)

GPGPU nVidia Tesla M2075

 

I removed all off-board cards and HDD. The boot process stays stuck in the same stage. I installed CentOS 7 minimal ISO on HDD and the problem did not occur.

 

Regards,

 

--

_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
 
 

_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to