Thanks Song, I added the following kernel parameters:
debug ignore_loglevel log_buf_len=10M print_fatal_signals=1 The console output file is attached. I noticed that the machine's devices were not found in the udev database. Regards, Angelo Cavalcanti br.linkedin.com/in/angelocr Enviado do Gmail Android App Em qui, 14 de mar de 2019 06:41, Song BJ Yang <yang...@cn.ibm.com> escreveu: > Hi, > > We encountered a similar issue > https://github.com/xcat2/xcat-core/issues/274 , but in this case the > console uncovered the root cause. > > > However, your console output does not show why the boot up process hang. > I suggest you add more verbose output during boot up, this is a reference > dochttps://www.askapache.com/linux/linux-debugging/ on how to get more > debug info during kernel boot up. > > To apply the kernel options during diskless kernel boot up, you can > leverage the `addkcmdline` attribute, an example for addkcmdline usage , > chdef mid05tor12cn05 addkcmdline="debug ignore_loglevel log_buf_len=10M > print_fatal_signals=1 LOGLEVEL=8 earlyprintk=vga,keep sched_debug" > > > good luck > > > ------------------------------------------------------------------------------ > YANG Song (杨嵩) > IBM China System Technology Laboratory > Tel: 86-10-82452903 > Email: yang...@cn.ibm.com > Address: Building 28, ZhongGuanCun Software Park, > No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC > > 北京市海淀区东北旺西路8号中关村软件园28号楼 > 邮编: 100193 > > > > ----- Original message ----- > From: Jarrod Johnson <jjohns...@lenovo.com> > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > Cc: > Subject: Re: [xcat-user] [External] Netboot process stuck > Date: Thu, Mar 14, 2019 5:33 AM > > > What does the boot kernel get command line wise (e.g. > /tftpboot/xcat/xnba/nodes/<nodename>) > > > > *From:* Angelo Cavalcanti <angelo.cavalca...@gmail.com> > *Sent:* Wednesday, March 13, 2019 4:23 PM > *To:* xcat-user@lists.sourceforge.net > *Subject:* [External] [xcat-user] Netboot process stuck > > > > Hi everyone, > > > > I've setup various nodes to netboot image but one of them stuck in boot > process, below: > > > > [ 485.127448] systemd[1]: Reached target Sockets. > > [ 515.134248] systemd[1]: Started Journal Service. > > [ 635.252091] RPC: Registered named UNIX socket transport module. > > [ 635.258163] RPC: Registered udp transport module. > > [ 635.263004] RPC: Registered tcp transport module. > > [ 635.267847] RPC: Registered tcp NFSv4.1 backchannel transport module. > > [ 845.308189] pps_core: LinuxPPS API ver. 1 registered > > [ 845.313333] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo > Giometti <giome...@linux.it> > > [ 845.325629] PTP clock support registered > > [ 845.332035] dca service started, version 1.12.1 > > [ 845.343852] mlx4_core: Mellanox ConnectX core driver v4.0-0 > > [ 845.349585] mlx4_core: Initializing 0000:04:00.0 > > [ 845.364042] igb: Intel(R) Gigabit Ethernet Network Driver - version > 5.4.0-k > > [ 845.368351] ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 6 Gbps > 0x3f impl SATA mode > > [ 845.368355] ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pio > slum part ems apst > > [ 845.380249] scsi host0: ahci > > [ 845.380938] scsi host1: ahci > > [ 845.382757] scsi host2: ahci > > [ 845.383161] scsi host3: ahci > > [ 845.386805] scsi host4: ahci > > [ 845.387739] scsi host5: ahci > > [ 845.387852] ata1: SATA max UDMA/133 abar m2048@0xde100000 port > 0xde100100 irq 39 > > [ 845.387855] ata2: SATA max UDMA/133 abar m2048@0xde100000 port > 0xde100180 irq 39 > > [ 845.387857] ata3: SATA max UDMA/133 abar m2048@0xde100000 port > 0xde100200 irq 39 > > [ 845.387860] ata4: SATA max UDMA/133 abar m2048@0xde100000 port > 0xde100280 irq 39 > > [ 845.387862] ata5: SATA max UDMA/133 abar m2048@0xde100000 port > 0xde100300 irq 39 > > [ 845.387865] ata6: SATA max UDMA/133 abar m2048@0xde100000 port > 0xde100380 irq 39 > > [ 845.451810] igb: Copyright (c) 2007-2014 Intel Corporation. > > [ 845.511186] igb 0000:81:00.0: added PHC on eth0 > > [ 845.515869] igb 0000:81:00.0: Intel(R) Gigabit Ethernet Network > Connection > > [ 845.522896] igb 0000:81:00.0: eth0: (PCIe:5.0Gb/s:Width x4) > 00:25:90:6c:a8:a2 > > [ 845.530236] igb 0000:81:00.0: eth0: PBA No: 104900-000 > > [ 845.535510] igb 0000:81:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 > tx queue(s) > > [ 845.598681] igb 0000:81:00.1: added PHC on eth1 > > [ 845.603379] igb 0000:81:00.1: Intel(R) Gigabit Ethernet Network > Connection > > [ 845.610404] igb 0000:81:00.1: eth1: (PCIe:5.0Gb/s:Width x4) > 00:25:90:6c:a8:a3 > > [ 845.617768] igb 0000:81:00.1: eth1: PBA No: 104900-000 > > [ 845.623050] igb 0000:81:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 > tx queue(s) > > [ 845.694180] ata4: SATA link down (SStatus 0 SControl 300) > > [ 845.699730] ata3: SATA link down (SStatus 0 SControl 300) > > [ 845.705276] ata2: SATA link down (SStatus 0 SControl 300) > > [ 845.710813] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > > [ 845.717162] ata5: SATA link down (SStatus 0 SControl 300) > > [ 845.722738] ata6: SATA link down (SStatus 0 SControl 300) > > [ 845.728523] ata1.00: ATA-8: ST91000640NS, SN03, max UDMA/133 > > [ 845.734330] ata1.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth > 31/32) > > [ 845.742162] ata1.00: configured for UDMA/133 > > [ 845.746833] scsi 0:0:0:0: Direct-Access ATA ST91000640NS > SN03 PQ: 0 ANSI: 5 > > [ 845.791663] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 > TB/931 GiB) > > [ 845.799600] sd 0:0:0:0: [sda] Write Protect is off > > [ 845.804545] sd 0:0:0:0: [sda] Write cache: enabled, read cache: > enabled, doesn't support DPO or FUA > > [ 845.820557] sda: sda1 sda2 > > [ 845.823892] sd 0:0:0:0: [sda] Attached SCSI disk > > [ 851.888095] mlx4_core 0000:04:00.0: Old device ETS support detected > > [ 851.894490] mlx4_core 0000:04:00.0: Consider upgrading device FW. > > [ 852.632012] mlx4_core 0000:04:00.0: PCIe link speed is 8.0GT/s, device > supports 8.0GT/s > > [ 852.640265] mlx4_core 0000:04:00.0: PCIe link width is x8, device > supports x8 > > [ 852.787509] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0 > > [ 945.585085] igb 0000:81:00.0: changing MTU from 1500 to 2044 > > [ 949.821377] igb 0000:81:00.0 enp129s0f0: igb: enp129s0f0 NIC Link is Up > 1000 Mbps Full Duplex, Flow Control: RX > > [ 950.903525] random: crng init done > > > > Notice that the boot process is slow. > > > > The machine has the following configuration: > > 2x Intel Xeon E5-2670 > > 256GB RAM > > Motherboard Supermicro X9DRG-HF > > HDD 1TB > > Mellanox Infiniband ConnectX-3 card (MT27500) > > GPGPU nVidia Tesla M2075 > > > > I removed all off-board cards and HDD. The boot process stays stuck in the > same stage. I installed CentOS 7 minimal ISO on HDD and the problem did not > occur. > > > > Regards, > > > > -- > > Angelo Cavalcanti > br.linkedin.com/in/angelocr > _______________________________________________ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > >
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user