Re: [Lustre-discuss] RE : Lustre-2.4 VMs (EL6.4)
Often this problem is because the hostname in /etc/hosts is actually mapped to localhost on the node itself. Unfortunately, this is how some systems are set up by default. Cheers, Andreas > On Aug 19, 2014, at 12:39, "Abhay Dandekar" wrote: > > I came across a similar situation. > > Below is the log of machine state. These steps worked on some setups while on > some it didnt. > > Armaan, > > Were you able to get over the problem ? Any workaround ? > > Thanks in advance for all your help. > > > Warm Regards, > Abhay Dandekar > > > -- Forwarded message -- > From: Abhay Dandekar > Date: Wed, Aug 6, 2014 at 12:18 AM > Subject: Lustre configuration failure : lwp-MDT: Communicating with 0@lo, > operation mds_connect failed with -11. > To: lustre-discuss@lists.lustre.org > > > > Hi All, > > I have come across an lustre installation failure where the MGS is always > trying to reach "lo" config instead of configured ethernet. > > These same steps worked on a different machine, somehow they are failing here. > > Here are the logs > > Lustre installation is success with all the packages installed without any > error. > > 0. Lustre version > > Aug 5 23:07:37 lfs-server kernel: LNet: HW CPU cores: 1, npartitions: 1 > Aug 5 23:07:37 lfs-server modprobe: FATAL: Error inserting crc32c_intel > (/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/arch/x86/crypto/crc32c-intel.ko): > No such device > Aug 5 23:07:37 lfs-server kernel: alg: No test for crc32 (crc32-table) > Aug 5 23:07:37 lfs-server kernel: alg: No test for adler32 (adler32-zlib) > Aug 5 23:07:41 lfs-server modprobe: FATAL: Error inserting padlock_sha > (/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/drivers/crypto/padlock-sha.ko): > No such device > Aug 5 23:07:41 lfs-server kernel: padlock: VIA PadLock Hash Engine not > detected. > Aug 5 23:07:45 lfs-server kernel: Lustre: Lustre: Build Version: > 2.5.2-RC2--PRISTINE-2.6.32-431.17.1.el6_lustre.x86_64 > Aug 5 23:07:45 lfs-server kernel: LNet: Added LNI 192.168.122.50@tcp > [8/256/0/180] > Aug 5 23:07:45 lfs-server kernel: LNet: Accept secure, port 988 > > > 1. Mkfs > > [root@lfs-server ~]# mkfs.lustre --fsname=lustre --mgs --mdt --index=0 > /dev/sdb > >Permanent disk data: > Target: lustre:MDT > Index: 0 > Lustre FS: lustre > Mount type: ldiskfs > Flags: 0x65 > (MDT MGS first_time update ) > Persistent mount opts: user_xattr,errors=remount-ro > Parameters: > > checking for existing Lustre data: not found > device size = 10240MB > formatting backing filesystem ldiskfs on /dev/sdb > target name lustre:MDT > 4k blocks 2621440 > options-J size=400 -I 512 -i 2048 -q -O > dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E > lazy_journal_init -F > mkfs_cmd = mke2fs -j -b 4096 -L lustre:MDT -J size=400 -I 512 -i 2048 -q > -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E > lazy_journal_init -F /dev/sdb 2621440 > Aug 5 17:16:47 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem with > ordered data mode. quota=on. Opts: > Writing CONFIGS/mountdata > [root@lfs-server ~]# > > 2. Mount > > [root@lfs-server ~]# mount -t lustre /dev/sdb /mnt/mgs > Aug 5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem with > ordered data mode. quota=on. Opts: > Aug 5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem with > ordered data mode. quota=on. Opts: > Aug 5 17:18:02 lfs-server kernel: Lustre: ctl-lustre-MDT: No data found > on store. Initialize space > Aug 5 17:18:02 lfs-server kernel: Lustre: lustre-MDT: new disk, > initializing > Aug 5 17:18:02 lfs-server kernel: Lustre: MGS: non-config logname received: > params > Aug 5 17:18:02 lfs-server kernel: LustreError: 11-0: > lustre-MDT-lwp-MDT: Communicating with 0@lo, operation mds_connect > failed with -11. > [root@lfs-server ~]# > > > 3. Unmount > [root@lfs-server ~]# umount /dev/sdb > Aug 5 17:19:46 lfs-server kernel: Lustre: Failing over lustre-MDT > Aug 5 17:19:52 lfs-server kernel: Lustre: > 1338:0:(client.c:1908:ptlrpc_expire_one_request()) @@@ Request sent has timed > out for slow reply: [sent 1407239386/real 1407239386] req@88003d795c00 > x1475596948340888/t0(0) o251->MGC192.168.122.50@tcp@0@lo:26/25 lens 224/224 e > 0 to 1 dl 1407239392 ref 2 fl Rpc:XN/0/ rc 0/-1 > [root@lfs-server ~]# Aug 5 17:19:53 lfs-server kernel: Lustre: server umount > lustre-MDT complete > > [root@lfs-server ~]# > > > 4. [root@mgs ~]# cat /etc/modprobe.d/lustre.conf > options lnet networks=tcp(eth0) > [root@mgs ~]# > > 5.Even the lnet configuration is in place, it does not pick up the required > eth0. > > [root@mgs ~]# lctl dl > 0 UP osd-ldiskfs lustre-MDT-osd lustre-MDT-osd_UUID 8 > 1 UP mgs MGS MGS 5 > 2 UP mgc MGC192.168.122.50@tcp c6ea84c0-b3b2-9d25-8126-32d85956ae4d 5 > 3 UP mds MDS
Re: [Lustre-discuss] Client build fails on Ubuntu 13.10 (3.11 kernel)
On our debian we built 2.4.x like so: sh autogen.sh ./configure --disable-modules --disable-server --enable-client --prefix=/path/to/prefix/you/want The kernel module already ships with 3.11 iirc, though I don't know what the lustre version compatibility is of 3.11, we use a self built 3.14 kernel and will probably switch to 3.16 in a few weeks. On our MDS/ODS we use CentOS 6 with lustre rpms (and OFED). HTH, Eli On Tue, Aug 19, 2014 at 6:43 PM, Anjana Kar wrote: > Hi, > > Has anyone succeeded in building lustre 2.5 client on an Ubuntu system. > After a "configure --disable-server", the make starts, but fails rather > quickly > with these errors > > lustre-release.2.5/libcfs/include/libcfs/linux/linux-prim.h:100:1: error: > unknown type name ‘read_proc_t’ > typedef read_proc_t cfs_read_proc_t; > lustre-release.2.5/libcfs/include/libcfs/linux/linux-prim.h:101:1: error: > unknown type name ‘write_proc_t’ > typedef write_proc_tcfs_write_proc_t; > ^ > ... > lustre-release.2.5/libcfs/libcfs/linux/linux-tracefile.o] Error 1 > > Thanks for any pointers. > > -Anjana Kar > Pittsburgh Supercomputing Center > k...@psc.edu > > > ___ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Client build fails on Ubuntu 13.10 (3.11 kernel)
Hi, Has anyone succeeded in building lustre 2.5 client on an Ubuntu system. After a "configure --disable-server", the make starts, but fails rather quickly with these errors lustre-release.2.5/libcfs/include/libcfs/linux/linux-prim.h:100:1: error: unknown type name ‘read_proc_t’ typedef read_proc_t cfs_read_proc_t; lustre-release.2.5/libcfs/include/libcfs/linux/linux-prim.h:101:1: error: unknown type name ‘write_proc_t’ typedef write_proc_tcfs_write_proc_t; ^ ... lustre-release.2.5/libcfs/libcfs/linux/linux-tracefile.o] Error 1 Thanks for any pointers. -Anjana Kar Pittsburgh Supercomputing Center k...@psc.edu ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] RE : Lustre-2.4 VMs (EL6.4)
I came across a similar situation. Below is the log of machine state. These steps worked on some setups while on some it didnt. Armaan, Were you able to get over the problem ? Any workaround ? Thanks in advance for all your help. Warm Regards, Abhay Dandekar -- Forwarded message -- From: Abhay Dandekar Date: Wed, Aug 6, 2014 at 12:18 AM Subject: Lustre configuration failure : lwp-MDT: Communicating with 0@lo, operation mds_connect failed with -11. To: lustre-discuss@lists.lustre.org Hi All, I have come across an lustre installation failure where the MGS is always trying to reach "lo" config instead of configured ethernet. These same steps worked on a different machine, somehow they are failing here. Here are the logs Lustre installation is success with all the packages installed without any error. 0. Lustre version Aug 5 23:07:37 lfs-server kernel: LNet: HW CPU cores: 1, npartitions: 1 Aug 5 23:07:37 lfs-server modprobe: FATAL: Error inserting crc32c_intel (/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/arch/x86/crypto/crc32c-intel.ko): No such device Aug 5 23:07:37 lfs-server kernel: alg: No test for crc32 (crc32-table) Aug 5 23:07:37 lfs-server kernel: alg: No test for adler32 (adler32-zlib) Aug 5 23:07:41 lfs-server modprobe: FATAL: Error inserting padlock_sha (/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/drivers/crypto/padlock-sha.ko): No such device Aug 5 23:07:41 lfs-server kernel: padlock: VIA PadLock Hash Engine not detected. Aug 5 23:07:45 lfs-server kernel: Lustre: Lustre: Build Version: 2.5.2-RC2--PRISTINE-2.6.32-431.17.1.el6_lustre.x86_64 Aug 5 23:07:45 lfs-server kernel: LNet: Added LNI 192.168.122.50@tcp [8/256/0/180] Aug 5 23:07:45 lfs-server kernel: LNet: Accept secure, port 988 1. Mkfs [root@lfs-server ~]# mkfs.lustre --fsname=lustre --mgs --mdt --index=0 /dev/sdb Permanent disk data: Target: lustre:MDT Index: 0 Lustre FS: lustre Mount type: ldiskfs Flags: 0x65 (MDT MGS first_time update ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: checking for existing Lustre data: not found device size = 10240MB formatting backing filesystem ldiskfs on /dev/sdb target name lustre:MDT 4k blocks 2621440 options-J size=400 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F mkfs_cmd = mke2fs -j -b 4096 -L lustre:MDT -J size=400 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/sdb 2621440 Aug 5 17:16:47 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem with ordered data mode. quota=on. Opts: Writing CONFIGS/mountdata [root@lfs-server ~]# 2. Mount [root@lfs-server ~]# mount -t lustre /dev/sdb /mnt/mgs Aug 5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem with ordered data mode. quota=on. Opts: Aug 5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem with ordered data mode. quota=on. Opts: Aug 5 17:18:02 lfs-server kernel: Lustre: ctl-lustre-MDT: No data found on store. Initialize space Aug 5 17:18:02 lfs-server kernel: Lustre: lustre-MDT: new disk, initializing Aug 5 17:18:02 lfs-server kernel: Lustre: MGS: non-config logname received: params Aug 5 17:18:02 lfs-server kernel: LustreError: 11-0: lustre-MDT-lwp-MDT: Communicating with 0@lo, operation mds_connect failed with -11. [root@lfs-server ~]# 3. Unmount [root@lfs-server ~]# umount /dev/sdb Aug 5 17:19:46 lfs-server kernel: Lustre: Failing over lustre-MDT Aug 5 17:19:52 lfs-server kernel: Lustre: 1338:0:(client.c:1908:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1407239386/real 1407239386] req@88003d795c00 x1475596948340888/t0(0) o251->MGC192.168.122.50@tcp @0@lo:26/25 lens 224/224 e 0 to 1 dl 1407239392 ref 2 fl Rpc:XN/0/ rc 0/-1 [root@lfs-server ~]# Aug 5 17:19:53 lfs-server kernel: Lustre: server umount lustre-MDT complete [root@lfs-server ~]# 4. [root@mgs ~]# cat /etc/modprobe.d/lustre.conf options lnet networks=tcp(eth0) [root@mgs ~]# 5.Even the lnet configuration is in place, it does not pick up the required eth0. [root@mgs ~]# lctl dl 0 UP osd-ldiskfs lustre-MDT-osd lustre-MDT-osd_UUID 8 1 UP mgs MGS MGS 5 2 UP mgc MGC192.168.122.50@tcp c6ea84c0-b3b2-9d25-8126-32d85956ae4d 5 3 UP mds MDS MDS_uuid 3 4 UP lod lustre-MDT-mdtlov lustre-MDT-mdtlov_UUID 4 5 UP mdt lustre-MDT lustre-MDT_UUID 5 6 UP mdd lustre-MDD lustre-MDD_UUID 4 7 UP qmt lustre-QMT lustre-QMT_UUID 4 8 UP lwp lustre-MDT-lwp-MDT lustre-MDT-lwp-MDT_UUID 5 [root@mgs ~]# Any pointers to go ahead ?? Warm Regards, Abhay Dandekar ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss