Re: [Lustre-discuss] RE : Lustre-2.4 VMs (EL6.4)

2014-08-19 Thread Andreas Dilger
Often this problem is because the hostname in /etc/hosts is actually mapped to 
localhost on the node itself. 

Unfortunately, this is how some systems are set up by default. 

Cheers, Andreas

> On Aug 19, 2014, at 12:39, "Abhay Dandekar"  wrote:
> 
> I came across a similar situation.
> 
> Below is the log of machine state. These steps worked on some setups while on 
> some it didnt.
> 
> Armaan,
> 
> Were you able to get over the problem ? Any workaround ?
> 
> Thanks in advance for all your help.
> 
> 
> Warm Regards,
> Abhay Dandekar
> 
> 
> -- Forwarded message --
> From: Abhay Dandekar 
> Date: Wed, Aug 6, 2014 at 12:18 AM
> Subject: Lustre configuration failure : lwp-MDT: Communicating with 0@lo, 
> operation mds_connect failed with -11.
> To: lustre-discuss@lists.lustre.org
> 
> 
> 
> Hi All,
> 
> I have come across an lustre installation failure where the MGS is always 
> trying to reach "lo" config instead of configured ethernet.
> 
> These same steps worked on a different machine, somehow they are failing here.
> 
> Here are the logs 
> 
> Lustre installation is success with all the packages installed without any 
> error.
> 
> 0. Lustre version 
> 
> Aug  5 23:07:37 lfs-server kernel: LNet: HW CPU cores: 1, npartitions: 1
> Aug  5 23:07:37 lfs-server modprobe: FATAL: Error inserting crc32c_intel 
> (/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/arch/x86/crypto/crc32c-intel.ko):
>  No such device
> Aug  5 23:07:37 lfs-server kernel: alg: No test for crc32 (crc32-table)
> Aug  5 23:07:37 lfs-server kernel: alg: No test for adler32 (adler32-zlib)
> Aug  5 23:07:41 lfs-server modprobe: FATAL: Error inserting padlock_sha 
> (/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/drivers/crypto/padlock-sha.ko):
>  No such device
> Aug  5 23:07:41 lfs-server kernel: padlock: VIA PadLock Hash Engine not 
> detected.
> Aug  5 23:07:45 lfs-server kernel: Lustre: Lustre: Build Version: 
> 2.5.2-RC2--PRISTINE-2.6.32-431.17.1.el6_lustre.x86_64
> Aug  5 23:07:45 lfs-server kernel: LNet: Added LNI 192.168.122.50@tcp 
> [8/256/0/180]
> Aug  5 23:07:45 lfs-server kernel: LNet: Accept secure, port 988
> 
> 
> 1. Mkfs
> 
> [root@lfs-server ~]# mkfs.lustre --fsname=lustre --mgs --mdt --index=0 
> /dev/sdb 
> 
>Permanent disk data:
> Target: lustre:MDT
> Index:  0
> Lustre FS:  lustre
> Mount type: ldiskfs
> Flags:  0x65
>   (MDT MGS first_time update )
> Persistent mount opts: user_xattr,errors=remount-ro
> Parameters:
> 
> checking for existing Lustre data: not found
> device size = 10240MB
> formatting backing filesystem ldiskfs on /dev/sdb
> target name  lustre:MDT
> 4k blocks 2621440
> options-J size=400 -I 512 -i 2048 -q -O 
> dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E 
> lazy_journal_init -F
> mkfs_cmd = mke2fs -j -b 4096 -L lustre:MDT  -J size=400 -I 512 -i 2048 -q 
> -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E 
> lazy_journal_init -F /dev/sdb 2621440
> Aug  5 17:16:47 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem with 
> ordered data mode. quota=on. Opts: 
> Writing CONFIGS/mountdata
> [root@lfs-server ~]#
> 
> 2. Mount
> 
> [root@lfs-server ~]# mount -t lustre /dev/sdb /mnt/mgs 
> Aug  5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem with 
> ordered data mode. quota=on. Opts: 
> Aug  5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem with 
> ordered data mode. quota=on. Opts: 
> Aug  5 17:18:02 lfs-server kernel: Lustre: ctl-lustre-MDT: No data found 
> on store. Initialize space
> Aug  5 17:18:02 lfs-server kernel: Lustre: lustre-MDT: new disk, 
> initializing
> Aug  5 17:18:02 lfs-server kernel: Lustre: MGS: non-config logname received: 
> params
> Aug  5 17:18:02 lfs-server kernel: LustreError: 11-0: 
> lustre-MDT-lwp-MDT: Communicating with 0@lo, operation mds_connect 
> failed with -11.
> [root@lfs-server ~]# 
> 
> 
> 3. Unmount
> [root@lfs-server ~]# umount /dev/sdb 
> Aug  5 17:19:46 lfs-server kernel: Lustre: Failing over lustre-MDT
> Aug  5 17:19:52 lfs-server kernel: Lustre: 
> 1338:0:(client.c:1908:ptlrpc_expire_one_request()) @@@ Request sent has timed 
> out for slow reply: [sent 1407239386/real 1407239386]  req@88003d795c00 
> x1475596948340888/t0(0) o251->MGC192.168.122.50@tcp@0@lo:26/25 lens 224/224 e 
> 0 to 1 dl 1407239392 ref 2 fl Rpc:XN/0/ rc 0/-1
> [root@lfs-server ~]# Aug  5 17:19:53 lfs-server kernel: Lustre: server umount 
> lustre-MDT complete
> 
> [root@lfs-server ~]# 
> 
> 
> 4. [root@mgs ~]# cat /etc/modprobe.d/lustre.conf 
> options lnet networks=tcp(eth0)
> [root@mgs ~]# 
> 
> 5.Even the lnet configuration is in place, it does not pick up the required 
> eth0.
> 
> [root@mgs ~]# lctl dl 
>   0 UP osd-ldiskfs lustre-MDT-osd lustre-MDT-osd_UUID 8
>   1 UP mgs MGS MGS 5
>   2 UP mgc MGC192.168.122.50@tcp c6ea84c0-b3b2-9d25-8126-32d85956ae4d 5
>   3 UP mds MDS

Re: [Lustre-discuss] Client build fails on Ubuntu 13.10 (3.11 kernel)

2014-08-19 Thread E.S. Rosenberg
On our debian we built 2.4.x like so:

sh autogen.sh

./configure --disable-modules --disable-server --enable-client
--prefix=/path/to/prefix/you/want
The kernel module already ships with 3.11 iirc, though I don't know what
the lustre version compatibility is of 3.11, we use a self built 3.14
kernel and will probably switch to 3.16 in a few weeks.

On our MDS/ODS we use CentOS 6 with lustre rpms (and OFED).

HTH,
Eli


On Tue, Aug 19, 2014 at 6:43 PM, Anjana Kar  wrote:

> Hi,
>
> Has anyone succeeded in building lustre 2.5 client on an Ubuntu system.
> After a "configure --disable-server", the make starts, but fails rather
> quickly
> with these errors
>
> lustre-release.2.5/libcfs/include/libcfs/linux/linux-prim.h:100:1: error:
> unknown type name ‘read_proc_t’
>  typedef read_proc_t cfs_read_proc_t;
> lustre-release.2.5/libcfs/include/libcfs/linux/linux-prim.h:101:1: error:
> unknown type name ‘write_proc_t’
>  typedef write_proc_tcfs_write_proc_t;
>  ^
> ...
> lustre-release.2.5/libcfs/libcfs/linux/linux-tracefile.o] Error 1
>
> Thanks for any pointers.
>
> -Anjana Kar
>  Pittsburgh Supercomputing Center
>  k...@psc.edu
>
>
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Client build fails on Ubuntu 13.10 (3.11 kernel)

2014-08-19 Thread Anjana Kar

Hi,

Has anyone succeeded in building lustre 2.5 client on an Ubuntu system.
After a "configure --disable-server", the make starts, but fails rather 
quickly

with these errors

lustre-release.2.5/libcfs/include/libcfs/linux/linux-prim.h:100:1: 
error: unknown type name ‘read_proc_t’

 typedef read_proc_t cfs_read_proc_t;
lustre-release.2.5/libcfs/include/libcfs/linux/linux-prim.h:101:1: 
error: unknown type name ‘write_proc_t’

 typedef write_proc_tcfs_write_proc_t;
 ^
...
lustre-release.2.5/libcfs/libcfs/linux/linux-tracefile.o] Error 1

Thanks for any pointers.

-Anjana Kar
 Pittsburgh Supercomputing Center
 k...@psc.edu


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] RE : Lustre-2.4 VMs (EL6.4)

2014-08-19 Thread Abhay Dandekar
I came across a similar situation.

Below is the log of machine state. These steps worked on some setups while
on some it didnt.

Armaan,

Were you able to get over the problem ? Any workaround ?

Thanks in advance for all your help.


Warm Regards,
Abhay Dandekar


-- Forwarded message --
From: Abhay Dandekar 
Date: Wed, Aug 6, 2014 at 12:18 AM
Subject: Lustre configuration failure : lwp-MDT: Communicating with 0@lo,
operation mds_connect failed with -11.
To: lustre-discuss@lists.lustre.org



Hi All,

I have come across an lustre installation failure where the MGS is always
trying to reach "lo" config instead of configured ethernet.

These same steps worked on a different machine, somehow they are failing
here.

Here are the logs

Lustre installation is success with all the packages installed without any
error.

0. Lustre version

Aug  5 23:07:37 lfs-server kernel: LNet: HW CPU cores: 1, npartitions: 1
Aug  5 23:07:37 lfs-server modprobe: FATAL: Error inserting crc32c_intel
(/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/arch/x86/crypto/crc32c-intel.ko):
No such device
Aug  5 23:07:37 lfs-server kernel: alg: No test for crc32 (crc32-table)
Aug  5 23:07:37 lfs-server kernel: alg: No test for adler32 (adler32-zlib)
Aug  5 23:07:41 lfs-server modprobe: FATAL: Error inserting padlock_sha
(/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/drivers/crypto/padlock-sha.ko):
No such device
Aug  5 23:07:41 lfs-server kernel: padlock: VIA PadLock Hash Engine not
detected.
Aug  5 23:07:45 lfs-server kernel: Lustre: Lustre: Build Version:
2.5.2-RC2--PRISTINE-2.6.32-431.17.1.el6_lustre.x86_64
Aug  5 23:07:45 lfs-server kernel: LNet: Added LNI 192.168.122.50@tcp
[8/256/0/180]
Aug  5 23:07:45 lfs-server kernel: LNet: Accept secure, port 988


1. Mkfs

[root@lfs-server ~]# mkfs.lustre --fsname=lustre --mgs --mdt --index=0
/dev/sdb

   Permanent disk data:
Target: lustre:MDT
Index:  0
Lustre FS:  lustre
Mount type: ldiskfs
Flags:  0x65
  (MDT MGS first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

checking for existing Lustre data: not found
device size = 10240MB
formatting backing filesystem ldiskfs on /dev/sdb
target name  lustre:MDT
4k blocks 2621440
options-J size=400 -I 512 -i 2048 -q -O
dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E
lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre:MDT  -J size=400 -I 512 -i 2048
-q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E
lazy_journal_init -F /dev/sdb 2621440
Aug  5 17:16:47 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem
with ordered data mode. quota=on. Opts:
Writing CONFIGS/mountdata
[root@lfs-server ~]#

2. Mount

[root@lfs-server ~]# mount -t lustre /dev/sdb /mnt/mgs
Aug  5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem
with ordered data mode. quota=on. Opts:
Aug  5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem
with ordered data mode. quota=on. Opts:
Aug  5 17:18:02 lfs-server kernel: Lustre: ctl-lustre-MDT: No data
found on store. Initialize space
Aug  5 17:18:02 lfs-server kernel: Lustre: lustre-MDT: new disk,
initializing
Aug  5 17:18:02 lfs-server kernel: Lustre: MGS: non-config logname
received: params
Aug  5 17:18:02 lfs-server kernel: LustreError: 11-0:
lustre-MDT-lwp-MDT: Communicating with 0@lo, operation mds_connect
failed with -11.
[root@lfs-server ~]#


3. Unmount
[root@lfs-server ~]# umount /dev/sdb
Aug  5 17:19:46 lfs-server kernel: Lustre: Failing over lustre-MDT
Aug  5 17:19:52 lfs-server kernel: Lustre:
1338:0:(client.c:1908:ptlrpc_expire_one_request()) @@@ Request sent has
timed out for slow reply: [sent 1407239386/real 1407239386]
req@88003d795c00 x1475596948340888/t0(0) o251->MGC192.168.122.50@tcp
@0@lo:26/25 lens 224/224 e 0 to 1 dl 1407239392 ref 2 fl Rpc:XN/0/
rc 0/-1
[root@lfs-server ~]# Aug  5 17:19:53 lfs-server kernel: Lustre: server
umount lustre-MDT complete

[root@lfs-server ~]#


4. [root@mgs ~]# cat /etc/modprobe.d/lustre.conf
options lnet networks=tcp(eth0)
[root@mgs ~]#

5.Even the lnet configuration is in place, it does not pick up the required
eth0.

[root@mgs ~]# lctl dl
  0 UP osd-ldiskfs lustre-MDT-osd lustre-MDT-osd_UUID 8
  1 UP mgs MGS MGS 5
  2 UP mgc MGC192.168.122.50@tcp c6ea84c0-b3b2-9d25-8126-32d85956ae4d 5
  3 UP mds MDS MDS_uuid 3
  4 UP lod lustre-MDT-mdtlov lustre-MDT-mdtlov_UUID 4
  5 UP mdt lustre-MDT lustre-MDT_UUID 5
  6 UP mdd lustre-MDD lustre-MDD_UUID 4
  7 UP qmt lustre-QMT lustre-QMT_UUID 4
  8 UP lwp lustre-MDT-lwp-MDT lustre-MDT-lwp-MDT_UUID 5
[root@mgs ~]#

Any pointers to go ahead ??


Warm Regards,
Abhay Dandekar
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss