[Lustre-discuss] Speeding up configuration log regeneration?

2013-10-17 Thread Olli Lounela
Hi, We run four-node Lustre 2.3, and I needed to both change hardware under MGS/MDS and reassign an OSS ip. Just the same, I added a brand new 10GE network to the system, which was the reason for MDS hardware change. I ran tunefs.lustre --writeconf as per chapter 14.4 in Lustre Manual,

Re: [Lustre-discuss] Speeding up configuration log regeneration?

2013-10-17 Thread Dilger, Andreas
On 2013/10/17 5:34 AM, Olli Lounela olli.loun...@helsinki.fi wrote: Hi, We run four-node Lustre 2.3, and I needed to both change hardware under MGS/MDS and reassign an OSS ip. Just the same, I added a brand new 10GE network to the system, which was the reason for MDS hardware change. Note that

Re: [Lustre-discuss] lustre 1.8.5 client failed to mount lustre

2013-10-17 Thread Weilin Chang
I got the following message for lctl list_nids: Opening /dev/lnet failed: No such device Hint: the kerenel modules may not be loaded IOC_LIBCFS_GET_NI error 19: No such device It looks like the lunstre packet has been installed, but it does not hook with kernel correctly. Which part went wrong?

Re: [Lustre-discuss] lustre 1.8.5 client failed to mount lustre

2013-10-17 Thread Weilin Chang
[root@localhost client]# uname -a Linux localhost.localdomain 2.6.18-194.el5PAE #1 SMP Fri Apr 2 15:37:44 EDT 2010 i686 i686 i386 GNU/Linux Is this version OK for luster 1.8.5? -Weilin -Original Message- From: Diep, Minh [mailto:minh.d...@intel.com] Sent: Thursday, October 17, 2013

Re: [Lustre-discuss] lustre 1.8.5 client failed to mount lustre

2013-10-17 Thread Dilger, Andreas
On 2013/10/16 4:45 PM, Weilin Chang weilin.ch...@huawei.com wrote: HI, I am using luster 1.8.5. Server s are up and mounted without any problem. But client failed to mount the luster file system. I also did not see luster in /proc/filesystems. Is there other rpm I needed to install on the

Re: [Lustre-discuss] lustre 1.8.5 client failed to mount lustre

2013-10-17 Thread Weilin Chang
Lustre_rmmod failed with the following message: Open /proc/sys/lnet/dump_kernel failed: No such file or directory Open(dump_kernel) failed: No such file or directory The package does not install correct for the directory /proc/syus/lnet does not exist. -Weilin -Original Message- From:

Re: [Lustre-discuss] lustre 1.8.5 client failed to mount lustre

2013-10-17 Thread Weilin Chang
I agree with your observation. My purpose is to run Lustre on a 32 bit Linux system. The latest Lustre release does not support kernel patch for 32 bit Linux system. I don't know how well to generate a 32 bit Linux kernel from a 64 bit Linux system without having other problems. I think the

Re: [Lustre-discuss] lustre 1.8.5 client failed to mount lustre

2013-10-17 Thread Jeff Johnson
Weilin, Earlier in the week you were discussing trying to get Lustre running on 32-bit ARM. Are the systems you have installed 1.8.5 upon x86 systems or are you doing this on ARM processor based platforms? --Jeff On 10/17/13 11:10 AM, Weilin Chang wrote: Lustre_rmmod failed with the

Re: [Lustre-discuss] lustre 1.8.5 client failed to mount lustre

2013-10-17 Thread White, Cliff
On 10/17/13 11:10 AM, Weilin Chang weilin.ch...@huawei.com wrote: Lustre_rmmod failed with the following message: Open /proc/sys/lnet/dump_kernel failed: No such file or directory Open(dump_kernel) failed: No such file or directory The package does not install correct for the directory

[Lustre-discuss] Broken communication between OSS and Client on Lustre 2.4

2013-10-17 Thread Eduardo Murrieta
Hello, this is my first post on this list, I hope someone can give me some advise on how to resolve the following issue. I'm using the lustre release 2.4.0 RC2 compiled from whamcloud sources, this is an upgrade from lustre 2.2.22 from same sources. The situation is: There are several clients

Re: [Lustre-discuss] Broken communication between OSS and Client on Lustre 2.4

2013-10-17 Thread Jeff Johnson
Hola Eduardo, How are the OSTs connected to the OSS (SAS, FC, Infiniband SRP)? Are there any non-Lustre errors in the dmesg output of the OSS? Block devices error on the OSS (/dev/sd?)? If you are losing [scsi,sas,fc,srp] connectivity you may see this sort of thing. If the OSTs are connected to

Re: [Lustre-discuss] Broken communication between OSS and Client on Lustre 2.4

2013-10-17 Thread Eduardo Murrieta
Hello Jeff, Non, this is a lustre filesystem for Instituto de Ciencias Nucleares at UNAM, we are working on the installation for Alice at DGTIC too, but this problem is with our local filesystem. The OST is connected using a LSI-SAS controller, we have 8 OSTs on the same server, there are nodes

Re: [Lustre-discuss] Broken communication between OSS and Client on Lustre 2.4

2013-10-17 Thread Joseph Landman
Are there device or Filesystem level error messages on the server? This almost looks like a corrupted file system. Please pardon brevity and typos ... Sent from my iPhone On Oct 17, 2013, at 6:11 PM, Eduardo Murrieta emurri...@nucleares.unam.mx wrote: Hello Jeff, Non, this is a lustre

Re: [Lustre-discuss] Broken communication between OSS and Client on Lustre 2.4

2013-10-17 Thread Jeff Johnson
Ah, I understand. I performed the onsite Lustre installation of Alice and worked with JLG and his staff. Nice group of people! This seems like a backend issue. Ldiskfs or the LSI RAID devices. Do you see any read/write failures reported on the OSS of the sd block devices where the OSTs reside?

Re: [Lustre-discuss] Broken communication between OSS and Client on Lustre 2.4

2013-10-17 Thread Eduardo Murrieta
I have this on the debug_file from my OSS: 0010:02000400:0.0:1382055634.785734:0:3099:0:(ost_handler.c:940:ost_brw_read()) lustre-OST: Bulk IO read error with 0afb2e4c-d 870-47ef-c16f-4d2bce6dabf9 (at 10.2.64.4@o2ib), client will retry: rc -107

Re: [Lustre-discuss] Broken communication between OSS and Client on Lustre 2.4

2013-10-17 Thread Jeff Johnson
Eduardo, One or two E5506 CPUs in the OSS? What is the specific LSI controller and how many of them in the OSS? I think the OSS is under provisioned for 8 OSTs. I'm betting you run a high iowait on those sd devices during your problematic run. The iowait probably grows until deadlock. Can you