[lustre-discuss] upgrading from 2.5.2 to 2.9

2017-03-29 Thread Anjana Kar

Hi,

Has anyone upgraded lustre servers from 2.5.x to 2.9?

Our setup is one MDS (ldiskfs) hosting two MDTs, and one OSS (zfs OSTs)

running version 2.5.2 on Centos 6.x for two lustre filesystems, ~350TB.

Recently writes from a 2.9 client has been crashing the MDS, which prompted

us to look into the upgrade path. Is this upgrade feasible, and if so, 
is there


documentation we can follow for backing up the MDS, and doing the upgrade.

Thanks in advance,

-Anjana Kar

 Pittsburgh Supercomputing Center

 k...@psc.edu

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre 2.5.x on Ubuntu 14.04

2015-12-07 Thread Anjana Kar

The compile fails on a 3.13 kernel with errors in libcfs.
Are patches available to fix these errors?

~/lustre-release/libcfs/include/libcfs/linux/linux-prim.h:100:1:
error: unknown type name ‘read_proc_t’
 typedef read_proc_t cfs_read_proc_t;

~/lustre-release/libcfs/include/libcfs/params_tree.h:85:17:
error: dereferencing pointer to incomplete type
  spin_lock(&(dp)->pde_unload_lock);


Thanks,
-Anjana Kar
 Pittsburgh Supercomputing Center
 k...@psc.edu


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] removing zpool OSTs from a filesystem

2015-08-30 Thread Anjana Kar
After adding 4 new zpools/OSTs to an existing lustre filesystem, writes 
have

been hanging intermittently with similar messages showing up in the logs:

sd 2:0:5:0: [sdcr] Unhandled error code
sd 2:0:5:0: [sdcr] Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
sd 2:0:5:0: [sdcr] CDB: Write(10): 2a 00 00 2b 06 70 00 00 f0 00
LustreError: 1252:0:(ost_handler.c:1775:ost_blocking_ast()) Error -2 
syncing data on lock cancel


The writes seem to recover after a while.
Has anyone seen this before? New disks are same as the existing ones, 
4TB Seagates.

Lustre version: 2.5.2

Since there is no real data on the OSTs, we are thinking of setting up 
a new test
filesystem with these 4 OSTs and a separate MDT. The new OSTs have been 
deactivated,
but they are still mounted according to zpool status on the OSS, and 
filesystem size

(df) reflects the increased size.

Should we try to  remove the new OSTs from the existing filesystem 
before creating the
test filesystem? Any pointers on how to do this would be greatly 
appreciated. If there

a better way to approach this problem please let me know.

Thanks in advance,
-Anjana Kar
 Pittsburgh Supercomputing Center
 k...@psc.edu
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[Lustre-discuss] zero length files on lustre

2014-10-08 Thread Anjana Kar

After some low level SAS expander firmware updates on our JetStor shelves,
the lustre filesystem is reporting several zero length files. The raids show
normal status, and all OSTs mount on the OSS nodes w/o problems. The
clients also mount the filesystem. Read/write tests from the clients report
no errors, but several files which had content before are showing up zero
length. These files appear to be on different OSTs so it's not on any 
particular

shelf. Rebooting the servers and a few clients hasn't shown any change.
Servers are running CentOS 5.5, lustre 1.8.3.

Could low level disk settings cause lustre to report zero length files?
There have been no software changes on these systems, so not sure
what could be causing this. Any thoughts?

Thanks,
-Anjana Kar
 Pittsburgh Supercomputing Center
 k...@psc.edu
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Client build fails on Ubuntu 13.10 (3.11 kernel)

2014-08-19 Thread Anjana Kar

Hi,

Has anyone succeeded in building lustre 2.5 client on an Ubuntu system.
After a configure --disable-server, the make starts, but fails rather 
quickly

with these errors

lustre-release.2.5/libcfs/include/libcfs/linux/linux-prim.h:100:1: 
error: unknown type name ‘read_proc_t’

 typedef read_proc_t cfs_read_proc_t;
lustre-release.2.5/libcfs/include/libcfs/linux/linux-prim.h:101:1: 
error: unknown type name ‘write_proc_t’

 typedef write_proc_tcfs_write_proc_t;
 ^
...
lustre-release.2.5/libcfs/libcfs/linux/linux-tracefile.o] Error 1

Thanks for any pointers.

-Anjana Kar
 Pittsburgh Supercomputing Center
 k...@psc.edu


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] number of inodes in zfs MDT

2014-06-18 Thread Anjana Kar

We just finished testing zfs 0.6.3 and lustre 2.5.2 with zfs MDT  OSTs, but
still ran into the problem of running out of inodes on the MDT. The number
started at about 7.9M, and grew to 8.9M, but not beyond that, the MDT
being on a mirrored zpool of 2 80GB SSD drives. The filesystem size was 97TB
with 8 13TB raidz2 OSTs on a shared MDS/OSS node, and a second OSS.

It took ~5000 seconds to run out of inodes in our empty file test. But,
during that time it averaged about 1650/sec which is the best we've seen.
I'm not sure why the inodes have been an issue, but we ran out of time to
pursue this further.

Instead we have moved to ldiskfs MDT and zfs OSTs, with the same lustre/zfs
versions, and have a lot more inodes available.

FilesystemInodes   IUsed   IFree IUse% Mounted on
x.x.x.x@o2ib:/iconfs
 39049920 7455386 31594534   20% /iconfs

Performance has been reportedly better, but one problem was that when
the OSS nodes went down before the OSTs could be taken offline (as would
happen during a power outage), OSTs failed to mount after the reboot.

To get around that we added a zpool import -f line after the message
Unexpected return code from import of pool $pool in the lustre startup 
script
so the pools mount, then ran the lustre startup script to start the 
OSTs. If

there is a better way to handle this please let me know.

Another problem we ran into is that our 1.8.9 clients could not write 
into the new
filesystem with lustre 2.5.60 which came from 
git.hpdd.intel.com/fs/lustre-release.git.
Things worked after checking out track -b b2_5 origin/b2_5, and 
rebuilding kernel

for ldiskfs. OS on the lustre servers is CentOS 6.5, kernel 2.6.32-431.17.1.

Thanks again for all the responses.

-Anjana

On 06/12/2014 09:43 PM, Scott Nolin wrote:

Just a note, I see zfs-0.6.3 has just been annoounced:

https://groups.google.com/a/zfsonlinux.org/forum/#!topic/zfs-announce/Lj7xHtRVOM4 



I also see it is upgraded in the zfs/lustre repo.

The changelog notes the default as changed to 3/4 arc_c_max and a 
variety of other fixes, many focusing on performance.


So Anjana this is probably worth testing, especially if you're 
considering drastic measures.


We upgraded for our MDS, so this file create issue is harder for us to 
test now (literally started testing writes this afternoon, and it's 
not degraded yet, so far at 20 million writes). Since your problem 
still happens fairly quickly I'm sure any information you have will be 
very helpful to add to LU-2476. And if it helps, it may save you some 
pain.


We will likely install the upgrade but may not be able to test 
millions of writes any time soon, as the filesystem is needed for 
production.


Regards,
Scott


On Thu, 12 Jun 2014 16:41:14 +
 Dilger, Andreas andreas.dil...@intel.com wrote:
It looks like you've already increased arc_meta_limit beyond the 
default, which is c_max / 4. That was critical to performance in our 
testing.


There is also a patch from Brian that should help performance in your 
case:

http://review.whamcloud.com/10237

Cheers, Andreas

On Jun 11, 2014, at 12:53, Scott Nolin 
scott.no...@ssec.wisc.edumailto:scott.no...@ssec.wisc.edu wrote:


We tried a few arc tunables as noted here:

https://jira.hpdd.intel.com/browse/LU-2476

However, I didn't find any clear benefit in the long term. We were 
just trying a few things without a lot of insight.


Scott

On 6/9/2014 12:37 PM, Anjana Kar wrote:
Thanks for all the input.

Before we move away from zfs MDT, I was wondering if we can try 
setting zfs
tunables to test the performance. Basically what's a value we can use 
for
arc_meta_limit for our system? Are there are any others settings that 
can

be changed?

Generating small files on our current system, things started off at 500
files/sec,
then declined so it was about 1/20th of that after 2.45 million files.

-Anjana

On 06/09/2014 10:27 AM, Scott Nolin wrote:
We ran some scrub performance tests, and even without tunables set it
wasn't too bad, for our specific configuration. The main thing we did
was verify it made sense to scrub all OSTs simultaneously.

Anyway, indeed scrub or resilver aren't about Defrag.

Further, the mds performance issues aren't about fragmentation.

A side note, it's probably ideal to stay below 80% due to
fragmentation for ldiskfs too or performance degrades.

Sean, note I am dealing with specific issues for a very create intense
workload, and this is on the mds only where we may change. The data
integrity features of Zfs make it very attractive too. I fully expect
things will improve too with Zfs.

If you want a lot of certainty in your choices, you may want to
consult various vendors if lustre systems.

Scott




On June 8, 2014 11:42:15 AM CDT, Dilger, Andreas
andreas.dil...@intel.commailto:andreas.dil...@intel.com wrote:

  Scrub and resilver have nothing to so with defrag.

  Scrub is scanning of all the data blocks in the pool to verify 
their checksums and parity

[Lustre-discuss] directory creation fails on 1.8.9wc1 client

2014-06-12 Thread Anjana Kar

The server is running
build:  2.5.59-g47cde80-CHANGED-2.6.32-431.17.1.el6_lustre.netboot,

MDT is ldiskfs, OSTs are zfs.

The filesystem mounts on a client running
v1_8_9_WC1-g171bd56-CHANGED-2.6.32-431.17.1.el6.x86_64

We're not able to create directories from the client as root or user..

As root, it gives this message:
mkdir: cannot create directory `test2': Unknown error 524

As a user, we're able to create files at the top level, but no directories:
[kar@kollman2 kar]$ touch test2
[kar@kollman2 kar]$ mkdir test3
mkdir: cannot create directory `test3': Operation not permitted

Filesystem Inodes IUsedIFree IUse% Mounted on
10.10.101.160@o2ib:/iconfs
 39049920   273 390496471% /iconfs

There were no errors during the kernel or lustre builds.
Any ideas what the problem might be?

Thanks in advance,
-Anjana Kar
 Pittsburgh Supercomputing Center
 k...@psc.edu
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] number of inodes in zfs MDT

2014-06-09 Thread Anjana Kar

Thanks for all the input.

Before we move away from zfs MDT, I was wondering if we can try setting zfs
tunables to test the performance. Basically what's a value we can use for
arc_meta_limit for our system? Are there are any others settings that can
be changed?

Generating small files on our current system, things started off at 500 
files/sec,

then declined so it was about 1/20th of that after 2.45 million files.

-Anjana

On 06/09/2014 10:27 AM, Scott Nolin wrote:
We ran some scrub performance tests, and even without tunables set it 
wasn't too bad, for our specific configuration. The main thing we did 
was verify it made sense to scrub all OSTs simultaneously.


Anyway, indeed scrub or resilver aren't about Defrag.

Further, the mds performance issues aren't about fragmentation.

A side note, it's probably ideal to stay below 80% due to 
fragmentation for ldiskfs too or performance degrades.


Sean, note I am dealing with specific issues for a very create intense 
workload, and this is on the mds only where we may change. The data 
integrity features of Zfs make it very attractive too. I fully expect 
things will improve too with Zfs.


If you want a lot of certainty in your choices, you may want to 
consult various vendors if lustre systems.


Scott




On June 8, 2014 11:42:15 AM CDT, Dilger, Andreas 
andreas.dil...@intel.com wrote:


Scrub and resilver have nothing to so with defrag.

Scrub is scanning of all the data blocks in the pool to verify their 
checksums and parity to detect silent data corruption, and rewrite the bad 
blocks if necessary.

Resilver is reconstructing a failed disk onto a new disk using parity or 
mirror copies of all the blocks on the failed disk. This is similar to scrub.

Both scrub and resilver can be done online, though resilver of course 
requires a spare disk to rebuild onto, which may not be possible to add to a 
running system if your hardware does not support it.

Both of them do not improve the performance or layout of data on disk. 
They do impact performance because they cause a lot if random IO to the disks, though 
this impact can be limited by tunables on the pool.

Cheers, Andreas

On Jun 8, 2014, at 4:21, Sean Brisbane 
s.brisba...@physics.ox.ac.ukmailto:s.brisba...@physics.ox.ac.uk wrote:

Hi Scott,

We are considering running zfs backed lustre and the factor of 10ish 
performance hit you see worries me. I know zfs can splurge bits of files all 
over the place by design. The oracle docs do recommend scrubbing the volumes 
and keeping usage below 80% for maintenance and performance reasons, I'm going 
to call it 'defrag' but I'm sure someone who knows better will probably correct 
me as to why it is not the same.
So are these performance issues after scubbing and is it possible to scrub 
online - I.e. some reasonable level of performance is maintained while the 
scrub happens?
Resilvering is also recommended. Not sure if that is for performance 
reasons.

http://docs.oracle.com/cd/E23824_01/html/821-1448/zfspools-4.html



Sent from my HTC Desire C on Three

- Reply message -
From: Scott Nolin 
scott.no...@ssec.wisc.edumailto:scott.no...@ssec.wisc.edu
To: Anjana Kar k...@psc.edumailto:k...@psc.edu, 
lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org 
lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org
Subject: [Lustre-discuss] number of inodes in zfs MDT
Date: Fri, Jun 6, 2014 3:23 AM



Looking at some of our existing zfs filesystems, we have a couple with zfs 
mdts

One has 103M inodes and uses 152G of MDT space, another 12M and 19G. I’d 
plan for less than that I guess as Mr. Dilger suggests. It all depends on your 
expected average file size and number of files for what will work.

We have run into some unpleasant surprises with zfs for the MDT, I believe 
mostly documented in bug reports, or at least hinted at.

A serious issue we have is performance of the zfs arc cache over time. This 
is something we didn’t see in early testing, but with enough use it grinds 
things to a crawl. I believe this may be addressed in the newer version of ZFS, 
which we’re hopefully awaiting.

Another thing we’ve seen, which is mysterious to me is this it appears hat 
as the MDT begins to fill up file create rates go down. We don’t really have a 
strong handle on this (not enough for a bug report I think), but we see this:


   1.
The aforementioned 104M inode / 152GB MDT system has 4 SAS drives raid10. 
On initial testing file creates were about 2500 to 3000 IOPs per second. Follow 
up testing in it’s current state (about half full..) shows them at about 500 
IOPs now, but with a few iterations of mdtest those IOPs plummet quickly to 
unbearable levels (like 30…).
   2.
We took a snapshot of the filesystem and sent it to the backup MDS, this 
time with the MDT built on 4 SAS drives in a raid0 - really

[Lustre-discuss] number of inodes in zfs MDT

2014-06-03 Thread Anjana Kar

Is there a way to set the number of inodes for zfs MDT?

I've tried using --mkfsoptions=-N value mentioned in lustre 2.0 
manual, but it
fails to accept it. We are mirroring 2 80GB SSDs for the MDT, but the 
number of
inodes is getting set to 7 million, which is not enough for a 100TB 
filesystem.


Thanks in advance.

-Anjana Kar
 Pittsburgh Supercomputing Center
 k...@psc.edu
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] lustre 2.5.58 MDT 100% full

2014-06-02 Thread Anjana Kar

The MGS/MDT pool and filesystem were created using these commands:

zpool create -f -o ashift=12 -O canmount=off lustre-mgs-mdt
mirror /dev/disk/by-path/pci-:00:1f.2-scsi-0:0:0:0 
/dev/disk/by-path/pci-:

00:1f.2-scsi-1:0:0:0

mkfs.lustre --mgs --mdt --fsname=iconfs --backfstype=zfs --dev
ice-size 131072 --index=0 lustre-mgs-mdt/mgsmdt0

However, now the MDT is full, and 8 OSTs are 86% full.

lustre-mgs-mdt/mgsmdt0
73G   73G 0 100% /mnt/lustre/local/mgsmdt
lustre-ost0/ost0   13T   11T  1.8T  86% /mnt/lustre/local/ost0


Is there a way to clear space, or will the filesystem/MDT need rebuilt?

Previously, the same disks were used for creating a mirrored MDT w/o
-ashift=12, and did not fill up even when the OSTs were at 89%.
Thanks for any advice.

-Anjana Kar
 Pittsburgh Supercomputing Center
 k...@psc.edu
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Fwd: MDT fails to mount after lustre upgrade

2014-05-01 Thread Anjana Kar

This one is lustre over zfs.
Running tunefs.lustre with writeconf didn't seem to help either.
Still get the same messages trying to mount MDT, though
all pools have data.

NAME SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
lustre-mgs-mdt  74.5G  47.2G  27.3G63%  1.00x  ONLINE  -
lustre-ost0 16.2T  15.1T  1.20T92%  1.00x  ONLINE  -
lustre-ost1 16.2T  14.8T  1.48T90%  1.00x  ONLINE  -
lustre-ost2 16.2T  15.1T  1.16T92%  1.00x  ONLINE  -
lustre-ost3 16.2T  15.0T  1.23T92%  1.00x  ONLINE  -


On 04/24/2014 11:06 AM, Parinay Kondekar wrote:

Anjana,

Does this look similar - https://jira.hpdd.intel.com/browse/LU-4634 ?

including hpdd-discuss.

HTH

-- Forwarded message --
From: *Anjana Kar* k...@psc.edu mailto:k...@psc.edu
Date: 24 April 2014 20:16
Subject: Re: [Lustre-discuss] MDT fails to mount after lustre upgrade
To: lustre-discuss@lists.lustre.org 
mailto:lustre-discuss@lists.lustre.org



Would it make sense to run
tunefs.lustre --mgs --writeconf --mgs --mdt /dev/sda /dev/sdb ?

The original mkfs command used to create the MDT was
mkfs.lustre --reformat --fsname=iconfs --mgs --mdt --backfstype=zfs 
--device-size 131072 \

--index=0 lustre-mgs-mdt/mgsmdt0 mirror /dev/sda /dev/sdb

Not sure if both device names should be included or the zpool name.

The latest 2.5.x version also fails to mount MDT with similar 
messages, though

zpool seems intact... is there anyway to get MDT to mount?

[root@icon0 kar]# more /proc/fs/lustre/version
lustre: 2.5.58
kernel: patchless_client
build: 2.5.58-g5565877-PRISTINE-2.6.32-431.11.2.el6.netboot

[root@icon0 kar]# /sbin/service lustre start mgsmdt

Mounting lustre-mgs-mdt/mgsmdt0 on /mnt/lustre/local/mgsmdt
mount.lustre: mount lustre-mgs-mdt/mgsmdt0 at /mnt/lustre/local/mgsmdt 
failed: File exists


[root@icon0 kar]# zpool list
NAME SIZE  ALLOC   FREECAP  DEDUP  HEALTH ALTROOT
lustre-mgs-mdt  74.5G  47.2G  27.3G63%  1.00x ONLINE  -

Corresponding console messages:
2014-04-24T10:36:53.736967-04:00 icon0.psc.edu http://icon0.psc.edu 
kernel: Lustre: Lustre: Build Version: 
2.5.58-g5565877-PRISTINE-2.6.32-431.11.2.el6.netboot
2014-04-24T10:36:53.986362-04:00 icon0.psc.edu http://icon0.psc.edu 
kernel: LNet: Added LNI 10.10.101.160@o2ib [8/256/0/180]
2014-04-24T10:36:54.005177-04:00 icon0.psc.edu http://icon0.psc.edu 
kernel: LNet: Added LNI 128.182.75.160@tcp10 [8/256/0/180]
2014-04-24T10:36:54.009556-04:00 icon0.psc.edu http://icon0.psc.edu 
kernel: LNet: Accept secure, port 988
2014-04-24T10:36:57.428805-04:00 icon0.psc.edu http://icon0.psc.edu 
kernel: LustreError: 11-0: iconfs-MDT-lwp-MDT: Communicating 
with 0@lo, operation mds_connect failed with -11.
2014-04-24T10:36:58.600958-04:00 icon0.psc.edu http://icon0.psc.edu 
kernel: LustreError: 11888:0:(mdd_device.c:1050:mdd_prepare()) 
iconfs-MDD: failed to initialize lfsck: rc = -17
2014-04-24T10:36:58.600978-04:00 icon0.psc.edu http://icon0.psc.edu 
kernel: LustreError: 
11888:0:(obd_mount_server.c:1776:server_fill_super()) Unable to start 
targets: -17
2014-04-24T10:36:58.614082-04:00 icon0.psc.edu http://icon0.psc.edu 
kernel: Lustre: Failing over iconfs-MDT
2014-04-24T10:37:04.885917-04:00 icon0.psc.edu http://icon0.psc.edu 
kernel: Lustre: 11888:0:(client.c:1912:ptlrpc_expire_one_request()) 
@@@ Request sent has timed out for slow reply: [sent 1398350218/real 
1398350218]  req@8802fca1a000 x1466276472946828/t0(0) 
o251-MGC10.10.101.160@o2ib@0@lo:26/25 lens 224/224 e 0 to 1 dl 
1398350224 ref 2 fl Rpc:XN/0/ rc 0/-1
2014-04-24T10:37:05.418516-04:00 icon0.psc.edu http://icon0.psc.edu 
kernel: Lustre: server umount iconfs-MDT complete
2014-04-24T10:37:05.418535-04:00 icon0.psc.edu http://icon0.psc.edu 
kernel: LustreError: 11888:0:(obd_mount.c:1338:lustre_fill_super()) 
Unable to mount  (-17)


TIA,

-Anjana Kar
 Pittsburgh Supercomputing Center
k...@psc.edu mailto:k...@psc.edu
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org mailto:Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MDT fails to mount after lustre upgrade

2014-04-24 Thread Anjana Kar

Would it make sense to run
tunefs.lustre --mgs --writeconf --mgs --mdt /dev/sda /dev/sdb ?

The original mkfs command used to create the MDT was
mkfs.lustre --reformat --fsname=iconfs --mgs --mdt --backfstype=zfs 
--device-size 131072 \

--index=0 lustre-mgs-mdt/mgsmdt0 mirror /dev/sda /dev/sdb

Not sure if both device names should be included or the zpool name.

The latest 2.5.x version also fails to mount MDT with similar messages, 
though

zpool seems intact... is there anyway to get MDT to mount?

[root@icon0 kar]# more /proc/fs/lustre/version
lustre: 2.5.58
kernel: patchless_client
build: 2.5.58-g5565877-PRISTINE-2.6.32-431.11.2.el6.netboot

[root@icon0 kar]# /sbin/service lustre start mgsmdt
Mounting lustre-mgs-mdt/mgsmdt0 on /mnt/lustre/local/mgsmdt
mount.lustre: mount lustre-mgs-mdt/mgsmdt0 at /mnt/lustre/local/mgsmdt 
failed: File exists


[root@icon0 kar]# zpool list
NAME SIZE  ALLOC   FREECAP  DEDUP  HEALTH ALTROOT
lustre-mgs-mdt  74.5G  47.2G  27.3G63%  1.00x ONLINE  -

Corresponding console messages:
2014-04-24T10:36:53.736967-04:00 icon0.psc.edu kernel: Lustre: Lustre: 
Build Version: 2.5.58-g5565877-PRISTINE-2.6.32-431.11.2.el6.netboot
2014-04-24T10:36:53.986362-04:00 icon0.psc.edu kernel: LNet: Added LNI 
10.10.101.160@o2ib [8/256/0/180]
2014-04-24T10:36:54.005177-04:00 icon0.psc.edu kernel: LNet: Added LNI 
128.182.75.160@tcp10 [8/256/0/180]
2014-04-24T10:36:54.009556-04:00 icon0.psc.edu kernel: LNet: Accept 
secure, port 988
2014-04-24T10:36:57.428805-04:00 icon0.psc.edu kernel: LustreError: 
11-0: iconfs-MDT-lwp-MDT: Communicating with 0@lo, operation 
mds_connect failed with -11.
2014-04-24T10:36:58.600958-04:00 icon0.psc.edu kernel: LustreError: 
11888:0:(mdd_device.c:1050:mdd_prepare()) iconfs-MDD: failed to 
initialize lfsck: rc = -17
2014-04-24T10:36:58.600978-04:00 icon0.psc.edu kernel: LustreError: 
11888:0:(obd_mount_server.c:1776:server_fill_super()) Unable to start 
targets: -17
2014-04-24T10:36:58.614082-04:00 icon0.psc.edu kernel: Lustre: Failing 
over iconfs-MDT
2014-04-24T10:37:04.885917-04:00 icon0.psc.edu kernel: Lustre: 
11888:0:(client.c:1912:ptlrpc_expire_one_request()) @@@ Request sent has 
timed out for slow reply: [sent 1398350218/real 1398350218]  
req@8802fca1a000 x1466276472946828/t0(0) 
o251-MGC10.10.101.160@o2ib@0@lo:26/25 lens 224/224 e 0 to 1 dl 
1398350224 ref 2 fl Rpc:XN/0/ rc 0/-1
2014-04-24T10:37:05.418516-04:00 icon0.psc.edu kernel: Lustre: server 
umount iconfs-MDT complete
2014-04-24T10:37:05.418535-04:00 icon0.psc.edu kernel: LustreError: 
11888:0:(obd_mount.c:1338:lustre_fill_super()) Unable to mount  (-17)


TIA,
-Anjana Kar
 Pittsburgh Supercomputing Center
 k...@psc.edu
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] MDT fails to mount after lustre upgrade

2014-04-22 Thread Anjana Kar

After upgrading a CentOS MGS/MDS server running zfs lustre

from
lustre: 2.3.64
kernel: patchless_client
build: v2_3_64-1dkms-1-PRISTINE-2.6.32-358.2.1.el6.netboot

to
lustre: 2.5.57
kernel: patchless_client
build: 2.5.57-g3423b5b-CHANGED-2.6.32-431.11.2.el6.netboot

the MDT fails to mount with the messages below.

/sbin/service lustre start mgsmdt brings the zpool online:

NAME USED  AVAIL  REFER  MOUNTPOINT
lustre-mgs-mdt  47.2G  26.2G30K /lustre-mgs-mdt
lustre-mgs-mdt/mgsmdt0  46.5G  26.2G  27.8G /lustre-mgs-mdt/mgsmdt0

lctl list_nids shows
10.10.101.160@o2ib
128.182.75.160@tcp10

Any suggestions on how to get MDT to mount?

Thanks,
-Anjana Kar
 Pittsburgh Supercomputing Center
 k...@psc.edu

--- 
2014-04-21T13:05:03.641223-04:00 icon0.psc.edu kernel: SPL: using hostid 
0xb680a04b
LustreError: 11-0: iconfs-MDT-lwp-MDT: Communicating with 0@lo, 
operation mds_connect failed with -11.
2014-04-21T13:05:06.820694-04:00 icon0.psc.edu kernel: LustreError: 
11-0: iconfs-MDT-lwp-MDT: Communicating with 0@lo, operation 
mds_connect failed with -11.
LustreError: 3537:0:(mdd_device.c:1050:mdd_prepare()) iconfs-MDD: 
failed to initialize lfsck: rc = -17
LustreError: 3537:0:(obd_mount_server.c:1776:server_fill_super()) Unable 
to start targets: -17 2014-04-21T13:05Lustre: Failing over 
iconfs-MDT :09.821666-04:00 icon0.psc.edu kernel: LustreError: 
3537:0:(mdd_device.c:1050:mdd_prepare()) iconfs-MDD: failed to 
initialize lfsck: rc = -17
2014-04-21T13:05:09.821688-04:00 icon0.psc.edu kernel: LustreError: 
3537:0:(obd_mount_server.c:1776:server_fill_super()) Unable to start 
targets: -17 2014-04-21T13:05:09.834743-04:00 icon0.psc.edu kernel: 
Lustre: Failing over iconfs-MDT



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MDT fails to mount after lustre upgrade

2014-04-22 Thread Anjana Kar

Ran these commands.. corresponding messages (below) from console:

[root@icon0]# lustre_rmmod
[root@icon0]# /sbin/service lustre start mgsmdt
Mounting lustre-mgs-mdt/mgsmdt0 on /mnt/lustre/local/mgsmdt
mount.lustre: mount lustre-mgs-mdt/mgsmdt0 at /mnt/lustre/local/mgsmdt 
failed: File exists

[root@icon0 obdclass]# lctl dl
[root@icon0 obdclass]# more /proc/fs/lustre/version
lustre: 2.5.57
kernel: patchless_client
build:  2.5.57-g3423b5b-CHANGED-2.6.32-431.11.2.el6.netboot


2014-04-22T11:17:01.632211-04:00 icon0.psc.edu kernel: LNet: Removed LNI 
128.182.75.160@tcp10
2014-04-22T11:17:03.638989-04:00 icon0.psc.edu kernel: LNet: Removed LNI 
10.10.101.160@o2ib


2014-04-22T11:17:45.370586-04:00 icon0.psc.edu kernel: alg: No test for 
adler32 (adler32-zlib)
2014-04-22T11:17:45.375377-04:00 icon0.psc.edu kernel: alg: No test for 
crc32 (crc32-table)


2014-04-22T11:17:53.537558-04:00 icon0.psc.edu kernel: Lustre: Lustre: 
Build Version: 2.5.57-g3423b5b-CHANGED-2.6.32-431.11.2.el6.netboot
2014-04-22T11:17:53.780761-04:00 icon0.psc.edu kernel: LNet: Added LNI 
10.10.101.160@o2ib [8/256/0/180]
2014-04-22T11:17:53.799193-04:00 icon0.psc.edu kernel: LNet: Added LNI 
128.182.75.160@tcp10 [8/256/0/180]
2014-04-22T11:17:53.803483-04:00 icon0.psc.edu kernel: LNet: Accept 
secure, port 988


2014-04-22T11:17:56.016370-04:00 icon0.psc.edu kernel: LustreError: 
11-0: iconfs-MDT-lwp-MDT: Communicating with 0@lo, operation 
mds_connect failed with -11.
LustreError: 9303:0:(mdd_device.c:1050:mdd_prepare()) iconfs-MDD: 
failed to initialize lfsck: rc = -17
LustreError: 9303:0:(obd_mount_server.c:1776:server_fill_super()) Unable 
to start targets: -17
2014-04-22T11:17:58.112621-04:00 icon0.psc.edu kernel: LustreError: 
9303:0:(mdd_device.c:1050:mdd_prepare()) iconfs-MDD: failed to 
initialize lfsck: rc = -17
2014-04-22T11:17:58.112650-04:00 icon0.psc.edu kernel: LustreError: 
9303:0:(obd_mount_server.c:1776:server_fill_super()) Unable to start 
targets: -17
2014-04-22T11:17:58.125683-04:00 icon0.psc.edu kernel: Lustre: Failing 
over iconfs-MDT


2014-04-22T11:18:04.453442-04:00 icon0.psc.edu kernel: Lustre: 
9303:0:(client.c:1912:ptlrpc_expire_one_request()) @@@ Request sent has 
timed out for slow reply: [sent 1398179878/real 1398179878]  
req@880307c08800 x1466097858510988/t0(0) 
o251-MGC10.10.101.160@o2ib@0@lo:26/25 lens 224/224 e 0 to 1 dl 
1398179884 ref 2 fl Rpc:XN/0/ rc 0/-1


2014-04-22T11:18:04.715996-04:00 icon0.psc.edu kernel: Lustre: server 
umount iconfs-MDT complete
2014-04-22T11:18:04.716024-04:00 icon0.psc.edu kernel: LustreError: 
9303:0:(obd_mount.c:1338:lustre_fill_super()) Unable to mount  (-17)


On 04/22/2014 11:00 AM, Parinay Kondekar wrote:
LustreError: 3537:0:(obd_mount_server.c:1776:server_fill_super()) 
Unable to start targets: -17 2014-04-21T13:05Lustre: Failing over 
iconfs-MDT :09.821666-04:00 icon0.psc.edu 
http://icon0.psc.edu/ kernel: LustreError: 
3537:0:(mdd_device.c:1050:mdd_prepare()) iconfs-MDD: failed to 
initialize lfsck: rc = -17


 #defineEEXIST17 /* File exists */

Issue is present with 2.4 client.

- Try `lustre_rmmod` and mount again.
- lctl dl output
- lustre version

HTH



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] ldiskfs for MDT and zfs for OSTs?

2013-10-07 Thread Anjana Kar
Is it possible to configure the MDT as ldiskfs and the OSTs with zfs
in lustre 2.4? The server is running a lustre kernel on a Centos 6.4
system, has both lustre-osd-ldiskfs and lustre-osd-zfs rpms installed.
The MDT is up as ldiskfs, but get an error trying to configure the ost:

mkfs.lustre --fsname=lustrefs --reformat --ost --backfstype=zfs .

mkfs.lustre FATAL: unable to prepare backend (22)
mkfs.lustre: exiting with 22 (Invalid argument)

Thanks,
-Anjana Kar
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ldiskfs for MDT and zfs for OSTs?

2013-10-07 Thread Anjana Kar
On 10/07/2013 04:27 PM, Ned Bass wrote:
 On Mon, Oct 07, 2013 at 02:23:32PM -0400, Anjana Kar wrote:
 Here is the exact command used to create a raidz2 pool with 8+2 drives,
 followed by the error messages:

 mkfs.lustre --fsname=cajalfs --reformat --ost --backfstype=zfs
 --index=0 --mgsnode=10.10.101.171@o2ib lustre-ost0/ost0 raidz2
 /dev/sda /dev/sdc /dev/sde /dev/sdg /dev/sdi /dev/sdk /dev/sdm
 /dev/sdo /dev/sdq /dev/sds

 mkfs.lustre FATAL: Invalid filesystem name /dev/sds
 It seems that either the version of mkfs.lustre you are using has a
 parsing bug, or there was some sort of syntax error in the actual
 command entered.  If you are certain your command line is free from
 errors, please post the version of lustre you are using, or report the
 bug in the Lustre issue tracker.

 Thanks,
 Ned

For building this server, I followed steps from the walk-thru-build* for 
Centos 6.4,
and added --with-spl and --with-zfs when configuring lustre..
*https://wiki.hpdd.intel.com/pages/viewpage.action?pageId=8126821

spl and zfs modules were installed from source for the lustre 2.4 kernel
2.6.32.358.18.1.el6_lustre2.4

Device sds appears to be valid, but I will try issuing the command using 
by-path
names..

-Anjana
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss