[Lustre-discuss] Cannot get an OST to activate

2010-09-03 Thread Bob Ball
We added a new OSS to our 1.8.4 Lustre installation. It has 6 OST of 8.9TB each. Within a day of having these on-line, one OST stopped accepting new files. I cannot get it to activate. The other 5 seem fine. On the MDS lctl dl shows it IN, but not UP, and files can be read from it: 33 IN

Re: [Lustre-discuss] Cannot get an OST to activate

2010-09-03 Thread Bob Ball
files back from that command, but other problems on our cluster confused that result. We will recheck. bob Bernd Schubert wrote: On Friday, September 03, 2010, Bob Ball wrote: We added a new OSS to our 1.8.4 Lustre installation. It has 6 OST of 8.9TB each. Within a day of having the

Re: [Lustre-discuss] client modules not loading during boot

2010-09-08 Thread Bob Ball
Try adding _netdev as a mount option. bob Cliff White wrote: The mount command will automatically load the modules on the client. cliffw On 09/03/2010 11:56 AM, Ronald K Long wrote: We have installed lustre 1.8.2 and 1.8.4 client on Red hat 5. The lustre modules are not

Re: [Lustre-discuss] Cannot get an OST to activate

2010-09-10 Thread Bob Ball
000100 59227 59414 Thanks, bob Bob Ball wrote: Thank you, Bern. "df" claims there is some 442MB of data on the volume, compared to neighbors with 285GB. That could well be a fragment of a single, unsuccessful transfer attempt. I can run lfs_find on it though and see what comes ba

Re: [Lustre-discuss] Cannot get an OST to activate

2010-09-10 Thread Bob Ball
I just made some random checks on the "lfs find" output for this OST from yesterday.  Each file I checked was one lost when we had problems a few months back.  The suggested "unlink" on these did not work in 1.8.3, worked fine on a whole set yesterday with 1.8.4, but I obviously did not find

Re: [Lustre-discuss] Cannot get an OST to activate

2010-09-10 Thread Bob Ball
down-time was involved. bob Bob Ball wrote: I just made some random checks on the "lfs find" output for this OST from yesterday.  Each file I checked was one lost when we had problems a few months back.  The suggested "unlink" on these did not work in 1.8.3, worked fine on a

Re: [Lustre-discuss] How do you monitor your lustre?

2010-09-30 Thread Bob Ball
syslog-ng bob On 9/30/2010 1:31 PM, Ben Evans wrote: Would't it be better to log to a non-Lustre machine? If your MDS goes down, you lose all your logs, which would make things a bit trickier. -Original Message- From: lustre-discuss-boun...@lists.lustre.org

Re: [Lustre-discuss] Question about lfs find

2010-10-06 Thread Bob Ball
in my experience a single (or small number) of lfs_find with lots of obd arguments was faster than doing all of them individually. go to lustre 1.8.4 (at least) and use lfs_migrate with your lfs_find list. it wasn't REAL fast, but it was REAL reliable. bob On 10/6/2010 5:24 PM, Michael

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Bob Ball
Why do you need both active? If one is a backup to the other, then bond them as a primary/backup pair, meaning only one will be active at at a time, ie, your designated primary (unless it goes down). bob On 10/21/2010 9:51 AM, Brock Palen wrote: On Oct 21, 2010, at 9:48 AM, Joe Landman

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Bob Ball
afterwards for information. bob On 10/21/2010 9:59 AM, Bob Ball wrote: Why do you need both active? If one is a backup to the other, then bond them as a primary/backup pair, meaning only one will be active at at a time, ie, your designated primary (unless it goes down). bob On 10/21/2010 9:51 AM

[Lustre-discuss] completely emptying an OST

2010-10-24 Thread Bob Ball
We may need to completely empty an OST, so that we can more efficiently format the underlying RAID-6 array (in light of recent discussions about the stripe on this list). What is the most efficient way to do this? I can use multiple clients running lfs_migrate from pre-prepared lists, but

[Lustre-discuss] questions about an OST content

2010-11-06 Thread Bob Ball
I am emptying a set of OST so that I can reformat the underlying RAID-6 more efficiently. Two questions: 1. Is there a quick way to tell if the OST is really empty? lfs_find takes many hours to run. 2. When I reformat, I want it to retain the same ID so as to not make holes in the list. From

Re: [Lustre-discuss] questions about an OST content

2010-11-06 Thread Bob Ball
Responding to everyone (and thanks to all) lfs df -i from a client or simply df -i from the OSS node ... This still shows of order 100 inodes after the OST was emptied. tunefs.lustre --print /dev/sdj will tell you the index in base 10. Yes, this worked. df -H /path/to/OST/mount_point This

Re: [Lustre-discuss] questions about an OST content

2010-11-08 Thread Bob Ball
already in use retries left: 0 mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already in use The target service's index is already in use. (/dev/sdc) On 11/8/2010 5:01 AM, Andreas Dilger wrote: On 2010-11-07, at 12:32, Bob Ball b...@umich.edu

Re: [Lustre-discuss] questions about an OST content

2010-11-08 Thread Bob Ball
Yes, you are correct. That was the key here, did not put that file back in place. Back up and (so far) operating cleanly. Thanks, bob On 11/8/2010 3:04 PM, Andreas Dilger wrote: On 2010-11-08, at 11:39, Bob Ball wrote: Don't know if I sent to the whole list. One of those days. remade

Re: [Lustre-discuss] questions about an OST content

2010-11-10 Thread Bob Ball
, at 11:01, Bob Ball wrote: Well, we ran 2 days, migrating files off OST, then this morning, the MDT crashed. Could not get all clients reconnected before seeing another kernel panic on the mdt. did an e2fsck of the mdt db and tried again. crashed again, but this time the logged message

[Lustre-discuss] Problems with lfs find

2010-11-29 Thread Bob Ball
I have an odd problem. I am trying to empty all files from a set of OST as indicated below, by making a list via lfs find and then sending that list to lfs_migrate. However, I have just gotten this message back from the lfs find: llapi_semantic_traverse: Failed to open

Re: [Lustre-discuss] Problems with lfs find

2010-11-30 Thread Bob Ball
of this ldiskfs mount of the file system, back into some recovery directory in the real file system, so that users can pick through them? After they are moved, the file system will be reformatted and returned to use. bob On 11/30/2010 8:53 AM, Bob Ball wrote: OK, thanks. Scary, to see errors out of lfs

Re: [Lustre-discuss] Problems with lfs find

2010-11-30 Thread Bob Ball
On 11/30/2010 4:17 PM, Andreas Dilger wrote: On 2010-11-30, at 11:17, Bob Ball wrote: [r...@umdist03 d0]# ls -l total 182976 -rw-rw-rw- 1 daits users 45002956 Jul 5 20:52 1162976 -rw-rw-rw- 1 daits users 44569036 Jul 7 02:53 1200608 -rw-rw-rw- 1 daits users 49108913 Jun 28 04:43 1218976 -rw

[Lustre-discuss] OST error

2010-12-02 Thread Bob Ball
We were getting errors thrown by an OST. /var/log/messages contained a lot of these: 2010-11-28T17:05:34-05:00 umfs06.aglt2.org kernel: [2102640.735927] LDISKFS-fs error (device sdk): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 639corrupted: 440 blocks free in bitmap, 439 - in gd

Re: [Lustre-discuss] OST error

2010-12-03 Thread Bob Ball
, that remain corrupted, and we'll probably never be able to come up with a complete list of them. bob On 12/2/2010 4:35 PM, Bob Ball wrote: It is a Dell PERC6 RAID array. OMSA monitoring is enabled and is not throwing errors. H, mptctl is old though, so maybe that is a contributing factor

Re: [Lustre-discuss] Getting around a Catch-22

2010-12-08 Thread Bob Ball
Thanks for the pointer to this. After thinking on this a bit, I believe I can see my way clear to using it. Testing time bob On 12/7/2010 5:45 PM, Cliff White wrote: On 12/07/2010 06:51 AM, Bob Ball wrote: We have 6 OSS, each with at least 8 OST. It sometimes happens that I need to do

[Lustre-discuss] Renaming an OSS

2010-12-13 Thread Bob Ball
For administrative reasons, we want to rename an OSS. It will retain the same IP addresses, and have the same OST. Does this present any problems that I should be aware of, or is it a no-brainer? Thanks, bob ___ Lustre-discuss mailing list

[Lustre-discuss] question about routing between subnets

2011-01-21 Thread Bob Ball
Our lustre 1.8.4 system sits primarily on subnet A. However, we also have a small number of clients that sit on subnet B. In setting up the subnet B clients, we provided lnet router machines that have addresses on both subnet A and on subnet B, the MGS machine has addresses on both subnet A

[Lustre-discuss] Fwd: question about routing between subnets

2011-01-25 Thread Bob Ball
Date: Fri, 21 Jan 2011 15:48:25 -0500 From: Bob Ball b...@umich.edu To: Lustre discussion lustre-discuss@lists.Lustre.org Our lustre 1.8.4 system sits primarily

Re: [Lustre-discuss] Fwd: question about routing between subnets

2011-01-25 Thread Bob Ball
OK, so, finally with time on my hands, I find I can make this work. Sorry about the message list traffic. bob On 1/25/2011 9:51 AM, Bob Ball wrote: Hi, no response on this the first time I sent it around. Can anyone help me

[Lustre-discuss] Migrating MDT volume to a new location

2011-02-02 Thread Bob Ball
Is there a recommended way to migrate an MDT (MGS is separate) volume from one location to another on the same server? This uses iSCSI volumes. Lustre 1.8.4 Thanks, bob ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org

Re: [Lustre-discuss] Migrating MDT volume to a new location

2011-02-03 Thread Bob Ball
with resize2fs applied to a MDT ... Cheers, Thomas On 02/03/2011 06:35 PM, Bob Ball wrote: We used dd. It took about 12 hours. After the dd, we did an e2fsck on the new volume, remounted it as the MDT, and Lustre happily began serving files again. Thanks to everyone for their help. bob

Re: [Lustre-discuss] Help! Newbie trying to set up Lustre network

2011-02-22 Thread Bob Ball
Quite often, sunrpc takes the port needed by Lustre, before Lustre can get to it. That results in the messages below. No recourse but to reboot. Put the mount in your /etc/fstab as the simplest approach. This may not be the ONLY reason why this happens, but it is the one that has most

Re: [Lustre-discuss] Help! Newbie trying to set up Lustre network

2011-02-22 Thread Bob Ball
Make sure that the kernel you are running matches up with the rpms you installed then? [ball@umt3int01:gate01_b]$ rpm -qa|grep lustre lustre-client-1.8.4-2.6.18_194.17.4.el5.x86_64 lustre-client-modules-1.8.4-2.6.18_194.17.4.el5.x86_64 [ball@umt3int01:gate01_b]$ uname -r 2.6.18-194.17.4.el5 bob

Re: [Lustre-discuss] Client Kernel panic - not syncing. Lustre 1.8.5

2011-05-25 Thread Bob Ball
1.6.6 gave us lots of problems. We are using 1.8.4 here. Has better tools, for one thing, eg, lfs_migrate. bob On 5/24/2011 7:37 PM, Mag Gam wrote: stick with 1.6.6 , its a great release! BTW, why did you decide to upgrade to 1.8.x? is there a feature you are looking for? On Fri, May 20,

Re: [Lustre-discuss] Emptied OSTs not empty

2011-06-27 Thread Bob Ball
I had the same issue with Lustre 1.8.4. Wash, rinse, repeat In other words, do the lfs_find, do the lfs_migrate, then do the find a second time. That seemed to catch most everything of importance. I haven't a clue why this should be the case, but it was true on every OST that I

[Lustre-discuss] mkfs.lustre problem

2011-11-09 Thread Bob Ball
I'm hoping someone can help me out here. We are running Lustre 1.8.4 under SL5.7 (now, it has slowly upgraded over time from SL5.3 or SL5.4 and it started out at Lustre 1.8.3). A newly installed OSS running SL5.7 does not seem to show this issue, when making new OST (not reusing the index

Re: [Lustre-discuss] mkfs.lustre problem

2011-11-10 Thread Bob Ball
was cleared. Subsequent operations on the volume beginning with the mkfs.lustre then succeeded without a hitch. bob On 11/9/2011 4:08 PM, Bob Ball wrote: I'm hoping someone can help me out here. We are running Lustre 1.8.4 under SL5.7 (now, it has slowly upgraded over time from SL5.3 or SL5.4

Re: [Lustre-discuss] delete a undeletable file

2013-03-07 Thread Bob Ball
You could just unlink it instead. That will work when rm fails. bob On 3/7/2013 11:10 AM, Colin Faber wrote: Hi, If the file is disassociated with an OST which is offline, bring the OST back online, if the OST object it self is missing then you can remove the file using 'unlink' rather

[Lustre-discuss] OST not activating for write following upgrade

2013-07-16 Thread Bob Ball
I am just finishing an upgrade from Lustre 1.8.4 to 2.1.6, on SL5.8 to SL6.4. All OSS, MGS and MDT machines were upgraded. Following the upgrade I find that none of the OST will change to RW state. Could I get some advice from someone on this? We have separate mgs/mdt machines, and 8 OSS

Re: [Lustre-discuss] OST not activating for write following upgrade

2013-07-16 Thread Bob Ball
Sorry for the noise. This request can be cancelled. I left out one very important step in restoring the mdt, namely rm OBJECTS/* CATALOGS DOH! Umounted, removed via ldiskfs mount, remounted, and now all OST are showing UP instead of IN. bob On 7/16/2013 9:22 AM, Bob Ball wrote: I am just

Re: [Lustre-discuss] Can't install lustre-tests

2013-07-18 Thread Bob Ball
The documentation suggests using yum localinstall lustre-tests-2.1.5-2.6.32_279.19.1.el6_lustre.x86_64.x86_64.rpm That worked well for me, and found the dependent rpm just fine, openmpi-1.5.4-1.el6.x86_64 bob On 7/18/2013 8:00 PM, Dilger, Andreas wrote: On 2013/07/18 3:36 PM, Prakash Surya

[Lustre-discuss] root squash problem

2013-07-19 Thread Bob Ball
We have just installed Lustre 2.1.6 on SL6.4 systems. It is working well. However, I find that I am unable to apply root squash parameters. We have separate mgs and mdt machines. Under Lustre 1.8.4 this was not an issue for root squash commands applied on the mdt. However, when I modify

Re: [Lustre-discuss] [HPDD-discuss] root squash problem

2013-07-22 Thread Bob Ball
Of Bob Ball Sent: Saturday, July 20, 2013 12:35 AM To: hpdd-disc...@lists.01.org; Lustre discussion Subject: [HPDD-discuss] root squash problem We have just installed Lustre 2.1.6 on SL6.4 systems. It is working well. However, I find that I am unable to apply root squash parameters. We

Re: [Lustre-discuss] [HPDD-discuss] Failure to connect to some OST from a client machine

2013-09-05 Thread Bob Ball
tcp2 hops 1 gw 10.10.1.51@tcp down Succeeded On 9/5/2013 4:01 PM, Kris Howard wrote: Might check lctl show_route and look for downed routes. On Thu, Sep 5, 2013 at 12:56 PM, Bob Ball b...@umich.edu mailto:b...@umich.edu wrote: We are running Lustre

[Lustre-discuss] Failure to connect to some OST from a client machine

2013-09-05 Thread Bob Ball
We are running Lustre 2.1.6 on Scientific Linux 6.4, kernel 2.6.32-358.11.1.el6.x86_64. This was an upgrade from Lustre 1.8.4 on SL5. We have had a few situations lately where a client stops talking to some subset of the OST (about 58 of these total on 8 OSS, nearly 500TB in total). I have a

[Lustre-discuss] Files written to an OST are corrupted

2013-09-19 Thread Bob Ball
Hi, everyone, I need some help in figuring out what may have happened here, as newly created files on an OST are being corrupted. I don't know if this applies to all files written to this OST, or just to files of order 2GB size, but files are definitely being corrupted, with no errors

[Lustre-discuss] Recovering a failed OST

2014-05-19 Thread Bob Ball
I need to completely remake a failed OST. I have done this in the past, but this time, the disk failed in such a way that I cannot fully get recovery information from the OST before I destroy and recreate. In particular, I am unable to recover the LAST_ID file, but successfully retrieved the

Re: [Lustre-discuss] [HPDD-discuss] Recovering a failed OST

2014-05-19 Thread Bob Ball
must be stopped on all servers before performing this procedure. So, is this the best approach to follow, allowing for the fact that there is nothing at all left on the OST, or is there a better short cut to choosing an appropriate LAST_ID? Thanks again, bob On 5/19/2014 1:50 PM, Bob Ball

Re: [Lustre-discuss] [HPDD-discuss] Recovering a failed OST

2014-05-19 Thread Bob Ball
that this is a valid starting point, and proceed to get my file system back online. bob On 5/19/2014 2:05 PM, Bob Ball wrote: Google first, ask later. I found this in the manuals: 26.3.4 Fixing a Bad LAST_ID on an OST The procedures there spell out pretty well what I must do, so

Re: [Lustre-discuss] [HPDD-discuss] Recovering a failed OST

2014-05-22 Thread Bob Ball
had to run this several times in order to restore the structure below. best regards, Martin On 05/19/2014 08:24 PM, Bob Ball wrote: Oh, better still, as I kept looking, and the low-level panic retreated, I found this on the mdt: [root@lmd02 ~]# lctl get_param osc.*.prealloc_next_id ... osc.umt3

Re: [Lustre-discuss] Performance dropoff for a nearly full Lustre file system

2015-01-14 Thread Bob Ball
In my memory, it is not recommended to run Lustre more than 90% full. bob On 1/14/2015 2:43 PM, Mike Selway wrote: Hello, I’m looking for experiences for what has been observed to happen (performance drop offs, severity of drops, partial/full failures, …) when an operational

[lustre-discuss] Deactivate an OST for new file write operations

2015-05-04 Thread Bob Ball
We just built a new 2.7.0 Lustre file system. Overall I'm happy with performance, but something is confusing me. We have a combined mgs/mdt DataStore. On that server I issue: ... 20 UP osp umt3B-OST000b-osc-MDT umt3B-MDT-mdtlov_UUID 5 [root@mdtmgs ~]# lctl --device 20 deactivate

Re: [lustre-discuss] Deactivate an OST for new file write operations

2015-05-04 Thread Bob Ball
index fail as expected, attempts to write to another, enabled index are fine. bob On 5/4/2015 1:39 PM, Bob Ball wrote: We just built a new 2.7.0 Lustre file system. Overall I'm happy with performance, but something is confusing me. We have a combined mgs/mdt DataStore. On that server I issue

[lustre-discuss] Can't get zfs OST to mount on a freshly built server

2015-05-13 Thread Bob Ball
OK, so, I am seeing EXACTLY the issue reported at the end of LU-6452 8 minutes after it was closed by Andreas Dilger. https://jira.hpdd.intel.com/browse/LU-6452 There is no response. Is there a solution? This is Lustre 2.7.0 with (now) zfs 0.6.4.1-1, which was current when the server was

Re: [lustre-discuss] upgrading zfs back end file system lustre doesn't mount anymore

2015-06-08 Thread Bob Ball
I had the same issue. I could not find/get an answer, posted to this same list about 1mo ago. I ended up down-grading to the same zfs I had previously used, ie, 0.6.3.1 with Lustre 2.7.0 Maybe an answer has come round in the intervening month? But, it would appear that if you use zfs, you

[lustre-discuss] File writes blocking on Lustre 2.7.0

2015-06-05 Thread Bob Ball
OK, this was just odd to me. We have a Lustre 2.7.0 system running now, and, after setting up our first OSS, copying over all files from the old system, we brought the new file system online. The new backingstore is zfs. All was well with the world. Meanwhile, all the old file servers were

Re: [lustre-discuss] File migration from one OST to another for Lustre 2.5

2015-06-24 Thread Bob Ball
So, let's say I just want to empty the OST completely, reformat or remake or, then add the OST back in. If I never let the emptied OST, prior to re-X, reconnect, then the MDS cannot destroy the objects on the OST. Is this going to be an issue once the re-X OST is ready and is brought

[lustre-discuss] Problem on Lustre 2.7.0

2015-06-10 Thread Bob Ball
We are running Lustre 2.7.0. # uname -r 2.6.32-504.8.1.el6_lustre.x86_64 The combined mgsmdt load jumped up yesterday, and stayed high since, with a couple of really outrageous peaks. Ended up power cycling, as the mdt would not umount. It seems to be performing fine now, but while watching

Re: [lustre-discuss] lustre2 cannot set ost to read only

2015-06-15 Thread Bob Ball
cat /proc/fs/lustre/lov/*/target_obd I forget which version this bug appeared in, that the lctl command does not show the IN state. I am hoping it will be fixed in the 2.8 release as it is really annoying. bob On 6/15/2015 3:39 PM, Kurt Strosahl wrote: An update, Even though the MDT

Re: [lustre-discuss] Can't get zfs OST to mount on a freshly built server

2015-05-25 Thread Bob Ball
. Many thanks, bob On 5/13/2015 10:16 AM, Bob Ball wrote: OK, so, I am seeing EXACTLY the issue reported at the end of LU-6452 8 minutes after it was closed by Andreas Dilger. https://jira.hpdd.intel.com/browse/LU-6452 There is no response. Is there a solution? This is Lustre 2.7.0 with (now

[lustre-discuss] Trying to rebuild all Lustre rpms with zfs 0.6.4.2 -- and failing

2015-07-22 Thread Bob Ball
Hi, I'm looking for someone who can give me advice on a problem I am having rebuilding the full set (server and client) of Lustre rpms, including zfs support on the server side. The 2.7.0 rpms as distributed were built with zfs 0.6.3, and do not work with 0.6.4. So, I follow the directions

[lustre-discuss] Trying to rebuild all Lustre rpms with zfs 0.6.4.2 -- and failing

2015-07-22 Thread Bob Ball
Hi, I'm looking for someone who can give me advice on a problem I am having rebuilding the full set (server and client) of Lustre rpms, including zfs support on the server side. The 2.7.0 rpms as distributed were built with zfs 0.6.3, and do not work with 0.6.4. So, I follow the directions

Re: [lustre-discuss] Removing large directory tree

2015-07-10 Thread Bob Ball
FWIW, last September in the context of reducing the memory usage on the mgs, Andreas Dilger had this to say ls is itself fairly inefficient at directory traversal, because all of the GNU file utilities are bloated and do much more work than necessary (e.g. rm will stat() every file before

Re: [lustre-discuss] Expanding a zfsonlinux OST pool

2015-12-07 Thread Bob Ball
is using the disks. Also, how often are disks failing and how long does a replacement take to resilver, with your current disks? -Olaf *From:* Bob Ball [b...@umich.edu] *Sent:* Monday, November 23, 2015 12:22 PM

Re: [lustre-discuss] Building lustre with zfs only

2015-12-15 Thread Bob Ball
You may want to do this on your build machine? cd /usr/src/spl-0.6.4.2/ ./configure --with-config=kernel make all cd ../zfs-0.6.4.2/ ./configure --with-config=kernel make all bob On 12/15/2015 2:16 PM, Christopher J. Morrone wrote: Is your OS CentOS 6.7 then? You _might_ be hitting bug

Re: [lustre-discuss] Expanding a zfsonlinux OST pool

2015-11-24 Thread Bob Ball
*From:* Bob Ball [b...@umich.edu] *Sent:* Monday, November 23, 2015 12:22 PM *To:* Faaland, Olaf P.; Morrone, Chris *Cc:* Bob Ball *Subject:* Expanding a zfsonlinux OST pool Hi, We have some zfsonlinux pools in use with Lustre 2.7 that use some older disks, and we are rapidly running out

Re: [lustre-discuss] re-creating a zfs OST

2016-01-14 Thread Bob Ball
Thank you, Andreas. As always, your answers are the best. bob On 1/13/2016 10:14 PM, Dilger, Andreas wrote: On 2016/01/12, 12:50, "lustre-discuss on behalf of Bob Ball" <lustre-discuss-boun...@lists.lustre.org on behalf of b...@umich.edu> wrote: I have a zfs OST that I need

Re: [lustre-discuss] re-creating a zfs OST

2016-01-14 Thread Bob Ball
wrote: On 2016/01/12, 12:50, "lustre-discuss on behalf of Bob Ball" <lustre-discuss-boun...@lists.lustre.org on behalf of b...@umich.edu> wrote: I have a zfs OST that I need to drain and re-create. This is lustre 2.7.x. In the past, with ldiskfs OST, I did this a number of ti

[lustre-discuss] Error on a zpool underlying an OST

2016-03-11 Thread Bob Ball
Hi, we have Lustre 2.7.58 in place on our OST and MDT/MGS (combined). Underlying the lustre file system is a raid-z2 zfs pool. A few days ago, we lost 2 disks at once from the raid-z2. I replaced one and a resilver started, that seemed to choke. So, I put back both disks with replacements,

Re: [lustre-discuss] Lustre 2.8.0 released

2016-03-28 Thread Bob Ball
Has anyone else noticed that the lnetctl command is missing from this rpm set? I mean, an entire chapter of the manual dedicated to this command, and it is not present? Will this be fixed? bob On 3/16/2016 6:13 PM, Jones, Peter A wrote: We are pleased to announce that the Lustre 2.8.0

Re: [lustre-discuss] Lustre 2.8.0 released

2016-03-28 Thread Bob Ball
Glossman HPDD Software Engineer On 3/28/16, 1:11 PM, "lustre-discuss on behalf of Bob Ball" <lustre-discuss-boun...@lists.lustre.org on behalf of b...@umich.edu> wrote: Has anyone else noticed that the lnetctl command is missing from this rpm set? I mean, an entire chapt

Re: [lustre-discuss] ZFS version for lustre-2.8 binaries?

2016-04-12 Thread Bob Ball
What I recall hearing is that, because of IO issues with 0.6.5, official Lustre 2.8.0 support is with zfs 0.6.4.2 bob On 4/12/2016 11:21 AM, Nathan Smith wrote: In doing a test install of vanilla Lustre 2.8 [0] and zfs-0.6.5.6 I received the symbol mismatch error when attempting to start

Re: [lustre-discuss] Lustre 2.8.0 released

2016-04-04 Thread Bob Ball
to master as you can see. On Mar 28, 2016, at 5:27 PM, Bob Ball wrote: Thanks. Are you going to replace the 2.8 rpm sets in your repos, or just leave them that way until 2.9? Yeah, that is a provocative question. It just seems to me that it is such a fundamental thing, that a new rpm set s

Re: [lustre-discuss] ZFS not freeing disk space

2016-08-10 Thread Bob Ball
It is my understanding that when you set the OST deactivated, they you also don't get updates on the used space either, as of some recent version of Lustre. It has never been clear to me though if a simple re-activation is needed to run the logs, or if the reboot is required, once it is

Re: [lustre-discuss] Error on a zpool underlying an OST

2016-07-12 Thread Bob Ball
The answer came offline, and I guess I never replied back to the original posting. This is what I learned. It deals with only a single file, not 1000's. --bob --- On Mon, 14 Mar 2016, Bob Ball wrote: OK, it would seem the affected user has already deleted

Re: [lustre-discuss] scrubbing Lustre/ZFS OSSs

2017-01-31 Thread Bob Ball
Just "zpool scrub ". Scrub may slow down access, but it does not otherwise impact the OST, in my experience. bob On 1/30/2017 9:52 PM, Riccardo Veraldi wrote: Hello, I need to scrub the underlying ZFS data pools on my Lustre OSSs. May I do it safely when the Lustre filesystem is mounted ?

Re: [lustre-discuss] ZFS PANIC

2017-02-17 Thread Bob Ball
, it seems as if it must be mdtmgs related, by the old "what else could it be?" argument. Is this OST index a dead loss? Fix this index, or destroy forever and introduce a new OST? bob On 2/13/2017 1:00 PM, Bob Ball wrote: OK, so, I tried some new system mounts today, and each ti

Re: [lustre-discuss] ZFS PANIC

2017-02-18 Thread Bob Ball
.50 and .90), so you would be better off to update to the latest release (e.g. 2.8.0 or 2.9.0), which will have had much more testing. Likewise, ZFS 0.6.4.x is quite old and many fixes have gone into ZFS 0.6.5.x. Cheers, Andreas On Feb 17, 2017, at 19:21, Bob Ball <b...@umich.edu>

Re: [lustre-discuss] ZFS PANIC

2017-02-10 Thread Bob Ball
: 18 UP obdfilter umt3B-OST000f umt3B-OST000f_UUID 403 bob On 2/10/2017 9:39 AM, Bob Ball wrote: Hi, I am getting this message PANIC: zfs: accessing past end of object 29/7 (size=33792 access=33792+128) The affected OST seems to reject new mounts from clients now, and the lctl dl count

Re: [lustre-discuss] ZFS PANIC

2017-02-13 Thread Bob Ball
completion using spare disks. It would be nice though if someone had a better way to fix this, or could truly point to a reason why this is consistently happening now. bob On 2/10/2017 11:23 AM, Bob Ball wrote: Well, I find this odd, to say the least. All of this below was from yesterday, and

[lustre-discuss] ZFS PANIC

2017-02-10 Thread Bob Ball
Hi, I am getting this message PANIC: zfs: accessing past end of object 29/7 (size=33792 access=33792+128) The affected OST seems to reject new mounts from clients now, and the lctl dl count of connections to the obdfilter process increases, but does not seem to decrease? This is Lustre

Re: [lustre-discuss] "Not on preferred path" error

2016-09-20 Thread Bob Ball
Stabbing in the dark, but this sounds like a multipath problem. Perhaps you have 2 or more paths to the storage, and one or more of them is down for some reason, perhaps the hardware itself, perhaps a cable is pulled You could look for LEDs in a bad state. I always find it instructive to

Re: [lustre-discuss] lfs_migrate transfer question

2016-09-26 Thread Bob Ball
Pre-make your list of files, split it into 5 or so parts, give each of 5 clients one piece, and let them migrate in parallel. bob On 9/26/2016 9:00 AM, Jérôme BECOT wrote: Hello, As we had a inode usage issue, we did as Andreas advised : - add a new ost with much more inodes - disable the

Re: [lustre-discuss] Lustre Orphaned Chunks

2016-10-17 Thread Bob Ball
You'll need to restart the mds/mdt machine before the space will reclaim off the OST. bob On 10/17/2016 2:32 PM, DeWitt, Chad wrote: Hi All. I am still learning Lustre and I have run into an issue. I have referred to both the Lustre admin manual and Google, but I've had no luck in finding

Re: [lustre-discuss] set OSTs read only ?

2017-07-16 Thread Bob Ball
until the OST is activated again. On Sun, Jul 16, 2017, 9:29 AM E.S. Rosenberg <esr+lus...@mail.hebrew.edu <mailto:esr%2blus...@mail.hebrew.edu>> wrote: On Thu, Jul 13, 2017 at 5:49 AM, Bob Ball <b...@umich.edu <mailto:b...@umich.edu>> wrote: On the m

Re: [lustre-discuss] set OSTs read only ?

2017-07-17 Thread Bob Ball
en't tested this myself). The problem might be that this prevents new clients from mounting. It probably makes sense to add server-side read-only mounting as a feature. Could you please file a ticket in Jira about this? Cheers, Andreas On Jul 16, 2017, at 09:16, Bob Ball <b...@umich.edu

Re: [lustre-discuss] set OSTs read only ?

2017-07-12 Thread Bob Ball
On the mgs/mgt do something like: lctl --device -OST0019-osc-MDT deactivate No further files will be assigned to that OST. Reverse with "activate". Or reboot the mgs/mdt as this is not persistent. "lctl dl" will tell you exactly what that device name should be for you. bob On

Re: [lustre-discuss] Rhel 7 or centos 7 for lustre mds and oss

2017-10-18 Thread Bob Ball
We have successfully used Scientific Linux 7, a variant of CentOS. bob On 10/18/2017 5:59 AM, Amjad Syed wrote: Hello We are in process of purchasing a new lustre filesystem for our site that will be used for life sciences and genomics. We would like to know if we should buy rhel license or

Re: [lustre-discuss] Lustre clients do not mount storage automatically

2017-10-24 Thread Bob Ball
If mounting from /etc/fstab, try adding "_netdev" as a parameter. This forces the mount to wait until the network is ready. bob On 10/24/2017 5:58 AM, Ravi Konila wrote: Hi My lustre clients does not mount lustre storage automatically on reboot. I tried by adding in fstab as well as in

Re: [lustre-discuss] Lustre clients do not mount storage automatically

2017-10-24 Thread Bob Ball
    /home    lustre defaults,_netdev    0 0 If I add mount command in rc.local twice, it works..surprise.. my rc.local has mount –t lustre 192.168.0.50@o2ib:/lhome /home mount –t lustre 192.168.0.50@o2ib:/lhome /home Regards *Ravi Konila* *From:* Bob Ball *Sent:* Tuesday, October 24, 2017 6:10 PM

Re: [lustre-discuss] OST mount issue

2021-04-26 Thread Bob Ball via lustre-discuss
Secure Linux? bob On 4/26/2021 12:26 PM, Steve Thompson wrote: On Mon, 26 Apr 2021, Degremont, Aurelien wrote: Could you provide more debugging information, like 'rpm -qa | grep lustre' on both hosts? The actual mount command, etc... There must be something different, as the result is