Re: [Lustre-discuss] Performance dropoff for a nearly full Lustre file system

2015-01-14 Thread Bob Ball
In my memory, it is not recommended to run Lustre more than 90% full. bob On 1/14/2015 2:43 PM, Mike Selway wrote: Hello, I’m looking for experiences for what has been observed to happen (performance drop offs, severity of drops, partial/full failures, …) when an operational

[Lustre-discuss] Recovering a reformatted OST

2015-01-29 Thread Bob Ball
This is Lustre 2.1.6. I did a dumb thing. I reformatted a drained OST BEFORE I saved off appropriate data to re-establish it at its old index. I don't know what I was thinking? The files that I would normally restore are, LAST_ID last_rcvd mountdata umt3-OST0020 I know how to recreate the

Re: [Lustre-discuss] Recovering a reformatted OST

2015-01-30 Thread Bob Ball
Thank you, Andreas. I feel a bit better now. bob On 1/30/2015 11:25 AM, Dilger, Andreas wrote: On 2015/01/29, 12:34 PM, "Bob Ball" wrote: This is Lustre 2.1.6. I did a dumb thing. I reformatted a drained OST BEFORE I saved off appropriate data to re-establish it at its old

Re: [lustre-discuss] OST partition sizes

2015-04-28 Thread Bob Ball
Our biggest is a raidz6 on 10 disks of 4TB each. 8 is a good divisor into 1MB, and the 10 disks is a good divisor as well for our total number of disks. bob On 4/28/2015 4:01 PM, Andrus, Brian Contractor wrote: Quick question/survey: What is the partition size folks use for their OSTs and

[lustre-discuss] Deactivate an OST for new file write operations

2015-05-04 Thread Bob Ball
We just built a new 2.7.0 Lustre file system. Overall I'm happy with performance, but something is confusing me. We have a combined mgs/mdt DataStore. On that server I issue: ... 20 UP osp umt3B-OST000b-osc-MDT umt3B-MDT-mdtlov_UUID 5 [root@mdtmgs ~]# lctl --device 20 deactivate /var

Re: [lustre-discuss] Deactivate an OST for new file write operations

2015-05-04 Thread Bob Ball
index fail as expected, attempts to write to another, enabled index are fine. bob On 5/4/2015 1:39 PM, Bob Ball wrote: We just built a new 2.7.0 Lustre file system. Overall I'm happy with performance, but something is confusing me. We have a combined mgs/mdt DataStore. On that server I

Re: [lustre-discuss] Deactivate an OST for new file write operations

2015-05-05 Thread Bob Ball
ted a documentation change and LUDOC-218 was created which is still open. Regards, Roland Am 04.05.2015 um 19:54 schrieb Bob Ball: Hmm, an interesting addendum to this, section 18.3.4 of the Lustre manual shows how to create a file on a given OST. If I try that to the disabled OST, it silently

[lustre-discuss] Can't get zfs OST to mount on a freshly built server

2015-05-13 Thread Bob Ball
OK, so, I am seeing EXACTLY the issue reported at the end of LU-6452 8 minutes after it was closed by Andreas Dilger. https://jira.hpdd.intel.com/browse/LU-6452 There is no response. Is there a solution? This is Lustre 2.7.0 with (now) zfs 0.6.4.1-1, which was current when the server was buil

Re: [lustre-discuss] Can't get zfs OST to mount on a freshly built server

2015-05-25 Thread Bob Ball
here. Many thanks, bob On 5/13/2015 10:16 AM, Bob Ball wrote: OK, so, I am seeing EXACTLY the issue reported at the end of LU-6452 8 minutes after it was closed by Andreas Dilger. https://jira.hpdd.intel.com/browse/LU-6452 There is no response. Is there a solution? This is Lustre 2.7.0 with

[lustre-discuss] File writes blocking on Lustre 2.7.0

2015-06-05 Thread Bob Ball
OK, this was just odd to me. We have a Lustre 2.7.0 system running now, and, after setting up our first OSS, copying over all files from the old system, we brought the new file system online. The new backingstore is zfs. All was well with the world. Meanwhile, all the old file servers were

Re: [lustre-discuss] upgrading zfs back end file system lustre doesn't mount anymore

2015-06-08 Thread Bob Ball
I had the same issue. I could not find/get an answer, posted to this same list about 1mo ago. I ended up down-grading to the same zfs I had previously used, ie, 0.6.3.1 with Lustre 2.7.0 Maybe an answer has come round in the intervening month? But, it would appear that if you use zfs, you a

[lustre-discuss] Problem on Lustre 2.7.0

2015-06-10 Thread Bob Ball
We are running Lustre 2.7.0. # uname -r 2.6.32-504.8.1.el6_lustre.x86_64 The combined mgsmdt load jumped up yesterday, and stayed high since, with a couple of really outrageous peaks. Ended up power cycling, as the mdt would not umount. It seems to be performing fine now, but while watching l

Re: [lustre-discuss] lustre2 cannot set ost to read only

2015-06-15 Thread Bob Ball
cat /proc/fs/lustre/lov/*/target_obd I forget which version this bug appeared in, that the lctl command does not show the IN state. I am hoping it will be fixed in the 2.8 release as it is really annoying. bob On 6/15/2015 3:39 PM, Kurt Strosahl wrote: An update, Even though the MDT i

Re: [lustre-discuss] File migration from one OST to another for Lustre 2.5

2015-06-24 Thread Bob Ball
So, let's say I just want to empty the OST completely, reformat or remake or, then add the OST back in. If I never let the emptied OST, prior to re-X, reconnect, then the MDS cannot destroy the objects on the OST. Is this going to be an issue once the re-X OST is ready and is brought bac

Re: [lustre-discuss] Removing large directory tree

2015-07-10 Thread Bob Ball
FWIW, last September in the context of reducing the memory usage on the mgs, Andreas Dilger had this to say "ls" is itself fairly inefficient at directory traversal, because all of the GNU file utilities are bloated and do much more work than necessary (e.g. "rm" will stat() every file before un

[lustre-discuss] Trying to rebuild all Lustre rpms with zfs 0.6.4.2 -- and failing

2015-07-22 Thread Bob Ball
Hi, I'm looking for someone who can give me advice on a problem I am having rebuilding the full set (server and client) of Lustre rpms, including zfs support on the server side. The 2.7.0 rpms as distributed were built with zfs 0.6.3, and do not work with 0.6.4. So, I follow the directions

[lustre-discuss] Trying to rebuild all Lustre rpms with zfs 0.6.4.2 -- and failing

2015-07-22 Thread Bob Ball
Hi, I'm looking for someone who can give me advice on a problem I am having rebuilding the full set (server and client) of Lustre rpms, including zfs support on the server side. The 2.7.0 rpms as distributed were built with zfs 0.6.3, and do not work with 0.6.4. So, I follow the directions

Re: [lustre-discuss] Expanding a zfsonlinux OST pool

2015-11-24 Thread Bob Ball
-Olaf *From:* Bob Ball [b...@umich.edu] *Sent:* Monday, November 23, 2015 12:22 PM *To:* Faaland, Olaf P.; Morrone, Chris *Cc:* Bob Ball *Subject:* Expanding a zfsonlinux OST pool Hi, We have some zfsonlinux pools in use with Lustre 2.7 that use some older disks, and we are rapidly running o

Re: [lustre-discuss] Expanding a zfsonlinux OST pool

2015-12-07 Thread Bob Ball
w zfs is using the disks. Also, how often are disks failing and how long does a replacement take to resilver, with your current disks? -Olaf *From:* Bob Ball [b...@umich.edu] *Sent:* Monday, November 23, 2015 12:22 P

Re: [lustre-discuss] Building lustre with zfs only

2015-12-15 Thread Bob Ball
You may want to do this on your build machine? cd /usr/src/spl-0.6.4.2/ ./configure --with-config=kernel make all cd ../zfs-0.6.4.2/ ./configure --with-config=kernel make all bob On 12/15/2015 2:16 PM, Christopher J. Morrone wrote: Is your OS CentOS 6.7 then? You _might_ be hitting bug LU-7

[lustre-discuss] re-creating a zfs OST

2016-01-12 Thread Bob Ball
I have a zfs OST that I need to drain and re-create. This is lustre 2.7.x. In the past, with ldiskfs OST, I did this a number of times following critical failures, and came up with the following items to be replaced on the file system once the mkfs.lustre had been run, where the new OST was m

Re: [lustre-discuss] re-creating a zfs OST

2016-01-14 Thread Bob Ball
/12, 12:50, "lustre-discuss on behalf of Bob Ball" wrote: I have a zfs OST that I need to drain and re-create. This is lustre 2.7.x. In the past, with ldiskfs OST, I did this a number of times following critical failures, and came up with the following items to be replaced on the

Re: [lustre-discuss] re-creating a zfs OST

2016-01-14 Thread Bob Ball
Thank you, Andreas. As always, your answers are the best. bob On 1/13/2016 10:14 PM, Dilger, Andreas wrote: On 2016/01/12, 12:50, "lustre-discuss on behalf of Bob Ball" wrote: I have a zfs OST that I need to drain and re-create. This is lustre 2.7.x. In the past, with ldiskfs

[lustre-discuss] Error on a zpool underlying an OST

2016-03-11 Thread Bob Ball
Hi, we have Lustre 2.7.58 in place on our OST and MDT/MGS (combined). Underlying the lustre file system is a raid-z2 zfs pool. A few days ago, we lost 2 disks at once from the raid-z2. I replaced one and a resilver started, that seemed to choke. So, I put back both disks with replacements,

Re: [lustre-discuss] Error on a zpool underlying an OST

2016-03-12 Thread Bob Ball
broken zfs object, like in "zdb: Examining ZFS At Point-Blank Range," and see what it is (plain zfs file or else). Knowing zfs version can be helpful. Alex On Mar 11, 2016, at 7:19 PM, Bob Ball <mailto:b...@umich.edu>> wrote: errors: Permanent errors have been de

Re: [lustre-discuss] Lustre 2.8.0 released

2016-03-28 Thread Bob Ball
Has anyone else noticed that the lnetctl command is missing from this rpm set? I mean, an entire chapter of the manual dedicated to this command, and it is not present? Will this be fixed? bob On 3/16/2016 6:13 PM, Jones, Peter A wrote: We are pleased to announce that the Lustre 2.8.0 Relea

Re: [lustre-discuss] Lustre 2.8.0 released

2016-03-28 Thread Bob Ball
Glossman HPDD Software Engineer On 3/28/16, 1:11 PM, "lustre-discuss on behalf of Bob Ball" wrote: Has anyone else noticed that the lnetctl command is missing from this rpm set? I mean, an entire chapter of the manual dedicated to this command, and it is not present? Will thi

Re: [lustre-discuss] Lustre 2.8.0 released

2016-04-04 Thread Bob Ball
ee. On Mar 28, 2016, at 5:27 PM, Bob Ball wrote: Thanks. Are you going to replace the 2.8 rpm sets in your repos, or just leave them that way until 2.9? Yeah, that is a provocative question. It just seems to me that it is such a fundamental thing, that a new rpm set should be created for dist

Re: [lustre-discuss] ZFS version for lustre-2.8 binaries?

2016-04-12 Thread Bob Ball
What I recall hearing is that, because of IO issues with 0.6.5, official Lustre 2.8.0 support is with zfs 0.6.4.2 bob On 4/12/2016 11:21 AM, Nathan Smith wrote: In doing a test install of vanilla Lustre 2.8 [0] and zfs-0.6.5.6 I received the symbol mismatch error when attempting to start lustr

Re: [lustre-discuss] Error on a zpool underlying an OST

2016-07-12 Thread Bob Ball
The answer came offline, and I guess I never replied back to the original posting. This is what I learned. It deals with only a single file, not 1000's. --bob --- On Mon, 14 Mar 2016, Bob Ball wrote: OK, it would seem the affected user has already deleted

Re: [lustre-discuss] ZFS not freeing disk space

2016-08-10 Thread Bob Ball
It is my understanding that when you set the OST deactivated, they you also don't get updates on the used space either, as of some recent version of Lustre. It has never been clear to me though if a simple re-activation is needed to run the logs, or if the reboot is required, once it is re-act

Re: [lustre-discuss] "Not on preferred path" error

2016-09-20 Thread Bob Ball
Stabbing in the dark, but this sounds like a multipath problem. Perhaps you have 2 or more paths to the storage, and one or more of them is down for some reason, perhaps the hardware itself, perhaps a cable is pulled You could look for LEDs in a bad state. I always find it instructive to

Re: [lustre-discuss] lfs_migrate transfer question

2016-09-26 Thread Bob Ball
Pre-make your list of files, split it into 5 or so parts, give each of 5 clients one piece, and let them migrate in parallel. bob On 9/26/2016 9:00 AM, Jérôme BECOT wrote: Hello, As we had a inode usage issue, we did as Andreas advised : - add a new ost with much more inodes - disable the fi

Re: [lustre-discuss] Lustre Orphaned Chunks

2016-10-17 Thread Bob Ball
You'll need to restart the mds/mdt machine before the space will reclaim off the OST. bob On 10/17/2016 2:32 PM, DeWitt, Chad wrote: Hi All. I am still learning Lustre and I have run into an issue. I have referred to both the Lustre admin manual and Google, but I've had no luck in finding

Re: [lustre-discuss] scrubbing Lustre/ZFS OSSs

2017-01-31 Thread Bob Ball
Just "zpool scrub ". Scrub may slow down access, but it does not otherwise impact the OST, in my experience. bob On 1/30/2017 9:52 PM, Riccardo Veraldi wrote: Hello, I need to scrub the underlying ZFS data pools on my Lustre OSSs. May I do it safely when the Lustre filesystem is mounted ?

[lustre-discuss] ZFS PANIC

2017-02-10 Thread Bob Ball
Hi, I am getting this message PANIC: zfs: accessing past end of object 29/7 (size=33792 access=33792+128) The affected OST seems to reject new mounts from clients now, and the lctl dl count of connections to the obdfilter process increases, but does not seem to decrease? This is Lustre 2.7.

Re: [lustre-discuss] ZFS PANIC

2017-02-10 Thread Bob Ball
state: 18 UP obdfilter umt3B-OST000f umt3B-OST000f_UUID 403 bob On 2/10/2017 9:39 AM, Bob Ball wrote: Hi, I am getting this message PANIC: zfs: accessing past end of object 29/7 (size=33792 access=33792+128) The affected OST seems to reject new mounts from clients now, and the lctl dl cou

Re: [lustre-discuss] ZFS PANIC

2017-02-13 Thread Bob Ball
upon completion using spare disks. It would be nice though if someone had a better way to fix this, or could truly point to a reason why this is consistently happening now. bob On 2/10/2017 11:23 AM, Bob Ball wrote: Well, I find this odd, to say the least. All of this below was from yesterday

Re: [lustre-discuss] ZFS PANIC

2017-02-17 Thread Bob Ball
seems as if it must be mdtmgs related, by the old "what else could it be?" argument. Is this OST index a dead loss? Fix this index, or destroy forever and introduce a new OST? bob On 2/13/2017 1:00 PM, Bob Ball wrote: OK, so, I tried some new system mounts today, and each ti

Re: [lustre-discuss] ZFS PANIC

2017-02-18 Thread Bob Ball
tween .50 and .90), so you would be better off to update to the latest release (e.g. 2.8.0 or 2.9.0), which will have had much more testing. Likewise, ZFS 0.6.4.x is quite old and many fixes have gone into ZFS 0.6.5.x. Cheers, Andreas On Feb 17, 2017, at 19:21, Bob Ball wrote: No luck, remove

Re: [lustre-discuss] set OSTs read only ?

2017-07-12 Thread Bob Ball
On the mgs/mgt do something like: lctl --device -OST0019-osc-MDT deactivate No further files will be assigned to that OST. Reverse with "activate". Or reboot the mgs/mdt as this is not persistent. "lctl dl" will tell you exactly what that device name should be for you. bob On 7/12/201

Re: [lustre-discuss] set OSTs read only ?

2017-07-16 Thread Bob Ball
jects until the OST is activated again. On Sun, Jul 16, 2017, 9:29 AM E.S. Rosenberg mailto:esr%2blus...@mail.hebrew.edu>> wrote: On Thu, Jul 13, 2017 at 5:49 AM, Bob Ball mailto:b...@umich.edu>> wrote: On the mgs/mgt do something like: lctl --device -OS

Re: [lustre-discuss] set OSTs read only ?

2017-07-17 Thread Bob Ball
en't tested this myself). The problem might be that this prevents new clients from mounting. It probably makes sense to add server-side read-only mounting as a feature. Could you please file a ticket in Jira about this? Cheers, Andreas On Jul 16, 2017, at 09:16, Bob Ball <mailto:b...@um

Re: [lustre-discuss] Rhel 7 or centos 7 for lustre mds and oss

2017-10-18 Thread Bob Ball
We have successfully used Scientific Linux 7, a variant of CentOS. bob On 10/18/2017 5:59 AM, Amjad Syed wrote: Hello We are in process of purchasing a new lustre filesystem for our site that will be used for life sciences and genomics. We would like to know if we should buy rhel license or g

Re: [lustre-discuss] Lustre clients do not mount storage automatically

2017-10-24 Thread Bob Ball
If mounting from /etc/fstab, try adding "_netdev" as a parameter. This forces the mount to wait until the network is ready. bob On 10/24/2017 5:58 AM, Ravi Konila wrote: Hi My lustre clients does not mount lustre storage automatically on reboot. I tried by adding in fstab as well as in /etc/rc

Re: [lustre-discuss] Lustre clients do not mount storage automatically

2017-10-24 Thread Bob Ball
lhome    /home    lustre defaults,_netdev    0 0 If I add mount command in rc.local twice, it works..surprise.. my rc.local has mount –t lustre 192.168.0.50@o2ib:/lhome /home mount –t lustre 192.168.0.50@o2ib:/lhome /home Regards *Ravi Konila* *From:* Bob Ball *Sent:* Tuesday, October 24, 2017 6:10 P

Re: [lustre-discuss] Lustre 2.10.3 on RHEL7.5 can't correctly mount from Lustre 2.8 server

2018-05-11 Thread Bob Ball
No, that version of Lustre does not work with SL7.5.  See some previous threads on this.  I gather we need 2.11 for SL7.5 bob On 5/11/2018 10:15 AM, Kevin M. Hildebrand wrote: I'm not sure if this is a supported behavior or not, but I'm currently unable to mount a filesystem from my 2.8 server

[Lustre-discuss] Cannot get an OST to activate

2010-09-03 Thread Bob Ball
We added a new OSS to our 1.8.4 Lustre installation. It has 6 OST of 8.9TB each. Within a day of having these on-line, one OST stopped accepting new files. I cannot get it to activate. The other 5 seem fine. On the MDS "lctl dl" shows it IN, but not UP, and files can be read from it: 33 IN

Re: [Lustre-discuss] Cannot get an OST to activate

2010-09-03 Thread Bob Ball
files back from that command, but other problems on our cluster confused that result.  We will recheck. bob Bernd Schubert wrote: On Friday, September 03, 2010, Bob Ball wrote: We added a new OSS to our 1.8.4 Lustre installation. It has 6 OST of 8.9TB each. Within a day of havin

Re: [Lustre-discuss] client modules not loading during boot

2010-09-08 Thread Bob Ball
Try adding _netdev as a mount option. bob Cliff White wrote: The mount command will automatically load the modules on the client. cliffw On 09/03/2010 11:56 AM, Ronald K Long wrote: We have installed lustre 1.8.2 and 1.8.4 client on Red hat 5. The lustre modules are not loading

Re: [Lustre-discuss] Cannot get an OST to activate

2010-09-10 Thread Bob Ball
d0   292648   346413 e0    68225 -7137254917378053186 f0    59064    59607 000100    59227    59414 Thanks, bob Bob Ball wrote: Thank you, Bern.  "df" claims there is some 442MB of data on the v

Re: [Lustre-discuss] Cannot get an OST to activate

2010-09-10 Thread Bob Ball
I just made some random checks on the "lfs find" output for this OST from yesterday.  Each file I checked was one lost when we had problems a few months back.  The suggested "unlink" on these did not work in 1.8.3, worked fine on a whole set yesterday with 1.8.4, but I obviously did not find th

Re: [Lustre-discuss] Cannot get an OST to activate

2010-09-10 Thread Bob Ball
so no down-time was involved. bob Bob Ball wrote: I just made some random checks on the "lfs find" output for this OST from yesterday.  Each file I checked was one lost when we had problems a few months back.  The suggested "unlink" on these did not work in 1.8.3, worked fine

Re: [Lustre-discuss] Large directory performance

2010-09-13 Thread Bob Ball
Peter, can you comment on what you said here about RAID6? Are there Twiki or other entries somewhere about this? There are relevant bits of advice in 1.4.2.2 and 10.1.1-4 for example (some of them objectionable, such as recommending RAID6 for data storage, without the necessary qualifications at

Re: [Lustre-discuss] How do you monitor your lustre?

2010-09-30 Thread Bob Ball
syslog-ng bob On 9/30/2010 1:31 PM, Ben Evans wrote: > Would't it be better to log to a non-Lustre machine? If your MDS goes down, > you lose all your logs, which would make things a bit trickier. > > -Original Message- > From: lustre-discuss-boun...@lists.lustre.org > [mailto:lustre

Re: [Lustre-discuss] Question about "lfs find"

2010-10-06 Thread Bob Ball
in my experience a single (or small number) of lfs_find with lots of obd arguments was faster than doing all of them individually. go to lustre 1.8.4 (at least) and use lfs_migrate with your lfs_find list. it wasn't REAL fast, but it was REAL reliable. bob On 10/6/2010 5:24 PM, Michael Barn

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Bob Ball
Why do you need both active? If one is a backup to the other, then bond them as a primary/backup pair, meaning only one will be active at at a time, ie, your designated primary (unless it goes down). bob On 10/21/2010 9:51 AM, Brock Palen wrote: > On Oct 21, 2010, at 9:48 AM, Joe Landman wrote

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Bob Ball
afterwards for information. bob On 10/21/2010 9:59 AM, Bob Ball wrote: > Why do you need both active? If one is a backup to the other, then bond > them as a primary/backup pair, meaning only one will be active at at a > time, ie, your designated primary (unless it goes down). > > bob &g

[Lustre-discuss] completely emptying an OST

2010-10-24 Thread Bob Ball
We may need to completely empty an OST, so that we can more efficiently format the underlying RAID-6 array (in light of recent discussions about the stripe on this list). What is the most efficient way to do this? I can use multiple clients running lfs_migrate from pre-prepared lists, but tha

[Lustre-discuss] questions about an OST content

2010-11-06 Thread Bob Ball
I am emptying a set of OST so that I can reformat the underlying RAID-6 more efficiently. Two questions: 1. Is there a quick way to tell if the OST is really empty? lfs_find takes many hours to run. 2. When I reformat, I want it to retain the same ID so as to not make "holes" in the list. Fro

Re: [Lustre-discuss] questions about an OST content

2010-11-06 Thread Bob Ball
ount_point This still shows several hundred MB as well, in disparate amounts per OST I'm emptying. An indicator, but not perfect. Yes, the OST were near full to begin with. More below. On 11/6/2010 11:09 AM, Andreas Dilger wrote: > On 2010-11-06, at 8:24, Bob Ball wrote: >> I am

Re: [Lustre-discuss] questions about an OST content

2010-11-07 Thread Bob Ball
ex even matter if we restore the files below that you mention? That seems to be what you are saying. Thanks much, bob On 11/6/2010 11:09 AM, Andreas Dilger wrote: > On 2010-11-06, at 8:24, Bob Ball wrote: >> I am emptying a set of OST so that I can reformat the underlying RAID-6 >&g

Re: [Lustre-discuss] questions about an OST content

2010-11-07 Thread Bob Ball
BTW, the new OST sizes will be much different from the original OST sizes. Is the "copy the old file" method below still valid in this case? bob On 11/7/2010 2:32 PM, Bob Ball wrote: > Hi, Andreas. > > Tomorrow, we will redo all 8 OST on the first file server we are &g

Re: [Lustre-discuss] questions about an OST content

2010-11-07 Thread Bob Ball
Thanks, Ashley. No quotas, fortunately. Tomorrow will be "fun". bob On 11/7/2010 3:44 PM, Ashley Pittman wrote: > On 7 Nov 2010, at 19:32, Bob Ball wrote: >> So, while we are doing the reformat, is there any way to avoid this >> "hang" situation? > I bel

Re: [Lustre-discuss] questions about an OST content

2010-11-08 Thread Bob Ball
dress already in use retries left: 0 mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already in use The target service's index is already in use. (/dev/sdc) On 11/8/2010 5:01 AM, Andreas Dilger wrote: On 2010-11-07, at 12:32, B

Re: [Lustre-discuss] questions about an OST content

2010-11-08 Thread Bob Ball
/8/2010 5:01 AM, Andreas Dilger wrote: On 2010-11-07, at 12:32, Bob Ball <b...@umich.edu> wrote: Tomorrow, we will redo all 8 OST on the first file server we are redoing.  I am very nervous about this, as a lot is riding on us doing this co

Re: [Lustre-discuss] questions about an OST content

2010-11-08 Thread Bob Ball
Yes, you are correct. That was the key here, did not put that file back in place. Back up and (so far) operating cleanly. Thanks, bob On 11/8/2010 3:04 PM, Andreas Dilger wrote: > On 2010-11-08, at 11:39, Bob Ball wrote: >> Don't know if I sent to the whole list. One

Re: [Lustre-discuss] questions about an OST content

2010-11-10 Thread Bob Ball
, but can't seem to find anything on it now, or what to do about it. help? bob On 11/8/2010 4:27 PM, Bob Ball wrote: > Yes, you are correct. That was the key here, did not put that file back > in place. Back up and (so far) operating cleanly. > > Thanks, > bob > > On 11

Re: [Lustre-discuss] questions about an OST content

2010-11-10 Thread Bob Ball
t syncing: Fatal exception On 11/10/2010 1:01 PM, Bob Ball wrote: > Well, we ran 2 days, migrating files off OST, then this morning, the MDT > crashed. Could not get all clients reconnected before seeing another > kernel panic on the mdt. did an e2fsck of the mdt db and tried again.

Re: [Lustre-discuss] questions about an OST content

2010-11-10 Thread Bob Ball
-10, at 11:01, Bob Ball wrote: >> Well, we ran 2 days, migrating files off OST, then this morning, the MDT >> crashed. Could not get all clients reconnected before seeing another >> kernel panic on the mdt. did an e2fsck of the mdt db and tried again. >> crashed agai

[Lustre-discuss] Problems with lfs find

2010-11-29 Thread Bob Ball
I have an odd problem. I am trying to empty all files from a set of OST as indicated below, by making a list via lfs find and then sending that list to lfs_migrate. However, I have just gotten this message back from the lfs find: llapi_semantic_traverse: Failed to open '/lustre/umt3/data13/d

Re: [Lustre-discuss] Problems with lfs find

2010-11-30 Thread Bob Ball
OK, thanks. Scary, to see errors out of lfs find. bob On 11/30/2010 1:47 AM, Andreas Dilger wrote: > On 2010-11-29, at 20:18, Bob Ball wrote: >> I have an odd problem. I am trying to empty all files from a set of OST >> as indicated below, by making a list via lfs find and th

Re: [Lustre-discuss] Problems with lfs find

2010-11-30 Thread Bob Ball
ply copy them out of this "ldiskfs" mount of the file system, back into some recovery directory in the real file system, so that users can pick through them? After they are moved, the file system will be reformatted and returned to use. bob On 11/30/2010 8:53 AM, Bob Ball wrote: >

Re: [Lustre-discuss] Problems with lfs find

2010-11-30 Thread Bob Ball
On 11/30/2010 4:17 PM, Andreas Dilger wrote: > On 2010-11-30, at 11:17, Bob Ball wrote: >> [r...@umdist03 d0]# ls -l >> total 182976 >> -rw-rw-rw- 1 daits users 45002956 Jul 5 20:52 1162976 >> -rw-rw-rw- 1 daits users 44569036 Jul 7 02:53 1200608 >> -rw-rw-rw- 1

[Lustre-discuss] OST error

2010-12-02 Thread Bob Ball
We were getting errors thrown by an OST. /var/log/messages contained a lot of these: 2010-11-28T17:05:34-05:00 umfs06.aglt2.org kernel: [2102640.735927] LDISKFS-fs error (device sdk): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 639corrupted: 440 blocks free in bitmap, 439 - in gd

Re: [Lustre-discuss] OST error

2010-12-02 Thread Bob Ball
fy the error counters > on that. > > -cf > > > On 12/02/2010 02:00 PM, Bob Ball wrote: >> We were getting errors thrown by an OST. /var/log/messages contained a >> lot of these: >> 2010-11-28T17:05:34-05:00 umfs06.aglt2.org kernel: [2102640.735927] >> LDI

Re: [Lustre-discuss] OST error

2010-12-03 Thread Bob Ball
remain corrupted, and we'll probably never be able to come up with a complete list of them. bob On 12/2/2010 4:35 PM, Bob Ball wrote: > It is a Dell PERC6 RAID array. OMSA monitoring is enabled and is not > throwing errors. H, mptctl is old though, so maybe that is a >

[Lustre-discuss] Getting around a Catch-22

2010-12-07 Thread Bob Ball
We have 6 OSS, each with at least 8 OST. It sometimes happens that I need to do maintenance on an OST, so to avoid hanging processes on the client machines, I use lctl to disable access to that OST on active client machines. So, now, it may happen during this maintenance that a client machine

Re: [Lustre-discuss] Getting around a Catch-22

2010-12-08 Thread Bob Ball
Thanks for the pointer to this. After thinking on this a bit, I believe I can see my way clear to using it. Testing time bob On 12/7/2010 5:45 PM, Cliff White wrote: > On 12/07/2010 06:51 AM, Bob Ball wrote: >> We have 6 OSS, each with at least 8 OST. It sometimes happens that

[Lustre-discuss] Renaming an OSS

2010-12-13 Thread Bob Ball
For administrative reasons, we want to rename an OSS. It will retain the same IP addresses, and have the same OST. Does this present any problems that I should be aware of, or is it a no-brainer? Thanks, bob ___ Lustre-discuss mailing list Lustre-di

Re: [Lustre-discuss] Renaming an OSS

2010-12-13 Thread Bob Ball
From: lustre-discuss-boun...@lists.lustre.org > [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Bob Ball > Sent: Monday, December 13, 2010 1:49 PM > To: Lustre discussion > Subject: [Lustre-discuss] Renaming an OSS > > For administrative reasons, we want to rename

Re: [Lustre-discuss] Renaming an OSS

2010-12-13 Thread Bob Ball
hostnames. If you have heartbeat, you need to change names in configuration Best regards, Wojciech On 13 December 2010 21:34, Bob Ball <b...@umich.edu> wrote: Yes, just the DNS name.  From Andreas' respo

Re: [Lustre-discuss] Renaming an OSS

2010-12-13 Thread Bob Ball
.       From: Bob Ball [mailto:b...@umich.edu] Sent: Monday, December 13, 2010 3:02 PM To: Wojciech Turek Cc: Lundgren, Andrew; Lustre discussion Subject: Re: [Lustre-discuss

[Lustre-discuss] Trying to mount lustre on a client when one or more OST is disabled

2010-12-14 Thread Bob Ball
I am trying to get a lustre client to mount the service, but with one or more OST disabled. This does not appear to be working. Lustre version is 1.8.4. mount -o localflock,exclude=umt3-OST0019 -t lustre 10.10.1@tcp0:/umt3 /lustre/umt3 dmesg on this client shows the following during th

Re: [Lustre-discuss] Trying to mount lustre on a client when one or more OST is disabled

2010-12-14 Thread Bob Ball
bob On 12/14/2010 11:57 AM, Andreas Dilger wrote: > The error message shows a timeout connecting to umt3-MDT and not the OST. > The operation 38 is MDS_CONNECT, AFAIK. > > Cheers, Andreas > > On 2010-12-14, at 9:19, Bob Ball wrote: > >> I am trying to get a

Re: [Lustre-discuss] Trying to mount lustre on a client when one or more OST is disabled

2010-12-14 Thread Bob Ball
hat? And, most importantly, how do I fix this? bob On 12/14/2010 3:05 PM, Bob Ball wrote: > Well, you are absolutely right, it is a timeout talking to what it > THINKS is the MDT. The thing is, it is NOT! > > We were set up for HA for the MDT, with 10.10.1.48 and 10.10.1.49 > watch

Re: [Lustre-discuss] Trying to mount lustre on a client when one or more OST is disabled

2010-12-15 Thread Bob Ball
r. Are you sure you properly > specified the failover parameters > during mkfs on the MDT and did the first mount from the correct machine? > > If the NIDs are wrong, it is possible to correct it using > --writeconf. See the manual (or search > the list archives). > > Kevin &g

Re: [Lustre-discuss] Trying to mount lustre on a client when one or more OST is disabled

2010-12-15 Thread Bob Ball
On 12/15/2010 1:33 PM, Bob Ball wrote: > And, the hole gets deeper. I was digging in the list archives, and in > the manual, and decided to look at what was stored in the file systems > using "tunefs.lustre --print". > > The mgs machine is fine: > [mgs:~]# tunefs.lustre --

Re: [Lustre-discuss] lustre routing between two IP networks (public private)

2011-01-04 Thread Bob Ball
We have 3 machines that live on both the private network where the bulk of the lustre clients and all the servers are located, and several clients on the second network of the 3 machines that also need to access the lustre file system. We do this via lnet routing. From the modprobe.conf on ea

Re: [Lustre-discuss] lustre routing between two IP networks (public private)

2011-01-04 Thread Bob Ball
I forgot to mention, the mgs also has addresses on both networks. options lnet networks="tcp0(eth0),tcp2(eth2)" The MDT does NOT have addresses on both networks. bob On 1/4/2011 10:00 AM, Bob Ball wrote: > We have 3 machines that live on both the private network where the bulk &g

[Lustre-discuss] question about routing between subnets

2011-01-21 Thread Bob Ball
Our lustre 1.8.4 system sits primarily on subnet A. However, we also have a small number of clients that sit on subnet B. In setting up the subnet B clients, we provided lnet router machines that have addresses on both subnet A and on subnet B, the MGS machine has addresses on both subnet A a

[Lustre-discuss] Fwd: question about routing between subnets

2011-01-25 Thread Bob Ball
Date: Fri, 21 Jan 2011 15:48:25 -0500 From: Bob Ball To: Lustre discussion Our lustre 1.8.4 system sits primarily on subnet A. However, we also have a small number

Re: [Lustre-discuss] Fwd: question about routing between subnets

2011-01-25 Thread Bob Ball
OK, so, finally with time on my hands, I find I can make this work.  Sorry about the message list traffic. bob On 1/25/2011 9:51 AM, Bob Ball wrote: Hi, no response on this the first time I sent it around.  Can anyone help me on this

[Lustre-discuss] Migrating MDT volume to a new location

2011-02-02 Thread Bob Ball
Is there a recommended way to migrate an MDT (MGS is separate) volume from one location to another on the same server? This uses iSCSI volumes. Lustre 1.8.4 Thanks, bob ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lust

Re: [Lustre-discuss] Migrating MDT volume to a new location

2011-02-03 Thread Bob Ball
We used dd. It took about 12 hours. After the dd, we did an e2fsck on the new volume, remounted it as the MDT, and Lustre happily began serving files again. Thanks to everyone for their help. bob On 2/3/2011 12:21 PM, Frederik Ferner wrote: > Bob Ball wrote: >> Is there a recommend

Re: [Lustre-discuss] Migrating MDT volume to a new location

2011-02-03 Thread Bob Ball
w if someone has experience with resize2fs applied > to a MDT ... > > Cheers, > Thomas > > On 02/03/2011 06:35 PM, Bob Ball wrote: >> We used dd. It took about 12 hours. After the dd, we did an e2fsck on >> the new volume, remounted it as the MDT, and Lustre happily b

Re: [Lustre-discuss] osc_brw_redo_request error on clients

2011-02-09 Thread Bob Ball
Maybe, clients should mount the file system with "localflock" parameter? Please check the manual for information about this, but I think it was the same problem we had a while back where a dynamic link was failing. bob On 2/9/2011 7:24 PM, James Robnett wrote: >> Normally I've had no problems

Re: [Lustre-discuss] Lustre client error

2011-02-15 Thread Bob Ball
You can deactivate it on the MDT, that will make it RO, but leave it alone on the clients so they can still access files from it. bob On 2/15/2011 1:57 PM, Jagga Soorma wrote: Hi Guys, One of my clients got a hung lustre mount this morning and I saw the following errors in my logs: -- ..sn

Re: [Lustre-discuss] Help! Newbie trying to set up Lustre network

2011-02-22 Thread Bob Ball
Quite often, sunrpc takes the port needed by Lustre, before Lustre can get to it. That results in the messages below. No recourse but to reboot. Put the mount in your /etc/fstab as the simplest approach. This may not be the ONLY reason why this happens, but it is the one that has most ofte

Re: [Lustre-discuss] Help! Newbie trying to set up Lustre network

2011-02-22 Thread Bob Ball
Make sure that the kernel you are running matches up with the rpms you installed then? [ball@umt3int01:gate01_b]$ rpm -qa|grep lustre lustre-client-1.8.4-2.6.18_194.17.4.el5.x86_64 lustre-client-modules-1.8.4-2.6.18_194.17.4.el5.x86_64 [ball@umt3int01:gate01_b]$ uname -r 2.6.18-194.17.4.el5 bob

  1   2   >