In my memory, it is not recommended to run Lustre more than 90% full.
bob
On 1/14/2015 2:43 PM, Mike Selway wrote:
Hello,
I’m looking for experiences for what has been observed
to happen (performance drop offs, severity of drops, partial/full
failures, …) when an operational
This is Lustre 2.1.6.
I did a dumb thing. I reformatted a drained OST BEFORE I saved off
appropriate data to re-establish it at its old index. I don't know what
I was thinking? The files that I would normally restore are,
LAST_ID
last_rcvd
mountdata
umt3-OST0020
I know how to recreate the
Thank you, Andreas. I feel a bit better now.
bob
On 1/30/2015 11:25 AM, Dilger, Andreas wrote:
On 2015/01/29, 12:34 PM, "Bob Ball" wrote:
This is Lustre 2.1.6.
I did a dumb thing. I reformatted a drained OST BEFORE I saved off
appropriate data to re-establish it at its old
Our biggest is a raidz6 on 10 disks of 4TB each. 8 is a good divisor
into 1MB, and the 10 disks is a good divisor as well for our total
number of disks.
bob
On 4/28/2015 4:01 PM, Andrus, Brian Contractor wrote:
Quick question/survey:
What is the partition size folks use for their OSTs and
We just built a new 2.7.0 Lustre file system. Overall I'm happy with
performance, but something is confusing me.
We have a combined mgs/mdt DataStore. On that server I issue:
...
20 UP osp umt3B-OST000b-osc-MDT umt3B-MDT-mdtlov_UUID 5
[root@mdtmgs ~]# lctl --device 20 deactivate
/var
index fail as expected, attempts to write
to another, enabled index are fine.
bob
On 5/4/2015 1:39 PM, Bob Ball wrote:
We just built a new 2.7.0 Lustre file system. Overall I'm happy with
performance, but something is confusing me.
We have a combined mgs/mdt DataStore. On that server I
ted a documentation change and LUDOC-218
was created which is still open.
Regards,
Roland
Am 04.05.2015 um 19:54 schrieb Bob Ball:
Hmm, an interesting addendum to this, section 18.3.4 of the Lustre
manual shows how to create a file on a given OST. If I try that to the
disabled OST, it silently
OK, so, I am seeing EXACTLY the issue reported at the end of LU-6452 8
minutes after it was closed by Andreas Dilger.
https://jira.hpdd.intel.com/browse/LU-6452
There is no response. Is there a solution?
This is Lustre 2.7.0 with (now) zfs 0.6.4.1-1, which was current when
the server was buil
here.
Many thanks,
bob
On 5/13/2015 10:16 AM, Bob Ball wrote:
OK, so, I am seeing EXACTLY the issue reported at the end of LU-6452 8
minutes after it was closed by Andreas Dilger.
https://jira.hpdd.intel.com/browse/LU-6452
There is no response. Is there a solution?
This is Lustre 2.7.0 with
OK, this was just odd to me. We have a Lustre 2.7.0 system running now,
and, after setting up our first OSS, copying over all files from the old
system, we brought the new file system online. The new backingstore is
zfs. All was well with the world.
Meanwhile, all the old file servers were
I had the same issue. I could not find/get an answer, posted to this
same list about 1mo ago. I ended up down-grading to the same zfs I had
previously used, ie, 0.6.3.1 with Lustre 2.7.0
Maybe an answer has come round in the intervening month? But, it would
appear that if you use zfs, you a
We are running Lustre 2.7.0.
# uname -r
2.6.32-504.8.1.el6_lustre.x86_64
The combined mgsmdt load jumped up yesterday, and stayed high since,
with a couple of really outrageous peaks. Ended up power cycling, as
the mdt would not umount. It seems to be performing fine now, but while
watching l
cat /proc/fs/lustre/lov/*/target_obd
I forget which version this bug appeared in, that the lctl command does
not show the IN state. I am hoping it will be fixed in the 2.8 release
as it is really annoying.
bob
On 6/15/2015 3:39 PM, Kurt Strosahl wrote:
An update,
Even though the MDT i
So, let's say I just want to empty the OST completely, reformat or
remake or, then add the OST back in. If I never let the emptied
OST, prior to re-X, reconnect, then the MDS cannot destroy the objects
on the OST.
Is this going to be an issue once the re-X OST is ready and is brought
bac
FWIW, last September in the context of reducing the memory usage on the
mgs, Andreas Dilger had this to say
"ls" is itself fairly inefficient at directory traversal, because all of
the GNU file utilities are bloated and do much more work than necessary
(e.g. "rm" will stat() every file before un
Hi,
I'm looking for someone who can give me advice on a problem I am having
rebuilding the full set (server and client) of Lustre rpms, including
zfs support on the server side. The 2.7.0 rpms as distributed were
built with zfs 0.6.3, and do not work with 0.6.4.
So, I follow the directions
Hi,
I'm looking for someone who can give me advice on a problem I am having
rebuilding the full set (server and client) of Lustre rpms, including
zfs support on the server side. The 2.7.0 rpms as distributed were
built with zfs 0.6.3, and do not work with 0.6.4.
So, I follow the directions
-Olaf
*From:* Bob Ball [b...@umich.edu]
*Sent:* Monday, November 23, 2015 12:22 PM
*To:* Faaland, Olaf P.; Morrone, Chris
*Cc:* Bob Ball
*Subject:* Expanding a zfsonlinux OST pool
Hi,
We have some zfsonlinux pools in use with Lustre 2.7 that use some
older disks, and we are rapidly running o
w zfs is using the disks.
Also, how often are disks failing and how long does a replacement take
to resilver, with your current disks?
-Olaf
*From:* Bob Ball [b...@umich.edu]
*Sent:* Monday, November 23, 2015 12:22 P
You may want to do this on your build machine?
cd /usr/src/spl-0.6.4.2/
./configure --with-config=kernel
make all
cd ../zfs-0.6.4.2/
./configure --with-config=kernel
make all
bob
On 12/15/2015 2:16 PM, Christopher J. Morrone wrote:
Is your OS CentOS 6.7 then? You _might_ be hitting bug LU-7
I have a zfs OST that I need to drain and re-create. This is lustre
2.7.x. In the past, with ldiskfs OST, I did this a number of times
following critical failures, and came up with the following items to be
replaced on the file system once the mkfs.lustre had been run, where the
new OST was m
/12, 12:50, "lustre-discuss on behalf of Bob Ball"
wrote:
I have a zfs OST that I need to drain and re-create. This is lustre
2.7.x. In the past, with ldiskfs OST, I did this a number of times
following critical failures, and came up with the following items to be
replaced on the
Thank you, Andreas. As always, your answers are the best.
bob
On 1/13/2016 10:14 PM, Dilger, Andreas wrote:
On 2016/01/12, 12:50, "lustre-discuss on behalf of Bob Ball"
wrote:
I have a zfs OST that I need to drain and re-create. This is lustre
2.7.x. In the past, with ldiskfs
Hi, we have Lustre 2.7.58 in place on our OST and MDT/MGS (combined).
Underlying the lustre file system is a raid-z2 zfs pool.
A few days ago, we lost 2 disks at once from the raid-z2. I replaced
one and a resilver started, that seemed to choke. So, I put back both
disks with replacements,
broken zfs object, like in
"zdb: Examining ZFS At Point-Blank Range," and see what it is (plain
zfs file or else).
Knowing zfs version can be helpful.
Alex
On Mar 11, 2016, at 7:19 PM, Bob Ball <mailto:b...@umich.edu>> wrote:
errors: Permanent errors have been de
Has anyone else noticed that the lnetctl command is missing from this
rpm set? I mean, an entire chapter of the manual dedicated to this
command, and it is not present?
Will this be fixed?
bob
On 3/16/2016 6:13 PM, Jones, Peter A wrote:
We are pleased to announce that the Lustre 2.8.0 Relea
Glossman
HPDD Software Engineer
On 3/28/16, 1:11 PM, "lustre-discuss on behalf of Bob Ball"
wrote:
Has anyone else noticed that the lnetctl command is missing from this
rpm set? I mean, an entire chapter of the manual dedicated to this
command, and it is not present?
Will thi
ee.
On Mar 28, 2016, at 5:27 PM, Bob Ball wrote:
Thanks.
Are you going to replace the 2.8 rpm sets in your repos, or just leave
them that way until 2.9? Yeah, that is a provocative question. It just
seems to me that it is such a fundamental thing, that a new rpm set
should be created for dist
What I recall hearing is that, because of IO issues with 0.6.5, official
Lustre 2.8.0 support is with zfs 0.6.4.2
bob
On 4/12/2016 11:21 AM, Nathan Smith wrote:
In doing a test install of vanilla Lustre 2.8 [0] and zfs-0.6.5.6 I
received the symbol mismatch error when attempting to start lustr
The answer came offline, and I guess I never replied back to the
original posting. This is what I learned. It deals with only a single
file, not 1000's. --bob
---
On Mon, 14 Mar 2016, Bob Ball wrote:
OK, it would seem the affected user has already deleted
It is my understanding that when you set the OST deactivated, they you
also don't get updates on the used space either, as of some recent
version of Lustre. It has never been clear to me though if a simple
re-activation is needed to run the logs, or if the reboot is required,
once it is re-act
Stabbing in the dark, but this sounds like a multipath problem. Perhaps
you have 2 or more paths to the storage, and one or more of them is down
for some reason, perhaps the hardware itself, perhaps a cable is
pulled You could look for LEDs in a bad state.
I always find it instructive to
Pre-make your list of files, split it into 5 or so parts, give each of 5
clients one piece, and let them migrate in parallel.
bob
On 9/26/2016 9:00 AM, Jérôme BECOT wrote:
Hello,
As we had a inode usage issue, we did as Andreas advised :
- add a new ost with much more inodes
- disable the fi
You'll need to restart the mds/mdt machine before the space will reclaim
off the OST.
bob
On 10/17/2016 2:32 PM, DeWitt, Chad wrote:
Hi All.
I am still learning Lustre and I have run into an issue. I have
referred to both the Lustre admin manual and Google, but I've had no
luck in finding
Just "zpool scrub ". Scrub may slow down access, but it does not
otherwise impact the OST, in my experience.
bob
On 1/30/2017 9:52 PM, Riccardo Veraldi wrote:
Hello,
I need to scrub the underlying ZFS data pools on my Lustre OSSs.
May I do it safely when the Lustre filesystem is mounted ?
Hi,
I am getting this message
PANIC: zfs: accessing past end of object 29/7 (size=33792 access=33792+128)
The affected OST seems to reject new mounts from clients now, and the
lctl dl count of connections to the obdfilter process increases, but
does not seem to decrease?
This is Lustre 2.7.
state:
18 UP obdfilter umt3B-OST000f umt3B-OST000f_UUID 403
bob
On 2/10/2017 9:39 AM, Bob Ball wrote:
Hi,
I am getting this message
PANIC: zfs: accessing past end of object 29/7 (size=33792
access=33792+128)
The affected OST seems to reject new mounts from clients now, and the
lctl dl cou
upon completion using spare disks. It would be
nice though if someone had a better way to fix this, or could truly
point to a reason why this is consistently happening now.
bob
On 2/10/2017 11:23 AM, Bob Ball wrote:
Well, I find this odd, to say the least. All of this below was from
yesterday
seems as if it must be mdtmgs
related, by the old "what else could it be?" argument.
Is this OST index a dead loss? Fix this index, or destroy forever and
introduce a new OST?
bob
On 2/13/2017 1:00 PM, Bob Ball wrote:
OK, so, I tried some new system mounts today, and each ti
tween .50 and .90), so you would be better off to update to the latest
release (e.g. 2.8.0 or 2.9.0), which will have had much more testing.
Likewise, ZFS 0.6.4.x is quite old and many fixes have gone into ZFS 0.6.5.x.
Cheers, Andreas
On Feb 17, 2017, at 19:21, Bob Ball wrote:
No luck, remove
On the mgs/mgt do something like:
lctl --device -OST0019-osc-MDT deactivate
No further files will be assigned to that OST. Reverse with
"activate". Or reboot the mgs/mdt as this is not persistent. "lctl dl"
will tell you exactly what that device name should be for you.
bob
On 7/12/201
jects until the OST is
activated again.
On Sun, Jul 16, 2017, 9:29 AM E.S. Rosenberg
mailto:esr%2blus...@mail.hebrew.edu>> wrote:
On Thu, Jul 13, 2017 at 5:49 AM, Bob Ball mailto:b...@umich.edu>> wrote:
On the mgs/mgt do something like:
lctl --device -OS
en't tested this
myself). The problem might be that this prevents new clients from
mounting.
It probably makes sense to add server-side read-only mounting as a
feature. Could you please file a ticket in Jira about this?
Cheers, Andreas
On Jul 16, 2017, at 09:16, Bob Ball <mailto:b...@um
We have successfully used Scientific Linux 7, a variant of CentOS.
bob
On 10/18/2017 5:59 AM, Amjad Syed wrote:
Hello
We are in process of purchasing a new lustre filesystem for our site
that will be used for life sciences and genomics.
We would like to know if we should buy rhel license or g
If mounting from /etc/fstab, try adding "_netdev" as a parameter. This
forces the mount to wait until the network is ready.
bob
On 10/24/2017 5:58 AM, Ravi Konila wrote:
Hi
My lustre clients does not mount lustre storage automatically on reboot.
I tried by adding in fstab as well as in /etc/rc
lhome /home lustre defaults,_netdev 0 0
If I add mount command in rc.local twice, it works..surprise..
my rc.local has
mount –t lustre 192.168.0.50@o2ib:/lhome /home
mount –t lustre 192.168.0.50@o2ib:/lhome /home
Regards
*Ravi Konila*
*From:* Bob Ball
*Sent:* Tuesday, October 24, 2017 6:10 P
No, that version of Lustre does not work with SL7.5. See some previous
threads on this. I gather we need 2.11 for SL7.5
bob
On 5/11/2018 10:15 AM, Kevin M. Hildebrand wrote:
I'm not sure if this is a supported behavior or not, but I'm currently
unable to mount a filesystem from my 2.8 server
We added a new OSS to our 1.8.4 Lustre installation. It has 6 OST of
8.9TB each. Within a day of having these on-line, one OST stopped
accepting new files. I cannot get it to activate. The other 5 seem fine.
On the MDS "lctl dl" shows it IN, but not UP, and files can be read from it:
33 IN
files back from that command, but other problems
on our cluster confused that result. We will recheck.
bob
Bernd Schubert wrote:
On Friday, September 03, 2010, Bob Ball wrote:
We added a new OSS to our 1.8.4 Lustre installation. It has 6 OST of
8.9TB each. Within a day of havin
Try adding _netdev as a mount option.
bob
Cliff White wrote:
The mount command will automatically load the modules on the client.
cliffw
On 09/03/2010 11:56 AM, Ronald K Long wrote:
We have installed lustre 1.8.2 and 1.8.4 client on Red hat 5. The lustre
modules are not loading
d0 292648 346413
e0 68225 -7137254917378053186
f0 59064 59607
000100 59227 59414
Thanks,
bob
Bob Ball wrote:
Thank you, Bern. "df" claims there is some 442MB of data on the
v
I just made some random checks on the "lfs find" output for this OST
from yesterday. Each file I checked was one lost when we had problems
a few months back. The suggested "unlink" on these did not work in
1.8.3, worked fine on a whole set yesterday with 1.8.4, but I obviously
did not find th
so no down-time was involved.
bob
Bob Ball wrote:
I just made some random checks on the "lfs find" output for this OST
from yesterday. Each file I checked was one lost when we had problems
a few months back. The suggested "unlink" on these did not work in
1.8.3, worked fine
Peter, can you comment on what you said here about RAID6? Are there
Twiki or other entries somewhere about this?
There are relevant bits of advice in 1.4.2.2 and 10.1.1-4 for
example (some of them objectionable, such as recommending RAID6
for data storage, without the necessary qualifications at
syslog-ng
bob
On 9/30/2010 1:31 PM, Ben Evans wrote:
> Would't it be better to log to a non-Lustre machine? If your MDS goes down,
> you lose all your logs, which would make things a bit trickier.
>
> -Original Message-
> From: lustre-discuss-boun...@lists.lustre.org
> [mailto:lustre
in my experience a single (or small number) of lfs_find with lots of
obd arguments was faster than doing all of them individually.
go to lustre 1.8.4 (at least) and use lfs_migrate with your lfs_find
list. it wasn't REAL fast, but it was REAL reliable.
bob
On 10/6/2010 5:24 PM, Michael Barn
Why do you need both active? If one is a backup to the other, then bond
them as a primary/backup pair, meaning only one will be active at at a
time, ie, your designated primary (unless it goes down).
bob
On 10/21/2010 9:51 AM, Brock Palen wrote:
> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote
afterwards for information.
bob
On 10/21/2010 9:59 AM, Bob Ball wrote:
> Why do you need both active? If one is a backup to the other, then bond
> them as a primary/backup pair, meaning only one will be active at at a
> time, ie, your designated primary (unless it goes down).
>
> bob
&g
We may need to completely empty an OST, so that we can more efficiently
format the underlying RAID-6 array (in light of recent discussions about
the stripe on this list). What is the most efficient way to do this? I
can use multiple clients running lfs_migrate from pre-prepared lists,
but tha
I am emptying a set of OST so that I can reformat the underlying RAID-6
more efficiently. Two questions:
1. Is there a quick way to tell if the OST is really empty? lfs_find
takes many hours to run.
2. When I reformat, I want it to retain the same ID so as to not make
"holes" in the list. Fro
ount_point
This still shows several hundred MB as well, in disparate amounts per
OST I'm emptying. An indicator, but not perfect. Yes, the OST were
near full to begin with.
More below.
On 11/6/2010 11:09 AM, Andreas Dilger wrote:
> On 2010-11-06, at 8:24, Bob Ball wrote:
>> I am
ex even matter if we restore the files
below that you mention? That seems to be what you are saying.
Thanks much,
bob
On 11/6/2010 11:09 AM, Andreas Dilger wrote:
> On 2010-11-06, at 8:24, Bob Ball wrote:
>> I am emptying a set of OST so that I can reformat the underlying RAID-6
>&g
BTW, the new OST sizes will be much different from the original OST
sizes. Is the "copy the old file" method below still valid in this case?
bob
On 11/7/2010 2:32 PM, Bob Ball wrote:
> Hi, Andreas.
>
> Tomorrow, we will redo all 8 OST on the first file server we are
&g
Thanks, Ashley. No quotas, fortunately. Tomorrow will be "fun".
bob
On 11/7/2010 3:44 PM, Ashley Pittman wrote:
> On 7 Nov 2010, at 19:32, Bob Ball wrote:
>> So, while we are doing the reformat, is there any way to avoid this
>> "hang" situation?
> I bel
dress already
in use retries left: 0
mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already
in use
The target service's index is already in use. (/dev/sdc)
On 11/8/2010 5:01 AM, Andreas Dilger wrote:
On 2010-11-07, at
12:32, B
/8/2010 5:01 AM, Andreas Dilger wrote:
On 2010-11-07, at
12:32, Bob Ball <b...@umich.edu> wrote:
Tomorrow, we will redo all 8 OST on the first file
server we are redoing. I am very nervous about this, as a
lot is riding on us doing this co
Yes, you are correct. That was the key here, did not put that file back
in place. Back up and (so far) operating cleanly.
Thanks,
bob
On 11/8/2010 3:04 PM, Andreas Dilger wrote:
> On 2010-11-08, at 11:39, Bob Ball wrote:
>> Don't know if I sent to the whole list. One
, but can't seem to find anything on it
now, or what to do about it.
help?
bob
On 11/8/2010 4:27 PM, Bob Ball wrote:
> Yes, you are correct. That was the key here, did not put that file back
> in place. Back up and (so far) operating cleanly.
>
> Thanks,
> bob
>
> On 11
t syncing: Fatal exception
On 11/10/2010 1:01 PM, Bob Ball wrote:
> Well, we ran 2 days, migrating files off OST, then this morning, the MDT
> crashed. Could not get all clients reconnected before seeing another
> kernel panic on the mdt. did an e2fsck of the mdt db and tried again.
-10, at 11:01, Bob Ball wrote:
>> Well, we ran 2 days, migrating files off OST, then this morning, the MDT
>> crashed. Could not get all clients reconnected before seeing another
>> kernel panic on the mdt. did an e2fsck of the mdt db and tried again.
>> crashed agai
I have an odd problem. I am trying to empty all files from a set of OST
as indicated below, by making a list via lfs find and then sending that
list to lfs_migrate. However, I have just gotten this message back from
the lfs find:
llapi_semantic_traverse: Failed to open
'/lustre/umt3/data13/d
OK, thanks. Scary, to see errors out of lfs find.
bob
On 11/30/2010 1:47 AM, Andreas Dilger wrote:
> On 2010-11-29, at 20:18, Bob Ball wrote:
>> I have an odd problem. I am trying to empty all files from a set of OST
>> as indicated below, by making a list via lfs find and th
ply copy them out of this "ldiskfs" mount of the file system,
back into some recovery directory in the real file system, so that users
can pick through them? After they are moved, the file system will be
reformatted and returned to use.
bob
On 11/30/2010 8:53 AM, Bob Ball wrote:
>
On 11/30/2010 4:17 PM, Andreas Dilger wrote:
> On 2010-11-30, at 11:17, Bob Ball wrote:
>> [r...@umdist03 d0]# ls -l
>> total 182976
>> -rw-rw-rw- 1 daits users 45002956 Jul 5 20:52 1162976
>> -rw-rw-rw- 1 daits users 44569036 Jul 7 02:53 1200608
>> -rw-rw-rw- 1
We were getting errors thrown by an OST. /var/log/messages contained a
lot of these:
2010-11-28T17:05:34-05:00 umfs06.aglt2.org kernel: [2102640.735927]
LDISKFS-fs error (device sdk): ldiskfs_mb_check_ondisk_bitmap: on-disk
bitmap for group 639corrupted: 440 blocks free in bitmap, 439 - in gd
fy the error counters
> on that.
>
> -cf
>
>
> On 12/02/2010 02:00 PM, Bob Ball wrote:
>> We were getting errors thrown by an OST. /var/log/messages contained a
>> lot of these:
>> 2010-11-28T17:05:34-05:00 umfs06.aglt2.org kernel: [2102640.735927]
>> LDI
remain
corrupted, and we'll probably never be able to come up with a complete
list of them.
bob
On 12/2/2010 4:35 PM, Bob Ball wrote:
> It is a Dell PERC6 RAID array. OMSA monitoring is enabled and is not
> throwing errors. H, mptctl is old though, so maybe that is a
>
We have 6 OSS, each with at least 8 OST. It sometimes happens that I
need to do maintenance on an OST, so to avoid hanging processes on the
client machines, I use lctl to disable access to that OST on active
client machines.
So, now, it may happen during this maintenance that a client machine
Thanks for the pointer to this. After thinking on this a bit, I believe
I can see my way clear to using it. Testing time
bob
On 12/7/2010 5:45 PM, Cliff White wrote:
> On 12/07/2010 06:51 AM, Bob Ball wrote:
>> We have 6 OSS, each with at least 8 OST. It sometimes happens that
For administrative reasons, we want to rename an OSS. It will retain
the same IP addresses, and have the same OST. Does this present any
problems that I should be aware of, or is it a no-brainer?
Thanks,
bob
___
Lustre-discuss mailing list
Lustre-di
From: lustre-discuss-boun...@lists.lustre.org
> [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Bob Ball
> Sent: Monday, December 13, 2010 1:49 PM
> To: Lustre discussion
> Subject: [Lustre-discuss] Renaming an OSS
>
> For administrative reasons, we want to rename
hostnames.
If you have heartbeat, you need to change names in configuration
Best regards,
Wojciech
On 13 December 2010 21:34, Bob Ball <b...@umich.edu>
wrote:
Yes, just the DNS name. From Andreas'
respo
.
From: Bob Ball [mailto:b...@umich.edu]
Sent: Monday, December 13, 2010 3:02 PM
To: Wojciech Turek
Cc: Lundgren, Andrew; Lustre discussion
Subject: Re: [Lustre-discuss
I am trying to get a lustre client to mount the service, but with one or
more OST disabled. This does not appear to be working. Lustre version
is 1.8.4.
mount -o localflock,exclude=umt3-OST0019 -t lustre
10.10.1@tcp0:/umt3 /lustre/umt3
dmesg on this client shows the following during th
bob
On 12/14/2010 11:57 AM, Andreas Dilger wrote:
> The error message shows a timeout connecting to umt3-MDT and not the OST.
> The operation 38 is MDS_CONNECT, AFAIK.
>
> Cheers, Andreas
>
> On 2010-12-14, at 9:19, Bob Ball wrote:
>
>> I am trying to get a
hat? And, most importantly, how do I fix
this?
bob
On 12/14/2010 3:05 PM, Bob Ball wrote:
> Well, you are absolutely right, it is a timeout talking to what it
> THINKS is the MDT. The thing is, it is NOT!
>
> We were set up for HA for the MDT, with 10.10.1.48 and 10.10.1.49
> watch
r. Are you sure you properly
> specified the failover parameters
> during mkfs on the MDT and did the first mount from the correct machine?
>
> If the NIDs are wrong, it is possible to correct it using
> --writeconf. See the manual (or search
> the list archives).
>
> Kevin
&g
On 12/15/2010 1:33 PM, Bob Ball wrote:
> And, the hole gets deeper. I was digging in the list archives, and in
> the manual, and decided to look at what was stored in the file systems
> using "tunefs.lustre --print".
>
> The mgs machine is fine:
> [mgs:~]# tunefs.lustre --
We have 3 machines that live on both the private network where the bulk
of the lustre clients and all the servers are located, and several
clients on the second network of the 3 machines that also need to access
the lustre file system. We do this via lnet routing. From the
modprobe.conf on ea
I forgot to mention, the mgs also has addresses on both networks.
options lnet networks="tcp0(eth0),tcp2(eth2)"
The MDT does NOT have addresses on both networks.
bob
On 1/4/2011 10:00 AM, Bob Ball wrote:
> We have 3 machines that live on both the private network where the bulk
&g
Our lustre 1.8.4 system sits primarily on subnet A. However, we also
have a small number of clients that sit on subnet B. In setting up the
subnet B clients, we provided lnet router machines that have addresses
on both subnet A and on subnet B, the MGS machine has addresses on both
subnet A a
Date:
Fri, 21 Jan 2011 15:48:25 -0500
From:
Bob Ball
To:
Lustre discussion
Our lustre 1.8.4 system sits primarily on subnet A. However, we also
have a small number
OK, so, finally with time on my hands, I find I can make this work.
Sorry about the message list traffic.
bob
On 1/25/2011 9:51 AM, Bob Ball wrote:
Hi, no response on this the first time I sent it around. Can
anyone help me on this
Is there a recommended way to migrate an MDT (MGS is separate) volume
from one location to another on the same server? This uses iSCSI volumes.
Lustre 1.8.4
Thanks,
bob
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lust
We used dd. It took about 12 hours. After the dd, we did an e2fsck on
the new volume, remounted it as the MDT, and Lustre happily began
serving files again.
Thanks to everyone for their help.
bob
On 2/3/2011 12:21 PM, Frederik Ferner wrote:
> Bob Ball wrote:
>> Is there a recommend
w if someone has experience with resize2fs applied
> to a MDT ...
>
> Cheers,
> Thomas
>
> On 02/03/2011 06:35 PM, Bob Ball wrote:
>> We used dd. It took about 12 hours. After the dd, we did an e2fsck on
>> the new volume, remounted it as the MDT, and Lustre happily b
Maybe, clients should mount the file system with "localflock"
parameter? Please check the manual for information about this, but I
think it was the same problem we had a while back where a dynamic link
was failing.
bob
On 2/9/2011 7:24 PM, James Robnett wrote:
>> Normally I've had no problems
You can deactivate it on the MDT, that will make it RO, but leave it
alone on the clients so they can still access files from it.
bob
On 2/15/2011 1:57 PM, Jagga Soorma wrote:
Hi Guys,
One of my clients got a hung lustre mount this morning and I saw the
following errors in my logs:
--
..sn
Quite often, sunrpc takes the port needed by Lustre, before Lustre can
get to it. That results in the messages below. No recourse but to
reboot. Put the mount in your /etc/fstab as the simplest approach.
This may not be the ONLY reason why this happens, but it is the one that
has most ofte
Make sure that the kernel you are running matches up with the rpms you
installed then?
[ball@umt3int01:gate01_b]$ rpm -qa|grep lustre
lustre-client-1.8.4-2.6.18_194.17.4.el5.x86_64
lustre-client-modules-1.8.4-2.6.18_194.17.4.el5.x86_64
[ball@umt3int01:gate01_b]$ uname -r
2.6.18-194.17.4.el5
bob
1 - 100 of 122 matches
Mail list logo