Re: [Lustre-discuss] Problems with MDS Crashing

2010-05-20 Thread Andrew Godziuk
We have had another hang, but this time we had KVM access to the
machine (and the screen blanker wasn't on). I took some screenshots,
the first one is an error I got after reboot, the BMP one is what I
saw when I first logged in to KVM, and the other ones are what I saw
when trying to type 'root' - it started printing traces.

http://amber.leeware.com/wi/lustre-death/

After reboot there was a command timeout message from RAID card. When
hanged - "too little hardware resources".

-- 
Andrew
http://CloudAccess.net/
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre client bug (?)

2010-04-22 Thread Andrew Godziuk
Hi,

I'm not sure where I should report it but I couldn't find the error
text in Google so I guess it's not in bug tracker yet.

This appeared on CentOS 64-bit client under light traffic. Lustre
1.8.2 patchless client from Sun, Linux 2.6.28.10 #4 SMP, both without
custom patches. I'm not sure what more details I could supply.

mx1 kernel: LustreError:
20716:0:(statahead.c:149:ll_sai_entry_cleanup())
ASSERTION(list_empty(&entry->se_list)) failed
Message from syslogd@ at Thu Apr 22 04:31:50 2010 ...
mx1 kernel: LustreError: 20716:0:(statahead.c:149:ll_sai_entry_cleanup()) LBUG

-- 
Andrzej Godziuk
http://CloudAccess.net/
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] DRBD + active/active OST, again

2010-03-02 Thread Andrew Godziuk
On Tue, Mar 2, 2010 at 2:31 PM, Johann Lombardi  wrote:
> On Tue, Mar 02, 2010 at 02:01:06PM +0100, Andrew Godziuk wrote:
>> Then I guess this part of manual should be changed:
...
>> to state explicitly that active/active scenario is only possible when
>> OSS is active for some OSTs and passive for some others.
>
> Yes, i think this is explained in the next section:
> "For OST failover, multiple OSS nodes are configured to be able to serve the
> same OST. However, only one OSS node can serve the OST at a time. An OST can 
> be
> moved between OSS nodes that have access to the same storage device using
> umount/mount commands. "

It sounded to me like contradiction and made me ask the question here.
Now that I know, it sounds logical.

> BTW, in your case, since you did not specify a failover node for the OST at
> mkfs time, the lustre clients are not aware of the alternative path and thus
> won't try to reach the OST through the 2nd OSS. So your filesystem should
> still be safe since the 2nd mount instance should never receive any client
> connection. However, I would still recommend to umount the OST on the 2nd
> OSS asap.

This was just a test setup, I'll be specifying --failover in the live
setup for sure.

Again, thank you very much for your help.

-- 
Andrzej Godziuk
http://CloudAccess.net/
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] DRBD + active/active OST, again

2010-03-02 Thread Andrew Godziuk
Johann,

Thank you for your detailed answer, this made the picture much more clear.

Then I guess this part of manual should be changed:

"The active/passive configuration is seldom used for OST servers as it
doubles hardware costs without improving performance. On the other
hand, an active/active cluster configuration can improve performance
by serving and providing arbitrary failover protection to a number of
OSTs."

to state explicitly that active/active scenario is only possible when
OSS is active for some OSTs and passive for some others.

-- 
Andrzej Godziuk
http://CloudAccess.net/
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] DRBD + active/active OST, again

2010-03-02 Thread Andrew Godziuk
Hi,

I know this topic has been discussed here many times, but all the
messages seem to be about Lustre 1.6.

Has anything changed in Lustre 1.8 that would make it possible to set
up two OSS with an OST shared using DRBD, in an active-active
configuration?

I have mounted a shared OST on two OSS nodes, none of them marked with
"--failover", and it looked as if it was working, but I didn't do any
stress tests for reliablility. Does such setup ever have a chance to
work for real or did it only look as if everything was OK?

-- 
Andrzej Godziuk
http://CloudAccess.net/
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss