Re: [Lustre-discuss] aacraid kernel panic caused failover

2011-04-06 Thread Thomas Roth
We have ~ 60 servers with these Adaptec controllers, and found this problem just to happen from time to time. Upgrade of the aacraid module wouldn't help. We had contacts to Adaptec, but they had no clue either. Only good thing is it seems that this adapter panic happens in an instant, halting

[Lustre-discuss] e2fsck and related errors during recovering

2011-04-06 Thread Werner Dilling
Hello, after a crash of our lustre system (1.6.4) we have problems repairing the filesystem. Running the 1.6.4 e2fsck failed on the mds filesystem so we tried with the latest 1.8 version which succeeded. But trying to mount mds as ldiskfs filesystem failed with the standard error message:

Re: [Lustre-discuss] aacraid kernel panic caused failover

2011-04-06 Thread Jeff Johnson
I have seen similar behavior on these controllers. On dissimilar configs and different aged systems. These happened to be non-Lustre standalone nfs and iscsi target boxes. Went through controller and drive firmware upgrades, low-level fw dumps and analysis from dev engineers. In the end it

Re: [Lustre-discuss] aacraid kernel panic caused failover

2011-04-06 Thread Thomas Roth
Provided your card is actually a Adaptec Raid controller (it says Adaptec ASR 5405 on our cards, not Intel or Sun), this is definitely not the problem. We have had a number of broken or aged batteries amongs our 60 or so controller cards, but never any relation with the kernel panic and the

Re: [Lustre-discuss] aacraid kernel panic caused failover

2011-04-06 Thread David Noriega
It is adaptec based, just branded by sun and built by intel. Anyways I reseated the card and will wait and see. If it still goes wonky, is there a card anyone recommends? It has to be a low profile pcie 8x with two x4 sas internal connectors. On Wed, Apr 6, 2011 at 10:38 AM, Thomas Roth

Re: [Lustre-discuss] e2fsck and related errors during recovering

2011-04-06 Thread Andreas Dilger
Having the actual error messages makes this kind of problem much easier to solve. At a guess, if the journal was removed by e2fsck you can re-add it with tune2fs -J size=400 /dev/{mdsdev}. As for lfsck, if you still need to run it, you need to make sure the same version of e2fsprogs is on all

Re: [Lustre-discuss] e2fsck and related errors during recovering

2011-04-06 Thread Larry
Is it helpful updating the e2fsprogs to the newest version? I have ever had a problem during e2fsck, after updating the e2fsprogs, it's ok. On Thu, Apr 7, 2011 at 2:29 AM, Andreas Dilger adil...@whamcloud.com wrote: Having the actual error messages makes this kind of problem much easier to