Re: [Lustre-discuss] aacraid kernel panic caused failover

2011-04-06 Thread Thomas Roth
We have ~ 60 servers with these Adaptec controllers, and found this problem 
just to happen from time to time.
Upgrade of the aacraid module wouldn't help. We had contacts to Adaptec, but 
they had no clue either.
Only good thing is it seems that this adapter panic happens in an instant, 
halting the machine, but has no prior phase of degradation: the controller
doesn't start leaving out every second bit or just writing the '1's and not the 
'0's or ... - so whatever data has made it to the disks before the
crash seems to be quite sensible. Reboot and never buy Adaptec again.

Cheers,
Thomas

On 04/06/2011 07:03 AM, David Noriega wrote:
 Ok I updated the aacraid driver and the raid firmware, yet I still had
 the problem happen, so I did more research and applied the following
 tweaks:
 
 1) Rebuilt mkinitrd with the following options:
 a) edit /etc/sysconfig/mkinitrid/multipath to contain MULTIPATH=yes
 b) mkinitrid initrd-2.6.18-194.3.1.el5_lustre.1.8.4.img
 2.6.18-194.3.1.el5_lustre.1.8.4 --preload=scsi_dh_rdac
 2) Added the local hard disk to the multipath black list
 3) Edited modprobe.conf to have the following aacraid options:
 options aacraid firmware_debug=2 startup_timeout=60 #the debug doesn't
 seem to print anything to dmesg
 4) Added pcie_aspm=off to the kernel boot options
 
 So things looked good for a while. I did have a problem mounting the
 lustre partitions but this was my fault in misconfiguring some lnet
 options I was experimenting with. I fixed that and just as a test, I
 ran 'modprobe lustre' since I wasn't ready to fail back the partitions
 just yet(wanted to wait till when activity was the lowest). That was
 earlier today. I was about to fail back tonight, yet when I checked
 the server again I saw in dmesg the same aacraid problems from before.
 Is it possible lustre is interfering with aacraid? Its weird since I
 do have a duplicate machine and its not having any of thise problems.
 
 On Fri, Mar 25, 2011 at 9:55 AM, Temple  Jason jtem...@cscs.ch wrote:
 Adaptec should have the firmware and drivers on their site for your card.  
 If not adaptec, then SOracle will have it available somewhere.

 The firmware and system drivers usually have a utility that will check the 
 current version and upgrade it for you.

 Hope this helps (I use different cards, so I can't tell you exactly).

 -Jason

 -Original Message-
 From: David Noriega [mailto:tsk...@my.utsa.edu]
 Sent: venerdì, 25. marzo 2011 15:47
 To: Temple Jason
 Subject: Re: [Lustre-discuss] aacraid kernel panic caused failover

 Hmm not sure, whats the best way to find out?

 On Fri, Mar 25, 2011 at 9:46 AM, Temple  Jason jtem...@cscs.ch wrote:
 Hi,

 Are you using the latest firmware?  This sort of thing used to happen to 
 me, but with different raid cards.

 -Jason

 -Original Message-
 From: lustre-discuss-boun...@lists.lustre.org 
 [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of David Noriega
 Sent: venerdì, 25. marzo 2011 15:38
 To: lustre-discuss@lists.lustre.org
 Subject: [Lustre-discuss] aacraid kernel panic caused failover

 Had some crazyness happen to our lustre system. We have two OSSs, both
 identical sun x4140 servers and on only one of them have I've seen
 this pop up in the kernel messages and then a kernel panic. The panic
 seemed to then spread and caused the network to go down and the second
 OSS to try to failover(or failback?). Anyways 'splitbrain' occurred
 and I was able to get in and set them straight. I researched this
 aacraid module messages and so far all I can find says to increase the
 timeout, but these are old messages and currently they are set to 60.
 Anyone else have any ideas?

 aacraid: Host adapter abort request (0,0,0,0)
 aacraid: Host adapter reset request. SCSI hang ?
 AAC: Host adapter BLINK LED 0xef
 AAC0: adapter kernel panic'd ef.

 --
 Personally, I liked the university. They gave us money and facilities,
 we didn't have to produce anything! You've never been out of college!
 You don't know what it's like out there! I've worked in the private
 sector. They expect results. -Ray Ghostbusters
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss




 --
 Personally, I liked the university. They gave us money and facilities,
 we didn't have to produce anything! You've never been out of college!
 You don't know what it's like out there! I've worked in the private
 sector. They expect results. -Ray Ghostbusters

 
 
 

-- 

Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
64291 Darmstadt
www.gsi.de

Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528

Geschäftsführung: Professor Dr. 

[Lustre-discuss] e2fsck and related errors during recovering

2011-04-06 Thread Werner Dilling

Hello,
after a crash of our lustre system (1.6.4) we have problems repairing  
the filesystem. Running the 1.6.4 e2fsck failed on the mds filesystem  
so we tried with the latest 1.8 version which succeeded. But trying to  
mount mds as ldiskfs filesystem failed with the standard error  
message: bad superblock on 

We tried to get more info and the file command
file -s -L /dev/ produced ext2 filesystem instead of ext3  
filesystem which we got from all ost-filesystems.
We were able to produce the mds-database which is needed to get info  
for lfs fsck. But using this database to create the ost databases  
failed with the error message: error getting mds_hdr (large number:8)  
in /tmp/msdb: Cannot allocate memory ..
So I assume the msdb is in bad shape and my question is how we can  
proceed. I assume we have to create a correct version of the  
mds-filesystem and how to do this is unknown. Any help and info is  
appreciated.


Thanks
w.dilling





smime.p7s
Description: S/MIME Signatur
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] aacraid kernel panic caused failover

2011-04-06 Thread Jeff Johnson
I have seen similar behavior on these controllers. On dissimilar configs and 
different aged systems. These happened to be non-Lustre standalone nfs and 
iscsi target boxes. 

Went through controller and drive firmware upgrades, low-level fw dumps  and 
analysis from dev engineers.

In the end it was never really explained or resolved. It appears that these 
controllers, like small children, have tantrums and fall apart. A power cycle 
clears the condition.

Not the best controller for an OSS.

--Jeff

---mobile signature---
Jeff Johnson - Aeon Computing
jeff.john...@aeoncomputing.com


On Apr 6, 2011, at 1:05, Thomas Roth t.r...@gsi.de wrote:

 We have ~ 60 servers with these Adaptec controllers, and found this problem 
 just to happen from time to time.
 Upgrade of the aacraid module wouldn't help. We had contacts to Adaptec, but 
 they had no clue either.
 Only good thing is it seems that this adapter panic happens in an instant, 
 halting the machine, but has no prior phase of degradation: the controller
 doesn't start leaving out every second bit or just writing the '1's and not 
 the '0's or ... - so whatever data has made it to the disks before the
 crash seems to be quite sensible. Reboot and never buy Adaptec again.
 
 Cheers,
 Thomas
 
 On 04/06/2011 07:03 AM, David Noriega wrote:
 Ok I updated the aacraid driver and the raid firmware, yet I still had
 the problem happen, so I did more research and applied the following
 tweaks:
 
 1) Rebuilt mkinitrd with the following options:
 a) edit /etc/sysconfig/mkinitrid/multipath to contain MULTIPATH=yes
 b) mkinitrid initrd-2.6.18-194.3.1.el5_lustre.1.8.4.img
 2.6.18-194.3.1.el5_lustre.1.8.4 --preload=scsi_dh_rdac
 2) Added the local hard disk to the multipath black list
 3) Edited modprobe.conf to have the following aacraid options:
 options aacraid firmware_debug=2 startup_timeout=60 #the debug doesn't
 seem to print anything to dmesg
 4) Added pcie_aspm=off to the kernel boot options
 
 So things looked good for a while. I did have a problem mounting the
 lustre partitions but this was my fault in misconfiguring some lnet
 options I was experimenting with. I fixed that and just as a test, I
 ran 'modprobe lustre' since I wasn't ready to fail back the partitions
 just yet(wanted to wait till when activity was the lowest). That was
 earlier today. I was about to fail back tonight, yet when I checked
 the server again I saw in dmesg the same aacraid problems from before.
 Is it possible lustre is interfering with aacraid? Its weird since I
 do have a duplicate machine and its not having any of thise problems.
 
 On Fri, Mar 25, 2011 at 9:55 AM, Temple  Jason jtem...@cscs.ch wrote:
 Adaptec should have the firmware and drivers on their site for your card.  
 If not adaptec, then SOracle will have it available somewhere.
 
 The firmware and system drivers usually have a utility that will check the 
 current version and upgrade it for you.
 
 Hope this helps (I use different cards, so I can't tell you exactly).
 
 -Jason
 
 -Original Message-
 From: David Noriega [mailto:tsk...@my.utsa.edu]
 Sent: venerdì, 25. marzo 2011 15:47
 To: Temple Jason
 Subject: Re: [Lustre-discuss] aacraid kernel panic caused failover
 
 Hmm not sure, whats the best way to find out?
 
 On Fri, Mar 25, 2011 at 9:46 AM, Temple  Jason jtem...@cscs.ch wrote:
 Hi,
 
 Are you using the latest firmware?  This sort of thing used to happen to 
 me, but with different raid cards.
 
 -Jason
 
 -Original Message-
 From: lustre-discuss-boun...@lists.lustre.org 
 [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of David Noriega
 Sent: venerdì, 25. marzo 2011 15:38
 To: lustre-discuss@lists.lustre.org
 Subject: [Lustre-discuss] aacraid kernel panic caused failover
 
 Had some crazyness happen to our lustre system. We have two OSSs, both
 identical sun x4140 servers and on only one of them have I've seen
 this pop up in the kernel messages and then a kernel panic. The panic
 seemed to then spread and caused the network to go down and the second
 OSS to try to failover(or failback?). Anyways 'splitbrain' occurred
 and I was able to get in and set them straight. I researched this
 aacraid module messages and so far all I can find says to increase the
 timeout, but these are old messages and currently they are set to 60.
 Anyone else have any ideas?
 
 aacraid: Host adapter abort request (0,0,0,0)
 aacraid: Host adapter reset request. SCSI hang ?
 AAC: Host adapter BLINK LED 0xef
 AAC0: adapter kernel panic'd ef.
 
 --
 Personally, I liked the university. They gave us money and facilities,
 we didn't have to produce anything! You've never been out of college!
 You don't know what it's like out there! I've worked in the private
 sector. They expect results. -Ray Ghostbusters
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 
 
 
 --
 Personally, I liked 

Re: [Lustre-discuss] aacraid kernel panic caused failover

2011-04-06 Thread Thomas Roth
Provided your card is actually a Adaptec Raid controller (it says 
Adaptec ASR 5405 on our cards, not Intel or Sun), this is definitely 
not the problem. We have had a number of broken or aged batteries amongs 
our 60 or so controller cards, but never any relation with the kernel 
panic and the controller complaining about its BBU.

Cheers,
Thomas

On 04/06/2011 04:58 PM, David Noriega wrote:
 Our adaptec raid card is a Sun StorageTek RAID INT card, made by intel
 of all people. So I installed the raid manager software, which of
 course doesn't say anything is wrong, but it does come with a
 monitoring daemon and it printed this message after the last aacraid
 kernel panic:

 Sun StorageTek RAID Manager Agent: [203] The battery-backup cache
 device needs a new battery: controller 1.

 So could that be the problem?

 On Wed, Apr 6, 2011 at 7:52 AM, Jeff Johnson
 jeff.john...@aeoncomputing.com  wrote:
 I have seen similar behavior on these controllers. On dissimilar configs and 
 different aged systems. These happened to be non-Lustre standalone nfs and 
 iscsi target boxes.

 Went through controller and drive firmware upgrades, low-level fw dumps  and 
 analysis from dev engineers.

 In the end it was never really explained or resolved. It appears that these 
 controllers, like small children, have tantrums and fall apart. A power 
 cycle clears the condition.

 Not the best controller for an OSS.

 --Jeff

 ---mobile signature---
 Jeff Johnson - Aeon Computing
 jeff.john...@aeoncomputing.com


 On Apr 6, 2011, at 1:05, Thomas Rotht.r...@gsi.de  wrote:

 We have ~ 60 servers with these Adaptec controllers, and found this problem 
 just to happen from time to time.
 Upgrade of the aacraid module wouldn't help. We had contacts to Adaptec, 
 but they had no clue either.
 Only good thing is it seems that this adapter panic happens in an instant, 
 halting the machine, but has no prior phase of degradation: the controller
 doesn't start leaving out every second bit or just writing the '1's and not 
 the '0's or ... - so whatever data has made it to the disks before the
 crash seems to be quite sensible. Reboot and never buy Adaptec again.

 Cheers,
 Thomas

 On 04/06/2011 07:03 AM, David Noriega wrote:
 Ok I updated the aacraid driver and the raid firmware, yet I still had
 the problem happen, so I did more research and applied the following
 tweaks:

 1) Rebuilt mkinitrd with the following options:
 a) edit /etc/sysconfig/mkinitrid/multipath to contain MULTIPATH=yes
 b) mkinitrid initrd-2.6.18-194.3.1.el5_lustre.1.8.4.img
 2.6.18-194.3.1.el5_lustre.1.8.4 --preload=scsi_dh_rdac
 2) Added the local hard disk to the multipath black list
 3) Edited modprobe.conf to have the following aacraid options:
 options aacraid firmware_debug=2 startup_timeout=60 #the debug doesn't
 seem to print anything to dmesg
 4) Added pcie_aspm=off to the kernel boot options

 So things looked good for a while. I did have a problem mounting the
 lustre partitions but this was my fault in misconfiguring some lnet
 options I was experimenting with. I fixed that and just as a test, I
 ran 'modprobe lustre' since I wasn't ready to fail back the partitions
 just yet(wanted to wait till when activity was the lowest). That was
 earlier today. I was about to fail back tonight, yet when I checked
 the server again I saw in dmesg the same aacraid problems from before.
 Is it possible lustre is interfering with aacraid? Its weird since I
 do have a duplicate machine and its not having any of thise problems.

 On Fri, Mar 25, 2011 at 9:55 AM, Temple  Jasonjtem...@cscs.ch  wrote:
 Adaptec should have the firmware and drivers on their site for your card. 
  If not adaptec, then SOracle will have it available somewhere.

 The firmware and system drivers usually have a utility that will check 
 the current version and upgrade it for you.

 Hope this helps (I use different cards, so I can't tell you exactly).

 -Jason

 -Original Message-
 From: David Noriega [mailto:tsk...@my.utsa.edu]
 Sent: venerdì, 25. marzo 2011 15:47
 To: Temple Jason
 Subject: Re: [Lustre-discuss] aacraid kernel panic caused failover

 Hmm not sure, whats the best way to find out?

 On Fri, Mar 25, 2011 at 9:46 AM, Temple  Jasonjtem...@cscs.ch  wrote:
 Hi,

 Are you using the latest firmware?  This sort of thing used to happen to 
 me, but with different raid cards.

 -Jason

 -Original Message-
 From: lustre-discuss-boun...@lists.lustre.org 
 [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of David 
 Noriega
 Sent: venerdì, 25. marzo 2011 15:38
 To: lustre-discuss@lists.lustre.org
 Subject: [Lustre-discuss] aacraid kernel panic caused failover

 Had some crazyness happen to our lustre system. We have two OSSs, both
 identical sun x4140 servers and on only one of them have I've seen
 this pop up in the kernel messages and then a kernel panic. The panic
 seemed to then spread and caused the network to go down and the second
 OSS to try to failover(or 

Re: [Lustre-discuss] aacraid kernel panic caused failover

2011-04-06 Thread David Noriega
It is adaptec based, just branded by sun and built by intel. Anyways I
reseated the card and will wait and see. If it still goes wonky, is
there a card anyone recommends? It has to be a low profile pcie 8x
with two x4 sas internal connectors.

On Wed, Apr 6, 2011 at 10:38 AM, Thomas Roth t.r...@gsi.de wrote:
 Provided your card is actually a Adaptec Raid controller (it says
 Adaptec ASR 5405 on our cards, not Intel or Sun), this is definitely
 not the problem. We have had a number of broken or aged batteries amongs
 our 60 or so controller cards, but never any relation with the kernel
 panic and the controller complaining about its BBU.

 Cheers,
 Thomas

 On 04/06/2011 04:58 PM, David Noriega wrote:
 Our adaptec raid card is a Sun StorageTek RAID INT card, made by intel
 of all people. So I installed the raid manager software, which of
 course doesn't say anything is wrong, but it does come with a
 monitoring daemon and it printed this message after the last aacraid
 kernel panic:

 Sun StorageTek RAID Manager Agent: [203] The battery-backup cache
 device needs a new battery: controller 1.

 So could that be the problem?

 On Wed, Apr 6, 2011 at 7:52 AM, Jeff Johnson
 jeff.john...@aeoncomputing.com  wrote:
 I have seen similar behavior on these controllers. On dissimilar configs 
 and different aged systems. These happened to be non-Lustre standalone nfs 
 and iscsi target boxes.

 Went through controller and drive firmware upgrades, low-level fw dumps  
 and analysis from dev engineers.

 In the end it was never really explained or resolved. It appears that these 
 controllers, like small children, have tantrums and fall apart. A power 
 cycle clears the condition.

 Not the best controller for an OSS.

 --Jeff

 ---mobile signature---
 Jeff Johnson - Aeon Computing
 jeff.john...@aeoncomputing.com


 On Apr 6, 2011, at 1:05, Thomas Rotht.r...@gsi.de  wrote:

 We have ~ 60 servers with these Adaptec controllers, and found this 
 problem just to happen from time to time.
 Upgrade of the aacraid module wouldn't help. We had contacts to Adaptec, 
 but they had no clue either.
 Only good thing is it seems that this adapter panic happens in an instant, 
 halting the machine, but has no prior phase of degradation: the controller
 doesn't start leaving out every second bit or just writing the '1's and 
 not the '0's or ... - so whatever data has made it to the disks before the
 crash seems to be quite sensible. Reboot and never buy Adaptec again.

 Cheers,
 Thomas

 On 04/06/2011 07:03 AM, David Noriega wrote:
 Ok I updated the aacraid driver and the raid firmware, yet I still had
 the problem happen, so I did more research and applied the following
 tweaks:

 1) Rebuilt mkinitrd with the following options:
 a) edit /etc/sysconfig/mkinitrid/multipath to contain MULTIPATH=yes
 b) mkinitrid initrd-2.6.18-194.3.1.el5_lustre.1.8.4.img
 2.6.18-194.3.1.el5_lustre.1.8.4 --preload=scsi_dh_rdac
 2) Added the local hard disk to the multipath black list
 3) Edited modprobe.conf to have the following aacraid options:
 options aacraid firmware_debug=2 startup_timeout=60 #the debug doesn't
 seem to print anything to dmesg
 4) Added pcie_aspm=off to the kernel boot options

 So things looked good for a while. I did have a problem mounting the
 lustre partitions but this was my fault in misconfiguring some lnet
 options I was experimenting with. I fixed that and just as a test, I
 ran 'modprobe lustre' since I wasn't ready to fail back the partitions
 just yet(wanted to wait till when activity was the lowest). That was
 earlier today. I was about to fail back tonight, yet when I checked
 the server again I saw in dmesg the same aacraid problems from before.
 Is it possible lustre is interfering with aacraid? Its weird since I
 do have a duplicate machine and its not having any of thise problems.

 On Fri, Mar 25, 2011 at 9:55 AM, Temple  Jasonjtem...@cscs.ch  wrote:
 Adaptec should have the firmware and drivers on their site for your 
 card.  If not adaptec, then SOracle will have it available somewhere.

 The firmware and system drivers usually have a utility that will check 
 the current version and upgrade it for you.

 Hope this helps (I use different cards, so I can't tell you exactly).

 -Jason

 -Original Message-
 From: David Noriega [mailto:tsk...@my.utsa.edu]
 Sent: venerdì, 25. marzo 2011 15:47
 To: Temple Jason
 Subject: Re: [Lustre-discuss] aacraid kernel panic caused failover

 Hmm not sure, whats the best way to find out?

 On Fri, Mar 25, 2011 at 9:46 AM, Temple  Jasonjtem...@cscs.ch  wrote:
 Hi,

 Are you using the latest firmware?  This sort of thing used to happen 
 to me, but with different raid cards.

 -Jason

 -Original Message-
 From: lustre-discuss-boun...@lists.lustre.org 
 [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of David 
 Noriega
 Sent: venerdì, 25. marzo 2011 15:38
 To: lustre-discuss@lists.lustre.org
 Subject: [Lustre-discuss] aacraid kernel panic caused failover

Re: [Lustre-discuss] e2fsck and related errors during recovering

2011-04-06 Thread Andreas Dilger
Having the actual error messages makes this kind of problem much easier to 
solve.

At a guess, if the journal was removed by e2fsck you can re-add it with 
tune2fs -J size=400 /dev/{mdsdev}.

As for lfsck, if you still need to run it, you need to make sure the same 
version of e2fsprogs is on all OSTs and MDS. 

Cheers, Andreas

On 2011-04-06, at 1:26 AM, Werner Dilling dill...@zdv.uni-tuebingen.de wrote:

 Hello,
 after a crash of our lustre system (1.6.4) we have problems repairing the 
 filesystem. Running the 1.6.4 e2fsck failed on the mds filesystem so we tried 
 with the latest 1.8 version which succeeded. But trying to mount mds as 
 ldiskfs filesystem failed with the standard error message: bad superblock on 
 
 We tried to get more info and the file command
 file -s -L /dev/ produced ext2 filesystem instead of ext3 filesystem 
 which we got from all ost-filesystems.
 We were able to produce the mds-database which is needed to get info for lfs 
 fsck. But using this database to create the ost databases failed with the 
 error message: error getting mds_hdr (large number:8) in /tmp/msdb: Cannot 
 allocate memory ..
 So I assume the msdb is in bad shape and my question is how we can proceed. I 
 assume we have to create a correct version of the mds-filesystem and how to 
 do this is unknown. Any help and info is appreciated.
 
 Thanks
 w.dilling
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] e2fsck and related errors during recovering

2011-04-06 Thread Larry
Is it helpful updating the e2fsprogs to the newest version? I have
ever had a problem during e2fsck, after updating the e2fsprogs, it's
ok.

On Thu, Apr 7, 2011 at 2:29 AM, Andreas Dilger adil...@whamcloud.com wrote:
 Having the actual error messages makes this kind of problem much easier to 
 solve.

 At a guess, if the journal was removed by e2fsck you can re-add it with 
 tune2fs -J size=400 /dev/{mdsdev}.

 As for lfsck, if you still need to run it, you need to make sure the same 
 version of e2fsprogs is on all OSTs and MDS.

 Cheers, Andreas

 On 2011-04-06, at 1:26 AM, Werner Dilling dill...@zdv.uni-tuebingen.de 
 wrote:

 Hello,
 after a crash of our lustre system (1.6.4) we have problems repairing the 
 filesystem. Running the 1.6.4 e2fsck failed on the mds filesystem so we 
 tried with the latest 1.8 version which succeeded. But trying to mount mds 
 as ldiskfs filesystem failed with the standard error message: bad superblock 
 on 
 We tried to get more info and the file command
 file -s -L /dev/ produced ext2 filesystem instead of ext3 filesystem 
 which we got from all ost-filesystems.
 We were able to produce the mds-database which is needed to get info for lfs 
 fsck. But using this database to create the ost databases failed with the 
 error message: error getting mds_hdr (large number:8) in /tmp/msdb: Cannot 
 allocate memory ..
 So I assume the msdb is in bad shape and my question is how we can proceed. 
 I assume we have to create a correct version of the mds-filesystem and how 
 to do this is unknown. Any help and info is appreciated.

 Thanks
 w.dilling



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss