On Wed, Oct 03, 2007 at 05:43:06PM -0700, Russ Allbery wrote: > Jose Calhariz <[EMAIL PROTECTED]> writes: > > > I had an error message from the reiserfs, on Thursday night. But the > > corruption went bigger, and more and more volumes were going offline. > > So I stop it to do an fsck. That't when the fsck failed. I didn't > > stopped the fileserver on Friday because was production hours. Maybe my > > killing mistake. > > Ah, okay. > > I definitely recommend against using ReiserFS for any production purposes > (completely apart from whether you use AFS or not).
I don't know what happen. I have only two leads. One IO error message from reiserfs on the begin of everything. And after the loss I found a strange behavior with the hardware RAID5. I need to do further investigation. And most important I learned I don't know enough about reiserfs guts. So I really don't understand the error messages from reiserfsck. I will move into ext3, that I know very well, or XFS, I have a local expert that can to help in case o trouble with XFS. I remember see an online presentation from an AFS workshop were XFS was considered best than ext3 for /vicep partitions. > > > I can be wrong, but I need to use my root.afs. I need a link on /afs as > > a shortcut for my cellname. So I can't use -dynroot on some clients. > > Correct me if I am wrong. > > This is what the CellAlias configuration file is for. It's hard to tell > exactly why the client didn't work; it doesn't sound like you have much > information about what failed or what could have been happening. Thank you. I didn't know about that file. > > > I am talking by memory, as I didn't saved the log files. I had seen > > messages of exit with various numbers, 0, 1 and maybe 15. No core file, > > how do I enable core files? > > Make sure that you don't have core limit size limited when you start the > file server and they should happen automatically if the file server > actually fails. Ok, I have by default "ulimit -c 0". I don't depend on core files for so many years I forget about ulimit -c 0. Now I am a sysadm not a programmer. I only program in bash and install gdb for other people to use, not for myself :-) > But if you don't have any exit status other than 0, 1, > and 15, the file server isn't failing. Which again raises the question of > what the problem actually is. > > If the file server is not existing with any status other than those three, > I'm 99% certain that the stack limit is not an issue for you. What I > would expect, were it to run into a stack limit, would be a bus error or > segfault. I have restarted my fileserver. No problem this time with "ulimit -s 8192". So I think you are right. My last 3 VLDB servers were in trouble on that day and were creating more problems everywhere. The salvage was taking 40 minutes, so I had time to solve the other problems before I put all my efforts on the last one. The failing file server. Thank you for your help on this issue. -- P.S. [En_US] The sig below is from my random sig-generator, which strangely often seems to pick signatures which are apropriate to the message at hand! P.S. [Pt_Pt] A assinatura em baixo é do gerador aleatório de assinaturas, que estranhamente, escolhe com frequência assinaturas que parecem apropriadas ao email! -- A vantagem de ser milionário é poder falar o que ser quer, para quem se quer e como se quer --Príncipe Johannes von Thurn und
signature.asc
Description: Digital signature
