Re: Filesystem corruption
On 2007-06-06 11:10, Xu CanHao wrote: So maybe I'd suggest anybody take the _official_ reiser4 patch-set and _vanilla_ kernel source, these things should provide the maximum stability. My root filesystem with reiser4 never loses data. I fully agree, as long as there _exists_ a current official patch. That was not always the case in the recent past. No wonder people started to get their own hands dirty from time to time. Btw: It's also fun to read / mess with the code ... -- Ingo Bormuth, voicebox fax: +49-(0)-12125-10226517 public key 86326EC9, http://ibormuth.efil.de/contact
Re: Filesystem corruption
On 2007-06-04 13:41, Edward Shishkin wrote: When performing mapping read (needed for execution, etc) reiser4 converts small files from tails to extents and back (your /bin/sleep is less then 4 * blocksize, right?) Yes, it's 15k. The conversion is done on disk, even when mounted read only? I'd like to see the logic in the code. In case you just know by heart, it' would be nice if you could give me a little hint where to start at. Please, rebuild your kernel with the official patch [...] Please, report, if such data loss still takes place after upgrade. I'll keep you informed ... Thanks. -- Ingo Bormuth, voicebox fax: +49-(0)-12125-10226517 public key 86326EC9, http://ibormuth.efil.de/contact
Re: Filesystem corruption
So maybe I'd suggest anybody take the _official_ reiser4 patch-set and _vanilla_ kernel source, these things should provide the maximum stability. My root filesystem with reiser4 never loses data.
Re: Filesystem corruption
Ingo Bormuth wrote: On 2007-06-03 03:10, Edward Shishkin wrote: Ingo Bormuth wrote: Hm, same here. I lost /bin/sleep several times. Would you please describe the problem in more details? What kernel version? What does I lost /bin/sleep mean? Does it mean that: 1. /bin/sleep was truncated to 0 bytes, i.e. ls -l /bin/sleep shows something like -rwxr-xr-x 1 root root 0 2005-04-20 18:32 /bin/sleep 2. /bin/sleep disappeared (ls -l /bin doesn't show this file) 3. /bin/sleep exists, but filled by zeros etc... The file was removed by 'fsck.reiser4 --fix' which emmitted a message about deleting a corrupted file. (Case 2 in your list). This always happened after a system freeze or power loss. The machine freezes quite frequently - I think it has a DMA problem. Nevertheless I don't see how a file that was not written to can get corrupted. When performing mapping read (needed for execution, etc) reiser4 converts small files from tails to extents and back (your /bin/sleep is less then 4 * blocksize, right?) Current kernel is 2.6.20.5 (the reiser4 patch I submitted to this list on may 2nd). Please, rebuild your kernel with the official patch http://ftp.namesys.com/pub/reiser4-for-2.6/2.6.20/ It contains a bugfix related to tail conversion (races when acquiring exclusive access). Please, report, if such data loss still takes place after upgrade. Thanks, Edward. Root is mounted rw,noatime,nodiratime,onerror=remount-ro,tmgr.atom_max_age=60 Hope that helps.
Re: Filesystem corruption
On 2007-06-03 03:10, Edward Shishkin wrote: Ingo Bormuth wrote: Hm, same here. I lost /bin/sleep several times. Would you please describe the problem in more details? What kernel version? What does I lost /bin/sleep mean? Does it mean that: 1. /bin/sleep was truncated to 0 bytes, i.e. ls -l /bin/sleep shows something like -rwxr-xr-x 1 root root 0 2005-04-20 18:32 /bin/sleep 2. /bin/sleep disappeared (ls -l /bin doesn't show this file) 3. /bin/sleep exists, but filled by zeros etc... The file was removed by 'fsck.reiser4 --fix' which emmitted a message about deleting a corrupted file. (Case 2 in your list). This always happened after a system freeze or power loss. The machine freezes quite frequently - I think it has a DMA problem. Nevertheless I don't see how a file that was not written to can get corrupted. Current kernel is 2.6.20.5 (the reiser4 patch I submitted to this list on may 2nd). Root is mounted rw,noatime,nodiratime,onerror=remount-ro,tmgr.atom_max_age=60 Hope that helps. -- Ingo Bormuth, voicebox fax: +49-(0)-12125-10226517 public key 86326EC9, http://ibormuth.efil.de/contact
Re: Filesystem corruption
On 2007-05-30 15:03, David Masover wrote: Only, recently, these fsck-a-thons started happening more and more often, and I started to lose random files. They'd just be silently truncated to 0 bytes. And not files I was writing a lot -- I'm talking about things like /bin/mount. Hm, same here. I lost /bin/sleep several times. I have a little script printing status messages to the screen, sleeping two seconds and print again - you name it. The probability that /bin/sleep is accessed at the same time the system crashes is quite high (this is _no_ write access, the system is even mounted noatime). How could pure execution of a file cause corruption of the file itself? Any idea ? Apart from that single file, I never had any serious problems with reiser4 on three busy systems for years - fsck.reiser4 works like charme. -- Ingo Bormuth, voicebox fax: +49-(0)-12125-10226517 public key 86326EC9, http://ibormuth.efil.de/contact
Re: Filesystem corruption
Ingo Bormuth wrote: On 2007-05-30 15:03, David Masover wrote: Only, recently, these fsck-a-thons started happening more and more often, and I started to lose random files. They'd just be silently truncated to 0 bytes. And not files I was writing a lot -- I'm talking about things like /bin/mount. Hm, same here. I lost /bin/sleep several times. Would you please describe the problem in more details? What kernel version? What does I lost /bin/sleep mean? Does it mean that: 1. /bin/sleep was truncated to 0 bytes, i.e. ls -l /bin/sleep shows something like -rwxr-xr-x 1 root root 0 2005-04-20 18:32 /bin/sleep 2. /bin/sleep disappeared (ls -l /bin doesn't show this file) 3. /bin/sleep exists, but filled by zeros etc... Thanks, Edward. I have a little script printing status messages to the screen, sleeping two seconds and print again - you name it. The probability that /bin/sleep is accessed at the same time the system crashes is quite high (this is _no_ write access, the system is even mounted noatime). How could pure execution of a file cause corruption of the file itself? Any idea ? Apart from that single file, I never had any serious problems with reiser4 on three busy systems for years - fsck.reiser4 works like charme.
Re: Filesystem corruption
On Tuesday 29 May 2007 07:36:13 Toby Thain wrote: but you can't mention using reiserfs in mixed company without someone accusing you of throwing your data away. People who repeat this rarely have any direct experience of Reiser; they repeat what they've heard; like all myths and legends they are transmitted orally rather than based on scientific observation. Well, there is one problem I vaguely remember that I don't think has been addressed, I think it was one of those lets-put-it-off-till-v4 things. It was the fact that there are a limited number of inodes (or keys, or whatever you call a unique file), and no way of knowing how many you have left until your FS will suddenly, one day refuse to create another file. (For comparison, ext3 seems to support not only telling you how many inodes you have left, but tuning that on the fly.) But, I haven't run into that, and the only problem I've had lately has been Reiser4 losing data, and crashing occasionally. I switched most of my data off of Reiser4 and onto XFS for that reason. I've also been using ext3 in some places, and Reiser3 in others (one place in particular where space is limited, but I will have tons of small files). I later learned that XFS does out-of-order writes by default, making me think I should give up and invest in UPS hardware. But, switching away from Reiser4 means I no longer see random files (including stuff in, for example, /sbin, that I hadn't touched in months) go up in smoke. Ordinarily I like to help debug things, but not at the risk of my data. Maybe I'll try again later, and see if I can reproduce it in a VM or somewhere safe... I do still follow the list, though, in case something interesting happens. It was fun while it lasted! pgpariYsg6fOw.pgp Description: PGP signature
Re: Filesystem corruption
Hello On Wednesday 30 May 2007 17:25, David Masover wrote: On Tuesday 29 May 2007 07:36:13 Toby Thain wrote: but you can't mention using reiserfs in mixed company without someone accusing you of throwing your data away. People who repeat this rarely have any direct experience of Reiser; they repeat what they've heard; like all myths and legends they are transmitted orally rather than based on scientific observation. Well, there is one problem I vaguely remember that I don't think has been addressed, I think it was one of those lets-put-it-off-till-v4 things. It was the fact that there are a limited number of inodes (or keys, or whatever you call a unique file), and no way of knowing how many you have left until your FS will suddenly, one day refuse to create another file. reiserfs is limited to ~2^32 file creations. It is possible to exhaust but I do not remember any reports about that. (For comparison, ext3 seems to support not only telling you how many inodes you have left, but tuning that on the fly.) But, I haven't run into that, and the only problem I've had lately has been Reiser4 losing data, and crashing occasionally. I switched most of my data off of Reiser4 and onto XFS for that reason. I've also been using ext3 in some places, and Reiser3 in others (one place in particular where space is limited, but I will have tons of small files). I later learned that XFS does out-of-order writes by default, making me think I should give up and invest in UPS hardware. But, switching away from Reiser4 means I no longer see random files (including stuff in, for example, /sbin, that I hadn't touched in months) go up in smoke. Ordinarily I like to help debug things, but not at the risk of my data. Maybe I'll try again later, and see if I can reproduce it in a VM or somewhere safe... that would be great, thanks I do still follow the list, though, in case something interesting happens. It was fun while it lasted!
Re: Filesystem corruption
Hello On Tuesday 29 May 2007 16:36, Toby Thain wrote: I have always found reiser3 to be rock solid My experienced too, over many server years. but you can't mention using reiserfs in mixed company without someone accusing you of throwing your data away. People who repeat this rarely have any direct experience of Reiser; they repeat what they've heard; like all myths and legends they are transmitted orally rather than based on scientific observation. well, there were in past several bad stories when reiserfsck was unable restore filesystems because it was unable to find reiserfs metadata. Later we found that sometimes (for unknown (but not likely due to reiserfs problem) reason) partition table changes so that beginning of a partition gets shifted by few sectors. So, now, when a user reports that reiserfs metadata disappered from a device completely - recovering a partition table to original state makes data available again. You would think the developers would be doing more to counter this but I have been following reiserfs for years and nobody seems to really care all that much. Can't do much about human nature. MySQL suffers from the same baseless poisoned folk wisdom. --Toby
Re: Filesystem corruption
On 30-May-07, at 10:25 AM, David Masover wrote: On Tuesday 29 May 2007 07:36:13 Toby Thain wrote: but you can't mention using reiserfs in mixed company without someone accusing you of throwing your data away. People who repeat this rarely have any direct experience of Reiser; they repeat what they've heard; like all myths and legends they are transmitted orally rather than based on scientific observation. Well, there is one problem I vaguely remember that I don't think has been addressed, I think it was one of those lets-put-it-off-till-v4 things. It was the fact that there are a limited number of inodes (or keys, or whatever you call a unique file), But does it cause data loss? One usually sees claims that reiserfs ate my data, or I heard reiserfs ate somebody's data, but without supplying a root cause - bad memory? powerfail? bad disk? etc. and no way of knowing how many you have left until your FS will suddenly, one day refuse to create another file. ... switching away from Reiser4 means I no longer see random files (including stuff in, for example, /sbin, that I hadn't touched in months) go up in smoke. I only wish sanity had prevailed over kernel inclusion, then we'd see it shaken down a lot quicker, like R3 was. Ordinarily I like to help debug things, but not at the risk of my data. Maybe I'll try again later, and see if I can reproduce it in a VM or somewhere safe... I do still follow the list, though, in case something interesting happens. Yeah, R4 is something interesting. :) I still hope it gets finished... --Toby It was fun while it lasted!
Re: Filesystem corruption
I think people just like to spread FUD without doing any analysis of what really caused the FS corruption. It can be anything from a bad 3rd party driver to bad hardware ('bad blocks', does anybody check for them before mkfs these days? I do). People also like to try those untested patchsets, containing every blah that's thrown out by so called 'kernel hackers' which makes your system 10x faster. Rieser4 seems like an easy candidate to vent their anger on afterwards. I have used R4 for a year now and I have had to reset my PC, troubleshooting problems with vmware/mythtv/cisco vpn client/nvidia, so many times that its not even funny! And R4 didn't give me any problems even once. It boots right up, without any files lost and consistent FS as a subsequent livecd boot and fsck proved it everytime. If I did that to ext or xfs, I would have lost big time. Only files I have ever lost were on ext3 during a sudden power failure. I don't trust safety of my data on any FS but Rieserfs. I hope people don't leave this good piece of code to rot!! -devsk - Original Message From: Toby Thain [EMAIL PROTECTED] To: David Masover [EMAIL PROTECTED] Cc: ReiserFS List reiserfs-list@namesys.com Sent: Wednesday, May 30, 2007 9:42:01 AM Subject: Re: Filesystem corruption On 30-May-07, at 10:25 AM, David Masover wrote: On Tuesday 29 May 2007 07:36:13 Toby Thain wrote: but you can't mention using reiserfs in mixed company without someone accusing you of throwing your data away. People who repeat this rarely have any direct experience of Reiser; they repeat what they've heard; like all myths and legends they are transmitted orally rather than based on scientific observation. Well, there is one problem I vaguely remember that I don't think has been addressed, I think it was one of those lets-put-it-off-till-v4 things. It was the fact that there are a limited number of inodes (or keys, or whatever you call a unique file), But does it cause data loss? One usually sees claims that reiserfs ate my data, or I heard reiserfs ate somebody's data, but without supplying a root cause - bad memory? powerfail? bad disk? etc. and no way of knowing how many you have left until your FS will suddenly, one day refuse to create another file. ... switching away from Reiser4 means I no longer see random files (including stuff in, for example, /sbin, that I hadn't touched in months) go up in smoke. I only wish sanity had prevailed over kernel inclusion, then we'd see it shaken down a lot quicker, like R3 was. Ordinarily I like to help debug things, but not at the risk of my data. Maybe I'll try again later, and see if I can reproduce it in a VM or somewhere safe... I do still follow the list, though, in case something interesting happens. Yeah, R4 is something interesting. :) I still hope it gets finished... --Toby It was fun while it lasted! Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games. http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow
Re: Filesystem corruption
On 30-May-07, at 2:22 PM, devsk wrote: I think people just like to spread FUD without doing any analysis of what really caused the FS corruption. I fear you're right. OTOH, filesystem developers on this list (and others including ZFS list) tend to be extremely meticulous. --Toby
Re: Filesystem corruption
On Wednesday 30 May 2007 11:42:01 Toby Thain wrote: But does it cause data loss? One usually sees claims that reiserfs ate my data, or I heard reiserfs ate somebody's data, but without supplying a root cause - bad memory? powerfail? bad disk? etc. Power failure shouldn't kill a filesystem, and generally shouldn't eat data that was written to disk before the failure. (Although I could complain all day here about why corruption happens anyway when you do any kind of out-of-order operations... I am looking forward to that Reiser4 transaction API, so we can finally get rid of the tmpfile+rename hack.) But in any case, there were some kernels -- 2.4.16, I think? -- in which reiserfs was unstable and did corrupt easily. I believe that was tracked down to kernel bugs outside of reiserfs. pgpro4QoRvDOq.pgp Description: PGP signature
Re: Filesystem corruption
On Wednesday 30 May 2007 12:22:17 devsk wrote: I have used R4 for a year now and I have had to reset my PC, troubleshooting problems with vmware/mythtv/cisco vpn client/nvidia, so many times that its not even funny! And R4 didn't give me any problems even once. It boots right up, without any files lost and consistent FS as a subsequent livecd boot and fsck proved it everytime. That happened to me for maybe a year or so, I'm not sure. Then, slowly, I started to get problems. The machine crashing due to some nvidia bug -- or even a reiser-specific oops or something -- then I'd have to fsck it, which would take an hour or more, then I'd boot, and apparently no problems. Only, recently, these fsck-a-thons started happening more and more often, and I started to lose random files. They'd just be silently truncated to 0 bytes. And not files I was writing a lot -- I'm talking about things like /bin/mount. Now, maybe it's an amd64-specific bug. Or (somehow) a dmraid-specific bug, or a dont_load_bitmap bug. (Who can blame me; without dont_load_bitmap, it takes at least 30 seconds, maybe a minute to mount.) Could even be, somehow, a Gentoo-specific bug. Could be a 350-gig-partition bug, or even a bug of the it-hates-me variety. (My server ran Reiser4 for awhile longer, with no problems, but I wasn't about to take chances there.) But, I switched a friend over to Ubuntu, and he had the same kind of problems. In fact, he had them first (I thought it was his computer, for awhile). Finally, we switched to stock Ubuntu kernels and XFS, me on dmraid, him on normal linux raid5 (md), and we now have no problems. It's even faster -- the biggest gain for Reiser4 was /usr/portage, which doesn't exist on Ubuntu. If I did that to ext or xfs, I would have lost big time. Well, I'm on XFS on my desktop now, and ext3 on my server. No problems at all so far. Also much faster, because my desktop now has a repacker (xfs_fsr). I hope people don't leave this good piece of code to rot!! Me too, but you know, I can no longer afford to spend a few hours running fsck for no apparent reason. I no longer have a machine that can do anything but just work. The killer feature of Reiser4, as implemented, is small file performance that makes ReiserFSv3 weep, and v3 makes XFS weep. All the other stuff we were promised is either planned for a later release (repacker, pseudofiles, transaction API) or barely working (cryptocompress). And on just about any setup I work on today, small file performance is a small enough priority that even the slightest hint of instability is a deal-breaker. Enough people feel the same way that ext3 is still widely used. And if it's ever really crucial, there's reiserfs3. So, you can blame it on my hardware, or on not getting kernel inclusion, or anything you want, but the only place I still use Reiser4 is on the gameserver at our LAN party, and we're thinking of moving that to something like ext3 or xfs, just so we don't need custom kernels. And after all, that's a gameserver, it's not like the filesystem is the bottleneck anyway. pgpyny6ogblkT.pgp Description: PGP signature
Re: Filesystem corruption
On Wednesday 30 May 2007 11:02:26 Vladimir V. Saveliev wrote: Ordinarily I like to help debug things, but not at the risk of my data. Maybe I'll try again later, and see if I can reproduce it in a VM or somewhere safe... that would be great, thanks Keep in mind, it's unlikely, given I don't have much resembling my original setup left around. And it was fairly random, under fairly normal usage patterns -- just I'd suddenly notice my movie had stopped playing, and I'd hit ctrl+alt+f8 and find a bunch of reiser4 error messages. Is it at all likely that this is an amd64 bug? (The only two places I've seen it are on my box and my friend's, both amd64 on some sort of RAID.) If you don't have enough testers or hardware for amd64, I can try (again) to setup a working x86_64 VM for you to test on. pgphsmCDRGDn1.pgp Description: PGP signature
Re: Filesystem corruption
David, Its funny how my setup is very similar to yours: gentoo, amd64, nvraid using dmraid. mount/mkfs is VERY fast (less than a second) here, and I don't use any specific mount options except noatime. My partition is about 16GB though, hosting '/' and /home. what sources do you use? I use gentoo-sources (currently using 2.6.21-r2) with the latest stable patch (currently 2.6.21) from namesys, applied manually. Nothing else. I use suspend-to-ram (with a UPS) and the whole system is rock solid. -devsk - Original Message From: David Masover [EMAIL PROTECTED] To: devsk [EMAIL PROTECTED] Cc: Toby Thain [EMAIL PROTECTED]; ReiserFS List reiserfs-list@namesys.com Sent: Wednesday, May 30, 2007 1:03:14 PM Subject: Re: Filesystem corruption On Wednesday 30 May 2007 12:22:17 devsk wrote: I have used R4 for a year now and I have had to reset my PC, troubleshooting problems with vmware/mythtv/cisco vpn client/nvidia, so many times that its not even funny! And R4 didn't give me any problems even once. It boots right up, without any files lost and consistent FS as a subsequent livecd boot and fsck proved it everytime. That happened to me for maybe a year or so, I'm not sure. Then, slowly, I started to get problems. The machine crashing due to some nvidia bug -- or even a reiser-specific oops or something -- then I'd have to fsck it, which would take an hour or more, then I'd boot, and apparently no problems. Only, recently, these fsck-a-thons started happening more and more often, and I started to lose random files. They'd just be silently truncated to 0 bytes. And not files I was writing a lot -- I'm talking about things like /bin/mount. Now, maybe it's an amd64-specific bug. Or (somehow) a dmraid-specific bug, or a dont_load_bitmap bug. (Who can blame me; without dont_load_bitmap, it takes at least 30 seconds, maybe a minute to mount.) Could even be, somehow, a Gentoo-specific bug. Could be a 350-gig-partition bug, or even a bug of the it-hates-me variety. (My server ran Reiser4 for awhile longer, with no problems, but I wasn't about to take chances there.) But, I switched a friend over to Ubuntu, and he had the same kind of problems. In fact, he had them first (I thought it was his computer, for awhile). Finally, we switched to stock Ubuntu kernels and XFS, me on dmraid, him on normal linux raid5 (md), and we now have no problems. It's even faster -- the biggest gain for Reiser4 was /usr/portage, which doesn't exist on Ubuntu. If I did that to ext or xfs, I would have lost big time. Well, I'm on XFS on my desktop now, and ext3 on my server. No problems at all so far. Also much faster, because my desktop now has a repacker (xfs_fsr). I hope people don't leave this good piece of code to rot!! Me too, but you know, I can no longer afford to spend a few hours running fsck for no apparent reason. I no longer have a machine that can do anything but just work. The killer feature of Reiser4, as implemented, is small file performance that makes ReiserFSv3 weep, and v3 makes XFS weep. All the other stuff we were promised is either planned for a later release (repacker, pseudofiles, transaction API) or barely working (cryptocompress). And on just about any setup I work on today, small file performance is a small enough priority that even the slightest hint of instability is a deal-breaker. Enough people feel the same way that ext3 is still widely used. And if it's ever really crucial, there's reiserfs3. So, you can blame it on my hardware, or on not getting kernel inclusion, or anything you want, but the only place I still use Reiser4 is on the gameserver at our LAN party, and we're thinking of moving that to something like ext3 or xfs, just so we don't need custom kernels. And after all, that's a gameserver, it's not like the filesystem is the bottleneck anyway. Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online. http://smallbusiness.yahoo.com/webhosting
Re: Filesystem corruption
Hello On Tuesday 29 May 2007 08:18, Tracy R Reed wrote: Laurent CARON wrote: Seems to me it is a filesystem corruption. Did I miss it or did not a single person ask you if this happened with reiserfs 3 or 4? Laurent mentioned rebuild-tree mode of reiserfsck. So the problem happened with reiserfs 3. I would be quite surprised if this were reiser 3 and not so surprised if it were reiser 4 which is still beta afaik. Reiser has a nasty reputation for filesystem corruption more than any other fs. I have always found reiser3 to be rock solid but you can't mention using reiserfs in mixed company without someone accusing you of throwing your data away. You would think the developers would be doing more to counter this but I have been following reiserfs for years and nobody seems to really care all that much.
Re: Filesystem corruption
Hello On Monday 28 May 2007 22:16, Laurent CARON wrote: Christian Kujau a écrit : Please try to check the fs with a current version of reiserfsprogs first. As the manpage advises, try --check first and use --rebuild-tree only if you know what you're doing, IOW: have a current backup. Over the past few years, i experienced a few reiser corruption on various hardware (dell, hp, asus, sata, scsi, ide...) with the same symptoms (unredable file/dir). Always ran check which told me to run fix-fixable or rebuild-tree, which I did after ensuring of backup reliability, and the error was corrected (after eventually losing a few files i fortunately had in the backups). Would you run reiserfsck --check -l log and let us see the log? That may give a hint about which kind of corruptions do you have. Also, which kernel/machine is this running on? Do you know *why* this corruption may have occured? Any recent hardware issues? Is ther anything in the logs regarding fs/device errors? Kernel is 2.6.19. The machine does not seem to have any HW issue, nothing strange in the logs. :$ This is just a plain Dell 2650 server with a bunch of SCSI HDD, software raid5 array, reiserfs on top of it. Laurent
Re: Filesystem corruption
I have always found reiser3 to be rock solid My experienced too, over many server years. but you can't mention using reiserfs in mixed company without someone accusing you of throwing your data away. People who repeat this rarely have any direct experience of Reiser; they repeat what they've heard; like all myths and legends they are transmitted orally rather than based on scientific observation. You would think the developers would be doing more to counter this but I have been following reiserfs for years and nobody seems to really care all that much. Can't do much about human nature. MySQL suffers from the same baseless poisoned folk wisdom. --Toby
Re: Filesystem corruption
Hello On Sunday 27 May 2007 17:18, Laurent CARON wrote: Hi, A few days ago, one of my procmail suddenly receipes stopped to work. I didn't care much since this only was for 1 or 2 mails. Yesterday, i took time to dig it a bit further and looked at the filesystem on my mail server Here is the output of ls -al in the Maildir where my mails are stored total 1341 drwx-- 6 lcaron mail 256 2007-05-24 10:35 ./ drwx-- 363 lcaron mail 12184 2007-05-25 21:52 ../ -rw-r--r-- 1 lcaron mail17 2004-05-25 09:19 courierimapacl drwx-- 2 lcaron mail48 2004-05-25 09:20 courierimapkeywords/ -rw-r--r-- 1 lcaron lcaron 169365 2007-05-24 10:35 courierimapuiddb drwx-- 2 lcaron mail 1185016 2007-05-24 10:26 cur/ -rw--- 1 lcaron mail 0 2004-05-25 09:19 maildirfolder ?- ? ? ??? new drwx-- 2 lcaron mail48 2007-05-24 19:16 tmp/ The entry that scares me is ?- ? ? ??? new Seems to me it is a filesystem corruption. Any other solution than rebuild-tree ? Did you try rm -rf new? Thanks Laurent
Re: Filesystem corruption
Vladimir V. Saveliev a écrit : Did you try rm -rf new? $ rm -rf new rm: cannot lstat `new': Permission denied
Re: Filesystem corruption
Hello On Monday 28 May 2007 18:10, Laurent CARON wrote: Vladimir V. Saveliev a écrit : Did you try rm -rf new? $ rm -rf new rm: cannot lstat `new': Permission denied Is there anything from reiserfs in system logs?
Re: Filesystem corruption
Vladimir V. Saveliev a écrit : Is there anything from reiserfs in system logs? Nothing from reiserfs/kernel in I did experience a similar bug on another computer a while ago (this bug was fixed by rebuilding the tree).
Re: Filesystem corruption
[resending, because lncsa.com bounced my mail] On Mon, 28 May 2007, Christian Kujau wrote: On Sun, 27 May 2007, Laurent CARON wrote: The entry that scares me is ?- ? ? ??? new Seems to me it is a filesystem corruption. Any other solution than rebuild-tree ? Please try to check the fs with a current version of reiserfsprogs first. As the manpage advises, try --check first and use --rebuild-tree only if you know what you're doing, IOW: have a current backup. Also, which kernel/machine is this running on? Do you know *why* this corruption may have occured? Any recent hardware issues? Is ther anything in the logs regarding fs/device errors? C. -- BOFH excuse #448: vi needs to be upgraded to vii
Re: Filesystem corruption
Christian Kujau a écrit : Please try to check the fs with a current version of reiserfsprogs first. As the manpage advises, try --check first and use --rebuild-tree only if you know what you're doing, IOW: have a current backup. Over the past few years, i experienced a few reiser corruption on various hardware (dell, hp, asus, sata, scsi, ide...) with the same symptoms (unredable file/dir). Always ran check which told me to run fix-fixable or rebuild-tree, which I did after ensuring of backup reliability, and the error was corrected (after eventually losing a few files i fortunately had in the backups). Also, which kernel/machine is this running on? Do you know *why* this corruption may have occured? Any recent hardware issues? Is ther anything in the logs regarding fs/device errors? Kernel is 2.6.19. The machine does not seem to have any HW issue, nothing strange in the logs. :$ This is just a plain Dell 2650 server with a bunch of SCSI HDD, software raid5 array, reiserfs on top of it. Laurent
Re: Filesystem corruption
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Mon, 28 May 2007, Laurent CARON wrote: Always ran check which told me to run fix-fixable or rebuild-tree, which I did after ensuring of backup reliability, and the error was corrected (after eventually losing a few files i fortunately had in the backups). Well, lucky you :) The machine does not seem to have any HW issue, nothing strange in the logs. :$ This is just a plain Dell 2650 server with a bunch of SCSI HDD, software raid5 array, reiserfs on top of it. ...and no power-failures, bad memory whatsoever? Hm, too bad, since now it's unclear what *caused* the corruptions in the first place. You'll probably (hopefully) be able to correct this corruption with --rebuild-tree but I'd have a close look on this filesystem for further curruptions. Christian. - -- BOFH excuse #118: the router thinks its a printer. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGW2N/+A7rjkF8z0wRAg9yAJ9PgWYfv1KC1Z3o/cVXScqxTYDPfwCdHKDD Wy3p1M9ODJFfuqn0JaCEu8U= =uCAH -END PGP SIGNATURE-
Filesystem corruption
Hi, A few days ago, one of my procmail suddenly receipes stopped to work. I didn't care much since this only was for 1 or 2 mails. Yesterday, i took time to dig it a bit further and looked at the filesystem on my mail server Here is the output of ls -al in the Maildir where my mails are stored total 1341 drwx-- 6 lcaron mail 256 2007-05-24 10:35 ./ drwx-- 363 lcaron mail 12184 2007-05-25 21:52 ../ -rw-r--r-- 1 lcaron mail17 2004-05-25 09:19 courierimapacl drwx-- 2 lcaron mail48 2004-05-25 09:20 courierimapkeywords/ -rw-r--r-- 1 lcaron lcaron 169365 2007-05-24 10:35 courierimapuiddb drwx-- 2 lcaron mail 1185016 2007-05-24 10:26 cur/ -rw--- 1 lcaron mail 0 2004-05-25 09:19 maildirfolder ?- ? ? ??? new drwx-- 2 lcaron mail48 2007-05-24 19:16 tmp/ The entry that scares me is ?- ? ? ??? new Seems to me it is a filesystem corruption. Any other solution than rebuild-tree ? Thanks Laurent
Re: Filesystem corruption
Hello! On Thu, Aug 14, 2003 at 12:05:28AM +0800, Locke wrote: the files. I'm guessing the reason why it recovered so little was because that because I was running a 7.8GB+40GB LVM and the 40GB pyhsical volume wasn't working and left it with only 7.8GB. Yes of course. is_tree_node: node level 0 does not match to the expected one 1 vs-5150: search_by_key: invalid format found in block 8838461. Fsck? So LVM substitures zero filled blocks instead of data if physical volume is unavailable. Of course reiserfsck happily thrown all of those blocks out of the tree. And also when rebooting after the corruption I saw several error messages for all drives, hda, hdb and hdg ** hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } hda: dma_intr: error=0x84 { DriveStatusError BadCRC } hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Also you should consider replacing your noisy IDE cable for primary IDE controller with not noisy one. Or just run in lower UDMA mode. **The messages are copied from the FAQ in namesys.com because they looked similar so I'm not sure if they're the exactly same. Well, if they are not the same, you'd better write them down on paper. Is there anything I can try to recover more data? You might try to get LVM up again and run reiserfsck --rebuild tree. Some more stuff wuill be restored. Though still you will have lots of files' content lost and there is no way to restore it anymore. Also use reiserfsck 3.6.11 Bye, Oleg
Re: filesystem corruption ?
Hello, Though this machine will be replaced by a real server in a few month, I'm still rather worried what happend. Even if its 'only' a hardware memory problem this means lots of trouble for us -- on the one hand it seems not to be memtest86 detectable and on the other hand our programs really do need working memory, but of course this is not of your concern. Update: I yesterday started our fall-back-server and run another memtest86 on the suspected machine. A colleague just told me that memtest86 reported 3 errors in test 8, well lets see what comes in test 11. So this either means that the physicians have run some experiments today or that the memory became damaged within 2 weeks. Thanks a lot for your help to identify this as a hardware problem. Best regards, Bernd
Re: filesystem corruption ?
On Friday 21 March 2003 08:32, you wrote: Hello! On Thu, Mar 20, 2003 at 07:23:48PM +0100, Bernd Schubert wrote: Hm, interesting. And what are the differences? How big are they? Since it are binaries files, a colleague had the idea to use hexdump and diff, so the command for the attached file was: diff (hexdump /worka/gdb) (hexdump /usr/bin/gdb)|sort -k 2 gdb.diff So the lines beginning with '' are from working gdb and lines beginning with '' are from corrupted gdb. When you look into the diff-file you will see, that only some bits per line have changed. I see. Basically you have two pages of data corrupted. And the corruption indeed looks like bit corruption. How about rebooting that box and checking if corruption pattern changes? Also I'd recommend you to run memtext86 for some time as this looks like bad memory pattern. All of our machines have to pass a full memtest86 checking before we intend to use them - this machine is about 3 weeks old, of course it also had to run this test and furthermore it has ECC-memory. Any events happening between morning backup and time of problem discovery? Except, that I recompiled a kernel and we installed some programs using aptitude (its a debian system), nothing happend to the filesystem. There was also no reboot, no crash, etc. Update: The corruption probably happend at 15:48, since at this time also a xchat on one of the clients crashed and this was noticed by us at first. The xchat binary was also affected by the corruption. So, the beam of X-rays run through the memory module corrupting some bits? There is the 'Environmental Physics Institut' in the floor below us and since we currently have an extremely high hardware failure rate, I have been joking for some time that they might be causing it (I believe they are indeed using x-ray beams). I should really ask them if their constructions are shielded properly ;-) ;) This stuff should not have been written to disk, so probably plain reboot should fix everything? Can you test that? Yes of course, if something goes wrong we still have our fall back machine :-) I will report in the afternoon if it worked. Best regards, Bernd
Re: filesystem corruption ?
Hi, So, the beam of X-rays run through the memory module corrupting some bits? ;) This stuff should not have been written to disk, so probably plain reboot should fix everything? Can you test that? indeed after rebooting everything is fine again. We will run another memtest86 during the weekend, though I really don't believe we will find a problem. Though this machine will be replaced by a real server in a few month, I'm still rather worried what happend. Even if its 'only' a hardware memory problem this means lots of trouble for us -- on the one hand it seems not to be memtest86 detectable and on the other hand our programs really do need working memory, but of course this is not of your concern. Thanks for your help, Bernd
Re: filesystem corruption ?
Hello! On Fri, Mar 21, 2003 at 02:01:38PM +0100, Bernd Schubert wrote: So, the beam of X-rays run through the memory module corrupting some bits? ;) This stuff should not have been written to disk, so probably plain reboot should fix everything? Can you test that? indeed after rebooting everything is fine again. We will run another memtest86 So on-disk corruption is out of question. during the weekend, though I really don't believe we will find a problem. Ask those physics guys to run some X-ray experiments while you are running memtest86 ;) Though this machine will be replaced by a real server in a few month, I'm still rather worried what happend. Even if its 'only' a hardware memory problem this means lots of trouble for us -- on the one hand it seems not to be memtest86 detectable and on the other hand our programs really do need Well, it may be not detectable because no high-enerty beams are running around at the time of test. working memory, but of course this is not of your concern. I've learn in the school that if you put some bit amount of plumbum in between some area and source of radiation, chances are radiation that will reach the protected area will be of much lesser strenght. In fact you might go to those guys and ask them what matherial (and how much of it) is best suited to shield against stuff they generate. Bye, Oleg
Re: filesystem corruption ?
I've learn in the school that if you put some bit amount of plumbum in between some area and source of radiation, chances are radiation that will reach the protected area will be of much lesser strenght. In fact you might go to those guys and ask them what matherial (and how much of it) is best suited to shield against stuff they generate. We already discussed during the lunch time to order somthing like this for our systems ;-) (would be a rather strange order for a usual computer company, wouldn't it ?) But in fact, I'm now really going to contact the those guys and ask if they have some stuff to detect their beams. Have a nice weekend, Bernd
Re: filesystem corruption ?
On Fri, 21 Mar 2003 14:07, Oleg Drokin wrote: I've learn in the school that if you put some bit amount of plumbum in It's better known in English as lead. The problem with lead is that it's poisonous and soft. Having to wash your hands after touching your computer could get annoying. Other metals such as copper and steel will reduce the radiation and can also be used for protection against mechanical damage. The best way to reduce radiation is by distance. The inverse-square law applies, so moving the computer further away from the experiment will reduce the radiation more easily than anything else you may do. One thing to consider is disk-less X-term machines for if you need to operate a computer from near the experiment, so if the X-term crashed from radiation then your server with the data should continue running correctly. -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page
Re: [reiserfs-list] Filesystem corruption after resize
Quoting Vitaly Fertman ([EMAIL PROTECTED]): Hi, Hello, The exact commands used are: resize_reiserfs -s 400G /dev/vg01/stuff lvreduce -l 16693 /dev/vg01/stuff pvmove -v /dev/md1 vgreduce -v vg01 /dev/md1 resize_reiserfs /dev/vg01/stuff reiserfsck --check /dev/vg01/stuff This all worked like a charm, until I noticed that a nightly script that scans all files, no longer was able to access about 20 files (access denied even though the script is running as root). Do you mean reiserfsck finished without any error/warning massage? Yes, it did not detect any errors after the resize. The errors turned up a day after. So it might not be 100% that those two events are linked. But since nothing else was done that could justify corruptions, that is the theory I am working on. This progs I send to you is what is going to be the next release. Please run --check and tell me what is in fsck.log. You can run --fix-fixable if it says so, but it would be better to run rebuild-tree on a copy (it is not a release). Or you can do the following: debugreiserfs/debugreiserfs -p /dev/vg01/stuff | gzip -p stuff.gz it will pack metadata (without filebodies), I will download it and test locally. I will send you those two files in a seperate mail. I copied all the data over to the other raid device, so I am not so much concerned about rescueing the filesystem - I could just reformat the whole thing and copy the files back. But I would very much like to find out what happened so I can take actions to prevent it from happening again. Particularly I need to know if resizing on lvm devices is working properly, since I will need to resize again shortly when the replacement disk arrives. Baldur
Re: [reiserfs-list] Filesystem Corruption
Thanks Oleg, sorry for the late response (i was out of the office) , you may find the following information on the last crash useful :- +++ 3 04:32:37 devo kernel: vs-13075: reiserfs_read_inode2: dead inode read from disk [854 1695654 0x0 SD]. This is likely to be race with knfsd. Ignore Jun 3 04:32:39 devo kernel: vs-13060: reiserfs_update_sd: stat data of object [854 1695654 0x0 SD] (nlink == 1) not found (pos 1) Jun 3 04:41:38 devo kernel: vs-13060: reiserfs_update_sd: stat data of object [854 1695654 0x0 SD] (nlink == 1) not found (pos 1) Jun 3 04:41:43 devo kernel: vs-13060: reiserfs_update_sd: stat data of object [854 1695654 0x0 SD] (nlink == 1) not found (pos 1) I will upgrade the kernel and reiserfs tools this week and inform you of the result after a fsck. -Kurt On Friday 07 June 2002 3:15 am, Oleg Drokin wrote: Hello! On Thu, Jun 06, 2002 at 02:00:01PM -0400, Kurt wrote: error stating the file pointed to nowhere. I was unable to complete a reiserfsck --fix-fixable because of the length of time that this (fsck) process took since this was an unscheduled downtime. During the weekend i will attempt to do the fsck again, however i really needed to know if this problem has been observed by anyone else, and what steps they took to fix the problem. We recommend you to upgrade your kernel to 2.4.18. To know what exact problem is it would be very useful if you'd posted excerpts from kernel logs with actual errors. Thank you. Bye, Oleg -- Kurt Palmer SysAdmin [EMAIL PROTECTED]Advance Internet 201-459-2846
[reiserfs-list] Filesystem corruption after resize
Hello, First something about my setup: md0: 8x80 GB in a RAID5 configuration md1: 4x160 GB in a RAID5 configuration /dev/vg01/stuff: the union of md0 and md1 done with lvm. dark:/mnt# reiserfsck -V -reiserfsck, 2002- reiserfsprogs 3.x.1a dark:/mnt# resize_reiserfs -v -resize_reiserfs, 2002- reiserfsprogs 3.x.1a Usage: resize_reiserfs [-s[+|-]#[G|M|K]] [-fqv] device dark:/mnt# cat /proc/version Linux version 2.4.18 (root@dark) (gcc version 2.95.4 20011006 (Debian prerelease)) #1 SMP Fri Apr 12 13:40:03 CEST 2002 The system is a dual AMD Athlon(tm) MP 1800+ (1533 MHz), with 1 GB memory. Now recently one of the 160 GB disks died. Since I still had enough free space and I wanted to preserve the redundancy, I used resize_reiserfs to shrink the filesystem. Then I used lvm to move it away from the non-redundant md1 device. The exact commands used are: resize_reiserfs -s 400G /dev/vg01/stuff lvreduce -l 16693 /dev/vg01/stuff pvmove -v /dev/md1 vgreduce -v vg01 /dev/md1 resize_reiserfs /dev/vg01/stuff reiserfsck --check /dev/vg01/stuff This all worked like a charm, until I noticed that a nightly script that scans all files, no longer was able to access about 20 files (access denied even though the script is running as root). Dmesg is full of this: vs-5150: search_by_key: invalid format found in block 66153. Fsck? vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [163330 163334 0x0 SD] is_leaf: free space seems wrong: level=1, nr_items=1, free_space=3040 rdkey vs-5150: search_by_key: invalid format found in block 72879. Fsck? vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [168724 168732 0x0 SD] is_tree_node: node level 29122 does not match to the expected one 1 vs-5150: search_by_key: invalid format found in block 70647. Fsck? vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [167220 167223 0x0 SD] is_tree_node: node level 2 does not match to the expected one 1 vs-5150: search_by_key: invalid format found in block 66153. Fsck? vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [163330 163334 0x0 SD] and so on, there is alot of this stuff repeating. reiserfsck --fix-fixable /dev/vg01/stuff crashes. Btw. a seperate problem, I am never able to unmount this filesystem properly. I always get this error: dark:/mnt# umount stuff umount: /mnt/stuff: device is busy dark:/mnt# fuser -v stuff USERPID ACCESS COMMAND stuffroot kernel mount /mnt/stuff So without rebooting I can't quote the exact output from --fix-fixable, but it is approximate the same as when I just run it plain: dark:/mnt# reiserfsck -l /root/reiserfsck.log /dev/vg01/stuff -reiserfsck, 2002- reiserfsprogs 3.x.1a Will read-only check consistency of the filesystem on /dev/vg01/stuff Will put log info to '/root/reiserfsck.log' Do you want to run this program?[N/Yes] (note need to type Yes):Yes ### reiserfsck --check started at Tue Jun 11 16:36:38 2002 ### Filesystem seems mounted read-only. Skipping journal replay.. Checking S+tree../ 4 (of 6)/ 27 (of 132)/ 44 (of 152)bit 1359513587, bitsize 136749056 reiserfsck: bitmap.c:168: reiserfs_bitmap_test_bit: Assertion `bit_number bm-bm_bit_size' failed. Aborted What can I do to resolve this? Thanks, Baldur
Re: [reiserfs-list] Filesystem Corruption
Hello! On Thu, Jun 06, 2002 at 02:00:01PM -0400, Kurt wrote: error stating the file pointed to nowhere. I was unable to complete a reiserfsck --fix-fixable because of the length of time that this (fsck) process took since this was an unscheduled downtime. During the weekend i will attempt to do the fsck again, however i really needed to know if this problem has been observed by anyone else, and what steps they took to fix the problem. We recommend you to upgrade your kernel to 2.4.18. To know what exact problem is it would be very useful if you'd posted excerpts from kernel logs with actual errors. Thank you. Bye, Oleg
[reiserfs-list] Filesystem Corruption
(Embedded image moved Kurt [EMAIL PROTECTED] to file: 06/06/2002 02:00 PM pic11654.pcx) (Embedded image moved Kurt [EMAIL PROTECTED] to file: 06/06/2002 02:00 PM pic24262.pcx) (Embedded image moved Kurt [EMAIL PROTECTED] to file: 06/06/2002 02:00 PM pic13835.pcx) Hello all, I currently have a system configured as follows :- 1) LVM version 1.0.1-rc4(ish)(03/10/2001) 2) /dev/PROJ/proj on /proj type reiserfs (rw,noatime,notail) 3) /dev/PROJ/proj239G 142G 97G 60% /proj 4) 2.4.17 with reiserfs tools 3.x.0k 5) Reiserfs compiled in (CONFIG_REISERFS_CHECK set to NO) 6) 256 MB RAM (sar -r shows memory usage is not abnormal for this box) 7)Tuns of very small files based on log processing I am told by my co-worker that the system unresponsive and showed reiserfs related errors on the console. Upon restart they noticed that the file /proj/webtrends/receive/bama/www3/access.01Jun.r.gz was unreadable by root (permission denied). I did a reiserfsck on the drive and noticed that access.01Jun.r.gz returned an error stating the file pointed to nowhere. I was unable to complete a reiserfsck --fix-fixable because of the length of time that this (fsck) process took since this was an unscheduled downtime. During the weekend i will attempt to do the fsck again, however i really needed to know if this problem has been observed by anyone else, and what steps they took to fix the problem. -Kurt -- Kurt Palmer SysAdmin [EMAIL PROTECTED]Advance Internet 201-459-2846
[reiserfs-list] Filesystem Corruption
(Embedded image moved Kurt [EMAIL PROTECTED] to file: 06/06/2002 02:00 PM pic29967.pcx) (Embedded image moved Kurt [EMAIL PROTECTED] to file: 06/06/2002 02:00 PM pic30134.pcx) (Embedded image moved Kurt [EMAIL PROTECTED] to file: 06/06/2002 02:00 PM pic18956.pcx) (Embedded image moved Kurt [EMAIL PROTECTED] to file: 06/06/2002 02:00 PM pic19921.pcx) (Embedded image moved Kurt [EMAIL PROTECTED] to file: 06/06/2002 02:00 PM pic06540.pcx) (Embedded image moved Kurt [EMAIL PROTECTED] to file: 06/06/2002 02:00 PM pic08003.pcx) (Embedded image moved Kurt [EMAIL PROTECTED] to file: 06/06/2002 02:00 PM pic04883.pcx) (Embedded image moved Kurt [EMAIL PROTECTED] to file: 06/06/2002 02:00 PM pic11654.pcx) (Embedded image moved Kurt [EMAIL PROTECTED] to file: 06/06/2002 02:00 PM pic24262.pcx) (Embedded image moved Kurt [EMAIL PROTECTED] to file: 06/06/2002 02:00 PM pic13835.pcx) Hello all, I currently have a system configured as follows :- 1) LVM version 1.0.1-rc4(ish)(03/10/2001) 2) /dev/PROJ/proj on /proj type reiserfs (rw,noatime,notail) 3) /dev/PROJ/proj239G 142G 97G 60% /proj 4) 2.4.17 with reiserfs tools 3.x.0k 5) Reiserfs compiled in (CONFIG_REISERFS_CHECK set to NO) 6) 256 MB RAM (sar -r shows memory usage is not abnormal for this box) 7)Tuns of very small files based on log processing I am told by my co-worker that the system unresponsive and showed reiserfs related errors on the console. Upon restart they noticed that the file /proj/webtrends/receive/bama/www3/access.01Jun.r.gz was unreadable by root (permission denied). I did a reiserfsck on the drive and noticed that access.01Jun.r.gz returned an error stating the file pointed to nowhere. I was unable to complete a reiserfsck --fix-fixable because of the length of time that this (fsck) process took since this was an unscheduled downtime. During the weekend i will attempt to do the fsck again, however i really needed to know if this problem has been observed by anyone else, and what steps they took to fix the problem. -Kurt -- Kurt Palmer SysAdmin [EMAIL PROTECTED]Advance Internet 201-459-2846