Re: some testing questions
On 2006-08-14 16:15, Vladimir V. Saveliev wrote: reiser4progs incluses a program measurefs.reiser4. It should be able to measure tree fragmentation. I am not sure how does portage tree evolve, but maybe it could be interesting too see how does reiser4 tree fragmentation change when filesystem is loaded regularly. This is a reiser4 partition holding the following: - portage tree (synced every three days) - ccache (compiler cache allowed to grow to 3GB - recently cleared) - firefox's and opera's cache - /tmp (portage builds everything in here) The filesystem was created around 1.5 years ago (how can I say). #cat /proc/version Linux version 2.6.17.8-reiser4-r3 ([EMAIL PROTECTED]) (gcc version 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0, pie-8.7.9)) #2 Sat Aug 12 12:03:25 CEST 2006 #df: /dev/hda8 6357768 3478716 2879052 55% /cache #cat /etc/fstab: /dev/hda8 /cache reiser4 noatime,nodiratime,nodev,nosuid,tmgr.atom_max_age=50 0 0 #measurefs.reiser4 -S: measurefs.reiser4 1.0.5 Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser, licensing governed by reiser4progs/COPYING. Tree statistics ... done Packing statistics: Formatted nodes:3622.85b (88.45%) Branch nodes: 2792.00b (68.16%) Twig nodes: 3233.75b (78.95%) Leaf nodes: 3966.47b (96.84%) Node statistics: Total nodes: 871653 Formatted nodes: 75571 Unformatted nodes: 796082 Branch nodes:23 Twig nodes:1360 Leaf nodes: 870270 Item statistics: Total items: 542211 Nodeptr items:75570 Statdata items: 214695 Direntry items: 37432 Tail items: 207819 Extent items: 6695 Tree fragmentation: 0.074648 Data fragmentation: 0.039962 Last week I recompiled gcc and afterwards cleared 3GB of ccache data. Before doing so, the partition was 90% full. My feeling is that now that it's half empty performance is much better. Emerge sync used to take _ages_ rebuilding its cache and now is quite fast. Also CPU usage during compilation seems much lower. I can't remember to ever hear the CPU fan running during recent compilations (700MHZ PIII). Before clearing the cache it ran continuously and still felt hot. I know none of this is hard data. If you are interested in a follow up, just let me know. BTW: Is it save to run measurefs.reiser4 -S -T -D on a mounted fs ? -- Ingo Bormuth, voicebox telefax: +49-12125-10226517 '(~o-o~)' public key 86326EC9, http://ibormuth.efil.de/contact --ooO--(.)--Ooo--
Re: some testing questions
Ingo Bormuth wrote: #df: /dev/hda8 6357768 3478716 2879052 55% /cache Before doing so, the partition was 90% full. The performance difference between 90% full and 55% full will be large on every filesystem. When we ship a repacker, that will be less true, because we will have large chunks of unused space after the repacker runs. Oddly enough, I don't know the statistics for reiser* filesystems, but I know that for FFS you should not let it become more than 85% full before buying a new disk (or cleaning your home directory) if you want good performance.
Re: the 'official' point of view expressed by kernelnewbies.org
Anyone with serious need for data integrity already uses RAID, so why add brand new complexity for a solved problem? RAID is great at recovering data, but not detecting errors. File system can detect errors with checksum. What is missing is an API between layers for filesystem to say this sector is bad, go rebuild it. This seems like a much more simple and useful thing than adding ECC into the filesystem itself. How about we switch to ecc, which would help with bit rot not sector loss? Interesting aspect. Yes, we can implement ECC as a special crypto transform that inflates data. As I mentioned earlier, it is possible via translation of key offsets with scale factor 1. Of course, it is better then nothing, but anyway meta-data remains ecc-unprotected, and, hence, robustness is not increased.. _ On the road to retirement? Check out MSN Life Events for advice on how to get there! http://lifeevents.msn.com/category.aspx?cid=Retirement
Re: some testing questions
Hans Reiser wrote: Ingo Bormuth wrote: #df: /dev/hda8 6357768 3478716 2879052 55% /cache Before doing so, the partition was 90% full. The performance difference between 90% full and 55% full will be large on every filesystem. When we ship a repacker, that will be less true, because we will have large chunks of unused space after the repacker runs. Not always true. For one, doesn't Reiser4 arbitrarily reserve 5%? For another, look at his results -- unless I'm wrong, that's 3-7% fragmentation. If I'm wrong, it's more like .03-.07%. And lastly, at a certain point, percentages aren't really that accurate. I've got a 350 or 400 gig partition which is 95% full, according to df (which if I was right about that 5%, it's more like 90% full) and that still leaves a solid 10-20 gigs free. I mean, yes, performance will eventually start to suffer, but how much time and activity will it take to fragment 20 gigs of free space, especially with lazy allocation?
Re: the 'official' point of view expressed by kernelnewbies.org
Tom Reinhart wrote: Anyone with serious need for data integrity already uses RAID, so why add brand new complexity for a solved problem? RAID is great at recovering data, but not detecting errors. File system can detect errors with checksum. What is missing is an API between layers for filesystem to say this sector is bad, go rebuild it. I agree that such an API is needed. I think there are a lot of systems on desktops that lack RAID though. Probably I should leave ECC for some hopefully next year future release though. This seems like a much more simple and useful thing than adding ECC into the filesystem itself. How about we switch to ecc, which would help with bit rot not sector loss? Interesting aspect. Yes, we can implement ECC as a special crypto transform that inflates data. As I mentioned earlier, it is possible via translation of key offsets with scale factor 1. Of course, it is better then nothing, but anyway meta-data remains ecc-unprotected, and, hence, robustness is not increased.. _ On the road to retirement? Check out MSN Life Events for advice on how to get there! http://lifeevents.msn.com/category.aspx?cid=Retirement
Re: some testing questions
David Masover wrote: that's 3-7% fragmentation. which is high enough to hurt performance. 50mb/s * 0.01 seconds = amount of transfer a seek costs. He needs a repacker. After we resolve code review issues from akpm.
Re: the 'official' point of view expressed by kernelnewbies.org
Tom Reinhart wrote: Anyone with serious need for data integrity already uses RAID, so why add brand new complexity for a solved problem? RAID is great at recovering data, but not detecting errors. File system can detect errors with checksum. What is missing is an API between layers for filesystem to say this sector is bad, go rebuild it. Actually we dont need a special API: kernel should warn and recommend running fsck, which scans the whole tree and handles blocks with bad checksums. This seems like a much more simple and useful thing than adding ECC into the filesystem itself. checksumming is _not_ much more easy then ecc-ing from implementation standpoint, however it would be nice, if some part of errors will get fixed without massive surgery performed by fsck How about we switch to ecc, which would help with bit rot not sector loss? Interesting aspect. Yes, we can implement ECC as a special crypto transform that inflates data. As I mentioned earlier, it is possible via translation of key offsets with scale factor 1. Of course, it is better then nothing, but anyway meta-data remains ecc-unprotected, and, hence, robustness is not increased.. _ On the road to retirement? Check out MSN Life Events for advice on how to get there! http://lifeevents.msn.com/category.aspx?cid=Retirement
Re: the 'official' point of view expressed by kernelnewbies.org
Edward Shishkin wrote: Tom Reinhart wrote: Anyone with serious need for data integrity already uses RAID, so why add brand new complexity for a solved problem? RAID is great at recovering data, but not detecting errors. File system can detect errors with checksum. What is missing is an API between layers for filesystem to say this sector is bad, go rebuild it. Actually we dont need a special API: kernel should warn and recommend running fsck, which scans the whole tree and handles blocks with bad checksums. Yes, but our fsck knows nothing about RAID currently so This seems like a much more simple and useful thing than adding ECC into the filesystem itself. checksumming is _not_ much more easy then ecc-ing from implementation standpoint, however it would be nice, if some part of errors will get fixed without massive surgery performed by fsck How about we switch to ecc, which would help with bit rot not sector loss? Interesting aspect. Yes, we can implement ECC as a special crypto transform that inflates data. As I mentioned earlier, it is possible via translation of key offsets with scale factor 1. Of course, it is better then nothing, but anyway meta-data remains ecc-unprotected, and, hence, robustness is not increased.. _ On the road to retirement? Check out MSN Life Events for advice on how to get there! http://lifeevents.msn.com/category.aspx?cid=Retirement
Re: the 'official' point of view expressed by kernelnewbies.org
Edward Shishkin wrote: Tom Reinhart wrote: Anyone with serious need for data integrity already uses RAID, so why add brand new complexity for a solved problem? RAID is great at recovering data, but not detecting errors. File system can detect errors with checksum. What is missing is an API between layers for filesystem to say this sector is bad, go rebuild it. Actually we dont need a special API: kernel should warn and recommend running fsck, which scans the whole tree and handles blocks with bad checksums. What does this have to do with RAID, though?
Re: the 'official' point of view expressed by kernelnewbies.org
David Masover wrote: Edward Shishkin wrote: Tom Reinhart wrote: Anyone with serious need for data integrity already uses RAID, so why add brand new complexity for a solved problem? RAID is great at recovering data, but not detecting errors. File system can detect errors with checksum. What is missing is an API between layers for filesystem to say this sector is bad, go rebuild it. Actually we dont need a special API: kernel should warn and recommend running fsck, which scans the whole tree and handles blocks with bad checksums. What does this have to do with RAID, though? I assumed we dont have raid: reiser4 can support its own checksums/ecc signatures for (meta)data protection via node plugin
Re: the 'official' point of view expressed by kernelnewbies.org
Edward Shishkin wrote: David Masover wrote: Edward Shishkin wrote: Tom Reinhart wrote: Anyone with serious need for data integrity already uses RAID, so why add brand new complexity for a solved problem? RAID is great at recovering data, but not detecting errors. File system can detect errors with checksum. What is missing is an API between layers for filesystem to say this sector is bad, go rebuild it. Actually we dont need a special API: kernel should warn and recommend running fsck, which scans the whole tree and handles blocks with bad checksums. What does this have to do with RAID, though? I assumed we dont have raid: reiser4 can support its own checksums/ecc signatures for (meta)data protection via node plugin We don't have a guaranteed raid, however, it would be nice to do the right thing when there is raid.
Re: the 'official' point of view expressed by kernelnewbies.org
On 8/15/06, Edward Shishkin [EMAIL PROTECTED] wrote: checksumming is _not_ much more easy then ecc-ing from implementation standpoint, however it would be nice, if some part of errors will get fixed without massive surgery performed by fsck We need checksumming even with eccing... ECCing on large spans of data is too computationally costly to do unless we know something is wrong (via a checksum). Lets pause for a minute, when you talk about ECC what are you actually talking about? If you're talking about a hamming code (used on ram, http://en.wikipedia.org/wiki/Hamming_code) or Convolutional code (used on telecom links, http://en.wikipedia.org/wiki/Convolutional_code) or are you talking about an erasure code like RS coding (http://en.wikipedia.org/wiki/Reed-Solomon_code)? I assume in the discussions that you're not talking about an RS like code... because RAID-5 and RAID-6 are, fundamentally, a form of RS coding. They don't solve bit errors, but when you know you've lost a block of data they can recover it. Non-RS forms of ECC are very slow in software (esp decoding) .. and really aren't that useful: most of the time HDD's will lose data in nice big chunks that erasure codes handle well but other codes do not. The challenge with erasure codes is that you must know that a block is bad... most of the times the drives will tell you, but some times corruption leaks through. This is where block level checksums come into play... they allow you to detect bad blocks and then your erasure code allows you to recover the data. The checksum must be fast because you must perform it on every read from disk... this makes ECC unsuitable, because although it could detect errors, it is too slow. Also, the number of additional errors ECC could fix are very small.. It would simply be better to store more erasure code blocks. An optimal RS codes which allows one block of N to fail (and require one block extra storage) is computationally trivial. We call it raid-5. If your 'threat model' is bad sectors rather than bad disks (an increasingly realistic shift) then N needs to have nothing to do with the number of disks you have and can be instead related to how much protection you want on a file. If 1:N isn't enough for you, RS can be generalized to any number of redundant blocks. Unfortunately, doing so requires modular aritmetic which current CPUs are not too impressively fast at. However, the Linux Raid-6 code demonstrates that two part parity can be done quite quickly in software. As such, I think 'ecc' is useless.. checksums are useful because they are cheap and allow us to use cheap erasure coding (which could be in a lower levle raid driver, or implemented in the FS) to achieve data integrity. The question of including error coding in the FS or in a lower level is, as far as I'm concerned, so clear a matter that it is hardly worth discussing anymore. In my view it is absolutely idiotic to place redundancy in a lower level. The advantage of placing redundancy in a lower level is code simplicity and sharing. The problem with doing so, however, is many fold. The redundancy requirements for various parts of the file system differ dramatically, without tight FS integration matching the need to the service is nearly impossible. The most important reason, however, is performance. Raid-5 (and raid-6) suffer a tremendous performance hit because of the requirement to write a full stripe OR execute a read modify write cycle. With FS integrated erasure codes it is possible to adjust the layout of the written blocks to ensure that every write is a full stripe write, effectively you adjust the stripe width with every write to ensure that the write always spans all the disks. Alternatively you can reduce the number of stripe chunks (i.e. number of disks) in the parity computation to make the write fit (although doing so wastes space)... FS redundancy integration also solves the layout problem. From my experience most systems with hardware raid are getting far below optimal performance because even when their FS is smart enough to do file allocation in a raid aware way (XFS and to a lesser extent EXT2/3) this is usually foiled by the partition table at the beginning of the raid device. Resulting in 1:N FS blocks actually spanning two disks! (thus reading that block incurres potentially 2x disk latency). Seperated FS and redundancy layers are an antiquated concept.. The FS's job is to provide reliable storage, fully stop. It's shocking to see that a dinosaur like SUN has figured this out but the free software community still fights against it.
Re: the 'official' point of view expressed by kernelnewbies.org
Anyone with serious need for data integrity already uses RAID, so why add brand new complexity for a solved problem? RAID is great at recovering data, but not detecting errors. File system can detect errors with checksum. What is missing is an API between layers for filesystem to say this sector is bad, go rebuild it. I agree that such an API is needed. I think there are a lot of systems on desktops that lack RAID though. Probably I should leave ECC for some hopefully next year future release though. Of course, not everyone uses RAID. ECC would benefit some people in some cases... no argument there. But as a business man, you know about targetting the right features to the right customers: Customer 1 uses RAID. Obviously, reliability is very important to customer 1, he is willing to take the extra expense to get it. Adding another level of protection (checksumming/RAID restore) is a no brainer, especially since it adds very little overhead over what he already sacrificed to RAID. Customer 2 doesn't use RAID. You can add all the fancy features to the filesystem you want, this customer is already vulnerable to total disk loss. If he really cared about integrity, he would be customer 1. If he won't pay for RAID, why would he pay for ECC? (in money or disk space overhead). Having ECC without RAID recovery is simply targetting the wrong person. (having both wouldn't suck. The more layers of protection, the better, although ECC would only be necessary against RAID failures, which just adds more .9's to the reliability score. But, you can also do this by adding more redundanncy disks to the array, so it's questionable if having both is even worth the development expense) _ FREE pop-up blocking with the new MSN Toolbar get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/
Re: the 'official' point of view expressed by kernelnewbies.org
From: Edward Shishkin [EMAIL PROTECTED] Actually we dont need a special API: kernel should warn and recommend running fsck, which scans the whole tree and handles blocks with bad checksums. Running fsck requires taking filesystem offline and having downtime. No fun. :( Correcting individual data errors can be done quickly and on-line as long as there exists a subset of the RAID that can reconstruct the correct data (with the correct checksum). _ Dont just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/
Re: the 'official' point of view expressed by kernelnewbies.org
On 8/15/06, Tom Reinhart [EMAIL PROTECTED] wrote: Of course, not everyone uses RAID. ECC would benefit some people in some cases... no argument there. We can use RAID mechanisms (RS erasure code) on a single disk. You could technically call it ECC, but if you do so you will confuse people. Block level parity would be correct.
Re: the 'official' point of view expressed by kernelnewbies.org
I am skeptical that bitflip errors above the storage layer are as common as the ZFS authors say, and their statistics that I have seen somehow lack a lot of detail about how they were gathered. If, say, a device with 100 errors counts as 100 instances for their statistics. Well, it would be nice to know how they were gathered. Next time I meet them I must ask. That said, if users want it, there should be a plugin that checks the bits. I agree that stripe awareness and the need to signal the underlying raid that a block needs to be recovered is important. Checksumming at the fs level seems like a reasonable plugin. I have no opinion on the computational cost of ECC vs. checksums, I will trust that you are correct.