Re: some testing questions

2006-08-15 Thread Ingo Bormuth
On 2006-08-14 16:15, Vladimir V. Saveliev wrote:
 reiser4progs incluses a program measurefs.reiser4. It should be able to 
 measure tree fragmentation. I am not sure how does portage tree evolve, but 
 maybe it could be interesting too see how does reiser4 tree fragmentation 
 change when filesystem is loaded regularly.

This is a reiser4 partition holding the following:

  - portage tree (synced every three days)
  - ccache (compiler cache allowed to grow to 3GB - recently cleared)
  - firefox's and opera's cache
  - /tmp (portage builds everything in here)

The filesystem was created around 1.5 years ago (how can I say).

#cat /proc/version
Linux version 2.6.17.8-reiser4-r3 ([EMAIL PROTECTED]) (gcc version 3.4.6 
(Gentoo 3.4.6-r1, ssp-3.4.5-1.0, pie-8.7.9)) #2 Sat Aug 12 12:03:25 CEST 2006

#df:
/dev/hda8  6357768   3478716   2879052  55% /cache

#cat /etc/fstab:
/dev/hda8  /cache  reiser4  
noatime,nodiratime,nodev,nosuid,tmgr.atom_max_age=50  0 0

#measurefs.reiser4 -S:
measurefs.reiser4 1.0.5
Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser, licensing governed by
reiser4progs/COPYING.

Tree statistics ... done
Packing statistics:
  Formatted nodes:3622.85b (88.45%)
  Branch nodes:   2792.00b (68.16%)
  Twig nodes: 3233.75b (78.95%)
  Leaf nodes: 3966.47b (96.84%)

Node statistics:
  Total nodes: 871653
  Formatted nodes:  75571
  Unformatted nodes:   796082
  Branch nodes:23
  Twig nodes:1360
  Leaf nodes:  870270

Item statistics:
  Total items: 542211
  Nodeptr items:75570
  Statdata items: 214695
  Direntry items:   37432
  Tail items:  207819
  Extent items:  6695


Tree fragmentation: 0.074648

Data fragmentation: 0.039962

Last week I recompiled gcc and afterwards cleared 3GB of ccache data. 
Before doing so, the partition was 90% full. My feeling is that now 
that it's half empty performance is much better. Emerge sync used to 
take _ages_ rebuilding its cache and now is quite fast. Also CPU usage 
during compilation seems much lower. I can't remember to ever hear the 
CPU fan running during recent compilations (700MHZ PIII). Before clearing 
the cache it ran continuously and still felt hot.

I know none of this is hard data. If you are interested in a follow up,
just let me know.


BTW: Is it save to run measurefs.reiser4 -S -T -D on a mounted fs ?

-- 
Ingo Bormuth, voicebox  telefax: +49-12125-10226517   '(~o-o~)'
public key 86326EC9, http://ibormuth.efil.de/contact   --ooO--(.)--Ooo--



Re: some testing questions

2006-08-15 Thread Hans Reiser
Ingo Bormuth wrote:


 #df:
 /dev/hda8  6357768   3478716   2879052  55% /cache
  
 Before doing so, the partition was 90% full. 
The performance difference between 90% full and 55% full will be large
on every filesystem.  When we ship a repacker, that will be less true,
because we will have large chunks of unused space after the repacker runs.

Oddly enough, I don't know the statistics for reiser* filesystems, but I
know that for FFS you should not let it become more than 85% full before
buying a new disk (or cleaning your home directory) if you want good
performance.


Re: the 'official' point of view expressed by kernelnewbies.org

2006-08-15 Thread Tom Reinhart
Anyone with serious need for data integrity already uses RAID, so why add 
brand new complexity for a solved problem?


RAID is great at recovering data, but not detecting errors.  File system can 
detect errors with checksum.  What is missing is an API between layers for 
filesystem to say this sector is bad, go rebuild it.


This seems like a much more simple and useful thing than adding ECC into the 
filesystem itself.




How about we switch to ecc, which would help with bit rot not sector
loss?



Interesting aspect.

Yes, we can implement ECC as a special crypto transform that inflates
data. As I mentioned earlier, it is possible via translation of key
offsets with scale factor  1.

Of course, it is better then nothing, but anyway meta-data remains
ecc-unprotected, and, hence, robustness is not increased..



_
On the road to retirement? Check out MSN Life Events for advice on how to 
get there! http://lifeevents.msn.com/category.aspx?cid=Retirement




Re: some testing questions

2006-08-15 Thread David Masover

Hans Reiser wrote:

Ingo Bormuth wrote:


#df:
/dev/hda8  6357768   3478716   2879052  55% /cache
 
Before doing so, the partition was 90% full. 

The performance difference between 90% full and 55% full will be large
on every filesystem.  When we ship a repacker, that will be less true,
because we will have large chunks of unused space after the repacker runs.


Not always true.  For one, doesn't Reiser4 arbitrarily reserve 5%?  For 
another, look at his results -- unless I'm wrong, that's 3-7% 
fragmentation.  If I'm wrong, it's more like .03-.07%.


And lastly, at a certain point, percentages aren't really that accurate. 
 I've got a 350 or 400 gig partition which is 95% full, according to df 
(which if I was right about that 5%, it's more like 90% full) and that 
still leaves a solid 10-20 gigs free.


I mean, yes, performance will eventually start to suffer, but how much 
time and activity will it take to fragment 20 gigs of free space, 
especially with lazy allocation?


Re: the 'official' point of view expressed by kernelnewbies.org

2006-08-15 Thread Hans Reiser
Tom Reinhart wrote:
 Anyone with serious need for data integrity already uses RAID, so why
 add brand new complexity for a solved problem?

 RAID is great at recovering data, but not detecting errors.  File
 system can detect errors with checksum.  What is missing is an API
 between layers for filesystem to say this sector is bad, go rebuild it.
I agree that such an API is needed.  I think there are a lot of systems
on desktops that lack RAID though.  Probably I should leave ECC for some
hopefully next year future release though.

 This seems like a much more simple and useful thing than adding ECC
 into the filesystem itself.


 How about we switch to ecc, which would help with bit rot not sector
 loss?


 Interesting aspect.

 Yes, we can implement ECC as a special crypto transform that inflates
 data. As I mentioned earlier, it is possible via translation of key
 offsets with scale factor  1.

 Of course, it is better then nothing, but anyway meta-data remains
 ecc-unprotected, and, hence, robustness is not increased..


 _
 On the road to retirement? Check out MSN Life Events for advice on how
 to get there! http://lifeevents.msn.com/category.aspx?cid=Retirement






Re: some testing questions

2006-08-15 Thread Hans Reiser
David Masover wrote:
  that's 3-7% fragmentation.  
which is high enough to hurt performance. 50mb/s * 0.01 seconds =
amount of transfer a seek costs.  He needs a repacker.  After we resolve
code review issues from akpm.


Re: the 'official' point of view expressed by kernelnewbies.org

2006-08-15 Thread Edward Shishkin

Tom Reinhart wrote:
Anyone with serious need for data integrity already uses RAID, so why 
add brand new complexity for a solved problem?


RAID is great at recovering data, but not detecting errors.  File system 
can detect errors with checksum.  What is missing is an API between 
layers for filesystem to say this sector is bad, go rebuild it.




Actually we dont need a special API: kernel should warn and recommend
running fsck, which scans the whole tree and handles blocks with bad
checksums.

This seems like a much more simple and useful thing than adding ECC into 
the filesystem itself.


checksumming is _not_ much more easy then ecc-ing from implementation
standpoint, however it would be nice, if some part of errors will get
fixed without massive surgery performed by fsck






How about we switch to ecc, which would help with bit rot not sector
loss?




Interesting aspect.

Yes, we can implement ECC as a special crypto transform that inflates
data. As I mentioned earlier, it is possible via translation of key
offsets with scale factor  1.

Of course, it is better then nothing, but anyway meta-data remains
ecc-unprotected, and, hence, robustness is not increased..



_
On the road to retirement? Check out MSN Life Events for advice on how 
to get there! http://lifeevents.msn.com/category.aspx?cid=Retirement








Re: the 'official' point of view expressed by kernelnewbies.org

2006-08-15 Thread Hans Reiser
Edward Shishkin wrote:
 Tom Reinhart wrote:
 Anyone with serious need for data integrity already uses RAID, so why
 add brand new complexity for a solved problem?

 RAID is great at recovering data, but not detecting errors.  File
 system can detect errors with checksum.  What is missing is an API
 between layers for filesystem to say this sector is bad, go rebuild
 it.


 Actually we dont need a special API: kernel should warn and recommend
 running fsck, which scans the whole tree and handles blocks with bad
 checksums.
Yes, but our fsck knows nothing about RAID currently so

 This seems like a much more simple and useful thing than adding ECC
 into the filesystem itself.

 checksumming is _not_ much more easy then ecc-ing from implementation
 standpoint, however it would be nice, if some part of errors will get
 fixed without massive surgery performed by fsck




 How about we switch to ecc, which would help with bit rot not sector
 loss?



 Interesting aspect.

 Yes, we can implement ECC as a special crypto transform that inflates
 data. As I mentioned earlier, it is possible via translation of key
 offsets with scale factor  1.

 Of course, it is better then nothing, but anyway meta-data remains
 ecc-unprotected, and, hence, robustness is not increased..


 _
 On the road to retirement? Check out MSN Life Events for advice on
 how to get there! http://lifeevents.msn.com/category.aspx?cid=Retirement









Re: the 'official' point of view expressed by kernelnewbies.org

2006-08-15 Thread David Masover

Edward Shishkin wrote:

Tom Reinhart wrote:
Anyone with serious need for data integrity already uses RAID, so why 
add brand new complexity for a solved problem?


RAID is great at recovering data, but not detecting errors.  File 
system can detect errors with checksum.  What is missing is an API 
between layers for filesystem to say this sector is bad, go rebuild it.




Actually we dont need a special API: kernel should warn and recommend
running fsck, which scans the whole tree and handles blocks with bad
checksums.


What does this have to do with RAID, though?


Re: the 'official' point of view expressed by kernelnewbies.org

2006-08-15 Thread Edward Shishkin

David Masover wrote:

Edward Shishkin wrote:


Tom Reinhart wrote:

Anyone with serious need for data integrity already uses RAID, so why 
add brand new complexity for a solved problem?


RAID is great at recovering data, but not detecting errors.  File 
system can detect errors with checksum.  What is missing is an API 
between layers for filesystem to say this sector is bad, go rebuild 
it.




Actually we dont need a special API: kernel should warn and recommend
running fsck, which scans the whole tree and handles blocks with bad
checksums.



What does this have to do with RAID, though?




I assumed we dont have raid: reiser4 can support its own checksums/ecc
signatures for (meta)data protection via node plugin


Re: the 'official' point of view expressed by kernelnewbies.org

2006-08-15 Thread David Masover

Edward Shishkin wrote:

David Masover wrote:

Edward Shishkin wrote:


Tom Reinhart wrote:

Anyone with serious need for data integrity already uses RAID, so 
why add brand new complexity for a solved problem?


RAID is great at recovering data, but not detecting errors.  File 
system can detect errors with checksum.  What is missing is an API 
between layers for filesystem to say this sector is bad, go rebuild 
it.




Actually we dont need a special API: kernel should warn and recommend
running fsck, which scans the whole tree and handles blocks with bad
checksums.



What does this have to do with RAID, though?




I assumed we dont have raid: reiser4 can support its own checksums/ecc
signatures for (meta)data protection via node plugin


We don't have a guaranteed raid, however, it would be nice to do the 
right thing when there is raid.


Re: the 'official' point of view expressed by kernelnewbies.org

2006-08-15 Thread Gregory Maxwell

On 8/15/06, Edward Shishkin [EMAIL PROTECTED] wrote:

checksumming is _not_ much more easy then ecc-ing from implementation
standpoint, however it would be nice, if some part of errors will get
fixed without massive surgery performed by fsck


We need checksumming even with eccing... ECCing on large spans of data
is too computationally costly to do unless we know something is wrong
(via a checksum).

Lets pause for a minute, when you talk about ECC what are you actually
talking about?   If you're talking about a hamming code (used on ram,
http://en.wikipedia.org/wiki/Hamming_code) or Convolutional code (used
on telecom links, http://en.wikipedia.org/wiki/Convolutional_code) or
are you talking about an erasure code like RS coding
(http://en.wikipedia.org/wiki/Reed-Solomon_code)?

I assume in the discussions that you're not talking about an RS like
code... because RAID-5 and RAID-6 are, fundamentally, a form of RS
coding. They don't solve bit errors, but when you know you've lost a
block of data they can recover it.

Non-RS forms of ECC are very slow in software (esp decoding) .. and
really aren't that useful: most of the time HDD's will lose data in
nice big chunks that erasure codes handle well but other codes do not.

The challenge with erasure codes is that you must know that a block is
bad... most of the times the drives will tell you, but some times
corruption leaks through. This is where block level checksums come
into play... they allow you to detect bad blocks and then your erasure
code allows you to recover the data.   The checksum must be fast
because you must perform it on every read from disk... this makes ECC
unsuitable, because although it could detect errors, it is too slow.
Also, the number of additional errors ECC could fix are very small..
It would simply be better to store more erasure code blocks.

An optimal RS codes which allows one block of N to fail (and require
one block extra storage)  is computationally trivial. We call it
raid-5.  If your 'threat model' is bad sectors rather than bad disks
(an increasingly realistic shift) then N needs to have nothing to do
with the number of disks you have and can be instead related to how
much protection you want on a file.

If 1:N isn't enough for you, RS can be generalized to any number of
redundant blocks. Unfortunately, doing so requires modular aritmetic
which current CPUs are not too impressively fast at. However, the
Linux Raid-6 code demonstrates that two part parity can be done quite
quickly in software.

As such, I think 'ecc' is useless.. checksums are useful because they
are cheap and allow us to use cheap erasure coding (which could be in
a lower levle raid driver, or implemented in the FS) to achieve data
integrity.

The question of including error coding in the FS or in a lower level
is, as far as I'm concerned, so clear a matter that it is hardly worth
discussing anymore.  In my view it is absolutely idiotic to place
redundancy in a lower level.

The advantage of placing redundancy in a lower level is code
simplicity and sharing.

The problem with doing so, however, is many fold.

The redundancy requirements for various parts of the file system
differ dramatically, without tight FS integration matching the need to
the service is nearly impossible.

The most important reason, however, is performance.  Raid-5 (and
raid-6) suffer a tremendous performance hit because of the requirement
to write a full stripe OR execute a read modify write cycle.  With FS
integrated erasure codes it is possible to adjust the layout of the
written blocks to ensure that every write is a full stripe write,
effectively you adjust the stripe width with every write to ensure
that the write always spans all the disks.  Alternatively you can
reduce the number of stripe chunks  (i.e. number of disks) in the
parity computation to make the write fit (although doing so wastes
space)...

FS redundancy integration also solves the layout problem. From my
experience most systems with hardware raid are getting far below
optimal performance because even when their FS is smart enough to do
file allocation in a raid aware way (XFS and to a lesser extent
EXT2/3) this is usually foiled by the partition table at the beginning
of the raid device. Resulting in 1:N FS blocks actually spanning two
disks! (thus reading that block incurres potentially 2x disk latency).

Seperated FS and redundancy layers are an antiquated concept.. The
FS's job is to provide reliable storage, fully stop.  It's shocking to
see that a dinosaur like SUN has figured this out but the free
software community still fights against it.


Re: the 'official' point of view expressed by kernelnewbies.org

2006-08-15 Thread Tom Reinhart



Anyone with serious need for data integrity already uses RAID, so why
add brand new complexity for a solved problem?

RAID is great at recovering data, but not detecting errors.  File
system can detect errors with checksum.  What is missing is an API
between layers for filesystem to say this sector is bad, go rebuild it.



I agree that such an API is needed.  I think there are a lot of systems
on desktops that lack RAID though.  Probably I should leave ECC for some
hopefully next year future release though.


Of course, not everyone uses RAID.  ECC would benefit some people in some 
cases... no argument there.


But as a business man, you know about targetting the right features to the 
right customers:


Customer 1 uses RAID.  Obviously, reliability is very important to customer 
1, he is willing to take the extra expense to get it.  Adding another level 
of protection (checksumming/RAID restore) is a no brainer, especially since 
it adds very little overhead over what he already sacrificed to RAID.


Customer 2 doesn't use RAID.  You can add all the fancy features to the 
filesystem you want, this customer is already vulnerable to total disk loss. 
 If he really cared about integrity, he would be customer 1.  If he won't 
pay for RAID, why would he pay for ECC?  (in money or disk space overhead).


Having ECC without RAID recovery is simply targetting the wrong person.

(having both wouldn't suck.  The more layers of protection, the better, 
although ECC would only be necessary against RAID failures, which just adds 
more .9's to the reliability score.  But, you can also do this by adding 
more redundanncy disks to the array, so it's questionable if having both is 
even worth the development expense)


_
FREE pop-up blocking with the new MSN Toolbar – get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/




Re: the 'official' point of view expressed by kernelnewbies.org

2006-08-15 Thread Tom Reinhart



From: Edward Shishkin [EMAIL PROTECTED]



Actually we dont need a special API: kernel should warn and recommend
running fsck, which scans the whole tree and handles blocks with bad
checksums.


Running fsck requires taking filesystem offline and having downtime.  No 
fun.  :(


Correcting individual data errors can be done quickly and on-line as long as 
there exists a subset of the RAID that can reconstruct the correct data 
(with the correct checksum).


_
Don’t just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/




Re: the 'official' point of view expressed by kernelnewbies.org

2006-08-15 Thread Gregory Maxwell

On 8/15/06, Tom Reinhart [EMAIL PROTECTED] wrote:

Of course, not everyone uses RAID.  ECC would benefit some people in some
cases... no argument there.


We can use RAID mechanisms (RS erasure code) on a single disk. You
could technically call it ECC, but if you do so you will confuse
people.  Block level parity would be correct.


Re: the 'official' point of view expressed by kernelnewbies.org

2006-08-15 Thread Hans Reiser
I am skeptical that bitflip errors above the storage layer are as common
as the ZFS authors say, and their statistics that I have seen somehow
lack a lot of detail about how they were gathered.  If, say, a device
with 100 errors counts as 100 instances for their statistics.  Well,
it would be nice to know how they were gathered.  Next time I meet them
I must ask.

That said, if users want it, there should be a plugin that checks the bits.

I agree that stripe awareness and the need to signal the underlying raid
that a block needs to be recovered is important.  Checksumming at the fs
level seems like a reasonable plugin.

I have no opinion on the computational cost of ECC vs. checksums, I will
trust that you are correct.