Re: ChunkFS - measuring cross-chunk references

2007-05-19 Thread Karuna sagar K

On 4/25/07, Suparna Bhattacharya <[EMAIL PROTECTED]> wrote:

On Wed, Apr 25, 2007 at 05:50:55AM +0530, Karuna sagar K wrote:

One more set of numbers to calculate would be an estimate of cross-references
across chunks of block groups -- 1 (=128MB), 2 (=256MB), 4 (=512MB), 8(=1GB)
as suggested by Kalpak.



Here is the tool to make such calculations.

Result of running the tool on / partition of ext3 file system (each
chunk is 4 times a block group):

./cref.sh /dev/hda1 dmp /mnt/test 4

---

Number of files = 221763

Number of directories = 24456

Total size = 8193116 KB

Total data stored = 7179200 KB

Size of block groups = 131072 KB

Number of inodes per block group = 16288

Chunk size = 524288 KB

No. of cross references between directories and sub-directories = 869

No. of cross references between directories and file = 584

Total no. of cross references = 13806 (dir ref = 1453, file ref = 12353)

---



Once we have that, it would be nice if we can get data on results with
the tool from other people, especially with larger filesystem sizes.




Thanks,
Karuna


cref.tar.bz2
Description: BZip2 compressed data


Re: ChunkFS - measuring cross-chunk references

2007-05-19 Thread Karuna sagar K

On 4/25/07, Suparna Bhattacharya [EMAIL PROTECTED] wrote:

On Wed, Apr 25, 2007 at 05:50:55AM +0530, Karuna sagar K wrote:

One more set of numbers to calculate would be an estimate of cross-references
across chunks of block groups -- 1 (=128MB), 2 (=256MB), 4 (=512MB), 8(=1GB)
as suggested by Kalpak.



Here is the tool to make such calculations.

Result of running the tool on / partition of ext3 file system (each
chunk is 4 times a block group):

./cref.sh /dev/hda1 dmp /mnt/test 4

---

Number of files = 221763

Number of directories = 24456

Total size = 8193116 KB

Total data stored = 7179200 KB

Size of block groups = 131072 KB

Number of inodes per block group = 16288

Chunk size = 524288 KB

No. of cross references between directories and sub-directories = 869

No. of cross references between directories and file = 584

Total no. of cross references = 13806 (dir ref = 1453, file ref = 12353)

---



Once we have that, it would be nice if we can get data on results with
the tool from other people, especially with larger filesystem sizes.




Thanks,
Karuna


cref.tar.bz2
Description: BZip2 compressed data


Re: Testing framework

2007-04-25 Thread Karuna sagar K

On 4/23/07, Avishay Traeger <[EMAIL PROTECTED]> wrote:

On Mon, 2007-04-23 at 02:16 +0530, Karuna sagar K wrote:


You may want to check out the paper "EXPLODE: A Lightweight, General
System for Finding Serious Storage System Errors" from OSDI 2006 (if you
haven't already).  The idea sounds very similar to me, although I
haven't read all the details of your proposal.


EXPLODE is more of a generic tool i.e. it is used to find larger set
of errors/bugs in file systems than the Test framework which focuses
on the repair of file systems.

The Test framework is focused towards repairability of the file
systems, it doesnt use model checking concept, it uses replayable
corruption mechanism and is user space implementation. Thats the
reason why this is not similar to EXPLODE.



Avishay




Thanks,
Karuna
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing framework

2007-04-25 Thread Karuna sagar K

On 4/23/07, Avishay Traeger [EMAIL PROTECTED] wrote:

On Mon, 2007-04-23 at 02:16 +0530, Karuna sagar K wrote:
snip

You may want to check out the paper EXPLODE: A Lightweight, General
System for Finding Serious Storage System Errors from OSDI 2006 (if you
haven't already).  The idea sounds very similar to me, although I
haven't read all the details of your proposal.


EXPLODE is more of a generic tool i.e. it is used to find larger set
of errors/bugs in file systems than the Test framework which focuses
on the repair of file systems.

The Test framework is focused towards repairability of the file
systems, it doesnt use model checking concept, it uses replayable
corruption mechanism and is user space implementation. Thats the
reason why this is not similar to EXPLODE.



Avishay




Thanks,
Karuna
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


ChunkFS - measuring cross-chunk references

2007-04-24 Thread Karuna sagar K

On 4/24/07, Theodore Tso <[EMAIL PROTECTED]> wrote:

On Mon, Apr 23, 2007 at 02:53:33PM -0600, Andreas Dilger wrote:

.

It would also be good to distinguish between directories referencing
files in another chunk, and directories referencing subdirectories in
another chunk (which would be simpler to handle, given the topological
restrictions on directories, as compared to files and hard links).



Modified the tool to distinguish between
1. cross references between directories and files
2. cross references between directories and sub directories
3. cross references within a file (due to huge file size)

Below is the result from / partition of ext3 file system:

Number of files = 221794
Number of directories = 24457
Total size = 8193116 KB
Total data stored = 7187392 KB
Size of block groups = 131072 KB
Number of inodes per block group = 16288
No. of cross references between directories and sub-directories = 7791
No. of cross references between directories and file = 657
Total no. of cross references = 62018 (dir ref = 8448, file ref = 53570)

Thanks for the suggestions.


There may also be special things we will need to do to handle
scenarios such as BackupPC, where if it looks like a directory
contains a huge number of hard links to a particular chunk, we'll need
to make sure that directory is either created in the right chunk
(possibly with hints from the application) or migrated to the right
chunk (but this might cause the inode number of the directory to
change --- maybe we allow this as long as the directory has never been
stat'ed, so that the inode number has never been observed).

The other thing which we should consider is that chunkfs really
requires a 64-bit inode number space, which means either we only allow
it on 64-bit systems, or we need to consider a migration so that even
on 32-bit platforms, stat() functions like stat64(), insofar that it
uses a stat structure which returns a 64-bit ino_t.

   - Ted




Thanks,
Karuna


cref.tar.bz2
Description: BZip2 compressed data


ChunkFS - measuring cross-chunk references

2007-04-24 Thread Karuna sagar K

On 4/24/07, Theodore Tso [EMAIL PROTECTED] wrote:

On Mon, Apr 23, 2007 at 02:53:33PM -0600, Andreas Dilger wrote:

.

It would also be good to distinguish between directories referencing
files in another chunk, and directories referencing subdirectories in
another chunk (which would be simpler to handle, given the topological
restrictions on directories, as compared to files and hard links).



Modified the tool to distinguish between
1. cross references between directories and files
2. cross references between directories and sub directories
3. cross references within a file (due to huge file size)

Below is the result from / partition of ext3 file system:

Number of files = 221794
Number of directories = 24457
Total size = 8193116 KB
Total data stored = 7187392 KB
Size of block groups = 131072 KB
Number of inodes per block group = 16288
No. of cross references between directories and sub-directories = 7791
No. of cross references between directories and file = 657
Total no. of cross references = 62018 (dir ref = 8448, file ref = 53570)

Thanks for the suggestions.


There may also be special things we will need to do to handle
scenarios such as BackupPC, where if it looks like a directory
contains a huge number of hard links to a particular chunk, we'll need
to make sure that directory is either created in the right chunk
(possibly with hints from the application) or migrated to the right
chunk (but this might cause the inode number of the directory to
change --- maybe we allow this as long as the directory has never been
stat'ed, so that the inode number has never been observed).

The other thing which we should consider is that chunkfs really
requires a 64-bit inode number space, which means either we only allow
it on 64-bit systems, or we need to consider a migration so that even
on 32-bit platforms, stat() functions like stat64(), insofar that it
uses a stat structure which returns a 64-bit ino_t.

   - Ted




Thanks,
Karuna


cref.tar.bz2
Description: BZip2 compressed data


Re: Testing framework

2007-04-23 Thread Karuna sagar K

On 4/23/07, Kalpak Shah <[EMAIL PROTECTED]> wrote:

On Mon, 2007-04-23 at 02:16 +0530, Karuna sagar K wrote:


.

The file system is looked upon as a set of blocks (more precisely
metadata blocks). We randomly choose from this set of blocks to
corrupt. Hence we would be able to overcome the deficiency of the
previous approach. However this approach makes it difficult to have a
replayable corruption. Further thought about this approach has to be
given.


Fill a test filesystem with data and save it. Corrupt it by copying a
chunk of data from random locations A to B. Save positions A and B so
that you can reproduce the corruption.



Hey, thats a nice idea :). But, this woundnt reproduce the same
corruption right? Because, say, on first run of the tool there is
metadata stored at locations A and B and then on the second run there
may be user data present. I mean the allocation may be different.


Or corrupt random bits (ideally in metadata blocks) and maintain the
list of the bit numbers for reproducing the corruption.



.

The corrupted file system is repaired and recovered with 'fsck' or any
other tools; this phase considers the repair and recovery action on
the file system as a black box. The time taken to repair by the tool
is measured


I see that you are running fsck just once on the test filesystem. It
might be a good idea to run it twice and if second fsck does not find
the filesystem to be completely clean that means it is a bug in fsck.


You are right. Will modify that.







..

State of the either file system is stored, which may be huge, time
consuming and not necessary. So, we could have better ways of storing
the state.


Also, people may want to test with different mount options, so something
like "mount -t $fstype -o loop,$MOUNT_OPTIONS $imgname $mountpt" may be
useful. Similarly it may also be useful to have MKFS_OPTIONS while
formatting the filesystem.



Right. I didnt think of that. Will look into it.


Thanks,
Kalpak.

>
> Comments are welcome!!
>
> Thanks,
> Karuna




Thanks,
Karuna
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ChunkFS - measuring cross-chunk references

2007-04-23 Thread Karuna sagar K

Hi,

The tool estimates the cross-chunk references from an extt2/3 file
system. It considers a block group as one chunk and calcuates how many
block groups does a file span across. So, the block group size gives
the estimate of chunk size.

The file systems were aged for about 3-4 months on a developers laptop.

Should have given the background before. Below is the explanations for
the tool. Valh and others came up with this idea.

-
Chunkfs will only work if we have "few" cross-chunk references.  We
can estimate the effect of chunk size on the number of these
references using an existing ext2/3 file system and treating the block
groups as though they are chunks.  The basic idea is that we figure
out what the block group boundaries are and then find out which files
and directories span two or more block groups.

Step 1:
---

Get a real-world ext2/3 file system. A file system which has been in
use is required. One from a laptop or a server of any sort will do
fine.

Step 2:
---

Figure out where the block group boundaries are on disk. Two things
are to be known:

1. Which inode numbers are in which block group?
2. Which blocks are in which block group?

At the end of this step we should have a list that looks something like:

Block group 1: Inodes 11-343, blocks 1000-2
Block group 2: Inodes 344-576, blocks 2-4
[...]

Step 3:
---

For each file, get the inode number and use mapping from step 2 to
figure out which block group it is in.  Now use bmap() on each block
in the file, and find out the block number.  Use mapping from step 2
to figure out which block groups it has data in. For each file, record
the list of all block groups.

For each directory, get the inode number and map that to a block
group. Then get the inode numbers of all entries in the directory
(ignore symlinks) and map them to a block group.  For each directory,
record the list of all block groups.

Step 4:
---

Count the number of cross-chunk references this file system would
need.  This is done by going through each directory and file, and
adding up the number of block groups it uses MINUS one.  So if a file
was in block groups 3, 7, and 24, then you would add 2 to the total
number of cross-chunk references.  If a file was only in block group
2, then you would add 0 to the total.


On 4/22/07, Amit Gud <[EMAIL PROTECTED]> wrote:

Karuna sagar K wrote:
> Hi,
>
> The attached code contains program to estimate the cross-chunk
> references for ChunkFS file system (idea from Valh). Below are the
> results:
>

Nice to see some numbers! But would be really nice to know:

- what the chunk size is
- how the files were created or, more vaguely, how 'aged' the fs is
- what is the chunk allocation algorithm


Best,
AG
--
May the source be with you.
http://www.cis.ksu.edu/~gud





Thanks,
Karuna
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ChunkFS - measuring cross-chunk references

2007-04-23 Thread Karuna sagar K

Hi,

The tool estimates the cross-chunk references from an extt2/3 file
system. It considers a block group as one chunk and calcuates how many
block groups does a file span across. So, the block group size gives
the estimate of chunk size.

The file systems were aged for about 3-4 months on a developers laptop.

Should have given the background before. Below is the explanations for
the tool. Valh and others came up with this idea.

-
Chunkfs will only work if we have few cross-chunk references.  We
can estimate the effect of chunk size on the number of these
references using an existing ext2/3 file system and treating the block
groups as though they are chunks.  The basic idea is that we figure
out what the block group boundaries are and then find out which files
and directories span two or more block groups.

Step 1:
---

Get a real-world ext2/3 file system. A file system which has been in
use is required. One from a laptop or a server of any sort will do
fine.

Step 2:
---

Figure out where the block group boundaries are on disk. Two things
are to be known:

1. Which inode numbers are in which block group?
2. Which blocks are in which block group?

At the end of this step we should have a list that looks something like:

Block group 1: Inodes 11-343, blocks 1000-2
Block group 2: Inodes 344-576, blocks 2-4
[...]

Step 3:
---

For each file, get the inode number and use mapping from step 2 to
figure out which block group it is in.  Now use bmap() on each block
in the file, and find out the block number.  Use mapping from step 2
to figure out which block groups it has data in. For each file, record
the list of all block groups.

For each directory, get the inode number and map that to a block
group. Then get the inode numbers of all entries in the directory
(ignore symlinks) and map them to a block group.  For each directory,
record the list of all block groups.

Step 4:
---

Count the number of cross-chunk references this file system would
need.  This is done by going through each directory and file, and
adding up the number of block groups it uses MINUS one.  So if a file
was in block groups 3, 7, and 24, then you would add 2 to the total
number of cross-chunk references.  If a file was only in block group
2, then you would add 0 to the total.


On 4/22/07, Amit Gud [EMAIL PROTECTED] wrote:

Karuna sagar K wrote:
 Hi,

 The attached code contains program to estimate the cross-chunk
 references for ChunkFS file system (idea from Valh). Below are the
 results:


Nice to see some numbers! But would be really nice to know:

- what the chunk size is
- how the files were created or, more vaguely, how 'aged' the fs is
- what is the chunk allocation algorithm


Best,
AG
--
May the source be with you.
http://www.cis.ksu.edu/~gud





Thanks,
Karuna
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing framework

2007-04-23 Thread Karuna sagar K

On 4/23/07, Kalpak Shah [EMAIL PROTECTED] wrote:

On Mon, 2007-04-23 at 02:16 +0530, Karuna sagar K wrote:


.

The file system is looked upon as a set of blocks (more precisely
metadata blocks). We randomly choose from this set of blocks to
corrupt. Hence we would be able to overcome the deficiency of the
previous approach. However this approach makes it difficult to have a
replayable corruption. Further thought about this approach has to be
given.


Fill a test filesystem with data and save it. Corrupt it by copying a
chunk of data from random locations A to B. Save positions A and B so
that you can reproduce the corruption.



Hey, thats a nice idea :). But, this woundnt reproduce the same
corruption right? Because, say, on first run of the tool there is
metadata stored at locations A and B and then on the second run there
may be user data present. I mean the allocation may be different.


Or corrupt random bits (ideally in metadata blocks) and maintain the
list of the bit numbers for reproducing the corruption.



.

The corrupted file system is repaired and recovered with 'fsck' or any
other tools; this phase considers the repair and recovery action on
the file system as a black box. The time taken to repair by the tool
is measured


I see that you are running fsck just once on the test filesystem. It
might be a good idea to run it twice and if second fsck does not find
the filesystem to be completely clean that means it is a bug in fsck.


You are right. Will modify that.



snip



..

State of the either file system is stored, which may be huge, time
consuming and not necessary. So, we could have better ways of storing
the state.


Also, people may want to test with different mount options, so something
like mount -t $fstype -o loop,$MOUNT_OPTIONS $imgname $mountpt may be
useful. Similarly it may also be useful to have MKFS_OPTIONS while
formatting the filesystem.



Right. I didnt think of that. Will look into it.


Thanks,
Kalpak.


 Comments are welcome!!

 Thanks,
 Karuna




Thanks,
Karuna
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Testing framework

2007-04-22 Thread Karuna sagar K

Hi,

For some time I had been working on this file system test framework.
Now I have a implementation for the same and below is the explanation.
Any comments are welcome.

Introduction:
The testing tools and benchmarks available around do not take into
account the repair and recovery aspects of file systems. The test
framework described here focuses on repair and recovery capabilities
of file systems. Since most file systems use 'fsck' to recover from
file system inconsistencies, the test framework characterizes file
systems based on outcomes of running 'fsck'.

Overview:
The model can be described in brief as - prepare a file system, record
the state of the file system, corrupt it, use repair and recovery
tools and finally compare and report the status of the recovered file
system against its initial state.

Prepare Phase:
This is the first phase in the model. Here we prepare a file system to
carry out subsequent phases. A new file system image is created with
the specified name. 'mkfs' program is run on this image and then the
file system is aged after populating it sufficiently. This state of
the file system is considered as an ideal state.

Corruption Phase:
The file system prepared in the prepare phase is corrupted to simulate
a system crash or in general an inconsistency in the file system.
Obviously we are more interested in corrupting the metadata
information. A random corruption would provide us with the results
like that of fs_mutator or fs_fuzzer. However, for different test runs
the corruption would vary and hence it wouldn't be fair and tedious to
have a comparison between file systems. So, we would like have a
mechanism where the corruption could be replayable thus ensuring
almost same amount of corruption be reproduced across test runs. The
techniques for corruption are:

Higher level perspective/approach:
In this approach the file system is viewed as a tree of nodes, where
nodes are either files or directories. The metadata information
corresponding to some randomly chosen nodes of the tree are corrupted.
Nodes which are corrupted are marked or recorded to be able to replay
later. This file system is called source file system while the file
system on which we need to replay the corruption is called target file
system. The assumption is that the target file system contains a set
of files and directories which is a superset of that in the source
file system. Hence to replay the corruption we need point out which
nodes in the source file system were corrupted in the source file
system and corrupt the corresponding nodes in the target file system.

A major disadvantage with this approach is that on-disk structures
(like superblocks, block group descriptors, etc.) are not considered
for corruption.

Lower level perspective/approach:
The file system is looked upon as a set of blocks (more precisely
metadata blocks). We randomly choose from this set of blocks to
corrupt. Hence we would be able to overcome the deficiency of the
previous approach. However this approach makes it difficult to have a
replayable corruption. Further thought about this approach has to be
given.

We could have a blend of both the approaches in the program to
compromise between corruption and replayability.

Repair Phase:
The corrupted file system is repaired and recovered with 'fsck' or any
other tools; this phase considers the repair and recovery action on
the file system as a black box. The time taken to repair by the tool
is measured.

Comparison Phase:
The current state of the file system is compared with the ideal state
of the file system. The metadata information of the file system is
checked with that of the ideal file system and the outcome is noted to
summarize on this test run. If repair tool used is 100% effective then
the current state of the file system should be exactly the same as
that of the ideal file system. Simply checking for equality wouldn't
be right because it doesn't take care of lost and found files. Hence
we need to check node-by-node for each node in the ideal state of the
file system.

State Record:
The comparison phase requires that the ideal state of the file system
be known. Replicating the whole file system would eat up a lot of disk
space. Storing the state of the file system in memory would be costly
in case of huge file systems. So, we need to store the state of the
file system on the disk such that it wouldn't take up a lot of disk
space. We record the metadata information and store it onto a file.
One approach is replicating the metadata blocks of the source file
system and storing the replica blocks under a single file called state
file. Additional metadata such as checksum of the data blocks can be
stored in the same state file. However this may store some unnecessary
metadata information in the state file and hence swelling it up for
huge source file systems. So, instead of storing the metadata blocks
themselves we would summarize the information in them before storing
in the state 

ChunkFS - measuring cross-chunk references

2007-04-22 Thread Karuna sagar K

Hi,

The attached code contains program to estimate the cross-chunk
references for ChunkFS file system (idea from Valh). Below are the
results:

test on ext3, / partition-1 on 27 March 2007
Number of files = 217806
Number of directories = 24295
Total size = 8193116 KB
Total data stored = 7557892 KB
Size of block groups = 131072 KB
Number of inodes per block group = 16288
Total no. of cross references = 60657

test on ext3, / partition-1 on 22 March 2007
Number of files = 230615
Number of directories = 24243
Total size = 8193116 KB
Total data stored = 7167212 KB
Size of block groups = 131072 KB
Number of inodes per block group = 16288
Total no. of cross references = 62163

test on ext3, / partition-2 on 22 March 2007
Number of files = 79509
Number of directories = 6397
Total size = 3076888 KB
Total data stored = 1685100 KB
Size of block groups = 131072 KB
Number of inodes per block group = 16032
Total no. of cross references = 17996
---
test on ext3, /home partition-3 on 20 April 2007
Number of files = 157632
Number of directories = 13652
Total size = 10233404 KB
Total data stored = 9490196 KB
Size of block groups = 131072 KB
Number of inodes per block group = 16224
Total no. of cross references = 27184
---

Comments??

Thanks,
Karuna


cref.tar.bz2
Description: BZip2 compressed data


ChunkFS - measuring cross-chunk references

2007-04-22 Thread Karuna sagar K

Hi,

The attached code contains program to estimate the cross-chunk
references for ChunkFS file system (idea from Valh). Below are the
results:

test on ext3, / partition-1 on 27 March 2007
Number of files = 217806
Number of directories = 24295
Total size = 8193116 KB
Total data stored = 7557892 KB
Size of block groups = 131072 KB
Number of inodes per block group = 16288
Total no. of cross references = 60657

test on ext3, / partition-1 on 22 March 2007
Number of files = 230615
Number of directories = 24243
Total size = 8193116 KB
Total data stored = 7167212 KB
Size of block groups = 131072 KB
Number of inodes per block group = 16288
Total no. of cross references = 62163

test on ext3, / partition-2 on 22 March 2007
Number of files = 79509
Number of directories = 6397
Total size = 3076888 KB
Total data stored = 1685100 KB
Size of block groups = 131072 KB
Number of inodes per block group = 16032
Total no. of cross references = 17996
---
test on ext3, /home partition-3 on 20 April 2007
Number of files = 157632
Number of directories = 13652
Total size = 10233404 KB
Total data stored = 9490196 KB
Size of block groups = 131072 KB
Number of inodes per block group = 16224
Total no. of cross references = 27184
---

Comments??

Thanks,
Karuna


cref.tar.bz2
Description: BZip2 compressed data


Testing framework

2007-04-22 Thread Karuna sagar K

Hi,

For some time I had been working on this file system test framework.
Now I have a implementation for the same and below is the explanation.
Any comments are welcome.

Introduction:
The testing tools and benchmarks available around do not take into
account the repair and recovery aspects of file systems. The test
framework described here focuses on repair and recovery capabilities
of file systems. Since most file systems use 'fsck' to recover from
file system inconsistencies, the test framework characterizes file
systems based on outcomes of running 'fsck'.

Overview:
The model can be described in brief as - prepare a file system, record
the state of the file system, corrupt it, use repair and recovery
tools and finally compare and report the status of the recovered file
system against its initial state.

Prepare Phase:
This is the first phase in the model. Here we prepare a file system to
carry out subsequent phases. A new file system image is created with
the specified name. 'mkfs' program is run on this image and then the
file system is aged after populating it sufficiently. This state of
the file system is considered as an ideal state.

Corruption Phase:
The file system prepared in the prepare phase is corrupted to simulate
a system crash or in general an inconsistency in the file system.
Obviously we are more interested in corrupting the metadata
information. A random corruption would provide us with the results
like that of fs_mutator or fs_fuzzer. However, for different test runs
the corruption would vary and hence it wouldn't be fair and tedious to
have a comparison between file systems. So, we would like have a
mechanism where the corruption could be replayable thus ensuring
almost same amount of corruption be reproduced across test runs. The
techniques for corruption are:

Higher level perspective/approach:
In this approach the file system is viewed as a tree of nodes, where
nodes are either files or directories. The metadata information
corresponding to some randomly chosen nodes of the tree are corrupted.
Nodes which are corrupted are marked or recorded to be able to replay
later. This file system is called source file system while the file
system on which we need to replay the corruption is called target file
system. The assumption is that the target file system contains a set
of files and directories which is a superset of that in the source
file system. Hence to replay the corruption we need point out which
nodes in the source file system were corrupted in the source file
system and corrupt the corresponding nodes in the target file system.

A major disadvantage with this approach is that on-disk structures
(like superblocks, block group descriptors, etc.) are not considered
for corruption.

Lower level perspective/approach:
The file system is looked upon as a set of blocks (more precisely
metadata blocks). We randomly choose from this set of blocks to
corrupt. Hence we would be able to overcome the deficiency of the
previous approach. However this approach makes it difficult to have a
replayable corruption. Further thought about this approach has to be
given.

We could have a blend of both the approaches in the program to
compromise between corruption and replayability.

Repair Phase:
The corrupted file system is repaired and recovered with 'fsck' or any
other tools; this phase considers the repair and recovery action on
the file system as a black box. The time taken to repair by the tool
is measured.

Comparison Phase:
The current state of the file system is compared with the ideal state
of the file system. The metadata information of the file system is
checked with that of the ideal file system and the outcome is noted to
summarize on this test run. If repair tool used is 100% effective then
the current state of the file system should be exactly the same as
that of the ideal file system. Simply checking for equality wouldn't
be right because it doesn't take care of lost and found files. Hence
we need to check node-by-node for each node in the ideal state of the
file system.

State Record:
The comparison phase requires that the ideal state of the file system
be known. Replicating the whole file system would eat up a lot of disk
space. Storing the state of the file system in memory would be costly
in case of huge file systems. So, we need to store the state of the
file system on the disk such that it wouldn't take up a lot of disk
space. We record the metadata information and store it onto a file.
One approach is replicating the metadata blocks of the source file
system and storing the replica blocks under a single file called state
file. Additional metadata such as checksum of the data blocks can be
stored in the same state file. However this may store some unnecessary
metadata information in the state file and hence swelling it up for
huge source file systems. So, instead of storing the metadata blocks
themselves we would summarize the information in them before storing
in the state 

Testing framework

2006-12-28 Thread Karuna sagar k

Hi,

I am working on a testing framework for file systems focusing on
repair and recovery areas. Right now, I have been timing fsck and
trying to determine the effectiveness of fsck. The idea that I have is
below.

In abstract terms, I create a file system (ideal state), corrupt it,
run fsck on it and compare the recovered state with the ideal one. So
the whole process is divided into phases:

1. Prepare Phase - a new file system is created, populated and aged.
This state of the file system is consistent and is considered as ideal
state. This state is to be recorded for later comparisons (during the
comparison phase).

2. Corruption Phase - the file system on the disk is corrupted. We
should be able to reproduce such corruption on different test runs
(probably not very accurately). This way we would be able to compare
the file systems in a better way.

3. Repair Phase - fsck is run to repair and recover the disk to a
consistent state. The time taken by fsck is determined here.

4. Comparison Phase - the current state (recovered state) is compared
(a logical comparison) with the ideal state. The comparison tells what
fsck recovered, and how close are the two states.

Apart from this we would also require a component to record the state
of the file system. For this we construct a tree (which represents the
state of the file system) where each node stores some info on the
files and directories in the file system and the tree structure
records the parent children relationship among the files and
directories.

Currently I am focusing on the corruption and comparison phases:

Corruption Phase:
Focus - the corruption should be reproducible. This gives the control
over the comparison between test runs and file systems. I am assuming
here that different test runs would deal with same files.

I have come across two approaches here:

Approach 1 -
1. Among the files present on the disk, we randomly choose few files.
2. For each such file, we will then find the meta data block info
(inode block).
3. We seek to such blocks and corrupt them (the corruption is done by
randomly writing data to the block or some predetermined info)
4. Such files are noted to reproduce the similar corruption.

The above approach looks at the file system from a very abstract view.
But there may be file system specific data structures on the disk (eg.
group descriptors in case of ext2). These are not handled by this
approach directly.

Approach 2 - We pick up meta data blocks from the disk, and form a
logical disk structure containing just these meta data blocks. The
blocks on the logical disk map onto the physical disk. We pick up
randomly blocks from this and corrupt them.

Comparison Phase:
We determine the logical equivalence between the recovered and ideal
states of the disk i.e. say if a file was lost during corruption and
fsck recovered it and put it in the lost+found directory. We have to
recognize this as logical equivalent and not report that the file is
lost.

Right now I have a basic implementation of the above with random
corruption (using fsfuzzer logic).

Any suggestions?

Thanks,
Karuna
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Testing framework

2006-12-28 Thread Karuna sagar k

Hi,

I am working on a testing framework for file systems focusing on
repair and recovery areas. Right now, I have been timing fsck and
trying to determine the effectiveness of fsck. The idea that I have is
below.

In abstract terms, I create a file system (ideal state), corrupt it,
run fsck on it and compare the recovered state with the ideal one. So
the whole process is divided into phases:

1. Prepare Phase - a new file system is created, populated and aged.
This state of the file system is consistent and is considered as ideal
state. This state is to be recorded for later comparisons (during the
comparison phase).

2. Corruption Phase - the file system on the disk is corrupted. We
should be able to reproduce such corruption on different test runs
(probably not very accurately). This way we would be able to compare
the file systems in a better way.

3. Repair Phase - fsck is run to repair and recover the disk to a
consistent state. The time taken by fsck is determined here.

4. Comparison Phase - the current state (recovered state) is compared
(a logical comparison) with the ideal state. The comparison tells what
fsck recovered, and how close are the two states.

Apart from this we would also require a component to record the state
of the file system. For this we construct a tree (which represents the
state of the file system) where each node stores some info on the
files and directories in the file system and the tree structure
records the parent children relationship among the files and
directories.

Currently I am focusing on the corruption and comparison phases:

Corruption Phase:
Focus - the corruption should be reproducible. This gives the control
over the comparison between test runs and file systems. I am assuming
here that different test runs would deal with same files.

I have come across two approaches here:

Approach 1 -
1. Among the files present on the disk, we randomly choose few files.
2. For each such file, we will then find the meta data block info
(inode block).
3. We seek to such blocks and corrupt them (the corruption is done by
randomly writing data to the block or some predetermined info)
4. Such files are noted to reproduce the similar corruption.

The above approach looks at the file system from a very abstract view.
But there may be file system specific data structures on the disk (eg.
group descriptors in case of ext2). These are not handled by this
approach directly.

Approach 2 - We pick up meta data blocks from the disk, and form a
logical disk structure containing just these meta data blocks. The
blocks on the logical disk map onto the physical disk. We pick up
randomly blocks from this and corrupt them.

Comparison Phase:
We determine the logical equivalence between the recovered and ideal
states of the disk i.e. say if a file was lost during corruption and
fsck recovered it and put it in the lost+found directory. We have to
recognize this as logical equivalent and not report that the file is
lost.

Right now I have a basic implementation of the above with random
corruption (using fsfuzzer logic).

Any suggestions?

Thanks,
Karuna
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/