Re: ChunkFS - measuring cross-chunk references
On 4/25/07, Suparna Bhattacharya <[EMAIL PROTECTED]> wrote: On Wed, Apr 25, 2007 at 05:50:55AM +0530, Karuna sagar K wrote: One more set of numbers to calculate would be an estimate of cross-references across chunks of block groups -- 1 (=128MB), 2 (=256MB), 4 (=512MB), 8(=1GB) as suggested by Kalpak. Here is the tool to make such calculations. Result of running the tool on / partition of ext3 file system (each chunk is 4 times a block group): ./cref.sh /dev/hda1 dmp /mnt/test 4 --- Number of files = 221763 Number of directories = 24456 Total size = 8193116 KB Total data stored = 7179200 KB Size of block groups = 131072 KB Number of inodes per block group = 16288 Chunk size = 524288 KB No. of cross references between directories and sub-directories = 869 No. of cross references between directories and file = 584 Total no. of cross references = 13806 (dir ref = 1453, file ref = 12353) --- Once we have that, it would be nice if we can get data on results with the tool from other people, especially with larger filesystem sizes. Thanks, Karuna cref.tar.bz2 Description: BZip2 compressed data
Re: ChunkFS - measuring cross-chunk references
On 4/25/07, Suparna Bhattacharya [EMAIL PROTECTED] wrote: On Wed, Apr 25, 2007 at 05:50:55AM +0530, Karuna sagar K wrote: One more set of numbers to calculate would be an estimate of cross-references across chunks of block groups -- 1 (=128MB), 2 (=256MB), 4 (=512MB), 8(=1GB) as suggested by Kalpak. Here is the tool to make such calculations. Result of running the tool on / partition of ext3 file system (each chunk is 4 times a block group): ./cref.sh /dev/hda1 dmp /mnt/test 4 --- Number of files = 221763 Number of directories = 24456 Total size = 8193116 KB Total data stored = 7179200 KB Size of block groups = 131072 KB Number of inodes per block group = 16288 Chunk size = 524288 KB No. of cross references between directories and sub-directories = 869 No. of cross references between directories and file = 584 Total no. of cross references = 13806 (dir ref = 1453, file ref = 12353) --- Once we have that, it would be nice if we can get data on results with the tool from other people, especially with larger filesystem sizes. Thanks, Karuna cref.tar.bz2 Description: BZip2 compressed data
Re: Testing framework
On 4/23/07, Avishay Traeger <[EMAIL PROTECTED]> wrote: On Mon, 2007-04-23 at 02:16 +0530, Karuna sagar K wrote: You may want to check out the paper "EXPLODE: A Lightweight, General System for Finding Serious Storage System Errors" from OSDI 2006 (if you haven't already). The idea sounds very similar to me, although I haven't read all the details of your proposal. EXPLODE is more of a generic tool i.e. it is used to find larger set of errors/bugs in file systems than the Test framework which focuses on the repair of file systems. The Test framework is focused towards repairability of the file systems, it doesnt use model checking concept, it uses replayable corruption mechanism and is user space implementation. Thats the reason why this is not similar to EXPLODE. Avishay Thanks, Karuna - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Testing framework
On 4/23/07, Avishay Traeger [EMAIL PROTECTED] wrote: On Mon, 2007-04-23 at 02:16 +0530, Karuna sagar K wrote: snip You may want to check out the paper EXPLODE: A Lightweight, General System for Finding Serious Storage System Errors from OSDI 2006 (if you haven't already). The idea sounds very similar to me, although I haven't read all the details of your proposal. EXPLODE is more of a generic tool i.e. it is used to find larger set of errors/bugs in file systems than the Test framework which focuses on the repair of file systems. The Test framework is focused towards repairability of the file systems, it doesnt use model checking concept, it uses replayable corruption mechanism and is user space implementation. Thats the reason why this is not similar to EXPLODE. Avishay Thanks, Karuna - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
ChunkFS - measuring cross-chunk references
On 4/24/07, Theodore Tso <[EMAIL PROTECTED]> wrote: On Mon, Apr 23, 2007 at 02:53:33PM -0600, Andreas Dilger wrote: . It would also be good to distinguish between directories referencing files in another chunk, and directories referencing subdirectories in another chunk (which would be simpler to handle, given the topological restrictions on directories, as compared to files and hard links). Modified the tool to distinguish between 1. cross references between directories and files 2. cross references between directories and sub directories 3. cross references within a file (due to huge file size) Below is the result from / partition of ext3 file system: Number of files = 221794 Number of directories = 24457 Total size = 8193116 KB Total data stored = 7187392 KB Size of block groups = 131072 KB Number of inodes per block group = 16288 No. of cross references between directories and sub-directories = 7791 No. of cross references between directories and file = 657 Total no. of cross references = 62018 (dir ref = 8448, file ref = 53570) Thanks for the suggestions. There may also be special things we will need to do to handle scenarios such as BackupPC, where if it looks like a directory contains a huge number of hard links to a particular chunk, we'll need to make sure that directory is either created in the right chunk (possibly with hints from the application) or migrated to the right chunk (but this might cause the inode number of the directory to change --- maybe we allow this as long as the directory has never been stat'ed, so that the inode number has never been observed). The other thing which we should consider is that chunkfs really requires a 64-bit inode number space, which means either we only allow it on 64-bit systems, or we need to consider a migration so that even on 32-bit platforms, stat() functions like stat64(), insofar that it uses a stat structure which returns a 64-bit ino_t. - Ted Thanks, Karuna cref.tar.bz2 Description: BZip2 compressed data
ChunkFS - measuring cross-chunk references
On 4/24/07, Theodore Tso [EMAIL PROTECTED] wrote: On Mon, Apr 23, 2007 at 02:53:33PM -0600, Andreas Dilger wrote: . It would also be good to distinguish between directories referencing files in another chunk, and directories referencing subdirectories in another chunk (which would be simpler to handle, given the topological restrictions on directories, as compared to files and hard links). Modified the tool to distinguish between 1. cross references between directories and files 2. cross references between directories and sub directories 3. cross references within a file (due to huge file size) Below is the result from / partition of ext3 file system: Number of files = 221794 Number of directories = 24457 Total size = 8193116 KB Total data stored = 7187392 KB Size of block groups = 131072 KB Number of inodes per block group = 16288 No. of cross references between directories and sub-directories = 7791 No. of cross references between directories and file = 657 Total no. of cross references = 62018 (dir ref = 8448, file ref = 53570) Thanks for the suggestions. There may also be special things we will need to do to handle scenarios such as BackupPC, where if it looks like a directory contains a huge number of hard links to a particular chunk, we'll need to make sure that directory is either created in the right chunk (possibly with hints from the application) or migrated to the right chunk (but this might cause the inode number of the directory to change --- maybe we allow this as long as the directory has never been stat'ed, so that the inode number has never been observed). The other thing which we should consider is that chunkfs really requires a 64-bit inode number space, which means either we only allow it on 64-bit systems, or we need to consider a migration so that even on 32-bit platforms, stat() functions like stat64(), insofar that it uses a stat structure which returns a 64-bit ino_t. - Ted Thanks, Karuna cref.tar.bz2 Description: BZip2 compressed data
Re: Testing framework
On 4/23/07, Kalpak Shah <[EMAIL PROTECTED]> wrote: On Mon, 2007-04-23 at 02:16 +0530, Karuna sagar K wrote: . The file system is looked upon as a set of blocks (more precisely metadata blocks). We randomly choose from this set of blocks to corrupt. Hence we would be able to overcome the deficiency of the previous approach. However this approach makes it difficult to have a replayable corruption. Further thought about this approach has to be given. Fill a test filesystem with data and save it. Corrupt it by copying a chunk of data from random locations A to B. Save positions A and B so that you can reproduce the corruption. Hey, thats a nice idea :). But, this woundnt reproduce the same corruption right? Because, say, on first run of the tool there is metadata stored at locations A and B and then on the second run there may be user data present. I mean the allocation may be different. Or corrupt random bits (ideally in metadata blocks) and maintain the list of the bit numbers for reproducing the corruption. . The corrupted file system is repaired and recovered with 'fsck' or any other tools; this phase considers the repair and recovery action on the file system as a black box. The time taken to repair by the tool is measured I see that you are running fsck just once on the test filesystem. It might be a good idea to run it twice and if second fsck does not find the filesystem to be completely clean that means it is a bug in fsck. You are right. Will modify that. .. State of the either file system is stored, which may be huge, time consuming and not necessary. So, we could have better ways of storing the state. Also, people may want to test with different mount options, so something like "mount -t $fstype -o loop,$MOUNT_OPTIONS $imgname $mountpt" may be useful. Similarly it may also be useful to have MKFS_OPTIONS while formatting the filesystem. Right. I didnt think of that. Will look into it. Thanks, Kalpak. > > Comments are welcome!! > > Thanks, > Karuna Thanks, Karuna - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ChunkFS - measuring cross-chunk references
Hi, The tool estimates the cross-chunk references from an extt2/3 file system. It considers a block group as one chunk and calcuates how many block groups does a file span across. So, the block group size gives the estimate of chunk size. The file systems were aged for about 3-4 months on a developers laptop. Should have given the background before. Below is the explanations for the tool. Valh and others came up with this idea. - Chunkfs will only work if we have "few" cross-chunk references. We can estimate the effect of chunk size on the number of these references using an existing ext2/3 file system and treating the block groups as though they are chunks. The basic idea is that we figure out what the block group boundaries are and then find out which files and directories span two or more block groups. Step 1: --- Get a real-world ext2/3 file system. A file system which has been in use is required. One from a laptop or a server of any sort will do fine. Step 2: --- Figure out where the block group boundaries are on disk. Two things are to be known: 1. Which inode numbers are in which block group? 2. Which blocks are in which block group? At the end of this step we should have a list that looks something like: Block group 1: Inodes 11-343, blocks 1000-2 Block group 2: Inodes 344-576, blocks 2-4 [...] Step 3: --- For each file, get the inode number and use mapping from step 2 to figure out which block group it is in. Now use bmap() on each block in the file, and find out the block number. Use mapping from step 2 to figure out which block groups it has data in. For each file, record the list of all block groups. For each directory, get the inode number and map that to a block group. Then get the inode numbers of all entries in the directory (ignore symlinks) and map them to a block group. For each directory, record the list of all block groups. Step 4: --- Count the number of cross-chunk references this file system would need. This is done by going through each directory and file, and adding up the number of block groups it uses MINUS one. So if a file was in block groups 3, 7, and 24, then you would add 2 to the total number of cross-chunk references. If a file was only in block group 2, then you would add 0 to the total. On 4/22/07, Amit Gud <[EMAIL PROTECTED]> wrote: Karuna sagar K wrote: > Hi, > > The attached code contains program to estimate the cross-chunk > references for ChunkFS file system (idea from Valh). Below are the > results: > Nice to see some numbers! But would be really nice to know: - what the chunk size is - how the files were created or, more vaguely, how 'aged' the fs is - what is the chunk allocation algorithm Best, AG -- May the source be with you. http://www.cis.ksu.edu/~gud Thanks, Karuna - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ChunkFS - measuring cross-chunk references
Hi, The tool estimates the cross-chunk references from an extt2/3 file system. It considers a block group as one chunk and calcuates how many block groups does a file span across. So, the block group size gives the estimate of chunk size. The file systems were aged for about 3-4 months on a developers laptop. Should have given the background before. Below is the explanations for the tool. Valh and others came up with this idea. - Chunkfs will only work if we have few cross-chunk references. We can estimate the effect of chunk size on the number of these references using an existing ext2/3 file system and treating the block groups as though they are chunks. The basic idea is that we figure out what the block group boundaries are and then find out which files and directories span two or more block groups. Step 1: --- Get a real-world ext2/3 file system. A file system which has been in use is required. One from a laptop or a server of any sort will do fine. Step 2: --- Figure out where the block group boundaries are on disk. Two things are to be known: 1. Which inode numbers are in which block group? 2. Which blocks are in which block group? At the end of this step we should have a list that looks something like: Block group 1: Inodes 11-343, blocks 1000-2 Block group 2: Inodes 344-576, blocks 2-4 [...] Step 3: --- For each file, get the inode number and use mapping from step 2 to figure out which block group it is in. Now use bmap() on each block in the file, and find out the block number. Use mapping from step 2 to figure out which block groups it has data in. For each file, record the list of all block groups. For each directory, get the inode number and map that to a block group. Then get the inode numbers of all entries in the directory (ignore symlinks) and map them to a block group. For each directory, record the list of all block groups. Step 4: --- Count the number of cross-chunk references this file system would need. This is done by going through each directory and file, and adding up the number of block groups it uses MINUS one. So if a file was in block groups 3, 7, and 24, then you would add 2 to the total number of cross-chunk references. If a file was only in block group 2, then you would add 0 to the total. On 4/22/07, Amit Gud [EMAIL PROTECTED] wrote: Karuna sagar K wrote: Hi, The attached code contains program to estimate the cross-chunk references for ChunkFS file system (idea from Valh). Below are the results: Nice to see some numbers! But would be really nice to know: - what the chunk size is - how the files were created or, more vaguely, how 'aged' the fs is - what is the chunk allocation algorithm Best, AG -- May the source be with you. http://www.cis.ksu.edu/~gud Thanks, Karuna - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Testing framework
On 4/23/07, Kalpak Shah [EMAIL PROTECTED] wrote: On Mon, 2007-04-23 at 02:16 +0530, Karuna sagar K wrote: . The file system is looked upon as a set of blocks (more precisely metadata blocks). We randomly choose from this set of blocks to corrupt. Hence we would be able to overcome the deficiency of the previous approach. However this approach makes it difficult to have a replayable corruption. Further thought about this approach has to be given. Fill a test filesystem with data and save it. Corrupt it by copying a chunk of data from random locations A to B. Save positions A and B so that you can reproduce the corruption. Hey, thats a nice idea :). But, this woundnt reproduce the same corruption right? Because, say, on first run of the tool there is metadata stored at locations A and B and then on the second run there may be user data present. I mean the allocation may be different. Or corrupt random bits (ideally in metadata blocks) and maintain the list of the bit numbers for reproducing the corruption. . The corrupted file system is repaired and recovered with 'fsck' or any other tools; this phase considers the repair and recovery action on the file system as a black box. The time taken to repair by the tool is measured I see that you are running fsck just once on the test filesystem. It might be a good idea to run it twice and if second fsck does not find the filesystem to be completely clean that means it is a bug in fsck. You are right. Will modify that. snip .. State of the either file system is stored, which may be huge, time consuming and not necessary. So, we could have better ways of storing the state. Also, people may want to test with different mount options, so something like mount -t $fstype -o loop,$MOUNT_OPTIONS $imgname $mountpt may be useful. Similarly it may also be useful to have MKFS_OPTIONS while formatting the filesystem. Right. I didnt think of that. Will look into it. Thanks, Kalpak. Comments are welcome!! Thanks, Karuna Thanks, Karuna - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Testing framework
Hi, For some time I had been working on this file system test framework. Now I have a implementation for the same and below is the explanation. Any comments are welcome. Introduction: The testing tools and benchmarks available around do not take into account the repair and recovery aspects of file systems. The test framework described here focuses on repair and recovery capabilities of file systems. Since most file systems use 'fsck' to recover from file system inconsistencies, the test framework characterizes file systems based on outcomes of running 'fsck'. Overview: The model can be described in brief as - prepare a file system, record the state of the file system, corrupt it, use repair and recovery tools and finally compare and report the status of the recovered file system against its initial state. Prepare Phase: This is the first phase in the model. Here we prepare a file system to carry out subsequent phases. A new file system image is created with the specified name. 'mkfs' program is run on this image and then the file system is aged after populating it sufficiently. This state of the file system is considered as an ideal state. Corruption Phase: The file system prepared in the prepare phase is corrupted to simulate a system crash or in general an inconsistency in the file system. Obviously we are more interested in corrupting the metadata information. A random corruption would provide us with the results like that of fs_mutator or fs_fuzzer. However, for different test runs the corruption would vary and hence it wouldn't be fair and tedious to have a comparison between file systems. So, we would like have a mechanism where the corruption could be replayable thus ensuring almost same amount of corruption be reproduced across test runs. The techniques for corruption are: Higher level perspective/approach: In this approach the file system is viewed as a tree of nodes, where nodes are either files or directories. The metadata information corresponding to some randomly chosen nodes of the tree are corrupted. Nodes which are corrupted are marked or recorded to be able to replay later. This file system is called source file system while the file system on which we need to replay the corruption is called target file system. The assumption is that the target file system contains a set of files and directories which is a superset of that in the source file system. Hence to replay the corruption we need point out which nodes in the source file system were corrupted in the source file system and corrupt the corresponding nodes in the target file system. A major disadvantage with this approach is that on-disk structures (like superblocks, block group descriptors, etc.) are not considered for corruption. Lower level perspective/approach: The file system is looked upon as a set of blocks (more precisely metadata blocks). We randomly choose from this set of blocks to corrupt. Hence we would be able to overcome the deficiency of the previous approach. However this approach makes it difficult to have a replayable corruption. Further thought about this approach has to be given. We could have a blend of both the approaches in the program to compromise between corruption and replayability. Repair Phase: The corrupted file system is repaired and recovered with 'fsck' or any other tools; this phase considers the repair and recovery action on the file system as a black box. The time taken to repair by the tool is measured. Comparison Phase: The current state of the file system is compared with the ideal state of the file system. The metadata information of the file system is checked with that of the ideal file system and the outcome is noted to summarize on this test run. If repair tool used is 100% effective then the current state of the file system should be exactly the same as that of the ideal file system. Simply checking for equality wouldn't be right because it doesn't take care of lost and found files. Hence we need to check node-by-node for each node in the ideal state of the file system. State Record: The comparison phase requires that the ideal state of the file system be known. Replicating the whole file system would eat up a lot of disk space. Storing the state of the file system in memory would be costly in case of huge file systems. So, we need to store the state of the file system on the disk such that it wouldn't take up a lot of disk space. We record the metadata information and store it onto a file. One approach is replicating the metadata blocks of the source file system and storing the replica blocks under a single file called state file. Additional metadata such as checksum of the data blocks can be stored in the same state file. However this may store some unnecessary metadata information in the state file and hence swelling it up for huge source file systems. So, instead of storing the metadata blocks themselves we would summarize the information in them before storing in the state
ChunkFS - measuring cross-chunk references
Hi, The attached code contains program to estimate the cross-chunk references for ChunkFS file system (idea from Valh). Below are the results: test on ext3, / partition-1 on 27 March 2007 Number of files = 217806 Number of directories = 24295 Total size = 8193116 KB Total data stored = 7557892 KB Size of block groups = 131072 KB Number of inodes per block group = 16288 Total no. of cross references = 60657 test on ext3, / partition-1 on 22 March 2007 Number of files = 230615 Number of directories = 24243 Total size = 8193116 KB Total data stored = 7167212 KB Size of block groups = 131072 KB Number of inodes per block group = 16288 Total no. of cross references = 62163 test on ext3, / partition-2 on 22 March 2007 Number of files = 79509 Number of directories = 6397 Total size = 3076888 KB Total data stored = 1685100 KB Size of block groups = 131072 KB Number of inodes per block group = 16032 Total no. of cross references = 17996 --- test on ext3, /home partition-3 on 20 April 2007 Number of files = 157632 Number of directories = 13652 Total size = 10233404 KB Total data stored = 9490196 KB Size of block groups = 131072 KB Number of inodes per block group = 16224 Total no. of cross references = 27184 --- Comments?? Thanks, Karuna cref.tar.bz2 Description: BZip2 compressed data
ChunkFS - measuring cross-chunk references
Hi, The attached code contains program to estimate the cross-chunk references for ChunkFS file system (idea from Valh). Below are the results: test on ext3, / partition-1 on 27 March 2007 Number of files = 217806 Number of directories = 24295 Total size = 8193116 KB Total data stored = 7557892 KB Size of block groups = 131072 KB Number of inodes per block group = 16288 Total no. of cross references = 60657 test on ext3, / partition-1 on 22 March 2007 Number of files = 230615 Number of directories = 24243 Total size = 8193116 KB Total data stored = 7167212 KB Size of block groups = 131072 KB Number of inodes per block group = 16288 Total no. of cross references = 62163 test on ext3, / partition-2 on 22 March 2007 Number of files = 79509 Number of directories = 6397 Total size = 3076888 KB Total data stored = 1685100 KB Size of block groups = 131072 KB Number of inodes per block group = 16032 Total no. of cross references = 17996 --- test on ext3, /home partition-3 on 20 April 2007 Number of files = 157632 Number of directories = 13652 Total size = 10233404 KB Total data stored = 9490196 KB Size of block groups = 131072 KB Number of inodes per block group = 16224 Total no. of cross references = 27184 --- Comments?? Thanks, Karuna cref.tar.bz2 Description: BZip2 compressed data
Testing framework
Hi, For some time I had been working on this file system test framework. Now I have a implementation for the same and below is the explanation. Any comments are welcome. Introduction: The testing tools and benchmarks available around do not take into account the repair and recovery aspects of file systems. The test framework described here focuses on repair and recovery capabilities of file systems. Since most file systems use 'fsck' to recover from file system inconsistencies, the test framework characterizes file systems based on outcomes of running 'fsck'. Overview: The model can be described in brief as - prepare a file system, record the state of the file system, corrupt it, use repair and recovery tools and finally compare and report the status of the recovered file system against its initial state. Prepare Phase: This is the first phase in the model. Here we prepare a file system to carry out subsequent phases. A new file system image is created with the specified name. 'mkfs' program is run on this image and then the file system is aged after populating it sufficiently. This state of the file system is considered as an ideal state. Corruption Phase: The file system prepared in the prepare phase is corrupted to simulate a system crash or in general an inconsistency in the file system. Obviously we are more interested in corrupting the metadata information. A random corruption would provide us with the results like that of fs_mutator or fs_fuzzer. However, for different test runs the corruption would vary and hence it wouldn't be fair and tedious to have a comparison between file systems. So, we would like have a mechanism where the corruption could be replayable thus ensuring almost same amount of corruption be reproduced across test runs. The techniques for corruption are: Higher level perspective/approach: In this approach the file system is viewed as a tree of nodes, where nodes are either files or directories. The metadata information corresponding to some randomly chosen nodes of the tree are corrupted. Nodes which are corrupted are marked or recorded to be able to replay later. This file system is called source file system while the file system on which we need to replay the corruption is called target file system. The assumption is that the target file system contains a set of files and directories which is a superset of that in the source file system. Hence to replay the corruption we need point out which nodes in the source file system were corrupted in the source file system and corrupt the corresponding nodes in the target file system. A major disadvantage with this approach is that on-disk structures (like superblocks, block group descriptors, etc.) are not considered for corruption. Lower level perspective/approach: The file system is looked upon as a set of blocks (more precisely metadata blocks). We randomly choose from this set of blocks to corrupt. Hence we would be able to overcome the deficiency of the previous approach. However this approach makes it difficult to have a replayable corruption. Further thought about this approach has to be given. We could have a blend of both the approaches in the program to compromise between corruption and replayability. Repair Phase: The corrupted file system is repaired and recovered with 'fsck' or any other tools; this phase considers the repair and recovery action on the file system as a black box. The time taken to repair by the tool is measured. Comparison Phase: The current state of the file system is compared with the ideal state of the file system. The metadata information of the file system is checked with that of the ideal file system and the outcome is noted to summarize on this test run. If repair tool used is 100% effective then the current state of the file system should be exactly the same as that of the ideal file system. Simply checking for equality wouldn't be right because it doesn't take care of lost and found files. Hence we need to check node-by-node for each node in the ideal state of the file system. State Record: The comparison phase requires that the ideal state of the file system be known. Replicating the whole file system would eat up a lot of disk space. Storing the state of the file system in memory would be costly in case of huge file systems. So, we need to store the state of the file system on the disk such that it wouldn't take up a lot of disk space. We record the metadata information and store it onto a file. One approach is replicating the metadata blocks of the source file system and storing the replica blocks under a single file called state file. Additional metadata such as checksum of the data blocks can be stored in the same state file. However this may store some unnecessary metadata information in the state file and hence swelling it up for huge source file systems. So, instead of storing the metadata blocks themselves we would summarize the information in them before storing in the state
Testing framework
Hi, I am working on a testing framework for file systems focusing on repair and recovery areas. Right now, I have been timing fsck and trying to determine the effectiveness of fsck. The idea that I have is below. In abstract terms, I create a file system (ideal state), corrupt it, run fsck on it and compare the recovered state with the ideal one. So the whole process is divided into phases: 1. Prepare Phase - a new file system is created, populated and aged. This state of the file system is consistent and is considered as ideal state. This state is to be recorded for later comparisons (during the comparison phase). 2. Corruption Phase - the file system on the disk is corrupted. We should be able to reproduce such corruption on different test runs (probably not very accurately). This way we would be able to compare the file systems in a better way. 3. Repair Phase - fsck is run to repair and recover the disk to a consistent state. The time taken by fsck is determined here. 4. Comparison Phase - the current state (recovered state) is compared (a logical comparison) with the ideal state. The comparison tells what fsck recovered, and how close are the two states. Apart from this we would also require a component to record the state of the file system. For this we construct a tree (which represents the state of the file system) where each node stores some info on the files and directories in the file system and the tree structure records the parent children relationship among the files and directories. Currently I am focusing on the corruption and comparison phases: Corruption Phase: Focus - the corruption should be reproducible. This gives the control over the comparison between test runs and file systems. I am assuming here that different test runs would deal with same files. I have come across two approaches here: Approach 1 - 1. Among the files present on the disk, we randomly choose few files. 2. For each such file, we will then find the meta data block info (inode block). 3. We seek to such blocks and corrupt them (the corruption is done by randomly writing data to the block or some predetermined info) 4. Such files are noted to reproduce the similar corruption. The above approach looks at the file system from a very abstract view. But there may be file system specific data structures on the disk (eg. group descriptors in case of ext2). These are not handled by this approach directly. Approach 2 - We pick up meta data blocks from the disk, and form a logical disk structure containing just these meta data blocks. The blocks on the logical disk map onto the physical disk. We pick up randomly blocks from this and corrupt them. Comparison Phase: We determine the logical equivalence between the recovered and ideal states of the disk i.e. say if a file was lost during corruption and fsck recovered it and put it in the lost+found directory. We have to recognize this as logical equivalent and not report that the file is lost. Right now I have a basic implementation of the above with random corruption (using fsfuzzer logic). Any suggestions? Thanks, Karuna - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Testing framework
Hi, I am working on a testing framework for file systems focusing on repair and recovery areas. Right now, I have been timing fsck and trying to determine the effectiveness of fsck. The idea that I have is below. In abstract terms, I create a file system (ideal state), corrupt it, run fsck on it and compare the recovered state with the ideal one. So the whole process is divided into phases: 1. Prepare Phase - a new file system is created, populated and aged. This state of the file system is consistent and is considered as ideal state. This state is to be recorded for later comparisons (during the comparison phase). 2. Corruption Phase - the file system on the disk is corrupted. We should be able to reproduce such corruption on different test runs (probably not very accurately). This way we would be able to compare the file systems in a better way. 3. Repair Phase - fsck is run to repair and recover the disk to a consistent state. The time taken by fsck is determined here. 4. Comparison Phase - the current state (recovered state) is compared (a logical comparison) with the ideal state. The comparison tells what fsck recovered, and how close are the two states. Apart from this we would also require a component to record the state of the file system. For this we construct a tree (which represents the state of the file system) where each node stores some info on the files and directories in the file system and the tree structure records the parent children relationship among the files and directories. Currently I am focusing on the corruption and comparison phases: Corruption Phase: Focus - the corruption should be reproducible. This gives the control over the comparison between test runs and file systems. I am assuming here that different test runs would deal with same files. I have come across two approaches here: Approach 1 - 1. Among the files present on the disk, we randomly choose few files. 2. For each such file, we will then find the meta data block info (inode block). 3. We seek to such blocks and corrupt them (the corruption is done by randomly writing data to the block or some predetermined info) 4. Such files are noted to reproduce the similar corruption. The above approach looks at the file system from a very abstract view. But there may be file system specific data structures on the disk (eg. group descriptors in case of ext2). These are not handled by this approach directly. Approach 2 - We pick up meta data blocks from the disk, and form a logical disk structure containing just these meta data blocks. The blocks on the logical disk map onto the physical disk. We pick up randomly blocks from this and corrupt them. Comparison Phase: We determine the logical equivalence between the recovered and ideal states of the disk i.e. say if a file was lost during corruption and fsck recovered it and put it in the lost+found directory. We have to recognize this as logical equivalent and not report that the file is lost. Right now I have a basic implementation of the above with random corruption (using fsfuzzer logic). Any suggestions? Thanks, Karuna - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/