Hubert Chan wrote: > > This may be a bit of an unusual request but... > > I'm just starting my Master's degree in Computer Science (actually a > month ago), and I'm looking around for possible thesis topics (or maybe > for my Ph.D. too). I thought that maybe the ReiserFS crowd may have run > into some interesting questions/topics that they wouldn't mind > offloading onto a graduate student. ;-) > > I'm interested in the more theoretical (or as my advisor says, the > impractical) side of computing -- mostly algorithms, and in particular > graph and/or combinatorial algorithms. So whatever I work on may not be > immediately applicable to ReiserFS (or maybe it will). > > I'm impressed with ReiserFS (especially the features that aren't there > yet :-) ), and I'd like to help out. Since I don't know much about > programming filesystems, I thought this might be one way that I could > help. > > (I'm at the University of Waterloo (Canada), in case anyone was > wondering.) > > -- > Hubert Chan <[EMAIL PROTECTED]> - http://www.geocities.com/hubertchan/ > PGP/GnuPG key: 1024D/71FDA37F > Fingerprint: 6CC5 822D 2E55 494C 81DD 6F2C 6518 54DF 71FD A37F > Key available at wwwkeys.pgp.net. Please encrypt *all* e-mail to me.
One thing you can do, if you want something challenging, and suitable for a master's followed by a PhD thesis on the same topic, is the following: Implement compressed internal nodes.by compressing keys and compressing blocknumbers (Linux is going to 64 bit blocknumbers, and annoying people like me are complaining about the effect of this on the size of metadata such as internal nodes, compression could be the best solution for all persons). Implement variable length keys and store whole filenames in them not their hashes (Master's thesis completed here, probably could go into Linux 2.8, unless 2.6 takes too long in which case it might go into 2.6.... Your development will proceed in parallel to v4 development, and I estimate it will complete in 2003 if you work hard at it. v4.0 will ship Sep. 2002.) Implement stem compression of directory entries. Systematically review leaf node format implementing various compression techniques. Implement effective compression of large numbers of small files in a directory. See textbook "Managing Gigabytes" to see how the hypertext folks compress indices, and selectively synthesize some of their techniques into our code to the point that search engines can use reiserfs directories for their directories. Note that their techniques do not address the issue of dynamic insertion.... See Future Vision button on our website to get an idea of what will happen after directories become efficient enough to serve as implementations of inverted files. (At this point you have a PhD thesis, and have done something that technically enables radical filesystem semantics changes, and faster web search engines. It is likely that some search engine company or government funding agency will be sponsoring you at this time.) (very optional, probably not important enough to code) Implement key suffix compression (see Grey & Reuter page 863). Note that I have at this time no opinion on your coding skills, and I am sure you understand that your code will only go in if I like it when it is done. You should be aware of my business model being based on selling licenses in addition to the GPL, and it is therefor important to the business model that I be able to license any core code that is integrated into our main distribution of reiserfs. I think taxing proprietary software to pay for free software is perversely clever of me and amusing and profitable to do, but some view it as heretical (Stallman and Linus have no objection to it). If your code is good, it will be in my interest to finagle funding for you to produce more of it once I have seen how good it is. If you aren't comfortable with my selling licenses in addition to the GPL you might want to work on plugins (e.g. efficient insertion for files) instead. Plugins, being self-contained and not making the rest of the FS dependent on them, can better be made available as GPL only. Hans
