Hubert Chan wrote:
> 
> This may be a bit of an unusual request but...
> 
> I'm just starting my Master's degree in Computer Science (actually a
> month ago), and I'm looking around for possible thesis topics (or maybe
> for my Ph.D. too).  I thought that maybe the ReiserFS crowd may have run
> into some interesting questions/topics that they wouldn't mind
> offloading onto a graduate student. ;-)
> 
> I'm interested in the more theoretical (or as my advisor says, the
> impractical) side of computing -- mostly algorithms, and in particular
> graph and/or combinatorial algorithms.  So whatever I work on may not be
> immediately applicable to ReiserFS (or maybe it will).
> 
> I'm impressed with ReiserFS (especially the features that aren't there
> yet :-) ), and I'd like to help out.  Since I don't know much about
> programming filesystems, I thought this might be one way that I could
> help.
> 
> (I'm at the University of Waterloo (Canada), in case anyone was
> wondering.)
> 
> --
> Hubert Chan <[EMAIL PROTECTED]> - http://www.geocities.com/hubertchan/
> PGP/GnuPG key: 1024D/71FDA37F
> Fingerprint: 6CC5 822D 2E55 494C 81DD  6F2C 6518 54DF 71FD A37F
> Key available at wwwkeys.pgp.net.   Please encrypt *all* e-mail to me.


One thing you can do, if you want something challenging, and suitable for a
master's followed by a PhD thesis on the same topic, is the following:

Implement compressed internal nodes.by compressing keys and compressing
blocknumbers (Linux is going to 64 bit blocknumbers, and annoying people like me
are complaining about the effect of this on the size of metadata such as
internal nodes, compression could be the best solution for all persons).  

Implement variable length keys and store whole filenames in them not their
hashes

(Master's thesis completed here, probably could go into Linux 2.8, unless 2.6
takes too long in which case it might go into 2.6....  Your development will
proceed in parallel to v4 development, and I estimate it will complete in 2003
if you work hard at it.  v4.0 will ship Sep. 2002.)

Implement stem compression of directory entries.

Systematically review leaf node format implementing various compression
techniques.  Implement effective compression of large numbers of small files in
a directory.  See textbook "Managing Gigabytes" to see how the hypertext folks
compress indices, and selectively synthesize some of their techniques into our
code to the point that search engines can use reiserfs directories for their
directories.  Note that their techniques do not address the issue of dynamic
insertion....  See Future Vision button on our website to get an idea of what
will happen after directories become efficient enough to serve as
implementations of inverted files.

(At this point you have a PhD thesis, and have done something that technically
enables radical filesystem semantics changes, and faster web search engines.  It
is likely that some search engine company or government funding agency will be
sponsoring you at this time.)

(very optional, probably not important enough to code) Implement key suffix
compression (see Grey & Reuter page 863).

Note that I have at this time no opinion on your coding skills, and I am sure
you understand that your code will only go in if I like it when it is done.  You
should be aware of my business model being based on selling licenses in addition
to the GPL, and it is therefor important to the business model that I be able to
license any core code that is integrated into our main distribution of
reiserfs.  I think taxing proprietary software to pay for free software is
perversely clever of me and amusing and profitable to do, but some view it as
heretical (Stallman and Linus have no objection to it).  If your code is good,
it will be in my interest to finagle funding for you to produce more of it once
I have seen how good it is.  If you aren't comfortable with my selling licenses
in addition to the GPL you might want to work on plugins (e.g. efficient
insertion for files) instead.  Plugins, being self-contained and not making the
rest of the FS dependent on them, can better be made available as GPL only.

Hans

Reply via email to