Peter,

I think you misunderstood what and why I was doing this.  Let me try to 
clarify.

My test is far from perfect.  Its mearly an exercise to verify the basic idea.

> Just by copying you are allowing reiser to optimize the dir.
Exactly, but I am copying in a way that implicitly suggests what order those 
files will be accessed in.

I was attempting to reorder the data on disk to minimize disk 
seeks with knowledge of the order that data will be accessed.  This was done 
by taking advantage of the way reiser assigns keys to files based on their 
name and its affinity to match key order with block order.  

> You're trying to duplicate what a tree-based design does automatically.
This works because of the tree-based design of reiser.

The reiser must assign each file (item actually) some key, why not take 
advantage of knowledge of the order those items will be accessed in?  The 
current key assignment algorithm is a best guess at that given the limited 
information it has (file/directory name).  Remember key assignment roughly 
translates to on disk position.

The relocate script can leave the file system in the exact same state from a 
semantic standpoint (what files and directories are there) but relocate the 
data on disk.  Copying those files to single directory with numeric names was 
a kludge to implicitly tell the file system to place those files in a 
specific order and near each other on disk.  The rename step is to switch the 
old unoptimized file position with the new more optimized position.

> Moreover, remember that reiser packs
> files into clusters so that you may read more than just your one file from
> time to time which could end up adding time to your test.
The boot optimization was over 3885 files.  Ideally those files would be 
ordered head to tail in a sequence that perfectly matches the order they will 
be read.  As a result multiple items in a node will all need to be read at 
nearly the same time.  That didn't happen in my test, but it was much closer 
to that after I ran the relocate script than before.  Hence the performance 
improvement.  With this script, reiser4 and a repacker I have reason to 
believe the ordering will be nearly perfect.  Of course, that is excluding 
random access patterns inside the same file and the directory data needed to 
get at the files.

This basic technique can be made into a boot script much like the readahead 
script already in Ubuntu, just improved.  Boot once with a profile option, it 
measures read patterns (already does this), then reorders data on disk with 
this trick, or maybe something better.  Then the next time you boot its 
1.5-2x faster.  Better yet, including this profile information in the distro 
packages.  When a package is installed this info is used to help assign item 
keys resulting in a better disk layout and faster boot times and no weird 
file copy rename mumbo jumbo.

I bring this up here because I expect with reiser4, a repacker, and this 
trick, reiser4 could deliver at least 50% better reproducible real world boot 
and app load performance than any other file system.  At least until other 
file system implement something similar, like what MS did with XP.  Can 
something similar be done (or has been) on ext(2/3/4), XFS, JFS or other 
linux file systems?

Windows XP boots much faster than Windows 2000 in part because it does what I 
am talking about.  File access is recorded at boot, then the disk is defraged 
with this knowledge.  Check out
http://msdn.microsoft.com/msdnmag/issues/01/12/xpkernel/default.aspx
under "Prefetch".

Also look at http://kerneltrap.org/node/2157

MS's implementation required implementing a defrag utility with a specific 
feature that could position disk data based on access logs.  Reiser4 can do 
the same thing as part of its basic functionality with the addition of a much 
much simpler tool to help assign keys based on that access log.  Then a 
repacker (when it devaporizes) can further optimize for that access pattern 
without any code specific to that purpose.  Seems like good orthogonal design 
to me.

Hope that clarifies.  Like my previous post, whatever it did, it did it in way 
to many words.



On Wednesday 13 September 2006 15:10, Peter wrote:
> On Wed, 13 Sep 2006 14:51:39 -0600, Quinn Harris wrote:
> > Thoughts?
>
> Yes. Why on earth would you do this? By copying the files and renaming and
> hardlinking them is nothing a sysadmin would ever do. Just by copying you
> are allowing reiser to optimize the dir. You're trying to duplicate what a
> tree-based design does automatically. Moreover, remember that reiser packs
> files into clusters so that you may read more than just your one file from
> time to time which could end up adding time to your test.
>
> If reiser needs speedup it certainly won't be done by renaming files!
>
> JM$0.02

-- 
Quinn Harris

Reply via email to