Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2007-01-05 Thread Chaitanya Patti

> We are in the process of porting RAIF to 2.6.19 right now.  Should be done
> in early January.  The trick is that we are trying to keep the same source
> good for a wide range of kernel versions.  In fact, not too long ago we
> even were able to compile it for 2.4.24!
>
> Nikolai.

We now have RAIF for the 2.6.19 kernel available at:

ftp://ftp.fsl.cs.sunysb.edu/pub/raif/raif-1.1.tar.gz

This version is more stable but there are for sure still some remaining
bugs and we very much appreciate your feedback.

Thank you.

Chaitanya on behalf of the RAIF team.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2007-01-05 Thread Chaitanya Patti

 We are in the process of porting RAIF to 2.6.19 right now.  Should be done
 in early January.  The trick is that we are trying to keep the same source
 good for a wide range of kernel versions.  In fact, not too long ago we
 even were able to compile it for 2.4.24!

 Nikolai.

We now have RAIF for the 2.6.19 kernel available at:

ftp://ftp.fsl.cs.sunysb.edu/pub/raif/raif-1.1.tar.gz

This version is more stable but there are for sure still some remaining
bugs and we very much appreciate your feedback.

Thank you.

Chaitanya on behalf of the RAIF team.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-25 Thread Nikolai Joukov
> > Every stackable file system caches the data at its own level and
> > copies it from/to the lower file system's cached pages when necessary.
> > ...
> > this effectively reduces the system's cache memory size by two or more
> > times.
>
> It should not be that bad with a decent cache replacement policy; I
> wonder if observing the problem (that you corrected in the various ways
> you've described), you got some insight as to what exactly was happening.

I agree that appropriate replacement policies can partially eliminate
the double caching problem for stackable file systems.  In fact, that's
exactly what RAIF does: it forces the data pages of the lower file
systems to be evicted right after they are written and are not needed
anymore.  This solves the problem for most write-intensive workloads.
Without this optimization the situation is much worse because Linux is
trying to protect caches of different file systems from each other.  But,
as you mentioned, any cache replacement policy is optimized for some set
of workloads and is bad for some other set of workloads.  Also, caching
the data at multiple layers not just increases the memory consumption but
also adds CPU time overheads because of the data copying between the
pages.  I believe that the real solution to the problem is the ability to
share data pages between file systems.

Nikolai.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-25 Thread Nikolai Joukov
  Every stackable file system caches the data at its own level and
  copies it from/to the lower file system's cached pages when necessary.
  ...
  this effectively reduces the system's cache memory size by two or more
  times.

 It should not be that bad with a decent cache replacement policy; I
 wonder if observing the problem (that you corrected in the various ways
 you've described), you got some insight as to what exactly was happening.

I agree that appropriate replacement policies can partially eliminate
the double caching problem for stackable file systems.  In fact, that's
exactly what RAIF does: it forces the data pages of the lower file
systems to be evicted right after they are written and are not needed
anymore.  This solves the problem for most write-intensive workloads.
Without this optimization the situation is much worse because Linux is
trying to protect caches of different file systems from each other.  But,
as you mentioned, any cache replacement policy is optimized for some set
of workloads and is bad for some other set of workloads.  Also, caching
the data at multiple layers not just increases the memory consumption but
also adds CPU time overheads because of the data copying between the
pages.  I believe that the real solution to the problem is the ability to
share data pages between file systems.

Nikolai.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-22 Thread Nikolai Joukov
> > 3. A known ideal solution for this problem is sharing of the cached pages
> >between file systems.  We attempted to do it for Tracefs but the
> >resulting code is not beautiful and is potentially racy:
> >
> >Unfortunately, for fan-out file systems this solution requires even
> >more support from the OS.  However, this is what most OSs do
> >(including BSD and Windows) but unfortunately not Linux :-(
>
> VFS-hooks seem to be the cleanest solution not only for a stacked-fs, but
> also for many other situations.  It's rather sad that linux hasn't seen the
> light yet.

Jeff Sipek just got his proposal for a paper/discussion topic accepted to
the Linux Storage and Filesystems workshop, co-located with FAST.  The
topic for discussion will be what "surgery" the Linux kernel needs to
support stackable file systems properly.  I hope it is an indicator that
the situation with support of the stackable file systems in Linux may
improve soon.

Nikolai.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-22 Thread Nikolai Joukov
  3. A known ideal solution for this problem is sharing of the cached pages
 between file systems.  We attempted to do it for Tracefs but the
 resulting code is not beautiful and is potentially racy:
 http://marc.theaimsgroup.com/?l=linux-fsdevelm=113193082115222w=2
 Unfortunately, for fan-out file systems this solution requires even
 more support from the OS.  However, this is what most OSs do
 (including BSD and Windows) but unfortunately not Linux :-(

 VFS-hooks seem to be the cleanest solution not only for a stacked-fs, but
 also for many other situations.  It's rather sad that linux hasn't seen the
 light yet.

Jeff Sipek just got his proposal for a paper/discussion topic accepted to
the Linux Storage and Filesystems workshop, co-located with FAST.  The
topic for discussion will be what surgery the Linux kernel needs to
support stackable file systems properly.  I hope it is an indicator that
the situation with support of the stackable file systems in Linux may
improve soon.

Nikolai.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-16 Thread Nikolai Joukov
> I am looking at filling the net-pipe, and it only reaches 40-75% max, with
> some short 100% bursts, and a slow 10% start.  It seems that caching
> somewhat delays the writes, which then batch up and sync at various speeds.
> So you have the cache really hiding slow sync speeds.  To tune this, it may
> be helpful to turn off caching, which in turn could surface the actual
> bottlenecks.

Well, RAIF is an external kernel module and as such cannot change the
caching behavior much.  The notorious problem of all Linux stackable file
systems is double-caching of data.  Every stackable file system caches the
data at its own level and copies it from/to the lower file system's cached
pages when necessary.  This has some advantages for file systems that
change the data (e.g., encrypt or compress).  However, this effectively
reduces the system's cache memory size by two or more times.

Among all the existing stackable file systems, Tracefs and RAIF have the
highest requirements for low overheads.  Here are the solutions to the
double-caching problem that we tried:

1. Redirect read and write requests to the lower file systems directly.
   This allows to avoid caching of data at the RAIF level.  However, this
   optimization must be turned off as soon as a file is mmap'ed to avoid
   cache inconsistencies.  Also, even if the file is not mmap'ed, RAIF1
   still keeps the data copies for all the lower branches.  This
   optimization is implemented in RAIF but is not turned on by default.
   (We strip this and many other #ifdef'ed code fragments from the
   code releases automatically.)

2. We cache the data at the RAIF level.  When we write to the lower file
   systems we do allocate the lower pages and do copy the data but we
   also mark the lower pages with PG_reclaim flag before calling the
   lower writepage operation.  This releases all the lower pages right
   after the write completes.  This works fine for mmap'ed files and this
   is the default RAIF behavior now.  This solves the problem for most
   workloads that mix reads and writes.  For example, it improved
   Postmark's performance several times.  Unfortunately, this optimization
   does not improve performance for big sequential writes - the workload
   that you tried.  So essentially, you had a quarter of your original page
   cache while running your workload.

3. A known ideal solution for this problem is sharing of the cached pages
   between file systems.  We attempted to do it for Tracefs but the
   resulting code is not beautiful and is potentially racy:
   
   Unfortunately, for fan-out file systems this solution requires even
   more support from the OS.  However, this is what most OSs do
   (including BSD and Windows) but unfortunately not Linux :-(

> little overhead.  So for RAIF to be viable, it needs to have low overhead,
> which doesn't seem impossible to implement, given RAIF's simple but
> beautiful approach.

Thanks a lot!

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-16 Thread Nikolai Joukov
 I am looking at filling the net-pipe, and it only reaches 40-75% max, with
 some short 100% bursts, and a slow 10% start.  It seems that caching
 somewhat delays the writes, which then batch up and sync at various speeds.
 So you have the cache really hiding slow sync speeds.  To tune this, it may
 be helpful to turn off caching, which in turn could surface the actual
 bottlenecks.

Well, RAIF is an external kernel module and as such cannot change the
caching behavior much.  The notorious problem of all Linux stackable file
systems is double-caching of data.  Every stackable file system caches the
data at its own level and copies it from/to the lower file system's cached
pages when necessary.  This has some advantages for file systems that
change the data (e.g., encrypt or compress).  However, this effectively
reduces the system's cache memory size by two or more times.

Among all the existing stackable file systems, Tracefs and RAIF have the
highest requirements for low overheads.  Here are the solutions to the
double-caching problem that we tried:

1. Redirect read and write requests to the lower file systems directly.
   This allows to avoid caching of data at the RAIF level.  However, this
   optimization must be turned off as soon as a file is mmap'ed to avoid
   cache inconsistencies.  Also, even if the file is not mmap'ed, RAIF1
   still keeps the data copies for all the lower branches.  This
   optimization is implemented in RAIF but is not turned on by default.
   (We strip this and many other #ifdef'ed code fragments from the
   code releases automatically.)

2. We cache the data at the RAIF level.  When we write to the lower file
   systems we do allocate the lower pages and do copy the data but we
   also mark the lower pages with PG_reclaim flag before calling the
   lower writepage operation.  This releases all the lower pages right
   after the write completes.  This works fine for mmap'ed files and this
   is the default RAIF behavior now.  This solves the problem for most
   workloads that mix reads and writes.  For example, it improved
   Postmark's performance several times.  Unfortunately, this optimization
   does not improve performance for big sequential writes - the workload
   that you tried.  So essentially, you had a quarter of your original page
   cache while running your workload.

3. A known ideal solution for this problem is sharing of the cached pages
   between file systems.  We attempted to do it for Tracefs but the
   resulting code is not beautiful and is potentially racy:
   http://marc.theaimsgroup.com/?l=linux-fsdevelm=113193082115222w=2
   Unfortunately, for fan-out file systems this solution requires even
   more support from the OS.  However, this is what most OSs do
   (including BSD and Windows) but unfortunately not Linux :-(

 little overhead.  So for RAIF to be viable, it needs to have low overhead,
 which doesn't seem impossible to implement, given RAIF's simple but
 beautiful approach.

Thanks a lot!

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-15 Thread Nikolai Joukov
> >The idea behind the cloneset is that most of the blocks (or files)
> >do not change in either source or target.  This being the case its only
> necessary
> >to update the changed elements.  This means updates are incremental. Once
> >the system has figured out what it needs to update its usable and if you
> access
> >an element that should be updated you will see the correctly updated
> version - even
> >though backgound resyncing is still in progress.
>
> I still can't tell what you're describing.  With RAID1 as well, only
> changed elements ever get updated.  I have two identical filesystems,
> members of a RAIF set.  I change one file.  One file in each member
> filesystem gets updated, and I again have two identical filesystems.
>
> How would a cloneset work differently, and how would it be better?

Thanks, Bryan.  I was about to write almost the same.

> > This type of logic is great for backups.
>
> Can you give an example of using it for backup?

I guess, you can mount Versionfs (yet another stackable file system)
below RAIF and above one of the lower file systems or use some other
versioning file system such as ext3cow.  This will allow rolling back to
any older file system version at any time.

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-15 Thread Nikolai Joukov
> > We have designed a new stackable file system that we called RAIF:
> > Redundant Array of Independent Filesystems.
> >
> > Similar to Unionfs, RAIF is a fan-out file system and can be mounted over
> > many different disk-based, memory, network, and distributed file systems.
> > RAIF can use the stable and maintained code of the other file systems and
> > thus stay simple itself.  Similar to standard RAID, RAIF can replicate the
> > data or store it with parity on any subset of the lower file systems.  RAIF
> > has three main advantages over traditional driver-level RAID systems:
>
> this sounds very interesting. did you see the paper on chunkfs?
> http://www.usenix.org/events/hotdep06/tech/prelim_papers/henson/henson_html/

I saw Val at OSDI right before this HotDep talk and sure, I have seen the
paper :-)

> this sounds as if it may be something that you would be able to make a
> functional equivalent to chunkfs with your raid0 mode.

I also have this feeling.  RAIF0 is similar to chunkfs and allows more
flexibility.  Not only RAIF can stripe the data on many small local file
systems (possibly located on multiple drives) but also can stripe the data
on remote file systems.  In addition, it can keep the parity, use
per-file-type storage policies etc.  However, such a configuration would
mean lots and lots of lower file systems ( = branches = chunks).  I am
afraid that in this case RAIF's performance would be not so great due to
VFS API restrictions for operations like lookup.

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-15 Thread David Lang

On Wed, 13 Dec 2006, Nikolai Joukov wrote:


We have designed a new stackable file system that we called RAIF:
Redundant Array of Independent Filesystems.

Similar to Unionfs, RAIF is a fan-out file system and can be mounted over
many different disk-based, memory, network, and distributed file systems.
RAIF can use the stable and maintained code of the other file systems and
thus stay simple itself.  Similar to standard RAID, RAIF can replicate the
data or store it with parity on any subset of the lower file systems.  RAIF
has three main advantages over traditional driver-level RAID systems:


this sounds very interesting. did you see the paper on chunkfs? 
http://www.usenix.org/events/hotdep06/tech/prelim_papers/henson/henson_html/


this sounds as if it may be something that you would be able to make a 
functional equivalent to chunkfs with your raid0 mode.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-15 Thread Ed Tomlinson
On Friday 15 December 2006 15:11, Nikolai Joukov wrote:
> > On Wednesday 13 December 2006 12:47, Nikolai Joukov wrote:
> > > We have designed a new stackable file system that we called RAIF:
> > > Redundant Array of Independent Filesystems
> >
> > Do you have a function similar to an an EMC cloneset?   Basicily a cloneset
> > tracks what has changed in both the source and target luns (drives).  When 
> > one
> > updates the cloneset the target is made identical to the source.  Its a 
> > great
> > way to do backups.  Its an important feature to be able to write to the 
> > target drives.
> > I would love to see this working at a filesystem level.
> 
> Well, if you mount RAIF over your file system and a for-backups file
> system, RAIF can replicate the files on both of them automatically.  I
> guess that's what you need.

Yes and no.  The idea behind the cloneset is that most of the blocks (or files)
do not change in either source or target.  This being the case its only 
necessary
to update the changed elements.  This means updates are incremental.  Once
the system has figured out what it needs to update its usable and if you access
an element that should be updated you will see the correctly updated version - 
even 
though backgound resyncing is still in progress.  This type of logic is great 
for backups.

Thanks
Ed Tomlinson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-15 Thread Nikolai Joukov
> On Wednesday 13 December 2006 12:47, Nikolai Joukov wrote:
> > We have designed a new stackable file system that we called RAIF:
> > Redundant Array of Independent Filesystems
>
> Do you have a function similar to an an EMC cloneset?   Basicily a cloneset
> tracks what has changed in both the source and target luns (drives).  When one
> updates the cloneset the target is made identical to the source.  Its a great
> way to do backups.  Its an important feature to be able to write to the 
> target drives.
> I would love to see this working at a filesystem level.

Well, if you mount RAIF over your file system and a for-backups file
system, RAIF can replicate the files on both of them automatically.  I
guess that's what you need.

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-15 Thread Nikolai Joukov
> Nikolai Joukov wrote:
> > > > We started the project in April 2004.  Right now I am using it as my
> > > > /home/kolya file system at home.  We believe that at this stage RAIF
> > > > is mature enough for others to try it out.  The code is available at:
> > > >
> > > > 
> > > >
> > > > The code requires no kernel patches and compiles for a wide range of
> > > > kernels as a module.  The latest kernel we used it for is 2.6.13 and
> > > > we are in the process of porting it to 2.6.19.
> > > >
> > > > We will be happy to hear your back.
> > >
> > > When removing a file from the underlying branch, the oops below happens.
> > > Wouldn't it be possible to just fail the branch instead of oopsing?
> >
> > This is a known problem of all Linux stackable file systems.  Users are
> > not supposed to change the file systems below mounted stackable file
> > systems (but they can read them).  One of the ways to enforce it is to use
> > overlay mounts.  For example, mount the lower file systems at
> > /raif/b0 ... /raif/bN and then mount RAIF at /raif.  Stackable file
> > systems recently started getting into the kernel and we hope that there
> > will be a better solution for this problem in the future.  Having said
> > that, you are right: failing the branch would be the right thing to do.
>
> Good.  It seems that there is also some tmpfs/raif-over-nfs deadlock
> situation.  Can't really tell if it's the kernel or the raif, but when do
> you think the patches could be brought into sync with the current mainline?

It would be great if you could send us more details about how to recreate
this deadlock and we will take a look at it.  It would be even better if
you and everybody else who finds bugs in RAIF submit the bug reports to:

  

We are in the process of porting RAIF to 2.6.19 right now.  Should be done
in early January.  The trick is that we are trying to keep the same source
good for a wide range of kernel versions.  In fact, not too long ago we
even were able to compile it for 2.4.24!

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-15 Thread Nikolai Joukov
> Nikolai Joukov wrote:
> > > Nikolai Joukov wrote:
> > > > We have designed a new stackable file system that we called RAIF:
> > > > Redundant Array of Independent Filesystems.
> > >
> > > Great!
> > >
> > > > We have performed some benchmarking on a 3GHz PC with 2GB of RAM and
> > > > U320 SCSI disks.  Compared to the Linux RAID driver, RAIF has
> > > > overheads of about 20-25% under the Postmark v1.5 benchmark in case of
> > > > striping and replication.  In case of RAID4 and RAID5-like
> > > > configurations, RAIF performed about two times *better* than software
> > > > RAID and even better than an Adaptec 2120S RAID5 controller.
> > >
> > > I am not surprised.  RAID 4/5/6 performance is highly sensitive to the
> > > underlying hw, and thus needs a fair amount of fine tuning.
> >
> > Nevertheless, performance is not the biggest advantage of RAIF.  For
> > read-biased workloads RAID is always slightly faster than RAIF.  The
> > biggest advantages of RAIF are flexible configurations (e.g., can combine
> > NFS and local file systems), per-file-type storage policies, and the fact
> > that files are stored as files on the lower file systems (which is
> > convenient).
>
> Ok, a I was just about to inform you of a three nfs-branch raif which was
> unable to fill the net pipe.  So it looks like a 25% performance hit across
> the board.  Should be possible to reduce to sub 3% though once RAIF matures,
> don't you think?

Hmmm.  Which workload did you try?  Which RAIF level did you use: RAIF0
(striping), replication (RAIF1, default), or striping with parity (RAIF4,
5, 6)?  Which hardware did you use?  RAIF has to consume extra CPU time
and on older machines the overheads seem to be higher (CPUs are getting
faster than I/O devices at a faster pace).  Also, I guess you are
comparing RAIF mounted over three NFS branches with NFS alone, right?
It doesn't seem to be very fair to me :-)

Recently we solved the double-caching problem, which improved RAIF's
performance by an order of magnitude under I/O-intensive workloads.
(Normally, Linux stackable file systems cache the data twice.)
Unfortunately, many VFS meta-operations are synchronous (e.g., lookup).
RAIF has to wait on such operations in sequence for every branch
involved.  (This is different from, say, readpage operation.  We
call readpage on all the branches right away and then wait for their
simultaneous operation.)  Sequential waiting on lookups should be OK for
multi-threaded workloads but may result in extra elapsed time for the
single-threaded workloads.  Again, elegant solutions may require VFS API
changes.  Alternatively, we can create kernel threads for every branch.

I am not sure about 3% overheads in all the cases compared to NFS alone.
On one hand, there should be some price to pay for the extra
functionality.  On the other hand, for some workloads RAIF should
even improve performance compared to a single NFS because of the load
distribution.  In general, I agree that there are still many things we can
optimize.

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-15 Thread Nikolai Joukov
> On Friday 15 December 2006 10:01, Nikolai Joukov wrote:
> > > Nikolai Joukov wrote:
> > > > We have designed a new stackable file system that we called RAIF:
> > > > Redundant Array of Independent Filesystems.
> > >
> > > Great!
>
> Yes, definitely...
>
> I see the major benefit being in the mobile, industrial and embedded systems
> arena. Perhaps this might come as a suprise to people, but a very large and
> ever growing number (perhaps even most) Linux devices don't use block devices
> for storage. Instead they use flash file systems or nfs, niether of which use
> local block devices.
>
> It looks like RAIF gives a way to provide redundancy etc on these devices.

Good point!  Also, RAIF can store different file types differently.
Therefore, it is possible to mount RAIF over file systems with lots of
storage space and a flash file system (with usually less space).  In this
case, RAIF can be configured to use flash to keep replicas of the most
important data only.   And yes, thanks to the stackable nature of RAIF no
explicit flash support is required.  RAIF can reuse existing file systems
designed for flash media (e.g., JFFS2).

Nikolai.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-15 Thread Ed Tomlinson
On Wednesday 13 December 2006 12:47, Nikolai Joukov wrote:
> We have designed a new stackable file system that we called RAIF:
> Redundant Array of Independent Filesystems

Do you have a function similar to an an EMC cloneset?   Basicily a cloneset
tracks what has changed in both the source and target luns (drives).  When one
updates the cloneset the target is made identical to the source.  Its a great
way to do backups.  Its an important feature to be able to write to the target 
drives.
I would love to see this working at a filesystem level.

Thanks
Ed Tomlinson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-15 Thread Ed Tomlinson
On Wednesday 13 December 2006 12:47, Nikolai Joukov wrote:
 We have designed a new stackable file system that we called RAIF:
 Redundant Array of Independent Filesystems

Do you have a function similar to an an EMC cloneset?   Basicily a cloneset
tracks what has changed in both the source and target luns (drives).  When one
updates the cloneset the target is made identical to the source.  Its a great
way to do backups.  Its an important feature to be able to write to the target 
drives.
I would love to see this working at a filesystem level.

Thanks
Ed Tomlinson
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-15 Thread David Lang

On Wed, 13 Dec 2006, Nikolai Joukov wrote:


We have designed a new stackable file system that we called RAIF:
Redundant Array of Independent Filesystems.

Similar to Unionfs, RAIF is a fan-out file system and can be mounted over
many different disk-based, memory, network, and distributed file systems.
RAIF can use the stable and maintained code of the other file systems and
thus stay simple itself.  Similar to standard RAID, RAIF can replicate the
data or store it with parity on any subset of the lower file systems.  RAIF
has three main advantages over traditional driver-level RAID systems:


this sounds very interesting. did you see the paper on chunkfs? 
http://www.usenix.org/events/hotdep06/tech/prelim_papers/henson/henson_html/


this sounds as if it may be something that you would be able to make a 
functional equivalent to chunkfs with your raid0 mode.


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-15 Thread Nikolai Joukov
  We have designed a new stackable file system that we called RAIF:
  Redundant Array of Independent Filesystems.
 
  Similar to Unionfs, RAIF is a fan-out file system and can be mounted over
  many different disk-based, memory, network, and distributed file systems.
  RAIF can use the stable and maintained code of the other file systems and
  thus stay simple itself.  Similar to standard RAID, RAIF can replicate the
  data or store it with parity on any subset of the lower file systems.  RAIF
  has three main advantages over traditional driver-level RAID systems:

 this sounds very interesting. did you see the paper on chunkfs?
 http://www.usenix.org/events/hotdep06/tech/prelim_papers/henson/henson_html/

I saw Val at OSDI right before this HotDep talk and sure, I have seen the
paper :-)

 this sounds as if it may be something that you would be able to make a
 functional equivalent to chunkfs with your raid0 mode.

I also have this feeling.  RAIF0 is similar to chunkfs and allows more
flexibility.  Not only RAIF can stripe the data on many small local file
systems (possibly located on multiple drives) but also can stripe the data
on remote file systems.  In addition, it can keep the parity, use
per-file-type storage policies etc.  However, such a configuration would
mean lots and lots of lower file systems ( = branches = chunks).  I am
afraid that in this case RAIF's performance would be not so great due to
VFS API restrictions for operations like lookup.

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-15 Thread Nikolai Joukov
 On Friday 15 December 2006 10:01, Nikolai Joukov wrote:
   Nikolai Joukov wrote:
We have designed a new stackable file system that we called RAIF:
Redundant Array of Independent Filesystems.
  
   Great!

 Yes, definitely...

 I see the major benefit being in the mobile, industrial and embedded systems
 arena. Perhaps this might come as a suprise to people, but a very large and
 ever growing number (perhaps even most) Linux devices don't use block devices
 for storage. Instead they use flash file systems or nfs, niether of which use
 local block devices.

 It looks like RAIF gives a way to provide redundancy etc on these devices.

Good point!  Also, RAIF can store different file types differently.
Therefore, it is possible to mount RAIF over file systems with lots of
storage space and a flash file system (with usually less space).  In this
case, RAIF can be configured to use flash to keep replicas of the most
important data only.   And yes, thanks to the stackable nature of RAIF no
explicit flash support is required.  RAIF can reuse existing file systems
designed for flash media (e.g., JFFS2).

Nikolai.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-15 Thread Nikolai Joukov
 Nikolai Joukov wrote:
   Nikolai Joukov wrote:
We have designed a new stackable file system that we called RAIF:
Redundant Array of Independent Filesystems.
  
   Great!
  
We have performed some benchmarking on a 3GHz PC with 2GB of RAM and
U320 SCSI disks.  Compared to the Linux RAID driver, RAIF has
overheads of about 20-25% under the Postmark v1.5 benchmark in case of
striping and replication.  In case of RAID4 and RAID5-like
configurations, RAIF performed about two times *better* than software
RAID and even better than an Adaptec 2120S RAID5 controller.
  
   I am not surprised.  RAID 4/5/6 performance is highly sensitive to the
   underlying hw, and thus needs a fair amount of fine tuning.
 
  Nevertheless, performance is not the biggest advantage of RAIF.  For
  read-biased workloads RAID is always slightly faster than RAIF.  The
  biggest advantages of RAIF are flexible configurations (e.g., can combine
  NFS and local file systems), per-file-type storage policies, and the fact
  that files are stored as files on the lower file systems (which is
  convenient).

 Ok, a I was just about to inform you of a three nfs-branch raif which was
 unable to fill the net pipe.  So it looks like a 25% performance hit across
 the board.  Should be possible to reduce to sub 3% though once RAIF matures,
 don't you think?

Hmmm.  Which workload did you try?  Which RAIF level did you use: RAIF0
(striping), replication (RAIF1, default), or striping with parity (RAIF4,
5, 6)?  Which hardware did you use?  RAIF has to consume extra CPU time
and on older machines the overheads seem to be higher (CPUs are getting
faster than I/O devices at a faster pace).  Also, I guess you are
comparing RAIF mounted over three NFS branches with NFS alone, right?
It doesn't seem to be very fair to me :-)

Recently we solved the double-caching problem, which improved RAIF's
performance by an order of magnitude under I/O-intensive workloads.
(Normally, Linux stackable file systems cache the data twice.)
Unfortunately, many VFS meta-operations are synchronous (e.g., lookup).
RAIF has to wait on such operations in sequence for every branch
involved.  (This is different from, say, readpage operation.  We
call readpage on all the branches right away and then wait for their
simultaneous operation.)  Sequential waiting on lookups should be OK for
multi-threaded workloads but may result in extra elapsed time for the
single-threaded workloads.  Again, elegant solutions may require VFS API
changes.  Alternatively, we can create kernel threads for every branch.

I am not sure about 3% overheads in all the cases compared to NFS alone.
On one hand, there should be some price to pay for the extra
functionality.  On the other hand, for some workloads RAIF should
even improve performance compared to a single NFS because of the load
distribution.  In general, I agree that there are still many things we can
optimize.

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-15 Thread Nikolai Joukov
 Nikolai Joukov wrote:
We started the project in April 2004.  Right now I am using it as my
/home/kolya file system at home.  We believe that at this stage RAIF
is mature enough for others to try it out.  The code is available at:
   
ftp://ftp.fsl.cs.sunysb.edu/pub/raif/
   
The code requires no kernel patches and compiles for a wide range of
kernels as a module.  The latest kernel we used it for is 2.6.13 and
we are in the process of porting it to 2.6.19.
   
We will be happy to hear your back.
  
   When removing a file from the underlying branch, the oops below happens.
   Wouldn't it be possible to just fail the branch instead of oopsing?
 
  This is a known problem of all Linux stackable file systems.  Users are
  not supposed to change the file systems below mounted stackable file
  systems (but they can read them).  One of the ways to enforce it is to use
  overlay mounts.  For example, mount the lower file systems at
  /raif/b0 ... /raif/bN and then mount RAIF at /raif.  Stackable file
  systems recently started getting into the kernel and we hope that there
  will be a better solution for this problem in the future.  Having said
  that, you are right: failing the branch would be the right thing to do.

 Good.  It seems that there is also some tmpfs/raif-over-nfs deadlock
 situation.  Can't really tell if it's the kernel or the raif, but when do
 you think the patches could be brought into sync with the current mainline?

It would be great if you could send us more details about how to recreate
this deadlock and we will take a look at it.  It would be even better if
you and everybody else who finds bugs in RAIF submit the bug reports to:

  http://bugzilla.fsl.cs.sunysb.edu

We are in the process of porting RAIF to 2.6.19 right now.  Should be done
in early January.  The trick is that we are trying to keep the same source
good for a wide range of kernel versions.  In fact, not too long ago we
even were able to compile it for 2.4.24!

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-15 Thread Nikolai Joukov
 On Wednesday 13 December 2006 12:47, Nikolai Joukov wrote:
  We have designed a new stackable file system that we called RAIF:
  Redundant Array of Independent Filesystems

 Do you have a function similar to an an EMC cloneset?   Basicily a cloneset
 tracks what has changed in both the source and target luns (drives).  When one
 updates the cloneset the target is made identical to the source.  Its a great
 way to do backups.  Its an important feature to be able to write to the 
 target drives.
 I would love to see this working at a filesystem level.

Well, if you mount RAIF over your file system and a for-backups file
system, RAIF can replicate the files on both of them automatically.  I
guess that's what you need.

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-15 Thread Ed Tomlinson
On Friday 15 December 2006 15:11, Nikolai Joukov wrote:
  On Wednesday 13 December 2006 12:47, Nikolai Joukov wrote:
   We have designed a new stackable file system that we called RAIF:
   Redundant Array of Independent Filesystems
 
  Do you have a function similar to an an EMC cloneset?   Basicily a cloneset
  tracks what has changed in both the source and target luns (drives).  When 
  one
  updates the cloneset the target is made identical to the source.  Its a 
  great
  way to do backups.  Its an important feature to be able to write to the 
  target drives.
  I would love to see this working at a filesystem level.
 
 Well, if you mount RAIF over your file system and a for-backups file
 system, RAIF can replicate the files on both of them automatically.  I
 guess that's what you need.

Yes and no.  The idea behind the cloneset is that most of the blocks (or files)
do not change in either source or target.  This being the case its only 
necessary
to update the changed elements.  This means updates are incremental.  Once
the system has figured out what it needs to update its usable and if you access
an element that should be updated you will see the correctly updated version - 
even 
though backgound resyncing is still in progress.  This type of logic is great 
for backups.

Thanks
Ed Tomlinson
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-15 Thread Nikolai Joukov
 The idea behind the cloneset is that most of the blocks (or files)
 do not change in either source or target.  This being the case its only
 necessary
 to update the changed elements.  This means updates are incremental. Once
 the system has figured out what it needs to update its usable and if you
 access
 an element that should be updated you will see the correctly updated
 version - even
 though backgound resyncing is still in progress.

 I still can't tell what you're describing.  With RAID1 as well, only
 changed elements ever get updated.  I have two identical filesystems,
 members of a RAIF set.  I change one file.  One file in each member
 filesystem gets updated, and I again have two identical filesystems.

 How would a cloneset work differently, and how would it be better?

Thanks, Bryan.  I was about to write almost the same.

  This type of logic is great for backups.

 Can you give an example of using it for backup?

I guess, you can mount Versionfs (yet another stackable file system)
below RAIF and above one of the lower file systems or use some other
versioning file system such as ext3cow.  This will allow rolling back to
any older file system version at any time.

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-14 Thread Al Boldi
Nikolai Joukov wrote:
> > Nikolai Joukov wrote:
> > > We have designed a new stackable file system that we called RAIF:
> > > Redundant Array of Independent Filesystems.
> >
> > Great!
> >
> > > We have performed some benchmarking on a 3GHz PC with 2GB of RAM and
> > > U320 SCSI disks.  Compared to the Linux RAID driver, RAIF has
> > > overheads of about 20-25% under the Postmark v1.5 benchmark in case of
> > > striping and replication.  In case of RAID4 and RAID5-like
> > > configurations, RAIF performed about two times *better* than software
> > > RAID and even better than an Adaptec 2120S RAID5 controller.
> >
> > I am not surprised.  RAID 4/5/6 performance is highly sensitive to the
> > underlying hw, and thus needs a fair amount of fine tuning.
>
> Nevertheless, performance is not the biggest advantage of RAIF.  For
> read-biased workloads RAID is always slightly faster than RAIF.  The
> biggest advantages of RAIF are flexible configurations (e.g., can combine
> NFS and local file systems), per-file-type storage policies, and the fact
> that files are stored as files on the lower file systems (which is
> convenient).

Ok, a I was just about to inform you of a three nfs-branch raif which was 
unable to fill the net pipe.  So it looks like a 25% performance hit across 
the board.  Should be possible to reduce to sub 3% though once RAIF matures, 
don't you think?


> > > This is because RAIF is located above
> > > file system caches and can cache parity as normal data when needed. 
> > > We have more performance details in a technical report, if anyone is
> > > interested.
> >
> > Definitely interested.  Can you give a link?
>
> The main focus of the paper is on a general OS profiling method and not
> on RAIF.  However, it has some details about the RAIF benchmarking with
> Postmark in Chapter 9:
>
>   
>
> Figures 9.7 and 9.8 also show profiles of the Linux RAID5 and RAIF5
> operation under the same Postmark workload.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-14 Thread Al Boldi
Nikolai Joukov wrote:
> > > We started the project in April 2004.  Right now I am using it as my
> > > /home/kolya file system at home.  We believe that at this stage RAIF
> > > is mature enough for others to try it out.  The code is available at:
> > >
> > >   
> > >
> > > The code requires no kernel patches and compiles for a wide range of
> > > kernels as a module.  The latest kernel we used it for is 2.6.13 and
> > > we are in the process of porting it to 2.6.19.
> > >
> > > We will be happy to hear your back.
> >
> > When removing a file from the underlying branch, the oops below happens.
> > Wouldn't it be possible to just fail the branch instead of oopsing?
>
> This is a known problem of all Linux stackable file systems.  Users are
> not supposed to change the file systems below mounted stackable file
> systems (but they can read them).  One of the ways to enforce it is to use
> overlay mounts.  For example, mount the lower file systems at
> /raif/b0 ... /raif/bN and then mount RAIF at /raif.  Stackable file
> systems recently started getting into the kernel and we hope that there
> will be a better solution for this problem in the future.  Having said
> that, you are right: failing the branch would be the right thing to do.

Good.  It seems that there is also some tmpfs/raif-over-nfs deadlock 
situation.  Can't really tell if it's the kernel or the raif, but when do 
you think the patches could be brought into sync with the current mainline?


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-14 Thread Nikolai Joukov
> Well, Congratulations, Doctor!!  [Must be nice to be exiled to Stony
> Brook!!  Oh, well, not I]

Long Island is a very nice place with lots of vineries and perfect sand
beaches - don't envy :-)

> Here's hoping that source exists, and that it is available for us.

I guess, you are subscribed to the linux-raid list only.  Unfortunately, I
didn't CC my post to that list and one of the replies was CC'd there
without the link.  The original post is available here:

  

And the link to the sources is:

  

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-14 Thread Nikolai Joukov
> > We started the project in April 2004.  Right now I am using it as my
> > /home/kolya file system at home.  We believe that at this stage RAIF is
> > mature enough for others to try it out.  The code is available at:
> >
> > 
> >
> > The code requires no kernel patches and compiles for a wide range of
> > kernels as a module.  The latest kernel we used it for is 2.6.13 and we
> > are in the process of porting it to 2.6.19.
> >
> > We will be happy to hear your back.
>
> When removing a file from the underlying branch, the oops below happens.
> Wouldn't it be possible to just fail the branch instead of oopsing?

This is a known problem of all Linux stackable file systems.  Users are
not supposed to change the file systems below mounted stackable file
systems (but they can read them).  One of the ways to enforce it is to use
overlay mounts.  For example, mount the lower file systems at
/raif/b0 ... /raif/bN and then mount RAIF at /raif.  Stackable file
systems recently started getting into the kernel and we hope that there
will be a better solution for this problem in the future.  Having said
that, you are right: failing the branch would be the right thing to do.

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-14 Thread berk walker



Nikolai Joukov wrote:


  

Figures 9.7 and 9.8 also show profiles of the Linux RAID5 and RAIF5
operation under the same Postmark workload.

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University

  


Well, Congratulations, Doctor!!  [Must be nice to be exiled to Stony 
Brook!!  Oh, well, not I]


For some reason, I can not connect to the above link, but I may not need 
to.  Does [should] it contain a link/pointer to the underlying source 
code?  This concept sounds very interesting, and I am sure that many of 
us would like to look closer, and maybe even get a taste.



Here's hoping that source exists, and that it is available for us.

Thanks
b-

  
-

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-14 Thread Charles Manning
On Friday 15 December 2006 10:01, Nikolai Joukov wrote:
> > Nikolai Joukov wrote:
> > > We have designed a new stackable file system that we called RAIF:
> > > Redundant Array of Independent Filesystems.
> >
> > Great!

Yes, definitely...

I see the major benefit being in the mobile, industrial and embedded systems 
arena. Perhaps this might come as a suprise to people, but a very large and 
ever growing number (perhaps even most) Linux devices don't use block devices 
for storage. Instead they use flash file systems or nfs, niether of which use 
local block devices.

It looks like RAIF gives a way to provide redundancy etc on these devices.


> >
> > > We have performed some benchmarking on a 3GHz PC with 2GB of RAM and
> > > U320 SCSI disks.  Compared to the Linux RAID driver, RAIF has overheads
> > > of about 20-25% under the Postmark v1.5 benchmark in case of striping
> > > and replication.  In case of RAID4 and RAID5-like configurations, RAIF
> > > performed about two times *better* than software RAID and even better
> > > than an Adaptec 2120S RAID5 controller.
> >
> > I am not surprised.  RAID 4/5/6 performance is highly sensitive to the
> > underlying hw, and thus needs a fair amount of fine tuning.
>
> Nevertheless, performance is not the biggest advantage of RAIF.  For
> read-biased workloads RAID is always slightly faster than RAIF.  The
> biggest advantages of RAIF are flexible configurations (e.g., can combine
> NFS and local file systems), per-file-type storage policies, and the fact
> that files are stored as files on the lower file systems (which is
> convenient).
>
> > > This is because RAIF is located above
> > > file system caches and can cache parity as normal data when needed.  We
> > > have more performance details in a technical report, if anyone is
> > > interested.
> >
> > Definitely interested.  Can you give a link?
>
> The main focus of the paper is on a general OS profiling method and not
> on RAIF.  However, it has some details about the RAIF benchmarking with
> Postmark in Chapter 9:
>
>   
>
> Figures 9.7 and 9.8 also show profiles of the Linux RAID5 and RAIF5
> operation under the same Postmark workload.
>
> Nikolai.
> -
> Nikolai Joukov, Ph.D.
> Filesystems and Storage Laboratory
> Stony Brook University
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-14 Thread Nikolai Joukov
> Nikolai Joukov wrote:
> > We have designed a new stackable file system that we called RAIF:
> > Redundant Array of Independent Filesystems.
>
> Great!
>
> > We have performed some benchmarking on a 3GHz PC with 2GB of RAM and U320
> > SCSI disks.  Compared to the Linux RAID driver, RAIF has overheads of
> > about 20-25% under the Postmark v1.5 benchmark in case of striping and
> > replication.  In case of RAID4 and RAID5-like configurations, RAIF
> > performed about two times *better* than software RAID and even better than
> > an Adaptec 2120S RAID5 controller.
>
> I am not surprised.  RAID 4/5/6 performance is highly sensitive to the
> underlying hw, and thus needs a fair amount of fine tuning.

Nevertheless, performance is not the biggest advantage of RAIF.  For
read-biased workloads RAID is always slightly faster than RAIF.  The
biggest advantages of RAIF are flexible configurations (e.g., can combine
NFS and local file systems), per-file-type storage policies, and the fact
that files are stored as files on the lower file systems (which is
convenient).

> > This is because RAIF is located above
> > file system caches and can cache parity as normal data when needed.  We
> > have more performance details in a technical report, if anyone is
> > interested.
>
> Definitely interested.  Can you give a link?

The main focus of the paper is on a general OS profiling method and not
on RAIF.  However, it has some details about the RAIF benchmarking with
Postmark in Chapter 9:

  

Figures 9.7 and 9.8 also show profiles of the Linux RAID5 and RAIF5
operation under the same Postmark workload.

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-14 Thread Nikolai Joukov
 Nikolai Joukov wrote:
  We have designed a new stackable file system that we called RAIF:
  Redundant Array of Independent Filesystems.

 Great!

  We have performed some benchmarking on a 3GHz PC with 2GB of RAM and U320
  SCSI disks.  Compared to the Linux RAID driver, RAIF has overheads of
  about 20-25% under the Postmark v1.5 benchmark in case of striping and
  replication.  In case of RAID4 and RAID5-like configurations, RAIF
  performed about two times *better* than software RAID and even better than
  an Adaptec 2120S RAID5 controller.

 I am not surprised.  RAID 4/5/6 performance is highly sensitive to the
 underlying hw, and thus needs a fair amount of fine tuning.

Nevertheless, performance is not the biggest advantage of RAIF.  For
read-biased workloads RAID is always slightly faster than RAIF.  The
biggest advantages of RAIF are flexible configurations (e.g., can combine
NFS and local file systems), per-file-type storage policies, and the fact
that files are stored as files on the lower file systems (which is
convenient).

  This is because RAIF is located above
  file system caches and can cache parity as normal data when needed.  We
  have more performance details in a technical report, if anyone is
  interested.

 Definitely interested.  Can you give a link?

The main focus of the paper is on a general OS profiling method and not
on RAIF.  However, it has some details about the RAIF benchmarking with
Postmark in Chapter 9:

  http://www.fsl.cs.sunysb.edu/docs/joukov-phdthesis/thesis.pdf

Figures 9.7 and 9.8 also show profiles of the Linux RAID5 and RAIF5
operation under the same Postmark workload.

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-14 Thread Charles Manning
On Friday 15 December 2006 10:01, Nikolai Joukov wrote:
  Nikolai Joukov wrote:
   We have designed a new stackable file system that we called RAIF:
   Redundant Array of Independent Filesystems.
 
  Great!

Yes, definitely...

I see the major benefit being in the mobile, industrial and embedded systems 
arena. Perhaps this might come as a suprise to people, but a very large and 
ever growing number (perhaps even most) Linux devices don't use block devices 
for storage. Instead they use flash file systems or nfs, niether of which use 
local block devices.

It looks like RAIF gives a way to provide redundancy etc on these devices.


 
   We have performed some benchmarking on a 3GHz PC with 2GB of RAM and
   U320 SCSI disks.  Compared to the Linux RAID driver, RAIF has overheads
   of about 20-25% under the Postmark v1.5 benchmark in case of striping
   and replication.  In case of RAID4 and RAID5-like configurations, RAIF
   performed about two times *better* than software RAID and even better
   than an Adaptec 2120S RAID5 controller.
 
  I am not surprised.  RAID 4/5/6 performance is highly sensitive to the
  underlying hw, and thus needs a fair amount of fine tuning.

 Nevertheless, performance is not the biggest advantage of RAIF.  For
 read-biased workloads RAID is always slightly faster than RAIF.  The
 biggest advantages of RAIF are flexible configurations (e.g., can combine
 NFS and local file systems), per-file-type storage policies, and the fact
 that files are stored as files on the lower file systems (which is
 convenient).

   This is because RAIF is located above
   file system caches and can cache parity as normal data when needed.  We
   have more performance details in a technical report, if anyone is
   interested.
 
  Definitely interested.  Can you give a link?

 The main focus of the paper is on a general OS profiling method and not
 on RAIF.  However, it has some details about the RAIF benchmarking with
 Postmark in Chapter 9:

   http://www.fsl.cs.sunysb.edu/docs/joukov-phdthesis/thesis.pdf

 Figures 9.7 and 9.8 also show profiles of the Linux RAID5 and RAIF5
 operation under the same Postmark workload.

 Nikolai.
 -
 Nikolai Joukov, Ph.D.
 Filesystems and Storage Laboratory
 Stony Brook University
 -
 To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-14 Thread berk walker



Nikolai Joukov wrote:


  http://www.fsl.cs.sunysb.edu/docs/joukov-phdthesis/thesis.pdf

Figures 9.7 and 9.8 also show profiles of the Linux RAID5 and RAIF5
operation under the same Postmark workload.

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University

  


Well, Congratulations, Doctor!!  [Must be nice to be exiled to Stony 
Brook!!  Oh, well, not I]


For some reason, I can not connect to the above link, but I may not need 
to.  Does [should] it contain a link/pointer to the underlying source 
code?  This concept sounds very interesting, and I am sure that many of 
us would like to look closer, and maybe even get a taste.



Here's hoping that source exists, and that it is available for us.

Thanks
b-

  
-

To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-14 Thread Nikolai Joukov
  We started the project in April 2004.  Right now I am using it as my
  /home/kolya file system at home.  We believe that at this stage RAIF is
  mature enough for others to try it out.  The code is available at:
 
  ftp://ftp.fsl.cs.sunysb.edu/pub/raif/
 
  The code requires no kernel patches and compiles for a wide range of
  kernels as a module.  The latest kernel we used it for is 2.6.13 and we
  are in the process of porting it to 2.6.19.
 
  We will be happy to hear your back.

 When removing a file from the underlying branch, the oops below happens.
 Wouldn't it be possible to just fail the branch instead of oopsing?

This is a known problem of all Linux stackable file systems.  Users are
not supposed to change the file systems below mounted stackable file
systems (but they can read them).  One of the ways to enforce it is to use
overlay mounts.  For example, mount the lower file systems at
/raif/b0 ... /raif/bN and then mount RAIF at /raif.  Stackable file
systems recently started getting into the kernel and we hope that there
will be a better solution for this problem in the future.  Having said
that, you are right: failing the branch would be the right thing to do.

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-14 Thread Nikolai Joukov
 Well, Congratulations, Doctor!!  [Must be nice to be exiled to Stony
 Brook!!  Oh, well, not I]

Long Island is a very nice place with lots of vineries and perfect sand
beaches - don't envy :-)

 Here's hoping that source exists, and that it is available for us.

I guess, you are subscribed to the linux-raid list only.  Unfortunately, I
didn't CC my post to that list and one of the replies was CC'd there
without the link.  The original post is available here:

  http://marc.theaimsgroup.com/?l=linux-fsdevelm=116603282106036w=2

And the link to the sources is:

  ftp://ftp.fsl.cs.sunysb.edu/pub/raif/

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-14 Thread Al Boldi
Nikolai Joukov wrote:
   We started the project in April 2004.  Right now I am using it as my
   /home/kolya file system at home.  We believe that at this stage RAIF
   is mature enough for others to try it out.  The code is available at:
  
 ftp://ftp.fsl.cs.sunysb.edu/pub/raif/
  
   The code requires no kernel patches and compiles for a wide range of
   kernels as a module.  The latest kernel we used it for is 2.6.13 and
   we are in the process of porting it to 2.6.19.
  
   We will be happy to hear your back.
 
  When removing a file from the underlying branch, the oops below happens.
  Wouldn't it be possible to just fail the branch instead of oopsing?

 This is a known problem of all Linux stackable file systems.  Users are
 not supposed to change the file systems below mounted stackable file
 systems (but they can read them).  One of the ways to enforce it is to use
 overlay mounts.  For example, mount the lower file systems at
 /raif/b0 ... /raif/bN and then mount RAIF at /raif.  Stackable file
 systems recently started getting into the kernel and we hope that there
 will be a better solution for this problem in the future.  Having said
 that, you are right: failing the branch would be the right thing to do.

Good.  It seems that there is also some tmpfs/raif-over-nfs deadlock 
situation.  Can't really tell if it's the kernel or the raif, but when do 
you think the patches could be brought into sync with the current mainline?


Thanks!

--
Al

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-14 Thread Al Boldi
Nikolai Joukov wrote:
  Nikolai Joukov wrote:
   We have designed a new stackable file system that we called RAIF:
   Redundant Array of Independent Filesystems.
 
  Great!
 
   We have performed some benchmarking on a 3GHz PC with 2GB of RAM and
   U320 SCSI disks.  Compared to the Linux RAID driver, RAIF has
   overheads of about 20-25% under the Postmark v1.5 benchmark in case of
   striping and replication.  In case of RAID4 and RAID5-like
   configurations, RAIF performed about two times *better* than software
   RAID and even better than an Adaptec 2120S RAID5 controller.
 
  I am not surprised.  RAID 4/5/6 performance is highly sensitive to the
  underlying hw, and thus needs a fair amount of fine tuning.

 Nevertheless, performance is not the biggest advantage of RAIF.  For
 read-biased workloads RAID is always slightly faster than RAIF.  The
 biggest advantages of RAIF are flexible configurations (e.g., can combine
 NFS and local file systems), per-file-type storage policies, and the fact
 that files are stored as files on the lower file systems (which is
 convenient).

Ok, a I was just about to inform you of a three nfs-branch raif which was 
unable to fill the net pipe.  So it looks like a 25% performance hit across 
the board.  Should be possible to reduce to sub 3% though once RAIF matures, 
don't you think?


   This is because RAIF is located above
   file system caches and can cache parity as normal data when needed. 
   We have more performance details in a technical report, if anyone is
   interested.
 
  Definitely interested.  Can you give a link?

 The main focus of the paper is on a general OS profiling method and not
 on RAIF.  However, it has some details about the RAIF benchmarking with
 Postmark in Chapter 9:

   http://www.fsl.cs.sunysb.edu/docs/joukov-phdthesis/thesis.pdf

 Figures 9.7 and 9.8 also show profiles of the Linux RAID5 and RAIF5
 operation under the same Postmark workload.


Thanks!

--
Al

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-13 Thread Nikolai Joukov
> >We have designed a new stackable file system that we called RAIF:
> >Redundant Array of Independent Filesystems.
> >
> >Similar to Unionfs, RAIF is a fan-out file system and can be mounted over
> >many different disk-based, memory, network, and distributed file systems.
> >RAIF can use the stable and maintained code of the other file systems and
> >thus stay simple itself.  Similar to standard RAID, RAIF can replicate the
> >data or store it with parity on any subset of the lower file systems.  RAIF
> >has three main advantages over traditional driver-level RAID systems:
> >
> >1. RAIF can be mounted over any set of file systems.  This allows users to
> >   create many more useful configurations.  For example, it is possible to
> >   replicate the data on the local and remote disks, and stripe the data on
> >   the local hard drives and keep the parity (or even ECC to tolerate
> >   multiple failures) on the remote server(s).  In the latter case, all the
> >   read requests will be satisfied from the fast local disks and no local
> >   disk space will be spent on parity.
>
> As for striping on a simplistic level, look at the Equal File
> Distribution patch for unionfs :-)
>
> http://www.mail-archive.com/unionfs@mail.fsl.cs.sunysb.edu/msg01936.html
>
> Files are stored normally so that after the union is unmounted, the
> files appear in one piece (unlike real RAID0 over two block devices).

RAIF supports rules that describe how to store particular files or groups
of files.  A rule with RAIF level 0 (which is similar to RAID level 0) and
a special striping unit size = '-1' will do the same (distribute the
files on the lower file systems) for files that match any given file name
pattern.  A rule with level 4 and striping unit size = '-1' will
distribute files on several file systems and store an extra copy of the
files on a dedicated file system (e.g., an NFS mount with lots of space).
Now guess what RAIF's level 6 will do with a special striping unit
size = '-1' :-)

Nikolai.

Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-13 Thread Jan Engelhardt

>We have designed a new stackable file system that we called RAIF:
>Redundant Array of Independent Filesystems.
>
>Similar to Unionfs, RAIF is a fan-out file system and can be mounted over
>many different disk-based, memory, network, and distributed file systems.
>RAIF can use the stable and maintained code of the other file systems and
>thus stay simple itself.  Similar to standard RAID, RAIF can replicate the
>data or store it with parity on any subset of the lower file systems.  RAIF
>has three main advantages over traditional driver-level RAID systems:
>
>1. RAIF can be mounted over any set of file systems.  This allows users to
>   create many more useful configurations.  For example, it is possible to
>   replicate the data on the local and remote disks, and stripe the data on
>   the local hard drives and keep the parity (or even ECC to tolerate
>   multiple failures) on the remote server(s).  In the latter case, all the
>   read requests will be satisfied from the fast local disks and no local
>   disk space will be spent on parity.

As for striping on a simplistic level, look at the Equal File 
Distribution patch for unionfs :-)

http://www.mail-archive.com/unionfs@mail.fsl.cs.sunysb.edu/msg01936.html

Files are stored normally so that after the union is unmounted, the 
files appear in one piece (unlike real RAID0 over two block devices).


-`J'
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-13 Thread Nikolai Joukov
> > Nikolai Joukov wrote:
> > > replication.  In case of RAID4 and RAID5-like configurations, RAIF 
> > > performed
> > > about two times *better* than software RAID and even better than an 
> > > Adaptec
> > > 2120S RAID5 controller.  This is because RAIF is located above file system
> > > caches and can cache parity as normal data when needed.  We have more
> > > performance details in a technical report, if anyone is interested.
> >
> > This doesn't make sense to me.  You do not want to cache the parity
> > data.  It only needs to be used to validate the data blocks when the
> > stripe is read, and after that, you only want to cache the data, and
> > throw out the parity.  Caching the parity as well will pollute the cache
> > and thus, should lower performance due to more important data being
> > thrown out.
>
> This happens automatically: unused parity pages are treated as unused
> pages and get reused to cache something else.  Also, the parity
> never gets cached if you do not write the data (or recover the data).
> However, if you use the same parity page over and over you do not need to
> fetch it from the disk again.

To avoid confusion here: data recovery is not the only situation when it
is necessary to read the parity.  Existing parity is also necessary for
writes that are smaller than the page size.

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-13 Thread Nikolai Joukov
> Nikolai Joukov wrote:
> > replication.  In case of RAID4 and RAID5-like configurations, RAIF performed
> > about two times *better* than software RAID and even better than an Adaptec
> > 2120S RAID5 controller.  This is because RAIF is located above file system
> > caches and can cache parity as normal data when needed.  We have more
> > performance details in a technical report, if anyone is interested.
>
> This doesn't make sense to me.  You do not want to cache the parity
> data.  It only needs to be used to validate the data blocks when the
> stripe is read, and after that, you only want to cache the data, and
> throw out the parity.  Caching the parity as well will pollute the cache
> and thus, should lower performance due to more important data being
> thrown out.

This happens automatically: unused parity pages are treated as unused
pages and get reused to cache something else.  Also, the parity
never gets cached if you do not write the data (or recover the data).
However, if you use the same parity page over and over you do not need to
fetch it from the disk again.

By the way, unlike most other stackable file systems, RAIF does not cache
the data (or parity) multiple times: it only caches the data at its own
level and not at the level of the lower file systems.

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-13 Thread Phillip Susi

Nikolai Joukov wrote:

replication.  In case of RAID4 and RAID5-like configurations, RAIF performed
about two times *better* than software RAID and even better than an Adaptec
2120S RAID5 controller.  This is because RAIF is located above file system
caches and can cache parity as normal data when needed.  We have more
performance details in a technical report, if anyone is interested.


This doesn't make sense to me.  You do not want to cache the parity 
data.  It only needs to be used to validate the data blocks when the 
stripe is read, and after that, you only want to cache the data, and 
throw out the parity.  Caching the parity as well will pollute the cache 
and thus, should lower performance due to more important data being 
thrown out.





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-13 Thread Phillip Susi

Nikolai Joukov wrote:

replication.  In case of RAID4 and RAID5-like configurations, RAIF performed
about two times *better* than software RAID and even better than an Adaptec
2120S RAID5 controller.  This is because RAIF is located above file system
caches and can cache parity as normal data when needed.  We have more
performance details in a technical report, if anyone is interested.


This doesn't make sense to me.  You do not want to cache the parity 
data.  It only needs to be used to validate the data blocks when the 
stripe is read, and after that, you only want to cache the data, and 
throw out the parity.  Caching the parity as well will pollute the cache 
and thus, should lower performance due to more important data being 
thrown out.





-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-13 Thread Nikolai Joukov
 Nikolai Joukov wrote:
  replication.  In case of RAID4 and RAID5-like configurations, RAIF performed
  about two times *better* than software RAID and even better than an Adaptec
  2120S RAID5 controller.  This is because RAIF is located above file system
  caches and can cache parity as normal data when needed.  We have more
  performance details in a technical report, if anyone is interested.

 This doesn't make sense to me.  You do not want to cache the parity
 data.  It only needs to be used to validate the data blocks when the
 stripe is read, and after that, you only want to cache the data, and
 throw out the parity.  Caching the parity as well will pollute the cache
 and thus, should lower performance due to more important data being
 thrown out.

This happens automatically: unused parity pages are treated as unused
pages and get reused to cache something else.  Also, the parity
never gets cached if you do not write the data (or recover the data).
However, if you use the same parity page over and over you do not need to
fetch it from the disk again.

By the way, unlike most other stackable file systems, RAIF does not cache
the data (or parity) multiple times: it only caches the data at its own
level and not at the level of the lower file systems.

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-13 Thread Nikolai Joukov
  Nikolai Joukov wrote:
   replication.  In case of RAID4 and RAID5-like configurations, RAIF 
   performed
   about two times *better* than software RAID and even better than an 
   Adaptec
   2120S RAID5 controller.  This is because RAIF is located above file system
   caches and can cache parity as normal data when needed.  We have more
   performance details in a technical report, if anyone is interested.
 
  This doesn't make sense to me.  You do not want to cache the parity
  data.  It only needs to be used to validate the data blocks when the
  stripe is read, and after that, you only want to cache the data, and
  throw out the parity.  Caching the parity as well will pollute the cache
  and thus, should lower performance due to more important data being
  thrown out.

 This happens automatically: unused parity pages are treated as unused
 pages and get reused to cache something else.  Also, the parity
 never gets cached if you do not write the data (or recover the data).
 However, if you use the same parity page over and over you do not need to
 fetch it from the disk again.

To avoid confusion here: data recovery is not the only situation when it
is necessary to read the parity.  Existing parity is also necessary for
writes that are smaller than the page size.

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-13 Thread Jan Engelhardt

We have designed a new stackable file system that we called RAIF:
Redundant Array of Independent Filesystems.

Similar to Unionfs, RAIF is a fan-out file system and can be mounted over
many different disk-based, memory, network, and distributed file systems.
RAIF can use the stable and maintained code of the other file systems and
thus stay simple itself.  Similar to standard RAID, RAIF can replicate the
data or store it with parity on any subset of the lower file systems.  RAIF
has three main advantages over traditional driver-level RAID systems:

1. RAIF can be mounted over any set of file systems.  This allows users to
   create many more useful configurations.  For example, it is possible to
   replicate the data on the local and remote disks, and stripe the data on
   the local hard drives and keep the parity (or even ECC to tolerate
   multiple failures) on the remote server(s).  In the latter case, all the
   read requests will be satisfied from the fast local disks and no local
   disk space will be spent on parity.

As for striping on a simplistic level, look at the Equal File 
Distribution patch for unionfs :-)

http://www.mail-archive.com/unionfs@mail.fsl.cs.sunysb.edu/msg01936.html

Files are stored normally so that after the union is unmounted, the 
files appear in one piece (unlike real RAID0 over two block devices).


-`J'
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-13 Thread Nikolai Joukov
 We have designed a new stackable file system that we called RAIF:
 Redundant Array of Independent Filesystems.
 
 Similar to Unionfs, RAIF is a fan-out file system and can be mounted over
 many different disk-based, memory, network, and distributed file systems.
 RAIF can use the stable and maintained code of the other file systems and
 thus stay simple itself.  Similar to standard RAID, RAIF can replicate the
 data or store it with parity on any subset of the lower file systems.  RAIF
 has three main advantages over traditional driver-level RAID systems:
 
 1. RAIF can be mounted over any set of file systems.  This allows users to
create many more useful configurations.  For example, it is possible to
replicate the data on the local and remote disks, and stripe the data on
the local hard drives and keep the parity (or even ECC to tolerate
multiple failures) on the remote server(s).  In the latter case, all the
read requests will be satisfied from the fast local disks and no local
disk space will be spent on parity.

 As for striping on a simplistic level, look at the Equal File
 Distribution patch for unionfs :-)

 http://www.mail-archive.com/unionfs@mail.fsl.cs.sunysb.edu/msg01936.html

 Files are stored normally so that after the union is unmounted, the
 files appear in one piece (unlike real RAID0 over two block devices).

RAIF supports rules that describe how to store particular files or groups
of files.  A rule with RAIF level 0 (which is similar to RAID level 0) and
a special striping unit size = '-1' will do the same (distribute the
files on the lower file systems) for files that match any given file name
pattern.  A rule with level 4 and striping unit size = '-1' will
distribute files on several file systems and store an extra copy of the
files on a dedicated file system (e.g., an NFS mount with lots of space).
Now guess what RAIF's level 6 will do with a special striping unit
size = '-1' :-)

Nikolai.

Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/