Re: interesting use case for multiple devices and delayed raid?

2009-04-02 Thread Chris Mason
On Thu, 2009-04-02 at 16:41 +1100, Dmitri Nikulin wrote:
 On Thu, Apr 2, 2009 at 8:04 AM, Brian J. Murrell br...@interlinx.bc.ca 
 wrote:
  A more complete solution, that requires no software changes, would be
  to have 3 or 4 disks. A stripe for really fast reads and writes, and
  another disk (or another stripe) to act as a slave to the data being
  written to the primary stripe. This seems to do what you want, at a
  small price premium.
 
  No.  That's not really what I am describing at all.
 
 Well you get the bandwidth of 2 disks when reading and writing, and
 still mirrored to a second stripe as time permits. Kind of like
 delayed RAID10.
 
  I apologize if my original description was unclear.  Hopefully it is
  more so now.
 
 Yes. It'll be up to the actual filesystem devs to weigh in on whether
 it's worth implementing.
 

It's an interesting idea, but I think we've got fast front end devices
higher up on the todo list.  That will still support the destaging to
slower disks idea, but will be more flexible overall.

-chris


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: interesting use case for multiple devices and delayed raid?

2009-04-01 Thread Dmitri Nikulin
On Wed, Apr 1, 2009 at 8:17 PM, Brian J. Murrell br...@interlinx.bc.ca wrote:
 I have a use case that I wonder if anyone might find interesting
 involving multiple device support and delayed raid.

 Let's say I have a system with two disks of equal size (to make it easy)
 which has sporadic, heavy, write requirements.  At some points in time
 there will be multiple files being appended to simultaneously and at
 other times, there will be no activity at all.

 The write activity is time sensitive, however, so the filesystem must be
 able to provide guaranteed (only in a loose sense -- not looking for
 real QoS reservation semantics) bandwidths at times.  Let's say slightly
 (but within the realm of reality) less than the bandwidth of the two
 disks combined.

I assume you mean read bandwidth, since write bandwidth cannot be
increased by mirroring, only striping. If you intend to stripe first,
then mirror later as time permits, this is the kind of sophistication
you will need to write in the program code itself.

A filesystem is a handy abstraction, but you are by no means limited
to using it. If you have very special needs, you can get pretty far by
writing your own meta-filesystem to add semantics you don't have in
your kernel filesystem of choice. That's what every single database
application does. You can get even further by writing a complete
user-space filesystem as part of your program, or a shared daemon, and
the performance isn't really that bad.

 I also want both the metadata and file data mirrored between the two
 disks so that I can afford to lose one of the disks and not lose (most
 of) my data.  It is not a strict requirement that all data be
 immediately mirrored however.

This is handled by DragonFly BSD's HAMMER filesystem. A master gets
written to, and asynchronously updates a slave, even over a network.
It is transactionally consistent and virtually impossible to corrupt
as long as the disk media is stable. However as far as I know it won't
spread reads, so you'll still get the performance of one disk.

A more complete solution, that requires no software changes, would be
to have 3 or 4 disks. A stripe for really fast reads and writes, and
another disk (or another stripe) to act as a slave to the data being
written to the primary stripe. This seems to do what you want, at a
small price premium.

-- 
Dmitri Nikulin

Centre for Synchrotron Science
Monash University
Victoria 3800, Australia
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: interesting use case for multiple devices and delayed raid?

2009-04-01 Thread Brian J. Murrell
On Wed, 01 Apr 2009 21:13:19 +1100, Dmitri Nikulin wrote:

On Wed, 2009-04-01 at 21:13 +1100, Dmitri Nikulin wrote:
 
 I assume you mean read bandwidth, since write bandwidth cannot be
 increased by mirroring, only striping.

No, I mean write bandwidth.  You can get increased write bandwidth with
RAID 0 if you only write to one side of the mirror (initially),
effectively, striping.  You would update the other half of the mirror
lazily (iow, delayed) when the filesystem has idle bandwidth.  One
of the stipulations was that the use pattern is peaks and valleys, not
sustained usage.

Yes, you would lose the data that was written to a failed mirror before
the filesystem got a chance to do the lazy mirror updating later on.
That was a stipulation in my original requirements too.

 If you intend to stripe first,
 then mirror later as time permits,

Yeah, that's one way to describe it.

 this is the kind of sophistication
 you will need to write in the program code itself.

Why?  A filesystem that does already does it's own mirroring and
striping (as I understand btrfs does) should be able to handle this
itself.  Much better in the filesystem than for each application to have
to handle it itself.

 A filesystem is a handy abstraction, but you are by no means limited
 to using it. If you have very special needs, you can get pretty far by
 writing your own meta-filesystem to add semantics you don't have in
 your kernel filesystem of choice.

Of course.  But I am floating this idea as a feature of btrfs given that
it already has much of the components needed.

 This is handled by DragonFly BSD's HAMMER filesystem. A master gets
 written to, and asynchronously updates a slave, even over a network.
 It is transactionally consistent and virtually impossible to corrupt
 as long as the disk media is stable. However as far as I know it won't
 spread reads, so you'll still get the performance of one disk.

More importantly, it won't spread writes.

 A more complete solution, that requires no software changes, would be
 to have 3 or 4 disks. A stripe for really fast reads and writes, and
 another disk (or another stripe) to act as a slave to the data being
 written to the primary stripe. This seems to do what you want, at a
 small price premium.

No.  That's not really what I am describing at all.

I apologize if my original description was unclear.  Hopefully it is
more so now.

b.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: interesting use case for multiple devices and delayed raid?

2009-04-01 Thread Dmitri Nikulin
On Thu, Apr 2, 2009 at 8:04 AM, Brian J. Murrell br...@interlinx.bc.ca wrote:
 A more complete solution, that requires no software changes, would be
 to have 3 or 4 disks. A stripe for really fast reads and writes, and
 another disk (or another stripe) to act as a slave to the data being
 written to the primary stripe. This seems to do what you want, at a
 small price premium.

 No.  That's not really what I am describing at all.

Well you get the bandwidth of 2 disks when reading and writing, and
still mirrored to a second stripe as time permits. Kind of like
delayed RAID10.

 I apologize if my original description was unclear.  Hopefully it is
 more so now.

Yes. It'll be up to the actual filesystem devs to weigh in on whether
it's worth implementing.

-- 
Dmitri Nikulin

Centre for Synchrotron Science
Monash University
Victoria 3800, Australia
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html