>> In my opinon, the whole thing comes up from the idea of using cheap hardware
>> and out-of-the-box configurations to keep promises of reliability and
>> availability which are not realistic. There is a reason why there are more
>> expensive HDDs, RAIDs, SANs with volume mirroring, multipathing
On Wed, 22 Oct 2008 16:35:55 +0200
"dbz" <[EMAIL PROTECTED]> wrote:
> concerning this discussion, I'd like to put up some "requests" which
> strongly oppose to those brought up initially:
>
> - if you run into an error in the fs structure or any IO error that prevents
> you from bringing the fs
On Thu, 23 Oct 2008 12:50:33 am Chris Mason wrote:
> As someone else replied, NFS is stateless
NFS up to and including v3 is, but NFSv4 is stateful.
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
This email may come with a PGP signature as a file. Do not panic.
For more info s
On Wed, 22 Oct 2008 11:19:06 pm Stephan von Krawczynski wrote:
> You have a NFS server with clients. Your NFS server dies, your backup
> server cannot take over the clients without them resetting their NFS-link
> (which means reboot to many applications) - no way.
We're getting way off btrfs here
On Wed, 22 Oct 2008 11:56:58 -0400
"Michel Salim" <[EMAIL PROTECTED]> wrote:
> > [...]
> > Lets agree that the market for drives, arrays and related stuff is big and
> > contains just about any example one needs for arguing :-)
> > Nevertheless we probably agree that if john doe meets big-player a
Ric Wheeler wrote:
>> FS waiting for completion of all the dependent writes isn't too good
>> latency and throughput-wise tho. It would be best if FS can indicate
>> dependencies between write commands and barrier so that barrier
>> doesn't have to empty the whole queue. Hmm... Can someone tell m
Eric Anopolsky wrote:
On Thu, 2008-10-23 at 01:14 +0900, Tejun Heo wrote:
Ric Wheeler wrote:
Waiting for the target to ack an IO is not sufficient, since the target
ack does not (with write cache enabled) mean that it is on persistent
storage.
FS waiting for completion of all th
On Thu, 2008-10-23 at 01:14 +0900, Tejun Heo wrote:
> Ric Wheeler wrote:
> > Waiting for the target to ack an IO is not sufficient, since the target
> > ack does not (with write cache enabled) mean that it is on persistent
> > storage.
>
> FS waiting for completion of all the dependent writes isn'
Avi Kivity wrote:
Tejun Heo wrote:
For most SATA drives, disabling write back cache seems to take high
toll on write throughput. :-(
I measured this yesterday. This is true for pure write workloads; for
mixed read/write workloads the throughput decrease is negligible.
Depends on your
jim owens wrote:
For most SATA drives, disabling write back cache seems to take high
toll on write throughput. :-(
I measured this yesterday. This is true for pure write workloads;
for mixed read/write workloads the throughput decrease is negligible.
Different tests on different hardware
g
Avi Kivity wrote:
Tejun Heo wrote:
For most SATA drives, disabling write back cache seems to take high
toll on write throughput. :-(
I measured this yesterday. This is true for pure write workloads; for
mixed read/write workloads the throughput decrease is negligible.
Different tests on d
Tejun Heo wrote:
For most SATA drives, disabling write back cache seems to take high
toll on write throughput. :-(
I measured this yesterday. This is true for pure write workloads; for
mixed read/write workloads the throughput decrease is negligible.
As long as the error status is sti
Ric Wheeler wrote:
For any given set of disks, you "just" need to do the math to compute
the utilized capacity, the expected rate of drive failure, the rebuild
time and then see whether you can recover from your first failure
before a 2nd disk dies.
Spare disks have the advantage of a fully
Michel Salim wrote:
Though it would be nice to have a tool that would provide enough
information to make a warranty claim -- does btrfs keep enough
information for such a tool to be written?
Failed device I/O (rather than bad checksums and other
fs-specific error detections) should be logged a
Tejun Heo wrote:
Ric Wheeler wrote:
Waiting for the target to ack an IO is not sufficient, since the target
ack does not (with write cache enabled) mean that it is on persistent
storage.
FS waiting for completion of all the dependent writes isn't too good
latency and throughput-wise th
Ric Wheeler wrote:
> Waiting for the target to ack an IO is not sufficient, since the target
> ack does not (with write cache enabled) mean that it is on persistent
> storage.
FS waiting for completion of all the dependent writes isn't too good
latency and throughput-wise tho. It would be best if
On Wed, 2008-10-22 at 11:25 -0400, Ric Wheeler wrote:
> Avi Kivity wrote:
> > Ric Wheeler wrote:
> >>>
> >>> Well, btrfs is not about duplicating how most storage works today.
> >>> Spare capacity has significant advantages over spare disks, such as
> >>> being able to mix disk sizes, RAID level
On Wed, Oct 22, 2008 at 9:52 AM, Stephan von Krawczynski
<[EMAIL PROTECTED]> wrote:
> On Wed, 22 Oct 2008 09:15:45 -0400
> Chris Mason <[EMAIL PROTECTED]> wrote:
>
>> On Wed, 2008-10-22 at 14:27 +0200, Stephan von Krawczynski wrote:
>> > On Tue, 21 Oct 2008 13:31:37 -0400
>> > Ric Wheeler <[EMAIL P
Avi Kivity wrote:
Chris Mason wrote:
One problem with the spare
capacity model is the general trend where drives from the same batch
that get hammered on in the same way tend to die at the same time. Some
shops will sleep better knowing there's a hot spare and that's fine by
me.
How does h
Chris Mason wrote:
One problem with the spare
capacity model is the general trend where drives from the same batch
that get hammered on in the same way tend to die at the same time. Some
shops will sleep better knowing there's a hot spare and that's fine by
me.
How does hot sparing help? A
Ric Wheeler wrote:
I think that the btrfs plan is still to push more complicated RAID
schemes off to MD (RAID6, etc) so this is an issue even with a JBOD.
It will be interesting to map out the possible ways to use built in
mirroring, etc vs the external RAID and actually measure the utilized
c
Avi Kivity wrote:
Ric Wheeler wrote:
Well, btrfs is not about duplicating how most storage works today.
Spare capacity has significant advantages over spare disks, such as
being able to mix disk sizes, RAID levels, and better performance.
Sure, there are advantages that go in favour of one
Ric Wheeler wrote:
Well, btrfs is not about duplicating how most storage works today.
Spare capacity has significant advantages over spare disks, such as
being able to mix disk sizes, RAID levels, and better performance.
Sure, there are advantages that go in favour of one or the other
appr
Avi Kivity wrote:
Ric Wheeler wrote:
You want to have spare capacity, enough for one or two (or fifteen)
drives' worth of data. When a drive goes bad, you rebuild into the
spare capacity you have.
That is a different model (and one that makes sense, we used that in
Centera for object level
Ric Wheeler wrote:
You want to have spare capacity, enough for one or two (or fifteen)
drives' worth of data. When a drive goes bad, you rebuild into the
spare capacity you have.
That is a different model (and one that makes sense, we used that in
Centera for object level protection schemes)
Avi Kivity wrote:
Ric Wheeler wrote:
One key is not to replace the drives too early - you often can
recover significant amounts of data from a drive that is on its last
legs. This can be useful even in RAID rebuilds since with today's
enormous drive capacities, you might hit a latent error dur
Ric Wheeler wrote:
Matthias Wächter wrote:
On 10/22/2008 3:50 PM, Chris Mason wrote:
Let me reword my answer ;). The next write will always succeed unless
the drive is out of remapping sectors. If the drive is out, it is only
good for reads and holding down paper on your desk.
I hav
Chris Mason wrote:
You want to have spare capacity, enough for one or two (or fifteen)
drives' worth of data. When a drive goes bad, you rebuild into the
spare capacity you have.
You want spare capacity that does not degrade your raid levels if you
move the data onto it. In some confi
On Wed, 2008-10-22 at 16:32 +0200, Avi Kivity wrote:
> Ric Wheeler wrote:
> > One key is not to replace the drives too early - you often can recover
> > significant amounts of data from a drive that is on its last legs.
> > This can be useful even in RAID rebuilds since with today's enormous
>
concerning this discussion, I'd like to put up some "requests" which
strongly oppose to those brought up initially:
- if you run into an error in the fs structure or any IO error that prevents
you from bringing the fs into a consistent state, please simply oops. If a
user feels that availabili
Matthias Wächter wrote:
On 10/22/2008 3:50 PM, Chris Mason wrote:
Let me reword my answer ;). The next write will always succeed unless
the drive is out of remapping sectors. If the drive is out, it is only
good for reads and holding down paper on your desk.
I have a fairly new SATA
Ric Wheeler wrote:
One key is not to replace the drives too early - you often can recover
significant amounts of data from a drive that is on its last legs.
This can be useful even in RAID rebuilds since with today's enormous
drive capacities, you might hit a latent error during the rebuild on
jim owens wrote:
Avi Kivity wrote:
jim owens wrote:
Remember that the device bandwidth is the limiter so even
when each host has a dedicated path to the device (as in
dual port SAS or FC), that 2nd host cuts the throughput by
more than 1/2 with uncoordinated seeks and transfers.
That's only
Chris Mason wrote:
On Wed, 2008-10-22 at 09:38 -0400, Ric Wheeler wrote:
Chris Mason wrote:
On Wed, 2008-10-22 at 22:15 +0900, Tejun Heo wrote:
Ric Wheeler wrote:
I think that we do handle a failure in the case that you outline above
since the FS will be able t
Avi Kivity wrote:
jim owens wrote:
Remember that the device bandwidth is the limiter so even
when each host has a dedicated path to the device (as in
dual port SAS or FC), that 2nd host cuts the throughput by
more than 1/2 with uncoordinated seeks and transfers.
That's only a problem if there
On 10/22/2008 3:50 PM, Chris Mason wrote:
> Let me reword my answer ;). The next write will always succeed unless
> the drive is out of remapping sectors. If the drive is out, it is only
> good for reads and holding down paper on your desk.
I have a fairly new SATA disk with about 3000 hours of
On Wed, 22 Oct 2008 05:48:30 -0700
"Jeff Schroeder" <[EMAIL PROTECTED]> wrote:
> > NFS is a good example for a fs that never got redesigned for modern world. I
> > hope it will, but currently it's like Model T on a highway.
> > You have a NFS server with clients. Your NFS server dies, your backup
On Wed, 2008-10-22 at 09:38 -0400, Ric Wheeler wrote:
> Chris Mason wrote:
> > On Wed, 2008-10-22 at 22:15 +0900, Tejun Heo wrote:
> >
> >> Ric Wheeler wrote:
> >>
> >>> I think that we do handle a failure in the case that you outline above
> >>> since the FS will be able to notice the erro
On Wed, 22 Oct 2008 09:15:45 -0400
Chris Mason <[EMAIL PROTECTED]> wrote:
> On Wed, 2008-10-22 at 14:27 +0200, Stephan von Krawczynski wrote:
> > On Tue, 21 Oct 2008 13:31:37 -0400
> > Ric Wheeler <[EMAIL PROTECTED]> wrote:
> >
> > > [...]
> > > If you have remapped a big chunk of the sectors (sa
On Wed, 2008-10-22 at 14:19 +0200, Stephan von Krawczynski wrote:
> On Tue, 21 Oct 2008 13:49:43 -0400
> Chris Mason <[EMAIL PROTECTED]> wrote:
>
> > On Tue, 2008-10-21 at 18:27 +0200, Stephan von Krawczynski wrote:
> >
> > > > > 2. general requirements
> > > > > - fs errors without file/dir
Chris Mason wrote:
On Wed, 2008-10-22 at 22:15 +0900, Tejun Heo wrote:
Ric Wheeler wrote:
I think that we do handle a failure in the case that you outline above
since the FS will be able to notice the error before it sends a commit
down (and that commit is wrapped in the barrier flush c
Chris Mason wrote:
On Wed, 2008-10-22 at 14:27 +0200, Stephan von Krawczynski wrote:
On Tue, 21 Oct 2008 13:31:37 -0400
Ric Wheeler <[EMAIL PROTECTED]> wrote:
[...]
If you have remapped a big chunk of the sectors (say more than 10%), you
should grab the data off the disk asap and repl
Tejun Heo wrote:
Ric Wheeler wrote:
I think that we do handle a failure in the case that you outline above
since the FS will be able to notice the error before it sends a commit
down (and that commit is wrapped in the barrier flush calls). This is
the easy case since we still have the context
On Wed, 2008-10-22 at 22:15 +0900, Tejun Heo wrote:
> Ric Wheeler wrote:
> > I think that we do handle a failure in the case that you outline above
> > since the FS will be able to notice the error before it sends a commit
> > down (and that commit is wrapped in the barrier flush calls). This is
>
Ric Wheeler wrote:
Scrubbing is key for many scenarios since errors can "grow" even in
places where previous IO has been completed without flagging an error.
Some neat tricks are:
(1) use block level scrubbing to detect any media errors. If you
can map that sector level error into a file s
Ric Wheeler wrote:
> I think that we do handle a failure in the case that you outline above
> since the FS will be able to notice the error before it sends a commit
> down (and that commit is wrapped in the barrier flush calls). This is
> the easy case since we still have the context for the IO.
I
On Wed, 2008-10-22 at 14:27 +0200, Stephan von Krawczynski wrote:
> On Tue, 21 Oct 2008 13:31:37 -0400
> Ric Wheeler <[EMAIL PROTECTED]> wrote:
>
> > [...]
> > If you have remapped a big chunk of the sectors (say more than 10%), you
> > should grab the data off the disk asap and replace it. Worry
On Wed, 2008-10-22 at 09:03 -0400, Ric Wheeler wrote:
> Avi Kivity wrote:
> > Stephan von Krawczynski wrote:
> >>
> >>>- filesystem autodetects, isolates, and (possibly) repairs errors
> >>>- online "scan, check, repair filesystem" tool initiated by admin
> >>>- Reliability so high that
Avi Kivity wrote:
Stephan von Krawczynski wrote:
- filesystem autodetects, isolates, and (possibly) repairs errors
- online "scan, check, repair filesystem" tool initiated by admin
- Reliability so high that they never run that check-and-fix tool
That is _wrong_ (to a certain e
Tejun Heo wrote:
Ric Wheeler wrote:
The cache flush command for ATA devices will block and wait until all of
the device's write cache has been written back.
What I assume Tejun was referring to here is that some IO might have
been written out to the device and an error happened when the devi
Tejun Heo wrote:
Ric Wheeler wrote:
The cache flush command for ATA devices will block and wait until all of
the device's write cache has been written back.
What I assume Tejun was referring to here is that some IO might have
been written out to the device and an error happened when the devi
On Wed, Oct 22, 2008 at 5:19 AM, Stephan von Krawczynski
<[EMAIL PROTECTED]> wrote:
> On Tue, 21 Oct 2008 13:49:43 -0400
> Chris Mason <[EMAIL PROTECTED]> wrote:
>
>> On Tue, 2008-10-21 at 18:27 +0200, Stephan von Krawczynski wrote:
>>
>> > > > 2. general requirements
>> > > > - fs errors witho
On Tue, 21 Oct 2008 13:31:37 -0400
Ric Wheeler <[EMAIL PROTECTED]> wrote:
> [...]
> If you have remapped a big chunk of the sectors (say more than 10%), you
> should grab the data off the disk asap and replace it. Worry less about
> errors during read, writes indicate more serious errors.
Ok, n
On Tue, 21 Oct 2008 13:49:43 -0400
Chris Mason <[EMAIL PROTECTED]> wrote:
> On Tue, 2008-10-21 at 18:27 +0200, Stephan von Krawczynski wrote:
>
> > > > 2. general requirements
> > > > - fs errors without file/dir names are useless
> > > > - errors in parts of the fs are no reason for a fs
Stephan von Krawczynski wrote:
- filesystem autodetects, isolates, and (possibly) repairs errors
- online "scan, check, repair filesystem" tool initiated by admin
- Reliability so high that they never run that check-and-fix tool
That is _wrong_ (to a certain extent). You _want t
On Tue, 21 Oct 2008 18:59:26 +0200
Andi Kleen <[EMAIL PROTECTED]> wrote:
> Stephan von Krawczynski <[EMAIL PROTECTED]> writes:
> >
> > Yes, we hear and say that all the time, name one linux fs doing it, please.
>
> ext[234] support it to some extent. It has some limitations
> (especially when the
On Tue, 21 Oct 2008 18:09:40 +0200
Andi Kleen <[EMAIL PROTECTED]> wrote:
> While that's true today, I'm not sure it has to be true always.
> I always thought traditional fsck user interfaces were a
> UI desaster and could be done much better with some simple tweaks.
> [...]
You are completely ri
On Tue, 21 Oct 2008 13:15:13 -0400
Christoph Hellwig <[EMAIL PROTECTED]> wrote:
> On Tue, Oct 21, 2008 at 07:01:36PM +0200, Stephan von Krawczynski wrote:
> > Sure, but what you say only reflects the ideal world. On a file service, you
> > never have that. In fact you do not even have good control
On Tue, 21 Oct 2008 11:34:20 -0400
jim owens <[EMAIL PROTECTED]> wrote:
> Hearing what user's think they want is always good, but...
>
> Stephan von Krawczynski wrote:
> >
> > thanks for your feedback. Understand "minimum requirement" as "minimum
> > requirement to drop the current installation
Ric Wheeler wrote:
> The cache flush command for ATA devices will block and wait until all of
> the device's write cache has been written back.
>
> What I assume Tejun was referring to here is that some IO might have
> been written out to the device and an error happened when the device
> tried to
Eric Anopolsky wrote:
On Tue, 2008-10-21 at 18:18 -0400, Ric Wheeler wrote:
Eric Anopolsky wrote:
On Tue, 2008-10-21 at 09:59 -0400, Chris Mason wrote:
- power loss at any time must not corrupt the fs (atomic fs modification)
(new-data loss is acceptable)
jim owens wrote:
Remember that the device bandwidth is the limiter so even
when each host has a dedicated path to the device (as in
dual port SAS or FC), that 2nd host cuts the throughput by
more than 1/2 with uncoordinated seeks and transfers.
That's only a problem if there is a single shared
On Tue, 2008-10-21 at 18:18 -0400, Ric Wheeler wrote:
> Eric Anopolsky wrote:
> > On Tue, 2008-10-21 at 09:59 -0400, Chris Mason wrote:
> >
> >>> - power loss at any time must not corrupt the fs (atomic fs
> >>> modification)
> >>> (new-data loss is acceptable)
> >>>
> >> Done.
Eric Anopolsky wrote:
On Tue, 2008-10-21 at 09:59 -0400, Chris Mason wrote:
- power loss at any time must not corrupt the fs (atomic fs modification)
(new-data loss is acceptable)
Done. Btrfs already uses barriers as required for sata drives.
Aren't there situations
On Tue, 2008-10-21 at 09:59 -0400, Chris Mason wrote:
> > - power loss at any time must not corrupt the fs (atomic fs
> > modification)
> > (new-data loss is acceptable)
>
> Done. Btrfs already uses barriers as required for sata drives.
Aren't there situations in which write barriers
calin wrote:
question is: if you had such an implementation, are there
drawbacks expectable for the single-mount case? If not I'd vote for it
because there are not really many alternatives "on the market".
As I understand it, the largest issue is in locking and boundaries.
Correct, that is t
On Tue, 2008-10-21 at 18:27 +0200, Stephan von Krawczynski wrote:
> > > 2. general requirements
> > > - fs errors without file/dir names are useless
> > > - errors in parts of the fs are no reason for a fs to go offline as a
> > > whole
> >
> > These two are in progress. Btrfs won't alw
> question is: if you had such an implementation, are there
> drawbacks expectable for the single-mount case? If not I'd vote for it
> because there are not really many alternatives "on the market".
As I understand it, the largest issue is in locking and boundaries. Two
different systems could m
Christoph Hellwig wrote:
On Tue, Oct 21, 2008 at 07:01:36PM +0200, Stephan von Krawczynski wrote:
Sure, but what you say only reflects the ideal world. On a file service, you
never have that. In fact you do not even have good control about what is going
on. Lets say you have a setup that crea
On Tue, Oct 21, 2008 at 07:01:36PM +0200, Stephan von Krawczynski wrote:
> Sure, but what you say only reflects the ideal world. On a file service, you
> never have that. In fact you do not even have good control about what is going
> on. Lets say you have a setup that creates, reads and deletes fi
On Tue, 21 Oct 2008 09:20:16 -0400
jim owens <[EMAIL PROTECTED]> wrote:
> btrfs has many of the same goals... but they are goals not code
> so when you might see them is indeterminate.
no big issue, my pension is 20 years away, I got time ;-)
> I believe these should not be in btrfs:
>
> Steph
Stephan von Krawczynski <[EMAIL PROTECTED]> writes:
>
> Yes, we hear and say that all the time, name one linux fs doing it, please.
ext[234] support it to some extent. It has some limitations
(especially when the files are large and you shouldn't do too much followon
IO to prevent the data from be
Hello Chris,
let me clarify some things a bit, see ...
On Tue, 21 Oct 2008 09:59:40 -0400
Chris Mason <[EMAIL PROTECTED]> wrote:
> Thanks for this input and for taking the time to post it.
>
> > 1. filesystem-check
> > 1.1 it should not
> > - delay boot process (we have to wait for hours c
Chris Mason <[EMAIL PROTECTED]> writes:
>
> Started interactively? I'm not entirely sure what that means, but in
> general when you ask the user a question about if/how to fix a
> corruption, they will have no idea what the correct answer is.
While that's true today, I'm not sure it has to be tru
Hearing what user's think they want is always good, but...
Stephan von Krawczynski wrote:
thanks for your feedback. Understand "minimum requirement" as "minimum
requirement to drop the current installation and migrate the data to a
new fs platform".
I would sure like to know what existing pla
On Tue, 21 Oct 2008 14:13:33 +0200
Andi Kleen <[EMAIL PROTECTED]> wrote:
> Stephan von Krawczynski <[EMAIL PROTECTED]> writes:
>
> > reading the list for a while it looks like all kinds of implementational
> > topics are covered but no basic user requests or talks are going on. Since I
> > have f
On Tue, 2008-10-21 at 13:23 +0200, Stephan von Krawczynski wrote:
> Hello all,
>
> reading the list for a while it looks like all kinds of implementational
> topics are covered but no basic user requests or talks are going on. Since I
> have found no other list on vger covering these issues I choo
btrfs has many of the same goals... but they are goals not code
so when you might see them is indeterminate.
I believe these should not be in btrfs:
Stephan von Krawczynski wrote:
- parallel mounts (very important!)
as Andi said, you want a cluster or distributed fs. There
are layered d
Stephan von Krawczynski <[EMAIL PROTECTED]> writes:
> reading the list for a while it looks like all kinds of implementational
> topics are covered but no basic user requests or talks are going on. Since I
> have found no other list on vger covering these issues I choose this one,
> forgive my ign
Hello all,
reading the list for a while it looks like all kinds of implementational
topics are covered but no basic user requests or talks are going on. Since I
have found no other list on vger covering these issues I choose this one,
forgive my ignorance if it is the wrong place.
Like many people
80 matches
Mail list logo