Re: Truncation Data Loss

2009-11-11 Thread David Vasek

On Tue, 10 Nov 2009, Nick Guenther wrote:


[ext3 data= / FFS]
journal ~= sync (ensures consistency of both metadata and file data)
ordered ~= softdep (ensures consistency of metadata both internally
and with file data)
writeback ~= default (ensures consistency of metadata internally but
real file data may not agree, e.g. my empty file)
Additionally FFS has the async flag which turns off the internal
consistency of the metadata structures; I guess there's no equivalent
for this in ext?


Isn't it rather
default ~= async ?

For ext2, at least.

Regards,
David



Re: Truncation Data Loss

2009-11-11 Thread Janne Johansson
Nick Guenther wrote:

 So, as nicely summarized at

 http://www.h-online.com/open/news/item/Possible-data-loss-in-Ext4-740467.html
 ,
 ext4 is kind of broken. It won't honor fsync and, as a /feature/, will
 wait up to two minutes to write out data, leading to lots of files
 emptied to the great bitbucket in the sky if the machine goes down in
 that period.
 There is a very simple explanation for why things are so.
 Actual data file loss has never been what these things were coded for.
 filesystem *tree and meta-data*, ie. the structure of how things are
 knit together, is the main concern.  If you lose the filesystem tree
 structure, you've lost all your files, not just the newest ones.
 Therefore the goal is safe metadata handling.  The result is you can
 lose specific data in specific (newly written to) files, but the
 structure of the filesystem is consistant enough for fsck to not damage
 it.

 See, since it seems that BSD doesn't have this file-data consistency
 guarantee, are Linus' worries about ext4's potential data loss just
 being alarmist? It seems to me that the case described in
 https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45
 is just as likely to happen on OpenBSD--if I run KDE or GNOME and mess
 around with my settings then quickly murder the system the files will
 be resurrected empty, right?

It seems like some posters in this thread somehow misses the fact that
if you have outstanding writes and the box dies. Some of your data dies
also. New or old data, something will be missing.

From the point your app does a write(), it gets buffered in the I/O
handling, it gets buffered by the device driver for the card, it gets
buffered in the card probably, it gets buffered on the on-disk memory
cache and then it serially hits the platter one bit a a time until its
all written. If you have data in this long pipe and the power goes, you
will lose data, period.

OpenBSD has chosen to try harder to keep the metadata intact, and ext4
doesn't try at all, for the love of speed. Still, you are only moving
around the window of opportunity for fail, and sometimes making it
larger or smaller, but it is always there.

The last comment above should really only read:
If I quickly murder my system, the files might be gone. Nothing else.

If you have writes going, data loss is a reality. Sometimes more,
sometimes less, but its all games with statistics. If ext4 has a 50%
chance of killing your files and FFS on obsd has 1%, you might still get
to keep your KDE settings on either system or you may lose them all. It
shouldn't be news to anyone that Linux always went for fast-and-insecure
whereas the BSDs opted for slower-but-safer for the filesystems. Making
a fuss about how insecure the penguins are this week feels like a waste
of time to me.

If you care about your data, you have backups.

Regardless of if the probability is 1% or 50%, because for someone out
there, the percentages will be against you.



Re: Truncation Data Loss

2009-11-11 Thread Michal
Janne Johansson wrote:
 Nick Guenther wrote:
 
 So, as nicely summarized at

 http://www.h-online.com/open/news/item/Possible-data-loss-in-Ext4-740467.html
 ,
 ext4 is kind of broken. It won't honor fsync and, as a /feature/, will
 wait up to two minutes to write out data, leading to lots of files
 emptied to the great bitbucket in the sky if the machine goes down in
 that period.
 There is a very simple explanation for why things are so.
 Actual data file loss has never been what these things were coded for.
 filesystem *tree and meta-data*, ie. the structure of how things are
 knit together, is the main concern.  If you lose the filesystem tree
 structure, you've lost all your files, not just the newest ones.
 Therefore the goal is safe metadata handling.  The result is you can
 lose specific data in specific (newly written to) files, but the
 structure of the filesystem is consistant enough for fsck to not damage
 it.
 
 See, since it seems that BSD doesn't have this file-data consistency
 guarantee, are Linus' worries about ext4's potential data loss just
 being alarmist? It seems to me that the case described in
 https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45
 is just as likely to happen on OpenBSD--if I run KDE or GNOME and mess
 around with my settings then quickly murder the system the files will
 be resurrected empty, right?
 
 It seems like some posters in this thread somehow misses the fact that
 if you have outstanding writes and the box dies. Some of your data dies
 also. New or old data, something will be missing.
 
 From the point your app does a write(), it gets buffered in the I/O
 handling, it gets buffered by the device driver for the card, it gets
 buffered in the card probably, it gets buffered on the on-disk memory
 cache and then it serially hits the platter one bit a a time until its
 all written. If you have data in this long pipe and the power goes, you
 will lose data, period.
 
 OpenBSD has chosen to try harder to keep the metadata intact, and ext4
 doesn't try at all, for the love of speed. Still, you are only moving
 around the window of opportunity for fail, and sometimes making it
 larger or smaller, but it is always there.
 
 The last comment above should really only read:
 If I quickly murder my system, the files might be gone. Nothing else.
 
 If you have writes going, data loss is a reality. Sometimes more,
 sometimes less, but its all games with statistics. If ext4 has a 50%
 chance of killing your files and FFS on obsd has 1%, you might still get
 to keep your KDE settings on either system or you may lose them all. It
 shouldn't be news to anyone that Linux always went for fast-and-insecure
 whereas the BSDs opted for slower-but-safer for the filesystems. Making
 a fuss about how insecure the penguins are this week feels like a waste
 of time to me.
 
 If you care about your data, you have backups.
 
 Regardless of if the probability is 1% or 50%, because for someone out
 there, the percentages will be against you.
 

I know this is a bit off topic, but storage devices have battery's on
RAID cards for a reason. If you are worried about read/writes etc when a
system dies, there are measures you can take



Re: Truncation Data Loss

2009-11-11 Thread Russell Howe

Michal wrote, sometime around 11/11/09 11:40:


I know this is a bit off topic, but storage devices have battery's on
RAID cards for a reason. If you are worried about read/writes etc when a
system dies, there are measures you can take


Probably even more OT, but...

Although some (most?) RAID cards which have a battery option will only 
let you enable the write cache if you have a battery installed. 
Certainly the HP P400 cards we have do.


There has been endless discussion about data loss in these types of 
scenarios on the XFS mailing list - it journals metadata but not data, 
so if your application (e.g. vim) overwrites files by first truncating 
them to 0 length and then writing out the data, you'll find that the 
truncate and the resize of the file are all nicely replayed from the 
journal after the crash, but if the machine died before your data hit 
the disk, all you'll get when you read() is \0\0\0\0...


Since ext4 has started to implement similar features in similar ways to 
XFS, the ext4 folk are running into the same old problems.


--
Russell Howe, IT Manager. rh...@bmtmarinerisk.com
BMT Marine  Offshore Surveys Ltd.



Re: Truncation Data Loss

2009-11-11 Thread Marco Peereboom
On Wed, Nov 11, 2009 at 12:24:25PM +0100, Janne Johansson wrote:
 Nick Guenther wrote:
 
  So, as nicely summarized at
 
  http://www.h-online.com/open/news/item/Possible-data-loss-in-Ext4-740467.html
  ,
  ext4 is kind of broken. It won't honor fsync and, as a /feature/, will
  wait up to two minutes to write out data, leading to lots of files
  emptied to the great bitbucket in the sky if the machine goes down in
  that period.
  There is a very simple explanation for why things are so.
  Actual data file loss has never been what these things were coded for.
  filesystem *tree and meta-data*, ie. the structure of how things are
  knit together, is the main concern.  If you lose the filesystem tree
  structure, you've lost all your files, not just the newest ones.
  Therefore the goal is safe metadata handling.  The result is you can
  lose specific data in specific (newly written to) files, but the
  structure of the filesystem is consistant enough for fsck to not damage
  it.
 
  See, since it seems that BSD doesn't have this file-data consistency
  guarantee, are Linus' worries about ext4's potential data loss just
  being alarmist? It seems to me that the case described in
  https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45
  is just as likely to happen on OpenBSD--if I run KDE or GNOME and mess
  around with my settings then quickly murder the system the files will
  be resurrected empty, right?
 
 It seems like some posters in this thread somehow misses the fact that
 if you have outstanding writes and the box dies. Some of your data dies
 also. New or old data, something will be missing.

And that all SATA drives enable it or else they are glacial and that
SCSI disables it to enhance perception that one is safer.

Buy a ups! your laptop has a built in one.



Re: Truncation Data Loss

2009-11-11 Thread Marco Peereboom
EXT was and still is a joke.  I remember reading about the 2 minute
drain and I almost peed my pants.  EXT3 had the nice feature of randomly
stopping to boot after enough reboots on enough machines.  Thankfully I
no longer run any volume of this crap.

On Wed, Nov 11, 2009 at 11:55:30AM +, Russell Howe wrote:
 Michal wrote, sometime around 11/11/09 11:40:

 I know this is a bit off topic, but storage devices have battery's on
 RAID cards for a reason. If you are worried about read/writes etc when a
 system dies, there are measures you can take

 Probably even more OT, but...

 Although some (most?) RAID cards which have a battery option will only  
 let you enable the write cache if you have a battery installed.  
 Certainly the HP P400 cards we have do.

 There has been endless discussion about data loss in these types of  
 scenarios on the XFS mailing list - it journals metadata but not data,  
 so if your application (e.g. vim) overwrites files by first truncating  
 them to 0 length and then writing out the data, you'll find that the  
 truncate and the resize of the file are all nicely replayed from the  
 journal after the crash, but if the machine died before your data hit  
 the disk, all you'll get when you read() is \0\0\0\0...

 Since ext4 has started to implement similar features in similar ways to  
 XFS, the ext4 folk are running into the same old problems.

 -- 
 Russell Howe, IT Manager. rh...@bmtmarinerisk.com
 BMT Marine  Offshore Surveys Ltd.



Re: Truncation Data Loss

2009-11-11 Thread Nick Guenther
On Wed, Nov 11, 2009 at 3:35 AM, David Vasek va...@fido.cz wrote:
 On Tue, 10 Nov 2009, Nick Guenther wrote:

 [ext3 data= / FFS]
 journal ~= sync (ensures consistency of both metadata and file data)
 ordered ~= softdep (ensures consistency of metadata both internally
 and with file data)
 writeback ~= default (ensures consistency of metadata internally but
 real file data may not agree, e.g. my empty file)
 Additionally FFS has the async flag which turns off the internal
 consistency of the metadata structures; I guess there's no equivalent
 for this in ext?

 Isn't it rather
 default ~= async ?

 For ext2, at least.


Well I'm not sure because no one seems to really know. Linux's
mount(1) has this to say:
  writeback
 Data ordering is not preserved - data may be written
into
 the  main file system after its metadata has been
commitb
 ted to the journal.  This is rumoured to be the
highest-
 throughput  option.   It  guarantees internal file
system
 integrity, however it can allow old  data  to  appear
in
 files after a crash and journal recovery.
which seems to imply that metadata is written synchronously (because
it only talks about data appearing in files, not about the whole
filesystem getting trashed).

And BSD's mount(1) says:
 async   Metadata I/O to the file system should be done asyn-
 chronously.  By default, only regular data is read/writ-
 ten asynchronously.

 This is a dangerous flag to set since it does not
guaran-
 tee to keep a consistent file system structure on the
 disk.  You should not use this flag unless you are pre-
 pared to recreate the file system should your system
 crash.  The most common use of this flag is to speed up
 restore(8) where it can give a factor of two speed in-
 crease.



Re: Truncation Data Loss

2009-11-11 Thread Nick Guenther
On Wed, Nov 11, 2009 at 1:16 PM, Ted Unangst ted.unan...@gmail.com wrote:
 On Tue, Nov 10, 2009 at 10:50 PM, Nick Guenther kou...@gmail.com wrote:
 See, since it seems that BSD doesn't have this file-data consistency
 guarantee, are Linus' worries about ext4's potential data loss just
 being alarmist? It seems to me that the case described in

https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45
 is just as likely to happen on OpenBSD--if I run KDE or GNOME and mess
 around with my settings then quickly murder the system the files will
 be resurrected empty, right?

 Yes, if you cut power before things are written to disk, they will not
 be written to disk.  Snark aside, it really is that simple.  Different
 filesystems have different definitions of what written to disk
 means, or more accurately, *when*, but in all cases, if you cared you
 used fsync or tried a little harder to not crash.

 What is the reason softdep isn't on by default?

 It changes the expected behavior.  FFS without softdep is a lot
 closer to the semantics people and most applications expect.



Okay, one last question: one of the original softdep papers
(http://www.usenix.org/publications/library/proceedings/bsdcon02/mckusick.htm
l)
is all about how softdeps can avoid fsck, but I just set softdep on
all my filesystems, rebooted (to start fresh), wrote some files, wrote
some more files, edited the first files, and jacked the power plug
right after it said wrote. When the system came up fsck ran, what
gives? Does OpenBSD only implement softdep for the write speedups?

I'm just really confused about what softdep -is- I guess. What
semantics get changed? Do all the BSDs use the same softdep code? Did
they pick and choose ideas from the original softdep papers?

Thanks for letting me pick your brain, Ted,
-Nick



Re: Truncation Data Loss

2009-11-11 Thread David Vasek

On Wed, 11 Nov 2009, Nick Guenther wrote:


On Wed, Nov 11, 2009 at 3:35 AM, David Vasek va...@fido.cz wrote:

On Tue, 10 Nov 2009, Nick Guenther wrote:


[ext3 data= / FFS]
journal ~= sync (ensures consistency of both metadata and file data)
ordered ~= softdep (ensures consistency of metadata both internally
and with file data)
writeback ~= default (ensures consistency of metadata internally but
real file data may not agree, e.g. my empty file)
Additionally FFS has the async flag which turns off the internal
consistency of the metadata structures; I guess there's no equivalent
for this in ext?


Isn't it rather
default ~= async ?

For ext2, at least.



Well I'm not sure because no one seems to really know. Linux's
mount(1) has this to say:
 writeback
Data ordering is not preserved - data may be written
into
the  main file system after its metadata has been
commitb
ted to the journal.  This is rumoured to be the
highest-
throughput  option.   It  guarantees internal file
system
integrity, however it can allow old  data  to  appear
in
files after a crash and journal recovery.
which seems to imply that metadata is written synchronously (because
it only talks about data appearing in files, not about the whole
filesystem getting trashed).


The paragraph from Linux's mount(1?) you cited is related to ext3/ext4, 
which, as tedu@ has already written, uses journaling. Ext2, in contrast, 
is mounted async by default.


If still unsure, look at the following document about ext2 from Linux 
(kernel) documentation, section Metadata (line 281 and below).


http://www.mjmwired.net/kernel/Documentation/filesystems/ext2.txt

Regards,
David



Re: Truncation Data Loss

2009-11-11 Thread Theo de Raadt
 Okay, one last question: one of the original softdep papers
 (http://www.usenix.org/publications/library/proceedings/bsdcon02/mckusick.htm
 l)
 is all about how softdeps can avoid fsck, but I just set softdep on
 all my filesystems, rebooted (to start fresh), wrote some files, wrote
 some more files, edited the first files, and jacked the power plug
 right after it said wrote. When the system came up fsck ran, what
 gives? Does OpenBSD only implement softdep for the write speedups?
 
 I'm just really confused about what softdep -is- I guess. What
 semantics get changed? Do all the BSDs use the same softdep code? Did
 they pick and choose ideas from the original softdep papers?

You are misreading the fsck manual page.  Let me take you through it:

 fsck - file system consistency check and interactive repair

Note what it says carefully. It says file system.  It does not
say file consistency check and interactive repair.

If you want your files to not lose a byte, use the sync option.
Come on.  Just do it.  Give it a try.  Then you will understand.



Re: Truncation Data Loss

2009-11-11 Thread Ted Unangst
On Wed, Nov 11, 2009 at 7:08 PM, Nick Guenther kou...@gmail.com wrote:
 Okay, one last question: one of the original softdep papers
 (http://www.usenix.org/publications/library/proceedings/bsdcon02/mckusick.html)
 is all about how softdeps can avoid fsck, but I just set softdep on
 all my filesystems, rebooted (to start fresh), wrote some files, wrote
 some more files, edited the first files, and jacked the power plug
 right after it said wrote. When the system came up fsck ran, what
 gives? Does OpenBSD only implement softdep for the write speedups?

All the softdep code comes from FreeBSD, but at various points, not to
mention that things that work in papers never work the same for people
who aren't writing papers.  Nobody added the background fsck code to
OpenBSD because it didn't exist when we imported softdep and nobody
has cared enough to do so since.



Truncation Data Loss

2009-11-10 Thread Nick Guenther
So, as nicely summarized at
http://www.h-online.com/open/news/item/Possible-data-loss-in-Ext4-740467.html,
ext4 is kind of broken. It won't honor fsync and, as a /feature/, will
wait up to two minutes to write out data, leading to lots of files
emptied to the great bitbucket in the sky if the machine goes down in
that period. Why is this relevant to OpenBSD? Well sometimes I've been
writing a file in vi or mg and had my machine go down, and when it
comes back I find that the file is empty and I'm just trying to figure
out if this is just because the data wasn't fsync'd or if it's because
of softdep or what.

I know this is kind of a newbish question but I have no idea how I'd
go about researching it. And I'd like to sort this out because it's a
big gap in my knowledge. I thought there was a paper on softdep but
http://openbsd.org/papers doesn't have it.

NetBSD's summary http://www.netbsd.org/docs/misc/#softdep-impact says:
The FFS takes care to correctly order all metadata operations, as
well as to ensure that all metadata operations precede operations on
the data to which they refer, so that the file system may be
guaranteed to be recoverable after a crash. The last N seconds of file
data may not be recoverable, where N is the syncer interval, but the
file system metadata will be. N is usually 30.

So my interpretation of this is that my missing file is a
to-be-expected ancient part of posix, unless I run sync after every
write. Is this right? Out of curiousity, what would happen if I ran
sync and pulled the power at the same time (that is, what cases can
cause the filesystem to get inconsistent)?


But I still don't get how softdeps fits into all this. That page goes on:

With softdeps running, you've got almost the same guarantee. With
softdeps, you have the guarantee that you will get a consistent
snapshot of the file system as it was at some particular point in time
before the crash. So you don't know, as you did without softdeps,
that, for example, if you did an atomic operation such as a rename of
a lock file, the lock file will actually be there; but you do know
that the directory it was in won't be trashed and you do know that
ordering dependencies between that atomic operation and future atomic
operations will have been preserved, so if you are depending on atomic
operations to control, say, some database-like process (e.g. writing
mail spool files in batches, gathering data from a transaction system,
etc.) you can safely start back up where you appear to have left off.

but while I kind of grasp the details, I can't seem to figure out what
they mean in context.

Enlightenment appreciated! I don't want to be that guy in 20 years who
rewrites the filesystem to be more efficient by not actually writing
the quantum-light-platter.

(and btw, why isn't http://openbsd.org/papers linked from the front page?)

-Nick



Re: Truncation Data Loss

2009-11-10 Thread Ted Unangst
On Tue, Nov 10, 2009 at 4:29 AM, Nick Guenther kou...@gmail.com wrote:
 So, as nicely summarized at
 http://www.h-online.com/open/news/item/Possible-data-loss-in-Ext4-740467.html,
 ext4 is kind of broken. It won't honor fsync and, as a /feature/, will
 wait up to two minutes to write out data, leading to lots of files
 emptied to the great bitbucket in the sky if the machine goes down in
 that period. Why is this relevant to OpenBSD? Well sometimes I've been
 writing a file in vi or mg and had my machine go down, and when it
 comes back I find that the file is empty and I'm just trying to figure
 out if this is just because the data wasn't fsync'd or if it's because
 of softdep or what.

softdep has that effect.  The file was created and then data written.
But softdep cares more about the first op than the second, so there's
a window where crashing will cause you to wake up with empty files.

Without softdep, it's more likely you'll have your data (though it may
even be the old version, and you may have to look in lost+found for
it).  softdep works fine with fsync, but the old unix trick of write
data then rename leads to empty files, because the rename is sped up
but the data isn't.



Re: Truncation Data Loss

2009-11-10 Thread Theo de Raadt
On Tue, Nov 10, 2009 at 4:29 AM, Nick Guenther kou...@gmail.com wrote:
 So, as nicely summarized at
 http://www.h-online.com/open/news/item/Possible-data-loss-in-Ext4-740467.html,
 ext4 is kind of broken. It won't honor fsync and, as a /feature/, will
 wait up to two minutes to write out data, leading to lots of files
 emptied to the great bitbucket in the sky if the machine goes down in
 that period. Why is this relevant to OpenBSD? Well sometimes I've been
 writing a file in vi or mg and had my machine go down, and when it
 comes back I find that the file is empty and I'm just trying to figure
 out if this is just because the data wasn't fsync'd or if it's because
 of softdep or what.

softdep has that effect.  The file was created and then data written.
But softdep cares more about the first op than the second, so there's
a window where crashing will cause you to wake up with empty files.

Without softdep, it's more likely you'll have your data (though it may
even be the old version, and you may have to look in lost+found for
it).  softdep works fine with fsync, but the old unix trick of write
data then rename leads to empty files, because the rename is sped up
but the data isn't.

There is a very simple explanation for why things are so.

Actual data file loss has never been what these things were coded for.

filesystem *tree and meta-data*, ie. the structure of how things are
knit together, is the main concern.  If you lose the filesystem tree
structure, you've lost all your files, not just the newest ones.
Therefore the goal is safe metadata handling.  The result is you can
lose specific data in specific (newly written to) files, but the
structure of the filesystem is consistant enough for fsck to not damage
it.

If you want to never lose data, you have an option.  Make the filesystem
syncronous, using the -o sync option.

If you can't accept the performance hit from that, then please accept
that all the work done over the ages is only on ensuring metadata-safety
for a low performance penalty.  It has never been about trying to
promise file data consistancy when that could only be achieved by
syncronous file data writing.



Re: Truncation Data Loss

2009-11-10 Thread Jussi Peltola
On Tue, Nov 10, 2009 at 11:18:57AM -0700, Theo de Raadt wrote:
 If you want to never lose data, you have an option.  Make the filesystem
 syncronous, using the -o sync option.
 
 If you can't accept the performance hit from that, then please accept
 that all the work done over the ages is only on ensuring metadata-safety
 for a low performance penalty.  It has never been about trying to
 promise file data consistancy when that could only be achieved by
 syncronous file data writing.
 
And the more or less correct solution to improve the performance is
battery backed RAID write cache, but it's no silver bullet.



Re: Truncation Data Loss

2009-11-10 Thread Bob Beck
2009/11/10 Jussi Peltola pe...@pelzi.net:
 On Tue, Nov 10, 2009 at 11:18:57AM -0700, Theo de Raadt wrote:
 If you want to never lose data, you have an option.  Make the filesystem
 syncronous, using the -o sync option.

 If you can't accept the performance hit from that, then please accept
 that all the work done over the ages is only on ensuring metadata-safety
 for a low performance penalty.  It has never been about trying to
 promise file data consistancy when that could only be achieved by
 syncronous file data writing.

 And the more or less correct solution to improve the performance is
 battery backed RAID write cache, but it's no silver bullet.



 Other than it will still blow goats because it will be bashing
all that data synchronously over the bus.

The best silver bullets are the bullets that just shoot the users that
care either about this, and/or performance. Once you shoot enough of
them performance improves to an acceptable level.



Re: Truncation Data Loss

2009-11-10 Thread Bryan Irvine
I lost a picture of Bob Becks ass this same exact way.

-B

On Tue, Nov 10, 2009 at 1:29 AM, Nick Guenther kou...@gmail.com wrote:
 So, as nicely summarized at
 http://www.h-online.com/open/news/item/Possible-data-loss-in-Ext4-740467.html,
 ext4 is kind of broken. It won't honor fsync and, as a /feature/, will
 wait up to two minutes to write out data, leading to lots of files
 emptied to the great bitbucket in the sky if the machine goes down in
 that period. Why is this relevant to OpenBSD? Well sometimes I've been
 writing a file in vi or mg and had my machine go down, and when it
 comes back I find that the file is empty and I'm just trying to figure
 out if this is just because the data wasn't fsync'd or if it's because
 of softdep or what.

 I know this is kind of a newbish question but I have no idea how I'd
 go about researching it. And I'd like to sort this out because it's a
 big gap in my knowledge. I thought there was a paper on softdep but
 http://openbsd.org/papers doesn't have it.

 NetBSD's summary http://www.netbsd.org/docs/misc/#softdep-impact says:
 The FFS takes care to correctly order all metadata operations, as
 well as to ensure that all metadata operations precede operations on
 the data to which they refer, so that the file system may be
 guaranteed to be recoverable after a crash. The last N seconds of file
 data may not be recoverable, where N is the syncer interval, but the
 file system metadata will be. N is usually 30.

 So my interpretation of this is that my missing file is a
 to-be-expected ancient part of posix, unless I run sync after every
 write. Is this right? Out of curiousity, what would happen if I ran
 sync and pulled the power at the same time (that is, what cases can
 cause the filesystem to get inconsistent)?


 But I still don't get how softdeps fits into all this. That page goes on:

 With softdeps running, you've got almost the same guarantee. With
 softdeps, you have the guarantee that you will get a consistent
 snapshot of the file system as it was at some particular point in time
 before the crash. So you don't know, as you did without softdeps,
 that, for example, if you did an atomic operation such as a rename of
 a lock file, the lock file will actually be there; but you do know
 that the directory it was in won't be trashed and you do know that
 ordering dependencies between that atomic operation and future atomic
 operations will have been preserved, so if you are depending on atomic
 operations to control, say, some database-like process (e.g. writing
 mail spool files in batches, gathering data from a transaction system,
 etc.) you can safely start back up where you appear to have left off.

 but while I kind of grasp the details, I can't seem to figure out what
 they mean in context.

 Enlightenment appreciated! I don't want to be that guy in 20 years who
 rewrites the filesystem to be more efficient by not actually writing
 the quantum-light-platter.

 (and btw, why isn't http://openbsd.org/papers linked from the front page?)

 -Nick



Re: Truncation Data Loss

2009-11-10 Thread Brad Tilley
On Tue, Nov 10, 2009 at 5:25 PM, Bob Beck b...@openbsd.org wrote:
 2009/11/10 Jussi Peltola pe...@pelzi.net:
 On Tue, Nov 10, 2009 at 11:18:57AM -0700, Theo de Raadt wrote:
 If you want to never lose data, you have an option. B Make the filesystem
 syncronous, using the -o sync option.

 If you can't accept the performance hit from that, then please accept
 that all the work done over the ages is only on ensuring metadata-safety
 for a low performance penalty. B It has never been about trying to
 promise file data consistancy when that could only be achieved by
 syncronous file data writing.

 And the more or less correct solution to improve the performance is
 battery backed RAID write cache, but it's no silver bullet.


  Other than it will still blow goats because it will be bashing
 all that data synchronously over the bus.

On a positive note, when a RAID control bug corrupts data, the
corruption is much more efficient than it would have been without
RAID.

Brad



Re: Truncation Data Loss

2009-11-10 Thread Nick Guenther
On Tue, Nov 10, 2009 at 1:18 PM, Theo de Raadt dera...@cvs.openbsd.org
wrote:
On Tue, Nov 10, 2009 at 4:29 AM, Nick Guenther kou...@gmail.com wrote:
 So, as nicely summarized at

http://www.h-online.com/open/news/item/Possible-data-loss-in-Ext4-740467.html
,
 ext4 is kind of broken. It won't honor fsync and, as a /feature/, will
 wait up to two minutes to write out data, leading to lots of files
 emptied to the great bitbucket in the sky if the machine goes down in
 that period. Why is this relevant to OpenBSD? Well sometimes I've been
 writing a file in vi or mg and had my machine go down, and when it
 comes back I find that the file is empty and I'm just trying to figure
 out if this is just because the data wasn't fsync'd or if it's because
 of softdep or what.

softdep has that effect.  The file was created and then data written.
But softdep cares more about the first op than the second, so there's
a window where crashing will cause you to wake up with empty files.

Without softdep, it's more likely you'll have your data (though it may
even be the old version, and you may have to look in lost+found for
it).  softdep works fine with fsync, but the old unix trick of write
data then rename leads to empty files, because the rename is sped up
but the data isn't.

 There is a very simple explanation for why things are so.

 Actual data file loss has never been what these things were coded for.

 filesystem *tree and meta-data*, ie. the structure of how things are
 knit together, is the main concern.  If you lose the filesystem tree
 structure, you've lost all your files, not just the newest ones.
 Therefore the goal is safe metadata handling.  The result is you can
 lose specific data in specific (newly written to) files, but the
 structure of the filesystem is consistant enough for fsck to not damage
 it.

 If you want to never lose data, you have an option.  Make the filesystem
 syncronous, using the -o sync option.

 If you can't accept the performance hit from that, then please accept
 that all the work done over the ages is only on ensuring metadata-safety
 for a low performance penalty.  It has never been about trying to
 promise file data consistancy when that could only be achieved by
 syncronous file data writing.


Thank you Ted and Theo for setting the record straight. I'm still a
bit confused so in the hopes of enlightening us all I'd like to keep
asking.

See, since it seems that BSD doesn't have this file-data consistency
guarantee, are Linus' worries about ext4's potential data loss just
being alarmist? It seems to me that the case described in
https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45
is just as likely to happen on OpenBSD--if I run KDE or GNOME and mess
around with my settings then quickly murder the system the files will
be resurrected empty, right?

Another summary article,
http://www.h-online.com/open/news/item/Kernel-developers-squabble-over-Ext3-a
nd-Ext4-740787.html,
says that ext3 mounted with data=ordered  changes to metadata only
become valid after writing the payload data. My understanding is that
the way this works is the metadata gets journalled to a scratch area
on the disk, then once the syncer gets around to actually writing the
file data (the 'payload') to some new unused location on disk, the
metadata in the journal gets written to the disk too. If the system
goes down before the payload gets written (or even after, but before
the metadata) then the old version of the file is the one still in the
filesystem. This way a file is either in its old state or new state,
never in-between. So then where would my empty file example fit in? Is
it impossible on ext3?

I know I'm getting off topic a bit, but I know this list is clear
enough to clean up the mud puddle. I'm trying to understand the
implementation choices of my chosen OS, so that I can either defend it
to linux zealots. This table summarizes my understanding of the
approximate equivalencies between the various ext and ffs modes.
Please, if I'm totally off, hit me:
[ext3 data= / FFS]
journal ~= sync (ensures consistency of both metadata and file data)
ordered ~= softdep (ensures consistency of metadata both internally
and with file data)
writeback ~= default (ensures consistency of metadata internally but
real file data may not agree, e.g. my empty file)
Additionally FFS has the async flag which turns off the internal
consistency of the metadata structures; I guess there's no equivalent
for this in ext?
What is the reason softdep isn't on by default?

Sorry for being long winded, but I'm thinking some people will be
tempted into geeking out with me,
-Nick

[My own personal See Also section:
ext3 from the horse's mouth
http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html
ext3 internals in reality
http://www.sans.org/reading_room/whitepapers/forensics/taking_advantage_of_e
xt3_journaling_file_system_in_a_forensic_investigation_2011
softdep from the horse's mouth 

Re: Truncation Data Loss

2009-11-10 Thread Daniel Ouellet
Bryan Irvine wrote:
 I lost a picture of Bob Becks ass this same exact way.

Very popular piece of art!

And a collectors item these days, specially in Germany looks like! (;

Might be the next hot item on some stickers coming your way next release! (;

Probably would however need a disclaimer as a requirements of being 18 to open
the new packages.