Re: fsync in glib/gio

2009-03-25 Thread Freddie Unpenstein
 fsync() performance will always be crappy on notebooks. If disk is
 spun down, fsync() will take 5 seconds or more...

This is why it needs to be opt-in.  An environment variable for those 
concerned, to enable fsync().

That also helps answer the question of how much, or how little.  Let the 
environment variable be a fsynchronisation level.


Fredderic


Medical Transcription Training
Click here to find Medical Transcription Training programs.
http://tagline.excite.com/fc/FgElN1gzvAMuYrsM7v1JxF3j6oo4aeEC1tto4gjLgP9DKyq04t81pjeoafW/___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-24 Thread Pavel Machek

 Yeah, I'd totally agree.  But in the absence of an ability to change the  
 spec, it's best to try to make things work as well as they can within  
 the spec, no?  It seems like some people are advocating well, today  
 everyone uses ext3, and there's no problem, so we shouldn't do this  
 because it'll reduce performance there.  And of course, a year from now  
 (or less!  obviously some already are), I'm sure most desktop distros  
 will be shipping with ext4 default.  (And I could be wrong, but it seems  
 to me that ext3 is the only FS that, by coincidence will usually be  
 immune to this problem, and, also coincidentally, is one of the only  
 FSes that has crappy fsync() performance.)

fsync() performance will always be crappy on notebooks. If disk is
spun down, fsync() will take 5 seconds or more...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-24 Thread Pavel Machek
Hi!

   I think you don't understand the problem.
  
  That might very well be the case. I had a look at the presentation that
  Alex linked to in the initial post in this thread. But I would have
  preferred a document that doesn't look at the issue from a database
  developer point of view.
 
 Here is a comment from the save file point of view.
 https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54
 
  It seems wrong to work around broken file-systems on the application
  level. That only takes away pressure from the file-system developers to
  address the problem properly.
 
 I don't disagree, but on the other hand. Users are losing data as we
 speak. (See above ubuntu bug report)

It got fixed in ext4...

 One compromise we could make it to only fsync in the case we're actually
 overwriting an existing file. This would mean that we don't risk
 loosing

You should fsync just before doing rename, preferably in some special
way so that we can tell it apart from 'normal' fsync.

If you are overwriting files with truncate that is broken and
always was. You need to use rename. 
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-24 Thread Alexander Larsson
On Tue, 2009-03-24 at 10:01 +0100, Pavel Machek wrote:
 
   It seems wrong to work around broken file-systems on the application
   level. That only takes away pressure from the file-system developers to
   address the problem properly.
  
  I don't disagree, but on the other hand. Users are losing data as we
  speak. (See above ubuntu bug report)
 
 It got fixed in ext4...

Yes, but not in e.g. XFS.

  One compromise we could make it to only fsync in the case we're actually
  overwriting an existing file. This would mean that we don't risk
  loosing
 
 You should fsync just before doing rename, preferably in some special
 way so that we can tell it apart from 'normal' fsync.

This is what I mean of course. And there is no such special way
infortunately.

 If you are overwriting files with truncate that is broken and
 always was. You need to use rename. 

Of course.

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-24 Thread Alexander Larsson
On Tue, 2009-03-24 at 10:38 +0100, Pavel Machek wrote:
 On Tue 2009-03-24 10:30:52, Alexander Larsson wrote:
  On Tue, 2009-03-24 at 10:01 +0100, Pavel Machek wrote:
   
 It seems wrong to work around broken file-systems on the application
 level. That only takes away pressure from the file-system developers 
 to
 address the problem properly.

I don't disagree, but on the other hand. Users are losing data as we
speak. (See above ubuntu bug report)
   
   It got fixed in ext4...
  
  Yes, but not in e.g. XFS.
 
 Well, given enough pressure, I'm sure XFS can be fixed too. (And then
 it can perhaps gain its reliable filesystem badge).

Even if it is fixed there would be other filesystems like that flash
filesystem that nokia uses that do the same thing.

One compromise we could make it to only fsync in the case we're actually
overwriting an existing file. This would mean that we don't risk
loosing
   
   You should fsync just before doing rename, preferably in some special
   way so that we can tell it apart from 'normal' fsync.
  
  This is what I mean of course. And there is no such special way
  infortunately.
 
 One proposal was to create replace() call (doing proper replacement
 of one file with another). It could start up as fsync(); rename()
 initially, but it should slowly move up into glibc and kernel, and do
 the right thing.

I proposed something similar, and i've seen multiple other proposals
too. However, I belive it when i can call it on a released distro.


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-24 Thread Xavier Bestel
On Tue, 2009-03-24 at 10:38 +0100, Pavel Machek wrote:
One compromise we could make it to only fsync in the case we're actually
overwriting an existing file. This would mean that we don't risk
loosing
   
   You should fsync just before doing rename, preferably in some special
   way so that we can tell it apart from 'normal' fsync.
  
  This is what I mean of course. And there is no such special way
  infortunately.
 
 One proposal was to create replace() call (doing proper replacement
 of one file with another). It could start up as fsync(); rename()
 initially, but it should slowly move up into glibc and kernel, and do
 the right thing.

Isn't that a bit heavyweight ? The minimal thing needed would be a sort
of barrier, to ensure the file state is coherent before and after the
rename() - i.e. replace()+unlink() shouldn't do more io than unlink().

IMHO the kernel should guarantee a safe rename() (more that the minimal
posix requirement) with either the old or new file on disk. But maybe
that's too hard to fix.

Xav


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-24 Thread Alexander Larsson
On Tue, 2009-03-24 at 11:09 +0100, Xavier Bestel wrote:
 On Tue, 2009-03-24 at 10:38 +0100, Pavel Machek wrote:
 One compromise we could make it to only fsync in the case we're 
 actually
 overwriting an existing file. This would mean that we don't risk
 loosing

You should fsync just before doing rename, preferably in some special
way so that we can tell it apart from 'normal' fsync.
   
   This is what I mean of course. And there is no such special way
   infortunately.
  
  One proposal was to create replace() call (doing proper replacement
  of one file with another). It could start up as fsync(); rename()
  initially, but it should slowly move up into glibc and kernel, and do
  the right thing.
 
 Isn't that a bit heavyweight ? The minimal thing needed would be a sort
 of barrier, to ensure the file state is coherent before and after the
 rename() - i.e. replace()+unlink() shouldn't do more io than unlink().

The idea is of course that replace() would eventually be replaced with
something that did less when that is availible.

 IMHO the kernel should guarantee a safe rename() (more that the minimal
 posix requirement) with either the old or new file on disk. But maybe
 that's too hard to fix.

Just because some kernel guarantees this doesn't mean that glib can rely
on that though.

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-17 Thread Sven Neumann
Hi,

On Mon, 2009-03-16 at 17:04 +0100, Alexander Larsson wrote:

 I commited something like this limited patch to svn. However, I didn't
 add the public API parts yet.

I've also turned in and accepted that using fsync() is needed at least
when replacing a file. I've done a similar change for GIMP in
libgimpconfig/gimpconfigwriter.c.


Sven


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-16 Thread Michael Meeks

On Sun, 2009-03-15 at 10:19 +0100, Alexander Larsson wrote:
 The debate is far from over with this. gio should be made slower and do
 unnecessary syncronous I/O in order to fulfill the standards, yes.

Sure, it should fsync on ext4-before-it-was-fixed systems - it sucks to
loose data; though I'm still unconvinced this is a standards issue :-)

  But there are milllions of lines of code that does the rename as
 atomic replace and the chances that anywhere near a majority of those
 are fixed is extremely slim. Therefore everyone should be aware of
 the broken filesystems that don't give data-before-metadata-on-rename
 guarantees so that sane people can stay away from them.

Out of interest, what distributions are shipping with ext4 configured
in this helpful loose your data mode ? can we not simply inject the
go slower patch into these ext4 distros [ since it won't affect their
performance quite as badly as everywhere else ], as a temporary
workaround; and then sit back and let the default glib behaviour be to
work well on all sane systems ? :-)

Regards,

Michael.

-- 
 michael.me...@novell.com  , Pseudo Engineer, itinerant idiot


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-16 Thread Alexander Larsson
On Mon, 2009-03-16 at 10:23 +, Michael Meeks wrote:
 On Sun, 2009-03-15 at 10:19 +0100, Alexander Larsson wrote:
  The debate is far from over with this. gio should be made slower and do
  unnecessary syncronous I/O in order to fulfill the standards, yes.
 
   Sure, it should fsync on ext4-before-it-was-fixed systems - it sucks to
 loose data; though I'm still unconvinced this is a standards issue :-)

There is no requirements in any standard as to what happens on system
crashes.

   But there are milllions of lines of code that does the rename as
  atomic replace and the chances that anywhere near a majority of those
  are fixed is extremely slim. Therefore everyone should be aware of
  the broken filesystems that don't give data-before-metadata-on-rename
  guarantees so that sane people can stay away from them.
 
   Out of interest, what distributions are shipping with ext4 configured
 in this helpful loose your data mode ? can we not simply inject the
 go slower patch into these ext4 distros [ since it won't affect their
 performance quite as badly as everywhere else ], as a temporary
 workaround; and then sit back and let the default glib behaviour be to
 work well on all sane systems ? :-)

It seems both ubuntu and F11 will backport the ext4 fixes. However, even
with these fixed there is risk for data loss on e.g. xfs, and even ext3
if you configure it in data=writeback mode. There was also reports from
nokia about it being unsafe on the flash file system they were using.

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-16 Thread Sven Herzberg
Am Montag, den 16.03.2009, 11:49 +0100 schrieb Alexander Larsson:
 On Mon, 2009-03-16 at 10:23 +, Michael Meeks wrote:
  On Sun, 2009-03-15 at 10:19 +0100, Alexander Larsson wrote:
But there are milllions of lines of code that does the rename as
   atomic replace and the chances that anywhere near a majority of those
   are fixed is extremely slim. Therefore everyone should be aware of
   the broken filesystems that don't give data-before-metadata-on-rename
   guarantees so that sane people can stay away from them.
  
  Out of interest, what distributions are shipping with ext4 configured
  in this helpful loose your data mode ? can we not simply inject the
  go slower patch into these ext4 distros [ since it won't affect their
  performance quite as badly as everywhere else ], as a temporary
  workaround; and then sit back and let the default glib behaviour be to
  work well on all sane systems ? :-)
 
 It seems both ubuntu and F11 will backport the ext4 fixes. However, even
 with these fixed there is risk for data loss on e.g. xfs, and even ext3
 if you configure it in data=writeback mode. There was also reports from
 nokia about it being unsafe on the flash file system they were using.

If flash file system is this one (they mention Nokia on the website -
2nd link):

http://www.linux-mtd.infradead.org/faq/ubifs.html#L_powercut
http://www.linux-mtd.infradead.org/doc/ubifs.html

This might be either not noticed by the developers or not an issue with
their file system.

Regards,
  Sven

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-16 Thread Alexander Larsson
On Mon, 2009-03-16 at 11:55 +0100, Sven Herzberg wrote:
 Am Montag, den 16.03.2009, 11:49 +0100 schrieb Alexander Larsson:
  On Mon, 2009-03-16 at 10:23 +, Michael Meeks wrote:
   On Sun, 2009-03-15 at 10:19 +0100, Alexander Larsson wrote:
 But there are milllions of lines of code that does the rename as
atomic replace and the chances that anywhere near a majority of those
are fixed is extremely slim. Therefore everyone should be aware of
the broken filesystems that don't give data-before-metadata-on-rename
guarantees so that sane people can stay away from them.
   
 Out of interest, what distributions are shipping with ext4 configured
   in this helpful loose your data mode ? can we not simply inject the
   go slower patch into these ext4 distros [ since it won't affect their
   performance quite as badly as everywhere else ], as a temporary
   workaround; and then sit back and let the default glib behaviour be to
   work well on all sane systems ? :-)
  
  It seems both ubuntu and F11 will backport the ext4 fixes. However, even
  with these fixed there is risk for data loss on e.g. xfs, and even ext3
  if you configure it in data=writeback mode. There was also reports from
  nokia about it being unsafe on the flash file system they were using.
 
 If flash file system is this one (they mention Nokia on the website -
 2nd link):
 
 http://www.linux-mtd.infradead.org/faq/ubifs.html#L_powercut
 http://www.linux-mtd.infradead.org/doc/ubifs.html
 
 This might be either not noticed by the developers or not an issue with
 their file system.

Yes, this was it. See:
http://bugzilla.gnome.org/show_bug.cgi?id=562976

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-15 Thread Stef Walter
Mark Mielke wrote:
 I think fsync() is absolutely necessary to be explicit in this
 situation, because the application needs to assert that all data is
 written *before* using rename to perform the atomic-change-in-place
 effect. I think that anybody who thinks fsync() is unnecessary is
 failing to see the principle that fsync() exists solely for the purpose
 of guaranteeing this state, and that if you think fsync() should be
 unnecessary here, you should also think fsync() should be unnecessary
 anywhere else. Why have an fsync() at all? Why shouldn't all operations
 be synchronous by nature? Change the specification to force all I/O
 operations to be ordered that way no application developer will ever
 have to be surprised or ever call a synchronization primitive again. Right?

fsync() was really broken on ext3. Now, all of a sudden it's teh
awesome FTW!!!

There's a reason people haven't been using it. It could take an obscene
amount of time to complete depending on what you happened to be doing in
elsewhere in the (multi-tasking, no less) OS.

Stef

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-15 Thread Alexander Larsson
On Sat, 2009-03-14 at 21:02 -0400, Mark Mielke wrote:

 The debate should be over. Debating about other file systems and some 
 theoretical change to the spec is quite pointless in gtk-devel-list. At 
 best, it's a legitimate rant. At worst, it's an ignorant rant. In any 
 case, it's a rant. Fix glib/gio for the rename atomic change-in-place 
 case specifically. Everybody is happy from a glib/gio perspective. If 
 thousands of other applications are still broken - who cares?

The debate is far from over with this. gio should be made slower and do
unnecessary syncronous I/O in order to fulfill the standards, yes. But
there are milllions of lines of code that does the rename as atomic
replace and the chances that anywhere near a majority of those are
fixed is extremely slim. Therefore everyone should be aware of the
broken filesystems that don't give data-before-metadata-on-rename
guarantees so that sane people can stay away from them.


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-15 Thread Alexander Larsson
On Sat, 2009-03-14 at 14:30 -0700, Brian J. Tarricone wrote:

  2) such filesystems are broken
  
  Clearly the answer to 1 is yes. Anything else would be a disservice to
  our users data. However, that doesn't mean such filesystems aren't
  broken, in the sense that I would never let a filesystem like that
  near any of my data.
 
 Well, you're certainly entitled to your opinion.  Personally I don't
 think the filesystem is broken.  It's behaving within spec.  You can
 argue that the spec is broken, or that it specifies behavior that
 defies common sense, but it's there, and it's been there for quite a
 while, so I don't quite understand why this is suddenly such a big
 deal.  (Well, sure I do: no one decided to use a FS that behaves like
 this as a distro default before, I guess.)

All filesystems, including ext2 which may lose all filesystem data in
the case of a crash, are behaving within spec. Any guarantee beyond
this is a quality/robustness of implementation issue in the particular
filesystem in use. Now, people did not think the robustness of ext2 was
enough, so ext3 added journaling. 

The commonly used idiom of write-to-tempfile-then-rename-over-target is
atomic as per POSIX spec (i.e. when the system doesn't crash, as that is
outside the spec), in that any app opening the file at any time gets
either the full old data or the full new data.

Now, a *good* (i.e. not only correct) filesystem should imho carry over
this property to a system-crash level. I understand that this is not
required by any document, and that it might cause some performance
overhead, but that same is true of e.g. journaling in the first place.
If we're merely going after the minimum requirements per the spec, why
even add journaling?

 That sounds pretty awful, to me, to be honest.  So every FS -- no, wait
 -- every FS that's going to be pushed as a mainstream FS -- is going
 to have to be closely monitored to make sure it doesn't have this
 behavior?  Everyone's going to be putting little band-aids over this
 issue, but only in FSes we care about?  The underlying issue, the
 root cause -- that the spec allows what many consider very unsafe
 behavior -- is just going to be ignored?

Well, ideally there should be a way to say you need data-before-metadata
guarantees so that you don't have to force a fsync. However, I don't see
that happening really.


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-15 Thread Mark Mielke

Stef Walter wrote:

Mark Mielke wrote:
  

I think fsync() is absolutely necessary to be explicit in this
situation, because the application needs to assert that all data is
written *before* using rename to perform the atomic-change-in-place
effect. I think that anybody who thinks fsync() is unnecessary is
failing to see the principle that fsync() exists solely for the purpose
of guaranteeing this state, and that if you think fsync() should be
unnecessary here, you should also think fsync() should be unnecessary
anywhere else. Why have an fsync() at all? Why shouldn't all operations
be synchronous by nature? Change the specification to force all I/O
operations to be ordered that way no application developer will ever
have to be surprised or ever call a synchronization primitive again. Right?



fsync() was really broken on ext3. Now, all of a sudden it's teh
awesome FTW!!!

There's a reason people haven't been using it. It could take an obscene
amount of time to complete depending on what you happened to be doing in
elsewhere in the (multi-tasking, no less) OS.
  


This depends on what your priority is. If your priority is to be 
absolutely certain that the file be intact on power failure - then you 
really have no other option.


Most people have not had this requirement in the past.

The rename to effect atomic-change-in-place is a scenario where you want 
a stronger guarantee. It's not about fsync() being awesome - it's 
about it being necessary to achieve this guarantee in a portable way - 
whatever the cost.


Cheers,
mark

--
Mark Mielke m...@mielke.cc

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-15 Thread Mark Mielke

Alexander Larsson wrote:
Of course, given that POSIX allows this behaviour we should 
probably use the fsync hammer to make the risks for data loss less at

least in some cases. But to argue that such behaviour from the
filesystem is *good*. It boggles the mind.

Anyway, this argument is over for me. XFS has long had problems with
this but they have now changed so that rename overwrite is safe (they
even verify this in their QA runs these days), ext4 will have patches
for this in 2.6.30, and the btrfs maintainer said he will queue similar
patches for 2.6.30. Well probably add the fsync to glib saving in the
file already exists case in order to protect against this on other
systems, but the main future linux filesystems at least are sane in
their default configurations.
  


I think we agree more than disagree on the principles. That is, fsync() 
is a sort of hammer, but it probably should be used to explicitly ensure 
that operation is safe, even on systems that do not have built-in 
protection for this particular sequence of operations.


The part we disagree on, I think, is the best course of action. Fighting 
with each file system provider to provide a guarantee not listed in the 
spec, is going to be painful whether you are right or wrong.


I somewhat liked the ReiserFS approach - although I never used it 
myself. I believe they allowed you to demarcate transaction boundaries 
and commit the whole transaction or not at all. This has less of the 
baggage that comes with fsync(), and makes the intent of the application 
clear to the file system. This even works for more complicated 
situations than the rename() case people are fussing over. All 
non-standard extensions of course. If you really feel that the current 
behaviour is wrong - the best approach would be to change the spec.


Cheers,
mark

--
Mark Mielke m...@mielke.cc

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-14 Thread Alexander Larsson
On Fri, 2009-03-13 at 18:20 -0600, Federico Mena Quintero wrote:
 On Fri, 2009-03-13 at 22:16 +0100, Alexander Larsson wrote:
 
  Its well explained in the various discussions about this. Essentially,
  the metadata for the rename is written to disk, but the data in the file
  is not (yet, due to delayed allocation) and then the system crashes. On
  fsck we discover the file is broken (no data) and set the file size to
  0.
 
 This reminds me a lot of
 http://bugzilla.gnome.org/show_bug.cgi?id=562396 - a problem with
 Nautilus metadata.  You start a copy operation, but if you do a read
 before the copy is done, then you get the old data.  You should wait
 for the copy to be done first, but anyway, my point is...
 
 My point is that the kernel could perfectly well ensure that metadata
 operations that depend on data operations will not be reordered.  Don't
 rename a file in a directory if we have outstanding writes for the
 inode, or something.  (After the rename, do you need to open the
 directory and fsync it?  You can't open directories for writing...)

It could, but that is a high cost for filesystem performance, so its not
always there (e.g. in XFS) or an optional (possibly enabled by default).

I don't think the directory fsync matters to get a safe save, if the
directory metadata is not written we would get the old file. There is no
risk in destroying both file if directory metadata is not written. Of
course, if you want to guarantee it on disk you need to fsync the
directory too.


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-14 Thread Alexander Larsson
On Fri, 2009-03-13 at 14:34 -0700, Brian J. Tarricone wrote:

 
  Now, we don't actually really need the data to be on the disk at a
  certain time. On the contrary, its really fine if its delayed. But, what
  we want is either the old file in place, or the new file in place, not
  the old file deleted, the metadata for the new file and the new file
  being empty. Thats what is broken, even if its allowed by POSIX.
 
 Sure, but that's just a special case.  So you (as the app developer) 
 recognise this, understand how the spec interacts with your use-case, 
 and write robust code accordingly.
 
 Or, you take the the spec/kernel/FS is broken approach, and try to get 
 a guarantee specified for the special case, something like in the case 
 where a file is renamed over top of an existing file, the source file 
 must be flushed to disk before the rename takes place.  And then the 
 app developer doesn't have to worry about it, because the implementation 
 should do the right thing.

I think you are conflating two issues.

1) Should glib/gio fsync at least in the really bad data loss case to
make sure users are not losing data on such filesystem

2) such filesystems are broken

Clearly the answer to 1 is yes. Anything else would be a disservice to
our users data. However, that doesn't mean such filesystems aren't
broken, in the sense that I would never let a filesystem like that near
any of my data.

For instance, any script doing sed -i s/foo/bar/ file.conf on such a
filesystem risks ending up with a zero byte file.conf. (sed uses rename
but doesn't fsync.) Is this what users except? Should that script be
rewritten in C so it can use fsync? Should sed fsync? That kind of
reasoning will lead to all apps implementing fsync-on-close manually,
and we're then worse off than if the fs just guaranteed
data-before-metadata-on-rename.

 Yeah, I'd totally agree.  But in the absence of an ability to change the 
 spec, it's best to try to make things work as well as they can within 
 the spec, no?  It seems like some people are advocating well, today 
 everyone uses ext3, and there's no problem, so we shouldn't do this 
 because it'll reduce performance there.  And of course, a year from now 
 (or less!  obviously some already are), I'm sure most desktop distros 
 will be shipping with ext4 default.  (And I could be wrong, but it seems 
 to me that ext3 is the only FS that, by coincidence will usually be 
 immune to this problem, and, also coincidentally, is one of the only 
 FSes that has crappy fsync() performance.)

ext4 from the next kernel release will have patches that makes the
rename case safe, although it will be an option that can be disabled.
Not sure about btrfs. Its unsafe at the moment, but chris mason is
talking about possible fixes in the lwn thread.

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-14 Thread Mark Mielke

Alexander Larsson wrote:

2) such filesystems are broken

Clearly the answer to 1 is yes. Anything else would be a disservice to
our users data. However, that doesn't mean such filesystems aren't
broken, in the sense that I would never let a filesystem like that near
any of my data.

For instance, any script doing sed -i s/foo/bar/ file.conf on such a
filesystem risks ending up with a zero byte file.conf. (sed uses rename
but doesn't fsync.) Is this what users except? Should that script be
rewritten in C so it can use fsync? Should sed fsync? That kind of
reasoning will lead to all apps implementing fsync-on-close manually,
and we're then worse off than if the fs just guaranteed
data-before-metadata-on-rename.
  


Broken file system or not - all portable applications that require the 
file system to be in a particular state before continuing should use 
fsync(). rename() to accomplish the effect of atomic-change-in-place is 
exactly the sort of scenario where you want to guarantee a particular 
state before continuing, and it is exactly the sort of place where 
fsync() should be performed explicitly. This is without regard to 
whatever general safety is provided by the file system. It is wrong to 
conclude that fsync() is unnecessary. Should sed -i use fsync()? If it 
is promising atomic-change-in-place, then it certainly should.


Also, if the user chooses a file system which makes fewer guarantees 
during a pull the plug test, they should be willing to live with their 
choices. In ext2 days and FAT16/32 days, the effects could be very bad.


This thread has focused on the rename() case, often used to have the 
atomic-change-in-place effect. There are other cases that even your most 
favourite file system mode may not protect you from. Most file systems 
won't guarantee a write() order to disk, as I listed before. Heck, even 
if you write() 2 x 512-byte blocks in a row - you are not guaranteed 
that the first block will be written before the second. The system 
probably tunes for sequential writes to improve performance, but it's 
not a guarantee, and if you wrote the second before the first - you 
might find that the first still writes before the second. This is 
probably worse in the mmap() case where pages are dirtied. Which page 
will be flushed to disk first? The only way you know for sure is with 
barriers like fsync(). This promises that it will not proceed until the 
state has been sent to disk.


File system journalling was introduced to improve file system recovery 
speed and accuracy. It was not introduced to provide the ACID guarantees 
associated with database systems. There is room for it to take this 
direction - but the expectation that is must is unrealistic. If a lot of 
code out there happens to be buggy (for example - close()/rename() from 
atomic-change-in-place), then the file system can try to work around 
these bugs, but I think it's wrote to say that it must.


Cheers,
mark

--
Mark Mielke m...@mielke.cc

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-14 Thread Alexander Larsson
On Sat, 2009-03-14 at 13:38 -0400, Mark Mielke wrote:

 Should sed -i use fsync()? If it 
 is promising atomic-change-in-place, then it certainly should.

This is the same kind of reasoning that says its ok to do something
because its specified by posix. If its not defined somewhere that sed -i
must use fsync, is it then ok to lose users data. Its certainly per the
spec, but its a pretty sucky system that I wouldn't want to use.

 Also, if the user chooses a file system which makes fewer guarantees 
 during a pull the plug test, they should be willing to live with their 
 choices. In ext2 days and FAT16/32 days, the effects could be very bad.

Sure, way back we had even crappier software. This shouldn't be the
reason to not hold todays standard higher in the robustness area.

 This thread has focused on the rename() case, often used to have the 
 atomic-change-in-place effect. There are other cases that even your most 
 favourite file system mode may not protect you from. Most file systems 
 won't guarantee a write() order to disk, as I listed before.

We focus on that one because its of increadible important, being the
historical way that one implements save on unix (witness e.g. gnu sed
using it for in-place replace), and its a pretty simple common
operation. If you do all sorts of weird database or mmap operations
everyone expects that you have to handle these details. Especially,
nobody ever assumed the filesystem was a database or that it provides
ACID guarantees. Please don't use such strawman arguments.

We all understand that its is per-spec to not guarantee
data-before-metadata on rename, we're not stupid and able to read a
manpage as well as you. But we still think its a bad idea and not a sign
of robust software.

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-14 Thread Alexander Larsson
On Sat, 2009-03-14 at 19:21 +0100, Alexander Larsson wrote:

 We all understand that its is per-spec to not guarantee
 data-before-metadata on rename, we're not stupid and able to read a
 manpage as well as you. But we still think its a bad idea and not a sign
 of robust software.

Additionally, I'm not saying glib should not fsync in the rename case.
We should of course follow the spec so that glib apps don't lose data on
such filesystems. But that doesn't make such filesystem behaviour a good
idea.

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-14 Thread Brian J. Tarricone
On Sat, 14 Mar 2009 13:16:45 +0100 Alexander Larsson wrote:

 On Fri, 2009-03-13 at 14:34 -0700, Brian J. Tarricone wrote:
 
 I think you are conflating two issues.

No, I don't think I am.  I think I'm just replying to a subset of the
email that's slightly off topic.  Or rather, the fact that I'm only
replying to someone's assertion that the filesystem is broken and that
I was ignoring the issue of gio initially made my reply a bit off topic.

 1) Should glib/gio fsync at least in the really bad data loss case to
 make sure users are not losing data on such filesystem

Right, I think we're in perfect agreement that this is a 'yes', though
we may not agree that gio should fsync quite as often (I'd advocate to
do it *only* in the rename-over-existing case, and to have the default
be no-fsync everywhere else).

 2) such filesystems are broken
 
 Clearly the answer to 1 is yes. Anything else would be a disservice to
 our users data. However, that doesn't mean such filesystems aren't
 broken, in the sense that I would never let a filesystem like that
 near any of my data.

Well, you're certainly entitled to your opinion.  Personally I don't
think the filesystem is broken.  It's behaving within spec.  You can
argue that the spec is broken, or that it specifies behavior that
defies common sense, but it's there, and it's been there for quite a
while, so I don't quite understand why this is suddenly such a big
deal.  (Well, sure I do: no one decided to use a FS that behaves like
this as a distro default before, I guess.)

This is, btw, why I wouldn't use XFS on a partition with data I care
about.  XFS was designed with priorities other than data integrity in
mind.  I accept that, use something else, and move on.  You don't see
anyone declaring XFS broken (well, at least, not anyone knowledgeable),
do you?  So why is ext4 broken?  Because it's being touted as an ext3
replacement, I guess...

 For instance, any script doing sed -i s/foo/bar/ file.conf on such a
 filesystem risks ending up with a zero byte file.conf. (sed uses
 rename but doesn't fsync.) Is this what users except? Should that
 script be rewritten in C so it can use fsync? Should sed fsync?

Yes, it should!  Given the spec, if it does anything less, it's
risking user data.

 That
 kind of reasoning will lead to all apps implementing fsync-on-close
 manually, and we're then worse off than if the fs just guaranteed
 data-before-metadata-on-rename.

Yep, probably true.  But the right move here is to get the spec
changed, not say any filesystem that follows the spec but doesn't work
the way *I* think it should is broken.  How is that approach at all
useful in the big picture?

  Yeah, I'd totally agree.  But in the absence of an ability to
  change the spec, it's best to try to make things work as well as
  they can within the spec, no?  It seems like some people are
  advocating well, today everyone uses ext3, and there's no problem,
  so we shouldn't do this because it'll reduce performance there.
  And of course, a year from now (or less!  obviously some already
  are), I'm sure most desktop distros will be shipping with ext4
  default.  (And I could be wrong, but it seems to me that ext3 is
  the only FS that, by coincidence will usually be immune to this
  problem, and, also coincidentally, is one of the only FSes that has
  crappy fsync() performance.)
 
 ext4 from the next kernel release will have patches that makes the
 rename case safe, although it will be an option that can be disabled.
 Not sure about btrfs. Its unsafe at the moment, but chris mason is
 talking about possible fixes in the lwn thread.

That sounds pretty awful, to me, to be honest.  So every FS -- no, wait
-- every FS that's going to be pushed as a mainstream FS -- is going
to have to be closely monitored to make sure it doesn't have this
behavior?  Everyone's going to be putting little band-aids over this
issue, but only in FSes we care about?  The underlying issue, the
root cause -- that the spec allows what many consider very unsafe
behavior -- is just going to be ignored?

Sigh.  Well, I guess I'm getting even more off topic here, so I give
up.  I see farther down in the replies Mark argues pretty much the same
thing I am, only doing a better job of it.

-brian
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-14 Thread Mark Mielke

Alexander Larsson wrote:

On Sat, 2009-03-14 at 19:21 +0100, Alexander Larsson wrote:
  

We all understand that its is per-spec to not guarantee
data-before-metadata on rename, we're not stupid and able to read a
manpage as well as you. But we still think its a bad idea and not a sign
of robust software.



Additionally, I'm not saying glib should not fsync in the rename case.
We should of course follow the spec so that glib apps don't lose data on
such filesystems. But that doesn't make such filesystem behaviour a good
idea.
  


Let's be clear about what you think is a bad idea. You think it's a bad 
idea for a file system to optimize a situation that is legal under the 
specification, but not well known by the application developers. You 
think that because it is commonly done, therefore the operation system 
and/or file system should guess what your intent is, and disable the 
optimization for this situation.


Let's take this away from file systems for a second and to a similar 
situation. A lot of Java designers still don't know about the Java 
memory model, and how changes to variable in one thread may not be 
visible to another thread, or if visible, the changes may not be made in 
the same order unless synchronization primitives are used. This is the 
exact same situation. It's an allowance in the specification that is not 
well known by application developers, that requires careful use of 
synchronization primitives in order to function reliably and portably. 
fsync() is a synchronization primitive. In case people don't like Java 
here - similar things can happen in C, which is why compiler 
instructions like 'volatile' become necessary.


It's a newbie mistake of sorts. Here's another one - people who don't 
check the return result of close()/fclose(). I bet if you look through 
many of the people who do close()/rename(), you'll find that a lot of 
them don't check the result of close() before they call rename(). I note 
that the glib/gio as referenced in the patch properly checks the result 
of fclose(). It's very common for applications to not check this. Will 
you also say that any file system where the last write() succeeds, but 
the final close() fails is somehow broken? You used the argument that if 
every application must do it - shouldn't the file system implement it? 
Well, why doesn't this apply to close()? Why stop at rename()?


I'm not calling you stupid or saying you can't read a manpage. I am 
saying that any expectation on your part that file systems everywhere 
will one day universally implement your particular expectations is 
unreasonable, and not everybody agrees with you that the behaviour is 
wrong. I don't agree with you. The specification does not agree with 
you. I think fsync() is absolutely necessary to be explicit in this 
situation, because the application needs to assert that all data is 
written *before* using rename to perform the atomic-change-in-place 
effect. I think that anybody who thinks fsync() is unnecessary is 
failing to see the principle that fsync() exists solely for the purpose 
of guaranteeing this state, and that if you think fsync() should be 
unnecessary here, you should also think fsync() should be unnecessary 
anywhere else. Why have an fsync() at all? Why shouldn't all operations 
be synchronous by nature? Change the specification to force all I/O 
operations to be ordered that way no application developer will ever 
have to be surprised or ever call a synchronization primitive again. Right?


I value asynchronous operations, and see synchronization primitives as 
being a necessary evil to allow for asynchronous operations to be 
possible. write() and close() never promise to store content to disk. 
rename() has nothing to do with content. The only way to guarantee the 
operation is safe is using fsync() before close(). Relying on the file 
system to guess your intent is unreasonable.


Cheers,
mark

--
Mark Mielke m...@mielke.cc

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-14 Thread Mark Mielke

Alexander Larsson wrote:

On Sat, 2009-03-14 at 13:38 -0400, Mark Mielke wrote:
  
Should sed -i use fsync()? If it 
is promising atomic-change-in-place, then it certainly should.



This is the same kind of reasoning that says its ok to do something
because its specified by posix. If its not defined somewhere that sed -i
must use fsync, is it then ok to lose users data. Its certainly per the
spec, but its a pretty sucky system that I wouldn't want to use.
  


If sed -i does NOT make strong promises about your data, and you trust 
sed -i with your data, then the person who lost the data is you, for 
choosing to use sed -i and trusting it with your data.


If sed -i DOES make strong promises about your data, then it should 
absolutely use fsync() to explicitly confirm that the data is safe 
before doing the rename().


Blaming it on the file system for not guessing your intent is 
unreasonable. If you truly have such expectations, make sure that all of 
your mounts are synchronous and that you only use fully journalled file 
systems. You will probably have to reduce your expectations about 
overall system performance as a result.


Personally, I've ALWAYS been wary of perl -i (same thing as sed -i). I 
have never read anywhere that it guarantees the safety of my data. I 
always do backup of the files before I start, and carefully check the 
results after I finish. But, that's me...


Cheers,
mark

--
Mark Mielke m...@mielke.cc

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-14 Thread Morten Welinder
This is crazy.

People are actually advocating that thousands upon thousands of applications
need to be changed.

Yes, POSIX allows this particular idiotic behaviour.  So what?  It probably also
allows free() to do nothing, yet no-one in their right mind would want that.  Or
maybe you would be upset if the code fragment

const char *s = x;
int i = (s+1)-s;

formatted your hard drive.  Yes, the C standard really does allow that
to happen.
(C99 section 6.5.6 #9, if you really want  the details.)I don't
know about you,
but I would return the compiler with a big Broken! label if that happened.

The mere fact that a standard allows an idiotic implementation doesn't mean
we should play ball with it.  The same standard also allows sane
implementations.

We could litter fsync() calls all over, but...

1. It describes a semantic that isn't really what we want.  In fact,
there is no way
to get exactly the semantics we want with POSIX.   We have to ask for the
please-wait-for-the-disk semantics we don't want.  That's a sure
way of getting
sluggish programs.

2. Shell scripts, Makefiles, and other languages without explicit
fsync control will
kill really you.  Instead of...

foo file file.new
mv file.new file

...you get to write...

foo file file.new
sync
mv file.new file

Performance might be affected.

3. Auditing and changing thousands of programs?  Expect bugs.

We already break the strict letter of POSIX and the C standard in
fifty different ways.
If someone shows up with an environment that doesn't behave as we want, we
say sorry, no ball.  Just add stupid file systems to the list.

Morten
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-14 Thread Brian J. Tarricone
On Sat, 14 Mar 2009 20:10:32 -0400 Morten Welinder wrote:

 This is crazy.
 
 People are actually advocating that thousands upon thousands of
 applications need to be changed.

If they're behaving incorrectly, yes.

But I don't think most of them are.  The only case where *not* doing an
fsync() turns into a problem is if you have a power failure or system
crash at an inopportune time.  So for many applications, that risk
might be acceptable (not all files have equal value).

And the funny thing is: what you think is crazy is EXACTLY what's
being done here.  We're adding a pre-rename fsync() to gio in the read
file - write tmp file - rename tmp over original file case.  So
there's app one of those alleged thousands upon thousands that just got
fixed.

 Yes, POSIX allows this particular idiotic behaviour.  So what?  It
 probably also allows free() to do nothing, yet no-one in their right
 mind would want that.

Actually you might in some particularly weird scenarios.  free()
usually causes memory fragmentation, which might not be desirable.  If
you have a fixed amount of RAM and fixed amount of data you're
operating on, a controlled memory 'leak' might be what you want for
best performance.

(Hey, it's a stupid example, but your example was pretty stupid too.  I
figure a stupid example deserves a stupid response.)

  Or maybe you would be upset if the code
 fragment
 
 const char *s = x;
 int i = (s+1)-s;
 
 formatted your hard drive.  Yes, the C standard really does allow that
 to happen.
 (C99 section 6.5.6 #9, if you really want  the details.)

Sorry, dude, but now you're just not making sense.  I just looked up
the C99 spec[1] and read that section.  it doesn't say anything about
formatting your hard drive.  It just talks about how pointer arith must
result in well-defined values inside the object the pointer points to.
Nothing new there.

 The mere fact that a standard allows an idiotic implementation
 doesn't mean we should play ball with it.  The same standard also
 allows sane implementations.

Well what's idiotic and sane is a matter of opinion.  A risk of file
corruption in certain corner cases if you don't follow the spec, in the
name of better performance seems reasonable to me -- for the right
kind of app and the right kind of data.

 We could litter fsync() calls all over, but...
 
 1. It describes a semantic that isn't really what we want.  In fact,
 there is no way
 to get exactly the semantics we want with POSIX.   We have to ask
 for the please-wait-for-the-disk semantics we don't want.  That's a
 sure way of getting
 sluggish programs.

True, in some cases.

 2. Shell scripts, Makefiles, and other languages without explicit
 fsync control will
 kill really you.  Instead of...
 
 foo file file.new
 mv file.new file
 
 ...you get to write...
 
 foo file file.new
 sync
 mv file.new file

That's a deficiency of the shell, that it doesn't provide something
analogous to fsync, not of the requirement that it must be done.

 Performance might be affected.

Guess what?  You can't have everything.  Performance and reliability
are often at odds, and you need to find a trade-off you're happy with.
If you have a level of reliability that you can't live with, then you
may have to reduce performance to get what you want.  That's life.
Deal with it.

 3. Auditing and changing thousands of programs?  Expect bugs.

Arguably, they're already buggy for expecting behavior that isn't
guaranteed.

 We already break the strict letter of POSIX and the C standard in
 fifty different ways.

Wow, this statement is pretty useless and hand-wavy.  Maybe those fifty
different ways are reasonable and ok, and don't risk user data.  Maybe
those fifty different ways don't actually exist and you're just
trash-talking.

 If someone shows up with an environment that doesn't behave as we
 want, we say sorry, no ball.  Just add stupid file systems to the
 list.

Well apparently 'we' didn't do that: ext4 came out, distros started
using it by default, this issue occurred, and there's pain involved.
Apparently the ext4 devs have caved to pressure and will be adding a
hack to the next version of the driver to order writes to avoid this
specific failure case.  That's all well and good... until the next
filesystem comes along and does something similar.  Or even something
different, but with the same effect.

Again: if you don't like the spec, get it changed or amended!  Then,
later, when this happens again, you can clearly point a finger at the
FS developer and say yes, this really is your bug.  And who knows,
maybe it won't happen again if there's better behavior defined in the
spec.

-brian

[1] http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-14 Thread Mark Mielke

Morten Welinder wrote:

This is crazy.

People are actually advocating that thousands upon thousands of applications
need to be changed.
  


No. The crazy part is that people care so much at all. Nobody cared a 
year ago - why care today?


This isn't a *new* problem in any way.

The question as originally raised was whether glib/gio should do fsync() 
to be proper. Not whether to fix thousands of applications - whether to 
fix glib/gio. The answer is universally yes, even amongst the people who 
are calling the spec broken.


According to the spec, fsync() is proper. If glib/gio wants to use 
rename as atomic change-in-place, and have the best chance of passing 
the pull the plug test, glib/gio should do fsync() before close() and 
rename().


The debate should be over. Debating about other file systems and some 
theoretical change to the spec is quite pointless in gtk-devel-list. At 
best, it's a legitimate rant. At worst, it's an ignorant rant. In any 
case, it's a rant. Fix glib/gio for the rename atomic change-in-place 
case specifically. Everybody is happy from a glib/gio perspective. If 
thousands of other applications are still broken - who cares?


Cheers,
mark

--
Mark Mielke m...@mielke.cc

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-14 Thread Freddie Unpenstein
From: Brian J. Tarricone, Date: 15/03/2009 07:31

 That sounds pretty awful, to me, to be honest. So every FS -- no, wait
 -- every FS that's going to be pushed as a mainstream FS -- is going
 to have to be closely monitored to make sure it doesn't have this
 behavior? Everyone's going to be putting little band-aids over this
 issue, but only in FSes we care about? The underlying issue, the
 root cause -- that the spec allows what many consider very unsafe
 behavior -- is just going to be ignored?

In the absense of a sync this file or do this / tell me after file safely 
sync'd call, it's the next best thing.  The ability to stack file system 
operations and/or obtain completion status would probably be a much better way 
of dealing with this.  A quick at-exit cleanup routine could check for 
completion of a two-step save-and-rename operation, and force a flush if not, 
etc.

Although, the idea of ordered operations does seem to suggest that something 
like this WOULD be considered.  Making sure it's writing the data for a file 
before using it to overwrite another file, does seem fairly in order to me, 
and seems like rather a bit of a glaring omission from any FS's that don't.  
I've always just kind of assumed that they WOULD do that.

I've got a generally pretty damn slow system, and a fair bit of file system 
activity going on, including an unwholesome amount of swapping.  I'd really 
rather now have a bunch of unneccesary sync's going on.  We have probably four 
or five power failures a year, and I've yet to loose anything, even on the old 
FAT partition.  I, for one, don't want this being pushed on me. If it's a 
problem, I'll change my file system settings to make them safer, or look to 
another file system that is.


Fredderic


Adult Education
Get educated.  Click here for Adult Education programs.
http://tagline.excite.com/fc/FgElN1g0XgYRHnMg0w0PNhHpsxtTpltmAZNXdKRusAJu3fAboSbtTCS9GsI/___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Alexander Larsson
On Thu, 2009-03-12 at 21:27 +, Michael Meeks wrote:
 On Thu, 2009-03-12 at 21:11 +0100, Alexander Larsson wrote:
  With all the recent yahoo about ext4 data loss and fsync I felt I had to
  look at glib and make sure we're doing this right.
 
   Hmm; is this not just a database guy ? ;-) presumably if -all- file I/O
 should be synchronous, the kernel would do this for us ?

If you want to you can make all i/o sync by mounting it as such. But
thats of course really slow. Generally the gio file write operations are
used for saving files, and people sort of expect that when save returns
the file is ok on disk. 

And to make matters worse, its perfectly ok for a filesystem (ext4, xfs
and btrfs do this atm) to reorder the metadata and the data writes such
that writing to a temp file, renaming over target file and then crashing
can end up with an empty file. This happens if metadata was saved but
not the new file contents, and the window for this is about a minute, so
its not a small race condition.

So, you save and the system hangs 10 seconds later. What do you expect?
Ideally the new file, less problematic the old file. But, if you're left
with *no* version of the file I'd be pretty pissed off.

  Attached is a patch that makes sure we fsync before closing in the gio
  file saving code and in g_file_set_contents().
 
   Isn't it the case that with ext3 and below fsync is an impossibly
 expensive operation that gums up the whole system - by taking some
 obscure kernel lock on some other piece of somethingummy and causes
 everything to grind to a halt, your audio to skip, and instant hair
 loss ? ;-)

With the data=ordered setting in ext3 (the default), any fsync will
result in all dirty data being flushed, not just the data in that file.
This can be pretty expensive if there is a lot of outstanding I/O.
However, this is only a problem if such an operation happens often, and
file saving is just not that common. And if something constantly is
saving something that is a problem for multiple other reasons too and
should be fixed. 

Of course, not all file writes are saves. For example, it could be
nautilus copying 1 files. This is why I added the ASYNC_WRITE flag
and used it in the file copy case.

   I believe they fixed this for ext4, which is nice for them; but ... for
 everyone else ? What data-loss case are we really trying to protect
 against ? of course, if you hard yank the power, bad things can happen;
 but how often does that occur ?

It occurs often enought that there were several people in the ubuntu
ext4-eats-my-data bug that had it happen to them multiple times.

   AFAIR we spent some cycles in evolution recently to reduce the
 ridiculous number of fsync's that sqlite was injecting into each
 transaction to make the message store perform reasonably and not grind
 the whole system to a halt ;-) at 10ms per fsync, that makes some sense.

The sqlite case is slightly different, basically same as the firefox
case:
http://shaver.off.net/diary/2008/05/25/fsyncers-and-curveballs/

Basically, once you're syncing the database regularly we're talking
about constantly syncing, not just syncing when you're finished saving.
So, the problem is much worse. I think in the firefox case it synced for
every key you pressed in the awesome bar.


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Michael Meeks
Hi Alex,

On Fri, 2009-03-13 at 08:38 +0100, Alexander Larsson wrote:
 If you want to you can make all i/o sync by mounting it as such. But
 thats of course really slow. Generally the gio file write operations are
 used for saving files, and people sort of expect that when save returns
 the file is ok on disk. 

Sure - which is basically what ext3 provides with it's default ordered
mode right ? metadata hits the disk after data - ok, so all pending data
writes were flushed as well as a side-effect(?) but ... ;-)

 And to make matters worse, its perfectly ok for a filesystem (ext4, xfs
 and btrfs do this atm) to reorder the metadata and the data writes such
 that writing to a temp file, renaming over target file and then crashing
 can end up with an empty file. This happens if metadata was saved but
 not the new file contents, and the window for this is about a minute, so
 its not a small race condition.

Sure, sure - and I guess this is why they had to make 'fsync' not suck
for these filing systems.

 So, you save and the system hangs 10 seconds later. What do you expect?
 Ideally the new file, less problematic the old file. But, if you're left
 with *no* version of the file I'd be pretty pissed off.

Yep; of course this is not good.

 With the data=ordered setting in ext3 (the default), any fsync will
 result in all dirty data being flushed, not just the data in that file.
 This can be pretty expensive if there is a lot of outstanding I/O.

Sure - on ext3 it's a 'sync()' call effectively, with the added bonus
that this has a terrible effect on other applications trying to use the
I/O subsystem to eg. read audio to play your mp3, or swap, or allocate
memory, or ... ;-)

Calling 'fsync' regularly on ext3 will bring your system to a grinding
halt, quite regularly, cf. my comments on hair-loss etc. Lots of our
users will be on these systems, and last I checked [ did you poke your
kernel guys on this ? ] this was not recommended.

 However, this is only a problem if such an operation happens often, and
 file saving is just not that common.

Sure; although amusingly, if it happens often ~enough (say every few
hundred ms) - then we almost move the OS into a semi-synchronous writes
mode, and have somewhat less to write each time, and so don't suffer
multi-second glitches in the I/O subsystem ;-)

  And if something constantly is saving something that is a problem
 for multiple other reasons too and should be fixed.

Oh - well, of course there is regular autosave, and setting of gconf
settings [ we set ~200+ keys on login amazingly ;-], then of course the
IM infrastructure will want to make sure your IM logs are *really*
on-disk each time you get a message, the E-mail client for every mail
message ;-) pretty soon if everyone calls 'fsync' we end up in a fairly
bad place (IMHO).

 Of course, not all file writes are saves. For example, it could be
 nautilus copying 1 files. This is why I added the ASYNC_WRITE flag
 and used it in the file copy case.

I guess.

 It occurs often enought that there were several people in the ubuntu
 ext4-eats-my-data bug that had it happen to them multiple times.

Nasty indeed, and this is the only solution. I suppose the nutshell of
my concern is this:

* can we in some (evil or otherwise) way avoid hurting
  desktop performance, interactivity and playback for
  all ext2/3 systems, and still keep ext4 users happy ? :-)

Of course, perhaps I'm just way off / out of date wrt. my dislike of
fsync, let me hunt down some kernely people for a sane 2nd opinion.

Regards,

Michael.

-- 
 michael.me...@novell.com  , Pseudo Engineer, itinerant idiot


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Steve Frécinaux

Alexander Larsson wrote:

Attached is a patch that makes sure we fsync before closing in the gio
file saving code and in g_file_set_contents().


Wouldn't fdatasync be sufficient in most case ?
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Alexander Larsson
On Fri, 2009-03-13 at 11:37 +0100, Steve Frécinaux wrote:
 Alexander Larsson wrote:
  Attached is a patch that makes sure we fsync before closing in the gio
  file saving code and in g_file_set_contents().
 
 Wouldn't fdatasync be sufficient in most case ?

In practical gio use there really is no difference. As soon as the size
of the file changes the metadata (st_size) changes, so a fdatasync()
implies fsync(). The difference is only apparent when you do in place
changing of a file, which would happen e.g. in a database file.


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Mathias Hasselmann
Am Freitag, den 13.03.2009, 12:18 +0100 schrieb Sven Neumann:
 Hi,
 
 On Fri, 2009-03-13 at 08:38 +0100, Alexander Larsson wrote:
 
  If you want to you can make all i/o sync by mounting it as such. But
  thats of course really slow. Generally the gio file write operations are
  used for saving files, and people sort of expect that when save returns
  the file is ok on disk. 
 
 Do they?? Doing file I/O asynchronously is a feature, in particular for
 laptop users. It improves I/O performance and it saves power. Of course
 it's a risk and may result in data loss under certain rare
 circumstances. But it's a risk that people are willing to take. Please
 do not ruin this by implicitly enforcing fsync.

I think you don't understand the problem.

Other file systems but ext3 in order=data mode are that brain dead and
broken, that they lose __both__ the old and new document on power loss!
This is __not__ acceptable, in no way.

Maybe the time kernel hackers will realize some day, that they lost any
sense for real world applications and over-optimized their file systems
for write performance benchmarks. Well, but until this happens we have
to suffer from fsync().

Really, loosing both versions of files really isn't an option.

Ciao,
Mathias
-- 
Mathias Hasselmann mathias.hasselm...@gmx.de
Personal Blog: http://taschenorakel.de/mathias/
Openismus GmbH: http://www.openismus.com/

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Morten Welinder
I think I am in line with what Michael is saying here: there is a
non-trivial risk that littering fsync all over the place will badly
affect existing systems.

The ext4 attitude is interesting, btw.  They are saying that
POSIX allows this behaviour so it's your problem.  But when
the gcc people say The C standard allows this or that, then
the kernel people question the sanity of gcc developers several
generations back.

F*** POSIX allows this!  A program that does open-write-close-
rename should not be left with an empty file in case something
goes wrong.  The old file, or the new file.  Anything else is insane
and by extension the kernel developers and their ancestors.

The world is full of crappy hardware, drivers, electricity suppliers,
cable-pulling clumsy people, etc.  They are all out to get you,
so there is no reason why the kernel should hand them your
data too.

Morten
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Sven Neumann
Hi,

On Fri, 2009-03-13 at 14:11 +0100, Mathias Hasselmann wrote:

 I think you don't understand the problem.

That might very well be the case. I had a look at the presentation that
Alex linked to in the initial post in this thread. But I would have
preferred a document that doesn't look at the issue from a database
developer point of view.

 Other file systems but ext3 in order=data mode are that brain dead and
 broken, that they lose __both__ the old and new document on power loss!
 This is __not__ acceptable, in no way.

But ext3 is what everyone uses. And as far as I understand the next
generation Linux file-system btrfs is going to provide similar
functionality:
http://btrfs.wiki.kernel.org/index.php/FAQ#Does_Btrfs_have_data.3Dordered_mode_like_Ext3.3F

It seems wrong to work around broken file-systems on the application
level. That only takes away pressure from the file-system developers to
address the problem properly.


Sven


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Andrew W. Nosenko
On Fri, Mar 13, 2009 at 12:37 PM, Steve Frécinaux nudr...@gmail.com wrote:
 Alexander Larsson wrote:

 Attached is a patch that makes sure we fsync before closing in the gio
 file saving code and in g_file_set_contents().

 Wouldn't fdatasync be sufficient in most case ?

Narrowed reply: not all OS'es have fdatasync().  Therefore, you will
need at least fallback to fsync().

General reply:  People, are you sure that trashing of all OS'es with
all filesystems by general-purpose library is a good way to workaround
_application_ problem on the one concrete OS plus one concrete FS
combination?

IMHO, it is indded application level problem.  On databases -- yes, we
need to save any and every data at any cost.  But it is extreme case
and people who writting DB servers do syncs anyway independly on OS
and FS, just because of nature of application.

But when paranoia-mode begins drive anything anywere on the
desktop...  Unsure that it is good.  If follow it, then all partitions
should me remounted with -o sync mode and all other modes should be
declared obsolete and removed.  Unsure that it is good (again)...

-- 
Andrew W. Nosenko andrew.w.nose...@gmail.com
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Behdad Esfahbod

On 03/13/2009 09:15 AM, Morten Welinder wrote:

I think I am in line with what Michael is saying here: there is a
non-trivial risk that littering fsync all over the place will badly
affect existing systems.

The ext4 attitude is interesting, btw.  They are saying that
POSIX allows this behaviour so it's your problem.  But when
the gcc people say The C standard allows this or that, then
the kernel people question the sanity of gcc developers several
generations back.

F*** POSIX allows this!  A program that does open-write-close-
rename should not be left with an empty file in case something
goes wrong.  The old file, or the new file.  Anything else is insane
and by extension the kernel developers and their ancestors.


Very well said.

Great way to start a Friday :).  Keep it coming guys!

Cheers,

behdad


The world is full of crappy hardware, drivers, electricity suppliers,
cable-pulling clumsy people, etc.  They are all out to get you,
so there is no reason why the kernel should hand them your
data too.

Morten

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Alexander Larsson
On Fri, 2009-03-13 at 14:35 +0100, Sven Neumann wrote:
 Hi,
 
 On Fri, 2009-03-13 at 14:11 +0100, Mathias Hasselmann wrote:
 
  I think you don't understand the problem.
 
 That might very well be the case. I had a look at the presentation that
 Alex linked to in the initial post in this thread. But I would have
 preferred a document that doesn't look at the issue from a database
 developer point of view.

Here is a comment from the save file point of view.
https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54

 It seems wrong to work around broken file-systems on the application
 level. That only takes away pressure from the file-system developers to
 address the problem properly.

I don't disagree, but on the other hand. Users are losing data as we
speak. (See above ubuntu bug report)

One compromise we could make it to only fsync in the case we're actually
overwriting an existing file. This would mean that we don't risk loosing
both the old and the new version of the file, you only lose new files.
This case is far less common so the performance aspects are not as bad,
and its also get rids of the worst failure mode.




___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Brian J. Tarricone

Sven Neumann wrote:


It seems wrong to work around broken file-systems on the application
level. That only takes away pressure from the file-system developers to
address the problem properly.


How is the file system broken?  Read the man page for write().  If you 
want to guarantee that file data will hit disk (or at least the disk's 
HW buffer) by a certain time, you need to call fsync() (or fdatasync(), 
where available).


This isn't a Linux idiosyncrasy, even.  POSIX specifies this.

The only thing that's actually broken IIRC is ext3, in that a fsync() 
effectively acts as a full-FS sync() (see the Firefox 3.0/sqlite 
fiasco[1]), which is ridiculous.  If anything should be fixed, *that* 
should be... as well as naive applications that think that open() - 
write() - close() is sufficient to get data to disk in a known amount 
of time.


(Of course, ext3 won't ever be fixed, so... I guess we wait for ext4 use 
to become more widespread, and for btrfs to go stable.)


-brian

[1] http://shaver.off.net/diary/2008/05/25/fsyncers-and-curveballs/
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Alexander Larsson
On Fri, 2009-03-13 at 11:11 -0700, Brian J. Tarricone wrote:
 Sven Neumann wrote:
 
  It seems wrong to work around broken file-systems on the application
  level. That only takes away pressure from the file-system developers to
  address the problem properly.
 
 How is the file system broken?  Read the man page for write().  If you 
 want to guarantee that file data will hit disk (or at least the disk's 
 HW buffer) by a certain time, you need to call fsync() (or fdatasync(), 
 where available).

The fact that its documented doesn't make it not broken. If you read the
posix specs you'll see that its per specification for the implementation
of fsync() to be empty. 

Now, we don't actually really need the data to be on the disk at a
certain time. On the contrary, its really fine if its delayed. But, what
we want is either the old file in place, or the new file in place, not
the old file deleted, the metadata for the new file and the new file
being empty. Thats what is broken, even if its allowed by POSIX.

 This isn't a Linux idiosyncrasy, even.  POSIX specifies this.

 The only thing that's actually broken IIRC is ext3, in that a fsync() 
 effectively acts as a full-FS sync() (see the Firefox 3.0/sqlite 
 fiasco[1]), which is ridiculous.  If anything should be fixed, *that* 
 should be... as well as naive applications that think that open() - 
 write() - close() is sufficient to get data to disk in a known amount 
 of time.

Broken is a wider concept than you think. Things that are fully up to
some well documented spec can also be broken from the point of view of
common sense.

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Alexander Larsson
On Fri, 2009-03-13 at 18:45 +0100, Alexander Larsson wrote:
 One compromise we could make it to only fsync in the case we're actually
 overwriting an existing file. This would mean that we don't risk loosing
 both the old and the new version of the file, you only lose new files.
 This case is far less common so the performance aspects are not as bad,
 and its also get rids of the worst failure mode.

Attached is a patch that does this. It adds a
G_FILE_CREATE_SYNC_ON_CLOSE flag, and turns it on by default when
handling an overwrite. The same thing happens in g_file_set_contents().

I think we need to do this, at the minimum, because even if there are
some ext4 patches to fix up this issue they can be disabled, and from
what I understand XFS and btrfs have similar issues.

The question is, do we want this or the full patch?

Index: gio/glocalfileoutputstream.c
===
--- gio/glocalfileoutputstream.c	(revision 7974)
+++ gio/glocalfileoutputstream.c	(working copy)
@@ -69,6 +69,7 @@ struct _GLocalFileOutputStreamPrivate {
   char *original_filename;
   char *backup_filename;
   char *etag;
+  gboolean sync_on_close;
   int fd;
 };
 
@@ -81,7 +82,7 @@ static gboolean   g_local_file_output_st
 			   GCancellable   *cancellable,
 			   GError**error);
 static GFileInfo *g_local_file_output_stream_query_info   (GFileOutputStream  *stream,
-			   char   *attributes,
+			   const char *attributes,
 			   GCancellable   *cancellable,
 			   GError**error);
 static char * g_local_file_output_stream_get_etag (GFileOutputStream  *stream);
@@ -190,6 +191,22 @@ g_local_file_output_stream_close (GOutpu
 
   file = G_LOCAL_FILE_OUTPUT_STREAM (stream);
 
+#ifdef HAVE_FSYNC
+  if (file-priv-sync_on_close)
+{
+  if (fsync (file-priv-fd) != 0)
+	{
+  int errsv = errno;
+	  
+	  g_set_error (error, G_IO_ERROR,
+		   g_io_error_from_errno (errno),
+		   _(Error writing to file: %s),
+		   g_strerror (errsv));
+	  goto err_out;
+	}
+}
+#endif
+ 
 #ifdef G_OS_WIN32
 
   /* Must close before renaming on Windows, so just do the close first
@@ -459,7 +476,7 @@ g_local_file_output_stream_truncate (GFi
 
 static GFileInfo *
 g_local_file_output_stream_query_info (GFileOutputStream  *stream,
-   char   *attributes,
+   const char *attributes,
    GCancellable   *cancellable,
    GError**error)
 {
@@ -517,6 +534,8 @@ _g_local_file_output_stream_create  (con
   
   stream = g_object_new (G_TYPE_LOCAL_FILE_OUTPUT_STREAM, NULL);
   stream-priv-fd = fd;
+  if (flags  G_FILE_CREATE_SYNC_ON_CLOSE)
+stream-priv-sync_on_close = TRUE;
   return G_FILE_OUTPUT_STREAM (stream);
 }
 
@@ -562,6 +581,8 @@ _g_local_file_output_stream_append  (con
   
   stream = g_object_new (G_TYPE_LOCAL_FILE_OUTPUT_STREAM, NULL);
   stream-priv-fd = fd;
+  if (flags  G_FILE_CREATE_SYNC_ON_CLOSE)
+stream-priv-sync_on_close = TRUE;
   
   return G_FILE_OUTPUT_STREAM (stream);
 }
@@ -992,6 +1013,7 @@ _g_local_file_output_stream_replace (con
 
   if (fd == -1  errno == EEXIST)
 {
+  flags |= G_FILE_CREATE_SYNC_ON_CLOSE;
   /* The file already exists */
   fd = handle_overwrite_open (filename, etag, create_backup, temp_file,
   flags, cancellable, error);
@@ -1022,6 +1044,8 @@ _g_local_file_output_stream_replace (con
  
   stream = g_object_new (G_TYPE_LOCAL_FILE_OUTPUT_STREAM, NULL);
   stream-priv-fd = fd;
+  if (flags  G_FILE_CREATE_SYNC_ON_CLOSE)
+stream-priv-sync_on_close = TRUE;
   stream-priv-tmp_filename = temp_file;
   if (create_backup)
 stream-priv-backup_filename = create_backup_filename (filename);
Index: gio/gioenums.h
===
--- gio/gioenums.h	(revision 7974)
+++ gio/gioenums.h	(working copy)
@@ -164,13 +164,17 @@ typedef enum {
  *You can think of it as unlink destination before
  *writing to it, although the implementation may not
  *be exactly like that. Since 2.20
+ * @G_FILE_CREATE_SYNC_ON_CLOSE: If possible, try to ensure
+ *that all data is on disk before returning. For local
+ *files this means calling fsync() before close.
  *
  * Flags used when an operation may create a file.
  */
 typedef enum {
   G_FILE_CREATE_NONE= 0,
   G_FILE_CREATE_PRIVATE = (1  0),
-  G_FILE_CREATE_REPLACE_DESTINATION = (1  1)
+  G_FILE_CREATE_REPLACE_DESTINATION = (1  1),
+  G_FILE_CREATE_SYNC_ON_CLOSE = (1  2)
 } GFileCreateFlags;
 
 
Index: configure.in
===
--- configure.in	(revision 7974)
+++ configure.in	(working copy)
@@ -563,6 +563,7 @@ AC_CHECK_FUNCS(mmap)
 AC_CHECK_FUNCS(posix_memalign)
 AC_CHECK_FUNCS(memalign)
 AC_CHECK_FUNCS(valloc)
+AC_CHECK_FUNCS(fsync)
 
 AC_CHECK_FUNCS(atexit on_exit)
 
Index: glib/gfileutils.c

Re: fsync in glib/gio

2009-03-13 Thread Federico Mena Quintero
On Fri, 2009-03-13 at 09:15 -0400, Morten Welinder wrote:

 F*** POSIX allows this!  A program that does open-write-close-
 rename should not be left with an empty file in case something
 goes wrong.  The old file, or the new file.  Anything else is insane
 and by extension the kernel developers and their ancestors.

100% agreed.

Has anyone actually debugged why this happens?  The kernel must surely
ensure that even if it reorders data/metadata requests, it will do so in
sensible ways only, doesn't it?

  Federico

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Behdad Esfahbod

On 03/13/2009 05:16 PM, Alexander Larsson wrote:

On Fri, 2009-03-13 at 15:05 -0600, Federico Mena Quintero wrote:

On Fri, 2009-03-13 at 09:15 -0400, Morten Welinder wrote:


F*** POSIX allows this!  A program that does open-write-close-
rename should not be left with an empty file in case something
goes wrong.  The old file, or the new file.  Anything else is insane
and by extension the kernel developers and their ancestors.

100% agreed.

Has anyone actually debugged why this happens?  The kernel must surely
ensure that even if it reorders data/metadata requests, it will do so in
sensible ways only, doesn't it?


Its well explained in the various discussions about this. Essentially,
the metadata for the rename is written to disk, but the data in the file
is not (yet, due to delayed allocation) and then the system crashes. On
fsck we discover the file is broken (no data) and set the file size to
0.


That's clearly a broken filesystem (screw standards.  If it doesn't do what 
users expect, it's broken).  Why work around it in gio?  Have the filesystem 
guys fix it for whatever that means.


All we need the few major distros handling it properly.

behdad
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Brian J. Tarricone

Alexander Larsson wrote:

On Fri, 2009-03-13 at 11:11 -0700, Brian J. Tarricone wrote:

Sven Neumann wrote:


It seems wrong to work around broken file-systems on the application
level. That only takes away pressure from the file-system developers to
address the problem properly.


How is the file system broken?  Read the man page for write().  If you 
want to guarantee that file data will hit disk (or at least the disk's 
HW buffer) by a certain time, you need to call fsync() (or fdatasync(), 
where available).


The fact that its documented doesn't make it not broken. If you read the
posix specs you'll see that its per specification for the implementation
of fsync() to be empty. 


That's not the point I'm trying to make.  It may be 'stupid' behavior, 
but it's at least specified.  Saying the filesystem guys should fix 
their filesystems to be less lame just doesn't work, as they're 
compliant with the spec.  So either the app developer can write their 
save routines to be robust *in the face of the spec*, or they can 'hope' 
that every new FS adopts a restriction on behavior that isn't specified 
anywhere, and every old FS is modified and updated to follow this 
fantasy restriction.  Doesn't that sound a bit like unreasonable wishful 
thinking?



Now, we don't actually really need the data to be on the disk at a
certain time. On the contrary, its really fine if its delayed. But, what
we want is either the old file in place, or the new file in place, not
the old file deleted, the metadata for the new file and the new file
being empty. Thats what is broken, even if its allowed by POSIX.


Sure, but that's just a special case.  So you (as the app developer) 
recognise this, understand how the spec interacts with your use-case, 
and write robust code accordingly.


Or, you take the the spec/kernel/FS is broken approach, and try to get 
a guarantee specified for the special case, something like in the case 
where a file is renamed over top of an existing file, the source file 
must be flushed to disk before the rename takes place.  And then the 
app developer doesn't have to worry about it, because the implementation 
should do the right thing.


Your patch to gio takes the first approach, which is fine, I think, if 
unfortunate in the sense that it forces behavior that may not be 
desired.  A user of g_file_set_contents() may be writing a temp file or 
something that they don't care all that much about, and doing so 
arguably reduces performance.  Of course, g_file_set_contents() is a 
decently high-level abstraction, so one could argue that people who want 
finer control over how the file gets written should use gio or 
open/write/close directly.



This isn't a Linux idiosyncrasy, even.  POSIX specifies this.

The only thing that's actually broken IIRC is ext3, in that a fsync() 
effectively acts as a full-FS sync() (see the Firefox 3.0/sqlite 
fiasco[1]), which is ridiculous.  If anything should be fixed, *that* 
should be... as well as naive applications that think that open() - 
write() - close() is sufficient to get data to disk in a known amount 
of time.


Broken is a wider concept than you think. Things that are fully up to
some well documented spec can also be broken from the point of view of
common sense.


Yeah, I'd totally agree.  But in the absence of an ability to change the 
spec, it's best to try to make things work as well as they can within 
the spec, no?  It seems like some people are advocating well, today 
everyone uses ext3, and there's no problem, so we shouldn't do this 
because it'll reduce performance there.  And of course, a year from now 
(or less!  obviously some already are), I'm sure most desktop distros 
will be shipping with ext4 default.  (And I could be wrong, but it seems 
to me that ext3 is the only FS that, by coincidence will usually be 
immune to this problem, and, also coincidentally, is one of the only 
FSes that has crappy fsync() performance.)


I dunno...  my vote/opinion would be to have a _SYNC flag, leave async 
as the default, and force _SYNC for g_file_set_contents() (maybe?) and 
for cases in gio where we know a rename is going to overwrite an 
existing file (if it's possible to know that without a perf hit).


-brian
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Mark Mielke

Behdad Esfahbod wrote:

Its well explained in the various discussions about this. Essentially,
the metadata for the rename is written to disk, but the data in the file
is not (yet, due to delayed allocation) and then the system crashes. On
fsck we discover the file is broken (no data) and set the file size to
0.


That's clearly a broken filesystem (screw standards.  If it doesn't do 
what users expect, it's broken).  Why work around it in gio?  Have the 
filesystem guys fix it for whatever that means.


All we need the few major distros handling it properly.


There are different definitions of broken. I might consider it broken 
if my glib/gio application which writes out thousands of little files 
and suddenly starts taking twice as long as my Perl program. fsync() has 
a real measurable cost.


The documentation for the file systems is usually quite clear, although 
people may not understand it. When a file system such as ext3 offers 
different journal modes, it's presumed that the user understands the 
effect of their choice. It the user must be absolutely safe - they 
should use 'journal' mode - but this may be slow, as all data is written 
to disk at least twice. If 'ordered' mode is used, this means they are 
willing to accept lesser guarantees for increased performance. The ext3 
'ordered' mode works pretty well - data before metadata, and metadata 
updates are ordered. But - it's not perfect! What order is the data 
written in?


Do you intend to patch glib/gio if somebody reports that glib/gio used 
write() to one part of a file, then another part of a file, then pulls 
the plug, and is able to prove to you that it is possible for the second 
write to finish while the first write hasn't started? Will you call it 
absolutely broken and demand a file system fix?


The ext3 'writeback' mode provides even fewer guarantees. Meta data is 
ordered, data is not. The behaviour you are talking about right now 
seems to be 'writeback' mode (not sure - I guess ext4 is doing 
'writeback' mode by default otherwise I don't understand the complaint?).


Tell your users if they expect the right thing to use 'journal' mode. 
glib/gio cannot and should not be second guessing the file system choice 
of the user. Taking this argument to its extreme, you may as well run 
fsync() after every single I/O operations that performs a modification. 
This would be horrible for performance, and the user has this capability 
already by defining the file system as completely journalled and using 
synchronous writes. They don't need glib/gio to simulate this.


My opinion is that glib/gio shouldn't be doing this stuff. The problem 
is not with glib/gio. glib/gio should offer an fsync() wrapper (not sure 
if it does or not - I don't use it), such that applications with special 
requirements such as a database application, can use fsync() at 
strategic points where the application wishes to make greater promises 
than the file system. A database file applies here. For databases 
specifically, fsync() on close() is not good enough. fsync() needs to be 
done at any point that the data needs to be consistent and written to 
disk before the application continues to do another write(). glib/gio 
cannot guess where these points are.


Putting fsync() on close() is a hack.

Just my opinion. :-)

Cheers,
mark

--
Mark Mielke m...@mielke.cc

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Mark Mielke

Mark Mielke wrote:

Putting fsync() on close() is a hack.


Hmm - Looking at the patch, I don't see it doing fsync() on close() - I 
should have read from the beginning instead of reacting to the one 
person calling the file system semantics broken. :-)


Definitely - any file system operations that *requires* file system 
integrity, should make use of fsync(). The write new file and rename 
into place is a good candidate, and fsync() should definitely be done 
here. Wishful thinking about some theoretical system which is both 
efficient, deployed on all systems everywhere, and does fully 
synchronous writes for both data and metadata is pointless. :-)


Note that this doesn't matter what file system is used. Older 
non-journalled file systems had the same problem. The only file systems 
that are completely safe are the ones that do full journalling of all 
data. (Those also tend to be the slowest file systems or modes although 
some file systems may be breaking this trend...)


Just please don't *always* fsync(). Only where it matters. As per the 
patch, I think it tries to do it where it matters and looks fine?


Cheers,
mark

--
Mark Mielke m...@mielke.cc

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Federico Mena Quintero
On Fri, 2009-03-13 at 22:16 +0100, Alexander Larsson wrote:

 Its well explained in the various discussions about this. Essentially,
 the metadata for the rename is written to disk, but the data in the file
 is not (yet, due to delayed allocation) and then the system crashes. On
 fsck we discover the file is broken (no data) and set the file size to
 0.

This reminds me a lot of
http://bugzilla.gnome.org/show_bug.cgi?id=562396 - a problem with
Nautilus metadata.  You start a copy operation, but if you do a read
before the copy is done, then you get the old data.  You should wait
for the copy to be done first, but anyway, my point is...

My point is that the kernel could perfectly well ensure that metadata
operations that depend on data operations will not be reordered.  Don't
rename a file in a directory if we have outstanding writes for the
inode, or something.  (After the rename, do you need to open the
directory and fsync it?  You can't open directories for writing...)

  Federico

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-13 Thread Alberto Garcia
On Fri, Mar 13, 2009 at 06:20:57PM -0600, Federico Mena Quintero wrote:

 My point is that the kernel could perfectly well ensure that
 metadata operations that depend on data operations will not be
 reordered.

I think that's how ext3 works by default (data=ordered mount option).

http://en.wikipedia.org/wiki/Ext3#Journaling_levels

Don't know if the same behaviour is available in ext4

Berto
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-12 Thread Sven Herzberg
Hi,

Am Donnerstag, den 12.03.2009, 21:11 +0100 schrieb Alexander Larsson:
  typedef enum {
G_FILE_CREATE_NONE= 0,
G_FILE_CREATE_PRIVATE = (1  0),
 -  G_FILE_CREATE_REPLACE_DESTINATION = (1  1)
 +  G_FILE_CREATE_REPLACE_DESTINATION = (1  1),
 +  G_FILE_CREATE_ASYNC_WRITE = (1  2),
  } GFileCreateFlags;

IIRC we have commits in GNOME canvas that remove trailing commas in
enums because of some compiler compatibility issue.

Regards,
  Sven

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-12 Thread Ray Strode
Hi,

2009/3/12 Alexander Larsson al...@redhat.com:
 With all the recent yahoo about ext4 data loss and fsync I felt I had to
 look at glib and make sure we're doing this right.
...
 It might also be interesting to look over the rest of our platform for
 similar places where fsync is missing.
We had to fix this in gconf a few months ago:

http://bugzilla.gnome.org/show_bug.cgi?id=562976

(was for some other fs, though, not ext4)

--Ray
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: fsync in glib/gio

2009-03-12 Thread Colin Walters
2009/3/12 Alexander Larsson al...@redhat.com:
 With all the recent yahoo about ext4 data loss and fsync I felt I had to
 look at glib and make sure we're doing this right.

 Attached is a patch that makes sure we fsync before closing in the gio
 file saving code and in g_file_set_contents().

 It also adds G_FILE_CREATE_ASYNC_WRITE flag to disable the fsync in the
 gio saving code. This is used in the file copy code as I think the fsync
 may significantly affect performance when copying lots of files. This is
 similar to cp, which also doesn't fsync, so I think this is a decent
 compromise. At least we do the safe thing when apps save a document.

Does this affect things like nautilus bulk file operations (copy+paste
directory)?  If so, is there a patch for nautilus to use the new flag,
or are we saying that these operations should also be sync?
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list