Re: Safe File Update (atomic)

2011-01-09 Thread Olaf van der Spek
On Thu, Jan 6, 2011 at 7:59 PM, Enrico Weigelt weig...@metux.de wrote:
 * Olaf van der Spek olafvds...@gmail.com schrieb:

 A transaction to update multiple files in one atomic go?

 Yes. The application first starts an transaction, creates/writes/
 removes a bunch of files and then sends a commit. The changes
 should become visible atomically and the call returns when the
 commit() is completed (and written out to disk). If there're
 conflics, the transaction is aborted w/ proper a error code.
 So, in case of a package manager, the update will run completely
 in one shot (from userland view) or not at all.

 I could live with:

 a) relatively slow performance (commit taking a second or so)
 b) abort as soon as an conflict arises
 c) files changed within the transaction are actually new ones
   (sane package managers will have to unlink text files instead
   simply overwriting nevertheless)

That would be nice, but the single file case appears to be difficult
enough already. So we might want to focus on that first.

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktimegmsjdukzghfpnikvipahnz6hb3bjpemtb...@mail.gmail.com



Re: Safe File Update (atomic)

2011-01-06 Thread Olaf van der Spek
On Thu, Jan 6, 2011 at 1:54 AM, Ted Ts'o ty...@mit.edu wrote:
 I was thinking, doesn't ext have this kind of dependency tracking already?
 It has to write the inode after writing the data, otherwise the inode
 might point to garbage.

 No, it doesn't.  We use journaling, and forced data writeouts, to
 ensure consistency.

Suppose I append one byte to an existing file, I don't use fsync.
Will it commit the inode with the increased size before the data byte
is written?
In that case, garbage might show up in my file.

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlkti=dw8ks+s3ynw+1-axk3-tsx0ez5g+or5cgv...@mail.gmail.com



Re: Safe File Update (atomic)

2011-01-06 Thread Bernhard R. Link
* Ted Ts'o ty...@mit.edu [110105 19:26]:
 So one of the questions is how much should be penalizing programs that
 are doing things right (i.e., using fsync), versus programs which are
 doing things wrong (i.e., using rename and trusting to luck).

Please do not call it wrong. All those programs doing is not
requesting some specific protection. They are doing file system
operations that are totally within the normal abstraction level of
file system interfaces. While some programs might be expected to
anticipicate cases not within that interface (i.e. the case that
due to some external event the filesystem is interupted in what it
does and cannot complete its work), that is definitly not the
responsibility of the average program, especially if there is no
interface for this specific problem (i.e. requesting a barrier to only
do a rename after the new file is actually commited to disk).

So the question is: How much should the filesystem protect my data in
case of sudden power loss? Should it only protect data where the program
explicitly requested something explicitly, or should it also do what
it reasonably can to protect all data.

Having some performance knobs so users can choose between performance
and data safety is good. This way users can make decisions depending
what they want.

But a filesystem losing data so easily or with a default setting losing
data so easily is definitly not something to give unsuspecting users.

Bernhard R. Link


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20110106105913.gb14...@pcpool00.mathematik.uni-freiburg.de



Re: Safe File Update (atomic)

2011-01-06 Thread Olaf van der Spek
On Thu, Jan 6, 2011 at 5:01 AM, Ted Ts'o ty...@mit.edu wrote:
 On Thu, Jan 06, 2011 at 12:57:07AM +, Ian Jackson wrote:
 Ted Ts'o writes (Re: Safe File Update (atomic)):
  Then I invite you to implement it, and start discovering all of the
  corner cases for yourself.  :-)  As I predicted, you're not going to
  believe me when I tell you it's too hard.

 How about you reimplement all of Unix userland, first, so that it
 doesn't have what you apparently think is a bug!

 I think you are forgetting the open source way, which is you scratch
 your own itch.

Most of the time one is writing software because it's useful for
oneselves and others. Not because writing software itself is so much
fun. It's about the result.
So focus should be on what those users need/want.

 The the main programs I use where I'd care about this (e.g., emacs)
 got this right two decades ago; I even remember being around during
 the MIT Project Athena days, almost 25 years ago, when we needed to
 add error checking to the fsync() call because Transarc's AFS didn't
 actually try to send the file you were saving to the file server until
 the fsync() or the close() call, and so if you got an over-quota
 error, it was reflected back at fsync() time, and not at the write()
 system call which was what emacs had been expecting and checking.
 (All of which is POSIX compliant, so the bug was clearly with emacs;
 it was fixed, and we moved on.)

Would you classify the emacs implementation of safe file write
semantics simple or complex?
Why did they not get it right the first time?
IMO it's because the API is hard to use and easy to misuse, while it
should be the other way around.
Hiding behind POSIX semantics is easy but doesn't solve the problem.

 Note that all of the modern file systems (and all of the historical
 ones too, with the exception of ext3) have always had the same
 property.  If you care about the data, you use fsync().  If you don't,
 then you can take advantage of the fact that compiles are really,
 really fast.  (After all, in the very unlikely case that you crash,
 you can always rebuild, and why should you optimize for an unlikely
 case?  And if you have crappy proprietary drivers that cause you to
 crash all the time, then maybe you should rethink using said
 proprietary drivers.)

 That's the open source way --- you scratch your own itch.  I'm
 perfectly satisifed with the open source tools that I use.  Unless you
 think the programmers two decades ago were smarter, and people have
 gotten dumber since then (Are we not men?  We are Devo!), it really
 isn't that hard to follow the rules.

I think the number of programmers today is much larger than it was two
decades ago and I also think the average experience of the programmer
went down.

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktikj-yl_emcavvls_xn3imrqcpzteavqpey...@mail.gmail.com



Re: Safe File Update (atomic)

2011-01-06 Thread Bastien ROUCARIES
On Thu, Jan 6, 2011 at 12:39 PM, Olaf van der Spek olafvds...@gmail.com wrote:
 On Thu, Jan 6, 2011 at 5:01 AM, Ted Ts'o ty...@mit.edu wrote:
 On Thu, Jan 06, 2011 at 12:57:07AM +, Ian Jackson wrote:
 Ted Ts'o writes (Re: Safe File Update (atomic)):
  Then I invite you to implement it, and start discovering all of the
  corner cases for yourself.  :-)  As I predicted, you're not going to
  believe me when I tell you it's too hard.

 How about you reimplement all of Unix userland, first, so that it
 doesn't have what you apparently think is a bug!

 I think you are forgetting the open source way, which is you scratch
 your own itch.

 Most of the time one is writing software because it's useful for
 oneselves and others. Not because writing software itself is so much
 fun. It's about the result.
 So focus should be on what those users need/want.

 The the main programs I use where I'd care about this (e.g., emacs)
 got this right two decades ago; I even remember being around during
 the MIT Project Athena days, almost 25 years ago, when we needed to
 add error checking to the fsync() call because Transarc's AFS didn't
 actually try to send the file you were saving to the file server until
 the fsync() or the close() call, and so if you got an over-quota
 error, it was reflected back at fsync() time, and not at the write()
 system call which was what emacs had been expecting and checking.
 (All of which is POSIX compliant, so the bug was clearly with emacs;
 it was fixed, and we moved on.)

Could you point to the code snippet ? I could be worth to add to
gnulib for instance

Another point is to create some fuse filesytem for testing error
condition on filesystem. For instance a filesystem that create EIO
error randomly or EFULL, it will improve the quality of our software.
I have written such a filesystem, I will surelly post it in a few day
(hopefully)

Bastien


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktin5v_cuuetf3j1p1_mtkv3oq2+sk9vstn453...@mail.gmail.com



Re: Safe File Update (atomic)

2011-01-06 Thread Enrico Weigelt

 Getting people to believe that you can't square a circle[1] 
 is very hard, 

Just allow an infinite number of steps and it's almost trivial ;-)

 It's like trying teaching a pig to sing. 

Well, that works, just sounds a bit like vogon poetry ;-o

 If you give me a specific approach, I can tell you why it won't work,
 or why it won't be accepted by the kernel maintainers (for example,
 because it involves pouring far too much complexity into the kernel).

To come back to the original question, I'd like to know which concrete
realworld problems should be solved by that. One thing an database-like
transactional filesystem (w/ MVCC) would be nice is package managers: 
we still have the problem that within the update process there may be
inconsistent states (yes, this had already bitten me!) - if it would
be possible to make an update visible atomically, that would be a 
big win for critical 24/7 systems.

My approach to this would be an special unionfs with transactional
semantics (I admit: no idea how complex it would be implementing this)


cu
-- 
--
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 phone:  +49 36207 519931  email: weig...@metux.de
 mobile: +49 151 27565287  icq:   210169427 skype: nekrad666
--
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
--


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110106183337.gd14...@nibiru.local



Re: Safe File Update (atomic)

2011-01-06 Thread Olaf van der Spek
On Thu, Jan 6, 2011 at 7:33 PM, Enrico Weigelt weig...@metux.de wrote:
 To come back to the original question, I'd like to know which concrete
 realworld problems should be solved by that. One thing an database-like
 transactional filesystem (w/ MVCC) would be nice is package managers:
 we still have the problem that within the update process there may be
 inconsistent states (yes, this had already bitten me!) - if it would
 be possible to make an update visible atomically, that would be a
 big win for critical 24/7 systems.

 My approach to this would be an special unionfs with transactional
 semantics (I admit: no idea how complex it would be implementing this)

A transaction to update multiple files in one atomic go?
Nah, this request is for just a single file, although a future
extension to multiple files shouldn't be too hard.

Olaf


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlkti=ur3avv65qfbo+-074fons__8pkq=5zwljm...@mail.gmail.com



Re: Safe File Update (atomic)

2011-01-06 Thread Enrico Weigelt
* Olaf van der Spek olafvds...@gmail.com schrieb:

 A transaction to update multiple files in one atomic go?

Yes. The application first starts an transaction, creates/writes/
removes a bunch of files and then sends a commit. The changes
should become visible atomically and the call returns when the
commit() is completed (and written out to disk). If there're
conflics, the transaction is aborted w/ proper a error code.
So, in case of a package manager, the update will run completely
in one shot (from userland view) or not at all.

I could live with:

a) relatively slow performance (commit taking a second or so)
b) abort as soon as an conflict arises
c) files changed within the transaction are actually new ones
   (sane package managers will have to unlink text files instead
   simply overwriting nevertheless)


cu
-- 
--
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 phone:  +49 36207 519931  email: weig...@metux.de
 mobile: +49 151 27565287  icq:   210169427 skype: nekrad666
--
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
--


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110106185922.ge14...@nibiru.local



Re: Safe File Update (atomic)

2011-01-05 Thread Olaf van der Spek
On Wed, Jan 5, 2011 at 1:25 AM, Ted Ts'o ty...@mit.edu wrote:
 On Wed, Jan 05, 2011 at 01:05:03AM +0100, Olaf van der Spek wrote:

 Why is it that you ignore all my responses to technical questions you asked?


 In general, because they are either (a) not well-formed, or (b) you
 are asking me to prove a negative.  Getting people to believe that you

Saying that instead of ignoring half of my response would be more constructive.

 If you give me a specific approach, I can tell you why it won't work,
 or why it won't be accepted by the kernel maintainers (for example,
 because it involves pouring far too much complexity into the kernel).

Let's consider the temp file workaround, since a lot of existing apps
use it. A request is to commit the source data before committing the
rename. Seems quite simple.

 But for me to list all possible approaches and tell you why each one
 is not going to work?  You'll have to pay me before I'm willing to
 invest that kind of time.

That's not what I asked.

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktinhnusyvui0jzompde3mkpbuawemy3_-cnpb...@mail.gmail.com



Re: Safe File Update (atomic)

2011-01-05 Thread Ted Ts'o
On Wed, Jan 05, 2011 at 12:55:22PM +0100, Olaf van der Spek wrote:
  If you give me a specific approach, I can tell you why it won't work,
  or why it won't be accepted by the kernel maintainers (for example,
  because it involves pouring far too much complexity into the kernel).
 
 Let's consider the temp file workaround, since a lot of existing apps
 use it. A request is to commit the source data before committing the
 rename. Seems quite simple.

Currently ext4 is initiating writeback on the source file at the time
of the rename.  Given performance measurements others (maybe it was
you, I can't remember, and I don't feel like going through the
literally hundreds of messages on this and related threads) have
cited, it seems that btrfs is doing something similar.  The problem
with doing a full commit, which means surviving a power failure, is
that you have to request a barrier operation to make sure the data
goes all the way down to the disk platter --- and this is expensive
(on the order of at least 20-30ms, more if you've written a lot to the
disk).

We have had experience with forcing data writeback (what you call
commit the source data) before the rename --- ext3 did that.  And it
had some very nasty performance problems which showed up very busy
systems where people were doing a lot of different things at the same
time: large background writes from bittorrents and/or DVD ripping,
compiles, web browsing, etc.  If you force a large amount of data out
when you do a commit, everything else that tries to write to the file
system at that point stops, and if you have stupid programs (i.e.,
firefox trying to do database updates on its UI loop), it can cause
programs to apparently lock up, and users get really upset.

So one of the questions is how much should be penalizing programs that
are doing things right (i.e., using fsync), versus programs which are
doing things wrong (i.e., using rename and trusting to luck).  This is
a policy question, for which you might have a different opinion than I
might have on the subject.

We could also simply force a synchronous data writeback at rename
time, instead of merely starting writeback at the point of the rename.
In the case of a program which has already done an fsync(), the
synchronous data writeback would be a no-op, so that's good in terms
of not penalizing programs which do things right.  But the problem
there is that there could be some renames where forcing data writeback
is not needed, and so we would be forcing the performance hit of the
commit the source data even when it might not be needed (or wanted)
by the user.

How often does it happen that someone does a rename on top of an
already-existing file, where the fsync() isn't wanted.  Well, I can
think up scenarios, such as where an existing .iso image is corrupted
or needs to be updated, and so the user creates a new one and then
renames it on top of the old .iso image, but then gets surprised when
the rename ends up taking minutes to complete.  Is that a common
occurrence?  Probably not, but the case of the system crashing right
after the rename() is someone unusual as well.  

Humans in general suck at reasoning about low-probability events;
that's why we are allowing low-paid TSA workers to grope
air-travellers to avoid terrorist blowing up planes midflight, while
not being up in arms over the number of deaths every year due to
automobile accidents.

For this reason, I'm cautious about going overboard at forcing commits
on renames; doing this has real performance implications, and it is a
computer science truism that optimizing for the uncommon/failure case
is a bad thing to do.

OK, what about simply deferring the commit of the rename until the
file writeback has naturally completed?  The problem with that is
entangled updates.  Suppose there is another file which is written
to the same directory block as the one affected by the rename, and
*that* file is fsync()'ed?  Keeping track of all of the data
dependencies is **hard**.   See: http://lwn.net/Articles/339337/

  But for me to list all possible approaches and tell you why each one
  is not going to work?  You'll have to pay me before I'm willing to
  invest that kind of time.
 
 That's not what I asked.

Actually, it is, although maybe you didn't realize it.  Look above,
and how I had to present multiple alternatives, and then shoot them
all down, one at a time.  There are hundreds of solutions, all of them
wrong.

Hence why *my* counter is --- submit patches.  The mere act of
actually trying to code an alternative will allow you to determine why
your approach won't work, or failing that, others can take your patch,
apply them, and then demonstrate use cases where your idea completely
falls apart.  But it means that you do most of the work, which is fair
since you're the one demanding the feature.

It doesn't scale for me to spend a huge amount of time composing
e-mails like this, which is why it's rare that I do that.  You've
tricked me into 

Re: Safe File Update (atomic)

2011-01-05 Thread Olaf van der Spek
On Wed, Jan 5, 2011 at 7:26 PM, Ted Ts'o ty...@mit.edu wrote:
 On Wed, Jan 05, 2011 at 12:55:22PM +0100, Olaf van der Spek wrote:
  If you give me a specific approach, I can tell you why it won't work,
  or why it won't be accepted by the kernel maintainers (for example,
  because it involves pouring far too much complexity into the kernel).

 Let's consider the temp file workaround, since a lot of existing apps
 use it. A request is to commit the source data before committing the
 rename. Seems quite simple.

 Currently ext4 is initiating writeback on the source file at the time
 of the rename.  Given performance measurements others (maybe it was
 you, I can't remember, and I don't feel like going through the
 literally hundreds of messages on this and related threads) have
 cited, it seems that btrfs is doing something similar.  The problem
 with doing a full commit, which means surviving a power failure, is
 that you have to request a barrier operation to make sure the data
 goes all the way down to the disk platter --- and this is expensive
 (on the order of at least 20-30ms, more if you've written a lot to the
 disk).

 We have had experience with forcing data writeback (what you call
 commit the source data) before the rename --- ext3 did that.  And it
 had some very nasty performance problems which showed up very busy
 systems where people were doing a lot of different things at the same
 time: large background writes from bittorrents and/or DVD ripping,
 compiles, web browsing, etc.  If you force a large amount of data out
 when you do a commit, everything else that tries to write to the file
 system at that point stops, and if you have stupid programs (i.e.,
 firefox trying to do database updates on its UI loop), it can cause
 programs to apparently lock up, and users get really upset.

I'm not sure why other IO would be affected. Isn't this equivalent to
fsync on the source file?
It almost sounds like you lock the entire FS during the data
writeback, which shouldn't be necessary.

 So one of the questions is how much should be penalizing programs that
 are doing things right (i.e., using fsync), versus programs which are
 doing things wrong (i.e., using rename and trusting to luck).  This is
 a policy question, for which you might have a different opinion than I
 might have on the subject.

 We could also simply force a synchronous data writeback at rename
 time, instead of merely starting writeback at the point of the rename.
 In the case of a program which has already done an fsync(), the
 synchronous data writeback would be a no-op, so that's good in terms
 of not penalizing programs which do things right.  But the problem
 there is that there could be some renames where forcing data writeback
 is not needed, and so we would be forcing the performance hit of the
 commit the source data even when it might not be needed (or wanted)
 by the user.

 How often does it happen that someone does a rename on top of an
 already-existing file, where the fsync() isn't wanted.  Well, I can
 think up scenarios, such as where an existing .iso image is corrupted
 or needs to be updated, and so the user creates a new one and then
 renames it on top of the old .iso image, but then gets surprised when
 the rename ends up taking minutes to complete.  Is that a common

Would this be an example of an atomic non-durable use case? ;)
I thought those didn't exist?

 occurrence?  Probably not, but the case of the system crashing right
 after the rename() is someone unusual as well.

Given the reports of empty files not that unusual.
The delay in this unusual case seems like a small price to pay.

 For this reason, I'm cautious about going overboard at forcing commits
 on renames; doing this has real performance implications, and it is a
 computer science truism that optimizing for the uncommon/failure case
 is a bad thing to do.

Performance is important, I agree.
But you're trading performance for safety here.
And on rename, you have to guess the user's intention: just rename or
atomic file update.

 OK, what about simply deferring the commit of the rename until the
 file writeback has naturally completed?  The problem with that is
 entangled updates.  Suppose there is another file which is written
 to the same directory block as the one affected by the rename, and
 *that* file is fsync()'ed?  Keeping track of all of the data
 dependencies is **hard**.   See: http://lwn.net/Articles/339337/

Ah. So performance isn't the problem, it's just hard too implement.
Would've been a lot faster if you said that earlier.
Instead, you require apps to use fsync, even if they don't need/want
it, which introduces a performance hit.
Wasn't there a big problem with fsync in ext3 anyway?

BTW, with O_ATOMIC, you could avoid the updates to directory blocks
and would only have to track other updates to the same inode.

  But for me to list all possible approaches and tell you why each one
  is not going to work?  You'll have to pay me 

Re: Safe File Update (atomic)

2011-01-05 Thread Ted Ts'o
On Wed, Jan 05, 2011 at 09:38:30PM +0100, Olaf van der Spek wrote:
 
 Performance is important, I agree.
 But you're trading performance for safety here.

... but if the safety is not needed, then you're paying for no good
reason.  And if performance is needed, then use fsync().

  OK, what about simply deferring the commit of the rename until the
  file writeback has naturally completed?  The problem with that is
  entangled updates.  Suppose there is another file which is written
  to the same directory block as the one affected by the rename, and
  *that* file is fsync()'ed?  Keeping track of all of the data
  dependencies is **hard**.   See: http://lwn.net/Articles/339337/
 
 Ah. So performance isn't the problem, it's just hard too implement.
 Would've been a lot faster if you said that earlier.

Too hard to implement doesn't go far enough.  It's also a matter of
near impossibility to add new features later.  BSD FFS didn't get
ACL's, extended attributes, and many other features ***years*** after
Linux had them.  Complexity is evil; it leads to bugs, makes things
hard to maintain, and it makes it harder to add new features later.

But hey, if you're so smart, you go ahead and implement them yourself.
You can demonstrate how you can do it better than everyone else.
Otherwise you're just wasting everybody's time.  Complex ideas are not
valid ones; or at least they certainly aren't good ones.

   - Ted


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110105213737.gp2...@thunk.org



Re: Safe File Update (atomic)

2011-01-05 Thread Olaf van der Spek
On Wed, Jan 5, 2011 at 10:37 PM, Ted Ts'o ty...@mit.edu wrote:
 Ah. So performance isn't the problem, it's just hard too implement.
 Would've been a lot faster if you said that earlier.

 Too hard to implement doesn't go far enough.  It's also a matter of
 near impossibility to add new features later.  BSD FFS didn't get
 ACL's, extended attributes, and many other features ***years*** after
 Linux had them.  Complexity is evil; it leads to bugs, makes things
 hard to maintain, and it makes it harder to add new features later.

That was about soft updates. I'm not sure this is just as complex.
I was thinking, doesn't ext have this kind of dependency tracking already?
It has to write the inode after writing the data, otherwise the inode
might point to garbage.

 But hey, if you're so smart, you go ahead and implement them yourself.
 You can demonstrate how you can do it better than everyone else.
 Otherwise you're just wasting everybody's time.  Complex ideas are not
 valid ones; or at least they certainly aren't good ones.

Nobody said FSs are simple.

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlkti=nxzymkerpue4bai0oe9cn2dcz4=+y1rqio...@mail.gmail.com



Re: Safe File Update (atomic)

2011-01-05 Thread Ted Ts'o
On Wed, Jan 05, 2011 at 10:47:03PM +0100, Olaf van der Spek wrote:
 
 That was about soft updates. I'm not sure this is just as complex.

Then I invite you to implement it, and start discovering all of the
corner cases for yourself.  :-)  As I predicted, you're not going to
believe me when I tell you it's too hard.

 I was thinking, doesn't ext have this kind of dependency tracking already?
 It has to write the inode after writing the data, otherwise the inode
 might point to garbage.

No, it doesn't.  We use journaling, and forced data writeouts, to
ensure consistency.

- Ted


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110106005456.gq2...@thunk.org



Re: Safe File Update (atomic)

2011-01-05 Thread Ian Jackson
Ted Ts'o writes (Re: Safe File Update (atomic)):
 Then I invite you to implement it, and start discovering all of the
 corner cases for yourself.  :-)  As I predicted, you're not going to
 believe me when I tell you it's too hard.

How about you reimplement all of Unix userland, first, so that it
doesn't have what you apparently think is a bug!

Ian.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/19749.4963.87525.539...@chiark.greenend.org.uk



Re: Safe File Update (atomic)

2011-01-05 Thread Ted Ts'o
On Thu, Jan 06, 2011 at 12:57:07AM +, Ian Jackson wrote:
 Ted Ts'o writes (Re: Safe File Update (atomic)):
  Then I invite you to implement it, and start discovering all of the
  corner cases for yourself.  :-)  As I predicted, you're not going to
  believe me when I tell you it's too hard.
 
 How about you reimplement all of Unix userland, first, so that it
 doesn't have what you apparently think is a bug!

I think you are forgetting the open source way, which is you scratch
your own itch.

The the main programs I use where I'd care about this (e.g., emacs)
got this right two decades ago; I even remember being around during
the MIT Project Athena days, almost 25 years ago, when we needed to
add error checking to the fsync() call because Transarc's AFS didn't
actually try to send the file you were saving to the file server until
the fsync() or the close() call, and so if you got an over-quota
error, it was reflected back at fsync() time, and not at the write()
system call which was what emacs had been expecting and checking.
(All of which is POSIX compliant, so the bug was clearly with emacs;
it was fixed, and we moved on.)

If there was a program that I used and where I'd care about it, I'd
scratch my own itch and fix it.  Olaf seems to really concerned about
this theoretical use case, and if he cares so much, he can either
stick with ext3, which has the property he wants purely by accident,
but which has terrible performance problem under some circumstances as
a result, or he can fix it in the programs that he cares about --- or
he can try to create his own file system (and he can either impress us
if he actually can solve it without disastrous performance problems,
or he can be depressed when no one uses it because it is dog slow).

Note that all of the modern file systems (and all of the historical
ones too, with the exception of ext3) have always had the same
property.  If you care about the data, you use fsync().  If you don't,
then you can take advantage of the fact that compiles are really,
really fast.  (After all, in the very unlikely case that you crash,
you can always rebuild, and why should you optimize for an unlikely
case?  And if you have crappy proprietary drivers that cause you to
crash all the time, then maybe you should rethink using said
proprietary drivers.)

That's the open source way --- you scratch your own itch.  I'm
perfectly satisifed with the open source tools that I use.  Unless you
think the programmers two decades ago were smarter, and people have
gotten dumber since then (Are we not men?  We are Devo!), it really
isn't that hard to follow the rules.

- Ted


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110106040123.ga27...@thunk.org



Re: Safe File Update (atomic)

2011-01-04 Thread Olaf van der Spek
On Mon, Jan 3, 2011 at 3:43 PM, Ted Ts'o ty...@mit.edu wrote:
 On Mon, Jan 03, 2011 at 12:26:29PM +0100, Olaf van der Spek wrote:

 Given that the issue has come up before so often, I expected there to
 be a FAQ about it.

 Your asking the question over (and over... and over...)  doesn't make
 it an FAQ.  :-)

Hi Ted,

Why is it that you ignore all my responses to technical questions you asked?

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktin0v4sl2zjqkxdekqtuowz3fazkrdbbne=wc...@mail.gmail.com



Re: Safe File Update (atomic)

2011-01-04 Thread Ted Ts'o
On Wed, Jan 05, 2011 at 01:05:03AM +0100, Olaf van der Spek wrote:
 
 Why is it that you ignore all my responses to technical questions you asked?
 

In general, because they are either (a) not well-formed, or (b) you
are asking me to prove a negative.  Getting people to believe that you
can't square a circle[1] is very hard, and when I was one of the
postmasters at MIT, we'd get kooks every so often saying that they had
a proof that they could square the circle, but everyone was being
unfair and ignoring them, and could we please forward this to the head
of MIT's math department with their amazing discovery.  We learned a
long time ago that it's not worth trying to argue with kooks like
that.  It's like trying teaching a pig to sing.  It frustrates you,
and it annoys the pig.

[1] http://en.wikipedia.org/wiki/Squaring_the_circle

If you give me a specific approach, I can tell you why it won't work,
or why it won't be accepted by the kernel maintainers (for example,
because it involves pouring far too much complexity into the kernel).
But for me to list all possible approaches and tell you why each one
is not going to work?  You'll have to pay me before I'm willing to
invest that kind of time.

Best regards,

- Ted


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110105002537.gi2...@thunk.org



Re: Safe File Update (atomic)

2011-01-03 Thread Shachar Shemesh

On 02/01/11 17:37, Olaf van der Spek wrote:


A userspace lib is fine with me. In fact, I've been asking for it
multiple times. Result: no response.

   

Excuse me?

You (well, Henrique, but you were CCed) said how about a user space 
lib? I said I'm working on one, will be ready about this weekend. I 
even gave a URL to watch (https://github.com/Shachar/safewrite). If you 
check it out right now, you will find there a fully implemented and 
fairly debugged user space solution, even including a build tool and man 
page.


BTW - feedback welcome.

Shachar

--
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d21a67b.50...@debian.org



Re: Safe File Update (atomic)

2011-01-03 Thread Olaf van der Spek
On Sun, Jan 2, 2011 at 6:14 PM, Henrique de Moraes Holschuh  Maybe I
wasn't clear, in that case I'm sorry. To me, O_ATOMIC is
 Whether this should map to O_ATOMIC in glibc or be something new, I don't
 care.  But if it is a flag, I'd highly suggest naming it O_CREATEUNLINKED or
 something else that won't give people wrong ideas, as _nothing_ but the
 final inode linking is atomic.

How does this solve the meta-data issue?

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlkti=vvjs5yh84dfq-oeumym7btjng+yu0pkk-r...@mail.gmail.com



Re: Safe File Update (atomic)

2011-01-03 Thread Olaf van der Spek
On Sun, Jan 2, 2011 at 7:55 PM, Adam Borowski kilob...@angband.pl wrote:
 Note that on the other side of the fence there's something called TxF

Not GA AFAIK.

 And what if you're changing one byte inside a 50 GB file?
 I see an easy implementation on btrfs/ocfs2 (assuming no other writers),
 but on ext3/4, that'd be tricky.

My proposal is explicitly only for complete file data updates.

  what should an application do as a fallback?  And given that it is

 Fallback could be implement in the kernel or in userland. Using rename
 as a fallback sounds reasonable. Implementations could switch to
 O_ATOMIC when available.

 For large files using reflink (currently implemented as fs-specific ioctls)
 can ensure performance.  It can give you anything but the abuse for
 preserving owner (ie, the topic of this thread).  To get that, you'd need
 in-kernel support, but for example http://lwn.net/Articles/331808/ proposes
 an API which is just a thin wrapper over existing functionality in multiple
 filesystems.  It basically duplicates an inode, preserving all current
 attributes but making any new writes CoWed.  If you make the old one
 immutable, you get the TxF semantics (mandatory write lock), if you don't,
 you'll get the mentioned above one of the updates will win data loss.

Data loss? If you overwrite a file, losing the old contents isn't data loss.

  And what are the use cases where this really makes sense?  Will people

 Lots of files are written in one go. They could all use this interface.

 I don't see how O_ATOMIC helps there.  TxF transactions would work (all
 writes either succeed together or none does), but O_ATOMIC can't do more
 than one file.

I mean that each app that writes a file in one go could use the O_ATOMIC API.
Extending O_ATOMIC to support multiple files seems simple too by using
a vector variant of close.

 Uhm, but you didn't answer the question.  These two use cases Ted Tso
 mentioned are certainly not worth the complexity of in-kernel support,
 O_ATOMIC doesn't bring other goodies, and the rest can be done by an
 userspace library which is indeed a good idea.

Someone is working on such a lib, let's see the code complexity and
exceptions it has.

 Not true. I've asked (you) for just such a lib, but I'm still waiting
 for an answer.

 Shachar Shemesh is already working on it, when he finishes, Ted Tso will
 point out what's wrong in it (if something is).  What else do you need?

Don't know yet. Let's wait for that lib.

 Why would anyone work on an implementation if there's no agreement about it?

 Because one implementation after research is better than many naive and
 possibly wrong ones.

True, but it'd still be nice to have some agreement before doing all
that hard work.

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktinpqeqrybqtx4avr1aa=t2t04yo5++r0_ynd...@mail.gmail.com



Re: Safe File Update (atomic)

2011-01-03 Thread Olaf van der Spek
On Mon, Jan 3, 2011 at 4:25 AM, Ted Ts'o ty...@mit.edu wrote:
 On Sun, Jan 02, 2011 at 04:14:15PM +0100, Olaf van der Spek wrote:

 Last time you ignored my response, but let's try again.
 The implementation would be comparable to using a temp file, so
 there's no need to keep 2 g in memory.
 Write the 2 g to disk, wait one day, append the 1 k, fsync, update inode.

 Write the 2g to disk *where*?  Some random assigned blocks?  And using

A random allocation strategy would work, but better options are available. ;)

 *what* to keep track of the where to find all of the metadata blocks?

Implementation detail, but a new temp inode might work.

 That information is normally stored in the inode, but you don't want
 to touch it.  So we need to store it someplace, and you haven't
 specified where.  Some alternate universe?  Another inode, which is
 only tied to that file descriptor?  That's *possible*, but it's (a)
 not at all trivial, and (b) won't work for all file systems.  It
 definitely won't work for FAT based file systems, so your blithe, oh,
 just emulate it in the kernel is rather laughable.

You'd have to decide what you want to do in that case. One option is
to fallback to the non-atomic variant.
Another is to fallback to a temp file with a name. But then I assume
the kernel is still able to preseve meta-data and to ensure atomic
operation.

 If you think it's so easy, *you* go implement it.

  How exactly do the semantics for O_ATOMIC work?
 
  And given at the momment ***zero*** file systems implement O_ATOMIC,
  what should an application do as a fallback?  And given that it is

 Fallback could be implement in the kernel or in userland. Using rename
 as a fallback sounds reasonable. Implementations could switch to
 O_ATOMIC when available.

 Using rename as a fallback means exposing random temp file names into
 the directory.  Which could conflict with files that the userspace
 might want to create.

They don't need to be in the same dir.

 It could be done, but again, it's an awful lot
 of complexity to shove into the kernel.

That's unfortunate but I think the only option.

  highly unlikely this could ever be implemented for various file
  systems including NFS, I'll observe this won't really reduce
  application complexity, since you'll always need to have a fallback
  for file systems and kernels that don't support O_ATOMIC.

 I don't see a reason why this couldn't be implemented by NFS.

 Try it; it should become obvious fairly quickly.  Or just go read the
 NFS protocol specifications.

In that case: update the NFS protocol (yes, long-term solution)

 As you've said yourself, a lot of apps don't get this right. Why not?
 Because the safe way is much more complex than the unsafe way. APIs
 should be easy to use right and hard to misuse. With O_ATOMIC, I feel
 this is the case. Without, it's the opposite and the consequences are
 obvious. There shouldn't be a tradeoff between safety and potential
 problems.

 Application programmers have in the past been unwilling to change
 their applications.

Why not?

 If they are willing to change their applications,
 they can just as easily use a userspace library, or use fsync() and
 rename() properly.  If they aren't willing to change their programs

Fsync, rename (and preserving meta-data) is a lot more complex then
their current code which for me would be a disadvantage.
O_ATOMIC is a single flag that doesn't increase their code complexity.
A new lib dependency is also a disadvantage.

 and recompile (and the last time we've been around this block, they
 weren't; they just blamed the file system), asking them to use
 O_ATOMIC probably won't work, given the portiability issues.

If they're happy to blame the FS they're probably also happy to
#define O_ATOMIC 0 if O_ATOMIC isn't available.

 Not true. I've asked (you) for just such a lib, but I'm still waiting
 for an answer.

 Pay someone enough money, and they'll write you the library.  Whining
 about it petulently and expecting someone else to write it is probably
 not going to work.

Given that the issue has come up before so often, I expected there to
be a FAQ about it.
I didn't say you had to write such a lib, just saying you weren't
aware of any existing lib would've been enough.
But given that you have so much experience on this issue, pointing to
a few apps that got this right shouldn't be so hard.
Unless getting it right is currently impossible...

 Quite frankly, if you're competent enough to use it, you should be
 able to write such a library yourself.  If you aren't going to be
 using it yourself, they why are you wasting everyone's time on this?

Because this is still a real world problem that needs to be solved.
Stopping this conversation isn't going to solve the problem.

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 

Re: Safe File Update (atomic)

2011-01-03 Thread Henrique de Moraes Holschuh
Ted,

Thanks for the reply and detailed analysis.

 Which gets me back to the question of use cases.  When are we going to
 be using this monster?  For many use cases, where the original reason

Where implicit rollbacks are desireable, I suppose.  It is incompatible
with edit-in-place, anyway.  Which asks for all the fsyncs on the link
thing.  Anyone that wants something different is welcome to do it the old
way IMHO.

The first part (get an unlinked fd) is useful without fsyncs or any
guarantees for temp files.

 by the kernel.  But if you use make the system call synchronous, now
 there's no performance advantage over simply doing the fsync() and
 rename() in userspace.  And if we do this using O_ATOMIC, or your

I understand this is far more about ease of use (read: more difficult to
misuse) than much higher performance.

 1) You care about data loss in the case of power failure, but not in
 the case of hard drive or storage failure, *AND* you are writing tons
 and tons of tiny 3-4 byte files and so you are worried about
 performance because you're doing something insane with large number of
 small files.

That usage pattern cannot be made both safe and fast outside of a full-blown
ACID database, so lets skip it.

 2) You are specifically worried about the case where you are replacing
 the contents of a file that is owned by different uid than the user
 doing the data file update, in a safe way where you don't want a
 partially written file to replace the old, complete file, *AND* you
 care about the file's ownership after the data update.

I am not sure about the file ownership, but this is the useful usecase IMO.

 3) You care about the temp file used by the userspace library, or
 application which is doing the write temp file, fsync(), rename()
 scheme, being automatically deleted in case of a system crash or a
 process getting sent an uncatchable signal and getting terminated.

This is always useful, as well.

 Is it worth it?  I'd say no; and suggest that someone who really cares
 should create a userspace application helper library first, since
 you'll need it as a fallback for the cases listed above where this
 scheme won't work.  (Even if you do the fallback in the kernel, you'll
 still need userspace fallback for non-Linux systems, and for when the
 application is run on an older Linux kernel that doesn't have all of
 this O_ATOMIC or link/unlink magic).

That's what I suggested, as well.

 The reality is we've lived without this capability in Unix and Linux
 system for something like three decades.  I suspect we can live

But not very well.  And the usage patterns of *nix systems have changed in
the last decade.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110103114940.ga9...@khazad-dum.debian.net



Re: Safe File Update (atomic)

2011-01-03 Thread Ted Ts'o
On Mon, Jan 03, 2011 at 09:49:40AM -0200, Henrique de Moraes Holschuh wrote:
 
  1) You care about data loss in the case of power failure, but not in
  the case of hard drive or storage failure, *AND* you are writing tons
  and tons of tiny 3-4 byte files and so you are worried about
  performance because you're doing something insane with large number of
  small files.
 
 That usage pattern cannot be made both safe and fast outside of a full-blown
 ACID database, so lets skip it.

Agreed.

 
  2) You are specifically worried about the case where you are replacing
  the contents of a file that is owned by different uid than the user
  doing the data file update, in a safe way where you don't want a
  partially written file to replace the old, complete file, *AND* you
  care about the file's ownership after the data update.
 
 I am not sure about the file ownership, but this is the useful usecase IMO.

But if you don't care about file ownership, then you can do the write
a temp file, fsync, and rename trick.  If it's about ease of use, as
you suggest, a userspace library solves that problem.  It's *only* if
you care about the file ownership remaining the same that (2) comes
into play.

  3) You care about the temp file used by the userspace library, or
  application which is doing the write temp file, fsync(), rename()
  scheme, being automatically deleted in case of a system crash or a
  process getting sent an uncatchable signal and getting terminated.
 
 This is always useful, as well.

 and (3) is the recovery after a power failure/crash scenario

If you don't care about the file ownership issue, then recovering
after a powerfailure/crash is the last remaining case --- and you
could solve this by creating a file with an mktemp-style name in a
mode 1777 directory, where the contents of the file contains the temp
file name to be deleted by an init.d script.  This could be done in
the userspace library, and if you crash after the rename, but before
you have a chance to delete the file containing the
temp-filename-to-be-deleted, that's not a problem, since the init.d
file will find no file with that name to be deleted, and then
continue.

Hence, all of these problems can be solved in userspace, with a
userspace library, with the exception of the file ownership issue,
which you've admitted may not be all that critical.

  Is it worth it?  I'd say no; and suggest that someone who really cares
  should create a userspace application helper library first, since
  you'll need it as a fallback for the cases listed above where this
  scheme won't work.  (Even if you do the fallback in the kernel, you'll
  still need userspace fallback for non-Linux systems, and for when the
  application is run on an older Linux kernel that doesn't have all of
  this O_ATOMIC or link/unlink magic).
 
 That's what I suggested, as well.

Then we're in agreement.  :-)

- Ted


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/2011010319.gg11...@thunk.org



Re: Safe File Update (atomic)

2011-01-03 Thread Olaf van der Spek
On Mon, Jan 3, 2011 at 6:28 AM, Enrico Weigelt weig...@metux.de wrote:
 * Ted Ts'o ty...@mit.edu schrieb:

 This is possible.  It would be specific only to file systems that
 support inodes (i.e., ix-nay for NFS, FAT, etc.).

 FAT supports inodes ?

ix-nay: no/except

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktimc5atwf5j8mcrbkpw+kp4chxp2ey=xurplj...@mail.gmail.com



Re: Safe File Update (atomic)

2011-01-03 Thread Olaf van der Spek
On Mon, Jan 3, 2011 at 11:35 AM, Shachar Shemesh shac...@debian.org wrote:
 On 02/01/11 17:37, Olaf van der Spek wrote:

 A userspace lib is fine with me. In fact, I've been asking for it
 multiple times. Result: no response.



 Excuse me?

 You (well, Henrique, but you were CCed) said how about a user space lib? I

Yes, sorry, you are. It was aimed at the people at the linux lists.

 said I'm working on one, will be ready about this weekend. I even gave a
 URL to watch (https://github.com/Shachar/safewrite). If you check it out
 right now, you will find there a fully implemented and fairly debugged user
 space solution, even including a build tool and man page.

I did look into it when I read the original post. Will look again and
provide feedback.

Olaf


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktinqrmx3gqx0sg-9pqbg6snittab_4cehx5qc...@mail.gmail.com



Re: Safe File Update (atomic)

2011-01-03 Thread Ted Ts'o
On Mon, Jan 03, 2011 at 12:26:29PM +0100, Olaf van der Spek wrote:
 
 Given that the issue has come up before so often, I expected there to
 be a FAQ about it.

Your asking the question over (and over... and over...)  doesn't make
it an FAQ.  :-)

Aside from your asking over and over, it hasn't come up that often,
actually.  The right answer has been known for decades, and it's is
very simple; write a temp file, copy over the xattr's and ACL's if you
care (in many cases, such as an application's private state files, it
won't care, so it can skip this step --- it's only the more generic
file editors that would need to worry about such things --- but when's
the last time anyone has really worried about xattr's on a .c file?),
fsync(), and rename().

This is *not* hard.   People who get it wrong are just being lazy.

In the special case of dpkg, where they are writing a moderate number
of large files, and they care about syncing the files without causing
journal commits, the use of sync_file_range() on the files followed by
a series of fdatasync() calls has solved their issues as far as I know.

- Ted


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110103144335.gd6...@thunk.org



Re: Safe File Update (atomic)

2011-01-03 Thread Olaf van der Spek
On Mon, Jan 3, 2011 at 3:43 PM, Ted Ts'o ty...@mit.edu wrote:
 On Mon, Jan 03, 2011 at 12:26:29PM +0100, Olaf van der Spek wrote:

 Given that the issue has come up before so often, I expected there to
 be a FAQ about it.

 Your asking the question over (and over... and over...)  doesn't make
 it an FAQ.  :-)

Haha, right. But file loss issues have come up before. Let's wait for
the userspace lib.

 Aside from your asking over and over, it hasn't come up that often,
 actually.  The right answer has been known for decades, and it's is
 very simple; write a temp file, copy over the xattr's and ACL's if you
 care (in many cases, such as an application's private state files, it
 won't care, so it can skip this step --- it's only the more generic
 file editors that would need to worry about such things --- but when's
 the last time anyone has really worried about xattr's on a .c file?),
 fsync(), and rename().

 This is *not* hard.   People who get it wrong are just being lazy.

True, that's why right/safe-by-default would be nice to have.

 In the special case of dpkg, where they are writing a moderate number
 of large files, and they care about syncing the files without causing
 journal commits, the use of sync_file_range() on the files followed by
 a series of fdatasync() calls has solved their issues as far as I know.

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktikfyvny9oaq9ren5wxztjrodmogfjuj7opz+...@mail.gmail.com



Re: Safe File Update (atomic)

2011-01-03 Thread Uoti Urpala
Ted Ts'o tytso at mit.edu writes:
 actually.  The right answer has been known for decades, and it's is
 very simple; write a temp file, copy over the xattr's and ACL's if you
 care (in many cases, such as an application's private state files, it
 won't care, so it can skip this step --- it's only the more generic
 file editors that would need to worry about such things --- but when's
 the last time anyone has really worried about xattr's on a .c file?),
 fsync(), and rename().
 
 This is *not* hard.   People who get it wrong are just being lazy.

IMO calling a recipe containing fsync() the right answer is wrong. For the
clear majority of programs waiting for a disk-level write is not the correct
semantics, and using fsync does cause real problems, the recent dpkg issues here
being just one example. IMO telling people to use fsync does more harm than
good; rather we should be telling them to not use fsync unless they really know
what they're doing.

In another post in this thread you also talk about how we've managed to live
with the current functionality for three decades. We've managed to live, but
what exactly is the practice we've lived with? I'd say an essential part of it
has been the recipe of write temp file + rename, _without_ doing an fsync.
Yes, it may not have been theoretically crash safe on all filesystems; but in
practice, the practice that has allowed things to work for decades, the
filesystems have either been safe or the machines stable enough for it to not
become an issue. If this is no longer true then that is a reason why things are
now different from previous decades and why it's now necessary to add new
functionality.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/loom.20110103t195257-...@post.gmane.org



Re: Safe File Update (atomic)

2011-01-02 Thread Henrique de Moraes Holschuh
On Sun, 02 Jan 2011, Ted Ts'o wrote:
 And of course, Olaf isn't actually offerring to implement this
 hypothetical O_ATOMIC.  Oh, no!  He's just petulently demanding it,
 even though he can't give us any concrete use cases where this would
 actually be a huge win over a userspace safe-write library that
 properly uses fsync() and rename().

Olaf, O_ATOMIC is difficult in the kernel sense and in the long run.  It
is an API that is too hard to implement in a sane way, with too many
boundary conditions.

OTOH, you don't need O_ATOMIC.  You need a way for easy application access
to a saner/simpler way to deal with files that require atomic replacement.
Time to switch to a plan B that can achieve it.  Do not lose track of your
final goal, and stop wasting time with O_ATOMIC (and aggravating fs
developers, which can only hurt your goal in the end).

Maybe there are ways to actually let the kernel detect usage patterns and do
the right thing, but nobody found any that is complete (and the incomplete
ones are implemented in ext3 and ext4 AFAIK).

If an userspace library is built to do all the dances required using only
POSIX APIs (you can use extensions where they are available to enhance
performance) you will have an EXACT list of boundary conditions and choke
points.

With that exact list of requirements in hand and something that can be
easily regression-tested, it gets a LOT easier to talk to any fs developer
and to the glibc developers, and come up with the kernel and glibc
enhancements needed to accelerate it (or remove boundary conditions) that
are acceptable to both sides.

In the end, because POSIX _is_ crap in many ways, you will have some
boundary conditions that cannot be removed or worked around.  It is likely
that they will not be so serious flaws that it will make the whole idea
unusable.  Maybe they will apply only to some filesystems (just like right
now there are some things you simply don't use NFSv3 for).

If you have other ideas that have no weird side-effects or troublesome
semanthics, I am sure you'd have a better chance of it happenning.  They're
probably not going to take the form of open() flags for the same reason
O_ATOMIC has problems, but who knows.  If I had a good idea about how to
solve this problem, I'd have already written paper about it or something.

Well, that's it.  I have nothing else to contribute to this thread.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110102125258.ga6...@khazad-dum.debian.net



Re: Safe File Update (atomic)

2011-01-02 Thread Olaf van der Spek
On Sun, Jan 2, 2011 at 8:09 AM, Ted Ts'o ty...@mit.edu wrote:
 You could ask for a new (non-POSIX?) API that does not ask of a
 POSIX-like filesystem something it cannot provide (i.e. don't ask for
 something that requires inode-path reverse mappings).  You could ask
 for syscalls to copy inodes, etc.  You could ask for whatever is needed
 to do a (open+write+close) that is atomic if the target already exists.
 Maybe one of those has a better chance than O_ATOMIC.

 The O_ATOMIC open flag is highly problematic, and it's not fully
 specified.  What if the system is under a huge amount of memory
 pressure, and the badly behaved application program does:

        fd = open(file, O_ATOMIC | O_TRUNC);
        write(fd, buf, 2*1024*1024*1024); // write 2 gigs, heh, heh heh
        sleep for one day
        write(fd, buf2, 1024);
        close(fd);

Last time you ignored my response, but let's try again.
The implementation would be comparable to using a temp file, so
there's no need to keep 2 g in memory.
Write the 2 g to disk, wait one day, append the 1 k, fsync, update inode.

 What happens if another program opens file for reading during the
 one day sleep period?  Does it get the the old contents of file?

Of course, according to the definition of atomic.

 The partially written, incomplete new version of file?  What happens
 if the file is currently mmap'ed, as Henrique has asked?

Didn't I respond to that too? Again, old file.

 What if another program opens the file O_ATOMIC during the one day
 sleep period, so the file is in the middle of getting updated by two
 different processes using O_ATOMIC?

Again equivalent to using the rename trick. One of the updates will
win and since they don't depend on the old contents there are no
troubles.

 How exactly do the semantics for O_ATOMIC work?

 And given at the momment ***zero*** file systems implement O_ATOMIC,
 what should an application do as a fallback?  And given that it is

Fallback could be implement in the kernel or in userland. Using rename
as a fallback sounds reasonable. Implementations could switch to
O_ATOMIC when available.

 highly unlikely this could ever be implemented for various file
 systems including NFS, I'll observe this won't really reduce
 application complexity, since you'll always need to have a fallback
 for file systems and kernels that don't support O_ATOMIC.

I don't see a reason why this couldn't be implemented by NFS.

 And what are the use cases where this really makes sense?  Will people

Lots of files are written in one go. They could all use this interface.

 really code to this interface, knowing that it only works on Linux
 (there are other operating systems, out there, like FreeBSD and

FreeBSD, Solaris and AIX probably also care about file consistency.
Discussing this proposal with them would be a good idea.

 Solaris and AIX, you know, and some application programmers _do_ care
 about portability), and the only benefits are (a) a marginal
 performance boost for insane people who like to write vast number of
 2-4 byte files without any need for atomic updates across a large
 number of these small files, and (b) the ability to keep the the file
 owner unchanged when someone other than the owner updates said file
 (how important is this _really_; what is the use case where this
 really matters?).

As you've said yourself, a lot of apps don't get this right. Why not?
Because the safe way is much more complex than the unsafe way. APIs
should be easy to use right and hard to misuse. With O_ATOMIC, I feel
this is the case. Without, it's the opposite and the consequences are
obvious. There shouldn't be a tradeoff between safety and potential
problems.

O_ATOMIC is merely a proposed way to solve this problem. I've asked
(you) for a concrete code example to do it without O_ATOMIC support,
but nobody has been able to provide one yet.

 And of course, Olaf isn't actually offerring to implement this
 hypothetical O_ATOMIC.  Oh, no!  He's just petulently demanding it,
 even though he can't give us any concrete use cases where this would
 actually be a huge win over a userspace safe-write library that
 properly uses fsync() and rename().

Not true. I've asked (you) for just such a lib, but I'm still waiting
for an answer.

Why would anyone work on an implementation if there's no agreement about it?

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktiml0-7go=rfyt+wtonjeinqh9zqo5rpnf3c8...@mail.gmail.com



Re: Safe File Update (atomic)

2011-01-02 Thread Olaf van der Spek
On Sun, Jan 2, 2011 at 1:52 PM, Henrique de Moraes Holschuh
h...@debian.org wrote:
 Olaf, O_ATOMIC is difficult in the kernel sense and in the long run.  It
 is an API that is too hard to implement in a sane way, with too many
 boundary conditions.

 OTOH, you don't need O_ATOMIC.  You need a way for easy application access
 to a saner/simpler way to deal with files that require atomic replacement.
 Time to switch to a plan B that can achieve it.  Do not lose track of your
 final goal, and stop wasting time with O_ATOMIC (and aggravating fs
 developers, which can only hurt your goal in the end).

Maybe I wasn't clear, in that case I'm sorry. To me, O_ATOMIC is
mostly about the userspace API. The implementation isn't (that)
important, so you're right.

 Maybe there are ways to actually let the kernel detect usage patterns and do
 the right thing, but nobody found any that is complete (and the incomplete
 ones are implemented in ext3 and ext4 AFAIK).

 If an userspace library is built to do all the dances required using only
 POSIX APIs (you can use extensions where they are available to enhance
 performance) you will have an EXACT list of boundary conditions and choke
 points.

A userspace lib is fine with me. In fact, I've been asking for it
multiple times. Result: no response.

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktikmkh=bxzwybhqdn_geog=qkkbs-efkxgex3...@mail.gmail.com



Re: Safe File Update (atomic)

2011-01-02 Thread Henrique de Moraes Holschuh
On Sun, 02 Jan 2011, Olaf van der Spek wrote:
 On Sun, Jan 2, 2011 at 1:52 PM, Henrique de Moraes Holschuh
 h...@debian.org wrote:
  Olaf, O_ATOMIC is difficult in the kernel sense and in the long run.  It
  is an API that is too hard to implement in a sane way, with too many
  boundary conditions.
 
  OTOH, you don't need O_ATOMIC.  You need a way for easy application access
  to a saner/simpler way to deal with files that require atomic replacement.
  Time to switch to a plan B that can achieve it.  Do not lose track of your
  final goal, and stop wasting time with O_ATOMIC (and aggravating fs
  developers, which can only hurt your goal in the end).
 
 Maybe I wasn't clear, in that case I'm sorry. To me, O_ATOMIC is
 mostly about the userspace API. The implementation isn't (that)
 important, so you're right.

Ok.  Here is one meta-API that could be useful (and yes, it is likely mostly
exactly what you call O_ATOMIC.  Whatever, my body is at 38.4°C right now
and the ferver is still climbing, so I don't even claim perfect sanity at
the moment.

Ted, if I could impose on you a single question, please either reply with a
short no, already explained why the idea below is bogus elsewhere, no,
new idea but wouldn't work because of a,b,c, no, but I don't care to
explain why right now, and yes, could work depending on the details.  I
won't pester you about it.

1. Create unlinked file fd (benefits from kernel support, but doesn't
require it).  If a filesystem cannot support this or the boundary conditions
are unaceptable, fail.  Needs to know the destination name to do the unliked
create on the right fs and directory (otherwise attempts to link the file
later would have to fail if the fs is different).

2. fd works as any normal fd to an unlinked regular file.

3. create a link() that can do unlink+link atomically.  Maybe this already
exists, otherwise needs kernel support.

The behaviour of (3) should allow synchrous wait of a fsync() and a sync of
the metadata of the parent dir.  It doesn't matter much if it does
everything, or just calling fsync(), or creating a fclose() variant that
does it.

Whether this should map to O_ATOMIC in glibc or be something new, I don't
care.  But if it is a flag, I'd highly suggest naming it O_CREATEUNLINKED or
something else that won't give people wrong ideas, as _nothing_ but the
final inode linking is atomic.

This will work for other uses, too.  It is a safe and easy way to create
temporary files for ipc, etc.

Or not, maybe it is completely broken and I should not write while in a
ferver.

 A userspace lib is fine with me. In fact, I've been asking for it
 multiple times. Result: no response.

You will need to actually find someone who wants to write such lib, or pay
someone to, or fire up a public funds campaign and contract it from someone
the community would trust to actually be able to complete the job, etc.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110102171441.ga6...@khazad-dum.debian.net



Re: Safe File Update (atomic)

2011-01-02 Thread Adam Borowski
On Sun, Jan 02, 2011 at 04:14:15PM +0100, Olaf van der Spek wrote:
 On Sun, Jan 2, 2011 at 8:09 AM, Ted Ts'o ty...@mit.edu wrote:
  The O_ATOMIC open flag is highly problematic, and it's not fully
  specified.

Note that on the other side of the fence there's something called TxF
(Transactional NTFS).  I don't know how fast or reliable it is, but 
browsing the docs shows some interesting things.  In particular, it is
not limited to a single file but can handle any number of changes to the
filesystem.

  What if the system is under a huge amount of memory
  pressure, and the badly behaved application program does:
 
         fd = open(file, O_ATOMIC | O_TRUNC);
         write(fd, buf, 2*1024*1024*1024); // write 2 gigs, heh, heh heh
         sleep for one day
         write(fd, buf2, 1024);
         close(fd);
 
 Last time you ignored my response, but let's try again.
 The implementation would be comparable to using a temp file, so
 there's no need to keep 2 g in memory.
 Write the 2 g to disk, wait one day, append the 1 k, fsync, update inode.

And what if you're changing one byte inside a 50 GB file?
I see an easy implementation on btrfs/ocfs2 (assuming no other writers),
but on ext3/4, that'd be tricky.

  What if another program opens the file O_ATOMIC during the one day
  sleep period, so the file is in the middle of getting updated by two
  different processes using O_ATOMIC?
 
 Again equivalent to using the rename trick. One of the updates will
 win and since they don't depend on the old contents there are no
 troubles.

On NTFS, an attempt to open a file for writing twice fails if at least one
of you and the other writer use TxF.  This goes contrary to the usual Unix
semantics (where you can always open the file for writing) but it is how SQL
works.  NTFS has bad lock granularity (the whole file rather than a row,
page or a byte range), but is straightforward.
 
  How exactly do the semantics for O_ATOMIC work?
 
  And given at the momment ***zero*** file systems implement O_ATOMIC,

I'd count TxF as an implementation.

  what should an application do as a fallback?  And given that it is
 
 Fallback could be implement in the kernel or in userland. Using rename
 as a fallback sounds reasonable. Implementations could switch to
 O_ATOMIC when available.

For large files using reflink (currently implemented as fs-specific ioctls)
can ensure performance.  It can give you anything but the abuse for
preserving owner (ie, the topic of this thread).  To get that, you'd need
in-kernel support, but for example http://lwn.net/Articles/331808/ proposes
an API which is just a thin wrapper over existing functionality in multiple
filesystems.  It basically duplicates an inode, preserving all current
attributes but making any new writes CoWed.  If you make the old one
immutable, you get the TxF semantics (mandatory write lock), if you don't,
you'll get the mentioned above one of the updates will win data loss.

  highly unlikely this could ever be implemented for various file
  systems including NFS, I'll observe this won't really reduce
  application complexity, since you'll always need to have a fallback
  for file systems and kernels that don't support O_ATOMIC.
 
 I don't see a reason why this couldn't be implemented by NFS.

Not sure how extensible NFS is, but it's just a matter of passing these
calls over network to the underlying filesystem.  Ie, the problem can be
divided into doing this locally (see above) and extending NFS.

  And what are the use cases where this really makes sense?  Will people
 
 Lots of files are written in one go. They could all use this interface.

I don't see how O_ATOMIC helps there.  TxF transactions would work (all
writes either succeed together or none does), but O_ATOMIC can't do more
than one file.
 
  the only benefits are (a) a marginal performance boost for insane people
  who like to write vast number of 2-4 byte files without any need for
  atomic updates across a large number of these small files, and (b) the
  ability to keep the the file owner unchanged when someone other than the
  owner updates said file (how important is this _really_; what is the use
  case where this really matters?).

 As you've said yourself, a lot of apps don't get this right. Why not?
 Because the safe way is much more complex than the unsafe way. APIs
 should be easy to use right and hard to misuse. With O_ATOMIC, I feel
 this is the case. Without, it's the opposite and the consequences are
 obvious. There shouldn't be a tradeoff between safety and potential
 problems.

Uhm, but you didn't answer the question.  These two use cases Ted Tso
mentioned are certainly not worth the complexity of in-kernel support,
O_ATOMIC doesn't bring other goodies, and the rest can be done by an
userspace library which is indeed a good idea.

 O_ATOMIC is merely a proposed way to solve this problem. I've asked
 (you) for a concrete code example to do it without O_ATOMIC support,
 but nobody has been able to 

Re: Safe File Update (atomic)

2011-01-02 Thread Ted Ts'o
On Sun, Jan 02, 2011 at 04:14:15PM +0100, Olaf van der Spek wrote:
 
 Last time you ignored my response, but let's try again.
 The implementation would be comparable to using a temp file, so
 there's no need to keep 2 g in memory.
 Write the 2 g to disk, wait one day, append the 1 k, fsync, update inode.

Write the 2g to disk *where*?  Some random assigned blocks?  And using
*what* to keep track of the where to find all of the metadata blocks?
That information is normally stored in the inode, but you don't want
to touch it.  So we need to store it someplace, and you haven't
specified where.  Some alternate universe?  Another inode, which is
only tied to that file descriptor?  That's *possible*, but it's (a)
not at all trivial, and (b) won't work for all file systems.  It
definitely won't work for FAT based file systems, so your blithe, oh,
just emulate it in the kernel is rather laughable.

If you think it's so easy, *you* go implement it.

  How exactly do the semantics for O_ATOMIC work?
 
  And given at the momment ***zero*** file systems implement O_ATOMIC,
  what should an application do as a fallback?  And given that it is
 
 Fallback could be implement in the kernel or in userland. Using rename
 as a fallback sounds reasonable. Implementations could switch to
 O_ATOMIC when available.

Using rename as a fallback means exposing random temp file names into
the directory.  Which could conflict with files that the userspace
might want to create.  It could be done, but again, it's an awful lot
of complexity to shove into the kernel.

  highly unlikely this could ever be implemented for various file
  systems including NFS, I'll observe this won't really reduce
  application complexity, since you'll always need to have a fallback
  for file systems and kernels that don't support O_ATOMIC.
 
 I don't see a reason why this couldn't be implemented by NFS.

Try it; it should become obvious fairly quickly.  Or just go read the
NFS protocol specifications.

 As you've said yourself, a lot of apps don't get this right. Why not?
 Because the safe way is much more complex than the unsafe way. APIs
 should be easy to use right and hard to misuse. With O_ATOMIC, I feel
 this is the case. Without, it's the opposite and the consequences are
 obvious. There shouldn't be a tradeoff between safety and potential
 problems.

Application programmers have in the past been unwilling to change
their applications.  If they are willing to change their applications,
they can just as easily use a userspace library, or use fsync() and
rename() properly.  If they aren't willing to change their programs
and recompile (and the last time we've been around this block, they
weren't; they just blamed the file system), asking them to use
O_ATOMIC probably won't work, given the portiability issues.

  And of course, Olaf isn't actually offerring to implement this
  hypothetical O_ATOMIC.  Oh, no!  He's just petulently demanding it,
  even though he can't give us any concrete use cases where this would
  actually be a huge win over a userspace safe-write library that
  properly uses fsync() and rename().
 
 Not true. I've asked (you) for just such a lib, but I'm still waiting
 for an answer.

Pay someone enough money, and they'll write you the library.  Whining
about it petulently and expecting someone else to write it is probably
not going to work.

Quite frankly, if you're competent enough to use it, you should be
able to write such a library yourself.  If you aren't going to be
using it yourself, they why are you wasting everyone's time on this?

- Ted


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110103032549.gc11...@thunk.org



Re: Safe File Update (atomic)

2011-01-02 Thread Ted Ts'o
On Sun, Jan 02, 2011 at 03:14:41PM -0200, Henrique de Moraes Holschuh wrote:
 
 1. Create unlinked file fd (benefits from kernel support, but doesn't
 require it).  If a filesystem cannot support this or the boundary conditions
 are unaceptable, fail.  Needs to know the destination name to do the unliked
 create on the right fs and directory (otherwise attempts to link the file
 later would have to fail if the fs is different).

This is possible.  It would be specific only to file systems that
support inodes (i.e., ix-nay for NFS, FAT, etc.).  Some file systems
would want to know a likely directory where the file would be linked
so for their inode and block allocation policies can optimize the
inode and block placement.

 2. fd works as any normal fd to an unlinked regular file.
 
 3. create a link() that can do unlink+link atomically.  Maybe this already
 exists, otherwise needs kernel support.
 
 The behaviour of (3) should allow synchrous wait of a fsync() and a sync of
 the metadata of the parent dir.  It doesn't matter much if it does
 everything, or just calling fsync(), or creating a fclose() variant that
 does it.

OK, so this is where things get trickly.  The first is you are asking
for the ability to take a file descriptor and link it into some
directory.  The inode associated with the fd might or might not be
already linked to some other directory, and it might or might not be
owned by the user trying to do the link.  The latter could get
problematical if quota is enabled, since it does open up a new
potential security exposure.

A user might pass a file descriptor to another process in a different
security domain, and that process could create a link to some
directory which the original user doesn't have access to.  The user
would no longer be able to delete file and drop quota, and the process
would retain permanent access to the file, which it might not
otherwise have if the inode was protected by a parent directory's
permissions.  It's for the same reason that we can't just implement
open-by-inode-number; even if you use the inode's permissions and
ACL's to do a access check, this allows someone to bypass security
controls based on the containing directory's permissions.  It might
not be a security exposure, but for some scenarios (i.e., a mode 600
~/Private directory that contains world-readable files), it changes
accessibility of some files.

We could control for this by only allowing the link to happen if the
user executing this new system call owns the inode being linked, so
this particular problem is addressable.

The larger problem is this doesn't solve give you any performance
benefits over simply creating a temporary file, fsync'ing it, and then
doing the rename.  And it doesn't solve the problem that userspace is
responsible for copying over the extended attributes and ACL
information.  So in exchange for doing something non-portable which is
Linux specific, and won't work on FAT, NFS, and other non-inode based
file systems at all, and which requires special file-system
modifications for inode-based file systems --- the only real benefit
you get is that the temp file gets cleaned up automatically if you
crash before the link/unlink new magical system call is completed.

Is it worth it?   I'm not at all convinced.

Can this be fixed?  Well, I suppose we could have this magical
link/unlink system call also magically copy over the xattr and acl's.

And if you don't care about when things happen, you could have the
kernel fork off a kernel thread, which does the fsync, followed by the
magic ACL and xattr copying, and once all of this completes, it could
do the magic link/unlink.

So we could bundle all of this into a system call.  *Theoretically*.
But then someone else will say that they want to know when this magic
link/unlink system call actually completes.  Others might say that
they don't care about the fsync happening right away, but would rather
wait some arbitary time, and let the system writeback algorithsm write
back the file *whenever*, but only when the file is written back
*whenever*, should the rest of the magical link/unlink happen.

So now we have an explosion of complexity, with all sorts of different
variants.  And there's also the problem where if you don't do don't
make the system call synchronous (where it does an fsync() and waits
for it to complete), you'll lose the ability to report errors back to
userspace.

Which gets me back to the question of use cases.  When are we going to
be using this monster?  For many use cases, where the original reason
why we said people were doing it wrong because they weren't doing
things right, the risk was losing data.  But if you don't do things
synchronously, and use fsync(), you'll also end up risking losing data
because you won't know about write failures --- specifically, your
program may have long exited by the time the write failure is noticed
by the kernel.  But if you use make the system call synchronous, now
there's no 

Re: Safe File Update (atomic)

2011-01-02 Thread Enrico Weigelt
* Ted Ts'o ty...@mit.edu schrieb:

 This is possible.  It would be specific only to file systems that
 support inodes (i.e., ix-nay for NFS, FAT, etc.).  

FAT supports inodes ?
IIRC it puts all file information (including attributes and first
data block) directly into the dirent ...

 Some file systems would want to know a likely directory where the
 file would be linked so for their inode and block allocation
 policies can optimize the inode and block placement.

Interesting. Do you know of some which do that and maybe some studies
on whether that's worth it ?
 
 A user might pass a file descriptor to another process in a different
 security domain, and that process could create a link to some
 directory which the original user doesn't have access to.  The user
 would no longer be able to delete file and drop quota, and the process
 would retain permanent access to the file, which it might not
 otherwise have if the inode was protected by a parent directory's
 permissions. 

Just curious: does the fd passing duplicate the fd or pass it as-is ?
(so multiple processes have access to the same fd instance instead
of just the same inode ?)

 1) You care about data loss in the case of power failure, but not in
 the case of hard drive or storage failure, *AND* you are writing tons
 and tons of tiny 3-4 byte files and so you are worried about
 performance because you're doing something insane with large number of
 small files.

I'd be careful w/ declaring use of tons of small files insane.
Sure, this might call for a database, but hierachical filesystems
also might be a good interface for hierachical key-value-lists.
(for this, an read-at-once/write-at-once syscall would be nice ;-)).

 3) You care about the temp file used by the userspace library, or
 application which is doing the write temp file, fsync(), rename()
 scheme, being automatically deleted in case of a system crash or a
 process getting sent an uncatchable signal and getting terminated.

Indeed, an automatic garbage collection for temp files would be nice.
But that also could be done by an new flag which tells the kernel
to automatically remove those files when the holding process terminates.
But this information would also have to be permanently recorded somewhere,
so the gc can still clean them up after hard reboot. 
 

cu
-- 
--
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 phone:  +49 36207 519931  email: weig...@metux.de
 mobile: +49 151 27565287  icq:   210169427 skype: nekrad666
--
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
--


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110103052826.ga14...@nibiru.local



Re: Safe File Update (atomic)

2011-01-01 Thread Olaf van der Spek
On Fri, Dec 31, 2010 at 5:08 PM, Enrico Weigelt weig...@metux.de wrote:
 Not true. Renaming a running executable works just fine, for example.

 Well, has been quite a while since I last used Windows, but IIRC
 renaming an running executable was denied.

Maybe on FAT. However, that's OT.

   Why not designing an new (overlay'ing) filesystem for that ?
 
  Increased complexity, lower performance, little benefit.
 
  Why that ? Currently applications (try to) implement that all on
  their own, which needs great efforts for multiprocess synchronization.
  Having that in a little fileserver eases this sychronization and
  moves the complexity to a single point.

 I mean compared to implementing it properly in the kernel.

 Doing it in the kernel would be fine (maybe DLM could be used here),

What's DLM?

 but would be a nonportable solution for quite a long time ;-o

Since it's the only proper solution I don't think that's a problem.

Olaf


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktinzgo=u85r4mjaxuslkzdha7_yhbfz2ylfu7...@mail.gmail.com



Re: Safe File Update (atomic)

2011-01-01 Thread Cyril Brulebois
Olaf van der Spek olafvds...@gmail.com (01/01/2011):
  Doing it in the kernel would be fine (maybe DLM could be used here),
 
 What's DLM?

CONFIG_DLM.

KiBi.


signature.asc
Description: Digital signature


Re: Safe File Update (atomic)

2011-01-01 Thread Olaf van der Spek
On Sat, Jan 1, 2011 at 7:13 PM, Cyril Brulebois k...@debian.org wrote:
 Olaf van der Spek olafvds...@gmail.com (01/01/2011):
  Doing it in the kernel would be fine (maybe DLM could be used here),

 What's DLM?

 CONFIG_DLM.

DLM seems independent of atomic updates.

Olaf


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktim5=w67itcnx7fkrmsuddstt=05zggbuwq32...@mail.gmail.com



Re: Safe File Update (atomic)

2011-01-01 Thread Enrico Weigelt
* Olaf van der Spek olafvds...@gmail.com schrieb:

  Doing it in the kernel would be fine (maybe DLM could be used here),
 
 What's DLM?

Distributed lock manager.

  but would be a nonportable solution for quite a long time ;-o
 
 Since it's the only proper solution I don't think that's a problem.

I doubt that the only proper solution. As said, an (userland)
filesystem could also do fine.


cu
-- 
--
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 phone:  +49 36207 519931  email: weig...@metux.de
 mobile: +49 151 27565287  icq:   210169427 skype: nekrad666
--
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
--


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110101230343.gd10...@nibiru.local



Re: Safe File Update (atomic)

2011-01-01 Thread Olaf van der Spek
On Sun, Jan 2, 2011 at 12:03 AM, Enrico Weigelt weig...@metux.de wrote:
 I doubt that the only proper solution. As said, an (userland)
 filesystem could also do fine.

Do you think distros like Debian would install such a setup by default?

Olaf


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktinsexfkrcdzf1d1ia44z=logx0cic46z39vf...@mail.gmail.com



Re: Safe File Update (atomic)

2011-01-01 Thread Ted Ts'o
On Fri, Dec 31, 2010 at 09:51:50AM -0200, Henrique de Moraes Holschuh wrote:
 On Fri, 31 Dec 2010, Olaf van der Spek wrote:
  Ah, hehe. BTW, care to respond to the mail I send to you?
 
 There is nothing more I can add to this thread.  You want O_ATOMIC.  It
 cannot be implemented for all use cases of the POSIX API, so it will not
 be implemented by the kernel.  That's all there is to it, AFAIK.
 
 You could ask for a new (non-POSIX?) API that does not ask of a
 POSIX-like filesystem something it cannot provide (i.e. don't ask for
 something that requires inode-path reverse mappings).  You could ask
 for syscalls to copy inodes, etc.  You could ask for whatever is needed
 to do a (open+write+close) that is atomic if the target already exists.
 Maybe one of those has a better chance than O_ATOMIC.

The O_ATOMIC open flag is highly problematic, and it's not fully
specified.  What if the system is under a huge amount of memory
pressure, and the badly behaved application program does:

fd = open(file, O_ATOMIC | O_TRUNC);
write(fd, buf, 2*1024*1024*1024); // write 2 gigs, heh, heh heh
sleep for one day
write(fd, buf2, 1024);
close(fd);

What happens if another program opens file for reading during the
one day sleep period?  Does it get the the old contents of file?
The partially written, incomplete new version of file?  What happens
if the file is currently mmap'ed, as Henrique has asked?

What if another program opens the file O_ATOMIC during the one day
sleep period, so the file is in the middle of getting updated by two
different processes using O_ATOMIC?

How exactly do the semantics for O_ATOMIC work?

And given at the momment ***zero*** file systems implement O_ATOMIC,
what should an application do as a fallback?  And given that it is
highly unlikely this could ever be implemented for various file
systems including NFS, I'll observe this won't really reduce
application complexity, since you'll always need to have a fallback
for file systems and kernels that don't support O_ATOMIC.

And what are the use cases where this really makes sense?  Will people
really code to this interface, knowing that it only works on Linux
(there are other operating systems, out there, like FreeBSD and
Solaris and AIX, you know, and some application programmers _do_ care
about portability), and the only benefits are (a) a marginal
performance boost for insane people who like to write vast number of
2-4 byte files without any need for atomic updates across a large
number of these small files, and (b) the ability to keep the the file
owner unchanged when someone other than the owner updates said file
(how important is this _really_; what is the use case where this
really matters?).

And of course, Olaf isn't actually offerring to implement this
hypothetical O_ATOMIC.  Oh, no!  He's just petulently demanding it,
even though he can't give us any concrete use cases where this would
actually be a huge win over a userspace safe-write library that
properly uses fsync() and rename().

If someone were to pay me a huge amount of money, and told me what was
the file size range where such a thing would be used, and what sort of
application would need it, and what kind of update frequency it should
be optimized for, and other semantic details about parallel O_ATOMIC
updates, what happens to users who are in the middle of reading the
file, what are the implications for quota, etc., it's certainly
something I can entertain.  But at the moment, it's a vague
specification (not even a solution) looking for a problem.

- Ted


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110102070922.ga6...@thunk.org



Re: Safe File Update (atomic)

2010-12-31 Thread Olaf van der Spek
On Fri, Dec 31, 2010 at 3:17 AM, Henrique de Moraes Holschuh
h...@debian.org wrote:
 On Thu, 30 Dec 2010, Henrique de Moraes Holschuh wrote:
 BTW: safely removing a file is also tricky.  AFAIK, one must open it RW,
 in exclusive mode. stat it by fd and check whether it is what one
 expects (regular file, ownership).  unlink it by fd.  close the fd.

 Eh, as it was pointed to me by private mail, this is obviously a load of
 crap :p  There is no unlink by fd.  Sorry about that.

 The attacks here are races by messing with intermediate path components,
 which are either not worth bothering with, or have to be avoided in a
 much more convoluted manner.

Ah, hehe. BTW, care to respond to the mail I send to you?


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktinnyxtf2czhkfrmkw_gpp39h5uqu2j8oz1cs...@mail.gmail.com



Re: Safe File Update (atomic)

2010-12-31 Thread Henrique de Moraes Holschuh
On Fri, 31 Dec 2010, Olaf van der Spek wrote:
 Ah, hehe. BTW, care to respond to the mail I send to you?

There is nothing more I can add to this thread.  You want O_ATOMIC.  It
cannot be implemented for all use cases of the POSIX API, so it will not
be implemented by the kernel.  That's all there is to it, AFAIK.

You could ask for a new (non-POSIX?) API that does not ask of a
POSIX-like filesystem something it cannot provide (i.e. don't ask for
something that requires inode-path reverse mappings).  You could ask
for syscalls to copy inodes, etc.  You could ask for whatever is needed
to do a (open+write+close) that is atomic if the target already exists.
Maybe one of those has a better chance than O_ATOMIC.

It is up to you and the fs developers to find some common ground.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20101231115150.gb31...@khazad-dum.debian.net



Re: Safe File Update (atomic)

2010-12-31 Thread Olaf van der Spek
On Fri, Dec 31, 2010 at 12:51 PM, Henrique de Moraes Holschuh
h...@debian.org wrote:
 On Fri, 31 Dec 2010, Olaf van der Spek wrote:
 Ah, hehe. BTW, care to respond to the mail I send to you?

 There is nothing more I can add to this thread.  You want O_ATOMIC.  It

That's a shame. I thought I provided pretty concrete answers.

 cannot be implemented for all use cases of the POSIX API, so it will not
 be implemented by the kernel.  That's all there is to it, AFAIK.

 You could ask for a new (non-POSIX?) API that does not ask of a
 POSIX-like filesystem something it cannot provide (i.e. don't ask for

What's the definition of a POSIX-like FS?

 something that requires inode-path reverse mappings).  You could ask
 for syscalls to copy inodes, etc.  You could ask for whatever is needed

To me, inodes are an implementation detail that shouldn't be exposed.

 to do a (open+write+close) that is atomic if the target already exists.
 Maybe one of those has a better chance than O_ATOMIC.

 It is up to you and the fs developers to find some common ground.

The FS devs are happy with all the regressions of the workaround, so
they're unlikely to do anything.

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktikw9372od-eufevczv8dtxorbagslq3mc...@mail.gmail.com



Re: Safe File Update (atomic)

2010-12-31 Thread Enrico Weigelt
* Olaf van der Spek olafvds...@gmail.com schrieb:

  something that requires inode-path reverse mappings).  You could ask
  for syscalls to copy inodes, etc.  You could ask for whatever is needed
 
 To me, inodes are an implementation detail that shouldn't be exposed.

Well, they're an fundamental concept which sometimes *IS* significant
to the applications. It's very different from systems where each
file has exactly one name (eg. DOS/Windows) or where there're just
filesnames that point to opaque stream objects that can be virtually
anything (eg. Plan9).

  to do a (open+write+close) that is atomic if the target already exists.
  Maybe one of those has a better chance than O_ATOMIC.
 
  It is up to you and the fs developers to find some common ground.
 
 The FS devs are happy with all the regressions of the workaround, so
 they're unlikely to do anything.

Why not designing an new (overlay'ing) filesystem for that ?


cu
-- 
--
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 phone:  +49 36207 519931  email: weig...@metux.de
 mobile: +49 151 27565287  icq:   210169427 skype: nekrad666
--
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
--


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20101231135711.gb10...@nibiru.local



Re: Safe File Update (atomic)

2010-12-31 Thread Olaf van der Spek
On Fri, Dec 31, 2010 at 2:57 PM, Enrico Weigelt weig...@metux.de wrote:
 To me, inodes are an implementation detail that shouldn't be exposed.

 Well, they're an fundamental concept which sometimes *IS* significant
 to the applications. It's very different from systems where each
 file has exactly one name (eg. DOS/Windows) or where there're just
 filesnames that point to opaque stream objects that can be virtually
 anything (eg. Plan9).

Sometimes, indeed. This number of times should be as low as possible.

  to do a (open+write+close) that is atomic if the target already exists.
  Maybe one of those has a better chance than O_ATOMIC.
 
  It is up to you and the fs developers to find some common ground.

 The FS devs are happy with all the regressions of the workaround, so
 they're unlikely to do anything.

 Why not designing an new (overlay'ing) filesystem for that ?

Increased complexity, lower performance, little benefit.

Olaf


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktinq1aucfw2fkjiqwz=y2k4hoor87zbhfq8nb...@mail.gmail.com



Re: Safe File Update (atomic)

2010-12-31 Thread Enrico Weigelt
* Olaf van der Spek olafvds...@gmail.com schrieb:

  Well, they're an fundamental concept which sometimes *IS* significant
  to the applications. It's very different from systems where each
  file has exactly one name (eg. DOS/Windows) or where there're just
  filesnames that point to opaque stream objects that can be virtually
  anything (eg. Plan9).
 
 Sometimes, indeed. This number of times should be as low as possible.

These cases aren't that rare. Windows, for example, tends to deny
renames on open files, as they're also identified by the filename.
(yes, there're other solutions for this problem, eg. having some
internal-only inode numbering, etc).

It's important to understand, that on *nix, filenames are not representing
the files directly, but just a pointer to them (somewhat comparable
to DNS entries), where other platforms directly use the filename as
primary identification (sometimes even as primary key). This has great
implications on the semantics of the filesystem.

  Why not designing an new (overlay'ing) filesystem for that ?
 
 Increased complexity, lower performance, little benefit.

Why that ? Currently applications (try to) implement that all on 
their own, which needs great efforts for multiprocess synchronization.
Having that in a little fileserver eases this sychronization and
moves the complexity to a single point.


cu
-- 
--
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 phone:  +49 36207 519931  email: weig...@metux.de
 mobile: +49 151 27565287  icq:   210169427 skype: nekrad666
--
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
--


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20101231144455.ga29...@nibiru.local



Re: Safe File Update (atomic)

2010-12-31 Thread brian m. carlson
On Fri, Dec 31, 2010 at 03:44:56PM +0100, Enrico Weigelt wrote:
 * Olaf van der Spek olafvds...@gmail.com schrieb:
 
   Well, they're an fundamental concept which sometimes *IS* significant
   to the applications. It's very different from systems where each
   file has exactly one name (eg. DOS/Windows) or where there're just
   filesnames that point to opaque stream objects that can be virtually
   anything (eg. Plan9).
  
  Sometimes, indeed. This number of times should be as low as possible.
 
 These cases aren't that rare. Windows, for example, tends to deny
 renames on open files, as they're also identified by the filename.
 (yes, there're other solutions for this problem, eg. having some
 internal-only inode numbering, etc).

I would like to point out that this specific issue is why Windows needs
to be rebooted so often compared to Unix systems.  This is one situation
where inodes really shine.

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187


signature.asc
Description: Digital signature


Re: Safe File Update (atomic)

2010-12-31 Thread Olaf van der Spek
On Fri, Dec 31, 2010 at 3:44 PM, Enrico Weigelt weig...@metux.de wrote:
 * Olaf van der Spek olafvds...@gmail.com schrieb:

  Well, they're an fundamental concept which sometimes *IS* significant
  to the applications. It's very different from systems where each
  file has exactly one name (eg. DOS/Windows) or where there're just
  filesnames that point to opaque stream objects that can be virtually
  anything (eg. Plan9).

 Sometimes, indeed. This number of times should be as low as possible.

 These cases aren't that rare. Windows, for example, tends to deny

I mean that apps shouldn't have to know about inodes.

 renames on open files, as they're also identified by the filename.

Not true. Renaming a running executable works just fine, for example.

 (yes, there're other solutions for this problem, eg. having some
 internal-only inode numbering, etc).

 It's important to understand, that on *nix, filenames are not representing
 the files directly, but just a pointer to them (somewhat comparable
 to DNS entries), where other platforms directly use the filename as
 primary identification (sometimes even as primary key). This has great
 implications on the semantics of the filesystem.

  Why not designing an new (overlay'ing) filesystem for that ?

 Increased complexity, lower performance, little benefit.

 Why that ? Currently applications (try to) implement that all on
 their own, which needs great efforts for multiprocess synchronization.
 Having that in a little fileserver eases this sychronization and
 moves the complexity to a single point.

I mean compared to implementing it properly in the kernel.

Olaf


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktimzsvy_g8+r2zooz=skb0tza86kot2qb-eh8...@mail.gmail.com



Re: Safe File Update (atomic)

2010-12-31 Thread Olaf van der Spek
On Fri, Dec 31, 2010 at 3:58 PM, brian m. carlson
sand...@crustytoothpaste.net wrote:
 These cases aren't that rare. Windows, for example, tends to deny
 renames on open files, as they're also identified by the filename.
 (yes, there're other solutions for this problem, eg. having some
 internal-only inode numbering, etc).

 I would like to point out that this specific issue is why Windows needs
 to be rebooted so often compared to Unix systems.  This is one situation
 where inodes really shine.

I didn't say inodes are bad. I said apps shouldn't have to know about them.

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlkti=upmdxkkfmx5ly8nfxndmobg55f3yrpuygy...@mail.gmail.com



Re: Safe File Update (atomic)

2010-12-31 Thread Enrico Weigelt
* Olaf van der Spek olafvds...@gmail.com schrieb:

  renames on open files, as they're also identified by the filename.
 
 Not true. Renaming a running executable works just fine, for example.

Well, has been quite a while since I last used Windows, but IIRC
renaming an running executable was denied.

   Why not designing an new (overlay'ing) filesystem for that ?
 
  Increased complexity, lower performance, little benefit.
 
  Why that ? Currently applications (try to) implement that all on
  their own, which needs great efforts for multiprocess synchronization.
  Having that in a little fileserver eases this sychronization and
  moves the complexity to a single point.
 
 I mean compared to implementing it properly in the kernel.

Doing it in the kernel would be fine (maybe DLM could be used here),
but would be a nonportable solution for quite a long time ;-o


cu
-- 
--
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 phone:  +49 36207 519931  email: weig...@metux.de
 mobile: +49 151 27565287  icq:   210169427 skype: nekrad666
--
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
--


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20101231160803.gc10...@nibiru.local



Re: Safe File Update (atomic)

2010-12-30 Thread Henrique de Moraes Holschuh
On Wed, 29 Dec 2010, Olaf van der Spek wrote:
 Writing a temp file, fsync, rename is often proposed. However, the

It is:
  write temp file (in same directory as file to be replaced), fsync temp
  file[1], rename (atomic), fsync directory[2].

[1] Makes sure file data has been commited to backend device before
the metadata update

[2] Makes sure the metadata has been commited to permantent storage.
Can often be ignored when you don't really care to know you will
get the new contents (as opposed to the old contents) in case of
a crash.  MTAs and spools, for example, MUST do it.

Which steps you can skip is filesystem-options/filesystem/
kernel-version/kernel dependent.  When the rename acts as a barrier, [1]
can be skipped, for example.  Tracking this is a losing proposition.

If we could use some syscall to make [1] into a simple barrier request
(guaranteed to degrade to fsync if barriers are not operating), it would
be better performance-wise.  This is what one should request of libc and
the kernels with a non-zero chance of getting it implemented (in fact,
it might even already exist).

 I've brought this up on linux-fsdevel and linux-ext4 but they (Ted)
 claim those exceptions aren't really a problem.

Indeed they are not.  Code has been dealing with them for years.  You
name the temp file properly, and teach your program to clean old ones up
*safely* (see vim swap file handling for an example) when it starts.

vim is a good example: nobody gets surprised by vim swap-files left over
when vim/computer crashes. And vim will do something smart with them if
it finds them in the current directory when it is started.

BTW: safely removing a file is also tricky.  AFAIK, one must open it RW,
in exclusive mode. stat it by fd and check whether it is what one
expects (regular file, ownership).  unlink it by fd.  close the fd.

 Is there a code snippet or lib function that handles this properly?

I don't know.  I'd be interested in the answer, though :-)

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20101230114655.ga19...@khazad-dum.debian.net



Re: Safe File Update (atomic)

2010-12-30 Thread Olaf van der Spek
On Thu, Dec 30, 2010 at 12:46 PM, Henrique de Moraes Holschuh
h...@debian.org wrote:
  write temp file (in same directory as file to be replaced), fsync temp

What if the target name is actually a symlink? To a different volume?
What if you're not allowed to create a file in that dir.

 If we could use some syscall to make [1] into a simple barrier request
 (guaranteed to degrade to fsync if barriers are not operating), it would
 be better performance-wise.  This is what one should request of libc and
 the kernels with a non-zero chance of getting it implemented (in fact,
 it might even already exist).

My proposal was O_ATOMIC:
// begin transaction
open(fname, O_ATOMIC | O_TRUNC);
write; // 0+ times
close;

Seems like the ideal API from the app's point of view.

 I've brought this up on linux-fsdevel and linux-ext4 but they (Ted)
 claim those exceptions aren't really a problem.

 Indeed they are not.  Code has been dealing with them for years.  You

Code has been wrong for years to, based on the reason reports about
file corruption with ext4.

 name the temp file properly, and teach your program to clean old ones up
 *safely* (see vim swap file handling for an example) when it starts.

What about restoring meta-data? File-owner?

 vim is a good example: nobody gets surprised by vim swap-files left over
 when vim/computer crashes. And vim will do something smart with them if
 it finds them in the current directory when it is started.

I'm sure the vim code is far from trivial. I think this complexity is
part of the reason most apps don't bother.

 BTW: safely removing a file is also tricky.  AFAIK, one must open it RW,
 in exclusive mode. stat it by fd and check whether it is what one

Exclusive mode? Linux doesn't know about mandatory locking (AFAIK).

 expects (regular file, ownership).  unlink it by fd.  close the fd.

 Is there a code snippet or lib function that handles this properly?

 I don't know.  I'd be interested in the answer, though :-)

I'll ask glibc.

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktikm+dacfnq7lort9vo7p-m-gvn0dgqxup5au...@mail.gmail.com



Re: Safe File Update (atomic)

2010-12-30 Thread Shachar Shemesh

On 30/12/10 13:46, Henrique de Moraes Holschuh wrote:




Is there a code snippet or lib function that handles this properly?


I don't know.  I'd be interested in the answer, though :-)




I'm working on one under the MIT license. Will probably release it by 
the end of this week. Will also handle copying the permissions over and 
following symlinks.


Shachar

--
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d1c9d3b.6060...@debian.org



Re: Safe File Update (atomic)

2010-12-30 Thread Olaf van der Spek
On Thu, Dec 30, 2010 at 3:51 PM, Shachar Shemesh shac...@shemesh.biz wrote:
 I'm working on one under the MIT license. Will probably release it by the
 end of this week. Will also handle copying the permissions over and
 following symlinks.

Sounds great!
Got a project page already?
What aboue file owner? Meta-data (ACL)?

Olaf


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktik-o2mu47dfdvm8kedobjfhw7swkxcwy9fwh...@mail.gmail.com



Re: Safe File Update (atomic)

2010-12-30 Thread Mike Hommey
On Thu, Dec 30, 2010 at 03:30:29PM +0100, Olaf van der Spek wrote:
  name the temp file properly, and teach your program to clean old ones up
  *safely* (see vim swap file handling for an example) when it starts.
 
 What about restoring meta-data? File-owner?

owner, permissions, acl, xattrs, and whatever other future stuff can be
stored about files, which then all applications should be made aware of?
Yay for simplicity.

Mike


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20101230151011.ga12...@glandium.org



Re: Safe File Update (atomic)

2010-12-30 Thread Shachar Shemesh

On 30/12/10 17:02, Olaf van der Spek wrote:

On Thu, Dec 30, 2010 at 3:51 PM, Shachar Shemeshshac...@shemesh.biz  wrote:
   

I'm working on one under the MIT license. Will probably release it by the
end of this week. Will also handle copying the permissions over and
following symlinks.
 

Sounds great!
Got a project page already?
   
No. I was doing it as code to accompany an article on my company's site 
about how it should be done. I was originally out to write the article, 
and then decided to add code. A good thing, too, as recursively 
resolving symbolic links is not trivial. There is an extremely simple 
way to do it on Linux, but it will not work on all platforms (the *BSD 
platforms, including Mac, do not have /proc by default).

What aboue file owner? Meta-data (ACL)?

Olaf
   


The current code (I'm still working on it, or I would have released it 
already, but it's about 80% done) does copy owner data over (but ignores 
failures), but does not handle ACLs. I decided to postpone this 
particular hot potato until I can get a chance to see how to do it (i.e. 
- never had a chance on Linux) AND how to do it in a cross-platform way 
(the code is designed to work on any Posix). Pointers/patches once 
released are, of course, welcome :-)


Shachar

--
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d1ca143.9020...@debian.org



Re: Safe File Update (atomic)

2010-12-30 Thread Olaf van der Spek
On Thu, Dec 30, 2010 at 4:12 PM, Shachar Shemesh shac...@debian.org wrote:
 No. I was doing it as code to accompany an article on my company's site
 about how it should be done. I was originally out to write the article, and
 then decided to add code. A good thing, too, as recursively resolving
 symbolic links is not trivial. There is an extremely simple way to do it on
 Linux, but it will not work on all platforms (the *BSD platforms, including
 Mac, do not have /proc by default).

Depending on /proc is probably not reasonable.
Are you sure it will be atomic? ;)

 What aboue file owner? Meta-data (ACL)?

 Olaf


 The current code (I'm still working on it, or I would have released it
 already, but it's about 80% done) does copy owner data over (but ignores
 failures), but does not handle ACLs. I decided to postpone this particular

How do you preserve owner (as non-root)?

 hot potato until I can get a chance to see how to do it (i.e. - never had a
 chance on Linux) AND how to do it in a cross-platform way (the code is
 designed to work on any Posix). Pointers/patches once released are, of
 course, welcome :-)

The reason I asked for a kernelland solution is because it's hard if
not impossible to do properly in userland. But some kernel devs (Ted
and others) don't agree. They reason that the desire to preserve all
meta-data isn't reasonable by itself.

Olaf


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktik93zn1yjf5xyq_+rhaonrj1bszcafpnmkrt...@mail.gmail.com



Re: Safe File Update (atomic)

2010-12-30 Thread Henrique de Moraes Holschuh
On Thu, 30 Dec 2010, Olaf van der Spek wrote:
 On Thu, Dec 30, 2010 at 12:46 PM, Henrique de Moraes Holschuh
 h...@debian.org wrote:
   write temp file (in same directory as file to be replaced), fsync temp
 
 What if the target name is actually a symlink? To a different volume?

Indeed. You have to check that first, of course :-(  This is about safe
handling of such functions, symlinks always have to be derreferenced and
their target checked.  After that, you operate on the target, if the symlink
changes, your operations will not.

 What if you're not allowed to create a file in that dir.

You fail the write.  Or the user has to request the unsafe handling
(truncate + write).  Or you have to detect it will happen and switch modes
if you're allowed to.

  If we could use some syscall to make [1] into a simple barrier request
  (guaranteed to degrade to fsync if barriers are not operating), it would
  be better performance-wise.  This is what one should request of libc and
  the kernels with a non-zero chance of getting it implemented (in fact,
  it might even already exist).
 
 My proposal was O_ATOMIC:
 // begin transaction
 open(fname, O_ATOMIC | O_TRUNC);
 write; // 0+ times
 close;
 
 Seems like the ideal API from the app's point of view.

POSIX filesystems do not support it, so you'd need glibc to do everything
your application would have to get that atomicity.  I.e. it should go in a
separate lib, anyway, and you will have to code for it in the app :(

It is not transparent.  It cannot be.  What about mmap()?  What about
read+write patterns?

At most you could have an open+write+close function that encapsulate most
of the crap, with a few options to tell it what to do if it finds a symlink
or mismatched owner, what to do if it cannot do it in an atomic way, etc.

I suppose one could actually ask for a non-posix interface to do all those
three operations in one syscall, but I don't think the kernel people will
want to implement it.  It would make sense only if object stores become
commonplace (where this thing is likely an object store primitive, anyway).

  I've brought this up on linux-fsdevel and linux-ext4 but they (Ted)
  claim those exceptions aren't really a problem.
 
  Indeed they are not.  Code has been dealing with them for years.  You
 
 Code has been wrong for years to, based on the reason reports about
 file corruption with ext4.

Code written to *deal with files safely* by people who wanted to get it
right and actually checked what needs to be done, has been right for years.
And has piss-poor performance.

Code written by random joe which has no clue about the braindamages of POSIX
and Unix, well... this thread shows how much crap is really needed.

One can, obviously, have most filesystems be super-safe, and create a new
fadvise or something to say this is crap, be unsafe if you can.
Performance will be poor, everything will be safe, and the extra fsyncs()
will not hurt much because the fs would do it anyway.

  name the temp file properly, and teach your program to clean old ones up
  *safely* (see vim swap file handling for an example) when it starts.
 
 What about restoring meta-data? File-owner?

Hmm, yes, more steps if you want to do something like that, as you must do
it with the target open in exclusive mode.  close target only after the
rename went ok.

But if the file owner is not yourself, you really should change it, not to
mention you might not want to complete the operation in the first place.

A lib for this is a really good idea :p

  vim is a good example: nobody gets surprised by vim swap-files left over
  when vim/computer crashes. And vim will do something smart with them if
  it finds them in the current directory when it is started.
 
 I'm sure the vim code is far from trivial. I think this complexity is
 part of the reason most apps don't bother.

That I agree with completely.

  BTW: safely removing a file is also tricky.  AFAIK, one must open it RW,
  in exclusive mode. stat it by fd and check whether it is what one
 
 Exclusive mode? Linux doesn't know about mandatory locking (AFAIK).

Yeah... races everywhere...

  expects (regular file, ownership).  unlink it by fd.  close the fd.
 
  Is there a code snippet or lib function that handles this properly?
 
  I don't know.  I'd be interested in the answer, though :-)
 
 I'll ask glibc.

This really should be in a separate lib.  You want it to be usable outside
of glibc systems, and you CAN implement it (slow that it will be) on
anything POSIX.  You need only some help of the kernel to speed it up, and
that has to be detected at compile time (support) and runtime (availability
of the feature) anyway.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. 

Re: Safe File Update (atomic)

2010-12-30 Thread Henrique de Moraes Holschuh
On Thu, 30 Dec 2010, Olaf van der Spek wrote:
 The reason I asked for a kernelland solution is because it's hard if
 not impossible to do properly in userland. But some kernel devs (Ted
 and others) don't agree. They reason that the desire to preserve all
 meta-data isn't reasonable by itself.

It isn't.  And you can do it anyway:

1. open target, keep it open.
2. do the safe open+write dance on the temp target.
3. get metadata from target by fd
4. apply metadata to temp target by fd
5. atomic rename
6. close both fds
7. sync parent dir.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20101230152401.gb4...@khazad-dum.debian.net



Re: Safe File Update (atomic)

2010-12-30 Thread Olaf van der Spek
On Thu, Dec 30, 2010 at 4:20 PM, Henrique de Moraes Holschuh
h...@debian.org wrote:
 What if the target name is actually a symlink? To a different volume?

 Indeed. You have to check that first, of course :-(  This is about safe
 handling of such functions, symlinks always have to be derreferenced and
 their target checked.  After that, you operate on the target, if the symlink
 changes, your operations will not.

That's not really atomic.

 What if you're not allowed to create a file in that dir.

 You fail the write.

That's a regression from the non-atomic case.

 Or the user has to request the unsafe handling
 (truncate + write).  Or you have to detect it will happen and switch modes
 if you're allowed to.

  If we could use some syscall to make [1] into a simple barrier request
  (guaranteed to degrade to fsync if barriers are not operating), it would
  be better performance-wise.  This is what one should request of libc and
  the kernels with a non-zero chance of getting it implemented (in fact,
  it might even already exist).

 My proposal was O_ATOMIC:
 // begin transaction
 open(fname, O_ATOMIC | O_TRUNC);
 write; // 0+ times
 close;

 Seems like the ideal API from the app's point of view.

 POSIX filesystems do not support it, so you'd need glibc to do everything

Not yet, but I assume it'll be added when there's enough demand.

 your application would have to get that atomicity.  I.e. it should go in a
 separate lib, anyway, and you will have to code for it in the app :(

Why would it have to go in a separate lib?

 It is not transparent.  It cannot be.  What about mmap()?  What about
 read+write patterns?

They either happen before or after this atomic transaction. Comparable
to the rename workaround.

 At most you could have an open+write+close function that encapsulate most
 of the crap, with a few options to tell it what to do if it finds a symlink
 or mismatched owner, what to do if it cannot do it in an atomic way, etc.

 I suppose one could actually ask for a non-posix interface to do all those
 three operations in one syscall, but I don't think the kernel people will

There's no need for a single syscall.

 want to implement it.  It would make sense only if object stores become
 commonplace (where this thing is likely an object store primitive, anyway).

Nah. Tons of files are written in one go. All could use this atomic flag.

  I've brought this up on linux-fsdevel and linux-ext4 but they (Ted)
  claim those exceptions aren't really a problem.
 
  Indeed they are not.  Code has been dealing with them for years.  You

 Code has been wrong for years to, based on the reason reports about
 file corruption with ext4.

 Code written to *deal with files safely* by people who wanted to get it
 right and actually checked what needs to be done, has been right for years.
 And has piss-poor performance.

Isn't fixing / improving that a good thing?

 Code written by random joe which has no clue about the braindamages of POSIX
 and Unix, well... this thread shows how much crap is really needed.

So you agree that this should be improved?

 One can, obviously, have most filesystems be super-safe, and create a new
 fadvise or something to say this is crap, be unsafe if you can.
 Performance will be poor, everything will be safe, and the extra fsyncs()
 will not hurt much because the fs would do it anyway.

I actually think this can be done with better performance then the
rename workaround.

  name the temp file properly, and teach your program to clean old ones up
  *safely* (see vim swap file handling for an example) when it starts.

 What about restoring meta-data? File-owner?

 Hmm, yes, more steps if you want to do something like that, as you must do
 it with the target open in exclusive mode.  close target only after the
 rename went ok.

 But if the file owner is not yourself, you really should change it, not to
 mention you might not want to complete the operation in the first place.

Why? Of course write access to the file is required.

 I'll ask glibc.

 This really should be in a separate lib.  You want it to be usable outside
 of glibc systems, and you CAN implement it (slow that it will be) on
 anything POSIX.  You need only some help of the kernel to speed it up, and
 that has to be detected at compile time (support) and runtime (availability
 of the feature) anyway.

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktinhoftnychhjsd6og04jrvyube8ul55szyyl...@mail.gmail.com



Re: Safe File Update (atomic)

2010-12-30 Thread Olaf van der Spek
On Thu, Dec 30, 2010 at 4:24 PM, Henrique de Moraes Holschuh
h...@debian.org wrote:
 On Thu, 30 Dec 2010, Olaf van der Spek wrote:
 The reason I asked for a kernelland solution is because it's hard if
 not impossible to do properly in userland. But some kernel devs (Ted
 and others) don't agree. They reason that the desire to preserve all
 meta-data isn't reasonable by itself.

 It isn't.

Why not?

 And you can do it anyway:

 1. open target, keep it open.
 2. do the safe open+write dance on the temp target.
 3. get metadata from target by fd
 4. apply metadata to temp target by fd
 5. atomic rename
 6. close both fds
 7. sync parent dir.

Doesn't work for file-owner.
How does it handle meta-data you don't know about yet?

Olaf


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktimgqaavbzgwndr6bf87=1bvb1au++qje29d+...@mail.gmail.com



Re: Safe File Update (atomic)

2010-12-30 Thread Shachar Shemesh

On 30/12/10 13:46, Henrique de Moraes Holschuh wrote:




Is there a code snippet or lib function that handles this properly?
 

I don't know.  I'd be interested in the answer, though :-)

   


I'm working on one under the MIT license. Will probably release it by 
the end of this week. Will also handle copying the permissions over and 
following symlinks.


Shachar

--
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d1c9c74.2050...@shemesh.biz



Re: Safe File Update (atomic)

2010-12-30 Thread Henrique de Moraes Holschuh
On Thu, 30 Dec 2010, Olaf van der Spek wrote:
 On Thu, Dec 30, 2010 at 4:24 PM, Henrique de Moraes Holschuh
 h...@debian.org wrote:
  On Thu, 30 Dec 2010, Olaf van der Spek wrote:
  The reason I asked for a kernelland solution is because it's hard if
  not impossible to do properly in userland. But some kernel devs (Ted
  and others) don't agree. They reason that the desire to preserve all
  meta-data isn't reasonable by itself.
 
  It isn't.
 
 Why not?

You touched it, it is not the same file/inode anymore.

 How does it handle meta-data you don't know about yet?

It doesn't.  You need a copy inode without the file data filesystem
interface to be able to do that in the first place.  It might exist, but I
never heard of it.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20101230174822.ga20...@khazad-dum.debian.net



Re: Safe File Update (atomic)

2010-12-30 Thread Olaf van der Spek
On Thu, Dec 30, 2010 at 6:48 PM, Henrique de Moraes Holschuh
h...@debian.org wrote:
 Why not?

 You touched it, it is not the same file/inode anymore.

That's again a regression from the non-atomic case.

 How does it handle meta-data you don't know about yet?

 It doesn't.  You need a copy inode without the file data filesystem
 interface to be able to do that in the first place.  It might exist, but I
 never heard of it.

You wouldn't need that with O_ATOMIC.

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktimyynyyyw2osbkbg8wxgv2ybrdotzymzlu83...@mail.gmail.com



Re: Safe File Update (atomic)

2010-12-30 Thread Shachar Shemesh

On 30/12/10 19:48, Henrique de Moraes Holschuh wrote:


It doesn't.  You need a copy inode without the file data filesystem
interface to be able to do that in the first place.  It might exist, but I
never heard of it.

   


If my (extremely leaky) memory serves me right, Windows has it. It's 
called delete and then rename. It is not atomic (since when do Windows 
care about not breaking stuff), but it does exactly that.


If you delete a file and quickly (yes, this feature is time based) 
rename a different file to the same name, the new file will receive all 
metadata information the old file had (including owner, permissions etc.)


Just thought I'd share this little nugget to show you how much worse 
non-posix has it.


Shachar

--
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d1ccc38.6000...@debian.org



Re: Safe File Update (atomic)

2010-12-30 Thread Shachar Shemesh

On 30/12/10 17:17, Olaf van der Spek wrote:

On Thu, Dec 30, 2010 at 4:12 PM, Shachar Shemeshshac...@debian.org  wrote:
   

No. I was doing it as code to accompany an article on my company's site
about how it should be done. I was originally out to write the article, and
then decided to add code. A good thing, too, as recursively resolving
symbolic links is not trivial. There is an extremely simple way to do it on
Linux, but it will not work on all platforms (the *BSD platforms, including
Mac, do not have /proc by default).
 

Depending on /proc is probably not reasonable.
Are you sure it will be atomic? ;)

   
open old file, get fd (we'll assume it's 5). Do readlink on 
/proc/self/fd/5, and get file's real path. Do everything in said path. 
It's atomic, in the sense that the determining point in time is the 
point in which you opened the old file.


How do you preserve owner (as non-root)?

   
I thought I answered that. Best effort. You perform the chown, but do 
not bother with the return code. If it succeeded, great. If not, well, 
you did your best.


The reason I asked for a kernelland solution is because it's hard if
not impossible to do properly in userland. But some kernel devs (Ted
and others) don't agree. They reason that the desire to preserve all
meta-data isn't reasonable by itself.
   
I'm with Henrique on that one. I am more concerned with the amount of 
non-Posix code that needs to go into this than preserving all attributes.


Shachar

--
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d1ccd81.3010...@debian.org



Re: Safe File Update (atomic)

2010-12-30 Thread Olaf van der Spek
On Thu, Dec 30, 2010 at 7:15 PM, Shachar Shemesh shac...@debian.org wrote:
 If my (extremely leaky) memory serves me right, Windows has it. It's called
 delete and then rename. It is not atomic (since when do Windows care about
 not breaking stuff), but it does exactly that.

 If you delete a file and quickly (yes, this feature is time based) rename a
 different file to the same name, the new file will receive all metadata
 information the old file had (including owner, permissions etc.)

 Just thought I'd share this little nugget to show you how much worse
 non-posix has it.

You're kidding me. Got any source to back this up?

Olaf


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktik8ywzth67auoukrxmt2w1urmpgahnbg4k9s...@mail.gmail.com



Re: Safe File Update (atomic)

2010-12-30 Thread Olaf van der Spek
On Thu, Dec 30, 2010 at 7:20 PM, Shachar Shemesh shac...@debian.org wrote:
 Depending on /proc is probably not reasonable.
 Are you sure it will be atomic? ;)



 open old file, get fd (we'll assume it's 5). Do readlink on /proc/self/fd/5,
 and get file's real path. Do everything in said path. It's atomic, in the
 sense that the determining point in time is the point in which you opened
 the old file.

 How do you preserve owner (as non-root)?



 I thought I answered that. Best effort. You perform the chown, but do not
 bother with the return code. If it succeeded, great. If not, well, you did
 your best.

Ah. Another regression.


 The reason I asked for a kernelland solution is because it's hard if
 not impossible to do properly in userland. But some kernel devs (Ted
 and others) don't agree. They reason that the desire to preserve all
 meta-data isn't reasonable by itself.


 I'm with Henrique on that one. I am more concerned with the amount of
 non-Posix code that needs to go into this than preserving all attributes.

With kernel support you would only need a single non-POSIX flag.
Please be sure to document all assumptions / limitations of your variant.

Olaf


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktimzgfzwpj8phtevdycbxwwd5s7pp+enlcpi+...@mail.gmail.com



Re: Safe File Update (atomic)

2010-12-30 Thread Ben Hutchings
On Thu, 2010-12-30 at 19:29 +0100, Olaf van der Spek wrote:
 On Thu, Dec 30, 2010 at 7:15 PM, Shachar Shemesh shac...@debian.org wrote:
  If my (extremely leaky) memory serves me right, Windows has it. It's called
  delete and then rename. It is not atomic (since when do Windows care about
  not breaking stuff), but it does exactly that.
 
  If you delete a file and quickly (yes, this feature is time based) rename a
  different file to the same name, the new file will receive all metadata
  information the old file had (including owner, permissions etc.)
 
  Just thought I'd share this little nugget to show you how much worse
  non-posix has it.
 
 You're kidding me. Got any source to back this up?

http://support.microsoft.com/?kbid=172190

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


signature.asc
Description: This is a digitally signed message part


Re: Safe File Update (atomic)

2010-12-30 Thread Olaf van der Spek
On Thu, Dec 30, 2010 at 7:46 PM, Ben Hutchings b...@decadent.org.uk wrote:
 You're kidding me. Got any source to back this up?

 http://support.microsoft.com/?kbid=172190

Interesting. Although no longer available on Vista / 7.

Olaf


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktinuqjcgdg0aazqkiomfthqyorfzc89y7xquu...@mail.gmail.com



Re: Safe File Update (atomic)

2010-12-30 Thread Henrique de Moraes Holschuh
On Thu, 30 Dec 2010, Henrique de Moraes Holschuh wrote:
 BTW: safely removing a file is also tricky.  AFAIK, one must open it RW,
 in exclusive mode. stat it by fd and check whether it is what one
 expects (regular file, ownership).  unlink it by fd.  close the fd.

Eh, as it was pointed to me by private mail, this is obviously a load of
crap :p  There is no unlink by fd.  Sorry about that.

The attacks here are races by messing with intermediate path components,
which are either not worth bothering with, or have to be avoided in a
much more convoluted manner.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20101231021723.ga9...@khazad-dum.debian.net



Re: Safe File Update (atomic)

2010-12-30 Thread Shachar Shemesh

On 30/12/10 17:02, Olaf van der Spek wrote:

Got a project page already?
   


Watch this space. Actual code coming soon(tm).

https://github.com/Shachar/safewrite

Shachar

--
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d1d743b.8080...@shemesh.biz



Safe File Update (atomic)

2010-12-29 Thread Olaf van der Spek
Since the introduction of ext4, some apps/users have had issues with
file corruption after a system crash. It's not a bug in the FS AFAIK
and it's not exclusive to ext4.
Writing a temp file, fsync, rename is often proposed. However, the
durable aspect of fsync isn't always required and this way has other
issues, like resetting file owner, maybe losing meta-data, requiring
permission to create the temp file and having the temp file visible
(shortly, or permanently after a crash).

I've brought this up on linux-fsdevel and linux-ext4 but they (Ted)
claim those exceptions aren't really a problem.
Is there a code snippet or lib function that handles this properly?
What do you think about the exceptions?

Olaf


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktimz6ui+l76h=f1frtefb=-daghoeacvnjsp7...@mail.gmail.com