Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-23 Thread Jeff King
On Tue, Jan 23, 2018 at 12:45:53AM -0500, Theodore Ts'o wrote: > What I was thinking about instead is that in cases where we know we > are likely to be creating a large number of loose objects (whether > they referenced or not), in a world where we will be calling fsync(2) > after every single loo

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-22 Thread Theodore Ts'o
On Mon, Jan 22, 2018 at 07:47:10PM -0500, Jeff King wrote: > > I think Ævar is talking about the case of: > > 1. You make 100 objects that aren't referenced. They're loose. > > 2. You run git-gc. They're still too recent to be deleted. > > Right now those recent loose objects sit loose, and

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-22 Thread Jeff King
On Mon, Jan 22, 2018 at 01:09:03PM -0500, Theodore Ts'o wrote: > > Wouldn't it also make gc pruning more expensive? Now you can repack > > regularly and loose objects will be left out of the pack, and then just > > rm'd, whereas now it would entail creating new packs (unless the whole > > pack was

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-22 Thread Jeff King
On Mon, Jan 22, 2018 at 04:09:23PM +0100, Ævar Arnfjörð Bjarmason wrote: > > Yes, a "cruft pack" that holds unreachable object has been discussed > > a few times recently on list, and I do agree that it is a desirable > > thing to have in the longer run. > > > > What's tricky is to devise a way to

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-22 Thread Theodore Ts'o
On Mon, Jan 22, 2018 at 04:09:23PM +0100, Ævar Arnfjörð Bjarmason wrote: > > What's tricky is to devise a way to allow us to salvage objects that > > are placed in a cruft pack because they are accessed recently, > > proving themselves to be no longer crufts. It could be that a good > > way to res

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-22 Thread Ævar Arnfjörð Bjarmason
On Sat, Jan 20 2018, Junio C. Hamano jotted: > Theodore Ts'o writes: > >> I've never been fond of the "git repack -A" behavior >> where it can generate huge numbers of loose files. I'd much prefer it >> if the other objects ended up in a separate pack file, and then some >> other provisio

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-21 Thread Chris Mason
On 01/18/2018 11:27 AM, Christoph Hellwig wrote: [adding Chris to the Cc list - this is about the awful ext3 data=ordered behavior of syncing the whole file system data and metadata on each fsync] On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote: On Wed, Jan 17, 2018 at 3:52 PM, T

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-20 Thread Junio C Hamano
Theodore Ts'o writes: > I've never been fond of the "git repack -A" behavior > where it can generate huge numbers of loose files. I'd much prefer it > if the other objects ended up in a separate pack file, and then some > other provision made for nuking that pack file some time later

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-20 Thread Theodore Ts'o
On Fri, Jan 19, 2018 at 11:08:46AM -0800, Junio C Hamano wrote: > So..., is it fair to say that the one you sent in > > https://public-inbox.org/git/20180117193510.ga30...@lst.de/ > > is the best variant we have seen in this thread so far? I'll keep > that in my inbox so that I do not forget,

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-19 Thread Junio C Hamano
Christoph Hellwig writes: > [adding Chris to the Cc list - this is about the awful ext3 data=ordered > behavior of syncing the whole file system data and metadata on each > fsync] > > On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote: >> On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-18 Thread Christoph Hellwig
[adding Chris to the Cc list - this is about the awful ext3 data=ordered behavior of syncing the whole file system data and metadata on each fsync] On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote: > On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o wrote: > > > > Well, let's be fair;

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Linus Torvalds
On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o wrote: > > Well, let's be fair; this is something *ext3* got wrong, and it was > the default file system back them. I'm pretty sure reiserfs and btrfs did too.. Linus

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Theodore Ts'o
On Wed, Jan 17, 2018 at 02:07:22PM -0800, Linus Torvalds wrote: > > Now re-do the test while another process writes to a totally unrelated > a huge file (say, do a ISO file copy or something). > > That was the thing that several filesystems get completely and > horribly wrong. Generally _particul

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Linus Torvalds
On Wed, Jan 17, 2018 at 3:16 PM, Ævar Arnfjörð Bjarmason wrote: > > Or does overall FS activity and raw throughput (e.g. with an ISO copy) > matter more than general FS contention? Traditionally, yes. Also note that none of this is about "throughput". It's about waiting for a second or two when

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Ævar Arnfjörð Bjarmason
On Wed, Jan 17 2018, Linus Torvalds jotted: > On Wed, Jan 17, 2018 at 1:44 PM, Ævar Arnfjörð Bjarmason > wrote: >> >> I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered >> on the tests I thought might do a lot of loose object writes: >> >> $ GIT_PERF_REPEAT_COUNT=1

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Linus Torvalds
On Wed, Jan 17, 2018 at 2:07 PM, Linus Torvalds wrote: > > The original git design was very much to write each object file > without any syncing, because they don't matter since a new object file > - by definition - isn't really reachable. Then sync before writing the > index file or a new ref. .

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Linus Torvalds
On Wed, Jan 17, 2018 at 1:44 PM, Ævar Arnfjörð Bjarmason wrote: > > I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered > on the tests I thought might do a lot of loose object writes: > > $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux > GIT_PERF_MAKE_OPTS="

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Ævar Arnfjörð Bjarmason
On Wed, Jan 17 2018, Junio C. Hamano jotted: > Christoph Hellwig writes: > >> fsync is required for data integrity as there is no gurantee that >> data makes it to disk at any specified time without it. Even for >> ext3 with data=ordered mode the file system will only commit all >> data at some

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Christoph Hellwig
On Wed, Jan 17, 2018 at 03:55:09PM -0500, Jeff King wrote: > I'm definitely sympathetic, and I've contemplated a patch like this a > few times. But I'm not sure we're "safe by default" here after this > patch. In particular: > > 1. This covers only loose objects. We generally sync pack writes >

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Jeff King
On Wed, Jan 17, 2018 at 07:48:28PM +0100, Christoph Hellwig wrote: > fsync is required for data integrity as there is no gurantee that > data makes it to disk at any specified time without it. Even for > ext3 with data=ordered mode the file system will only commit all > data at some point in time

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Andreas Schwab
On Jan 17 2018, Christoph Hellwig wrote: > I've lost data on development machines with various times countless > times due to the lack of this option, and now lost trees on a Too many times. :-) Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Christoph Hellwig
On Wed, Jan 17, 2018 at 11:37:31AM -0800, Matthew Wilcox wrote: > How about this instead? > > This option is enabled by default and ensures data integrity by calling > fsync after writing object files. It is not necessary on filesystems > which journal data writes, but is still necessary on files

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Matthew Wilcox
On Wed, Jan 17, 2018 at 11:04:32AM -0800, Junio C Hamano wrote: > Christoph Hellwig writes: > > @@ -866,10 +866,8 @@ core.whitespace:: > > core.fsyncObjectFiles:: > > This boolean will enable 'fsync()' when writing object files. > > + > > -This is a total waste of time and effort on a filesy

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Christoph Hellwig
On Wed, Jan 17, 2018 at 11:04:32AM -0800, Junio C Hamano wrote: > I am somewhat sympathetic to the desire to flip the default to > "safe" and allow those who know they are already safe to tweak the > knob for performance, and it also makes sense to document that the > default is "true" here. But I

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Junio C Hamano
Christoph Hellwig writes: > fsync is required for data integrity as there is no gurantee that > data makes it to disk at any specified time without it. Even for > ext3 with data=ordered mode the file system will only commit all > data at some point in time that is not guaranteed. It comes from

[PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Christoph Hellwig
fsync is required for data integrity as there is no gurantee that data makes it to disk at any specified time without it. Even for ext3 with data=ordered mode the file system will only commit all data at some point in time that is not guaranteed. I've lost data on development machines with variou

Re: [PATCH] Enable core.fsyncObjectFiles by default

2015-06-24 Thread Theodore Ts'o
On Tue, Jun 23, 2015 at 10:32:08PM -0700, Junio C Hamano wrote: > > Regarding loose object files, given that we write to a temporary, > optionally fsync, close and then move to the final name, would we > still see partially written file if we omit the fsync, or would the > corruption be limited to

Re: [PATCH] Enable core.fsyncObjectFiles by default

2015-06-23 Thread Junio C Hamano
Theodore Ts'o writes: > The main issue is that non-expert users might not realize that they > really need to run "git fsck" after a crash; otherwise, what might > happen is that although git is only appending, that you might have > some zero-length (or partially written) git object or pack files,

Re: [PATCH] Enable core.fsyncObjectFiles by default

2015-06-23 Thread Junio C Hamano
Jeff King writes: > I had always assumed this was fine on ext4 with data=ordered (i.e., > either the rename and its pointed-to content will go through, or not; so > you either get your update or the old state, but not a garbage or empty > file). But it sounds from what Ted wrote in: > > http://

Re: [PATCH] Enable core.fsyncObjectFiles by default

2015-06-23 Thread Jeff King
On Tue, Jun 23, 2015 at 02:57:23PM -0700, Stefan Beller wrote: > Linus Torvalds started a discussion[1] if we want to play rather safe > than use defaults which make sense only for the most power users of Git: > > > So git is "safe" in the sense that you won't really lose any data, > > but you ma

Re: [PATCH] Enable core.fsyncObjectFiles by default

2015-06-23 Thread Duy Nguyen
On Wed, Jun 24, 2015 at 4:57 AM, Stefan Beller wrote: > Linus Torvalds started a discussion[1] if we want to play rather safe > than use defaults which make sense only for the most power users of Git: > >> So git is "safe" in the sense that you won't really lose any data, >> but you may well be in

Re: [PATCH] Enable core.fsyncObjectFiles by default

2015-06-23 Thread Theodore Ts'o
The main issue is that non-expert users might not realize that they really need to run "git fsck" after a crash; otherwise, what might happen is that although git is only appending, that you might have some zero-length (or partially written) git object or pack files, and while you might not notice

Re: [PATCH] Enable core.fsyncObjectFiles by default

2015-06-23 Thread Junio C Hamano
Stefan Beller writes: > Linus Torvalds started a discussion[1] if we want to play rather safe > than use defaults which make sense only for the most power users of Git: > >> So git is "safe" in the sense that you won't really lose any data, >> but you may well be inconvenienced. The "fsync each

[PATCH] Enable core.fsyncObjectFiles by default

2015-06-23 Thread Stefan Beller
Linus Torvalds started a discussion[1] if we want to play rather safe than use defaults which make sense only for the most power users of Git: > So git is "safe" in the sense that you won't really lose any data, > but you may well be inconvenienced. The "fsync each object" config > option is ther