On Tue, Jan 23, 2018 at 12:45:53AM -0500, Theodore Ts'o wrote:
> What I was thinking about instead is that in cases where we know we
> are likely to be creating a large number of loose objects (whether
> they referenced or not), in a world where we will be calling fsync(2)
> after every single loo
On Mon, Jan 22, 2018 at 07:47:10PM -0500, Jeff King wrote:
>
> I think Ævar is talking about the case of:
>
> 1. You make 100 objects that aren't referenced. They're loose.
>
> 2. You run git-gc. They're still too recent to be deleted.
>
> Right now those recent loose objects sit loose, and
On Mon, Jan 22, 2018 at 01:09:03PM -0500, Theodore Ts'o wrote:
> > Wouldn't it also make gc pruning more expensive? Now you can repack
> > regularly and loose objects will be left out of the pack, and then just
> > rm'd, whereas now it would entail creating new packs (unless the whole
> > pack was
On Mon, Jan 22, 2018 at 04:09:23PM +0100, Ævar Arnfjörð Bjarmason wrote:
> > Yes, a "cruft pack" that holds unreachable object has been discussed
> > a few times recently on list, and I do agree that it is a desirable
> > thing to have in the longer run.
> >
> > What's tricky is to devise a way to
On Mon, Jan 22, 2018 at 04:09:23PM +0100, Ævar Arnfjörð Bjarmason wrote:
> > What's tricky is to devise a way to allow us to salvage objects that
> > are placed in a cruft pack because they are accessed recently,
> > proving themselves to be no longer crufts. It could be that a good
> > way to res
On Sat, Jan 20 2018, Junio C. Hamano jotted:
> Theodore Ts'o writes:
>
>> I've never been fond of the "git repack -A" behavior
>> where it can generate huge numbers of loose files. I'd much prefer it
>> if the other objects ended up in a separate pack file, and then some
>> other provisio
On 01/18/2018 11:27 AM, Christoph Hellwig wrote:
[adding Chris to the Cc list - this is about the awful ext3 data=ordered
behavior of syncing the whole file system data and metadata on each
fsync]
On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote:
On Wed, Jan 17, 2018 at 3:52 PM, T
Theodore Ts'o writes:
> I've never been fond of the "git repack -A" behavior
> where it can generate huge numbers of loose files. I'd much prefer it
> if the other objects ended up in a separate pack file, and then some
> other provision made for nuking that pack file some time later
On Fri, Jan 19, 2018 at 11:08:46AM -0800, Junio C Hamano wrote:
> So..., is it fair to say that the one you sent in
>
> https://public-inbox.org/git/20180117193510.ga30...@lst.de/
>
> is the best variant we have seen in this thread so far? I'll keep
> that in my inbox so that I do not forget,
Christoph Hellwig writes:
> [adding Chris to the Cc list - this is about the awful ext3 data=ordered
> behavior of syncing the whole file system data and metadata on each
> fsync]
>
> On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote:
>> On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o
[adding Chris to the Cc list - this is about the awful ext3 data=ordered
behavior of syncing the whole file system data and metadata on each
fsync]
On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote:
> On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o wrote:
> >
> > Well, let's be fair;
On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o wrote:
>
> Well, let's be fair; this is something *ext3* got wrong, and it was
> the default file system back them.
I'm pretty sure reiserfs and btrfs did too..
Linus
On Wed, Jan 17, 2018 at 02:07:22PM -0800, Linus Torvalds wrote:
>
> Now re-do the test while another process writes to a totally unrelated
> a huge file (say, do a ISO file copy or something).
>
> That was the thing that several filesystems get completely and
> horribly wrong. Generally _particul
On Wed, Jan 17, 2018 at 3:16 PM, Ævar Arnfjörð Bjarmason
wrote:
>
> Or does overall FS activity and raw throughput (e.g. with an ISO copy)
> matter more than general FS contention?
Traditionally, yes.
Also note that none of this is about "throughput". It's about waiting
for a second or two when
On Wed, Jan 17 2018, Linus Torvalds jotted:
> On Wed, Jan 17, 2018 at 1:44 PM, Ævar Arnfjörð Bjarmason
> wrote:
>>
>> I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered
>> on the tests I thought might do a lot of loose object writes:
>>
>> $ GIT_PERF_REPEAT_COUNT=1
On Wed, Jan 17, 2018 at 2:07 PM, Linus Torvalds
wrote:
>
> The original git design was very much to write each object file
> without any syncing, because they don't matter since a new object file
> - by definition - isn't really reachable. Then sync before writing the
> index file or a new ref.
.
On Wed, Jan 17, 2018 at 1:44 PM, Ævar Arnfjörð Bjarmason
wrote:
>
> I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered
> on the tests I thought might do a lot of loose object writes:
>
> $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux
> GIT_PERF_MAKE_OPTS="
On Wed, Jan 17 2018, Junio C. Hamano jotted:
> Christoph Hellwig writes:
>
>> fsync is required for data integrity as there is no gurantee that
>> data makes it to disk at any specified time without it. Even for
>> ext3 with data=ordered mode the file system will only commit all
>> data at some
On Wed, Jan 17, 2018 at 03:55:09PM -0500, Jeff King wrote:
> I'm definitely sympathetic, and I've contemplated a patch like this a
> few times. But I'm not sure we're "safe by default" here after this
> patch. In particular:
>
> 1. This covers only loose objects. We generally sync pack writes
>
On Wed, Jan 17, 2018 at 07:48:28PM +0100, Christoph Hellwig wrote:
> fsync is required for data integrity as there is no gurantee that
> data makes it to disk at any specified time without it. Even for
> ext3 with data=ordered mode the file system will only commit all
> data at some point in time
On Jan 17 2018, Christoph Hellwig wrote:
> I've lost data on development machines with various times countless
> times due to the lack of this option, and now lost trees on a
Too many times. :-)
Andreas.
--
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756
On Wed, Jan 17, 2018 at 11:37:31AM -0800, Matthew Wilcox wrote:
> How about this instead?
>
> This option is enabled by default and ensures data integrity by calling
> fsync after writing object files. It is not necessary on filesystems
> which journal data writes, but is still necessary on files
On Wed, Jan 17, 2018 at 11:04:32AM -0800, Junio C Hamano wrote:
> Christoph Hellwig writes:
> > @@ -866,10 +866,8 @@ core.whitespace::
> > core.fsyncObjectFiles::
> > This boolean will enable 'fsync()' when writing object files.
> > +
> > -This is a total waste of time and effort on a filesy
On Wed, Jan 17, 2018 at 11:04:32AM -0800, Junio C Hamano wrote:
> I am somewhat sympathetic to the desire to flip the default to
> "safe" and allow those who know they are already safe to tweak the
> knob for performance, and it also makes sense to document that the
> default is "true" here. But I
Christoph Hellwig writes:
> fsync is required for data integrity as there is no gurantee that
> data makes it to disk at any specified time without it. Even for
> ext3 with data=ordered mode the file system will only commit all
> data at some point in time that is not guaranteed.
It comes from
fsync is required for data integrity as there is no gurantee that
data makes it to disk at any specified time without it. Even for
ext3 with data=ordered mode the file system will only commit all
data at some point in time that is not guaranteed.
I've lost data on development machines with variou
On Tue, Jun 23, 2015 at 10:32:08PM -0700, Junio C Hamano wrote:
>
> Regarding loose object files, given that we write to a temporary,
> optionally fsync, close and then move to the final name, would we
> still see partially written file if we omit the fsync, or would the
> corruption be limited to
Theodore Ts'o writes:
> The main issue is that non-expert users might not realize that they
> really need to run "git fsck" after a crash; otherwise, what might
> happen is that although git is only appending, that you might have
> some zero-length (or partially written) git object or pack files,
Jeff King writes:
> I had always assumed this was fine on ext4 with data=ordered (i.e.,
> either the rename and its pointed-to content will go through, or not; so
> you either get your update or the old state, but not a garbage or empty
> file). But it sounds from what Ted wrote in:
>
> http://
On Tue, Jun 23, 2015 at 02:57:23PM -0700, Stefan Beller wrote:
> Linus Torvalds started a discussion[1] if we want to play rather safe
> than use defaults which make sense only for the most power users of Git:
>
> > So git is "safe" in the sense that you won't really lose any data,
> > but you ma
On Wed, Jun 24, 2015 at 4:57 AM, Stefan Beller wrote:
> Linus Torvalds started a discussion[1] if we want to play rather safe
> than use defaults which make sense only for the most power users of Git:
>
>> So git is "safe" in the sense that you won't really lose any data,
>> but you may well be in
The main issue is that non-expert users might not realize that they
really need to run "git fsck" after a crash; otherwise, what might
happen is that although git is only appending, that you might have
some zero-length (or partially written) git object or pack files, and
while you might not notice
Stefan Beller writes:
> Linus Torvalds started a discussion[1] if we want to play rather safe
> than use defaults which make sense only for the most power users of Git:
>
>> So git is "safe" in the sense that you won't really lose any data,
>> but you may well be inconvenienced. The "fsync each
Linus Torvalds started a discussion[1] if we want to play rather safe
than use defaults which make sense only for the most power users of Git:
> So git is "safe" in the sense that you won't really lose any data,
> but you may well be inconvenienced. The "fsync each object" config
> option is ther
34 matches
Mail list logo