Re: PATCH reduce impact of FIFREEZE on userland processes

2012-12-09 Thread Dave Chinner
On Sat, Dec 08, 2012 at 08:47:34AM +, Alun wrote:
> On Sat, 8 Dec 2012 12:20:29 +1100
> Dave Chinner  wrote:
> 
> First off, thanks for the examples. I'll answer your one question and
> then I'll shut up!
> 
> > > I'll try and chase this up by submitting patches to lvcreate and
> > > fsfreeze (in the former case, I think there's no reason not to run
> > > syncfs; in the latter perhaps it should be a command line option).
> > 
> > Is that even necesary? users can issue the sync themselves if
> > necessary
> 
> I think it's necessary for the issue to be better documented in LVM at
> the very least. I've dabbled with LVM for nearly 10 years, and used it
> in a busy production environment for around 6. For nearly 2 years I've
> been seeing, every now and then, these odd cases where taking a snapshot
> caused irrecoverable high load on the server.

Irrecoverable in what way?

> I've never seen any
> mention anywhere of the advisability of manually running sync prior to
> taking a snapshot on a busy system, and I had to get down to looking at
> the kernel sources before I got an inkling this might be the issue. I'd
> imagine that the vast majority of end users think the same way as I
> did, viz that taking a snapshot was designed to have minimal effect on
> any other users of the filesystem.

Right - minimal effect, not "no effect".

> There's also the issue that AFAIK there's no commonly distributed
> program which will allow you to call syncfs() on a filesystem. Running
> sync is a bit of a sledgehammer approach for a busy system with
> multiple large filesystems.

I have no doubt that you could write the 20 lines of C code needed
to use syncfs ;)

> I've submitted a patch to util-linux, adding a --sync option to
> fsfreeze which, if specified, will syncfs the requested filesystem
> prior to any freeze operation. Hopefully they'll accept this, though
> the only comment I've received so far suggested that I should be
> submitting a kernel patch rather than band aiding it in userland!

Perhaps that tells you something - that both sides are telling you
it's a band aid for your specific issue? :/

fsfreeze is a data integrity operation and some people rely on it to
take immediate effect as it currently does. IMO, that's the bar that
the any generic freeze optimisation has to overcome.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH reduce impact of FIFREEZE on userland processes

2012-12-09 Thread Dave Chinner
On Sat, Dec 08, 2012 at 07:12:04AM -0500, Christoph Hellwig wrote:
> On Fri, Dec 07, 2012 at 11:42:55AM +1100, Dave Chinner wrote:
> > The problem wth doing this is that the sync can delay the freeze
> > process by quite some time under the exact conditions you describe.
> > If you want freeze to take effect immediately (i.e instantly stop
> > new modifications), then adding a sync will break this semantic.
> > THere are existing users of freeze that require this behaviour...
> 
> But that's only because he uses the big hammer sync_filesystem() which
> actually waits for I/O completion.  I agree that this is a bad idea,
> but if we'd just do a writeback_inodes_sb() call in this place that
> starts asynchronous writeout I think everyone would benefit.

The problem with that is that async writeback will block on IO
submission as soon as the disk backs up on congestion. It's
effectively still waiting on IO completions to occur, only now
indirectly through the request queue submission process.

Hence I think for the heavily loaded situations that are causing
freeze latency related issues, sync or async pre-flushes are going
to cause exactly the same delays to freezing writes

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH reduce impact of FIFREEZE on userland processes

2012-12-08 Thread Christoph Hellwig
On Fri, Dec 07, 2012 at 11:42:55AM +1100, Dave Chinner wrote:
> The problem wth doing this is that the sync can delay the freeze
> process by quite some time under the exact conditions you describe.
> If you want freeze to take effect immediately (i.e instantly stop
> new modifications), then adding a sync will break this semantic.
> THere are existing users of freeze that require this behaviour...

But that's only because he uses the big hammer sync_filesystem() which
actually waits for I/O completion.  I agree that this is a bad idea,
but if we'd just do a writeback_inodes_sb() call in this place that
starts asynchronous writeout I think everyone would benefit.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH reduce impact of FIFREEZE on userland processes

2012-12-08 Thread Alun
On Sat, 8 Dec 2012 12:20:29 +1100
Dave Chinner  wrote:

First off, thanks for the examples. I'll answer your one question and
then I'll shut up!

> > I'll try and chase this up by submitting patches to lvcreate and
> > fsfreeze (in the former case, I think there's no reason not to run
> > syncfs; in the latter perhaps it should be a command line option).
> 
> Is that even necesary? users can issue the sync themselves if
> necessary

I think it's necessary for the issue to be better documented in LVM at
the very least. I've dabbled with LVM for nearly 10 years, and used it
in a busy production environment for around 6. For nearly 2 years I've
been seeing, every now and then, these odd cases where taking a snapshot
caused irrecoverable high load on the server. I've never seen any
mention anywhere of the advisability of manually running sync prior to
taking a snapshot on a busy system, and I had to get down to looking at
the kernel sources before I got an inkling this might be the issue. I'd
imagine that the vast majority of end users think the same way as I
did, viz that taking a snapshot was designed to have minimal effect on
any other users of the filesystem.

There's also the issue that AFAIK there's no commonly distributed
program which will allow you to call syncfs() on a filesystem. Running
sync is a bit of a sledgehammer approach for a busy system with
multiple large filesystems.

I've submitted a patch to util-linux, adding a --sync option to
fsfreeze which, if specified, will syncfs the requested filesystem
prior to any freeze operation. Hopefully they'll accept this, though
the only comment I've received so far suggested that I should be
submitting a kernel patch rather than band aiding it in userland!

Looking at the LVM sources, it would appear that the freezing of 
affected filesystems is done in the kernel side of device mapper. I'm
not going there!

Anyway, thanks for your time.

Cheers,
Alun.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH reduce impact of FIFREEZE on userland processes

2012-12-07 Thread Dave Chinner
On Fri, Dec 07, 2012 at 08:59:52AM +, Alun wrote:
> Dave Chinner  said, in message
> 20121207004255.GC27172@dastard:
> > 
> > The problem wth doing this is that the sync can delay the freeze
> > process by quite some time under the exact conditions you describe.
> > If you want freeze to take effect immediately (i.e instantly stop
> > new modifications), then adding a sync will break this semantic.
> > THere are existing users of freeze that require this behaviour...
> 
> Ahh, that would be the subtlety I was worried might exist! Thanks.
> 
> The specific issue that brought me here was that, on a fairly heavily
> loaded file server (>1000 connected Windows clients), taking an LVM
> snapshot caused enough of an interruption to service that many of the
> Windows clients disconnected and reconnected, so causing a huge process
> load on the server - enough that we'd completely lose service and have
> to reboot. Chasing this down, I noticed that FIFREEZE does a filesystem
> sync, and it seemed to me that adding another one prior to blocking
> writes was an easy hit. 

Yup, that's typical.

> I'm not trying to argue my case here - you've convinced me that this
> change in semantics is risky and removes flexibility.
> 
> I'll try and chase this up by submitting patches to lvcreate and
> fsfreeze (in the former case, I think there's no reason not to run
> syncfs; in the latter perhaps it should be a command line option).

Is that even necesary? users can issue the sync themselves if
necessary

> > That, to me, is irrelevant, because something is normally done while
> > the filesystem is frozen. It's not uncommon for freeze periods to
> > extend to minutes while work is done by whatever required the
> > freeze. Hence the few seconds it takes to acheive the frozen state is
> > mostly irrelevant.
> 
> You've referred twice to existing systems that would break in the
> presence of this change.  I'm really having trouble thinking of a
> situation where it's critical to have writes suspended *NOW* and where
> it's valid to keep them suspended for minutes.

Say you get your filesystem reporting a read error in a directory.
There are people out there that will immediately freeze the
filesystem (to prevent potential damage from being propagated) while
they investigate the problem and determine their next action. This
may even involve running non-modifying fsck on the underlying block
device while the filesystem is frozen...

Then there is systems like HA servers that share a filesystem in a
primary/secondary setup - freezes are often used in failover
situations. This ensures all cached dirty data is written to disk
in preparation for the other node to mount it. Freezing the
filesystem ensures that spurious errors are not returned to
applications/clients while the failover takes place. Hence the
filesystem can remain frozen for some time while everything on the
new primary node is started up and fences/STONITHs the frozen
node

Then there's co-ordinating management operations on filesystems that
span multiple storage arrays (e.g. for hardware based snapshots,
cloning, etc), VM guest migration between two physical hosts, and so
on. Freeze is use for a lot more things than LVM snapshots...

> I'd have thought that,
> in the vast majority of cases, the critical thing was to minimise the
> time for which writes were suspended.

In the obvious use cases, yes. Once you look outside snapshots to
consider applications that need a stable, unchanging filesystem in
an application transparent manner, you'll find lots of interesting
uses for FIFREEZE

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH reduce impact of FIFREEZE on userland processes

2012-12-07 Thread Alun
Dave Chinner  said, in message
20121207004255.GC27172@dastard:
> 
> The problem wth doing this is that the sync can delay the freeze
> process by quite some time under the exact conditions you describe.
> If you want freeze to take effect immediately (i.e instantly stop
> new modifications), then adding a sync will break this semantic.
> THere are existing users of freeze that require this behaviour...

Ahh, that would be the subtlety I was worried might exist! Thanks.

The specific issue that brought me here was that, on a fairly heavily
loaded file server (>1000 connected Windows clients), taking an LVM
snapshot caused enough of an interruption to service that many of the
Windows clients disconnected and reconnected, so causing a huge process
load on the server - enough that we'd completely lose service and have
to reboot. Chasing this down, I noticed that FIFREEZE does a filesystem
sync, and it seemed to me that adding another one prior to blocking
writes was an easy hit. 

I'm not trying to argue my case here - you've convinced me that this
change in semantics is risky and removes flexibility.

I'll try and chase this up by submitting patches to lvcreate and
fsfreeze (in the former case, I think there's no reason not to run
syncfs; in the latter perhaps it should be a command line option).

> That, to me, is irrelevant, because something is normally done while
> the filesystem is frozen. It's not uncommon for freeze periods to
> extend to minutes while work is done by whatever required the
> freeze. Hence the few seconds it takes to acheive the frozen state is
> mostly irrelevant.

You've referred twice to existing systems that would break in the
presence of this change.  I'm really having trouble thinking of a
situation where it's critical to have writes suspended *NOW* and where
it's valid to keep them suspended for minutes. I'd have thought that,
in the vast majority of cases, the critical thing was to minimise the
time for which writes were suspended. Would you mind describing the use
case you're thinking of?

Cheers,
Alun.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH reduce impact of FIFREEZE on userland processes

2012-12-06 Thread Dave Chinner
On Wed, Dec 05, 2012 at 09:17:07PM +, Alun wrote:
> 
> This patch is against kernel version 3.7-rc7.
> 
> The FIFREEZE ioctl blocks userland writes, then calls sync_filesystem.
> If there is a large amount of dirty data, this sync can take a
> substantial time to complete, with corresponding loss of responsiveness
> to any userland processes wishing to write.
> 
> This patch simply adds an extra call to sync_filesystem prior to
> blocking writes, so that (hopefully) the majority of outstanding dirty
> data has been flushed before we impact on userland.

The problem wth doing this is that the sync can delay the freeze
process by quite some time under the exact conditions you describe.
If you want freeze to take effect immediately (i.e instantly stop
new modifications), then adding a sync will break this semantic.
THere are existing users of freeze that require this behaviour...

> I'm a complete kernel newbie and have only done some pretty minimal
> testing on my own machine, but with the patch in place the impact of
> running "fsfreeze -f" immediately followed by "fsfreeze -u" on a
> moderately loaded filesystem (as measured by time taken for a write()
> to complete) was reduced from 2.5 to 0.2 seconds.

That, to me, is irrelevant, because something is normally done while
the filesystem is frozen. It's not uncommon for freeze periods to
extend to minutes while work is done by whatever required the
freeze. Hence the few seconds it takes to acheive the frozen state is
mostly irrelevant.

If you are really concerned by minimising the amount of time it
takes to freeze, then "syncfs; fsfreeze -f; fsfreeze -u" will get
you exactly the same result as your patch, without having any bad
side effects for other users

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PATCH reduce impact of FIFREEZE on userland processes

2012-12-05 Thread Alun

This patch is against kernel version 3.7-rc7.

The FIFREEZE ioctl blocks userland writes, then calls sync_filesystem.
If there is a large amount of dirty data, this sync can take a
substantial time to complete, with corresponding loss of responsiveness
to any userland processes wishing to write.

This patch simply adds an extra call to sync_filesystem prior to
blocking writes, so that (hopefully) the majority of outstanding dirty
data has been flushed before we impact on userland.

I'm a complete kernel newbie and have only done some pretty minimal
testing on my own machine, but with the patch in place the impact of
running "fsfreeze -f" immediately followed by "fsfreeze -u" on a
moderately loaded filesystem (as measured by time taken for a write()
to complete) was reduced from 2.5 to 0.2 seconds. Hopefully there's no
subtlety in how all this works, and that adding the extra call has no
scary implications...

Signed-off-by: Alun Jones 

--- linux-3.7-rc7/fs/super.c.orig   2012-11-29 17:35:37.0
+ +++ linux-3.7-rc7/fs/super.c2012-12-05 20:56:38.730631855
+ @@ -1314,6 +1314,11 @@ int freeze_super(struct super_block *sb)
return 0;
}
 
+   /* Sync before we block writes to reduce the amount of
+* work that has to be done afterwards.
+*/
+   sync_filesystem(sb);
+
/* From now on, no new normal writers can start */
sb->s_writers.frozen = SB_FREEZE_WRITE;
smp_wmb();
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/