Re: [Cluster-devel] [Lsf-pc] [LSF/MM ATTEND] [TOPIC] fs/block interface discussions

2014-12-12 Thread Steven Whitehouse

Hi,

On 11/12/14 00:52, Alasdair G Kergon wrote:

On Wed, Dec 10, 2014 at 07:46:51PM +0100, Jan Kara wrote:

   But still you first need to stop all writes to the filesystem, then do a
sync, and then allow writing again - which is exactly what freeze does.

And with device-mapper, we were asked to support the taking of snapshots
of multiple volumes simultaneously (e.g. where the application data is
stored across more than one filesystem). Thin dm snapshots can handle
this (the original non-thin ones can't).

Alasdair



Thats good to know, and a useful feature. One of the issues I can see is 
that because there are a number of different layers involved 
(application/fs/storage) coordination of requirements between those is 
not easy. To try to answer Jan's question earlier in the thread, no I 
don't know any specific application developers, but I can certainly help 
to propose some kind of solution, and then get some feedback. I think it 
is probably going to be easier to start with a specific proposal, albeit 
tentative, and then ask for feedback than to just say how should we do 
this? which is a lot more open ended.


Going back to the other point above regarding freeze, is it not 
necessarily a requirement to stop all writes in order to do a snapshot, 
what is needed is in effect a barrier between operations which should be 
represented in the snapshot and those which should not, because they 
happen after the snapshot has been taken. Not that I'm particularly 
attached to that proposal as it stands, but I hope it demonstrates the 
kind of thing I had in mind for discussion. I hope also that it will be 
possible to come up with a better solution during and/or following the 
discussion.


The goal  would really be to figure out which bits we already have, 
which bits are missing, where the problems are, what can be done better, 
and so forth, while we have at least two of the three layers represented 
and in the same room. This is very much something for the long term 
rather than a quick discussion followed by a few patches kind of thing, 
I think,


Steve.



Re: [Cluster-devel] [Lsf-pc] [LSF/MM ATTEND] [TOPIC] fs/block interface discussions

2014-12-12 Thread Jan Kara
  Hi,

On Fri 12-12-14 11:46:34, Steven Whitehouse wrote:
 On 11/12/14 00:52, Alasdair G Kergon wrote:
 On Wed, Dec 10, 2014 at 07:46:51PM +0100, Jan Kara wrote:
But still you first need to stop all writes to the filesystem, then do a
 sync, and then allow writing again - which is exactly what freeze does.
 And with device-mapper, we were asked to support the taking of snapshots
 of multiple volumes simultaneously (e.g. where the application data is
 stored across more than one filesystem). Thin dm snapshots can handle
 this (the original non-thin ones can't).
 
 Thats good to know, and a useful feature. One of the issues I can
 see is that because there are a number of different layers involved
 (application/fs/storage) coordination of requirements between those
 is not easy. To try to answer Jan's question earlier in the thread,
 no I don't know any specific application developers, but I can
 certainly help to propose some kind of solution, and then get some
 feedback. I think it is probably going to be easier to start with a
 specific proposal, albeit tentative, and then ask for feedback than
 to just say how should we do this? which is a lot more open ended.
 
 Going back to the other point above regarding freeze, is it not
 necessarily a requirement to stop all writes in order to do a
 snapshot, what is needed is in effect a barrier between operations
 which should be represented in the snapshot and those which should
 not, because they happen after the snapshot has been taken. Not
 that I'm particularly attached to that proposal as it stands, but I
 hope it demonstrates the kind of thing I had in mind for discussion.
 I hope also that it will be possible to come up with a better
 solution during and/or following the discussion.
  I think understand your idea with a 'barrier'. It's just that I have
troubles seeing how it would actually get implemented - how do you make
sure that e.g. after writing back block allocation bitmap and while writing
back other metadata, noone can allocate new blocks to file 'foo' and still
writeback the file's inode before you submit the barrier?

 The goal  would really be to figure out which bits we already have,
 which bits are missing, where the problems are, what can be done
 better, and so forth, while we have at least two of the three layers
 represented and in the same room. This is very much something for
 the long term rather than a quick discussion followed by a few
 patches kind of thing, I think,
  Sure, if you have some proposal (not necessarily patches) then it's
probably worth talking about.

Honza
-- 
Jan Kara j...@suse.cz
SUSE Labs, CR



Re: [Cluster-devel] [Lsf-pc] [LSF/MM ATTEND] [TOPIC] fs/block interface discussions

2014-12-12 Thread Alex Elsayed
Alex Elsayed wrote:

 Jan Kara wrote:
 
   Hi,
 
 On Fri 12-12-14 11:46:34, Steven Whitehouse wrote:
 On 11/12/14 00:52, Alasdair G Kergon wrote:
 On Wed, Dec 10, 2014 at 07:46:51PM +0100, Jan Kara wrote:
But still you first need to stop all writes to the filesystem, then
do a
 sync, and then allow writing again - which is exactly what freeze
 does.
 And with device-mapper, we were asked to support the taking of
 snapshots of multiple volumes simultaneously (e.g. where the
 application data is stored across more than one filesystem). Thin dm
 snapshots can handle this (the original non-thin ones can't).
 
 Thats good to know, and a useful feature. One of the issues I can
 see is that because there are a number of different layers involved
 (application/fs/storage) coordination of requirements between those
 is not easy. To try to answer Jan's question earlier in the thread,
 no I don't know any specific application developers, but I can
 certainly help to propose some kind of solution, and then get some
 feedback. I think it is probably going to be easier to start with a
 specific proposal, albeit tentative, and then ask for feedback than
 to just say how should we do this? which is a lot more open ended.
 
 Going back to the other point above regarding freeze, is it not
 necessarily a requirement to stop all writes in order to do a
 snapshot, what is needed is in effect a barrier between operations
 which should be represented in the snapshot and those which should
 not, because they happen after the snapshot has been taken. Not
 that I'm particularly attached to that proposal as it stands, but I
 hope it demonstrates the kind of thing I had in mind for discussion.
 I hope also that it will be possible to come up with a better
 solution during and/or following the discussion.
   I think understand your idea with a 'barrier'. It's just that I have
 troubles seeing how it would actually get implemented - how do you make
 sure that e.g. after writing back block allocation bitmap and while
 writing back other metadata, noone can allocate new blocks to file 'foo'
 and still writeback the file's inode before you submit the barrier?
 
 Actually, I suspect something could be (relatively) trivially implemented
 using a similar strategy to dm-era. Snapshots increment the era; blocks
 from previous eras cannot be overwritten or removed, and the target could
 be mapped to view a past era. With that, you have essentially
 instantaneous snapshots (increment a counter) with only a barrier
 constraint, not freezing.

Thinking on it further, I'd suspect dm-thinp would also be fine with some 
form of barrier, rather than full freezing - generally speaking, if 
snapshots are (roughly) instantaneous, then we don't need to care so much 
about during the snapshot - ensuring a consistent state at one instant and 
preventing reordering across it should be sufficient.

 The goal  would really be to figure out which bits we already have,
 which bits are missing, where the problems are, what can be done
 better, and so forth, while we have at least two of the three layers
 represented and in the same room. This is very much something for
 the long term rather than a quick discussion followed by a few
 patches kind of thing, I think,
   Sure, if you have some proposal (not necessarily patches) then it's
 probably worth talking about.
 
 Honza
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-fsdevel
 in the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [Cluster-devel] [Lsf-pc] [LSF/MM ATTEND] [TOPIC] fs/block interface discussions

2014-12-12 Thread Alex Elsayed
Jan Kara wrote:

   Hi,
 
 On Fri 12-12-14 11:46:34, Steven Whitehouse wrote:
 On 11/12/14 00:52, Alasdair G Kergon wrote:
 On Wed, Dec 10, 2014 at 07:46:51PM +0100, Jan Kara wrote:
But still you first need to stop all writes to the filesystem, then
do a
 sync, and then allow writing again - which is exactly what freeze does.
 And with device-mapper, we were asked to support the taking of snapshots
 of multiple volumes simultaneously (e.g. where the application data is
 stored across more than one filesystem). Thin dm snapshots can handle
 this (the original non-thin ones can't).
 
 Thats good to know, and a useful feature. One of the issues I can
 see is that because there are a number of different layers involved
 (application/fs/storage) coordination of requirements between those
 is not easy. To try to answer Jan's question earlier in the thread,
 no I don't know any specific application developers, but I can
 certainly help to propose some kind of solution, and then get some
 feedback. I think it is probably going to be easier to start with a
 specific proposal, albeit tentative, and then ask for feedback than
 to just say how should we do this? which is a lot more open ended.
 
 Going back to the other point above regarding freeze, is it not
 necessarily a requirement to stop all writes in order to do a
 snapshot, what is needed is in effect a barrier between operations
 which should be represented in the snapshot and those which should
 not, because they happen after the snapshot has been taken. Not
 that I'm particularly attached to that proposal as it stands, but I
 hope it demonstrates the kind of thing I had in mind for discussion.
 I hope also that it will be possible to come up with a better
 solution during and/or following the discussion.
   I think understand your idea with a 'barrier'. It's just that I have
 troubles seeing how it would actually get implemented - how do you make
 sure that e.g. after writing back block allocation bitmap and while
 writing back other metadata, noone can allocate new blocks to file 'foo'
 and still writeback the file's inode before you submit the barrier?

Actually, I suspect something could be (relatively) trivially implemented 
using a similar strategy to dm-era. Snapshots increment the era; blocks from 
previous eras cannot be overwritten or removed, and the target could be 
mapped to view a past era. With that, you have essentially instantaneous 
snapshots (increment a counter) with only a barrier constraint, not 
freezing.

 The goal  would really be to figure out which bits we already have,
 which bits are missing, where the problems are, what can be done
 better, and so forth, while we have at least two of the three layers
 represented and in the same room. This is very much something for
 the long term rather than a quick discussion followed by a few
 patches kind of thing, I think,
   Sure, if you have some proposal (not necessarily patches) then it's
 probably worth talking about.
 
 Honza



Re: [Cluster-devel] [Lsf-pc] [LSF/MM ATTEND] [TOPIC] fs/block interface discussions

2014-12-10 Thread Jan Kara
  Hi,

On Wed 10-12-14 11:49:48, Steven Whitehouse wrote:
 I'm interested generally in topics related to integration between
 components, one example being snapshots. We have snapshots at
 various different layers (can be done at array level or dm/lvm level
 and also we have filesystem support in the form of fs freezing).
  Well, usually snapshots at LVM layer are using fs freezing to get a
consistent image of a filesystem. So these two are integrated AFAICS.

 There are a few thoughts that spring to mind - one being how this
 should integrate with applications - in order to make it easier to
 use, and another being whether we could introduce snapshots which do
 not require freezing the fs (as per btrfs) for other filesystems too
 - possibly by passing down a special kind of flush from the
 filesystem layer.
  So btrfs is special in its COW nature. For filesystems which do updates
in place you can do COW in the block layer (after all that's what
dm snapshotting does) but you still have to get fs into consistent state
(that's fsfreeze), then take snapshot of the device (by setting up proper
COW structures), and only then you can allow further modifications of the
filesystem by unfreezing it. I don't see a way around that...

 A more general topic is proposed changes to the fs/block interface,
 of which the above may possibly be one example. There are a number
 of proposals for new classes of block device, and new features which
 will potentially require a different (or extended) interface at the
 fs/block layer. These have largely been discussed to date as
 individual features, and I wonder whether it might be useful to try
 and bring together the various proposals to see if there is
 commonality between at least some of them at the fs/block interface
 level. I know that there have been discussions going on relating to
 the individual proposals, so the idea I had was to try and look at
 them from a slightly different angle by bringing as many of them as
 possible together and concentrating on how they would be used from a
 filesystem perspective,
  Could you elaborate on which combination of features you'd like to
discuss?
Honza
-- 
Jan Kara j...@suse.cz
SUSE Labs, CR



Re: [Cluster-devel] [Lsf-pc] [LSF/MM ATTEND] [TOPIC] fs/block interface discussions

2014-12-10 Thread Steven Whitehouse

Hi,

On 10/12/14 12:48, Jan Kara wrote:

   Hi,

On Wed 10-12-14 11:49:48, Steven Whitehouse wrote:

I'm interested generally in topics related to integration between
components, one example being snapshots. We have snapshots at
various different layers (can be done at array level or dm/lvm level
and also we have filesystem support in the form of fs freezing).

   Well, usually snapshots at LVM layer are using fs freezing to get a
consistent image of a filesystem. So these two are integrated AFAICS.


There are a few thoughts that spring to mind - one being how this
should integrate with applications - in order to make it easier to
use, and another being whether we could introduce snapshots which do
not require freezing the fs (as per btrfs) for other filesystems too
- possibly by passing down a special kind of flush from the
filesystem layer.

   So btrfs is special in its COW nature. For filesystems which do updates
in place you can do COW in the block layer (after all that's what
dm snapshotting does) but you still have to get fs into consistent state
(that's fsfreeze), then take snapshot of the device (by setting up proper
COW structures), and only then you can allow further modifications of the
filesystem by unfreezing it. I don't see a way around that...
Well I think it should be possible to get the fs into a consistent state 
without needing to do the freeze/snapshot/unfreeze procedure. Instead we 
might have (there are no doubt other solutions too, so this is just an 
example to get discussion started) an extra flag on a bio, which would 
only be valid with some combination of flush flags. Then it is just a 
case of telling the block layer that we want to do a snapshot, and it 
would then spot the marked bio when the fs sends it down, and know that 
everything before and including that bio should be in the snapshot, and 
everything after that is not. So the fs would do basically a special 
form of sync, setting the flag on the bio when it is consistent - the 
question being how should that then be triggered? It means that there is 
no longer any possibility of having a problem if the unfreeze does not 
happen for any reason.


Perhaps the more important question though, is how it would/could be 
integrated with applications? The ultimate goal that I had in mind is 
that we could have a tool which is run to create a snapshot which will 
with a single command deal with all three (application/fs/block) layers, 
and it should not matter whether the snapshot is done via any particular 
fs or block device, it should work in the same way. So how could we send 
a message to a process to say that a snapshot is about to be taken, and 
to get a message back when the app has produced a consistent set of 
data, and to coordinate between multiple applications using the same 
block device, or even across multiple block devices, being used by a 
single app?




A more general topic is proposed changes to the fs/block interface,
of which the above may possibly be one example. There are a number
of proposals for new classes of block device, and new features which
will potentially require a different (or extended) interface at the
fs/block layer. These have largely been discussed to date as
individual features, and I wonder whether it might be useful to try
and bring together the various proposals to see if there is
commonality between at least some of them at the fs/block interface
level. I know that there have been discussions going on relating to
the individual proposals, so the idea I had was to try and look at
them from a slightly different angle by bringing as many of them as
possible together and concentrating on how they would be used from a
filesystem perspective,

   Could you elaborate on which combination of features you'd like to
discuss?
Honza
Well there are a number that I'm aware of that are currently in 
development, but I suspect that this list is not complete:

 - SMR drives
 - persistent memory (various different types)
 - Hinting from fs to block layer for various different reasons 
(layout, compression, snapshots, anything else?)

 - Better i/o error reporting/recovery
 - copy offload
 - anything I forgot?

Steve.



Re: [Cluster-devel] [Lsf-pc] [LSF/MM ATTEND] [TOPIC] fs/block interface discussions

2014-12-10 Thread Jan Kara
  Hi,

On Wed 10-12-14 14:13:02, Steven Whitehouse wrote:
 On 10/12/14 12:48, Jan Kara wrote:
 On Wed 10-12-14 11:49:48, Steven Whitehouse wrote:
 I'm interested generally in topics related to integration between
 components, one example being snapshots. We have snapshots at
 various different layers (can be done at array level or dm/lvm level
 and also we have filesystem support in the form of fs freezing).
Well, usually snapshots at LVM layer are using fs freezing to get a
 consistent image of a filesystem. So these two are integrated AFAICS.
 
 There are a few thoughts that spring to mind - one being how this
 should integrate with applications - in order to make it easier to
 use, and another being whether we could introduce snapshots which do
 not require freezing the fs (as per btrfs) for other filesystems too
 - possibly by passing down a special kind of flush from the
 filesystem layer.
So btrfs is special in its COW nature. For filesystems which do updates
 in place you can do COW in the block layer (after all that's what
 dm snapshotting does) but you still have to get fs into consistent state
 (that's fsfreeze), then take snapshot of the device (by setting up proper
 COW structures), and only then you can allow further modifications of the
 filesystem by unfreezing it. I don't see a way around that...
 Well I think it should be possible to get the fs into a consistent
 state without needing to do the freeze/snapshot/unfreeze procedure.
 Instead we might have (there are no doubt other solutions too, so
 this is just an example to get discussion started) an extra flag on
 a bio, which would only be valid with some combination of flush
 flags. Then it is just a case of telling the block layer that we
 want to do a snapshot, and it would then spot the marked bio when
 the fs sends it down, and know that everything before and including
 that bio should be in the snapshot, and everything after that is
 not. So the fs would do basically a special form of sync, setting
 the flag on the bio when it is consistent - the question being how
 should that then be triggered? It means that there is no longer any
 possibility of having a problem if the unfreeze does not happen for
 any reason.
  But still you first need to stop all writes to the filesystem, then do a
sync, and then allow writing again - which is exactly what freeze does.
Without stopping writers, you cannot be sure you don't have a mix of old
and new files in the snapshot and also guranteeing some finite completion
time is difficult (although that's doable)... So it seems to me that what
you describe is freeze-snapshot-unfreeze cycle, just that it's fully
controlled by the kernel.

 Perhaps the more important question though, is how it would/could be
 integrated with applications? The ultimate goal that I had in mind
 is that we could have a tool which is run to create a snapshot which
 will with a single command deal with all three
 (application/fs/block) layers, and it should not matter whether the
 snapshot is done via any particular fs or block device, it should
 work in the same way. So how could we send a message to a process to
 say that a snapshot is about to be taken, and to get a message back
 when the app has produced a consistent set of data, and to
 coordinate between multiple applications using the same block
 device, or even across multiple block devices, being used by a
 single app?
  Yeah, this would be nice. But it requires buy in from the applications
which is always difficult. Do you know any application whose developers
would be interested in something like this?

 A more general topic is proposed changes to the fs/block interface,
 of which the above may possibly be one example. There are a number
 of proposals for new classes of block device, and new features which
 will potentially require a different (or extended) interface at the
 fs/block layer. These have largely been discussed to date as
 individual features, and I wonder whether it might be useful to try
 and bring together the various proposals to see if there is
 commonality between at least some of them at the fs/block interface
 level. I know that there have been discussions going on relating to
 the individual proposals, so the idea I had was to try and look at
 them from a slightly different angle by bringing as many of them as
 possible together and concentrating on how they would be used from a
 filesystem perspective,
Could you elaborate on which combination of features you'd like to
 discuss?
  Honza
 Well there are a number that I'm aware of that are currently in
 development, but I suspect that this list is not complete:
  - SMR drives
  - persistent memory (various different types)
  - Hinting from fs to block layer for various different reasons
 (layout, compression, snapshots, anything else?)
  - Better i/o error reporting/recovery
  - copy offload
  - anything I forgot?
  I see. 

Re: [Cluster-devel] [Lsf-pc] [LSF/MM ATTEND] [TOPIC] fs/block interface discussions

2014-12-10 Thread Alasdair G Kergon
On Wed, Dec 10, 2014 at 07:46:51PM +0100, Jan Kara wrote:
   But still you first need to stop all writes to the filesystem, then do a
 sync, and then allow writing again - which is exactly what freeze does.

And with device-mapper, we were asked to support the taking of snapshots
of multiple volumes simultaneously (e.g. where the application data is
stored across more than one filesystem). Thin dm snapshots can handle
this (the original non-thin ones can't).

Alasdair