Re: CoW behavior when writing same content

2018-10-09 Thread Chris Murphy
On Tue, Oct 9, 2018 at 11:25 AM, Andrei Borzenkov  wrote:
> 09.10.2018 18:52, Chris Murphy пишет:

>>> In this case is root/big_file and snapshot/big_file still share the same 
>>> data?
>>
>> You'll be left with three files. /big_file and root/big_file will
>> share extents,
>
> How comes they share extents? This requires --reflink, is it default now?

Good catch. It's not the default. I meant to write that initially only

root/big_file and snapshot/big_file have shared extents

And the shared extents are lost when snapshot/big_file is
"overwritten" by the copy into snapshot/


>> and snapshot/big_file will have its own extents. You'd
>> need to copy with --reflink for snapshot/big_file to have shared
>> extents with /big_file - or deduplicate.
>>
> This still overwrites the whole file in the sense original file content
> of "snapshot/big_file" is lost. That new content happens to be identical
> and that new content will probably be reflinked does not change the fact
> that original file is gone.

Agreed.

-- 
Chris Murphy


Re: CoW behavior when writing same content

2018-10-09 Thread Andrei Borzenkov
09.10.2018 18:52, Chris Murphy пишет:
> On Tue, Oct 9, 2018 at 8:48 AM, Gervais, Francois
>  wrote:
>> Hi,
>>
>> If I have a snapshot where I overwrite a big file but which only a
>> small portion of it is different, will the whole file be rewritten in
>> the snapshot? Or only the different part of the file?
> 

If you overwrite the whole file, the whole file will be overwritten.

> Depends on how the application modifies files. Many applications write
> out a whole new file with a pseudorandom filename, fsync, then rename.
> 
>>
>> Something like:
>>
>> $ dd if=/dev/urandom of=/big_file bs=1M count=1024
>> $ cp /big_file root/
>> $ btrfs sub snap root snapshot
>> $ cp /big_file snapshot/
>>

And which portion of these three files is different? They must be
identical. Not that it really matters, but that does not match your
question.

>> In this case is root/big_file and snapshot/big_file still share the same 
>> data?
> 
> You'll be left with three files. /big_file and root/big_file will
> share extents,

How comes they share extents? This requires --reflink, is it default now?

> and snapshot/big_file will have its own extents. You'd
> need to copy with --reflink for snapshot/big_file to have shared
> extents with /big_file - or deduplicate.
> 
This still overwrites the whole file in the sense original file content
of "snapshot/big_file" is lost. That new content happens to be identical
and that new content will probably be reflinked does not change the fact
that original file is gone.


Re: CoW behavior when writing same content

2018-10-09 Thread Roman Mamedov
On Tue, 9 Oct 2018 09:52:00 -0600
Chris Murphy  wrote:

> You'll be left with three files. /big_file and root/big_file will
> share extents, and snapshot/big_file will have its own extents. You'd
> need to copy with --reflink for snapshot/big_file to have shared
> extents with /big_file - or deduplicate.

Or use rsync for copying, in the mode where it reads and checksums blocks of
both files, to copy only the non-matching portions.

rsync --inplace

  This  option  is  useful  for  transferring  large  files   with
  block-based  changes  or appended data, and also on systems that
  are disk bound, not network bound.  It  can  also  help  keep  a
  copy-on-write filesystem snapshot from diverging the entire con‐
  tents of a file that only has minor changes.

-- 
With respect,
Roman


Re: CoW behavior when writing same content

2018-10-09 Thread Chris Murphy
On Tue, Oct 9, 2018 at 8:48 AM, Gervais, Francois
 wrote:
> Hi,
>
> If I have a snapshot where I overwrite a big file but which only a
> small portion of it is different, will the whole file be rewritten in
> the snapshot? Or only the different part of the file?

Depends on how the application modifies files. Many applications write
out a whole new file with a pseudorandom filename, fsync, then rename.

>
> Something like:
>
> $ dd if=/dev/urandom of=/big_file bs=1M count=1024
> $ cp /big_file root/
> $ btrfs sub snap root snapshot
> $ cp /big_file snapshot/
>
> In this case is root/big_file and snapshot/big_file still share the same data?

You'll be left with three files. /big_file and root/big_file will
share extents, and snapshot/big_file will have its own extents. You'd
need to copy with --reflink for snapshot/big_file to have shared
extents with /big_file - or deduplicate.


-- 
Chris Murphy