Re: How to find (out if) files sharing content?

2012-11-05 Thread David Sterba
On Wed, Oct 31, 2012 at 09:02:15PM +0800, Jeff Liu wrote:
 I propose this because OCFS2 report shared space in this way combine with 
 du(1).
 
 An old patch set to teach du(1) aware of reflinked file:
 https://oss.oracle.com/pipermail/ocfs2-devel/2010-September/007293.html

Patch looks ok, the shared size is requested by an option.

 Do you means that the costs is very expensive for userland extent status 
 checkup per file?

The most expensive part is IMO not in userspace, it does in-memory lookups.

  And without any possibility to turn this off,I'm afraid this will render 
  FIEMAP unusable in practice.
 For OCFS2, the FIEMAP_EXTENT_SHARED flag will be set upon fiemap ioctl(2) if 
 an extent
 is OCFS2_EXT_REFCOUNTED(i.e. reflinked or cloned), which means that 
 FIEMAP_EXTENT_SHARED
 is not a persistent flag, but I have no idea how Btrfs would be in this 
 point. :(

After some research, I think this could work for btrfs without
unwanted performance penalties.

There's the fiemap::fm_flags field that can be extended to request the
shared extent info from fiemap, so the information is not computed
unconditionally (that was my concern before). The rest is only
implementation details how to speed up the file extent - refcount info
lookups.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to find (out if) files sharing content?

2012-11-05 Thread Jeff Liu
On 11/06/2012 06:45 AM, David Sterba wrote:
 On Wed, Oct 31, 2012 at 09:02:15PM +0800, Jeff Liu wrote:
 I propose this because OCFS2 report shared space in this way combine with 
 du(1).

 An old patch set to teach du(1) aware of reflinked file:
 https://oss.oracle.com/pipermail/ocfs2-devel/2010-September/007293.html
 
 Patch looks ok, the shared size is requested by an option.
 
 Do you means that the costs is very expensive for userland extent status 
 checkup per file?
 
 The most expensive part is IMO not in userspace, it does in-memory lookups.
 
 And without any possibility to turn this off,I'm afraid this will render 
 FIEMAP unusable in practice.
 For OCFS2, the FIEMAP_EXTENT_SHARED flag will be set upon fiemap ioctl(2) if 
 an extent
 is OCFS2_EXT_REFCOUNTED(i.e. reflinked or cloned), which means that 
 FIEMAP_EXTENT_SHARED
 is not a persistent flag, but I have no idea how Btrfs would be in this 
 point. :(
 
 After some research, I think this could work for btrfs without
 unwanted performance penalties.
 
 There's the fiemap::fm_flags field that can be extended to request the
 shared extent info from fiemap, so the information is not computed
 unconditionally (that was my concern before). The rest is only
 implementation details how to speed up the file extent - refcount info
 lookups.
Thanks for your confirmation.

-Jeff
 
 david
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to find (out if) files sharing content?

2012-10-31 Thread David Sterba
On Wed, Oct 31, 2012 at 10:30:22AM +0800, Jeff Liu wrote:
 One idea is to mark those cloned extents as FIEMAP_EXTENT_SHARED so that
 we can go through a file to figure out how many extents are shared
 through fiemap(2), and calculate the real storage(fs/subvolume) footprint
 in the end.

This will cost at least one more seek per extent to find out that the
extent is shared, could be quite expensive. And without any possibility
to turn this off, I'm afraid this will render FIEMAP unusable in
practice.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to find (out if) files sharing content?

2012-10-31 Thread Jeff Liu
On 10/31/2012 07:31 PM, David Sterba wrote:
 On Wed, Oct 31, 2012 at 10:30:22AM +0800, Jeff Liu wrote:
 One idea is to mark those cloned extents as FIEMAP_EXTENT_SHARED so that
 we can go through a file to figure out how many extents are shared
 through fiemap(2), and calculate the real storage(fs/subvolume) footprint
 in the end.
 
 This will cost at least one more seek per extent to find out that the
 extent is shared, could be quite expensive.
I propose this because OCFS2 report shared space in this way combine with du(1).

An old patch set to teach du(1) aware of reflinked file:
https://oss.oracle.com/pipermail/ocfs2-devel/2010-September/007293.html

Do you means that the costs is very expensive for userland extent status 
checkup per file?
If yes, I have once tested an 50Gb OCFS2 partition filled with reflinked files 
on an old laptop,
it spent around 4 minutes to show the totally results if I recalled correct, 
but this definitely
depending on the real world scenarios.

 And without any possibility to turn this off,I'm afraid this will render 
 FIEMAP unusable in practice.
For OCFS2, the FIEMAP_EXTENT_SHARED flag will be set upon fiemap ioctl(2) if an 
extent
is OCFS2_EXT_REFCOUNTED(i.e. reflinked or cloned), which means that 
FIEMAP_EXTENT_SHARED
is not a persistent flag, but I have no idea how Btrfs would be in this point. 
:(

Thanks,
-Jeff
 
 david
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to find (out if) files sharing content?

2012-10-30 Thread Gábor Nyers
Hi,

How could one find out if 2 files share any extents on a btrfs file system?

A more generic variation of the above: How to list files on the same
file system/subvolume sharing content?

Thanks,
Gábor
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to find (out if) files sharing content?

2012-10-30 Thread Hugo Mills
On Tue, Oct 30, 2012 at 04:20:05PM +0100, Gábor Nyers wrote:
 Hi,
 
 How could one find out if 2 files share any extents on a btrfs file system?
 
 A more generic variation of the above: How to list files on the same
 file system/subvolume sharing content?

   You have direct (read-only) access to the metadata trees through
the TREE_SEARCH ioctl. It should be possible to walk through the
extents of a given file, and (I think) follow back-refs from the
extent back to the other files that share it.

   There's no simple code to do that right now, though.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- And what rough beast,  its hour come round at last / slouches ---  
 towards Bethlehem,  to be born? 


signature.asc
Description: Digital signature


Re: How to find (out if) files sharing content?

2012-10-30 Thread Jan Schmidt
On Tue, October 30, 2012 at 16:39 (+0100), Hugo Mills wrote:
 It should be possible to walk through the
 extents of a given file, and (I think) follow back-refs from the
 extent back to the other files that share it.

You wish :-) Backrefs are not made to walk them while the file system is online.
However btrfs inspect logical manages quite well, at least I haven't heard
otherwise so far. You still need to get the logical block numbers, either by
TREE_SEARCH ioctl or by filefrag.

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to find (out if) files sharing content?

2012-10-30 Thread Liu Bo
On 10/30/2012 11:20 PM, Gábor Nyers wrote:
 Hi,
 
 How could one find out if 2 files share any extents on a btrfs file system?
 
 A more generic variation of the above: How to list files on the same
 file system/subvolume sharing content?
 

Indeed ocfs2 already has the feature where you can get shared parts via 'du',
we're planning to support this in btrfs, too.

thanks,
liubo

 Thanks,
 Gábor
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to find (out if) files sharing content?

2012-10-30 Thread Jeff Liu
On 10/31/2012 08:40 AM, Liu Bo wrote:
 On 10/30/2012 11:20 PM, Gábor Nyers wrote:
 Hi,

 How could one find out if 2 files share any extents on a btrfs file system?

 A more generic variation of the above: How to list files on the same
 file system/subvolume sharing content?
One idea is to mark those cloned extents as FIEMAP_EXTENT_SHARED so that
we can go through a file to figure out how many extents are shared
through fiemap(2), and calculate the real storage(fs/subvolume) footprint
in the end.

Thanks,
-Jeff

 
 Indeed ocfs2 already has the feature where you can get shared parts via 'du',
 we're planning to support this in btrfs, too.
 
 thanks,
 liubo
 
 Thanks,
 Gábor
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html