Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header

2016-03-15 Thread Duncan
Nicholas D Steeves posted on Tue, 15 Mar 2016 18:08:41 -0400 as excerpted:

> I'm not sure to what degree the following is a relevant concern, and I'm
> guessing it's not, other than for laughs, but to me "dedupe" reads as
> "de-dupe" or "undupe".  While it functions as the inverse of the verb
> "to dupe", I don't think one can "be unduped" or "be unfooled". What is
> that old aphorism?  "Once duped twice shy"? ;-)

That's the obvious association, yes, and the negative connotations of 
dupe are surely why I have such a personal negative reaction to dedupe.  
But precedent and current usage being what they are...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header

2016-03-15 Thread Nicholas D Steeves
On 13 March 2016 at 12:55, Duncan <1i5t5.dun...@cox.net> wrote:
> NeilBrown posted on Sun, 13 Mar 2016 22:33:22 +1100 as excerpted:
>
>> On Sun, Mar 13 2016, Qu Wenruo wrote:
>>
>>> BTW, I am always interested in, why de-duplication can be shorted as
>>> 'dedupe'.
>
>>> I didn't see any 'e' in the whole word "DUPlication".
>>> Or it's an abbreviation of "DUPlicatE" instead of "DUPlication"?
>>
>> The "u" in "duplicate" is pronounced as a long vowel sound, almost like
>> d-you-plicate.
>
>> To make a vowel long you can add an 'e' at the end of a word.
>
>> by analogy, "dupe" has a long "u" and so sounds like the first syllable
>> of "duplicate".
>
> As a native (USian but with some years growing up in the then recently
> independent former Crown colony of Kenya, influencing my personal
> preferences) English speaker, while what Neil says about short "u" vs.
> long "u" is correct, I agree with Qu that the "e" in dupe doesn't make so
> much sense, and would, other things being equal, vastly prefer dedup to
> dedupe, myself.
>
> However, there's some value in consistency, and given the previous dedupe
> precedent in-kernel, sticking to that for consistency reasons makes sense.
>
> But were this debate to have been about the original usage, I'd have
> definitely favored dedup all the way, as not withstanding Neil's argument
> above, adding the "e" makes little sense to me either.  So only because
> it's already in use in kernel code, but if this /were/ the original
> kernel code...
>
> So I definitely understand your confusion, Qu, and have the same personal
> preference even as a native English speaker. =:^)

I'm not sure to what degree the following is a relevant concern, and
I'm guessing it's not, other than for laughs, but to me "dedupe" reads
as "de-dupe" or "undupe".  While it functions as the inverse of the
verb "to dupe", I don't think one can "be unduped" or "be unfooled".
What is that old aphorism?  "Once duped twice shy"? ;-)

Honestly I'm surprised that a verb-form of "tuple" hasn't yet emerged,
because if it had we might be saying "detup" instead of "dedup".

Best regards,
Nicholas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header

2016-03-13 Thread Duncan
NeilBrown posted on Sun, 13 Mar 2016 22:33:22 +1100 as excerpted:

> On Sun, Mar 13 2016, Qu Wenruo wrote:
> 
>> BTW, I am always interested in, why de-duplication can be shorted as
>> 'dedupe'.

>> I didn't see any 'e' in the whole word "DUPlication".
>> Or it's an abbreviation of "DUPlicatE" instead of "DUPlication"?
> 
> The "u" in "duplicate" is pronounced as a long vowel sound, almost like
> d-you-plicate.

> To make a vowel long you can add an 'e' at the end of a word.

> by analogy, "dupe" has a long "u" and so sounds like the first syllable
> of "duplicate".

As a native (USian but with some years growing up in the then recently 
independent former Crown colony of Kenya, influencing my personal 
preferences) English speaker, while what Neil says about short "u" vs. 
long "u" is correct, I agree with Qu that the "e" in dupe doesn't make so 
much sense, and would, other things being equal, vastly prefer dedup to 
dedupe, myself.

However, there's some value in consistency, and given the previous dedupe 
precedent in-kernel, sticking to that for consistency reasons makes sense.

But were this debate to have been about the original usage, I'd have 
definitely favored dedup all the way, as not withstanding Neil's argument 
above, adding the "e" makes little sense to me either.  So only because 
it's already in use in kernel code, but if this /were/ the original 
kernel code...

So I definitely understand your confusion, Qu, and have the same personal 
preference even as a native English speaker. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header

2016-03-13 Thread NeilBrown
On Sun, Mar 13 2016, Qu Wenruo wrote:

> Qu Wenruo wrote on 2016/03/12 16:16 +0800:
>>
>>
>> On 03/11/2016 07:43 PM, David Sterba wrote:
>>> On Thu, Mar 10, 2016 at 08:57:12AM +0800, Qu Wenruo wrote:
> The ioctl FIDEDUPERANGE and the tool duperemove both use "dupe".
> It would be nice if we could be consistent and all use the same
> abbreviation.

 Yes, current kernel VFS level offline dedup uses the name "dedupe".
 But on the other hand, ZFS uses the name "dedup" for their online dedup.

 And personally speaking, I'd like some difference to distinguish inline
 dedup and offline dedup.
 In that case, the extra "e" seems somewhat useful.
 With "e", it's intended for offline use. Without "e", it's intended for
 online use.
>>>
>>> Such difference is very subtle and I think we should stick to just one
>>> spelling, which shall be 'dedupe'.
>>
>> OK, I'll change them to 'dedupe' in next bug fix version.
>>
>> Thanks,
>> Qu
>>
>>
> BTW, I am always interested in, why de-duplication can be shorted as 
> 'dedupe'.

The "u" in "duplicate" is pronounced as a long vowel sound, almost like
   d-you-plicate.

Normal pronunciation rules for English indicate that "dup" should be
pronounced with a short vowel sound, like "cup".  So "dup" sounds wrong.

To make a vowel long you can add an 'e' at the end of a word.
So:
   tub or cub  have a short "u"
   tube or cube  have a long "u".

by analogy, "dupe" has a long "u" and so sounds like the first syllable
of "duplicate".

NeilBrown


>
> I didn't see any 'e' in the whole word "DUPlication".
> Or it's an abbreviation of "DUPlicatE" instead of "DUPlication"?
>
> Thanks,
> Qu
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: PGP signature


Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header

2016-03-12 Thread Qu Wenruo



Qu Wenruo wrote on 2016/03/12 16:16 +0800:



On 03/11/2016 07:43 PM, David Sterba wrote:

On Thu, Mar 10, 2016 at 08:57:12AM +0800, Qu Wenruo wrote:

The ioctl FIDEDUPERANGE and the tool duperemove both use "dupe".
It would be nice if we could be consistent and all use the same
abbreviation.


Yes, current kernel VFS level offline dedup uses the name "dedupe".
But on the other hand, ZFS uses the name "dedup" for their online dedup.

And personally speaking, I'd like some difference to distinguish inline
dedup and offline dedup.
In that case, the extra "e" seems somewhat useful.
With "e", it's intended for offline use. Without "e", it's intended for
online use.


Such difference is very subtle and I think we should stick to just one
spelling, which shall be 'dedupe'.


OK, I'll change them to 'dedupe' in next bug fix version.

Thanks,
Qu


BTW, I am always interested in, why de-duplication can be shorted as 
'dedupe'.


I didn't see any 'e' in the whole word "DUPlication".
Or it's an abbreviation of "DUPlicatE" instead of "DUPlication"?

Thanks,
Qu



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header

2016-03-12 Thread Qu Wenruo



On 03/11/2016 07:43 PM, David Sterba wrote:

On Thu, Mar 10, 2016 at 08:57:12AM +0800, Qu Wenruo wrote:

The ioctl FIDEDUPERANGE and the tool duperemove both use "dupe".
It would be nice if we could be consistent and all use the same
abbreviation.


Yes, current kernel VFS level offline dedup uses the name "dedupe".
But on the other hand, ZFS uses the name "dedup" for their online dedup.

And personally speaking, I'd like some difference to distinguish inline
dedup and offline dedup.
In that case, the extra "e" seems somewhat useful.
With "e", it's intended for offline use. Without "e", it's intended for
online use.


Such difference is very subtle and I think we should stick to just one
spelling, which shall be 'dedupe'.


OK, I'll change them to 'dedupe' in next bug fix version.

Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header

2016-03-11 Thread David Sterba
On Thu, Mar 10, 2016 at 08:57:12AM +0800, Qu Wenruo wrote:
> > The ioctl FIDEDUPERANGE and the tool duperemove both use "dupe".
> > It would be nice if we could be consistent and all use the same
> > abbreviation.
> 
> Yes, current kernel VFS level offline dedup uses the name "dedupe".
> But on the other hand, ZFS uses the name "dedup" for their online dedup.
> 
> And personally speaking, I'd like some difference to distinguish inline 
> dedup and offline dedup.
> In that case, the extra "e" seems somewhat useful.
> With "e", it's intended for offline use. Without "e", it's intended for 
> online use.

Such difference is very subtle and I think we should stick to just one
spelling, which shall be 'dedupe'.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header

2016-03-09 Thread Qu Wenruo



NeilBrown wrote on 2016/03/10 08:27 +1100:

On Thu, Feb 18 2016, Qu Wenruo wrote:


+
+/*
+ * Dedup storage backend
+ * On disk is persist storage but overhead is large
+ * In memory is fast but will lose all its hash on umount
+ */
+#define BTRFS_DEDUP_BACKEND_INMEMORY   0
+#define BTRFS_DEDUP_BACKEND_ONDISK 1
+#define BTRFS_DEDUP_BACKEND_LAST   2


Hi,

This may seem petty, but I'm here to complain about the names. :-)


Any complaint is better than no complaint. :)



Firstly, "2" is *not* the LAST backend.  The LAST backed is clearly
"ONDISK" with is "1:.
"2" is the number of backends, or the count of them.
So

+#define BTRFS_DEDUP_BACKEND_LAST   1


would be OK, as would


+#define BTRFS_DEDUP_BACKEND_COUNT  2


but what you have is wrong.

The place where you use this define:

+   if (backend >= BTRFS_DEDUP_BACKEND_LAST)
+   return -EINVAL;

is correct, but it looks wrong.  It looks like it is saying that it is
invalid to use the LAST backend!


Makes sense, I'll use BACKEND_COUNT as the name.



Secondly, you use "dup" as an abbreviation of "duplicate".
The ioctl FIDEDUPERANGE and the tool duperemove both use "dupe".
It would be nice if we could be consistent and all use the same
abbreviation.


Yes, current kernel VFS level offline dedup uses the name "dedupe".
But on the other hand, ZFS uses the name "dedup" for their online dedup.

And personally speaking, I'd like some difference to distinguish inline 
dedup and offline dedup.

In that case, the extra "e" seems somewhat useful.
With "e", it's intended for offline use. Without "e", it's intended for 
online use.


Thanks,
Qu



Thanks,
NeilBrown




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header

2016-03-09 Thread NeilBrown
On Thu, Feb 18 2016, Qu Wenruo wrote:

> +
> +/*
> + * Dedup storage backend
> + * On disk is persist storage but overhead is large
> + * In memory is fast but will lose all its hash on umount
> + */
> +#define BTRFS_DEDUP_BACKEND_INMEMORY 0
> +#define BTRFS_DEDUP_BACKEND_ONDISK   1
> +#define BTRFS_DEDUP_BACKEND_LAST 2

Hi,

This may seem petty, but I'm here to complain about the names. :-)

Firstly, "2" is *not* the LAST backend.  The LAST backed is clearly
"ONDISK" with is "1:.
"2" is the number of backends, or the count of them.
So
> +#define BTRFS_DEDUP_BACKEND_LAST 1

would be OK, as would

> +#define BTRFS_DEDUP_BACKEND_COUNT2

but what you have is wrong.

The place where you use this define:

+   if (backend >= BTRFS_DEDUP_BACKEND_LAST)
+   return -EINVAL;

is correct, but it looks wrong.  It looks like it is saying that it is
invalid to use the LAST backend!

Secondly, you use "dup" as an abbreviation of "duplicate".
The ioctl FIDEDUPERANGE and the tool duperemove both use "dupe".
It would be nice if we could be consistent and all use the same
abbreviation.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


[PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header

2016-02-17 Thread Qu Wenruo
From: Wang Xiaoguang 

Introduce the header for btrfs online(write time) de-duplication
framework and needed header.

The new de-duplication framework is going to support 2 different dedup
method and 1 dedup hash.

Signed-off-by: Qu Wenruo 
Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/ctree.h   |   5 +++
 fs/btrfs/dedup.h   | 127 +
 fs/btrfs/disk-io.c |   1 +
 3 files changed, 133 insertions(+)
 create mode 100644 fs/btrfs/dedup.h

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index bc6a87e..094db5c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1866,6 +1866,11 @@ struct btrfs_fs_info {
struct list_head pinned_chunks;
 
int creating_free_space_tree;
+
+   /* Inband de-duplication related structures*/
+   unsigned int dedup_enabled:1;
+   struct btrfs_dedup_info *dedup_info;
+   struct mutex dedup_ioctl_lock;
 };
 
 struct btrfs_subvolume_writers {
diff --git a/fs/btrfs/dedup.h b/fs/btrfs/dedup.h
new file mode 100644
index 000..8e1ff03
--- /dev/null
+++ b/fs/btrfs/dedup.h
@@ -0,0 +1,127 @@
+/*
+ * Copyright (C) 2015 Fujitsu.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#ifndef __BTRFS_DEDUP__
+#define __BTRFS_DEDUP__
+
+#include 
+#include 
+#include 
+
+/*
+ * Dedup storage backend
+ * On disk is persist storage but overhead is large
+ * In memory is fast but will lose all its hash on umount
+ */
+#define BTRFS_DEDUP_BACKEND_INMEMORY   0
+#define BTRFS_DEDUP_BACKEND_ONDISK 1
+#define BTRFS_DEDUP_BACKEND_LAST   2
+
+/* Dedup block size limit and default value */
+#define BTRFS_DEDUP_BLOCKSIZE_MAX  (8 * 1024 * 1024)
+#define BTRFS_DEDUP_BLOCKSIZE_MIN  (16 * 1024)
+#define BTRFS_DEDUP_BLOCKSIZE_DEFAULT  (128 * 1024)
+
+/* Hash algorithm, only support SHA256 yet */
+#define BTRFS_DEDUP_HASH_SHA2560
+
+static int btrfs_dedup_sizes[] = { 32 };
+
+/*
+ * For caller outside of dedup.c
+ *
+ * Different dedup backends should have their own hash structure
+ */
+struct btrfs_dedup_hash {
+   u64 bytenr;
+   u32 num_bytes;
+
+   /* last field is a variable length array of dedup hash */
+   u8 hash[];
+};
+
+struct btrfs_dedup_info {
+   /* dedup blocksize */
+   u64 blocksize;
+   u16 backend;
+   u16 hash_type;
+
+   struct crypto_shash *dedup_driver;
+   struct mutex lock;
+
+   /* following members are only used in in-memory dedup mode */
+   struct rb_root hash_root;
+   struct rb_root bytenr_root;
+   struct list_head lru_list;
+   u64 limit_nr;
+   u64 current_nr;
+};
+
+struct btrfs_trans_handle;
+
+int btrfs_dedup_hash_size(u16 type);
+struct btrfs_dedup_hash *btrfs_dedup_alloc_hash(u16 type);
+
+/*
+ * Initial inband dedup info
+ * Called at dedup enable time.
+ */
+int btrfs_dedup_enable(struct btrfs_fs_info *fs_info, u16 type, u16 backend,
+  u64 blocksize, u64 limit_nr);
+
+/*
+ * Disable dedup and invalidate all its dedup data.
+ * Called at dedup disable time.
+ */
+int btrfs_dedup_disable(struct btrfs_fs_info *fs_info);
+
+/*
+ * Calculate hash for dedup.
+ * Caller must ensure [start, start + dedup_bs) has valid data.
+ */
+int btrfs_dedup_calc_hash(struct btrfs_fs_info *fs_info,
+ struct inode *inode, u64 start,
+ struct btrfs_dedup_hash *hash);
+
+/*
+ * Search for duplicated extents by calculated hash
+ * Caller must call btrfs_dedup_calc_hash() first to get the hash.
+ *
+ * @inode: the inode for we are writing
+ * @file_pos: offset inside the inode
+ * As we will increase extent ref immediately after a hash match,
+ * we need @file_pos and @inode in this case.
+ *
+ * Return > 0 for a hash match, and the extent ref will be
+ * *INCREASED*, and hash->bytenr/num_bytes will record the existing
+ * extent data.
+ * Return 0 for a hash miss. Nothing is done
+ */
+int btrfs_dedup_search(struct btrfs_fs_info *fs_info,
+  struct inode *inode, u64 file_pos,
+  struct btrfs_dedup_hash *hash);
+
+/* Add a dedup hash into dedup info */
+int btrfs_dedup_add(struct btrfs_trans_handle *trans,
+   struct btrfs_fs_info *fs_info,
+   struct btrfs_dedup_hash *hash);
+
+/* Remove a dedup hash from dedup info */
+int btrfs_dedup