Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header
Nicholas D Steeves posted on Tue, 15 Mar 2016 18:08:41 -0400 as excerpted: > I'm not sure to what degree the following is a relevant concern, and I'm > guessing it's not, other than for laughs, but to me "dedupe" reads as > "de-dupe" or "undupe". While it functions as the inverse of the verb > "to dupe", I don't think one can "be unduped" or "be unfooled". What is > that old aphorism? "Once duped twice shy"? ;-) That's the obvious association, yes, and the negative connotations of dupe are surely why I have such a personal negative reaction to dedupe. But precedent and current usage being what they are... -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header
On 13 March 2016 at 12:55, Duncan <1i5t5.dun...@cox.net> wrote: > NeilBrown posted on Sun, 13 Mar 2016 22:33:22 +1100 as excerpted: > >> On Sun, Mar 13 2016, Qu Wenruo wrote: >> >>> BTW, I am always interested in, why de-duplication can be shorted as >>> 'dedupe'. > >>> I didn't see any 'e' in the whole word "DUPlication". >>> Or it's an abbreviation of "DUPlicatE" instead of "DUPlication"? >> >> The "u" in "duplicate" is pronounced as a long vowel sound, almost like >> d-you-plicate. > >> To make a vowel long you can add an 'e' at the end of a word. > >> by analogy, "dupe" has a long "u" and so sounds like the first syllable >> of "duplicate". > > As a native (USian but with some years growing up in the then recently > independent former Crown colony of Kenya, influencing my personal > preferences) English speaker, while what Neil says about short "u" vs. > long "u" is correct, I agree with Qu that the "e" in dupe doesn't make so > much sense, and would, other things being equal, vastly prefer dedup to > dedupe, myself. > > However, there's some value in consistency, and given the previous dedupe > precedent in-kernel, sticking to that for consistency reasons makes sense. > > But were this debate to have been about the original usage, I'd have > definitely favored dedup all the way, as not withstanding Neil's argument > above, adding the "e" makes little sense to me either. So only because > it's already in use in kernel code, but if this /were/ the original > kernel code... > > So I definitely understand your confusion, Qu, and have the same personal > preference even as a native English speaker. =:^) I'm not sure to what degree the following is a relevant concern, and I'm guessing it's not, other than for laughs, but to me "dedupe" reads as "de-dupe" or "undupe". While it functions as the inverse of the verb "to dupe", I don't think one can "be unduped" or "be unfooled". What is that old aphorism? "Once duped twice shy"? ;-) Honestly I'm surprised that a verb-form of "tuple" hasn't yet emerged, because if it had we might be saying "detup" instead of "dedup". Best regards, Nicholas -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header
NeilBrown posted on Sun, 13 Mar 2016 22:33:22 +1100 as excerpted: > On Sun, Mar 13 2016, Qu Wenruo wrote: > >> BTW, I am always interested in, why de-duplication can be shorted as >> 'dedupe'. >> I didn't see any 'e' in the whole word "DUPlication". >> Or it's an abbreviation of "DUPlicatE" instead of "DUPlication"? > > The "u" in "duplicate" is pronounced as a long vowel sound, almost like > d-you-plicate. > To make a vowel long you can add an 'e' at the end of a word. > by analogy, "dupe" has a long "u" and so sounds like the first syllable > of "duplicate". As a native (USian but with some years growing up in the then recently independent former Crown colony of Kenya, influencing my personal preferences) English speaker, while what Neil says about short "u" vs. long "u" is correct, I agree with Qu that the "e" in dupe doesn't make so much sense, and would, other things being equal, vastly prefer dedup to dedupe, myself. However, there's some value in consistency, and given the previous dedupe precedent in-kernel, sticking to that for consistency reasons makes sense. But were this debate to have been about the original usage, I'd have definitely favored dedup all the way, as not withstanding Neil's argument above, adding the "e" makes little sense to me either. So only because it's already in use in kernel code, but if this /were/ the original kernel code... So I definitely understand your confusion, Qu, and have the same personal preference even as a native English speaker. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header
On Sun, Mar 13 2016, Qu Wenruo wrote: > Qu Wenruo wrote on 2016/03/12 16:16 +0800: >> >> >> On 03/11/2016 07:43 PM, David Sterba wrote: >>> On Thu, Mar 10, 2016 at 08:57:12AM +0800, Qu Wenruo wrote: > The ioctl FIDEDUPERANGE and the tool duperemove both use "dupe". > It would be nice if we could be consistent and all use the same > abbreviation. Yes, current kernel VFS level offline dedup uses the name "dedupe". But on the other hand, ZFS uses the name "dedup" for their online dedup. And personally speaking, I'd like some difference to distinguish inline dedup and offline dedup. In that case, the extra "e" seems somewhat useful. With "e", it's intended for offline use. Without "e", it's intended for online use. >>> >>> Such difference is very subtle and I think we should stick to just one >>> spelling, which shall be 'dedupe'. >> >> OK, I'll change them to 'dedupe' in next bug fix version. >> >> Thanks, >> Qu >> >> > BTW, I am always interested in, why de-duplication can be shorted as > 'dedupe'. The "u" in "duplicate" is pronounced as a long vowel sound, almost like d-you-plicate. Normal pronunciation rules for English indicate that "dup" should be pronounced with a short vowel sound, like "cup". So "dup" sounds wrong. To make a vowel long you can add an 'e' at the end of a word. So: tub or cub have a short "u" tube or cube have a long "u". by analogy, "dupe" has a long "u" and so sounds like the first syllable of "duplicate". NeilBrown > > I didn't see any 'e' in the whole word "DUPlication". > Or it's an abbreviation of "DUPlicatE" instead of "DUPlication"? > > Thanks, > Qu > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: PGP signature
Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header
Qu Wenruo wrote on 2016/03/12 16:16 +0800: On 03/11/2016 07:43 PM, David Sterba wrote: On Thu, Mar 10, 2016 at 08:57:12AM +0800, Qu Wenruo wrote: The ioctl FIDEDUPERANGE and the tool duperemove both use "dupe". It would be nice if we could be consistent and all use the same abbreviation. Yes, current kernel VFS level offline dedup uses the name "dedupe". But on the other hand, ZFS uses the name "dedup" for their online dedup. And personally speaking, I'd like some difference to distinguish inline dedup and offline dedup. In that case, the extra "e" seems somewhat useful. With "e", it's intended for offline use. Without "e", it's intended for online use. Such difference is very subtle and I think we should stick to just one spelling, which shall be 'dedupe'. OK, I'll change them to 'dedupe' in next bug fix version. Thanks, Qu BTW, I am always interested in, why de-duplication can be shorted as 'dedupe'. I didn't see any 'e' in the whole word "DUPlication". Or it's an abbreviation of "DUPlicatE" instead of "DUPlication"? Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header
On 03/11/2016 07:43 PM, David Sterba wrote: On Thu, Mar 10, 2016 at 08:57:12AM +0800, Qu Wenruo wrote: The ioctl FIDEDUPERANGE and the tool duperemove both use "dupe". It would be nice if we could be consistent and all use the same abbreviation. Yes, current kernel VFS level offline dedup uses the name "dedupe". But on the other hand, ZFS uses the name "dedup" for their online dedup. And personally speaking, I'd like some difference to distinguish inline dedup and offline dedup. In that case, the extra "e" seems somewhat useful. With "e", it's intended for offline use. Without "e", it's intended for online use. Such difference is very subtle and I think we should stick to just one spelling, which shall be 'dedupe'. OK, I'll change them to 'dedupe' in next bug fix version. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header
On Thu, Mar 10, 2016 at 08:57:12AM +0800, Qu Wenruo wrote: > > The ioctl FIDEDUPERANGE and the tool duperemove both use "dupe". > > It would be nice if we could be consistent and all use the same > > abbreviation. > > Yes, current kernel VFS level offline dedup uses the name "dedupe". > But on the other hand, ZFS uses the name "dedup" for their online dedup. > > And personally speaking, I'd like some difference to distinguish inline > dedup and offline dedup. > In that case, the extra "e" seems somewhat useful. > With "e", it's intended for offline use. Without "e", it's intended for > online use. Such difference is very subtle and I think we should stick to just one spelling, which shall be 'dedupe'. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header
NeilBrown wrote on 2016/03/10 08:27 +1100: On Thu, Feb 18 2016, Qu Wenruo wrote: + +/* + * Dedup storage backend + * On disk is persist storage but overhead is large + * In memory is fast but will lose all its hash on umount + */ +#define BTRFS_DEDUP_BACKEND_INMEMORY 0 +#define BTRFS_DEDUP_BACKEND_ONDISK 1 +#define BTRFS_DEDUP_BACKEND_LAST 2 Hi, This may seem petty, but I'm here to complain about the names. :-) Any complaint is better than no complaint. :) Firstly, "2" is *not* the LAST backend. The LAST backed is clearly "ONDISK" with is "1:. "2" is the number of backends, or the count of them. So +#define BTRFS_DEDUP_BACKEND_LAST 1 would be OK, as would +#define BTRFS_DEDUP_BACKEND_COUNT 2 but what you have is wrong. The place where you use this define: + if (backend >= BTRFS_DEDUP_BACKEND_LAST) + return -EINVAL; is correct, but it looks wrong. It looks like it is saying that it is invalid to use the LAST backend! Makes sense, I'll use BACKEND_COUNT as the name. Secondly, you use "dup" as an abbreviation of "duplicate". The ioctl FIDEDUPERANGE and the tool duperemove both use "dupe". It would be nice if we could be consistent and all use the same abbreviation. Yes, current kernel VFS level offline dedup uses the name "dedupe". But on the other hand, ZFS uses the name "dedup" for their online dedup. And personally speaking, I'd like some difference to distinguish inline dedup and offline dedup. In that case, the extra "e" seems somewhat useful. With "e", it's intended for offline use. Without "e", it's intended for online use. Thanks, Qu Thanks, NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header
On Thu, Feb 18 2016, Qu Wenruo wrote: > + > +/* > + * Dedup storage backend > + * On disk is persist storage but overhead is large > + * In memory is fast but will lose all its hash on umount > + */ > +#define BTRFS_DEDUP_BACKEND_INMEMORY 0 > +#define BTRFS_DEDUP_BACKEND_ONDISK 1 > +#define BTRFS_DEDUP_BACKEND_LAST 2 Hi, This may seem petty, but I'm here to complain about the names. :-) Firstly, "2" is *not* the LAST backend. The LAST backed is clearly "ONDISK" with is "1:. "2" is the number of backends, or the count of them. So > +#define BTRFS_DEDUP_BACKEND_LAST 1 would be OK, as would > +#define BTRFS_DEDUP_BACKEND_COUNT2 but what you have is wrong. The place where you use this define: + if (backend >= BTRFS_DEDUP_BACKEND_LAST) + return -EINVAL; is correct, but it looks wrong. It looks like it is saying that it is invalid to use the LAST backend! Secondly, you use "dup" as an abbreviation of "duplicate". The ioctl FIDEDUPERANGE and the tool duperemove both use "dupe". It would be nice if we could be consistent and all use the same abbreviation. Thanks, NeilBrown signature.asc Description: PGP signature
[PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header
From: Wang Xiaoguang Introduce the header for btrfs online(write time) de-duplication framework and needed header. The new de-duplication framework is going to support 2 different dedup method and 1 dedup hash. Signed-off-by: Qu Wenruo Signed-off-by: Wang Xiaoguang --- fs/btrfs/ctree.h | 5 +++ fs/btrfs/dedup.h | 127 + fs/btrfs/disk-io.c | 1 + 3 files changed, 133 insertions(+) create mode 100644 fs/btrfs/dedup.h diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index bc6a87e..094db5c 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1866,6 +1866,11 @@ struct btrfs_fs_info { struct list_head pinned_chunks; int creating_free_space_tree; + + /* Inband de-duplication related structures*/ + unsigned int dedup_enabled:1; + struct btrfs_dedup_info *dedup_info; + struct mutex dedup_ioctl_lock; }; struct btrfs_subvolume_writers { diff --git a/fs/btrfs/dedup.h b/fs/btrfs/dedup.h new file mode 100644 index 000..8e1ff03 --- /dev/null +++ b/fs/btrfs/dedup.h @@ -0,0 +1,127 @@ +/* + * Copyright (C) 2015 Fujitsu. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; if not, write to the + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, + * Boston, MA 021110-1307, USA. + */ + +#ifndef __BTRFS_DEDUP__ +#define __BTRFS_DEDUP__ + +#include +#include +#include + +/* + * Dedup storage backend + * On disk is persist storage but overhead is large + * In memory is fast but will lose all its hash on umount + */ +#define BTRFS_DEDUP_BACKEND_INMEMORY 0 +#define BTRFS_DEDUP_BACKEND_ONDISK 1 +#define BTRFS_DEDUP_BACKEND_LAST 2 + +/* Dedup block size limit and default value */ +#define BTRFS_DEDUP_BLOCKSIZE_MAX (8 * 1024 * 1024) +#define BTRFS_DEDUP_BLOCKSIZE_MIN (16 * 1024) +#define BTRFS_DEDUP_BLOCKSIZE_DEFAULT (128 * 1024) + +/* Hash algorithm, only support SHA256 yet */ +#define BTRFS_DEDUP_HASH_SHA2560 + +static int btrfs_dedup_sizes[] = { 32 }; + +/* + * For caller outside of dedup.c + * + * Different dedup backends should have their own hash structure + */ +struct btrfs_dedup_hash { + u64 bytenr; + u32 num_bytes; + + /* last field is a variable length array of dedup hash */ + u8 hash[]; +}; + +struct btrfs_dedup_info { + /* dedup blocksize */ + u64 blocksize; + u16 backend; + u16 hash_type; + + struct crypto_shash *dedup_driver; + struct mutex lock; + + /* following members are only used in in-memory dedup mode */ + struct rb_root hash_root; + struct rb_root bytenr_root; + struct list_head lru_list; + u64 limit_nr; + u64 current_nr; +}; + +struct btrfs_trans_handle; + +int btrfs_dedup_hash_size(u16 type); +struct btrfs_dedup_hash *btrfs_dedup_alloc_hash(u16 type); + +/* + * Initial inband dedup info + * Called at dedup enable time. + */ +int btrfs_dedup_enable(struct btrfs_fs_info *fs_info, u16 type, u16 backend, + u64 blocksize, u64 limit_nr); + +/* + * Disable dedup and invalidate all its dedup data. + * Called at dedup disable time. + */ +int btrfs_dedup_disable(struct btrfs_fs_info *fs_info); + +/* + * Calculate hash for dedup. + * Caller must ensure [start, start + dedup_bs) has valid data. + */ +int btrfs_dedup_calc_hash(struct btrfs_fs_info *fs_info, + struct inode *inode, u64 start, + struct btrfs_dedup_hash *hash); + +/* + * Search for duplicated extents by calculated hash + * Caller must call btrfs_dedup_calc_hash() first to get the hash. + * + * @inode: the inode for we are writing + * @file_pos: offset inside the inode + * As we will increase extent ref immediately after a hash match, + * we need @file_pos and @inode in this case. + * + * Return > 0 for a hash match, and the extent ref will be + * *INCREASED*, and hash->bytenr/num_bytes will record the existing + * extent data. + * Return 0 for a hash miss. Nothing is done + */ +int btrfs_dedup_search(struct btrfs_fs_info *fs_info, + struct inode *inode, u64 file_pos, + struct btrfs_dedup_hash *hash); + +/* Add a dedup hash into dedup info */ +int btrfs_dedup_add(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info, + struct btrfs_dedup_hash *hash); + +/* Remove a dedup hash from dedup info */ +int btrfs_dedup