Chris Mason wrote:
On Thu, Jun 04, 2009 at 10:49:19AM +0200, Thomas Glanzmann wrote:
Hello Chris,
My question is now, how often can a block in btrfs be refferenced?
The exact answer depends on if we are referencing it from a single
file or from multiple files. But either way it is roughly
On Fri, Jun 05, 2009 at 02:20:48PM +0200, Tomasz Chmielewski wrote:
Chris Mason wrote:
On Thu, Jun 04, 2009 at 10:49:19AM +0200, Thomas Glanzmann wrote:
Hello Chris,
My question is now, how often can a block in btrfs be refferenced?
The exact answer depends on if we are referencing it from
Chris Mason wrote:
I wonder how well would deduplication work with defragmentation? One
excludes the other to some extent.
Very much so ;) Ideally we end up doing dedup in large extents, but it
will definitely increase the overall fragmentation of the FS.
Defragmentation could lead to
Hello Chris,
My question is now, how often can a block in btrfs be refferenced?
The exact answer depends on if we are referencing it from a single
file or from multiple files. But either way it is roughly 2^32.
could you please explain to me what underlying datastructure is used to
Hello Heinz,
Hi, during the last half year I thought a little bit about doing dedup
for my backup program: not only with fixed blocks (which is
implemented), but with moving blocks (with all offsets in a file: 1
byte, 2 byte, ...). That means, I have to have *lots* of comparisions
(size of
Heinz-Josef Claes wrote (ao):
Am Dienstag, 28. April 2009 19:38:24 schrieb Chris Mason:
On Tue, 2009-04-28 at 19:34 +0200, Thomas Glanzmann wrote:
Hello,
I wouldn't rely on crc32: it is not a strong hash,
Such deduplication can lead to various problems,
including security
On Tue, 5 May 2009 07:29:45 +1000
Dmitri Nikulin dniku...@gmail.com wrote:
On Tue, May 5, 2009 at 5:11 AM, Heinz-Josef Claes hjcl...@web.de wrote:
Hi, during the last half year I thought a little bit about doing dedup for
my backup program: not only with fixed blocks (which is implemented),
Hello Jan,
* Jan-Frode Myklebust janfr...@tanso.net [090504 20:20]:
thin or shallow clones sounds more like sparse images. I believe
linked clones is the word for running multiple virtual machines off
a single gold image. Ref, the VMware View Composer section of:
not exactly. VMware has one
Ric Wheeler schrieb:
One thing in the above scheme that would be really interesting for all
possible hash functions is maintaining good stats on hash collisions,
effectiveness of the hash, etc. There has been a lot of press about MD5
hash collisions for example - it would be really neat to be
Hello Ric,
(1) Block level or file level dedup?
what is the difference between the two?
(2) Inband dedup (during a write) or background dedup?
I think inband dedup is way to intensive on ressources (memory) and also
would kill every performance benchmark. So I think the offline dedup is
the
Thomas Glanzmann wrote:
Hello Ric,
(1) Block level or file level dedup?
what is the difference between the two?
(2) Inband dedup (during a write) or background dedup?
I think inband dedup is way to intensive on ressources (memory) and also
would kill every performance benchmark. So I
On 05/04/2009 10:39 AM, Tomasz Chmielewski wrote:
Ric Wheeler schrieb:
One thing in the above scheme that would be really interesting for
all possible hash functions is maintaining good stats on hash
collisions, effectiveness of the hash, etc. There has been a lot of
press about MD5 hash
On 04/28/2009 01:41 PM, Michael Tharp wrote:
Thomas Glanzmann wrote:
no, I just used the md5 checksum. And even if I have a hash escalation
which is highly unlikely it still gives a good house number.
I'd start with a crc32 and/or MD5 to find candidate blocks, then do a
bytewise comparison
Hello Andrey,
As far as I understand, VMware already ships this gold image feature
(as they call it) for Windows environments and claims it to be very
efficient.
they call it ,,thin or shallow clones'' and ship it with desktop
virtualization (one vm per thinclient user) and for VMware lab
Ric,
I would not categorize it as offline, but just not as inband (i.e., you can
run a low priority background process to handle dedup).
Offline windows are extremely rare in production sites these days and
it could take a very long time to do dedup at the block level over a
large file
On 2009-05-04, Thomas Glanzmann tho...@glanzmann.de wrote:
As far as I understand, VMware already ships this gold image feature
(as they call it) for Windows environments and claims it to be very
efficient.
they call it ,,thin or shallow clones''
thin or shallow clones sounds more like
Thomas Glanzmann schrieb:
Ric,
I would not categorize it as offline, but just not as inband (i.e., you can
run a low priority background process to handle dedup).
Offline windows are extremely rare in production sites these days and
it could take a very long time to do dedup at
On Mon, May 4, 2009 at 10:06 PM, Jan-Frode Myklebust janfr...@tanso.net wrote:
Looking at the website content, it also revealed that VMware will have a
similiar feature for their workhorse ,,esx server'' in the upcoming
release, however my point still stands. Ship out a service pack for
On Tue, May 5, 2009 at 5:11 AM, Heinz-Josef Claes hjcl...@web.de wrote:
Hi, during the last half year I thought a little bit about doing dedup for
my backup program: not only with fixed blocks (which is implemented), but
with moving blocks (with all offsets in a file: 1 byte, 2 byte, ...). That
Thomas Glanzmann wrote:
Looking at this picture, when I'm going to implement the dedup code, do I also
have to take care to spread the blocks over the different devices or is
there already infrastructure in place that automates that process?
If you somehow had blocks duplicated exactly across
On Wed, 2009-04-29 at 14:03 +0200, Thomas Glanzmann wrote:
Hello Chris,
You can start with the code documentation section on
http://btrfs.wiki.kernel.org
I read through this and at the moment one questions come in my mind:
Hello Chris,
But, in your ioctls you want to deal with [file, offset, len], not
directly with block numbers. COW means that blocks can move around
without you knowing, and some of the btrfs internals will COW files in
order to relocate storage.
So, what you want is a dedup file (or files)
On Wed, 2009-04-29 at 15:58 +0200, Thomas Glanzmann wrote:
Hello Chris,
But, in your ioctls you want to deal with [file, offset, len], not
directly with block numbers. COW means that blocks can move around
without you knowing, and some of the btrfs internals will COW files in
order to
Hello Chris,
Your database should know, and the ioctl could check to see if the
source and destination already point to the same thing before doing
anything expensive.
I see.
So, if I only have file, offset, len and not the block number, is there
a way from userland to tell if two blocks
Andrey Kuzmin wrote:
On Tue, Apr 28, 2009 at 2:02 PM, Chris Mason chris.ma...@oracle.com wrote:
On Tue, 2009-04-28 at 07:22 +0200, Thomas Glanzmann wrote:
Hello Chris,
There is a btrfs ioctl to clone individual files, and this could be used
to implement an online dedup. But, since it is
Chris,
what blocksizes can I choose with btrfs? Do you think that it is
possible for an outsider like me to submit patches to btrfs which enable
dedup in three fulltime days?
Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to
Thomas Glanzmann schrieb:
300 Gbyte of used storage of several productive VMs with the following
Operatings systems running:
\begin{itemize}
\item Red Hat Linux 32 and 64 Bit (Release 3, 4 and 5)
\item SuSE Linux 32 and 64 Bit (SLES 9 and 10)
\item Windows 2003 Std.
Tomasz Chmielewski wrote:
Thomas Glanzmann schrieb:
300 Gbyte of used storage of several productive VMs with the following
Operatings systems running:
\begin{itemize}
\item Red Hat Linux 32 and 64 Bit (Release 3, 4 and 5)
\item SuSE Linux 32 and 64 Bit (SLES 9 and 10)
Hello,
I wouldn't rely on crc32: it is not a strong hash,
Such deduplication can lead to various problems,
including security ones.
sure thing, did you think of replacing crc32 with sha1 or md5, is this
even possible (is there enough space reserved so that the change can be
done without
Hello Chris,
Is there a checksum for every block in btrfs?
Yes, but they are only crc32c.
I see, is it easily possible to exchange that with sha-1 or md5?
Is it possible to retrieve these checksums from userland?
Not today. The sage developers sent a patch to make an ioctl for
this,
On Tue, 2009-04-28 at 19:34 +0200, Thomas Glanzmann wrote:
Hello,
I wouldn't rely on crc32: it is not a strong hash,
Such deduplication can lead to various problems,
including security ones.
sure thing, did you think of replacing crc32 with sha1 or md5, is this
even possible (is there
On Tue, 2009-04-28 at 19:37 +0200, Thomas Glanzmann wrote:
Hello Chris,
Is there a checksum for every block in btrfs?
Yes, but they are only crc32c.
I see, is it easily possible to exchange that with sha-1 or md5?
Yes, but for the purposes of dedup, it's not exactly what you want.
Hello,
It is possible, there's room in the metadata for about about 4k of
checksum for each 4k of data. The initial btrfs code used sha256, but
the real limiting factor is the CPU time used.
I see. There a very efficient md5 algorithms out there, for example,
especially if the code is
Am Dienstag, 28. April 2009 19:38:24 schrieb Chris Mason:
On Tue, 2009-04-28 at 19:34 +0200, Thomas Glanzmann wrote:
Hello,
I wouldn't rely on crc32: it is not a strong hash,
Such deduplication can lead to various problems,
including security ones.
sure thing, did you think of
Thomas Glanzmann wrote:
no, I just used the md5 checksum. And even if I have a hash escalation
which is highly unlikely it still gives a good house number.
I'd start with a crc32 and/or MD5 to find candidate blocks, then do a
bytewise comparison before actually merging them. Even the risk of
Hello Chris,
Right now the blocksize can only be the same as the page size. For
this external dedup program you have in mind, you could use any
multiple of the page size.
perfect. Exactly what I need.
Three days is probably not quite enough ;) I'd honestly prefer the
dedup happen
Hello Heinz,
It's not only cpu time, it's also memory. You need 32 byte for each 4k
block. It needs to be in RAM for performance reason.
exactly and that is not going to scale.
Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to
Hello,
* Thomas Glanzmann tho...@glanzmann.de [090428 22:10]:
exactly. And if there is a way to retrieve the already calculated
checksums from kernel land, than it would be possible to implement a
,,systemcall'' that gives the kernel a hint of a possible duplicated
block (like providing a
Am Dienstag, 28. April 2009 22:16:19 schrieb Thomas Glanzmann:
Hello Heinz,
It's not only cpu time, it's also memory. You need 32 byte for each 4k
block. It needs to be in RAM for performance reason.
exactly and that is not going to scale.
Thomas
Hi Thomas,
I wrote a backup
On Tue, 2009-04-28 at 22:52 +0200, Thomas Glanzmann wrote:
Hello Heinz,
I wrote a backup tool which uses dedup, so I know a little bit about
the problem and the performance impact if the checksums are not in
memory (optionally in that tool).
http://savannah.gnu.org/projects/storebackup
Hello,
- Implement a system call that reports all checksums and unique
block identifiers for all stored blocks.
This would require storing the larger checksums in the filesystem. It
is much better done in the dedup program.
I think I misunderstood something here. I
On Wed, Apr 29, 2009 at 3:43 AM, Chris Mason chris.ma...@oracle.com wrote:
So you need an extra index either way. It makes sense to keep the
crc32c csums for fast verification of the data read from disk and only
use the expensive csums for dedup.
What about self-healing? With only a CRC32 to
On Tue, 2009-04-28 at 23:12 +0200, Thomas Glanzmann wrote:
Hello,
- Implement a system call that reports all checksums and unique
block identifiers for all stored blocks.
This would require storing the larger checksums in the filesystem. It
is much better done in
On Wed, 2009-04-29 at 00:14 +0200, Thomas Glanzmann wrote:
Hello Chris,
They are, but only the crc32c are stored today.
maybe crc32c is good enough to identify duplicated blocks, I mean we
only need a hint, the dedup ioctl does the double checking. I will write
tomorrow a perl script and
On Tue, Apr 28, 2009 at 04:58:15PM -0400, Chris Mason wrote:
Assuming a 4 kbyte block size that would mean for a 1 Tbyte
filesystem:
1Tbyte / 4096 / 8 = 32 Mbyte of memory (this should of course
be saved to disk from time to time and be restored on
On Mon, 2009-04-27 at 05:33 +0200, Thomas Glanzmann wrote:
Hello,
I would like to know if it would be possible to implement the following
feature in btrfs:
Have an online filesystem check which accounts for possible duplicated
data blocks (maybe with the help of already implemented
On Thu, Oct 16, 2008 at 03:30:49PM -0400, Chris Mason wrote:
On Thu, 2008-10-16 at 15:25 -0400, Valerie Aurora Henson wrote:
Both deduplication and compression have an interesting side effect in
which a write to a previously allocated block can return ENOSPC.
This is even more exciting
On Thu, Oct 16, 2008 at 03:25:01PM -0400, Valerie Aurora Henson wrote:
Both deduplication and compression have an interesting side effect in
which a write to a previously allocated block can return ENOSPC.
This is even more exciting when you factor in mmap. Any thoughts on
how to handle this?
On Wed, Oct 15, 2008 at 03:39:16PM +0200, Avi Kivity wrote:
Andi Kleen wrote:
Ray Van Dolson [EMAIL PROTECTED] writes:
I recall their being a thread here a number of months back regarding
data-deduplication support for bttfs.
Did anyone end up picking that up and giving a go at it?
On Wed, Oct 15, 2008 at 3:15 PM, Andi Kleen [EMAIL PROTECTED] wrote:
On Wed, Oct 15, 2008 at 03:39:16PM +0200, Avi Kivity wrote:
Andi Kleen wrote:
Ray Van Dolson [EMAIL PROTECTED] writes:
I recall their being a thread here a number of months back regarding
data-deduplication support for
Andi Kleen wrote:
There are some patches to do in QEMU's cow format for KVM. That's
user level only.
And thus, doesn't work for sharing between different images, especially
at runtime.
It would work if the images are all based once on a reference image, won't it?
Yes
On Sat, 2008-10-11 at 19:06 -0700, Ray Van Dolson wrote:
I recall their being a thread here a number of months back regarding
data-deduplication support for bttfs.
Did anyone end up picking that up and giving a go at it? Block level
data dedup would be *awesome* in a Linux filesystem. It
Ray Van Dolson [EMAIL PROTECTED] writes:
I recall their being a thread here a number of months back regarding
data-deduplication support for bttfs.
Did anyone end up picking that up and giving a go at it? Block level
data dedup would be *awesome* in a Linux filesystem. It does wonders
for
53 matches
Mail list logo