Re: Offline Deduplication for Btrfs

2011-01-06 Thread Gordan Bobic
Chris Mason wrote: Excerpts from Gordan Bobic's message of 2011-01-05 12:42:42 -0500: Josef Bacik wrote: Basically I think online dedup is huge waste of time and completely useless. I couldn't disagree more. First, let's consider what is the general-purpose use-case of data deduplication.

Re: Offline Deduplication for Btrfs

2011-01-06 Thread Gordan Bobic
Spelic wrote: On 01/06/2011 02:03 AM, Gordan Bobic wrote: That's just alarmist. AES is being cryptanalyzed because everything uses it. And the news of it's insecurity are somewhat exaggerated (for now at least). Who cares... the fact of not being much used is a benefit for RIPEMD

Re: Offline Deduplication for Btrfs

2011-01-06 Thread Gordan Bobic
Tomasz Chmielewski wrote: I have been thinking a lot about de-duplication for a backup application I am writing. I wrote a little script to figure out how much it would save me. For my laptop home directory, about 100 GiB of data, it was a couple of percent, depending a bit on the size of the

Re: Offline Deduplication for Btrfs

2011-01-06 Thread Gordan Bobic
Simon Farnsworth wrote: The basic idea is to use fanotify/inotify (whichever of the notification systems works for this) to track which inodes have been written to. It can then mmap() the changed data (before it's been dropped from RAM) and do the same process as an offline dedupe (hash,

Re: Offline Deduplication for Btrfs

2011-01-06 Thread Gordan Bobic
Peter A wrote: On Thursday, January 06, 2011 05:48:18 am you wrote: Can you elaborate what you're talking about here? How does the length of a directory name affect alignment of file block contents? I don't see how variability of length matters, other than to make things a lot more complicated.

Re: Offline Deduplication for Btrfs

2011-01-06 Thread Gordan Bobic
Ondřej Bílka wrote: Then again, for a lot of use-cases there are perhaps better ways to achieve the targed goal than deduping on FS level, e.g. snapshotting or something like fl-cow: http://www.xmailserver.org/flcow.html As VM are concerned fl-cow is poor replacement of deduping. Depends on

Re: Offline Deduplication for Btrfs

2011-01-06 Thread Gordan Bobic
Tomasz Torcz wrote: On Thu, Jan 06, 2011 at 02:19:04AM +0100, Spelic wrote: CPU can handle considerably more than 250 block hashings per second. You could argue that this changes in cases of sequential I/O on big files, but a 1.86GHz GHz Core2 can churn through 111MB/s of SHA256, which even

Re: Offline Deduplication for Btrfs

2011-01-06 Thread Gordan Bobic
Peter A wrote: On Thursday, January 06, 2011 09:00:47 am you wrote: Peter A wrote: I'm saying in a filesystem it doesn't matter - if you bundle everything into a backup stream, it does. Think of tar. 512 byte allignment. I tar up a directory with 8TB total size. No big deal. Now I create a

Re: Synching a Backup Server

2011-01-06 Thread Gordan Bobic
Unfortunately, we don't use btrfs or LVM on remote servers, so there's no snapshotting available during the backup run. In a perfect world, btrfs would be production-ready, ZFS would be available on Linux, and we'd no longer need the abomination called LVM. :) As a matter of fact, ZFS _IS_

Re: Synching a Backup Server

2011-01-06 Thread Gordan Bobic
On 01/06/2011 09:44 PM, Carl Cook wrote: On Thu 06 January 2011 12:07:17 C Anthony Risinger wrote: as for the DB stuff, you definitely need to snapshot _before_ rsync. roughly: ) read lock and flush tables ) snapshot ) unlock tables ) mount snapshot ) rsync from snapshot ie. the same as

Re: Synching a Backup Server

2011-01-06 Thread Gordan Bobic
On 01/06/2011 10:26 PM, Carl Cook wrote: On Thu 06 January 2011 13:58:41 Freddie Cash wrote: Simplest solution is to write a script to create a mysqldump of all databases into a directory, add that to cron so that it runs at the same time everyday, 10-15 minutes before the rsync run is done.

Re: Offline Deduplication for Btrfs

2011-01-05 Thread Gordan Bobic
Josef Bacik wrote: Basically I think online dedup is huge waste of time and completely useless. I couldn't disagree more. First, let's consider what is the general-purpose use-case of data deduplication. What are the resource requirements to perform it? How do these resource requirements

Re: Offline Deduplication for Btrfs

2011-01-05 Thread Gordan Bobic
On 01/05/2011 06:41 PM, Diego Calleja wrote: On Miércoles, 5 de Enero de 2011 18:42:42 Gordan Bobic escribió: So by doing the hash indexing offline, the total amount of disk I/O required effectively doubles, and the amount of CPU spent on doing the hashing is in no way reduced

Re: Offline Deduplication for Btrfs

2011-01-05 Thread Gordan Bobic
On 01/05/2011 07:01 PM, Ray Van Dolson wrote: On Wed, Jan 05, 2011 at 07:41:13PM +0100, Diego Calleja wrote: On Miércoles, 5 de Enero de 2011 18:42:42 Gordan Bobic escribió: So by doing the hash indexing offline, the total amount of disk I/O required effectively doubles, and the amount of CPU

Re: Offline Deduplication for Btrfs

2011-01-05 Thread Gordan Bobic
On 01/05/2011 07:46 PM, Josef Bacik wrote: Blah blah blah, I'm not having an argument about which is better because I simply do not care. I think dedup is silly to begin with, and online dedup even sillier. Offline dedup is more expensive - so why are you of the opinion that it is less

Re: Offline Deduplication for Btrfs

2011-01-05 Thread Gordan Bobic
On 01/05/2011 09:14 PM, Diego Calleja wrote: In fact, there are cases where online dedup is clearly much worse. For example, cases where people suffer duplication, but it takes a lot of time (several months) to hit it. With online dedup, you need to enable it all the time to get deduplication,

Re: Offline Deduplication for Btrfs

2011-01-05 Thread Gordan Bobic
On 01/06/2011 12:22 AM, Spelic wrote: On 01/05/2011 09:46 PM, Gordan Bobic wrote: On 01/05/2011 07:46 PM, Josef Bacik wrote: Offline dedup is more expensive - so why are you of the opinion that it is less silly? And comparison by silliness quotiend still sounds like an argument over which

Re: SSD optimizations

2010-12-13 Thread Gordan Bobic
On 12/13/2010 05:11 AM, Sander wrote: Gordan Bobic wrote (ao): On 12/12/2010 17:24, Paddy Steed wrote: In a few weeks parts for my new computer will be arriving. The storage will be a 128GB SSD. A few weeks after that I will order three large disks for a RAID array. I understand that BTRFS

Re: SSD optimizations

2010-12-13 Thread Gordan Bobic
On 13/12/2010 14:33, Peter Harris wrote: On Mon, Dec 13, 2010 at 4:25 AM, Gordan Bobic wrote: I suggest you back your opinion up with some hard data before making such statements. Here's a quick test - make an ext2 fs and a btrfs on two similar disk partitions (any disk, for the sake

Re: SSD optimizations

2010-12-13 Thread Gordan Bobic
On 13/12/2010 15:17, cwillu wrote: In a few weeks parts for my new computer will be arriving. The storage will be a 128GB SSD. A few weeks after that I will order three large disks for a RAID array. I understand that BTRFS RAID 5 support will be available shortly. What is the best possible way

Re: SSD optimizations

2010-12-13 Thread Gordan Bobic
On 13/12/2010 17:17, Paddy Steed wrote: So, no-one has any idea's on how to implement the cache. Would making it all swap work, does to OS cache files in swap? No, it doesn't. I don't believe there are any plans to implement hierarchical storage in BTRFS, but perhaps one of the developers

Re: SSD optimizations

2010-12-12 Thread Gordan Bobic
On 12/12/2010 17:24, Paddy Steed wrote: In a few weeks parts for my new computer will be arriving. The storage will be a 128GB SSD. A few weeks after that I will order three large disks for a RAID array. I understand that BTRFS RAID 5 support will be available shortly. What is the best possible

Re: VFS support for fast copy on deduplicating FSes

2010-11-25 Thread Gordan Bobic
David Nicol wrote: unresearched question/suggestion: Is there general support for a fast copy ioctl in the VFS layer, which would be hooked by file systems that support COW or other forms of deduplication and can provide copy semantics by manipulating metadata only? What would be nice to have

Re: Update to Project_ideas wiki page

2010-11-18 Thread Gordan Bobic
Bart Kus wrote: On 11/17/2010 10:07 AM, Gordan Bobic wrote: On 11/17/2010 05:56 PM, Hugo Mills wrote: On Wed, Nov 17, 2010 at 04:12:29PM +0100, Bart Noordervliet wrote: Can I suggest we combine this new RAID level management with a modernisation of the terminology for storage redundancy

Re: Update to Project_ideas wiki page

2010-11-18 Thread Gordan Bobic
Bart Noordervliet wrote: On Wed, Nov 17, 2010 at 19:07, Gordan Bobic gor...@bobich.net wrote: Since BTRFS is already doing some relatively radical things, I would like to suggest that RAID5 and RAID6 be deemed obsolete. RAID5 isn't safely usable for arrays bigger than about 5TB with disks

Re: Update to Project_ideas wiki page

2010-11-17 Thread Gordan Bobic
On 11/17/2010 05:56 PM, Hugo Mills wrote: On Wed, Nov 17, 2010 at 04:12:29PM +0100, Bart Noordervliet wrote: Can I suggest we combine this new RAID level management with a modernisation of the terminology for storage redundancy, as has been discussed previously in the Raid1 with 3 drives thread

Linear (JBOD) Array Mode

2010-11-12 Thread Gordan Bobic
Is there an option in btrfs for this mode of RAID? I know it supports the equivalent of RAID10, but what I am after is JBOD of mirrors. The reason I want this is for making a really low power home NAS, typically for home theater/media use. I believe this would yield better power savings in the

Copy-on-write hard-links

2010-06-10 Thread Gordan Bobic
Is there a feature in btrfs to manually/explicitly mark hard-links to be copy-on-write? My understanding is that this is what happens when a snapshot is mounted rw and files modified. Consider this scenario: I have a base template fs. I make two snapshots of it that are identical. The files

Re: Copy-on-write hard-links

2010-06-10 Thread Gordan Bobic
On 06/10/2010 09:00 PM, Chris Mason wrote: On Thu, Jun 10, 2010 at 06:11:40PM +0100, Gordan Bobic wrote: Is there a feature in btrfs to manually/explicitly mark hard-links to be copy-on-write? My understanding is that this is what happens when a snapshot is mounted rw and files modified

Re: Poor performance with qemu

2010-04-08 Thread Gordan Bobic
Avi Kivity wrote: On 03/30/2010 03:56 PM, Chris Mason wrote: On Sun, Mar 28, 2010 at 05:18:03PM +0200, Diego Calleja wrote: Hi, I'm using KVM, and the virtual disk (a 20 GB file using the raw qemu format according to virt-manager and, of course, placed on a btrfs filesystem, running the

Re: SSD Optimizations

2010-03-11 Thread Gordan Bobic
On Thu, 11 Mar 2010 08:38:53 +0100, Sander san...@humilis.net wrote: Are there options available comparable to ext2/ext3 to help reduce wear and improve performance? With SSDs you don't have to worry about wear. And if you believe that you clearly swallowed the marketing spiel hook line and

Re: SSD Optimizations

2010-03-11 Thread Gordan Bobic
On Thu, 11 Mar 2010 13:59:09 +0100, Stephan von Krawczynski sk...@ithnet.com wrote: On Wed, Mar 10, 2010 at 11:49 AM, Gordan Bobic gor...@bobich.net wrote: Are there options available comparable to ext2/ext3 to help reduce wear and improve performance? With SSDs you

Re: SSD Optimizations

2010-03-11 Thread Gordan Bobic
On Thu, 11 Mar 2010 16:35:33 +0100, Stephan von Krawczynski sk...@ithnet.com wrote: Besides, why shouldn't we help the drive firmware by - writing the data only in erase-block sizes - trying to write blocks that are smaller than the erase-block in a way that won't cross the erase-block

Re: SSD Optimizations

2010-03-11 Thread Gordan Bobic
On Thu, 11 Mar 2010 14:42:40 +0100, Asdo a...@shiftmail.org wrote: 1- I think the SSD would rewrite once-written blocks to other locations, so to reuse the same physical blocks for wear levelling. The written-once blocks are very good candidates because their write-count is 1 There are

Re: SSD Optimizations

2010-03-11 Thread Gordan Bobic
On Thu, 11 Mar 2010 09:21:30 -0500, Chris Mason chris.ma...@oracle.com wrote: On Wed, Mar 10, 2010 at 07:49:34PM +, Gordan Bobic wrote: I'm looking to try BTRFS on a SSD, and I would like to know what SSD optimizations it applies. Is there a comprehensive list of what ssd mount option does

SSD Optimizations

2010-03-10 Thread Gordan Bobic
I'm looking to try BTRFS on a SSD, and I would like to know what SSD optimizations it applies. Is there a comprehensive list of what ssd mount option does? How are the blocks and metadata arranged? Are there options available comparable to ext2/ext3 to help reduce wear and improve performance?

Re: SSD Optimizations

2010-03-10 Thread Gordan Bobic
Marcus Fritzsch wrote: Hi there, On Wed, Mar 10, 2010 at 8:49 PM, Gordan Bobic gor...@bobich.net wrote: [...] Are there similar optimizations available in BTRFS? There is an SSD mount option available[1]. [1] http://btrfs.wiki.kernel.org/index.php/Getting_started#Mount_Options But what

Re: SSD Optimizations

2010-03-10 Thread Gordan Bobic
Mike Fedyk wrote: On Wed, Mar 10, 2010 at 11:49 AM, Gordan Bobic gor...@bobich.net wrote: I'm looking to try BTRFS on a SSD, and I would like to know what SSD optimizations it applies. Is there a comprehensive list of what ssd mount option does? How are the blocks and metadata arranged