Re: [CentOS] Deduplication data for CentOS?

2012-10-01 Thread joel billy
At our shop we have used quadstor - http://www.quadstor.com with good
amount of success. But our use is specifically for vmware environments
over a SAN. However it is possible (i have tried this a couple of
times) to use the quadstor virtual disks as a local block device,
format it with ext4 or btrfs etc. and get the benefits of
deduplication, compression etc. Yes btrfs deduplication is possible
:-), i have tried it.
You might need to check on the memory requirements for NAS/local
filesystems. We use 8 GB in our SAN box and so far things are fine.

- jb

Rainer Traut tr.ml@... writes:


 Hi list,

 is there any working solution for deduplication of data for centos?
 We are trying to find a solution for our backup server which runs a bash
 script invoking xdelta(3). But having this functionality in fs is much
 more friendly...

 We have looked into lessfs, sdfs and ddar.
 Are these filesystems ready to use (on centos)?
 ddar is sthg different, I know.

 Thx
 Rainer

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-09-13 Thread Ryan Palamara
The better option for ZFS would be to get a SSD and move the dedupe table onto 
that drive instead of having it in RAM, because it can become massive.

Thank you,

Ryan Palamara
ZAIS Group, LLC
2 Bridge Avenue, Suite 322
Red Bank, New Jersey 07701
Phone: (732) 450-7444
ryan.palam...@zaisgroup.com


-Original Message-
From: centos-boun...@centos.org [mailto:centos-boun...@centos.org] On Behalf Of 
Dean Jones
Sent: Monday, August 27, 2012 11:45 AM
To: CentOS mailing list
Subject: Re: [CentOS] Deduplication data for CentOS?

Deduplication with ZFS takes a lot of RAM.

I would not yet trust any of the linux zfs projects for data that I
wanted to keep long term.

On Mon, Aug 27, 2012 at 8:26 AM, Les Mikesell lesmikes...@gmail.com wrote:
 On Mon, Aug 27, 2012 at 9:23 AM, John R Pierce pie...@hogranch.com wrote:
 On 08/27/12 4:55 AM, Rainer Traut wrote:
 is there any working solution for deduplication of data for centos?
 We are trying to find a solution for our backup server which runs a bash
 script invoking xdelta(3). But having this functionality in fs is much
 more friendly...

 BackupPC does exactly this.its not a generalized solution to
 deduplication of a file system, instead, its a backup system, designed
 to backup multiple targets, that implements deduplication on the backup
 tree it maintains.

 Not _exactly_, but maybe close enough and it is very easy to install
 and try.   Backuppc will use rsync for transfers and thus only uses
 bandwidth for the differences, but it uses hardlinks to files to dedup
 the storage.  It will find and link duplicate content even from
 different sources, but the complete file must be identical.  It does
 not store deltas, so large files that change even slightly between
 backups end up stored as complete copies (with optional compression).

 --
Les Mikesell
  lesmikes...@gmail.com
 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos



This e-mail message is intended only for the named recipient(s) above. It may 
contain confidential information. If you are not the intended recipient you are 
hereby notified that any dissemination, distribution or copying of this e-mail 
and any attachment(s) is strictly prohibited. If you have received this e-mail 
in error, please immediately notify the sender by replying to this e-mail and 
delete the message and any attachment(s) from your system. Thank you.

This is not an offer (or solicitation of an offer) to buy/sell the 
securities/instruments mentioned or an official confirmation. This is not 
research and is not from ZAIS Group but it may refer to a research 
analyst/research report. Unless indicated, these views are the author's and may 
differ from those of ZAIS Group research or others in the Firm. We do not 
represent this is accurate or complete and we may not update this. Past 
performance is not indicative of future returns.

IRS CIRCULAR 230 NOTICE:.

To comply with requirements imposed by the IRS, we inform you that any U.S. 
federal tax advice contained herein (including any attachments), unless 
specifically stated otherwise, is not intended or written to be used, and 
cannot be used, for the purpose of (i) avoiding penalties under the Internal 
Revenue Code or (ii) promoting, marketing or recommending any transaction or 
matter addressed herein to another party. Each taxpayer should seek advice 
based on the taxpayer's particular circumstances from an independent tax 
advisor.

ZAIS, ZAIS Group and ZAIS Solutions are trademarks of ZAIS Group, LLC.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-09-13 Thread Les Mikesell
On Thu, Sep 13, 2012 at 12:06 PM, Ryan Palamara
ryan.palam...@zaisgroup.com wrote:
 The better option for ZFS would be to get a SSD and move the dedupe table 
 onto that drive instead of having it in RAM, because it can become massive.

What's 'massive' in dollars these days?

-- 
   Les Mikesell
 lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-09-13 Thread Ryan Palamara
It depends on size of the data that you are storing and the block size. Here is 
a good primer on it: 
http://constantin.glez.de/blog/2011/07/zfs-dedupe-or-not-dedupe

As a quick estimate, about 5GB per 1TB or storage for SSD. However I believe 
that you would need even more RAM since only a 1/4 of the RAM will be used for 
the dedupe table with ZFS.

Thank you,

Ryan Palamara
ZAIS Group, LLC
2 Bridge Avenue, Suite 322
Red Bank, New Jersey 07701
Phone: (732) 450-7444
ryan.palam...@zaisgroup.com


-Original Message-
From: centos-boun...@centos.org [mailto:centos-boun...@centos.org] On Behalf Of 
Les Mikesell
Sent: Thursday, September 13, 2012 3:09 PM
To: CentOS mailing list
Subject: Re: [CentOS] Deduplication data for CentOS?

On Thu, Sep 13, 2012 at 12:06 PM, Ryan Palamara ryan.palam...@zaisgroup.com 
wrote:
 The better option for ZFS would be to get a SSD and move the dedupe table 
 onto that drive instead of having it in RAM, because it can become massive.

What's 'massive' in dollars these days?

--
   Les Mikesell
 lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos



This e-mail message is intended only for the named recipient(s) above. It may 
contain confidential information. If you are not the intended recipient you are 
hereby notified that any dissemination, distribution or copying of this e-mail 
and any attachment(s) is strictly prohibited. If you have received this e-mail 
in error, please immediately notify the sender by replying to this e-mail and 
delete the message and any attachment(s) from your system. Thank you.

This is not an offer (or solicitation of an offer) to buy/sell the 
securities/instruments mentioned or an official confirmation. This is not 
research and is not from ZAIS Group but it may refer to a research 
analyst/research report. Unless indicated, these views are the author's and may 
differ from those of ZAIS Group research or others in the Firm. We do not 
represent this is accurate or complete and we may not update this. Past 
performance is not indicative of future returns.

IRS CIRCULAR 230 NOTICE:.

To comply with requirements imposed by the IRS, we inform you that any U.S. 
federal tax advice contained herein (including any attachments), unless 
specifically stated otherwise, is not intended or written to be used, and 
cannot be used, for the purpose of (i) avoiding penalties under the Internal 
Revenue Code or (ii) promoting, marketing or recommending any transaction or 
matter addressed herein to another party. Each taxpayer should seek advice 
based on the taxpayer's particular circumstances from an independent tax 
advisor.

ZAIS, ZAIS Group and ZAIS Solutions are trademarks of ZAIS Group, LLC.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-09-11 Thread Bob Hepple
Rainer Traut tr.ml@... writes:

 
 Hi list,
 
 is there any working solution for deduplication of data for centos?
 We are trying to find a solution for our backup server which runs a bash 
 script invoking xdelta(3). But having this functionality in fs is much 
 more friendly...
 
 We have looked into lessfs, sdfs and ddar.
 Are these filesystems ready to use (on centos)?
 ddar is sthg different, I know.
 
 Thx
 Rainer
 


Not sure if it's already been mentioned but storeBackup uses rsync and hardlinks
to minimise storage - and it break up big files and backs up the fragments
separately. May help ...
http://www.nongnu.org/storebackup/en/node2.html

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-29 Thread Rainer Traut
Am 28.08.2012 21:26, schrieb Les Mikesell:
 On Tue, Aug 28, 2012 at 2:04 PM, John R Pierce pie...@hogranch.com wrote:
 On 08/28/12 11:41 AM, Les Mikesell wrote:
 On Tue, Aug 28, 2012 at 3:03 AM, Rainer Trauttr...@gmx.de  wrote:

 Rsync is of no use for us. We have mainly big Domino .nsf files which
 only change slightly. So rsync  would not be able to make many hardlinks. 
 :)
 Rdiff-backup might work for this since it stores deltas.   Are you
 doing something to snapshot the filesystem during the copy or are
 these just growing logs where consistency doesn't matter?

 NSF files are a proprietary database format used by Lotus Notes and
 Domino, very complex, there's a pile of versions, and they are totally
 opaque.  Pretty sure that if they are being accessed or updated while
 being copied the copy is invalid, so yes, some form of snapshotting is
 required.

 commercial backup software uses Domino/Notes APIs to do incremental
 backups, for example
 http://www.symantec.com/business/support/index?page=contentid=TECH46513

 If there is a command-line way to generate an incremental backup file,
 backuppc could run it via ssh as a pre-backup command.


Yes, there is commercial software to do incremental backups but I do not 
know of commandline options to do this. Maybe anyone?

Les is right, I stop the server, take the snapshot, start the server and 
do the xdelta on the snapshot NSF files.
Having that minimal downtime is ok and acknowledged by the customer.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-29 Thread John R Pierce
On 08/29/12 2:43 AM, Rainer Traut wrote:
 Yes, there is commercial software to do incremental backups but I do not
 know of commandline options to do this. Maybe anyone?

 Les is right, I stop the server, take the snapshot, start the server and
 do the xdelta on the snapshot NSF files.
 Having that minimal downtime is ok and acknowledged by the customer.

I found some more stuff on a IBM site talking about the API (has to be 
called from software, not command line) to generate and keep track of 
transaction log files which the backup software archives. nothing about 
de-dup.



-- 
john r pierceN 37, W 122
santa cruz ca mid-left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-28 Thread Rainer Traut
Am 27.08.2012 16:04, schrieb Janne Snabb:
 On 08/27/2012 07:23 PM, Rainer Traut wrote:

 Yeah I know it has this feature, but is there a working zfs
 implementation for linux?

 I have heard some positive feedback about http://zfsonlinux.org/ but I
 have not had time to test myself yet. It probably depends on your
 intended usage. It is a new in-kernel ZFS implementation (different from
 the old FUSE implementation).

 RHEL 6.2 x86_64 is listed as one of the supported OSes, so it probably
 works fine with CentOS too.

 There is some positive and negative feedback in the following links:

 https://groups.google.com/a/zfsonlinux.org/group/zfs-discuss/browse_thread/thread/5a739039623f8fb1

 http://pingd.org/2012/installing-zfs-raid-z-on-centos-6-2-with-ssd-caching.html

 Please share your results if you do any testing :)

The website looks promising. They are using a thing called SPL, 
Sun/Solaris Porting Layer to be able to use the Solaris ZFS code.
But there is no more OpenSolaris, isn't it? Means they have to stay with 
the ZFS code from when it was open?
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-28 Thread Rainer Traut
Am 27.08.2012 18:04, schrieb Les Mikesell:
 On Mon, Aug 27, 2012 at 6:55 AM, Rainer Traut tr...@gmx.de wrote:

 is there any working solution for deduplication of data for centos?
 We are trying to find a solution for our backup server which runs a bash
 script invoking xdelta(3). But having this functionality in fs is much
 more friendly...


 Below forwarded on behalf of mroth:

 Les,

 A favor, please?  Could you post this for me? Spamhouse is bouncing me
 again, this time because *they* have a bug (see below). I tried asking
 Karanbir, but I guess he's not online yet

 Thanks in advance.

 John R Pierce wrote:
 On 08/27/12 4:55 AM, Rainer Traut wrote:
 is there any working solution for deduplication of data for centos? We
 are trying to find a solution for our backup server which runs a bash
 script invoking xdelta(3). But having this functionality in fs is much
 more friendly...


 I've tried, twice, to suggest that a workaround that doesn't involve a
 new, and possibly experimental f/s would be to use rsync with hard links,
 which is what we do. There's no way we have enough disk space for 5 weeks
 of terabytes of data

Rsync is of no use for us. We have mainly big Domino .nsf files which 
only change slightly. So rsync  would not be able to make many hardlinks. :)
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-28 Thread Rainer Traut
Am 27.08.2012 22:55, schrieb Adam Tauno Williams:
 On Mon, 2012-08-27 at 14:32 -0400, Brian Mathis wrote:
 On Mon, Aug 27, 2012 at 7:55 AM, Rainer Traut tr...@gmx.de wrote:
 We have looked into lessfs, sdfs and ddar.
 Are these filesystems ready to use (on centos)?
 ddar is sthg different, I know.
 This is something I have been thinking about peripherally for a while
 now.  What are your impressions of SDFS (OpenDedupe)?  I had been
 hoping it would be pretty good.  Any issues with it on CentOS?

 I've used it for backups; it works reliably.  It is memory hungry
 however [sort of the nature of block-level deduplication].
 http://www.wmmi.net/documents/OpenDedup.pdf

I have read the pdf and one thing strikes me:
--io-chunk-size SIZE in kB; use 4 for VMDKs, defaults to 128

and later:
● Memory
● 2GB allocation OK for:
● 200GB@4KB chunks
● 6TB@128KB chunks
...
32TB of data at 128KB requires
8GB of RAM. 1TB @ 4KB equals
the same 8GB.

We are using ESXi5 in a SAN environment, right now with a 2TB backup volume.
You are right, 16GB of ram is still much...
And why 4k chunk size for VMDKs?

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-28 Thread Fajar Priyanto
Sorry for the top posting.
Dedup is just a hype. After a while the table that manage the deduped data
will be just too big. Don't use it for long term.

Sent from Samsung Galaxy ^^
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-28 Thread John R Pierce
On 08/28/12 12:58 AM, Rainer Traut wrote:
 The website looks promising. They are using a thing called SPL,
 Sun/Solaris Porting Layer to be able to use the Solaris ZFS code.
 But there is no more OpenSolaris, isn't it? Means they have to stay with
 the ZFS code from when it was open?

opensolaris spawned ilumnos (the kernel) and openindiana (a complete OS 
based on ilumnos and opensolaris) as well as some other ilumnos based 
distributions like nexenta.





-- 
john r pierceN 37, W 122
santa cruz ca mid-left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-28 Thread John R Pierce
On 08/28/12 1:03 AM, Rainer Traut wrote:
 Rsync is of no use for us. We have mainly big Domino .nsf files which
 only change slightly. So rsync  would not be able to make many hardlinks. :)

so you need block level dedup?  good luck with that.  never seen a 
scheme yet that wasn't full of issues or had really bad performance.



-- 
john r pierceN 37, W 122
santa cruz ca mid-left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-28 Thread Leon Fauster
Am 28.08.2012 um 10:03 schrieb Rainer Traut:
 Rsync is of no use for us. We have mainly big Domino .nsf files which 
 only change slightly. So rsync  would not be able to make many hardlinks. :)


can this endeavor ensure the consistence of this database files? 

--
LF



___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-28 Thread Les Mikesell
On Tue, Aug 28, 2012 at 3:03 AM, Rainer Traut tr...@gmx.de wrote:

 Rsync is of no use for us. We have mainly big Domino .nsf files which
 only change slightly. So rsync  would not be able to make many hardlinks. :)

Rdiff-backup might work for this since it stores deltas.   Are you
doing something to snapshot the filesystem during the copy or are
these just growing logs where consistency doesn't matter?

I'd probably look at freebsd with zfs on a machine with a boatload of
ram if I needed dedup in the filesystem right now.   Or put together
some scripts that would copyand split the large files to chunks in a
directory and let backuppc take it from there.


-- 
   Les Mikesell
 lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-28 Thread John R Pierce
On 08/28/12 11:41 AM, Les Mikesell wrote:
 On Tue, Aug 28, 2012 at 3:03 AM, Rainer Trauttr...@gmx.de  wrote:
 
 Rsync is of no use for us. We have mainly big Domino .nsf files which
 only change slightly. So rsync  would not be able to make many hardlinks. :)
 Rdiff-backup might work for this since it stores deltas.   Are you
 doing something to snapshot the filesystem during the copy or are
 these just growing logs where consistency doesn't matter?

NSF files are a proprietary database format used by Lotus Notes and 
Domino, very complex, there's a pile of versions, and they are totally 
opaque.  Pretty sure that if they are being accessed or updated while 
being copied the copy is invalid, so yes, some form of snapshotting is 
required.

commercial backup software uses Domino/Notes APIs to do incremental 
backups, for example 
http://www.symantec.com/business/support/index?page=contentid=TECH46513



-- 
john r pierceN 37, W 122
santa cruz ca mid-left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-28 Thread Les Mikesell
On Tue, Aug 28, 2012 at 2:04 PM, John R Pierce pie...@hogranch.com wrote:
 On 08/28/12 11:41 AM, Les Mikesell wrote:
 On Tue, Aug 28, 2012 at 3:03 AM, Rainer Trauttr...@gmx.de  wrote:
 
 Rsync is of no use for us. We have mainly big Domino .nsf files which
 only change slightly. So rsync  would not be able to make many hardlinks. 
 :)
 Rdiff-backup might work for this since it stores deltas.   Are you
 doing something to snapshot the filesystem during the copy or are
 these just growing logs where consistency doesn't matter?

 NSF files are a proprietary database format used by Lotus Notes and
 Domino, very complex, there's a pile of versions, and they are totally
 opaque.  Pretty sure that if they are being accessed or updated while
 being copied the copy is invalid, so yes, some form of snapshotting is
 required.

 commercial backup software uses Domino/Notes APIs to do incremental
 backups, for example
 http://www.symantec.com/business/support/index?page=contentid=TECH46513

If there is a command-line way to generate an incremental backup file,
backuppc could run it via ssh as a pre-backup command.

-- 
  Les Mikesell
lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] Deduplication data for CentOS?

2012-08-27 Thread Rainer Traut
Hi list,

is there any working solution for deduplication of data for centos?
We are trying to find a solution for our backup server which runs a bash 
script invoking xdelta(3). But having this functionality in fs is much 
more friendly...

We have looked into lessfs, sdfs and ddar.
Are these filesystems ready to use (on centos)?
ddar is sthg different, I know.

Thx
Rainer
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-27 Thread John Doe
From: Rainer Traut tr...@gmx.de

 is there any working solution for deduplication of data for centos?
 We are trying to find a solution for our backup server which runs a bash 
 script invoking xdelta(3). But having this functionality in fs is much 
 more friendly...
 
 We have looked into lessfs, sdfs and ddar.
 Are these filesystems ready to use (on centos)?
 ddar is sthg different, I know.

Never tried but what about zfs?

JD
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-27 Thread Rainer Traut
Am 27.08.2012 14:15, schrieb John Doe:
 From: Rainer Traut tr...@gmx.de

 is there any working solution for deduplication of data for centos?
 We are trying to find a solution for our backup server which runs a bash
 script invoking xdelta(3). But having this functionality in fs is much
 more friendly...

 We have looked into lessfs, sdfs and ddar.
 Are these filesystems ready to use (on centos)?
 ddar is sthg different, I know.

 Never tried but what about zfs?

Yeah I know it has this feature, but is there a working zfs 
implementation for linux?
Linux is a must, because the data we are backing up are Domino databases 
and also is a customer's requirement.

And btrfs has not yet implemented this feature I think.


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-27 Thread Janne Snabb
On 08/27/2012 07:23 PM, Rainer Traut wrote:

 Yeah I know it has this feature, but is there a working zfs 
 implementation for linux?

I have heard some positive feedback about http://zfsonlinux.org/ but I
have not had time to test myself yet. It probably depends on your
intended usage. It is a new in-kernel ZFS implementation (different from
the old FUSE implementation).

RHEL 6.2 x86_64 is listed as one of the supported OSes, so it probably
works fine with CentOS too.

There is some positive and negative feedback in the following links:

https://groups.google.com/a/zfsonlinux.org/group/zfs-discuss/browse_thread/thread/5a739039623f8fb1

http://pingd.org/2012/installing-zfs-raid-z-on-centos-6-2-with-ssd-caching.html

Please share your results if you do any testing :)

-- 
Janne Snabb / EPIPE Communications
sn...@epipe.com - http://epipe.com/
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-27 Thread John R Pierce
On 08/27/12 4:55 AM, Rainer Traut wrote:
 is there any working solution for deduplication of data for centos?
 We are trying to find a solution for our backup server which runs a bash
 script invoking xdelta(3). But having this functionality in fs is much
 more friendly...

BackupPC does exactly this.its not a generalized solution to 
deduplication of a file system, instead, its a backup system, designed 
to backup multiple targets, that implements deduplication on the backup 
tree it maintains.



-- 
john r pierceN 37, W 122
santa cruz ca mid-left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-27 Thread Les Mikesell
On Mon, Aug 27, 2012 at 9:23 AM, John R Pierce pie...@hogranch.com wrote:
 On 08/27/12 4:55 AM, Rainer Traut wrote:
 is there any working solution for deduplication of data for centos?
 We are trying to find a solution for our backup server which runs a bash
 script invoking xdelta(3). But having this functionality in fs is much
 more friendly...

 BackupPC does exactly this.its not a generalized solution to
 deduplication of a file system, instead, its a backup system, designed
 to backup multiple targets, that implements deduplication on the backup
 tree it maintains.

Not _exactly_, but maybe close enough and it is very easy to install
and try.   Backuppc will use rsync for transfers and thus only uses
bandwidth for the differences, but it uses hardlinks to files to dedup
the storage.  It will find and link duplicate content even from
different sources, but the complete file must be identical.  It does
not store deltas, so large files that change even slightly between
backups end up stored as complete copies (with optional compression).

-- 
   Les Mikesell
 lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-27 Thread Dean Jones
Deduplication with ZFS takes a lot of RAM.

I would not yet trust any of the linux zfs projects for data that I
wanted to keep long term.

On Mon, Aug 27, 2012 at 8:26 AM, Les Mikesell lesmikes...@gmail.com wrote:
 On Mon, Aug 27, 2012 at 9:23 AM, John R Pierce pie...@hogranch.com wrote:
 On 08/27/12 4:55 AM, Rainer Traut wrote:
 is there any working solution for deduplication of data for centos?
 We are trying to find a solution for our backup server which runs a bash
 script invoking xdelta(3). But having this functionality in fs is much
 more friendly...

 BackupPC does exactly this.its not a generalized solution to
 deduplication of a file system, instead, its a backup system, designed
 to backup multiple targets, that implements deduplication on the backup
 tree it maintains.

 Not _exactly_, but maybe close enough and it is very easy to install
 and try.   Backuppc will use rsync for transfers and thus only uses
 bandwidth for the differences, but it uses hardlinks to files to dedup
 the storage.  It will find and link duplicate content even from
 different sources, but the complete file must be identical.  It does
 not store deltas, so large files that change even slightly between
 backups end up stored as complete copies (with optional compression).

 --
Les Mikesell
  lesmikes...@gmail.com
 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-27 Thread Leon Fauster
Am 27.08.2012 um 16:23 schrieb John R Pierce:
 On 08/27/12 4:55 AM, Rainer Traut wrote:
 is there any working solution for deduplication of data for centos?
 We are trying to find a solution for our backup server which runs a bash
 script invoking xdelta(3). But having this functionality in fs is much
 more friendly...
 
 BackupPC does exactly this.its not a generalized solution to 
 deduplication of a file system, instead, its a backup system, designed 
 to backup multiple targets, that implements deduplication on the backup 
 tree it maintains.


AFAIK - bacula has deduplication capabilities.

--
LF
 
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-27 Thread Les Mikesell
On Mon, Aug 27, 2012 at 6:55 AM, Rainer Traut tr...@gmx.de wrote:

 is there any working solution for deduplication of data for centos?
 We are trying to find a solution for our backup server which runs a bash
 script invoking xdelta(3). But having this functionality in fs is much
 more friendly...


Below forwarded on behalf of mroth:

Les,

   A favor, please?  Could you post this for me? Spamhouse is bouncing me
again, this time because *they* have a bug (see below). I tried asking
Karanbir, but I guess he's not online yet

   Thanks in advance.

John R Pierce wrote:
 On 08/27/12 4:55 AM, Rainer Traut wrote:
 is there any working solution for deduplication of data for centos? We
are trying to find a solution for our backup server which runs a bash
script invoking xdelta(3). But having this functionality in fs is much
more friendly...

 BackupPC does exactly this.its not a generalized solution to
deduplication of a file system, instead, its a backup system, designed to
backup multiple targets, that implements deduplication on the backup tree
it maintains.

I've tried, twice, to suggest that a workaround that doesn't involve a
new, and possibly experimental f/s would be to use rsync with hard links,
which is what we do. There's no way we have enough disk space for 5 weeks
of terabytes of data

However, the reason I haven't been able to suggest it is that I'm being
blocked by spamhost. And when I go there, it asserts I'm listed in the
CBL. And when I go *THERE*, it tells me I'm not.

Oh, and now, when I try to go to the CBL, it's down.

I don't suppose the CentOS list has a whitelist

 mark
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-27 Thread David C. Miller
- Original Message -
 From: Rainer Traut tr...@gmx.de
 To: centos@centos.org
 Sent: Monday, August 27, 2012 4:55:03 AM
 Subject: [CentOS] Deduplication data for CentOS?
 
 Hi list,
 
 is there any working solution for deduplication of data for centos?
 We are trying to find a solution for our backup server which runs a
 bash
 script invoking xdelta(3). But having this functionality in fs is
 much
 more friendly...
 
 We have looked into lessfs, sdfs and ddar.
 Are these filesystems ready to use (on centos)?
 ddar is sthg different, I know.
 
 Thx
 Rainer

Although not open source, CrashplanPROe only costs $365 for a perpetual five 
client license. I use it to backup some of my Linux boxes. It has very good 
deduplication, compression, and encryption. For example I have 1.7TB of data on 
one linux system and another system that has 1.5TB. I NFS mount one of the 
systems to another and only use one Crashplan client to backup both data sets 
to a single backup archive. The backup archive is only 1.2TB and that also 
spans 90 days worth of file modification and deletion I can recover. 

David.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-27 Thread Brian Mathis
On Mon, Aug 27, 2012 at 7:55 AM, Rainer Traut tr...@gmx.de wrote:
 Hi list,

 is there any working solution for deduplication of data for centos?
 We are trying to find a solution for our backup server which runs a bash
 script invoking xdelta(3). But having this functionality in fs is much
 more friendly...

 We have looked into lessfs, sdfs and ddar.
 Are these filesystems ready to use (on centos)?
 ddar is sthg different, I know.

 Thx
 Rainer


This is something I have been thinking about peripherally for a while
now.  What are your impressions of SDFS (OpenDedupe)?  I had been
hoping it would be pretty good.  Any issues with it on CentOS?


❧ Brian Mathis
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Deduplication data for CentOS?

2012-08-27 Thread Adam Tauno Williams
On Mon, 2012-08-27 at 14:32 -0400, Brian Mathis wrote: 
 On Mon, Aug 27, 2012 at 7:55 AM, Rainer Traut tr...@gmx.de wrote:
  We have looked into lessfs, sdfs and ddar.
  Are these filesystems ready to use (on centos)?
  ddar is sthg different, I know.
 This is something I have been thinking about peripherally for a while
 now.  What are your impressions of SDFS (OpenDedupe)?  I had been
 hoping it would be pretty good.  Any issues with it on CentOS?

I've used it for backups; it works reliably.  It is memory hungry
however [sort of the nature of block-level deduplication].
http://www.wmmi.net/documents/OpenDedup.pdf


signature.asc
Description: This is a digitally signed message part
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos