[gentoo-user] Re: WARNING: Crucial MX300 drives SUUUUUCK!!!!

2017-03-07 Thread Kai Krakow
Am Mon, 6 Mar 2017 09:09:48 -0500
schrieb "Poison BL." :

> On Mon, Mar 6, 2017 at 2:23 AM, Kai Krakow 
> wrote:
> 
> > Am Tue, 14 Feb 2017 16:14:23 -0500
> > schrieb "Poison BL." :  
> > > I actually see both sides of it... as nice as it is to have a
> > > chance to recover the information from between the last backup
> > > and the death of the drive, the reduced chance of corrupt data
> > > from a silently failing (spinning) disk making it into backups is
> > > a bit of a good balancing point for me.  
> >
> > I've seen bordbackup giving me good protection to this. First, it
> > doesn't backup files which are already in the backup. So if data
> > silently changed, it won't make it into the backup. Second, it does
> > incremental backups. Even if something broke and made it into the
> > backup, you can eventually go back weeks or months to get back the
> > file. The algorithm is very efficient. And every incremental backup
> > is a full backup at the same time - so you thin out backup history
> > by deleting any backup at any time (so it's not like traditional
> > incremental backup which always needs the parent backup).
> >
> > OTOH, this means that every data block is only stored once. If
> > silent data corruption is hitting here, you loose the complete
> > history of this file (and maybe others using the same deduplicated
> > block).
> >
> > For the numbers, I'm storing my 1.7 TB system into a 3 TB disk
> > which is 2.2 TB full now. But the backup history is almost 1 year
> > now (daily backups).
> >
> > As a sort of protection against silent data corruption, you could
> > rsync borgbackup to a remote location. The differences are usually
> > small, so that should be a fast operation. Maybe to some cloud
> > storage or RAID protected NAS which can detect and correct silent
> > data corruption (like ZFS or btrfs based systems).
> >
> >
> > --
> > Regards,
> > Kai
> >
> > Replies to list-only preferred.
> >  
> 
> That's some impressive backup density... and I haven't looked into
> borgbackup, but it sounds like it runs on the same principles as the
> rsync+hardlink based scripts I've seen, though those will back up
> files that've silently changed, since the checksums won't match any
> more, but that won't blow away previous copies of the file either.
> I'll have to give it a try!

Borgbackup seems to check inodes to really fast get a listing of what
files changed. It only needs a few minutes to scan through millions of
files for me, rsync is way slower, and even "find" is slower I feel.
Taking a daily backup of takes usually 8-12 minutes for me (depending
on the delta), thinning the backup set from old backups takes another
1-2 minutes.

> As for protecting against the backup set itself getting silent
> corruption, an rsync to a remote location would help, but you would
> have to ensure it doesn't overwrite anything already there that
> may've changed, only create new.

Use timestamp check only in rsync, not contents check. This should work
for borgbackup as it is only creating newer files, never older.

> Also, making the initial clone would
> take ages, I suspect, since it would have to rebuild the hardlink set
> for everything (again, assuming that's the trick borgbackup's using).

No, that's not the trick. Stored files are stored as chunks. Chunks are
split based on a moving window checksumming algorithm to detect
duplicate file blocks. So, deduplication is not done at file level but
subfile level (block level with variable block sizes).

Additionally, those chunks can be compressed with lz4, gzip, and I
think xz (the latter being painfully slow of course).

> One of the best options is to house the base backup set itself on
> something like zfs or btrfs on a system with ecc ram, and maintain
> checksums of everything on the side (crc32 would likely suffice, but
> sha1's fast enough these days there's almost no excuse not to use
> it). It might be possible to task tripwire to keep tabs on that side
> of it, now that I consider it. While the filesystem itself in that
> case is trying its best to prevent issues, there's always that slim
> risk that there's a bug in the filesystem code itself that eats
> something, hence the added layer of paranoia. Also, with ZFS for the
> base data set,
> you gain in-place compression,

Is already done by borgbackup.

> dedup

Is also done by borgbackup.

> if you're feeling
> adventurous

You don't have to because you can use a more simple filesystem for
borgbackup. I'm storing on xfs and yet plan to sync to remote.

> (not really worth it unless you have multiple very
> similar backup sets for different systems), block level checksums,
> redundancy across physical disks, in place snapshots, and the ability
> to use zfs send/receive to do snapshot backups of the backup set
> itself.
> 
> I managed to corrupt some data with zfs (w/ dedup, on gentoo) shared
> out over nfs a while back on a box with way 

Re: [gentoo-user] Re: WARNING: Crucial MX300 drives SUUUUUCK!!!!

2017-03-06 Thread Poison BL.
On Mon, Mar 6, 2017 at 2:23 AM, Kai Krakow  wrote:

> Am Tue, 14 Feb 2017 16:14:23 -0500
> schrieb "Poison BL." :
> > I actually see both sides of it... as nice as it is to have a chance
> > to recover the information from between the last backup and the death
> > of the drive, the reduced chance of corrupt data from a silently
> > failing (spinning) disk making it into backups is a bit of a good
> > balancing point for me.
>
> I've seen bordbackup giving me good protection to this. First, it
> doesn't backup files which are already in the backup. So if data
> silently changed, it won't make it into the backup. Second, it does
> incremental backups. Even if something broke and made it into the
> backup, you can eventually go back weeks or months to get back the
> file. The algorithm is very efficient. And every incremental backup is
> a full backup at the same time - so you thin out backup history by
> deleting any backup at any time (so it's not like traditional
> incremental backup which always needs the parent backup).
>
> OTOH, this means that every data block is only stored once. If silent
> data corruption is hitting here, you loose the complete history of this
> file (and maybe others using the same deduplicated block).
>
> For the numbers, I'm storing my 1.7 TB system into a 3 TB disk which is
> 2.2 TB full now. But the backup history is almost 1 year now (daily
> backups).
>
> As a sort of protection against silent data corruption, you could rsync
> borgbackup to a remote location. The differences are usually small, so
> that should be a fast operation. Maybe to some cloud storage or RAID
> protected NAS which can detect and correct silent data corruption (like
> ZFS or btrfs based systems).
>
>
> --
> Regards,
> Kai
>
> Replies to list-only preferred.
>

That's some impressive backup density... and I haven't looked into
borgbackup, but it sounds like it runs on the same principles as the
rsync+hardlink based scripts I've seen, though those will back up files
that've silently changed, since the checksums won't match any more, but
that won't blow away previous copies of the file either. I'll have to give
it a try!

As for protecting against the backup set itself getting silent corruption,
an rsync to a remote location would help, but you would have to ensure it
doesn't overwrite anything already there that may've changed, only create
new. Also, making the initial clone would take ages, I suspect, since it
would have to rebuild the hardlink set for everything (again, assuming
that's the trick borgbackup's using). One of the best options is to house
the base backup set itself on something like zfs or btrfs on a system with
ecc ram, and maintain checksums of everything on the side (crc32 would
likely suffice, but sha1's fast enough these days there's almost no excuse
not to use it). It might be possible to task tripwire to keep tabs on that
side of it, now that I consider it. While the filesystem itself in that
case is trying its best to prevent issues, there's always that slim risk
that there's a bug in the filesystem code itself that eats something, hence
the added layer of paranoia. Also, with ZFS for the base data set, you gain
in-place compression, dedup if you're feeling adventurous (not really worth
it unless you have multiple very similar backup sets for different
systems), block level checksums, redundancy across physical disks, in place
snapshots, and the ability to use zfs send/receive to do snapshot backups
of the backup set itself.

I managed to corrupt some data with zfs (w/ dedup, on gentoo) shared out
over nfs a while back on a box with way too little ram a while back
(nothing important, throwaway VM images), hence the paranoia of secondary
checksum auditing and still replicating the backup set for things that
might be important.

-- 
Poison [BLX]
Joshua M. Murphy


[gentoo-user] Re: WARNING: Crucial MX300 drives SUUUUUCK!!!!

2017-03-05 Thread Kai Krakow
Am Tue, 14 Feb 2017 16:14:23 -0500
schrieb "Poison BL." :

> On Tue, Feb 14, 2017 at 3:46 PM, Daniel Frey 
> wrote:
> 
> > On 02/13/2017 10:17 AM, Poison BL. wrote:  
> > >
> > > I've had more than one spinning rust drive fail hard over the
> > > years as well, though yes, you do usually have some chance of
> > > recovery from those. Gambling on that chance by leaving a given
> > > disk as a single point of failure is still a bad idea, spinning
> > > disk or not. The point that you went from single-disk SSD back to
> > > raid10 makes me question why, if your uptime requirements (even
> > > if only for your own desires on a personal machine) justify
> > > raid10, you weren't on at least raid1 with the SSD  
> > setup.
> >
> > I finally got tired and replaced my old laptop with a ThinkPad P70,
> > and boy is it so much faster than anything else I own. Compile
> > times are crazy fast on this new laptop of mine, but it came
> > equipped with an i7 with 8 threads and 16GB of RAM, which I'm sure
> > helps A LOT.
> >
> > I'm going to get an SSD (or maybe an NVMe drive) for the new laptop
> > and leave /home on ol' reliable rust disks.
> >
> > I do have backups. That's not the concern - the concern for me was
> > turning on the PC and having it completely crap out.
> >
> > I used to have an SSD on my mythtv backend server, and it started
> > behaving strangely one day. I could not log in to the console. X
> > froze. Logged in via ssh and files appeared to be missing on the
> > root partition. Rebooted the backend server and it was completely
> > dead, no warnings or anything.
> >
> > Dan
> >
> >
> >
> >  
> I actually see both sides of it... as nice as it is to have a chance
> to recover the information from between the last backup and the death
> of the drive, the reduced chance of corrupt data from a silently
> failing (spinning) disk making it into backups is a bit of a good
> balancing point for me.

I've seen bordbackup giving me good protection to this. First, it
doesn't backup files which are already in the backup. So if data
silently changed, it won't make it into the backup. Second, it does
incremental backups. Even if something broke and made it into the
backup, you can eventually go back weeks or months to get back the
file. The algorithm is very efficient. And every incremental backup is
a full backup at the same time - so you thin out backup history by
deleting any backup at any time (so it's not like traditional
incremental backup which always needs the parent backup).

OTOH, this means that every data block is only stored once. If silent
data corruption is hitting here, you loose the complete history of this
file (and maybe others using the same deduplicated block).

For the numbers, I'm storing my 1.7 TB system into a 3 TB disk which is
2.2 TB full now. But the backup history is almost 1 year now (daily
backups).

As a sort of protection against silent data corruption, you could rsync
borgbackup to a remote location. The differences are usually small, so
that should be a fast operation. Maybe to some cloud storage or RAID
protected NAS which can detect and correct silent data corruption (like
ZFS or btrfs based systems).


-- 
Regards,
Kai

Replies to list-only preferred.