Re: HDD long-term data storage with ensured integrity

2024-05-04 Thread Marc SCHAEFER
On Fri, May 03, 2024 at 01:50:52PM -0700, David Christensen wrote:
> Thank you for devising a benchmark and posting some data.  :-)

I did not do the comparison hosted on github.  I just wrote the
script which tests the dm-integrity on dm-raid error detection
and error correction.

> FreeBSD also offers a layered solution.  From the top down:

I prefer this approach, indeed.



Re: HDD long-term data storage with ensured integrity

2024-05-03 Thread David Christensen

On 5/3/24 04:26, Marc SCHAEFER wrote:

On Mon, Apr 08, 2024 at 10:04:01PM +0200, Marc SCHAEFER wrote:

For off-site long-term offline archiving, no, I am not using RAID.


Now, as I had to think a bit about ONLINE integrity, I found this
comparison:

https://github.com/t13a/dm-integrity-benchmarks

Contenders are btrfs, zfs, and notably ext4+dm-integrity+dm-raid

I tend to have a biais favoring UNIX layered solutions against
"all-into-one" solutions, and it seems that performance-wise,
it's also quite good.

I wrote this script to convince myself of auto-correction
of the ext4+dm-integrity+dm-raid layered approach.



Thank you for devising a benchmark and posting some data.  :-)


FreeBSD also offers a layered solution.  From the top down:

* UFS2 file system, which supports snapshots (requires partitions with 
soft updates enabled).


* gpart(8) for partitions (volumes).

* graid(8) for redundancy and self-healing.

* geli(8) providers with continuous integrity checking.


AFAICT the FreeBSD stack is mature and production quality, which I find 
very appealing.  But the feature set is not as sophisticated as ZFS, 
which leaves me wanting.  Notably, I have not found a way to replicate 
UFS snapshots directly -- the best I can dream up is synchronizing a 
snapshot to a backup UFS2 filesystem and then taking a snapshot with the 
same name.



I am coming to the conclusion that the long-term survivability of data 
requires several components -- good live file system, good backups, good 
archives, continuous internal integrity checking with self-healing, 
periodic external integrity checking (e.g. mtree(1)) with some form of 
recovery (e.g. manual), etc.. If I get the other pieces right, I could 
go with OpenZFS for the live and backup systems, and worry less about 
data corruption bugs.



David



Re: HDD long-term data storage with ensured integrity

2024-05-03 Thread Michael Kjörling
On 3 May 2024 13:26 +0200, from schae...@alphanet.ch (Marc SCHAEFER):
> https://github.com/t13a/dm-integrity-benchmarks
> 
> Contenders are btrfs, zfs, and notably ext4+dm-integrity+dm-raid

ZFS' selling point is not performance, _especially_ on rotational
drives. In fact, it's fairly widely accepted that ZFS is in fact
inferior in performance compared to pretty much everything else
modern, even at the best of times; and some of its features help
mitigate its lower against-disk performance.

ZFS' value proposition lies elsewhere.

Which is fine. It's the right choice for some people; for others,
other alternatives provide better trade-offs.

-- 
Michael Kjörling  https://michael.kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”



Re: HDD long-term data storage with ensured integrity

2024-05-03 Thread Marc SCHAEFER
On Mon, Apr 08, 2024 at 10:04:01PM +0200, Marc SCHAEFER wrote:
> For off-site long-term offline archiving, no, I am not using RAID.

Now, as I had to think a bit about ONLINE integrity, I found this
comparison:

https://github.com/t13a/dm-integrity-benchmarks

Contenders are btrfs, zfs, and notably ext4+dm-integrity+dm-raid

I tend to have a biais favoring UNIX layered solutions against
"all-into-one" solutions, and it seems that performance-wise,
it's also quite good.

I wrote this script to convince myself of auto-correction
of the ext4+dm-integrity+dm-raid layered approach.

It gives:

[ ... ]
[  390.249699] md/raid1:mdX: read error corrected (8 sectors at 21064 on dm-11)
[  390.249701] md/raid1:mdX: redirecting sector 20488 to other mirror: dm-7
[  390.293807] md/raid1:mdX: dm-11: rescheduling sector 262168
[  390.293988] md/raid1:mdX: read error corrected (8 sectors at 262320 on dm-11)
[  390.294040] md/raid1:mdX: read error corrected (8 sectors at 262368 on dm-11)
[  390.294125] md/raid1:mdX: read error corrected (8 sectors at 262456 on dm-11)
[  390.294209] md/raid1:mdX: read error corrected (8 sectors at 262544 on dm-11)
[  390.294287] md/raid1:mdX: read error corrected (8 sectors at 262624 on dm-11)
[  390.294586] md/raid1:mdX: read error corrected (8 sectors at 263000 on dm-11)
[  390.294712] md/raid1:mdX: redirecting sector 262168 to other mirror: dm-7

pretty much convicing.

So after testing btrfs and being not convinced, after doing some test on
a production zfs -- not convinced either -- I am going to ry
ext4+dm-integrity+dm-raid. 

#! /bin/bash

set -e

function create_lo {
   local f

   f=$(losetup -f)

   losetup $f $1
   echo $f
}

# beware of the rm -r below!
tmp_dir=/tmp/$(basename $0)
mnt=/mnt

mkdir $tmp_dir

declare -a pvs
for p in pv1 pv2
do
   truncate -s 250M $tmp_dir/$p
   
   l=$(create_lo $tmp_dir/$p)
   
   pvcreate $l
   
   pvs+=($l)
done

vg=$(basename $0)-test
lv=test

vgcreate $vg ${pvs[*]}

vgdisplay $vg

lvcreate --type raid1 --raidintegrity y -m 1 -L 200M -n $lv $vg

lvdisplay $vg

# sync/integrity complete?
sleep 10
cat /proc/mdstat
echo
lvs -a -o name,copy_percent,devices $vg
echo
echo -n Type ENTER
read ignore

mkfs.ext4 -I 256 /dev/$vg/$lv
mount /dev/$vg/$lv $mnt

for f in $(seq 1 10)
do
   # ignore errors
   head -c 20M < /dev/random > $mnt/f_$f || true
done

(cd $mnt && find . -type f -print0 | xargs -0 md5sum > $tmp_dir/MD5SUMS)

# corrupting some data in one PV
count=5000
blocks=$(blockdev --getsz ${pvs[1]})
if [ $blocks -lt 32767 ]; then
   factor=1
else
   factor=$(( ($blocks - 1) / 32767))
fi

p=1
for i in $(seq 1 $count)
do
  offset=$(($RANDOM * $factor))
  echo ${pvs[$p]} $offset
  dd if=/dev/random of=${pvs[$p]} bs=$(blockdev --getpbsz ${pvs[$p]}) 
seek=$offset count=1
  # only doing on 1, not 0, since we have no way to avoid destroying the same 
sector!
  #p=$((1 - p))
done

dd if=/dev/$vg/$lv of=/dev/null bs=32M
dmesg | tail

umount $mnt

lvremove -y $vg/$lv

vgremove -y $vg

for p in ${pvs[*]}
do
   pvremove $p
   losetup -d $p
done

rm -r $tmp_dir



Re: HDD long-term data storage with ensured integrity

2024-04-12 Thread David Christensen

On 4/12/24 08:14, piorunz wrote:

On 10/04/2024 12:10, David Christensen wrote:

Those sound like some compelling features.


I believe the last time I tried Btrfs was Debian 9 (?).  I ran into
problems because I did not do the required manual maintenance
(rebalancing).  Does the Btrfs in Debian 11 or Debian 12 still require
manual maintenance?  If so, what and how often?


I don't do balance at all, it's not required.

Scrub is recommended, because it will detect any bit-rot due to hardware
errors on HDD media. It scans the entire surface of allocated sectors on
all drives. I do scrub usually monthly.



Thank you for the information.


David




Re: HDD long-term data storage with ensured integrity

2024-04-12 Thread piorunz

On 10/04/2024 12:10, David Christensen wrote:

Those sound like some compelling features.


I believe the last time I tried Btrfs was Debian 9 (?).  I ran into
problems because I did not do the required manual maintenance
(rebalancing).  Does the Btrfs in Debian 11 or Debian 12 still require
manual maintenance?  If so, what and how often?


I don't do balance at all, it's not required.

Scrub is recommended, because it will detect any bit-rot due to hardware
errors on HDD media. It scans the entire surface of allocated sectors on
all drives. I do scrub usually monthly.

--
With kindest regards, Piotr.

⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org/
⠈⠳⣄



Re: HDD long-term data storage with ensured integrity

2024-04-10 Thread David Christensen

On 4/10/24 08:49, Paul Leiber wrote:

Am 10.04.2024 um 13:10 schrieb David Christensen:
Does the Btrfs in Debian 11 or Debian 12 still require 
manual maintenance?  If so, what and how often?


Scrub and balance are actions which have been recommended. I am using 
btrfsmaintenance scripts [1][2] to automate this. I am doing a weekly 
balance and a monthly scrub. After some reading today, I am getting 
unsure if this is approach is correct, especially if balance is 
necessary anymore (it usually doesn't find anything to do anyway), so 
please take these periods with caution. My main message is that such 
operations can be automated using the linked scripts.


Best regards,

Paul

[1] https://packages.debian.org/bookworm/btrfsmaintenance
[2] https://github.com/kdave/btrfsmaintenance



Thank you.  Those scripts should be useful.


David



Re: HDD long-term data storage with ensured integrity

2024-04-10 Thread Paul Leiber

Am 10.04.2024 um 13:10 schrieb David Christensen:

On 4/9/24 17:08, piorunz wrote:

On 02/04/2024 13:53, David Christensen wrote:


Does anyone have any comments or suggestions regarding how to use
magnetic hard disk drives, commodity x86 computers, and Debian for
long-term data storage with ensured integrity?


I use Btrfs, on all my systems, including some servers, with soft Raid1
and Raid10 modes (because these modes are considered stable and
production ready). I decided on Btrfs not ZFS, because Btrfs allows to
migrate drives on the fly while partition is live and heavily used,
replace them with different sizes and types, mixed capacities, change
Raid levels, change amount of drives too. I could go from single drive
to Raid10 on 4 drives and back while my data is 100% available at all 
times.

It saved my bacon many times, including hard checksum corruption on NVMe
drive which otherwise I would never know about. Thanks to Btrfs I
located the corrupted files, fixed them, got hardware replaced under
warranty.
Also helped with corrupted RAM: Btrfs just refused to save file because
saved copy couldn't match read checksum from the source due to RAM bit
flips. Diagnosed, then replaced memory, all good.
I like a lot when one of the drives get ATA reset for whatever reason,
and all other drives continue to read and write, I can continue using
the system for hours, if I even notice. Not possible in normal
circumstances without Raid. Once the problematic drive is back, or after
reboot if it's more serious, then I do "scrub" command and everything is
resynced again. If I don't do that, then Btrfs dynamically correct
checksum errors on the fly anyway.
And list goes on - I've been using Btrfs for last 5 years, not a single
problem to date, it survived hard resets, power losses, drive failures,
countless migrations.



Those sound like some compelling features.


I believe the last time I tried Btrfs was Debian 9 (?).  I ran into 
problems because I did not do the required manual maintenance 
(rebalancing).  Does the Btrfs in Debian 11 or Debian 12 still require 
manual maintenance?  If so, what and how often?


Scrub and balance are actions which have been recommended. I am using 
btrfsmaintenance scripts [1][2] to automate this. I am doing a weekly 
balance and a monthly scrub. After some reading today, I am getting 
unsure if this is approach is correct, especially if balance is 
necessary anymore (it usually doesn't find anything to do anyway), so 
please take these periods with caution. My main message is that such 
operations can be automated using the linked scripts.


Best regards,

Paul

[1] https://packages.debian.org/bookworm/btrfsmaintenance
[2] https://github.com/kdave/btrfsmaintenance



Re: HDD long-term data storage with ensured integrity

2024-04-10 Thread Curt
On 2024-04-10, David Christensen  wrote:
>> 
>> I use Btrfs, on all my systems, including some servers, with soft Raid1
>> and Raid10 modes (because these modes are considered stable and
>> production ready). I decided on Btrfs not ZFS, because Btrfs allows to
>> migrate drives on the fly while partition is live and heavily used,
>> replace them with different sizes and types, mixed capacities, change
>> Raid levels, change amount of drives too. I could go from single drive
>> to Raid10 on 4 drives and back while my data is 100% available at all 
>> times.
>> It saved my bacon many times, including hard checksum corruption on NVMe
>> drive which otherwise I would never know about. Thanks to Btrfs I
>> located the corrupted files, fixed them, got hardware replaced under
>> warranty.
>> Also helped with corrupted RAM: Btrfs just refused to save file because
>> saved copy couldn't match read checksum from the source due to RAM bit
>> flips. Diagnosed, then replaced memory, all good.
>> I like a lot when one of the drives get ATA reset for whatever reason,
>> and all other drives continue to read and write, I can continue using
>> the system for hours, if I even notice. Not possible in normal
>> circumstances without Raid. Once the problematic drive is back, or after
>> reboot if it's more serious, then I do "scrub" command and everything is
>> resynced again. If I don't do that, then Btrfs dynamically correct
>> checksum errors on the fly anyway.
>> And list goes on - I've been using Btrfs for last 5 years, not a single
>> problem to date, it survived hard resets, power losses, drive failures,
>> countless migrations.
>
>
> Those sound like some compelling features.

I don't believe in immortality. After many a summer dies the swan.



Re: HDD long-term data storage with ensured integrity

2024-04-10 Thread David Christensen

On 4/9/24 17:08, piorunz wrote:

On 02/04/2024 13:53, David Christensen wrote:


Does anyone have any comments or suggestions regarding how to use
magnetic hard disk drives, commodity x86 computers, and Debian for
long-term data storage with ensured integrity?


I use Btrfs, on all my systems, including some servers, with soft Raid1
and Raid10 modes (because these modes are considered stable and
production ready). I decided on Btrfs not ZFS, because Btrfs allows to
migrate drives on the fly while partition is live and heavily used,
replace them with different sizes and types, mixed capacities, change
Raid levels, change amount of drives too. I could go from single drive
to Raid10 on 4 drives and back while my data is 100% available at all 
times.

It saved my bacon many times, including hard checksum corruption on NVMe
drive which otherwise I would never know about. Thanks to Btrfs I
located the corrupted files, fixed them, got hardware replaced under
warranty.
Also helped with corrupted RAM: Btrfs just refused to save file because
saved copy couldn't match read checksum from the source due to RAM bit
flips. Diagnosed, then replaced memory, all good.
I like a lot when one of the drives get ATA reset for whatever reason,
and all other drives continue to read and write, I can continue using
the system for hours, if I even notice. Not possible in normal
circumstances without Raid. Once the problematic drive is back, or after
reboot if it's more serious, then I do "scrub" command and everything is
resynced again. If I don't do that, then Btrfs dynamically correct
checksum errors on the fly anyway.
And list goes on - I've been using Btrfs for last 5 years, not a single
problem to date, it survived hard resets, power losses, drive failures,
countless migrations.



Those sound like some compelling features.


I believe the last time I tried Btrfs was Debian 9 (?).  I ran into 
problems because I did not do the required manual maintenance 
(rebalancing).  Does the Btrfs in Debian 11 or Debian 12 still require 
manual maintenance?  If so, what and how often?




[1] https://github.com/openzfs/zfs/issues/15526

[2] https://github.com/openzfs/zfs/issues/15933


Problems reported here are from Linux kernel 6.5 and 6.7 on Gentoo
system. Does this even affects Debian Stable with 6.1 LTS?



I do not know.



--
With kindest regards, Piotr.

⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org/
⠈⠳⣄



David



Re: HDD long-term data storage with ensured integrity

2024-04-09 Thread piorunz

On 02/04/2024 13:53, David Christensen wrote:


Does anyone have any comments or suggestions regarding how to use
magnetic hard disk drives, commodity x86 computers, and Debian for
long-term data storage with ensured integrity?


I use Btrfs, on all my systems, including some servers, with soft Raid1
and Raid10 modes (because these modes are considered stable and
production ready). I decided on Btrfs not ZFS, because Btrfs allows to
migrate drives on the fly while partition is live and heavily used,
replace them with different sizes and types, mixed capacities, change
Raid levels, change amount of drives too. I could go from single drive
to Raid10 on 4 drives and back while my data is 100% available at all times.
It saved my bacon many times, including hard checksum corruption on NVMe
drive which otherwise I would never know about. Thanks to Btrfs I
located the corrupted files, fixed them, got hardware replaced under
warranty.
Also helped with corrupted RAM: Btrfs just refused to save file because
saved copy couldn't match read checksum from the source due to RAM bit
flips. Diagnosed, then replaced memory, all good.
I like a lot when one of the drives get ATA reset for whatever reason,
and all other drives continue to read and write, I can continue using
the system for hours, if I even notice. Not possible in normal
circumstances without Raid. Once the problematic drive is back, or after
reboot if it's more serious, then I do "scrub" command and everything is
resynced again. If I don't do that, then Btrfs dynamically correct
checksum errors on the fly anyway.
And list goes on - I've been using Btrfs for last 5 years, not a single
problem to date, it survived hard resets, power losses, drive failures,
countless migrations.


[1] https://github.com/openzfs/zfs/issues/15526

[2] https://github.com/openzfs/zfs/issues/15933


Problems reported here are from Linux kernel 6.5 and 6.7 on Gentoo
system. Does this even affects Debian Stable with 6.1 LTS?

--
With kindest regards, Piotr.

⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org/
⠈⠳⣄



Re: HDD long-term data storage with ensured integrity

2024-04-08 Thread David Christensen

On 4/8/24 13:04, Marc SCHAEFER wrote:

Hello,

On Mon, Apr 08, 2024 at 11:28:04AM -0700, David Christensen wrote:

So, an ext4 file system on an LVM logical volume?

Why LVM?  Are you implementing redundancy (RAID)?  Is your data larger than
a single disk (concatenation/ JBOD)?  Something else?


For off-site long-term offline archiving, no, I am not using RAID.

No, it's not LVM+md, just plain LVM for flexibility.

Typically I use 16 TB hard drives, and I tend to use one LV per data
source, the LV name being the data source and the date of the copy.
Or sometimes I just copy a raw volume (ext4 or something else)
to a LV.

With smaller drives (4 TB) I tend to not use LVM, just plain ext4 on the
raw disk.

I almost never use partitionning.

However, I tend to use luks encryption (per ext4 filesystem) when the
drives are stored off-site.  So it's either LVM -> LV -> LUKS -> ext4
or raw disk -> LUKS -> ext4.

You can find some of the scripts I use to automate this off-site
long-term archiving here:

https://git.alphanet.ch/gitweb/?p=various;a=tree;f=offsite-archival/LVM-LUKS



Thank you for the clarification.  :-)


David



Re: Why LVM (was: HDD long-term data storage with ensured integrity)

2024-04-08 Thread DdB
Am 08.04.2024 um 23:08 schrieb Stefan Monnier:
> David Christensen [2024-04-08 11:28:04] wrote:
>> Why LVM?
> 
> Personally, I've been using LVM everywhere I can (i.e. everywhere
> except on my OpenWRT router, tho I've also used LVM there back when my
> router had an HDD.  I also use LVM on my 2GB USB rescue image).
> 
> To me the question is rather the reverse: why not?
> I basically see it as a more flexible form of partitioning.

As an LVM-newbie (never used it before, i am more familar with ZFS), i
did already collect quite a bit of misconceptions of mine/design
problems with lvm. Therefore i would rather renew the question: Why?

Just one example:
In order to be able to use thin snapshots on my root partition, i did
every thing i could, to have it inside a thinpool... until i noticed
some weird problems booting from it (attributed to grub), so i setup a
/boot outside, but the problems stayed (due to lvm's limitations).

I came to use it to gain some flexibility (although it is an experiment)
and found myself setting up zfs for its data integrity + flexibility,
just to have a quality backup of the lvm-volume(s) on a zfs pool.

> 
> Even in the worst cases where I have a single LV volume, I appreciate
> the fact that it forces me to name things, isolating me from issue
> linked to predicting the name of the device and the issues that plague
> UUIDs (the fact they're hard to remember, and that they're a bit too
> magical/hidden for my taste, so they sometimes change when I don't want
> them to and vice versa).

Even GPT brings you the chance to name hings (like part_label), only it
does not force you. But i have been using that for 10+ years as a routine.

DdB



Why LVM (was: HDD long-term data storage with ensured integrity)

2024-04-08 Thread Stefan Monnier
David Christensen [2024-04-08 11:28:04] wrote:
> Why LVM?

Personally, I've been using LVM everywhere I can (i.e. everywhere
except on my OpenWRT router, tho I've also used LVM there back when my
router had an HDD.  I also use LVM on my 2GB USB rescue image).

To me the question is rather the reverse: why not?
I basically see it as a more flexible form of partitioning.

Even in the worst cases where I have a single LV volume, I appreciate
the fact that it forces me to name things, isolating me from issue
linked to predicting the name of the device and the issues that plague
UUIDs (the fact they're hard to remember, and that they're a bit too
magical/hidden for my taste, so they sometimes change when I don't want
them to and vice versa).


Stefan



Re: HDD long-term data storage with ensured integrity

2024-04-08 Thread Marc SCHAEFER
Hello,

On Mon, Apr 08, 2024 at 11:28:04AM -0700, David Christensen wrote:
> So, an ext4 file system on an LVM logical volume?
> 
> Why LVM?  Are you implementing redundancy (RAID)?  Is your data larger than
> a single disk (concatenation/ JBOD)?  Something else?

For off-site long-term offline archiving, no, I am not using RAID.

No, it's not LVM+md, just plain LVM for flexibility.

Typically I use 16 TB hard drives, and I tend to use one LV per data
source, the LV name being the data source and the date of the copy.
Or sometimes I just copy a raw volume (ext4 or something else)
to a LV.

With smaller drives (4 TB) I tend to not use LVM, just plain ext4 on the
raw disk.

I almost never use partitionning.

However, I tend to use luks encryption (per ext4 filesystem) when the
drives are stored off-site.  So it's either LVM -> LV -> LUKS -> ext4
or raw disk -> LUKS -> ext4.

You can find some of the scripts I use to automate this off-site
long-term archiving here:

https://git.alphanet.ch/gitweb/?p=various;a=tree;f=offsite-archival/LVM-LUKS



Re: HDD long-term data storage with ensured integrity

2024-04-08 Thread David Christensen

On 4/8/24 02:38, Marc SCHAEFER wrote:

For offline storage:

On Tue, Apr 02, 2024 at 05:53:15AM -0700, David Christensen wrote:

Does anyone have any comments or suggestions regarding how to use magnetic
hard disk drives, commodity x86 computers, and Debian for long-term data
storage with ensured integrity?


I use LVM on ext4, and I add a MD5SUMS file at the root.

I then power up the drives at least once a year and check the MD5SUMS.

A simple CRC could also work, obviously.

So far, I have not detected MORE corruption with this method than the
drive ECC itself (current drives & buses are much better than they
used to be).  When I have errors detected, I replace the file with
another copy (I usually have multiple off-site copies, and sometimes
even on-site online copies, but not always).  When the errors add
up, it is time to buy another drive, usually after 5+ years or
even sometimes 10+ years.

So, just re-reading the content might be enough, once a year or so.

This is for HDD (for SDD I have no offline storage experience, it
could be shorter).



Thank you for the reply.


So, an ext4 file system on an LVM logical volume?


Why LVM?  Are you implementing redundancy (RAID)?  Is your data larger 
than a single disk (concatenation/ JBOD)?  Something else?



David



Re: HDD long-term data storage with ensured integrity

2024-04-08 Thread Marc SCHAEFER
For offline storage:

On Tue, Apr 02, 2024 at 05:53:15AM -0700, David Christensen wrote:
> Does anyone have any comments or suggestions regarding how to use magnetic
> hard disk drives, commodity x86 computers, and Debian for long-term data
> storage with ensured integrity?

I use LVM on ext4, and I add a MD5SUMS file at the root.

I then power up the drives at least once a year and check the MD5SUMS.

A simple CRC could also work, obviously.

So far, I have not detected MORE corruption with this method than the
drive ECC itself (current drives & buses are much better than they
used to be).  When I have errors detected, I replace the file with
another copy (I usually have multiple off-site copies, and sometimes
even on-site online copies, but not always).  When the errors add
up, it is time to buy another drive, usually after 5+ years or
even sometimes 10+ years.

So, just re-reading the content might be enough, once a year or so.

This is for HDD (for SDD I have no offline storage experience, it
could be shorter).



Re: HDD long-term data storage with ensured integrity

2024-04-03 Thread Jonathan Dowland
On Tue Apr 2, 2024 at 10:57 PM BST, David Christensen wrote:
> AIUI neither LVM nor ext4 have data and metadata checksum and correction 
> features.  But, it should be possible to achieve such by including 
> dm-integrity (for checksumming) and some form of RAID (for correction) 
> in the storage stack.  I need to explore that possibility further.

It would be nice to have checksumming and parity stuff in the filesystem
layer, as BTRFS and XFS offer, but failing that, you can do it above
that layer using tried-and-tested tools such as sha1sum, par2, etc.

I personally would not rely upon RAID for anything except availability.
My advice is once you've detected corruption, which is exceedingly rare,
restore from backup.

-- 
Please do not CC me for listmail.

  Jonathan Dowland
✎j...@debian.org
   https://jmtd.net



Re: HDD long-term data storage with ensured integrity

2024-04-03 Thread David Christensen

On 4/2/24 14:57, David Christensen wrote:
AIUI neither LVM nor ext4 have data and metadata checksum and correction 
features.  But, it should be possible to achieve such by including 
dm-integrity (for checksumming) and some form of RAID (for correction) 
in the storage stack.  I need to explore that possibility further.



I have RTFM dm-integrity before and it is still experimental.  I need 
something that is production ready:


https://manpages.debian.org/bookworm/cryptsetup-bin/cryptsetup.8.en.html

Authenticated disk encryption (EXPERIMENTAL)


David



Re: HDD long-term data storage with ensured integrity

2024-04-02 Thread David Christensen

On 4/2/24 06:55, Stefan Monnier wrote:

The most obvious alternative to ZFS on Debian would be Btrfs.  Does anyone
have any comments or suggestions regarding Btrfs and data corruption bugs,
concurrency, CMM level, PSP, etc.?


If you're worried about such things, I'd think "the most obvious
alternative" is LVM+ext4.  Both Btrfs and ZFS share the same underlying
problem: more features => more code => more bugs.


 Stefan



AIUI neither LVM nor ext4 have data and metadata checksum and correction 
features.  But, it should be possible to achieve such by including 
dm-integrity (for checksumming) and some form of RAID (for correction) 
in the storage stack.  I need to explore that possibility further.



David



Re: HDD long-term data storage with ensured integrity

2024-04-02 Thread Stefan Monnier
> The most obvious alternative to ZFS on Debian would be Btrfs.  Does anyone
> have any comments or suggestions regarding Btrfs and data corruption bugs,
> concurrency, CMM level, PSP, etc.?

If you're worried about such things, I'd think "the most obvious
alternative" is LVM+ext4.  Both Btrfs and ZFS share the same underlying
problem: more features => more code => more bugs.


Stefan



HDD long-term data storage with ensured integrity

2024-04-02 Thread David Christensen

On 3/31/24 02:18, DdB wrote:
> i intend to create a huge backup server from some oldish hardware.
> Hardware has been partly refurbished and offers 1 SSD + 8 HDD on a
> 6core Intel with 64 GB RAM. ... the [Debian] installer ... aborts.

On 4/1/24 11:35, DdB wrote:
> A friend of mine just let me use an external CD-Drive with the netboot
> image. ... all is well.


Now you get to solve the same problem I have been stuck on since last 
November -- how to use those HDD's.



ZFS has been my bulk storage solution of choice for the past ~4 years, 
but the recent data corruption bugs [1, 2] have me worried.  From a 
technical perspective, it's about incorrect concurrent execution of GNU 
cp(1), Linux, and/or OpenZFS.  From a management perspective, it's about 
Capability Maturity Model (CMM) [3] and Programming Systems Product 
(PSP) [4].



The most obvious alternative to ZFS on Debian would be Btrfs.  Does 
anyone have any comments or suggestions regarding Btrfs and data 
corruption bugs, concurrency, CMM level, PSP, etc.?



Does anyone have any comments or suggestions regarding how to use 
magnetic hard disk drives, commodity x86 computers, and Debian for 
long-term data storage with ensured integrity?



David


[1] https://github.com/openzfs/zfs/issues/15526

[2] https://github.com/openzfs/zfs/issues/15933

[3] https://en.wikipedia.org/wiki/Capability_maturity_model

[4] https://en.wikipedia.org/wiki/The_Mythical_Man-Month



large, shared data storage ???

2002-10-01 Thread Michael D. Schleif


Where can I find information to setup and run large disk arrays to be
shared across multiple debian servers?

We need to review all open source options; but, will also consider
pointers to viable pay resources . . .

-- 

Best Regards,

mds
mds resource
888.250.3987

Dare to fix things before they break . . .

Our capacity for understanding is inversely proportional to how much we
think we know.  The more I know, the more I know I don't know . . .


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Data Storage

1999-02-25 Thread Peter Ludwig

On Thu, 25 Feb 1999, Stephen Lavelle wrote:

 We are soon going to be installing a Linux Box on our Win98 network as a
 file server -

You'll notice a vast difference :)  Plug here for Linux sponsored by
no-one.

 and i want to know of a good back up media supported by debian and easy to
 configure:
 something like - zip or jazz drives.

Well, I run an LS-120 (from Imation, external) drive on my machine here at
home, I've updated the kernel to 2.2.1, but if you just run the internel
version, it works fine with earlier kernel versions (i.e. 2.0.34 the base
kernel that comes with hamm recognised it and allowed me to use it okay
internally, but I wanted it external so I upgraded grin).

The external (i.e. parrallel) versions of both the jazz and zip drives are
also supported by kernel 2.2.1, also I believe that you might be able to
use any number of external Tape drives.

The number one rule of buying hardware for linux is simply this :-

Do not buy any hardware for a machine that is to run Linux permanently
that has or is called Windows-Specific/Wininsert-name-here.  This
sort of hardware is unable to run under the linux operating system
(normally) as the required information for writing the driver is not
always released public domain, as such, it cannot be distributed, ergo no
linux driver.

If you are looking for a particular style of hardware, try looking at the
hardware supported list at either www.debian.org, or more importantly (as
you can always update the kernel if you need to) www.kernel.org, or
www.linux.org.

I noticed that you are from Australia (as I am), if you are in Brisbane,
if you want I can go give you a list of people who have good priced
hardware (which runs very well under linux) for sale.

Catch ya l8r,
Peter Ludwig



Data Storage

1999-02-24 Thread Stephen Lavelle
We are soon going to be installing a Linux Box on our Win98 network as a
file server -
and i want to know of a good back up media supported by debian and easy to
configure:
something like - zip or jazz drives.
Any suggestions?
Regards,
Stephen Lavelle
Austanners Wet Blue Pty Ltd.
~ Australian Tanned Wet Blue Leather ~
110 Heales Road,
Lara, Geelong, Australia
3212
Tel:++(03)52742232
Fax:++(03)52742350
mailto:[EMAIL PROTECTED]
The information contained in this email is privileged and confidential
and intended for the addressee only. If you are not the intended
recipient, you are asked to respect that confidentiality and not
disclose, copy or make use of its contents. If received in error you are
asked to destroy this email and contact the sender immediately. Your
assistance is appreciated.



Re: Data Storage

1999-02-24 Thread Wojciech Zabolotny
On Thu, 25 Feb 1999, Stephen Lavelle wrote:
 We are soon going to be installing a Linux Box on our Win98 network as a
 file server -
 and i want to know of a good back up media supported by debian and easy to
 configure:
 something like - zip or jazz drives.
 Any suggestions?

What about CD-recorder? It has one big advantage - no way to overwrite old
backups...
Wojtek Zabolotny
[EMAIL PROTECTED]