Re: Backup Times on a Linux desktop

2019-11-12 Thread Charles Curley
Thanks for the feedback.

On Tue, 05 Nov 2019 23:35:05 +0100
Linux-Fan  wrote:

> Charles Curley writes:
> 

> > https://charlescurley.com/blog/posts/2019/Nov/02/backups-on-linux/index.html
> >   
> 
> [...]
> 
> Thanks for sharing! I appreciate that I am not the only one with a
> backup system composed of multiple tools with different timings and
> occasions of invocation :)

They just metastasize over the years.

> 
> One point where my opinion is slightly different (might boil down to
> taste, but that's part of the feedback?). Quoting from the blog:
> 
> > Some stuff isn't worth the disk space to back up because you can
> > regenerate it or re-install it just as easily. Caches, such as a web
> > proxy's. Executables you can re-install, like your office suite.  
> 
> I personally think it is (especially today) not so easy to keep track
> of all the programs one actually needs and where to get them.
> Additionally, one should take into consideration, whether the
> avaiability of Internet access (needed for software re-installation
> unless other measures are taken) is really part of the assumptions
> for backup restoring? I try to put some effort into
> 100%-offline-restoration.

I see your point. I certainly expect to do bare metal restoration with
local resources only, and see to it that everything I need to do that
is available.

> 
> At the same time, I try to avoid "disk image"-style backups, because
> they are hard to make (usually the system needs to be offline for
> this) and they are hard to restore: What if my server with 4x2T HDDs
> just dies. By tomorrow, I will not have another server, a humble
> laptop with 500 GB HDD might be all there is for the moment.
> Restoring images is infeasible in that situation, a normal
> "reinstallation" is less (but might be: consider borrowing a computer
> frome someone else for some time. In that case it will likely be
> impossible to change the OS and thus the software installation might
> be limited...)

Both good points. That is part of your disaster recovery planning: can
you get a replacement box quickly enough. I've had clients buy a spare
box and keep it off site. Another key part of disaster recovery is: how
quickly do you have to be back up and running?

One reason I like amanda is that you can restore without having amanda
on the machine with the files. I've never actually had to do that,
fortunately. Also amanda lets you select individual files to restore.


> 
> YMMV
> Linux-Fan
> 



-- 
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/



Re: Backup Times on a Linux desktop

2019-11-05 Thread Linux-Fan

Charles Curley writes:


On Sat, 02 Nov 2019 20:24:52 +0100
Konstantin Nebel  wrote:


[...]


> So now I am thinking. How should I approach backups. On windows it
> does magically backups and remind me when they didnt run for a while.
> I like that attitude.
>
> On linux with all that decision freedom it can be good and bad cause
> you have to think about things :D

I started writing a reply to this several days ago, and realized it
would make a good blog entry. I'd appreciate feedback.

https://charlescurley.com/blog/posts/2019/Nov/02/backups-on-linux/index.html


[...]

Thanks for sharing! I appreciate that I am not the only one with a backup
system composed of multiple tools with different timings and occasions of
invocation :)

One point where my opinion is slightly different (might boil down to taste,
but that's part of the feedback?). Quoting from the blog:


Some stuff isn't worth the disk space to back up because you can
regenerate it or re-install it just as easily. Caches, such as a web
proxy's. Executables you can re-install, like your office suite.


I personally think it is (especially today) not so easy to keep track of all
the programs one actually needs and where to get them. Additionally, one
should take into consideration, whether the avaiability of Internet access
(needed for software re-installation unless other measures are taken) is
really part of the assumptions for backup restoring? I try to put some
effort into 100%-offline-restoration.

At the same time, I try to avoid "disk image"-style backups, because they
are hard to make (usually the system needs to be offline for this) and they
are hard to restore: What if my server with 4x2T HDDs just dies. By
tomorrow, I will not have another server, a humble laptop with 500 GB HDD
might be all there is for the moment. Restoring images is infeasible in that
situation, a normal "reinstallation" is less (but might be: consider
borrowing a computer frome someone else for some time. In that case it will
likely be impossible to change the OS and thus the software installation
might be limited...)

YMMV
Linux-Fan



Re: Backup Times on a Linux desktop

2019-11-05 Thread Charles Curley
On Sat, 02 Nov 2019 20:24:52 +0100
Konstantin Nebel  wrote:

> Now i attached a 4 tb drive to my pi and I decided what the heck, why
> not doing backups now.
> 
> So now I am thinking. How should I approach backups. On windows it
> does magically backups and remind me when they didnt run for a while.
> I like that attitude.
> 
> On linux with all that decision freedom it can be good and bad cause
> you have to think about things :D

I started writing a reply to this several days ago, and realized it
would make a good blog entry. I'd appreciate feedback.

https://charlescurley.com/blog/posts/2019/Nov/02/backups-on-linux/index.html

-- 
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/


pgpO87Gr8WjdW.pgp
Description: OpenPGP digital signature


Re: Backup Times on a Linux desktop

2019-11-05 Thread Stefan Monnier
> On linux with all that decision freedom it can be good and bad cause you have
> to think about things :D

All the answers I've seen mention the use of "cron" but I'm not sure
what they mean by that, nor am I sure what is your typical use of the
desktop (e.g. is it always ON?), so I think it's worth mentioning the
use of /etc/cron.daily and /etc/cron.weekly as well as `anacron`
(nowadays provided by `systemd-cron`) which will run those tasks "when
possible" rather than at fixed times.

Regarding backup software I use `bup` for some systems and `rsync`
for others.


Stefan



Re: Backup Times on a Linux desktop

2019-11-05 Thread Stefan Monnier
> Suppose that you backup 2000 files in a day and inside this backup a chunk
> is deduped and referenced by 300 files. If the deduped chunk is broken
> I think you will lost it on 300 referenced files/chunks. This is not good
> for me.

I don't know what other backup software does, but at least `bup`
addresses this risk by recommending the use of `par2` (and
of course Git's content-addresses storage makes it easy to detect
corruption).


Stefan



Re: Backup Times on a Linux desktop

2019-11-05 Thread Alessandro Baggi

On 04/11/19 20:43, deloptes wrote:

Alessandro Baggi wrote:


If I'm not wrong deduplication "is a technique for eliminating duplicate
copies of repeating data".

I'm not a borg expert and it performs deduplication on data chunk.

Suppose that you backup 2000 files in a day and inside this backup a
chunk is deduped and referenced by 300 files. If the deduped chunk is
broken I think you will lost it on 300 referenced files/chunks. This is
not good for me.



Look at the explanation by Linux-Fan. I think it is pretty good. It fits one
scenario, however if your backup system (disks or whatever) is broken - it
can not be considered as backup system at all.



Linux-Fan reply is interesting but there is not nothing new for me.


I think deduplication is a great thing nowdays - People need to backup TBs,
take care of retention etc. I do not share your concerns at all.


if your main dataset has a broken file, no problem, you can recovery
from backups.

If your saved deduped chunk is broken all files that has reference to it
could be broken. I think also that the same chunk will be used for
successive backups (always for deduplication) so this single chunk could
be used from backup1 to backupN.



This is not true.



What is not true?
The same single chunk will not be used inside other backups? So dedup 
chunk is related only to one backup?




It has also integrity check but don't know if check this. I read also
that integrity check on bigsized dataset could require too much time.

In my mind a backup is a copy of file in window time and if needed in
another window time another copy could be picked but it could not be a
reference to a previous copy. Today there are people that make backups
on tape (expensive) for reliability. I run backups on disks. Disks are
cheap so compression (that require time in backup and restore) and
deduplication (that add complexity) are not needed for me and they don't
affect really my free disk space because I can add a disk.



I think it depends how far you want to go - how precious is the data.
Magnetic disk and tapes can be destroyed by EMP or similar. SSD despite its
price can fail and if it fails - it can not recover anything.
So ... there are some rules in securely preserving backups - but all of this
is very expensive.



EMP or similar? You are right but I have seen only one case in my 
experience where a similar event broken a memory and was a laptop disk 
near a radar station. How many times could happen this?



Rsnapshot uses hardlink that is similar.

All this solutions are valid if them fit your needs. You must choose how
important are data inside your backups and if losing a chunk deduped
could make damage to your backup dataset in a timeline.



No unless the corruption is on the backup server, but if it happens ... well
you should consider the backup server broken - I do not think it has
anything with deduplication.


Ah if you have multiple server to backup, I prefer bacula because can
pull data from hosts and can backup multiple server from the same point
(maybe using for each client a separated bacula-sd daemon with dedicated
storage).



.





Re: Backup Times on a Linux desktop

2019-11-04 Thread Charles Curley
On Mon, 4 Nov 2019 06:01:54 -1000
Joel Roth  wrote:

> These days I use rsync with the --link-dest option to make
> complete Time-Machine(tm) style backups using hardlinks to
> avoid file duplication in the common case.  In this
> scenario, the top-level directory is typically named based
> on date and time, e.g. back-2019.11.04-05:32:06.

Take a look at rsnapshot. You have pretty well described it.


> 
> I usually make backups while the system is running, although
> I'm not sure it's considered kosher. It takes around 10% of
> CPU on my i5 system.

It's kosher except in a few places where referential integrity is an
issue. The classic here is a database that extends across multiple
files, which means almost all of them.

Referential integrity means keeping the data consistent. Suppose you
send an INSERT statement to a SQL database, and it affects multiple
files. The database writes to the first file. Then your backup comes
along and grabs the files for backup. Then your database writes the
other files. Your backups are broken, and you won't know it until you
restore and test.

There are work-arounds. Shut the database down during backups, or make
it read only during backups. Or tell it to accept writes from clients
but not actually write them out to the files until the backup is over.

Obviously this requires some sort of co-ordination between the backup
software and the software maintaining the files.

Or use Sqlite, which I believe avoids this issue entirely.


-- 
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/



Re: Backup Times on a Linux desktop

2019-11-04 Thread Joel Roth
On Mon, Nov 04, 2019, Charles Curley wrote:
> On Mon, 4 Nov 2019 06:01:54 -1000
> Joel Roth  wrote:
> 
> > These days I use rsync with the --link-dest option to make
> > complete Time-Machine(tm) style backups using hardlinks to
> > avoid file duplication in the common case.  In this
> > scenario, the top-level directory is typically named based
> > on date and time, e.g. back-2019.11.04-05:32:06.
> 
> Take a look at rsnapshot. You have pretty well described it.

Looks like a featureful, capable, and thoroughly debugged
front end to rsync with the --link-dest option. 

Thanks, I'll fool around with this. 

Also for the explanations about file integrity issues when
databases are involved. 

--
Joel Roth



Re: Backup Times on a Linux desktop

2019-11-04 Thread deloptes
Alessandro Baggi wrote:

> If I'm not wrong deduplication "is a technique for eliminating duplicate
> copies of repeating data".
> 
> I'm not a borg expert and it performs deduplication on data chunk.
> 
> Suppose that you backup 2000 files in a day and inside this backup a
> chunk is deduped and referenced by 300 files. If the deduped chunk is
> broken I think you will lost it on 300 referenced files/chunks. This is
> not good for me.
> 

Look at the explanation by Linux-Fan. I think it is pretty good. It fits one
scenario, however if your backup system (disks or whatever) is broken - it
can not be considered as backup system at all.

I think deduplication is a great thing nowdays - People need to backup TBs,
take care of retention etc. I do not share your concerns at all.

> if your main dataset has a broken file, no problem, you can recovery
> from backups.
> 
> If your saved deduped chunk is broken all files that has reference to it
> could be broken. I think also that the same chunk will be used for
> successive backups (always for deduplication) so this single chunk could
> be used from backup1 to backupN.
> 

This is not true.

> It has also integrity check but don't know if check this. I read also
> that integrity check on bigsized dataset could require too much time.
> 
> In my mind a backup is a copy of file in window time and if needed in
> another window time another copy could be picked but it could not be a
> reference to a previous copy. Today there are people that make backups
> on tape (expensive) for reliability. I run backups on disks. Disks are
> cheap so compression (that require time in backup and restore) and
> deduplication (that add complexity) are not needed for me and they don't
> affect really my free disk space because I can add a disk.
> 

I think it depends how far you want to go - how precious is the data.
Magnetic disk and tapes can be destroyed by EMP or similar. SSD despite its
price can fail and if it fails - it can not recover anything.
So ... there are some rules in securely preserving backups - but all of this
is very expensive.

> Rsnapshot uses hardlink that is similar.
> 
> All this solutions are valid if them fit your needs. You must choose how
> important are data inside your backups and if losing a chunk deduped
> could make damage to your backup dataset in a timeline.
> 

No unless the corruption is on the backup server, but if it happens ... well
you should consider the backup server broken - I do not think it has
anything with deduplication.

> Ah if you have multiple server to backup, I prefer bacula because can
> pull data from hosts and can backup multiple server from the same point
> (maybe using for each client a separated bacula-sd daemon with dedicated
> storage).




Re: Backup Times on a Linux desktop

2019-11-04 Thread Joel Roth
On Sat, Nov 02, 2019, Konstantin Nebel wrote:
> So now I am thinking. How should I approach backups. On windows it does
> magically backups and remind me when they didnt run for a while. I like that
> attitude.
(...) 
>  I like to turn off
> my computer at night. So a backup running in night is not really an option
> unless I do wake on lan and run backup and then turn off. 
(...)

Someone already recommended setting up a cron job for triggering backups
on a regular schedule. That takes care of the automagic part.

These days I use rsync with the --link-dest option to make
complete Time-Machine(tm) style backups using hardlinks to
avoid file duplication in the common case.  In this
scenario, the top-level directory is typically named based
on date and time, e.g. back-2019.11.04-05:32:06.

I usually make backups while the system is running, although
I'm not sure it's considered kosher. It takes around 10% of
CPU on my i5 system.

> Whoever read till the end Im thankful and ready to hear your opinion.
 
> Cheers
> Konstantin

--
Joel Roth



Re: Backup Times on a Linux desktop

2019-11-04 Thread Alessandro Baggi

On 04/11/19 15:41, deloptes wrote:

Not sure if true - for example you make daily, weekly and monthly backups
(classical) Lets focus on the daily part. On day 3 the files is broken.
You have to recover from day 2. The file is not broken for day 2 - correct?!


If I'm not wrong deduplication "is a technique for eliminating duplicate 
copies of repeating data".


I'm not a borg expert and it performs deduplication on data chunk.

Suppose that you backup 2000 files in a day and inside this backup a 
chunk is deduped and referenced by 300 files. If the deduped chunk is 
broken I think you will lost it on 300 referenced files/chunks. This is 
not good for me.


if your main dataset has a broken file, no problem, you can recovery 
from backups.


If your saved deduped chunk is broken all files that has reference to it 
could be broken. I think also that the same chunk will be used for 
successive backups (always for deduplication) so this single chunk could 
be used from backup1 to backupN.


It has also integrity check but don't know if check this. I read also 
that integrity check on bigsized dataset could require too much time.


In my mind a backup is a copy of file in window time and if needed in 
another window time another copy could be picked but it could not be a 
reference to a previous copy. Today there are people that make backups 
on tape (expensive) for reliability. I run backups on disks. Disks are 
cheap so compression (that require time in backup and restore) and 
deduplication (that add complexity) are not needed for me and they don't 
affect really my free disk space because I can add a disk.


Rsnapshot uses hardlink that is similar.

All this solutions are valid if them fit your needs. You must choose how 
important are data inside your backups and if losing a chunk deduped 
could make damage to your backup dataset in a timeline.


Ah if you have multiple server to backup, I prefer bacula because can 
pull data from hosts and can backup multiple server from the same point 
(maybe using for each client a separated bacula-sd daemon with dedicated 
storage).




Re: Backup Times on a Linux desktop

2019-11-04 Thread Linux-Fan

deloptes writes:


Alessandro Baggi wrote:

> Borg seems very promising but I performs only push request at the moment
> and I need pull request. It offers deduplication, encryption and much
> more.
>
> One word on deduplication: it is a great feature to save space, with
> deduplication compression ops (that could require much time) are avoided
> but remember that with deduplication for multiple backups only one
> version of this files is deduplicated. So if this file get corrupted
> (for every reason) it will be compromised on all  previous backups jobs
> performed, so the file is lost. For this I try to avoid deduplication on
> important backup dataset.

Not sure if true - for example you make daily, weekly and monthly backups
(classical) Lets focus on the daily part. On day 3 the files is broken.
You have to recover from day 2. The file is not broken for day 2 - correct?!


[...]

I'd argue that you are both right about this. It just depends on where the
file corruption occurs.

Consider a deduplicated system which stores backups in /fs/backup and reads
the input files from /fs/data. Then if a file in /fs/data is corrupted, you
could always extract it from the backup successfully. If that file were
changed and corrupted, the backup system would no longer consider it a
"duplicate" and thus store the corrupted content of the file as a new
version. Effectively, while the newest version of the file is corrupted and
thus not useful, it is still possible to recover the old version of the file
from the (deduplicated or not) backup.

The other consideration is a corruption on the backup storage volume like
some files in /fs/backup go bad. In a deduplicated setting, if a single
piece of data in /fs/backup corresponds to a lot of restored files with the
same contents, all of these files are no longer successfully recoverable,
because the backup's internal structure contains corrupted data.

In a non-deduplicated (so to say: redundant) backup system, if parts of the
backup store become corrupted, the damage is likely (but not necessarily)
restricted to only some files upon restoration and as there is no
deduplication, it is likely that the "amount of data non-restorable" is
somehow related to the "amount of data corrupted"...

as these considerations about a corrupted backup store are mostly on such a
blurry level as described, the benefit from avoiding deduplication because
of the risk of losing more files upon corruption of the backup store is
possibly limited. However, given some concrete systems, the picture might
change entirely. A basic file-based (e.g. rsync) backup is as tolerant to
corruption as the original "naked" files. For any system maintaining its own
filesystem, the respective system needs to be studied extensively to find
out how partial corruption affects restorability. In theory, it could have
additional redundancy data to restore files even in the presence of a
certain level of corruption (e.g. in percent bytes changed or similar).

This whole thing was actually a reason for writing my own system: File-based
rsync-backup was slow, space inefficient and did not provide encryption.
However, more advanced systems (like borg, obnam?) split files into
multiple chunks and maintain their own filesystem. For me it is not really
obvious how a partially corrupted backup restores with these systems. For
my tool, I chose an approach between these: I store only "whole" files and
do not deduplicate them in any way. However, I put multipls small files into
archives such that I can compress and encrypt them. In my case, a partial
corruption would exactly lose the files from the corrupted archives which
establishes a relation between the amount of data corrupted and lost
(although in the worst case: "each archive slightly corrupted", all is
lost... to avoid that one needs error correction, but my tool does not do it
[yet?])

HTH
Linux-Fan



Re: Backup Times on a Linux desktop

2019-11-04 Thread deloptes
Alessandro Baggi wrote:

> Borg seems very promising but I performs only push request at the moment
> and I need pull request. It offers deduplication, encryption and much
> more.
> 
> One word on deduplication: it is a great feature to save space, with
> deduplication compression ops (that could require much time) are avoided
> but remember that with deduplication for multiple backups only one
> version of this files is deduplicated. So if this file get corrupted
> (for every reason) it will be compromised on all  previous backups jobs
> performed, so the file is lost. For this I try to avoid deduplication on
> important backup dataset.

Not sure if true - for example you make daily, weekly and monthly backups
(classical) Lets focus on the daily part. On day 3 the files is broken.
You have to recover from day 2. The file is not broken for day 2 - correct?!

> but remember that with deduplication for multiple backups only one
> version of this files is deduplicated.

I do not know how you come to the conclusion regarding this. This is not how
deduplication works. At least not according my understanding. The
documentation describes the process of backing up and deduplication such
that file chunks are being read and compared. If they are different the new
chunk is backuped. Remember this is done for each backup. If you want to
restore a previous one obviously the file will be reconstructed based on
previously store/backuped information.

regards





Re: Backup Times on a Linux desktop

2019-11-04 Thread Jonathan Dowland

On Sun, Nov 03, 2019 at 02:47:46AM -0500, Gene Heskett wrote:

Just 4 or 5 days ago, I had to recover the linuxcnc configs from a backup
of the pi3, making a scratch dir here at home, then scanned my database
for the last level0 of the pi3b, pulled that out with amrecover then
copied what I needed back to the rpi4 now living in my Sheldon Lathes
control box. File moving done by an old friend, mc, and sshfs mounts.
Totally painless,


As a former Amanda user in a professional setting (thankfully now deep
in my past), I read most of this with a mixed sense of nostalgia (oh yes
I remember that) and pleasure that I am no longer having to put up with
it, although once I got to "totally painless" I almost spat out my tea. 


Since you are all set up already and it's working great for you I
wouldn't suggest you change anything but for anyone who isn't already
invested in Amanda, the process you describe there is considerably more
awkward than that offered by many more modern tools.



Re: Backup Times on a Linux desktop

2019-11-04 Thread Jonathan Dowland



I'll respond on the issue of triggering the backup, rather than the
specific backup software itself, because my solution for triggering
is separate from the backup software I use (rdiff-backup).

I trigger (some) backup jobs via systemd units, that are triggered by
the insertion of my removeable backup drive. So, I would suggest that
instead of doing a network backup to your 4T drive on the other side of
your pi, you could attach the drive directly to your Computer when you
want to initiate a backup. This doesn't address your desire to have it
happen in the background, though, because you would still need to
remember (or prompt yourself) to attach the drive. I provide the details
anyway just in case they are interesting.

My "backup-exthdd.service" is what performs the actual backup job:

   [Unit]
   OnFailure=status-email-user@%n.service blinkstick-fail.service
   Requires=systemd-cryptsetup@extbackup.service
   After=systemd-cryptsetup@extbackup.service

   [Service]
   Type=oneshot
   ExecStart=/bin/mount /extbackup
   ExecStart=
   ExecStop=/bin/umount /extbackup
   ExecStop=/usr/local/bin/blinkstick --index 1 --limit 10 --set-color green

   [Install]
   
WantedBy=dev-disk-by\x2duuid-e0eed9b6\x2d03f1\x2d41ed\x2d80a4\x2dc7cc4ff013c3.device

(the mount and umount Execs there shouldn't be needed, they should be
addressed by systemd unit dependencies, but in practice they were
necessary when I set this up. This was a while ago and systemd may
perform differently now.)

My external backup disk has an encrypted partition on it. So, the job
above actually depends upon the decrypted partition. The job
"systemd-cryptsetup@extbackup.service" handles that. The skeleton of the
job was written by systemd-cryptsetup-generator automatically, based on
content in /etc/crypttab; I then had to adapt it further. The entirety
of it is:

   [Unit]
   Description=Cryptography Setup for %I
   SourcePath=/etc/crypttab
   DefaultDependencies=no
   Conflicts=umount.target
   BindsTo=dev-mapper-%i.device
   IgnoreOnIsolate=true
   After=systemd-readahead-collect.service systemd-readahead-replay.service 
cryptsetup-pre.target
   Before=cryptsetup.target
   
BindsTo=dev-disk-by\x2duuid-e0eed9b6\x2d03f1\x2d41ed\x2d80a4\x2dc7cc4ff013c3.device
   
After=dev-disk-by\x2duuid-e0eed9b6\x2d03f1\x2d41ed\x2d80a4\x2dc7cc4ff013c3.device
   Before=umount.target
   StopWhenUnneeded=true

   [Service]
   Type=oneshot
   RemainAfterExit=yes
   TimeoutSec=0
   ExecStart=/lib/systemd/systemd-cryptsetup attach 'extbackup' 
'/dev/disk/by-uuid/e0eed9b6-03f1-41ed-80a4-c7cc4ff013c3' '/root/exthdd.key' 
'luks,noauto'
   ExecStop=/lib/systemd/systemd-cryptsetup detach 'extbackup'

So when the remote disk device with the UUID
e0eed9b6-03f1-41ed-80a4-c7cc4ff013c3 appears on the system, its
appearance causes systemd to start the "backup-exthdd.service" job,
which depends upon the bits to enable the encrypted volume.

(the "blinkstick-fail.service" and ExecStop=/usr/local/bin/blinkstick…
line relate to a notification system I have: this is my headless NAS,
and the "blinkstick" is a little multicolour LED attached via USB. In
normal circumstances it is switched off. When a job is running it
changes to a particular colour; when the job finished successfully, it's
green - indicating I can unplug the drive (it's all unmounted etc.), if
anything goes wrong it turns red.)



Re: Backup Times on a Linux desktop

2019-11-04 Thread Alessandro Baggi

On 02/11/19 20:24, Konstantin Nebel wrote:

Hi,

this is basically a question, what you guys prefer and do. I have a Linux
destkop and recently I decided to buy a raspberry pi 4 (great device) and
already after a couple days I do not know how I lived without it. So why
Raspberrypi.

In the past I decided not to do backups on purpose. I decided that my data on
my local Computer is not important and to store my important stuff in a
nextcloud I host for myself and do backups of that. And for a long period of
time I was just fine with it.

Now i attached a 4 tb drive to my pi and I decided what the heck, why not
doing backups now.

So now I am thinking. How should I approach backups. On windows it does
magically backups and remind me when they didnt run for a while. I like that
attitude.

On linux with all that decision freedom it can be good and bad cause you have
to think about things :D

(SKIP THIS IF U DONT WANT TO READ TOO MUCH) ;)
So I could do the backup on logout for example but I am not sure if that is
not annoying so I'd like to have your opinion. Oh and yeah. I like to turn off
my computer at night. So a backup running in night is not really an option
unless I do wake on lan and run backup and then turn off. But right now I have
dual boot and Windows on default (for games, shame on me) and I might switch
cause first Gaming on Linux is really becoming rly good and second I could buy
second GPU for my Linux and then forward my GPU to a windows VM running my
games in 3d... Especially after buying Ryzen 3900X (that a monster of cpu)

Whoever read till the end Im thankful and ready to hear your opinion.


Cheers
Konstantin



Hi Konstantin,
In my linux experience I found several solution for backup.
First of all rsync.

Scripted rsync is well suited for your situation. Remember that rsync is 
not a backup tool/system alone, it is very helpfull when you need to 
sync file between hosts. Over this you can user --backup option that 
saves the last copy before it will be overwritten by the new copy in a 
different dir. You can use SSH to add encryption during transfer. If you 
add a catalog and configuration you can use it for multiple client.
In then past I ran my scripted rsync backup tool, with catalog, 
prejob/postjob script etc.



Then I encountered bacula. bacula is a beast, complex, hard to configure 
in the first time but it is very powerfull. It permit pooling, 
scheduling, mailing, encryption, multiple clients, prejob/postjob script 
on server and on client, storage on tape or disks, has its own scheduler 
like cron that works very well, volume recycling, Client GUI, Windows 
Client, Web Interface and much more.
I used it for several server and works great. In some situation I prefer 
run rsync to a local machine before run backup because on large datasets 
it requires more times and more overhead on network bandwidth plus all 
operation like stopping services + create lvm snapshot ecc With 
large datasets rsync permit to sync file very quickly so I can block my 
service for very small amount of time and the perform backup locally on 
synced dataset.



There are also other backup tool like rsnapshot (based on rsync) and I 
think this is the best solution for you. There is bareOS (a clone of 
bacula), amanda, restic, duplicity, BackupPC and borg.


Borg seems very promising but I performs only push request at the moment 
and I need pull request. It offers deduplication, encryption and much more.


One word on deduplication: it is a great feature to save space, with 
deduplication compression ops (that could require much time) are avoided 
but remember that with deduplication for multiple backups only one 
version of this files is deduplicated. So if this file get corrupted 
(for every reason) it will be compromised on all  previous backups jobs 
performed, so the file is lost. For this I try to avoid deduplication on 
important backup dataset.


My 2 cents.





Re: Backup Times on a Linux desktop

2019-11-03 Thread Gene Heskett
On Sunday 03 November 2019 01:49:15 ghe wrote:

> > On Nov 2, 2019, at 05:42 PM, Linux-Fan  wrote:
> >
> > Konstantin Nebel writes:
> >> this is basically a question, what you guys prefer and do. I have a
> >> Linux destkop and recently I decided to buy a raspberry pi 4 (great
> >> device) and
> >
> > [...]
> >
> >> Now i attached a 4 tb drive to my pi and I decided what the heck,
> >> why not doing backups now.
> >>
> >> So now I am thinking. How should I approach backups. On windows it
> >> does magically backups and remind me when they didnt run for a
> >> while. I like that attitude.
>
> I've used Amanda (in a shell script like Gene does) for going on 20
> years. It's been rock solid. I use it with tape, but I hear it backs
> up to disks too. The only thing I don't like about it (coming from
> memory of the experience when I configured it these many years ago) is
> that it's pretty difficult to get going. Now that it's going, though,
> I easily change things all the time.
>
> Recovering from a backup is a reasonable job. I don't do it very
> often, but the recovery software is pretty good about asking questions
> and providing help if you haven't used it for a few months.

Just 4 or 5 days ago, I had to recover the linuxcnc configs from a backup 
of the pi3, making a scratch dir here at home, then scanned my database 
for the last level0 of the pi3b, pulled that out with amrecover then 
copied what I needed back to the rpi4 now living in my Sheldon Lathes 
control box. File moving done by an old friend, mc, and sshfs mounts. 
Totally painless, and all the magic I had written on the pi3 is now 
running considerably better on the pi4.  Saves me from having to 
re-invent at least 2 wheels you won't find on any other cnc lathes, that 
took me over 6 months to develop.

> It doesn't do magic, like Winders does, though. You have to tell it
> what kind of magic you'd like. I have a cron job that runs the backup
> every couple days (in the middle of the night). And another that
> reminds me to change the tape -- amanda whines when I ignore the
> reminder/forget to change the tape. Very thoughtful and well done
> software.
>
> Oh, and it backs up all the computers on my LAN, including my 'Pi. And
> as best I know, it's strictly CLI.


I don't use tapes, but virtual tapes, 60 of them on a 2TB drive, and 
which are nothing but a directory entry on the disk. So I have backups 
up to 60 days old instantly available. Using tapes nearly bankrupted me 
as the hard drives will outlast tapes by at least 1000/1. Amanda ages 
out the old ones and reuses the space on the drives with no drama.  This 
wouldn't do for business records of course, but is ideal for a home 
workshop. I started out with a 1T drive but after several years it was 
getting crowded, so it got replaced with a 2T about a year back.  The 
retired 1T has nearly 80,000 spinning hours on it with a perfect bill of 
health from smartctl.
Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page 



Re: Backup Times on a Linux desktop

2019-11-03 Thread ghe



> On Nov 2, 2019, at 05:42 PM, Linux-Fan  wrote:
> 
> Konstantin Nebel writes:
> 
>> this is basically a question, what you guys prefer and do. I have a Linux
>> destkop and recently I decided to buy a raspberry pi 4 (great device) and
> 
> [...]
> 
>> Now i attached a 4 tb drive to my pi and I decided what the heck, why not
>> doing backups now.
>> 
>> So now I am thinking. How should I approach backups. On windows it does
>> magically backups and remind me when they didnt run for a while. I like that
>> attitude.

I've used Amanda (in a shell script like Gene does) for going on 20 years. It's 
been rock solid. I use it with tape, but I hear it backs up to disks too. The 
only thing I don't like about it (coming from memory of the experience when I 
configured it these many years ago) is that it's pretty difficult to get going. 
Now that it's going, though, I easily change things all the time.

Recovering from a backup is a reasonable job. I don't do it very often, but the 
recovery software is pretty good about asking questions and providing help if 
you haven't used it for a few months.

It doesn't do magic, like Winders does, though. You have to tell it what kind 
of magic you'd like. I have a cron job that runs the backup every couple days 
(in the middle of the night). And another that reminds me to change the tape -- 
amanda whines when I ignore the reminder/forget to change the tape. Very 
thoughtful and well done software.

Oh, and it backs up all the computers on my LAN, including my 'Pi. And as best 
I know, it's strictly CLI.

-- 
Glenn English





Re: Backup Times on a Linux desktop

2019-11-02 Thread elvis



On 3/11/19 5:24 am, Konstantin Nebel wrote:

Hi,

this is basically a question, what you guys prefer and do. I have a Linux
destkop and recently I decided to buy a raspberry pi 4 (great device) and
already after a couple days I do not know how I lived without it. So why
Raspberrypi.

In the past I decided not to do backups on purpose. I decided that my data on
my local Computer is not important and to store my important stuff in a
nextcloud I host for myself and do backups of that. And for a long period of
time I was just fine with it.

Now i attached a 4 tb drive to my pi and I decided what the heck, why not
doing backups now.

So now I am thinking. How should I approach backups. On windows it does
magically backups and remind me when they didnt run for a while. I like that
attitude.

On linux with all that decision freedom it can be good and bad cause you have
to think about things :D

(SKIP THIS IF U DONT WANT TO READ TOO MUCH) ;)
So I could do the backup on logout for example but I am not sure if that is
not annoying so I'd like to have your opinion. Oh and yeah. I like to turn off
my computer at night. So a backup running in night is not really an option
unless I do wake on lan and run backup and then turn off. But right now I have
dual boot and Windows on default (for games, shame on me) and I might switch
cause first Gaming on Linux is really becoming rly good and second I could buy
second GPU for my Linux and then forward my GPU to a windows VM running my
games in 3d... Especially after buying Ryzen 3900X (that a monster of cpu)
an
Whoever read till the end Im thankful and ready to hear your opinion.


I use Bacula which runs the backups on a schedule, but you can also 
trigger them with scripting if you want them done during the day.






Cheers
Konstantin


--
Auntie Em:  Hate you, Hate Kansas, Taking the dog.  -Dorothy.



Re: Backup Times on a Linux desktop

2019-11-02 Thread Rick Thomas



On Sat, Nov 2, 2019, at 1:30 PM, Konstantin Nebel wrote:
> Hi,
> 
> > Anyway from my experience borg is the best and I can recommend it wormly.
> 
> I appreciate you answering in the fullest how you do backups and I used borg
> in the past which I can recommend as well. But I really want to focus on how
> to trigger the backup in an automated way and not in which tool is recommended
> to use.
> 
> > My use case is quite opposite. I shut down the backup server as I do only
> > weekly and monthly backups.
> 
> I assume u trigger them manually then? I would like to do them automatically
> and hopefully forget about them that they exist and do the occasional check if
> they work or not :)

For this I use "rsnapshot," which schedules it's backup runs via Linux "cron".  
It's pretty much "fire and forget" once you've done the configuration details.

Hope that helps!
Rick



Re: Backup Times on a Linux desktop

2019-11-02 Thread Linux-Fan

Konstantin Nebel writes:


this is basically a question, what you guys prefer and do. I have a Linux
destkop and recently I decided to buy a raspberry pi 4 (great device) and


[...]


Now i attached a 4 tb drive to my pi and I decided what the heck, why not
doing backups now.

So now I am thinking. How should I approach backups. On windows it does
magically backups and remind me when they didnt run for a while. I like that
attitude.


[...]


So I could do the backup on logout for example but I am not sure if that is
not annoying so I'd like to have your opinion. Oh and yeah. I like to turn


[...]


Whoever read till the end Im thankful and ready to hear your opinion.


[...]

My opinion on the matter is this: Go for a good (fast) tool and trigger it
often :) Borg has been mentioned and might be very good (I wrote my backup
tool myself but that is probably less good :) )

About the times of triggering: For me (also as a "Desktop" user of sorts), I
actually do a variant of backup on logout which is actually "backup before
shutdown". I do it by using a custom script called `mahalt` which I invoke
to shutdown my computer. Before triggering the actual shutdown, it invokes
the backup procedure.

As a "Laptop" user (current situation), I can really not be sure that I will
have the time to await the backup (on my "Desktop" it takes about 2 minutes
or so which is really acceptable for shutdown). Thus for the "Laptop" usage,
I trigger backup manually once per day (usually as the last action of the
day's computer usage) and it runs slightly faster because everything is on
SSD.

If that triggering intervals are good for you also depends on the amount of
data. If it is much (as per >50 GiB or so), the detection of changed files
(even with good tools) will take some considerable amount of time and thus a
"more rare" triggering interval might be good. Still, triggering it entirely
in background (so as to have enough "time" for the backup) should be
considered with extreme care, because unexpectedly triggered backups can not
only impair system performance but also backup an "inconsistent" state of
sorts (e.g. opened files or partial directory structures if large parts of
the structure are just being copied/renamed etc.).

Btw. as I am paranoid when it comes to backups I always do multi-level
backup: The first copy is on the very same computer's HDD (same for
Laptop/Desktop), the second copy goes to a "mini" computer similar to
Raspberry Pi. Then for the "Desktop" I usually have all HDDs on RAID1 and
the "mini" computer synchronizes to an online file storage service. For the
"Laptop" I sync to the "mini" computer and to a separate SD card (via rsync
at the moment and also triggered manually once per day). In any case, at
least one of the storage locations is offsite...

HTH
Linux-Fan



Re: Backup Times on a Linux desktop

2019-11-02 Thread Charles Curley
On Sat, 02 Nov 2019 21:30:32 +0100
Konstantin Nebel  wrote:

> But I really want to focus on how
> to trigger the backup in an automated way and not in which tool is
> recommended to use.

cron, unless the program you select has a built-in equivalent.

But it's hard to get good backups while the machine is shut down.

-- 
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/


pgpT5XPhp2E4R.pgp
Description: OpenPGP digital signature


Re: Backup Times on a Linux desktop

2019-11-02 Thread deloptes
Konstantin Nebel wrote:

>> Anyway from my experience borg is the best and I can recommend it wormly.
> 
> I appreciate you answering in the fullest how you do backups and I used
> borg in the past which I can recommend as well. But I really want to focus
> on how to trigger the backup in an automated way and not in which tool is
> recommended to use.
> 

I think it is not only me that misunderstood the point of the question.

>> My use case is quite opposite. I shut down the backup server as I do only
>> weekly and monthly backups.
> 
> I assume u trigger them manually then? I would like to do them
> automatically and hopefully forget about them that they exist and do the
> occasional check if they work or not :)

Yes, on my list is still to find a way to trigger one single backup in a
time window when the backup server is available. I want to also explore the
snapshot function of LVM. This is my use case.

In any case a backup is time consuming process and what many suggest is to
sync the content you are interested in permanently to a local machine and
then do your regular backup from there.

This would mean you use something like inotify and sync the files to the PI4
and have a cronjob that does the backup at some point of time from the
local directory. To me it looks very clean and simple.
https://linuxhint.com/inotofy-rsync-bash-live-backups/

there are couple of other articles

regards




Re: Backup Times on a Linux desktop

2019-11-02 Thread Dominic Knight
On Sat, 2019-11-02 at 20:24 +0100, Konstantin Nebel wrote:
> Hi,
>
>

> dual boot and Windows on default (for games, shame on me) and I might
> switch
> cause first Gaming on Linux is really becoming rly good and second I
> could buy

Missing out answers to your question :)
Yes Gaming on Linux is becoming really good, running Flatpak Steam on
Debian testing and only one game for Linux I have is dead (and that
might well work if I really bothered), less than 0.5 %. Proton also
works very well, the few Windows games I have run as well and some
times better than their target OS.

Cheers,
Dom.



Re: Backup Times on a Linux desktop

2019-11-02 Thread Christopher David Howie

On 11/2/2019 3:24 PM, Konstantin Nebel wrote:

Whoever read till the end Im thankful and ready to hear your opinion.


I use restic (the static binaries from the Github release page, not the 
Debian package which falls out of date too quickly) and invoke it from 
crontab.  On my LAN server, the script creates LVM snapshots of the 
volumes to back up and the backup runs against that to get a true 
point-in-time backup.


The local backup repository is stored on a RAID1.  A remote system 
regularly runs "rclone copy --immutable" to pull new data, but ignore 
deleted and changed data (preventing corruption/loss of data already 
copied off-site).


The remote system also syncs its local copy to B2, so I have three 
geographically-distributed copies.


--
Chris Howie
http://www.chrishowie.com
http://en.wikipedia.org/wiki/User:Crazycomputers

If you correspond with me on a regular basis, please read this document: 
http://www.chrishowie.com/email-preferences/


PGP fingerprint: 2B7A B280 8B12 21CC 260A DF65 6FCE 505A CF83 38F5


IMPORTANT INFORMATION/DISCLAIMER

This document should be read only by those persons to whom it is 
addressed.  If you have received this message it was obviously addressed 
to you and therefore you can read it.


Additionally, by sending an email to ANY of my addresses or to ANY 
mailing lists to which I am subscribed, whether intentionally or 
accidentally, you are agreeing that I am "the intended recipient," and 
that I may do whatever I wish with the contents of any message received 
from you, unless a pre-existing agreement prohibits me from so doing.


This overrides any disclaimer or statement of confidentiality that may 
be included on your message.




Re: Backup Times on a Linux desktop

2019-11-02 Thread Gene Heskett
On Saturday 02 November 2019 16:30:32 Konstantin Nebel wrote:

> Hi,
>
> > Anyway from my experience borg is the best and I can recommend it
> > wormly.
>
> I appreciate you answering in the fullest how you do backups and I
> used borg in the past which I can recommend as well. But I really want
> to focus on how to trigger the backup in an automated way and not in
> which tool is recommended to use.
>
> > My use case is quite opposite. I shut down the backup server as I do
> > only weekly and monthly backups.
>
> I assume u trigger them manually then? I would like to do them
> automatically and hopefully forget about them that they exist and do
> the occasional check if they work or not :)

I use GenesAmandaHelper, a wrapper around amanda that I wrote 15+ years 
ago and which takes care of the housekeeping better than amanda, all 
fired off to do my 5 machines at about 2am every morning.  Its reduced 
me to a lazy bum since all I have to do is read the emails it sends me.

Everything else is automatic.  Computers should work for you, not make 
more work for you.

> --
> Cheers
> Konstantin


Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page 



Re: Backup Times on a Linux desktop

2019-11-02 Thread Konstantin Nebel
Hi,

> Anyway from my experience borg is the best and I can recommend it wormly.

I appreciate you answering in the fullest how you do backups and I used borg
in the past which I can recommend as well. But I really want to focus on how
to trigger the backup in an automated way and not in which tool is recommended
to use.

> My use case is quite opposite. I shut down the backup server as I do only
> weekly and monthly backups.

I assume u trigger them manually then? I would like to do them automatically
and hopefully forget about them that they exist and do the occasional check if
they work or not :)

--
Cheers
Konstantin


signature.asc
Description: This is a digitally signed message part.


Re: Backup Times on a Linux desktop

2019-11-02 Thread deloptes
Konstantin Nebel wrote:

> Whoever read till the end Im thankful and ready to hear your opinion.

There are many good solutions out there. I can not say anything about your
specific use case. Usually you would do a snapshot of the partition and
backup, but I am not so far here. I do classical file based backup.

I resumed doing backups last year after 10y break. I do backup only of
important data (accounting, personal information). I found out that the
best tool for me to use is borg. It has compression, password protection
and deduplication, so currently 4 backup copies of 2 clients and a server
(total of approx 2TB of data) occupies 860GB on the backup server.

There are also few GUIs or WebUIs, which I do not use.

My use case is quite opposite. I shut down the backup server as I do only
weekly and monthly backups.

Anyway from my experience borg is the best and I can recommend it wormly.

hope it helps
regards



Backup Times on a Linux desktop

2019-11-02 Thread Konstantin Nebel
Hi,

this is basically a question, what you guys prefer and do. I have a Linux
destkop and recently I decided to buy a raspberry pi 4 (great device) and
already after a couple days I do not know how I lived without it. So why
Raspberrypi.

In the past I decided not to do backups on purpose. I decided that my data on
my local Computer is not important and to store my important stuff in a
nextcloud I host for myself and do backups of that. And for a long period of
time I was just fine with it.

Now i attached a 4 tb drive to my pi and I decided what the heck, why not
doing backups now.

So now I am thinking. How should I approach backups. On windows it does
magically backups and remind me when they didnt run for a while. I like that
attitude.

On linux with all that decision freedom it can be good and bad cause you have
to think about things :D

(SKIP THIS IF U DONT WANT TO READ TOO MUCH) ;)
So I could do the backup on logout for example but I am not sure if that is
not annoying so I'd like to have your opinion. Oh and yeah. I like to turn off
my computer at night. So a backup running in night is not really an option
unless I do wake on lan and run backup and then turn off. But right now I have
dual boot and Windows on default (for games, shame on me) and I might switch
cause first Gaming on Linux is really becoming rly good and second I could buy
second GPU for my Linux and then forward my GPU to a windows VM running my
games in 3d... Especially after buying Ryzen 3900X (that a monster of cpu)

Whoever read till the end Im thankful and ready to hear your opinion.


Cheers
Konstantin


signature.asc
Description: This is a digitally signed message part.