Re: [rdiff-backup-users] --check-destination-dir taking a very long time

Walt Mankowski Tue, 10 Sep 2019 12:52:47 -0700

On Tue, Sep 10, 2019 at 03:25:12PM -0400, Joe Steele wrote:
> On 9/9/2019 9:53 PM, Walt Mankowski wrote:
> > I found a file named
> > 
> >    rdiff-backup-data/current_mirror.2019-09-08T03:01:02-04:00.data
> > 
> > which contained
> > 
> >    4351
> > 
> > I moved it out of the way and reran the backup command. This time it
> > through an exception. The output is in the attached log file.
> > 
> 
> (Some of the following echoes what Eric Lavarde wrote a few minutes ago.)
> 
> Moving a current_mirror file out of the way is never a good thing to do.
> Having 2 current_mirror files is how rdiff-backup knows that the last backup
> failed and that a regression is necessary in order to reestablish a
> consistent state for the backup repository.
> 
> Fortunately, it looks as though your attempt to run another backup after
> removing the current_mirror file did not get anywhere (based on your log).
> 
> I suggest putting the 'current_mirror.2019-09-08T03:01:02-04:00.data' back
> in place (and possibly restart systemd-resolved, as commented further
> below).  After that, I would look to see what current mirror files you now
> have.  My guess is that you will find the following:
> 
> current_mirror.2019-09-07T03:01:01-04:00.data
> current_mirror.2019-09-08T03:01:02-04:00.data
> current_mirror.2019-09-09T21:46:29-04:00.data
> 
> 9/7/19 is your last good backup.  9/8 was the backup that failed.  9/9 was
> your most recent attempt to fix things.
> 
> *Assuming* that I am correct about the current_mirror files that exist, then
> I would remove the last of those files
> (current_mirror.2019-09-09T21:46:29-04:00.data).  Yes, that's contrary to my
> admonition above.  But rdiff-backup cannot deal with 3 such files, and this
> last file is from your most recent backup that did not get anywhere,
> according to your log.
> 
> I would then again try 'rdiff-backup --check-destination-dir' (and cross
> your fingers).


Oops!

What I ended up doing was moving the file back and removing these files:

  current_mirror.2019-09-09T21:46:28-04:00.data
  file_statistics.2019-09-09T21:46:28-04:00.data.gz

Then, since I hadn't read your email yet, I reran the backup command again:

  $ sudo rdiff-backup -v9 --print-statistics --exclude-filelist 
/usr/local/etc/rdiff_exclude / /backup/scruffy 2>&1 | tee rdiff-backup3.txt

That seems to have done the trick. It printed out a bunch of text and
finally said

  Previous backup seems to have failed, regressing destination now.

Looks like I lucked out! So I'm just going to let it run now and see
what happens.

> Your original concern was that this was taking forever (12+ hours and
> counting).  For what it is worth, my experience is that regressions do take
> many hours (depending on size of your current mirror), and they leave you
> wondering if anything is actually happening.
> 
> It seems like 296 GB would take me 4-8 hours to regress (I can't really
> remember -- it's been a while).  If your backup is 527 GB (i.e., that's what
> shows up for 'MirrorFileSize' in your session_statistics.* files), then yes,
> I imagine that would take quite some time to regress.  There are probably
> other factors besides size that affect the speed -- disk speed, processor
> speed, load, etc.  I don't know if rdiff-backup logging verbosity is a
> factor or not -- I would think that it might be a factor.

Thanks. It's really good to know the time wasn't out of line.

Some sort of diagnostic messages, especially with -v9, would be really
helpful to know that it's working!

> None of the above addresses your problem with "No space left on device".  I
> would try to restore your repository to a consistent state before
> investigating that further.  (Of course, the real frustrating thing is that
> if the backup fails again, you are forced to wait many hours while you
> repeat the regression of the failed backup.)

Agreed. But considering at this point I haven't even been able to
regress the backups, I'm willing to cross that bridge when I come to
it. There was some weirdness with my box that day, including gnome
crashing, so maybe it just got into a weird state. Even if it fails
again I'm no worse off than I am now.

> <snip>
> 
> > On Mon, Sep 09, 2019 at 08:17:04PM -0400, Walt Mankowski wrote:
> > > I ran
> > > 
> > >    $ sudo rdiff-backup -v9 --print-statistics --exclude-filelist 
> > > /usr/local/etc/rdiff_exclude / /backup/scruffy 2>&1 | tee rdiff-backup.txt
> > > 
> > > This time it exited right away. I've attached the log file, where the
> > > key message is
> > > 
> > >    Fatal Error: It appears that a previous rdiff-backup session with
> > >    process id 4351 is still running.
> > > 
> > > Process 4351 is /lib/systemd/systemd-resolved
> > > 
> 
> It would seem that you had a bit of bad luck in that a process ID that had
> been used for a crashed rdiff-backup session happened to now be in use again
> for an unrelated process (systemd-resolved).

That is bizarre! Especially because it was a systemd process with a
lowish PID, I figured it had been running since boot.

> > > Is it safe to rerun it with --force?
> > > 
> 
> Using --force would have gotten around the Fatal Error, but it would have
> also forced other things to happen that you may not want.  In this instance,
> I would have probably restarted systemd-resolved so that it used a different
> PID.  That should have gotten rdiff-backup past that particular error.

Good to know.

> > > > > The latter was when I killed it when I woke up and saw that both of
> > > > > them were running.
> > > > > 
> 
> That's interesting.  That points out that rdiff-backup does not check if a
> regression is already in progress before starting another one.  That needs
> fixing.
> 
> --Joe

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Re: [rdiff-backup-users] --check-destination-dir taking a very long time

Reply via email to