Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-07 Thread Arnold Krille
On Friday 07 October 2011 01:41:45 Holger Parplies wrote:
 Hi,
 
 Les Mikesell wrote on 2011-10-06 18:17:06 -0500 [Re: [BackupPC-users] Bad 
md5sums due to zero size (uncompressed) cpool files - WEIRD BUG]:
  On Thu, Oct 6, 2011 at 5:21 PM, Arnold Krille arn...@arnoldarts.de 
wrote:
No, it makes perfect sense for backuppc where the point is to keep
as much history as possible online in a given space.
   
   No, the point of backup is to be able to *restore* as much historical
   data as possible.  Keeping the data is not the important part.
Restoring it is.  Anything that is between storing data and
   *restoring* that data is in the way of that job.
   
   Actually the point of a backup is to restore the most recent version of
   something from just before the trouble (whatever that might be).
  
  Yes, but throw in the fact that it may take some unpredictable amount
  of time after the 'trouble' (which could have been accidentally
  deleting a rarely used file) before anyone notices and you see why you
  need some history available to restore from the version just before
  the trouble.
 
 I think you've all got it wrong. The real *point* of a backup is ...
 whatever the person doing the backup wants it for. For some people that
 might just be being able to say, hey, we did all we could to preserve the
 data as long as legally required - too bad it didn't work out.

No, that case of archiving documents and communications to fulfill legal 
requirements is called an _archive_! And while such a thing works well on 
paper (and for paper-documents, provided you are good friends with the 
archive-lady), try to access any old electronic n a company after they 
switched from lotus to SAP in between. Still the data is archived according to 
the law.

 Usually, it
 seems to be sufficient that the data is stored, but some of us really *do*
 want to be able to *restore* it, too, while others are doing backups mainly
 for watching the progress. Fine. That's what the flexibility of BackupPC is
 for, right?

No one wants a backup. Everyone only wants restore.

Good thing backuppc can do backups that don't get in your way, restore that 
works simply by clicking in a web-interface and additionally write complete 
dumps to archive-media...

Have fun,

Arnold


signature.asc
Description: This is a digitally signed message part.
--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-07 Thread Tim Fletcher
On Thu, 2011-10-06 at 17:54 +0200, Holger Parplies wrote:

 To be honest, I would *hope* that only you had these issues and everyone
 else's backups are fine, i.e. that your hardware and not the BackupPC software
 was the trigger (though it would probably need some sort of software bug to
 come up with the exact symptoms).

So far my scan of one of my systems has finished and has given:

758080 files in 4096 directories checked, 0 had wrong digests, of these
0 zero-length.

Another system is currently up to a/f/d of a full scan and has found the 
following errors

tim@carbon:~$ grep -v ok /tmp/pool-check.txt ; tail -n 1 /tmp/pool-check.txt
[335403] 1ef2238fe0d1e5ffb7abe1696a32ae91 (   384) != 
5c6bec8866797c63a7fbdc92f8678c1f
[397563] 2429be9ee43ac9c7d0cb8f0f4f759cd8 (   364) != 
12351030008ccf626abd83090c0e5efa
[761269] 452017085ec5f0a21b272dac9cbaf51c (  2801) != 
b4f9ab837e47574f145289faddc38ca2
[1260873] 72ed33567c8fbda29d63ade20f13778d (   364) != 
8521efc754784ac13db47545edb22fcd
[1380912] 7d264e0aedb7d6693946594b583643d6 (   270) != 
c15a891ef0ab8d4842196fcbaf3e6b9f
[1534431] 8a7e659dd6d0a4f45464cc3f55372323 (58) != 
de0ac1b424d9a022b4f3415896817ec4
[1598997] 90097fdb369c0152e737c7b88c0f6ff6 (   282) != 
16d682e1c10bee80844e6966eaabbbcf
[1873732] a9d171972fca12bf03c082b7fba542d1 (   364) != 
ee42dab9abc3486325a674779beaabcc
[1940164] afd73ec3463ea92cfd509ead19f938f5 (  5102) ok

Once the scan has finished I'll do a bit more digging about when the
files where created and from which host I'm backing up.

The hardware in both cases is the same, a HP Microserver with a a raid5
disk set. 

The one without errors is running Fedora 15 32bit but with a pool that
has been moved from Ubuntu 10.04 32bit a few months ago, the pool dates
from the 20/09/2010. 

The one with errors has always been on current Ubuntu 32bit, the pool
dates back to 08/01/2010. 

-- 
Tim Fletcher t...@night-shade.org.uk


--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-07 Thread Holger Parplies
Hi,

Tim Fletcher wrote on 2011-10-07 10:21:29 +0100 [Re: [BackupPC-users] Bad 
md5sums due to zero size (uncompressed) cpool files - WEIRD BUG]:
 [...]
 Another system is currently up to a/f/d of a full scan and has found the 
 following errors
 
 tim@carbon:~$ grep -v ok /tmp/pool-check.txt ; tail -n 1 /tmp/pool-check.txt

you might be better off with terse progress output (-p) instead of verbose
(-v), though that doesn't combine well with logging. I should add a logging
option to just output the mismatches to a file.

 [335403] 1ef2238fe0d1e5ffb7abe1696a32ae91 (   384) != 
 5c6bec8866797c63a7fbdc92f8678c1f

In case that needs explanation, the output format is

[counter] file-name (uncompressed-file-length) != computed-hash

Normally, file-name and hash match (except for a possible _n suffix of the
file-name in case of a hash collision). These lines indicate for which files
that is not true, and what hash chain they should really be in (though no
attempt is made to fix that, and you should only do so yourself if you know
what you are doing). The (uncompressed-file-length) was added yesterday and
breaks 80 character output width. Sorry.

 [761269] 452017085ec5f0a21b272dac9cbaf51c (  2801) != 
 b4f9ab837e47574f145289faddc38ca2

My mismatches all have an uncompressed file length between 64 bytes and 163
bytes. I haven't checked, but that does seem to fit well with the known
top-level-attrib-file-bug fixed in BackupPC 3.2.0.
2801 bytes does sound rather long for a top-level attrib file, unless you have
many shares with long names :). You might want to check what the file
uncompresses to with something like

sudo .../BackupPC_zcat 
$TopDir/cpool/4/5/2/452017085ec5f0a21b272dac9cbaf51c | less

A top-level attrib file would include the share names and a lot of control
characters.

If the mismatches are only top-level attrib files, there's not much point in
investigating any further (or worrying about it - the data should be ok). If
they're not, there definitely is.

 The one without errors is running Fedora 15 32bit but with a pool that
 has been moved from Ubuntu 10.04 32bit a few months ago, the pool dates
 from the 20/09/2010. 

My guess would be that either that pool was created and used with BackupPC
3.2.0 or newer, or that you only have a single share for each host that is
backed up.

 The one with errors has always been on current Ubuntu 32bit, the pool
 dates back to 08/01/2010. 

This would seem to be an older pool (started pre 3.2.0 - which makes sense,
because 3.2.0 hadn't been released yet :). Switching to 3.2.0 would not have
*corrected* the errors, but only prevented new ones from appearing.

Regards,
Holger

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-07 Thread Holger Parplies
Hi,

Jeffrey J. Kosowsky wrote on 2011-10-07 01:08:03 -0400 [Re: [BackupPC-users] 
Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG]:
 Holger Parplies wrote at about 05:46:36 +0200 on Friday, October 7, 2011:
   Jeffrey J. Kosowsky wrote on 2011-10-06 22:54:44 -0400 [Re: 
 [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool?files - 
 WEIRD BUG]:
[...]
Why would PoolWrite.pm change the mod time of a pool file that is not
in the actual backup?

ok, so by now we seem to have concluded that Xfer::Rsync *might* modify a file
in a previous backup. Without looking at the code, it's just speculation, and
I haven't currently got the time, considering the code is quite complex.

After some reflection, it *does* make sense to add checksums to a file in a
previous backup, even if the file is found to have changed in the current
backup, because the *previous* backup may still be the reference for a future
backup (also, the pool file might be reused).

   Also: can you give a better resolution on the mod times, i.e. which one is
   older?
 
 OK...
 #82: Modify: 2011-04-27 03:05:04.551226502 -0400
 #110: Modify: 2011-04-27 03:05:19.813321479 -040
 
 So #110 was modified 15 seconds after #82. Hmmm

Strange time zone on #110 ;-).
15 seconds seems to be *ages*, considering we're talking about small files
(right?). I agree with you: Hmmm.

 Note both of those files have rsync checksums.
 
 When I looked at a couple of files without the rsync checksums, the
 mod times differed by a day.

Meaning they corresponded to the backup times?

 As an aside, I noticed that when I looked at version without the rsync
 checksum, that the corrected version also doesn't have an rsync
 checksum even after having being backed up many times subsequently --
 Now I thought that the rsync checksum should be added after the 2nd or
 3rd time the file is read... This makes me wonder whether there is
 potentially an issue with the rsync checksum...

I had thought the same (2nd backup - for the 3rd they should be present and
give a speedup). This is another thing we could check - which files in our
backups have checksum caches. What makes *me* wonder is that this still only
seems to happen to you.

Also, the XferLOG entry for both backups #82 and #110 have the line:
 pool 644   0/0 252 
 usr/share/FlightGear/Timezone/America/Port-au-Prince

But this doesn't make sense since if the new pool file was created as
part of backup #110, shouldn't it say 'create' and not 'pool'?
   
   Considering the mtime, yes.

Or, put differently: no.

If the file *has* rsync checksums, they wouldn't have been added on the first
backup. mtime says they were added by #110, thus the file would have been in
the pool without checksums then. Or would it have been the *reference file* at
the time checksums were added!?

 And we know BackupPC *thinks* it's a new file since it creates a new
 pool file chain member. But what and why did the original file get
 clobbered just before the then?

No, we don't really know what the situation was then. We're trying to
reconstruct it from evidence, which we are having a hard time interpreting
(at least I am).

None of this makes sense to me but somehow I suspect that herein may be a
clue to the problem...
   
   Xfer::Rsync opening the reference file?
 
 But what would cause it to truncate the data portion?

Use the force, read the source. I don't think it's *meant* to clobber the data
portion.

 Maybe it's something with rsync checksum caching/seeding when it tries
 to add a checksum?

There does seem to be a connection, though there are those 8-byte files
*without* checksum cache. Are these failed attempts of some sort, or are they
- like you first said - an indication that it's *not* checksum caching?

 I'm just guessing here...

So am I.

Regards,
Holger

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-07 Thread Tim Fletcher
On Fri, 2011-10-07 at 10:21 +0100, Tim Fletcher wrote:

 Another system is currently up to a/f/d of a full scan and has found the 
 following errors

The final answer off the server with the larger and older install of
backuppc is:

2836949 files in 4096 directories checked, 13 had wrong digests, of
these 0 zero-length.

What sort of information would be helpful for follow up?

-- 
Tim Fletcher t...@night-shade.org.uk


--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Tim Fletcher
On Wed, 2011-10-05 at 21:35 -0400, Jeffrey J. Kosowsky wrote:

 Finally, remember it's possible that many people are having this
 problem but just don't know it, since the only way one would know
 would be if one actually computed the partial file md5sums of all the
 pool files and/or restored  tested ones backups. Since the error
 affects only 71 out of 1.1 million files it's possible that no one has
 ever noticed...
 
 It would be interesting if other people would run a test on their
 pools to see if they have similar such issues (remember I only tested
 my pool in response to the recent thread of the guy who was having
 issues with his pool)...

Do you have a script or series of commands to do this check with?

I have access to a couple of backuppc installs of various ages and sizes
that I can test.

-- 
Tim Fletcher t...@night-shade.org.uk


--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Timothy J Massey
Tim Fletcher t...@night-shade.org.uk wrote on 10/06/2011 05:17:03 AM:

 Do you have a script or series of commands to do this check with?
 
 I have access to a couple of backuppc installs of various ages and sizes
 that I can test.

Me too, if it can run in a reasonable amount of time.  I'd hate to find 
out during a major restore that something is corrupt.

Tim Massey

 
Out of the Box Solutions, Inc. 
Creative IT Solutions Made Simple!
http://www.OutOfTheBoxSolutions.com
tmas...@obscorp.com 
 
22108 Harper Ave.
St. Clair Shores, MI 48080
Office: (800)750-4OBS (4627)
Cell: (586)945-8796 
--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Holger Parplies
Hi,

Tim Fletcher wrote on 2011-10-06 10:17:03 +0100 [Re: [BackupPC-users] Bad 
md5sums due to zero size (uncompressed) cpool files - WEIRD BUG]:
 On Wed, 2011-10-05 at 21:35 -0400, Jeffrey J. Kosowsky wrote:
  Finally, remember it's possible that many people are having this
  problem but just don't know it,

perfectly possible. I was just saying what possible cause came to my mind (any
many people *could* be running with an almost full disk). As you (Jeffrey)
said, the fact that the errors appeared only within a small time frame may or
may not be significant. I guess I don't need to ask whether you are *sure*
that the disk wasn't almost full back then.

To be honest, I would *hope* that only you had these issues and everyone
else's backups are fine, i.e. that your hardware and not the BackupPC software
was the trigger (though it would probably need some sort of software bug to
come up with the exact symptoms).

  since the only way one would know would be if one actually computed the
  partial file md5sums of all the pool files and/or restored  tested ones
  backups.

Almost.

  Since the error affects only 71 out of 1.1 million files it's possible
  that no one has ever noticed...

Well, let's think about that for a moment. We *have* had multiple issues that
*sounded* like corrupt attrib files. What would happen, if you had an attrib
file that decompresses to  in the reference backup?

  It would be interesting if other people would run a test on their
  pools to see if they have similar such issues (remember I only tested
  my pool in response to the recent thread of the guy who was having
  issues with his pool)...
 
 Do you have a script or series of commands to do this check with?

Actually, what I would propose in response to what you have found would be to
test for pool files that decompress to zero length. That should be
computationally less expensive than computing hashes - in particular, you can
stop decompressing once you have decompressed any content at all. Sure, that
just checks for this issue, not for possible different ones. On the one hand,
having the *correct* content in the pool under an incorrect hash would not be
a *serious* issue - it wouldn't prevent restoring your data, it would just
make pooling not work correctly (for the files affected). On the other,
different instances of this problem might point toward a common cause. And I
guess it would be possible to have *truncated* data (i.e. not zero-length, but
incomplete just the same) in your files as well.

You weren't asking me, but, yes, I wrote a script to check pool file contents
against the file names back in 2007. I'll append it here, but it would really
be interesting to add information on whether the file decompressed to
zero-length. I could easily add the decompressed file length to the output,
but it would make lines longer than 80 characters. Ok, I did that (and added
counting of zero-length files) - please make your terminals at least 93
characters wide :). I just scanned 1/16th of my pool and found various
mismatches, though none of them zero-length. Probably top-level attrib files.
Link counts might be interesting - I'll add them later.

 I have access to a couple of backuppc installs of various ages and sizes
 that I can test.

Try something like

BackupPC_verifyPool -s -p

to scan the whole pool, or

BackupPC_verifyPool -s -p -r 0

to test it on the 0/0/0 - 0/0/f pool subdirectories (-r takes a Perl
expression evaluating to an array of numbers between 0 and 255, e.g. 0,
0 .. 255 (the default), or 0, 1, 10 .. 15, 5; note the quotes to make your
shell pass it as a single argument). If you have switched off compression,
you'll have to add a '-u' (though I'm not sure this test makes much sense in
that case). You'll want either '-p' (progress) or '-v' (verbose) to see
anything happening. It *will* take time to traverse the pool, but you can
safely interrupt the script at any time and use the range parameter to resume
it later (though not at the exact place) - or just suspend and resume it (^Z).

You might need to change the 'use lib' statement in line 64 to match your
distribution.

Hope that helps.

Regards,
Holger
#!/usr/bin/perl
#= -*-perl-*-
#
# BackupPC_verifyPool: Verify pool integrity
#
# DESCRIPTION
#
#   BackupPC_verifyPool tries to verify the integrity of the pool files,
#   based on their file names, which are supposed to correspond to MD5
#   digests calculated from the (uncompressed) length and parts of their
#   (again uncompressed) contents.
#   Needs to be run as backuppc user for access to the pool files and
#   meta data.
#
#   Usage: BackupPC_verifyPool [-v] [-p] [-u] [-r range] [-s]
#
#   Options:
#
# -vShow what is going on. Without this flag, only errors found
#   are displayed, which might be very boring. Use '-p' for a
#   bit of entertainment without causing your tty to scroll so

Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Timothy J Massey
Holger Parplies wb...@parplies.de wrote on 10/06/2011 11:54:05 AM:

 If you have switched off compression,
 you'll have to add a '-u' (though I'm not sure this test makes much 
sense in
 that case).

Well, then, it won't make much sense in *my* case:  I missed that this is 
unique to compressed pools.  (Is it?)

Personally, I feel that compression has no place in backups.  Back when we 
were highly limited in capacity by terrible analog devices (i.e. tape!) I 
used it from necessity.  Now, I just throw bigger hard drives at it and am 
thankful.  :)

Timothy J. Massey

 
Out of the Box Solutions, Inc. 
Creative IT Solutions Made Simple!
http://www.OutOfTheBoxSolutions.com
tmas...@obscorp.com 
 
22108 Harper Ave.
St. Clair Shores, MI 48080
Office: (800)750-4OBS (4627)
Cell: (586)945-8796 
--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Les Mikesell
On Thu, Oct 6, 2011 at 11:56 AM, Timothy J Massey tmas...@obscorp.comwrote:

 Personally, I feel that compression has no place in backups.  Back when we
 were highly limited in capacity by terrible analog devices (i.e. tape!) I
 used it from necessity.  Now, I just throw bigger hard drives at it and am
 thankful.  :)


No, it makes perfect sense for backuppc where the point is to keep as much
history as possible online in a given space.  If you have trouble with
compression, just throw a faster CPU at it.  Just anecdotally, I saw 95%
compression recently on a system where someone requested including their web
content directory and forgot to mention the 40Gb of log files that happened
to be there.

-- 
   Les Mikesell
 lesmikes...@gmail.com
--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Timothy J Massey
Les Mikesell lesmikes...@gmail.com wrote on 10/06/2011 01:21:29 PM:

 On Thu, Oct 6, 2011 at 11:56 AM, Timothy J Massey tmas...@obscorp.com 
wrote:
 Personally, I feel that compression has no place in backups.  Back 
 when we were highly limited in capacity by terrible analog devices 
 (i.e. tape!) I used it from necessity.  Now, I just throw bigger 
 hard drives at it and am thankful.  :) 
 
 No, it makes perfect sense for backuppc where the point is to keep 
 as much history as possible online in a given space.

No, the point of backup is to be able to *restore* as much historical data 
as possible.  Keeping the data is not the important part.  Restoring it 
is.  Anything that is between storing data and *restoring* that data is in 
the way of that job.

Obviously, there *are* things that have to go between it:  a filesystem to 
store the data, for example.  But if I can avoid something in between 
storing my data and using my data, I absolutely will.

Compression falls in that area.

  If you have 
 trouble with compression, just throw a faster CPU at it.  Just 
 anecdotally, I saw 95% compression recently on a system where 
 someone requested including their web content directory and forgot 
 to mention the 40Gb of log files that happened to be there.

That's all well and good.  My issue is *NOT* performance.  Or capacity, 
for that matter.  I'm not saying that there is no value to compression. 
I'm saying that my objective for of a backup server is FIRST to be as 
simple and reliable as possible, and THEN only to have other features. 
Features that detract from that first requirement are considered 
skeptically.

This entire thread is a *PERFECT* example of why I have my reasons.  I 
have avoided an entire category of failure simply by throwing more disk at 
it (or by having a smaller window of backups).  Seeing as I have, at a 
minimum, 4 months of data (with varying gaps between the backups) within 
the backup server itself, and archive data in long-term storage every 
three months, I have what I (and my clients) feel to be enough data. Extra 
capacity would have no value.  Extra reliability *always* has value.

YMMV, of course.

Timothy J. Massey

 
Out of the Box Solutions, Inc. 
Creative IT Solutions Made Simple!
http://www.OutOfTheBoxSolutions.com
tmas...@obscorp.com 
 
22108 Harper Ave.
St. Clair Shores, MI 48080
Office: (800)750-4OBS (4627)
Cell: (586)945-8796 
--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Les Mikesell
On Thu, Oct 6, 2011 at 1:04 PM, Timothy J Massey tmas...@obscorp.comwrote:


  On Thu, Oct 6, 2011 at 11:56 AM, Timothy J Massey tmas...@obscorp.com
 wrote:

  Personally, I feel that compression has no place in backups.  Back
  when we were highly limited in capacity by terrible analog devices
  (i.e. tape!) I used it from necessity.  Now, I just throw bigger
  hard drives at it and am thankful.  :)
 
  No, it makes perfect sense for backuppc where the point is to keep
  as much history as possible online in a given space.

 No, the point of backup is to be able to *restore* as much historical data
 as possible.  Keeping the data is not the important part.  Restoring it is.
  Anything that is between storing data and *restoring* that data is in the
 way of that job.


 Obviously, there *are* things that have to go between it:  a filesystem to
 store the data, for example.  But if I can avoid something in between
 storing my data and using my data, I absolutely will.

 Compression falls in that area.


My experience is that the failures are more likely in the parts underneath
storing the data than in the compression process.   Admittedly, that goes
all the way back to storing zip files on floppies vs. large uncompressed
text files and media reliability has improved a bit.


   If you have
  trouble with compression, just throw a faster CPU at it.  Just
  anecdotally, I saw 95% compression recently on a system where
  someone requested including their web content directory and forgot
  to mention the 40Gb of log files that happened to be there.

 That's all well and good.  My issue is *NOT* performance.  Or capacity, for
 that matter.  I'm not saying that there is no value to compression.  I'm
 saying that my objective for of a backup server is FIRST to be as simple and
 reliable as possible, and THEN only to have other features.  Features that
 detract from that first requirement are considered skeptically.


Media fails.  Things that reduce the media necessary to hold a given amount
of data reduces the chances of failure.  The CPU and RAM can fail too, but
if those go you are fried whether you were compressing or not.



 This entire thread is a *PERFECT* example of why I have my reasons.  I have
 avoided an entire category of failure simply by throwing more disk at it (or
 by having a smaller window of backups).  Seeing as I have, at a minimum, 4
 months of data (with varying gaps between the backups) within the backup
 server itself, and archive data in long-term storage every three months, I
 have what I (and my clients) feel to be enough data.  Extra capacity would
 have no value.  Extra reliability *always* has value.

 YMMV, of course.


With compressible data you increase both capacity and reliability by
compressing before storage.   There's no magical difference between the
reliability of 'cat' vs 'zcat'.  Either one could fail.

--
   Les Mikesell
 lesmikes...@gmail.com
--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Arnold Krille
On Thursday 06 October 2011 20:04:57 Timothy J Massey wrote:
 Les Mikesell lesmikes...@gmail.com wrote on 10/06/2011 01:21:29 PM:
  On Thu, Oct 6, 2011 at 11:56 AM, Timothy J Massey tmas...@obscorp.com
 
 wrote:
  Personally, I feel that compression has no place in backups.  Back
  when we were highly limited in capacity by terrible analog devices
  (i.e. tape!) I used it from necessity.  Now, I just throw bigger
  hard drives at it and am thankful.  :)
  
  No, it makes perfect sense for backuppc where the point is to keep
  as much history as possible online in a given space.
 
 No, the point of backup is to be able to *restore* as much historical data
 as possible.  Keeping the data is not the important part.  Restoring it
 is.  Anything that is between storing data and *restoring* that data is in
 the way of that job.

Actually the point of a backup is to restore the most recent version of 
something from just before the trouble (whatever that might be).

Storing or restoring historical data is called an archive. Interestingly most 
commercial archive-solutions advertise their (certified) long-term archive but 
never the ability to get back that data. Makes you wonder...

Have fun,

Arnold


signature.asc
Description: This is a digitally signed message part.
--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Holger Parplies
Hi,

Les Mikesell wrote on 2011-10-06 13:42:09 -0500 [Re: [BackupPC-users] Bad 
md5sums due to zero size (uncompressed) cpool files - WEIRD BUG]:
 On Thu, Oct 6, 2011 at 1:04 PM, Timothy J Massey tmas...@obscorp.comwrote:
 
  No, the point of backup is to be able to *restore* as much historical data
  as possible.  Keeping the data is not the important part.  Restoring it is.
   Anything that is between storing data and *restoring* that data is in the
  way of that job.
 [...]
 My experience is that the failures are more likely in the parts underneath
 storing the data than in the compression process.   Admittedly, that goes
 all the way back to storing zip files on floppies vs. large uncompressed
 text files and media reliability has improved a bit.
 [...]
 Media fails.  Things that reduce the media necessary to hold a given amount
 of data reduces the chances of failure.  The CPU and RAM can fail too, but
 if those go you are fried whether you were compressing or not.
 [...]
 With compressible data you increase both capacity and reliability by
 compressing before storage.   There's no magical difference between the
 reliability of 'cat' vs 'zcat'.  Either one could fail.

the problem, I believe, is not 'cat' or 'zcat' failing, it's a *media* error,
as you pointed out, rendering a complete compressed file unusable instead of
only the erraneous bytes/sectors. Yes, there are compression algorithms that
are able to recover after an error, but I don't think BackupPC uses any of
these.

Sure, the common case might be losing a complete disk rather than having a few
bytes altered, but in that case, you can either recover from the remaining
disks (presuming you have some form of redundancy), or you lose your complete
pool, whether or not compressed.

While you might reduce the chances of failure with compression, you increase
the impact of failure.

  This entire thread is a *PERFECT* example of why I have my reasons.

I agree with your reasons, but it remains to be seen whether compression makes
any difference in the context of this thread. But I'll reply to that
separately.

Regards,
Holger

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Holger Parplies
Hi,

Timothy J Massey wrote on 2011-10-06 12:56:42 -0400 [Re: [BackupPC-users] Bad 
md5sums due to zero size (uncompressed)?cpool files - WEIRD BUG]:
 Holger Parplies wrote on 10/06/2011 11:54:05 AM:
 
  If you have switched off compression, you'll have to add a '-u' (though
  I'm not sure this test makes much sense in that case).
 
 Well, then, it won't make much sense in *my* case:  I missed that this is 
 unique to compressed pools.  (Is it?)

I don't really think so, but an empty *uncompressed* file is trivial to find
with find, presuming you can get yourself to accept its syntax ;-).

I don't think BackupPC distinguishes much between compressed and uncompressed
files (it's just a parameter to most methods). Of course, the bug could be in
the code that finally handles compressed files, but my gut feeling is that it's
in the pooling code (presuming there actually *is* a bug, that is).

 Personally, I feel that compression has no place in backups.  Back when we 
 were highly limited in capacity by terrible analog devices (i.e. tape!) I 
 used it from necessity.  Now, I just throw bigger hard drives at it and am 
 thankful.  :)

Don't you feel uncomfortable about deduplication, too, then? After all, it
introduces a single point of failure for common data. If you can't get back
your file from the most recent backup, because it has somehow been corrupted,
there's not much chance to get the same content from any other backup. In
other words, deduplication *is* a form of compression ;-).

Regards,
Holger

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Les Mikesell
On Thu, Oct 6, 2011 at 5:21 PM, Arnold Krille arn...@arnoldarts.de wrote:

  No, it makes perfect sense for backuppc where the point is to keep
  as much history as possible online in a given space.

 No, the point of backup is to be able to *restore* as much historical data
 as possible.  Keeping the data is not the important part.  Restoring it
 is.  Anything that is between storing data and *restoring* that data is in
 the way of that job.

 Actually the point of a backup is to restore the most recent version of
 something from just before the trouble (whatever that might be).

Yes, but throw in the fact that it may take some unpredictable amount
of time after the 'trouble' (which could have been accidentally
deleting a rarely used file) before anyone notices and you see why you
need some history available to restore from the version just before
the trouble.

-- 
  Les Mikesell
lesmikes...@gmail.com

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Les Mikesell
On Thu, Oct 6, 2011 at 5:42 PM, Holger Parplies wb...@parplies.de wrote:

 [...]
 With compressible data you increase both capacity and reliability by
 compressing before storage.   There's no magical difference between the
 reliability of 'cat' vs 'zcat'.  Either one could fail.

 the problem, I believe, is not 'cat' or 'zcat' failing, it's a *media* error,
 as you pointed out, rendering a complete compressed file unusable instead of
 only the erraneous bytes/sectors. Yes, there are compression algorithms that
 are able to recover after an error, but I don't think BackupPC uses any of
 these.

 Sure, the common case might be losing a complete disk rather than having a few
 bytes altered, but in that case, you can either recover from the remaining
 disks (presuming you have some form of redundancy), or you lose your complete
 pool, whether or not compressed.

I like RAID1 where you can recover from any singe surviving disk.

 While you might reduce the chances of failure with compression, you increase
 the impact of failure.

Maybe, maybe not.   You might find something usable if you scrape some
plain text or maybe even part of a tar file off a disk past a media
error which is pretty hard to do anyway, but most other file types
won't have much chance of working.

-- 
   Les Mikesell
lesmikes...@gmail.com

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Jeffrey J. Kosowsky
Holger Parplies wrote at about 17:54:05 +0200 on Thursday, October 6, 2011:
  Hi,
  
  Tim Fletcher wrote on 2011-10-06 10:17:03 +0100 [Re: [BackupPC-users] Bad 
  md5sums due to zero size (uncompressed) cpool files - WEIRD BUG]:
   On Wed, 2011-10-05 at 21:35 -0400, Jeffrey J. Kosowsky wrote:
Finally, remember it's possible that many people are having this
problem but just don't know it,
  
  perfectly possible. I was just saying what possible cause came to my mind 
  (any
  many people *could* be running with an almost full disk). As you (Jeffrey)
  said, the fact that the errors appeared only within a small time frame may or
  may not be significant. I guess I don't need to ask whether you are *sure*
  that the disk wasn't almost full back then.

Disk was *less* full then...

  To be honest, I would *hope* that only you had these issues and everyone
  else's backups are fine, i.e. that your hardware and not the BackupPC 
  software
  was the trigger (though it would probably need some sort of software bug to
  come up with the exact symptoms).
  
since the only way one would know would be if one actually computed the
partial file md5sums of all the pool files and/or restored  tested ones
backups.
  
  Almost.
  
Since the error affects only 71 out of 1.1 million files it's possible
that no one has ever noticed...
  
  Well, let's think about that for a moment. We *have* had multiple issues that
  *sounded* like corrupt attrib files. What would happen, if you had an attrib
  file that decompresses to  in the reference backup?
  
It would be interesting if other people would run a test on their
pools to see if they have similar such issues (remember I only tested
my pool in response to the recent thread of the guy who was having
issues with his pool)...
   
   Do you have a script or series of commands to do this check with?
  
  Actually, what I would propose in response to what you have found would be to
  test for pool files that decompress to zero length. That should be
  computationally less expensive than computing hashes - in particular, you can
  stop decompressing once you have decompressed any content at all.

Actually this could be made even faster since there seem to be 2
cases:
1. Files of length 8 bytes with first byte = 78 [no rsync checksums]
2. Files of length 57 bytes with first byte = d7 [rsync checksums]

So, all you need to do is to stat the size and then test the
first-byte

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Holger Parplies
Hi,

Les Mikesell wrote on 2011-10-06 18:17:06 -0500 [Re: [BackupPC-users] Bad 
md5sums due to zero size (uncompressed) cpool files - WEIRD BUG]:
 On Thu, Oct 6, 2011 at 5:21 PM, Arnold Krille arn...@arnoldarts.de wrote:
 
   No, it makes perfect sense for backuppc where the point is to keep
   as much history as possible online in a given space.
 
  No, the point of backup is to be able to *restore* as much historical data
  as possible.  Keeping the data is not the important part.  Restoring it
  is.  Anything that is between storing data and *restoring* that data is in
  the way of that job.
 
  Actually the point of a backup is to restore the most recent version of
  something from just before the trouble (whatever that might be).
 
 Yes, but throw in the fact that it may take some unpredictable amount
 of time after the 'trouble' (which could have been accidentally
 deleting a rarely used file) before anyone notices and you see why you
 need some history available to restore from the version just before
 the trouble.

I think you've all got it wrong. The real *point* of a backup is ...
whatever the person doing the backup wants it for. For some people that
might just be being able to say, hey, we did all we could to preserve the
data as long as legally required - too bad it didn't work out. Usually, it
seems to be sufficient that the data is stored, but some of us really *do*
want to be able to *restore* it, too, while others are doing backups mainly
for watching the progress. Fine. That's what the flexibility of BackupPC is
for, right?

Regards,
Holger

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread John Rouillard
On Thu, Oct 06, 2011 at 05:54:05PM +0200, Holger Parplies wrote:
 Try something like
 
   BackupPC_verifyPool -s -p
 
 to scan the whole pool, or
 
   BackupPC_verifyPool -s -p -r 0
 
 to test it on the 0/0/0 - 0/0/f pool subdirectories (-r takes a Perl
 expression evaluating to an array of numbers between 0 and 255, e.g. 0,
 0 .. 255 (the default), or 0, 1, 10 .. 15, 5; note the quotes to make your
 shell pass it as a single argument). If you have switched off compression,
 you'll have to add a '-u' (though I'm not sure this test makes much sense in
 that case). You'll want either '-p' (progress) or '-v' (verbose) to see
 anything happening. It *will* take time to traverse the pool, but you can
 safely interrupt the script at any time and use the range parameter to resume
 it later (though not at the exact place) - or just suspend and resume it (^Z).
 
 You might need to change the 'use lib' statement in line 64 to match your
 distribution.

I ran this with -r 0 and got as a summary:

  39000 files in 16 directories checked, 4 had wrong digests, of these 0
  zero-length.

running it with -r 1,2 now.

-- 
-- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Holger Parplies
Hi,

Jeffrey J. Kosowsky wrote on 2011-10-06 19:28:38 -0400 [Re: [BackupPC-users] 
Bad md5sums due to zero size (uncompressed)?cpool files - WEIRD BUG]:
 Holger Parplies wrote at about 17:54:05 +0200 on Thursday, October 6, 2011:
 [...]
   Actually, what I would propose [...] would be to
   test for pool files that decompress to zero length. [...]
 
 Actually this could be made even faster since there seem to be 2
 cases:
 1. Files of length 8 bytes with first byte = 78 [no rsync checksums]
 2. Files of length 57 bytes with first byte = d7 [rsync checksums]
 
 So, all you need to do is to stat the size and then test the
 first-byte

I'm surprised that that isn't faster by orders of magnitude. Running both
BackupPC_verifyPool and the modified version which does exactly this in
parallel, it's only about 3 times as fast (faster, though, when traversing
directories currently in cache). An additionally running 'find' does report
some 57-byte files, but they don't seem to decompress to . Let's see how
this continues. I still haven't found a single zero-length file in my pool
so far (BackupPC_verifyPool at 3/6/*, above check at 2/0/*).

Regards,
Holger

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Timothy J Massey
Holger Parplies wb...@parplies.de wrote on 10/06/2011 06:58:06 PM:

 Don't you feel uncomfortable about deduplication, too, then? After all, 
it
 introduces a single point of failure for common data.

No.  Dedupe is merely a side effect of a filesystem.  Dedupe errors are no 
different than any of 1,000 other possible filesystem errors.  If I want 
to defend against a dedupe error, I can do so by *also* protecting against 
a filesystem error, too, and get dedupe safety for free.

Nearly all of my servers are virtualized today.  These servers are backed 
up two ways:  a file-level backup and a snapshot-based backup.  So I have 
redundancy that protects against a *LOT* of failures.  While it would be 
harder to get a single file from a snapshot backup, it's very doable.  So 
I have that redundancy.  Even if BackupPC were to decide to *maliciously* 
destroy my data, no problem:  my snapshots don't use BackupPC!  :)

(The big reason for the snapshot backups is that Windows systems are *way* 
easire bare-metal restored from a snapshot.  But I get all the other 
advantages, too.)

 If you can't get back
 your file from the most recent backup, because it has somehow been 
corrupted,
 there's not much chance to get the same content from any other backup. 
In
 other words, deduplication *is* a form of compression ;-).

That's true.  But I would consider that accidental (or maybe incidental) 
redundancy.  It's still all on the same disk, with the same filesystem.  I 
consider dedupe a problem to defend against, but I also consider 
filesystem (or disk!) failure a problem, too.  I protect against *all* of 
them, but not necessarily each individually.

Redundancy is a good thing.

(While we're on the subject, I've considered Les' argument that compressed 
files take less space on the disk and are therefore less likely to be 
corrupted before.  It's true, but like dedupe errors, it's just *one* 
possible failure--and to me, not a very likely one.  It's not one worth 
defending against by *itself*.  Having uncompressed files makes, e.g., 
scanning a badly scrambled filesystem for salvagable data *much* easier. 
When it comes to backup, I will almost *always* choose simple over fancy, 
even if fancy gives me other advantages but not additional safety.)

Timothy J. Massey

 
Out of the Box Solutions, Inc. 
Creative IT Solutions Made Simple!
http://www.OutOfTheBoxSolutions.com
tmas...@obscorp.com 
 
22108 Harper Ave.
St. Clair Shores, MI 48080
Office: (800)750-4OBS (4627)
Cell: (586)945-8796 
--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Les Mikesell
On Thu, Oct 6, 2011 at 8:06 PM, Timothy J Massey tmas...@obscorp.comwrote:

 Redundancy is a good thing.

 (While we're on the subject, I've considered Les' argument that compressed
 files take less space on the disk and are therefore less likely to be
 corrupted before.  It's true, but like dedupe errors, it's just *one*
 possible failure--and to me, not a very likely one.  It's not one worth
 defending against by *itself*.  Having uncompressed files makes, e.g.,
 scanning a badly scrambled filesystem for salvagable data *much* easier.


I've seen orders of magnitude more media errors then filesystem errors.  In
fact I can barely recall a filesystem error that was a big problem for fsck
to fix - well except for one case that was really caused by bad RAM where
the file contents would also have been randomly bad.

 When it comes to backup, I will almost *always* choose simple over fancy,
 even if fancy gives me other advantages but not additional safety.)


Simple to me means that the result fits on one disk which I can raid-mirror,
split, and keep several extra snapshot copies. Compression makes that a lot
easier.  And I'd probably go back to an earlier copy instead of groveling
through the live one in the unlikely scenario that the filesystem fails.
Plus, it covers the case of a building disaster with one of the copies
offsite.

-- 
  Les Mikesell
 lesmikes...@gmail.com
--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Jeffrey J. Kosowsky
Holger Parplies wrote at about 02:45:56 +0200 on Friday, October 7, 2011:
  Hi,
  
  Jeffrey J. Kosowsky wrote on 2011-10-06 19:28:38 -0400 [Re: [BackupPC-users] 
  Bad md5sums due to zero size (uncompressed)?cpool files - WEIRD BUG]:
   Holger Parplies wrote at about 17:54:05 +0200 on Thursday, October 6, 2011:
   [...]
 Actually, what I would propose [...] would be to
 test for pool files that decompress to zero length. [...]
   
   Actually this could be made even faster since there seem to be 2
   cases:
   1. Files of length 8 bytes with first byte = 78 [no rsync checksums]
   2. Files of length 57 bytes with first byte = d7 [rsync checksums]
   
   So, all you need to do is to stat the size and then test the
   first-byte
  
  I'm surprised that that isn't faster by orders of magnitude. Running both
  BackupPC_verifyPool and the modified version which does exactly this in
  parallel, it's only about 3 times as fast (faster, though, when traversing
  directories currently in cache). An additionally running 'find' does report
  some 57-byte files, but they don't seem to decompress to . Let's see how
  this continues. I still haven't found a single zero-length file in my pool
  so far (BackupPC_verifyPool at 3/6/*, above check at 2/0/*).
  

Do those 57 byte files have rsync checksums or are they just
compressed files that happen to be 57 bytes long?

Given that the rsync checksums have both block and file checksums,
it's hard to believe that a 57 byte file including rsync checksums
would have much if any data. Even with no blocks of data, you have:
- 0xb3 separator (1 byte)
- File digest which is 2 copies of the full 16 byte MD4 digest (32 bytes)
- Digest info consisting of block size, checksum seed, length of the block 
digest and the magic number (16 bytes)

The above total 49 bytes which is exactly the delta between a 57 byte
empty compressed file with rsync checksums and an 8 byte empty
compressed file without rsync checksums. The common 8 bytes is
presumably the zlib header (which I think is 2 bytes) and the trailer
which would then be 6 bytes.

Note: If you have any data, then you would have 20 bytes (consisting a 4 byte
Adler32 and 16byte MD4 digest) for each block of data.


--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Holger Parplies
Hi,

Jeffrey J. Kosowsky wrote on 2011-10-06 22:09:52 -0400 [Re: [BackupPC-users] 
Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG]:
 Holger Parplies wrote at about 02:45:56 +0200 on Friday, October 7, 2011:
   Jeffrey J. Kosowsky wrote on 2011-10-06 19:28:38 -0400 [Re: 
 [BackupPC-users] Bad md5sums due to zero size (uncompressed)?cpool files - 
 WEIRD BUG]:

Actually this could be made even faster since there seem to be 2
cases:
1. Files of length 8 bytes with first byte = 78 [no rsync checksums]
2. Files of length 57 bytes with first byte = d7 [rsync checksums]

So, all you need to do is to stat the size and then test the
first-byte
   [...]
   An additionally running 'find' does report some 57-byte files, but they
   don't seem to decompress to .
 
 Do those 57 byte files have rsync checksums or are they just
 compressed files that happen to be 57 bytes long?

well, I implemented your suggestion quoted above to determine whether they
would decompress empty, so they are just compressed files that happen to be 57
bytes long. Actually, I included a debug option to output 57 byte files
with a first byte \x78, and they seem to show up there (I didn't check if all
do, I was only interested in checking my implementation).

My point was that 'find cpool \( -size 8c -o -size 57c \)' does show quite a
number of hits - so many that it's hard to see whether there are any 8-byte
files in between - and the 57-byte ones are pointless, because you'd have to
individually determine whether they are just compressed files that happen to
be 57 bytes long or empty compressed files with checksums. I simply hadn't
expected that. 'find cpool -size 8c' should still be useful as a lightweight
check for your first case.

 Given that the rsync checksums have both block and file checksums,
 it's hard to believe that a 57 byte file including rsync checksums
 would have much if any data.

I thought you were positive that it can't. Actually, your reasoning seems to
say that it can't, but what about an 8-byte file without checksums? There's
not much point in looking for 8-byte files with a \x78 if it's uncertain that
they're really empty - at least we'd need to decompress to check.

Regards,
Holger

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Jeffrey J. Kosowsky
Jeffrey J. Kosowsky wrote at about 18:58:51 -0400 on Tuesday, October 4, 2011:
  After the recent thread on bad md5sum file names, I ran a check on all
  my 1.1 million cpool files to check whether the md5sum file names are
  correct.
  
  I got a total of 71 errors out of 1.1 million files:
  - 3 had data in it (though each file was only a few hundred bytes
long)
  
  - 68 of the 71 were *zero* sized when decompressed
29 were 8 bytes long corresponding to zlib compression of a zero
length file
  
39 were 57 bytes long corresponding to a zero length file with an
rsync checksum
  
  Each such cpool file has anywhere from 2 to several thousand links
  
  The 68 *zero* length files should *not* be in the pool since zero
  length files are not pooled. So, something is really messed up here.
  
  It turns out though that none of those zero-length decompressed cpool
  files were originally zero length but somehow they were stored in the
  pool as zero length with an md5sum that is correct for the original
  non-zero length file.
  
  Some are attrib files and some are regular files.
  
  Now it seems unlikely that the files were corrupted after the backups
  were completed since the header and trailers are correct and there is
  no way that the filesystem would just happen to zero out the data
  while leaving the header and trailers intact (including checksums).
  
  Also, it's not the rsync checksum caching causing the problem since
  some of the zero length files are without checksums.
  
  Now the fact that the md5sum file names are correct relative to the
  original data means that the file was originally read correctly by
  BackupPC..
  
  So it seems that for some reason the data was truncated when
  compressing and writing the cpool/pc file but after the partial file
  md5sum was calculated. And it seems to have happened multiple times
  for some of these files since there are multiple pc files linked to
  the same pool file (and before linking to a cpool file, the actual
  content of the files are compared since the partial file md5sum is not
  unique).
  
  Also, on my latest full backup a spot check shows that the files are
  backed up correctly to the right non-zero length cpool file which of
  course has the same (now correct) partial file md5sum. Though as you
  would expect, that cpool file has a _0 suffix since the earlier zero
  length is already stored (incorrectly) as the base of the chain.
  
  I am not sure what is going on with the other 3 files since I have yet
  to find them in the pc tree (my 'find' routine is still running)
  
  I will continue to investigate this but this is very strange and
  worrying since truncated cpool files means data loss!
  
  In summary, what could possibly cause BackupPC to truncate the data
  sometime between reading the file/calculating the partial file md5sum
  and compressing/writing the file to the cpool?
  

OK... this is a little weird maybe...

I looked at one file which is messed up:
 /f%2f/fusr/fshare/fFlightGear/fTimezone/fAmerica/fPort-au-Prince

On all (saved) backups, up to backup 82, the file (and the
corresponding cpool file e/f/0/ef0bd9db744f651b9640ea170b07225a) is
zero length decompressed.

My next saved backup is #110 which is non-zero length and has the
correct contents. This is true for all subsequent saved backups. The
corresponding pool file is as might be expected:
e/f/0/ef0bd9db744f651b9640ea170b07225a_0
which makes sense since PoolWrite.pm sees that while the partial file
md5sum is the same as the root, the contents differ (since the root is
empty) so it creates a new pool file with the same stem but with index
0.

Note the original file itself was unchanged between #82 and #110.

BUT WHAT IS INTERESTING is that both pool files have the same
modification time of: 2011-04-27 03:05
which according to the logs is during the time at which backup #110
was backing up the relevant share.

I don't understand this since why would backup #110 change the mod
time of the root file which was created during an earlier backup. 

Why would PoolWrite.pm change the mod time of a pool file that is not
in the actual backup?

Could it be that this backup somehow destroyed the data in the file?
(but even so, what would cause this to happen)

Also, the XferLOG entry for both backups #82 and #110 have the line:
 pool 644   0/0 252 
usr/share/FlightGear/Timezone/America/Port-au-Prince

But this doesn't make sense since if the new pool file was created as
part of backup #110, shouldn't it say 'create' and not 'pool'?

None of this makes sense to me but somehow I suspect that herein may be a
clue to the problem...

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this 

Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Holger Parplies
Hi,

Jeffrey J. Kosowsky wrote on 2011-10-06 22:54:44 -0400 [Re: [BackupPC-users] 
Bad md5sums due to zero size (uncompressed) cpool?files - WEIRD BUG]:
 OK... this is a little weird maybe...
 [...]
 On all (saved) backups, up to backup 82, the file (and the
 corresponding cpool file e/f/0/ef0bd9db744f651b9640ea170b07225a) is
 zero length decompressed.
 
 My next saved backup is #110 which is non-zero length and has the
 correct contents. This is true for all subsequent saved backups.
 [...]
 BUT WHAT IS INTERESTING is that both pool files have the same
 modification time of: 2011-04-27 03:05
 which according to the logs is during the time at which backup #110
 was backing up the relevant share.

you'll hate me asking this, but: do any of your repair scripts touch the
modification time?

Also: can you give a better resolution on the mod times, i.e. which one is
older?

 Why would PoolWrite.pm change the mod time of a pool file that is not
 in the actual backup?

PoolWrite normally wouldn't, unless something is going wrong somewhere (and it
probably wouldn't use utime() but rather open the file for writing).

This is an rsync backup, right?

 Could it be that this backup somehow destroyed the data in the file?
 (but even so, what would cause this to happen)

Hmm, let's see ... a bug?

 Also, the XferLOG entry for both backups #82 and #110 have the line:
  pool 644   0/0 252 
 usr/share/FlightGear/Timezone/America/Port-au-Prince
 
 But this doesn't make sense since if the new pool file was created as
 part of backup #110, shouldn't it say 'create' and not 'pool'?

Considering the mtime, yes. If it's rsync, an *identical* file should be
'same', if it's tar, an identical file would be 'pool'. Could this be an
indication that it wasn't BackupPC that clobbered the file?

 None of this makes sense to me but somehow I suspect that herein may be a
 clue to the problem...

Xfer::Rsync opening the reference file?

So far, *none* of my 1.1 million pool files (875000 checked so far) seem to be
empty. I'm using a different BackupPC version (2.1.2), but others seem to be
checking their pools, too. I'd still like to know whether this has happened
*anywhere* else, and if yes, what those BackupPC setups have in common with
yours.

Regards,
Holger

P.S.: If anyone wants a copy of the quick pool check - I've named it
  BackupPC_jeffrey for lack of a better idea - please ask. It's mainly
  a modified copy of BackupPC_verifyPool, hacked together in a few
  minutes. I *have* used open FILE, '', ..., so I'm fairly sure I'm
  not clobbering all 57-byte-files in the pool, but I can't say I did
  much testing, and I was already tired when writing it, so you might
  prefer to wait a day ;-). I might even add IO::Dirent support ...

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Holger Parplies
Hi,

I wrote on 2011-10-07 05:46:36 +0200 [Re: [BackupPC-users] Bad md5sums due to 
zero size (uncompressed) cpool files - WEIRD BUG]:
 [...]
 So far, *none* of my 1.1 million pool files (875000 checked so far) seem to be
 empty.

1148398 files in 4096 directories checked, 0 zero-length with, 0 without
checksums.

(according to your suggestions - 8 byte \x78 and 57 byte \xd7). Check took
about 4 hours including interruptions (suspended for about 45 minutes for
BackupPC_nightly, among others).

Regards,
Holger

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 Thread Jeffrey J. Kosowsky
Holger Parplies wrote at about 05:46:36 +0200 on Friday, October 7, 2011:
  Hi,
  
  Jeffrey J. Kosowsky wrote on 2011-10-06 22:54:44 -0400 [Re: [BackupPC-users] 
  Bad md5sums due to zero size (uncompressed) cpool?files - WEIRD BUG]:
   OK... this is a little weird maybe...
   [...]
   On all (saved) backups, up to backup 82, the file (and the
   corresponding cpool file e/f/0/ef0bd9db744f651b9640ea170b07225a) is
   zero length decompressed.
   
   My next saved backup is #110 which is non-zero length and has the
   correct contents. This is true for all subsequent saved backups.
   [...]
   BUT WHAT IS INTERESTING is that both pool files have the same
   modification time of: 2011-04-27 03:05
   which according to the logs is during the time at which backup #110
   was backing up the relevant share.
  
  you'll hate me asking this, but: do any of your repair scripts touch the
  modification time

None of them set the modification time (except the pool/pc copy script
where it sets the mod time to the original mod time). But I didn't run
the repair scripts during this time period and both files are modified 
*exactly* during the time of backup #110.

  
  Also: can you give a better resolution on the mod times, i.e. which one is
  older?

OK...
#82: Modify: 2011-04-27 03:05:04.551226502 -0400
#110: Modify: 2011-04-27 03:05:19.813321479 -040

So #110 was modified 15 seconds after #82. Hmmm
Note both of those files have rsync checksums.

When I looked at a couple of files without the rsync checksums, the
mod times differed by a day.

As an aside, I noticed that when I looked at version without the rsync
checksum, that the corrected version also doesn't have an rsync
checksum even after having being backed up many times subsequently --
Now I thought that the rsync checksum should be added after the 2nd or
3rd time the file is read... This makes me wonder whether there is
potentially an issue with the rsync checksum...

  
   Why would PoolWrite.pm change the mod time of a pool file that is not
   in the actual backup?
  
  PoolWrite normally wouldn't, unless something is going wrong somewhere (and 
  it
  probably wouldn't use utime() but rather open the file for writing).
  
  This is an rsync backup, right?

Yes...

   Could it be that this backup somehow destroyed the data in the file?
   (but even so, what would cause this to happen)
  
  Hmm, let's see ... a bug?

   Also, the XferLOG entry for both backups #82 and #110 have the line:
pool 644   0/0 252 
   usr/share/FlightGear/Timezone/America/Port-au-Prince
   
   But this doesn't make sense since if the new pool file was created as
   part of backup #110, shouldn't it say 'create' and not 'pool'?
  
  Considering the mtime, yes. If it's rsync, an *identical* file should be
  'same', if it's tar, an identical file would be 'pool'. Could this be an
  indication that it wasn't BackupPC that clobbered the file?

Well, it is 'rsync'...
And we know BackupPC *thinks* it's a new file since it creates a new
pool file chain member. But what and why did the original file get
clobbered just before the then?

  
   None of this makes sense to me but somehow I suspect that herein may be a
   clue to the problem...
  
  Xfer::Rsync opening the reference file?

But what would cause it to truncate the data portion?

Maybe it's something with rsync checksum caching/seeding when it tries
to add a checksum? I'm just guessing here...


--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-05 Thread Holger Parplies
Hi,

Jeffrey J. Kosowsky wrote on 2011-10-04 18:58:51 -0400 [[BackupPC-users] Bad 
md5sums due to zero size (uncompressed) cpool files - WEIRD BUG]:
 After the recent thread on bad md5sum file names, I ran a check on all
 my 1.1 million cpool files to check whether the md5sum file names are
 correct.
 
 I got a total of 71 errors out of 1.1 million files:
 [...]
 - 68 of the 71 were *zero* sized when decompressed
 [...]
 Each such cpool file has anywhere from 2 to several thousand links
 [...]
 It turns out though that none of those zero-length decompressed cpool
 files were originally zero length but somehow they were stored in the
 pool as zero length with an md5sum that is correct for the original
 non-zero length file.
 [...]
 Now it seems unlikely that the files were corrupted after the backups
 were completed since the header and trailers are correct and there is
 no way that the filesystem would just happen to zero out the data
 while leaving the header and trailers intact (including checksums).
 [...]
 Also, on my latest full backup a spot check shows that the files are
 backed up correctly to the right non-zero length cpool file which of
 course has the same (now correct) partial file md5sum. Though as you
 would expect, that cpool file has a _0 suffix since the earlier zero
 length is already stored (incorrectly) as the base of the chain.
 [...]
 In summary, what could possibly cause BackupPC to truncate the data
 sometime between reading the file/calculating the partial file md5sum
 and compressing/writing the file to the cpool?

the first and only thing that springs to my mind is a full disk. In some
situations, BackupPC needs to create a temporary file (RStmp, I think) to
reconstruct the remote file contents. This file can become quite large, I
suppose. Independant of that, I remember there is *at least* an incorrect
size fixup which needs to copy already written content to a different hash
chain (because the hash turns out to be incorrect *after*
transmission/compression). Without looking closely at the code, I could
imagine (but am not sure) that this could interact badly with a full disk:

* output file is already open, headers have been written
* huge RStmp file is written, filling up the disk
* received file contents are for some reason written to disk (which doesn't
  work - no space left) and read back for writing into the output file (giving
  zero-length contents)
* trailing information is written to the output file - this works, because
  there is enough space left in the already allocated block for the file
* RStmp file gets removed and the rest of the backup continues without
  apparent error

Actually, for the case I tried to invent above, this doesn't seem to fit, but
the general idea could apply - at least the symptoms are correct content
stored somewhere but read back incorrectly. This would mean the result of a
write operation would have to be unchecked by BackupPC somewhere (or handled
incorrectly).

So, the question is: have you been running BackupPC with an almost full disk?
Would there be at least one file in the backup set, of which the
*uncompressed* size is large in comparison to the reserved space (-
DfMaxUsagePct)?

For the moment, that's the most concrete thing I can think of. Of course,
writing to a temporary location might be fine an reading could fail (you
haven't modified your BackupPC code to use a signal handler for some arbitrary
purposes, have you? ;-). Or your Perl version could have an obscure bug that
occasionally trashes the contents of a string. Doesn't sound very likely,
though.

What *size* are the original files?

Ah, yes. How many backups are (or rather were) you running in parallel? Noone
said the RStmp needs to be created by the affected backup ...

Regards,
Holger

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-05 Thread Jeffrey J. Kosowsky
Holger Parplies wrote at about 17:41:48 +0200 on Wednesday, October 5, 2011:
  Hi,
  
  Jeffrey J. Kosowsky wrote on 2011-10-04 18:58:51 -0400 [[BackupPC-users] Bad 
  md5sums due to zero size (uncompressed) cpool files - WEIRD BUG]:
   After the recent thread on bad md5sum file names, I ran a check on all
   my 1.1 million cpool files to check whether the md5sum file names are
   correct.
   
   I got a total of 71 errors out of 1.1 million files:
   [...]
   - 68 of the 71 were *zero* sized when decompressed
   [...]
   Each such cpool file has anywhere from 2 to several thousand links
   [...]
   It turns out though that none of those zero-length decompressed cpool
   files were originally zero length but somehow they were stored in the
   pool as zero length with an md5sum that is correct for the original
   non-zero length file.
   [...]
   Now it seems unlikely that the files were corrupted after the backups
   were completed since the header and trailers are correct and there is
   no way that the filesystem would just happen to zero out the data
   while leaving the header and trailers intact (including checksums).
   [...]
   Also, on my latest full backup a spot check shows that the files are
   backed up correctly to the right non-zero length cpool file which of
   course has the same (now correct) partial file md5sum. Though as you
   would expect, that cpool file has a _0 suffix since the earlier zero
   length is already stored (incorrectly) as the base of the chain.
   [...]
   In summary, what could possibly cause BackupPC to truncate the data
   sometime between reading the file/calculating the partial file md5sum
   and compressing/writing the file to the cpool?
  
  the first and only thing that springs to my mind is a full disk. In some
  situations, BackupPC needs to create a temporary file (RStmp, I think) to
  reconstruct the remote file contents. This file can become quite large, I
  suppose. Independant of that, I remember there is *at least* an incorrect
  size fixup which needs to copy already written content to a different hash
  chain (because the hash turns out to be incorrect *after*
  transmission/compression). Without looking closely at the code, I could
  imagine (but am not sure) that this could interact badly with a full disk:
  
  * output file is already open, headers have been written
  * huge RStmp file is written, filling up the disk
  * received file contents are for some reason written to disk (which doesn't
work - no space left) and read back for writing into the output file 
  (giving
zero-length contents)
  * trailing information is written to the output file - this works, because
there is enough space left in the already allocated block for the file
  * RStmp file gets removed and the rest of the backup continues without
apparent error
  
  Actually, for the case I tried to invent above, this doesn't seem to fit, but
  the general idea could apply - at least the symptoms are correct content
  stored somewhere but read back incorrectly. This would mean the result of a
  write operation would have to be unchecked by BackupPC somewhere (or handled
  incorrectly).
  
  So, the question is: have you been running BackupPC with an almost full disk?

Nope - disk has plenty of space...

  Would there be at least one file in the backup set, of which the
  *uncompressed* size is large in comparison to the reserved space (-
  DfMaxUsagePct)?

Nothing large by today's standard - I don't backup any large databases
or video files.

  
  For the moment, that's the most concrete thing I can think of. Of course,
  writing to a temporary location might be fine an reading could fail (you
  haven't modified your BackupPC code to use a signal handler for some 
  arbitrary
  purposes, have you? ;-). Or your Perl version could have an obscure bug that
  occasionally trashes the contents of a string. Doesn't sound very likely,
  though.
  
  What *size* are the original files?

About half are attrib files of normal directories so they are quite
small. One I just checked was a kernel Documentation file of  20K

  
  Ah, yes. How many backups are (or rather were) you running in parallel? Noone
  said the RStmp needs to be created by the affected backup ...

I don't run more than 2-3 in parallel.
And again my disk is far from full (about 60% of a 250GB partition)
and the files with errors so far all seem to be small.

I do have the partition mounted over NFS but I'm now using an updated
kernel on both machines (kernel 2.6.32) so it's not the same buggy
stuff I had years ago with an old 2.6.12 kernel.

But still, I would think an NFS error would trash the entire file, not
just the data portion of a compressed file...

Looking at the timestamps of the bad pool files, the errors occurred in
the Feb-April time frame (note this pool was started in February) and
there have been no errors since then. But the errors are sprinkled
across ~10 different days during that time period

[BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-04 Thread Jeffrey J. Kosowsky
After the recent thread on bad md5sum file names, I ran a check on all
my 1.1 million cpool files to check whether the md5sum file names are
correct.

I got a total of 71 errors out of 1.1 million files:
- 3 had data in it (though each file was only a few hundred bytes
  long)

- 68 of the 71 were *zero* sized when decompressed
 29 were 8 bytes long corresponding to zlib compression of a zero
 length file

 39 were 57 bytes long corresponding to a zero length file with an
 rsync checksum

Each such cpool file has anywhere from 2 to several thousand links

The 68 *zero* length files should *not* be in the pool since zero
length files are not pooled. So, something is really messed up here.

It turns out though that none of those zero-length decompressed cpool
files were originally zero length but somehow they were stored in the
pool as zero length with an md5sum that is correct for the original
non-zero length file.

Some are attrib files and some are regular files.

Now it seems unlikely that the files were corrupted after the backups
were completed since the header and trailers are correct and there is
no way that the filesystem would just happen to zero out the data
while leaving the header and trailers intact (including checksums).

Also, it's not the rsync checksum caching causing the problem since
some of the zero length files are without checksums.

Now the fact that the md5sum file names are correct relative to the
original data means that the file was originally read correctly by
BackupPC..

So it seems that for some reason the data was truncated when
compressing and writing the cpool/pc file but after the partial file
md5sum was calculated. And it seems to have happened multiple times
for some of these files since there are multiple pc files linked to
the same pool file (and before linking to a cpool file, the actual
content of the files are compared since the partial file md5sum is not
unique).

Also, on my latest full backup a spot check shows that the files are
backed up correctly to the right non-zero length cpool file which of
course has the same (now correct) partial file md5sum. Though as you
would expect, that cpool file has a _0 suffix since the earlier zero
length is already stored (incorrectly) as the base of the chain.

I am not sure what is going on with the other 3 files since I have yet
to find them in the pc tree (my 'find' routine is still running)

I will continue to investigate this but this is very strange and
worrying since truncated cpool files means data loss!

In summary, what could possibly cause BackupPC to truncate the data
sometime between reading the file/calculating the partial file md5sum
and compressing/writing the file to the cpool?

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/