Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-31 Thread Jeffrey J. Kosowsky
John Rouillard wrote at about 20:13:15 + on Thursday, October 30, 2008:
 > On Thu, Oct 30, 2008 at 10:04:26AM -0400, Jeffrey J. Kosowsky wrote:
 > > Holger Parplies wrote at about 11:29:49 +0100 on Thursday, October 30, 
 > > 2008:
 > >  > Hi,
 > >  > 
 > >  > Jeffrey J. Kosowsky wrote on 2008-10-30 03:55:16 -0400 
 > > [[BackupPC-users] Duplicate files in pool with same CHECKSUM and same 
 > > CONTENTS]:
 > >  > > I have found a number of files in my pool that have the same checksum
 > >  > > (other than a trailing _0 or _1) and also the SAME CONTENT. Each copy
 > >  > > has a few links to it by the way.
 > >  > > 
 > >  > > Why is this happening? 
 > >  > 
 > >  > presumably creating a link sometimes fails, so BackupPC copies the file,
 > >  > assuming the hard link limit has been reached. I suspect problems with 
 > > your
 > >  > NFS server, though not a "stale NFS file handle" in this case,
 > >  > since the file succeeds. Strange.
 > > 
 > > Yes - I am beginning to think that may be true. However as I mentioned
 > > in the other thread, the syslog on the nfs server is clean and the one
 > > on the client shows only about a dozen or so nfs timeouts over the
 > > past 12 hours which is the time period I am looking at now. Otherwise,
 > > I don't see any nfs errors.
 > > So if it is a nfs problem, something seems to be happening somewhat
 > > randomly and invisibly to the filesystem.
 > 
 > IIRC you are using a soft nfs mount option right? If you are writing
 > to an NFS share that is not recommended. Try changing it to a hard
 > mount and see if the problem goes away. I only used soft mounts on
 > read only filesystems.
 > 
Unfortunately, this did not help. I assume the problem is somewhere in
the HW/SW.

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 Thread Holger Parplies
Hi,

[could we agree on a subject line without tabs? ;-]

Jeffrey J. Kosowsky wrote on 2008-10-30 20:31:15 -0400 [Re: [BackupPC-users] 
Duplicate files in pool with same CHECKSUM and same CONTENTS]:
> Jeffrey J. Kosowsky wrote at about 20:26:35 -0400 on Thursday, October 30, 
> 2008:
>  > It's really weird in that it seems to work the first time a directory
>  > is read but after a directory has been read a few times, it starts
>  > messing up. It's almost like the results are being stored in cache and
>  > then the cache is corrupted.
> 
> In fact, I have found two ways to assuredly allow me to read the
> directory again (at least for a few minutes or tries until it gets
> corrupted again):
> 1. Remount the nfs share
> 2. Read the directory directly on the server (without nfs)

bad memory on either client or server? Bug in the NFS implementation on client
or server? You said you built a kernel for the NAS device. Could anything have
gone wrong?

Have you tried the 'noac' mount option? Which NFS version are you using? Over
TCP or UDP?

Have you found out anything about ATAoE (or iSCSI, for that matter)
capabilities of the device?

Regards,
Holger

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 Thread Jeffrey J. Kosowsky
Jeffrey J. Kosowsky wrote at about 20:26:35 -0400 on Thursday, October 30, 2008:
 > John Rouillard wrote at about 20:13:15 + on Thursday, October 30, 2008:
 >  > On Thu, Oct 30, 2008 at 10:04:26AM -0400, Jeffrey J. Kosowsky wrote:
 >  > > Holger Parplies wrote at about 11:29:49 +0100 on Thursday, October 30, 
 > 2008:
 >  > >  > Hi,
 >  > >  > 
 >  > >  > Jeffrey J. Kosowsky wrote on 2008-10-30 03:55:16 -0400 
 > [[BackupPC-users] Duplicate files in pool with same CHECKSUM and same 
 > CONTENTS]:
 >  > >  > > I have found a number of files in my pool that have the same 
 > checksum
 >  > >  > > (other than a trailing _0 or _1) and also the SAME CONTENT. Each 
 > copy
 >  > >  > > has a few links to it by the way.
 >  > >  > > 
 >  > >  > > Why is this happening? 
 >  > >  > 
 >  > >  > presumably creating a link sometimes fails, so BackupPC copies the 
 > file,
 >  > >  > assuming the hard link limit has been reached. I suspect problems 
 > with your
 >  > >  > NFS server, though not a "stale NFS file handle" in this case,
 >  > >  > since the file succeeds. Strange.
 >  > > 
 >  > > Yes - I am beginning to think that may be true. However as I mentioned
 >  > > in the other thread, the syslog on the nfs server is clean and the one
 >  > > on the client shows only about a dozen or so nfs timeouts over the
 >  > > past 12 hours which is the time period I am looking at now. Otherwise,
 >  > > I don't see any nfs errors.
 >  > > So if it is a nfs problem, something seems to be happening somewhat
 >  > > randomly and invisibly to the filesystem.
 >  > 
 >  > IIRC you are using a soft nfs mount option right? If you are writing
 >  > to an NFS share that is not recommended. Try changing it to a hard
 >  > mount and see if the problem goes away. I only used soft mounts on
 >  > read only filesystems.
 > 
 > True -- I changed it to 'hard' but am still encountering the
 > problem... FRUSTRATING...
 > 
 > It's really weird in that it seems to work the first time a directory
 > is read but after a directory has been read a few times, it starts
 > messing up. It's almost like the results are being stored in cache and
 > then the cache is corrupted.

In fact, I have found two ways to assuredly allow me to read the
directory again (at least for a few minutes or tries until it gets
corrupted again):
1. Remount the nfs share
2. Read the directory directly on the server (without nfs)

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 Thread Jeffrey J. Kosowsky
John Rouillard wrote at about 20:13:15 + on Thursday, October 30, 2008:
 > On Thu, Oct 30, 2008 at 10:04:26AM -0400, Jeffrey J. Kosowsky wrote:
 > > Holger Parplies wrote at about 11:29:49 +0100 on Thursday, October 30, 
 > > 2008:
 > >  > Hi,
 > >  > 
 > >  > Jeffrey J. Kosowsky wrote on 2008-10-30 03:55:16 -0400 
 > > [[BackupPC-users] Duplicate files in pool with same CHECKSUM and same 
 > > CONTENTS]:
 > >  > > I have found a number of files in my pool that have the same checksum
 > >  > > (other than a trailing _0 or _1) and also the SAME CONTENT. Each copy
 > >  > > has a few links to it by the way.
 > >  > > 
 > >  > > Why is this happening? 
 > >  > 
 > >  > presumably creating a link sometimes fails, so BackupPC copies the file,
 > >  > assuming the hard link limit has been reached. I suspect problems with 
 > > your
 > >  > NFS server, though not a "stale NFS file handle" in this case,
 > >  > since the file succeeds. Strange.
 > > 
 > > Yes - I am beginning to think that may be true. However as I mentioned
 > > in the other thread, the syslog on the nfs server is clean and the one
 > > on the client shows only about a dozen or so nfs timeouts over the
 > > past 12 hours which is the time period I am looking at now. Otherwise,
 > > I don't see any nfs errors.
 > > So if it is a nfs problem, something seems to be happening somewhat
 > > randomly and invisibly to the filesystem.
 > 
 > IIRC you are using a soft nfs mount option right? If you are writing
 > to an NFS share that is not recommended. Try changing it to a hard
 > mount and see if the problem goes away. I only used soft mounts on
 > read only filesystems.

True -- I changed it to 'hard' but am still encountering the
problem... FRUSTRATING...

It's really weird in that it seems to work the first time a directory
is read but after a directory has been read a few times, it starts
messing up. It's almost like the results are being stored in cache and
then the cache is corrupted.

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 Thread Jeffrey J. Kosowsky
Craig Barratt wrote at about 11:27:41 -0700 on Thursday, October 30, 2008:
 > Jeffrey writes:
 > 
 > > Except that it my case some of the duplicated checksums truly are the
 > > same file (probably due to the link issue I am having)...
 > 
 > Yes.  Just as Holger mentions, if the hardlink attempt fails,
 > a new file is created in the pool.  You appear to have some
 > unreliability in your NFS or network setup.
 > 
 > The only other time identical files will have different pool
 > entries, as people noted, is when $Conf{HardLinkMax} is hit.
 > Subsequent expiry of backups might cause the identical files
 > to move below $Conf{HardLinkMax}.
 > 
 > It's not worth the trouble to try to combine those files since
 > the frequency is so small and the effort to relink them is very
 > high.
 > 
 > Craig

OK - Definitely seems to be an NFS problem -- sorry for having
troubled the BackupPC list.

When I do a shell command 'find | wc' on the cpool directory, I usually get
the right number of results but sometimes whole subdirectories are not
found.  This problem seems to come and go. As in sometimes, I get the
right results and sometimes I don't... This makes it even harder to
troubleshoot since I can't reliably reproduce the problem every time.

I am confused though why I'm not seeing any notation of this problem in my
log files (either on the nfs server or client)...

Thanks!!!

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 Thread John Rouillard
On Thu, Oct 30, 2008 at 10:04:26AM -0400, Jeffrey J. Kosowsky wrote:
> Holger Parplies wrote at about 11:29:49 +0100 on Thursday, October 30, 2008:
>  > Hi,
>  > 
>  > Jeffrey J. Kosowsky wrote on 2008-10-30 03:55:16 -0400 [[BackupPC-users] 
> Duplicate files in pool with same CHECKSUM and same CONTENTS]:
>  > > I have found a number of files in my pool that have the same checksum
>  > > (other than a trailing _0 or _1) and also the SAME CONTENT. Each copy
>  > > has a few links to it by the way.
>  > > 
>  > > Why is this happening? 
>  > 
>  > presumably creating a link sometimes fails, so BackupPC copies the file,
>  > assuming the hard link limit has been reached. I suspect problems with your
>  > NFS server, though not a "stale NFS file handle" in this case,
>  > since the file succeeds. Strange.
> 
> Yes - I am beginning to think that may be true. However as I mentioned
> in the other thread, the syslog on the nfs server is clean and the one
> on the client shows only about a dozen or so nfs timeouts over the
> past 12 hours which is the time period I am looking at now. Otherwise,
> I don't see any nfs errors.
> So if it is a nfs problem, something seems to be happening somewhat
> randomly and invisibly to the filesystem.

IIRC you are using a soft nfs mount option right? If you are writing
to an NFS share that is not recommended. Try changing it to a hard
mount and see if the problem goes away. I only used soft mounts on
read only filesystems.

-- 
-- rouilj

John Rouillard
System Administrator
Renesys Corporation
603-244-9084 (cell)
603-643-9300 x 111

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 Thread Craig Barratt
Jeffrey writes:

> Except that it my case some of the duplicated checksums truly are the
> same file (probably due to the link issue I am having)...

Yes.  Just as Holger mentions, if the hardlink attempt fails,
a new file is created in the pool.  You appear to have some
unreliability in your NFS or network setup.

The only other time identical files will have different pool
entries, as people noted, is when $Conf{HardLinkMax} is hit.
Subsequent expiry of backups might cause the identical files
to move below $Conf{HardLinkMax}.

It's not worth the trouble to try to combine those files since
the frequency is so small and the effort to relink them is very
high.

Craig

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 Thread Jeffrey J. Kosowsky
Jeffrey J. Kosowsky wrote at about 10:04:26 -0400 on Thursday, October 30, 2008:
 > Holger Parplies wrote at about 11:29:49 +0100 on Thursday, October 30, 2008:
 >  > Hi,
 >  > 
 >  > Jeffrey J. Kosowsky wrote on 2008-10-30 03:55:16 -0400 [[BackupPC-users] 
 > Duplicate files in pool with same CHECKSUM and same CONTENTS]:
 >  > > I have found a number of files in my pool that have the same checksum
 >  > > (other than a trailing _0 or _1) and also the SAME CONTENT. Each copy
 >  > > has a few links to it by the way.
 >  > > 
 >  > > Why is this happening? 
 >  > 
 >  > presumably creating a link sometimes fails, so BackupPC copies the file,
 >  > assuming the hard link limit has been reached. I suspect problems with 
 > your
 >  > NFS server, though not a "stale NFS file handle" in this case, since 
 > copying
 >  > the file succeeds. Strange.
 > 
 > Yes - I am beginning to think that may be true. However as I mentioned
 > in the other thread, the syslog on the nfs server is clean and the one
 > on the client shows only about a dozen or so nfs timeouts over the
 > past 12 hours which is the time period I am looking at now. Otherwise,
 > I don't see any nfs errors.
Actually I traced these errors to a timout due to disks on the NAS
spinning up. They appear to be just soft timeouts (and not related to
this link problem)

 > So if it is a nfs problem, something seems to be happening somewhat
 > randomly and invisibly to the filesystem.
 > 
 >  > 
 >  > >   Isn't this against the whole theory of pooling.
 >  > 
 >  > Well, yes :). But the action of copying the file when the method to 
 > implement
 >  > pooling (hard links) does not work for some reason (max link count 
 > reached, or
 >  > NFS file server errors if you think about it - you *do* get some level of
 >  > pooling; otherwise you'd have an independant copy or a missing file each 
 > time
 >  > linking fails) is perfectly reasonable.
 >  > 
 >  > >   It also doesn't seem
 >  > >   to get cleaned up by BackupPC_nightly since that has run several times
 >  > >   and the pool files are now several days old.
 >  > 
 >  > BackupPC_nightly is not supposed to clean up that situation. It could be
 >  > designed to do so (the situation may arise when a "link count overflow" is
 >  > resolved by expired backups), but it would be horribly inefficient: for 
 > the
 >  > file to be eliminated, you would have to find() every occurrence of the 
 > inode
 >  > in all pc/* trees and replace them with links to the duplicate(s) to be 
 > kept.
 >  > You don't want that.
 > 
 > Yes but it would be nice to have a switch perhaps that allowed this
 > more comprehensive cleanup.
 > Even in a non-error case, I can imagine situations where at some point
 > the max file links may have been exceeded and then backups were
 > deleted so that the link count came back down below the max.
 > 
 > The logic wouldn't seem to be that horrendous. Since you would only
 > need to walk down the pc/* trees once -- i.e. first walk down
 > (c)pool/* to compile list of repeated but identical checksums. Then
 > walk down the pc/* tree to find the files on the list.
 > 
 >  > 
 >  > > What can I do to clean it up?
 >  > 
 >  > Fix your NFS server? :) Is there a consistent maximum number of links, or 
 > do
 >  > the copies seem to happen randomly? Honestly, I don't think the savings 
 > you
 >  > may gain from storing the pool over NFS are worth the headaches. What is
 >  > cheaper about putting a large disk into a NAS device than into your 
 > BackupPC
 >  > server? Well, yes, you can share it ... how about exporting part of the 
 > disk
 >  > from the BackupPC server (I would still recommend distinct partitions)?
 >  > 
 > 
 > You are right in theory. But I would still like to get NFS working for
 > various reasons and it is always a good "learning experience" to
 > troubleshoot such things ;)
 > 

Now this is interesting...
Looking through my BackupPC log files, I noticed that this problem
*FIRST* occurred on Oct 27 and has affected every backup since. The
error are only occurring when BackupPC_link runs (and I didn't have
any problems with BackupPC_link in the 10 or so previous days that I
have been using BackupPC).

So, I used both find and the incremental backups themselves to see
what happened between the last error-free backup at 18:08PM on Oct 26
and the first bad one at 1AM on Oct 27. But it doesn't seem like any
files changed on either the BackupPC server or the NFS server.

Also, interestingly, this problem occ

Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 Thread Adam Goryachev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jeffrey J. Kosowsky wrote:
> Holger Parplies wrote at about 11:29:49 +0100 on Thursday, October 30, 2008:
>  > Hi,
>  > 
>  > Jeffrey J. Kosowsky wrote on 2008-10-30 03:55:16 -0400 [[BackupPC-users] 
> Duplicate files in pool with same CHECKSUM and same CONTENTS]:
>  > > I have found a number of files in my pool that have the same checksum
>  > > (other than a trailing _0 or _1) and also the SAME CONTENT. Each copy
>  > > has a few links to it by the way.
>  > > 
>  > > Why is this happening? 
>  > 
>  > presumably creating a link sometimes fails, so BackupPC copies the file,
>  > assuming the hard link limit has been reached. I suspect problems with your
>  > NFS server, though not a "stale NFS file handle" in this case, since 
> copying
>  > the file succeeds. Strange.
> 
> Yes - I am beginning to think that may be true. However as I mentioned
> in the other thread, the syslog on the nfs server is clean and the one
> on the client shows only about a dozen or so nfs timeouts over the
> past 12 hours which is the time period I am looking at now. Otherwise,
> I don't see any nfs errors.
> So if it is a nfs problem, something seems to be happening somewhat
> randomly and invisibly to the filesystem.

See this URL which assisted me in improving the performance, and
reducing NFS errors in my environment.
http://billharlan.com/pub/papers/NFS_for_clusters.html

It was written a long time ago, but most of it is stall very relevant (I
guess NFS has not changed much).

In my case, the actual problem was faulty memory in a new server plus
some sort of strange network card driver problem corrupting the NFS
packets

It truly surprised me just how many errors I was getting even from my
existing load which I had never noticed.

Regards,
Adam
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJCc6IGyoxogrTyiURAoVnAJ9iKX9Sj8H7mDgmyrC182Uz+rIvgwCePvy5
J8OaYBtJuOvYC9a4JSNGEKI=
=gip1
-END PGP SIGNATURE-

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 Thread Jeffrey J. Kosowsky
Tino Schwarze wrote at about 15:08:29 +0100 on Thursday, October 30, 2008:
 > On Thu, Oct 30, 2008 at 09:56:15AM -0400, Jeffrey J. Kosowsky wrote:
 > 
 > >  > I'm not sure though, how the file name is derived, I found another file
 > >  > with same name but different MD5 sum:
 > >  > .../cpool/0/0 # md5sum 8/0084734e7242df0fbc186ba6741d1bab*
 > >  > db224998946bac7859f2448f41c58f88  8/0084734e7242df0fbc186ba6741d1bab
 > >  > d1d8f3a86ae5492de0bf11f5cfb45860  8/0084734e7242df0fbc186ba6741d1bab_0
 > >  > 
 > >  > IIRC, BackupPC_nightly should perform chain cleaning.
 > > 
 > > Well, I haven't noticed any change after it runs...
 > > I think I'm even more confused now ;)
 > > How can I troubleshoot this further?
 > 
 > There's no trouble to shoot! ;-)
 > 
 > Holger explained that the pool file name is based on a checksum of the
 > first 256k of the file's content and the file's length, so collisions
 > are normal and expected.
 > 
Except that it my case some of the duplicated checksums truly are the
same file (probably due to the link issue I am having)...

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 Thread Tino Schwarze
On Thu, Oct 30, 2008 at 09:56:15AM -0400, Jeffrey J. Kosowsky wrote:

>  > I'm not sure though, how the file name is derived, I found another file
>  > with same name but different MD5 sum:
>  > .../cpool/0/0 # md5sum 8/0084734e7242df0fbc186ba6741d1bab*
>  > db224998946bac7859f2448f41c58f88  8/0084734e7242df0fbc186ba6741d1bab
>  > d1d8f3a86ae5492de0bf11f5cfb45860  8/0084734e7242df0fbc186ba6741d1bab_0
>  > 
>  > IIRC, BackupPC_nightly should perform chain cleaning.
> 
> Well, I haven't noticed any change after it runs...
> I think I'm even more confused now ;)
> How can I troubleshoot this further?

There's no trouble to shoot! ;-)

Holger explained that the pool file name is based on a checksum of the
first 256k of the file's content and the file's length, so collisions
are normal and expected.

HTH,

Tino.

-- 
"What we nourish flourishes." - "Was wir nähren erblüht."

www.lichtkreis-chemnitz.de
www.craniosacralzentrum.de

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 Thread Jeffrey J. Kosowsky
Holger Parplies wrote at about 11:29:49 +0100 on Thursday, October 30, 2008:
 > Hi,
 > 
 > Jeffrey J. Kosowsky wrote on 2008-10-30 03:55:16 -0400 [[BackupPC-users] 
 > Duplicate files in pool with same CHECKSUM and same CONTENTS]:
 > > I have found a number of files in my pool that have the same checksum
 > > (other than a trailing _0 or _1) and also the SAME CONTENT. Each copy
 > > has a few links to it by the way.
 > > 
 > > Why is this happening? 
 > 
 > presumably creating a link sometimes fails, so BackupPC copies the file,
 > assuming the hard link limit has been reached. I suspect problems with your
 > NFS server, though not a "stale NFS file handle" in this case, since copying
 > the file succeeds. Strange.

Yes - I am beginning to think that may be true. However as I mentioned
in the other thread, the syslog on the nfs server is clean and the one
on the client shows only about a dozen or so nfs timeouts over the
past 12 hours which is the time period I am looking at now. Otherwise,
I don't see any nfs errors.
So if it is a nfs problem, something seems to be happening somewhat
randomly and invisibly to the filesystem.

 > 
 > >   Isn't this against the whole theory of pooling.
 > 
 > Well, yes :). But the action of copying the file when the method to implement
 > pooling (hard links) does not work for some reason (max link count reached, 
 > or
 > NFS file server errors if you think about it - you *do* get some level of
 > pooling; otherwise you'd have an independant copy or a missing file each time
 > linking fails) is perfectly reasonable.
 > 
 > >   It also doesn't seem
 > >   to get cleaned up by BackupPC_nightly since that has run several times
 > >   and the pool files are now several days old.
 > 
 > BackupPC_nightly is not supposed to clean up that situation. It could be
 > designed to do so (the situation may arise when a "link count overflow" is
 > resolved by expired backups), but it would be horribly inefficient: for the
 > file to be eliminated, you would have to find() every occurrence of the inode
 > in all pc/* trees and replace them with links to the duplicate(s) to be kept.
 > You don't want that.

Yes but it would be nice to have a switch perhaps that allowed this
more comprehensive cleanup.
Even in a non-error case, I can imagine situations where at some point
the max file links may have been exceeded and then backups were
deleted so that the link count came back down below the max.

The logic wouldn't seem to be that horrendous. Since you would only
need to walk down the pc/* trees once -- i.e. first walk down
(c)pool/* to compile list of repeated but identical checksums. Then
walk down the pc/* tree to find the files on the list.

 > 
 > > What can I do to clean it up?
 > 
 > Fix your NFS server? :) Is there a consistent maximum number of links, or do
 > the copies seem to happen randomly? Honestly, I don't think the savings you
 > may gain from storing the pool over NFS are worth the headaches. What is
 > cheaper about putting a large disk into a NAS device than into your BackupPC
 > server? Well, yes, you can share it ... how about exporting part of the disk
 > from the BackupPC server (I would still recommend distinct partitions)?
 > 

You are right in theory. But I would still like to get NFS working for
various reasons and it is always a good "learning experience" to
troubleshoot such things ;)

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 Thread Jeffrey J. Kosowsky
Tino Schwarze wrote at about 11:13:27 +0100 on Thursday, October 30, 2008:
 > Hi Jeffrey,
 > 
 > On Thu, Oct 30, 2008 at 03:55:16AM -0400, Jeffrey J. Kosowsky wrote:
 > 
 > > I have found a number of files in my pool that have the same checksum
 > > (other than a trailing _0 or _1) and also the SAME CONTENT. Each copy
 > > has a few links to it by the way.
 > 
 > That's intentional - what are the link counts for the files? 
 > If you look at BackupPC's status page, there is a line like:
 > 
 > * Pool hashing gives 649 repeated files with longest chain 28, 
Ah I was wondering what that line meant... (for real :)
Mine says:
 Pool hashing gives 9676 repeated files with longest chain 4
HOWEVER: my config has:
 $Conf{HardLinkMax} = 31999
And when I look at some of the "repeated" pool files, I see that they
only have 2-3 links each.

 > 
 > > Why is this happening? 
 > >   Isn't this against the whole theory of pooling.  It also doesn't seem
 > >   to get cleaned up by BackupPC_nightly since that has run several times
 > >   and the pool files are now several days old.
 > 
 > Because there is a file-system dependent limit to the number of hard
 > links a file may have. Look at $Conf{HardLinkMax} in config.pl.
 > 
 > Hm. I just took a look in my cpool and found some files which didn't
 > hit the hardlink count yet, but have a _0 and _1:
 > .../cpool/0/0 # ls -l c/00cd83be1ea3c1ffa3c6af2f4e310206* 
 > -rw-r- 4371 backuppc users 34 2005-01-14 17:01 
 > c/00cd83be1ea3c1ffa3c6af2f4e310206 
 > -rw-r- 3536 backuppc users 34 2005-03-02 02:22 
 > c/00cd83be1ea3c1ffa3c6af2f4e310206_0 
 > -rw-r-  439 backuppc users 34 2006-03-11 02:04 
 > c/00cd83be1ea3c1ffa3c6af2f4e310206_1 
 > 
 > MD5Sums are not equal for all files, so maybe something got corrupted
 > (or I updated BackupPC during the time - the files are rather old!):
 > .../cpool/0/0 # md5sum c/00cd83be1ea3c1ffa3c6af2f4e310206*
 > 51ef559d1d7fa02c05fa032729c85804  c/00cd83be1ea3c1ffa3c6af2f4e310206
 > 51ef559d1d7fa02c05fa032729c85804  c/00cd83be1ea3c1ffa3c6af2f4e310206_0
 > 7e2276750fc478fa30142aa808df2a1f  c/00cd83be1ea3c1ffa3c6af2f4e310206_1
 > 
 > AFAIK, I started with $Conf{HardLinkMax} set to 32.000. As the files are
 > very old, a lot of links might have expired already.
 > 
 > I'm not sure though, how the file name is derived, I found another file
 > with same name but different MD5 sum:
 > .../cpool/0/0 # md5sum 8/0084734e7242df0fbc186ba6741d1bab*
 > db224998946bac7859f2448f41c58f88  8/0084734e7242df0fbc186ba6741d1bab
 > d1d8f3a86ae5492de0bf11f5cfb45860  8/0084734e7242df0fbc186ba6741d1bab_0
 > 
 > IIRC, BackupPC_nightly should perform chain cleaning.

Well, I haven't noticed any change after it runs...
I think I'm even more confused now ;)
How can I troubleshoot this further?

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 Thread Tino Schwarze
Apropos link count, I just did a quick check of my pool. Here are the
top linked files:

-rw-r- 987537 backuppc users 359 2007-05-19 23:43 
./0/d/1/0d16a8f0ce1b516044a3f015b7d5ee06
-rw-r- 437446 backuppc users 98 2007-02-07 03:21 
./b/c/8/bc891581e99fb3729ea3d239a52d2b9a
-rw-r- 340062 backuppc users 98 2007-12-22 02:50 
./6/5/9/659e6651b59c8d8de4ffacdb9a27eb9f
-rw-r- 266646 backuppc users 122 2007-12-22 10:15 
./c/e/a/ceaf858b5f9ef4fdbd1b2132a9d8b14e

So almost one million links for... *drum roll* our CVS commit message template!
2nd place got *drum roll* a CVS/Tag file.
And the third is... a CVS/Root.
The fourth is another CVS/Root still featuring a quarter million links.

Bye,

Tino.

-- 
"What we nourish flourishes." - "Was wir nähren erblüht."

www.lichtkreis-chemnitz.de
www.craniosacralzentrum.de

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 Thread Tino Schwarze
Hi Holger,

On Thu, Oct 30, 2008 at 12:11:43PM +0100, Holger Parplies wrote:

> > I'm not sure though, how the file name is derived,
> 
> It's in the docs. Up to 256 KB of file contents (from the first 1 MB) and the
> file length are taken into account, so it's quite easy to produce hash clashes
> if you want to: take a file > 1 MB and change the last byte. BackupPC resolves
> them and they're probably infrequent enough not to be a problem (and you get
> to see whether they are on the status page). Taking the length (of the
> uncompressed file) into account avoids things like growing logfiles from
> causing problems.

Thank you for the clarification!

Tino, hever having bothered about that before. ;-)

-- 
"What we nourish flourishes." - "Was wir nähren erblüht."

www.lichtkreis-chemnitz.de
www.craniosacralzentrum.de

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 Thread Holger Parplies
Hi,

Tino Schwarze wrote on 2008-10-30 11:13:27 +0100 [Re: [BackupPC-users] 
Duplicate files in pool with same CHECKSUM and same CONTENTS]:
> [...]
> Hm. I just took a look in my cpool and found some files which didn't
> hit the hardlink count yet, but have a _0 and _1:
> .../cpool/0/0 # ls -l c/00cd83be1ea3c1ffa3c6af2f4e310206* 
> -rw-r- 4371 backuppc users 34 2005-01-14 17:01 
> c/00cd83be1ea3c1ffa3c6af2f4e310206 
> -rw-r- 3536 backuppc users 34 2005-03-02 02:22 
> c/00cd83be1ea3c1ffa3c6af2f4e310206_0 
> -rw-r-  439 backuppc users 34 2006-03-11 02:04 
> c/00cd83be1ea3c1ffa3c6af2f4e310206_1 
> 
> MD5Sums are not equal for all files,

that's intentional :-). Those files have different content but hash to the
same BackupPC hash. Quoting you:

> If you look at BackupPC's status page, there is a line like:
> 
> * Pool hashing gives 649 repeated files with longest chain 28, 

That is what this line is about - you have up to 28 different files hashing to
the same BackupPC hash (some of these may coincidentally have identical
content due to link count overflows, but that would be the exception).

> AFAIK, I started with $Conf{HardLinkMax} set to 32.000. As the files are
> very old, a lot of links might have expired already.

True, but keep in mind how much 32000 really is. Unless you have many files
with identical content in your backup set (CVS/Root maybe), it will take very
many backups to reach so many links.

> I'm not sure though, how the file name is derived,

It's in the docs. Up to 256 KB of file contents (from the first 1 MB) and the
file length are taken into account, so it's quite easy to produce hash clashes
if you want to: take a file > 1 MB and change the last byte. BackupPC resolves
them and they're probably infrequent enough not to be a problem (and you get
to see whether they are on the status page). Taking the length (of the
uncompressed file) into account avoids things like growing logfiles from
causing problems.

> IIRC, BackupPC_nightly should perform chain cleaning.

Unused files (i.e. link count = 1) are removed and chains renumbered. Like I
wrote, relinking identical files does not make sense.

Regards,
Holger

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 Thread Holger Parplies
Hi,

Jeffrey J. Kosowsky wrote on 2008-10-30 03:55:16 -0400 [[BackupPC-users] 
Duplicate files in pool with same CHECKSUM and same CONTENTS]:
> I have found a number of files in my pool that have the same checksum
> (other than a trailing _0 or _1) and also the SAME CONTENT. Each copy
> has a few links to it by the way.
> 
> Why is this happening? 

presumably creating a link sometimes fails, so BackupPC copies the file,
assuming the hard link limit has been reached. I suspect problems with your
NFS server, though not a "stale NFS file handle" in this case, since copying
the file succeeds. Strange.

>   Isn't this against the whole theory of pooling.

Well, yes :). But the action of copying the file when the method to implement
pooling (hard links) does not work for some reason (max link count reached, or
NFS file server errors if you think about it - you *do* get some level of
pooling; otherwise you'd have an independant copy or a missing file each time
linking fails) is perfectly reasonable.

>   It also doesn't seem
>   to get cleaned up by BackupPC_nightly since that has run several times
>   and the pool files are now several days old.

BackupPC_nightly is not supposed to clean up that situation. It could be
designed to do so (the situation may arise when a "link count overflow" is
resolved by expired backups), but it would be horribly inefficient: for the
file to be eliminated, you would have to find() every occurrence of the inode
in all pc/* trees and replace them with links to the duplicate(s) to be kept.
You don't want that.

> What can I do to clean it up?

Fix your NFS server? :) Is there a consistent maximum number of links, or do
the copies seem to happen randomly? Honestly, I don't think the savings you
may gain from storing the pool over NFS are worth the headaches. What is
cheaper about putting a large disk into a NAS device than into your BackupPC
server? Well, yes, you can share it ... how about exporting part of the disk
from the BackupPC server (I would still recommend distinct partitions)?

Regards,
Holger

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 Thread Tino Schwarze
Hi Jeffrey,

On Thu, Oct 30, 2008 at 03:55:16AM -0400, Jeffrey J. Kosowsky wrote:

> I have found a number of files in my pool that have the same checksum
> (other than a trailing _0 or _1) and also the SAME CONTENT. Each copy
> has a few links to it by the way.

That's intentional - what are the link counts for the files? 
If you look at BackupPC's status page, there is a line like:

* Pool hashing gives 649 repeated files with longest chain 28, 

> Why is this happening? 
>   Isn't this against the whole theory of pooling.  It also doesn't seem
>   to get cleaned up by BackupPC_nightly since that has run several times
>   and the pool files are now several days old.

Because there is a file-system dependent limit to the number of hard
links a file may have. Look at $Conf{HardLinkMax} in config.pl.

Hm. I just took a look in my cpool and found some files which didn't
hit the hardlink count yet, but have a _0 and _1:
.../cpool/0/0 # ls -l c/00cd83be1ea3c1ffa3c6af2f4e310206* 
-rw-r- 4371 backuppc users 34 2005-01-14 17:01 
c/00cd83be1ea3c1ffa3c6af2f4e310206 
-rw-r- 3536 backuppc users 34 2005-03-02 02:22 
c/00cd83be1ea3c1ffa3c6af2f4e310206_0 
-rw-r-  439 backuppc users 34 2006-03-11 02:04 
c/00cd83be1ea3c1ffa3c6af2f4e310206_1 

MD5Sums are not equal for all files, so maybe something got corrupted
(or I updated BackupPC during the time - the files are rather old!):
.../cpool/0/0 # md5sum c/00cd83be1ea3c1ffa3c6af2f4e310206*
51ef559d1d7fa02c05fa032729c85804  c/00cd83be1ea3c1ffa3c6af2f4e310206
51ef559d1d7fa02c05fa032729c85804  c/00cd83be1ea3c1ffa3c6af2f4e310206_0
7e2276750fc478fa30142aa808df2a1f  c/00cd83be1ea3c1ffa3c6af2f4e310206_1

AFAIK, I started with $Conf{HardLinkMax} set to 32.000. As the files are
very old, a lot of links might have expired already.

I'm not sure though, how the file name is derived, I found another file
with same name but different MD5 sum:
.../cpool/0/0 # md5sum 8/0084734e7242df0fbc186ba6741d1bab*
db224998946bac7859f2448f41c58f88  8/0084734e7242df0fbc186ba6741d1bab
d1d8f3a86ae5492de0bf11f5cfb45860  8/0084734e7242df0fbc186ba6741d1bab_0

IIRC, BackupPC_nightly should perform chain cleaning.

Tino.

-- 
"What we nourish flourishes." - "Was wir nähren erblüht."

www.lichtkreis-chemnitz.de
www.craniosacralzentrum.de

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


[BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 Thread Jeffrey J. Kosowsky
I have found a number of files in my pool that have the same checksum
(other than a trailing _0 or _1) and also the SAME CONTENT. Each copy
has a few links to it by the way.

Why is this happening? 
  Isn't this against the whole theory of pooling.  It also doesn't seem
  to get cleaned up by BackupPC_nightly since that has run several times
  and the pool files are now several days old.

What can I do to clean it up?
  Is there a script that goes through looking for identical checksum
  pool files that have the same content and then coalesces them all
  into one inode.

Thanks!

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/