Re: [BackupPC-users] Storing Identical Files?

2023-02-11 Thread Christian Völker

Hi,

thanks for your ideas.

So unless you have either an md5sum collision (extremely unlikely
unless creating them intentionally -- as in (number of files)*2^-128
unlikely), you shouldn't have any files in your pool with an
underscore in them.
Under pool/ I do not have any files with an underscore. Under pc/ there 
are loads of them all named "attrib_*". I guess this is ok so far.



If the contents are the same, then indeed for some reason
de-duplication isn't working.
The content are the same- I rsync'ed them (with -avH). It might be some 
attributes (last access and so on) might be different. As well, path 
might be different: /srv/pics on clientA vs. /srv/share/pics on clientB. 
But that is for sure the only difference.



The only thing that I could think of
that could possibly cause duplicates is if the compression is set
differently on the different backups -- but I'm not sure that would
even create a problem.

They all are uncompressed (cpool ist empty and compression is disabled):
*$Conf{CompressLevel} = 0;*


Also, confirm that all your backups are in a v4 pool...

Yes, they are all in v4, v3 is disabled:
*$Conf{PoolV3Enabled} = 0;*


Meanwhile I realized I had a different issue. The share was backed up 
one day, on the other day (for different reasons) the clientB.pl was 
overwritten by a previous version and the share did not get backed up. 
Howeverm, in graphs I notice a shrink of the red line:

PoolUsage

So it is doing deduplication and I am just not patient enough?


Really unsure
thanks for all hints!

Greetings
/KNEBB



Christian Völker wrote at about 09:12:22 +0100 on Saturday, February 11, 2023:
  > Hi,
  >
  > I am using BackupPC now for years. It is really great. Meanwhile I use
  > v4.4.0 on Debian.
  >
  > As far as I understood it os very efficient in storing identical data.
  > Now I noticed something which let me doubt this. I guess there is an
  > explanation. So what do I have?
  >
  > I have two clients which have a large share. These two (Debian) clients
  > sync this share on a daily base through rsync (through a third clientC,
  > but this should not make a difference). On clientA there is a cron job
  > doing rsync to clientC and on clientB there is a cron job doing rsync
  > from clientC. So in the end all three hosts have identical data. rsync
  > command runs through ssh and use "-avH".
  >
  > BackupPC itself is only backing up host clientA so far (since months
  > now).  So the data is stored in /var/lib/backuppc.
  >
  > Now I added the clientB share to BackupPC and expected the filesystem
  > usage on /var/lib/backuppc to stay more or less equal after the backupc
  > of clientB as the data is already stored from clientA. At least after a
  > while when doing some cleanups.
  >
  > Unfortunately, the usage of the pool increased approximately about the
  > size of the share and has not been dropped since (more than a week now).
  >
  > So my questions are:
  >
  >   *   Is there dupe detecion on BackupPC?
  >   * If so, why does my pool size not decrease after a while?
  >   * If by default it has to decrease, is there an explanation why it
  > does not on my host?
  >
  > Thanks a lot!
  >
  >
  > /KNEBB
  >
  >
  >
  > ___
  > BackupPC-users mailing list
  >BackupPC-users@lists.sourceforge.net
  > List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
  > Wiki:https://github.com/backuppc/backuppc/wiki
  > Project:https://backuppc.github.io/backuppc/


___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project:https://backuppc.github.io/backuppc/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/


Re: [BackupPC-users] Storing Identical Files?

2023-02-11 Thread Greg Harris
Dumb question.  They aren’t encrypted on the drive are they?

Thanks,

Greg Harris

On Feb 11, 2023, at 8:49 PM, 
backu...@kosowsky.org wrote:

It does and in fact almost has to since pool files are stored
according to their md5sum.
So unless you have either an md5sum collision (extremely unlikely
unless creating them intentionally -- as in (number of files)*2^-128
unlikely), you shouldn't have any files in your pool with an
underscore in them.

If you have any such files, use BackupPC_zcat to compare their
contents. If they are different, then congrats you have
(unintentionally) created a blue moon md5sum collision.

If the contents are the same, then indeed for some reason
de-duplication isn't working. The only thing that I could think of
that could possibly cause duplicates is if the compression is set
differently on the different backups -- but I'm not sure that would
even create a problem.

Also, confirm that all your backups are in a v4 pool...

Christian Völker wrote at about 09:12:22 +0100 on Saturday, February 11, 2023:
Hi,

I am using BackupPC now for years. It is really great. Meanwhile I use
v4.4.0 on Debian.

As far as I understood it os very efficient in storing identical data.
Now I noticed something which let me doubt this. I guess there is an
explanation. So what do I have?

I have two clients which have a large share. These two (Debian) clients
sync this share on a daily base through rsync (through a third clientC,
but this should not make a difference). On clientA there is a cron job
doing rsync to clientC and on clientB there is a cron job doing rsync
from clientC. So in the end all three hosts have identical data. rsync
command runs through ssh and use "-avH".

BackupPC itself is only backing up host clientA so far (since months
now).  So the data is stored in /var/lib/backuppc.

Now I added the clientB share to BackupPC and expected the filesystem
usage on /var/lib/backuppc to stay more or less equal after the backupc
of clientB as the data is already stored from clientA. At least after a
while when doing some cleanups.

Unfortunately, the usage of the pool increased approximately about the
size of the share and has not been dropped since (more than a week now).

So my questions are:

 *   Is there dupe detecion on BackupPC?
 * If so, why does my pool size not decrease after a while?
 * If by default it has to decrease, is there an explanation why it
   does not on my host?

Thanks a lot!


/KNEBB



___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/


___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/


___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/


Re: [BackupPC-users] Storing Identical Files?

2023-02-11 Thread backuppc
It does and in fact almost has to since pool files are stored
according to their md5sum.
So unless you have either an md5sum collision (extremely unlikely
unless creating them intentionally -- as in (number of files)*2^-128
unlikely), you shouldn't have any files in your pool with an
underscore in them.

If you have any such files, use BackupPC_zcat to compare their
contents. If they are different, then congrats you have
(unintentionally) created a blue moon md5sum collision.

If the contents are the same, then indeed for some reason
de-duplication isn't working. The only thing that I could think of
that could possibly cause duplicates is if the compression is set
differently on the different backups -- but I'm not sure that would
even create a problem.

Also, confirm that all your backups are in a v4 pool...

Christian Völker wrote at about 09:12:22 +0100 on Saturday, February 11, 2023:
 > Hi,
 > 
 > I am using BackupPC now for years. It is really great. Meanwhile I use 
 > v4.4.0 on Debian.
 > 
 > As far as I understood it os very efficient in storing identical data. 
 > Now I noticed something which let me doubt this. I guess there is an 
 > explanation. So what do I have?
 > 
 > I have two clients which have a large share. These two (Debian) clients 
 > sync this share on a daily base through rsync (through a third clientC, 
 > but this should not make a difference). On clientA there is a cron job 
 > doing rsync to clientC and on clientB there is a cron job doing rsync 
 > from clientC. So in the end all three hosts have identical data. rsync 
 > command runs through ssh and use "-avH".
 > 
 > BackupPC itself is only backing up host clientA so far (since months 
 > now).  So the data is stored in /var/lib/backuppc.
 > 
 > Now I added the clientB share to BackupPC and expected the filesystem 
 > usage on /var/lib/backuppc to stay more or less equal after the backupc 
 > of clientB as the data is already stored from clientA. At least after a 
 > while when doing some cleanups.
 > 
 > Unfortunately, the usage of the pool increased approximately about the 
 > size of the share and has not been dropped since (more than a week now).
 > 
 > So my questions are:
 > 
 >   *   Is there dupe detecion on BackupPC?
 >   * If so, why does my pool size not decrease after a while?
 >   * If by default it has to decrease, is there an explanation why it
 > does not on my host?
 > 
 > Thanks a lot!
 > 
 > 
 > /KNEBB
 > 
 > 
 > 
 > ___
 > BackupPC-users mailing list
 > BackupPC-users@lists.sourceforge.net
 > List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
 > Wiki:https://github.com/backuppc/backuppc/wiki
 > Project: https://backuppc.github.io/backuppc/


___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/


[BackupPC-users] Storing Identical Files?

2023-02-11 Thread Christian Völker

Hi,

I am using BackupPC now for years. It is really great. Meanwhile I use 
v4.4.0 on Debian.


As far as I understood it os very efficient in storing identical data. 
Now I noticed something which let me doubt this. I guess there is an 
explanation. So what do I have?


I have two clients which have a large share. These two (Debian) clients 
sync this share on a daily base through rsync (through a third clientC, 
but this should not make a difference). On clientA there is a cron job 
doing rsync to clientC and on clientB there is a cron job doing rsync 
from clientC. So in the end all three hosts have identical data. rsync 
command runs through ssh and use "-avH".


BackupPC itself is only backing up host clientA so far (since months 
now).  So the data is stored in /var/lib/backuppc.


Now I added the clientB share to BackupPC and expected the filesystem 
usage on /var/lib/backuppc to stay more or less equal after the backupc 
of clientB as the data is already stored from clientA. At least after a 
while when doing some cleanups.


Unfortunately, the usage of the pool increased approximately about the 
size of the share and has not been dropped since (more than a week now).


So my questions are:

 *   Is there dupe detecion on BackupPC?
 * If so, why does my pool size not decrease after a while?
 * If by default it has to decrease, is there an explanation why it
   does not on my host?

Thanks a lot!


/KNEBB



___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/