Re: [Bacula-users] [Storage-Space usage] Prevent duplicate file-backups?

2006-10-26 Thread Tijl Van den Broeck
It would indeed be a great space saving feature.

But I wouldn't place my bet only on md5sums, it has been proven that
there -could- occur false matches. There has to be some additional
checking as well, starting with the filename. The chances of a
duplicate md5sum in the same filename, while having different
contents, are so small I doubt it would ever occur. Yes yes, Murphy's
Law, I know, but realistically... would it ever occur?

Filename match doesn't necessarly need to be a 1-1 check, but more of
a pattern check, when a file is copied and renamed, a part of the
original name is mostly kept.

Tijl Van den Broeck


On 10/25/06, Hristo Benev [EMAIL PROTECTED] wrote:
 If this is not possible with current version it is a very good request
 for feature.

 Probably this can be done with md5sum'ing - works even if files are
 renamed, and just linking files in catalog...
 Yes, it will require little bit more processing power, but it could save
 a lot of space.

 --
 Hristo Benev
 IT Manager

 WAVEROAD
 Partners in Telecommunications

 514-935-2020 x225 T
 514-935-1001 F
 www.waveroad.ca
 [EMAIL PROTECTED]


 -
 Using Tomcat but need to do more? Need to support web services, security?
 Get stuff done quickly with pre-integrated technology to make your job easier
 Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
 http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] [Storage-Space usage] Prevent duplicate file-backups?

2006-10-26 Thread Hristo Benev




Tijl Van den Broeck wrote:

  It would indeed be a great space saving feature.

But I wouldn't place my bet only on md5sums, it has been proven that
there -could- occur false matches. There has to be some additional
checking as well, starting with the filename. The chances of a
duplicate md5sum in the same filename, while having different
contents, are so small I doubt it would ever occur. Yes yes, Murphy's
Law, I know, but realistically... would it ever occur?

Filename match doesn't necessarly need to be a 1-1 check, but more of
a pattern check, when a file is copied and renamed, a part of the
original name is mostly kept.

Tijl Van den Broeck


On 10/25/06, Hristo Benev [EMAIL PROTECTED] wrote:
  
  
If this is not possible with current version it is a very good request
for feature.

Probably this can be done with md5sum'ing - works even if files are
renamed, and just linking files in catalog...
Yes, it will require little bit more processing power, but it could save
a lot of space.

--
Hristo Benev
IT Manager

WAVEROAD
Partners in Telecommunications

514-935-2020 x225 T
514-935-1001 F
www.waveroad.ca
[EMAIL PROTECTED]


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


  
  
-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users
  

SHA1SUM could be used instead md5sum, but the time consumed is 4 times
bigger (31M tar.gz file). Probably the best way will be diff (fastest).
I do not think that 2 files with same size will have different md5sum
but as Tijl Van den Broeck said Murfphy's law is here :). 
Having checksum (md5 or sha1) will help in case that the same file is
on 2 servers like (i386 folder in Windows) so just checksum could be
send and director could prevent sending the file over the
network(imagine bandwidth savings if this is over wan link).

And this could help bacula add a feature - sort of CDP (continuous data
protection).

-- 
Hristo Benev
IT Manager

WAVEROAD
Partners in Telecommunications

514-935-2020 x225 T
514-935-1001 F
www.waveroad.ca
[EMAIL PROTECTED]



-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] [Storage-Space usage] Prevent duplicate file-backups?

2006-10-26 Thread John Drescher
I have seen this discussed in this list before and I believe there are several problems on top of the small chance that a file will have the same size and same md5sum but different contents. One is do we only search (for dups) in the current backup job or volume or do we include other backups and other volumes. If we inclulde other backups how do we handle the case where a file from  job X is on a volume from job Y because of a duplicate and now some user has purged that volume that contains job Y.
John
-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] [Storage-Space usage] Prevent duplicate file-backups?

2006-10-26 Thread Hristo Benev




John Drescher wrote:
I have seen this discussed in this list before and I
believe there are several problems on top of the small chance that a
file will have the same size and same md5sum but different contents.
One is do we only search (for dups) in the current backup job or volume
or do we include other backups and other volumes. If we inclulde
other backups how do we handle the case where a file from job X is
on a volume from job Y because of a duplicate and now some user has
purged that volume that contains job Y.
  
  
John
  

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
  

___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users
  

My opinion is that as backup solution we could check only in one
volume, but on one volume we could have information from more file
daemons.
Otherwise we will have problems with volume retention and this is true
mainly for removable storage (tapes, external drives...). With file
system storage we could use an algorithm similar to CDP to limit the
number of copies that are held in storage or age and because file
system is randomly accessible and always available it will be easy to
copy data. 
Or if database type storage type is used (why not) we could just
create/delete links to a row.

About searching for duplications it will be just comparing a checksum
this could be done fast in SQL with b-tree indexes (I think) and if
found file is not transmitted over the network, just the relevant info
(filename, location, permissions etc...)

-- 
Hristo Benev
IT Manager

WAVEROAD
Partners in Telecommunications

514-935-2020 x225 T
514-935-1001 F
www.waveroad.ca
[EMAIL PROTECTED]



-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] [Storage-Space usage] Prevent duplicate file-backups?

2006-10-25 Thread Jens Classen
Hello list!

I could not find hints towards this in the manual, and wonder if this 
can be done with bacula:

I would like to prevent duplicate files from being backuped to the 
Storage. For example: User A has a zip-File in his Download folder, and 
the same zip-File in his user folder. Can I prevent bacula from backing 
up both files, or can it be configured to recognise the file as already 
backed up and drop the 2nd copy (like, noting the same file is in 
location B, but can be fetched from location A if needed)?

Any information on this would be highly appreciated. Thanks in advance!


Regards,

Jens


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] [Storage-Space usage] Prevent duplicate file-backups?

2006-10-25 Thread Hristo Benev
Jens Classen wrote:
 Hello list!

 I could not find hints towards this in the manual, and wonder if this 
 can be done with bacula:

 I would like to prevent duplicate files from being backuped to the 
 Storage. For example: User A has a zip-File in his Download folder, and 
 the same zip-File in his user folder. Can I prevent bacula from backing 
 up both files, or can it be configured to recognise the file as already 
 backed up and drop the 2nd copy (like, noting the same file is in 
 location B, but can be fetched from location A if needed)?

 Any information on this would be highly appreciated. Thanks in advance!


 Regards,

 Jens


 -
 Using Tomcat but need to do more? Need to support web services, security?
 Get stuff done quickly with pre-integrated technology to make your job easier
 Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
 http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users
   
If this is not possible with current version it is a very good request
for feature.

Probably this can be done with md5sum'ing - works even if files are
renamed, and just linking files in catalog...
Yes, it will require little bit more processing power, but it could save
a lot of space.

-- 
Hristo Benev
IT Manager

WAVEROAD
Partners in Telecommunications

514-935-2020 x225 T
514-935-1001 F
www.waveroad.ca
[EMAIL PROTECTED]


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users