Re: [Bacula-users] [Storage-Space usage] Prevent duplicate file-backups?
It would indeed be a great space saving feature. But I wouldn't place my bet only on md5sums, it has been proven that there -could- occur false matches. There has to be some additional checking as well, starting with the filename. The chances of a duplicate md5sum in the same filename, while having different contents, are so small I doubt it would ever occur. Yes yes, Murphy's Law, I know, but realistically... would it ever occur? Filename match doesn't necessarly need to be a 1-1 check, but more of a pattern check, when a file is copied and renamed, a part of the original name is mostly kept. Tijl Van den Broeck On 10/25/06, Hristo Benev [EMAIL PROTECTED] wrote: If this is not possible with current version it is a very good request for feature. Probably this can be done with md5sum'ing - works even if files are renamed, and just linking files in catalog... Yes, it will require little bit more processing power, but it could save a lot of space. -- Hristo Benev IT Manager WAVEROAD Partners in Telecommunications 514-935-2020 x225 T 514-935-1001 F www.waveroad.ca [EMAIL PROTECTED] - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Storage-Space usage] Prevent duplicate file-backups?
Tijl Van den Broeck wrote: It would indeed be a great space saving feature. But I wouldn't place my bet only on md5sums, it has been proven that there -could- occur false matches. There has to be some additional checking as well, starting with the filename. The chances of a duplicate md5sum in the same filename, while having different contents, are so small I doubt it would ever occur. Yes yes, Murphy's Law, I know, but realistically... would it ever occur? Filename match doesn't necessarly need to be a 1-1 check, but more of a pattern check, when a file is copied and renamed, a part of the original name is mostly kept. Tijl Van den Broeck On 10/25/06, Hristo Benev [EMAIL PROTECTED] wrote: If this is not possible with current version it is a very good request for feature. Probably this can be done with md5sum'ing - works even if files are renamed, and just linking files in catalog... Yes, it will require little bit more processing power, but it could save a lot of space. -- Hristo Benev IT Manager WAVEROAD Partners in Telecommunications 514-935-2020 x225 T 514-935-1001 F www.waveroad.ca [EMAIL PROTECTED] - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users SHA1SUM could be used instead md5sum, but the time consumed is 4 times bigger (31M tar.gz file). Probably the best way will be diff (fastest). I do not think that 2 files with same size will have different md5sum but as Tijl Van den Broeck said Murfphy's law is here :). Having checksum (md5 or sha1) will help in case that the same file is on 2 servers like (i386 folder in Windows) so just checksum could be send and director could prevent sending the file over the network(imagine bandwidth savings if this is over wan link). And this could help bacula add a feature - sort of CDP (continuous data protection). -- Hristo Benev IT Manager WAVEROAD Partners in Telecommunications 514-935-2020 x225 T 514-935-1001 F www.waveroad.ca [EMAIL PROTECTED] - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Storage-Space usage] Prevent duplicate file-backups?
I have seen this discussed in this list before and I believe there are several problems on top of the small chance that a file will have the same size and same md5sum but different contents. One is do we only search (for dups) in the current backup job or volume or do we include other backups and other volumes. If we inclulde other backups how do we handle the case where a file from job X is on a volume from job Y because of a duplicate and now some user has purged that volume that contains job Y. John - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Storage-Space usage] Prevent duplicate file-backups?
John Drescher wrote: I have seen this discussed in this list before and I believe there are several problems on top of the small chance that a file will have the same size and same md5sum but different contents. One is do we only search (for dups) in the current backup job or volume or do we include other backups and other volumes. If we inclulde other backups how do we handle the case where a file from job X is on a volume from job Y because of a duplicate and now some user has purged that volume that contains job Y. John - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users My opinion is that as backup solution we could check only in one volume, but on one volume we could have information from more file daemons. Otherwise we will have problems with volume retention and this is true mainly for removable storage (tapes, external drives...). With file system storage we could use an algorithm similar to CDP to limit the number of copies that are held in storage or age and because file system is randomly accessible and always available it will be easy to copy data. Or if database type storage type is used (why not) we could just create/delete links to a row. About searching for duplications it will be just comparing a checksum this could be done fast in SQL with b-tree indexes (I think) and if found file is not transmitted over the network, just the relevant info (filename, location, permissions etc...) -- Hristo Benev IT Manager WAVEROAD Partners in Telecommunications 514-935-2020 x225 T 514-935-1001 F www.waveroad.ca [EMAIL PROTECTED] - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] [Storage-Space usage] Prevent duplicate file-backups?
Hello list! I could not find hints towards this in the manual, and wonder if this can be done with bacula: I would like to prevent duplicate files from being backuped to the Storage. For example: User A has a zip-File in his Download folder, and the same zip-File in his user folder. Can I prevent bacula from backing up both files, or can it be configured to recognise the file as already backed up and drop the 2nd copy (like, noting the same file is in location B, but can be fetched from location A if needed)? Any information on this would be highly appreciated. Thanks in advance! Regards, Jens - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Storage-Space usage] Prevent duplicate file-backups?
Jens Classen wrote: Hello list! I could not find hints towards this in the manual, and wonder if this can be done with bacula: I would like to prevent duplicate files from being backuped to the Storage. For example: User A has a zip-File in his Download folder, and the same zip-File in his user folder. Can I prevent bacula from backing up both files, or can it be configured to recognise the file as already backed up and drop the 2nd copy (like, noting the same file is in location B, but can be fetched from location A if needed)? Any information on this would be highly appreciated. Thanks in advance! Regards, Jens - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users If this is not possible with current version it is a very good request for feature. Probably this can be done with md5sum'ing - works even if files are renamed, and just linking files in catalog... Yes, it will require little bit more processing power, but it could save a lot of space. -- Hristo Benev IT Manager WAVEROAD Partners in Telecommunications 514-935-2020 x225 T 514-935-1001 F www.waveroad.ca [EMAIL PROTECTED] - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users