Bug#284274: Re: Bug#284274: Patch for the hardlink replacement bug request
Hi, for the time being, it would probably be much more reasonable to limit that function to the current local filesystem only, instead of trying to crack a nut with a sledgehammer. Finding duplicates over filesystems should be considered being a special use case which could/should be handled separately. In the meantime, lots of users (e.g. not only yours truly) would be very happy, if not even delighted, if they were able to deduplicate files via hardlinking within the boundaries of a single file system. Best regards, Paul On 06/21/2012 02:24 AM, Javier Fernandez-Sanguino wrote: I still have to investigate if detecting files across different filesystems is something easy or not, but the approach above should work (although there are some race conditions) Regards Javier -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#284274: Patch for the hardlink replacement bug request
Getting rid of the only race condition that matters: Create the link first, with an unused name. Instead of relying on the return code, which may be wrong for NFS, call stat to find out if you created the file. Rename the link over top of the file it is intended to replace. In case that fails, remove the temporary file. I suggest temporary names that look like these: .fdupes-vtYoH1PGPa4lj^LIOfL_i~ .fdupes-wz_7uNXC2R4-ftNq-gl,Z~ .fdupes-kf9_9EQmw-v0nv_-HcyKS~ .fdupes-BTR6AlGWjz@rVSC^+@j+-~ .fdupes--SaeXuxNfj1U0mltgmWNN~ (dotfile, string fdupes, 128 random bits, tilde on end, and nothing that would be likely to trip up a bash shell) Note that you can't hope to support all the crazy things that exist in current and **future** kernels. There are numerous security modules, mount --bind tricks such as file-on-file mounting, union filesystems, network fs servers running on non-Linux systems, and so on. You have to draw the line somewhere; just make a note in the man page that the tool is intended for single-user use in non-crazy situations. Perfection is the enemy of good; we need this option working again. Right now I'm desperately rewriting this tool as a pile of nasty shell scripts, and I assure you that I totally don't care about cross-filesystem issues. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#284274: Patch for the hardlink replacement bug request
Hi, On Thu, Jun 21, 2012 at 2:24 AM, Javier Fernandez-Sanguino j...@computer.org wrote: Maybe you did not understand the proposed algorithm, no copies are involved, just file renaiming. Ah Indeed, I didn't understand that - that's much better than what I had understood on the first reply (delete + try to hardlink, if fails, copy back the files from another location). Thanks for clarifying working on it. -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#284274: Patch for the hardlink replacement bug request
On Mon, Jun 18, 2012 at 7:56 PM, Javier Fernández-Sanguino Peña j...@computer.org wrote: It might not be too difficult to introduce a check in the patch that tries the hard link and, if it fails, it restores the file and complains. I'll see what I can do. Having a working -L option would be awesome! I think that the above algorithm would cause a lot of I/O if the files to link are big (order of GBs). Maybe we can check if the files are on 2 different FSes and not trying to hardlink them? I know it can cause a race condition, but probably better than copying gigs around Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#284274: Patch for the hardlink replacement bug request
Hi there, Maybe you did not understand the proposed algorithm, no copies are involved, just file renaiming. I was thinking more in the lines of doing this: IF A and B are the same THEN - move B to a temporary file (predefined name or random name, as long as the file does not exist) - create a hardlink B to A IF the hard link fails THEN * restore the temporary file to B (rename, not copy) and complain ELSE * emove the temporary file and declare success I still have to investigate if detecting files across different filesystems is something easy or not, but the approach above should work (although there are some race conditions) Regards Javier -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#284274: Patch for the hardlink replacement bug request
On Sun, Jun 17, 2012 at 09:40:49PM +0200, Sandro Tosi wrote: On Fri, Jul 31, 2009 at 12:56 AM, Javier Fernández-Sanguino Peña j...@computer.org wrote: tags 284274 patch thanks Attached is a patch to the program sources (through the use of a dpatch patch in the Debian package) that adds a new -L / --linkhard option to fdupes. This option will replace all duplicate files with hardlinks which is useful in order to reduce space. It has been tested only slightly, but the code looks (to me) about right. Please consider this patch and include it in the Debian package. As it turned out[1] this patch loses data if some of the file to replace are on different filesystems. I'm going to remove it from the package for now, but i'd be happy to evaluate a new patch. It might not be too difficult to introduce a check in the patch that tries the hard link and, if it fails, it restores the file and complains. I'll see what I can do. Regards Javier signature.asc Description: Digital signature
Bug#284274: Patch for the hardlink replacement bug request
Resending to the bug too, now that the bugs is reopened. On Sun, Jun 17, 2012 at 9:40 PM, Sandro Tosi mo...@debian.org wrote: unarchive 284274 reopen 284274 thanks On Fri, Jul 31, 2009 at 12:56 AM, Javier Fernández-Sanguino Peña j...@computer.org wrote: tags 284274 patch thanks Attached is a patch to the program sources (through the use of a dpatch patch in the Debian package) that adds a new -L / --linkhard option to fdupes. This option will replace all duplicate files with hardlinks which is useful in order to reduce space. It has been tested only slightly, but the code looks (to me) about right. Please consider this patch and include it in the Debian package. As it turned out[1] this patch loses data if some of the file to replace are on different filesystems. I'm going to remove it from the package for now, but i'd be happy to evaluate a new patch. [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=677419 Regards, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#284274: Patch for the hardlink replacement bug request
Hi Javier, 2009/7/31 Javier Fernández-Sanguino Peña j...@computer.org: tags 284274 patch thanks Attached is a patch to the program sources (through the use of a dpatch patch in the Debian package) that adds a new -L / --linkhard option to fdupes. This option will replace all duplicate files with hardlinks which is useful in order to reduce space. Thanks for the patch! It has been tested only slightly, but the code looks (to me) about right. Please consider this patch and include it in the Debian package. I've added upstream in the loop, so he can comment. Hi Adrian, a fellow Debian Developer sent me a patch on fdupes to replace duplicate files with hardlink. It would be nice if you can merge it fdupe original source code. Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi diff -Nru fdupes-1.50-PR2/debian/changelog fdupes-1.50-PR2-2/debian/changelog --- fdupes-1.50-PR2/debian/changelog 2009-07-31 00:47:23.0 +0200 +++ fdupes-1.50-PR2-2/debian/changelog 2009-07-31 00:44:27.0 +0200 @@ -1,3 +1,11 @@ +fdupes (1.50-PR2-2.1) unstable; urgency=low + + * debian/patches/50_bts284274_hardlinkreplace.dpatch created. +- added -L / --linkhard to make fdupes replace files with hardlinks. Also + update the manual page; Closes: 284274 + + -- Javier Fernandez-Sanguino Pen~a j...@debian.org Fri, 31 Jul 2009 00:43:11 +0200 + fdupes (1.50-PR2-2) unstable; urgency=low * debian/control diff -Nru fdupes-1.50-PR2/debian/patches/00list fdupes-1.50-PR2-2/debian/patches/00list --- fdupes-1.50-PR2/debian/patches/00list 2009-07-31 00:47:23.0 +0200 +++ fdupes-1.50-PR2-2/debian/patches/00list 2009-07-31 00:43:02.0 +0200 @@ -5,3 +5,4 @@ 20_bts447601_lfs_support 30_bts481809_manpage_summarize 40_bts511702_nohidden_support +50_bts284274_hardlinkreplace diff -Nru fdupes-1.50-PR2/debian/patches/50_bts284274_hardlinkreplace.dpatch fdupes-1.50-PR2-2/debian/patches/50_bts284274_hardlinkreplace.dpatch --- fdupes-1.50-PR2/debian/patches/50_bts284274_hardlinkreplace.dpatch 1970-01-01 01:00:00.0 +0100 +++ fdupes-1.50-PR2-2/debian/patches/50_bts284274_hardlinkreplace.dpatch 2009-07-31 00:42:42.0 +0200 @@ -0,0 +1,228 @@ +#! /bin/sh /usr/share/dpatch/dpatch-run +## 50_bts284274_hardlinkreplace.dpatch by j...@debian.org +## +## All lines beginning with `## DP:' are a description of the patch. +## DP: No description. + +...@dpatch@ +diff -urNad fdupes-1.50-PR2~/fdupes.1 fdupes-1.50-PR2/fdupes.1 +--- fdupes-1.50-PR2~/fdupes.1 2009-07-31 00:38:28.0 +0200 fdupes-1.50-PR2/fdupes.1 2009-07-31 00:42:14.0 +0200 +@@ -58,10 +58,17 @@ + .B CAVEATS + below) + .TP ++.B -L --hardlink ++replace all duplicate files with hardlinks to the ++first file in each set of duplicates ++.TP + .B -N --noprompt + when used together with \-\-delete, preserve the first file in each + set of duplicates and delete the others without prompting the user + .TP ++.B -D --debug ++provide debugging information ++.TP + .B -v --version + display fdupes version + .TP +diff -urNad fdupes-1.50-PR2~/fdupes.c fdupes-1.50-PR2/fdupes.c +--- fdupes-1.50-PR2~/fdupes.c 2009-07-31 00:38:28.0 +0200 fdupes-1.50-PR2/fdupes.c 2009-07-31 00:41:08.0 +0200 +@@ -53,6 +53,8 @@ + #define F_NOPROMPT 0x0400 + #define F_SUMMARIZEMATCHES 0x0800 + #define F_EXCLUDEHIDDEN 0x1000 ++#define F_HARDLINKFILES 0x2000 ++#define F_DEBUGINFO 0x4000 + + char *program_name; + +@@ -881,6 +883,88 @@ + free(preservestr); + } + ++void hardlinkfiles(file_t *files, int debug) ++{ ++ int counter; ++ int groups = 0; ++ int curgroup = 0; ++ file_t *tmpfile; ++ file_t *curfile; ++ file_t **dupelist; ++ int max = 0; ++ int x = 0; ++ ++ curfile = files; ++ ++ while (curfile) { ++if (curfile-hasdupes) { ++ counter = 1; ++ groups++; ++ ++ tmpfile = curfile-duplicates; ++ while (tmpfile) { ++ counter++; ++ tmpfile = tmpfile-duplicates; ++ } ++ ++ if (counter max) max = counter; ++} ++ ++curfile = curfile-next; ++ } ++ ++ max++; ++ ++ dupelist = (file_t**) malloc(sizeof(file_t*) * max); ++ ++ if (!dupelist) { ++errormsg(out of memory\n); ++exit(1); ++ } ++ ++ while (files) { ++if (files-hasdupes) { ++ curgroup++; ++ counter = 1; ++ dupelist[counter] = files; ++ ++ if (debug) printf([%d] %s\n, counter, files-d_name); ++ ++ tmpfile = files-duplicates; ++ ++ while (tmpfile) { ++ dupelist[++counter] = tmpfile; ++ if (debug) printf([%d] %s\n, counter, tmpfile-d_name); ++ tmpfile = tmpfile-duplicates; ++ } ++ ++ if (debug) printf(\n); ++ ++ /* preserve only the first file */ ++ ++ printf( [+] %s\n, dupelist[1]-d_name); ++ for (x = 2; x = counter; x++) { ++ if (unlink(dupelist[x]-d_name) == 0) { ++if ( link(dupelist[1]-d_name,
Bug#284274: Patch for the hardlink replacement bug request
tags 284274 patch thanks Attached is a patch to the program sources (through the use of a dpatch patch in the Debian package) that adds a new -L / --linkhard option to fdupes. This option will replace all duplicate files with hardlinks which is useful in order to reduce space. It has been tested only slightly, but the code looks (to me) about right. Please consider this patch and include it in the Debian package. Regards Javier diff -Nru fdupes-1.50-PR2/debian/changelog fdupes-1.50-PR2-2/debian/changelog --- fdupes-1.50-PR2/debian/changelog 2009-07-31 00:47:23.0 +0200 +++ fdupes-1.50-PR2-2/debian/changelog 2009-07-31 00:44:27.0 +0200 @@ -1,3 +1,11 @@ +fdupes (1.50-PR2-2.1) unstable; urgency=low + + * debian/patches/50_bts284274_hardlinkreplace.dpatch created. +- added -L / --linkhard to make fdupes replace files with hardlinks. Also + update the manual page; Closes: 284274 + + -- Javier Fernandez-Sanguino Pen~a j...@debian.org Fri, 31 Jul 2009 00:43:11 +0200 + fdupes (1.50-PR2-2) unstable; urgency=low * debian/control diff -Nru fdupes-1.50-PR2/debian/patches/00list fdupes-1.50-PR2-2/debian/patches/00list --- fdupes-1.50-PR2/debian/patches/00list 2009-07-31 00:47:23.0 +0200 +++ fdupes-1.50-PR2-2/debian/patches/00list 2009-07-31 00:43:02.0 +0200 @@ -5,3 +5,4 @@ 20_bts447601_lfs_support 30_bts481809_manpage_summarize 40_bts511702_nohidden_support +50_bts284274_hardlinkreplace diff -Nru fdupes-1.50-PR2/debian/patches/50_bts284274_hardlinkreplace.dpatch fdupes-1.50-PR2-2/debian/patches/50_bts284274_hardlinkreplace.dpatch --- fdupes-1.50-PR2/debian/patches/50_bts284274_hardlinkreplace.dpatch 1970-01-01 01:00:00.0 +0100 +++ fdupes-1.50-PR2-2/debian/patches/50_bts284274_hardlinkreplace.dpatch 2009-07-31 00:42:42.0 +0200 @@ -0,0 +1,228 @@ +#! /bin/sh /usr/share/dpatch/dpatch-run +## 50_bts284274_hardlinkreplace.dpatch by j...@debian.org +## +## All lines beginning with `## DP:' are a description of the patch. +## DP: No description. + +...@dpatch@ +diff -urNad fdupes-1.50-PR2~/fdupes.1 fdupes-1.50-PR2/fdupes.1 +--- fdupes-1.50-PR2~/fdupes.1 2009-07-31 00:38:28.0 +0200 fdupes-1.50-PR2/fdupes.1 2009-07-31 00:42:14.0 +0200 +@@ -58,10 +58,17 @@ + .B CAVEATS + below) + .TP ++.B -L --hardlink ++replace all duplicate files with hardlinks to the ++first file in each set of duplicates ++.TP + .B -N --noprompt + when used together with \-\-delete, preserve the first file in each + set of duplicates and delete the others without prompting the user + .TP ++.B -D --debug ++provide debugging information ++.TP + .B -v --version + display fdupes version + .TP +diff -urNad fdupes-1.50-PR2~/fdupes.c fdupes-1.50-PR2/fdupes.c +--- fdupes-1.50-PR2~/fdupes.c 2009-07-31 00:38:28.0 +0200 fdupes-1.50-PR2/fdupes.c 2009-07-31 00:41:08.0 +0200 +@@ -53,6 +53,8 @@ + #define F_NOPROMPT 0x0400 + #define F_SUMMARIZEMATCHES 0x0800 + #define F_EXCLUDEHIDDEN 0x1000 ++#define F_HARDLINKFILES 0x2000 ++#define F_DEBUGINFO 0x4000 + + char *program_name; + +@@ -881,6 +883,88 @@ + free(preservestr); + } + ++void hardlinkfiles(file_t *files, int debug) ++{ ++ int counter; ++ int groups = 0; ++ int curgroup = 0; ++ file_t *tmpfile; ++ file_t *curfile; ++ file_t **dupelist; ++ int max = 0; ++ int x = 0; ++ ++ curfile = files; ++ ++ while (curfile) { ++if (curfile-hasdupes) { ++ counter = 1; ++ groups++; ++ ++ tmpfile = curfile-duplicates; ++ while (tmpfile) { ++ counter++; ++ tmpfile = tmpfile-duplicates; ++ } ++ ++ if (counter max) max = counter; ++} ++ ++curfile = curfile-next; ++ } ++ ++ max++; ++ ++ dupelist = (file_t**) malloc(sizeof(file_t*) * max); ++ ++ if (!dupelist) { ++errormsg(out of memory\n); ++exit(1); ++ } ++ ++ while (files) { ++if (files-hasdupes) { ++ curgroup++; ++ counter = 1; ++ dupelist[counter] = files; ++ ++ if (debug) printf([%d] %s\n, counter, files-d_name); ++ ++ tmpfile = files-duplicates; ++ ++ while (tmpfile) { ++ dupelist[++counter] = tmpfile; ++ if (debug) printf([%d] %s\n, counter, tmpfile-d_name); ++ tmpfile = tmpfile-duplicates; ++ } ++ ++ if (debug) printf(\n); ++ ++ /* preserve only the first file */ ++ ++ printf( [+] %s\n, dupelist[1]-d_name); ++ for (x = 2; x = counter; x++) { ++ if (unlink(dupelist[x]-d_name) == 0) { ++if ( link(dupelist[1]-d_name, dupelist[x]-d_name) == 0 ) { ++printf( [h] %s\n, dupelist[x]-d_name); ++} else { ++printf(-- unable to create a hardlink for the file: %s\n, strerror(errno)); ++printf( [!] %s , dupelist[x]-d_name); ++} ++ } else { ++ printf( [!] %s , dupelist[x]-d_name); ++ printf(-- unable to delete the file!\n); ++ } ++ } ++ printf(\n); ++} ++ ++files = files-next; ++ } ++ ++