Bug#284274: Re: Bug#284274: Patch for the hardlink replacement bug request

2013-03-10 Thread Paul Seelig
Hi,

for the time being, it would probably be much more reasonable to limit
that function to the current local filesystem only, instead of trying to
crack a nut with a sledgehammer.

Finding duplicates over filesystems should be considered being a special
use case which could/should be handled separately.

In the meantime, lots of users (e.g. not only yours truly) would be very
happy, if not even delighted, if they were able to deduplicate files via
hardlinking within the boundaries of a single file system.

Best regards,
Paul

On 06/21/2012 02:24 AM, Javier Fernandez-Sanguino wrote:
 I still have to investigate if detecting files across different
 filesystems is something easy or not, but the approach above should
 work (although there are some race conditions)
 
 
 Regards
 
 Javier
 


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#284274: Patch for the hardlink replacement bug request

2012-10-21 Thread Albert Cahalan
Getting rid of the only race condition that matters:

Create the link first, with an unused name. Instead of
relying on the return code, which may be wrong for NFS,
call stat to find out if you created the file. Rename the
link over top of the file it is intended to replace. In case
that fails, remove the temporary file.

I suggest temporary names that look like these:

.fdupes-vtYoH1PGPa4lj^LIOfL_i~
.fdupes-wz_7uNXC2R4-ftNq-gl,Z~
.fdupes-kf9_9EQmw-v0nv_-HcyKS~
.fdupes-BTR6AlGWjz@rVSC^+@j+-~
.fdupes--SaeXuxNfj1U0mltgmWNN~

(dotfile, string fdupes, 128 random bits, tilde on end,
and nothing that would be likely to trip up a bash shell)

Note that you can't hope to support all the crazy things
that exist in current and **future** kernels. There are
numerous security modules, mount --bind tricks such
as file-on-file mounting, union filesystems, network fs
servers running on non-Linux systems, and so on. You
have to draw the line somewhere; just make a note in
the man page that the tool is intended for single-user
use in non-crazy situations.

Perfection is the enemy of good; we need this option
working again. Right now I'm desperately rewriting this
tool as a pile of nasty shell scripts, and I assure you
that I totally don't care about cross-filesystem issues.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#284274: Patch for the hardlink replacement bug request

2012-06-21 Thread Sandro Tosi
Hi,

On Thu, Jun 21, 2012 at 2:24 AM, Javier Fernandez-Sanguino
j...@computer.org wrote:
 Maybe you did not understand the proposed algorithm, no copies are
 involved, just file renaiming.

Ah Indeed, I didn't understand that - that's much better than what I
had understood on the first reply (delete + try to hardlink, if fails,
copy back the files from another location).

Thanks for clarifying  working on it.
-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#284274: Patch for the hardlink replacement bug request

2012-06-20 Thread Sandro Tosi
On Mon, Jun 18, 2012 at 7:56 PM, Javier Fernández-Sanguino Peña
j...@computer.org wrote:
 It might not be too difficult to introduce a check in the patch that tries
 the hard link and, if it fails, it restores the file and complains. I'll see
 what I can do.

Having a working -L option would be awesome! I think that the above
algorithm would cause a lot of I/O if the files to link are big (order
of GBs). Maybe we can check if the files are on 2 different FSes and
not trying to hardlink them? I know it can cause a race condition, but
probably better than copying gigs around

Cheers,
-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#284274: Patch for the hardlink replacement bug request

2012-06-20 Thread Javier Fernandez-Sanguino
Hi there,

Maybe you did not understand the proposed algorithm, no copies are
involved, just file renaiming. I was thinking more in the lines of
doing this:

IF A and B are the same THEN
- move B to a temporary file (predefined name or random name, as long
as the file does not exist)
- create a hardlink B to A
IF the hard link fails THEN
   * restore the temporary file to B (rename, not copy) and complain
ELSE
  * emove the temporary file and declare success

I still have to investigate if detecting files across different
filesystems is something easy or not, but the approach above should
work (although there are some race conditions)


Regards

Javier



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#284274: Patch for the hardlink replacement bug request

2012-06-18 Thread Javier Fernández-Sanguino Peña
On Sun, Jun 17, 2012 at 09:40:49PM +0200, Sandro Tosi wrote:
 On Fri, Jul 31, 2009 at 12:56 AM, Javier Fernández-Sanguino Peña
 j...@computer.org wrote:
 
  tags 284274 patch
  thanks
 
  Attached is a patch to the program sources (through the use of a dpatch 
  patch
  in the Debian package) that adds a new -L / --linkhard option to fdupes. 
  This
  option will replace all duplicate files with hardlinks which is useful in
  order to reduce space.
 
  It has been tested only slightly, but the code looks (to me) about right.
 
  Please consider this patch and include it in the Debian package.
 
 As it turned out[1] this patch loses data if some of the file to
 replace are on different filesystems. I'm going to remove it from the
 package for now, but i'd be happy to evaluate a new patch.

It might not be too difficult to introduce a check in the patch that tries
the hard link and, if it fails, it restores the file and complains. I'll see
what I can do.

Regards

Javier


signature.asc
Description: Digital signature


Bug#284274: Patch for the hardlink replacement bug request

2012-06-17 Thread Sandro Tosi
Resending to the bug too, now that the bugs is reopened.

On Sun, Jun 17, 2012 at 9:40 PM, Sandro Tosi mo...@debian.org wrote:
 unarchive 284274
 reopen 284274
 thanks

 On Fri, Jul 31, 2009 at 12:56 AM, Javier Fernández-Sanguino Peña
 j...@computer.org wrote:

 tags 284274 patch
 thanks

 Attached is a patch to the program sources (through the use of a dpatch patch
 in the Debian package) that adds a new -L / --linkhard option to fdupes. This
 option will replace all duplicate files with hardlinks which is useful in
 order to reduce space.

 It has been tested only slightly, but the code looks (to me) about right.

 Please consider this patch and include it in the Debian package.

 As it turned out[1] this patch loses data if some of the file to
 replace are on different filesystems. I'm going to remove it from the
 package for now, but i'd be happy to evaluate a new patch.

 [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=677419

 Regards,
 --
 Sandro Tosi (aka morph, morpheus, matrixhasu)
 My website: http://matrixhasu.altervista.org/
 Me at Debian: http://wiki.debian.org/SandroTosi



-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#284274: Patch for the hardlink replacement bug request

2009-07-31 Thread Sandro Tosi
Hi Javier,

2009/7/31 Javier Fernández-Sanguino Peña j...@computer.org:

 tags 284274 patch
 thanks

 Attached is a patch to the program sources (through the use of a dpatch patch
 in the Debian package) that adds a new -L / --linkhard option to fdupes. This
 option will replace all duplicate files with hardlinks which is useful in
 order to reduce space.

Thanks for the patch!

 It has been tested only slightly, but the code looks (to me) about right.

 Please consider this patch and include it in the Debian package.

I've added upstream in the loop, so he can comment.

Hi Adrian,
a fellow Debian Developer sent me a patch on fdupes to replace
duplicate files with hardlink. It would be nice if you can merge it
fdupe original source code.

Cheers,
-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi
diff -Nru fdupes-1.50-PR2/debian/changelog fdupes-1.50-PR2-2/debian/changelog
--- fdupes-1.50-PR2/debian/changelog	2009-07-31 00:47:23.0 +0200
+++ fdupes-1.50-PR2-2/debian/changelog	2009-07-31 00:44:27.0 +0200
@@ -1,3 +1,11 @@
+fdupes (1.50-PR2-2.1) unstable; urgency=low
+
+  * debian/patches/50_bts284274_hardlinkreplace.dpatch created.
+- added -L / --linkhard to make fdupes replace files with hardlinks. Also
+  update the manual page; Closes: 284274
+
+ -- Javier Fernandez-Sanguino Pen~a j...@debian.org  Fri, 31 Jul 2009 00:43:11 +0200
+
 fdupes (1.50-PR2-2) unstable; urgency=low
 
   * debian/control
diff -Nru fdupes-1.50-PR2/debian/patches/00list fdupes-1.50-PR2-2/debian/patches/00list
--- fdupes-1.50-PR2/debian/patches/00list	2009-07-31 00:47:23.0 +0200
+++ fdupes-1.50-PR2-2/debian/patches/00list	2009-07-31 00:43:02.0 +0200
@@ -5,3 +5,4 @@
 20_bts447601_lfs_support
 30_bts481809_manpage_summarize
 40_bts511702_nohidden_support
+50_bts284274_hardlinkreplace
diff -Nru fdupes-1.50-PR2/debian/patches/50_bts284274_hardlinkreplace.dpatch fdupes-1.50-PR2-2/debian/patches/50_bts284274_hardlinkreplace.dpatch
--- fdupes-1.50-PR2/debian/patches/50_bts284274_hardlinkreplace.dpatch	1970-01-01 01:00:00.0 +0100
+++ fdupes-1.50-PR2-2/debian/patches/50_bts284274_hardlinkreplace.dpatch	2009-07-31 00:42:42.0 +0200
@@ -0,0 +1,228 @@
+#! /bin/sh /usr/share/dpatch/dpatch-run
+## 50_bts284274_hardlinkreplace.dpatch by  j...@debian.org
+##
+## All lines beginning with `## DP:' are a description of the patch.
+## DP: No description.
+
+...@dpatch@
+diff -urNad fdupes-1.50-PR2~/fdupes.1 fdupes-1.50-PR2/fdupes.1
+--- fdupes-1.50-PR2~/fdupes.1	2009-07-31 00:38:28.0 +0200
 fdupes-1.50-PR2/fdupes.1	2009-07-31 00:42:14.0 +0200
+@@ -58,10 +58,17 @@
+ .B CAVEATS
+ below)
+ .TP
++.B -L --hardlink
++replace all duplicate files with hardlinks to the
++first file in each set of duplicates
++.TP
+ .B -N --noprompt
+ when used together with \-\-delete, preserve the first file in each
+ set of duplicates and delete the others without prompting the user 
+ .TP
++.B -D --debug
++provide debugging information
++.TP
+ .B -v --version
+ display fdupes version
+ .TP
+diff -urNad fdupes-1.50-PR2~/fdupes.c fdupes-1.50-PR2/fdupes.c
+--- fdupes-1.50-PR2~/fdupes.c	2009-07-31 00:38:28.0 +0200
 fdupes-1.50-PR2/fdupes.c	2009-07-31 00:41:08.0 +0200
+@@ -53,6 +53,8 @@
+ #define F_NOPROMPT  0x0400
+ #define F_SUMMARIZEMATCHES  0x0800
+ #define F_EXCLUDEHIDDEN 0x1000
++#define F_HARDLINKFILES 0x2000
++#define F_DEBUGINFO 0x4000
+ 
+ char *program_name;
+ 
+@@ -881,6 +883,88 @@
+   free(preservestr);
+ }
+ 
++void hardlinkfiles(file_t *files, int debug)
++{
++  int counter;
++  int groups = 0;
++  int curgroup = 0;
++  file_t *tmpfile;
++  file_t *curfile;
++  file_t **dupelist;
++  int max = 0;
++  int x = 0;
++
++  curfile = files;
++  
++  while (curfile) {
++if (curfile-hasdupes) {
++  counter = 1;
++  groups++;
++
++  tmpfile = curfile-duplicates;
++  while (tmpfile) {
++	counter++;
++	tmpfile = tmpfile-duplicates;
++  }
++  
++  if (counter  max) max = counter;
++}
++
++curfile = curfile-next;
++  }
++
++  max++;
++
++  dupelist = (file_t**) malloc(sizeof(file_t*) * max);
++
++  if (!dupelist) {
++errormsg(out of memory\n);
++exit(1);
++  }
++
++  while (files) {
++if (files-hasdupes) {
++  curgroup++;
++  counter = 1;
++  dupelist[counter] = files;
++
++  if (debug) printf([%d] %s\n, counter, files-d_name);
++
++  tmpfile = files-duplicates;
++
++  while (tmpfile) {
++	dupelist[++counter] = tmpfile;
++	if (debug) printf([%d] %s\n, counter, tmpfile-d_name);
++	tmpfile = tmpfile-duplicates;
++  }
++
++  if (debug) printf(\n);
++
++  /* preserve only the first file */
++
++  printf(   [+] %s\n, dupelist[1]-d_name);
++  for (x = 2; x = counter; x++) { 
++	  if (unlink(dupelist[x]-d_name) == 0) {
++if ( link(dupelist[1]-d_name, 

Bug#284274: Patch for the hardlink replacement bug request

2009-07-30 Thread Javier Fernández-Sanguino Peña

tags 284274 patch
thanks

Attached is a patch to the program sources (through the use of a dpatch patch
in the Debian package) that adds a new -L / --linkhard option to fdupes. This
option will replace all duplicate files with hardlinks which is useful in
order to reduce space.

It has been tested only slightly, but the code looks (to me) about right.

Please consider this patch and include it in the Debian package.

Regards

Javier
diff -Nru fdupes-1.50-PR2/debian/changelog fdupes-1.50-PR2-2/debian/changelog
--- fdupes-1.50-PR2/debian/changelog	2009-07-31 00:47:23.0 +0200
+++ fdupes-1.50-PR2-2/debian/changelog	2009-07-31 00:44:27.0 +0200
@@ -1,3 +1,11 @@
+fdupes (1.50-PR2-2.1) unstable; urgency=low
+
+  * debian/patches/50_bts284274_hardlinkreplace.dpatch created.
+- added -L / --linkhard to make fdupes replace files with hardlinks. Also
+  update the manual page; Closes: 284274
+
+ -- Javier Fernandez-Sanguino Pen~a j...@debian.org  Fri, 31 Jul 2009 00:43:11 +0200
+
 fdupes (1.50-PR2-2) unstable; urgency=low
 
   * debian/control
diff -Nru fdupes-1.50-PR2/debian/patches/00list fdupes-1.50-PR2-2/debian/patches/00list
--- fdupes-1.50-PR2/debian/patches/00list	2009-07-31 00:47:23.0 +0200
+++ fdupes-1.50-PR2-2/debian/patches/00list	2009-07-31 00:43:02.0 +0200
@@ -5,3 +5,4 @@
 20_bts447601_lfs_support
 30_bts481809_manpage_summarize
 40_bts511702_nohidden_support
+50_bts284274_hardlinkreplace
diff -Nru fdupes-1.50-PR2/debian/patches/50_bts284274_hardlinkreplace.dpatch fdupes-1.50-PR2-2/debian/patches/50_bts284274_hardlinkreplace.dpatch
--- fdupes-1.50-PR2/debian/patches/50_bts284274_hardlinkreplace.dpatch	1970-01-01 01:00:00.0 +0100
+++ fdupes-1.50-PR2-2/debian/patches/50_bts284274_hardlinkreplace.dpatch	2009-07-31 00:42:42.0 +0200
@@ -0,0 +1,228 @@
+#! /bin/sh /usr/share/dpatch/dpatch-run
+## 50_bts284274_hardlinkreplace.dpatch by  j...@debian.org
+##
+## All lines beginning with `## DP:' are a description of the patch.
+## DP: No description.
+
+...@dpatch@
+diff -urNad fdupes-1.50-PR2~/fdupes.1 fdupes-1.50-PR2/fdupes.1
+--- fdupes-1.50-PR2~/fdupes.1	2009-07-31 00:38:28.0 +0200
 fdupes-1.50-PR2/fdupes.1	2009-07-31 00:42:14.0 +0200
+@@ -58,10 +58,17 @@
+ .B CAVEATS
+ below)
+ .TP
++.B -L --hardlink
++replace all duplicate files with hardlinks to the
++first file in each set of duplicates
++.TP
+ .B -N --noprompt
+ when used together with \-\-delete, preserve the first file in each
+ set of duplicates and delete the others without prompting the user 
+ .TP
++.B -D --debug
++provide debugging information
++.TP
+ .B -v --version
+ display fdupes version
+ .TP
+diff -urNad fdupes-1.50-PR2~/fdupes.c fdupes-1.50-PR2/fdupes.c
+--- fdupes-1.50-PR2~/fdupes.c	2009-07-31 00:38:28.0 +0200
 fdupes-1.50-PR2/fdupes.c	2009-07-31 00:41:08.0 +0200
+@@ -53,6 +53,8 @@
+ #define F_NOPROMPT  0x0400
+ #define F_SUMMARIZEMATCHES  0x0800
+ #define F_EXCLUDEHIDDEN 0x1000
++#define F_HARDLINKFILES 0x2000
++#define F_DEBUGINFO 0x4000
+ 
+ char *program_name;
+ 
+@@ -881,6 +883,88 @@
+   free(preservestr);
+ }
+ 
++void hardlinkfiles(file_t *files, int debug)
++{
++  int counter;
++  int groups = 0;
++  int curgroup = 0;
++  file_t *tmpfile;
++  file_t *curfile;
++  file_t **dupelist;
++  int max = 0;
++  int x = 0;
++
++  curfile = files;
++  
++  while (curfile) {
++if (curfile-hasdupes) {
++  counter = 1;
++  groups++;
++
++  tmpfile = curfile-duplicates;
++  while (tmpfile) {
++	counter++;
++	tmpfile = tmpfile-duplicates;
++  }
++  
++  if (counter  max) max = counter;
++}
++
++curfile = curfile-next;
++  }
++
++  max++;
++
++  dupelist = (file_t**) malloc(sizeof(file_t*) * max);
++
++  if (!dupelist) {
++errormsg(out of memory\n);
++exit(1);
++  }
++
++  while (files) {
++if (files-hasdupes) {
++  curgroup++;
++  counter = 1;
++  dupelist[counter] = files;
++
++  if (debug) printf([%d] %s\n, counter, files-d_name);
++
++  tmpfile = files-duplicates;
++
++  while (tmpfile) {
++	dupelist[++counter] = tmpfile;
++	if (debug) printf([%d] %s\n, counter, tmpfile-d_name);
++	tmpfile = tmpfile-duplicates;
++  }
++
++  if (debug) printf(\n);
++
++  /* preserve only the first file */
++
++  printf(   [+] %s\n, dupelist[1]-d_name);
++  for (x = 2; x = counter; x++) { 
++	  if (unlink(dupelist[x]-d_name) == 0) {
++if ( link(dupelist[1]-d_name, dupelist[x]-d_name) == 0 ) {
++printf(   [h] %s\n, dupelist[x]-d_name);
++} else {
++printf(-- unable to create a hardlink for the file: %s\n, strerror(errno));
++printf(   [!] %s , dupelist[x]-d_name);
++}
++	  } else {
++	printf(   [!] %s , dupelist[x]-d_name);
++	printf(-- unable to delete the file!\n);
++	  }
++	}
++  printf(\n);
++}
++
++files = files-next;
++  }
++
++