On Thu, 2012-05-24 at 12:29 +0200, Tom Kazimiers wrote:
> Hi,
>
> On 22.05.2012 13:02, Patrick Ohly wrote:
> > On Tue, 2012-05-22 at 11:53 +0200, Tom Kazimiers wrote:
> >> Hi Patrick,
> >>
> >> On 22.05.2012 09:18, Patrick Ohly wrote:
> >>> On Tue, 2012-05-22 at 00:53 +0200, Tom Kazimiers wrote:
> > I have an inkling what it could be. Are these duplicates byte-identical?
>
> Yes, they are.
>
> > The before/after comparisons are based on copying all items into the
> > session directories. To save space, hard links are used between
> > identical files. I suspect that this de-duplication is a bit too
> > aggressive and fails to reproduce your duplicates, although they are
> > still in your real data set.
>
> Sounds reasonable. If you have a commit to try out for me, I could give
> you some feedback whether that was the problem.
I was able to reproduce and fix the problem. As I suspected,
byte-identical duplicates were the trigger. Syncing and creation of the
data dumps works okay, it was only the dump comparison code in
synccompare which was faulty. It made the incorrect assumption that each
inode in a dump will only be used once.
Attached a patch. It should apply (with some fuzz) against the compiled
synccompare in a SyncEvolution distribution. I've included this in an
automated test run and will commit to master soon.
--
Best Regards, Patrick Ohly
The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.
diff --git a/test/synccompare.pl b/test/synccompare.pl
index a46f70d..470e5e8 100644
--- a/test/synccompare.pl
+++ b/test/synccompare.pl
@@ -766,7 +766,7 @@ if($#ARGV > 1) {
# Both "files" are really directories of individual files.
# Don't include files in the comparison which are known
# to be identical because the refer to the same inode.
- # - build map from inode to filename
+ # - build map from inode to filename(s) (each inode might be used more than once!)
my %files1;
my %files2;
my @content1;
@@ -778,7 +778,10 @@ if($#ARGV > 1) {
foreach $entry (grep { -f "$file1/$_" } readdir($dh)) {
$fullname = "$file1/$entry";
$inode = (stat($fullname))[1];
- $files1{$inode} = $entry;
+ if (!$files1{$inode}) {
+ $files1{$inode} = [];
+ }
+ push(@{$files1{$inode}}, $entry);
}
closedir($dh);
# - remove common files, read others
@@ -786,18 +789,21 @@ if($#ARGV > 1) {
foreach $entry (grep { -f "$file2/$_" } readdir($dh)) {
$fullname = "$file2/$entry";
$inode = (stat($fullname))[1];
- if ($files1{$inode}) {
- delete $files1{$inode};
+ if (@{$files1{$inode}}) {
+ # randomly match against the last file
+ pop @{$files1{$inode}};
} else {
open(IN, "<:utf8", "$fullname") || die "$fullname: $!";
push @content2, <IN>;
}
}
# - read remaining entries from first dir
- foreach $entry (values %files1) {
- $fullname = "$file1/$entry";
- open(IN, "<:utf8", "$fullname") || die "$fullname: $!";
- push @content1, <IN>;
+ foreach my $array (values %files1) {
+ foreach $entry (@{$array}) {
+ $fullname = "$file1/$entry";
+ open(IN, "<:utf8", "$fullname") || die "$fullname: $!";
+ push @content1, <IN>;
+ }
}
my $content1 = join("", @content1);
my $content2 = join("", @content2);
_______________________________________________
SyncEvolution mailing list
[email protected]
http://lists.syncevolution.org/listinfo/syncevolution