Hi,

Two points.
1. Zfs diff has knowledge of changed blocks and needs to look up the filename 
(or multiple filenames) of the block. This is a computational heavy operation 
and can take a very long time.
2. A snapshot can consist of only metadata changes, like atime changes. So 
although the snapshots contains a few KB of MB of changes the diff doesn't show 
any changed data.

You might try to use rsync to find differences between snapshots. This will 
generate a list of changes of two snapshots on my disk.

# time rsync -an -v --stats --delete 
/data/jails/_ports/.zfs/snapshot/repl-2025-02-12_00-00/ 
/data/jails/_ports/.zfs/snapshot/test/
sending incremental file list
./
MOVED
UPDATING
.git/
.git/FETCH_HEAD
... [snip long list of files] ...
x11/xwayland-satellite/distinfo
x11/yakuake/
x11/yakuake/distinfo

Number of files: 349,690 (reg: 283,943, dir: 65,236, link: 511)
Number of created files: 145 (reg: 120, dir: 25)
Number of deleted files: 267 (reg: 207, dir: 60)
Number of regular files transferred: 2,342
Total file size: 6,873,671,941 bytes
Total transferred file size: 43,376,750 bytes
Literal data: 0 bytes
Matched data: 0 bytes
File list size: 524,271
File list generation time: 0.006 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 10,499,359
Total bytes received: 92,348

sent 10,499,359 bytes  received 92,348 bytes  282,445.52 bytes/sec
total size is 6,873,671,941  speedup is 648.97 (DRY RUN)

real    0m37.316s
user    0m27.882s
sys    0m33.888s


This will be O(#files), but it will check all the files. Zfs diff might only 
check changed blocks, but needs to do a O(#changes * #files) lookup [1]. So 
which is faster depends on your situation.

NB: quite some information is available about the speed of zfs diff when I google on 
"zfs diff takes too long".
[1] 
https://zfsonlinux.topicbox.com/groups/zfs-discuss/T3d7c034221b1220a-Mac8fdaa32ad829183baad855
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1881748
https://github.com/openzfs/zfs/pull/12837
https://github.com/openzfs/zfs/pull/10391
https://github.com/openzfs/zfs/issues/6920

Regards,
Ronald.


Van: "Eugene M. Zheganin" <[email protected]>
Datum: woensdag, 12 februari 2025 17:41
Aan: [email protected]
Onderwerp: zfs diff

Hello,

I have a 13.2-RELEASE-p3 system with a large storage attached:

===Cut===

NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP DEDUP    HEALTH  
ALTROOT
tank    135T   122T  12.8T        -         -    66%    90% 1.27x    ONLINE  -
zroot  31.5G  27.8G  3.74G        -         -    70%    88% 1.00x    ONLINE  -

===Cut===

In order to process some newly incoming files I'd like to use the zfs diff 
functionality to get the list of the files created or modified. So I wrote a 
simple script (/root/periodic/zfsdiff) diffing two dataset snapshots between 
today and yesterday. Most of these launches do merely work. But not all of 
them. Some (like 15%) just are waiting for something infinitely, while 
seemingly doing nothing:

===Cut===

39935  -  I      18101:52,29 zfs diff tank/data/tank2@2025-01-20 
tank/data/tank2@2025-01-21
46118  -  Is         0:00,00 /bin/sh /root/periodic/zfsdiff
46126  -  I        354:34,75 zfs diff tank/data/tank0@2025-02-03 
tank/data/tank0@2025-02-04
49620  -  I       2155:14,42 zfs diff tank/data/tank1@2025-02-10 
tank/data/tank1@2025-02-11
53243  -  Is         0:00,00 /bin/sh /root/periodic/zfsdiff
53255  -  I       3607:34,83 zfs diff tank/data/tank0@2025-02-09 
tank/data/tank0@2025-02-10
56849  -  Is         0:00,00 /bin/sh /root/periodic/zfsdiff
59725  -  I       3630:23,01 zfs diff tank/data/tank2@2025-01-27 
tank/data/tank2@2025-01-28
65460  -  I       1425:25,55 zfs diff tank/data/tank1@2025-02-03 
tank/data/tank1@2025-02-04
82371  -  I        111:25,63 zfs diff tank/data/tank3@2025-02-11 
tank/data/tank3@2025-02-12
98172  -  Is         0:00,00 /bin/sh /root/periodic/zfsdiff
98223  -  I       4792:11,99 zfs diff tank/data/tank3@2025-02-04 
tank/data/tank3@2025-02-05
40589  2  IN     18108:48,07 zfs diff tank/data/tank2@2025-01-20 
tank/data/tank2@2025-01-21
28649  6  I+       471:24,81 zfs diff tank/data/tank1@2025-02-03

===Cut===

Surprisingly, this has little to no correlation to the size of the snapshot, 
for instance I have the relatively small snapshot diff that fails to process 
(notice the idle process above):

===Cut===

tank/data/tank1@2025-02-03                       31.6M      - 16.0T  -
tank/data/tank1@2025-02-04                       32.5M      - 16.0T  -

===Cut===

Also, some of these leave no output, without any traces of the script killed or 
crashed which is very suspicious as well. You could say that this probably 
means there were no changes, but the snapshot size thinks there were some.

Is there any trick there ? Does this look like a race condition, do I have to 
run these sequentially, like one diff at a time ? Can those interfere with only 
their fellow diffs, or also with snapshot creation ?


Thanks.

Eugene.




  • zfs diff Eugene M. Zheganin
    • Re: zfs diff Ronald Klop

Reply via email to