Re: [Bacula-users] Fix documentation on deduplication
On 25/04/2024 00:47, Martin Simmons wrote: On Wed, 24 Apr 2024 23:40:31 +1000, Gary R Schmidt said: On 24/04/2024 22:33, Gary R. Schmidt wrote: On 24/04/2024 21:30, Roberto Greiner wrote: Em 24/04/2024 04:30, Radosław Korzeniewski escreveu: Hello, wt., 23 kwi 2024 o 13:33 Roberto Greiner napisał(a): Em 23/04/2024 04:34, Radosław Korzeniewski escreveu: Hello, śr., 17 kwi 2024 o 14:01 Roberto Greiner napisał(a): The error is at the end of the page, where it says that you can see how much space is being used using 'df -h', but the problem is that df can't actually see the space gain from dedup, it shows how much would be used without dedup. This command (df -h) shows how much allocated and free space is available on the filesystem. So when you have a dedup ratio 20:1, and you wrote 20TB, then your df command shows 1TB allocated. But that is the exact problem I had. df did NOT show 1TB allocated. It indicated 20TB allocated (yes, in ZFS). I have not used ZFS Dedup for a long time (I'm a ZFS user from the first beta in Solaris), so I'm curious - if your zpool is 2TB in size and you have a 20:1 dedup ratio with 20TB saved and 1TB allocated then what df shows for you? Something like this? Size: 2TB Used: 20TB Avail: 1TB Use%: 2000% No, the values are quite different. I wrote 20tb to stay with the example previously given. My actual numbers are: df: 2,9TB used zpool list: 862GB used, 3.4x dedup level. Actual partition size: 7.2TB You use zpool list to examine filespace. Or zfs list. On FreeBSD at least, zfs list will show the same as df (i.e. will include all copies of the deduplicated data in the USED column). I think the reason is that deduplication is done at the pool level, so there is no single definition of which dataset owns each deduplicated block. As a result, the duplicates have to be counted multiple times. This is different from a cloned dataset, where the original dataset owns any blocks that are shared. That's correct, zfs list gives the logical filespace in use. Sorry. If you do "zfs get used,compressratio filesystem" then you can play with the values returned... $ for i in `zfs list -r zpool | sed 1d | awk '{print $1}'` do zfs get used,compressratio $i | sed 1d done gives a list of very interesting numbers. :-) Cheers, GaryB-) ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fix documentation on deduplication
> On Wed, 24 Apr 2024 23:40:31 +1000, Gary R Schmidt said: > > On 24/04/2024 22:33, Gary R. Schmidt wrote: > > On 24/04/2024 21:30, Roberto Greiner wrote: > >> > >> Em 24/04/2024 04:30, Radosław Korzeniewski escreveu: > >>> Hello, > >>> > >>> wt., 23 kwi 2024 o 13:33 Roberto Greiner > >>> napisał(a): > >>> > >>> > >>> Em 23/04/2024 04:34, Radosław Korzeniewski escreveu: > Hello, > > śr., 17 kwi 2024 o 14:01 Roberto Greiner > napisał(a): > > > The error is at the end of the page, where it says that you > can see how > much space is being used using 'df -h', but the problem is > that df can't > actually see the space gain from dedup, it shows how much > would be used > without dedup. > > > This command (df -h) shows how much allocated and free space is > available on the filesystem. So when you have a dedup ratio 20:1, > and you wrote 20TB, then your df command shows 1TB allocated. > >>> > >>> But that is the exact problem I had. df did NOT show 1TB > >>> allocated. It indicated 20TB allocated (yes, in ZFS). > >>> > >>> I have not used ZFS Dedup for a long time (I'm a ZFS user from the > >>> first beta in Solaris), so I'm curious - if your zpool is 2TB in size > >>> and you have a 20:1 dedup ratio with 20TB saved and 1TB allocated > >>> then what df shows for you? > >>> Something like this? > >>> Size: 2TB > >>> Used: 20TB > >>> Avail: 1TB > >>> Use%: 2000% > >>> > >> No, the values are quite different. I wrote 20tb to stay with the > >> example previously given. My actual numbers are: > >> > >> df: 2,9TB used > >> zpool list: 862GB used, 3.4x dedup level. > >> Actual partition size: 7.2TB > >> > > You use zpool list to examine filespace. > > Or zfs list. On FreeBSD at least, zfs list will show the same as df (i.e. will include all copies of the deduplicated data in the USED column). I think the reason is that deduplication is done at the pool level, so there is no single definition of which dataset owns each deduplicated block. As a result, the duplicates have to be counted multiple times. This is different from a cloned dataset, where the original dataset owns any blocks that are shared. __Martin ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fix documentation on deduplication
> On Wed, 24 Apr 2024 09:30:15 +0200, Radosław Korzeniewski said: > > Hello, > > wt., 23 kwi 2024 o 13:33 Roberto Greiner napisał(a): > > > > > Em 23/04/2024 04:34, Radosław Korzeniewski escreveu: > > > > Hello, > > > > śr., 17 kwi 2024 o 14:01 Roberto Greiner napisał(a): > > > >> > >> The error is at the end of the page, where it says that you can see how > >> much space is being used using 'df -h', but the problem is that df can't > >> actually see the space gain from dedup, it shows how much would be used > >> without dedup. > >> > >> > > This command (df -h) shows how much allocated and free space is available > > on the filesystem. So when you have a dedup ratio 20:1, and you wrote 20TB, > > then your df command shows 1TB allocated. > > > > But that is the exact problem I had. df did NOT show 1TB allocated. It > > indicated 20TB allocated (yes, in ZFS). > > > I have not used ZFS Dedup for a long time (I'm a ZFS user from the first > beta in Solaris), so I'm curious - if your zpool is 2TB in size and you > have a 20:1 dedup ratio with 20TB saved and 1TB allocated then what df > shows for you? > Something like this? > Size: 2TB > Used: 20TB > Avail: 1TB > Use%: 2000% No, the Size will say 21TB in that situation (on FreeBSD at least). __Martin ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fix documentation on deduplication
On 24/04/2024 22:33, Gary R. Schmidt wrote: On 24/04/2024 21:30, Roberto Greiner wrote: Em 24/04/2024 04:30, Radosław Korzeniewski escreveu: Hello, wt., 23 kwi 2024 o 13:33 Roberto Greiner napisał(a): Em 23/04/2024 04:34, Radosław Korzeniewski escreveu: Hello, śr., 17 kwi 2024 o 14:01 Roberto Greiner napisał(a): The error is at the end of the page, where it says that you can see how much space is being used using 'df -h', but the problem is that df can't actually see the space gain from dedup, it shows how much would be used without dedup. This command (df -h) shows how much allocated and free space is available on the filesystem. So when you have a dedup ratio 20:1, and you wrote 20TB, then your df command shows 1TB allocated. But that is the exact problem I had. df did NOT show 1TB allocated. It indicated 20TB allocated (yes, in ZFS). I have not used ZFS Dedup for a long time (I'm a ZFS user from the first beta in Solaris), so I'm curious - if your zpool is 2TB in size and you have a 20:1 dedup ratio with 20TB saved and 1TB allocated then what df shows for you? Something like this? Size: 2TB Used: 20TB Avail: 1TB Use%: 2000% No, the values are quite different. I wrote 20tb to stay with the example previously given. My actual numbers are: df: 2,9TB used zpool list: 862GB used, 3.4x dedup level. Actual partition size: 7.2TB You use zpool list to examine filespace. Or zfs list. ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fix documentation on deduplication
On 24/04/2024 21:30, Roberto Greiner wrote: Em 24/04/2024 04:30, Radosław Korzeniewski escreveu: Hello, wt., 23 kwi 2024 o 13:33 Roberto Greiner napisał(a): Em 23/04/2024 04:34, Radosław Korzeniewski escreveu: Hello, śr., 17 kwi 2024 o 14:01 Roberto Greiner napisał(a): The error is at the end of the page, where it says that you can see how much space is being used using 'df -h', but the problem is that df can't actually see the space gain from dedup, it shows how much would be used without dedup. This command (df -h) shows how much allocated and free space is available on the filesystem. So when you have a dedup ratio 20:1, and you wrote 20TB, then your df command shows 1TB allocated. But that is the exact problem I had. df did NOT show 1TB allocated. It indicated 20TB allocated (yes, in ZFS). I have not used ZFS Dedup for a long time (I'm a ZFS user from the first beta in Solaris), so I'm curious - if your zpool is 2TB in size and you have a 20:1 dedup ratio with 20TB saved and 1TB allocated then what df shows for you? Something like this? Size: 2TB Used: 20TB Avail: 1TB Use%: 2000% No, the values are quite different. I wrote 20tb to stay with the example previously given. My actual numbers are: df: 2,9TB used zpool list: 862GB used, 3.4x dedup level. Actual partition size: 7.2TB You use zpool list to examine filespace. Cheers, GaryB-) ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fix documentation on deduplication
Em 24/04/2024 04:30, Radosław Korzeniewski escreveu: Hello, wt., 23 kwi 2024 o 13:33 Roberto Greiner napisał(a): Em 23/04/2024 04:34, Radosław Korzeniewski escreveu: Hello, śr., 17 kwi 2024 o 14:01 Roberto Greiner napisał(a): The error is at the end of the page, where it says that you can see how much space is being used using 'df -h', but the problem is that df can't actually see the space gain from dedup, it shows how much would be used without dedup. This command (df -h) shows how much allocated and free space is available on the filesystem. So when you have a dedup ratio 20:1, and you wrote 20TB, then your df command shows 1TB allocated. But that is the exact problem I had. df did NOT show 1TB allocated. It indicated 20TB allocated (yes, in ZFS). I have not used ZFS Dedup for a long time (I'm a ZFS user from the first beta in Solaris), so I'm curious - if your zpool is 2TB in size and you have a 20:1 dedup ratio with 20TB saved and 1TB allocated then what df shows for you? Something like this? Size: 2TB Used: 20TB Avail: 1TB Use%: 2000% No, the values are quite different. I wrote 20tb to stay with the example previously given. My actual numbers are: df: 2,9TB used zpool list: 862GB used, 3.4x dedup level. Actual partition size: 7.2TB Roberto -- - Marcos Roberto Greiner Os otimistas acham que estamos no melhor dos mundos Os pessimistas tem medo de que isto seja verdade James Branch Cabell - ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fix documentation on deduplication
Hello, wt., 23 kwi 2024 o 17:10 Martin Simmons napisał(a): > > > But that is the exact problem I had. df did NOT show 1TB allocated. It > > indicated 20TB allocated (yes, in ZFS). > > Yes, that is how df works with ZFS unfortunately (it doesn't know about > dedup). See also > > https://c0t0d0s0.org/oracle/solaris/english/2009/12/02/df-considered-problematic.c0t0d0s0.html > Thanks for the link. I was almost certain that df was working well for ZFS + Dedup when I used it. Radek -- Radosław Korzeniewski rados...@korzeniewski.net ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fix documentation on deduplication
Hello, wt., 23 kwi 2024 o 13:33 Roberto Greiner napisał(a): > > Em 23/04/2024 04:34, Radosław Korzeniewski escreveu: > > Hello, > > śr., 17 kwi 2024 o 14:01 Roberto Greiner napisał(a): > >> >> The error is at the end of the page, where it says that you can see how >> much space is being used using 'df -h', but the problem is that df can't >> actually see the space gain from dedup, it shows how much would be used >> without dedup. >> >> > This command (df -h) shows how much allocated and free space is available > on the filesystem. So when you have a dedup ratio 20:1, and you wrote 20TB, > then your df command shows 1TB allocated. > > But that is the exact problem I had. df did NOT show 1TB allocated. It > indicated 20TB allocated (yes, in ZFS). > I have not used ZFS Dedup for a long time (I'm a ZFS user from the first beta in Solaris), so I'm curious - if your zpool is 2TB in size and you have a 20:1 dedup ratio with 20TB saved and 1TB allocated then what df shows for you? Something like this? Size: 2TB Used: 20TB Avail: 1TB Use%: 2000% Radek -- Radosław Korzeniewski rados...@korzeniewski.net ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fix documentation on deduplication
> On Tue, 23 Apr 2024 08:31:59 -0300, Roberto Greiner said: > > Em 23/04/2024 04:34, Radosław Korzeniewski escreveu: > > Hello, > > > > śr., 17 kwi 2024 o 14:01 Roberto Greiner napisał(a): > > > > > > The error is at the end of the page, where it says that you can > > see how > > much space is being used using 'df -h', but the problem is that df > > can't > > actually see the space gain from dedup, it shows how much would be > > used > > without dedup. > > > > > > This command (df -h) shows how much allocated and free space is > > available on the filesystem. So when you have a dedup ratio 20:1, and > > you wrote 20TB, then your df command shows 1TB allocated. > > But that is the exact problem I had. df did NOT show 1TB allocated. It > indicated 20TB allocated (yes, in ZFS). Yes, that is how df works with ZFS unfortunately (it doesn't know about dedup). See also https://c0t0d0s0.org/oracle/solaris/english/2009/12/02/df-considered-problematic.c0t0d0s0.html __Martin ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fix documentation on deduplication
Hello, śr., 17 kwi 2024 o 14:01 Roberto Greiner napisał(a): > > The error is at the end of the page, where it says that you can see how > much space is being used using 'df -h', but the problem is that df can't > actually see the space gain from dedup, it shows how much would be used > without dedup. > > This command (df -h) shows how much allocated and free space is available on the filesystem. So when you have a dedup ratio 20:1, and you wrote 20TB, then your df command shows 1TB allocated. Yes, zpool list shows you the exact Dedup ratio achieved without additional checking or counting. But this command (as mentioned by Heitor) will work with ZFS only. Aligned volumes can be used with external deduplication appliances where zpool command is unavailable. Then you can quickly check with the df -h command. Radek -- Radosław Korzeniewski rados...@korzeniewski.net ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fix documentation on deduplication
Em 23/04/2024 04:34, Radosław Korzeniewski escreveu: Hello, śr., 17 kwi 2024 o 14:01 Roberto Greiner napisał(a): The error is at the end of the page, where it says that you can see how much space is being used using 'df -h', but the problem is that df can't actually see the space gain from dedup, it shows how much would be used without dedup. This command (df -h) shows how much allocated and free space is available on the filesystem. So when you have a dedup ratio 20:1, and you wrote 20TB, then your df command shows 1TB allocated. But that is the exact problem I had. df did NOT show 1TB allocated. It indicated 20TB allocated (yes, in ZFS). Yes, zpool list shows you the exact Dedup ratio achieved without additional checking or counting. But this command (as mentioned by Heitor) will work with ZFS only. Aligned volumes can be used with external deduplication appliances where zpool command is unavailable. Then you can quickly check with the df -h command. Yes zpool listed all the information properly, both the actually allocated space and the dedup ratio, and as I said, in ZFS, df is not showing the correct information (in an Ubuntu 22.04 and ZFS environment). Thank you, Roberto -- - Marcos Roberto Greiner Os otimistas acham que estamos no melhor dos mundos Os pessimistas tem medo de que isto seja verdade James Branch Cabell - ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fix documentation on deduplication
Hello Roberto, This guide was written by me, and it is not part of the bacula.org project. That said, the step-by-step deployment was made using ddumbfs, despite the fact I briefly mention that ZFS could also be used. Rgds. MSc,MBA Heitor Faria (Miami/USA) Bacula LATAM CIO mobile1: + 1 909 655-8971 mobile2: + 55 61 98268-4220 bacula.lat | bacula.com.br From: "Roberto Greiner" To: "bacula-users" Sent: Wednesday, April 17, 2024 9:06 AM Subject: [Bacula-users] Fix documentation on deduplication Hy, I've installed a bacula system using ZFS deduplication in an Ubuntu 22.4 server, and one thing that made me lose a lot of time is that there is an error in the documentation, more specifically on this page: https://www.bacula.lat/community/block-level-file-system-deduplication-with-aligned-volumes-tutorial-bacula-9-0-8-and-above/?lang=en The same page is available in Portuguese, with the same problem, in the following address: https://www.bacula.lat/community/dedup-alinhado/ The error is at the end of the page, where it says that you can see how much space is being used using 'df -h', but the problem is that df can't actually see the space gain from dedup, it shows how much would be used without dedup. After some search, I found in the chapter 1.7 of 'https://bacula.org/whitepapers/DedupVolumes.pdf' that the proper command for checking dedup usage in ZFS is 'zpool list', and that command did show that dedup was working properly. These are my outputs with the two commands: user@bacula2:~$ df -h Filesystem Size Used Avail Use% Mounted on tmpfs 788M 2,8M 786M 1% /run /dev/mapper/ubuntu--vg-ubuntu--lv 910G 52G 812G 6% / tmpfs 3,9G 0 3,9G 0% /dev/shm tmpfs 5,0M 0 5,0M 0% /run/lock /dev/sda2 2,0G 252M 1,6G 14% /boot zfs 6,4T 128K 6,4T 1% /zfs zfs/mnt 9,2T 2,9T 6,4T 31% /zfs/mnt tmpfs 788M 4,0K 788M 1% /run/user/0 tmpfs 788M 4,0K 788M 1% /run/user/1000 user@bacula2:~$ zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT zfs 7.27T 850G 6.44T - - 3% 11% 3.41x ONLINE - So, could someone please correct the two above mentioned pages? It would avoid others from having the same problem. Thank you, Roberto -- - Marcos Roberto Greiner Os otimistas acham que estamos no melhor dos mundos Os pessimistas tem medo de que isto seja verdade James Branch Cabell - ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users