Re: [Bacula-users] Fix documentation on deduplication

2024-04-24 Thread Gary R. Schmidt

On 25/04/2024 00:47, Martin Simmons wrote:

On Wed, 24 Apr 2024 23:40:31 +1000, Gary R Schmidt said:


On 24/04/2024 22:33, Gary R. Schmidt wrote:

On 24/04/2024 21:30, Roberto Greiner wrote:


Em 24/04/2024 04:30, Radosław Korzeniewski escreveu:

Hello,

wt., 23 kwi 2024 o 13:33 Roberto Greiner 
napisał(a):


     Em 23/04/2024 04:34, Radosław Korzeniewski escreveu:

     Hello,

     śr., 17 kwi 2024 o 14:01 Roberto Greiner 
     napisał(a):


     The error is at the end of the page, where it says that you
     can see how
     much space is being used using 'df -h', but the problem is
     that df can't
     actually see the space gain from dedup, it shows how much
     would be used
     without dedup.


     This command (df -h) shows how much allocated and free space is
     available on the filesystem. So when you have a dedup ratio 20:1,
     and you wrote 20TB, then your df command shows 1TB allocated.


     But that is the exact problem I had. df did NOT show 1TB
     allocated. It indicated 20TB allocated (yes, in ZFS).

I have not used ZFS Dedup for a long time (I'm a ZFS user from the
first beta in Solaris), so I'm curious - if your zpool is 2TB in size
and you have a 20:1 dedup ratio with 20TB saved and 1TB allocated
then what df shows for you?
Something like this?
Size: 2TB
Used: 20TB
Avail: 1TB
Use%: 2000%


No, the values are quite different. I wrote 20tb to stay with the
example previously given. My actual numbers are:

df: 2,9TB used
zpool list: 862GB used, 3.4x dedup level.
Actual partition size: 7.2TB


You use zpool list to examine filespace.
Or zfs list.


On FreeBSD at least, zfs list will show the same as df (i.e. will include all
copies of the deduplicated data in the USED column).

I think the reason is that deduplication is done at the pool level, so there
is no single definition of which dataset owns each deduplicated block.  As a
result, the duplicates have to be counted multiple times.  This is different
from a cloned dataset, where the original dataset owns any blocks that are
shared.
That's correct, zfs list gives the logical filespace in use.  Sorry.


If you do "zfs get used,compressratio filesystem" then you can play with 
the values returned...


$ for i in `zfs list -r zpool | sed 1d | awk '{print $1}'`
do
zfs get used,compressratio $i | sed 1d
done
gives a list of very interesting numbers.  :-)

Cheers,
GaryB-)


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fix documentation on deduplication

2024-04-24 Thread Martin Simmons
> On Wed, 24 Apr 2024 23:40:31 +1000, Gary R Schmidt said:
> 
> On 24/04/2024 22:33, Gary R. Schmidt wrote:
> > On 24/04/2024 21:30, Roberto Greiner wrote:
> >>
> >> Em 24/04/2024 04:30, Radosław Korzeniewski escreveu:
> >>> Hello,
> >>>
> >>> wt., 23 kwi 2024 o 13:33 Roberto Greiner  
> >>> napisał(a):
> >>>
> >>>
> >>>     Em 23/04/2024 04:34, Radosław Korzeniewski escreveu:
>      Hello,
> 
>      śr., 17 kwi 2024 o 14:01 Roberto Greiner 
>      napisał(a):
> 
> 
>      The error is at the end of the page, where it says that you
>      can see how
>      much space is being used using 'df -h', but the problem is
>      that df can't
>      actually see the space gain from dedup, it shows how much
>      would be used
>      without dedup.
> 
> 
>      This command (df -h) shows how much allocated and free space is
>      available on the filesystem. So when you have a dedup ratio 20:1,
>      and you wrote 20TB, then your df command shows 1TB allocated.
> >>>
> >>>     But that is the exact problem I had. df did NOT show 1TB
> >>>     allocated. It indicated 20TB allocated (yes, in ZFS).
> >>>
> >>> I have not used ZFS Dedup for a long time (I'm a ZFS user from the 
> >>> first beta in Solaris), so I'm curious - if your zpool is 2TB in size 
> >>> and you have a 20:1 dedup ratio with 20TB saved and 1TB allocated 
> >>> then what df shows for you?
> >>> Something like this?
> >>> Size: 2TB
> >>> Used: 20TB
> >>> Avail: 1TB
> >>> Use%: 2000%
> >>>
> >> No, the values are quite different. I wrote 20tb to stay with the 
> >> example previously given. My actual numbers are:
> >>
> >> df: 2,9TB used
> >> zpool list: 862GB used, 3.4x dedup level.
> >> Actual partition size: 7.2TB
> >>
> > You use zpool list to examine filespace.
> > Or zfs list.

On FreeBSD at least, zfs list will show the same as df (i.e. will include all
copies of the deduplicated data in the USED column).

I think the reason is that deduplication is done at the pool level, so there
is no single definition of which dataset owns each deduplicated block.  As a
result, the duplicates have to be counted multiple times.  This is different
from a cloned dataset, where the original dataset owns any blocks that are
shared.

__Martin


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fix documentation on deduplication

2024-04-24 Thread Martin Simmons
> On Wed, 24 Apr 2024 09:30:15 +0200, Radosław Korzeniewski said:
> 
> Hello,
> 
> wt., 23 kwi 2024 o 13:33 Roberto Greiner  napisał(a):
> 
> >
> > Em 23/04/2024 04:34, Radosław Korzeniewski escreveu:
> >
> > Hello,
> >
> > śr., 17 kwi 2024 o 14:01 Roberto Greiner  napisał(a):
> >
> >>
> >> The error is at the end of the page, where it says that you can see how
> >> much space is being used using 'df -h', but the problem is that df can't
> >> actually see the space gain from dedup, it shows how much would be used
> >> without dedup.
> >>
> >>
> > This command (df -h) shows how much allocated and free space is available
> > on the filesystem. So when you have a dedup ratio 20:1, and you wrote 20TB,
> > then your df command shows 1TB allocated.
> >
> > But that is the exact problem I had. df did NOT show 1TB allocated. It
> > indicated 20TB allocated (yes, in ZFS).
> >
> I have not used ZFS Dedup for a long time (I'm a ZFS user from the first
> beta in Solaris), so I'm curious - if your zpool is 2TB in size and you
> have a 20:1 dedup ratio with 20TB saved and 1TB allocated then what df
> shows for you?
> Something like this?
> Size: 2TB
> Used: 20TB
> Avail: 1TB
> Use%: 2000%

No, the Size will say 21TB in that situation (on FreeBSD at least).

__Martin


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fix documentation on deduplication

2024-04-24 Thread Gary R. Schmidt

On 24/04/2024 22:33, Gary R. Schmidt wrote:

On 24/04/2024 21:30, Roberto Greiner wrote:


Em 24/04/2024 04:30, Radosław Korzeniewski escreveu:

Hello,

wt., 23 kwi 2024 o 13:33 Roberto Greiner  
napisał(a):



    Em 23/04/2024 04:34, Radosław Korzeniewski escreveu:

    Hello,

    śr., 17 kwi 2024 o 14:01 Roberto Greiner 
    napisał(a):


    The error is at the end of the page, where it says that you
    can see how
    much space is being used using 'df -h', but the problem is
    that df can't
    actually see the space gain from dedup, it shows how much
    would be used
    without dedup.


    This command (df -h) shows how much allocated and free space is
    available on the filesystem. So when you have a dedup ratio 20:1,
    and you wrote 20TB, then your df command shows 1TB allocated.


    But that is the exact problem I had. df did NOT show 1TB
    allocated. It indicated 20TB allocated (yes, in ZFS).

I have not used ZFS Dedup for a long time (I'm a ZFS user from the 
first beta in Solaris), so I'm curious - if your zpool is 2TB in size 
and you have a 20:1 dedup ratio with 20TB saved and 1TB allocated 
then what df shows for you?

Something like this?
Size: 2TB
Used: 20TB
Avail: 1TB
Use%: 2000%

No, the values are quite different. I wrote 20tb to stay with the 
example previously given. My actual numbers are:


df: 2,9TB used
zpool list: 862GB used, 3.4x dedup level.
Actual partition size: 7.2TB


You use zpool list to examine filespace.
Or zfs list.



___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fix documentation on deduplication

2024-04-24 Thread Gary R. Schmidt

On 24/04/2024 21:30, Roberto Greiner wrote:


Em 24/04/2024 04:30, Radosław Korzeniewski escreveu:

Hello,

wt., 23 kwi 2024 o 13:33 Roberto Greiner  napisał(a):


Em 23/04/2024 04:34, Radosław Korzeniewski escreveu:

Hello,

śr., 17 kwi 2024 o 14:01 Roberto Greiner 
napisał(a):


The error is at the end of the page, where it says that you
can see how
much space is being used using 'df -h', but the problem is
that df can't
actually see the space gain from dedup, it shows how much
would be used
without dedup.


This command (df -h) shows how much allocated and free space is
available on the filesystem. So when you have a dedup ratio 20:1,
and you wrote 20TB, then your df command shows 1TB allocated.


But that is the exact problem I had. df did NOT show 1TB
allocated. It indicated 20TB allocated (yes, in ZFS).

I have not used ZFS Dedup for a long time (I'm a ZFS user from the 
first beta in Solaris), so I'm curious - if your zpool is 2TB in size 
and you have a 20:1 dedup ratio with 20TB saved and 1TB allocated then 
what df shows for you?

Something like this?
Size: 2TB
Used: 20TB
Avail: 1TB
Use%: 2000%

No, the values are quite different. I wrote 20tb to stay with the 
example previously given. My actual numbers are:


df: 2,9TB used
zpool list: 862GB used, 3.4x dedup level.
Actual partition size: 7.2TB


You use zpool list to examine filespace.

Cheers,
GaryB-)


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fix documentation on deduplication

2024-04-24 Thread Roberto Greiner


Em 24/04/2024 04:30, Radosław Korzeniewski escreveu:

Hello,

wt., 23 kwi 2024 o 13:33 Roberto Greiner  napisał(a):


Em 23/04/2024 04:34, Radosław Korzeniewski escreveu:

Hello,

śr., 17 kwi 2024 o 14:01 Roberto Greiner 
napisał(a):


The error is at the end of the page, where it says that you
can see how
much space is being used using 'df -h', but the problem is
that df can't
actually see the space gain from dedup, it shows how much
would be used
without dedup.


This command (df -h) shows how much allocated and free space is
available on the filesystem. So when you have a dedup ratio 20:1,
and you wrote 20TB, then your df command shows 1TB allocated.


But that is the exact problem I had. df did NOT show 1TB
allocated. It indicated 20TB allocated (yes, in ZFS).

I have not used ZFS Dedup for a long time (I'm a ZFS user from the 
first beta in Solaris), so I'm curious - if your zpool is 2TB in size 
and you have a 20:1 dedup ratio with 20TB saved and 1TB allocated then 
what df shows for you?

Something like this?
Size: 2TB
Used: 20TB
Avail: 1TB
Use%: 2000%

No, the values are quite different. I wrote 20tb to stay with the 
example previously given. My actual numbers are:


df: 2,9TB used
zpool list: 862GB used, 3.4x dedup level.
Actual partition size: 7.2TB

Roberto


--

-
Marcos Roberto Greiner

   Os otimistas acham que estamos no melhor dos mundos
Os pessimistas tem medo de que isto seja verdade
 James Branch Cabell
  -
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fix documentation on deduplication

2024-04-24 Thread Radosław Korzeniewski
Hello,

wt., 23 kwi 2024 o 17:10 Martin Simmons  napisał(a):

>
> > But that is the exact problem I had. df did NOT show 1TB allocated. It
> > indicated 20TB allocated (yes, in ZFS).
>
> Yes, that is how df works with ZFS unfortunately (it doesn't know about
> dedup).  See also
>
> https://c0t0d0s0.org/oracle/solaris/english/2009/12/02/df-considered-problematic.c0t0d0s0.html
>

Thanks for the link. I was almost certain that df was working well for ZFS
+ Dedup when I used it.

Radek
-- 
Radosław Korzeniewski
rados...@korzeniewski.net
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fix documentation on deduplication

2024-04-24 Thread Radosław Korzeniewski
Hello,

wt., 23 kwi 2024 o 13:33 Roberto Greiner  napisał(a):

>
> Em 23/04/2024 04:34, Radosław Korzeniewski escreveu:
>
> Hello,
>
> śr., 17 kwi 2024 o 14:01 Roberto Greiner  napisał(a):
>
>>
>> The error is at the end of the page, where it says that you can see how
>> much space is being used using 'df -h', but the problem is that df can't
>> actually see the space gain from dedup, it shows how much would be used
>> without dedup.
>>
>>
> This command (df -h) shows how much allocated and free space is available
> on the filesystem. So when you have a dedup ratio 20:1, and you wrote 20TB,
> then your df command shows 1TB allocated.
>
> But that is the exact problem I had. df did NOT show 1TB allocated. It
> indicated 20TB allocated (yes, in ZFS).
>
I have not used ZFS Dedup for a long time (I'm a ZFS user from the first
beta in Solaris), so I'm curious - if your zpool is 2TB in size and you
have a 20:1 dedup ratio with 20TB saved and 1TB allocated then what df
shows for you?
Something like this?
Size: 2TB
Used: 20TB
Avail: 1TB
Use%: 2000%

Radek
-- 
Radosław Korzeniewski
rados...@korzeniewski.net
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fix documentation on deduplication

2024-04-23 Thread Martin Simmons
> On Tue, 23 Apr 2024 08:31:59 -0300, Roberto Greiner said:
> 
> Em 23/04/2024 04:34, Radosław Korzeniewski escreveu:
> > Hello,
> >
> > śr., 17 kwi 2024 o 14:01 Roberto Greiner  napisał(a):
> >
> >
> > The error is at the end of the page, where it says that you can
> > see how
> > much space is being used using 'df -h', but the problem is that df
> > can't
> > actually see the space gain from dedup, it shows how much would be
> > used
> > without dedup.
> >
> >
> > This command (df -h) shows how much allocated and free space is 
> > available on the filesystem. So when you have a dedup ratio 20:1, and 
> > you wrote 20TB, then your df command shows 1TB allocated.
> 
> But that is the exact problem I had. df did NOT show 1TB allocated. It 
> indicated 20TB allocated (yes, in ZFS).

Yes, that is how df works with ZFS unfortunately (it doesn't know about
dedup).  See also
https://c0t0d0s0.org/oracle/solaris/english/2009/12/02/df-considered-problematic.c0t0d0s0.html

__Martin


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fix documentation on deduplication

2024-04-23 Thread Radosław Korzeniewski
Hello,

śr., 17 kwi 2024 o 14:01 Roberto Greiner  napisał(a):

>
> The error is at the end of the page, where it says that you can see how
> much space is being used using 'df -h', but the problem is that df can't
> actually see the space gain from dedup, it shows how much would be used
> without dedup.
>
>
This command (df -h) shows how much allocated and free space is available
on the filesystem. So when you have a dedup ratio 20:1, and you wrote 20TB,
then your df command shows 1TB allocated.
Yes, zpool list shows you the exact Dedup ratio achieved without additional
checking or counting. But this command (as mentioned by Heitor) will work
with ZFS only.
Aligned volumes can be used with external deduplication appliances where
zpool command is unavailable. Then you can quickly check with the df -h
command.

Radek
-- 
Radosław Korzeniewski
rados...@korzeniewski.net
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fix documentation on deduplication

2024-04-23 Thread Roberto Greiner


Em 23/04/2024 04:34, Radosław Korzeniewski escreveu:

Hello,

śr., 17 kwi 2024 o 14:01 Roberto Greiner  napisał(a):


The error is at the end of the page, where it says that you can
see how
much space is being used using 'df -h', but the problem is that df
can't
actually see the space gain from dedup, it shows how much would be
used
without dedup.


This command (df -h) shows how much allocated and free space is 
available on the filesystem. So when you have a dedup ratio 20:1, and 
you wrote 20TB, then your df command shows 1TB allocated.


But that is the exact problem I had. df did NOT show 1TB allocated. It 
indicated 20TB allocated (yes, in ZFS).



Yes, zpool list shows you the exact Dedup ratio achieved without 
additional checking or counting. But this command (as mentioned by 
Heitor) will work with ZFS only.
Aligned volumes can be used with external deduplication appliances 
where zpool command is unavailable. Then you can quickly check with 
the df -h command.


Yes zpool listed all the information properly, both the actually 
allocated space and the dedup ratio, and as I said, in ZFS, df is not 
showing the correct information (in an Ubuntu 22.04 and ZFS environment).


Thank you,

Roberto


--
-
Marcos Roberto Greiner

   Os otimistas acham que estamos no melhor dos mundos
Os pessimistas tem medo de que isto seja verdade
 James Branch Cabell
  -
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fix documentation on deduplication

2024-04-17 Thread Heitor Faria
Hello Roberto,



This guide was written by me, and it is not part of the bacula.org project.

That said, the step-by-step deployment was made using ddumbfs, despite the fact 
I briefly mention that ZFS could also be used.



Rgds.

MSc,MBA Heitor Faria (Miami/USA)
Bacula LATAM CIO

mobile1: + 1 909 655-8971
mobile2: + 55 61 98268-4220



bacula.lat | bacula.com.br








From: "Roberto Greiner" 
To: "bacula-users" 
Sent: Wednesday, April 17, 2024 9:06 AM
Subject: [Bacula-users] Fix documentation on deduplication

Hy,

I've installed a bacula system using ZFS deduplication in an Ubuntu 22.4
server, and one thing that made me lose a lot of time is that there is
an error in the documentation, more specifically on this page:

https://www.bacula.lat/community/block-level-file-system-deduplication-with-aligned-volumes-tutorial-bacula-9-0-8-and-above/?lang=en

The same page is available in Portuguese, with the same problem, in the
following address:

https://www.bacula.lat/community/dedup-alinhado/

The error is at the end of the page, where it says that you can see how
much space is being used using 'df -h', but the problem is that df can't
actually see the space gain from dedup, it shows how much would be used
without dedup.

After some search, I found in the chapter 1.7 of
'https://bacula.org/whitepapers/DedupVolumes.pdf' that the proper
command for checking dedup usage in ZFS is 'zpool list', and that
command did show that dedup was working properly.

These are my outputs with the two commands:

user@bacula2:~$ df -h
Filesystem Size  Used Avail Use% Mounted on
tmpfs  788M  2,8M  786M   1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv  910G   52G  812G   6% /
tmpfs  3,9G 0  3,9G   0% /dev/shm
tmpfs  5,0M 0  5,0M   0% /run/lock
/dev/sda2  2,0G  252M  1,6G  14% /boot
zfs    6,4T  128K  6,4T   1% /zfs
zfs/mnt    9,2T  2,9T  6,4T  31% /zfs/mnt
tmpfs  788M  4,0K  788M   1% /run/user/0
tmpfs  788M  4,0K  788M   1% /run/user/1000
user@bacula2:~$ zpool list
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP DEDUP   
HEALTH  ALTROOT
zfs   7.27T   850G  6.44T    - - 3%    11% 3.41x   
ONLINE  -

So, could someone please correct the two above mentioned pages? It would
avoid others from having the same problem.

Thank you,

Roberto



--
-
Marcos Roberto Greiner

Os otimistas acham que estamos no melhor dos mundos
Os pessimistas tem medo de que isto seja verdade
James Branch Cabell
-



___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Fix documentation on deduplication

2024-04-17 Thread Roberto Greiner

Hy,

I've installed a bacula system using ZFS deduplication in an Ubuntu 22.4 
server, and one thing that made me lose a lot of time is that there is 
an error in the documentation, more specifically on this page:


https://www.bacula.lat/community/block-level-file-system-deduplication-with-aligned-volumes-tutorial-bacula-9-0-8-and-above/?lang=en

The same page is available in Portuguese, with the same problem, in the 
following address:


https://www.bacula.lat/community/dedup-alinhado/

The error is at the end of the page, where it says that you can see how 
much space is being used using 'df -h', but the problem is that df can't 
actually see the space gain from dedup, it shows how much would be used 
without dedup.


After some search, I found in the chapter 1.7 of 
'https://bacula.org/whitepapers/DedupVolumes.pdf' that the proper 
command for checking dedup usage in ZFS is 'zpool list', and that 
command did show that dedup was working properly.


These are my outputs with the two commands:

user@bacula2:~$ df -h
Filesystem Size  Used Avail Use% Mounted on
tmpfs  788M  2,8M  786M   1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv  910G   52G  812G   6% /
tmpfs  3,9G 0  3,9G   0% /dev/shm
tmpfs  5,0M 0  5,0M   0% /run/lock
/dev/sda2  2,0G  252M  1,6G  14% /boot
zfs    6,4T  128K  6,4T   1% /zfs
zfs/mnt    9,2T  2,9T  6,4T  31% /zfs/mnt
tmpfs  788M  4,0K  788M   1% /run/user/0
tmpfs  788M  4,0K  788M   1% /run/user/1000
user@bacula2:~$ zpool list
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP DEDUP    
HEALTH  ALTROOT
zfs   7.27T   850G  6.44T    - - 3%    11% 3.41x    
ONLINE  -


So, could someone please correct the two above mentioned pages? It would 
avoid others from having the same problem.


Thank you,

Roberto



--
-
Marcos Roberto Greiner

   Os otimistas acham que estamos no melhor dos mundos
Os pessimistas tem medo de que isto seja verdade
 James Branch Cabell
  -



___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users