Re: [lustre-discuss] free space on ldiskfs vs. zfs
Dear All, I'm sorry, I cannot provide verbose zpool information anymore. I was a bit in a hurry to put the file system into production and that's why I have reformatted the servers with ldiskfs. On Tue, Aug 25, 2015 at 5:54 AM, Alexander I Kulyavtsev wrote: > I was assuming the question was about total space as I struggled for some > time to understand why do I have 99 TB total available space per OSS, after > installing zfs lustre, while ldiskfs OSTs have 120 TB on the same hardware. > The 20% difference was partially (10%) accounted by different raid6 / raidz2 > configuration. But I was not able to explain the other 10%. > For question in original post, I can not make 24 TB from "available" field of > df output: > 207 KiB "available" on his zfs lustre, 198 KiB on ldiskfs lustre. > At the same time the difference of the total space is > 233548424256 -207693153280 = 25855270976 KiB = 24.09 TB. > Götz, could you please tell us what did you mean by "available" ? I was comparing the Lustre file system size from the two configurations, the space available for user data. I expected it to be the same, that is 218T for both file systems. I understand that you have the same issue. Regards, Götz Waschk ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] free space on ldiskfs vs. zfs
Hmm, I was assuming the question was about total space as I struggled for some time to understand why do I have 99 TB total available space per OSS, after installing zfs lustre, while ldiskfs OSTs have 120 TB on the same hardware. The 20% difference was partially (10%) accounted by different raid6 / raidz2 configuration. But I was not able to explain the other 10%. For question in original post, I can not make 24 TB from "available" field of df output: 207 KiB "available" on his zfs lustre, 198 KiB on ldiskfs lustre. At the same time the difference of the total space is 233548424256 -207693153280 = 25855270976 KiB = 24.09 TB. Götz, could you please tell us what did you mean by "available" ? Also, in my case the output of linux df on OSS for the zfs pool looks strange: zpool size reported as 25T (why?), and the formatted OST taking all space on this pool shows 33T: [root@lfs1 ~]# df -h /zpla- /mnt/OST Filesystem Size Used Avail Use% Mounted on zpla- 25T 256K 25T 1% /zpla- zpla-/OST 33T 8.3T 25T 26% /mnt/OST [root@lfs1 ~]# in bytes: [root@lfs1 ~]# df --block-size=1 /zpla- /mnt/OST Filesystem 1B-blocks Used Available Use% Mounted on zpla- 26769344561152262144 26769344299008 1% /zpla- zpla-/OST 35582552834048 9093386076160 26489164660736 26% /mnt/OST same ost reported by lustre: [root@lfsa scripts]# lfs df UUID 1K-blocksUsed Available Use% Mounted on lfs-MDT_UUID 974961920 275328 974684544 0% /mnt/lfsa[MDT:0] lfs-OST_UUID 34748586752 8880259840 25868324736 26% /mnt/lfsa[OST:0] ... Compare: [root@lfs1 ~]# zpool list NAMESIZE ALLOC FREE EXPANDSZ FRAGCAP DEDUP HEALTH ALTROOT zpla- 43.5T 10.9T 32.6T -16%24% 1.00x ONLINE - zpla-0001 43.5T 11.0T 32.5T -17%25% 1.00x ONLINE - zpla-0002 43.5T 10.8T 32.7T -17%24% 1.00x ONLINE - I realize zfs reports raw disk space including parity blocks (48TB = 43.5 TiB); and everything else (like metadata, space for xattr inodes). I can not explain the difference 40 TB (dec.) of data space (10*4TB drives) and 35,582,552,834,048 bytes shown by df for OST. Best regards, Alex. On Aug 24, 2015, at 7:52 PM, Christopher J. Morrone wrote: > I could be wrong, but I don't think that the original poster was asking > why the SIZE field of zpool list was wrong, but rather why the AVAIL > space in zfs list was lower than he expected. > > I would find it easier to answer the question if I knew his drive count > and drive size. > > Chris > > On 08/24/2015 02:12 PM, Alexander I Kulyavtsev wrote: >> Same question here. >> >> 6TB/65TB is 11% . In our case about the same fraction was "missing." >> >> My speculation was, It may happen if at some point between zpool and linux >> the value reported in TB is interpreted as in TiB, and then converted to TB. >> Or unneeded conversion MB to MiB done twice, etc. >> >> Here is my numbers: >> We have 12* 4TB drives per pool, it is 48 TB (decimal). >> zpool created as raidz2 10+2. >> zpool reports 43.5T. >> Pool size shall be 48T=4T*12, or 40T=4T*10 (depending what zpool shows, >> before raiding or after raiding). >>> From the Oracle ZFS documentation, "zpool list" returns the total space >>> without overheads, thus 48 TB shall be reported by zpool instead of 43.5TB. >> >> In my case, it looked like conversion error/interpretation issue between TB >> and TiB: >> >> 48*1000*1000*1000*1000/1024/1024/1024/1024 = 43.65574568510055541992 >> >> >> At disk level: >> >> ~/sas2ircu 0 display >> >> Device is a Hard disk >> Enclosure # : 2 >> Slot # : 12 >> SAS Address : 5003048-0-015a-a918 >> State : Ready (RDY) >> Size (in MB)/(in sectors) : 3815447/7814037167 >> Manufacturer: ATA >> Model Number: HGST HUS724040AL >> Firmware Revision : AA70 >> Serial No : PN2334PBJPW14T >> GUID: 5000cca23de6204b >> Protocol: SATA >> Drive Type : SATA_HDD >> >> One disk size is about 4 TB (decimal): >> >> 3815447*1024*1024 = 4000786153472 >> 7814037167*512 = 4000787029504 >> >> vdev presents whole disk to zpool. There is some overhead, some space left >> on sdq9 . >> >> [root@lfs1 scripts]# head -4 /etc/zfs/vdev_id.conf >> alias s0 /dev/disk/by-path/pci-:03:00.0-sas-0x50030480015aa90c-lun-0 >> alias s1 /dev/disk/by-path/pci-:03:00.0-sas-0x50030480015aa90d-lun-0 >> alias s2 /dev/disk/by-path/pci-:03:00.0-sas-0x50030480015aa90e-lun-0 >> alias s3 /dev/disk/by-path/pci-:03:00.0-sas-
Re: [lustre-discuss] free space on ldiskfs vs. zfs
I could be wrong, but I don't think that the original poster was asking why the SIZE field of zpool list was wrong, but rather why the AVAIL space in zfs list was lower than he expected. I would find it easier to answer the question if I knew his drive count and drive size. Chris On 08/24/2015 02:12 PM, Alexander I Kulyavtsev wrote: Same question here. 6TB/65TB is 11% . In our case about the same fraction was "missing." My speculation was, It may happen if at some point between zpool and linux the value reported in TB is interpreted as in TiB, and then converted to TB. Or unneeded conversion MB to MiB done twice, etc. Here is my numbers: We have 12* 4TB drives per pool, it is 48 TB (decimal). zpool created as raidz2 10+2. zpool reports 43.5T. Pool size shall be 48T=4T*12, or 40T=4T*10 (depending what zpool shows, before raiding or after raiding). From the Oracle ZFS documentation, "zpool list" returns the total space without overheads, thus 48 TB shall be reported by zpool instead of 43.5TB. In my case, it looked like conversion error/interpretation issue between TB and TiB: 48*1000*1000*1000*1000/1024/1024/1024/1024 = 43.65574568510055541992 At disk level: ~/sas2ircu 0 display Device is a Hard disk Enclosure # : 2 Slot # : 12 SAS Address : 5003048-0-015a-a918 State : Ready (RDY) Size (in MB)/(in sectors) : 3815447/7814037167 Manufacturer: ATA Model Number: HGST HUS724040AL Firmware Revision : AA70 Serial No : PN2334PBJPW14T GUID: 5000cca23de6204b Protocol: SATA Drive Type : SATA_HDD One disk size is about 4 TB (decimal): 3815447*1024*1024 = 4000786153472 7814037167*512 = 4000787029504 vdev presents whole disk to zpool. There is some overhead, some space left on sdq9 . [root@lfs1 scripts]# head -4 /etc/zfs/vdev_id.conf alias s0 /dev/disk/by-path/pci-:03:00.0-sas-0x50030480015aa90c-lun-0 alias s1 /dev/disk/by-path/pci-:03:00.0-sas-0x50030480015aa90d-lun-0 alias s2 /dev/disk/by-path/pci-:03:00.0-sas-0x50030480015aa90e-lun-0 alias s3 /dev/disk/by-path/pci-:03:00.0-sas-0x50030480015aa90f-lun-0 ... alias s12 /dev/disk/by-path/pci-:03:00.0-sas-0x50030480015aa918-lun-0 ... [root@lfs1 scripts]# ls -l /dev/disk/by-path/ ... lrwxrwxrwx 1 root root 9 Jul 23 16:27 pci-:03:00.0-sas-0x50030480015aa918-lun-0 -> ../../sdq lrwxrwxrwx 1 root root 10 Jul 23 16:27 pci-:03:00.0-sas-0x50030480015aa918-lun-0-part1 -> ../../sdq1 lrwxrwxrwx 1 root root 10 Jul 23 16:27 pci-:03:00.0-sas-0x50030480015aa918-lun-0-part9 -> ../../sdq9 Pool report: [root@lfs1 scripts]# zpool list NAMESIZE ALLOC FREE EXPANDSZ FRAGCAP DEDUP HEALTH ALTROOT zpla- 43.5T 10.9T 32.6T -16%24% 1.00x ONLINE - zpla-0001 43.5T 11.0T 32.5T -17%25% 1.00x ONLINE - zpla-0002 43.5T 10.8T 32.7T -17%24% 1.00x ONLINE - [root@lfs1 scripts]# [root@lfs1 ~]# zpool list -v zpla-0001 NAME SIZE ALLOC FREE EXPANDSZ FRAGCAP DEDUP HEALTH ALTROOT zpla-0001 43.5T 11.0T 32.5T -17%25% 1.00x ONLINE - raidz2 43.5T 11.0T 32.5T -17%25% s12 - - - - - - s13 - - - - - - s14 - - - - - - s15 - - - - - - s16 - - - - - - s17 - - - - - - s18 - - - - - - s19 - - - - - - s20 - - - - - - s21 - - - - - - s22 - - - - - - s23 - - - - - - [root@lfs1 ~]# [root@lfs1 ~]# zpool get all zpla-0001 NAME PROPERTYVALUE SOURCE zpla-0001 size43.5T - zpla-0001 capacity25% - zpla-0001 altroot - default zpla-0001 health ONLINE - zpla-0001 guid547290297520142 default zpla-0001 version - default zpla-0001 bootfs - default zpla-0001 delegation on default zpla-0001 autoreplace off default zpla-0001 cachefile -
Re: [lustre-discuss] free space on ldiskfs vs. zfs
Same question here. 6TB/65TB is 11% . In our case about the same fraction was "missing." My speculation was, It may happen if at some point between zpool and linux the value reported in TB is interpreted as in TiB, and then converted to TB. Or unneeded conversion MB to MiB done twice, etc. Here is my numbers: We have 12* 4TB drives per pool, it is 48 TB (decimal). zpool created as raidz2 10+2. zpool reports 43.5T. Pool size shall be 48T=4T*12, or 40T=4T*10 (depending what zpool shows, before raiding or after raiding). >From the Oracle ZFS documentation, "zpool list" returns the total space >without overheads, thus 48 TB shall be reported by zpool instead of 43.5TB. In my case, it looked like conversion error/interpretation issue between TB and TiB: 48*1000*1000*1000*1000/1024/1024/1024/1024 = 43.65574568510055541992 At disk level: ~/sas2ircu 0 display Device is a Hard disk Enclosure # : 2 Slot # : 12 SAS Address : 5003048-0-015a-a918 State : Ready (RDY) Size (in MB)/(in sectors) : 3815447/7814037167 Manufacturer: ATA Model Number: HGST HUS724040AL Firmware Revision : AA70 Serial No : PN2334PBJPW14T GUID: 5000cca23de6204b Protocol: SATA Drive Type : SATA_HDD One disk size is about 4 TB (decimal): 3815447*1024*1024 = 4000786153472 7814037167*512 = 4000787029504 vdev presents whole disk to zpool. There is some overhead, some space left on sdq9 . [root@lfs1 scripts]# head -4 /etc/zfs/vdev_id.conf alias s0 /dev/disk/by-path/pci-:03:00.0-sas-0x50030480015aa90c-lun-0 alias s1 /dev/disk/by-path/pci-:03:00.0-sas-0x50030480015aa90d-lun-0 alias s2 /dev/disk/by-path/pci-:03:00.0-sas-0x50030480015aa90e-lun-0 alias s3 /dev/disk/by-path/pci-:03:00.0-sas-0x50030480015aa90f-lun-0 ... alias s12 /dev/disk/by-path/pci-:03:00.0-sas-0x50030480015aa918-lun-0 ... [root@lfs1 scripts]# ls -l /dev/disk/by-path/ ... lrwxrwxrwx 1 root root 9 Jul 23 16:27 pci-:03:00.0-sas-0x50030480015aa918-lun-0 -> ../../sdq lrwxrwxrwx 1 root root 10 Jul 23 16:27 pci-:03:00.0-sas-0x50030480015aa918-lun-0-part1 -> ../../sdq1 lrwxrwxrwx 1 root root 10 Jul 23 16:27 pci-:03:00.0-sas-0x50030480015aa918-lun-0-part9 -> ../../sdq9 Pool report: [root@lfs1 scripts]# zpool list NAMESIZE ALLOC FREE EXPANDSZ FRAGCAP DEDUP HEALTH ALTROOT zpla- 43.5T 10.9T 32.6T -16%24% 1.00x ONLINE - zpla-0001 43.5T 11.0T 32.5T -17%25% 1.00x ONLINE - zpla-0002 43.5T 10.8T 32.7T -17%24% 1.00x ONLINE - [root@lfs1 scripts]# [root@lfs1 ~]# zpool list -v zpla-0001 NAME SIZE ALLOC FREE EXPANDSZ FRAGCAP DEDUP HEALTH ALTROOT zpla-0001 43.5T 11.0T 32.5T -17%25% 1.00x ONLINE - raidz2 43.5T 11.0T 32.5T -17%25% s12 - - - - - - s13 - - - - - - s14 - - - - - - s15 - - - - - - s16 - - - - - - s17 - - - - - - s18 - - - - - - s19 - - - - - - s20 - - - - - - s21 - - - - - - s22 - - - - - - s23 - - - - - - [root@lfs1 ~]# [root@lfs1 ~]# zpool get all zpla-0001 NAME PROPERTYVALUE SOURCE zpla-0001 size43.5T - zpla-0001 capacity25% - zpla-0001 altroot - default zpla-0001 health ONLINE - zpla-0001 guid547290297520142 default zpla-0001 version - default zpla-0001 bootfs - default zpla-0001 delegation on default zpla-0001 autoreplace off default zpla-0001 cachefile - default zpla-0001 failmodewaitdefault zpla-0001 listsnapshots off default zpla-0001 autoexpand off default zpla-0001 dedupditto 0 default zpla-0001 dedupratio 1.0
Re: [lustre-discuss] free space on ldiskfs vs. zfs
If you provide the "zpool list -v" output it might give us a little clearer view of what you have going on. Chris On 08/19/2015 06:18 AM, Götz Waschk wrote: Dear Lustre experts, I have configured two different Lustre instances, both using Lustre 2.5.3, one with ldiskfs on RAID-6 hardware RAID and one using ZFS and RAID-Z2, using the same type of hardware. I was wondering, why I 24 TB less space available, when I should have the same amount of parity used: # lfs df UUID 1K-blocksUsed Available Use% Mounted on fs19-MDT_UUID 50322916 47269646494784 1% /testlustre/fs19[MDT:0] fs19-OST_UUID51923288320 12672 51923273600 0% /testlustre/fs19[OST:0] fs19-OST0001_UUID51923288320 12672 51923273600 0% /testlustre/fs19[OST:1] fs19-OST0002_UUID51923288320 12672 51923273600 0% /testlustre/fs19[OST:2] fs19-OST0003_UUID51923288320 12672 51923273600 0% /testlustre/fs19[OST:3] filesystem summary: 207693153280 50688 207693094400 0% /testlustre/fs19 UUID 1K-blocksUsed Available Use% Mounted on fs18-MDT_UUID 47177700 48215243550028 1% /lustre/fs18[MDT:0] fs18-OST_UUID58387106064 6014088200 49452733560 11% /lustre/fs18[OST:0] fs18-OST0001_UUID58387106064 5919753028 49547068928 11% /lustre/fs18[OST:1] fs18-OST0002_UUID58387106064 5944542316 49522279640 11% /lustre/fs18[OST:2] fs18-OST0003_UUID58387106064 5906712004 49560109952 11% /lustre/fs18[OST:3] filesystem summary: 233548424256 23785095548 198082192080 11% /lustre/fs18 fs18 is using ldiskfs, while fs19 is ZFS: # zpool list NAME SIZE ALLOC FREECAP DEDUP HEALTH ALTROOT lustre-ost165T 18,1M 65,0T 0% 1.00x ONLINE - # zfs list NAME USED AVAIL REFER MOUNTPOINT lustre-ost1 13,6M 48,7T 311K /lustre-ost1 lustre-ost1/ost1 12,4M 48,7T 12,4M /lustre-ost1/ost1 Any idea on why my 6TB per OST went? Regards, Götz Waschk ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] free space on ldiskfs vs. zfs
Dear Lustre experts, I have configured two different Lustre instances, both using Lustre 2.5.3, one with ldiskfs on RAID-6 hardware RAID and one using ZFS and RAID-Z2, using the same type of hardware. I was wondering, why I 24 TB less space available, when I should have the same amount of parity used: # lfs df UUID 1K-blocksUsed Available Use% Mounted on fs19-MDT_UUID 50322916 47269646494784 1% /testlustre/fs19[MDT:0] fs19-OST_UUID51923288320 12672 51923273600 0% /testlustre/fs19[OST:0] fs19-OST0001_UUID51923288320 12672 51923273600 0% /testlustre/fs19[OST:1] fs19-OST0002_UUID51923288320 12672 51923273600 0% /testlustre/fs19[OST:2] fs19-OST0003_UUID51923288320 12672 51923273600 0% /testlustre/fs19[OST:3] filesystem summary: 207693153280 50688 207693094400 0% /testlustre/fs19 UUID 1K-blocksUsed Available Use% Mounted on fs18-MDT_UUID 47177700 48215243550028 1% /lustre/fs18[MDT:0] fs18-OST_UUID58387106064 6014088200 49452733560 11% /lustre/fs18[OST:0] fs18-OST0001_UUID58387106064 5919753028 49547068928 11% /lustre/fs18[OST:1] fs18-OST0002_UUID58387106064 5944542316 49522279640 11% /lustre/fs18[OST:2] fs18-OST0003_UUID58387106064 5906712004 49560109952 11% /lustre/fs18[OST:3] filesystem summary: 233548424256 23785095548 198082192080 11% /lustre/fs18 fs18 is using ldiskfs, while fs19 is ZFS: # zpool list NAME SIZE ALLOC FREECAP DEDUP HEALTH ALTROOT lustre-ost165T 18,1M 65,0T 0% 1.00x ONLINE - # zfs list NAME USED AVAIL REFER MOUNTPOINT lustre-ost1 13,6M 48,7T 311K /lustre-ost1 lustre-ost1/ost1 12,4M 48,7T 12,4M /lustre-ost1/ost1 Any idea on why my 6TB per OST went? Regards, Götz Waschk ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org