# Re: [zfs-discuss] Digging in the bowels of ZFS

```On 2012-12-05 05:52, Jim Klimov wrote:
```
```For undersized allocations, i.e. of compressed data, it is possible
to see P-sizes not divisible by 4 (disks) in 4KB sectors, however,
some sectors do apparently get wasted because the A-size in the DVA
is divisible by 6*4KB. With columnar allocation of disks, it is
easier to see why full stripes have to be used:```
```
p1 p2 d1 d2 d3 d4
.  ,  1  5  9   13
.  ,  2  6  10  14
.  ,  3  7  11  x
.  ,  4  8  12  x

In this illustration a 14-sector-long block is saved, with X being
the empty leftovers, on which we can't really save (as would be the
case with the other allocation, which is likely less efficient for
CPU and IOs).
```
```
Getting more and more puzzled with this... I have seen DVA values
matching both theories now...

Interestingly, all the allocations I looked over involved the number
of sectors divisible by 3... rounding to half of my 6-disk RAID set -
is it merely a coincidence, or some means of balancing IOs?

Anyhow, with 4KB sectors involved, I saw many 128KB logical blocks
compressed into just half a dozen sectors of userdata payload, so
wasting one or two sectors here is quite a large percentage of my

Exposition of found evidence follows:

Say, this one from my original post:
DVA[0]=<0:594928b8000:9000> ... size=20000L/4800P

It has 5 data sectors (@4Kb) over 4 data disks in my raidz2 set,
so it spills over to a second row and requires additional parity
sectors - overall 5d+4p = 9 sectors, which we see in DVA A-size.
This is normal, like expected.

These ones however differ:

DVA[0]=<0:acef500e000:c000> ... size=20000L/6a00P
DVA[0]=<0:acef501a000:c000> ... size=20000L/7200P
DVA[0]=<0:acef5026000:c000> ... size=20000L/5c00P

These neighbors, with 7, 8 and 6 sectors worth of data all occupy
12 sectors on disk along with their parities.

DVA[0]=<0:59492a92000:6000> ... size=20000L/2800P

With 3*4Kb sectors worth of data and 2 parity sectors, this block
is allocated over 6 not 5 sectors.

DVA[0]=<0:5996bf7c000:12000> ... size=20000L/a800P

Likewise, with 11 sectors of data and likely 6 sectors of parity,
this one is given 18, not 17 sectors of storage allocation.

DVA[0]=<0:5996be32000:1e000> ... size=20000L/12c00P

Here, 19 sectors of data and 10 of parity occupy 30 sectors on disk.

I did not yet research where exactly the "unused" sectors are
allocated - "vertically" on the last strip, like in my yesterdays
depiction quoted above, or "horizontally" across several disks,
space with no apparent gain. I mean, the raidz code does tricks
to ensure that parities are located on different disks, and in
normal conditions the userdata sector reads land on all disks
in a uniform manner. Why forfeit the natural "rotation" thanks
to P-sizes smaller than the multiple of number of data-disks?
Writes are anyway streamed and coalesced, so by not allocating
these unused blocks we'd only reduce the needed write IOPS by
some portion - and save disk space...

In short: can someone explain the rationale - why are allocations
such as they are now, and can it be discussed as a bug or should
this be rationalized as a feature?

Thanks,
//Jim
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
```