Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)

2016-08-10 Thread Tom Yan
On 10 August 2016 at 15:41, David Milburn  wrote:
> Hi,
>
> The 168 makes AHCI_CMD_TBL_SZ equal to 2816
>
> AHCI_CMD_TBL_SZ = AHCI_CMD_TBL_HDR_SZ + (AHCI_MAX_SG * 16)
> AHCI_CMD_TBL_SZ = 128 + (168 * 16)
>
> I think if you add in AHCI_CMD_SLOT_SZ (1024) and AHCI_RX_FIS_SZ (256)
> the DMA is 4K aligned, I think that is where the 168 came from.

Looks like the right guess. Though AHCI_PORT_PRIV_DMA_SZ is not:

AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_SZ (2816) + AHCI_RX_FIS_SZ (256) = 4096

but:

AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_AR_SZ (2816 * 32 = 90112) +
AHCI_RX_FIS_SZ (256) = 91392

and AHCI_PORT_PRIV_FBS_DMA_SZ is:

AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_AR_SZ (2816 * 32 = 90112) +
AHCI_RX_FIS_SZ * 16 (4096) = 95232

>
> Thanks,
> David
>
>


Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)

2016-08-10 Thread Tom Yan
On 10 August 2016 at 15:41, David Milburn  wrote:
> Hi,
>
> The 168 makes AHCI_CMD_TBL_SZ equal to 2816
>
> AHCI_CMD_TBL_SZ = AHCI_CMD_TBL_HDR_SZ + (AHCI_MAX_SG * 16)
> AHCI_CMD_TBL_SZ = 128 + (168 * 16)
>
> I think if you add in AHCI_CMD_SLOT_SZ (1024) and AHCI_RX_FIS_SZ (256)
> the DMA is 4K aligned, I think that is where the 168 came from.

Looks like the right guess. Though AHCI_PORT_PRIV_DMA_SZ is not:

AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_SZ (2816) + AHCI_RX_FIS_SZ (256) = 4096

but:

AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_AR_SZ (2816 * 32 = 90112) +
AHCI_RX_FIS_SZ (256) = 91392

and AHCI_PORT_PRIV_FBS_DMA_SZ is:

AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_AR_SZ (2816 * 32 = 90112) +
AHCI_RX_FIS_SZ * 16 (4096) = 95232

>
> Thanks,
> David
>
>


Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)

2016-08-10 Thread David Milburn

On 08/10/2016 12:19 PM, Tom Yan wrote:

On 10 August 2016 at 15:41, David Milburn  wrote:

Hi,

The 168 makes AHCI_CMD_TBL_SZ equal to 2816

AHCI_CMD_TBL_SZ = AHCI_CMD_TBL_HDR_SZ + (AHCI_MAX_SG * 16)
AHCI_CMD_TBL_SZ = 128 + (168 * 16)

I think if you add in AHCI_CMD_SLOT_SZ (1024) and AHCI_RX_FIS_SZ (256)
the DMA is 4K aligned, I think that is where the 168 came from.


Looks like the right guess. Though AHCI_PORT_PRIV_DMA_SZ is not:

AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_SZ (2816) + AHCI_RX_FIS_SZ (256) = 4096

but:

AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_AR_SZ (2816 * 32 = 90112) +
AHCI_RX_FIS_SZ (256) = 91392

and AHCI_PORT_PRIV_FBS_DMA_SZ is:

AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_AR_SZ (2816 * 32 = 90112) +
AHCI_RX_FIS_SZ * 16 (4096) = 95232



Yes, but in both cases mem_dma gets adjusted for AHCI_CMD_SLOT_SZ (1024)
and rx_fis_sz (256 or 4096 in fbs case).

Thanks,
David






Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)

2016-08-10 Thread David Milburn

On 08/10/2016 12:19 PM, Tom Yan wrote:

On 10 August 2016 at 15:41, David Milburn  wrote:

Hi,

The 168 makes AHCI_CMD_TBL_SZ equal to 2816

AHCI_CMD_TBL_SZ = AHCI_CMD_TBL_HDR_SZ + (AHCI_MAX_SG * 16)
AHCI_CMD_TBL_SZ = 128 + (168 * 16)

I think if you add in AHCI_CMD_SLOT_SZ (1024) and AHCI_RX_FIS_SZ (256)
the DMA is 4K aligned, I think that is where the 168 came from.


Looks like the right guess. Though AHCI_PORT_PRIV_DMA_SZ is not:

AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_SZ (2816) + AHCI_RX_FIS_SZ (256) = 4096

but:

AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_AR_SZ (2816 * 32 = 90112) +
AHCI_RX_FIS_SZ (256) = 91392

and AHCI_PORT_PRIV_FBS_DMA_SZ is:

AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_AR_SZ (2816 * 32 = 90112) +
AHCI_RX_FIS_SZ * 16 (4096) = 95232



Yes, but in both cases mem_dma gets adjusted for AHCI_CMD_SLOT_SZ (1024)
and rx_fis_sz (256 or 4096 in fbs case).

Thanks,
David






Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)

2016-08-10 Thread Tom Yan
On 10 August 2016 at 11:26, Tejun Heo  wrote:
> Hmmm.. why not?  The hardware limit is 64k and the driver is using a

Is that referring to the maximum number of entries allowed in the
PRDT, Physical Region Descriptor Table (which is, more precisely,
65535)?

> lower limit of 168 most likely because it doesn't make noticeable
> difference beyond certain point and it determines the size of
> contiguous memory which has to be allocated for the command table.
> Each sg entry is 16 bytes.  Pushing it to the hardware limit would
> require an order 9 allocation for each port.

That makes sense to me, and I didn't have the intention to push it to
the limit anyway.

> Not necessarily.  A single sg entry can point to an area larger than
> PAGE_SIZE.

You mean the 4MB limit of "Data Byte Count" in "DW3: Description
Information" of the PRDT? Is that what max_segment_size (which is set
to a general fallback of 65536:
http://lxr.free-electrons.com/ident?i=dma_get_max_seg_size) is about
in this case?

And my point was, it will be a multiple of 168 anyway, if 1344 is just
an example.

> As written above, that probably makes the ahci command table size
> nicely aligned.

I think that's what bothers me ultimately, cause I don't see how 168
makes it (more) nicely aligned (or even, aligned to what?).

I even checked out the AHCI driver of FreeBSD
(https://github.com/freebsd/freebsd/blob/master/sys/dev/ahci/ahci.h):

...
#define MAXPHYS 512 * 1024
...
#define AHCI_SG_ENTRIES (roundup(btoc(MAXPHYS) + 1, 8))
...
#define AHCI_CT_SIZE (128 + AHCI_SG_ENTRIES * 16)
...

Couldn't get the sense out of the `+ 1` and round up to 8 thing either.


Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)

2016-08-10 Thread Tom Yan
On 10 August 2016 at 11:26, Tejun Heo  wrote:
> Hmmm.. why not?  The hardware limit is 64k and the driver is using a

Is that referring to the maximum number of entries allowed in the
PRDT, Physical Region Descriptor Table (which is, more precisely,
65535)?

> lower limit of 168 most likely because it doesn't make noticeable
> difference beyond certain point and it determines the size of
> contiguous memory which has to be allocated for the command table.
> Each sg entry is 16 bytes.  Pushing it to the hardware limit would
> require an order 9 allocation for each port.

That makes sense to me, and I didn't have the intention to push it to
the limit anyway.

> Not necessarily.  A single sg entry can point to an area larger than
> PAGE_SIZE.

You mean the 4MB limit of "Data Byte Count" in "DW3: Description
Information" of the PRDT? Is that what max_segment_size (which is set
to a general fallback of 65536:
http://lxr.free-electrons.com/ident?i=dma_get_max_seg_size) is about
in this case?

And my point was, it will be a multiple of 168 anyway, if 1344 is just
an example.

> As written above, that probably makes the ahci command table size
> nicely aligned.

I think that's what bothers me ultimately, cause I don't see how 168
makes it (more) nicely aligned (or even, aligned to what?).

I even checked out the AHCI driver of FreeBSD
(https://github.com/freebsd/freebsd/blob/master/sys/dev/ahci/ahci.h):

...
#define MAXPHYS 512 * 1024
...
#define AHCI_SG_ENTRIES (roundup(btoc(MAXPHYS) + 1, 8))
...
#define AHCI_CT_SIZE (128 + AHCI_SG_ENTRIES * 16)
...

Couldn't get the sense out of the `+ 1` and round up to 8 thing either.


Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)

2016-08-10 Thread Tejun Heo
Hello, Tom.

On Wed, Aug 10, 2016 at 06:04:10PM +0800, Tom Yan wrote:
> On 10 August 2016 at 11:26, Tejun Heo  wrote:
> > Hmmm.. why not?  The hardware limit is 64k and the driver is using a
> 
> Is that referring to the maximum number of entries allowed in the
> PRDT, Physical Region Descriptor Table (which is, more precisely,
> 65535)?

Yeap.

> > Not necessarily.  A single sg entry can point to an area larger than
> > PAGE_SIZE.
> 
> You mean the 4MB limit of "Data Byte Count" in "DW3: Description
> Information" of the PRDT? Is that what max_segment_size (which is set
> to a general fallback of 65536:
> http://lxr.free-electrons.com/ident?i=dma_get_max_seg_size) is about
> in this case?

Ah, ahci isn't setting the hardware limit properly but yeah that's the
maximum segment size.

> And my point was, it will be a multiple of 168 anyway, if 1344 is just
> an example.
> 
> > As written above, that probably makes the ahci command table size
> > nicely aligned.
> 
> I think that's what bothers me ultimately, cause I don't see how 168
> makes it (more) nicely aligned (or even, aligned to what?).

Hmmm... Looked at the sizes and they don't seem to align to anything
meaningful.  No idea.

Thanks.

-- 
tejun


Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)

2016-08-10 Thread Tejun Heo
Hello, Tom.

On Wed, Aug 10, 2016 at 06:04:10PM +0800, Tom Yan wrote:
> On 10 August 2016 at 11:26, Tejun Heo  wrote:
> > Hmmm.. why not?  The hardware limit is 64k and the driver is using a
> 
> Is that referring to the maximum number of entries allowed in the
> PRDT, Physical Region Descriptor Table (which is, more precisely,
> 65535)?

Yeap.

> > Not necessarily.  A single sg entry can point to an area larger than
> > PAGE_SIZE.
> 
> You mean the 4MB limit of "Data Byte Count" in "DW3: Description
> Information" of the PRDT? Is that what max_segment_size (which is set
> to a general fallback of 65536:
> http://lxr.free-electrons.com/ident?i=dma_get_max_seg_size) is about
> in this case?

Ah, ahci isn't setting the hardware limit properly but yeah that's the
maximum segment size.

> And my point was, it will be a multiple of 168 anyway, if 1344 is just
> an example.
> 
> > As written above, that probably makes the ahci command table size
> > nicely aligned.
> 
> I think that's what bothers me ultimately, cause I don't see how 168
> makes it (more) nicely aligned (or even, aligned to what?).

Hmmm... Looked at the sizes and they don't seem to align to anything
meaningful.  No idea.

Thanks.

-- 
tejun


Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)

2016-08-10 Thread David Milburn

Hi,

On 08/10/2016 10:14 AM, Tejun Heo wrote:

Hello, Tom.

On Wed, Aug 10, 2016 at 06:04:10PM +0800, Tom Yan wrote:

On 10 August 2016 at 11:26, Tejun Heo  wrote:

Hmmm.. why not?  The hardware limit is 64k and the driver is using a


Is that referring to the maximum number of entries allowed in the
PRDT, Physical Region Descriptor Table (which is, more precisely,
65535)?


Yeap.


Not necessarily.  A single sg entry can point to an area larger than
PAGE_SIZE.


You mean the 4MB limit of "Data Byte Count" in "DW3: Description
Information" of the PRDT? Is that what max_segment_size (which is set
to a general fallback of 65536:
http://lxr.free-electrons.com/ident?i=dma_get_max_seg_size) is about
in this case?


Ah, ahci isn't setting the hardware limit properly but yeah that's the
maximum segment size.


And my point was, it will be a multiple of 168 anyway, if 1344 is just
an example.


As written above, that probably makes the ahci command table size
nicely aligned.


I think that's what bothers me ultimately, cause I don't see how 168
makes it (more) nicely aligned (or even, aligned to what?).


Hmmm... Looked at the sizes and they don't seem to align to anything
meaningful.  No idea.


The 168 makes AHCI_CMD_TBL_SZ equal to 2816

AHCI_CMD_TBL_SZ = AHCI_CMD_TBL_HDR_SZ + (AHCI_MAX_SG * 16)
AHCI_CMD_TBL_SZ = 128 + (168 * 16)

I think if you add in AHCI_CMD_SLOT_SZ (1024) and AHCI_RX_FIS_SZ (256)
the DMA is 4K aligned, I think that is where the 168 came from.

Thanks,
David




Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)

2016-08-10 Thread David Milburn

Hi,

On 08/10/2016 10:14 AM, Tejun Heo wrote:

Hello, Tom.

On Wed, Aug 10, 2016 at 06:04:10PM +0800, Tom Yan wrote:

On 10 August 2016 at 11:26, Tejun Heo  wrote:

Hmmm.. why not?  The hardware limit is 64k and the driver is using a


Is that referring to the maximum number of entries allowed in the
PRDT, Physical Region Descriptor Table (which is, more precisely,
65535)?


Yeap.


Not necessarily.  A single sg entry can point to an area larger than
PAGE_SIZE.


You mean the 4MB limit of "Data Byte Count" in "DW3: Description
Information" of the PRDT? Is that what max_segment_size (which is set
to a general fallback of 65536:
http://lxr.free-electrons.com/ident?i=dma_get_max_seg_size) is about
in this case?


Ah, ahci isn't setting the hardware limit properly but yeah that's the
maximum segment size.


And my point was, it will be a multiple of 168 anyway, if 1344 is just
an example.


As written above, that probably makes the ahci command table size
nicely aligned.


I think that's what bothers me ultimately, cause I don't see how 168
makes it (more) nicely aligned (or even, aligned to what?).


Hmmm... Looked at the sizes and they don't seem to align to anything
meaningful.  No idea.


The 168 makes AHCI_CMD_TBL_SZ equal to 2816

AHCI_CMD_TBL_SZ = AHCI_CMD_TBL_HDR_SZ + (AHCI_MAX_SG * 16)
AHCI_CMD_TBL_SZ = 128 + (168 * 16)

I think if you add in AHCI_CMD_SLOT_SZ (1024) and AHCI_RX_FIS_SZ (256)
the DMA is 4K aligned, I think that is where the 168 came from.

Thanks,
David




Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)

2016-08-09 Thread Tejun Heo
Hello, Tom.

On Sun, Aug 07, 2016 at 10:10:17PM +0800, Tom Yan wrote:
> So the (not so) recent bump of BLK_DEF_MAX_SECTORS from 1024 to 2560
> (commit d2be537c3ba3) seemed to have caused trouble to some of the ATA
> devices, which were then worked around with ATA_HORKAGE_MAX_SEC_1024.
> 
> However, I am suspecting that the bump of BLK_DEF_MAX_SECTORS is not
> the "real" cause of the trouble, but the fact that AHCI_MAX_SG has
> been set to a weird value of 168 (with a comment "hardware max is
> 64K", which neither seem to make any sense).

Hmmm.. why not?  The hardware limit is 64k and the driver is using a
lower limit of 168 most likely because it doesn't make noticeable
difference beyond certain point and it determines the size of
contiguous memory which has to be allocated for the command table.
Each sg entry is 16 bytes.  Pushing it to the hardware limit would
require an order 9 allocation for each port.

> AHCI_MAX_SG is used to set the sg_tablesize (i.e. max_segments,
> apparently), which is apparently used to derive the actual "request
> size" (that is, if it is lower than max_sectors(_kb), it will be the
> limiting factor instead).
>
> For example, no matter if the drive has max_sectors set to 2560, or to
> 65535 (by adding it as the Optimal Transfer Length to libata's SATL,
> which is also max_hw_sectors that is set from ATA_MAX_SECTORS_LBA48),
> "avgrq-sz" in `iostat` will be capped at 1344 (168 * 8).

Not necessarily.  A single sg entry can point to an area larger than
PAGE_SIZE.

> However, if I change AHCI_MAX_SG to 128 (which is also the
> sg_tablesize set in libata.h from LIBATA_MAX_PRD), "avgrq-sz" in
> `iostat` will be capped at 1024 (128 * 8), which should make
> ATA_HORKAGE_MAX_SEC_1024 unnecessary.
> 
> So why has AHCI_MAX_SG been set to 168 anyway?

As written above, that probably makes the ahci command table size
nicely aligned.

Thanks.

-- 
tejun


Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)

2016-08-09 Thread Tejun Heo
Hello, Tom.

On Sun, Aug 07, 2016 at 10:10:17PM +0800, Tom Yan wrote:
> So the (not so) recent bump of BLK_DEF_MAX_SECTORS from 1024 to 2560
> (commit d2be537c3ba3) seemed to have caused trouble to some of the ATA
> devices, which were then worked around with ATA_HORKAGE_MAX_SEC_1024.
> 
> However, I am suspecting that the bump of BLK_DEF_MAX_SECTORS is not
> the "real" cause of the trouble, but the fact that AHCI_MAX_SG has
> been set to a weird value of 168 (with a comment "hardware max is
> 64K", which neither seem to make any sense).

Hmmm.. why not?  The hardware limit is 64k and the driver is using a
lower limit of 168 most likely because it doesn't make noticeable
difference beyond certain point and it determines the size of
contiguous memory which has to be allocated for the command table.
Each sg entry is 16 bytes.  Pushing it to the hardware limit would
require an order 9 allocation for each port.

> AHCI_MAX_SG is used to set the sg_tablesize (i.e. max_segments,
> apparently), which is apparently used to derive the actual "request
> size" (that is, if it is lower than max_sectors(_kb), it will be the
> limiting factor instead).
>
> For example, no matter if the drive has max_sectors set to 2560, or to
> 65535 (by adding it as the Optimal Transfer Length to libata's SATL,
> which is also max_hw_sectors that is set from ATA_MAX_SECTORS_LBA48),
> "avgrq-sz" in `iostat` will be capped at 1344 (168 * 8).

Not necessarily.  A single sg entry can point to an area larger than
PAGE_SIZE.

> However, if I change AHCI_MAX_SG to 128 (which is also the
> sg_tablesize set in libata.h from LIBATA_MAX_PRD), "avgrq-sz" in
> `iostat` will be capped at 1024 (128 * 8), which should make
> ATA_HORKAGE_MAX_SEC_1024 unnecessary.
> 
> So why has AHCI_MAX_SG been set to 168 anyway?

As written above, that probably makes the ahci command table size
nicely aligned.

Thanks.

-- 
tejun


Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)

2016-08-07 Thread Tom Yan
So the (not so) recent bump of BLK_DEF_MAX_SECTORS from 1024 to 2560
(commit d2be537c3ba3) seemed to have caused trouble to some of the ATA
devices, which were then worked around with ATA_HORKAGE_MAX_SEC_1024.

However, I am suspecting that the bump of BLK_DEF_MAX_SECTORS is not
the "real" cause of the trouble, but the fact that AHCI_MAX_SG has
been set to a weird value of 168 (with a comment "hardware max is
64K", which neither seem to make any sense).

AHCI_MAX_SG is used to set the sg_tablesize (i.e. max_segments,
apparently), which is apparently used to derive the actual "request
size" (that is, if it is lower than max_sectors(_kb), it will be the
limiting factor instead).

For example, no matter if the drive has max_sectors set to 2560, or to
65535 (by adding it as the Optimal Transfer Length to libata's SATL,
which is also max_hw_sectors that is set from ATA_MAX_SECTORS_LBA48),
"avgrq-sz" in `iostat` will be capped at 1344 (168 * 8).

However, if I change AHCI_MAX_SG to 128 (which is also the
sg_tablesize set in libata.h from LIBATA_MAX_PRD), "avgrq-sz" in
`iostat` will be capped at 1024 (128 * 8), which should make
ATA_HORKAGE_MAX_SEC_1024 unnecessary.

So why has AHCI_MAX_SG been set to 168 anyway?


Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)

2016-08-07 Thread Tom Yan
So the (not so) recent bump of BLK_DEF_MAX_SECTORS from 1024 to 2560
(commit d2be537c3ba3) seemed to have caused trouble to some of the ATA
devices, which were then worked around with ATA_HORKAGE_MAX_SEC_1024.

However, I am suspecting that the bump of BLK_DEF_MAX_SECTORS is not
the "real" cause of the trouble, but the fact that AHCI_MAX_SG has
been set to a weird value of 168 (with a comment "hardware max is
64K", which neither seem to make any sense).

AHCI_MAX_SG is used to set the sg_tablesize (i.e. max_segments,
apparently), which is apparently used to derive the actual "request
size" (that is, if it is lower than max_sectors(_kb), it will be the
limiting factor instead).

For example, no matter if the drive has max_sectors set to 2560, or to
65535 (by adding it as the Optimal Transfer Length to libata's SATL,
which is also max_hw_sectors that is set from ATA_MAX_SECTORS_LBA48),
"avgrq-sz" in `iostat` will be capped at 1344 (168 * 8).

However, if I change AHCI_MAX_SG to 128 (which is also the
sg_tablesize set in libata.h from LIBATA_MAX_PRD), "avgrq-sz" in
`iostat` will be capped at 1024 (128 * 8), which should make
ATA_HORKAGE_MAX_SEC_1024 unnecessary.

So why has AHCI_MAX_SG been set to 168 anyway?