Re: [PATCH v2 3/3] nvme: Enable autonomous power state transitions
On Fri, 2016-09-02 at 14:43 -0700, Andy Lutomirski wrote: > On Fri, Sep 2, 2016 at 2:15 PM, J Freyensee >wrote: > > > > On Tue, 2016-08-30 at 14:59 -0700, Andy Lutomirski wrote: > > > > > > NVME devices can advertise multiple power states. These states > > > can > > > be either "operational" (the device is fully functional but > > > possibly > > > slow) or "non-operational" (the device is asleep until woken up). > > > Some devices can automatically enter a non-operational state when > > > idle for a specified amount of time and then automatically wake > > > back > > > up when needed. > > > > > > The hardware configuration is a table. For each state, an entry > > > in > > > the table indicates the next deeper non-operational state, if > > > any, > > > to autonomously transition to and the idle time required before > > > transitioning. > > > > > > This patch teaches the driver to program APST so that each > > > successive non-operational state will be entered after an idle > > > time > > > equal to 100% of the total latency (entry plus exit) associated > > > with > > > that state. A sysfs attribute 'apst_max_latency_us' gives the > > > maximum acceptable latency in ns; non-operational states with > > > total > > > latency greater than this value will not be used. As a special > > > case, apst_max_latency_us=0 will disable APST entirely. > > > > May I ask a dumb question? > > > > How does this work with multiple NVMe devices plugged into a > > system? I > > would have thought we'd want one apst_max_latency_us entry per NVMe > > controller for individual control of each device? I have two > > Fultondale-class devices plugged into a system I tried these > > patches on > > (the 4.8-rc4 kernel) and I'm not sure how the single > > /sys/module/nvme_core/parameters/apst_max_latency_us would work per > > my > > 2 devices (and the value is using the default 25000). > > > > Ah, I faked you out :( > > The module parameter (nvme_core/parameters/apst_max_latency_us) just > sets the default for newly probed devices. The actual setting is in > /sys/devices/whatever (symlinked from /sys/block/nvme0n1/devices, for > example). Perhaps I should name the former > default_apst_max_latency_us. It would certainly be more describable to understand what the entry is for, but then the name is also getting longer. Just "default_apst_latency_us"? Or maybe it's probably fine as-is.
Re: [PATCH v2 3/3] nvme: Enable autonomous power state transitions
On Fri, 2016-09-02 at 14:43 -0700, Andy Lutomirski wrote: > On Fri, Sep 2, 2016 at 2:15 PM, J Freyensee > wrote: > > > > On Tue, 2016-08-30 at 14:59 -0700, Andy Lutomirski wrote: > > > > > > NVME devices can advertise multiple power states. These states > > > can > > > be either "operational" (the device is fully functional but > > > possibly > > > slow) or "non-operational" (the device is asleep until woken up). > > > Some devices can automatically enter a non-operational state when > > > idle for a specified amount of time and then automatically wake > > > back > > > up when needed. > > > > > > The hardware configuration is a table. For each state, an entry > > > in > > > the table indicates the next deeper non-operational state, if > > > any, > > > to autonomously transition to and the idle time required before > > > transitioning. > > > > > > This patch teaches the driver to program APST so that each > > > successive non-operational state will be entered after an idle > > > time > > > equal to 100% of the total latency (entry plus exit) associated > > > with > > > that state. A sysfs attribute 'apst_max_latency_us' gives the > > > maximum acceptable latency in ns; non-operational states with > > > total > > > latency greater than this value will not be used. As a special > > > case, apst_max_latency_us=0 will disable APST entirely. > > > > May I ask a dumb question? > > > > How does this work with multiple NVMe devices plugged into a > > system? I > > would have thought we'd want one apst_max_latency_us entry per NVMe > > controller for individual control of each device? I have two > > Fultondale-class devices plugged into a system I tried these > > patches on > > (the 4.8-rc4 kernel) and I'm not sure how the single > > /sys/module/nvme_core/parameters/apst_max_latency_us would work per > > my > > 2 devices (and the value is using the default 25000). > > > > Ah, I faked you out :( > > The module parameter (nvme_core/parameters/apst_max_latency_us) just > sets the default for newly probed devices. The actual setting is in > /sys/devices/whatever (symlinked from /sys/block/nvme0n1/devices, for > example). Perhaps I should name the former > default_apst_max_latency_us. It would certainly be more describable to understand what the entry is for, but then the name is also getting longer. Just "default_apst_latency_us"? Or maybe it's probably fine as-is.
Re: [PATCH v2 3/3] nvme: Enable autonomous power state transitions
On Fri, Sep 2, 2016 at 2:15 PM, J Freyenseewrote: > On Tue, 2016-08-30 at 14:59 -0700, Andy Lutomirski wrote: >> NVME devices can advertise multiple power states. These states can >> be either "operational" (the device is fully functional but possibly >> slow) or "non-operational" (the device is asleep until woken up). >> Some devices can automatically enter a non-operational state when >> idle for a specified amount of time and then automatically wake back >> up when needed. >> >> The hardware configuration is a table. For each state, an entry in >> the table indicates the next deeper non-operational state, if any, >> to autonomously transition to and the idle time required before >> transitioning. >> >> This patch teaches the driver to program APST so that each >> successive non-operational state will be entered after an idle time >> equal to 100% of the total latency (entry plus exit) associated with >> that state. A sysfs attribute 'apst_max_latency_us' gives the >> maximum acceptable latency in ns; non-operational states with total >> latency greater than this value will not be used. As a special >> case, apst_max_latency_us=0 will disable APST entirely. > > May I ask a dumb question? > > How does this work with multiple NVMe devices plugged into a system? I > would have thought we'd want one apst_max_latency_us entry per NVMe > controller for individual control of each device? I have two > Fultondale-class devices plugged into a system I tried these patches on > (the 4.8-rc4 kernel) and I'm not sure how the single > /sys/module/nvme_core/parameters/apst_max_latency_us would work per my > 2 devices (and the value is using the default 25000). > Ah, I faked you out :( The module parameter (nvme_core/parameters/apst_max_latency_us) just sets the default for newly probed devices. The actual setting is in /sys/devices/whatever (symlinked from /sys/block/nvme0n1/devices, for example). Perhaps I should name the former default_apst_max_latency_us. > >> >> On hardware without APST support, apst_max_latency_us will not be >> exposed in sysfs. > > Not sure that is true, as from what I see so far, Fultondales don't > support apst yet I still see: > > [root@nvme-fabric-host01 nvme-cli]# cat > /sys/module/nvme_core/parameters/apst_max_latency_us > 25000 That will be there regardless. It's the value in the sysfs device directory that won't be there, which is hopefully why you couldn't find it. > >> >> In theory, the device can expose "default" APST table, but this >> doesn't seem to function correctly on my device (Samsung 950), nor >> does it seem particularly useful. There is also an optional >> mechanism by which a configuration can be "saved" so it will be >> automatically loaded on reset. This can be configured from >> userspace, but it doesn't seem useful to support in the driver. >> >> On my laptop, enabling APST seems to save nearly 1W. >> >> The hardware tables can be decoded in userspace with nvme-cli. >> 'nvme id-ctrl /dev/nvmeN' will show the power state table and >> 'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST >> configuration. > > nvme get-feature -f 0x0c -H /dev/nvme0 > > isn't working for me, I get a: > > [root@nvme-fabric-host01 nvme-cli]# ./nvme get-feature -f 0x0c -H > /dev/nvme0 > NVMe Status:INVALID_FIELD(2) > > I don't have the time right now to investigate further, but I'll assume > it's because I have Fultondales (though I would have thought this patch > would have provided enough code for the latest nvme-cli code to do this > new get-feature as-is). I'm assuming it doesn't work because your hardware doesn't support APST. nvme get-feature works even without my patches, since it mostly bypasses the driver. --Andy
Re: [PATCH v2 3/3] nvme: Enable autonomous power state transitions
On Fri, Sep 2, 2016 at 2:15 PM, J Freyensee wrote: > On Tue, 2016-08-30 at 14:59 -0700, Andy Lutomirski wrote: >> NVME devices can advertise multiple power states. These states can >> be either "operational" (the device is fully functional but possibly >> slow) or "non-operational" (the device is asleep until woken up). >> Some devices can automatically enter a non-operational state when >> idle for a specified amount of time and then automatically wake back >> up when needed. >> >> The hardware configuration is a table. For each state, an entry in >> the table indicates the next deeper non-operational state, if any, >> to autonomously transition to and the idle time required before >> transitioning. >> >> This patch teaches the driver to program APST so that each >> successive non-operational state will be entered after an idle time >> equal to 100% of the total latency (entry plus exit) associated with >> that state. A sysfs attribute 'apst_max_latency_us' gives the >> maximum acceptable latency in ns; non-operational states with total >> latency greater than this value will not be used. As a special >> case, apst_max_latency_us=0 will disable APST entirely. > > May I ask a dumb question? > > How does this work with multiple NVMe devices plugged into a system? I > would have thought we'd want one apst_max_latency_us entry per NVMe > controller for individual control of each device? I have two > Fultondale-class devices plugged into a system I tried these patches on > (the 4.8-rc4 kernel) and I'm not sure how the single > /sys/module/nvme_core/parameters/apst_max_latency_us would work per my > 2 devices (and the value is using the default 25000). > Ah, I faked you out :( The module parameter (nvme_core/parameters/apst_max_latency_us) just sets the default for newly probed devices. The actual setting is in /sys/devices/whatever (symlinked from /sys/block/nvme0n1/devices, for example). Perhaps I should name the former default_apst_max_latency_us. > >> >> On hardware without APST support, apst_max_latency_us will not be >> exposed in sysfs. > > Not sure that is true, as from what I see so far, Fultondales don't > support apst yet I still see: > > [root@nvme-fabric-host01 nvme-cli]# cat > /sys/module/nvme_core/parameters/apst_max_latency_us > 25000 That will be there regardless. It's the value in the sysfs device directory that won't be there, which is hopefully why you couldn't find it. > >> >> In theory, the device can expose "default" APST table, but this >> doesn't seem to function correctly on my device (Samsung 950), nor >> does it seem particularly useful. There is also an optional >> mechanism by which a configuration can be "saved" so it will be >> automatically loaded on reset. This can be configured from >> userspace, but it doesn't seem useful to support in the driver. >> >> On my laptop, enabling APST seems to save nearly 1W. >> >> The hardware tables can be decoded in userspace with nvme-cli. >> 'nvme id-ctrl /dev/nvmeN' will show the power state table and >> 'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST >> configuration. > > nvme get-feature -f 0x0c -H /dev/nvme0 > > isn't working for me, I get a: > > [root@nvme-fabric-host01 nvme-cli]# ./nvme get-feature -f 0x0c -H > /dev/nvme0 > NVMe Status:INVALID_FIELD(2) > > I don't have the time right now to investigate further, but I'll assume > it's because I have Fultondales (though I would have thought this patch > would have provided enough code for the latest nvme-cli code to do this > new get-feature as-is). I'm assuming it doesn't work because your hardware doesn't support APST. nvme get-feature works even without my patches, since it mostly bypasses the driver. --Andy
Re: [PATCH v2 3/3] nvme: Enable autonomous power state transitions
On Tue, 2016-08-30 at 14:59 -0700, Andy Lutomirski wrote: > NVME devices can advertise multiple power states. These states can > be either "operational" (the device is fully functional but possibly > slow) or "non-operational" (the device is asleep until woken up). > Some devices can automatically enter a non-operational state when > idle for a specified amount of time and then automatically wake back > up when needed. > > The hardware configuration is a table. For each state, an entry in > the table indicates the next deeper non-operational state, if any, > to autonomously transition to and the idle time required before > transitioning. > > This patch teaches the driver to program APST so that each > successive non-operational state will be entered after an idle time > equal to 100% of the total latency (entry plus exit) associated with > that state. A sysfs attribute 'apst_max_latency_us' gives the > maximum acceptable latency in ns; non-operational states with total > latency greater than this value will not be used. As a special > case, apst_max_latency_us=0 will disable APST entirely. May I ask a dumb question? How does this work with multiple NVMe devices plugged into a system? I would have thought we'd want one apst_max_latency_us entry per NVMe controller for individual control of each device? I have two Fultondale-class devices plugged into a system I tried these patches on (the 4.8-rc4 kernel) and I'm not sure how the single /sys/module/nvme_core/parameters/apst_max_latency_us would work per my 2 devices (and the value is using the default 25000). Now from nvme id-ctrl /dev/nvme0 (or nvme1) NVME Identify Controller: vid : 0x8086 ssvid : 0x8086 sn : CVFT41720018800HGN mn : INTEL SSDPE2MD800G4 fr : 8DV10151 rab : 0 ieee: 5cd2e4 cmic: 0 mdts: 5 cntlid : 0 ver : 0 rtd3r : 0 rtd3e : 0 oaes: 0 oacs: 0x6 acl : 3 aerl: 3 frmw: 0x2 lpa : 0x2 elpe: 63 npss: 0 avscc : 0 apsta : 0 <- the Fultondales don't support apst. But I'd still like to ask the dumb question :-). > > On hardware without APST support, apst_max_latency_us will not be > exposed in sysfs. Not sure that is true, as from what I see so far, Fultondales don't support apst yet I still see: [root@nvme-fabric-host01 nvme-cli]# cat /sys/module/nvme_core/parameters/apst_max_latency_us 25000 > > In theory, the device can expose "default" APST table, but this > doesn't seem to function correctly on my device (Samsung 950), nor > does it seem particularly useful. There is also an optional > mechanism by which a configuration can be "saved" so it will be > automatically loaded on reset. This can be configured from > userspace, but it doesn't seem useful to support in the driver. > > On my laptop, enabling APST seems to save nearly 1W. > > The hardware tables can be decoded in userspace with nvme-cli. > 'nvme id-ctrl /dev/nvmeN' will show the power state table and > 'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST > configuration. nvme get-feature -f 0x0c -H /dev/nvme0 isn't working for me, I get a: [root@nvme-fabric-host01 nvme-cli]# ./nvme get-feature -f 0x0c -H /dev/nvme0 NVMe Status:INVALID_FIELD(2) I don't have the time right now to investigate further, but I'll assume it's because I have Fultondales (though I would have thought this patch would have provided enough code for the latest nvme-cli code to do this new get-feature as-is). Jay
Re: [PATCH v2 3/3] nvme: Enable autonomous power state transitions
On Tue, 2016-08-30 at 14:59 -0700, Andy Lutomirski wrote: > NVME devices can advertise multiple power states. These states can > be either "operational" (the device is fully functional but possibly > slow) or "non-operational" (the device is asleep until woken up). > Some devices can automatically enter a non-operational state when > idle for a specified amount of time and then automatically wake back > up when needed. > > The hardware configuration is a table. For each state, an entry in > the table indicates the next deeper non-operational state, if any, > to autonomously transition to and the idle time required before > transitioning. > > This patch teaches the driver to program APST so that each > successive non-operational state will be entered after an idle time > equal to 100% of the total latency (entry plus exit) associated with > that state. A sysfs attribute 'apst_max_latency_us' gives the > maximum acceptable latency in ns; non-operational states with total > latency greater than this value will not be used. As a special > case, apst_max_latency_us=0 will disable APST entirely. May I ask a dumb question? How does this work with multiple NVMe devices plugged into a system? I would have thought we'd want one apst_max_latency_us entry per NVMe controller for individual control of each device? I have two Fultondale-class devices plugged into a system I tried these patches on (the 4.8-rc4 kernel) and I'm not sure how the single /sys/module/nvme_core/parameters/apst_max_latency_us would work per my 2 devices (and the value is using the default 25000). Now from nvme id-ctrl /dev/nvme0 (or nvme1) NVME Identify Controller: vid : 0x8086 ssvid : 0x8086 sn : CVFT41720018800HGN mn : INTEL SSDPE2MD800G4 fr : 8DV10151 rab : 0 ieee: 5cd2e4 cmic: 0 mdts: 5 cntlid : 0 ver : 0 rtd3r : 0 rtd3e : 0 oaes: 0 oacs: 0x6 acl : 3 aerl: 3 frmw: 0x2 lpa : 0x2 elpe: 63 npss: 0 avscc : 0 apsta : 0 <- the Fultondales don't support apst. But I'd still like to ask the dumb question :-). > > On hardware without APST support, apst_max_latency_us will not be > exposed in sysfs. Not sure that is true, as from what I see so far, Fultondales don't support apst yet I still see: [root@nvme-fabric-host01 nvme-cli]# cat /sys/module/nvme_core/parameters/apst_max_latency_us 25000 > > In theory, the device can expose "default" APST table, but this > doesn't seem to function correctly on my device (Samsung 950), nor > does it seem particularly useful. There is also an optional > mechanism by which a configuration can be "saved" so it will be > automatically loaded on reset. This can be configured from > userspace, but it doesn't seem useful to support in the driver. > > On my laptop, enabling APST seems to save nearly 1W. > > The hardware tables can be decoded in userspace with nvme-cli. > 'nvme id-ctrl /dev/nvmeN' will show the power state table and > 'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST > configuration. nvme get-feature -f 0x0c -H /dev/nvme0 isn't working for me, I get a: [root@nvme-fabric-host01 nvme-cli]# ./nvme get-feature -f 0x0c -H /dev/nvme0 NVMe Status:INVALID_FIELD(2) I don't have the time right now to investigate further, but I'll assume it's because I have Fultondales (though I would have thought this patch would have provided enough code for the latest nvme-cli code to do this new get-feature as-is). Jay
[PATCH v2 3/3] nvme: Enable autonomous power state transitions
NVME devices can advertise multiple power states. These states can be either "operational" (the device is fully functional but possibly slow) or "non-operational" (the device is asleep until woken up). Some devices can automatically enter a non-operational state when idle for a specified amount of time and then automatically wake back up when needed. The hardware configuration is a table. For each state, an entry in the table indicates the next deeper non-operational state, if any, to autonomously transition to and the idle time required before transitioning. This patch teaches the driver to program APST so that each successive non-operational state will be entered after an idle time equal to 100% of the total latency (entry plus exit) associated with that state. A sysfs attribute 'apst_max_latency_us' gives the maximum acceptable latency in ns; non-operational states with total latency greater than this value will not be used. As a special case, apst_max_latency_us=0 will disable APST entirely. On hardware without APST support, apst_max_latency_us will not be exposed in sysfs. In theory, the device can expose "default" APST table, but this doesn't seem to function correctly on my device (Samsung 950), nor does it seem particularly useful. There is also an optional mechanism by which a configuration can be "saved" so it will be automatically loaded on reset. This can be configured from userspace, but it doesn't seem useful to support in the driver. On my laptop, enabling APST seems to save nearly 1W. The hardware tables can be decoded in userspace with nvme-cli. 'nvme id-ctrl /dev/nvmeN' will show the power state table and 'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST configuration. Signed-off-by: Andy Lutomirski--- drivers/nvme/host/core.c | 167 +++ drivers/nvme/host/nvme.h | 6 ++ include/linux/nvme.h | 6 ++ 3 files changed, 179 insertions(+) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 9260d2971176..8aea8dfacda6 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -56,6 +56,12 @@ EXPORT_SYMBOL_GPL(nvme_max_retries); static int nvme_char_major; module_param(nvme_char_major, int, 0); +static unsigned long default_apst_max_latency_us = 25000; +module_param_named(apst_max_latency_us, default_apst_max_latency_us, + ulong, 0644); +MODULE_PARM_DESC(apst_max_latency_us, +"default max APST latency; overridden per device in sysfs"); + static LIST_HEAD(nvme_ctrl_list); static DEFINE_SPINLOCK(dev_list_lock); @@ -1209,6 +1215,98 @@ static void nvme_set_queue_limits(struct nvme_ctrl *ctrl, blk_queue_write_cache(q, vwc, vwc); } +static void nvme_configure_apst(struct nvme_ctrl *ctrl) +{ + /* +* APST (Autonomous Power State Transition) lets us program a +* table of power state transitions that the controller will +* perform automatically. We configure it with a simple +* heuristic: we are willing to spend at most 2% of the time +* transitioning between power states. Therefore, when running +* in any given state, we will enter the next lower-power +* non-operational state after waiting 100 * (enlat + exlat) +* microseconds, as long as that state's total latency is under +* the requested maximum latency. +* +* We will not autonomously enter any non-operational state for +* which the total latency exceeds apst_max_latency_us. Users +* can set apst_max_latency_us to zero to turn off APST. +*/ + + unsigned apste; + struct nvme_feat_auto_pst *table; + int ret; + + if (!ctrl->apsta) + return; /* APST isn't supported. */ + + if (ctrl->npss > 31) { + dev_warn(ctrl->device, "NPSS is invalid; not using APST\n"); + return; + } + + table = kzalloc(sizeof(*table), GFP_KERNEL); + if (!table) + return; + + if (ctrl->apst_max_latency_us == 0) { + /* Turn off APST. */ + apste = 0; + } else { + __le64 target = cpu_to_le64(0); + int state; + + /* +* Walk through all states from lowest- to highest-power. +* According to the spec, lower-numbered states use more +* power. NPSS, despite the name, is the index of the +* lowest-power state, not the number of states. +*/ + for (state = (int)ctrl->npss; state >= 0; state--) { + u64 total_latency_us, transition_ms; + + if (target) + table->entries[state] = target; + + /* +* Is this state a useful non-operational state for +* higher-power states to
[PATCH v2 3/3] nvme: Enable autonomous power state transitions
NVME devices can advertise multiple power states. These states can be either "operational" (the device is fully functional but possibly slow) or "non-operational" (the device is asleep until woken up). Some devices can automatically enter a non-operational state when idle for a specified amount of time and then automatically wake back up when needed. The hardware configuration is a table. For each state, an entry in the table indicates the next deeper non-operational state, if any, to autonomously transition to and the idle time required before transitioning. This patch teaches the driver to program APST so that each successive non-operational state will be entered after an idle time equal to 100% of the total latency (entry plus exit) associated with that state. A sysfs attribute 'apst_max_latency_us' gives the maximum acceptable latency in ns; non-operational states with total latency greater than this value will not be used. As a special case, apst_max_latency_us=0 will disable APST entirely. On hardware without APST support, apst_max_latency_us will not be exposed in sysfs. In theory, the device can expose "default" APST table, but this doesn't seem to function correctly on my device (Samsung 950), nor does it seem particularly useful. There is also an optional mechanism by which a configuration can be "saved" so it will be automatically loaded on reset. This can be configured from userspace, but it doesn't seem useful to support in the driver. On my laptop, enabling APST seems to save nearly 1W. The hardware tables can be decoded in userspace with nvme-cli. 'nvme id-ctrl /dev/nvmeN' will show the power state table and 'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST configuration. Signed-off-by: Andy Lutomirski --- drivers/nvme/host/core.c | 167 +++ drivers/nvme/host/nvme.h | 6 ++ include/linux/nvme.h | 6 ++ 3 files changed, 179 insertions(+) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 9260d2971176..8aea8dfacda6 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -56,6 +56,12 @@ EXPORT_SYMBOL_GPL(nvme_max_retries); static int nvme_char_major; module_param(nvme_char_major, int, 0); +static unsigned long default_apst_max_latency_us = 25000; +module_param_named(apst_max_latency_us, default_apst_max_latency_us, + ulong, 0644); +MODULE_PARM_DESC(apst_max_latency_us, +"default max APST latency; overridden per device in sysfs"); + static LIST_HEAD(nvme_ctrl_list); static DEFINE_SPINLOCK(dev_list_lock); @@ -1209,6 +1215,98 @@ static void nvme_set_queue_limits(struct nvme_ctrl *ctrl, blk_queue_write_cache(q, vwc, vwc); } +static void nvme_configure_apst(struct nvme_ctrl *ctrl) +{ + /* +* APST (Autonomous Power State Transition) lets us program a +* table of power state transitions that the controller will +* perform automatically. We configure it with a simple +* heuristic: we are willing to spend at most 2% of the time +* transitioning between power states. Therefore, when running +* in any given state, we will enter the next lower-power +* non-operational state after waiting 100 * (enlat + exlat) +* microseconds, as long as that state's total latency is under +* the requested maximum latency. +* +* We will not autonomously enter any non-operational state for +* which the total latency exceeds apst_max_latency_us. Users +* can set apst_max_latency_us to zero to turn off APST. +*/ + + unsigned apste; + struct nvme_feat_auto_pst *table; + int ret; + + if (!ctrl->apsta) + return; /* APST isn't supported. */ + + if (ctrl->npss > 31) { + dev_warn(ctrl->device, "NPSS is invalid; not using APST\n"); + return; + } + + table = kzalloc(sizeof(*table), GFP_KERNEL); + if (!table) + return; + + if (ctrl->apst_max_latency_us == 0) { + /* Turn off APST. */ + apste = 0; + } else { + __le64 target = cpu_to_le64(0); + int state; + + /* +* Walk through all states from lowest- to highest-power. +* According to the spec, lower-numbered states use more +* power. NPSS, despite the name, is the index of the +* lowest-power state, not the number of states. +*/ + for (state = (int)ctrl->npss; state >= 0; state--) { + u64 total_latency_us, transition_ms; + + if (target) + table->entries[state] = target; + + /* +* Is this state a useful non-operational state for +* higher-power states to autonomously transition to?