Re: [PATCH v4 0/3] nvme power saving

2016-10-27 Thread Christoph Hellwig
On Thu, Oct 27, 2016 at 05:06:16PM -0700, Andy Lutomirski wrote:
> It looks like there is at least one NVMe disk in existence (a
> different Samsung device) that sporadically dies when APST is on.
> This device appears to also sporadically die when APST is off, but it
> lasts considerably longer before dying with APST off.

Judy, can you help Andy to find someone in Samsung to report this
to?

> So here's what I'm tempted to do:
> 
>  - For devices that report NVMe version 1.2 support, APST is on by
> default.  I hope this is safe.

It should be safe.  That being said NVMe is being driven more and more
into consumer markets so eventually we will find some device we need
to work around inevitably, but that's life.

>  - For devices that don't report NVMe 1.2 or higher but do report
> APSTA (which implies NVMe 1.1), then we can have a blacklist or a
> whitelist.  A blacklist is nicer, but a whitelist is safer.

We just had a discussion about advertising features before claiming
conformance where they appear in in the NVMe technical working group.
The general concensus was that it should be safe.  I'm thus tempted
to start out with the blacklist.

>  - A sysfs and/or module control allows overriding this.
> 
>  - Implement dev_pm_qos latency control.  The chosen latency (if APST
> is enabled) will be the lesser of the dev_pm_qos setting and a module
> parameter.
> 
> How does that sound?

Great!


Re: [PATCH v4 0/3] nvme power saving

2016-10-27 Thread Andy Lutomirski
On Thu, Sep 22, 2016 at 3:15 PM, Andy Lutomirski  wrote:
> On Thu, Sep 22, 2016 at 2:33 PM, J Freyensee
>  wrote:
>> On Thu, 2016-09-22 at 14:43 -0600, Jens Axboe wrote:
>>> On 09/22/2016 02:11 PM, Andy Lutomirski wrote:
>>> >
>>> > On Thu, Sep 22, 2016 at 7:23 AM, Jens Axboe  wrote:
>>> > >
>>> > >
>>> > > On 09/16/2016 12:16 PM, Andy Lutomirski wrote:
>>> > > >
>>> > > >
>>> > > > Hi all-
>>> > > >
>>> > > > Here's v4 of the APST patch set.  The biggest bikesheddable
>>> > > > thing (I
>>> > > > think) is the scaling factor.  I currently have it hardcoded so
>>> > > > that
>>> > > > we wait 50x the total latency before entering a power saving
>>> > > > state.
>>> > > > On my Samsung 950, this means we enter state 3 (70mW, 0.5ms
>>> > > > entry
>>> > > > latency, 5ms exit latency) after 275ms and state 4 (5mW, 2ms
>>> > > > entry
>>> > > > latency, 22ms exit latency) after 1200ms.  I have the default
>>> > > > max
>>> > > > latency set to 25ms.
>>> > > >
>>> > > > FWIW, in practice, the latency this introduces seems to be well
>>> > > > under 22ms, but my benchmark is a bit silly and I might have
>>> > > > measured it wrong.  I certainly haven't observed a slowdown
>>> > > > just
>>> > > > using my laptop.
>>> > > >
>>> > > > This time around, I changed the names of parameters after Jay
>>> > > > Frayensee got confused by the first try.  Now they are:
>>> > > >
>>> > > >  - ps_max_latency_us in sysfs: actually controls it.
>>> > > >  - nvme_core.default_ps_max_latency_us: sets the default.
>>> > > >
>>> > > > Yeah, they're mouthfuls, but they should be clearer now.
>>> > >
>>> > >
>>> > > The only thing I don't like about this is the fact that's it's a
>>> > > driver private thing. Similar to ALPM on SATA, it's yet another
>>> > > knob that needs to be set. It we put it somewhere generic, then
>>> > > at least we could potentially use it in a generic fashion.
>>> >
>>> > Agreed.  I'm hoping to hear back from Rafael soon about the
>>> > dev_pm_qos
>>> > thing.
>>> >
>>> > >
>>> > >
>>> > > Additionally, it should not be on by default.
>>> >
>>> > I think I disagree with this.  Since we don't have anything like
>>> > laptop-mode AFAIK, I think we do want it on by default.  For the
>>> > server workloads that want to consume more idle power for faster
>>> > response when idle, I think the servers should be willing to make
>>> > this
>>> > change, just like they need to disable overly deep C states, etc.
>>> > (Admittedly, unifying the configuration would be nice.)
>>>
>>> I can see two reasons why we don't want it the default:
>>>
>>> 1) Changes like this has a tendency to cause issues on various types
>>> of
>>> hardware. How many NVMe devices have you tested this on? ALPM on SATA
>>> had a lot of initial problems, where slowed down some SSDs unberably.
>
> I'm reasonably optimistic that the NVMe situation will be a lot better
> for a couple of reasons:
>
> 1. There's only one player involved.  With ALPM, the controller and
> the drive need to cooperate on entering and leaving various idle
> states.  With NVMe, the controller *is* the drive, so there's no issue
> where a drive manufacturer might not have tested with the relevant
> controller or vice versa.
>
> 2. Windows appears to use it.  I haven't tested directly, but the
> Internet seems to think that Windows uses APST and maybe even manual
> state transitions, and that NVMe power states are even mandatory for
> Connected Standby logo compliance.
>
> 3. The feature is new.  NVMe 1.0 didn't support APST at all, so the
> driver is unlikely to cause problems with older drivers.
>
>>
>> ...and some SSDs don't even support this feature yet, so the number of
>> different NVMe devices available to test initially will most likely be
>> small (like the Fultondales I have, all I could check is to see if the
>> code broke anything if the device did not have this power-save
>> feature).
>>
>> I agree with Jens, makes a lot of sense to start with this feature
>> 'off'.
>>
>> To 'advertise' the feature, maybe make the feature a new selection in
>> Kconfig?  Example, initially make it "EXPERIMENTAL", and later when
>> more devices implement this feature it can be integrated more tightly
>> into the NVMe solution and default to on.
>>
>
> How about having a config option that's "default n" that changes the
> default?  I could also add a log message when APST is first enabled on
> a device to make it easier to notice a change.
>

It looks like there is at least one NVMe disk in existence (a
different Samsung device) that sporadically dies when APST is on.
This device appears to also sporadically die when APST is off, but it
lasts considerably longer before dying with APST off.

So here's what I'm tempted to do:

 - For devices that report NVMe version 1.2 support, APST is on by
default.  I hope this is safe.

 - For devices that don't report NVMe 1.2 or higher but do report
APSTA (which implies NVMe 1.1), then we can have a blacklist or a
whitelist.  A blacklist is nice

Re: [PATCH v4 0/3] nvme power saving

2016-09-24 Thread Jens Axboe

On 09/23/2016 05:42 PM, Christoph Hellwig wrote:

Jens,

can we at least get patches 1 and 2 in while pondering the fate
of the right interface for patch 3?



Yes definitely, I have no beef with the first two patches. I'll add them
for 4.9.

--
Jens Axboe



Re: [PATCH v4 0/3] nvme power saving

2016-09-23 Thread Christoph Hellwig
Jens,

can we at least get patches 1 and 2 in while pondering the fate
of the right interface for patch 3?


Re: [PATCH v4 0/3] nvme power saving

2016-09-22 Thread Andy Lutomirski
On Thu, Sep 22, 2016 at 2:33 PM, J Freyensee
 wrote:
> On Thu, 2016-09-22 at 14:43 -0600, Jens Axboe wrote:
>> On 09/22/2016 02:11 PM, Andy Lutomirski wrote:
>> >
>> > On Thu, Sep 22, 2016 at 7:23 AM, Jens Axboe  wrote:
>> > >
>> > >
>> > > On 09/16/2016 12:16 PM, Andy Lutomirski wrote:
>> > > >
>> > > >
>> > > > Hi all-
>> > > >
>> > > > Here's v4 of the APST patch set.  The biggest bikesheddable
>> > > > thing (I
>> > > > think) is the scaling factor.  I currently have it hardcoded so
>> > > > that
>> > > > we wait 50x the total latency before entering a power saving
>> > > > state.
>> > > > On my Samsung 950, this means we enter state 3 (70mW, 0.5ms
>> > > > entry
>> > > > latency, 5ms exit latency) after 275ms and state 4 (5mW, 2ms
>> > > > entry
>> > > > latency, 22ms exit latency) after 1200ms.  I have the default
>> > > > max
>> > > > latency set to 25ms.
>> > > >
>> > > > FWIW, in practice, the latency this introduces seems to be well
>> > > > under 22ms, but my benchmark is a bit silly and I might have
>> > > > measured it wrong.  I certainly haven't observed a slowdown
>> > > > just
>> > > > using my laptop.
>> > > >
>> > > > This time around, I changed the names of parameters after Jay
>> > > > Frayensee got confused by the first try.  Now they are:
>> > > >
>> > > >  - ps_max_latency_us in sysfs: actually controls it.
>> > > >  - nvme_core.default_ps_max_latency_us: sets the default.
>> > > >
>> > > > Yeah, they're mouthfuls, but they should be clearer now.
>> > >
>> > >
>> > > The only thing I don't like about this is the fact that's it's a
>> > > driver private thing. Similar to ALPM on SATA, it's yet another
>> > > knob that needs to be set. It we put it somewhere generic, then
>> > > at least we could potentially use it in a generic fashion.
>> >
>> > Agreed.  I'm hoping to hear back from Rafael soon about the
>> > dev_pm_qos
>> > thing.
>> >
>> > >
>> > >
>> > > Additionally, it should not be on by default.
>> >
>> > I think I disagree with this.  Since we don't have anything like
>> > laptop-mode AFAIK, I think we do want it on by default.  For the
>> > server workloads that want to consume more idle power for faster
>> > response when idle, I think the servers should be willing to make
>> > this
>> > change, just like they need to disable overly deep C states, etc.
>> > (Admittedly, unifying the configuration would be nice.)
>>
>> I can see two reasons why we don't want it the default:
>>
>> 1) Changes like this has a tendency to cause issues on various types
>> of
>> hardware. How many NVMe devices have you tested this on? ALPM on SATA
>> had a lot of initial problems, where slowed down some SSDs unberably.

I'm reasonably optimistic that the NVMe situation will be a lot better
for a couple of reasons:

1. There's only one player involved.  With ALPM, the controller and
the drive need to cooperate on entering and leaving various idle
states.  With NVMe, the controller *is* the drive, so there's no issue
where a drive manufacturer might not have tested with the relevant
controller or vice versa.

2. Windows appears to use it.  I haven't tested directly, but the
Internet seems to think that Windows uses APST and maybe even manual
state transitions, and that NVMe power states are even mandatory for
Connected Standby logo compliance.

3. The feature is new.  NVMe 1.0 didn't support APST at all, so the
driver is unlikely to cause problems with older drivers.

>
> ...and some SSDs don't even support this feature yet, so the number of
> different NVMe devices available to test initially will most likely be
> small (like the Fultondales I have, all I could check is to see if the
> code broke anything if the device did not have this power-save
> feature).
>
> I agree with Jens, makes a lot of sense to start with this feature
> 'off'.
>
> To 'advertise' the feature, maybe make the feature a new selection in
> Kconfig?  Example, initially make it "EXPERIMENTAL", and later when
> more devices implement this feature it can be integrated more tightly
> into the NVMe solution and default to on.
>

How about having a config option that's "default n" that changes the
default?  I could also add a log message when APST is first enabled on
a device to make it easier to notice a change.

--Andy


Re: [PATCH v4 0/3] nvme power saving

2016-09-22 Thread Jens Axboe

On 09/22/2016 04:16 PM, Keith Busch wrote:

On Thu, Sep 22, 2016 at 02:33:36PM -0700, J Freyensee wrote:

...and some SSDs don't even support this feature yet, so the number of
different NVMe devices available to test initially will most likely be
small (like the Fultondales I have, all I could check is to see if the
code broke anything if the device did not have this power-save
feature).

I agree with Jens, makes a lot of sense to start with this feature
'off'.

To 'advertise' the feature, maybe make the feature a new selection in
Kconfig?  Example, initially make it "EXPERIMENTAL", and later when
more devices implement this feature it can be integrated more tightly
into the NVMe solution and default to on.


Should we just leave the kernel out of this then? I bet we could script
this feature in user space.


That actually might be the sanest approach. Then we can tie it into some
generic PM latency setup in the future, from the kernel, when it's
available.

That way we don't get left with some odd NVMe specific PM sysfs knob
that is exposed to userland.

--
Jens Axboe



Re: [PATCH v4 0/3] nvme power saving

2016-09-22 Thread Keith Busch
On Thu, Sep 22, 2016 at 02:33:36PM -0700, J Freyensee wrote:
> ...and some SSDs don't even support this feature yet, so the number of
> different NVMe devices available to test initially will most likely be
> small (like the Fultondales I have, all I could check is to see if the
> code broke anything if the device did not have this power-save
> feature).
> 
> I agree with Jens, makes a lot of sense to start with this feature
> 'off'.
> 
> To 'advertise' the feature, maybe make the feature a new selection in
> Kconfig?  Example, initially make it "EXPERIMENTAL", and later when
> more devices implement this feature it can be integrated more tightly
> into the NVMe solution and default to on.

Should we just leave the kernel out of this then? I bet we could script
this feature in user space.


Re: [PATCH v4 0/3] nvme power saving

2016-09-22 Thread J Freyensee
On Thu, 2016-09-22 at 14:43 -0600, Jens Axboe wrote:
> On 09/22/2016 02:11 PM, Andy Lutomirski wrote:
> > 
> > On Thu, Sep 22, 2016 at 7:23 AM, Jens Axboe  wrote:
> > > 
> > > 
> > > On 09/16/2016 12:16 PM, Andy Lutomirski wrote:
> > > > 
> > > > 
> > > > Hi all-
> > > > 
> > > > Here's v4 of the APST patch set.  The biggest bikesheddable
> > > > thing (I
> > > > think) is the scaling factor.  I currently have it hardcoded so
> > > > that
> > > > we wait 50x the total latency before entering a power saving
> > > > state.
> > > > On my Samsung 950, this means we enter state 3 (70mW, 0.5ms
> > > > entry
> > > > latency, 5ms exit latency) after 275ms and state 4 (5mW, 2ms
> > > > entry
> > > > latency, 22ms exit latency) after 1200ms.  I have the default
> > > > max
> > > > latency set to 25ms.
> > > > 
> > > > FWIW, in practice, the latency this introduces seems to be well
> > > > under 22ms, but my benchmark is a bit silly and I might have
> > > > measured it wrong.  I certainly haven't observed a slowdown
> > > > just
> > > > using my laptop.
> > > > 
> > > > This time around, I changed the names of parameters after Jay
> > > > Frayensee got confused by the first try.  Now they are:
> > > > 
> > > >  - ps_max_latency_us in sysfs: actually controls it.
> > > >  - nvme_core.default_ps_max_latency_us: sets the default.
> > > > 
> > > > Yeah, they're mouthfuls, but they should be clearer now.
> > > 
> > > 
> > > The only thing I don't like about this is the fact that's it's a
> > > driver private thing. Similar to ALPM on SATA, it's yet another
> > > knob that needs to be set. It we put it somewhere generic, then
> > > at least we could potentially use it in a generic fashion.
> > 
> > Agreed.  I'm hoping to hear back from Rafael soon about the
> > dev_pm_qos
> > thing.
> > 
> > > 
> > > 
> > > Additionally, it should not be on by default.
> > 
> > I think I disagree with this.  Since we don't have anything like
> > laptop-mode AFAIK, I think we do want it on by default.  For the
> > server workloads that want to consume more idle power for faster
> > response when idle, I think the servers should be willing to make
> > this
> > change, just like they need to disable overly deep C states, etc.
> > (Admittedly, unifying the configuration would be nice.)
> 
> I can see two reasons why we don't want it the default:
> 
> 1) Changes like this has a tendency to cause issues on various types
> of
> hardware. How many NVMe devices have you tested this on? ALPM on SATA
> had a lot of initial problems, where slowed down some SSDs unberably.

...and some SSDs don't even support this feature yet, so the number of
different NVMe devices available to test initially will most likely be
small (like the Fultondales I have, all I could check is to see if the
code broke anything if the device did not have this power-save
feature).

I agree with Jens, makes a lot of sense to start with this feature
'off'.

To 'advertise' the feature, maybe make the feature a new selection in
Kconfig?  Example, initially make it "EXPERIMENTAL", and later when
more devices implement this feature it can be integrated more tightly
into the NVMe solution and default to on.



Re: [PATCH v4 0/3] nvme power saving

2016-09-22 Thread Jens Axboe

On 09/22/2016 02:11 PM, Andy Lutomirski wrote:

On Thu, Sep 22, 2016 at 7:23 AM, Jens Axboe  wrote:


On 09/16/2016 12:16 PM, Andy Lutomirski wrote:


Hi all-

Here's v4 of the APST patch set.  The biggest bikesheddable thing (I
think) is the scaling factor.  I currently have it hardcoded so that
we wait 50x the total latency before entering a power saving state.
On my Samsung 950, this means we enter state 3 (70mW, 0.5ms entry
latency, 5ms exit latency) after 275ms and state 4 (5mW, 2ms entry
latency, 22ms exit latency) after 1200ms.  I have the default max
latency set to 25ms.

FWIW, in practice, the latency this introduces seems to be well
under 22ms, but my benchmark is a bit silly and I might have
measured it wrong.  I certainly haven't observed a slowdown just
using my laptop.

This time around, I changed the names of parameters after Jay
Frayensee got confused by the first try.  Now they are:

 - ps_max_latency_us in sysfs: actually controls it.
 - nvme_core.default_ps_max_latency_us: sets the default.

Yeah, they're mouthfuls, but they should be clearer now.



The only thing I don't like about this is the fact that's it's a driver private 
thing. Similar to ALPM on SATA, it's yet another knob that needs to be set. It 
we put it somewhere generic, then at least we could potentially use it in a 
generic fashion.


Agreed.  I'm hoping to hear back from Rafael soon about the dev_pm_qos
thing.



Additionally, it should not be on by default.


I think I disagree with this.  Since we don't have anything like
laptop-mode AFAIK, I think we do want it on by default.  For the
server workloads that want to consume more idle power for faster
response when idle, I think the servers should be willing to make this
change, just like they need to disable overly deep C states, etc.
(Admittedly, unifying the configuration would be nice.)


I can see two reasons why we don't want it the default:

1) Changes like this has a tendency to cause issues on various types of
hardware. How many NVMe devices have you tested this on? ALPM on SATA
had a lot of initial problems, where slowed down some SSDs unberably.

2) Rolling out a new kernel and seeing a weird slowdown on some
workloads usually costs a LOT of time to investigate and finally get to
the bottom of. It's not that server setups don't want to make this
change, it's usually that they don't know about it until it's caused
some issue in production (eg slowdown, or otherwise).

Either one of those is enough, in my book, to default it to off. I ran
it on my laptop and saw no power saving wins, unfortunately, for what
it's worth.

--
Jens Axboe



Re: [PATCH v4 0/3] nvme power saving

2016-09-22 Thread Andy Lutomirski
On Thu, Sep 22, 2016 at 7:23 AM, Jens Axboe  wrote:
>
> On 09/16/2016 12:16 PM, Andy Lutomirski wrote:
>>
>> Hi all-
>>
>> Here's v4 of the APST patch set.  The biggest bikesheddable thing (I
>> think) is the scaling factor.  I currently have it hardcoded so that
>> we wait 50x the total latency before entering a power saving state.
>> On my Samsung 950, this means we enter state 3 (70mW, 0.5ms entry
>> latency, 5ms exit latency) after 275ms and state 4 (5mW, 2ms entry
>> latency, 22ms exit latency) after 1200ms.  I have the default max
>> latency set to 25ms.
>>
>> FWIW, in practice, the latency this introduces seems to be well
>> under 22ms, but my benchmark is a bit silly and I might have
>> measured it wrong.  I certainly haven't observed a slowdown just
>> using my laptop.
>>
>> This time around, I changed the names of parameters after Jay
>> Frayensee got confused by the first try.  Now they are:
>>
>>  - ps_max_latency_us in sysfs: actually controls it.
>>  - nvme_core.default_ps_max_latency_us: sets the default.
>>
>> Yeah, they're mouthfuls, but they should be clearer now.
>
>
> The only thing I don't like about this is the fact that's it's a driver 
> private thing. Similar to ALPM on SATA, it's yet another knob that needs to 
> be set. It we put it somewhere generic, then at least we could potentially 
> use it in a generic fashion.

Agreed.  I'm hoping to hear back from Rafael soon about the dev_pm_qos thing.

>
> Additionally, it should not be on by default.

I think I disagree with this.  Since we don't have anything like
laptop-mode AFAIK, I think we do want it on by default.  For the
server workloads that want to consume more idle power for faster
response when idle, I think the servers should be willing to make this
change, just like they need to disable overly deep C states, etc.
(Admittedly, unifying the configuration would be nice.)


Re: [PATCH v4 0/3] nvme power saving

2016-09-22 Thread Jens Axboe

On 09/16/2016 12:16 PM, Andy Lutomirski wrote:

Hi all-

Here's v4 of the APST patch set.  The biggest bikesheddable thing (I
think) is the scaling factor.  I currently have it hardcoded so that
we wait 50x the total latency before entering a power saving state.
On my Samsung 950, this means we enter state 3 (70mW, 0.5ms entry
latency, 5ms exit latency) after 275ms and state 4 (5mW, 2ms entry
latency, 22ms exit latency) after 1200ms.  I have the default max
latency set to 25ms.

FWIW, in practice, the latency this introduces seems to be well
under 22ms, but my benchmark is a bit silly and I might have
measured it wrong.  I certainly haven't observed a slowdown just
using my laptop.

This time around, I changed the names of parameters after Jay
Frayensee got confused by the first try.  Now they are:

 - ps_max_latency_us in sysfs: actually controls it.
 - nvme_core.default_ps_max_latency_us: sets the default.

Yeah, they're mouthfuls, but they should be clearer now.


The only thing I don't like about this is the fact that's it's a driver 
private thing. Similar to ALPM on SATA, it's yet another knob that needs 
to be set. It we put it somewhere generic, then at least we could 
potentially use it in a generic fashion.


Additionally, it should not be on by default.

--
Jens Axboe



Re: [PATCH v4 0/3] nvme power saving

2016-09-22 Thread Christoph Hellwig
On Wed, Sep 21, 2016 at 05:11:03PM -0700, Andy Lutomirski wrote:
> Anything I can/should do to help this make it for 4.9? :)

Maybe you should have added the Reviewed-by: tags you already got to
this repost? :)

Here we go again:

Reviewed-by: Christoph Hellwig 



Re: [PATCH v4 0/3] nvme power saving

2016-09-21 Thread Andy Lutomirski
On Fri, Sep 16, 2016 at 11:16 AM, Andy Lutomirski  wrote:
> Hi all-
>
> Here's v4 of the APST patch set.  The biggest bikesheddable thing (I
> think) is the scaling factor.  I currently have it hardcoded so that
> we wait 50x the total latency before entering a power saving state.
> On my Samsung 950, this means we enter state 3 (70mW, 0.5ms entry
> latency, 5ms exit latency) after 275ms and state 4 (5mW, 2ms entry
> latency, 22ms exit latency) after 1200ms.  I have the default max
> latency set to 25ms.

Anything I can/should do to help this make it for 4.9? :)

--Andy


Re: [PATCH v4 0/3] nvme power saving

2016-09-16 Thread J Freyensee
On Fri, 2016-09-16 at 11:16 -0700, Andy Lutomirski wrote:
> Hi all-
> 
> Here's v4 of the APST patch set.  The biggest bikesheddable thing (I
> think) is the scaling factor.  I currently have it hardcoded so that
> we wait 50x the total latency before entering a power saving state.
> On my Samsung 950, this means we enter state 3 (70mW, 0.5ms entry
> latency, 5ms exit latency) after 275ms and state 4 (5mW, 2ms entry
> latency, 22ms exit latency) after 1200ms.  I have the default max
> latency set to 25ms.
> 
> FWIW, in practice, the latency this introduces seems to be well
> under 22ms, but my benchmark is a bit silly and I might have
> measured it wrong.  I certainly haven't observed a slowdown just
> using my laptop.
> 
> This time around, I changed the names of parameters after Jay
> Frayensee got confused by the first try.  Now they are:
> 
>  - ps_max_latency_us in sysfs: actually controls it.
>  - nvme_core.default_ps_max_latency_us: sets the default.
> 
> Yeah, they're mouthfuls, but they should be clearer now.
> 

I took the patches and applied them to one of my NVMe fabric hosts on
my NVMe-over-Fabrics setup.  Basically, it doesn't test much other than
Andy's explanation that "ps_max_latency_us" does not appear in any of
/sys/block/nvmeXnY sysfs nodes (I have 7) so seems good to me on this
front.

Tested-by: Jay Freyensee 
[jpf: defaults benign to NVMe-over-Fabrics]


[PATCH v4 0/3] nvme power saving

2016-09-16 Thread Andy Lutomirski
Hi all-

Here's v4 of the APST patch set.  The biggest bikesheddable thing (I
think) is the scaling factor.  I currently have it hardcoded so that
we wait 50x the total latency before entering a power saving state.
On my Samsung 950, this means we enter state 3 (70mW, 0.5ms entry
latency, 5ms exit latency) after 275ms and state 4 (5mW, 2ms entry
latency, 22ms exit latency) after 1200ms.  I have the default max
latency set to 25ms.

FWIW, in practice, the latency this introduces seems to be well
under 22ms, but my benchmark is a bit silly and I might have
measured it wrong.  I certainly haven't observed a slowdown just
using my laptop.

This time around, I changed the names of parameters after Jay
Frayensee got confused by the first try.  Now they are:

 - ps_max_latency_us in sysfs: actually controls it.
 - nvme_core.default_ps_max_latency_us: sets the default.

Yeah, they're mouthfuls, but they should be clearer now.

Changes from v3:
 - Remove const from nvme_set_feature()'s parameter.  My inner C++
   programmer cringes a little...

Changes from v2:
 - Rename the parameters.

Changes from v1:
 - Get rid of feature buffer alignment warnings.
 - Change the error message if NPSS is bogus.
 - Rename apst_max_latency_ns to apst_max_latency_us because module params
   don't like u64 or unsigned long long and I wanted to make it fit more
   comfortably in a ulong module param.  (And the nanoseconds were useless.)
 - Add a module parameter for the default max latency.

Andy Lutomirski (3):
  nvme/scsi: Remove power management support
  nvme: Pass pointers, not dma addresses, to nvme_get/set_features()
  nvme: Enable autonomous power state transitions

 drivers/nvme/host/core.c | 180 ---
 drivers/nvme/host/nvme.h |  10 ++-
 drivers/nvme/host/scsi.c |  80 ++---
 include/linux/nvme.h |   6 ++
 4 files changed, 192 insertions(+), 84 deletions(-)

-- 
2.7.4