Re: [PATCH v2 00/91] drm/vc4: Support BCM2711 Display Pipelin

2020-05-27 Thread Daniel Drake
On Wed, May 27, 2020 at 5:13 PM Maxime Ripard  wrote:
> I'm about to send a v3 today or tomorrow, I can Cc you (and Jian-Hong) if you
> want.

That would be great, although given the potentially inconsistent
results we've been seeing so far it would be great if you could
additionally push a git branch somewhere.
That way we can have higher confidence that we are applying exactly
the same patches to the same base etc.

Thanks
Daniel


Re: [PATCH v2 00/91] drm/vc4: Support BCM2711 Display Pipelin

2020-05-26 Thread Daniel Drake
Hi Maxime,

On Tue, May 26, 2020 at 6:20 PM Maxime Ripard  wrote:
> I gave it a try with U-Boot with my latest work and couldn't reproduce it, so 
> it
> seems that I fixed it along the way

Is your latest work available in a git branch anywhere that we could
test directly?

Thanks
Daniel


Re: [PATCH v4 0/3] Replace private domain with per-group default domain

2020-05-05 Thread Daniel Drake
On Wed, May 6, 2020 at 10:03 AM Lu Baolu  wrote:
> https://lkml.org/lkml/2020/4/14/616
> [This has been applied in iommu/next.]
>
> Hence, there is no need to keep the private domain implementation
> in the Intel IOMMU driver. This patch series aims to remove it.

I applied these patches on top of Joerg's branch and confirmed that
they fix the issue discussed in the thread:

[PATCH v2] iommu/vt-d: consider real PCI device when checking if
mapping is needed
(the patch there is no longer needed)

Tested-by: Daniel Drake 

Thanks!


Re: [PATCH] PCI: PM: Fix pci_power_up()

2019-10-14 Thread Daniel Drake
On Mon, Oct 14, 2019 at 7:25 PM Rafael J. Wysocki  wrote:
> Since there is no reason for that difference to exist, modify
> pci_power_up() to follow pci_set_power_state() more closely and
> invoke __pci_start_power_transition() from there to call the
> platform firmware to power up the device (in case that's necessary).
>
> Fixes: db288c9c5f9d ("PCI / PM: restore the original behavior of 
> pci_set_power_state()")
> Reported-by: Daniel Drake 
> Link: 
> https://lore.kernel.org/linux-pm/cad8lp44tyxrmgplkhcqf9hv6smeurmxvmmvmtyfhz6q4se+...@mail.gmail.com/T/#m21be74af263c6a34f36e0fc5c77c5449d9406925
> Signed-off-by: Rafael J. Wysocki 
> ---
>
> Daniel, please test this one.

This one is working too, thanks

Daniel


Re: [RFD] x86/tsc: Loosen the requirements for watchdog - (was x86/hpet: Disable HPET on Intel Coffe Lake)

2019-08-29 Thread Daniel Drake
Hi Thomas,

On Fri, Aug 30, 2019 at 5:38 AM Thomas Gleixner  wrote:
> So if we have to disable the HPET on Kaby Lake alltogether unless Intel
> comes up with the clever fix, i.e. poking at the right registers, then I
> think we should also lift the TSC watchdog restrictions on these machines
> if they are single socket, which they are as the affected CPUs so far are
> mobile and client types.
>
> Also given the fact that we get more and more 'reduced' hardware exposed
> via ACPI and we already dealt with quite some fallout with various related
> issues due to that, I fear we need to bite this bullet anyway anytime soon.

Thanks for the explanation here!

My experience in this area is basically limited to the clock-related
issues that I've sent your way recently, so I don't have deep wisdom
to draw upon, but what you wrote here makes sense to me.

If you can outline a testing procedure, we can test upcoming patches
on Coffee Lake and Kaby Lake consumer laptops.

Thanks,
Daniel


Re: Early EFI-related boot freeze in parse_setup_data()

2019-08-27 Thread Daniel Drake
On Fri, Aug 16, 2019 at 2:14 PM Daniel Drake  wrote:
> Anyway, the system freeze occurs in parse_setup_data(), specifically:
>
> data = early_memremap(pa_data, sizeof(*data));
> data_len = data->len + sizeof(struct setup_data);
>
> Dereferencing data->len causes the system to hang. I presume it
> triggers an exception handler due to some kind of invalid memory
> access.
>
> By returning early in that function, boot continues basically fine. So
> I could then log the details: pa_data has value 0x892bb018 and
> early_memremap returns address 0xff200018. Accessing just a
> single byte at that address causes the system hang.

I noticed a complaint about NX in the logs, right where it does the
early_memremap of this data (which is now at address 0x893c0018):

 Notice: NX (Execute Disable) protection missing in CPU!
 e820: update [mem 0x893c0018-0x893cec57] usable ==> usable
 e820: update [mem 0x893c0018-0x893cec57] usable ==> usable
 e820: update [mem 0x893b3018-0x893bf057] usable ==> usable
 e820: update [mem 0x893b3018-0x893bf057] usable ==> usable

Indeed, in the BIOS setup menu, "NX Mode" was Disabled.
Setting it to Enabled avoids the hang and Linux boots as normal. Weird!

Daniel


Re: [PATCH v2 1/2] ASoC: es8316: fix headphone mixer volume table

2019-08-26 Thread Daniel Drake
On Mon, Aug 26, 2019 at 11:39 PM Katsuhiro Suzuki
 wrote:
>
> This patch fix setting table of Headphone mixer volume.
> Current code uses 4 ... 7 values but these values are prohibited.
>
> Correct settings are the following:
>    -12dB
>   0001 -10.5dB
>   0010 -9dB
>   0011 -7.5dB
>   0100 -6dB
>   1000 -4.5dB
>   1001 -3dB
>   1010 -1.5dB
>   1011 0dB
>
> Signed-off-by: Katsuhiro Suzuki 
Reviewed-by: Daniel Drake 


Re: [PATCH v2 2/2] ASoC: es8316: fix inverted L/R of headphone mixer volume

2019-08-26 Thread Daniel Drake
On Mon, Aug 26, 2019 at 11:39 PM Katsuhiro Suzuki
 wrote:
>
> This patch fixes inverted Left-Right channel of headphone mixer
> volume by wrong shift_left, shift_right values.
>
> Signed-off-by: Katsuhiro Suzuki 

Agrees with the spec

Reviewed-by: Daniel Drake 


Re: [PATCH] ASoC: es8316: limit headphone mixer volume

2019-08-25 Thread Daniel Drake
On Mon, Aug 26, 2019 at 1:38 AM Hans de Goede  wrote:
> On 24-08-19 23:04, Katsuhiro Suzuki wrote:
> > This patch limits Headphone mixer volume to 4 from 7.
> > Because output sound suddenly becomes very loudly with many noise if
> > set volume over 4.

That sounds like something that should be limited in UCM.

> Higher then 4 not working matches my experience, see this comment from
> the UCM file: alsa-lib/src/conf/ucm/codecs/es8316/EnableSeq.conf :
>
> # Set HP mixer vol to -6 dB (4/7) louder does not work
> cset "name='Headphone Mixer Volume' 4"

What does "does not work" mean more precisely?

I checked the spec, there is indeed something wrong in the kernel driver here.
The db scale is not a simple scale as the kernel source suggests.

Instead it is:
 – -12dB
0001 – -10.5dB
0010 – -9dB
0011 – -7.5dB
0100 – -6dB
1000 – -4.5dB
1001 – -3dB
1010 – -1.5dB
1011 – 0dB

So perhaps we can fix the kernel to follow this table and then use UCM
to limit the volume if its too high on a given platform?

Thanks
Daniel


Re: Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure

2019-08-22 Thread Daniel Drake
On Fri, Aug 23, 2019 at 9:54 AM ndrw  wrote:
> That's obviously a lot better than hard freezes but I wouldn't call such
> system lock-ups an excellent result. PSI-triggered OOM killer would have
> indeed been very useful as an emergency brake, and IMHO such mechanism
> should be built in the kernel and enabled by default. But in my
> experience it does a very poor job at detecting imminent freezes on
> systems without swap or with very fast swap (zram).

Perhaps you could share your precise test environment and the PSI
condition you are expecting to hit (that is not being hit). Except for
the single failure report mentioned, it's been working fine here in
all setups, including with zram which is shipped out of the box.

The nice thing about psi is that it's based on how much real-world
time the kernel is spending doing memory management. So it's very well
poised to handle differences in swap speed etc. You effectively just
set the threshold for how much time you view as excessive for the
kernel to be busy doing MM, and psi tells you when that's hit.

> > There's just one issue we've seen so far: a single report of psi reporting
> > memory pressure on a desktop system with 4GB RAM which is only running
> > the normal desktop components plus a single gmail tab in the web browser.
> > psi occasionally reports high memory pressure, so then psi-monitor steps in 
> > and
> > kills the browser tab, which seems erroneous.
>
> Is it Chrome/Chromium? If so, that's a known bug
> (https://bugs.chromium.org/p/chromium/issues/detail?id=333617)

The issue does not concern which process is being killed. The issue is
that in the single report we have of this, psi is apparently reporting
high memory pressure even though the system has plenty of free memory.

Daniel


Re: Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure

2019-08-20 Thread Daniel Drake
Hi,

Artem S. Tashkinov wrote:
> Once you hit a situation when opening a new tab requires more RAM than
> is currently available, the system will stall hard. You will barely  be
> able to move the mouse pointer. Your disk LED will be flashing
> incessantly (I'm not entirely sure why). You will not be able to run new
> applications or close currently running ones.
> 
> This little crisis may continue for minutes or even longer. I think
> that's not how the system should behave in this situation. I believe
> something must be done about that to avoid this stall.

Thanks for reviving this discussion. Indeed, this is a real pain point in
the Linux experience.

For Endless, we sunk some time into this and emerged with psi being the best
solution we could find. The way it works on a time basis seems very appropriate
when what we're ultimately interested in is maintaining desktop UI 
interactivity.
With psi enabled in the kernel, we add a small userspace daemon to kill a 
process
when psi reports that *all* userspace tasks are being blocked on kernel memory
management work for (at least) 1 second in a 10 second period.

https://github.com/endlessm/eos-boot-helper/blob/master/psi-monitor/psi-monitor.c

To share our results so far, despite this daemon being a quick initial
implementation, we find that it is bringing excellent results, no more memory
pressure hangs. The system recovers in less than 30 seconds, usually in more
like 10-15 seconds. Sadly a process got killed along the way, but that's a lot
better than the user having no option other than pulling the plug.
The system may not always recover to a totally smooth state, but the
responsiveness to mouse movements and clicks is still decent, so at that point
the user can close some more windows to restore full UI performance again. 

There's just one issue we've seen so far: a single report of psi reporting
memory pressure on a desktop system with 4GB RAM which is only running
the normal desktop components plus a single gmail tab in the web browser.
psi occasionally reports high memory pressure, so then psi-monitor steps in and
kills the browser tab, which seems erroneous. We haven't had a chance to look at
this in detail yet. Here's a log from the kernel OOM killer showing the memory 
and
process state at this point.
https://gist.github.com/dsd/b338bab0206dcce78263f6bb87de7d4a

> I'm almost sure some sysctl parameters could be changed to avoid this
> situation but something tells me this could be done for everyone and
> made default because some non tech-savvy users will just give up on
> Linux if they ever get in a situation like this and they won't be keen
> or even be able to Google for solutions.

As you anticipated, myself and others already jumped in with solutions
appropriate for tech-savvy users. Getting solutions widely deployed is indeed
another important aspect to tackle.

If you're curious to see how this can look from a "just works" standpoint, you
might be interested in downloading Endless (www.endlessos.com) and running your
tests there; we have the above solution running and active out of the box.

Bastien Nocera has recently adapted and extended our solution, presumably
with an eye towards getting this more widely deployed as a standard part
of the Linux desktop.
https://gitlab.freedesktop.org/hadess/low-memory-monitor/

And if there is a meaningful way to make the kernel behave better, that would
obviously be of huge value too.

Thanks
Daniel


Re: [PATCH] x86/apic: Handle missing global clockevent gracefully

2019-08-14 Thread Daniel Drake
On Mon, Aug 12, 2019 at 2:16 PM Daniel Drake  wrote:
> I can do a bit of testing on other platforms too. Are there any
> specific tests I should run, other than checking that the system boots
> and doesn't have any timer watchdog complaints in the log?

Tested this on 2 AMD platforms that were not affected by the crash here.
In addition to confirming that they boot fine without timer complaints
in the logs, I checked the calibrate_APIC_clock() result before and
after this patch. I repeated each test twice.

Asus E402YA (AMD E2-7015)
Before: 99811, 99811
After: 99812, 99812

Acer Aspire A315-21G (AMD A9-9420e)
Before: 99811, 99811
After: 99807, 99820

Those new numbers seem very close to the previous ones and I didn't
observe any problems.

Thanks
Daniel


Re: [RFC PATCH v7] rtl8xxxu: Improve TX performance of RTL8723BU on rtl8xxxu driver

2019-08-13 Thread Daniel Drake
On Mon, Aug 12, 2019 at 11:21 PM Jes Sorensen  wrote:
> On 8/12/19 10:32 AM, Kalle Valo wrote:
> > This is marked as RFC so I'm not sure what's the plan. Should I apply
> > this?
>
> I think it's at a point where it's worth applying - I kinda wish I had
> had time to test it, but I won't be near my stash of USB dongles for a
> little while.

The last remaining reason it was RFC was pending feedback from Jes, to
check that his earlier comments had been adequately addressed. So yes
I think it's good to apply now.

Thanks,
Daniel


Re: [PATCH] x86/apic: Handle missing global clockevent gracefully

2019-08-12 Thread Daniel Drake
On Fri, Aug 9, 2019 at 8:54 PM Thomas Gleixner  wrote:
> Some newer machines do not advertise legacy timers. The kernel can handle
> that situation if the TSC and the CPU frequency are enumerated by CPUID or
> MSRs and the CPU supports TSC deadline timer. If the CPU does not support
> TSC deadline timer the local APIC timer frequency has to be known as well.
>
> Some Ryzens machines do not advertize legacy timers, but there is no
> reliable way to determine the bus frequency which feeds the local APIC
> timer when the machine allows overclocking of that frequency.
>
> As there is no legacy timer the local APIC timer calibration crashes due to
> a NULL pointer dereference when accessing the not installed global clock
> event device.
>
> Switch the calibration loop to a non interrupt based one, which polls
> either TSC (frequency known) or jiffies. The latter requires a global
> clockevent. As the machines which do not have a global clockevent installed
> have a known TSC frequency this is a non issue. For older machines where
> TSC frequency is not known, there is no known case where the legacy timers
> do not exist as that would have been reported long ago.

This solves the problem I described in the thread:
setup_boot_APIC_clock() NULL dereference during early boot on
reduced hardware platforms

Thanks for your quick support!

> Note: Only lightly tested, but of course not on an affected machine.
>
>   If that works reliably, then this needs some exhaustive testing
>   on a wide range of systems so we can risk backports to stable
>   kernels.

I can do a bit of testing on other platforms too. Are there any
specific tests I should run, other than checking that the system boots
and doesn't have any timer watchdog complaints in the log?

Thanks
Daniel


Re: setup_boot_APIC_clock() NULL dereference during early boot on reduced hardware platforms

2019-08-01 Thread Daniel Drake
On Thu, Aug 1, 2019 at 3:16 PM Aubrey Li  wrote:
> No, the platform needs a global clock event, can you turn on some other
> clock source on your platform, like HPET?

Thanks Audrey and Thomas for the quick hints!

I double checked under Windows - it seems to be using a HPET there.
Also there is the HPET ACPI table. So I think this is the right angle
to look at.

Under Linux, hpet_legacy_clockevent_register() is the function where
global_clock_event can be set to HPET.

However, the only way this can be called is from hpet_enable().

hpet_enable() is called from 2 places:
 1. From hpet_time_init(). This is the default x86 timer_init that
acpi_generic_reduced_hw_init() took out of action here.
 2. From hpet_late_init(). However that function is only called late,
after calibrate_APIC_clock() has already crashed the kernel. Also,
even if moved earlier it would also not call hpet_enable() here
because the ACPI HPET table parsing has already populated
hpet_address.

I tried slotting in a call to hpet_enable() at an earlier point
regardless, but I still end up with the kernel hanging later during
boot, probably because irq0 fails to be setup and this error is hit:
if (setup_irq(0, ))
pr_info("Failed to register legacy timer interrupt\n");

I'll go deeper into that; further hints welcome too.

Thanks
Daniel


setup_boot_APIC_clock() NULL dereference during early boot on reduced hardware platforms

2019-08-01 Thread Daniel Drake
Hi,

Working with a new consumer laptop based on AMD R7-3700U, we are
seeing a kernel panic during early boot (before the display
initializes). It's a new product and there is no previous known
working kernel version (tested 5.0, 5.2 and current linus master).

We may have also seen this problem on a MiniPC based on AMD APU 7010
from another vendor, but we don't have it in hands right now to
confirm that it's the exact same crash.

earlycon shows the details: a NULL dereference under
setup_boot_APIC_clock(), which actually happens in
calibrate_APIC_clock():

/* Replace the global interrupt handler */
real_handler = global_clock_event->event_handler;
global_clock_event->event_handler = lapic_cal_handler;

global_clock_event is NULL here. This is a "reduced hardware" ACPI
platform so acpi_generic_reduced_hw_init() has set timer_init to NULL,
avoiding the usual codepaths that would set up global_clock_event.

I tried the obvious:
 if (!global_clock_event)
return -1;

However I'm probably missing part of the big picture here, as this
only makes boot fail later on. It continues til the next point that
something leads to schedule(), such as a driver calling msleep() or
mark_readonly() calling rcu_barrier(), etc. Then it hangs.

Is something missing in terms of timer setup here? Suggestions appreciated...

Thanks
Daniel


Re: [PATCH AUTOSEL 5.1 013/219] x86/tsc: Use CPUID.0x16 to calculate missing crystal frequency

2019-07-15 Thread Daniel Drake
Hi,

On Mon, Jul 15, 2019 at 9:38 PM Sasha Levin  wrote:
> From: Daniel Drake 
>
> [ Upstream commit 604dc9170f2435d27da5039a3efd757dceadc684 ]

In my opinion this is not stable kernel material.

It alone does not solve a particular bug. It's part of a larger effort
to decrease usage of the 8254 PIT on modern platforms, which solves a
real problem (see "x86: skip PIT initialization on modern chipsets"),
but I'd conservatively recommend just leaving these patches out of the
stable tree. The problem has existed for a while without a
particularly wide impact, and there is a somewhat documented
workaround of changing BIOS settings.

Daniel


Re: [PATCH v2] rtl8xxxu: Fix wifi low signal strength issue of RTL8723BU

2019-07-12 Thread Daniel Drake
On Fri, Jul 5, 2019 at 10:27 AM Chris Chiu  wrote:
> Per the code before REG_S0S1_PATH_SWITCH setting, the driver has told
> the co-processor the antenna is inverse.
> memset(, 0, sizeof(struct h2c_cmd));
> h2c.ant_sel_rsv.cmd = H2C_8723B_ANT_SEL_RSV;
> h2c.ant_sel_rsv.ant_inverse = 1;
> h2c.ant_sel_rsv.int_switch_type = 0;
> rtl8xxxu_gen2_h2c_cmd(priv, , sizeof(h2c.ant_sel_rsv));
>
> At least the current modification is consistent with the antenna
> inverse setting.
> I'll verify on vendor driver about when/how the inverse be determined.

I checked this out. The codepath hit hardcodes it to the AUX port,
i.e. "inverted" setup:

EXhalbtc8723b1ant_PowerOnSetting():
if(pBtCoexist->chipInterface == BTC_INTF_USB)
{
// fixed at S0 for USB interface
pBtCoexist->fBtcWrite4Byte(pBtCoexist, 0x948, 0x0);

u1Tmp |= 0x1;// antenna inverse
pBtCoexist->fBtcWriteLocalReg1Byte(pBtCoexist, 0xfe08, u1Tmp);

pBoardInfo->btdmAntPos = BTC_ANTENNA_AT_AUX_PORT;
  }

So I'm further convinced that these performance-enhancing changes are
increasing consistency with the vendor driver.

Daniel


Re: [PATCH] rtl8xxxu: Fix wifi low signal strength issue of RTL8723BU

2019-07-04 Thread Daniel Drake
On Wed, Jul 3, 2019 at 8:59 PM Jes Sorensen  wrote:
> My point is this seems to be very dongle dependent :( We have to be
> careful not breaking it for some users while fixing it for others.

Do you still have your device?

Once we get to the point when you are happy with Chris's two patches
here on a code review level, we'll reach out to other driver
contributors plus people who previously complained about these types
of problems, and see if we can get some wider testing.

Larry, do you have these devices, can you help with testing too?

Thanks
Daniel


Re: [PATCH v2] rtl8xxxu: Fix wifi low signal strength issue of RTL8723BU

2019-07-04 Thread Daniel Drake
On Thu, Jul 4, 2019 at 6:55 PM Chris Chiu  wrote:
> The WiFi tx power of RTL8723BU is extremely low after booting. So
> the WiFi scan gives very limited AP list and it always fails to
> connect to the selected AP. This module only supports 1x1 antenna
> and the antenna is switched to bluetooth due to some incorrect
> register settings.
>
> Compare with the vendor driver https://github.com/lwfinger/rtl8723bu,
> we realized that the 8723bu's enable_rf() does the same thing as
> rtw_btcoex_HAL_Initialize() in vendor driver. And it by default
> sets the antenna path to BTC_ANT_PATH_BT which we verified it's
> the cause of the wifi weak tx power. The vendor driver will set
> the antenna path to BTC_ANT_PATH_PTA in the consequent btcoexist
> mechanism, by the function halbtc8723b1ant_PsTdma.

Checking these details in the vendor driver:
EXhalbtc8723b1ant_PowerOnSetting sets:
pBoardInfo->btdmAntPos = BTC_ANTENNA_AT_AUX_PORT;

Following the code flow from rtw_btcoex_HAL_Initialize(), this has a
bWifiOnly parameter which will ultimately influence the final register
value.
Continuing the flow, we reach halbtc8723b1ant_SetAntPath() with
bInitHwCfg=TRUE, bWifiOff=FALSE. antPosType is set to WIFI in the
bWifiOnly case, and BT otherwise.

I'm assuming that bUseExtSwitch = FALSE (existing rtl8xxxu code also
seems to make the same assumption).
For the bWifiOnly=FALSE case, it uses BT,
pBtCoexist->fBtcWrite4Byte(pBtCoexist, 0x948, 0x0);
and rtl8xxxu seems to do the same - seemingly routing the antenna path
for BT only.

As for halbtc8723b1ant_PsTdma() then being called in a way that causes
it to switch to the PTA path a little later, it's more difficult to
point out how that happens in an email, but I thin kwe can trust you
on that :) There are certainly many callsites that would pass those
parameters.

> +* Different settings per different antenna position.
> +*  Antenna Position:   | Normal   Inverse
> +* --
> +* Antenna switch to BT:|  0x280,   0x00
> +* Antenna switch to WiFi:  |  0x0, 0x280
> +* Antenna switch to PTA:   |  0x200,   0x80
>  */
> -   rtl8xxxu_write32(priv, REG_S0S1_PATH_SWITCH, 0x00);
> +   rtl8xxxu_write32(priv, REG_S0S1_PATH_SWITCH, 0x80);

I don't really understand what we mean by an "inverse" antenna and my
reading of the vendor driver leads me to another interpretation.

The logic is based around an antenna position - btdmAntPos. It takes
one of two values:
BTC_ANTENNA_AT_MAIN_PORT= 0x1,
BTC_ANTENNA_AT_AUX_PORT= 0x2,

So the chip has pins to support two antennas - a "main" antenna and an
"aux" one.

We know we're dealing with a single antenna, so the actual module is
going to only be using one of those antenna interfaces. If the chip
tries to use the other antenna interface, it's effectively not using
the antenna. So it's rather important to tell the chip to use the
right interface.

And that's exactly what happens here. btdmAntPos is hardcoded that the
antenna is on the aux port for these devices, and this code is telling
the chip that this is how things are wired up.

The alternative way of calling this an antenna inverse (which the
vendor driver also does in another section), i.e. "antenna is not
connected to the main port but instead it's connected to the other
one", seems less clear to me.

Even if we don't fully understand what's going on here, I'm convinced
that your code change is fixing an inconsistency with the vendor
driver, and most significantly, making the signal level actually
usable on our devices. But if you agree with my interpretation of
these values then maybe you could update the comment here!

Daniel


Re: [PATCH] rtl8xxxu: Fix wifi low signal strength issue of RTL8723BU

2019-07-03 Thread Daniel Drake
On Tue, Jul 2, 2019 at 4:01 PM Chris Chiu  wrote:
> When the vendor driver invokes rtw_btcoex_HAL_Initialize, which will then
> call halbtc8723b1ant_SetAntPath to configure the registers in this patch.
> From the code, the registers will have different register settings per the
> antenna position and the phase. If the driver is in the InitHwConfig phase,
> the register value is identical to what rtl8xxxu driver does in enable_rf().
> However, the vendor driver will do halbtc8723b1ant_PsTdma() twice by
> halbtc8723b1ant_ActionWifiNotConnected() with the type argument 8 for
> PTA control about 200ms after InitHwConfig. The _ActionWifiNotConnected
> is invoked by the BTCOEXIST. I keep seeing the halbtc8723b1ant_PsTdma
> with type 8 been called every 2 seconds.

I see. So this is a measured step towards consistency with the vendor
driver. Maybe you can mention these details in the commit message.

> Yes, it ends up with 0x0c not matter what antenna position type is. Unless
> it's configured wifi only.

Also worth mentioning in the commit message then, that the 0xc
ACT_CONTROL value is effectively what the working vendor driver uses.

> > > -* 0x280, 0x00, 0x200, 0x80 - not clear
> > > +* Different settings per different antenna position.
> > > +* Antenna switch to BT: 0x280, 0x00 (inverse)
> > > +* Antenna switch to WiFi: 0x0, 0x280 (inverse)
> > > +* Antenna controlled by PTA: 0x200, 0x80 (inverse)
> > >  */
> > > -   rtl8xxxu_write32(priv, REG_S0S1_PATH_SWITCH, 0x00);
> > > +   rtl8xxxu_write32(priv, REG_S0S1_PATH_SWITCH, 0x80);
> >
> > I don't quite follow the comment here. Why are there 2 values listed
> > for each possibility, what do you mean by inverse? You say the
> > register settings were incorrect, but the previous value was 0x00
> > which you now document as "antenna switch to wifi" which sounds like
> > it was already correct?
> >
> > Which value does the vendor driver use?
> >
> The first column means the value for normal antenna installation, wifi
> on the main port. The second column is the value for inverse antenna
> installation. So if I want to manually switch the antenna for BT use,
> and the antenna installation is inverse, I need to set to 0x280. So 0x80
> means I want to switch to PTA and the antenna installation in inverse.

Still not quite clear what you mean by "inverse" here, but maybe I
just don't know anything about antennas. Is it that an antenna
connector has two pins and this one swaps the meaning of each pin?

Does the new value of 0x80 reflect what the vendor driver does in practice?

Thanks
Daniel


Re: [PATCH] rtl8xxxu: Fix wifi low signal strength issue of RTL8723BU

2019-07-03 Thread Daniel Drake
On Tue, Jul 2, 2019 at 8:42 PM Jes Sorensen  wrote:
> We definitely don't want to bring over the vendor code, since it's a
> pile of spaghetti, but we probably need to get something sorted. This
> went down the drain when the bluetooth driver was added without taking
> it into account - long after this driver was merged.

Yeah, I didn't mean bring over quite so literally.. Chris is studying
it and figuring out the neatest way to reimplement the required bits.

As for the relationship with bluetooth.. actually the bug that Chris
is working on here is that the rtl8xxxu wifi signal is totally
unusable *until* the bluetooth driver is loaded.
Once the bluetooth driver is loaded, at the point of bluetooth
firmware upload, the rtl8xxxu signal magiaclly strength becomes good.
I think this is consistent with other rtl8xxxu problem reports that we
saw lying around, although they had not been diagnosed in so much
detail.
The rtl8723bu vendor driver does not suffer this problem, it works
fine with or without the bluetooth driver in place.

Daniel


Re: [PATCH] rtl8xxxu: Fix wifi low signal strength issue of RTL8723BU

2019-07-01 Thread Daniel Drake
Hi Chris,

On Thu, Jun 27, 2019 at 5:53 PM Chris Chiu  wrote:
> The WiFi tx power of RTL8723BU is extremely low after booting. So
> the WiFi scan gives very limited AP list and it always fails to
> connect to the selected AP. This module only supports 1x1 antenna
> and the antenna is switched to bluetooth due to some incorrect
> register settings.
>
> This commit hand over the antenna control to PTA, the wifi signal
> will be back to normal and the bluetooth scan can also work at the
> same time. However, the btcoexist still needs to be handled under
> different circumstances. If there's a BT connection established,
> the wifi still fails to connect until disconneting the BT.
>
> Signed-off-by: Chris Chiu 

Really nice work finding this!

I know that after this change, you plan to bring over the btcoexist
code from the vendor driver (or at least the minimum required code)
for a more complete fix, but I'm curious how you found these magic
register values and how they compare to the values used by the vendor
driver with btcoexist?

What's PTA? A type of firmware-implemented btcoexist that works for
scanning but doesn't work when a BT connection is actually
established?

> diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8723b.c 
> b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8723b.c
> index 3adb1d3d47ac..6c3c70d93ac1 100644
> --- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8723b.c
> +++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8723b.c
> @@ -1525,7 +1525,7 @@ static void rtl8723b_enable_rf(struct rtl8xxxu_priv 
> *priv)
> /*
>  * WLAN action by PTA
>  */
> -   rtl8xxxu_write8(priv, REG_WLAN_ACT_CONTROL_8723B, 0x04);
> +   rtl8xxxu_write8(priv, REG_WLAN_ACT_CONTROL_8723B, 0x0c);

The comment above this still says "WLAN action by PTA" and the vendor
driver has:
//set wlan_act control by PTA
pBtCoexist->fBtcWrite1Byte(pBtCoexist, 0x76e, 0x4);

but then also:
//set wlan_act control by PTA
pBtCoexist->fBtcWrite1Byte(pBtCoexist, 0x76e, 0xc);

So this change seems to be at least consistent with ambiguity of the
vendor driver, do you have any understanding of the extra bit that is
now set here?

It's not easy to follow the code flow of the vendor driver to see what
actually happens, have you checked that, does it end up using the 0xc
value?

> -* 0x280, 0x00, 0x200, 0x80 - not clear
> +* Different settings per different antenna position.
> +* Antenna switch to BT: 0x280, 0x00 (inverse)
> +* Antenna switch to WiFi: 0x0, 0x280 (inverse)
> +* Antenna controlled by PTA: 0x200, 0x80 (inverse)
>  */
> -   rtl8xxxu_write32(priv, REG_S0S1_PATH_SWITCH, 0x00);
> +   rtl8xxxu_write32(priv, REG_S0S1_PATH_SWITCH, 0x80);

I don't quite follow the comment here. Why are there 2 values listed
for each possibility, what do you mean by inverse? You say the
register settings were incorrect, but the previous value was 0x00
which you now document as "antenna switch to wifi" which sounds like
it was already correct?

Which value does the vendor driver use?

> /*
>  * Software control, antenna at WiFi side
> diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c 
> b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
> index 8136e268b4e6..87b2179a769e 100644
> --- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
> +++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
> @@ -3891,12 +3891,13 @@ static int rtl8xxxu_init_device(struct ieee80211_hw 
> *hw)
>
> /* Check if MAC is already powered on */
> val8 = rtl8xxxu_read8(priv, REG_CR);
> +   val16 = rtl8xxxu_read16(priv, REG_SYS_CLKR);
>
> /*
>  * Fix 92DU-VC S3 hang with the reason is that secondary mac is not
>  * initialized. First MAC returns 0xea, second MAC returns 0x00
>  */
> -   if (val8 == 0xea)
> +   if (val8 == 0xea || !(val16 & BIT(11)))
> macpower = false;
> else
> macpower = true;

At a glance I can't see which code this corresponds to in the vendor
driver, can you point that out?

Thanks
Daniel


Re: [RFC PATCH v5] rtl8xxxu: Improve TX performance of RTL8723BU on rtl8xxxu driver

2019-07-01 Thread Daniel Drake
On Mon, Jun 17, 2019 at 2:56 PM Chris Chiu  wrote:
> With this commit, the tx rate of each data and qos data packet will
> be 39Mbps (MCS4) with the 0xF0 as the tx rate mask. The 20th bit
> to 23th bit means MCS4 to MCS7. It means that the firmware still picks
> the lowest rate from the rate mask and explains why the tx rate of
> data and qos data is always lowest 1Mbps because the default rate mask
> passed is always 0xFFF ranges from the basic CCK rate, OFDM rate,
> and MCS rate. However, with Realtek's driver, the tx rate observed from
> wireshark under the same condition is almost 65Mbps or 72Mbps
suggestion: add:
, indicating that rtl8xxxu could still be further improved.

Then remove this paragraph, I think we're in agreement of the approach here:
> I believe the firmware of RTL8723BU may need fix. And I think we
> can still bring in the dm_watchdog as rtlwifi to improve from the
> driver side. Please leave precious comments for my commits and
> suggest what I can do better. Or suggest if there's any better idea
> to fix this. Thanks.

> Signed-off-by: Chris Chiu 
Reviewed-by: Daniel Drake 

> +* is supported and no iface_combinations are providec.

typo: provided


Re: [PATCH v3] Bluetooth: btrtl: HCI reset on close for Realtek BT chip

2019-07-01 Thread Daniel Drake
On Tue, Jun 25, 2019 at 4:32 PM Jian-Hong Pan  wrote:
> Realtek RTL8822BE BT chip on ASUS X420FA cannot be turned on correctly
> after on-off several times. Bluetooth daemon sets BT mode failed when
> this issue happens. Scanning must be active while turning off for this
> bug to be hit.
>
> bluetoothd[1576]: Failed to set mode: Failed (0x03)
>
> If BT is turned off, then turned on again, it works correctly again.
>
> According to the vendor driver, the HCI_QUIRK_RESET_ON_CLOSE flag is set
> during probing. So, this patch makes Realtek's BT reset on close to fix
> this issue.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203429
> Signed-off-by: Jian-Hong Pan 

Reviewed-by: Daniel Drake 


Re: [PATCH] x86: skip PIT initialization on modern chipsets

2019-06-28 Thread Daniel Drake
Oops, please also add a reference to the main thread:

On Fri, Jun 28, 2019 at 3:23 PM Daniel Drake  wrote:
> Detect modern setups where we have no need for the PIT (e.g. where
> we already know the TSC and LAPIC timer frequencies, so no need
> to calibrate them against the PIT), and skip initialization PIT in
> such cases.
>
> Skip the IO-APIC timer-checking code when we don't have a PIT
> to test against (this was causing the panic).

Link: 
https://lore.kernel.org/lkml/CAD8Lp45fedoPLnK=umuhhtkjy5u2h04sykrx3u+m04u6fpv...@mail.gmail.com

> Tested-by: Daniel Drake 


[PATCH] x86: skip PIT initialization on modern chipsets

2019-06-28 Thread Daniel Drake
From: Thomas Gleixner 

Recent Intel chipsets including Skylake and ApolloLake have
a special ITSSPRC register which allows the 8254 PIT to be gated.
When gated, the 8254 registers can still be programmed as normal, but
there are no IRQ0 timer interrupts.

Some products such as the Connex L1430 and exone go Rugged E11
use this register to ship with the PIT gated by default. This causes
Linux to fail to boot:
  Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with
  apic=debug and send a report.

The panic happens before the framebuffer is initialized, so to the user,
it appears as an early boot hang on a black screen.

Affected products typically have a BIOS option that can be used
to enable the 8254 and make Linux work (Chipset -> South Cluster
Configuration -> Miscellaneous Configuration -> 8254 Clock Gating),
however it would be best to make Linux support the no-8254 case.

Detect modern setups where we have no need for the PIT (e.g. where
we already know the TSC and LAPIC timer frequencies, so no need
to calibrate them against the PIT), and skip initialization PIT in
such cases.

Skip the IO-APIC timer-checking code when we don't have a PIT
to test against (this was causing the panic).

Tested-by: Daniel Drake 
---
 arch/x86/include/asm/apic.h|  2 ++
 arch/x86/include/asm/time.h|  1 +
 arch/x86/kernel/apic/apic.c| 27 +++
 arch/x86/kernel/apic/io_apic.c |  4 
 arch/x86/kernel/i8253.c| 25 -
 arch/x86/kernel/time.c |  7 +--
 6 files changed, 63 insertions(+), 3 deletions(-)

Thomas, please add your sign off before pushing.

Depends on commits in tip.git x86/apic
  x86/tsc: Use CPUID.0x16 to calculate missing crystal frequency
  x86/apic: Rename 'lapic_timer_frequency' to 'lapic_timer_period'
  x86/tsc: Set LAPIC timer period to crystal clock frequency

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 71026eba9ba4..3e66b82454ce 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -175,6 +175,7 @@ extern void lapic_assign_system_vectors(void);
 extern void lapic_assign_legacy_vector(unsigned int isairq, bool replace);
 extern void lapic_online(void);
 extern void lapic_offline(void);
+extern bool apic_needs_pit(void);
 
 #else /* !CONFIG_X86_LOCAL_APIC */
 static inline void lapic_shutdown(void) { }
@@ -188,6 +189,7 @@ static inline void init_bsp_APIC(void) { }
 static inline void apic_intr_mode_init(void) { }
 static inline void lapic_assign_system_vectors(void) { }
 static inline void lapic_assign_legacy_vector(unsigned int i, bool r) { }
+static inline bool apic_needs_pit(void) { return true; }
 #endif /* !CONFIG_X86_LOCAL_APIC */
 
 #ifdef CONFIG_X86_X2APIC
diff --git a/arch/x86/include/asm/time.h b/arch/x86/include/asm/time.h
index cef818b16045..8ac563abb567 100644
--- a/arch/x86/include/asm/time.h
+++ b/arch/x86/include/asm/time.h
@@ -7,6 +7,7 @@
 
 extern void hpet_time_init(void);
 extern void time_init(void);
+extern bool pit_timer_init(void);
 
 extern struct clock_event_device *global_clock_event;
 
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 8956072f677d..abe67c3ae071 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -821,6 +821,33 @@ static int __init lapic_init_clockevent(void)
return 0;
 }
 
+bool __init apic_needs_pit(void)
+{
+   /*
+* If the frequencies are not known, PIT is required for both TSC
+* and apic timer calibration.
+*/
+   if (!tsc_khz || !cpu_khz)
+   return true;
+
+   /* Is there an APIC at all? */
+   if (!boot_cpu_has(X86_FEATURE_APIC))
+   return true;
+
+   /* Deadline timer is based on TSC so no further PIT action required */
+   if (boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER))
+   return false;
+
+   /* APIC timer disabled? */
+   if (disable_apic_timer)
+   return true;
+   /*
+* The APIC timer frequency is known already, no PIT calibration
+* required. If unknown, let the PIT be initialized.
+*/
+   return lapic_timer_period == 0;
+}
+
 static int __init calibrate_APIC_clock(void)
 {
struct clock_event_device *levt = this_cpu_ptr(_events);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 53aa234a6803..1bb864798800 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -58,6 +58,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2083,6 +2084,9 @@ static inline void __init check_timer(void)
unsigned long flags;
int no_pin1 = 0;
 
+   if (!global_clock_event)
+   return;
+
local_irq_save(flags);
 
/*
diff --git a/arch/x86/kernel/i8253.c b/arch/x86/kernel/i8253.c
index 0d307a657abb..2b7999a1a50a 100644
--- a/arch/x86/kernel/i8253.c
+++ b/arch/x86/kernel/i8253.

Re: No 8254 PIT & no HPET on new Intel N3350 platforms causes kernel panic during early boot

2019-06-27 Thread Daniel Drake
On Thu, Jun 27, 2019 at 10:07 PM Thomas Gleixner  wrote:
> Nah. That extra timer works thing is just another bandaid.
>
> What I had in mind is something like the below. That's on top of
>
>git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/apic
>
> Be warned. It's neither compiled nor tested, so keep a fire extinguisher
> handy. If it explodes you own the pieces.

Thanks, it works, and makes sense!

I'll add a commit message and send it as a patch, just one quick
function naming question... did you mean apic_and_tsc_needs_pit() or
apic_needs_pit()? That's the only compile fix needed.

Daniel


Re: No 8254 PIT & no HPET on new Intel N3350 platforms causes kernel panic during early boot

2019-06-27 Thread Daniel Drake
Hi Thomas,

Picking up this issue again after a break!

We made some progress last time on reducing PIT usage in the TSC
calibration code, but we still have the bigger issue to resolve:
IO-APIC code panicing when the PIT isn't ticking.

On Wed, Apr 3, 2019 at 7:21 PM Thomas Gleixner  wrote:
> For newer CPUs we might assume that:
>
>  1) The TSC and APIC timer are actually usable
>
>  2) The frequencies can be retrieved from CPUID or MSRs
>
> If #1 and #2 are reliable we can avoid the whole calibration and interrupt
> delivery mess.
>
> That means we need the following decision logic:
>
>   1) If HPET is available in ACPI, boot normal.
>
>   2) If HPET is not available, verify that the PIT actually counts. If it
>  does, boot normal.
>
>  If it does not either:
>
>  2A) Verify that this is a PCH 300/C240 and fiddle with that ISST bit.
>
>  But that means that we need to chase PCH ids forever...
>
>  2B) Shrug and just avoid the whole PIT/HPET magic all over the place:
>
>  - Avoid the interrupt delivery check in the IOAPIC code as it's
>uninteresting in that case. Trivial to do.

I tried to explore this idea here:
https://lore.kernel.org/patchwork/patch/1064972/
https://lore.kernel.org/patchwork/patch/1064971/

But I can't say I really knew what I was doing there, and you
pointed out some problems.

Any new ideas that I can experiment with?

Being more conservative, how about something like this?
---
 arch/x86/kernel/apic/io_apic.c | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 53aa234a6803..36b1e7d5b657 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2073,7 +2073,7 @@ static int mp_alloc_timer_irq(int ioapic, int pin)
  *
  * FIXME: really need to revamp this for all platforms.
  */
-static inline void __init check_timer(void)
+static inline void __init check_timer(int timer_was_working)
 {
struct irq_data *irq_data = irq_get_irq_data(0);
struct mp_chip_data *data = irq_data->chip_data;
@@ -2216,8 +2216,15 @@ static inline void __init check_timer(void)
apic_printk(APIC_QUIET, KERN_INFO
"Perhaps problem with the pre-enabled x2apic mode\n"
"Try booting with x2apic and interrupt-remapping 
disabled in the bios.\n");
-   panic("IO-APIC + timer doesn't work!  Boot with apic=debug and send a "
-   "report.  Then try booting with the 'noapic' option.\n");
+
+   if (timer_was_working)
+   panic("IO-APIC + timer doesn't work!  Boot with apic=debug and 
send a "
+   "report.  Then try booting with the 'noapic' 
option.\n");
+   else
+   apic_printk(APIC_QUIET, KERN_INFO
+   "Continuing anyway with no 8254 timer, as it was 
not working even before IO-APIC setup was attempted.\n"
+   "Boot will fail unless another working clocksource 
is found.\n");
+
 out:
local_irq_restore(flags);
 }
@@ -2304,12 +2311,20 @@ static void ioapic_destroy_irqdomain(int idx)
 void __init setup_IO_APIC(void)
 {
int ioapic;
+   int timer_was_working = 0;
 
if (skip_ioapic_setup || !nr_ioapics)
return;
 
io_apic_irqs = nr_legacy_irqs() ? ~PIC_IRQS : ~0UL;
 
+   /*
+* Record if the timer was in working state before we do any
+* IO-APIC setup.
+*/
+   if (nr_legacy_irqs())
+   timer_was_working = timer_irq_works();
+
apic_printk(APIC_VERBOSE, "ENABLING IO-APIC IRQs\n");
for_each_ioapic(ioapic)
BUG_ON(mp_irqdomain_create(ioapic));
@@ -2323,7 +2338,7 @@ void __init setup_IO_APIC(void)
setup_IO_APIC_irqs();
init_IO_APIC_traps();
if (nr_legacy_irqs())
-   check_timer();
+   check_timer(timer_was_working);
 
ioapic_initialized = 1;
 }
-- 
2.20.1



Re: [PATCH v2] Bluetooth: btrtl: HCI reset on close for Realtek BT chip

2019-06-24 Thread Daniel Drake
On Mon, Jun 24, 2019 at 2:24 PM Jian-Hong Pan  wrote:
> Realtek RTL8822BE BT chip on ASUS X420FA cannot be turned on correctly
> after on-off several times. Bluetooth daemon sets BT mode failed when
> this issue happens.

You could also mention that scanning must be active while turning off
for this bug to be hit.

> bluetoothd[1576]: Failed to set mode: Failed (0x03)
>
> If BT is tunred off, then turned on again, it works correctly again.

Typo: turned

> According to the vendor driver, the HCI_QUIRK_RESET_ON_CLOSE flag is set
> during probing. So, this patch makes Realtek's BT reset on close to fix
> this issue.

Checked the vendor driver - I see what you are referring to, so the
change seems correct.

#if HCI_VERSION_CODE >= KERNEL_VERSION(3, 7, 1)
if (!reset)
set_bit(HCI_QUIRK_RESET_ON_CLOSE, >quirks);
RTKBT_DBG("set_bit(HCI_QUIRK_RESET_ON_CLOSE, >quirks);");
#endif

However I'm pretty sure this is not saying that kernel 3.7.0 did not
need the reset. I think it just means that the flag did not exist
before Linux-3.7.1, so they added the ifdef to add some level of
compatibility with older kernel versions. I think you can remove
"since kernel v3.7.1." from the comment.

After those changes you can add:

Link: https://bugzilla.kernel.org/show_bug.cgi?id=203429
Reviewed-by: Daniel Drake 


> Signed-off-by: Jian-Hong Pan 
> ---
> v2:
>  - According to the vendor driver, it makes "all" Realtek's BT reset on
>close. So, this version makes it the same.
>  - Change to the new subject for all Realtek BT chips.
>
>  drivers/bluetooth/btrtl.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/bluetooth/btrtl.c b/drivers/bluetooth/btrtl.c
> index 208feef63de4..be6d5f7e1e44 100644
> --- a/drivers/bluetooth/btrtl.c
> +++ b/drivers/bluetooth/btrtl.c
> @@ -630,6 +630,10 @@ int btrtl_setup_realtek(struct hci_dev *hdev)
> return PTR_ERR(btrtl_dev);
>
> ret = btrtl_download_firmware(hdev, btrtl_dev);
> +   /* According to the vendor driver, BT must be reset on close to avoid
> +* firmware crash since kernel v3.7.1.
> +*/
> +   set_bit(HCI_QUIRK_RESET_ON_CLOSE, >quirks);
>
> btrtl_free(btrtl_dev);
>
> --
> 2.22.0
>


Re: [PATCH v2 2/5] nvme: rename "pci" operations to "mmio"

2019-06-24 Thread Daniel Drake
On Mon, Jun 24, 2019 at 2:16 PM Christoph Hellwig  wrote:
> IFF we want to support it it has to be done at the PCIe layer.  But
> even that will require actual documentation and support from Intel.
>
> If Intel still believes this scheme is their magic secret to control
> the NVMe market and give themselves and unfair advantage over their
> competitors there is not much we can do.

Since the 2016 discussion, more documentation has been published:
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/300-series-chipset-pch-datasheet-vol-2.pdf
Chapter 15 is entirely new, and section 15.2 provides a nice clarity
improvement of the magic regs in the AHCI BAR, which I have used in
these patches to clean up the code and add documentation in the header
(see patch 1 in this series, ahci-remap.h).

I believe there's room for further improvement in the docs here, but
it would be nice to know what you see as the blocking questions or
documentation gaps that would prevent us from continuing to develop
the fake PCI bridge approach
(https://marc.info/?l=linux-pci=156015271021614=2). We are going
to try and push Intel on this via other channels to see if we can get
a contact to help us, so it would be useful if I can include a
concrete list of what we need.

Bearing in mind that we've already been told that the NVMe device
config space is inaccessible, and the new docs show exactly how the
BIOS enforces such inaccessibility during early boot, the remaining
points you mentioned recently were:

 b) reset handling, including the PCI device removal as the last
escalation step
 c) SR-IOV VFs and their management
 d) power management

Are there other blocking questions you would require answers to?

Thanks,
Daniel


Re: [PATCH] Bluetooth: btrtl: HCI reset on close for RTL8822BE

2019-06-23 Thread Daniel Drake
Hi Jian-Hong,

On Fri, Jun 21, 2019 at 4:59 PM Jian-Hong Pan  wrote:
> Realtek RTL8822BE BT chip on ASUS X420FA cannot be turned on correctly
> after on-off several times.  Bluetooth daemon sets BT mode failed when
> this issue happens.
>
> bluetoothd[1576]: Failed to set mode: Failed (0x03)
>
> If BT is tunred off, then turned on again, it works correctly again.
> This patch makes RTL8822BE BT reset on close to fix this issue.

I know we've been trying to understand why Realtek's own bluetooth
driver avoids this bug, but is this solution based upon code in the
vendor driver?
At a glance I can't see the flag (or equivalent) being set there.

Daniel


Re: [PATCH v2 2/5] nvme: rename "pci" operations to "mmio"

2019-06-20 Thread Daniel Drake
On Thu, Jun 20, 2019 at 2:11 PM Christoph Hellwig  wrote:
> The Linux NVMe driver will deal with NVMe as specified plus whatever
> minor tweaks we'll need for small bugs.  Hiding it behind an AHCI
> device is completely out of scope and will not be accepted.

Do you have any new suggestions for alternative ways we can implement
support for this storage configuration?

I tried to follow your earlier suggestions regarding faking a PCI bus
here: (or let me know if you had something different in mind...)
https://marc.info/?l=linux-pci=156015271021614=2
but it looks like that is not going to fly either :(

Daniel


[PATCH v2 4/5] nvme: move common definitions to pci.h

2019-06-19 Thread Daniel Drake
From: Dan Williams 

A platform-driver for nvme resources needs access to struct nvme_dev and
other definitions that are currently local to pci.c.

Signed-off-by: Dan Williams 
Signed-off-by: Daniel Drake 
---
 drivers/nvme/host/pci.c | 125 +---
 drivers/nvme/host/pci.h | 136 
 2 files changed, 137 insertions(+), 124 deletions(-)
 create mode 100644 drivers/nvme/host/pci.h

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 23bda524f16b..bed6c91b6b7c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -26,6 +26,7 @@
 
 #include "trace.h"
 #include "nvme.h"
+#include "pci.h"
 
 #define SQ_SIZE(depth) (depth * sizeof(struct nvme_command))
 #define CQ_SIZE(depth) (depth * sizeof(struct nvme_completion))
@@ -83,97 +84,9 @@ static int poll_queues = 0;
 module_param_cb(poll_queues, _count_ops, _queues, 0644);
 MODULE_PARM_DESC(poll_queues, "Number of queues to use for polled IO.");
 
-struct nvme_dev;
-struct nvme_queue;
-
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
 static bool __nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode);
 
-struct nvme_dev_ops {
-   /* Enable device (required) */
-   int (*enable)(struct nvme_dev *dev);
-
-   /* Disable device (required) */
-   void (*disable)(struct nvme_dev *dev);
-
-   /* Allocate IRQ vectors for given number of io queues (required) */
-   int (*setup_irqs)(struct nvme_dev *dev, int nr_io_queues);
-
-   /* Get the IRQ vector for a specific queue */
-   int (*q_irq)(struct nvme_queue *q);
-
-   /* Allocate device-specific SQ command buffer (optional) */
-   int (*cmb_alloc_sq_cmds)(struct nvme_queue *nvmeq, size_t size,
-struct nvme_command **sq_cmds,
-dma_addr_t *sq_dma_addr);
-
-   /* Free device-specific SQ command buffer (optional) */
-   void (*cmb_free_sq_cmds)(struct nvme_queue *nvmeq,
-struct nvme_command *sq_cmds, size_t size);
-
-   /* Device-specific mapping of blk queues to CPUs (optional) */
-   int (*map_queues)(struct nvme_dev *dev, struct blk_mq_queue_map *map,
- int offset);
-
-   /* Check if device is enabled on the bus (required) */
-   int (*is_enabled)(struct nvme_dev *dev);
-
-   /* Check if channel is in running state (required) */
-   int (*is_offline)(struct nvme_dev *dev);
-
-   /* Check if device is present and responding (optional) */
-   bool (*is_present)(struct nvme_dev *dev);
-
-   /* Check & log device state before it gets reset (optional) */
-   void (*warn_reset)(struct nvme_dev *dev);
-};
-
-/*
- * Represents an NVM Express device.  Each nvme_dev is a PCI function.
- */
-struct nvme_dev {
-   const struct resource *res;
-   const struct nvme_dev_ops *ops;
-   struct nvme_queue *queues;
-   struct blk_mq_tag_set tagset;
-   struct blk_mq_tag_set admin_tagset;
-   u32 __iomem *dbs;
-   struct device *dev;
-   struct dma_pool *prp_page_pool;
-   struct dma_pool *prp_small_pool;
-   unsigned online_queues;
-   unsigned max_qid;
-   unsigned io_queues[HCTX_MAX_TYPES];
-   unsigned int num_vecs;
-   int q_depth;
-   u32 db_stride;
-   void __iomem *bar;
-   unsigned long bar_mapped_size;
-   struct work_struct remove_work;
-   struct mutex shutdown_lock;
-   bool subsystem;
-   u64 cmb_size;
-   bool cmb_use_sqes;
-   u32 cmbsz;
-   u32 cmbloc;
-   struct nvme_ctrl ctrl;
-
-   mempool_t *iod_mempool;
-
-   /* shadow doorbell buffer support: */
-   u32 *dbbuf_dbs;
-   dma_addr_t dbbuf_dbs_dma_addr;
-   u32 *dbbuf_eis;
-   dma_addr_t dbbuf_eis_dma_addr;
-
-   /* host memory buffer support: */
-   u64 host_mem_size;
-   u32 nr_host_mem_descs;
-   dma_addr_t host_mem_descs_dma;
-   struct nvme_host_mem_buf_desc *host_mem_descs;
-   void **host_mem_desc_bufs;
-};
-
 static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
 {
int n = 0, ret;
@@ -213,42 +126,6 @@ static inline struct nvme_dev *to_nvme_dev(struct 
nvme_ctrl *ctrl)
return container_of(ctrl, struct nvme_dev, ctrl);
 }
 
-/*
- * An NVM Express queue.  Each device has at least two (one for admin
- * commands and one for I/O commands).
- */
-struct nvme_queue {
-   struct nvme_dev *dev;
-   char irqname[24];   /* nvme4294967295-65535\0 */
-   spinlock_t sq_lock;
-   struct nvme_command *sq_cmds;
-/* only used for poll queues: */
-   spinlock_t cq_poll_lock cacheline_aligned_in_smp;
-   volatile struct nvme_completion *cqes;
-   struct blk_mq_tags **tags;
-   dma_addr_t sq_dma_addr;
-   dma_addr_t cq_dma_addr;
-   u32 __iomem *q_db;
-   u16

[PATCH v2 5/5] nvme: Intel AHCI remap support

2019-06-19 Thread Daniel Drake
Provide a platform driver for the nvme resources that may be remapped
behind an ahci bar on common Intel platforms. The implementation relies
on the standard nvme driver, but reimplements the nvme_dev_ops accordingly.

As the original NVMe PCI device is inaccessible, this driver is somewhat
limited: we always assume the device is present & online, can't
detect PCI errors, can't reset, power management is limited, etc.

A single shared legacy interrupt is used, although there is some
hope that MSI-X support could be added later.

Based on previous code by Dan Williams.

Signed-off-by: Daniel Drake 
---
 drivers/ata/Kconfig  |   1 +
 drivers/nvme/host/Kconfig|   3 +
 drivers/nvme/host/Makefile   |   3 +
 drivers/nvme/host/intel-ahci-remap.c | 185 +++
 drivers/nvme/host/pci.c  |  21 +--
 drivers/nvme/host/pci.h  |   9 ++
 6 files changed, 214 insertions(+), 8 deletions(-)
 create mode 100644 drivers/nvme/host/intel-ahci-remap.c

diff --git a/drivers/ata/Kconfig b/drivers/ata/Kconfig
index 6e82d66d7516..fb64e690d325 100644
--- a/drivers/ata/Kconfig
+++ b/drivers/ata/Kconfig
@@ -113,6 +113,7 @@ config SATA_AHCI_INTEL_NVME_REMAP
bool "AHCI: Intel Remapped NVMe device support"
depends on SATA_AHCI
depends on BLK_DEV_NVME
+   select NVME_INTEL_AHCI_REMAP
help
  Support access to remapped NVMe devices that appear in AHCI PCI
  memory space.
diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
index ec43ac9199e2..a8aefb18eb15 100644
--- a/drivers/nvme/host/Kconfig
+++ b/drivers/nvme/host/Kconfig
@@ -26,6 +26,9 @@ config NVME_MULTIPATH
 config NVME_FABRICS
tristate
 
+config NVME_INTEL_AHCI_REMAP
+   tristate
+
 config NVME_RDMA
tristate "NVM Express over Fabrics RDMA host driver"
depends on INFINIBAND && INFINIBAND_ADDR_TRANS && BLOCK
diff --git a/drivers/nvme/host/Makefile b/drivers/nvme/host/Makefile
index 8a4b671c5f0c..2010169880b7 100644
--- a/drivers/nvme/host/Makefile
+++ b/drivers/nvme/host/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_NVME_FABRICS)  += nvme-fabrics.o
 obj-$(CONFIG_NVME_RDMA)+= nvme-rdma.o
 obj-$(CONFIG_NVME_FC)  += nvme-fc.o
 obj-$(CONFIG_NVME_TCP) += nvme-tcp.o
+obj-$(CONFIG_NVME_INTEL_AHCI_REMAP)+= nvme-intel-ahci-remap.o
 
 nvme-core-y:= core.o
 nvme-core-$(CONFIG_TRACING)+= trace.o
@@ -24,3 +25,5 @@ nvme-rdma-y   += rdma.o
 nvme-fc-y  += fc.o
 
 nvme-tcp-y += tcp.o
+
+nvme-intel-ahci-remap-y+= intel-ahci-remap.o
diff --git a/drivers/nvme/host/intel-ahci-remap.c 
b/drivers/nvme/host/intel-ahci-remap.c
new file mode 100644
index ..7194d9dd0016
--- /dev/null
+++ b/drivers/nvme/host/intel-ahci-remap.c
@@ -0,0 +1,185 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Intel AHCI remapped NVMe platform driver
+ *
+ * Copyright (c) 2011-2016, Intel Corporation.
+ * Copyright (c) 2019, Endless Mobile, Inc.
+ *
+ * Support platform devices created by the ahci driver, corresponding to
+ * NVMe devices that have been remapped into the ahci device memory space.
+ *
+ * This scheme is rather peculiar, as NVMe is inherently based on PCIe,
+ * however we only have access to the NVMe device MMIO space and an
+ * interrupt. Without access to the pci_device, many features are
+ * unavailable; this driver only intends to offer basic functionality.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include "pci.h"
+
+struct ahci_remap_data {
+   atomic_t enabled;
+};
+
+static struct ahci_remap_data *to_ahci_remap_data(struct nvme_dev *dev)
+{
+   return dev->dev->platform_data;
+}
+
+static int ahci_remap_enable(struct nvme_dev *dev)
+{
+   int rc;
+   struct resource *res;
+   struct device *ddev = dev->dev;
+   struct device *parent = ddev->parent;
+   struct ahci_remap_data *adata = to_ahci_remap_data(dev);
+   struct platform_device *pdev = to_platform_device(ddev);
+
+   res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
+   if (!res)
+   return -ENXIO;
+
+   /* parent ahci device determines the dma mask */
+   if (dma_supported(parent, DMA_BIT_MASK(64)))
+   rc = dma_coerce_mask_and_coherent(ddev, DMA_BIT_MASK(64));
+   else if (dma_supported(parent, DMA_BIT_MASK(32)))
+   rc = dma_coerce_mask_and_coherent(ddev, DMA_BIT_MASK(32));
+   else
+   rc = -ENXIO;
+   if (rc)
+   return rc;
+
+   rc = nvme_enable(dev);
+   if (rc)
+   return rc;
+
+   atomic_inc(>enabled);
+
+   return 0;
+}
+
+static int ahci_remap_is_enabled(struct nvme_dev *dev)
+{
+   struct ahci_remap_data *adata = to_ahci_remap_da

[PATCH v2 2/5] nvme: rename "pci" operations to "mmio"

2019-06-19 Thread Daniel Drake
From: Dan Williams 

In preparation for adding a platform_device nvme host, rename to a more
generic "mmio" prefix.

Signed-off-by: Dan Williams 
Signed-off-by: Daniel Drake 
---
 drivers/nvme/host/pci.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 524d6bd6d095..42990b93349d 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1108,7 +1108,7 @@ static int nvme_poll(struct blk_mq_hw_ctx *hctx)
return found;
 }
 
-static void nvme_pci_submit_async_event(struct nvme_ctrl *ctrl)
+static void nvme_mmio_submit_async_event(struct nvme_ctrl *ctrl)
 {
struct nvme_dev *dev = to_nvme_dev(ctrl);
struct nvme_queue *nvmeq = >queues[0];
@@ -2448,7 +2448,7 @@ static void nvme_release_prp_pools(struct nvme_dev *dev)
dma_pool_destroy(dev->prp_small_pool);
 }
 
-static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
+static void nvme_mmio_free_ctrl(struct nvme_ctrl *ctrl)
 {
struct nvme_dev *dev = to_nvme_dev(ctrl);
 
@@ -2610,42 +2610,42 @@ static void nvme_remove_dead_ctrl_work(struct 
work_struct *work)
nvme_put_ctrl(>ctrl);
 }
 
-static int nvme_pci_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val)
+static int nvme_mmio_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val)
 {
*val = readl(to_nvme_dev(ctrl)->bar + off);
return 0;
 }
 
-static int nvme_pci_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val)
+static int nvme_mmio_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val)
 {
writel(val, to_nvme_dev(ctrl)->bar + off);
return 0;
 }
 
-static int nvme_pci_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val)
+static int nvme_mmio_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val)
 {
*val = readq(to_nvme_dev(ctrl)->bar + off);
return 0;
 }
 
-static int nvme_pci_get_address(struct nvme_ctrl *ctrl, char *buf, int size)
+static int nvme_mmio_get_address(struct nvme_ctrl *ctrl, char *buf, int size)
 {
struct pci_dev *pdev = to_pci_dev(to_nvme_dev(ctrl)->dev);
 
return snprintf(buf, size, "%s", dev_name(>dev));
 }
 
-static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
+static const struct nvme_ctrl_ops nvme_mmio_ctrl_ops = {
.name   = "pcie",
.module = THIS_MODULE,
.flags  = NVME_F_METADATA_SUPPORTED |
  NVME_F_PCI_P2PDMA,
-   .reg_read32 = nvme_pci_reg_read32,
-   .reg_write32= nvme_pci_reg_write32,
-   .reg_read64 = nvme_pci_reg_read64,
-   .free_ctrl  = nvme_pci_free_ctrl,
-   .submit_async_event = nvme_pci_submit_async_event,
-   .get_address= nvme_pci_get_address,
+   .reg_read32 = nvme_mmio_reg_read32,
+   .reg_write32= nvme_mmio_reg_write32,
+   .reg_read64 = nvme_mmio_reg_read64,
+   .free_ctrl  = nvme_mmio_free_ctrl,
+   .submit_async_event = nvme_mmio_submit_async_event,
+   .get_address= nvme_mmio_get_address,
 };
 
 static int nvme_dev_map(struct nvme_dev *dev)
@@ -2758,7 +2758,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
goto release_pools;
}
 
-   result = nvme_init_ctrl(>ctrl, >dev, _pci_ctrl_ops,
+   result = nvme_init_ctrl(>ctrl, >dev, _mmio_ctrl_ops,
quirks);
if (result)
goto release_mempool;
-- 
2.20.1



[PATCH v2 1/5] ahci: Discover Intel remapped NVMe devices

2019-06-19 Thread Daniel Drake
Intel SATA AHCI controllers support a strange mode where NVMe devices
disappear from the PCI bus, and instead are remapped into AHCI PCI memory
space.

Many current and upcoming consumer products ship with the AHCI controller
in this "RAID" or "Intel RST Premium with Intel Optane System Acceleration"
mode by default. Without Linux support for this remapped mode,
the default out-of-the-box experience is that the NVMe storage device
is inaccessible (which in many cases is the only internal storage device).

Using partial information provided by Intel in datasheets, emails,
and previous patches, extend the AHCI driver to detect the remapped NVMe
devices and create corresponding platform devices, to be picked up
by the nvme driver.

Our knowledge of the design and workings of this remapping scheme
has been collected in ahci-remap.h, which can be considered the best
spec we have at the moment.

Based on earlier work by Dan Williams.

Signed-off-by: Daniel Drake 
---
 drivers/ata/Kconfig|  32 +++
 drivers/ata/ahci.c | 173 -
 drivers/ata/ahci.h |  14 +++
 include/linux/ahci-remap.h | 140 +++---
 4 files changed, 329 insertions(+), 30 deletions(-)

diff --git a/drivers/ata/Kconfig b/drivers/ata/Kconfig
index a6beb2c5a692..6e82d66d7516 100644
--- a/drivers/ata/Kconfig
+++ b/drivers/ata/Kconfig
@@ -109,6 +109,38 @@ config SATA_MOBILE_LPM_POLICY
  Note "Minimum power" is known to cause issues, including disk
  corruption, with some disks and should not be used.
 
+config SATA_AHCI_INTEL_NVME_REMAP
+   bool "AHCI: Intel Remapped NVMe device support"
+   depends on SATA_AHCI
+   depends on BLK_DEV_NVME
+   help
+ Support access to remapped NVMe devices that appear in AHCI PCI
+ memory space.
+
+ You'll need this in order to access your NVMe storage if you are
+ running an Intel AHCI controller in "RAID" or "Intel RST Premium
+ with Intel Optane System Acceleration" mode. This is the default
+ configuration of many consumer products. If you have storage devices
+ being affected by this, you'll have noticed that such devices are
+ absent, and you'll see a warning in your kernel logs about remapped
+ NVMe devices.
+
+ Instead of enabling this option, it is recommended to go into the
+ BIOS menu and change the SATA device into "AHCI" mode in order to
+ gain access to the affected devices, while also enjoying all
+ available NVMe features and performance.
+
+ However, if you do want to access the NVMe devices in remapped
+ mode, say Y. Negative consequences of remapped device access
+ include:
+ - No NVMe device power management
+ - No NVMe reset support
+ - No NVMe quirks based on PCI ID
+ - No SR-IOV VFs
+ - Reduced performance through a shared, legacy interrupt
+
+ If unsure, say N.
+
 config SATA_AHCI_PLATFORM
tristate "Platform AHCI SATA support"
help
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index f7652baa6337..b58316347539 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1499,11 +1500,11 @@ static irqreturn_t ahci_thunderx_irq_handler(int irq, 
void *dev_instance)
 }
 #endif
 
-static void ahci_remap_check(struct pci_dev *pdev, int bar,
+static int ahci_remap_init(struct pci_dev *pdev, int bar,
struct ahci_host_priv *hpriv)
 {
int i, count = 0;
-   u32 cap;
+   u32 supported_devs;
 
/*
 * Check if this device might have remapped nvme devices.
@@ -1511,33 +1512,68 @@ static void ahci_remap_check(struct pci_dev *pdev, int 
bar,
if (pdev->vendor != PCI_VENDOR_ID_INTEL ||
pci_resource_len(pdev, bar) < SZ_512K ||
bar != AHCI_PCI_BAR_STANDARD ||
-   !(readl(hpriv->mmio + AHCI_VSCAP) & 1))
-   return;
+   !(readl(hpriv->mmio + AHCI_VS_CAP) & AHCI_VS_CAP_NRMBE))
+   return -ENODEV;
 
-   cap = readq(hpriv->mmio + AHCI_REMAP_CAP);
-   for (i = 0; i < AHCI_MAX_REMAP; i++) {
-   if ((cap & (1 << i)) == 0)
+   supported_devs = readl(hpriv->mmio + AHCI_REMAP_RCR_L)
+& AHCI_REMAP_RCR_L_NRS_MASK;
+   for_each_set_bit(i, (unsigned long *)_devs, AHCI_MAX_REMAP) {
+   struct ahci_remap *rdev;
+   u32 dcc;
+
+   /* Check that the remapped device is NVMe */
+   dcc = readl(hpriv->mmio + ahci_remap_dcc(i));
+   if ((dcc & AHCI_REMAP_DCC_DT) != AHCI_REMAP_DCC_DT_NVME)
continue;
-   if (readl(hpriv->mm

[PATCH v2 3/5] nvme: introduce nvme_dev_ops

2019-06-19 Thread Daniel Drake
In preparation for a platform device nvme driver, move the bus specific
portions of nvme to nvme_dev_ops, or otherwise rewrite routines to use a
generic 'struct device' instead of 'struct pci_dev'.

Based on earlier work by Dan Williams.

Signed-off-by: Daniel Drake 
---
 drivers/nvme/host/pci.c | 410 +++-
 1 file changed, 275 insertions(+), 135 deletions(-)

 I took Dan William's earlier patch here and refreshed it for the
latest nvme driver, which has gained a few more places where it uses
the PCI device, so nvme_dev_ops grew a bit more.

Is this a suitable way of handling this case? It feels a little
unclean to have both the NVMe host layer and the PCI-specific dev ops
in the same file. Maybe it makes sense because NVMe is inherently a PCI
thing under normal circumstances? Or would it be cleaner for me to
rename "pci.c" to "mmio.c" and then separate the pci dev ops into
a new "pci.c"?

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 42990b93349d..23bda524f16b 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -89,10 +89,51 @@ struct nvme_queue;
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
 static bool __nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode);
 
+struct nvme_dev_ops {
+   /* Enable device (required) */
+   int (*enable)(struct nvme_dev *dev);
+
+   /* Disable device (required) */
+   void (*disable)(struct nvme_dev *dev);
+
+   /* Allocate IRQ vectors for given number of io queues (required) */
+   int (*setup_irqs)(struct nvme_dev *dev, int nr_io_queues);
+
+   /* Get the IRQ vector for a specific queue */
+   int (*q_irq)(struct nvme_queue *q);
+
+   /* Allocate device-specific SQ command buffer (optional) */
+   int (*cmb_alloc_sq_cmds)(struct nvme_queue *nvmeq, size_t size,
+struct nvme_command **sq_cmds,
+dma_addr_t *sq_dma_addr);
+
+   /* Free device-specific SQ command buffer (optional) */
+   void (*cmb_free_sq_cmds)(struct nvme_queue *nvmeq,
+struct nvme_command *sq_cmds, size_t size);
+
+   /* Device-specific mapping of blk queues to CPUs (optional) */
+   int (*map_queues)(struct nvme_dev *dev, struct blk_mq_queue_map *map,
+ int offset);
+
+   /* Check if device is enabled on the bus (required) */
+   int (*is_enabled)(struct nvme_dev *dev);
+
+   /* Check if channel is in running state (required) */
+   int (*is_offline)(struct nvme_dev *dev);
+
+   /* Check if device is present and responding (optional) */
+   bool (*is_present)(struct nvme_dev *dev);
+
+   /* Check & log device state before it gets reset (optional) */
+   void (*warn_reset)(struct nvme_dev *dev);
+};
+
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
  */
 struct nvme_dev {
+   const struct resource *res;
+   const struct nvme_dev_ops *ops;
struct nvme_queue *queues;
struct blk_mq_tag_set tagset;
struct blk_mq_tag_set admin_tagset;
@@ -178,6 +219,7 @@ static inline struct nvme_dev *to_nvme_dev(struct nvme_ctrl 
*ctrl)
  */
 struct nvme_queue {
struct nvme_dev *dev;
+   char irqname[24];   /* nvme4294967295-65535\0 */
spinlock_t sq_lock;
struct nvme_command *sq_cmds;
 /* only used for poll queues: */
@@ -384,6 +426,11 @@ static unsigned int nvme_pci_iod_alloc_size(struct 
nvme_dev *dev,
return alloc_size + sizeof(struct scatterlist) * nseg;
 }
 
+static int nvme_pci_q_irq(struct nvme_queue *nvmeq)
+{
+   return pci_irq_vector(to_pci_dev(nvmeq->dev->dev), nvmeq->cq_vector);
+}
+
 static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
unsigned int hctx_idx)
 {
@@ -444,7 +491,14 @@ static int queue_irq_offset(struct nvme_dev *dev)
return 0;
 }
 
-static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
+static int nvme_pci_map_queues(struct nvme_dev *dev,
+  struct blk_mq_queue_map *map,
+  int offset)
+{
+   return blk_mq_pci_map_queues(map, to_pci_dev(dev->dev), offset);
+}
+
+static int nvme_map_queues(struct blk_mq_tag_set *set)
 {
struct nvme_dev *dev = set->driver_data;
int i, qoff, offset;
@@ -464,8 +518,8 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 * affinity), so use the regular blk-mq cpu mapping
 */
map->queue_offset = qoff;
-   if (i != HCTX_TYPE_POLL && offset)
-   blk_mq_pci_map_queues(map, to_pci_dev(dev->dev), 
offset);
+   if (i != HCTX_TYPE_POLL && offset && dev->ops->map_queues)
+   dev->ops->map_queues(dev, map, offset);
 

[PATCH v2 0/5] Support Intel AHCI remapped NVMe devices

2019-06-19 Thread Daniel Drake
Intel SATA AHCI controllers support a strange mode where NVMe devices
disappear from the PCI bus, and instead are remapped into AHCI PCI memory
space.

Many current and upcoming consumer products ship with the AHCI controller
in this "RAID" or "Intel RST Premium with Intel Optane System Acceleration"
mode by default. Without Linux support for this remapped mode,
the default out-of-the-box experience is that the NVMe storage device
is inaccessible (which in many cases is the only internal storage device).

In most cases, the SATA configuration can be changed in the BIOS menu to
"AHCI", resulting in the AHCI & NVMe devices appearing as separate
devices as you would ordinarily expect. Changing this configuration
is the recommendation for power users because there are several limitations
of the remapped mode (now documented in Kconfig help text).

However, it's also important to support the remapped mode given that
it is an increasingly common product default. We cannot expect ordinary
users of consumer PCs to find out about this situation and then
confidently go into the BIOS menu to change options.

This patch set implements support for the remapped mode.

v1 of these patches was originally posted by Dan Williams in 2016.
https://marc.info/?l=linux-ide=147709610621480=2
Since then:

 - Intel stopped developing these patches & hasn't been responding to
   my emails on this topic.

 - More register documentation appeared in
   
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/300-series-chipset-pch-datasheet-vol-2.pdf

 - I tried Christoph's suggestion of exposing the devices on a fake PCI bus,
   hence not requiring NVMe driver changes, but Bjorn Helgaas does not think
   it's the right approach and instead recommends the approach taken here.
   https://marc.info/?l=linux-pci=156034736822205=2

 - More consumer devices have appeared with this setting as the default,
   and with the decreasing cost of NVMe storage, it appears that a whole
   bunch more consumer PC products currently in development are going to
   ship in RAID/remapped mode, with only a single NVMe disk, which Linux
   will otherwise be unable to access by default.

 - We heard from hardware vendors that this Linux incompatibility is
   causing them to consider discontinuing Linux support on affected
   products. Changing the BIOS setting is too much of a logistics
   challenge.

 - I updated Dan's patches for current kernels. I added docs and references
   and incorporated the new register info. I incorporated feedback to push
   the recommendation that the user goes back to AHCI mode via the BIOS
   setting (in kernel logs and Kconfig help). And made some misc minor
   changes that I think are sensible.

 - I investigated MSI-X support. Can't quite get it working, but I'm hopeful
   that we can figure it out and add it later. With these patches shared
   I'll follow up with more details about that. With the focus on
   compatibility with default configuration of common consumer products,
   I'm hoping we could land an initial version without MSI support before
   tending to those complications.

Dan Williams (2):
  nvme: rename "pci" operations to "mmio"
  nvme: move common definitions to pci.h

Daniel Drake (3):
  ahci: Discover Intel remapped NVMe devices
  nvme: introduce nvme_dev_ops
  nvme: Intel AHCI remap support

 drivers/ata/Kconfig  |  33 ++
 drivers/ata/ahci.c   | 173 --
 drivers/ata/ahci.h   |  14 +
 drivers/nvme/host/Kconfig|   3 +
 drivers/nvme/host/Makefile   |   3 +
 drivers/nvme/host/intel-ahci-remap.c | 185 ++
 drivers/nvme/host/pci.c  | 490 ++-
 drivers/nvme/host/pci.h  | 145 
 include/linux/ahci-remap.h   | 140 +++-
 9 files changed, 922 insertions(+), 264 deletions(-)
 create mode 100644 drivers/nvme/host/intel-ahci-remap.c
 create mode 100644 drivers/nvme/host/pci.h

-- 
2.20.1



Re: [RFC PATCH v3] rtl8xxxu: Improve TX performance of RTL8723BU on rtl8xxxu driver

2019-05-30 Thread Daniel Drake
On Wed, May 29, 2019 at 10:52 PM Chris Chiu  wrote:
> You mean moving the ratr_index to be the 4th function parameter of
> update_rate_mask which has 2 candidates rtl8xxxu_update_rate_mask
> and rtl8xxxu_gen2_update_rate_mask? I was planning to keep the
> rtl8xxxu_update_rate_mask the same because old chips seems don't
> need the rate index when invoking H2C command to change rate mask.
> And rate index is not a common phrase/term for rate adaptive. Theoretically
> we just need packet error rate, sgi and other factors to determine the rate
> mask. This rate index seems to be only specific to newer Realtek drivers
> or firmware for rate adaptive algorithm.  I'd like to keep this for gen2 but
> I admit it's ugly to put it in the priv structure. Any suggestion is
> appreciated.

I think it's cleaner to have it as a function parameter, even if the
old chips don't use it.
The rest of the implementation is in the core, so we aren't exactly
dealing with chip-specific code here, at least the way it's currently
done.
The vendor driver also has it as a function parameter from what I can see.

Daniel


Re: [RFC PATCH v3] rtl8xxxu: Improve TX performance of RTL8723BU on rtl8xxxu driver

2019-05-29 Thread Daniel Drake
Hi Chris,

On Tue, May 28, 2019 at 11:03 PM Chris Chiu  wrote:
> +   /*
> +* Single virtual interface permitted since the driver supports 
> STATION
> +* mode only.

I think you can be a bit more explicit by saying e.g.:

Only one virtual interface permitted because only STA mode is
supported and no iface_combinations are provided.

> diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c 
> b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
> index 039e5ca9d2e4..2d612c2df5b2 100644
> --- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
> +++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
> @@ -4345,7 +4345,7 @@ void rtl8xxxu_gen2_update_rate_mask(struct 
> rtl8xxxu_priv *priv,
> h2c.b_macid_cfg.ramask3 = (ramask >> 24) & 0xff;
>
> h2c.ramask.arg = 0x80;
> -   h2c.b_macid_cfg.data1 = 0;
> +   h2c.b_macid_cfg.data1 = priv->ratr_index;

I think ratr_index can be moved to be a function parameter of the
update_rate_mask function. It looks like all callsites already know
which value they want to set. Then you don't have to store it in the
priv structure.

> @@ -5471,6 +5509,10 @@ static int rtl8xxxu_add_interface(struct ieee80211_hw 
> *hw,
>
> switch (vif->type) {
> case NL80211_IFTYPE_STATION:
> +   if (!priv->vif)
> +   priv->vif = vif;
> +   else
> +   return -EOPNOTSUPP;
> rtl8xxxu_stop_tx_beacon(priv);

rtl8xxxu_remove_interface should also set priv->vif back to NULL.

> @@ -6183,6 +6259,8 @@ static void rtl8xxxu_disconnect(struct usb_interface 
> *interface)
> mutex_destroy(>usb_buf_mutex);
> mutex_destroy(>h2c_mutex);
>
> +   cancel_delayed_work_sync(>ra_watchdog);

Given that the work was started in rtl8xxxu_start, I think it should
be cancelled in rtl8xxxu_stop() instead.

Daniel


Re: [RFC PATCH 2/2] rtl8xxxu: Add watchdog to update rate mask by signal strength

2019-05-27 Thread Daniel Drake
On Mon, May 27, 2019 at 12:38 AM Chris Chiu  wrote:
> The -EBUSY is returned by the ieee80211_check_combinations() in the
> ieee80211_check_concurrent_iface() function which is invoked each time
> doing ieee80211_open().
> The ieee80211_check_combinations() returns the -EBUSY because of
> cfg80211_check_combinations() will iterate all interfaces of different types
> then checks the combination is valid or not, which in this case the number
> of interface combination accumulated by cfg80211_iter_sum_ifcombos is 0
> when I'm trying to bring up the second station interface.

Thanks for clarifying. I see, rtl8xxxu does not provide any
iface_combinations so the default is to reject parallel interfaces.

Given that we can now confidently say that multiple interfaces are not
supported, I think that inside rtl8xxxu_add_interface() you could
store a pointer to the vif inside the rtl8xxxu_priv structure (and
clear it in rtl8xxxu_remove_interface). Plus for clarity, add a
comment pointing out that only a single interface is supported, and
make rtl8xxxu_add_interface() acually fail if we already had a
non-NULL pointer stored in the priv structure.

Then you can simplify the patch to remove the ugly storing of vif
inside rtl8xxxu_watchdog. You can store the delayed_work in the main
priv struct too, dropping rtl8xxxu_watchdog altogether.

Daniel


Re: [PATCH v4 04/13] platform/x86: wmi: Add function to get _UID of WMI device

2019-05-24 Thread Daniel Drake
On Tue, May 14, 2019 at 12:59 PM Yurii Pavlovskyi
 wrote:
>
> Add a new function to acpi.h / wmi.c that returns _UID of the ACPI WMI
> device. For example, it returns "ATK" for the following declaration in
> DSDT:
> Device (ATKD)
> {
> Name (_HID, "PNP0C14" /* Windows Management Instrumentation Device */)
>   // _HID: Hardware ID
> Name (_UID, "ATK")  // _UID: Unique ID
> ..
>
> Generally, it is possible that multiple PNP0C14 ACPI devices are present in
> the system as mentioned in the commit message of commit bff431e49ff5
> ("ACPI: WMI: Add ACPI-WMI mapping driver").
>
> Therefore the _UID is returned for a specific ACPI device that declares the
> given GUID, to which it is also mapped by other methods of wmi module.
>
> Signed-off-by: Yurii Pavlovskyi 

Some extra background may be useful when reviewing this.

As researched in
https://marc.info/?l=linux-kernel=155498017207933=2, we are
dealing with a tricky situation.

asus-wmi currently serves two different classes of device: eeepci-wmi
and asus-nb-wmi.

The eeepci devices have:
  _WDG : includes a METHOD block with GUID ASUS_WMI_MGMT_GUID, and an
EVENT block with GUID EEEPC_WMI_EVENT_GUID
 _UID : ASUSWMI

The asus-nb-wmi devices have:
 _ WDG : includes a METHOD block with GUID ASUS_WMI_MGMT_GUID (same as
eeepc), and an EVENT block with GUID ASUS_NB_WMI_EVENT_GUID
 _UID : ATK

To support new devices we now need to start concretely identifying
which of these we are working with. But complications include:
 - The main MGMT_GUID used for matching at the moment is shared over
both device types
 - Some Asus products have both of these (via two separate two
separate PNP0C14 WMI devices).

Currently eeepci-wmi and asus-nb-wmi register themselves with
asus-wmi, which registers a platform device for each one. The platform
dev probe then succeeds the platform device probe when it finds any
_WDG entry for the main MGMT_GUID and the _WDG entry for the
corresponding event GUID (not necessarily as part of the same
underlying ACPI Device). In the case of both devices being present
with duplicate MGMT, the first one that is parsed wins, and the other
is ignored (see guid_already_parsed()).

Sticking with the current approach, which is imperfect for devices
that have both devices, adding a method to detect the _UID seems
reasonable. Although actually I just realised you could probably also
detect the difference by using wmi_has_guid() on the EVENT UUID
without having to add a new function.

I'll keep thinking about how to improve the situation around two
devices present (but don't want to needlessly conflate this patch
series with that).

Daniel


Re: [PATCH v4 03/13] platform/x86: asus-wmi: Increase input buffer size of WMI methods

2019-05-24 Thread Daniel Drake
On Tue, May 14, 2019 at 12:54 PM Yurii Pavlovskyi
 wrote:
>
> The asus-nb-wmi driver is matched by WMI alias but fails to load on TUF
> Gaming series laptops producing multiple ACPI errors in the kernel log.
>
> The input buffer for WMI method invocation size is 2 dwords, whereas
> 3 are expected by this model.
>
> FX505GM:
> ..
> Method (WMNB, 3, Serialized)
> {
> P8XH (Zero, 0x11)
> CreateDWordField (Arg2, Zero, IIA0)
> CreateDWordField (Arg2, 0x04, IIA1)
> CreateDWordField (Arg2, 0x08, IIA2)
> Local0 = (Arg1 & 0x)
> ...
>
> Compare with older K54C:
> ...
> Method (WMNB, 3, NotSerialized)
> {
> CreateDWordField (Arg2, 0x00, IIA0)
> CreateDWordField (Arg2, 0x04, IIA1)
> Local0 = (Arg1 & 0x)
> ...
>
> Increase buffer size to 3 dwords. No negative consequences of this change
> are expected, as the input buffer size is not verified. The original
> function is replaced by a wrapper for a new method passing value 0 for the
> last parameter. The new function will be used to control RGB keyboard
> backlight.
>
> Signed-off-by: Yurii Pavlovskyi 

Reviewed-by: Daniel Drake 


Re: [PATCH v4 02/13] platform/x86: asus-wmi: Fix preserving keyboard backlight intensity on load

2019-05-24 Thread Daniel Drake
On Tue, May 14, 2019 at 12:51 PM Yurii Pavlovskyi
 wrote:
>
> The error code and return value are mixed up. The intensity is always set
> to 0 on load as kbd_led_read returns either 0 or negative value. To
> reproduce set backlight to maximum, reload driver and try to increase it
> using keyboard hotkey, the intensity will drop as a result. Correct the
> implementation.
>
> Signed-off-by: Yurii Pavlovskyi 

Reviewed-by: Daniel Drake 


Re: [PATCH v4 01/13] platform/x86: asus-wmi: Fix hwmon device cleanup

2019-05-24 Thread Daniel Drake
On Tue, May 14, 2019 at 12:50 PM Yurii Pavlovskyi
 wrote:
>
> The driver does not clean up the hwmon device on exit or error. To
> reproduce the bug, repeat rmmod, insmod to verify that device number
> /sys/devices/platform/asus-nb-wmi/hwmon/hwmon?? grows every time. Replace
> call for registering device with devm_* version that unregisters it
> automatically.
>
> Signed-off-by: Yurii Pavlovskyi 

Reviewed-by: Daniel Drake 


Re: [RFC PATCH 2/2] rtl8xxxu: Add watchdog to update rate mask by signal strength

2019-05-21 Thread Daniel Drake
On Fri, May 10, 2019 at 2:37 AM Chris Chiu  wrote:
> I've verified that multiple virtual interface can not work simultaneously in
> STA mode. I assigned different mac address for different vifs, I can only
> bring only one interface up. If I want to bring the second vif up, it always
> complains "SIOCSIFFLAGS: Device or resource busy".

Interesting. Can you go deeper into that so that we can be more
confident of this limitation?

ieee80211_open() is the starting point.
ieee80211_check_concurrent_iface() is one candidate to generate -EBUSY
but from inspection, I don't think that's happening in this case,
perhaps you can keep following through in order to figure out which
part of the code is not allowing the 2nd STA interface to come up.

Daniel


Re: [RFC PATCH 2/2] rtl8xxxu: Add watchdog to update rate mask by signal strength

2019-05-09 Thread Daniel Drake
On Thu, May 9, 2019 at 5:17 PM Chris Chiu  wrote:
> I need the vif because there's seems no easy way to get RSSI. Please
> suggest if there's any better idea for this. I believe multiple vifs is for AP
> mode (with more than 1 virtual AP/SSIDs) and the Station+AP coexist
> mode. But the rtl8xxxu driver basically supports only Station mode.

Yes, the driver only lets you create station interfaces, but it lets
you create several of them.
I'm not sure if that is intentional (and meaningful), or if its a bug.
Maybe you can experiment with multiple station interfaces and see if
it works in a meaningful way?

Daniel


Re: [PATCH] i915: disable framebuffer compression on GeminiLake

2019-05-09 Thread Daniel Drake
Hi,


On Thu, Apr 25, 2019 at 4:27 AM Paulo Zanoni  wrote:
>
> Em qua, 2019-04-24 às 20:58 +0100, Chris Wilson escreveu:
> > Quoting Jian-Hong Pan (2019-04-23 10:28:10)
> > > From: Daniel Drake 
> > >
> > > On many (all?) the Gemini Lake systems we work with, there is frequent
> > > momentary graphical corruption at the top of the screen, and it seems
> > > that disabling framebuffer compression can avoid this.
> > >
> > > The ticket was reported 6 months ago and has already affected a
> > > multitude of users, without any real progress being made. So, lets
> > > disable framebuffer compression on GeminiLake until a solution is found.
> > >
> > > Buglink: https://bugs.freedesktop.org/show_bug.cgi?id=108085
> > > Signed-off-by: Daniel Drake 
> > > Signed-off-by: Jian-Hong Pan 
> >
> > Fixes: fd7d6c5c8f3e ("drm/i915: enable FBC on gen9+ too") ?
> > Cc: Paulo Zanoni 
> > Cc: Daniel Vetter 
> > Cc: Jani Nikula 
> > Cc:  # v4.11+
> >
> > glk landed 1 month before, so that seems the earliest broken point.
> >
>
> The bug is well reported, the bug author is helpful and it even has a
> description of "steps to reproduce" that looks very easy (although I
> didn't try it). Everything suggests this is a bug the display team
> could actually solve with not-so-many hours of debugging.
>
> In the meantime, unbreak the systems:
> Reviewed-by: Paulo Zanoni 

Quick ping here. Any further comments on this patch? Can it be applied?

Thanks
Daniel


[tip:x86/apic] x86/tsc: Use CPUID.0x16 to calculate missing crystal frequency

2019-05-09 Thread tip-bot for Daniel Drake
Commit-ID:  604dc9170f2435d27da5039a3efd757dceadc684
Gitweb: https://git.kernel.org/tip/604dc9170f2435d27da5039a3efd757dceadc684
Author: Daniel Drake 
AuthorDate: Thu, 9 May 2019 13:54:15 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 9 May 2019 11:06:48 +0200

x86/tsc: Use CPUID.0x16 to calculate missing crystal frequency

native_calibrate_tsc() had a data mapping Intel CPU families
and crystal clock speed, but hardcoded tables are not ideal, and this
approach was already problematic at least in the Skylake X case, as
seen in commit:

  b51120309348 ("x86/tsc: Fix erroneous TSC rate on Skylake Xeon")

By examining CPUID data from http://instlatx64.atw.hu/ and units
in the lab, we have found that 3 different scenarios need to be dealt
with, and we can eliminate most of the hardcoded data using an approach a
little more advanced than before:

 1. ApolloLake, GeminiLake, CannonLake (and presumably all new chipsets
from this point) report the crystal frequency directly via CPUID.0x15.
That's definitive data that we can rely upon.

 2. Skylake, Kabylake and all variants of those two chipsets report a
crystal frequency of zero, however we can calculate the crystal clock
speed by condidering data from CPUID.0x16.

This method correctly distinguishes between the two crystal clock
frequencies present on different Skylake X variants that caused
headaches before.

As the calculations do not quite match the previously-hardcoded values
in some cases (e.g. 23913043Hz instead of 24MHz), TSC refinement is
enabled on all platforms where we had to calculate the crystal
frequency in this way.

 3. Denverton (GOLDMONT_X) reports a crystal frequency of zero and does
not support CPUID.0x16, so we leave this entry hardcoded.

Suggested-by: Thomas Gleixner 
Signed-off-by: Daniel Drake 
Reviewed-by: Thomas Gleixner 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: len.br...@intel.com
Cc: li...@endlessm.com
Cc: rafael.j.wyso...@intel.com
Link: http://lkml.kernel.org/r/20190509055417.13152-1-dr...@endlessm.com
Link: https://lkml.kernel.org/r/20190419083533.32388-1-dr...@endlessm.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/tsc.c | 47 +++
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 15b5e98a86f9..6e6d933fb99c 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -631,31 +631,38 @@ unsigned long native_calibrate_tsc(void)
 
crystal_khz = ecx_hz / 1000;
 
-   if (crystal_khz == 0) {
-   switch (boot_cpu_data.x86_model) {
-   case INTEL_FAM6_SKYLAKE_MOBILE:
-   case INTEL_FAM6_SKYLAKE_DESKTOP:
-   case INTEL_FAM6_KABYLAKE_MOBILE:
-   case INTEL_FAM6_KABYLAKE_DESKTOP:
-   crystal_khz = 24000;/* 24.0 MHz */
-   break;
-   case INTEL_FAM6_ATOM_GOLDMONT_X:
-   crystal_khz = 25000;/* 25.0 MHz */
-   break;
-   case INTEL_FAM6_ATOM_GOLDMONT:
-   crystal_khz = 19200;/* 19.2 MHz */
-   break;
-   }
-   }
+   /*
+* Denverton SoCs don't report crystal clock, and also don't support
+* CPUID.0x16 for the calculation below, so hardcode the 25MHz crystal
+* clock.
+*/
+   if (crystal_khz == 0 &&
+   boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT_X)
+   crystal_khz = 25000;
 
-   if (crystal_khz == 0)
-   return 0;
/*
-* TSC frequency determined by CPUID is a "hardware reported"
+* TSC frequency reported directly by CPUID is a "hardware reported"
 * frequency and is the most accurate one so far we have. This
 * is considered a known frequency.
 */
-   setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+   if (crystal_khz != 0)
+   setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+
+   /*
+* Some Intel SoCs like Skylake and Kabylake don't report the crystal
+* clock, but we can easily calculate it to a high degree of accuracy
+* by considering the crystal ratio and the CPU speed.
+*/
+   if (crystal_khz == 0 && boot_cpu_data.cpuid_level >= 0x16) {
+   unsigned int eax_base_mhz, ebx, ecx, edx;
+
+   cpuid(0x16, _base_mhz, , , );
+   crystal_khz = eax_base_mhz * 1000 *
+   eax_denominator / ebx_numerator;
+   }
+
+   if (crystal_khz == 0)
+   return 0;
 
/*
 * For Atom SoCs TSC is the only reliable clocksource.


[tip:x86/apic] x86/apic: Rename 'lapic_timer_frequency' to 'lapic_timer_period'

2019-05-09 Thread tip-bot for Daniel Drake
Commit-ID:  52ae346bd26c7a8b17ea82e9a09671e98c5402b7
Gitweb: https://git.kernel.org/tip/52ae346bd26c7a8b17ea82e9a09671e98c5402b7
Author: Daniel Drake 
AuthorDate: Thu, 9 May 2019 13:54:16 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 9 May 2019 11:06:49 +0200

x86/apic: Rename 'lapic_timer_frequency' to 'lapic_timer_period'

This variable is a period unit (number of clock cycles per jiffy),
not a frequency (which is number of cycles per second).

Give it a more appropriate name.

Suggested-by: Ingo Molnar 
Signed-off-by: Daniel Drake 
Reviewed-by: Thomas Gleixner 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: len.br...@intel.com
Cc: li...@endlessm.com
Cc: rafael.j.wyso...@intel.com
Link: http://lkml.kernel.org/r/20190509055417.13152-2-dr...@endlessm.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/apic.h|  2 +-
 arch/x86/kernel/apic/apic.c| 20 ++--
 arch/x86/kernel/cpu/mshyperv.c |  4 ++--
 arch/x86/kernel/cpu/vmware.c   |  2 +-
 arch/x86/kernel/jailhouse.c|  2 +-
 arch/x86/kernel/tsc_msr.c  |  4 ++--
 6 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 130e81e10fc7..fc505a84aa93 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -52,7 +52,7 @@ extern unsigned int apic_verbosity;
 extern int local_apic_timer_c2_ok;
 
 extern int disable_apic;
-extern unsigned int lapic_timer_frequency;
+extern unsigned int lapic_timer_period;
 
 extern enum apic_intr_mode_id apic_intr_mode;
 enum apic_intr_mode_id {
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index ab6af775f06c..93de7862eef8 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -194,7 +194,7 @@ static struct resource lapic_resource = {
.flags = IORESOURCE_MEM | IORESOURCE_BUSY,
 };
 
-unsigned int lapic_timer_frequency = 0;
+unsigned int lapic_timer_period = 0;
 
 static void apic_pm_activate(void);
 
@@ -500,7 +500,7 @@ lapic_timer_set_periodic_oneshot(struct clock_event_device 
*evt, bool oneshot)
if (evt->features & CLOCK_EVT_FEAT_DUMMY)
return 0;
 
-   __setup_APIC_LVTT(lapic_timer_frequency, oneshot, 1);
+   __setup_APIC_LVTT(lapic_timer_period, oneshot, 1);
return 0;
 }
 
@@ -804,11 +804,11 @@ calibrate_by_pmtimer(long deltapm, long *delta, long 
*deltatsc)
 
 static int __init lapic_init_clockevent(void)
 {
-   if (!lapic_timer_frequency)
+   if (!lapic_timer_period)
return -1;
 
/* Calculate the scaled math multiplication factor */
-   lapic_clockevent.mult = div_sc(lapic_timer_frequency/APIC_DIVISOR,
+   lapic_clockevent.mult = div_sc(lapic_timer_period/APIC_DIVISOR,
TICK_NSEC, lapic_clockevent.shift);
lapic_clockevent.max_delta_ns =
clockevent_delta2ns(0x7FFF, _clockevent);
@@ -838,7 +838,7 @@ static int __init calibrate_APIC_clock(void)
 */
if (!lapic_init_clockevent()) {
apic_printk(APIC_VERBOSE, "lapic timer already calibrated %d\n",
-   lapic_timer_frequency);
+   lapic_timer_period);
/*
 * Direct calibration methods must have an always running
 * local APIC timer, no need for broadcast timer.
@@ -883,13 +883,13 @@ static int __init calibrate_APIC_clock(void)
pm_referenced = !calibrate_by_pmtimer(lapic_cal_pm2 - lapic_cal_pm1,
, );
 
-   lapic_timer_frequency = (delta * APIC_DIVISOR) / LAPIC_CAL_LOOPS;
+   lapic_timer_period = (delta * APIC_DIVISOR) / LAPIC_CAL_LOOPS;
lapic_init_clockevent();
 
apic_printk(APIC_VERBOSE, ". delta %ld\n", delta);
apic_printk(APIC_VERBOSE, ". mult: %u\n", lapic_clockevent.mult);
apic_printk(APIC_VERBOSE, ". calibration result: %u\n",
-   lapic_timer_frequency);
+   lapic_timer_period);
 
if (boot_cpu_has(X86_FEATURE_TSC)) {
apic_printk(APIC_VERBOSE, ". CPU clock speed is "
@@ -900,13 +900,13 @@ static int __init calibrate_APIC_clock(void)
 
apic_printk(APIC_VERBOSE, ". host bus clock speed is "
"%u.%04u MHz.\n",
-   lapic_timer_frequency / (100 / HZ),
-   lapic_timer_frequency % (100 / HZ));
+   lapic_timer_period / (100 / HZ),
+   lapic_timer_period % (100 / HZ));
 
/*
 * Do a sanity check on the APIC calibration result
 */
-   if (lapic_timer_frequency < (100 / HZ)) {
+   if (lapic_timer_period < (100 / HZ)) {
local_irq_enable();
pr_warning("

[tip:x86/apic] x86/tsc: Set LAPIC timer period to crystal clock frequency

2019-05-09 Thread tip-bot for Daniel Drake
Commit-ID:  2420a0b1798d7a78d1f9b395f09f3c80d92cc588
Gitweb: https://git.kernel.org/tip/2420a0b1798d7a78d1f9b395f09f3c80d92cc588
Author: Daniel Drake 
AuthorDate: Thu, 9 May 2019 13:54:17 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 9 May 2019 11:06:49 +0200

x86/tsc: Set LAPIC timer period to crystal clock frequency

The APIC timer calibration (calibrate_APIC_timer()) can be skipped
in cases where we know the APIC timer frequency. On Intel SoCs,
we believe that the APIC is fed by the crystal clock; this would make
sense, and the crystal clock frequency has been verified against the
APIC timer calibration result on ApolloLake, GeminiLake, Kabylake,
CoffeeLake, WhiskeyLake and AmberLake.

Set lapic_timer_period based on the crystal clock frequency
accordingly.

APIC timer calibration would normally be skipped on modern CPUs
by nature of the TSC deadline timer being used instead,
however this change is still potentially useful, e.g. if the
TSC deadline timer has been disabled with a kernel parameter.
calibrate_APIC_timer() uses the legacy timer, but we are seeing
new platforms that omit such legacy functionality, so avoiding
such codepaths is becoming more important.

Suggested-by: Thomas Gleixner 
Signed-off-by: Daniel Drake 
Reviewed-by: Thomas Gleixner 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: len.br...@intel.com
Cc: li...@endlessm.com
Cc: rafael.j.wyso...@intel.com
Link: http://lkml.kernel.org/r/20190509055417.13152-3-dr...@endlessm.com
Link: https://lkml.kernel.org/r/20190419083533.32388-1-dr...@endlessm.com
Link: 
https://lkml.kernel.org/r/alpine.deb.2.21.1904031206440.1...@nanos.tec.linutronix.de
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/tsc.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 6e6d933fb99c..8f47c4862c56 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -671,6 +671,16 @@ unsigned long native_calibrate_tsc(void)
if (boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT)
setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
 
+#ifdef CONFIG_X86_LOCAL_APIC
+   /*
+* The local APIC appears to be fed by the core crystal clock
+* (which sounds entirely sensible). We can set the global
+* lapic_timer_period here to avoid having to calibrate the APIC
+* timer later.
+*/
+   lapic_timer_period = crystal_khz * 1000 / HZ;
+#endif
+
return crystal_khz * ebx_numerator / eax_denominator;
 }
 


Re: [RFC PATCH 1/2] rtl8xxxu: Add rate adaptive related data

2019-05-09 Thread Daniel Drake
Hi Chris,

Thanks for this! Some suggestions below, although let me know if any
don't make sense.

On Fri, May 3, 2019 at 3:22 PM Chris Chiu  wrote:
>
> Add wireless mode, signal strength level, and rate table index
> to tell the firmware that we need to adjust the tx rate bitmap
> accordingly.
> ---
>  .../net/wireless/realtek/rtl8xxxu/rtl8xxxu.h  | 45 +++
>  .../wireless/realtek/rtl8xxxu/rtl8xxxu_core.c | 45 ++-
>  2 files changed, 89 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h 
> b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
> index 8828baf26e7b..771f58aa7cae 100644
> --- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
> +++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
> @@ -1195,6 +1195,50 @@ struct rtl8723bu_c2h {
>
>  struct rtl8xxxu_fileops;
>
> +/*mlme related.*/
> +enum wireless_mode {
> +   WIRELESS_MODE_UNKNOWN = 0,
> +   //Sub-Element

Run these patches through checkpatch.pl, it'll have some suggestions
to bring the coding style in line, for example not using // style
comments.

> +   WIRELESS_MODE_B = BIT(0), // tx: cck only , rx: cck only, hw: cck
> +   WIRELESS_MODE_G = BIT(1), // tx: ofdm only, rx: ofdm & cck, hw: cck & 
> ofdm
> +   WIRELESS_MODE_A = BIT(2), // tx: ofdm only, rx: ofdm only, hw: ofdm 
> only
> +   WIRELESS_MODE_N_24G = BIT(3), // tx: MCS only, rx: MCS & cck, hw: MCS 
> & cck
> +   WIRELESS_MODE_N_5G = BIT(4), // tx: MCS only, rx: MCS & ofdm, hw: 
> ofdm only
> +   WIRELESS_AUTO = BIT(5),
> +   WIRELESS_MODE_AC = BIT(6),
> +   WIRELESS_MODE_MAX = 
> (WIRELESS_MODE_B|WIRELESS_MODE_G|WIRELESS_MODE_A|WIRELESS_MODE_N_24G|WIRELESS_MODE_N_5G|WIRELESS_MODE_AC),
> +};
> +
> +/* from rtlwifi/wifi.h */
> +enum ratr_table_mode_new {
> +RATEID_IDX_BGN_40M_2SS = 0,
> +RATEID_IDX_BGN_40M_1SS = 1,
> +RATEID_IDX_BGN_20M_2SS_BN = 2,
> +RATEID_IDX_BGN_20M_1SS_BN = 3,
> +RATEID_IDX_GN_N2SS = 4,
> +RATEID_IDX_GN_N1SS = 5,
> +RATEID_IDX_BG = 6,
> +RATEID_IDX_G = 7,
> +RATEID_IDX_B = 8,
> +RATEID_IDX_VHT_2SS = 9,
> +RATEID_IDX_VHT_1SS = 10,
> +RATEID_IDX_MIX1 = 11,
> +RATEID_IDX_MIX2 = 12,
> +RATEID_IDX_VHT_3SS = 13,
> +RATEID_IDX_BGN_3SS = 14,
> +};
> +
> +#define RTL8XXXU_RATR_STA_INIT 0
> +#define RTL8XXXU_RATR_STA_HIGH 1
> +#define RTL8XXXU_RATR_STA_MID  2
> +#define RTL8XXXU_RATR_STA_LOW  3
> +
> +struct rtl8xxxu_rate_adaptive {
> +   u16 wireless_mode;
> +   u8 ratr_index;
> +   u8 rssi_level;  /* INIT, HIGH, MIDDLE, LOW */
> +} __packed;

It would be better/cleaner to avoid storing data in per-device
structures if at all possible.

For wireless_mode, I think you should just call
rtl8xxxu_wireless_mode() every time from rtl8723b_refresh_rate_mask().
The work done there is simple (i.e. it's not expensive to call) and
then we avoid having to store data (which might become stale etc).

For ratr_index, I believe you can just make it a parameter passed to
rtl8xxxu_gen2_update_rate_mask which is the only consumer of this
variable. The two callsites (rtl8xxxu_bss_info_changed and
rtl8723b_refresh_rate_mask) already know which value they want to be
used.

rssi_level is the one that you probably do want to store, since I see
the logic in patch 2 - if the rssi level hasn't changed then you don't
need to set the rate mask again, and that's a good idea since it
reduces bus traffic. You could move this into rtl8xxxu_priv rather
than having its own separate structure.

After making those changes it might even make sense to collapse this
all into a single patch; you can judge!


Re: [PATCH v3 04/11] platform/x86: asus-wmi: Improve DSTS WMI method ID detection

2019-05-09 Thread Daniel Drake
> > -#define ASUS_WMI_METHODID_DSTS 0x53544344 /* Device STatuS */
> > -#define ASUS_WMI_METHODID_DSTS20x53545344 /* Device STatuS 
> > #2*/
>
> > +#define ASUS_WMI_METHODID_DSTS 0x53544344 /* Device STatuS (DCTS) 
> > */
>
> It's not clear from the description what 'C' stands for.
> Is there any specification which describes the difference and actual
> abbreviations?

The (recent) spec I have here doesn't mention 0x53544344 (DCTS).
However the spec changelog does mention that EEEPC stuff was removed
from the spec a while ago.
The spec does mention 0x53545344 (DSTS), labelled as "Get device status".

For clarity I think the constants could be renamed as
ASUS_WMI_METHODID_DCTS and ASUS_WMI_METHODID_DSTS.

Daniel


[PATCH v2 1/3] x86/tsc: use CPUID.0x16 to calculate missing crystal frequency

2019-05-08 Thread Daniel Drake
native_calibrate_tsc() had a data mapping Intel CPU families
and crystal clock speed, but hardcoded tables are not ideal, and this
approach was already problematic at least in the Skylake X case, as
seen in commit b51120309348 ("x86/tsc: Fix erroneous TSC rate on Skylake
Xeon").

By examining CPUID data from http://instlatx64.atw.hu/ and units
in the lab, we have found that 3 different scenarios need to be dealt
with, and we can eliminate most of the hardcoded data using an approach a
little more advanced than before:

 1. ApolloLake, GeminiLake, CannonLake (and presumably all new chipsets
from this point) report the crystal frequency directly via CPUID.0x15.
That's definitive data that we can rely upon.

 2. Skylake, Kabylake and all variants of those two chipsets report a
crystal frequency of zero, however we can calculate the crystal clock
speed by condidering data from CPUID.0x16.

This method correctly distinguishes between the two crystal clock
frequencies present on different Skylake X variants that caused
headaches before.

As the calculations do not quite match the previously-hardcoded values
in some cases (e.g. 23913043Hz instead of 24MHz), TSC refinement is
enabled on all platforms where we had to calculate the crystal
frequency in this way.

 3. Denverton (GOLDMONT_X) reports a crystal frequency of zero and does
not support CPUID.0x16, so we leave this entry hardcoded.

Link: https://lkml.kernel.org/r/20190419083533.32388-1-dr...@endlessm.com
Suggested-by: Thomas Gleixner 
Signed-off-by: Daniel Drake 
---

Notes:
v2:
 - Clarify the situation around Skylake X better.
 - Enable TSC refinement when we had to calculate the crystal clock,
   in case slight differences in the calculation result cause problems
   similar to those reported earlier on Skylake X.

 arch/x86/kernel/tsc.c | 47 +--
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 15b5e98a86f9..6e6d933fb99c 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -631,31 +631,38 @@ unsigned long native_calibrate_tsc(void)
 
crystal_khz = ecx_hz / 1000;
 
-   if (crystal_khz == 0) {
-   switch (boot_cpu_data.x86_model) {
-   case INTEL_FAM6_SKYLAKE_MOBILE:
-   case INTEL_FAM6_SKYLAKE_DESKTOP:
-   case INTEL_FAM6_KABYLAKE_MOBILE:
-   case INTEL_FAM6_KABYLAKE_DESKTOP:
-   crystal_khz = 24000;/* 24.0 MHz */
-   break;
-   case INTEL_FAM6_ATOM_GOLDMONT_X:
-   crystal_khz = 25000;/* 25.0 MHz */
-   break;
-   case INTEL_FAM6_ATOM_GOLDMONT:
-   crystal_khz = 19200;/* 19.2 MHz */
-   break;
-   }
-   }
+   /*
+* Denverton SoCs don't report crystal clock, and also don't support
+* CPUID.0x16 for the calculation below, so hardcode the 25MHz crystal
+* clock.
+*/
+   if (crystal_khz == 0 &&
+   boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT_X)
+   crystal_khz = 25000;
 
-   if (crystal_khz == 0)
-   return 0;
/*
-* TSC frequency determined by CPUID is a "hardware reported"
+* TSC frequency reported directly by CPUID is a "hardware reported"
 * frequency and is the most accurate one so far we have. This
 * is considered a known frequency.
 */
-   setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+   if (crystal_khz != 0)
+   setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+
+   /*
+* Some Intel SoCs like Skylake and Kabylake don't report the crystal
+* clock, but we can easily calculate it to a high degree of accuracy
+* by considering the crystal ratio and the CPU speed.
+*/
+   if (crystal_khz == 0 && boot_cpu_data.cpuid_level >= 0x16) {
+   unsigned int eax_base_mhz, ebx, ecx, edx;
+
+   cpuid(0x16, _base_mhz, , , );
+   crystal_khz = eax_base_mhz * 1000 *
+   eax_denominator / ebx_numerator;
+   }
+
+   if (crystal_khz == 0)
+   return 0;
 
/*
 * For Atom SoCs TSC is the only reliable clocksource.
-- 
2.20.1



[PATCH v2 2/3] x86/apic: rename lapic_timer_frequency to lapic_timer_period

2019-05-08 Thread Daniel Drake
This variable is a period unit (number of clock cycles per jiffy),
not a frequency (which is number of cycles per second).

Give it a more appropriate name.

Suggested-by: Ingo Molnar 
Signed-off-by: Daniel Drake 
---
 arch/x86/include/asm/apic.h|  2 +-
 arch/x86/kernel/apic/apic.c| 20 ++--
 arch/x86/kernel/cpu/mshyperv.c |  4 ++--
 arch/x86/kernel/cpu/vmware.c   |  2 +-
 arch/x86/kernel/jailhouse.c|  2 +-
 arch/x86/kernel/tsc_msr.c  |  4 ++--
 6 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 130e81e10fc7..fc505a84aa93 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -52,7 +52,7 @@ extern unsigned int apic_verbosity;
 extern int local_apic_timer_c2_ok;
 
 extern int disable_apic;
-extern unsigned int lapic_timer_frequency;
+extern unsigned int lapic_timer_period;
 
 extern enum apic_intr_mode_id apic_intr_mode;
 enum apic_intr_mode_id {
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index ab6af775f06c..93de7862eef8 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -194,7 +194,7 @@ static struct resource lapic_resource = {
.flags = IORESOURCE_MEM | IORESOURCE_BUSY,
 };
 
-unsigned int lapic_timer_frequency = 0;
+unsigned int lapic_timer_period = 0;
 
 static void apic_pm_activate(void);
 
@@ -500,7 +500,7 @@ lapic_timer_set_periodic_oneshot(struct clock_event_device 
*evt, bool oneshot)
if (evt->features & CLOCK_EVT_FEAT_DUMMY)
return 0;
 
-   __setup_APIC_LVTT(lapic_timer_frequency, oneshot, 1);
+   __setup_APIC_LVTT(lapic_timer_period, oneshot, 1);
return 0;
 }
 
@@ -804,11 +804,11 @@ calibrate_by_pmtimer(long deltapm, long *delta, long 
*deltatsc)
 
 static int __init lapic_init_clockevent(void)
 {
-   if (!lapic_timer_frequency)
+   if (!lapic_timer_period)
return -1;
 
/* Calculate the scaled math multiplication factor */
-   lapic_clockevent.mult = div_sc(lapic_timer_frequency/APIC_DIVISOR,
+   lapic_clockevent.mult = div_sc(lapic_timer_period/APIC_DIVISOR,
TICK_NSEC, lapic_clockevent.shift);
lapic_clockevent.max_delta_ns =
clockevent_delta2ns(0x7FFF, _clockevent);
@@ -838,7 +838,7 @@ static int __init calibrate_APIC_clock(void)
 */
if (!lapic_init_clockevent()) {
apic_printk(APIC_VERBOSE, "lapic timer already calibrated %d\n",
-   lapic_timer_frequency);
+   lapic_timer_period);
/*
 * Direct calibration methods must have an always running
 * local APIC timer, no need for broadcast timer.
@@ -883,13 +883,13 @@ static int __init calibrate_APIC_clock(void)
pm_referenced = !calibrate_by_pmtimer(lapic_cal_pm2 - lapic_cal_pm1,
, );
 
-   lapic_timer_frequency = (delta * APIC_DIVISOR) / LAPIC_CAL_LOOPS;
+   lapic_timer_period = (delta * APIC_DIVISOR) / LAPIC_CAL_LOOPS;
lapic_init_clockevent();
 
apic_printk(APIC_VERBOSE, ". delta %ld\n", delta);
apic_printk(APIC_VERBOSE, ". mult: %u\n", lapic_clockevent.mult);
apic_printk(APIC_VERBOSE, ". calibration result: %u\n",
-   lapic_timer_frequency);
+   lapic_timer_period);
 
if (boot_cpu_has(X86_FEATURE_TSC)) {
apic_printk(APIC_VERBOSE, ". CPU clock speed is "
@@ -900,13 +900,13 @@ static int __init calibrate_APIC_clock(void)
 
apic_printk(APIC_VERBOSE, ". host bus clock speed is "
"%u.%04u MHz.\n",
-   lapic_timer_frequency / (100 / HZ),
-   lapic_timer_frequency % (100 / HZ));
+   lapic_timer_period / (100 / HZ),
+   lapic_timer_period % (100 / HZ));
 
/*
 * Do a sanity check on the APIC calibration result
 */
-   if (lapic_timer_frequency < (100 / HZ)) {
+   if (lapic_timer_period < (100 / HZ)) {
local_irq_enable();
pr_warning("APIC frequency too slow, disabling apic timer\n");
return -1;
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 3fa238a137d2..faae6115ddef 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -270,9 +270,9 @@ static void __init ms_hyperv_init_platform(void)
 
rdmsrl(HV_X64_MSR_APIC_FREQUENCY, hv_lapic_frequency);
hv_lapic_frequency = div_u64(hv_lapic_frequency, HZ);
-   lapic_timer_frequency = hv_lapic_frequency;
+   lapic_timer_period = hv_lapic_frequency;
p

[PATCH v2 3/3] x86/tsc: set LAPIC timer period to crystal clock frequency

2019-05-08 Thread Daniel Drake
The APIC timer calibration (calibrate_APIC_timer()) can be skipped
in cases where we know the APIC timer frequency. On Intel SoCs,
we believe that the APIC is fed by the crystal clock; this would make
sense, and the crystal clock frequency has been verified against the
APIC timer calibration result on ApolloLake, GeminiLake, Kabylake,
CoffeeLake, WhiskeyLake and AmberLake.

Set lapic_timer_period based on the crystal clock frequency
accordingly.

APIC timer calibration would normally be skipped on modern CPUs
by nature of the TSC deadline timer being used instead,
however this change is still potentially useful, e.g. if the
TSC deadline timer has been disabled with a kernel parameter.
calibrate_APIC_timer() uses the legacy timer, but we are seeing
new platforms that omit such legacy functionality, so avoiding
such codepaths is becoming more important.

Link: https://lkml.kernel.org/r/20190419083533.32388-1-dr...@endlessm.com
Link: 
https://lkml.kernel.org/r/alpine.deb.2.21.1904031206440.1...@nanos.tec.linutronix.de
Suggested-by: Thomas Gleixner 
Signed-off-by: Daniel Drake 
---

Notes:
v2:
 - remove unnecessary braces
 - use newly renamed variable lapic_timer_period

 arch/x86/kernel/tsc.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 6e6d933fb99c..8f47c4862c56 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -671,6 +671,16 @@ unsigned long native_calibrate_tsc(void)
if (boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT)
setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
 
+#ifdef CONFIG_X86_LOCAL_APIC
+   /*
+* The local APIC appears to be fed by the core crystal clock
+* (which sounds entirely sensible). We can set the global
+* lapic_timer_period here to avoid having to calibrate the APIC
+* timer later.
+*/
+   lapic_timer_period = crystal_khz * 1000 / HZ;
+#endif
+
return crystal_khz * ebx_numerator / eax_denominator;
 }
 
-- 
2.20.1



Re: sg_dma_page_iter offset & length considerations

2019-04-25 Thread Daniel Drake
On Wed, Apr 24, 2019 at 7:22 PM Jason Gunthorpe  wrote:
> A driver that simply wants a SGL with a capped max size (ie 4k?)
> should use the dma_set_max_seg_size() API and just never get a SGE
> with a larger length.

That sounds like exactly what we want here. However I tried that
function and it seems to have no effect. Larger SGEs still appear just
as readily as before.

I tried to find the code that honours the value set here, but didn't
find anything. Sure there are some non-x86 archs and some iommu
drivers that appear to do something with it, but in this case
(standard Intel x86_64) it seems like nothing observes/honours this
value.

That did lead me back to the similarly-named mmc_host.max_seg_size
though, which is passed through to blk_queue_max_segment_size().
That is documented as "Enables a low level driver to set an upper
limit on the size of a coalesced segment", but perhaps I don't fully
grasp what it means by "coalesced segment".

The alcor driver currently has:
mmc->max_segs = 64;
mmc->max_seg_size = 240 * 512;

which I had interpreted to mean "we can accept a sglist with maximum
64 entries, and if you sum the size of all entries in that sglist, the
total must not exceed 240 sectors". i.e. with "coalesced" referring to
the overall sum; the description does not suggest to me that we can
influence the size of an individual segment here.

Other mmc drivers may also share a similar interpretation, e.g. see atmel-mci.c:
mmc->max_seg_size = mmc->max_blk_size * mmc->max_segs;

but I just experimented with the driver again, and setting
mmc_host.max_seg_size to 4096 does seem to do what we want here. Now
we get multi-entry sg lists but each sge is max 4096 bytes in size.

I'll do some more verification and then send a patch along these
lines. Thanks for your comments!

Daniel


sg_dma_page_iter offset & length considerations

2019-04-24 Thread Daniel Drake
Hi,

In drivers/mmc/alcor.c we're working with a MMC controller which
supports DMA transfers split up into page-sized chunks. A DMA transfer
consists of multiple pages, after each page is transferred there is an
interrupt so that the driver can program the DMA address of the next
page in the sequence. All pages must be complete in length, only the
last one can be a partial transfer.

I thought that the sg_dma_page_iter API looked like a great fit here:
the driver can accept any old sglist, and then use this new API to
collapse it into a list of pages that can be easily iterated over, and
fed to the hardware one at a time.

But looking closer I think I may have made some bad assumptions, and
I'm left with some fundamental questions about this API.

Specifically I can see userspace generates requests which present a
sglist such as:
 - first entry with offset=1536 length=2560
 - 7 entries with offset=0 length=4096
 - last entry with offset=0 length=1536

I gather that dma_map_sg() will take care off the offsets, i.e. any
physical address I get with sg_page_iter_dma_address() will already
have the offset applied, so I don't have to worry about tracking that.

But what about the length? For every page returned by the iterator, I
can't assume that I am being asked to work with the full page, right?
Such as the first and last page in the above example. I need to go
back to the sglist to check the corresponding length variable, and
having to go back to check the sglist seems to defeat the convenience
of having the sglist collapsed into a list of pages by the iterator.

Any comments?

Thanks
Daniel


Re: [PATCH 1/2] x86/tsc: use CPUID.0x16 to calculate missing crystal frequency

2019-04-24 Thread Daniel Drake
On Tue, Apr 23, 2019 at 5:08 PM Peter Zijlstra  wrote:
> Well, SKX isn't exactly 'missing'; it would be very good if we can
> confirm the new code is still correct under the below mentioned
> conditions.
>
> commit b511203093489eb1829cb4de86e8214752205ac6
> Author: Len Brown 
> Date:   Fri Dec 22 00:27:55 2017 -0500
>
> x86/tsc: Fix erroneous TSC rate on Skylake Xeon
>
> The INTEL_FAM6_SKYLAKE_X hardcoded crystal_khz value of 25MHZ is
> problematic:
>
>  - SKX workstations (with same model # as server variants) use a 24 MHz
>crystal.  This results in a -4.0% time drift rate on SKX workstations.

Checking http://instlatx64.atw.hu/ there are 11 platforms listed as
00050654 (skylake X), doing the calculations manually:
18-Core Intel Xeon Gold 6154, 3000 MHz: 2500
2x 16-Core Intel Xeon Gold 6130, 2100 MHz: 2500
2x 18-Core Intel Xeon Gold 6154, 3000 MHz: 2500
2x 28-Core Intel Xeon Platinum 8180, 2500 MHz: 2500
2x HexaCore Intel Xeon Bronze 3106: 2500
2x OctalCore Intel Xeon Silver 4108, 3000 MHz: 2500
10-Core Xeon W-2155: 23913043
HexaCore Intel Core i7-7800X: 23972602
10-Core Intel Core i9-7900X, 4000 MHz: 23913043
18-Core Intel Core i9-7980XE: 24074074
28-Core Intel Xeon W-3175X: 2500

So given that the results include 24MHz and 25MHz crystal clocks
calculated on different products then it looks like this variance is
correctly accounted for.

>  - SKX servers subject the crystal to an EMI reduction circuit that 
> reduces its
>actual frequency by (approximately) -0.25%.  This results in -1 second 
> per
>10 minute time drift as compared to network time.
>
> This issue can also trigger a timer and power problem, on configurations
> that use the LAPIC timer (versus the TSC deadline timer).  Clock ticks
> scheduled with the LAPIC timer arrive a few usec before the time they are
> expected (according to the slow TSC).  This causes Linux to poll-idle, 
> when
> it should be in an idle power saving state.  The idle and clock code do 
> not
> graciously recover from this error, sometimes resulting in significant
> polling and measurable power impact.
>
> Stop using native_calibrate_tsc() for INTEL_FAM6_SKYLAKE_X.
> native_calibrate_tsc() will return 0, boot will run with tsc_khz = 
> cpu_khz,
> and the TSC refined calibration will update tsc_khz to correct for the
> difference.

It sounds like I should add a condition:
if (boot_cpu_data.x86_model != INTEL_FAM6_SKYLAKE_X)
setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
so that refined TSC calibration will run on SKYLAKE_X.

That does raise another question though. Taking other platforms such
as KABYLAKE_MOBILE, the previous code hardcoded a precise 24MHz value,
but my new code calculates 23893805Hz. Is that small discrepancy small
enough to trigger the timer and power problem as described for
SKYLAKE_X above?

If so, maybe the logic should instead be:
if (CPUID.0x15 provided Crystal Hz, i.e. we didn't have to calculate it)
setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
This would enable TSC refinement on all variants of skylake and kabylake.

Daniel


[PATCH 1/2] x86/time: check usability of IRQ0 PIT timer

2019-04-22 Thread Daniel Drake
Modern Intel SoCs now include a special ITSSPRC register that can be
used to "gate" the PIT such that IRQ0 interrupts do not fire.

With Intel Apollo Lake we are starting to see consumer products that
have a BIOS option to apply this (defaulting to gated). Some such
products also lack the HPET ACPI table, so there is no HPET either.

At this point, Linux needs to stop assuming that the IRQ0 timer is
available.

Move APIC code to check IRQ0 to time.c, then check and record the IRQ0
PIT timer usability after it is set up. If it does not produce any
interrupts, unregister the clock event source.

Signed-off-by: Daniel Drake 
Link: 
https://lkml.kernel.org/r/CAD8Lp45fedoPLnK=umuhhtkjy5u2h04sykrx3u+m04u6fpv...@mail.gmail.com
---
 arch/x86/include/asm/time.h|   2 +
 arch/x86/kernel/apic/io_apic.c | 101 ---
 arch/x86/kernel/i8253.c|   6 ++
 arch/x86/kernel/time.c | 106 -
 drivers/clocksource/i8253.c|   6 ++
 include/linux/clockchips.h |   3 +
 include/linux/i8253.h  |   2 +
 kernel/time/tick-internal.h|   2 -
 8 files changed, 134 insertions(+), 94 deletions(-)

diff --git a/arch/x86/include/asm/time.h b/arch/x86/include/asm/time.h
index cef818b16045..e6e00d18b39f 100644
--- a/arch/x86/include/asm/time.h
+++ b/arch/x86/include/asm/time.h
@@ -10,4 +10,6 @@ extern void time_init(void);
 
 extern struct clock_event_device *global_clock_event;
 
+extern bool irq0_timer_works(void);
+
 #endif /* _ASM_X86_TIME_H */
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 53aa234a6803..ae46da48c07b 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -57,6 +57,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1573,92 +1574,6 @@ void __init setup_ioapic_ids_from_mpc(void)
 }
 #endif
 
-int no_timer_check __initdata;
-
-static int __init notimercheck(char *s)
-{
-   no_timer_check = 1;
-   return 1;
-}
-__setup("no_timer_check", notimercheck);
-
-static void __init delay_with_tsc(void)
-{
-   unsigned long long start, now;
-   unsigned long end = jiffies + 4;
-
-   start = rdtsc();
-
-   /*
-* We don't know the TSC frequency yet, but waiting for
-* 400/HZ TSC cycles is safe:
-* 4 GHz == 10 jiffies
-* 1 GHz == 40 jiffies
-*/
-   do {
-   rep_nop();
-   now = rdtsc();
-   } while ((now - start) < 400ULL / HZ &&
-   time_before_eq(jiffies, end));
-}
-
-static void __init delay_without_tsc(void)
-{
-   unsigned long end = jiffies + 4;
-   int band = 1;
-
-   /*
-* We don't know any frequency yet, but waiting for
-* 4094000/HZ cycles is safe:
-* 4 GHz == 10 jiffies
-* 1 GHz == 40 jiffies
-* 1 << 1 + 1 << 2 +...+ 1 << 11 = 4094
-*/
-   do {
-   __delay(((1U << band++) * 1000UL) / HZ);
-   } while (band < 12 && time_before_eq(jiffies, end));
-}
-
-/*
- * There is a nasty bug in some older SMP boards, their mptable lies
- * about the timer IRQ. We do the following to work around the situation:
- *
- * - timer IRQ defaults to IO-APIC IRQ
- * - if this function detects that timer IRQs are defunct, then we fall
- *   back to ISA timer IRQs
- */
-static int __init timer_irq_works(void)
-{
-   unsigned long t1 = jiffies;
-   unsigned long flags;
-
-   if (no_timer_check)
-   return 1;
-
-   local_save_flags(flags);
-   local_irq_enable();
-
-   if (boot_cpu_has(X86_FEATURE_TSC))
-   delay_with_tsc();
-   else
-   delay_without_tsc();
-
-   local_irq_restore(flags);
-
-   /*
-* Expect a few ticks at least, to be sure some possible
-* glue logic does not lock up after one or two first
-* ticks in a non-ExtINT mode.  Also the local APIC
-* might have cached one ExtINT interrupt.  Finally, at
-* least one tick may be lost due to delays.
-*/
-
-   /* jiffies wrap? */
-   if (time_after(jiffies, t1 + 4))
-   return 1;
-   return 0;
-}
-
 /*
  * In the SMP+IOAPIC case it might happen that there are an unspecified
  * number of pending IRQ events unhandled. These cases are very rare,
@@ -2066,6 +1981,12 @@ static int mp_alloc_timer_irq(int ioapic, int pin)
 }
 
 /*
+ * In the SMP+IOAPIC case it might happen that there are an unspecified
+ * number of pending IRQ events unhandled. These cases are very rare,
+ * so we 'resend' these IRQs via IPIs, to the same CPU. It's much
+ * better to do it this way as thus we do not have to be aware of
+ * 'pending' interrupts in the IRQ path, except at this point.
+ *
  * This code may look a bit paranoid, but it's supposed to cooperate with
  * a wide range of boards and BIOS bugs. 

[PATCH 2/2] x86/ioapic: avoid timer manipulation when IRQ0 timer is unavailable

2019-04-22 Thread Daniel Drake
New products based on Intel Apollo Lake are appearing where the HPET is
not present in ACPI, and the legacy 8254 PIT is "gated" by default in
the BIOS setup menu.

This leads an early boot "IO-APIC + timer doesn't work!" kernel panic
on a black screen (before the framebuffer is initialized).

Avoid the IO-APIC IRQ0 timer manipulation & verification on platforms
where the legacy IRQ0 timer has been determined as unavailable.

This fixes boot on Connex L1430 and Scope SN116PYA with default BIOS
settings.

Signed-off-by: Daniel Drake 
Link: 
https://lkml.kernel.org/r/CAD8Lp45fedoPLnK=umuhhtkjy5u2h04sykrx3u+m04u6fpv...@mail.gmail.com
---
 arch/x86/kernel/apic/io_apic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index ae46da48c07b..2d29c62abbcb 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2243,7 +2243,7 @@ void __init setup_IO_APIC(void)
sync_Arb_IDs();
setup_IO_APIC_irqs();
init_IO_APIC_traps();
-   if (nr_legacy_irqs())
+   if (global_clock_event && nr_legacy_irqs())
check_timer();
 
ioapic_initialized = 1;
-- 
2.19.1



Re: [PATCH 2/2] x86/tsc: set LAPIC timer frequency to crystal clock frequency

2019-04-22 Thread Daniel Drake
On Mon, Apr 22, 2019 at 8:04 PM Ingo Molnar  wrote:
> Minor style nit: the parentheses are unnecessary, integer expressions
> like this are evaluated left to right and multiplication and division has
> the same precedence.

Fair point, although the same could be said for cpu_khz_from_msr().

> But it might also make sense to actually store crystal_mhz instead of
> crystal_khz, because both CPUID 15H and 16H provides MHz values.
>
> That way the above expression would simplify to:
>
> lapic_timer_frequency = crystal_mhz / HZ;

Wouldn't it be
lapic_timer_frequency = crystal_mhz * 100 / HZ;
?

Thanks
Daniel


[PATCH 2/2] x86/tsc: set LAPIC timer frequency to crystal clock frequency

2019-04-22 Thread Daniel Drake
The APIC timer calibration (calibrate_APIC_timer()) can be skipped
in cases where we know the APIC timer frequency. On Intel SoCs,
we believe that the APIC is fed by the crystal clock; this would make
sense, and the crystal clock frequency has been verified against the
APIC timer calibration result on ApolloLake, GeminiLake, Kabylake,
CoffeeLake, WhiskeyLake and AmberLake.

Set lapic_timer_frequency based on the crystal clock frequency
accordingly.

APIC timer calibration would normally be skipped on modern CPUs
by nature of the TSC deadline timer being used instead,
however this change is still potentially useful, e.g. if the
TSC deadline timer has been disabled with a kernel parameter.
calibrate_APIC_timer() uses the legacy timer, but we are seeing
new platforms that omit such legacy functionality, so avoiding
such codepaths is becoming more important.

Link: https://lkml.kernel.org/r/20190419083533.32388-1-dr...@endlessm.com
Link: 
https://lkml.kernel.org/r/alpine.deb.2.21.1904031206440.1...@nanos.tec.linutronix.de
Suggested-by: Thomas Gleixner 
Signed-off-by: Daniel Drake 
---
 arch/x86/kernel/tsc.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 3971c837584a..8750543287fc 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -665,6 +665,16 @@ unsigned long native_calibrate_tsc(void)
if (boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT)
setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
 
+#ifdef CONFIG_X86_LOCAL_APIC
+   /*
+* The local APIC appears to be fed by the core crystal clock
+* (which sounds entirely sensible). We can set the global
+* lapic_timer_frequency here to avoid having to calibrate the APIC
+* timer later.
+*/
+   lapic_timer_frequency = (crystal_khz * 1000) / HZ;
+#endif
+
return crystal_khz * ebx_numerator / eax_denominator;
 }
 
-- 
2.19.1



[PATCH 1/2] x86/tsc: use CPUID.0x16 to calculate missing crystal frequency

2019-04-22 Thread Daniel Drake
native_calibrate_tsc() had a hardcoded table of Intel CPU families
and crystal clock, but we have found that it is possible to
calculate the crystal clock speed, and this is preferred to a hardcoded
table.

Where crystal clock frequency was not reported by CPUID.0x15,
use CPUID.0x16 data to calculate the crystal clock.

Using CPUID dump data from http://instlatx64.atw.hu/, the calculation
results can be seen to be sufficiently close to the previously hardcoded
values:
SKYLAKE_MOBILE: 24074074 Hz
SKYLAKE_DESKTOP: 23913043 Hz
KABYLAKE_MOBILE: 23893805 Hz
KABYLAKE_DESKTOP: 24050632 Hz
GOLDMONT: 19.2MHz crystal clock correctly reported by CPUID.0x15

Additionally, crystal clock frequency for platforms that were missing
from the list (e.g. SKYLAKE_X) will now be provided.

GOLDMONT_X was left as a hardcoded value, as the CPUID data on that site
indicates that the hardware does not report crystal clock nor CPUID.0x16
data.

Going forward, Intel have hopefully now started providing crystal
frequency in CPUID.0x15. At least ApolloLake, GeminiLake and CannonLake
CPUs all provide the relevant data directly.

Link: https://lkml.kernel.org/r/20190419083533.32388-1-dr...@endlessm.com
Suggested-by: Thomas Gleixner 
Signed-off-by: Daniel Drake 
---
 arch/x86/kernel/tsc.c | 33 ++---
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 3fae23834069..3971c837584a 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -629,23 +629,26 @@ unsigned long native_calibrate_tsc(void)
 
crystal_khz = ecx_hz / 1000;
 
-   if (crystal_khz == 0) {
-   switch (boot_cpu_data.x86_model) {
-   case INTEL_FAM6_SKYLAKE_MOBILE:
-   case INTEL_FAM6_SKYLAKE_DESKTOP:
-   case INTEL_FAM6_KABYLAKE_MOBILE:
-   case INTEL_FAM6_KABYLAKE_DESKTOP:
-   crystal_khz = 24000;/* 24.0 MHz */
-   break;
-   case INTEL_FAM6_ATOM_GOLDMONT_X:
-   crystal_khz = 25000;/* 25.0 MHz */
-   break;
-   case INTEL_FAM6_ATOM_GOLDMONT:
-   crystal_khz = 19200;/* 19.2 MHz */
-   break;
-   }
+   /*
+* Some Intel SoCs like Skylake and Kabylake don't report the crystal
+* clock, but we can easily calculate it by considering the crystal
+* ratio and the CPU speed.
+*/
+   if (crystal_khz == 0 && boot_cpu_data.cpuid_level >= 0x16) {
+   unsigned int eax_base_mhz, ebx, ecx, edx;
+   cpuid(0x16, _base_mhz, , , );
+   crystal_khz = eax_base_mhz * 1000 * \
+   eax_denominator / ebx_numerator;
}
 
+   /*
+* Denverton SoCs don't report crystal clock, and also don't support
+* CPUID.0x16, so hardcode the 25MHz crystal clock.
+*/
+   if (crystal_khz == 0 &&
+   boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT_X)
+   crystal_khz = 25000;
+
if (crystal_khz == 0)
return 0;
/*
-- 
2.19.1



Re: [PATCH] platform/x86: asus-wmi: Add fn-lock mode switch support

2019-04-19 Thread Daniel Drake
On Thu, Apr 18, 2019 at 2:47 PM Chris Chiu  wrote:
>
> Some of latest ASUS laptops support new fn-lock mode switching.
> This commit detect whether if the fn-lock option is enabled in
> BIOS setting, and toggle the fn-lock mode via a new WMI DEVID
> 0x00100023 when the corresponding notify code captured.
>
> The ASUS fn-lock mode switch is activated by pressing Fn+Esc.
> When on, keys F1 to F12 behave as applicable, with meanings
> defined by the application being used at the time. When off,
> F1 to F12 directly triggers hardware features, well known audio
> volume up/down, brightness up/down...etc, which were triggered
> by holding down Fn key and F-keys.
>
> Because there's no way to retrieve the fn-lock mode via existing
> WMI methods per ASUS spec, driver need to initialize and keep the
> fn-lock mode by itself.
>
> Signed-off-by: Chris Chiu 

Reviewed-by: Daniel Drake 


Re: Detecting x86 LAPIC timer frequency from CPUID data

2019-04-19 Thread Daniel Drake
On Fri, Apr 19, 2019 at 6:30 AM Thomas Gleixner  wrote:
>Time Stamp Counter/Core Crystal Clock Information (0x15):
>   TSC/clock ratio = 168/2
>   nominal core crystal clock = 0 Hz
>
>Processor Frequency Information (0x16):
>   Core Base Frequency (MHz) = 0x834 (2100)
>   Core Maximum Frequency (MHz) = 0xed8 (3800)
>   Bus (Reference) Frequency (MHz) = 0x64 (100)
>
> Assuming that TSC and local APIC timer run from the same frequency on these
> modern machines.
>
>2100MHz * 2 / 168 = 25MHz
>
> and disabling the tsc deadline timer tells me:
>
>   . calibration result: 24999
>
> Close enough.

I tested all the Intel SoC generations we have on hand. The assumption that
the core crystal clock feeds the APIC seems to be consistently true.

(Please note that all the following results are done with CONFIG_HZ=250,
 which is why the "calibration result" is 4x higher than HZ=1000 as used
 in previous mails)

In the easy case, the low cost platforms do not support CPUID.0x16 (so no
CPU frequency reporting), but they do tell us the core crystal clock, which
is consistent with the APIC calibration result:

N5000 (Gemini Lake)
[0.122948] ... lapic delta = 11
[0.122948] ... PM-Timer delta = 357950
[0.122948] ... PM-Timer result ok
[0.122948] . delta 11
[0.122948] . mult: 5153917
[0.122948] . calibration result: 76799
[0.122948] . CPU clock speed is 1094.1542 MHz.
[0.122948] . host bus clock speed is 19.0799 MHz.
   Time Stamp Counter/Core Crystal Clock Information (0x15):
  TSC/clock ratio = 171/3
  nominal core crystal clock = 1920 Hz
   Processor Frequency Information (0x16):
  Core Base Frequency (MHz) = 0x0 (0)
  Core Maximum Frequency (MHz) = 0x0 (0)
  Bus (Reference) Frequency (MHz) = 0x0 (0)

N3350 (Apollo Lake)
[0.248894] ... lapic delta = 11
[0.248894] ... PM-Timer delta = 357949
[0.248894] ... PM-Timer result ok
[0.248894] . delta 11
[0.248894] . mult: 5153917
[0.248894] . calibration result: 76799
[0.248894] . CPU clock speed is 1094.1540 MHz.
[0.248894] . host bus clock speed is 19.0799 MHz.
   Time Stamp Counter/Core Crystal Clock Information (0x15):
  TSC/clock ratio = 171/3
  nominal core crystal clock = 1920 Hz
(CPUID 0x16 not supported at all)

And the 4 higher-end SoCs that we have available for testing all report
crystal clock 0Hz from CPUID 0x15, but by combining the CPUID.0x16 base
frequency with the CPUID.0x15 TSC/clock ratio, the crystal frequency can
be calculated as you describe, and it consistently matches the APIC timer
calibration result.

i9-9980HK (Coffee Lake HR)
[0.379421] ... lapic delta = 149998
[0.379421] ... PM-Timer delta = 357950
[0.379421] ... PM-Timer result ok
[0.379421] . delta 149998
[0.379421] . mult: 6442365
[0.379421] . calibration result: 95998
[0.379421] . CPU clock speed is 2399.3902 MHz.
[0.379421] . host bus clock speed is 23.3998 MHz.
   Time Stamp Counter/Core Crystal Clock Information (0x15):
  TSC/clock ratio = 200/2
  nominal core crystal clock = 0 Hz
   Processor Frequency Information (0x16):
  Core Base Frequency (MHz) = 0x960 (2400)
  Core Maximum Frequency (MHz) = 0x1388 (5000)
  Bus (Reference) Frequency (MHz) = 0x64 (100)

i7-8565U (Whiskey Lake)
[0.173776] ... lapic delta = 149998
[0.173776] ... PM-Timer delta = 357950
[0.173776] ... PM-Timer result ok
[0.173776] . delta 149998
[0.173776] . mult: 6442365
[0.173776] . calibration result: 95998
[0.173776] . CPU clock speed is 1991.3903 MHz.
[0.173776] . host bus clock speed is 23.3998 MHz.
   Time Stamp Counter/Core Crystal Clock Information (0x15):
  TSC/clock ratio = 166/2
  nominal core crystal clock = 0 Hz
   Processor Frequency Information (0x16):
  Core Base Frequency (MHz) = 0x7d0 (2000)
  Core Maximum Frequency (MHz) = 0x11f8 (4600)
  Bus (Reference) Frequency (MHz) = 0x64 (100)

i5-7200U (Kabylake)
[0.219142] ... lapic delta = 149998
[0.219142] ... PM-Timer delta = 357951
[0.219142] ... PM-Timer result ok
[0.219142] . delta 149998
[0.219142] . mult: 6442365
[0.219142] . calibration result: 95998
[0.219142] . CPU clock speed is 2711.3880 MHz.
[0.219142] . host bus clock speed is 23.3998 MHz.
   Time Stamp Counter/Core Crystal Clock Information (0x15):
  TSC/clock ratio = 226/2
  nominal core crystal clock = 0 Hz
   Processor Frequency Information (0x16):
  Core Base Frequency (MHz) = 0xa8c (2700)
  Core Maximum Frequency (MHz) = 0xc1c (3100)
  Bus (Reference) Frequency (MHz) = 0x64 (100)

m3-8110Y (Amber Lake)
[0.102289] ... lapic delta = 149998
[0.102289] ... PM-Timer delta = 357951
[0.102289] ... PM-Timer result ok
[0.102289] . delta 149998
[0.102289] . mult: 6442365
[  

Detecting x86 LAPIC timer frequency from CPUID data

2019-04-16 Thread Daniel Drake
The CPUID.0x16 leaf provides "Bus (Reference) Frequency (in MHz)".

In the thread "No 8254 PIT & no HPET on new Intel N3350 platforms
causes kernel panic during early boot" we are exploring ways to have
the kernel avoid using the PIT/HPET IRQ0 timer in more cases, and
Thomas Gleixner suggested that we could use this CPUID data to set
lapic_timer_frequency, avoiding the need for calibrate_APIC_clock()
to measure the APIC clock against the IRQ0 timer.

I'm thinking of the the following code change, however I get
unexpected results on Intel i7-8565U (Whiskey Lake). When
booting without this change, and with apic=notscdeadline (so that
APIC clock gets calibrated and used), the bus speed is detected as
23MHz:

 ... lapic delta = 149994
 ... PM-Timer delta = 357939
 ... PM-Timer result ok
 . delta 149994
 . mult: 6442193
 . calibration result: 23999
 . CPU clock speed is 1991.0916 MHz.
 . host bus clock speed is 23.0999 MHz.

However the CPUID.0x16 ECX reports a 100MHz bus speed on this device,
so this code change would produce a significantly different calibration.

Am I doing anything obviously wrong?

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 3fae23834069..6c51ce842f86 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -679,6 +679,16 @@ static unsigned long cpu_khz_from_cpuid(void)
 
cpuid(0x16, _base_mhz, _max_mhz, _bus_mhz, );
 
+#ifdef CONFIG_X86_LOCAL_APIC
+   /*
+* If bus frequency is provided in CPUID data, set
+* global lapic_timer_frequency to bus_clock_cycles/jiffy.
+* This avoids having to calibrate the APIC timer later.
+*/
+   if (ecx_bus_mhz)
+   lapic_timer_frequency = (ecx_bus_mhz * 100) / HZ;
+#endif
+
return eax_base_mhz * 1000;
 }
 
-- 
2.19.1



Re: No 8254 PIT & no HPET on new Intel N3350 platforms causes kernel panic during early boot

2019-04-15 Thread Daniel Drake
On Wed, Apr 10, 2019 at 8:54 PM Thomas Gleixner  wrote:
> On Tue, 9 Apr 2019, Daniel Drake wrote:
> > On Wed, Apr 3, 2019 at 7:21 PM Thomas Gleixner  wrote:
> > >  - Prevent the TSC calibration code from touching PIT/HPET. It
> > >should do that already when the TSC frequency can be retrieved
> > >via CPUID or MSR. Should work, emphasis on should ...
> >
> > >From above, this seems to be working acceptably already. It does touch
> > the PIT, but ultimately ignores the information that it provided.
>
> Yes, but we might actually be smarter than that.

Do you have anything specific in mind?

You originally laid out this idea in the context of doing this if the
PIT/HPET is not working. However, I can't immediately see how to judge
that because:
- According to the analysis in my last mail, the PIT is actually
ticking even when it is gated in the BIOS. The BIOS setting just seems
to make it tick 4 times slower and not generate any IRQ0 interrupts.
- TSC calibration code runs really early during boot. To make it
detect this situation we could make it check if IRQ0 is working,
however setup_default_timer_irq() only happens a lot later, so I'm not
sure how this could be checked at such an early stage.

I think I'm now fairly clear on the other suggestions you have made so
I'll see if I can come up with some patches.

Thanks!
Daniel


Re: [PATCH v2 10/11] platform/x86: asus-wmi: Switch fan boost mode

2019-04-12 Thread Daniel Drake
On Thu, Apr 11, 2019 at 1:47 PM Yurii Pavlovskyi
 wrote:
> * 0x00 - is normal,
> * 0x01 - is obviously turbo by the amount of noise, might be useful to
> avoid CPU frequency throttling on high load,
> * 0x02 - the meaning is unknown at the time as modes are not named
> in the vendor documentation, but it does look like a quiet mode as CPU
> temperature does increase about 10 degrees on maximum load.

I'm curious which vendor documentation you're working with here.

>From the spec,
0 = normal
1 = overboost
2 = silent

Also you can use DSTS on the 0x00110018 device to check the exact
capabilities supported, which you should use to refine which modes can
be cycled through.
Bit 0 = overboost supported
Bit 1 = silent supported

Thanks
Daniel


Re: [PATCH v2 05/11] platform/x86: asus-wmi: Support queued WMI event codes

2019-04-12 Thread Daniel Drake
On Thu, Apr 11, 2019 at 1:44 PM Yurii Pavlovskyi
 wrote:
> Event codes are expected to be polled from a queue on at least some
> models.

Maybe avoid the word "poll" since it suggests something else (at least to me).

> The fix flushes the old key codes out of the queue on load and after
> receiving event the queue is read until either .. or 1 is encountered.
>
> It might be considered a minor issue and no normal user would likely to
> observe this (there is little reason unloading the driver), but it does
> significantly frustrate a developer who is unlucky enough to encounter
> this.
>
> Introduce functionality for flushing and processing queued codes, which is
> enabled via quirk flag for ASUS7000. It might be considered if it is
> reasonable to enable it everywhere (might introduce regressions) or always
> try to flush the queue on module load and try to detect if this quirk is
> present in the future.
>
> This patch limits the effect to the specific hardware defined by ASUS7000
> device that is used for driver detection by vendor driver of Fx505. The
> fallback is also implemented in case initial flush fails.

Which vendor driver are you referring to here?

Researching more:
In our database of 144 Asus models, 142 of them have GANQ.

Of those 142, 9 of them return One in the empty-queue case. The other
133 match your FX505GM device exactly. So it seems valid to interpret
both 0x and 0x1 as queue-end markers.

The 2 devices that do not have GANQ are not laptops. They also do not
have the _UID "ATK" WMI device, they only have "ASUSWMI" and their WMI
_WED method does not use a queue.
There are a few more units that have both ASUSWMI and ATK devices, and
the ASUSWMI device appears to never be queued.
Another observation is that the ASUSWMI device works with notify code
0xD2, and the ATK device works with 0xFF.

Nailing this down a bit further, I found a DSDT for EEEPC X101H: that
one only has ASUSWMI and it is also not queued.

So I think you should make this queue behaviour applied more
generically, but either avoid it when the WMI device _UID is not "ATK"
(as discussed in the DCTS/DSTS thread), or avoid it when the notify
code is not 0xFF.

Thanks
Daniel


Re: [PATCH 04/11] platform/x86: asus-wmi: Add quirk to force DSTS WMI method detection

2019-04-11 Thread Daniel Drake
On Thu, Apr 11, 2019 at 4:28 AM Yurii Pavlovskyi
 wrote:
> The DSTS method detection fails, as nothing is returned if method is not
> defined in WMNB. As a result the control of keyboard backlight is not
> functional for TUF Gaming series laptops (at the time the only
> functionality of the driver on this model implemented with WMI methods).
>
> Patch was tested on a newer TUF Gaming FX505GM and older K54C model.
>
> FX505GM:
> Method (WMNB, 3, Serialized)
> { ...
> If ((Local0 == 0x53545344))
> {
> ...
> Return (Zero)
> }
> ...
> // No return
> }
>
> K54C:
> Method (WMNB, 3, Serialized)
> { ...
> If ((Local0 == 0x53545344))
> {
> ...
> Return (0x02)
> }
> ...
> Return (0xFFFE)
> }
>
> The non-existing method ASUS_WMI_METHODID_DSTS=0x53544344 (actually it is
> DCTS in little endian ASCII) is selected in asus->dsts.
>
> One way to fix this would be to call both for every known device ID until
> some answers - this would increase module load time.
>
> Another option is to check some device that is known to exist on every
> model - none known at the time.
>
> Last option, which is implemented, is to check for presence of the
> ASUS7000 device in ACPI tree (it is a dummy device), which is the
> condition used for loading the vendor driver for this model. This might
> not fix every affected model ever produced, but it likely does not
> introduce any regressions. The patch introduces a quirk that is enabled
> when ASUS7000 is found.
>
> Scope (_SB)
> {
> Device (ATK)
> {
> Name (_HID, "ASUS7000")  // _HID: Hardware ID
> }
> }

Hmm, tricky! But about time we did something about the DSTS vs DCTS guessing.

While we don't have definitive knowledge of the right thing to do
here, I think I have enough info available to solidify some
assumptions we'd otherwise be afraid to make, and then we can
implement a better approach here.

In our database of 144 Asus DSDTs, 14 of them respond to DCTS:

AS_D520MT
D425MC
D640SA
D320SF-K
D415MT
D830MT
G11DF
M32CD4-K
V221ID
V272UN_SKU3
Z240IE
ZN220IC-K
ZN241IC
ZN270IE

Of those 14, they all additionally respond to DSTS, except for D415MT
and AS_D520MT.

In all 14 cases, the DCTS handling is done inside a device with _UID
ASUSWMI. None of the other 130 products have a device with that _UID.

Furthermore, we recently received access to the ASUS spec, which
confirms that the Instance Name for EeePC is "ACPI\PNP0C14\ASUSWMI_0"
whereas the Instance Name for other notebooks is "ACPI\PNP0C14\ATK_0".

The 12 devices that respond to both DCTS and DSTS, do it on separate
different devices, albeit with the same _WDG UUID. The one with _UID
ASUSWMI responds to DCTS, and the one with _UID ATK responds to DSTS.

So that seems fairly convincing. My thinking is that we can check the
_UID of the underlying device, and use DCTS for ASUSWMI, DSTS
otherwise. To do that, I think you'd have to rework the driver probing
so that it happens through wmi_driver_register(). Then from the probe
function onwards you will get a wmi_device, and hopefully you can get
the _UID through that instance somehow.

There's a separate issue of what happens on those 12 machines where
there are two WMI devs, with the same _WDG GUID.
drivers/platform/x86/wmi.c drops duplicates, so one of them is being
ignored in that case. We'd ideally find a way to ignore the ASUSWMI
node and go with ATK in that situation. But I think this can be
separated from your work here.

Thanks for these patches and welcome to the world of kernel development!

Daniel


Re: [PATCH 01/11] platform/x86: asus-wmi: Fix hwmon device cleanup

2019-04-11 Thread Daniel Drake
On Thu, Apr 11, 2019 at 4:21 AM Yurii Pavlovskyi
 wrote:
>
> The asus-wmi driver does not clean up the hwmon device on exit or error.
> To reproduce the bug, repeat rmmod, insmod to verify that device number
> /sys/devices/platform/asus-nb-wmi/hwmon/hwmon?? grows every time. Add
> pointer to the device in module state and call cleanup on error.

I wonder if this can be fixed more cleanly by using
devm_hwmon_device_register_with_groups() ?

Thanks
Daniel


Re: No 8254 PIT & no HPET on new Intel N3350 platforms causes kernel panic during early boot

2019-04-08 Thread Daniel Drake
On Wed, Apr 3, 2019 at 7:21 PM Thomas Gleixner  wrote:
> Btw, one of those links you provided
>
>   https://www.manualslib.com/manual/1316475/Ecs-Ed20pa2.html?page=23
>
> claims that you have to disable MWAIT as well. No idea why. Is MWAIT
> disabled on your platform?

I don't have that option in the BIOS. However there's no mention of
"mwait" nor "mwaitx" in /proc/cpuinfo. Checking our more general
database of 202 x86_64 consumer products released over the last few
years, only 19 of them have mwait/mwaitx listed there and they tend to
be older platforms.

> We have early-quirks.c in arch/x86/kernel/ for that.

Nice, we should be able to work around the issue there, but I hope we
can find something better...

> For newer CPUs we might assume that:
>
>  1) The TSC and APIC timer are actually usable
>
>  2) The frequencies can be retrieved from CPUID or MSRs
>
> If #1 and #2 are reliable we can avoid the whole calibration and interrupt
> delivery mess.

Let's take a step back and re-examine the wider sequence of events
here (which I've now done thanks to your pointers).

1. In very early boot, we face the TSC calibration challenge, arriving
at determine_cpu_tsc_frequencies(). This function calculates CPU
frequency and TSC frequency separately.

For the CPU frequency, native_calibrate_cpu_early() tries to do it via
cpu_khz_from_cpuid() with CPUID leaf 0x16, but this is not available
on the platforms in question, which have max cpuid level 0x15.
cpu_khz_from_msr() is then tried, but that doesn't support this
platform either (looks like it only supports older SoC generations).
So now we arrive in quick_pit_calibrate(), which directly programs the
PIT and measures the TSC rate against the PIT ticks.

When the 8254 is ungated in the BIOS, this function fails early because:
if (pit_expect_msb(0xff, , )) {
/* returned at count=13, d1 is now 32118 */
for (i = 1; i <= MAX_QUICK_PIT_ITERATIONS; i++) {
if (!pit_expect_msb(0xff-i, , ))
/* returned at count=13, d2 is now 48595 */
break;

delta -= tsc;
/* delta is now 246741 */

/*
 * Extrapolate the error and fail fast if the error will
 * never be below 500 ppm.
 */
if (i == 1 &&
d1 + d2 >= (delta * MAX_QUICK_PIT_ITERATIONS) >> 11)
return 0;
/* this return statement is hit, the calculation is:
32118 + 48595 >= (246741 * 233) >> 11
*/

so the error was too high (I'm not sure why) and
determine_cpu_tsc_frequencies() records the CPU frequency as 0.

Then, comparing to when the 8254 is gated via the BIOS option, the
behaviour is surprising too.
In that case, the quick_calibrate_pit() loop runs through up to i=128,
at which point the error level is low enough to be accepted,
calculating the CPU frequency as 4448MHz (4x higher than reality).
During each loop iteration, pit_expect_msb() returns when the value
changes at count=63 (compared to 13 in the PIT-ungated case). Does
this suggest the PIT is not actually fully gated, it's just ticking a
lot slower than otherwise?

Anyway, back in determine_cpu_tsc_frequencies() with the CPU frequency
calibration done, we now do TSC calibration. This one succeeds in all
cases via native_calibrate_tsc()  using CPUID leaf 0x15 to read the
correct value. The TSC is 1094MHz.

Then, in both cases (8254 gated or not), the CPU frequency calculation
is discarded here, because it's wildly different from the TSC rate:
else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz)
cpu_khz = tsc_khz;

So it seems that this code already behaves along the lines you
describe: it gives more trust to the TSC value read in the modern way,
and does not get upset if the CPU frequency calibration against the
PIT didn't produce a meaningful result.


2. Significantly later during boot, x86_late_time_init() calls
hpet_time_init() which sets up either the PIT or HPET. However, as far
as I can see, there is no checking that the selected clock is actually
ticking. In the case of these affected platforms with the 8254 gated,
we sail right pass this point without a working clock source.

3. x86_late_time_init() then calls apic_intr_mode_init() ->
apic_bsp_setup() -> setup_IO_APIC() and at this point we reach
check_timer(), which attempts to verify/fixup IRQ0 delivery via the
IO-APIC. At this point we check that jiffies increments, and if not,
panic.

4. Some time later, naive_smp_prepare_cpus() calls
setup_boot_APIC_clock() -> setup_APIC_timer() which registers the
local APIC clocksource, replacing the previous PIT/HPET clocksource.
There's no check to make sure that the new clocksource is ticking, as
far as I can see.

5. Some time later, start_secondary() calls
start_secondary_APIC_clock() -> setup_APIC_timer() registering the
APIC clocksource (again? or just for another CPU core?).


Hopefully that analysis helps refine/elaborate the plan a bit more...

> That means we need the following decision logic:
>
>   1) If HPET is 

No 8254 PIT & no HPET on new Intel N3350 platforms causes kernel panic during early boot

2019-04-03 Thread Daniel Drake
Hi,

I already wrote about this problem in the thread "APIC timer checked
before it is set up, boot fails on Connex L1430"
https://lkml.org/lkml/2018/12/28/10
However my initial diagnosis was misguided, and I have some new
findings to share now, so I'm starting over in this new thread.

Also CCing Hans, who also often attracts this class of problem on low
cost hardware!

The problem is that on affected platforms, all Linux distros (and all
known kernel versions) fail to boot, hanging on a black screen. EFI
earlyprintk can be used to see the panic:

APIC: switch to symmetric I/O mode setup
x2apic: IRQ remapping doesn't support X2APIC mode
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
..MP-BIOS bug: 8254 timer not connected to IO-APIC
...tryign to set up timer (IRQ0) through the 8259A ...
. (found apic 0 pin 2) ...
... failed.
...trying to set up timer as Virtual Wire IRQ...
. failed.
...trying to set up timer as ExtINT IRQ...
do_IRQ: 0.55 No irq handler for vector
. failed :(.
Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with
apic=debug and send a report.

After encountering this on Connex L1430 last time, we have now
encountered another affected product, from a different vendor (SCOPE
SN116PYA). They both have Intel Apollo Lake N3350 and AMI BIOS.

The code in question is making sure that the IRQ0 timer works, by
waiting for an interrupt. In this case there is no interrupt.

The x86 platform code in hpet_time_init() tries to enable the HPET
timer for this, however that is not available on these affected
platforms (no HPET ACPI table). So it then falls back on the 8253/8254
legacy PIT. The i8253.c driver is invoked to program the PIT
accordingly, however in this case it does not result in any IRQ0
interrupts being generated --> panic.

I found a relevant setting in the BIOS: Chipset -> South Cluster
Configuration -> Miscellaneous Configuration -> 8254 Clock Gating
This option is set to Enabled by default. Setting it to Disabled makes
the PIT tick and Linux boot finally works.

It's nice to have a workaround but I would hope we could do better -
especially because it seems like this problem is spreading. In
addition to the two products we found here, searching around finds
several other product manuals and discussions that tell you to go into
the BIOS and change this option if you want Linux to boot, some
examples:
https://blog.csdn.net/qhtsm/article/details/88600316
https://www.manualslib.com/manual/1316475/Ecs-Ed20pa2.html?page=23
https://tools.exone.de/live/shop/img/produkte/fs_112124_2.pdf page 11

As another data point, Windows 10 boots fine in this no-PIT no-HPET
configuation.

Going deeper, I found the clock_gate_8254 option in the coreboot
source code. This pointed me to the ITSSPRC register, which is
documented on page 1694 of
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/300-series-chipset-pch-datasheet-vol-2.pdf

"8254 Static Clock Gating Enable (CGE8254): When set, the 8254 timer
is disabled statically. This bit shall be set by BIOS if the 8254
feature is not needed in the system or before BIOS hands off the
system that supports C11. Normal operation of 8254 requires this bit
to 0."

(what's C11?)

I verified that the BIOS setting controls this specific bit value, and
I also created and verified a workaround that unsets this bit - now
Linux boots fine regardless of the BIOS setting:

#define INTEL_APL_PSR_BASE0xd000
#define INTEL_APL_PID_ITSS0xd0
#define INTEL_PCR_PORTID_SHIFT16
#define INTEL_APL_PCR_ITSSPRC0x3300
static void quirk_intel_apl_8254(void)
{
u32 addr = INTEL_APL_PSR_BASE | \
(INTEL_APL_PID_ITSS << INTEL_PCR_PORTID_SHIFT) | \
INTEL_APL_PCR_ITSSPRC;
u32 value;
void __iomem *itssprc = ioremap_nocache(addr, 4);

if (!itssprc)
return;

value = readl(itssprc);
if (value & 4) {
value &= ~4;
writel(value, itssprc);
}
iounmap(itssprc);
}

I was hoping I could send a workaround patch here, but I'm not sure of
an appropriate way to detect that we are on an Intel Apollo Lake
platform. This timer stuff happens during early boot, the early quirks
in pci/quirks.c run too late for this. Suggestions appreciated.

Poking at other angles, I tried taking the HPET ACPI table from
another (working) Intel N3350 system and putting it in the initrd as
an override. This makes the HPET work fine, at which point Linux boots
OK without having to touch the (BIOS-crippled) PIT.

I also spotted that GRUB was previously affected by this BIOS-level
behaviour change.
http://git.savannah.gnu.org/cgit/grub.git/commit/?id=446794de8da4329ea532cbee4ca877bcafd0e534
Apparently GRUB used to rely on the 8254 PIT too, but it now uses the
pmtimer for TSC calibration instead. I guess the originally-affected
platforms only ran into GRUB freezing here (as opposed to both GRUB
and Linux freezing) because those platforms had a working HPET,
meaning that Linux was 

Re: [PATCH] pinctrl: intel: save HOSTSW_OWN register over suspend/resume

2019-03-28 Thread Daniel Drake
On Thu, Mar 28, 2019 at 5:17 PM Andy Shevchenko
 wrote:
> Hmm... Can you confirm that laptop you declared as a fixed case and the
> mentioned here is the same one?

They are definitely not the same exact unit - originally we had a
pre-production sample, and now we briefly diagnosed a real production
unit that was sold to a customer. There could be subtle motherboard
variations as you mention.

> If it's the case, I recommend to ping Asus again and make them check and fix.

We'll keep an eye open for any opportunities to go deeper here.
However further investigation on both our side and theirs is blocked
by not having any of the affected hardware (since the models are now
so old), so I'm not very optimistic that we'll be able to make
progress there.

> Meanwhile, Mika's proposal sounds feasible and not so intrusive. We may
> implement this later on.

Chris will work on implementing this for your consideration.

Thanks for the quick feedback!
Daniel


Re: [PATCH] pinctrl: intel: save HOSTSW_OWN register over suspend/resume

2019-03-27 Thread Daniel Drake
Hi Mika,

Digging up this old thread again...

On Tue, Nov 21, 2017 at 8:13 PM Mika Westerberg
 wrote:
>
> On Tue, Nov 21, 2017 at 07:54:26PM +0800, Chris Chiu wrote:
> > Yup, I checked the value of the corresponded pin. It shows following before
> > suspend
> > pin 18 (GPIO_18) GPIO 0x40800102 0x00024075
> >
> > Then after resume
> > pin 18 (GPIO_18) GPIO 0x40800102 0x00024075 [ACPI]
>
> OK, so ownership is changed to ACPI.
>
> > What else register do you suggest me to compare? The PADCFG2 is invalid
>
> It's fine APL does not have PADCFG2.
>
> Hmm, I don't understand how this can work in Windows either. The Windows
> people told me that they don't save and restore anything else than
> padcfg registers + ie. If the ownership is changed to ACPI it means you
> don't get interrupts anymore (only GPEs) and that applies to Windows as
> well.

In the mails after the one quoted above, we reported back to you that
the new BIOS from Asus solved the issue.

However, during the time that has followed, we have had numerous user
reports from Asus E403NA, X540NA, and X541NA laptops (basically the
same models that we originally discussed) where the touchpad stops
working after suspend/resume, even with the latest BIOS. We managed to
get an affected E403NA unit in-hands again, and confirmed that
HOSTSW_OWN was being lost like we had observed before.

Unfortunately as this was a customer laptop we had to return it
immediately, before we could investigate further. We don't have access
to any more units since they are old models now.

However I'm wondering if you have any other ideas or if you think
something like our workaround patch might be acceptable under these
circumstances:
https://github.com/endlessm/linux/commit/f391452299f62a3d0cbe5333be90f69e9895d8ff

Thanks
Daniel


Re: [PATCH 3/3] Bluetooth: btrtl: Skip initialization if firmware is already loaded

2019-01-24 Thread Daniel Drake
On Thu, Jan 24, 2019 at 11:23 PM Kai-Heng Feng
 wrote:
> Realtek bluetooth may not work after reboot:
> [   12.446130] Bluetooth: hci0: RTL: rtl: unknown IC info, lmp subver a99e, 
> hci rev 826c, hci ver 0008
>
> The power is not cut during system reboot, so the firmware is kept in
> Bluetooth controller.
>
> Realtek bluetooth doesn't have the ability to check firmware loading
> status. but the version queried by HCI_OP_READ_LOCAL_VERSION will be
> different if firmware is already loaded. Realtek's own fork, rtk_btusb
> also use this method to detect the loading status.
>
> So let's assume the firmware is already loaded when we can't find
> matching IC info.

This logic was already present in the driver - but it looks like this
regressed at this point:

commit 26503ad25de8c7c93a2037f919c2e49a62cf65f1
Author: Martin Blumenstingl 
Date:   Thu Aug 2 16:57:13 2018 +0200

Bluetooth: btrtl: split the device initialization into smaller parts

After your patch it is effectively there in two places now, since it
is also in btrtl_download_firmware() (although not really effective
after the above commit). I wonder if that can be cleaned up to avoid
duplication.

Regarding the other patches that move away from the style of returning
either a useful value or an error, is this purely a stylistic thing or
is it needed for your 3rd patch? I don't have strong feelings either
way but I have the impression that the currently implemented approach
is a common style within kernel code and I don't see benefit in
splitting off a separate out parameter.

Daniel


> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201921
> Signed-off-by: Kai-Heng Feng 
> ---
>  drivers/bluetooth/btrtl.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/bluetooth/btrtl.c b/drivers/bluetooth/btrtl.c
> index c36f500d8313..e2f89d57dd14 100644
> --- a/drivers/bluetooth/btrtl.c
> +++ b/drivers/bluetooth/btrtl.c
> @@ -546,9 +546,10 @@ int btrtl_initialize(struct hci_dev *hdev,
> hdev->bus);
>
> if (!btrtl_dev->ic_info) {
> -   rtl_dev_err(hdev, "rtl: unknown IC info, lmp subver %04x, hci 
> rev %04x, hci ver %04x",
> -   lmp_subver, hci_rev, hci_ver);
> -   return -EINVAL;
> +   rtl_dev_info(hdev, "rtl: unknown IC info, lmp subver %04x, 
> hci rev %04x, hci ver %04x",
> +lmp_subver, hci_rev, hci_ver);
> +   rtl_dev_info(hdev, "rtl: firmware may be already loaded, or 
> it's an unsupported IC.");
> +   return 0;
> }
>
> if (btrtl_dev->ic_info->has_rom_version) {
> @@ -621,7 +622,8 @@ int btrtl_setup_realtek(struct hci_dev *hdev)
> if (ret)
> return ret;
>
> -   ret = btrtl_download_firmware(hdev, _dev);
> +   if (btrtl_dev.ic_info)
> +   ret = btrtl_download_firmware(hdev, _dev);
>
> return ret;
>  }
> --
> 2.17.1
>


[tip:x86/urgent] x86/kaslr: Fix incorrect i8254 outb() parameters

2019-01-11 Thread tip-bot for Daniel Drake
Commit-ID:  7e6fc2f50a3197d0e82d1c0e86282976c9e6c8a4
Gitweb: https://git.kernel.org/tip/7e6fc2f50a3197d0e82d1c0e86282976c9e6c8a4
Author: Daniel Drake 
AuthorDate: Mon, 7 Jan 2019 11:40:24 +0800
Committer:  Thomas Gleixner 
CommitDate: Fri, 11 Jan 2019 21:35:47 +0100

x86/kaslr: Fix incorrect i8254 outb() parameters

The outb() function takes parameters value and port, in that order.  Fix
the parameters used in the kalsr i8254 fallback code.

Fixes: 5bfce5ef55cb ("x86, kaslr: Provide randomness functions")
Signed-off-by: Daniel Drake 
Signed-off-by: Thomas Gleixner 
Cc: b...@alien8.de
Cc: h...@zytor.com
Cc: li...@endlessm.com
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20190107034024.15005-1-dr...@endlessm.com

---
 arch/x86/lib/kaslr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/kaslr.c b/arch/x86/lib/kaslr.c
index 79778ab200e4..a53665116458 100644
--- a/arch/x86/lib/kaslr.c
+++ b/arch/x86/lib/kaslr.c
@@ -36,8 +36,8 @@ static inline u16 i8254(void)
u16 status, timer;
 
do {
-   outb(I8254_PORT_CONTROL,
-I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
+   outb(I8254_CMD_READBACK | I8254_SELECT_COUNTER0,
+I8254_PORT_CONTROL);
status = inb(I8254_PORT_COUNTER0);
timer  = inb(I8254_PORT_COUNTER0);
timer |= inb(I8254_PORT_COUNTER0) << 8;


Re: APIC timer checked before it is set up, boot fails on Connex L1430

2019-01-06 Thread Daniel Drake
On Fri, Dec 28, 2018 at 2:11 PM Daniel Drake  wrote:
> On the Connex L1430 laptop based on Intel Apollo Lake N3350, Linux
> doesn't boot. It hangs early on a blank screen. Reproduced with Linus
> git, 4.18 and 4.19 (there is no previous known working kernel
> version). EFI earlyprintk shows:
>
> APIC: switch to symmetric I/O mode setup
> x2apic: IRQ remapping doesn't support X2APIC mode
> ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> ..MP-BIOS bug: 8254 timer not connected to IO-APIC
> ...tryign to set up timer (IRQ0) through the 8259A ...
> . (found apic 0 pin 2) ...
> ... failed.
> ...trying to set up timer as Virtual Wire IRQ...
> . failed.
> ...trying to set up timer as ExtINT IRQ...
> do_IRQ: 0.55 No irq handler for vector
> . failed :(.
> Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with
> apic=debug and send a report.
>
> Looking closer, check_timer() is observing that the IOAPIC timer
> doesn't tick, so it then tries some other approaches but doesn't
> manage to get them working either.
>
> The strange thing is, I booted with the no_timer_check parameter and
> the system works fine! With this parameter it assumes the IOAPIC timer
> is ticking and just continues the boot sequence anyway. Here is the
> boot log with apic=debug no_timer_check:
> https://gist.github.com/dsd/6f40d8ecc7102dd5dcb90c5dedc69214#file-dmesg-txt
>
> /proc/interrupts shows that APIC Local timer interrupts are working
> fine on both CPUs:
> https://gist.github.com/dsd/6f40d8ecc7102dd5dcb90c5dedc69214#file-interrupts-txt
>
> So, check_timer() is incorrectly deducing that the IOAPIC timer isn't
> working. The way it checks this is to do a delay loop and then check
> if jiffies has advanced. I presume the expectation here is that during
> this delay, the hardware IRQ will invoke local_apic_timer_interrupt()
> which will then increment jiffies. Indeed, during check_timer()
> execution this interrupt does not fire, however by using
> no_timer_check and adding a log message I can see that it fires for
> the first time quite some time after check_timer() is done:
>
>  ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
>  clocksource: tsc-early: mask: 0x max_cycles:
> 0xfc66f4fc7c, max_idle_ns: 440795224246 ns
>  Calibrating delay loop (skipped), value calculated using timer
> frequency.. 2188.80 BogoMIPS (lpj=1094400)
>  pid_max: default: 32768 minimum: 301
>  LSM: Security Framework initializing
>  SELinux:  Initializing.
>  Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
>  Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
>  Mount-cache hash table entries: 4096 (order: 3, 32768 bytes)
>  Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes)
>  mce: CPU supports 7 MCE banks
>  mce: CPU0: Thermal monitoring enabled (TM1)
>  Last level iTLB entries: 4KB 48, 2MB 0, 4MB 0
>  Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
>  Spectre V2 : Spectre mitigation: kernel not compiled with retpoline;
> no mitigation available!
>  Freeing SMP alternatives memory: 44K
>  TSC deadline timer enabled
>  smpboot: CPU0: Intel(R) Celeron(R) CPU N3350 @ 1.10GHz (family: 0x6,
> model: 0x5c, stepping: 0x9)
>  Performance Events: PEBS fmt3+, Goldmont events, 32-deep LBR,
> full-width counters, Intel PMU driver.
>  ... version:4
>  ... bit width:  48
>  ... generic registers:  4
>  ... value mask: 
>  ... max period: 7fff
>  ... fixed-purpose events:   3
>  ... event mask: 0007000f
>  rcu: Hierarchical SRCU implementation.
>  smp: Bringing up secondary CPUs ...
>  !!! local_apic_timer_interrupt for the first time cpu0 !!!
>
> Experimenting further, I used the same approach of adding delays and
> checking for the interrupt during the delay to figure out at which
> precise point during the boot sequence the timer interrupt starts
> working. It's here:
>
> static void setup_APIC_timer(void)
> {
> [...]
> if (this_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER)) {
> [...]
> clockevents_config_and_register(levt,
> tsc_khz * (1000 / TSC_DIVISOR),
> 0xF, ~0UL);
> }
> }
>
> We reach clockevents_register_device() which does:
>  1. Take a spinlock and disable IRQs
>  2. lapic_set_oneshot() which leads to "TSC deadline timer enabled" message
>  3. lapic_next_deadline()
>  4. Spin unlock & re-enable IRQs
>
> At the exact point where IRQs are re-enabled above, which is at the
> time of return from clockevents_config_and_register(), timer
> interrupts start working.
>
>
> The ov

[PATCH] kaslr: fix incorrect i8254 outb parameters

2019-01-06 Thread Daniel Drake
The outb call takes parameters value and port, in that order.
Fix the parameters used in the kalsr i8254 fallback code.

Signed-off-by: Daniel Drake 
---
 arch/x86/lib/kaslr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/kaslr.c b/arch/x86/lib/kaslr.c
index 79778ab200e4..a53665116458 100644
--- a/arch/x86/lib/kaslr.c
+++ b/arch/x86/lib/kaslr.c
@@ -36,8 +36,8 @@ static inline u16 i8254(void)
u16 status, timer;
 
do {
-   outb(I8254_PORT_CONTROL,
-I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
+   outb(I8254_CMD_READBACK | I8254_SELECT_COUNTER0,
+I8254_PORT_CONTROL);
status = inb(I8254_PORT_COUNTER0);
timer  = inb(I8254_PORT_COUNTER0);
timer |= inb(I8254_PORT_COUNTER0) << 8;
-- 
2.19.1



APIC timer checked before it is set up, boot fails on Connex L1430

2018-12-27 Thread Daniel Drake
Hi,

On the Connex L1430 laptop based on Intel Apollo Lake N3350, Linux
doesn't boot. It hangs early on a blank screen. Reproduced with Linus
git, 4.18 and 4.19 (there is no previous known working kernel
version). EFI earlyprintk shows:

APIC: switch to symmetric I/O mode setup
x2apic: IRQ remapping doesn't support X2APIC mode
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
..MP-BIOS bug: 8254 timer not connected to IO-APIC
...tryign to set up timer (IRQ0) through the 8259A ...
. (found apic 0 pin 2) ...
... failed.
...trying to set up timer as Virtual Wire IRQ...
. failed.
...trying to set up timer as ExtINT IRQ...
do_IRQ: 0.55 No irq handler for vector
. failed :(.
Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with
apic=debug and send a report.

Looking closer, check_timer() is observing that the IOAPIC timer
doesn't tick, so it then tries some other approaches but doesn't
manage to get them working either.

The strange thing is, I booted with the no_timer_check parameter and
the system works fine! With this parameter it assumes the IOAPIC timer
is ticking and just continues the boot sequence anyway. Here is the
boot log with apic=debug no_timer_check:
https://gist.github.com/dsd/6f40d8ecc7102dd5dcb90c5dedc69214#file-dmesg-txt

/proc/interrupts shows that APIC Local timer interrupts are working
fine on both CPUs:
https://gist.github.com/dsd/6f40d8ecc7102dd5dcb90c5dedc69214#file-interrupts-txt

So, check_timer() is incorrectly deducing that the IOAPIC timer isn't
working. The way it checks this is to do a delay loop and then check
if jiffies has advanced. I presume the expectation here is that during
this delay, the hardware IRQ will invoke local_apic_timer_interrupt()
which will then increment jiffies. Indeed, during check_timer()
execution this interrupt does not fire, however by using
no_timer_check and adding a log message I can see that it fires for
the first time quite some time after check_timer() is done:

 ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
 clocksource: tsc-early: mask: 0x max_cycles:
0xfc66f4fc7c, max_idle_ns: 440795224246 ns
 Calibrating delay loop (skipped), value calculated using timer
frequency.. 2188.80 BogoMIPS (lpj=1094400)
 pid_max: default: 32768 minimum: 301
 LSM: Security Framework initializing
 SELinux:  Initializing.
 Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
 Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
 Mount-cache hash table entries: 4096 (order: 3, 32768 bytes)
 Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes)
 mce: CPU supports 7 MCE banks
 mce: CPU0: Thermal monitoring enabled (TM1)
 Last level iTLB entries: 4KB 48, 2MB 0, 4MB 0
 Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
 Spectre V2 : Spectre mitigation: kernel not compiled with retpoline;
no mitigation available!
 Freeing SMP alternatives memory: 44K
 TSC deadline timer enabled
 smpboot: CPU0: Intel(R) Celeron(R) CPU N3350 @ 1.10GHz (family: 0x6,
model: 0x5c, stepping: 0x9)
 Performance Events: PEBS fmt3+, Goldmont events, 32-deep LBR,
full-width counters, Intel PMU driver.
 ... version:4
 ... bit width:  48
 ... generic registers:  4
 ... value mask: 
 ... max period: 7fff
 ... fixed-purpose events:   3
 ... event mask: 0007000f
 rcu: Hierarchical SRCU implementation.
 smp: Bringing up secondary CPUs ...
 !!! local_apic_timer_interrupt for the first time cpu0 !!!

Experimenting further, I used the same approach of adding delays and
checking for the interrupt during the delay to figure out at which
precise point during the boot sequence the timer interrupt starts
working. It's here:

static void setup_APIC_timer(void)
{
[...]
if (this_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER)) {
[...]
clockevents_config_and_register(levt,
tsc_khz * (1000 / TSC_DIVISOR),
0xF, ~0UL);
}
}

We reach clockevents_register_device() which does:
 1. Take a spinlock and disable IRQs
 2. lapic_set_oneshot() which leads to "TSC deadline timer enabled" message
 3. lapic_next_deadline()
 4. Spin unlock & re-enable IRQs

At the exact point where IRQs are re-enabled above, which is at the
time of return from clockevents_config_and_register(), timer
interrupts start working.


The overall ordering here seems surprising. check_timer() is probing
whether the APIC timer works well before setup_APIC_timer() has been
called. Shouldn't the timer be checked only after it has been set up?

Or is Linux assuming that the BIOS will boot with the APIC timer
already running?

Any other debugging suggestions much appreciated.

Thanks
Daniel


Re: [PATCH] ACPI / battery: Fix reporting "Not charging" when capacity is 100%

2018-11-06 Thread Daniel Drake
On Mon, Nov 5, 2018 at 1:19 AM Pavel Machek  wrote:
> Plus, I don't think "100% charge" is right test for "battery full". At
> least on thinkpads, there's configuration option, and it is common
> _not_ to charge batterry above 95% or so (to increase its lifetime).

Hans also touched on this area in his response:

> As for this kernel-side fix I do not believe that fixing thus in
> the kernel is the right thing to do. We try to stay away from
> heuristics using full_charge_capacity in the kernel since that
> is not really reliable / deterministic.

I'm not fully convinced by this argument though.

The ACPI spec is not very clear on what conditions you should apply to
decide when the battery is full. Instead, ACPI seems to provide a
pretty decent amount of data, and the decision about whether to
interpret that as "battery full" is left for consumers.

The best that the ACPI spec offers here is the _BIF/_BIX "Last Full
Charge Capacity" field (known as full_charge_capacity in this driver),
and it does appear to consider the "only charge to 95%" consideration,
although it is only documented as a prediction. There is also a sample
calculation (section 3.9.3) of how to calculate the Remaining Battery
Percentage which uses "Last Full Charge Capacity" too.

Without a crystal clear definition of battery full, it seems up to the
consumers of the data to decide when the battery is full via some form
of heuristic. And this is exactly what acpi-battery already does in
acpi_battery_is_charged() - and the main aspect is looking at
full_charge_capacity. The result of this is then sent to userspace.

So it does not seem fair to say that "you can't use basic heuristics
around full_charge_capacity" because the kernel already does (and this
is what the ACPI spec encourages), although JP's patch could be
improved to more clearly and more completely follow the logic in
acpi_battery_is_charged().

The question of whether we would want to interpret the numbers in this
way when DISCHARGING is set is a good one though. The ACPI spec
doesn't shine much light on this, but the calculations related to
battery percentage and battery life in section 3.9.3 at least do not
seem to consider charging/discharging status.

Thanks
Daniel


Re: [PATCH] ACPI / battery: Fix reporting "Not charging" when capacity is 100%

2018-11-06 Thread Daniel Drake
On Mon, Nov 5, 2018 at 1:19 AM Pavel Machek  wrote:
> Plus, I don't think "100% charge" is right test for "battery full". At
> least on thinkpads, there's configuration option, and it is common
> _not_ to charge batterry above 95% or so (to increase its lifetime).

Hans also touched on this area in his response:

> As for this kernel-side fix I do not believe that fixing thus in
> the kernel is the right thing to do. We try to stay away from
> heuristics using full_charge_capacity in the kernel since that
> is not really reliable / deterministic.

I'm not fully convinced by this argument though.

The ACPI spec is not very clear on what conditions you should apply to
decide when the battery is full. Instead, ACPI seems to provide a
pretty decent amount of data, and the decision about whether to
interpret that as "battery full" is left for consumers.

The best that the ACPI spec offers here is the _BIF/_BIX "Last Full
Charge Capacity" field (known as full_charge_capacity in this driver),
and it does appear to consider the "only charge to 95%" consideration,
although it is only documented as a prediction. There is also a sample
calculation (section 3.9.3) of how to calculate the Remaining Battery
Percentage which uses "Last Full Charge Capacity" too.

Without a crystal clear definition of battery full, it seems up to the
consumers of the data to decide when the battery is full via some form
of heuristic. And this is exactly what acpi-battery already does in
acpi_battery_is_charged() - and the main aspect is looking at
full_charge_capacity. The result of this is then sent to userspace.

So it does not seem fair to say that "you can't use basic heuristics
around full_charge_capacity" because the kernel already does (and this
is what the ACPI spec encourages), although JP's patch could be
improved to more clearly and more completely follow the logic in
acpi_battery_is_charged().

The question of whether we would want to interpret the numbers in this
way when DISCHARGING is set is a good one though. The ACPI spec
doesn't shine much light on this, but the calculations related to
battery percentage and battery life in section 3.9.3 at least do not
seem to consider charging/discharging status.

Thanks
Daniel


Re: [PATCH 0/9] psi: pressure stall information for CPU, memory, and IO v4

2018-09-16 Thread Daniel Drake
Hi Suren

On Fri, Sep 7, 2018 at 11:58 PM, Suren Baghdasaryan  wrote:
> Thanks for the new patchset! Backported to 4.9 and retested on ARMv8 8
> code system running Android. Signals behave as expected reacting to
> memory pressure, no jumps in "total" counters that would indicate an
> overflow/underflow issues. Nicely done!

Can you share your Linux v4.9 psi backport somewhere?

Thanks
Daniel


Re: [PATCH 0/9] psi: pressure stall information for CPU, memory, and IO v4

2018-09-16 Thread Daniel Drake
Hi Suren

On Fri, Sep 7, 2018 at 11:58 PM, Suren Baghdasaryan  wrote:
> Thanks for the new patchset! Backported to 4.9 and retested on ARMv8 8
> code system running Android. Signals behave as expected reacting to
> memory pressure, no jumps in "total" counters that would indicate an
> overflow/underflow issues. Nicely done!

Can you share your Linux v4.9 psi backport somewhere?

Thanks
Daniel


Re: [PATCH 0/9] psi: pressure stall information for CPU, memory, and IO v4

2018-09-07 Thread Daniel Drake
On Thu, Sep 6, 2018 at 5:43 AM, Johannes Weiner  wrote:
> Peter, do the changes from v3 look sane to you?
>
> If there aren't any further objections, I was hoping we could get this
> lined up for 4.20.

That would be excellent. I just retested the latest version at
http://git.cmpxchg.org/cgit.cgi/linux-psi.git (Linux 4.18) and the
results are great.

Test setup:
Endless OS
GeminiLake N4200 low end laptop
2GB RAM
swap (and zram swap) disabled

Baseline test: open a handful of large-ish apps and several website
tabs in Google Chrome.
Results: after a couple of minutes, system is excessively thrashing,
mouse cursor can barely be moved, UI is not responding to mouse
clicks, so it's impractical to recover from this situation as an
ordinary user

Add my simple killer:
https://gist.github.com/dsd/a8988bf0b81a6163475988120fe8d9cd
Results: when the thrashing causes the UI to become sluggish, the
killer steps in and kills something (usually a chrome tab), and the
system remains usable. I repeatedly opened more apps and more websites
over a 15 minute period but I wasn't able to get the system to a point
of UI unresponsiveness.

Thanks,
Daniel


Re: [PATCH 0/9] psi: pressure stall information for CPU, memory, and IO v4

2018-09-07 Thread Daniel Drake
On Thu, Sep 6, 2018 at 5:43 AM, Johannes Weiner  wrote:
> Peter, do the changes from v3 look sane to you?
>
> If there aren't any further objections, I was hoping we could get this
> lined up for 4.20.

That would be excellent. I just retested the latest version at
http://git.cmpxchg.org/cgit.cgi/linux-psi.git (Linux 4.18) and the
results are great.

Test setup:
Endless OS
GeminiLake N4200 low end laptop
2GB RAM
swap (and zram swap) disabled

Baseline test: open a handful of large-ish apps and several website
tabs in Google Chrome.
Results: after a couple of minutes, system is excessively thrashing,
mouse cursor can barely be moved, UI is not responding to mouse
clicks, so it's impractical to recover from this situation as an
ordinary user

Add my simple killer:
https://gist.github.com/dsd/a8988bf0b81a6163475988120fe8d9cd
Results: when the thrashing causes the UI to become sluggish, the
killer steps in and kills something (usually a chrome tab), and the
system remains usable. I repeatedly opened more apps and more websites
over a 15 minute period but I wasn't able to get the system to a point
of UI unresponsiveness.

Thanks,
Daniel


Re: Built in PS2 keyboard in new ASUS/acer laptops can not wake up system after s2idle

2018-08-16 Thread Daniel Drake
On Mon, Aug 6, 2018 at 7:17 PM, Rafael J. Wysocki  wrote:
>> 'echo enabled > /sys/devices/platform/i8042/serio0/power/wakeup' can get the
>> keyboard wake up the system as expected. We considered to work out a DMI
>> based quirk for this. But based on the information that EC would not signal 
>> SCI
>> event for WiskyLake models, we'd like to know if there's any generic 
>> solution?
>> Maybe a 'udev' rule to identify WiskyLake NoteBook then enable the keyboard
>> wake up?
>
> A udev rule sounds like a good idea to me.

What would the udev rule look like though?

Match for Intel CPU generation (WhiskyLake) and laptop chassis type
and then enable i8042 wakeups? While that seems like the most accurate
reflection of the situation which we are seeing across multiple
vendors, it doesn't feel right and seems unlikely to be accepted by
systemd upstream.

In previous designs, pressing a key while the system was in S3 sleep
would cause a SCI interrupt due to the firing of the EC GPE, which
effectively meant that keyboard wakeups were on by default and could
not be disabled. Also USB keyboards have wakeups on by default (see
usbhid_start()). Just these new platforms have this
unfortunate/accidental behaviour change...

Would it make sense to turn i8042 wakeups on by default on the kernel
side? I don't know if any particular conditions are applied, but that
would appear to be the default Win10 behaviour here.

Daniel


Re: Built in PS2 keyboard in new ASUS/acer laptops can not wake up system after s2idle

2018-08-16 Thread Daniel Drake
On Mon, Aug 6, 2018 at 7:17 PM, Rafael J. Wysocki  wrote:
>> 'echo enabled > /sys/devices/platform/i8042/serio0/power/wakeup' can get the
>> keyboard wake up the system as expected. We considered to work out a DMI
>> based quirk for this. But based on the information that EC would not signal 
>> SCI
>> event for WiskyLake models, we'd like to know if there's any generic 
>> solution?
>> Maybe a 'udev' rule to identify WiskyLake NoteBook then enable the keyboard
>> wake up?
>
> A udev rule sounds like a good idea to me.

What would the udev rule look like though?

Match for Intel CPU generation (WhiskyLake) and laptop chassis type
and then enable i8042 wakeups? While that seems like the most accurate
reflection of the situation which we are seeing across multiple
vendors, it doesn't feel right and seems unlikely to be accepted by
systemd upstream.

In previous designs, pressing a key while the system was in S3 sleep
would cause a SCI interrupt due to the firing of the EC GPE, which
effectively meant that keyboard wakeups were on by default and could
not be disabled. Also USB keyboards have wakeups on by default (see
usbhid_start()). Just these new platforms have this
unfortunate/accidental behaviour change...

Would it make sense to turn i8042 wakeups on by default on the kernel
side? I don't know if any particular conditions are applied, but that
would appear to be the default Win10 behaviour here.

Daniel


Making direct reclaim fail when thrashing

2018-07-27 Thread Daniel Drake
Split from the thread
  [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2
where we were discussing if/how to make the direct reclaim codepath
fail if we're excessively thrashing, so that the OOM killer might
step in. This is potentially desirable when the thrashing is so bad
that the UI stops responding, causing the user to pull the plug.

On Tue, Jul 17, 2018 at 7:23 AM, Michal Hocko  wrote:
> mm/workingset.c allows for tracking when an actual page got evicted.
> workingset_refault tells us whether a give filemap fault is a recent
> refault and activates the page if that is the case. So what you need is
> to note how many refaulted pages we have on the active LRU list. If that
> is a large part of the list and if the inactive list is really small
> then we know we are trashing. This all sounds much easier than it will
> eventually turn out to be of course but I didn't really get to play with
> this much.

Apologies in advance for any silly mistakes or terrible code that
follows, as I am not familiar in this part of the kernel.

As mentioned in my last mail, knowing if a page on the active list was
refaulted into place appears not trivial, because the eviction information
was lost upon refault (it was stored in the page cache shadow entry).

Here I'm experimenting by adding another tag to the page cache radix tree,
tagging pages that were activated in the refault path.

And then in get_scan_count I'm checking how many active pages have that
tag, and also looking at the size of the active and inactive lists.

It has a performance blow (probably due to looping over the whole
active list and doing lots of locking?) but I figured it might serve
as one step forward.

The results are not exactly as I would expect. Upon launching 20
processes that allocate and memset 100mb RAM each, exhausting all RAM
(and no swap available), the kernel starts thrashing and I get numbers
like:

 get_scan_count lru1 active=422714 inactive=19595 refaulted=0
 get_scan_count lru3 active=832 inactive=757 refaulted=21

Lots of active anonymous pages (lru1), and none refaulted, perhaps
not surprising because it can't swap them out, no swap available.

But only few file pages on the lists (lru3), and only a tiny number
of refaulted ones, which doesn't line up with your suggestion of
detecting when a large part of the active list is made up of refaulted
pages.

Any further suggestions appreciated.

Thanks
Daniel
---
 include/linux/fs.h |  1 +
 include/linux/radix-tree.h |  2 +-
 mm/filemap.c   |  2 ++
 mm/vmscan.c| 37 +
 4 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index d85ac9d24bb3..45f94ffd1c67 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -462,6 +462,7 @@ struct block_device {
 #define PAGECACHE_TAG_DIRTY0
 #define PAGECACHE_TAG_WRITEBACK1
 #define PAGECACHE_TAG_TOWRITE  2
+#define PAGECACHE_TAG_REFAULTED3
 
 int mapping_tagged(struct address_space *mapping, int tag);
 
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 34149e8b5f73..86eccb71ef7e 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -65,7 +65,7 @@ static inline bool radix_tree_is_internal_node(void *ptr)
 
 /*** radix-tree API starts here ***/
 
-#define RADIX_TREE_MAX_TAGS 3
+#define RADIX_TREE_MAX_TAGS 4
 
 #ifndef RADIX_TREE_MAP_SHIFT
 #define RADIX_TREE_MAP_SHIFT   (CONFIG_BASE_SMALL ? 4 : 6)
diff --git a/mm/filemap.c b/mm/filemap.c
index 250f675dcfb2..9a686570dc75 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -917,6 +917,8 @@ int add_to_page_cache_lru(struct page *page, struct 
address_space *mapping,
 */
if (!(gfp_mask & __GFP_WRITE) &&
shadow && workingset_refault(shadow)) {
+   radix_tree_tag_set(>i_pages, page_index(page),
+  PAGECACHE_TAG_REFAULTED);
SetPageActive(page);
workingset_activation(page);
} else
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 03822f86f288..79bc810b43bb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2102,6 +2102,30 @@ enum scan_balance {
SCAN_FILE,
 };
 
+
+static int count_refaulted(struct lruvec *lruvec, enum lru_list lru) {
+   int nr_refaulted = 0;
+   struct page *page;
+
+   list_for_each_entry(page, >lists[lru], lru) {
+   /* Lookup page cache entry from page following the approach
+* taken in __set_page_dirty_nobuffers */
+   unsigned long flags;
+   struct address_space *mapping = page_mapping(page);
+   if (!mapping)
+   continue;
+
+   xa_lock_irqsave(>i_pages, flags);
+   BUG_ON(page_mapping(page) != mapping);
+   nr_refaulted += radix_tree_tag_get(>i_pages,
+

Making direct reclaim fail when thrashing

2018-07-27 Thread Daniel Drake
Split from the thread
  [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2
where we were discussing if/how to make the direct reclaim codepath
fail if we're excessively thrashing, so that the OOM killer might
step in. This is potentially desirable when the thrashing is so bad
that the UI stops responding, causing the user to pull the plug.

On Tue, Jul 17, 2018 at 7:23 AM, Michal Hocko  wrote:
> mm/workingset.c allows for tracking when an actual page got evicted.
> workingset_refault tells us whether a give filemap fault is a recent
> refault and activates the page if that is the case. So what you need is
> to note how many refaulted pages we have on the active LRU list. If that
> is a large part of the list and if the inactive list is really small
> then we know we are trashing. This all sounds much easier than it will
> eventually turn out to be of course but I didn't really get to play with
> this much.

Apologies in advance for any silly mistakes or terrible code that
follows, as I am not familiar in this part of the kernel.

As mentioned in my last mail, knowing if a page on the active list was
refaulted into place appears not trivial, because the eviction information
was lost upon refault (it was stored in the page cache shadow entry).

Here I'm experimenting by adding another tag to the page cache radix tree,
tagging pages that were activated in the refault path.

And then in get_scan_count I'm checking how many active pages have that
tag, and also looking at the size of the active and inactive lists.

It has a performance blow (probably due to looping over the whole
active list and doing lots of locking?) but I figured it might serve
as one step forward.

The results are not exactly as I would expect. Upon launching 20
processes that allocate and memset 100mb RAM each, exhausting all RAM
(and no swap available), the kernel starts thrashing and I get numbers
like:

 get_scan_count lru1 active=422714 inactive=19595 refaulted=0
 get_scan_count lru3 active=832 inactive=757 refaulted=21

Lots of active anonymous pages (lru1), and none refaulted, perhaps
not surprising because it can't swap them out, no swap available.

But only few file pages on the lists (lru3), and only a tiny number
of refaulted ones, which doesn't line up with your suggestion of
detecting when a large part of the active list is made up of refaulted
pages.

Any further suggestions appreciated.

Thanks
Daniel
---
 include/linux/fs.h |  1 +
 include/linux/radix-tree.h |  2 +-
 mm/filemap.c   |  2 ++
 mm/vmscan.c| 37 +
 4 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index d85ac9d24bb3..45f94ffd1c67 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -462,6 +462,7 @@ struct block_device {
 #define PAGECACHE_TAG_DIRTY0
 #define PAGECACHE_TAG_WRITEBACK1
 #define PAGECACHE_TAG_TOWRITE  2
+#define PAGECACHE_TAG_REFAULTED3
 
 int mapping_tagged(struct address_space *mapping, int tag);
 
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 34149e8b5f73..86eccb71ef7e 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -65,7 +65,7 @@ static inline bool radix_tree_is_internal_node(void *ptr)
 
 /*** radix-tree API starts here ***/
 
-#define RADIX_TREE_MAX_TAGS 3
+#define RADIX_TREE_MAX_TAGS 4
 
 #ifndef RADIX_TREE_MAP_SHIFT
 #define RADIX_TREE_MAP_SHIFT   (CONFIG_BASE_SMALL ? 4 : 6)
diff --git a/mm/filemap.c b/mm/filemap.c
index 250f675dcfb2..9a686570dc75 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -917,6 +917,8 @@ int add_to_page_cache_lru(struct page *page, struct 
address_space *mapping,
 */
if (!(gfp_mask & __GFP_WRITE) &&
shadow && workingset_refault(shadow)) {
+   radix_tree_tag_set(>i_pages, page_index(page),
+  PAGECACHE_TAG_REFAULTED);
SetPageActive(page);
workingset_activation(page);
} else
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 03822f86f288..79bc810b43bb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2102,6 +2102,30 @@ enum scan_balance {
SCAN_FILE,
 };
 
+
+static int count_refaulted(struct lruvec *lruvec, enum lru_list lru) {
+   int nr_refaulted = 0;
+   struct page *page;
+
+   list_for_each_entry(page, >lists[lru], lru) {
+   /* Lookup page cache entry from page following the approach
+* taken in __set_page_dirty_nobuffers */
+   unsigned long flags;
+   struct address_space *mapping = page_mapping(page);
+   if (!mapping)
+   continue;
+
+   xa_lock_irqsave(>i_pages, flags);
+   BUG_ON(page_mapping(page) != mapping);
+   nr_refaulted += radix_tree_tag_get(>i_pages,
+

Re: [PATCH 2/2] pinctrl/amd: use byte access to clear irq/wake status bits

2018-07-24 Thread Daniel Drake
On Fri, Jul 20, 2018 at 6:38 PM, Daniel Kurtz  wrote:
> Sounds reasonable.  How about:
>
> -   /* Clear interrupt.
> -* We must read the pin register again, in case the
> -* value was changed while executing
> -* generic_handle_irq() above.
> +   /*
> +* Write-1-to-clear irq/wake status bits in MSByte.
> +* All other bits in this byte are read-only.
> +* This avoids modifying the lower 24-bits
> because they may have
> +*  changed while executing generic_handle_irq() 
> above.
>  */

That looks good. Thanks

Daniel


Re: [PATCH 2/2] pinctrl/amd: use byte access to clear irq/wake status bits

2018-07-24 Thread Daniel Drake
On Fri, Jul 20, 2018 at 6:38 PM, Daniel Kurtz  wrote:
> Sounds reasonable.  How about:
>
> -   /* Clear interrupt.
> -* We must read the pin register again, in case the
> -* value was changed while executing
> -* generic_handle_irq() above.
> +   /*
> +* Write-1-to-clear irq/wake status bits in MSByte.
> +* All other bits in this byte are read-only.
> +* This avoids modifying the lower 24-bits
> because they may have
> +*  changed while executing generic_handle_irq() 
> above.
>  */

That looks good. Thanks

Daniel


Re: [PATCH 2/2] pinctrl/amd: use byte access to clear irq/wake status bits

2018-07-17 Thread Daniel Drake
On Mon, Jul 16, 2018 at 7:57 PM, Daniel Kurtz  wrote:
> Commit 6afb10267c1692 ("pinctrl/amd: fix masking of GPIO interrupts")
> changed to the clearing of interrupt status bits to a RMW in a critical
> section.  This works, but is a bit overkill.
>
> The relevant interrupt/wake status bits are in the Most Significant Byte
> of a 32-bit word.  These two are the only write-able bits in this byte.

I don't have the hardware to test this any more, and I also don't have
any docs to double if those are really the only writable bits, but
looking at the existing driver code it does seem to be the case.

I think you should retain the comment noting that the value of the
register may have changed since it was read just a few lines above
(and hence explaining more precisely why we make the special effort
just to modify the MSB), just in case there is further rework of this
code in future and we end up walking into the same trap. It was one of
those issues that took a frustratingly long time to figure out...

Thanks
Daniel


Re: [PATCH 2/2] pinctrl/amd: use byte access to clear irq/wake status bits

2018-07-17 Thread Daniel Drake
On Mon, Jul 16, 2018 at 7:57 PM, Daniel Kurtz  wrote:
> Commit 6afb10267c1692 ("pinctrl/amd: fix masking of GPIO interrupts")
> changed to the clearing of interrupt status bits to a RMW in a critical
> section.  This works, but is a bit overkill.
>
> The relevant interrupt/wake status bits are in the Most Significant Byte
> of a 32-bit word.  These two are the only write-able bits in this byte.

I don't have the hardware to test this any more, and I also don't have
any docs to double if those are really the only writable bits, but
looking at the existing driver code it does seem to be the case.

I think you should retain the comment noting that the value of the
register may have changed since it was read just a few lines above
(and hence explaining more precisely why we make the special effort
just to modify the MSB), just in case there is further rework of this
code in future and we end up walking into the same trap. It was one of
those issues that took a frustratingly long time to figure out...

Thanks
Daniel


Re: [PATCH v2 1/2] platform/x86: asus-wmi: Call led hw_changed API on kbd brightness change

2018-06-19 Thread Daniel Drake
Hi,

On Thu, Jun 14, 2018 at 1:58 AM, Chris Chiu  wrote:
>
> On Wed, Jun 13, 2018 at 8:49 PM, Andy Shevchenko
>  wrote:
> > On Mon, Jun 11, 2018 at 10:18 AM, Chris Chiu  wrote:
> >> Make asus-wmi notify on hotkey kbd brightness changes, listen for
> >> brightness events and update the brightness directly in the driver.
> >
> >> For this purpose, bound check on brightness in kbd_led_set must be
> >> based on the same data type to prevent illegal value been set.
> >
> >> @@ -497,9 +498,9 @@ static void kbd_led_set(struct led_classdev *led_cdev,
> >>
> >> asus = container_of(led_cdev, struct asus_wmi, kbd_led);
> >>
> >> -   if (value > asus->kbd_led.max_brightness)
> >> +   if ((int)value > (int)asus->kbd_led.max_brightness)
> >> value = asus->kbd_led.max_brightness;
> >> -   else if (value < 0)
> >> +   else if ((int)value < 0)
> >> value = 0;
> >
> > I didn't quite understand this part of the problem.
> > Does it exist in the current code? Can it be split to a separate change?
> >
> > Can we avoid those ugly castings?
> >
>
> I'd like to remove the ugly castings but there's a concern I may need some
> advices. I don't know whether if the bound check logic ever verified before.
> Maybe the value passed via sysfs is already correctly bounded?

The casts come from the underlying need to limit the minumum and
maximum brightness within available bounds, plus difficulties doing
that when you are dealing with an enum data type.

kbd_led_set is a function pointer (from led_classdev.brightness_set)
which has this definition:

void (*brightness_set)(struct led_classdev *led_cdev, enum
led_brightness brightness);

It seems that the compiler has the choice of whether to use a signed
or unsigned type for enums. According to your tests, and a quick test
app below, it seems like gcc is using unsigned.

  #include 
  enum led_brightness { LED_OFF = 0 };
  int main(void) {
enum led_brightness tmp = -1;
if (tmp < 0)
  printf("less than zero\n");
else
  printf("gt zero\n");
  }

This test app prints "gt zero"

led-class.c:brightness_store() uses kstrtoul() so there is no chance
of passing a negative value through this codepath, as you suspected.
But we do need to do something with negative bounds for the code that
you are adding here.

I suggest doing it like this:

static void __kbd_led_set(struct led_classdev *led_cdev, int value)
{
struct asus_wmi *asus;

asus = container_of(led_cdev, struct asus_wmi, kbd_led);

if (value > asus->kbd_led.max_brightness)
value = asus->kbd_led.max_brightness;
else if (value < 0)
value = 0;

asus->kbd_led_wk = value;
queue_work(asus->led_workqueue, >kbd_led_work);
}

static void kbd_led_set(struct led_classdev *led_cdev,
enum led_brightness value)
{
return __kbd_led_set(led_cdev, value);
}

Now kbd_led_set can continue being a correctly typed function pointer
for led_classdev.brightness_set. And from the code you are adding here
you can call __kbd_led_set directly with signed integer values, and
rely on correct bounds correction without ugly casts.

Andy, what do you think?

Thanks
Daniel


Re: [PATCH v2 1/2] platform/x86: asus-wmi: Call led hw_changed API on kbd brightness change

2018-06-19 Thread Daniel Drake
Hi,

On Thu, Jun 14, 2018 at 1:58 AM, Chris Chiu  wrote:
>
> On Wed, Jun 13, 2018 at 8:49 PM, Andy Shevchenko
>  wrote:
> > On Mon, Jun 11, 2018 at 10:18 AM, Chris Chiu  wrote:
> >> Make asus-wmi notify on hotkey kbd brightness changes, listen for
> >> brightness events and update the brightness directly in the driver.
> >
> >> For this purpose, bound check on brightness in kbd_led_set must be
> >> based on the same data type to prevent illegal value been set.
> >
> >> @@ -497,9 +498,9 @@ static void kbd_led_set(struct led_classdev *led_cdev,
> >>
> >> asus = container_of(led_cdev, struct asus_wmi, kbd_led);
> >>
> >> -   if (value > asus->kbd_led.max_brightness)
> >> +   if ((int)value > (int)asus->kbd_led.max_brightness)
> >> value = asus->kbd_led.max_brightness;
> >> -   else if (value < 0)
> >> +   else if ((int)value < 0)
> >> value = 0;
> >
> > I didn't quite understand this part of the problem.
> > Does it exist in the current code? Can it be split to a separate change?
> >
> > Can we avoid those ugly castings?
> >
>
> I'd like to remove the ugly castings but there's a concern I may need some
> advices. I don't know whether if the bound check logic ever verified before.
> Maybe the value passed via sysfs is already correctly bounded?

The casts come from the underlying need to limit the minumum and
maximum brightness within available bounds, plus difficulties doing
that when you are dealing with an enum data type.

kbd_led_set is a function pointer (from led_classdev.brightness_set)
which has this definition:

void (*brightness_set)(struct led_classdev *led_cdev, enum
led_brightness brightness);

It seems that the compiler has the choice of whether to use a signed
or unsigned type for enums. According to your tests, and a quick test
app below, it seems like gcc is using unsigned.

  #include 
  enum led_brightness { LED_OFF = 0 };
  int main(void) {
enum led_brightness tmp = -1;
if (tmp < 0)
  printf("less than zero\n");
else
  printf("gt zero\n");
  }

This test app prints "gt zero"

led-class.c:brightness_store() uses kstrtoul() so there is no chance
of passing a negative value through this codepath, as you suspected.
But we do need to do something with negative bounds for the code that
you are adding here.

I suggest doing it like this:

static void __kbd_led_set(struct led_classdev *led_cdev, int value)
{
struct asus_wmi *asus;

asus = container_of(led_cdev, struct asus_wmi, kbd_led);

if (value > asus->kbd_led.max_brightness)
value = asus->kbd_led.max_brightness;
else if (value < 0)
value = 0;

asus->kbd_led_wk = value;
queue_work(asus->led_workqueue, >kbd_led_work);
}

static void kbd_led_set(struct led_classdev *led_cdev,
enum led_brightness value)
{
return __kbd_led_set(led_cdev, value);
}

Now kbd_led_set can continue being a correctly typed function pointer
for led_classdev.brightness_set. And from the code you are adding here
you can call __kbd_led_set directly with signed integer values, and
rely on correct bounds correction without ugly casts.

Andy, what do you think?

Thanks
Daniel


Re: [PATCH 1/2] platform/x86: asus-wmi: Call new led hw_changed API on kbd brightness change

2018-06-04 Thread Daniel Drake
On Mon, Jun 4, 2018 at 7:22 AM, Hans de Goede  wrote:
> Is this really a case of the hardware itself processing the
> keypress and then changing the brightness *itself* ?
>
> From the "[PATCH 2/2] platform/x86: asus-wmi: Add keyboard backlight
> toggle support" patch I get the impression that the driver is
> modifying the brightness from within the kernel rather then the
> keyboard controller are ACPI embeddec-controller doing it itself.
>
> If that is the case then the right fix is for the driver to stop
> mucking with the brighness itself, it should simply report the
> right keyboard events and export a led interface and then userspace
> will do the right thing (and be able to offer flexible policies
> to the user).

Before this modification, the driver reports the brightness keypresses
to userspace and then userspace can respond by changing the brightness
level, as you describe.

You are right in that the hardware doesn't change the brightness
directly itself, which is the normal usage of LED_BRIGHT_HW_CHANGED.

However this approach was suggested by Benjamin Berg and Bastien
Nocera in the thread: Re: [PATCH v2] platform/x86: asus-wmi: Add
keyboard backlight toggle support
https://marc.info/?l=linux-kernel=152639169210655=2

The issue is that we need to support a new "keyboard backlight
brightness cycle" key (in the patch that follows this one) which
doesn't fit into any definitions of keys recognised by the kernel and
likewise there's no userspace code to handle it.

If preferred we could leave the standard brightness keys behaving as
they are (input events) and make the new special key type directly
handled by the kernel?

Daniel


Re: [PATCH 1/2] platform/x86: asus-wmi: Call new led hw_changed API on kbd brightness change

2018-06-04 Thread Daniel Drake
On Mon, Jun 4, 2018 at 7:22 AM, Hans de Goede  wrote:
> Is this really a case of the hardware itself processing the
> keypress and then changing the brightness *itself* ?
>
> From the "[PATCH 2/2] platform/x86: asus-wmi: Add keyboard backlight
> toggle support" patch I get the impression that the driver is
> modifying the brightness from within the kernel rather then the
> keyboard controller are ACPI embeddec-controller doing it itself.
>
> If that is the case then the right fix is for the driver to stop
> mucking with the brighness itself, it should simply report the
> right keyboard events and export a led interface and then userspace
> will do the right thing (and be able to offer flexible policies
> to the user).

Before this modification, the driver reports the brightness keypresses
to userspace and then userspace can respond by changing the brightness
level, as you describe.

You are right in that the hardware doesn't change the brightness
directly itself, which is the normal usage of LED_BRIGHT_HW_CHANGED.

However this approach was suggested by Benjamin Berg and Bastien
Nocera in the thread: Re: [PATCH v2] platform/x86: asus-wmi: Add
keyboard backlight toggle support
https://marc.info/?l=linux-kernel=152639169210655=2

The issue is that we need to support a new "keyboard backlight
brightness cycle" key (in the patch that follows this one) which
doesn't fit into any definitions of keys recognised by the kernel and
likewise there's no userspace code to handle it.

If preferred we could leave the standard brightness keys behaving as
they are (input events) and make the new special key type directly
handled by the kernel?

Daniel


  1   2   3   4   5   6   >