RE: [RFC PATCH v6 1/3] ACPI / EC: Fix possible driver order issue by moving EC event handling earlier

2017-11-23 Thread Zheng, Lv
Hi, Rui

> From: Zhang, Rui
> Subject: RE: [RFC PATCH v6 1/3] ACPI / EC: Fix possible driver order issue by 
> moving EC event handling
> earlier
> 
> 
> > From: linux-acpi-ow...@vger.kernel.org [mailto:linux-acpi-
> > Subject: [RFC PATCH v6 1/3] ACPI / EC: Fix possible driver order issue by
> > moving EC event handling earlier
> >
> > This patch tries to detect EC events earlier after resume, so that if an 
> > event
> > occurred before invoking acpi_ec_unblock_transactions(), it could be
> > detected by acpi_ec_unblock_transactions() which is the earliest EC driver
> > call after resume.
> >
> > However after the noirq stage, if an event ocurred after
> > acpi_ec_unblock_transactions() and before acpi_ec_resume(), there was no
> > mean to detect and trigger it right then, but can only detect it and handle 
> > it
> > after acpi_ec_resume().
> >
> > Now the final logic is:
> > 1. If ec_freeze_events=Y, event handling is stopped in acpi_ec_suspend(),
> >restarted in acpi_ec_resume();
> > 2. If ec_freeze_events=N, event handling is stopped in
> >acpi_ec_block_transactions(), restarted in
> >acpi_ec_unblock_transactions();
> > 3. In order to handling the conflict of the edge-trigger nature of EC IRQ
> >and the Linux noirq stage, advance_transaction() is invoked where the
> >event handling is enabled and the noirq stage is ended.
> >
> > Known issue:
> > 1. Event ocurred between acpi_ec_unblock_transactions() and
> >acpi_ec_resume() may still lead to the order issue. This can only be
> >fixed by adding a periodic detection mechanism during the noirq stage.
> >
> > Signed-off-by: Lv Zheng 
> > Tested-by: Tomislav Ivek 
> > Tested-by: Luya Tshimbalanga 
> 
> I don't know what issue this patch has been tested for. Lv, can you please 
> clarify?

The testers' names are listed here because they have tried the commit and no
regressions can be found on their test platforms.

> 
> I agree with lv that it can probably fix some issues brought by the device 
> order issue.
> And I'll be glad to push this after we have verified it is really helpful.
> Lv,
> Do you still remember the bug report for the lid issue?

Maybe you should ask Benjamin.
Let me Cc him for further investigation.

Thanks,
Lv

> 
> Thanks,
> rui
> > ---
> >  drivers/acpi/ec.c | 35 ++-
> >  1 file changed, 26 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index df84246..f1f320b
> > 100644
> > --- a/drivers/acpi/ec.c
> > +++ b/drivers/acpi/ec.c
> > @@ -249,6 +249,11 @@ static bool acpi_ec_started(struct acpi_ec *ec)
> >!test_bit(EC_FLAGS_STOPPED, >flags);  }
> >
> > +static bool acpi_ec_no_sleep_events(void) {
> > +   return acpi_sleep_no_ec_events() && ec_freeze_events; }
> > +
> >  static bool acpi_ec_event_enabled(struct acpi_ec *ec)  {
> > /*
> > @@ -260,14 +265,14 @@ static bool acpi_ec_event_enabled(struct acpi_ec
> > *ec)
> > return false;
> > /*
> >  * However, disabling the event handling is experimental for late
> > -* stage (suspend), and is controlled by the boot parameter of
> > -* "ec_freeze_events":
> > +* stage (suspend), and is controlled by
> > +* "acpi_ec_no_sleep_events()":
> >  * 1. true:  The EC event handling is disabled before entering
> >  *   the noirq stage.
> >  * 2. false: The EC event handling is automatically disabled as
> >  *   soon as the EC driver is stopped.
> >  */
> > -   if (ec_freeze_events)
> > +   if (acpi_ec_no_sleep_events())
> > return acpi_ec_started(ec);
> > else
> > return test_bit(EC_FLAGS_STARTED, >flags); @@ -524,8
> > +529,8 @@ static bool acpi_ec_query_flushed(struct acpi_ec *ec)  static void
> > __acpi_ec_flush_event(struct acpi_ec *ec)  {
> > /*
> > -* When ec_freeze_events is true, we need to flush events in
> > -* the proper position before entering the noirq stage.
> > +* When acpi_ec_no_sleep_events() is true, we need to flush events
> > +* in the proper position before entering the noirq stage.
> >  */
> > wait_event(ec->wait, acpi_ec_query_flushed(ec));
> > if (ec_query_wq)
> > @@ -948,7 +953,8 @@ static void acpi_ec_start(struct acpi_ec *ec, bool
> > resuming)
> > if (!resuming) {
> > acpi_ec_submit_request(ec);
> > ec_dbg_ref(ec, "Increase driver");
> > -   }
> > +   } else if (!acpi_ec_no_sleep_events())
> > +   __acpi_ec_enable_event(ec);
> > ec_log_drv("EC started");
> > }
> > spin_unlock_irqrestore(>lock, flags); @@ -980,7 +986,7 @@
> > static void acpi_ec_stop(struct acpi_ec *ec, bool suspending)
> > if (!suspending) {
> > acpi_ec_complete_request(ec);
> > ec_dbg_ref(ec, "Decrease driver");
> > -   } else if 

RE: [RFC PATCH v6 1/3] ACPI / EC: Fix possible driver order issue by moving EC event handling earlier

2017-11-23 Thread Zheng, Lv
Hi, Rui

> From: Zhang, Rui
> Subject: RE: [RFC PATCH v6 1/3] ACPI / EC: Fix possible driver order issue by 
> moving EC event handling
> earlier
> 
> 
> > From: linux-acpi-ow...@vger.kernel.org [mailto:linux-acpi-
> > Subject: [RFC PATCH v6 1/3] ACPI / EC: Fix possible driver order issue by
> > moving EC event handling earlier
> >
> > This patch tries to detect EC events earlier after resume, so that if an 
> > event
> > occurred before invoking acpi_ec_unblock_transactions(), it could be
> > detected by acpi_ec_unblock_transactions() which is the earliest EC driver
> > call after resume.
> >
> > However after the noirq stage, if an event ocurred after
> > acpi_ec_unblock_transactions() and before acpi_ec_resume(), there was no
> > mean to detect and trigger it right then, but can only detect it and handle 
> > it
> > after acpi_ec_resume().
> >
> > Now the final logic is:
> > 1. If ec_freeze_events=Y, event handling is stopped in acpi_ec_suspend(),
> >restarted in acpi_ec_resume();
> > 2. If ec_freeze_events=N, event handling is stopped in
> >acpi_ec_block_transactions(), restarted in
> >acpi_ec_unblock_transactions();
> > 3. In order to handling the conflict of the edge-trigger nature of EC IRQ
> >and the Linux noirq stage, advance_transaction() is invoked where the
> >event handling is enabled and the noirq stage is ended.
> >
> > Known issue:
> > 1. Event ocurred between acpi_ec_unblock_transactions() and
> >acpi_ec_resume() may still lead to the order issue. This can only be
> >fixed by adding a periodic detection mechanism during the noirq stage.
> >
> > Signed-off-by: Lv Zheng 
> > Tested-by: Tomislav Ivek 
> > Tested-by: Luya Tshimbalanga 
> 
> I don't know what issue this patch has been tested for. Lv, can you please 
> clarify?

The testers' names are listed here because they have tried the commit and no
regressions can be found on their test platforms.

> 
> I agree with lv that it can probably fix some issues brought by the device 
> order issue.
> And I'll be glad to push this after we have verified it is really helpful.
> Lv,
> Do you still remember the bug report for the lid issue?

Maybe you should ask Benjamin.
Let me Cc him for further investigation.

Thanks,
Lv

> 
> Thanks,
> rui
> > ---
> >  drivers/acpi/ec.c | 35 ++-
> >  1 file changed, 26 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index df84246..f1f320b
> > 100644
> > --- a/drivers/acpi/ec.c
> > +++ b/drivers/acpi/ec.c
> > @@ -249,6 +249,11 @@ static bool acpi_ec_started(struct acpi_ec *ec)
> >!test_bit(EC_FLAGS_STOPPED, >flags);  }
> >
> > +static bool acpi_ec_no_sleep_events(void) {
> > +   return acpi_sleep_no_ec_events() && ec_freeze_events; }
> > +
> >  static bool acpi_ec_event_enabled(struct acpi_ec *ec)  {
> > /*
> > @@ -260,14 +265,14 @@ static bool acpi_ec_event_enabled(struct acpi_ec
> > *ec)
> > return false;
> > /*
> >  * However, disabling the event handling is experimental for late
> > -* stage (suspend), and is controlled by the boot parameter of
> > -* "ec_freeze_events":
> > +* stage (suspend), and is controlled by
> > +* "acpi_ec_no_sleep_events()":
> >  * 1. true:  The EC event handling is disabled before entering
> >  *   the noirq stage.
> >  * 2. false: The EC event handling is automatically disabled as
> >  *   soon as the EC driver is stopped.
> >  */
> > -   if (ec_freeze_events)
> > +   if (acpi_ec_no_sleep_events())
> > return acpi_ec_started(ec);
> > else
> > return test_bit(EC_FLAGS_STARTED, >flags); @@ -524,8
> > +529,8 @@ static bool acpi_ec_query_flushed(struct acpi_ec *ec)  static void
> > __acpi_ec_flush_event(struct acpi_ec *ec)  {
> > /*
> > -* When ec_freeze_events is true, we need to flush events in
> > -* the proper position before entering the noirq stage.
> > +* When acpi_ec_no_sleep_events() is true, we need to flush events
> > +* in the proper position before entering the noirq stage.
> >  */
> > wait_event(ec->wait, acpi_ec_query_flushed(ec));
> > if (ec_query_wq)
> > @@ -948,7 +953,8 @@ static void acpi_ec_start(struct acpi_ec *ec, bool
> > resuming)
> > if (!resuming) {
> > acpi_ec_submit_request(ec);
> > ec_dbg_ref(ec, "Increase driver");
> > -   }
> > +   } else if (!acpi_ec_no_sleep_events())
> > +   __acpi_ec_enable_event(ec);
> > ec_log_drv("EC started");
> > }
> > spin_unlock_irqrestore(>lock, flags); @@ -980,7 +986,7 @@
> > static void acpi_ec_stop(struct acpi_ec *ec, bool suspending)
> > if (!suspending) {
> > acpi_ec_complete_request(ec);
> > ec_dbg_ref(ec, "Decrease driver");
> > -   } else if (!ec_freeze_events)
> > +   } else if 

RE: linux-4.14-rc2/drivers/acpi/acpica/utmath.c: 2 * suspicious expression ?

2017-09-27 Thread Zheng, Lv
Hi, David

Not a fancy way, but just a bit clearing.
OK, style is changed here:
https://github.com/acpica/acpica/pull/321
And bug is recorded here:
https://bugs.acpica.org/show_bug.cgi?id=1422

Thanks for the report.

Best regards
Lv

> From: David Binderman [mailto:dcb...@hotmail.com]
> Subject: linux-4.14-rc2/drivers/acpi/acpica/utmath.c: 2 * suspicious 
> expression ?
> 
> Hello there,
> 
> [linux-4.14-rc2/drivers/acpi/acpica/utmath.c:137] -> 
> [linux-4.14-rc2/drivers/acpi/acpica/utmath.c:137]:
> (style) Same expression on both sides of '^='.
> [linux-4.14-rc2/drivers/acpi/acpica/utmath.c:174] -> 
> [linux-4.14-rc2/drivers/acpi/acpica/utmath.c:174]:
> (style) Same expression on both sides of '^='.
> 
> operand_ovl.part.lo ^= operand_ovl.part.lo;
> 
> Is this just a fancy way to zero the field or should something more clever
> be happening ?
> 
> Regards
> 
> David Binderman
> 


RE: linux-4.14-rc2/drivers/acpi/acpica/utmath.c: 2 * suspicious expression ?

2017-09-27 Thread Zheng, Lv
Hi, David

Not a fancy way, but just a bit clearing.
OK, style is changed here:
https://github.com/acpica/acpica/pull/321
And bug is recorded here:
https://bugs.acpica.org/show_bug.cgi?id=1422

Thanks for the report.

Best regards
Lv

> From: David Binderman [mailto:dcb...@hotmail.com]
> Subject: linux-4.14-rc2/drivers/acpi/acpica/utmath.c: 2 * suspicious 
> expression ?
> 
> Hello there,
> 
> [linux-4.14-rc2/drivers/acpi/acpica/utmath.c:137] -> 
> [linux-4.14-rc2/drivers/acpi/acpica/utmath.c:137]:
> (style) Same expression on both sides of '^='.
> [linux-4.14-rc2/drivers/acpi/acpica/utmath.c:174] -> 
> [linux-4.14-rc2/drivers/acpi/acpica/utmath.c:174]:
> (style) Same expression on both sides of '^='.
> 
> operand_ovl.part.lo ^= operand_ovl.part.lo;
> 
> Is this just a fancy way to zero the field or should something more clever
> be happening ?
> 
> Regards
> 
> David Binderman
> 


RE: [PATCH v4 0/3] ACPI / EC: Fix EC event handling issues

2017-09-26 Thread Zheng, Lv
Hi, Rafael

I'm now working on v5 of this series.
Which
1. splits root causes in a more detailed way,
2. clarifies root causes in patch description with real bugs, and
3. is safer according to the known EC FW behaviors.
So you can discard v3/v4 from the patchwork site.
And I'll post v5 when everything is cleared to me.

Thanks and best regards
Lv

> From: Zheng, Lv
> Subject: [PATCH v4 0/3] ACPI / EC: Fix EC event handling issues
> 
> EC events are special, required to be handled during suspend/resume. But
> there are special logics in Linux causing several issues related to the EC
> event handling:
> 1. During noirq stage, Linux cannot detect EC events which are target
>driven.
> 2. When EC event handling is enabled later than the other drivers, order
>problem could be observed.
> When fixing these problems, care should be taken to not to trigger
> regressions to the following problem which has alredy been fixed using
> different approach):
> 3. When EC event handling is enabled before the end of the noirq stage,
>EC event handling may stuck.
> This patchset fixes these issues.
> 
> v4 of this patch series re-orders the fixes so that the fix of the problem
> 2 could be independent against the fix of the problem 1, this is done by
> refining the fix of the problem 2, making it immune to the problem 3.
> 
> Lv Zheng (3):
>   ACPI / EC: Fix possible driver order issue by moving EC event handling
> earlier
>   ACPI / EC: Add event detection support for noirq stages
>   ACPI / EC: Enable noirq stage event detection
> 
>  drivers/acpi/ec.c   | 116 
> +---
>  drivers/acpi/internal.h |   1 +
>  2 files changed, 111 insertions(+), 6 deletions(-)
> 
> --
> 2.7.4



RE: [PATCH v4 0/3] ACPI / EC: Fix EC event handling issues

2017-09-26 Thread Zheng, Lv
Hi, Rafael

I'm now working on v5 of this series.
Which
1. splits root causes in a more detailed way,
2. clarifies root causes in patch description with real bugs, and
3. is safer according to the known EC FW behaviors.
So you can discard v3/v4 from the patchwork site.
And I'll post v5 when everything is cleared to me.

Thanks and best regards
Lv

> From: Zheng, Lv
> Subject: [PATCH v4 0/3] ACPI / EC: Fix EC event handling issues
> 
> EC events are special, required to be handled during suspend/resume. But
> there are special logics in Linux causing several issues related to the EC
> event handling:
> 1. During noirq stage, Linux cannot detect EC events which are target
>driven.
> 2. When EC event handling is enabled later than the other drivers, order
>problem could be observed.
> When fixing these problems, care should be taken to not to trigger
> regressions to the following problem which has alredy been fixed using
> different approach):
> 3. When EC event handling is enabled before the end of the noirq stage,
>EC event handling may stuck.
> This patchset fixes these issues.
> 
> v4 of this patch series re-orders the fixes so that the fix of the problem
> 2 could be independent against the fix of the problem 1, this is done by
> refining the fix of the problem 2, making it immune to the problem 3.
> 
> Lv Zheng (3):
>   ACPI / EC: Fix possible driver order issue by moving EC event handling
> earlier
>   ACPI / EC: Add event detection support for noirq stages
>   ACPI / EC: Enable noirq stage event detection
> 
>  drivers/acpi/ec.c   | 116 
> +---
>  drivers/acpi/internal.h |   1 +
>  2 files changed, 111 insertions(+), 6 deletions(-)
> 
> --
> 2.7.4



RE: [PATCH v3 1/4] ACPI / EC: Cleanup EC GPE mask flag

2017-08-22 Thread Zheng, Lv
Hi,

> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Subject: Re: [PATCH v3 1/4] ACPI / EC: Cleanup EC GPE mask flag
> 
> On Friday, August 11, 2017 8:36:28 AM CEST Lv Zheng wrote:
> > EC_FLAGS_COMMAND_STORM is actually used to mask GPE during IRQ processing.
> > This patch cleans it up using more readable flag/function names.
> >
> > Signed-off-by: Lv Zheng 
> > Tested-by: Tomislav Ivek 
> 
> Applied, thanks!

Thanks!

> 
> I'm not sure about the rest of the series, though, but let me comment on the
> specific patches.

Though it is a long bug fix story, it's actually simple and can be summarized 
with 1 line:
The patchset polls EC IRQs during noirq stage to fix EC event handling issues.

I've submitted this solution several years ago using an IRQ polling kernel 
thread.
You commented me to use a timer instead, here you are.

During these years, I tried different solutions than the "polling IRQ in noirq 
stage".
But they didn't work perfectly.
The noirq stage makes the EC event handling issue fixes mutual exclusive.
Fixing one issue can trigger regression for the other.
And bug reports prove that we must handle EC events during noirq stage.

So finally I picked the IRQ polling solution back, and refreshed it to follow 
your comment (using a timer instead of a kthread).

If you apply this patch and enable EC debugging log with "dyndbg='file ec.c 
+p'".
You should be able to see some event handling logs in noirq stages.
Without this solution applied, EC event handling log is silent during noirq 
stages.
That could be the only difference.

Thanks and best regards
Lv


RE: [PATCH v3 1/4] ACPI / EC: Cleanup EC GPE mask flag

2017-08-22 Thread Zheng, Lv
Hi,

> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Subject: Re: [PATCH v3 1/4] ACPI / EC: Cleanup EC GPE mask flag
> 
> On Friday, August 11, 2017 8:36:28 AM CEST Lv Zheng wrote:
> > EC_FLAGS_COMMAND_STORM is actually used to mask GPE during IRQ processing.
> > This patch cleans it up using more readable flag/function names.
> >
> > Signed-off-by: Lv Zheng 
> > Tested-by: Tomislav Ivek 
> 
> Applied, thanks!

Thanks!

> 
> I'm not sure about the rest of the series, though, but let me comment on the
> specific patches.

Though it is a long bug fix story, it's actually simple and can be summarized 
with 1 line:
The patchset polls EC IRQs during noirq stage to fix EC event handling issues.

I've submitted this solution several years ago using an IRQ polling kernel 
thread.
You commented me to use a timer instead, here you are.

During these years, I tried different solutions than the "polling IRQ in noirq 
stage".
But they didn't work perfectly.
The noirq stage makes the EC event handling issue fixes mutual exclusive.
Fixing one issue can trigger regression for the other.
And bug reports prove that we must handle EC events during noirq stage.

So finally I picked the IRQ polling solution back, and refreshed it to follow 
your comment (using a timer instead of a kthread).

If you apply this patch and enable EC debugging log with "dyndbg='file ec.c 
+p'".
You should be able to see some event handling logs in noirq stages.
Without this solution applied, EC event handling log is silent during noirq 
stages.
That could be the only difference.

Thanks and best regards
Lv


RE: [PATCH] actbl1.h: use tab instead of seven spaces as the indentation

2017-08-22 Thread Zheng, Lv
Hi,

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Chao Fan
> Subject: [PATCH] actbl1.h: use tab instead of seven spaces as the indentation
> 
> The indentation of these two lines is seven spaces, but not tab.
> So fix it.
> 
> Signed-off-by: Chao Fan 
> ---
>  include/acpi/actbl1.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
> index b4ce55c008b0..d13e5b416a7e 100644
> --- a/include/acpi/actbl1.h
> +++ b/include/acpi/actbl1.h
> @@ -1223,9 +1223,9 @@ struct acpi_srat_mem_affinity {
>   u16 reserved;   /* Reserved, must be zero */
>   u64 base_address;
>   u64 length;
> -   u32 reserved1;
> + u32 reserved1;
>   u32 flags;
> -   u64 reserved2;   /* Reserved, must be zero */
> + u64 reserved2; /* Reserved, must be zero */
>  };
> 
>  /* Flags */
> --

You needn't do this manually.
An indentation fix commit can be easily and automatically generated by ACPICA 
linuxize script - divergence.sh.
See Documentation/acpi/linuxized-acpica.txt for detailed information.

Thanks
Lv


RE: [PATCH] actbl1.h: use tab instead of seven spaces as the indentation

2017-08-22 Thread Zheng, Lv
Hi,

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Chao Fan
> Subject: [PATCH] actbl1.h: use tab instead of seven spaces as the indentation
> 
> The indentation of these two lines is seven spaces, but not tab.
> So fix it.
> 
> Signed-off-by: Chao Fan 
> ---
>  include/acpi/actbl1.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
> index b4ce55c008b0..d13e5b416a7e 100644
> --- a/include/acpi/actbl1.h
> +++ b/include/acpi/actbl1.h
> @@ -1223,9 +1223,9 @@ struct acpi_srat_mem_affinity {
>   u16 reserved;   /* Reserved, must be zero */
>   u64 base_address;
>   u64 length;
> -   u32 reserved1;
> + u32 reserved1;
>   u32 flags;
> -   u64 reserved2;   /* Reserved, must be zero */
> + u64 reserved2; /* Reserved, must be zero */
>  };
> 
>  /* Flags */
> --

You needn't do this manually.
An indentation fix commit can be easily and automatically generated by ACPICA 
linuxize script - divergence.sh.
See Documentation/acpi/linuxized-acpica.txt for detailed information.

Thanks
Lv


RE: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the namespace

2017-08-16 Thread Zheng, Lv
Hi, Rafael

> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Subject: Re: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the 
> namespace
> 
> On Tuesday, August 15, 2017 4:12:24 AM CEST Zheng, Lv wrote:
> > Hi, Rafael
> >
> > > From: linux-acpi-ow...@vger.kernel.org 
> > > [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of
> Rafael J.
> > > Wysocki
> > > Subject: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the 
> > > namespace
> > >
> > > From: Rafael J. Wysocki <rafael.j.wyso...@intel.com>
> > >
> > > On some systems the platform firmware expects GPEs to be enabled
> > > before the enumeration of devices and if that expectation is not
> > > met, the systems in question may not boot in some situations.
> > >
> > > For this reason, change the initialization ordering of the ACPI
> > > subsystem to make it enable GPEs before scanning the namespace
> > > for the first time in order to enumerate devices.
> > >
> > > Reported-by: Mika Westerberg <mika.westerb...@linux.intel.com>
> > > Suggested-by: Mika Westerberg <mika.westerb...@linux.intel.com>
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com>
> > > ---
> > >  drivers/acpi/scan.c |8 
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > >
> > > Index: linux-pm/drivers/acpi/scan.c
> > > ===
> > > --- linux-pm.orig/drivers/acpi/scan.c
> > > +++ linux-pm/drivers/acpi/scan.c
> > > @@ -2139,6 +2139,10 @@ int __init acpi_scan_init(void)
> > >   acpi_get_spcr_uart_addr();
> > >   }
> > >
> > > + acpi_gpe_apply_masked_gpes();
> > > + acpi_update_all_gpes();
> > > + acpi_ec_ecdt_start();
> > > +
> >
> > Just for your information.
> > A recent internal bug reveals that acpi_ec_ecdt_start() should only be
> > invoked after the enumeration (acpi_ec_add()) for now.
> > The function contains logics that need to be altered by acpi_ec_add().
> >
> > So it seems we can only do less aggressive change by moving the GPE
> > related 2 lines up.
> 
> OK, done.
> 
> Please check my linux-next branch and see if that's what it should be.

I confirmed.
And refreshed my EC regression fix on top of that with v2 tagged in the 
subjects.

Thanks and best regards
Lv


RE: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the namespace

2017-08-16 Thread Zheng, Lv
Hi, Rafael

> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Subject: Re: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the 
> namespace
> 
> On Tuesday, August 15, 2017 4:12:24 AM CEST Zheng, Lv wrote:
> > Hi, Rafael
> >
> > > From: linux-acpi-ow...@vger.kernel.org 
> > > [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of
> Rafael J.
> > > Wysocki
> > > Subject: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the 
> > > namespace
> > >
> > > From: Rafael J. Wysocki 
> > >
> > > On some systems the platform firmware expects GPEs to be enabled
> > > before the enumeration of devices and if that expectation is not
> > > met, the systems in question may not boot in some situations.
> > >
> > > For this reason, change the initialization ordering of the ACPI
> > > subsystem to make it enable GPEs before scanning the namespace
> > > for the first time in order to enumerate devices.
> > >
> > > Reported-by: Mika Westerberg 
> > > Suggested-by: Mika Westerberg 
> > > Signed-off-by: Rafael J. Wysocki 
> > > ---
> > >  drivers/acpi/scan.c |8 
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > >
> > > Index: linux-pm/drivers/acpi/scan.c
> > > ===
> > > --- linux-pm.orig/drivers/acpi/scan.c
> > > +++ linux-pm/drivers/acpi/scan.c
> > > @@ -2139,6 +2139,10 @@ int __init acpi_scan_init(void)
> > >   acpi_get_spcr_uart_addr();
> > >   }
> > >
> > > + acpi_gpe_apply_masked_gpes();
> > > + acpi_update_all_gpes();
> > > + acpi_ec_ecdt_start();
> > > +
> >
> > Just for your information.
> > A recent internal bug reveals that acpi_ec_ecdt_start() should only be
> > invoked after the enumeration (acpi_ec_add()) for now.
> > The function contains logics that need to be altered by acpi_ec_add().
> >
> > So it seems we can only do less aggressive change by moving the GPE
> > related 2 lines up.
> 
> OK, done.
> 
> Please check my linux-next branch and see if that's what it should be.

I confirmed.
And refreshed my EC regression fix on top of that with v2 tagged in the 
subjects.

Thanks and best regards
Lv


RE: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time

2017-08-16 Thread Zheng, Lv
Hi,

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Rafael J.
> Wysocki
> Subject: Re: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> 
> On Tuesday, August 15, 2017 11:59:00 AM CEST Zheng, Lv wrote:
> > Hi,
> >
> > > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> > > Subject: Re: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> > >
> > > On Friday, August 11, 2017 7:40:56 AM CEST Zheng, Lv wrote:
> > > > Hi, Rafael
> > > >
> > > > > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> > > > > Subject: Re: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> > > > >
> > > > > On Thursday, August 10, 2017 3:48:58 AM CEST Zheng, Lv wrote:
> > > > > > Hi, Rafael
> > > > > >
> > > > > > > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> > > > > > > Subject: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> > > > > > >
> > > > > > > From: Rafael J. Wysocki <rafael.j.wyso...@intel.com>
> > > > > > >
> > > > > > > In some cases GPEs are already active when they are enabled by
> > > > > > > acpi_ev_initialize_gpe_block() and whatever happens next may 
> > > > > > > depend
> > > > > > > on the result of handling the events signaled by them, so the
> > > > > > > events should not be discarded (which is what happens currently) 
> > > > > > > and
> > > > > > > they should be handled as soon as reasonably possible.
> > > > > > >
> > > > > > > For this reason, modify acpi_ev_initialize_gpe_block() to
> > > > > > > dispatch GPEs with the status flag set in-band right after
> > > > > > > enabling them.
> > > > > >
> > > > > > In fact, what we need seems to be invoking acpi_ev_gpe_dispatch()
> > > > > > right after enabling an GPE. So there are 2 conditions related:
> > > > > > 1. GPE is enabled for the first time.
> > > > > > 2. GPE is initialized.
> > > > > >
> > > > > > And we need to make sure that before acpi_update_all_gpes() is 
> > > > > > invoked,
> > > > > > all GPE EN bits are actually disabled.
> > > > >
> > > > > But we don't do it today, do we?
> > > >
> > > > We don't do that.
> > > >
> > > > >
> > > > > And still calling _dispatch() should not be incorrect even if the GPE
> > > > > has been enabled already at this point.  Worst case it just will
> > > > > queue up the execution of _Lxx/_Exx which may or may not do anything
> > > > > useful.
> > > > >
> > > > > And BTW this is all done under acpi_gbl_gpe_lock so 
> > > > > acpi_ev_gpe_detect()
> > > > > will block on it if run concurrently and we've checked the status, so
> > > > > we know that the GPE *should* be dispatched, so I sort of fail to see
> > > > > the problem.
> > > >
> > > > There is another problem related:
> > > > ACPICA clears GPEs before enabling it.
> > > > This is proven to be wrong, and we have to fix it:
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=196249
> > > >
> > > > without fixing this issue, in this solution, we surely need to save the
> > > > GPE STS bit before incrementing GPE reference count, and poll it 
> > > > according
> > > > to the saved STS bit. Because if we poll it after enabling, STS bit will
> > > > be wrongly cleared.
> > >
> > > I'm not sure if I understand you correctly, but why would we poll it?
> > >
> > > In the $subject patch the status is checked and then
> > > acpi_ev_add_gpe_reference() is called to add a reference to the GPE.
> > >
> > > If this is the first reference (which will be the case in the majority
> > > of cases), acpi_ev_enable_gpe() will be called and that will clear the
> > > status.
> > >
> > > Then, acpi_ev_gpe_dispatch() is called if the status was set and that
> > > itself doesn't check the status.  It disables the GPE upfront (so the
> > > status doesn't matter from now on until the GPE is enabled again) and
> > > clears the status u

RE: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time

2017-08-16 Thread Zheng, Lv
Hi,

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Rafael J.
> Wysocki
> Subject: Re: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> 
> On Tuesday, August 15, 2017 11:59:00 AM CEST Zheng, Lv wrote:
> > Hi,
> >
> > > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> > > Subject: Re: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> > >
> > > On Friday, August 11, 2017 7:40:56 AM CEST Zheng, Lv wrote:
> > > > Hi, Rafael
> > > >
> > > > > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> > > > > Subject: Re: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> > > > >
> > > > > On Thursday, August 10, 2017 3:48:58 AM CEST Zheng, Lv wrote:
> > > > > > Hi, Rafael
> > > > > >
> > > > > > > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> > > > > > > Subject: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> > > > > > >
> > > > > > > From: Rafael J. Wysocki 
> > > > > > >
> > > > > > > In some cases GPEs are already active when they are enabled by
> > > > > > > acpi_ev_initialize_gpe_block() and whatever happens next may 
> > > > > > > depend
> > > > > > > on the result of handling the events signaled by them, so the
> > > > > > > events should not be discarded (which is what happens currently) 
> > > > > > > and
> > > > > > > they should be handled as soon as reasonably possible.
> > > > > > >
> > > > > > > For this reason, modify acpi_ev_initialize_gpe_block() to
> > > > > > > dispatch GPEs with the status flag set in-band right after
> > > > > > > enabling them.
> > > > > >
> > > > > > In fact, what we need seems to be invoking acpi_ev_gpe_dispatch()
> > > > > > right after enabling an GPE. So there are 2 conditions related:
> > > > > > 1. GPE is enabled for the first time.
> > > > > > 2. GPE is initialized.
> > > > > >
> > > > > > And we need to make sure that before acpi_update_all_gpes() is 
> > > > > > invoked,
> > > > > > all GPE EN bits are actually disabled.
> > > > >
> > > > > But we don't do it today, do we?
> > > >
> > > > We don't do that.
> > > >
> > > > >
> > > > > And still calling _dispatch() should not be incorrect even if the GPE
> > > > > has been enabled already at this point.  Worst case it just will
> > > > > queue up the execution of _Lxx/_Exx which may or may not do anything
> > > > > useful.
> > > > >
> > > > > And BTW this is all done under acpi_gbl_gpe_lock so 
> > > > > acpi_ev_gpe_detect()
> > > > > will block on it if run concurrently and we've checked the status, so
> > > > > we know that the GPE *should* be dispatched, so I sort of fail to see
> > > > > the problem.
> > > >
> > > > There is another problem related:
> > > > ACPICA clears GPEs before enabling it.
> > > > This is proven to be wrong, and we have to fix it:
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=196249
> > > >
> > > > without fixing this issue, in this solution, we surely need to save the
> > > > GPE STS bit before incrementing GPE reference count, and poll it 
> > > > according
> > > > to the saved STS bit. Because if we poll it after enabling, STS bit will
> > > > be wrongly cleared.
> > >
> > > I'm not sure if I understand you correctly, but why would we poll it?
> > >
> > > In the $subject patch the status is checked and then
> > > acpi_ev_add_gpe_reference() is called to add a reference to the GPE.
> > >
> > > If this is the first reference (which will be the case in the majority
> > > of cases), acpi_ev_enable_gpe() will be called and that will clear the
> > > status.
> > >
> > > Then, acpi_ev_gpe_dispatch() is called if the status was set and that
> > > itself doesn't check the status.  It disables the GPE upfront (so the
> > > status doesn't matter from now on until the GPE is enabled again) and
> > > clears the status unconditionally if the GPE 

RE: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time

2017-08-15 Thread Zheng, Lv
Hi,

> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Subject: Re: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> 
> On Friday, August 11, 2017 7:40:56 AM CEST Zheng, Lv wrote:
> > Hi, Rafael
> >
> > > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> > > Subject: Re: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> > >
> > > On Thursday, August 10, 2017 3:48:58 AM CEST Zheng, Lv wrote:
> > > > Hi, Rafael
> > > >
> > > > > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> > > > > Subject: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> > > > >
> > > > > From: Rafael J. Wysocki <rafael.j.wyso...@intel.com>
> > > > >
> > > > > In some cases GPEs are already active when they are enabled by
> > > > > acpi_ev_initialize_gpe_block() and whatever happens next may depend
> > > > > on the result of handling the events signaled by them, so the
> > > > > events should not be discarded (which is what happens currently) and
> > > > > they should be handled as soon as reasonably possible.
> > > > >
> > > > > For this reason, modify acpi_ev_initialize_gpe_block() to
> > > > > dispatch GPEs with the status flag set in-band right after
> > > > > enabling them.
> > > >
> > > > In fact, what we need seems to be invoking acpi_ev_gpe_dispatch()
> > > > right after enabling an GPE. So there are 2 conditions related:
> > > > 1. GPE is enabled for the first time.
> > > > 2. GPE is initialized.
> > > >
> > > > And we need to make sure that before acpi_update_all_gpes() is invoked,
> > > > all GPE EN bits are actually disabled.
> > >
> > > But we don't do it today, do we?
> >
> > We don't do that.
> >
> > >
> > > And still calling _dispatch() should not be incorrect even if the GPE
> > > has been enabled already at this point.  Worst case it just will
> > > queue up the execution of _Lxx/_Exx which may or may not do anything
> > > useful.
> > >
> > > And BTW this is all done under acpi_gbl_gpe_lock so acpi_ev_gpe_detect()
> > > will block on it if run concurrently and we've checked the status, so
> > > we know that the GPE *should* be dispatched, so I sort of fail to see
> > > the problem.
> >
> > There is another problem related:
> > ACPICA clears GPEs before enabling it.
> > This is proven to be wrong, and we have to fix it:
> > https://bugzilla.kernel.org/show_bug.cgi?id=196249
> >
> > without fixing this issue, in this solution, we surely need to save the
> > GPE STS bit before incrementing GPE reference count, and poll it according
> > to the saved STS bit. Because if we poll it after enabling, STS bit will
> > be wrongly cleared.
> 
> I'm not sure if I understand you correctly, but why would we poll it?
> 
> In the $subject patch the status is checked and then
> acpi_ev_add_gpe_reference() is called to add a reference to the GPE.
> 
> If this is the first reference (which will be the case in the majority
> of cases), acpi_ev_enable_gpe() will be called and that will clear the
> status.
> 
> Then, acpi_ev_gpe_dispatch() is called if the status was set and that
> itself doesn't check the status.  It disables the GPE upfront (so the
> status doesn't matter from now on until the GPE is enabled again) and
> clears the status unconditionally if the GPE is edge-triggered.  This
> means that for edge-triggered GPEs the clearing of the status by
> acpi_ev_enable_gpe() doesn't matter here.

No problem, I understood.
And was thinking thought this patch [edge GPE dispatch fix] should be
correct and good for current Linux upstream.

What I meant is:
PATCH [edge GPE clear fix] https://patchwork.kernel.org/patch/9894983/
is a fix we need for upstream as it is the only possible fix for the
issue fixed by it.
On top of that, when acpi_ev_enable_gpe() is called, GPE won't be cleared.
and then things can be done in a simpler way:
PATCH [edge GPE enable fix] https://patchwork.kernel.org/patch/9894989/

As [edge GPE clear fix] is risky, I think [edge GPE dispatch fix] is OK
for Linux upstream.

So we can have 2 processes:
1. Merge [edge GPE dispatch fix] and let [edge GPE clear fix] and
   [edge GPE enable fix] released from ACPICA upstream so that:
   1. We can enhance them in ACPICA upstream.
   2. It will be regression safer for us to merge [edge GPE clear fix].
2. Merge [edge GPE clear fix] and [edge GPE enable fix] without
   merging [edge GPE dispatch fix].

What I meant is:

RE: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time

2017-08-15 Thread Zheng, Lv
Hi,

> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Subject: Re: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> 
> On Friday, August 11, 2017 7:40:56 AM CEST Zheng, Lv wrote:
> > Hi, Rafael
> >
> > > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> > > Subject: Re: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> > >
> > > On Thursday, August 10, 2017 3:48:58 AM CEST Zheng, Lv wrote:
> > > > Hi, Rafael
> > > >
> > > > > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> > > > > Subject: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> > > > >
> > > > > From: Rafael J. Wysocki 
> > > > >
> > > > > In some cases GPEs are already active when they are enabled by
> > > > > acpi_ev_initialize_gpe_block() and whatever happens next may depend
> > > > > on the result of handling the events signaled by them, so the
> > > > > events should not be discarded (which is what happens currently) and
> > > > > they should be handled as soon as reasonably possible.
> > > > >
> > > > > For this reason, modify acpi_ev_initialize_gpe_block() to
> > > > > dispatch GPEs with the status flag set in-band right after
> > > > > enabling them.
> > > >
> > > > In fact, what we need seems to be invoking acpi_ev_gpe_dispatch()
> > > > right after enabling an GPE. So there are 2 conditions related:
> > > > 1. GPE is enabled for the first time.
> > > > 2. GPE is initialized.
> > > >
> > > > And we need to make sure that before acpi_update_all_gpes() is invoked,
> > > > all GPE EN bits are actually disabled.
> > >
> > > But we don't do it today, do we?
> >
> > We don't do that.
> >
> > >
> > > And still calling _dispatch() should not be incorrect even if the GPE
> > > has been enabled already at this point.  Worst case it just will
> > > queue up the execution of _Lxx/_Exx which may or may not do anything
> > > useful.
> > >
> > > And BTW this is all done under acpi_gbl_gpe_lock so acpi_ev_gpe_detect()
> > > will block on it if run concurrently and we've checked the status, so
> > > we know that the GPE *should* be dispatched, so I sort of fail to see
> > > the problem.
> >
> > There is another problem related:
> > ACPICA clears GPEs before enabling it.
> > This is proven to be wrong, and we have to fix it:
> > https://bugzilla.kernel.org/show_bug.cgi?id=196249
> >
> > without fixing this issue, in this solution, we surely need to save the
> > GPE STS bit before incrementing GPE reference count, and poll it according
> > to the saved STS bit. Because if we poll it after enabling, STS bit will
> > be wrongly cleared.
> 
> I'm not sure if I understand you correctly, but why would we poll it?
> 
> In the $subject patch the status is checked and then
> acpi_ev_add_gpe_reference() is called to add a reference to the GPE.
> 
> If this is the first reference (which will be the case in the majority
> of cases), acpi_ev_enable_gpe() will be called and that will clear the
> status.
> 
> Then, acpi_ev_gpe_dispatch() is called if the status was set and that
> itself doesn't check the status.  It disables the GPE upfront (so the
> status doesn't matter from now on until the GPE is enabled again) and
> clears the status unconditionally if the GPE is edge-triggered.  This
> means that for edge-triggered GPEs the clearing of the status by
> acpi_ev_enable_gpe() doesn't matter here.

No problem, I understood.
And was thinking thought this patch [edge GPE dispatch fix] should be
correct and good for current Linux upstream.

What I meant is:
PATCH [edge GPE clear fix] https://patchwork.kernel.org/patch/9894983/
is a fix we need for upstream as it is the only possible fix for the
issue fixed by it.
On top of that, when acpi_ev_enable_gpe() is called, GPE won't be cleared.
and then things can be done in a simpler way:
PATCH [edge GPE enable fix] https://patchwork.kernel.org/patch/9894989/

As [edge GPE clear fix] is risky, I think [edge GPE dispatch fix] is OK
for Linux upstream.

So we can have 2 processes:
1. Merge [edge GPE dispatch fix] and let [edge GPE clear fix] and
   [edge GPE enable fix] released from ACPICA upstream so that:
   1. We can enhance them in ACPICA upstream.
   2. It will be regression safer for us to merge [edge GPE clear fix].
2. Merge [edge GPE clear fix] and [edge GPE enable fix] without
   merging [edge GPE dispatch fix].

What I meant is:
It's up to you to decide which pr

RE: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the namespace

2017-08-14 Thread Zheng, Lv
Hi, Rafael

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Rafael J.
> Wysocki
> Subject: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the namespace
> 
> From: Rafael J. Wysocki 
> 
> On some systems the platform firmware expects GPEs to be enabled
> before the enumeration of devices and if that expectation is not
> met, the systems in question may not boot in some situations.
> 
> For this reason, change the initialization ordering of the ACPI
> subsystem to make it enable GPEs before scanning the namespace
> for the first time in order to enumerate devices.
> 
> Reported-by: Mika Westerberg 
> Suggested-by: Mika Westerberg 
> Signed-off-by: Rafael J. Wysocki 
> ---
>  drivers/acpi/scan.c |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/drivers/acpi/scan.c
> ===
> --- linux-pm.orig/drivers/acpi/scan.c
> +++ linux-pm/drivers/acpi/scan.c
> @@ -2139,6 +2139,10 @@ int __init acpi_scan_init(void)
>   acpi_get_spcr_uart_addr();
>   }
> 
> + acpi_gpe_apply_masked_gpes();
> + acpi_update_all_gpes();
> + acpi_ec_ecdt_start();
> +

Just for your information.
A recent internal bug reveals that acpi_ec_ecdt_start() should only be
invoked after the enumeration (acpi_ec_add()) for now.
The function contains logics that need to be altered by acpi_ec_add().

So it seems we can only do less aggressive change by moving the GPE
related 2 lines up.

Thanks and best regards
Lv

>   mutex_lock(_scan_lock);
>   /*
>* Enumerate devices in the ACPI namespace.
> @@ -2163,10 +2167,6 @@ int __init acpi_scan_init(void)
>   }
>   }
> 
> - acpi_gpe_apply_masked_gpes();
> - acpi_update_all_gpes();
> - acpi_ec_ecdt_start();
> -
>   acpi_scan_initialized = true;
> 
>   out:
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the namespace

2017-08-14 Thread Zheng, Lv
Hi, Rafael

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Rafael J.
> Wysocki
> Subject: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the namespace
> 
> From: Rafael J. Wysocki 
> 
> On some systems the platform firmware expects GPEs to be enabled
> before the enumeration of devices and if that expectation is not
> met, the systems in question may not boot in some situations.
> 
> For this reason, change the initialization ordering of the ACPI
> subsystem to make it enable GPEs before scanning the namespace
> for the first time in order to enumerate devices.
> 
> Reported-by: Mika Westerberg 
> Suggested-by: Mika Westerberg 
> Signed-off-by: Rafael J. Wysocki 
> ---
>  drivers/acpi/scan.c |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/drivers/acpi/scan.c
> ===
> --- linux-pm.orig/drivers/acpi/scan.c
> +++ linux-pm/drivers/acpi/scan.c
> @@ -2139,6 +2139,10 @@ int __init acpi_scan_init(void)
>   acpi_get_spcr_uart_addr();
>   }
> 
> + acpi_gpe_apply_masked_gpes();
> + acpi_update_all_gpes();
> + acpi_ec_ecdt_start();
> +

Just for your information.
A recent internal bug reveals that acpi_ec_ecdt_start() should only be
invoked after the enumeration (acpi_ec_add()) for now.
The function contains logics that need to be altered by acpi_ec_add().

So it seems we can only do less aggressive change by moving the GPE
related 2 lines up.

Thanks and best regards
Lv

>   mutex_lock(_scan_lock);
>   /*
>* Enumerate devices in the ACPI namespace.
> @@ -2163,10 +2167,6 @@ int __init acpi_scan_init(void)
>   }
>   }
> 
> - acpi_gpe_apply_masked_gpes();
> - acpi_update_all_gpes();
> - acpi_ec_ecdt_start();
> -
>   acpi_scan_initialized = true;
> 
>   out:
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/3] ACPICA: Make it possible to enable runtime GPEs earlier

2017-08-11 Thread Zheng, Lv
Hi,

> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Subject: Re: [PATCH 2/3] ACPICA: Make it possible to enable runtime GPEs 
> earlier
> 
> On Thursday, August 10, 2017 3:52:05 AM CEST Zheng, Lv wrote:
> > Hi, Rafael
> >
> > For this patch, I have a concern.
> >
> > > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> > > Subject: [PATCH 2/3] ACPICA: Make it possible to enable runtime GPEs 
> > > earlier
> > >
> > > From: Rafael J. Wysocki <rafael.j.wyso...@intel.com>
> > >
> > > Runtime GPEs have corresponding _Lxx/_Exx methods and are enabled
> > > automatically during the initialization of the ACPI subsystem through
> > > acpi_update_all_gpes() with the assumption that acpi_setup_gpe_for_wake()
> > > will be called in advance for all of the GPEs pointed to by _PRW
> > > objects in the namespace that may be affected by acpi_update_all_gpes().
> > > That is, acpi_ev_initialize_gpe_block() can only be called for a GPE
> > > block after acpi_setup_gpe_for_wake() has been called for all of the
> > > _PRW (wakeup) GPEs in it.
> > >
> > > The platform firmware on some systems, however, expects GPEs to be
> > > enabled before the enumeration of devices which is when
> > > acpi_setup_gpe_for_wake() is called and that goes against the above
> > > assumption.
> > >
> > > For this reason, introduce a new flag to be set by
> > > acpi_ev_initialize_gpe_block() when automatically enabling a GPE
> > > to indicate to acpi_setup_gpe_for_wake() that it needs to drop the
> > > reference to the GPE coming from acpi_ev_initialize_gpe_block()
> > > and modify acpi_setup_gpe_for_wake() accordingly.  These changes
> > > allow acpi_setup_gpe_for_wake() and acpi_ev_initialize_gpe_block()
> > > to be invoked in any order.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com>
> > > ---
> > >  drivers/acpi/acpica/evgpeblk.c |2 ++
> > >  drivers/acpi/acpica/evxfgpe.c  |8 
> > >  include/acpi/actypes.h |3 ++-
> > >  3 files changed, 12 insertions(+), 1 deletion(-)
> > >
> > > Index: linux-pm/drivers/acpi/acpica/evgpeblk.c
> > > ===
> > > --- linux-pm.orig/drivers/acpi/acpica/evgpeblk.c
> > > +++ linux-pm/drivers/acpi/acpica/evgpeblk.c
> > > @@ -496,6 +496,8 @@ acpi_ev_initialize_gpe_block(struct acpi
> > >   continue;
> > >   }
> > >
> > > + gpe_event_info->flags |= ACPI_GPE_AUTO_ENABLED;
> > > +
> > >   if (event_status & ACPI_EVENT_FLAG_STATUS_SET) {
> > >   ACPI_INFO(("GPE 0x%02X active on init",
> > >  gpe_number));
> > > Index: linux-pm/include/acpi/actypes.h
> > > ===
> > > --- linux-pm.orig/include/acpi/actypes.h
> > > +++ linux-pm/include/acpi/actypes.h
> > > @@ -783,7 +783,7 @@ typedef u32 acpi_event_status;
> > >   *   |  | | |  +-- Type of dispatch:to method, handler, notify, or none
> > >   *   |  | | +- Interrupt type: edge or level triggered
> > >   *   |  | +--- Is a Wake GPE
> > > - *   |  +- Is GPE masked by the software GPE masking mechanism
> > > + *   |  +- Has been enabled automatically at init time
> > >   *   + 
> > >   */
> > >  #define ACPI_GPE_DISPATCH_NONE  (u8) 0x00
> > > @@ -799,6 +799,7 @@ typedef u32 acpi_event_status;
> > >  #define ACPI_GPE_XRUPT_TYPE_MASK(u8) 0x08
> > >
> > >  #define ACPI_GPE_CAN_WAKE   (u8) 0x10
> > > +#define ACPI_GPE_AUTO_ENABLED   (u8) 0x20
> > >
> > >  /*
> > >   * Flags for GPE and Lock interfaces
> > > Index: linux-pm/drivers/acpi/acpica/evxfgpe.c
> > > ===
> > > --- linux-pm.orig/drivers/acpi/acpica/evxfgpe.c
> > > +++ linux-pm/drivers/acpi/acpica/evxfgpe.c
> > > @@ -435,6 +435,14 @@ acpi_setup_gpe_for_wake(acpi_handle wake
> > >*/
> > >   gpe_event_info->flags =
> > >   (ACPI_GPE_DISPATCH_NOTIFY | ACPI_GPE_LEVEL_TRIGGERED);
> > > + } else if (gpe_event_info->flags & ACPI_GPE_AUTO_ENABLED) {
> > > + /*
&g

RE: [PATCH 2/3] ACPICA: Make it possible to enable runtime GPEs earlier

2017-08-11 Thread Zheng, Lv
Hi,

> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Subject: Re: [PATCH 2/3] ACPICA: Make it possible to enable runtime GPEs 
> earlier
> 
> On Thursday, August 10, 2017 3:52:05 AM CEST Zheng, Lv wrote:
> > Hi, Rafael
> >
> > For this patch, I have a concern.
> >
> > > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> > > Subject: [PATCH 2/3] ACPICA: Make it possible to enable runtime GPEs 
> > > earlier
> > >
> > > From: Rafael J. Wysocki 
> > >
> > > Runtime GPEs have corresponding _Lxx/_Exx methods and are enabled
> > > automatically during the initialization of the ACPI subsystem through
> > > acpi_update_all_gpes() with the assumption that acpi_setup_gpe_for_wake()
> > > will be called in advance for all of the GPEs pointed to by _PRW
> > > objects in the namespace that may be affected by acpi_update_all_gpes().
> > > That is, acpi_ev_initialize_gpe_block() can only be called for a GPE
> > > block after acpi_setup_gpe_for_wake() has been called for all of the
> > > _PRW (wakeup) GPEs in it.
> > >
> > > The platform firmware on some systems, however, expects GPEs to be
> > > enabled before the enumeration of devices which is when
> > > acpi_setup_gpe_for_wake() is called and that goes against the above
> > > assumption.
> > >
> > > For this reason, introduce a new flag to be set by
> > > acpi_ev_initialize_gpe_block() when automatically enabling a GPE
> > > to indicate to acpi_setup_gpe_for_wake() that it needs to drop the
> > > reference to the GPE coming from acpi_ev_initialize_gpe_block()
> > > and modify acpi_setup_gpe_for_wake() accordingly.  These changes
> > > allow acpi_setup_gpe_for_wake() and acpi_ev_initialize_gpe_block()
> > > to be invoked in any order.
> > >
> > > Signed-off-by: Rafael J. Wysocki 
> > > ---
> > >  drivers/acpi/acpica/evgpeblk.c |2 ++
> > >  drivers/acpi/acpica/evxfgpe.c  |8 
> > >  include/acpi/actypes.h |3 ++-
> > >  3 files changed, 12 insertions(+), 1 deletion(-)
> > >
> > > Index: linux-pm/drivers/acpi/acpica/evgpeblk.c
> > > ===
> > > --- linux-pm.orig/drivers/acpi/acpica/evgpeblk.c
> > > +++ linux-pm/drivers/acpi/acpica/evgpeblk.c
> > > @@ -496,6 +496,8 @@ acpi_ev_initialize_gpe_block(struct acpi
> > >   continue;
> > >   }
> > >
> > > + gpe_event_info->flags |= ACPI_GPE_AUTO_ENABLED;
> > > +
> > >   if (event_status & ACPI_EVENT_FLAG_STATUS_SET) {
> > >   ACPI_INFO(("GPE 0x%02X active on init",
> > >  gpe_number));
> > > Index: linux-pm/include/acpi/actypes.h
> > > ===
> > > --- linux-pm.orig/include/acpi/actypes.h
> > > +++ linux-pm/include/acpi/actypes.h
> > > @@ -783,7 +783,7 @@ typedef u32 acpi_event_status;
> > >   *   |  | | |  +-- Type of dispatch:to method, handler, notify, or none
> > >   *   |  | | +- Interrupt type: edge or level triggered
> > >   *   |  | +--- Is a Wake GPE
> > > - *   |  +- Is GPE masked by the software GPE masking mechanism
> > > + *   |  +- Has been enabled automatically at init time
> > >   *   + 
> > >   */
> > >  #define ACPI_GPE_DISPATCH_NONE  (u8) 0x00
> > > @@ -799,6 +799,7 @@ typedef u32 acpi_event_status;
> > >  #define ACPI_GPE_XRUPT_TYPE_MASK(u8) 0x08
> > >
> > >  #define ACPI_GPE_CAN_WAKE   (u8) 0x10
> > > +#define ACPI_GPE_AUTO_ENABLED   (u8) 0x20
> > >
> > >  /*
> > >   * Flags for GPE and Lock interfaces
> > > Index: linux-pm/drivers/acpi/acpica/evxfgpe.c
> > > ===
> > > --- linux-pm.orig/drivers/acpi/acpica/evxfgpe.c
> > > +++ linux-pm/drivers/acpi/acpica/evxfgpe.c
> > > @@ -435,6 +435,14 @@ acpi_setup_gpe_for_wake(acpi_handle wake
> > >*/
> > >   gpe_event_info->flags =
> > >   (ACPI_GPE_DISPATCH_NOTIFY | ACPI_GPE_LEVEL_TRIGGERED);
> > > + } else if (gpe_event_info->flags & ACPI_GPE_AUTO_ENABLED) {
> > > + /*
> > > +  * A reference to this GPE has been added during the GPE block
> > > +  * initialization, so drop it now to prevent the GPE from being
> > > +  * permanently enabled and clear its ACPI_GPE_AUTO_ENABLED flag.
> > > +  */
> > > + (void)acpi_ev_remove_gpe_reference(gpe_event_info);
> > > + gpe_event_info->flags &= ~ACPI_GPE_AUTO_ENABLED;
> >
> > The problem is if the GPE is shared, how can we know decrement reference
> > once can sufficiently convert it into wakeup dispatcher owned GPE?
> 
> Even if it is shared, the current code will not enable it if it sees
> ACPI_GPE_CAN_WAKE set.
> 
> We can change that logic, but that should be a separate patch IMO and
> this is not related to the problem at hand.

OK, I see.
We can enhance that on top of these fixes.

Thanks,
Lv


RE: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time

2017-08-10 Thread Zheng, Lv
Hi, Rafael

> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Subject: Re: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> 
> On Thursday, August 10, 2017 3:48:58 AM CEST Zheng, Lv wrote:
> > Hi, Rafael
> >
> > > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> > > Subject: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> > >
> > > From: Rafael J. Wysocki <rafael.j.wyso...@intel.com>
> > >
> > > In some cases GPEs are already active when they are enabled by
> > > acpi_ev_initialize_gpe_block() and whatever happens next may depend
> > > on the result of handling the events signaled by them, so the
> > > events should not be discarded (which is what happens currently) and
> > > they should be handled as soon as reasonably possible.
> > >
> > > For this reason, modify acpi_ev_initialize_gpe_block() to
> > > dispatch GPEs with the status flag set in-band right after
> > > enabling them.
> >
> > In fact, what we need seems to be invoking acpi_ev_gpe_dispatch()
> > right after enabling an GPE. So there are 2 conditions related:
> > 1. GPE is enabled for the first time.
> > 2. GPE is initialized.
> >
> > And we need to make sure that before acpi_update_all_gpes() is invoked,
> > all GPE EN bits are actually disabled.
> 
> But we don't do it today, do we?

We don't do that.

> 
> And still calling _dispatch() should not be incorrect even if the GPE
> has been enabled already at this point.  Worst case it just will
> queue up the execution of _Lxx/_Exx which may or may not do anything
> useful.
> 
> And BTW this is all done under acpi_gbl_gpe_lock so acpi_ev_gpe_detect()
> will block on it if run concurrently and we've checked the status, so
> we know that the GPE *should* be dispatched, so I sort of fail to see
> the problem.

There is another problem related:
ACPICA clears GPEs before enabling it.
This is proven to be wrong, and we have to fix it:
https://bugzilla.kernel.org/show_bug.cgi?id=196249

without fixing this issue, in this solution, we surely need to save the
GPE STS bit before incrementing GPE reference count, and poll it according
to the saved STS bit. Because if we poll it after enabling, STS bit will
be wrongly cleared.

So if we can do this on top of the "GPE clear" fix, things can be done
in a simpler way - invoke acpi_ev_gpe_detect() after fully initializing
GPEs (as what I pasted).

However I should say - merging "GPE clear" fix might be risky.
So this patch can be in upstream prior than the simpler solution to leave
us a stable base line.

I'll send out the simpler solution along with the "GPE clear" fix.
Maybe you can consider to ship it after merging this patch.

Thanks and best regards
Lv


RE: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time

2017-08-10 Thread Zheng, Lv
Hi, Rafael

> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Subject: Re: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> 
> On Thursday, August 10, 2017 3:48:58 AM CEST Zheng, Lv wrote:
> > Hi, Rafael
> >
> > > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> > > Subject: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> > >
> > > From: Rafael J. Wysocki 
> > >
> > > In some cases GPEs are already active when they are enabled by
> > > acpi_ev_initialize_gpe_block() and whatever happens next may depend
> > > on the result of handling the events signaled by them, so the
> > > events should not be discarded (which is what happens currently) and
> > > they should be handled as soon as reasonably possible.
> > >
> > > For this reason, modify acpi_ev_initialize_gpe_block() to
> > > dispatch GPEs with the status flag set in-band right after
> > > enabling them.
> >
> > In fact, what we need seems to be invoking acpi_ev_gpe_dispatch()
> > right after enabling an GPE. So there are 2 conditions related:
> > 1. GPE is enabled for the first time.
> > 2. GPE is initialized.
> >
> > And we need to make sure that before acpi_update_all_gpes() is invoked,
> > all GPE EN bits are actually disabled.
> 
> But we don't do it today, do we?

We don't do that.

> 
> And still calling _dispatch() should not be incorrect even if the GPE
> has been enabled already at this point.  Worst case it just will
> queue up the execution of _Lxx/_Exx which may or may not do anything
> useful.
> 
> And BTW this is all done under acpi_gbl_gpe_lock so acpi_ev_gpe_detect()
> will block on it if run concurrently and we've checked the status, so
> we know that the GPE *should* be dispatched, so I sort of fail to see
> the problem.

There is another problem related:
ACPICA clears GPEs before enabling it.
This is proven to be wrong, and we have to fix it:
https://bugzilla.kernel.org/show_bug.cgi?id=196249

without fixing this issue, in this solution, we surely need to save the
GPE STS bit before incrementing GPE reference count, and poll it according
to the saved STS bit. Because if we poll it after enabling, STS bit will
be wrongly cleared.

So if we can do this on top of the "GPE clear" fix, things can be done
in a simpler way - invoke acpi_ev_gpe_detect() after fully initializing
GPEs (as what I pasted).

However I should say - merging "GPE clear" fix might be risky.
So this patch can be in upstream prior than the simpler solution to leave
us a stable base line.

I'll send out the simpler solution along with the "GPE clear" fix.
Maybe you can consider to ship it after merging this patch.

Thanks and best regards
Lv


RE: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the namespace

2017-08-10 Thread Zheng, Lv
Hi,

> From: Lukas Wunner [mailto:lu...@wunner.de]
> Subject: Re: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the 
> namespace
> 
> On Thu, Aug 10, 2017 at 12:34:23AM +0200, Rafael J. Wysocki wrote:
> > --- linux-pm.orig/drivers/acpi/scan.c
> > +++ linux-pm/drivers/acpi/scan.c
> > @@ -2139,6 +2139,10 @@ int __init acpi_scan_init(void)
> > acpi_get_spcr_uart_addr();
> > }
> >
> > +   acpi_gpe_apply_masked_gpes();
> > +   acpi_update_all_gpes();
> > +   acpi_ec_ecdt_start();
> > +
> > mutex_lock(_scan_lock);
> > /*
> >  * Enumerate devices in the ACPI namespace.
> 
> I notice this is called from a subsys_initcall().  We scan the PCI bus
> much earlier in arch/x86/kernel/early-quirks.c and it would be possible
> to identify presence of Thunderbolt host controllers in an early quirk
> (using the method of pci_is_thunderbolt_attached()) and, if found,
> enable their GPEs or all GPEs.

We have 2 choices here:
1. GPE is a part of device enumeration.
   GPE must be enabled one by one after making sure that all related
   bus/devices are powered on.
   But it seems there is no such relationship between GPE and bus/device
   in ACPI spec.
   However if Windows works in this way, we'll regress by applying this
   patch.
2. GPE is not a part of device enumeration.
   Then probably we can even move GPE enabling into
   acip_initialize_objects(), before calling
   acpi_ns_initialize_devices(). As after preparing ACPI namespace,
   _Lxx/_Exx is ready, and ACPI subsystem should be able to handle early
   GPEs.
   And ACPI subsystem's device enumeration starts from
   acpi_ns_initialize_devices().

So this patch should be correct in theory. ;)

Thanks and best regards
Lv

> 
> Just as an aside in case your method doesn't work, I'm not affected by
> this issue being a Mac user... ;-)
> 
> Thanks,
> 
> Lukas


RE: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the namespace

2017-08-10 Thread Zheng, Lv
Hi,

> From: Lukas Wunner [mailto:lu...@wunner.de]
> Subject: Re: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the 
> namespace
> 
> On Thu, Aug 10, 2017 at 12:34:23AM +0200, Rafael J. Wysocki wrote:
> > --- linux-pm.orig/drivers/acpi/scan.c
> > +++ linux-pm/drivers/acpi/scan.c
> > @@ -2139,6 +2139,10 @@ int __init acpi_scan_init(void)
> > acpi_get_spcr_uart_addr();
> > }
> >
> > +   acpi_gpe_apply_masked_gpes();
> > +   acpi_update_all_gpes();
> > +   acpi_ec_ecdt_start();
> > +
> > mutex_lock(_scan_lock);
> > /*
> >  * Enumerate devices in the ACPI namespace.
> 
> I notice this is called from a subsys_initcall().  We scan the PCI bus
> much earlier in arch/x86/kernel/early-quirks.c and it would be possible
> to identify presence of Thunderbolt host controllers in an early quirk
> (using the method of pci_is_thunderbolt_attached()) and, if found,
> enable their GPEs or all GPEs.

We have 2 choices here:
1. GPE is a part of device enumeration.
   GPE must be enabled one by one after making sure that all related
   bus/devices are powered on.
   But it seems there is no such relationship between GPE and bus/device
   in ACPI spec.
   However if Windows works in this way, we'll regress by applying this
   patch.
2. GPE is not a part of device enumeration.
   Then probably we can even move GPE enabling into
   acip_initialize_objects(), before calling
   acpi_ns_initialize_devices(). As after preparing ACPI namespace,
   _Lxx/_Exx is ready, and ACPI subsystem should be able to handle early
   GPEs.
   And ACPI subsystem's device enumeration starts from
   acpi_ns_initialize_devices().

So this patch should be correct in theory. ;)

Thanks and best regards
Lv

> 
> Just as an aside in case your method doesn't work, I'm not affected by
> this issue being a Mac user... ;-)
> 
> Thanks,
> 
> Lukas


RE: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the namespace

2017-08-09 Thread Zheng, Lv
Hi, Rafael

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Rafael J.
> Wysocki
> Subject: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the namespace
> 
> From: Rafael J. Wysocki 
> 
> On some systems the platform firmware expects GPEs to be enabled
> before the enumeration of devices and if that expectation is not
> met, the systems in question may not boot in some situations.
> 
> For this reason, change the initialization ordering of the ACPI
> subsystem to make it enable GPEs before scanning the namespace
> for the first time in order to enumerate devices.

This indeed is worthy of a try.
Acked-by: Lv Zheng 

Thanks and best regards
Lv

> 
> Reported-by: Mika Westerberg 
> Suggested-by: Mika Westerberg 
> Signed-off-by: Rafael J. Wysocki 
> ---
>  drivers/acpi/scan.c |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/drivers/acpi/scan.c
> ===
> --- linux-pm.orig/drivers/acpi/scan.c
> +++ linux-pm/drivers/acpi/scan.c
> @@ -2139,6 +2139,10 @@ int __init acpi_scan_init(void)
>   acpi_get_spcr_uart_addr();
>   }
> 
> + acpi_gpe_apply_masked_gpes();
> + acpi_update_all_gpes();
> + acpi_ec_ecdt_start();
> +
>   mutex_lock(_scan_lock);
>   /*
>* Enumerate devices in the ACPI namespace.
> @@ -2163,10 +2167,6 @@ int __init acpi_scan_init(void)
>   }
>   }
> 
> - acpi_gpe_apply_masked_gpes();
> - acpi_update_all_gpes();
> - acpi_ec_ecdt_start();
> -
>   acpi_scan_initialized = true;
> 
>   out:
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the namespace

2017-08-09 Thread Zheng, Lv
Hi, Rafael

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Rafael J.
> Wysocki
> Subject: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the namespace
> 
> From: Rafael J. Wysocki 
> 
> On some systems the platform firmware expects GPEs to be enabled
> before the enumeration of devices and if that expectation is not
> met, the systems in question may not boot in some situations.
> 
> For this reason, change the initialization ordering of the ACPI
> subsystem to make it enable GPEs before scanning the namespace
> for the first time in order to enumerate devices.

This indeed is worthy of a try.
Acked-by: Lv Zheng 

Thanks and best regards
Lv

> 
> Reported-by: Mika Westerberg 
> Suggested-by: Mika Westerberg 
> Signed-off-by: Rafael J. Wysocki 
> ---
>  drivers/acpi/scan.c |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/drivers/acpi/scan.c
> ===
> --- linux-pm.orig/drivers/acpi/scan.c
> +++ linux-pm/drivers/acpi/scan.c
> @@ -2139,6 +2139,10 @@ int __init acpi_scan_init(void)
>   acpi_get_spcr_uart_addr();
>   }
> 
> + acpi_gpe_apply_masked_gpes();
> + acpi_update_all_gpes();
> + acpi_ec_ecdt_start();
> +
>   mutex_lock(_scan_lock);
>   /*
>* Enumerate devices in the ACPI namespace.
> @@ -2163,10 +2167,6 @@ int __init acpi_scan_init(void)
>   }
>   }
> 
> - acpi_gpe_apply_masked_gpes();
> - acpi_update_all_gpes();
> - acpi_ec_ecdt_start();
> -
>   acpi_scan_initialized = true;
> 
>   out:
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/3] ACPICA: Make it possible to enable runtime GPEs earlier

2017-08-09 Thread Zheng, Lv
Hi, Rafael

For this patch, I have a concern.

> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Subject: [PATCH 2/3] ACPICA: Make it possible to enable runtime GPEs earlier
> 
> From: Rafael J. Wysocki 
> 
> Runtime GPEs have corresponding _Lxx/_Exx methods and are enabled
> automatically during the initialization of the ACPI subsystem through
> acpi_update_all_gpes() with the assumption that acpi_setup_gpe_for_wake()
> will be called in advance for all of the GPEs pointed to by _PRW
> objects in the namespace that may be affected by acpi_update_all_gpes().
> That is, acpi_ev_initialize_gpe_block() can only be called for a GPE
> block after acpi_setup_gpe_for_wake() has been called for all of the
> _PRW (wakeup) GPEs in it.
> 
> The platform firmware on some systems, however, expects GPEs to be
> enabled before the enumeration of devices which is when
> acpi_setup_gpe_for_wake() is called and that goes against the above
> assumption.
> 
> For this reason, introduce a new flag to be set by
> acpi_ev_initialize_gpe_block() when automatically enabling a GPE
> to indicate to acpi_setup_gpe_for_wake() that it needs to drop the
> reference to the GPE coming from acpi_ev_initialize_gpe_block()
> and modify acpi_setup_gpe_for_wake() accordingly.  These changes
> allow acpi_setup_gpe_for_wake() and acpi_ev_initialize_gpe_block()
> to be invoked in any order.
> 
> Signed-off-by: Rafael J. Wysocki 
> ---
>  drivers/acpi/acpica/evgpeblk.c |2 ++
>  drivers/acpi/acpica/evxfgpe.c  |8 
>  include/acpi/actypes.h |3 ++-
>  3 files changed, 12 insertions(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/acpi/acpica/evgpeblk.c
> ===
> --- linux-pm.orig/drivers/acpi/acpica/evgpeblk.c
> +++ linux-pm/drivers/acpi/acpica/evgpeblk.c
> @@ -496,6 +496,8 @@ acpi_ev_initialize_gpe_block(struct acpi
>   continue;
>   }
> 
> + gpe_event_info->flags |= ACPI_GPE_AUTO_ENABLED;
> +
>   if (event_status & ACPI_EVENT_FLAG_STATUS_SET) {
>   ACPI_INFO(("GPE 0x%02X active on init",
>  gpe_number));
> Index: linux-pm/include/acpi/actypes.h
> ===
> --- linux-pm.orig/include/acpi/actypes.h
> +++ linux-pm/include/acpi/actypes.h
> @@ -783,7 +783,7 @@ typedef u32 acpi_event_status;
>   *   |  | | |  +-- Type of dispatch:to method, handler, notify, or none
>   *   |  | | +- Interrupt type: edge or level triggered
>   *   |  | +--- Is a Wake GPE
> - *   |  +- Is GPE masked by the software GPE masking mechanism
> + *   |  +- Has been enabled automatically at init time
>   *   + 
>   */
>  #define ACPI_GPE_DISPATCH_NONE  (u8) 0x00
> @@ -799,6 +799,7 @@ typedef u32 acpi_event_status;
>  #define ACPI_GPE_XRUPT_TYPE_MASK(u8) 0x08
> 
>  #define ACPI_GPE_CAN_WAKE   (u8) 0x10
> +#define ACPI_GPE_AUTO_ENABLED   (u8) 0x20
> 
>  /*
>   * Flags for GPE and Lock interfaces
> Index: linux-pm/drivers/acpi/acpica/evxfgpe.c
> ===
> --- linux-pm.orig/drivers/acpi/acpica/evxfgpe.c
> +++ linux-pm/drivers/acpi/acpica/evxfgpe.c
> @@ -435,6 +435,14 @@ acpi_setup_gpe_for_wake(acpi_handle wake
>*/
>   gpe_event_info->flags =
>   (ACPI_GPE_DISPATCH_NOTIFY | ACPI_GPE_LEVEL_TRIGGERED);
> + } else if (gpe_event_info->flags & ACPI_GPE_AUTO_ENABLED) {
> + /*
> +  * A reference to this GPE has been added during the GPE block
> +  * initialization, so drop it now to prevent the GPE from being
> +  * permanently enabled and clear its ACPI_GPE_AUTO_ENABLED flag.
> +  */
> + (void)acpi_ev_remove_gpe_reference(gpe_event_info);
> + gpe_event_info->flags &= ~ACPI_GPE_AUTO_ENABLED;

The problem is if the GPE is shared, how can we know decrement reference
once can sufficiently convert it into wakeup dispatcher owned GPE?

Thanks and best regards
Lv



RE: [PATCH 2/3] ACPICA: Make it possible to enable runtime GPEs earlier

2017-08-09 Thread Zheng, Lv
Hi, Rafael

For this patch, I have a concern.

> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Subject: [PATCH 2/3] ACPICA: Make it possible to enable runtime GPEs earlier
> 
> From: Rafael J. Wysocki 
> 
> Runtime GPEs have corresponding _Lxx/_Exx methods and are enabled
> automatically during the initialization of the ACPI subsystem through
> acpi_update_all_gpes() with the assumption that acpi_setup_gpe_for_wake()
> will be called in advance for all of the GPEs pointed to by _PRW
> objects in the namespace that may be affected by acpi_update_all_gpes().
> That is, acpi_ev_initialize_gpe_block() can only be called for a GPE
> block after acpi_setup_gpe_for_wake() has been called for all of the
> _PRW (wakeup) GPEs in it.
> 
> The platform firmware on some systems, however, expects GPEs to be
> enabled before the enumeration of devices which is when
> acpi_setup_gpe_for_wake() is called and that goes against the above
> assumption.
> 
> For this reason, introduce a new flag to be set by
> acpi_ev_initialize_gpe_block() when automatically enabling a GPE
> to indicate to acpi_setup_gpe_for_wake() that it needs to drop the
> reference to the GPE coming from acpi_ev_initialize_gpe_block()
> and modify acpi_setup_gpe_for_wake() accordingly.  These changes
> allow acpi_setup_gpe_for_wake() and acpi_ev_initialize_gpe_block()
> to be invoked in any order.
> 
> Signed-off-by: Rafael J. Wysocki 
> ---
>  drivers/acpi/acpica/evgpeblk.c |2 ++
>  drivers/acpi/acpica/evxfgpe.c  |8 
>  include/acpi/actypes.h |3 ++-
>  3 files changed, 12 insertions(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/acpi/acpica/evgpeblk.c
> ===
> --- linux-pm.orig/drivers/acpi/acpica/evgpeblk.c
> +++ linux-pm/drivers/acpi/acpica/evgpeblk.c
> @@ -496,6 +496,8 @@ acpi_ev_initialize_gpe_block(struct acpi
>   continue;
>   }
> 
> + gpe_event_info->flags |= ACPI_GPE_AUTO_ENABLED;
> +
>   if (event_status & ACPI_EVENT_FLAG_STATUS_SET) {
>   ACPI_INFO(("GPE 0x%02X active on init",
>  gpe_number));
> Index: linux-pm/include/acpi/actypes.h
> ===
> --- linux-pm.orig/include/acpi/actypes.h
> +++ linux-pm/include/acpi/actypes.h
> @@ -783,7 +783,7 @@ typedef u32 acpi_event_status;
>   *   |  | | |  +-- Type of dispatch:to method, handler, notify, or none
>   *   |  | | +- Interrupt type: edge or level triggered
>   *   |  | +--- Is a Wake GPE
> - *   |  +- Is GPE masked by the software GPE masking mechanism
> + *   |  +- Has been enabled automatically at init time
>   *   + 
>   */
>  #define ACPI_GPE_DISPATCH_NONE  (u8) 0x00
> @@ -799,6 +799,7 @@ typedef u32 acpi_event_status;
>  #define ACPI_GPE_XRUPT_TYPE_MASK(u8) 0x08
> 
>  #define ACPI_GPE_CAN_WAKE   (u8) 0x10
> +#define ACPI_GPE_AUTO_ENABLED   (u8) 0x20
> 
>  /*
>   * Flags for GPE and Lock interfaces
> Index: linux-pm/drivers/acpi/acpica/evxfgpe.c
> ===
> --- linux-pm.orig/drivers/acpi/acpica/evxfgpe.c
> +++ linux-pm/drivers/acpi/acpica/evxfgpe.c
> @@ -435,6 +435,14 @@ acpi_setup_gpe_for_wake(acpi_handle wake
>*/
>   gpe_event_info->flags =
>   (ACPI_GPE_DISPATCH_NOTIFY | ACPI_GPE_LEVEL_TRIGGERED);
> + } else if (gpe_event_info->flags & ACPI_GPE_AUTO_ENABLED) {
> + /*
> +  * A reference to this GPE has been added during the GPE block
> +  * initialization, so drop it now to prevent the GPE from being
> +  * permanently enabled and clear its ACPI_GPE_AUTO_ENABLED flag.
> +  */
> + (void)acpi_ev_remove_gpe_reference(gpe_event_info);
> + gpe_event_info->flags &= ~ACPI_GPE_AUTO_ENABLED;

The problem is if the GPE is shared, how can we know decrement reference
once can sufficiently convert it into wakeup dispatcher owned GPE?

Thanks and best regards
Lv



RE: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time

2017-08-09 Thread Zheng, Lv
Hi, Rafael

> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Subject: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> 
> From: Rafael J. Wysocki 
> 
> In some cases GPEs are already active when they are enabled by
> acpi_ev_initialize_gpe_block() and whatever happens next may depend
> on the result of handling the events signaled by them, so the
> events should not be discarded (which is what happens currently) and
> they should be handled as soon as reasonably possible.
> 
> For this reason, modify acpi_ev_initialize_gpe_block() to
> dispatch GPEs with the status flag set in-band right after
> enabling them.

In fact, what we need seems to be invoking acpi_ev_gpe_dispatch()
right after enabling an GPE. So there are 2 conditions related:
1. GPE is enabled for the first time.
2. GPE is initialized.

And we need to make sure that before acpi_update_all_gpes() is invoked,
all GPE EN bits are actually disabled.
What if we do this in this way:

Index: linux-acpica/drivers/acpi/acpica/evxfgpe.c
===
--- linux-acpica.orig/drivers/acpi/acpica/evxfgpe.c
+++ linux-acpica/drivers/acpi/acpica/evxfgpe.c
@@ -97,6 +97,14 @@ acpi_status acpi_update_all_gpes(void)
 unlock_and_exit:
(void)acpi_ut_release_mutex(ACPI_MTX_EVENTS);
 
+   /*
+* Poll GPEs to handle already triggered events.
+* It is not sufficient to trigger edge-triggered GPE with specific
+* GPE chips, software need to poll once after enabling.
+*/
+   if (acpi_gbl_all_gpes_initialized) {
+   acpi_ev_gpe_detect(acpi_gbl_gpe_xrupt_list_head);
+   }
return_ACPI_STATUS(status);
 }
 
@@ -120,6 +128,7 @@ acpi_status acpi_enable_gpe(acpi_handle
acpi_status status = AE_BAD_PARAMETER;
struct acpi_gpe_event_info *gpe_event_info;
acpi_cpu_flags flags;
+   u8 poll_gpes = FALSE;
 
ACPI_FUNCTION_TRACE(acpi_enable_gpe);
 
@@ -135,12 +144,25 @@ acpi_status acpi_enable_gpe(acpi_handle
if (ACPI_GPE_DISPATCH_TYPE(gpe_event_info->flags) !=
ACPI_GPE_DISPATCH_NONE) {
status = acpi_ev_add_gpe_reference(gpe_event_info);
+   if (ACPI_SUCCESS(status) &&
+   gpe_event_info->runtime_count == 1) {
+   poll_gpes = TRUE;
+   }
} else {
status = AE_NO_HANDLER;
}
}
 
acpi_os_release_lock(acpi_gbl_gpe_lock, flags);
+
+   /*
+* Poll GPEs to handle already triggered events.
+* It is not sufficient to trigger edge-triggered GPE with specific
+* GPE chips, software need to poll once after enabling.
+*/
+   if (poll_gpes && acpi_gbl_all_gpes_initialized) {
+   acpi_ev_gpe_detect(acpi_gbl_gpe_xrupt_list_head);
+   }
return_ACPI_STATUS(status);
 }
 ACPI_EXPORT_SYMBOL(acpi_enable_gpe)
Index: linux-acpica/drivers/acpi/acpica/utxfinit.c
===
--- linux-acpica.orig/drivers/acpi/acpica/utxfinit.c
+++ linux-acpica/drivers/acpi/acpica/utxfinit.c
@@ -284,6 +284,13 @@ acpi_status ACPI_INIT_FUNCTION acpi_init
}
 
/*
+* Cleanup GPE enabling status to make sure that the GPE settings are
+* as what the OSPMs expect. This should be done before enumerating
+* ACPI devices and operation region drivers.
+*/
+   (void)acpi_hw_disable_all_gpes();
+
+   /*
 * Initialize all device/region objects in the namespace. This runs
 * the device _STA and _INI methods and region _REG methods.
 */

Thanks and best regards
Lv

> 
> Signed-off-by: Rafael J. Wysocki 
> ---
>  drivers/acpi/acpica/evgpeblk.c |   28 +++-
>  1 file changed, 19 insertions(+), 9 deletions(-)
> 
> Index: linux-pm/drivers/acpi/acpica/evgpeblk.c
> ===
> --- linux-pm.orig/drivers/acpi/acpica/evgpeblk.c
> +++ linux-pm/drivers/acpi/acpica/evgpeblk.c
> @@ -440,9 +440,11 @@ acpi_ev_initialize_gpe_block(struct acpi
>void *ignored)
>  {
>   acpi_status status;
> + acpi_event_status event_status;
>   struct acpi_gpe_event_info *gpe_event_info;
>   u32 gpe_enabled_count;
>   u32 gpe_index;
> + u32 gpe_number;
>   u32 i;
>   u32 j;
> 
> @@ -470,30 +472,38 @@ acpi_ev_initialize_gpe_block(struct acpi
> 
>   gpe_index = (i * ACPI_GPE_REGISTER_WIDTH) + j;
>   gpe_event_info = _block->event_info[gpe_index];
> + gpe_number = gpe_block->block_base_number + gpe_index;
> 
>   /*
>* Ignore GPEs that have no corresponding _Lxx/_Exx 
> method
> -  * and 

RE: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time

2017-08-09 Thread Zheng, Lv
Hi, Rafael

> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Subject: [PATCH 1/3] ACPICA: Dispatch active GPEs at init time
> 
> From: Rafael J. Wysocki 
> 
> In some cases GPEs are already active when they are enabled by
> acpi_ev_initialize_gpe_block() and whatever happens next may depend
> on the result of handling the events signaled by them, so the
> events should not be discarded (which is what happens currently) and
> they should be handled as soon as reasonably possible.
> 
> For this reason, modify acpi_ev_initialize_gpe_block() to
> dispatch GPEs with the status flag set in-band right after
> enabling them.

In fact, what we need seems to be invoking acpi_ev_gpe_dispatch()
right after enabling an GPE. So there are 2 conditions related:
1. GPE is enabled for the first time.
2. GPE is initialized.

And we need to make sure that before acpi_update_all_gpes() is invoked,
all GPE EN bits are actually disabled.
What if we do this in this way:

Index: linux-acpica/drivers/acpi/acpica/evxfgpe.c
===
--- linux-acpica.orig/drivers/acpi/acpica/evxfgpe.c
+++ linux-acpica/drivers/acpi/acpica/evxfgpe.c
@@ -97,6 +97,14 @@ acpi_status acpi_update_all_gpes(void)
 unlock_and_exit:
(void)acpi_ut_release_mutex(ACPI_MTX_EVENTS);
 
+   /*
+* Poll GPEs to handle already triggered events.
+* It is not sufficient to trigger edge-triggered GPE with specific
+* GPE chips, software need to poll once after enabling.
+*/
+   if (acpi_gbl_all_gpes_initialized) {
+   acpi_ev_gpe_detect(acpi_gbl_gpe_xrupt_list_head);
+   }
return_ACPI_STATUS(status);
 }
 
@@ -120,6 +128,7 @@ acpi_status acpi_enable_gpe(acpi_handle
acpi_status status = AE_BAD_PARAMETER;
struct acpi_gpe_event_info *gpe_event_info;
acpi_cpu_flags flags;
+   u8 poll_gpes = FALSE;
 
ACPI_FUNCTION_TRACE(acpi_enable_gpe);
 
@@ -135,12 +144,25 @@ acpi_status acpi_enable_gpe(acpi_handle
if (ACPI_GPE_DISPATCH_TYPE(gpe_event_info->flags) !=
ACPI_GPE_DISPATCH_NONE) {
status = acpi_ev_add_gpe_reference(gpe_event_info);
+   if (ACPI_SUCCESS(status) &&
+   gpe_event_info->runtime_count == 1) {
+   poll_gpes = TRUE;
+   }
} else {
status = AE_NO_HANDLER;
}
}
 
acpi_os_release_lock(acpi_gbl_gpe_lock, flags);
+
+   /*
+* Poll GPEs to handle already triggered events.
+* It is not sufficient to trigger edge-triggered GPE with specific
+* GPE chips, software need to poll once after enabling.
+*/
+   if (poll_gpes && acpi_gbl_all_gpes_initialized) {
+   acpi_ev_gpe_detect(acpi_gbl_gpe_xrupt_list_head);
+   }
return_ACPI_STATUS(status);
 }
 ACPI_EXPORT_SYMBOL(acpi_enable_gpe)
Index: linux-acpica/drivers/acpi/acpica/utxfinit.c
===
--- linux-acpica.orig/drivers/acpi/acpica/utxfinit.c
+++ linux-acpica/drivers/acpi/acpica/utxfinit.c
@@ -284,6 +284,13 @@ acpi_status ACPI_INIT_FUNCTION acpi_init
}
 
/*
+* Cleanup GPE enabling status to make sure that the GPE settings are
+* as what the OSPMs expect. This should be done before enumerating
+* ACPI devices and operation region drivers.
+*/
+   (void)acpi_hw_disable_all_gpes();
+
+   /*
 * Initialize all device/region objects in the namespace. This runs
 * the device _STA and _INI methods and region _REG methods.
 */

Thanks and best regards
Lv

> 
> Signed-off-by: Rafael J. Wysocki 
> ---
>  drivers/acpi/acpica/evgpeblk.c |   28 +++-
>  1 file changed, 19 insertions(+), 9 deletions(-)
> 
> Index: linux-pm/drivers/acpi/acpica/evgpeblk.c
> ===
> --- linux-pm.orig/drivers/acpi/acpica/evgpeblk.c
> +++ linux-pm/drivers/acpi/acpica/evgpeblk.c
> @@ -440,9 +440,11 @@ acpi_ev_initialize_gpe_block(struct acpi
>void *ignored)
>  {
>   acpi_status status;
> + acpi_event_status event_status;
>   struct acpi_gpe_event_info *gpe_event_info;
>   u32 gpe_enabled_count;
>   u32 gpe_index;
> + u32 gpe_number;
>   u32 i;
>   u32 j;
> 
> @@ -470,30 +472,38 @@ acpi_ev_initialize_gpe_block(struct acpi
> 
>   gpe_index = (i * ACPI_GPE_REGISTER_WIDTH) + j;
>   gpe_event_info = _block->event_info[gpe_index];
> + gpe_number = gpe_block->block_base_number + gpe_index;
> 
>   /*
>* Ignore GPEs that have no corresponding _Lxx/_Exx 
> method
> -  * and GPEs that are used to wake the system
> +

RE: [PATCH V2] ACPI, APEI: Fixup incorrect 16-bit access width firmware bug

2017-07-30 Thread Zheng, Lv
Hi,

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Song
> liwei
> Subject: [PATCH V2] ACPI, APEI: Fixup incorrect 16-bit access width firmware 
> bug
> 
> From: Liwei Song 
> 
> This is a follow up to commit f712c71f7b2b ("ACPI, APEI: Fixup common
> access width firmware bug") fix the following firmware bug:
> 
> [Firmware Bug]: APEI: Invalid bit width + offset in GAR [0xb2/16/0/1/1]
> 
> This is due to an 8-bit access width is specified for a 16-bit register,
> Rearrange the condition and add 8-bit width check.
> 
> Signed-off-by: Liwei Song 
> ---
>  drivers/acpi/apei/apei-base.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
> index da370e1..eca3d7a 100644
> --- a/drivers/acpi/apei/apei-base.c
> +++ b/drivers/acpi/apei/apei-base.c
> @@ -604,12 +604,12 @@ static int apei_check_gar(struct acpi_generic_address 
> *reg, u64 *paddr,
>   *access_bit_width = 1UL << (access_size_code + 2);
> 
>   /* Fixup common BIOS bug */
> - if (bit_width == 32 && bit_offset == 0 && (*paddr & 0x03) == 0 &&
> - *access_bit_width < 32)
> - *access_bit_width = 32;
> - else if (bit_width == 64 && bit_offset == 0 && (*paddr & 0x07) == 0 &&
> - *access_bit_width < 64)
> - *access_bit_width = 64;
> + if (*access_bit_width < bit_width && bit_offset == 0) {
> + if ((bit_width == 16 && (*paddr & 0x01) == 0) ||
> + (bit_width == 32 && (*paddr & 0x03) == 0) ||
> + (bit_width == 64 && (*paddr & 0x07) == 0))
> + *access_bit_width = bit_width;
> + }
> 
>   if ((bit_width + bit_offset) > *access_bit_width) {
>   pr_warning(FW_BUG APEI_PFX

IMO, such problem could also be fixed by this commit and a cleanup of APEI
GAR code to invoke generic ACPICA GAR API - acpi_read()/acpi_write() directly.
https://github.com/acpica/acpica/pull/209

Thanks and best regards
Lv



RE: [PATCH V2] ACPI, APEI: Fixup incorrect 16-bit access width firmware bug

2017-07-30 Thread Zheng, Lv
Hi,

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Song
> liwei
> Subject: [PATCH V2] ACPI, APEI: Fixup incorrect 16-bit access width firmware 
> bug
> 
> From: Liwei Song 
> 
> This is a follow up to commit f712c71f7b2b ("ACPI, APEI: Fixup common
> access width firmware bug") fix the following firmware bug:
> 
> [Firmware Bug]: APEI: Invalid bit width + offset in GAR [0xb2/16/0/1/1]
> 
> This is due to an 8-bit access width is specified for a 16-bit register,
> Rearrange the condition and add 8-bit width check.
> 
> Signed-off-by: Liwei Song 
> ---
>  drivers/acpi/apei/apei-base.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
> index da370e1..eca3d7a 100644
> --- a/drivers/acpi/apei/apei-base.c
> +++ b/drivers/acpi/apei/apei-base.c
> @@ -604,12 +604,12 @@ static int apei_check_gar(struct acpi_generic_address 
> *reg, u64 *paddr,
>   *access_bit_width = 1UL << (access_size_code + 2);
> 
>   /* Fixup common BIOS bug */
> - if (bit_width == 32 && bit_offset == 0 && (*paddr & 0x03) == 0 &&
> - *access_bit_width < 32)
> - *access_bit_width = 32;
> - else if (bit_width == 64 && bit_offset == 0 && (*paddr & 0x07) == 0 &&
> - *access_bit_width < 64)
> - *access_bit_width = 64;
> + if (*access_bit_width < bit_width && bit_offset == 0) {
> + if ((bit_width == 16 && (*paddr & 0x01) == 0) ||
> + (bit_width == 32 && (*paddr & 0x03) == 0) ||
> + (bit_width == 64 && (*paddr & 0x07) == 0))
> + *access_bit_width = bit_width;
> + }
> 
>   if ((bit_width + bit_offset) > *access_bit_width) {
>   pr_warning(FW_BUG APEI_PFX

IMO, such problem could also be fixed by this commit and a cleanup of APEI
GAR code to invoke generic ACPICA GAR API - acpi_read()/acpi_write() directly.
https://github.com/acpica/acpica/pull/209

Thanks and best regards
Lv



RE: [PATCH v7 12/13] ACPI / init: Invoke early ACPI initialization earlier

2017-07-27 Thread Zheng, Lv
Hi,

> From: Dou Liyang [mailto:douly.f...@cn.fujitsu.com]
> Sent: Tuesday, July 18, 2017 5:44 PM
> Subject: Re: [PATCH v7 12/13] ACPI / init: Invoke early ACPI initialization 
> earlier
> 
> Hi Baoquan,
> 
> At 07/18/2017 04:45 PM, b...@redhat.com wrote:
> > On 07/18/17 at 02:08pm, Dou Liyang wrote:
> >> Hi, Zheng
> >>
> >> At 07/18/2017 01:18 PM, Zheng, Lv wrote:
> >>> Hi,
> >>>
> >>> Can the problem be fixed by invoking acpi_put_table() for mapped DMAR 
> >>> table?
> >>
> >> Invoking acpi_put_table() is my first choice. But it made the kernel
> >> *panic* when we try to get the table again in intel_iommu_init() in
> >> late stage.
> >>
> >> I am also confused that:
> >>
> >> There are two places where we used DMAR table in Linux:
> >>
> >> 1) In detect_intel_iommu() in ACPI early stage:
> >>
> >> ...
> >> status = acpi_get_table(ACPI_SIG_DMAR, 0, _tbl);
> >> 
> >> if (dmar_tbl) {
> >>acpi_put_table(dmar_tbl);
> >>dmar_tbl = NULL;
> >> }
> >>
> >> 2) In dmar_table_init() in ACPI late stage:
> >>
> >> ...
> >> status = acpi_get_table(ACPI_SIG_DMAR, 0, _tbl);
> >> ...
> >>
> >> As we know, dmar_table_init() is called by intel_iommu_init() and
> >> intel_prepare_irq_remapping().
> >>
> >> When I invoked acpi_put_table() in the intel_prepare_irq_remapping() in
> >> early stage like 1) shows, kernel will panic.
> >
> > That's because acpi_put_table() will make the table pointer be NULL,
> > while dmar_table_init() will skip parse_dmar_table() calling if
> > dmar_table_initialized is set to 1 in intel_prepare_irq_remapping().
> >
> 
> Correctly.
> 
> I have considered and removed the *dmar_table_initialized* in this
> situation. So, dmar_table_init() didn't skip parse_dmar_table()
> calling.
> 
> I didn't dig into the cause, I think it is interesting, I will do it
> right now and share with you later.
> 
> > Dmar hardware support interrupt remapping and io remapping separately. But
> > intel_iommu_init() is called later than intel_prepare_irq_remapping().
> > So what if make dmar_table_init() a reentrant function? You can just
> > have a try, but maybe not a good idea, the dmar table will be parsed
> > twice.
> 
> Yes, It is precisely one reason that I gave up invoking
> acpi_put_table().

Parsing a table twice is not a problem on x86.
If you check the code, there are many examples.
It's actually required if you want to use a table both in early stage and late 
stage.

Thanks

> 
> Thanks,
> 
>   dou.
> 
> >
> >>
> >>
> >> Thanks,
> >>
> >>    dou.
> >>>
> >>> Thanks
> >>> Lv
> >>>
> >>>> From: Dou Liyang [mailto:douly.f...@cn.fujitsu.com]
> >>>> Sent: Friday, July 14, 2017 1:53 PM
> >>>> To: x...@kernel.org; linux-kernel@vger.kernel.org
> >>>> Cc: t...@linutronix.de; mi...@kernel.org; h...@zytor.com; 
> >>>> ebied...@xmission.com; b...@redhat.com;
> >>>> pet...@infradead.org; izumi.t...@jp.fujitsu.com; 
> >>>> tokunaga.kei...@jp.fujitsu.com; Dou Liyang
> >>>> <douly.f...@cn.fujitsu.com>; linux-a...@vger.kernel.org; Rafael J. 
> >>>> Wysocki <r...@rjwysocki.net>;
> Zheng,
> >>>> Lv <lv.zh...@intel.com>; Julian Wollrath <jwollr...@web.de>
> >>>> Subject: [PATCH v7 12/13] ACPI / init: Invoke early ACPI initialization 
> >>>> earlier
> >>>>
> >>>> Linux uses acpi_early_init() to put the ACPI table management into
> >>>> the late stage from the early stage where the mapped ACPI tables is
> >>>> temporary and should be unmapped.
> >>>>
> >>>> But, now initializing interrupt delivery mode should map and parse the
> >>>> DMAR table earlier in the early stage. This causes an ACPI error when
> >>>> Linux reallocates the ACPI root tables. Because Linux doesn't unmapped
> >>>> the DMAR table after using in the early stage.
> >>>>
> >>>> Invoke acpi_early_init() earlier before late_time_init(), Keep the DMAR
> >>>> be mapped and parsed in late stage like before.
> >>>>
> >>>> Reported-by: Xiaolong Ye <xiaolong...@intel.com>
> >>>> Signed-off-by: Dou Liyang <douly.f..

RE: [PATCH v7 12/13] ACPI / init: Invoke early ACPI initialization earlier

2017-07-27 Thread Zheng, Lv
Hi,

> From: Dou Liyang [mailto:douly.f...@cn.fujitsu.com]
> Sent: Tuesday, July 18, 2017 5:44 PM
> Subject: Re: [PATCH v7 12/13] ACPI / init: Invoke early ACPI initialization 
> earlier
> 
> Hi Baoquan,
> 
> At 07/18/2017 04:45 PM, b...@redhat.com wrote:
> > On 07/18/17 at 02:08pm, Dou Liyang wrote:
> >> Hi, Zheng
> >>
> >> At 07/18/2017 01:18 PM, Zheng, Lv wrote:
> >>> Hi,
> >>>
> >>> Can the problem be fixed by invoking acpi_put_table() for mapped DMAR 
> >>> table?
> >>
> >> Invoking acpi_put_table() is my first choice. But it made the kernel
> >> *panic* when we try to get the table again in intel_iommu_init() in
> >> late stage.
> >>
> >> I am also confused that:
> >>
> >> There are two places where we used DMAR table in Linux:
> >>
> >> 1) In detect_intel_iommu() in ACPI early stage:
> >>
> >> ...
> >> status = acpi_get_table(ACPI_SIG_DMAR, 0, _tbl);
> >> 
> >> if (dmar_tbl) {
> >>acpi_put_table(dmar_tbl);
> >>dmar_tbl = NULL;
> >> }
> >>
> >> 2) In dmar_table_init() in ACPI late stage:
> >>
> >> ...
> >> status = acpi_get_table(ACPI_SIG_DMAR, 0, _tbl);
> >> ...
> >>
> >> As we know, dmar_table_init() is called by intel_iommu_init() and
> >> intel_prepare_irq_remapping().
> >>
> >> When I invoked acpi_put_table() in the intel_prepare_irq_remapping() in
> >> early stage like 1) shows, kernel will panic.
> >
> > That's because acpi_put_table() will make the table pointer be NULL,
> > while dmar_table_init() will skip parse_dmar_table() calling if
> > dmar_table_initialized is set to 1 in intel_prepare_irq_remapping().
> >
> 
> Correctly.
> 
> I have considered and removed the *dmar_table_initialized* in this
> situation. So, dmar_table_init() didn't skip parse_dmar_table()
> calling.
> 
> I didn't dig into the cause, I think it is interesting, I will do it
> right now and share with you later.
> 
> > Dmar hardware support interrupt remapping and io remapping separately. But
> > intel_iommu_init() is called later than intel_prepare_irq_remapping().
> > So what if make dmar_table_init() a reentrant function? You can just
> > have a try, but maybe not a good idea, the dmar table will be parsed
> > twice.
> 
> Yes, It is precisely one reason that I gave up invoking
> acpi_put_table().

Parsing a table twice is not a problem on x86.
If you check the code, there are many examples.
It's actually required if you want to use a table both in early stage and late 
stage.

Thanks

> 
> Thanks,
> 
>   dou.
> 
> >
> >>
> >>
> >> Thanks,
> >>
> >>dou.
> >>>
> >>> Thanks
> >>> Lv
> >>>
> >>>> From: Dou Liyang [mailto:douly.f...@cn.fujitsu.com]
> >>>> Sent: Friday, July 14, 2017 1:53 PM
> >>>> To: x...@kernel.org; linux-kernel@vger.kernel.org
> >>>> Cc: t...@linutronix.de; mi...@kernel.org; h...@zytor.com; 
> >>>> ebied...@xmission.com; b...@redhat.com;
> >>>> pet...@infradead.org; izumi.t...@jp.fujitsu.com; 
> >>>> tokunaga.kei...@jp.fujitsu.com; Dou Liyang
> >>>> ; linux-a...@vger.kernel.org; Rafael J. 
> >>>> Wysocki ;
> Zheng,
> >>>> Lv ; Julian Wollrath 
> >>>> Subject: [PATCH v7 12/13] ACPI / init: Invoke early ACPI initialization 
> >>>> earlier
> >>>>
> >>>> Linux uses acpi_early_init() to put the ACPI table management into
> >>>> the late stage from the early stage where the mapped ACPI tables is
> >>>> temporary and should be unmapped.
> >>>>
> >>>> But, now initializing interrupt delivery mode should map and parse the
> >>>> DMAR table earlier in the early stage. This causes an ACPI error when
> >>>> Linux reallocates the ACPI root tables. Because Linux doesn't unmapped
> >>>> the DMAR table after using in the early stage.
> >>>>
> >>>> Invoke acpi_early_init() earlier before late_time_init(), Keep the DMAR
> >>>> be mapped and parsed in late stage like before.
> >>>>
> >>>> Reported-by: Xiaolong Ye 
> >>>> Signed-off-by: Dou Liyang 
> >>>> Cc: linux-a...@vger.kernel.org
> >>>> Cc: Rafael J. Wysocki 
> >>>> Cc: Zheng, Lv 
> >>>> Cc: Julian Wollrath 
> >>>> ---
> >>>> Test in my own PC(Lenovo M4340).
> >>>> Ask help for doing regression testing for the bug said in commit 
> >>>> c4e1acbb35e4
> >>>> ("ACPI / init: Invoke early ACPI initialization later").
> >>>>
> >>>>  init/main.c | 2 +-
> >>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/init/main.c b/init/main.c
> >>>> index df58a41..7a09467 100644
> >>>> --- a/init/main.c
> >>>> +++ b/init/main.c
> >>>> @@ -654,12 +654,12 @@ asmlinkage __visible void __init start_kernel(void)
> >>>>  kmemleak_init();
> >>>>  setup_per_cpu_pageset();
> >>>>  numa_policy_init();
> >>>> +acpi_early_init();
> >>>>  if (late_time_init)
> >>>>  late_time_init();
> >>>>  calibrate_delay();
> >>>>  pidmap_init();
> >>>>  anon_vma_init();
> >>>> -acpi_early_init();
> >>>>  #ifdef CONFIG_X86
> >>>>  if (efi_enabled(EFI_RUNTIME_SERVICES))
> >>>>  efi_enter_virtual_mode();
> >>>> --
> >>>> 2.5.5
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >
> >
> >
> 



RE: [PATCH v7 12/13] ACPI / init: Invoke early ACPI initialization earlier

2017-07-17 Thread Zheng, Lv
Hi,

Can the problem be fixed by invoking acpi_put_table() for mapped DMAR table?

Thanks
Lv

> From: Dou Liyang [mailto:douly.f...@cn.fujitsu.com]
> Sent: Friday, July 14, 2017 1:53 PM
> To: x...@kernel.org; linux-kernel@vger.kernel.org
> Cc: t...@linutronix.de; mi...@kernel.org; h...@zytor.com; 
> ebied...@xmission.com; b...@redhat.com;
> pet...@infradead.org; izumi.t...@jp.fujitsu.com; 
> tokunaga.kei...@jp.fujitsu.com; Dou Liyang
> <douly.f...@cn.fujitsu.com>; linux-a...@vger.kernel.org; Rafael J. Wysocki 
> <r...@rjwysocki.net>; Zheng,
> Lv <lv.zh...@intel.com>; Julian Wollrath <jwollr...@web.de>
> Subject: [PATCH v7 12/13] ACPI / init: Invoke early ACPI initialization 
> earlier
> 
> Linux uses acpi_early_init() to put the ACPI table management into
> the late stage from the early stage where the mapped ACPI tables is
> temporary and should be unmapped.
> 
> But, now initializing interrupt delivery mode should map and parse the
> DMAR table earlier in the early stage. This causes an ACPI error when
> Linux reallocates the ACPI root tables. Because Linux doesn't unmapped
> the DMAR table after using in the early stage.
> 
> Invoke acpi_early_init() earlier before late_time_init(), Keep the DMAR
> be mapped and parsed in late stage like before.
> 
> Reported-by: Xiaolong Ye <xiaolong...@intel.com>
> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
> Cc: linux-a...@vger.kernel.org
> Cc: Rafael J. Wysocki <r...@rjwysocki.net>
> Cc: Zheng, Lv <lv.zh...@intel.com>
> Cc: Julian Wollrath <jwollr...@web.de>
> ---
> Test in my own PC(Lenovo M4340).
> Ask help for doing regression testing for the bug said in commit c4e1acbb35e4
> ("ACPI / init: Invoke early ACPI initialization later").
> 
>  init/main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/init/main.c b/init/main.c
> index df58a41..7a09467 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -654,12 +654,12 @@ asmlinkage __visible void __init start_kernel(void)
>   kmemleak_init();
>   setup_per_cpu_pageset();
>   numa_policy_init();
> + acpi_early_init();
>   if (late_time_init)
>   late_time_init();
>   calibrate_delay();
>   pidmap_init();
>   anon_vma_init();
> - acpi_early_init();
>  #ifdef CONFIG_X86
>   if (efi_enabled(EFI_RUNTIME_SERVICES))
>   efi_enter_virtual_mode();
> --
> 2.5.5
> 
> 



RE: [PATCH v7 12/13] ACPI / init: Invoke early ACPI initialization earlier

2017-07-17 Thread Zheng, Lv
Hi,

Can the problem be fixed by invoking acpi_put_table() for mapped DMAR table?

Thanks
Lv

> From: Dou Liyang [mailto:douly.f...@cn.fujitsu.com]
> Sent: Friday, July 14, 2017 1:53 PM
> To: x...@kernel.org; linux-kernel@vger.kernel.org
> Cc: t...@linutronix.de; mi...@kernel.org; h...@zytor.com; 
> ebied...@xmission.com; b...@redhat.com;
> pet...@infradead.org; izumi.t...@jp.fujitsu.com; 
> tokunaga.kei...@jp.fujitsu.com; Dou Liyang
> ; linux-a...@vger.kernel.org; Rafael J. Wysocki 
> ; Zheng,
> Lv ; Julian Wollrath 
> Subject: [PATCH v7 12/13] ACPI / init: Invoke early ACPI initialization 
> earlier
> 
> Linux uses acpi_early_init() to put the ACPI table management into
> the late stage from the early stage where the mapped ACPI tables is
> temporary and should be unmapped.
> 
> But, now initializing interrupt delivery mode should map and parse the
> DMAR table earlier in the early stage. This causes an ACPI error when
> Linux reallocates the ACPI root tables. Because Linux doesn't unmapped
> the DMAR table after using in the early stage.
> 
> Invoke acpi_early_init() earlier before late_time_init(), Keep the DMAR
> be mapped and parsed in late stage like before.
> 
> Reported-by: Xiaolong Ye 
> Signed-off-by: Dou Liyang 
> Cc: linux-a...@vger.kernel.org
> Cc: Rafael J. Wysocki 
> Cc: Zheng, Lv 
> Cc: Julian Wollrath 
> ---
> Test in my own PC(Lenovo M4340).
> Ask help for doing regression testing for the bug said in commit c4e1acbb35e4
> ("ACPI / init: Invoke early ACPI initialization later").
> 
>  init/main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/init/main.c b/init/main.c
> index df58a41..7a09467 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -654,12 +654,12 @@ asmlinkage __visible void __init start_kernel(void)
>   kmemleak_init();
>   setup_per_cpu_pageset();
>   numa_policy_init();
> + acpi_early_init();
>   if (late_time_init)
>   late_time_init();
>   calibrate_delay();
>   pidmap_init();
>   anon_vma_init();
> - acpi_early_init();
>  #ifdef CONFIG_X86
>   if (efi_enabled(EFI_RUNTIME_SERVICES))
>   efi_enter_virtual_mode();
> --
> 2.5.5
> 
> 



RE: [PATCH 3/3] ACPI: EC: Change EC noirq tuning to be an optional behavior

2017-07-02 Thread Zheng, Lv
Hi, Rafael

> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Subject: Re: [PATCH 3/3] ACPI: EC: Change EC noirq tuning to be an optional 
> behavior
> 
> On Wednesday, June 14, 2017 01:59:24 PM Lv Zheng wrote:
> > According to the bug report, though the busy polling mode can make noirq
> > stages executed faster, it causes abnormal fan blowing in noirq stages.
> >
> > This patch prepares an option so that the automatic busy polling mode
> > switching for noirq stages can be enabled by who wants to tune it, not all
> > users.
> > Noticed that the new global option cannot be changed during noirq stages.
> > There is no need to lock its value changes to sync with polling mode
> > settings switches.
> >
> > For reporters and testers in the thread, as there are too many reporters
> > on the bug link, this patch only picks names from most active commenters.
> > Sorry for the neglet.
> >
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=191181
> > Reported-by: Tatsuyuki Ishi 
> > Reported-by: Claudio Sacerdoti Coen 
> > Tested-by: Nicolo' 
> > Reported-by: Jens Axboe 
> > Tested-by: Gjorgji Jankovski 
> > Tested-by: Damjan Georgievski 
> > Tested-by: Fernando Chaves 
> > Signed-off-by: Lv Zheng 
> 
> First of all, this seems to be a fix for commit c3a696b6e8f8 (ACPI / EC: Use 
> busy polling
> mode when GPE is not enabled), so there should be a Fixes: tag pointing to 
> that
> one.
> 
> Moreover, if that is just a performance optimization and not a matter of 
> correctness,
> why don't we simply drop acpi_ec_enter/leave_noirq() entirely?
> 
> What is going to break if we do that?

Let me Cc Yu for justification.
I just added busy poll support for suspend/boot according to the root cause 
reported by him.
He should know the end user requirements better than me.

Thanks and best regards
Lv


RE: [PATCH 3/3] ACPI: EC: Change EC noirq tuning to be an optional behavior

2017-07-02 Thread Zheng, Lv
Hi, Rafael

> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Subject: Re: [PATCH 3/3] ACPI: EC: Change EC noirq tuning to be an optional 
> behavior
> 
> On Wednesday, June 14, 2017 01:59:24 PM Lv Zheng wrote:
> > According to the bug report, though the busy polling mode can make noirq
> > stages executed faster, it causes abnormal fan blowing in noirq stages.
> >
> > This patch prepares an option so that the automatic busy polling mode
> > switching for noirq stages can be enabled by who wants to tune it, not all
> > users.
> > Noticed that the new global option cannot be changed during noirq stages.
> > There is no need to lock its value changes to sync with polling mode
> > settings switches.
> >
> > For reporters and testers in the thread, as there are too many reporters
> > on the bug link, this patch only picks names from most active commenters.
> > Sorry for the neglet.
> >
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=191181
> > Reported-by: Tatsuyuki Ishi 
> > Reported-by: Claudio Sacerdoti Coen 
> > Tested-by: Nicolo' 
> > Reported-by: Jens Axboe 
> > Tested-by: Gjorgji Jankovski 
> > Tested-by: Damjan Georgievski 
> > Tested-by: Fernando Chaves 
> > Signed-off-by: Lv Zheng 
> 
> First of all, this seems to be a fix for commit c3a696b6e8f8 (ACPI / EC: Use 
> busy polling
> mode when GPE is not enabled), so there should be a Fixes: tag pointing to 
> that
> one.
> 
> Moreover, if that is just a performance optimization and not a matter of 
> correctness,
> why don't we simply drop acpi_ec_enter/leave_noirq() entirely?
> 
> What is going to break if we do that?

Let me Cc Yu for justification.
I just added busy poll support for suspend/boot according to the root cause 
reported by him.
He should know the end user requirements better than me.

Thanks and best regards
Lv


RE: [PATCH] ACPI / sleep: EC-based wakeup from suspend-to-idle on recent systems

2017-06-23 Thread Zheng, Lv
Hi, Rafael

> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Subject: [PATCH] ACPI / sleep: EC-based wakeup from suspend-to-idle on recent 
> systems
> 
> From: Rafael J. Wysocki 
> 
> Some recent Dell laptops, including the XPS13 model numbers 9360 and
> 9365, cannot be woken up from suspend-to-idle by pressing the power
> button which is unexpected and makes that feature less usable on
> those systems.  Moreover, on the 9365 ACPI S3 (suspend-to-RAM) is
> not expected to be used at all (the OS these systems ship with never
> exercises the ACPI S3 path in the firmware) and suspend-to-idle is
> the only viable system suspend mechanism there.
> 
> The reason why the power button wakeup from suspend-to-idle doesn't
> work on those systems is because their power button events are
> signaled by the EC (Embedded Controller), whose GPE (General Purpose
> Event) line is disabled during suspend-to-idle transitions in Linux.
> That is done on purpose, because in general the EC tends to be noisy
> for various reasons (battery and thermal updates and similar, for
> example) and all events signaled by it would kick the CPUs out of
> deep idle states while in suspend-to-idle, which effectively might
> defeat its purpose.
> 
> Of course, on the Dell systems in question the EC GPE must be enabled
> during suspend-to-idle transitions for the button press events to
> be signaled while suspended at all, but fortunately there is a way
> out of this puzzle.
> 
> First of all, those systems have the ACPI_FADT_LOW_POWER_S0 flag set
> in their ACPI tables, which means that the OS is expected to prefer
> the "low power S0 idle" system state over ACPI S3 on them.  That
> causes the most recent versions of other OSes to simply ignore ACPI
> S3 on those systems, so it is reasonable to expect that it should not
> be necessary to block GPEs during suspend-to-idle on them.
> 
> Second, in addition to that, the systems in question provide a special
> firmware interface that can be used to indicate to the platform that
> the OS is transitioning into a system-wide low-power state in which
> certain types of activity are not desirable or that it is leaving
> such a state and that (in principle) should allow the platform to
> adjust its operation mode accordingly.
> 
> That interface is a special _DSM object under a System Power
> Management Controller device (PNP0D80).  The expected way to use it
> is to invoke function 0 from it on system initialization, functions
> 3 and 5 during suspend transitions and functions 4 and 6 during
> resume transitions (to reverse the actions carried out by the
> former).  In particular, function 5 from the "Low-Power S0" device
> _DSM is expected to cause the platform to put itself into a low-power
> operation mode which should include making the EC less verbose (so to
> speak).  Next, on resume, function 6 switches the platform back to
> the "working-state" operation mode.
> 
> In accordance with the above, modify the ACPI suspend-to-idle code
> to look for the "Low-Power S0" _DSM interface on platforms with the
> ACPI_FADT_LOW_POWER_S0 flag set in the ACPI tables.  If it's there,
> use it during suspend-to-idle transitions as prescribed and avoid
> changing the GPE configuration in that case.  [That should reflect
> what the most recent versions of other OSes do.]
> 
> Also modify the ACPI EC driver to make it handle events during
> suspend-to-idle in the usual way if the "Low-Power S0" _DSM interface
> is going to be used to make the power button events work while
> suspended on the Dell machines mentioned above
> 
> Link: 
> http://www.uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf
> Signed-off-by: Rafael J. Wysocki 
> ---
> 
> This is a replacement for https://patchwork.kernel.org/patch/9797909/
> 
> The changelog describes what is going on (and now the "Low-Power S0" _DSM
> specification is public, so it can be used officially here) and it gets the 
> job
> done on the XPS13 9360.  [The additional sort of "bonus" is that the machine
> looks "suspended" in s2idle now, as one of the effects of the _DSM appears
> to be turning off the lights in a quite literal sense.]
> 
> The patch is based on https://patchwork.kernel.org/patch/9797913/ and
> https://patchwork.kernel.org/patch/9797903/ on top of the current linux-next.
> 
> Thanks,
> Rafael
> 
> ---
>  drivers/acpi/ec.c   |2
>  drivers/acpi/internal.h |2
>  drivers/acpi/sleep.c|  107 
> ++--
>  3 files changed, 107 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/drivers/acpi/sleep.c
> ===
> --- linux-pm.orig/drivers/acpi/sleep.c
> +++ linux-pm/drivers/acpi/sleep.c
> @@ -652,6 +652,84 @@ static const struct platform_suspend_ops
> 
>  static bool s2idle_wakeup;
> 
> +/*
> + * On platforms supporting the Low Power S0 Idle interface there is an 

RE: [PATCH] ACPI / sleep: EC-based wakeup from suspend-to-idle on recent systems

2017-06-23 Thread Zheng, Lv
Hi, Rafael

> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Subject: [PATCH] ACPI / sleep: EC-based wakeup from suspend-to-idle on recent 
> systems
> 
> From: Rafael J. Wysocki 
> 
> Some recent Dell laptops, including the XPS13 model numbers 9360 and
> 9365, cannot be woken up from suspend-to-idle by pressing the power
> button which is unexpected and makes that feature less usable on
> those systems.  Moreover, on the 9365 ACPI S3 (suspend-to-RAM) is
> not expected to be used at all (the OS these systems ship with never
> exercises the ACPI S3 path in the firmware) and suspend-to-idle is
> the only viable system suspend mechanism there.
> 
> The reason why the power button wakeup from suspend-to-idle doesn't
> work on those systems is because their power button events are
> signaled by the EC (Embedded Controller), whose GPE (General Purpose
> Event) line is disabled during suspend-to-idle transitions in Linux.
> That is done on purpose, because in general the EC tends to be noisy
> for various reasons (battery and thermal updates and similar, for
> example) and all events signaled by it would kick the CPUs out of
> deep idle states while in suspend-to-idle, which effectively might
> defeat its purpose.
> 
> Of course, on the Dell systems in question the EC GPE must be enabled
> during suspend-to-idle transitions for the button press events to
> be signaled while suspended at all, but fortunately there is a way
> out of this puzzle.
> 
> First of all, those systems have the ACPI_FADT_LOW_POWER_S0 flag set
> in their ACPI tables, which means that the OS is expected to prefer
> the "low power S0 idle" system state over ACPI S3 on them.  That
> causes the most recent versions of other OSes to simply ignore ACPI
> S3 on those systems, so it is reasonable to expect that it should not
> be necessary to block GPEs during suspend-to-idle on them.
> 
> Second, in addition to that, the systems in question provide a special
> firmware interface that can be used to indicate to the platform that
> the OS is transitioning into a system-wide low-power state in which
> certain types of activity are not desirable or that it is leaving
> such a state and that (in principle) should allow the platform to
> adjust its operation mode accordingly.
> 
> That interface is a special _DSM object under a System Power
> Management Controller device (PNP0D80).  The expected way to use it
> is to invoke function 0 from it on system initialization, functions
> 3 and 5 during suspend transitions and functions 4 and 6 during
> resume transitions (to reverse the actions carried out by the
> former).  In particular, function 5 from the "Low-Power S0" device
> _DSM is expected to cause the platform to put itself into a low-power
> operation mode which should include making the EC less verbose (so to
> speak).  Next, on resume, function 6 switches the platform back to
> the "working-state" operation mode.
> 
> In accordance with the above, modify the ACPI suspend-to-idle code
> to look for the "Low-Power S0" _DSM interface on platforms with the
> ACPI_FADT_LOW_POWER_S0 flag set in the ACPI tables.  If it's there,
> use it during suspend-to-idle transitions as prescribed and avoid
> changing the GPE configuration in that case.  [That should reflect
> what the most recent versions of other OSes do.]
> 
> Also modify the ACPI EC driver to make it handle events during
> suspend-to-idle in the usual way if the "Low-Power S0" _DSM interface
> is going to be used to make the power button events work while
> suspended on the Dell machines mentioned above
> 
> Link: 
> http://www.uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf
> Signed-off-by: Rafael J. Wysocki 
> ---
> 
> This is a replacement for https://patchwork.kernel.org/patch/9797909/
> 
> The changelog describes what is going on (and now the "Low-Power S0" _DSM
> specification is public, so it can be used officially here) and it gets the 
> job
> done on the XPS13 9360.  [The additional sort of "bonus" is that the machine
> looks "suspended" in s2idle now, as one of the effects of the _DSM appears
> to be turning off the lights in a quite literal sense.]
> 
> The patch is based on https://patchwork.kernel.org/patch/9797913/ and
> https://patchwork.kernel.org/patch/9797903/ on top of the current linux-next.
> 
> Thanks,
> Rafael
> 
> ---
>  drivers/acpi/ec.c   |2
>  drivers/acpi/internal.h |2
>  drivers/acpi/sleep.c|  107 
> ++--
>  3 files changed, 107 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/drivers/acpi/sleep.c
> ===
> --- linux-pm.orig/drivers/acpi/sleep.c
> +++ linux-pm/drivers/acpi/sleep.c
> @@ -652,6 +652,84 @@ static const struct platform_suspend_ops
> 
>  static bool s2idle_wakeup;
> 
> +/*
> + * On platforms supporting the Low Power S0 Idle interface there is an ACPI
> + * device object with the PNP0D80 compatible 

RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-21 Thread Zheng, Lv
Hi,

> From: Bastien Nocera [mailto:had...@hadess.net]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> On Tue, 2017-06-20 at 02:45 +, Zheng, Lv wrote:
> > Hi,
> >
> > > From: Bastien Nocera [mailto:had...@hadess.net]
> > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable
> > > LID switch exported by ACPI
> > >
> > > On Mon, 2017-06-19 at 01:43 +, Zheng, Lv wrote:
> > > > 
> > > > >
> > > > > If you implement it in such a way that GNOME settings daemon
> > > > > behaves weirdly, you'll get my revert
> > > > > request in the mail. Do. Not. Ever. Lie.
> > > >
> > > > First, I don't know what should be reverted...
> > > > I have 2 solutions here for review, and Benjamin has 1.
> > > > And none of them has been upstreamed.
> > > > We are just discussing.
> > >
> > > The discussion is getting tiring quite frankly. We've been over
> > > this
> > > for nearly a year now, and with no end in sight.
> >
> > We have concerns to introduce too complicated logics to such a
> > simple button driver especially the logics are related to platform
> > firmware, input ABI and user space behaviors.
> >
> > I understand the situation.
> > Anyway this shouldn't be a big deal.
> > Let's prepare a smarter series to collect all fixes and solutions
> > with runtime configurables and get that to the end users.
> > So that we can figure out which is the simplest solution.
> >
> > But before that, let me ask several questions about gnome-setting-
> > deamon.
> >
> > >
> > > > However we need to get 1 of them upstreamed in next cycle.
> > > >
> > > > I think users won't startup gnome-setting-daemon right after
> > > > resume.
> > > > It should have already been started.
> > > >
> > > > There is only 1 platform may see delayed state update after
> > > > resume.
> > > > Let's see if there is a practical issue.
> > > > 1. Before suspend, the "lid state" is "close", and
> > > > 2. After resume, the state might remain "close" for a while
> > > >Since libinput won't deliver close to userspace,
> > > >and gnome-setting-daemon listens to key switches, there is no
> > > > wrong behavior.
> > >
> > > It doesn't. It listens to UPower, which tells user-space whether
> > > there
> > > is a lid switch, and whether it's opened or closed.
> >
> > Thanks for the information.
> > However I don't see differences here.
> >
> > >
> > > > 3. Then after several seconds, "open" arrives.
> > > >gnome-setting-daemon re-arrange monitors and screen layouts in
> > > > response to the new event.
> > >
> > > Just how is anyone supposed to know that there is an event coming?
> >
> > Will UPower deliver EV_SW key events to gnome-setting-daemon?
> >
> > >
> > > > So there is no problem. IMO, there is no need to improve for
> > > > post-
> > > > resume case.
> > > >
> > > > Users will just startup gnome-setting-daemon once after boot.
> > > > And it's likely that when it is started, the state is correct.
> > >
> > > You cannot rely on when gnome-settings-daemon will be started to
> > > make
> > > *any* decision. Certainly not decisions on how the kernel should
> > > behave.
> >
> > My bad wording, I just meant:
> > When gnome-settings-daemon is started is not related to what we are
> > discussing.
> >
> > Do you want to fix regressions?
> > Or you want to fix new issues on recent platforms?
> > If you want to fix regressions, I think Benjamin has submitted a
> > revision
> > to use old method mode, there shouldn't be regressions for
> > gnome-settings-daemon.
> >
> > What else we want to do is to fix regressions related to systemd when
> > we go back to default method mode. Since there is no issue with
> > systemd
> > 233 and after just applying a small change, systemd 229 can also be
> > worked around, I mean dynamically add/remove input node is not
> > strictly
> > required for achieving our purposes.
> >
> > But if you want to fix new issues on new platforms, we can discuss
> > further and determine which program should be changed and which
> > p

RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-21 Thread Zheng, Lv
Hi,

> From: Bastien Nocera [mailto:had...@hadess.net]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> On Tue, 2017-06-20 at 02:45 +, Zheng, Lv wrote:
> > Hi,
> >
> > > From: Bastien Nocera [mailto:had...@hadess.net]
> > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable
> > > LID switch exported by ACPI
> > >
> > > On Mon, 2017-06-19 at 01:43 +, Zheng, Lv wrote:
> > > > 
> > > > >
> > > > > If you implement it in such a way that GNOME settings daemon
> > > > > behaves weirdly, you'll get my revert
> > > > > request in the mail. Do. Not. Ever. Lie.
> > > >
> > > > First, I don't know what should be reverted...
> > > > I have 2 solutions here for review, and Benjamin has 1.
> > > > And none of them has been upstreamed.
> > > > We are just discussing.
> > >
> > > The discussion is getting tiring quite frankly. We've been over
> > > this
> > > for nearly a year now, and with no end in sight.
> >
> > We have concerns to introduce too complicated logics to such a
> > simple button driver especially the logics are related to platform
> > firmware, input ABI and user space behaviors.
> >
> > I understand the situation.
> > Anyway this shouldn't be a big deal.
> > Let's prepare a smarter series to collect all fixes and solutions
> > with runtime configurables and get that to the end users.
> > So that we can figure out which is the simplest solution.
> >
> > But before that, let me ask several questions about gnome-setting-
> > deamon.
> >
> > >
> > > > However we need to get 1 of them upstreamed in next cycle.
> > > >
> > > > I think users won't startup gnome-setting-daemon right after
> > > > resume.
> > > > It should have already been started.
> > > >
> > > > There is only 1 platform may see delayed state update after
> > > > resume.
> > > > Let's see if there is a practical issue.
> > > > 1. Before suspend, the "lid state" is "close", and
> > > > 2. After resume, the state might remain "close" for a while
> > > >Since libinput won't deliver close to userspace,
> > > >and gnome-setting-daemon listens to key switches, there is no
> > > > wrong behavior.
> > >
> > > It doesn't. It listens to UPower, which tells user-space whether
> > > there
> > > is a lid switch, and whether it's opened or closed.
> >
> > Thanks for the information.
> > However I don't see differences here.
> >
> > >
> > > > 3. Then after several seconds, "open" arrives.
> > > >gnome-setting-daemon re-arrange monitors and screen layouts in
> > > > response to the new event.
> > >
> > > Just how is anyone supposed to know that there is an event coming?
> >
> > Will UPower deliver EV_SW key events to gnome-setting-daemon?
> >
> > >
> > > > So there is no problem. IMO, there is no need to improve for
> > > > post-
> > > > resume case.
> > > >
> > > > Users will just startup gnome-setting-daemon once after boot.
> > > > And it's likely that when it is started, the state is correct.
> > >
> > > You cannot rely on when gnome-settings-daemon will be started to
> > > make
> > > *any* decision. Certainly not decisions on how the kernel should
> > > behave.
> >
> > My bad wording, I just meant:
> > When gnome-settings-daemon is started is not related to what we are
> > discussing.
> >
> > Do you want to fix regressions?
> > Or you want to fix new issues on recent platforms?
> > If you want to fix regressions, I think Benjamin has submitted a
> > revision
> > to use old method mode, there shouldn't be regressions for
> > gnome-settings-daemon.
> >
> > What else we want to do is to fix regressions related to systemd when
> > we go back to default method mode. Since there is no issue with
> > systemd
> > 233 and after just applying a small change, systemd 229 can also be
> > worked around, I mean dynamically add/remove input node is not
> > strictly
> > required for achieving our purposes.
> >
> > But if you want to fix new issues on new platforms, we can discuss
> > further and determine which program should be changed and which
> > p

RE: [PATCH v2 3/3] ACPI / sleep: EC-based wakeup from suspend-to-idle on recent Dell systems

2017-06-20 Thread Zheng, Lv
Hi, Rafael

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Rafael J.
> Wysocki
> Subject: Re: [PATCH v2 3/3] ACPI / sleep: EC-based wakeup from 
> suspend-to-idle on recent Dell systems
> 
> On Tue, Jun 20, 2017 at 2:07 AM, Linus Torvalds
>  wrote:
> > On Tue, Jun 20, 2017 at 5:53 AM, Rafael J. Wysocki  
> > wrote:
> >>
> >> -> v2: Added acpi_sleep=no_ec_wakeup to prevent EC events from waking up
> >>   the system from s2idle on systems where they do that by default.
> >
> > This seems a big hacky.
> >
> > Is there no way to simply make acpi_ec_suspend() smarter while going
> > to sleep? Instead of just unconditionally disabling every EC GPE, can
> > we see that "this gpe is the power botton" somehow?
> 
> Unfortunately, the connection between the GPE and the power button is
> not direct.
> 
> The EC GPE handler has no idea that it will generate power button
> events.  It simply executes an AML method doing that.
> 
> The AML method, in turn, executes Notify(power button device) and the
> "power button device" driver has to register a notify handler that
> will recognize and process the events.  It doesn't know in principle
> where the events will come from, though.  They may come from the EC or
> from a different GPE etc.
> 
> Neither the EC driver, nor the "power button device" driver can figure
> out that the connection is there.

The EC driver can only get an event number after querying the firmware.
And it has no idea whether handling this event by executing _Exx where
Xx is the number of the event can result in Notify(power button device).

Traditional ACPI power button events are ACPI fixed events, not EC GPE:

Power button signal
A power button can be supplied in two ways.
 One way is to simply use the fixed status bit, and
 The other uses the declaration of an ACPI power device and AML code to
  determine the event.
For more information about the alternate-device based power button, see
Section 4.8.2.2.1.2, Control Method Power Button.”

If it is not designed as fixed event, OS has no idea what GPE, or
EC event number is related to the power button.

> 
> > Disabling the power button event sounds fundamentally broken, and it
> > sounds like Windows doesn't do that. I doubt Windows has some hacky
> > whitelist. So I'd rather fix a deeper issue than have these kinds of
> > hacks, if at all possible.
> 
> My understanding is that Windows uses the ACPI_FADT_LOW_POWER_S0 flag.
> It generally enables non-S3 suspend/resume when this flag is set and
> it doesn't touch S3 then.  Keeping the EC GPE (and other GPEs for that
> matter) enabled over suspend/resume is part of that if my
> understanding is correct.

This sounds reasonable, but I have a question.

On Surface notebooks, an EC GPE wake capable setting is prepared:
Device (EC0)
{
Name (_HID, EisaId ("PNP0C09"))  // _HID: Hardware ID
...
Method (_STA, 0, NotSerialized)  // _STA: Status
{
...
Return (0x0F)
}
Name (_GPE, 0x38)  // _GPE: General Purpose Events
Name (_PRW, Package (0x02)  // _PRW: Power Resources 
for Wake
{
0x38, 
0x03
})

The _PRW means GPE 0x38 (EC GPE) can wake-up the system from S3-S0.
And the platform only supports s2idle.
Decoding its FADT, we can see the flag is set:
[070h 0112   4]Flags (decoded below) : 002384B5
...
  Low Power S0 Idle (V5) : 1

If EC GPE should always be enabled when the flag is set, why MS
(surface pros are manufactured by MS) prepares _PRW for its EC?

Thanks,
Lv

> 
> During suspend we generally disable all GPEs that are not expected to
> generate wakeup events in order to avoid spurious wakeups, but we can
> try to keep them enabled if ACPI_FADT_LOW_POWER_S0 is set.  That will
> reduce the ugliness, but the cost may be more energy used while
> suspended on some systems.
> 
> Thanks,
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 3/3] ACPI / sleep: EC-based wakeup from suspend-to-idle on recent Dell systems

2017-06-20 Thread Zheng, Lv
Hi, Rafael

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Rafael J.
> Wysocki
> Subject: Re: [PATCH v2 3/3] ACPI / sleep: EC-based wakeup from 
> suspend-to-idle on recent Dell systems
> 
> On Tue, Jun 20, 2017 at 2:07 AM, Linus Torvalds
>  wrote:
> > On Tue, Jun 20, 2017 at 5:53 AM, Rafael J. Wysocki  
> > wrote:
> >>
> >> -> v2: Added acpi_sleep=no_ec_wakeup to prevent EC events from waking up
> >>   the system from s2idle on systems where they do that by default.
> >
> > This seems a big hacky.
> >
> > Is there no way to simply make acpi_ec_suspend() smarter while going
> > to sleep? Instead of just unconditionally disabling every EC GPE, can
> > we see that "this gpe is the power botton" somehow?
> 
> Unfortunately, the connection between the GPE and the power button is
> not direct.
> 
> The EC GPE handler has no idea that it will generate power button
> events.  It simply executes an AML method doing that.
> 
> The AML method, in turn, executes Notify(power button device) and the
> "power button device" driver has to register a notify handler that
> will recognize and process the events.  It doesn't know in principle
> where the events will come from, though.  They may come from the EC or
> from a different GPE etc.
> 
> Neither the EC driver, nor the "power button device" driver can figure
> out that the connection is there.

The EC driver can only get an event number after querying the firmware.
And it has no idea whether handling this event by executing _Exx where
Xx is the number of the event can result in Notify(power button device).

Traditional ACPI power button events are ACPI fixed events, not EC GPE:

Power button signal
A power button can be supplied in two ways.
 One way is to simply use the fixed status bit, and
 The other uses the declaration of an ACPI power device and AML code to
  determine the event.
For more information about the alternate-device based power button, see
Section 4.8.2.2.1.2, Control Method Power Button.”

If it is not designed as fixed event, OS has no idea what GPE, or
EC event number is related to the power button.

> 
> > Disabling the power button event sounds fundamentally broken, and it
> > sounds like Windows doesn't do that. I doubt Windows has some hacky
> > whitelist. So I'd rather fix a deeper issue than have these kinds of
> > hacks, if at all possible.
> 
> My understanding is that Windows uses the ACPI_FADT_LOW_POWER_S0 flag.
> It generally enables non-S3 suspend/resume when this flag is set and
> it doesn't touch S3 then.  Keeping the EC GPE (and other GPEs for that
> matter) enabled over suspend/resume is part of that if my
> understanding is correct.

This sounds reasonable, but I have a question.

On Surface notebooks, an EC GPE wake capable setting is prepared:
Device (EC0)
{
Name (_HID, EisaId ("PNP0C09"))  // _HID: Hardware ID
...
Method (_STA, 0, NotSerialized)  // _STA: Status
{
...
Return (0x0F)
}
Name (_GPE, 0x38)  // _GPE: General Purpose Events
Name (_PRW, Package (0x02)  // _PRW: Power Resources 
for Wake
{
0x38, 
0x03
})

The _PRW means GPE 0x38 (EC GPE) can wake-up the system from S3-S0.
And the platform only supports s2idle.
Decoding its FADT, we can see the flag is set:
[070h 0112   4]Flags (decoded below) : 002384B5
...
  Low Power S0 Idle (V5) : 1

If EC GPE should always be enabled when the flag is set, why MS
(surface pros are manufactured by MS) prepares _PRW for its EC?

Thanks,
Lv

> 
> During suspend we generally disable all GPEs that are not expected to
> generate wakeup events in order to avoid spurious wakeups, but we can
> try to keep them enabled if ACPI_FADT_LOW_POWER_S0 is set.  That will
> reduce the ugliness, but the cost may be more energy used while
> suspended on some systems.
> 
> Thanks,
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 3/3] ACPI / sleep: EC-based wakeup from suspend-to-idle on recent Dell systems

2017-06-20 Thread Zheng, Lv
Hi,

> From: rjwyso...@gmail.com [mailto:rjwyso...@gmail.com] On Behalf Of Rafael J. 
> Wysocki
> Subject: Re: [PATCH v2 3/3] ACPI / sleep: EC-based wakeup from 
> suspend-to-idle on recent Dell systems
> 
> On Tue, Jun 20, 2017 at 1:37 AM, Zheng, Lv <lv.zh...@intel.com> wrote:
> > Hi, Rafael
> >
> >> From: linux-acpi-ow...@vger.kernel.org 
> >> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of
> Rafael J.
> >> Wysocki
> >> Subject: [PATCH v2 3/3] ACPI / sleep: EC-based wakeup from suspend-to-idle 
> >> on recent Dell systems
> >>
> >> From: Rafael J. Wysocki <rafael.j.wyso...@intel.com>
> >>
> >> Some recent Dell laptops, including the XPS13 model numbers 9360 and
> >> 9365, cannot be woken up from suspend-to-idle by pressing the power
> >> button which is unexpected and makes that feature less usable on
> >> those systems.  Moreover, on the 9365 ACPI S3 (suspend-to-RAM) is
> >> not expected to be used at all (the OS these systems ship with never
> >> exercises the ACPI S3 path) and suspend-to-idle is the only viable
> >> system suspend mechanism in there.
> >>
> >> The reason why the power button wakeup from suspend-to-idle doesn't
> >> work on those systems is because their power button events are
> >> signaled by the EC (Embedded Controller), whose GPE (General Purpose
> >> Event) line is disabled during suspend-to-idle transitions in Linux.
> >> That is done on purpose, because in general the EC tends to generate
> >> tons of events for various reasons (battery and thermal updates and
> >> similar, for example) and all of them would kick the CPUs out of deep
> >> idle states while in suspend-to-idle, which effectively would defeat
> >> its purpose.
> >>
> >> Of course, on the Dell systems in question the EC GPE must be enabled
> >> during suspend-to-idle transitions for the button press events to
> >> be signaled while suspended at all.  For this reason, add a DMI
> >> switch to the ACPI system suspend infrastructure to treat the EC
> >> GPE as a wakeup one on the affected Dell systems.  In case the
> >> users would prefer not to do that after all, add a new kernel
> >> command line switch, acpi_sleep=no_ec_wakeup, to disable that new
> >> behavior.
> >>
> 
> [cut]
> 
> >>
> >> Index: linux-pm/drivers/acpi/sleep.c
> >> ===
> >> --- linux-pm.orig/drivers/acpi/sleep.c
> >> +++ linux-pm/drivers/acpi/sleep.c
> >> @@ -160,6 +160,23 @@ static int __init init_nvs_nosave(const
> >>   return 0;
> >>  }
> >>
> >> +/* If set, it is allowed to use the EC GPE to wake up the system. */
> >> +static bool ec_gpe_wakeup_allowed __initdata = true;
> >> +
> >> +void __init acpi_disable_ec_gpe_wakeup(void)
> >> +{
> >> + ec_gpe_wakeup_allowed = false;
> >> +}
> >> +
> >> +/* If set, the EC GPE will be configured to wake up the system. */
> >> +static bool ec_gpe_wakeup;
> >> +
> >> +static int __init init_ec_gpe_wakeup(const struct dmi_system_id *d)
> >> +{
> >> + ec_gpe_wakeup = ec_gpe_wakeup_allowed;
> >> + return 0;
> >> +}
> >> +
> >>  static struct dmi_system_id acpisleep_dmi_table[] __initdata = {
> >>   {
> >>   .callback = init_old_suspend_ordering,
> >> @@ -343,6 +360,26 @@ static struct dmi_system_id acpisleep_dm
> >>   DMI_MATCH(DMI_PRODUCT_NAME, "80E3"),
> >>   },
> >>   },
> >> + /*
> >> +  * Enable the EC to wake up the system from suspend-to-idle to allow
> >> +  * power button events to it wake up.
> >> +  */
> >> + {
> >> +  .callback = init_ec_gpe_wakeup,
> >> +  .ident = "Dell XPS 13 9360",
> >> +  .matches = {
> >> + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
> >> + DMI_MATCH(DMI_PRODUCT_NAME, "XPS 13 9360"),
> >> + },
> >> + },
> >> + {
> >> +  .callback = init_ec_gpe_wakeup,
> >> +  .ident = "Dell XPS 13 9365",
> >> +  .matches = {
> >> + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
> >> + DMI_MATCH(DMI_PRODUCT_NAME, "XPS 13 9365"),
> >> + },
> >> + },
> >&

RE: [PATCH v2 3/3] ACPI / sleep: EC-based wakeup from suspend-to-idle on recent Dell systems

2017-06-20 Thread Zheng, Lv
Hi,

> From: rjwyso...@gmail.com [mailto:rjwyso...@gmail.com] On Behalf Of Rafael J. 
> Wysocki
> Subject: Re: [PATCH v2 3/3] ACPI / sleep: EC-based wakeup from 
> suspend-to-idle on recent Dell systems
> 
> On Tue, Jun 20, 2017 at 1:37 AM, Zheng, Lv  wrote:
> > Hi, Rafael
> >
> >> From: linux-acpi-ow...@vger.kernel.org 
> >> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of
> Rafael J.
> >> Wysocki
> >> Subject: [PATCH v2 3/3] ACPI / sleep: EC-based wakeup from suspend-to-idle 
> >> on recent Dell systems
> >>
> >> From: Rafael J. Wysocki 
> >>
> >> Some recent Dell laptops, including the XPS13 model numbers 9360 and
> >> 9365, cannot be woken up from suspend-to-idle by pressing the power
> >> button which is unexpected and makes that feature less usable on
> >> those systems.  Moreover, on the 9365 ACPI S3 (suspend-to-RAM) is
> >> not expected to be used at all (the OS these systems ship with never
> >> exercises the ACPI S3 path) and suspend-to-idle is the only viable
> >> system suspend mechanism in there.
> >>
> >> The reason why the power button wakeup from suspend-to-idle doesn't
> >> work on those systems is because their power button events are
> >> signaled by the EC (Embedded Controller), whose GPE (General Purpose
> >> Event) line is disabled during suspend-to-idle transitions in Linux.
> >> That is done on purpose, because in general the EC tends to generate
> >> tons of events for various reasons (battery and thermal updates and
> >> similar, for example) and all of them would kick the CPUs out of deep
> >> idle states while in suspend-to-idle, which effectively would defeat
> >> its purpose.
> >>
> >> Of course, on the Dell systems in question the EC GPE must be enabled
> >> during suspend-to-idle transitions for the button press events to
> >> be signaled while suspended at all.  For this reason, add a DMI
> >> switch to the ACPI system suspend infrastructure to treat the EC
> >> GPE as a wakeup one on the affected Dell systems.  In case the
> >> users would prefer not to do that after all, add a new kernel
> >> command line switch, acpi_sleep=no_ec_wakeup, to disable that new
> >> behavior.
> >>
> 
> [cut]
> 
> >>
> >> Index: linux-pm/drivers/acpi/sleep.c
> >> ===
> >> --- linux-pm.orig/drivers/acpi/sleep.c
> >> +++ linux-pm/drivers/acpi/sleep.c
> >> @@ -160,6 +160,23 @@ static int __init init_nvs_nosave(const
> >>   return 0;
> >>  }
> >>
> >> +/* If set, it is allowed to use the EC GPE to wake up the system. */
> >> +static bool ec_gpe_wakeup_allowed __initdata = true;
> >> +
> >> +void __init acpi_disable_ec_gpe_wakeup(void)
> >> +{
> >> + ec_gpe_wakeup_allowed = false;
> >> +}
> >> +
> >> +/* If set, the EC GPE will be configured to wake up the system. */
> >> +static bool ec_gpe_wakeup;
> >> +
> >> +static int __init init_ec_gpe_wakeup(const struct dmi_system_id *d)
> >> +{
> >> + ec_gpe_wakeup = ec_gpe_wakeup_allowed;
> >> + return 0;
> >> +}
> >> +
> >>  static struct dmi_system_id acpisleep_dmi_table[] __initdata = {
> >>   {
> >>   .callback = init_old_suspend_ordering,
> >> @@ -343,6 +360,26 @@ static struct dmi_system_id acpisleep_dm
> >>   DMI_MATCH(DMI_PRODUCT_NAME, "80E3"),
> >>   },
> >>   },
> >> + /*
> >> +  * Enable the EC to wake up the system from suspend-to-idle to allow
> >> +  * power button events to it wake up.
> >> +  */
> >> + {
> >> +  .callback = init_ec_gpe_wakeup,
> >> +  .ident = "Dell XPS 13 9360",
> >> +  .matches = {
> >> + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
> >> + DMI_MATCH(DMI_PRODUCT_NAME, "XPS 13 9360"),
> >> + },
> >> + },
> >> + {
> >> +  .callback = init_ec_gpe_wakeup,
> >> +  .ident = "Dell XPS 13 9365",
> >> +  .matches = {
> >> + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
> >> + DMI_MATCH(DMI_PRODUCT_NAME, "XPS 13 9365"),
> >> + },
> >> + },
> >>   {},
> >>  };
> >>
> >

RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-19 Thread Zheng, Lv
Hi,

> From: Bastien Nocera [mailto:had...@hadess.net]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> On Mon, 2017-06-19 at 01:43 +, Zheng, Lv wrote:
> > 
> > >
> > > If you implement it in such a way that GNOME settings daemon
> > > behaves weirdly, you'll get my revert
> > > request in the mail. Do. Not. Ever. Lie.
> >
> > First, I don't know what should be reverted...
> > I have 2 solutions here for review, and Benjamin has 1.
> > And none of them has been upstreamed.
> > We are just discussing.
> 
> The discussion is getting tiring quite frankly. We've been over this
> for nearly a year now, and with no end in sight.

We have concerns to introduce too complicated logics to such a
simple button driver especially the logics are related to platform
firmware, input ABI and user space behaviors.

I understand the situation.
Anyway this shouldn't be a big deal.
Let's prepare a smarter series to collect all fixes and solutions
with runtime configurables and get that to the end users.
So that we can figure out which is the simplest solution.

But before that, let me ask several questions about gnome-setting-deamon.

> 
> > However we need to get 1 of them upstreamed in next cycle.
> >
> > I think users won't startup gnome-setting-daemon right after resume.
> > It should have already been started.
> >
> > There is only 1 platform may see delayed state update after resume.
> > Let's see if there is a practical issue.
> > 1. Before suspend, the "lid state" is "close", and
> > 2. After resume, the state might remain "close" for a while
> >Since libinput won't deliver close to userspace,
> >and gnome-setting-daemon listens to key switches, there is no
> > wrong behavior.
> 
> It doesn't. It listens to UPower, which tells user-space whether there
> is a lid switch, and whether it's opened or closed.

Thanks for the information.
However I don't see differences here.

> 
> > 3. Then after several seconds, "open" arrives.
> >gnome-setting-daemon re-arrange monitors and screen layouts in
> > response to the new event.
> 
> Just how is anyone supposed to know that there is an event coming?

Will UPower deliver EV_SW key events to gnome-setting-daemon?

> 
> > So there is no problem. IMO, there is no need to improve for post-
> > resume case.
> >
> > Users will just startup gnome-setting-daemon once after boot.
> > And it's likely that when it is started, the state is correct.
> 
> You cannot rely on when gnome-settings-daemon will be started to make
> *any* decision. Certainly not decisions on how the kernel should
> behave.

My bad wording, I just meant:
When gnome-settings-daemon is started is not related to what we are
discussing.

Do you want to fix regressions?
Or you want to fix new issues on recent platforms?
If you want to fix regressions, I think Benjamin has submitted a revision
to use old method mode, there shouldn't be regressions for
gnome-settings-daemon.

What else we want to do is to fix regressions related to systemd when
we go back to default method mode. Since there is no issue with systemd
233 and after just applying a small change, systemd 229 can also be
worked around, I mean dynamically add/remove input node is not strictly
required for achieving our purposes.

But if you want to fix new issues on new platforms, we can discuss
further and determine which program should be changed and which program
is the best candidate to stop all problems - the ACPI button driver or
the user space.

Cheers
Lv


RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-19 Thread Zheng, Lv
Hi,

> From: Bastien Nocera [mailto:had...@hadess.net]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> On Mon, 2017-06-19 at 01:43 +, Zheng, Lv wrote:
> > 
> > >
> > > If you implement it in such a way that GNOME settings daemon
> > > behaves weirdly, you'll get my revert
> > > request in the mail. Do. Not. Ever. Lie.
> >
> > First, I don't know what should be reverted...
> > I have 2 solutions here for review, and Benjamin has 1.
> > And none of them has been upstreamed.
> > We are just discussing.
> 
> The discussion is getting tiring quite frankly. We've been over this
> for nearly a year now, and with no end in sight.

We have concerns to introduce too complicated logics to such a
simple button driver especially the logics are related to platform
firmware, input ABI and user space behaviors.

I understand the situation.
Anyway this shouldn't be a big deal.
Let's prepare a smarter series to collect all fixes and solutions
with runtime configurables and get that to the end users.
So that we can figure out which is the simplest solution.

But before that, let me ask several questions about gnome-setting-deamon.

> 
> > However we need to get 1 of them upstreamed in next cycle.
> >
> > I think users won't startup gnome-setting-daemon right after resume.
> > It should have already been started.
> >
> > There is only 1 platform may see delayed state update after resume.
> > Let's see if there is a practical issue.
> > 1. Before suspend, the "lid state" is "close", and
> > 2. After resume, the state might remain "close" for a while
> >Since libinput won't deliver close to userspace,
> >and gnome-setting-daemon listens to key switches, there is no
> > wrong behavior.
> 
> It doesn't. It listens to UPower, which tells user-space whether there
> is a lid switch, and whether it's opened or closed.

Thanks for the information.
However I don't see differences here.

> 
> > 3. Then after several seconds, "open" arrives.
> >gnome-setting-daemon re-arrange monitors and screen layouts in
> > response to the new event.
> 
> Just how is anyone supposed to know that there is an event coming?

Will UPower deliver EV_SW key events to gnome-setting-daemon?

> 
> > So there is no problem. IMO, there is no need to improve for post-
> > resume case.
> >
> > Users will just startup gnome-setting-daemon once after boot.
> > And it's likely that when it is started, the state is correct.
> 
> You cannot rely on when gnome-settings-daemon will be started to make
> *any* decision. Certainly not decisions on how the kernel should
> behave.

My bad wording, I just meant:
When gnome-settings-daemon is started is not related to what we are
discussing.

Do you want to fix regressions?
Or you want to fix new issues on recent platforms?
If you want to fix regressions, I think Benjamin has submitted a revision
to use old method mode, there shouldn't be regressions for
gnome-settings-daemon.

What else we want to do is to fix regressions related to systemd when
we go back to default method mode. Since there is no issue with systemd
233 and after just applying a small change, systemd 229 can also be
worked around, I mean dynamically add/remove input node is not strictly
required for achieving our purposes.

But if you want to fix new issues on new platforms, we can discuss
further and determine which program should be changed and which program
is the best candidate to stop all problems - the ACPI button driver or
the user space.

Cheers
Lv


RE: [PATCH v2 3/3] ACPI / sleep: EC-based wakeup from suspend-to-idle on recent Dell systems

2017-06-19 Thread Zheng, Lv
Hi, Rafael

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Rafael J.
> Wysocki
> Subject: [PATCH v2 3/3] ACPI / sleep: EC-based wakeup from suspend-to-idle on 
> recent Dell systems
> 
> From: Rafael J. Wysocki 
> 
> Some recent Dell laptops, including the XPS13 model numbers 9360 and
> 9365, cannot be woken up from suspend-to-idle by pressing the power
> button which is unexpected and makes that feature less usable on
> those systems.  Moreover, on the 9365 ACPI S3 (suspend-to-RAM) is
> not expected to be used at all (the OS these systems ship with never
> exercises the ACPI S3 path) and suspend-to-idle is the only viable
> system suspend mechanism in there.
> 
> The reason why the power button wakeup from suspend-to-idle doesn't
> work on those systems is because their power button events are
> signaled by the EC (Embedded Controller), whose GPE (General Purpose
> Event) line is disabled during suspend-to-idle transitions in Linux.
> That is done on purpose, because in general the EC tends to generate
> tons of events for various reasons (battery and thermal updates and
> similar, for example) and all of them would kick the CPUs out of deep
> idle states while in suspend-to-idle, which effectively would defeat
> its purpose.
> 
> Of course, on the Dell systems in question the EC GPE must be enabled
> during suspend-to-idle transitions for the button press events to
> be signaled while suspended at all.  For this reason, add a DMI
> switch to the ACPI system suspend infrastructure to treat the EC
> GPE as a wakeup one on the affected Dell systems.  In case the
> users would prefer not to do that after all, add a new kernel
> command line switch, acpi_sleep=no_ec_wakeup, to disable that new
> behavior.
> 
> Signed-off-by: Rafael J. Wysocki 
> ---
> 
> -> v2: Added acpi_sleep=no_ec_wakeup to prevent EC events from waking up
>   the system from s2idle on systems where they do that by default.
> 
> ---
>  Documentation/admin-guide/kernel-parameters.txt |6 ++-
>  arch/x86/kernel/acpi/sleep.c|2 +
>  drivers/acpi/ec.c   |   19 ++
>  drivers/acpi/internal.h |2 +
>  drivers/acpi/sleep.c|   43 
> 
>  include/linux/acpi.h|1
>  6 files changed, 71 insertions(+), 2 deletions(-)
> 
> Index: linux-pm/drivers/acpi/ec.c
> ===
> --- linux-pm.orig/drivers/acpi/ec.c
> +++ linux-pm/drivers/acpi/ec.c
> @@ -40,6 +40,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #include "internal.h"
> @@ -1493,6 +1494,16 @@ static int acpi_ec_setup(struct acpi_ec
>   acpi_handle_info(ec->handle,
>"GPE=0x%lx, EC_CMD/EC_SC=0x%lx, EC_DATA=0x%lx\n",
>ec->gpe, ec->command_addr, ec->data_addr);
> +
> + /*
> +  * On some platforms the EC GPE is used for waking up the system from
> +  * suspend-to-idle, so mark it as a wakeup one.
> +  *
> +  * This can be done unconditionally, as the setting does not matter
> +  * until acpi_set_gpe_wake_mask() is called for the GPE.
> +  */
> + acpi_mark_gpe_for_wake(NULL, ec->gpe);
> +
>   return ret;
>  }
> 
> @@ -1835,8 +1846,11 @@ static int acpi_ec_suspend(struct device
>   struct acpi_ec *ec =
>   acpi_driver_data(to_acpi_device(dev));
> 
> - if (ec_freeze_events)
> + if (!pm_suspend_via_firmware() && acpi_sleep_ec_gpe_may_wakeup())
> + acpi_set_gpe_wake_mask(NULL, ec->gpe, ACPI_GPE_ENABLE);
> + else if (ec_freeze_events)
>   acpi_ec_disable_event(ec);
> +
>   return 0;
>  }
> 
> @@ -1846,6 +1860,9 @@ static int acpi_ec_resume(struct device
>   acpi_driver_data(to_acpi_device(dev));
> 
>   acpi_ec_enable_event(ec);
> + if (!pm_resume_via_firmware() && acpi_sleep_ec_gpe_may_wakeup())
> + acpi_set_gpe_wake_mask(NULL, ec->gpe, ACPI_GPE_DISABLE);
> +
>   return 0;
>  }
>  #endif
> Index: linux-pm/drivers/acpi/internal.h
> ===
> --- linux-pm.orig/drivers/acpi/internal.h
> +++ linux-pm/drivers/acpi/internal.h
> @@ -199,9 +199,11 @@ void acpi_ec_remove_query_handler(struct
>-- 
> */
>  #ifdef CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT
>  extern bool acpi_s2idle_wakeup(void);
> +extern bool acpi_sleep_ec_gpe_may_wakeup(void);
>  extern int acpi_sleep_init(void);
>  #else
>  static inline bool acpi_s2idle_wakeup(void) { return false; }
> +static inline bool acpi_sleep_ec_gpe_may_wakeup(void) { return false; }
>  static inline int acpi_sleep_init(void) { return -ENXIO; }
>  #endif
> 
> Index: linux-pm/drivers/acpi/sleep.c
> 

RE: [PATCH v2 3/3] ACPI / sleep: EC-based wakeup from suspend-to-idle on recent Dell systems

2017-06-19 Thread Zheng, Lv
Hi, Rafael

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Rafael J.
> Wysocki
> Subject: [PATCH v2 3/3] ACPI / sleep: EC-based wakeup from suspend-to-idle on 
> recent Dell systems
> 
> From: Rafael J. Wysocki 
> 
> Some recent Dell laptops, including the XPS13 model numbers 9360 and
> 9365, cannot be woken up from suspend-to-idle by pressing the power
> button which is unexpected and makes that feature less usable on
> those systems.  Moreover, on the 9365 ACPI S3 (suspend-to-RAM) is
> not expected to be used at all (the OS these systems ship with never
> exercises the ACPI S3 path) and suspend-to-idle is the only viable
> system suspend mechanism in there.
> 
> The reason why the power button wakeup from suspend-to-idle doesn't
> work on those systems is because their power button events are
> signaled by the EC (Embedded Controller), whose GPE (General Purpose
> Event) line is disabled during suspend-to-idle transitions in Linux.
> That is done on purpose, because in general the EC tends to generate
> tons of events for various reasons (battery and thermal updates and
> similar, for example) and all of them would kick the CPUs out of deep
> idle states while in suspend-to-idle, which effectively would defeat
> its purpose.
> 
> Of course, on the Dell systems in question the EC GPE must be enabled
> during suspend-to-idle transitions for the button press events to
> be signaled while suspended at all.  For this reason, add a DMI
> switch to the ACPI system suspend infrastructure to treat the EC
> GPE as a wakeup one on the affected Dell systems.  In case the
> users would prefer not to do that after all, add a new kernel
> command line switch, acpi_sleep=no_ec_wakeup, to disable that new
> behavior.
> 
> Signed-off-by: Rafael J. Wysocki 
> ---
> 
> -> v2: Added acpi_sleep=no_ec_wakeup to prevent EC events from waking up
>   the system from s2idle on systems where they do that by default.
> 
> ---
>  Documentation/admin-guide/kernel-parameters.txt |6 ++-
>  arch/x86/kernel/acpi/sleep.c|2 +
>  drivers/acpi/ec.c   |   19 ++
>  drivers/acpi/internal.h |2 +
>  drivers/acpi/sleep.c|   43 
> 
>  include/linux/acpi.h|1
>  6 files changed, 71 insertions(+), 2 deletions(-)
> 
> Index: linux-pm/drivers/acpi/ec.c
> ===
> --- linux-pm.orig/drivers/acpi/ec.c
> +++ linux-pm/drivers/acpi/ec.c
> @@ -40,6 +40,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #include "internal.h"
> @@ -1493,6 +1494,16 @@ static int acpi_ec_setup(struct acpi_ec
>   acpi_handle_info(ec->handle,
>"GPE=0x%lx, EC_CMD/EC_SC=0x%lx, EC_DATA=0x%lx\n",
>ec->gpe, ec->command_addr, ec->data_addr);
> +
> + /*
> +  * On some platforms the EC GPE is used for waking up the system from
> +  * suspend-to-idle, so mark it as a wakeup one.
> +  *
> +  * This can be done unconditionally, as the setting does not matter
> +  * until acpi_set_gpe_wake_mask() is called for the GPE.
> +  */
> + acpi_mark_gpe_for_wake(NULL, ec->gpe);
> +
>   return ret;
>  }
> 
> @@ -1835,8 +1846,11 @@ static int acpi_ec_suspend(struct device
>   struct acpi_ec *ec =
>   acpi_driver_data(to_acpi_device(dev));
> 
> - if (ec_freeze_events)
> + if (!pm_suspend_via_firmware() && acpi_sleep_ec_gpe_may_wakeup())
> + acpi_set_gpe_wake_mask(NULL, ec->gpe, ACPI_GPE_ENABLE);
> + else if (ec_freeze_events)
>   acpi_ec_disable_event(ec);
> +
>   return 0;
>  }
> 
> @@ -1846,6 +1860,9 @@ static int acpi_ec_resume(struct device
>   acpi_driver_data(to_acpi_device(dev));
> 
>   acpi_ec_enable_event(ec);
> + if (!pm_resume_via_firmware() && acpi_sleep_ec_gpe_may_wakeup())
> + acpi_set_gpe_wake_mask(NULL, ec->gpe, ACPI_GPE_DISABLE);
> +
>   return 0;
>  }
>  #endif
> Index: linux-pm/drivers/acpi/internal.h
> ===
> --- linux-pm.orig/drivers/acpi/internal.h
> +++ linux-pm/drivers/acpi/internal.h
> @@ -199,9 +199,11 @@ void acpi_ec_remove_query_handler(struct
>-- 
> */
>  #ifdef CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT
>  extern bool acpi_s2idle_wakeup(void);
> +extern bool acpi_sleep_ec_gpe_may_wakeup(void);
>  extern int acpi_sleep_init(void);
>  #else
>  static inline bool acpi_s2idle_wakeup(void) { return false; }
> +static inline bool acpi_sleep_ec_gpe_may_wakeup(void) { return false; }
>  static inline int acpi_sleep_init(void) { return -ENXIO; }
>  #endif
> 
> Index: linux-pm/drivers/acpi/sleep.c
> 

RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-18 Thread Zheng, Lv
Hi, Lennart

> From: Lennart Poettering [mailto:mzxre...@0pointer.de]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> On Fri, 16.06.17 11:06, Bastien Nocera (had...@hadess.net) wrote:
> 
> > > Let's consider this case with delay:
> > > After resume, gnome-setting-daemon queries SW_LID and got "close".
> > > Then it lights up the wrong monitors.
> > > Then I believe "open" will be delivered to it several seconds later.
> > > Should gnome-setting-daemon light-up correct monitors this time?
> > > So it just looks like user programs behave with a delay accordingly 
> > > because of the "platform
> turnaround" delay.
> >
> > If you implement it in such a way that GNOME settings daemon behaves 
> > weirdly, you'll get my revert
> request in the mail. Do. Not. Ever. Lie.
> 
> Just to mention this:
> 
> the reason logind applies the timeout and doesn't immediately react to
> lid changes is to be friendly to users, if they quickly close and
> reopen the lid. It's not supposed to be a work-around around broken
> input drivers.

I see, it's same reason for button driver to prepare "lid_report_interval".

I think all old user reports are meaningless to us.
At that time, we found 2 problems in systemd (version below 229):
1. If no "open" event received after resume, systemd suspends the platform.
2. If an "open" event received after a "close" event, the suspend cannot be
   cancelled, systemd still suspends the platform.
It looks the 2 problems are 1 single issue that has already been fixed in
recent systemd (I confirmed that this has been fixed in 233).
It's hard for a kernel driver to work these 2 problems around.

> 
> I am very sure that input drivers shouldn't lie to userspace. If you
> don't know the state of the switch, then you don#t know it, and should
> clarify that to userspace somehow.

Without considering "Surface Pro 1" case which requires a quirk anyway.

For my version 2 solution, for all other platforms, there is no "lie".
There is only a delay, and it's likely there is only 1 platform suffering
from such a delay.

Considering a platform that suffers from such a delay:
Before the platform sends the "open" event, the old cached state is "close".
And input layer automatically filters redundant "close".
Thus systemd won't see any event after resume.
 
Then after several seconds, (can be configured by HoldoffTimeoutSec),
The lid status is turned into correct and systemd can see an "open".
So there won't be a problem.
I can tell you that I tested systemd 229/233, no problem can be seen with
version 2 solution on all those platforms.

Since everything works, I mean why we need to change the ACPI driven
SW_LID into a "fade-in/out" input node.

On the contrary:
1. I feel the delay is common:
If an HID device is built on top of USBIP, there is always delays
(several seconds as network turnaround) for its SW_xxx keys if any.
So do we need to change all HID device drivers to export SW keys into
fade-in/out style just because the underlying transport layer may change?
For the case we are discussing, it's just the underlying transport layer
is the platform hardware/firmware and some of them have a huge delay.
2. I feel the delay is inevitable:
If kernel must ensure to resume userspace after determining the wakeup
reason and after the related wakeup source hardware or firmware driver
has synchronized the states. It then will be a long-time-consuming
suspend/resume cycle and cannot meet the fast-idle-entry/exit
requirements of the modern idle platforms. And even worse thing is,
for most of the hardware/firmware drivers, they don't even know that
the hardware/firmware driven by them are the waking the platform up.

I feel it's too early to say that we need such a big change.
We can wait and see if there are any further use cases requiring us to
handle before making such a big change.

Cheers,
Lv


RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-18 Thread Zheng, Lv
Hi, Lennart

> From: Lennart Poettering [mailto:mzxre...@0pointer.de]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> On Fri, 16.06.17 11:06, Bastien Nocera (had...@hadess.net) wrote:
> 
> > > Let's consider this case with delay:
> > > After resume, gnome-setting-daemon queries SW_LID and got "close".
> > > Then it lights up the wrong monitors.
> > > Then I believe "open" will be delivered to it several seconds later.
> > > Should gnome-setting-daemon light-up correct monitors this time?
> > > So it just looks like user programs behave with a delay accordingly 
> > > because of the "platform
> turnaround" delay.
> >
> > If you implement it in such a way that GNOME settings daemon behaves 
> > weirdly, you'll get my revert
> request in the mail. Do. Not. Ever. Lie.
> 
> Just to mention this:
> 
> the reason logind applies the timeout and doesn't immediately react to
> lid changes is to be friendly to users, if they quickly close and
> reopen the lid. It's not supposed to be a work-around around broken
> input drivers.

I see, it's same reason for button driver to prepare "lid_report_interval".

I think all old user reports are meaningless to us.
At that time, we found 2 problems in systemd (version below 229):
1. If no "open" event received after resume, systemd suspends the platform.
2. If an "open" event received after a "close" event, the suspend cannot be
   cancelled, systemd still suspends the platform.
It looks the 2 problems are 1 single issue that has already been fixed in
recent systemd (I confirmed that this has been fixed in 233).
It's hard for a kernel driver to work these 2 problems around.

> 
> I am very sure that input drivers shouldn't lie to userspace. If you
> don't know the state of the switch, then you don#t know it, and should
> clarify that to userspace somehow.

Without considering "Surface Pro 1" case which requires a quirk anyway.

For my version 2 solution, for all other platforms, there is no "lie".
There is only a delay, and it's likely there is only 1 platform suffering
from such a delay.

Considering a platform that suffers from such a delay:
Before the platform sends the "open" event, the old cached state is "close".
And input layer automatically filters redundant "close".
Thus systemd won't see any event after resume.
 
Then after several seconds, (can be configured by HoldoffTimeoutSec),
The lid status is turned into correct and systemd can see an "open".
So there won't be a problem.
I can tell you that I tested systemd 229/233, no problem can be seen with
version 2 solution on all those platforms.

Since everything works, I mean why we need to change the ACPI driven
SW_LID into a "fade-in/out" input node.

On the contrary:
1. I feel the delay is common:
If an HID device is built on top of USBIP, there is always delays
(several seconds as network turnaround) for its SW_xxx keys if any.
So do we need to change all HID device drivers to export SW keys into
fade-in/out style just because the underlying transport layer may change?
For the case we are discussing, it's just the underlying transport layer
is the platform hardware/firmware and some of them have a huge delay.
2. I feel the delay is inevitable:
If kernel must ensure to resume userspace after determining the wakeup
reason and after the related wakeup source hardware or firmware driver
has synchronized the states. It then will be a long-time-consuming
suspend/resume cycle and cannot meet the fast-idle-entry/exit
requirements of the modern idle platforms. And even worse thing is,
for most of the hardware/firmware drivers, they don't even know that
the hardware/firmware driven by them are the waking the platform up.

I feel it's too early to say that we need such a big change.
We can wait and see if there are any further use cases requiring us to
handle before making such a big change.

Cheers,
Lv


RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-18 Thread Zheng, Lv
Hi,

> From: Bastien Nocera [mailto:had...@hadess.net]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> 
> 
> > On 16 Jun 2017, at 10:53, Zheng, Lv <lv.zh...@intel.com> wrote:
> >
> > Hi,
> >
> >> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> >> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID 
> >> switch exported by ACPI
> >>
> >>> On Jun 16 2017 or thereabouts, Zheng, Lv wrote:
> >>> Hi, Benjamin
> >>>
> >>> Let me just say something one more time.
> >>>
> >>>> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> >>
> >> [snip]
> >>>>>>>
> >>>>>>> We can see:
> >>>>>>> "logind" has already implemented a timeout, and will not respond lid 
> >>>>>>> state
> >>>>>>> unless it can be stable within this timeout period.
> >>>>>>> I'm not an expert of logind, maybe this is because of 
> >>>>>>> "HoldOffTimeoutSec"?
> >>>>>>>
> >>>>>>> I feel "removing the input node for a period where its state is not 
> >>>>>>> trustful"
> >>>>>>> is technically identical to this mechanism.
> >>>>>>
> >>>>>> but you'd be making kernel policy based on one userspace 
> >>>>>> implementation.
> >>>>>> e.g. libinput doesn't have a timeout period, it assumes the state is
> >>>>>> correct when an input node is present.
> >>>>>
> >>>>> Do you see practical issues?
> >>>>
> >>>> Yes, libinput can't rely on the LID switch information to disable
> >>>> touchpads/touchscreens that are potentially sending false positive.
> >>>
> >>> "potential" doesn't mean "practical", right?
> >>
> >> I was using potential to say that some actual devices are sending
> >> rightful states, while others are not (we already named them a lot in
> >> those countless threads). So potential here is from a user space
> >> perspective where you are not sure if the state is reliable or not
> >> (given we currently don't have this information about reliability).
> >>
> >>> After applying my last version.
> >>> There are no false-positives IMO.
> >>> There are only delays for the reliable key events.
> >>>   ^^
> >>> While the "delay" is very common in computing world.
> >>
> >> No, if there is a delay, there is a false positive, because the initial
> >> state is wrong with respect to the device physical state.
> >>
> >>>
> >>>>> After resume, SW_LID state could remain unreliable "close" for a while.
> >>>>
> >>>> This is not an option. It is not part of the protocol, having an
> >>>> unreliable state.
> >>>>
> >>>>> But that's just a kind of delay happens in all computing programs.
> >>>>> I suppose all power managing programs have already handled that.
> >>>>> I confirmed no breakage for systemd 233.
> >>>>> For systemd 229, it cannot handle it well due to bugs.
> >>>>> But my latest patch series has worked the bug around.
> >>>>> So I don't see any breakage related to post-resume incorrect state 
> >>>>> period.
> >>>>> Do you see problems that my tests haven't covered?
> >>>>
> >>>> The problems are that you are not following the protocol. And if systemd
> >>>> 233 works around it, that's good, but systemd is not the only listener
> >>>> of the LID switch input node, and you are still breaking those by
> >>>> refusing to follow the specification of the evdev protocol.
> >>>
> >>> As you are talking about protocol, let me just ask once.
> >>>
> >>> In computing world,
> >>> 1. delay is very common
> >>>   There are bus turnaround, network turnaround, ...
> >>>   Even measurement itself has delay described by Shannon sampling.
> >>>   Should the delay be a part of the protocol?
> >>
> >> Please, you are either trolling or just kidding. If the

RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-18 Thread Zheng, Lv
Hi,

> From: Bastien Nocera [mailto:had...@hadess.net]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> 
> 
> > On 16 Jun 2017, at 10:53, Zheng, Lv  wrote:
> >
> > Hi,
> >
> >> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> >> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID 
> >> switch exported by ACPI
> >>
> >>> On Jun 16 2017 or thereabouts, Zheng, Lv wrote:
> >>> Hi, Benjamin
> >>>
> >>> Let me just say something one more time.
> >>>
> >>>> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> >>
> >> [snip]
> >>>>>>>
> >>>>>>> We can see:
> >>>>>>> "logind" has already implemented a timeout, and will not respond lid 
> >>>>>>> state
> >>>>>>> unless it can be stable within this timeout period.
> >>>>>>> I'm not an expert of logind, maybe this is because of 
> >>>>>>> "HoldOffTimeoutSec"?
> >>>>>>>
> >>>>>>> I feel "removing the input node for a period where its state is not 
> >>>>>>> trustful"
> >>>>>>> is technically identical to this mechanism.
> >>>>>>
> >>>>>> but you'd be making kernel policy based on one userspace 
> >>>>>> implementation.
> >>>>>> e.g. libinput doesn't have a timeout period, it assumes the state is
> >>>>>> correct when an input node is present.
> >>>>>
> >>>>> Do you see practical issues?
> >>>>
> >>>> Yes, libinput can't rely on the LID switch information to disable
> >>>> touchpads/touchscreens that are potentially sending false positive.
> >>>
> >>> "potential" doesn't mean "practical", right?
> >>
> >> I was using potential to say that some actual devices are sending
> >> rightful states, while others are not (we already named them a lot in
> >> those countless threads). So potential here is from a user space
> >> perspective where you are not sure if the state is reliable or not
> >> (given we currently don't have this information about reliability).
> >>
> >>> After applying my last version.
> >>> There are no false-positives IMO.
> >>> There are only delays for the reliable key events.
> >>>   ^^
> >>> While the "delay" is very common in computing world.
> >>
> >> No, if there is a delay, there is a false positive, because the initial
> >> state is wrong with respect to the device physical state.
> >>
> >>>
> >>>>> After resume, SW_LID state could remain unreliable "close" for a while.
> >>>>
> >>>> This is not an option. It is not part of the protocol, having an
> >>>> unreliable state.
> >>>>
> >>>>> But that's just a kind of delay happens in all computing programs.
> >>>>> I suppose all power managing programs have already handled that.
> >>>>> I confirmed no breakage for systemd 233.
> >>>>> For systemd 229, it cannot handle it well due to bugs.
> >>>>> But my latest patch series has worked the bug around.
> >>>>> So I don't see any breakage related to post-resume incorrect state 
> >>>>> period.
> >>>>> Do you see problems that my tests haven't covered?
> >>>>
> >>>> The problems are that you are not following the protocol. And if systemd
> >>>> 233 works around it, that's good, but systemd is not the only listener
> >>>> of the LID switch input node, and you are still breaking those by
> >>>> refusing to follow the specification of the evdev protocol.
> >>>
> >>> As you are talking about protocol, let me just ask once.
> >>>
> >>> In computing world,
> >>> 1. delay is very common
> >>>   There are bus turnaround, network turnaround, ...
> >>>   Even measurement itself has delay described by Shannon sampling.
> >>>   Should the delay be a part of the protocol?
> >>
> >> Please, you are either trolling or just kidding. If there are delays in
> >> th

RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-16 Thread Zheng, Lv
Hi,

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> On Jun 16 2017 or thereabouts, Zheng, Lv wrote:
> > Hi, Benjamin
> >
> > Let me just say something one more time.
> >
> > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> 
> [snip]
> > > > > >
> > > > > > We can see:
> > > > > > "logind" has already implemented a timeout, and will not respond 
> > > > > > lid state
> > > > > > unless it can be stable within this timeout period.
> > > > > > I'm not an expert of logind, maybe this is because of 
> > > > > > "HoldOffTimeoutSec"?
> > > > > >
> > > > > > I feel "removing the input node for a period where its state is not 
> > > > > > trustful"
> > > > > > is technically identical to this mechanism.
> > > > >
> > > > > but you'd be making kernel policy based on one userspace 
> > > > > implementation.
> > > > > e.g. libinput doesn't have a timeout period, it assumes the state is
> > > > > correct when an input node is present.
> > > >
> > > > Do you see practical issues?
> > >
> > > Yes, libinput can't rely on the LID switch information to disable
> > > touchpads/touchscreens that are potentially sending false positive.
> >
> > "potential" doesn't mean "practical", right?
> 
> I was using potential to say that some actual devices are sending
> rightful states, while others are not (we already named them a lot in
> those countless threads). So potential here is from a user space
> perspective where you are not sure if the state is reliable or not
> (given we currently don't have this information about reliability).
> 
> > After applying my last version.
> > There are no false-positives IMO.
> > There are only delays for the reliable key events.
> >^^
> > While the "delay" is very common in computing world.
> 
> No, if there is a delay, there is a false positive, because the initial
> state is wrong with respect to the device physical state.
> 
> >
> > > > After resume, SW_LID state could remain unreliable "close" for a while.
> > >
> > > This is not an option. It is not part of the protocol, having an
> > > unreliable state.
> > >
> > > > But that's just a kind of delay happens in all computing programs.
> > > > I suppose all power managing programs have already handled that.
> > > > I confirmed no breakage for systemd 233.
> > > > For systemd 229, it cannot handle it well due to bugs.
> > > > But my latest patch series has worked the bug around.
> > > > So I don't see any breakage related to post-resume incorrect state 
> > > > period.
> > > > Do you see problems that my tests haven't covered?
> > >
> > > The problems are that you are not following the protocol. And if systemd
> > > 233 works around it, that's good, but systemd is not the only listener
> > > of the LID switch input node, and you are still breaking those by
> > > refusing to follow the specification of the evdev protocol.
> >
> > As you are talking about protocol, let me just ask once.
> >
> > In computing world,
> > 1. delay is very common
> >There are bus turnaround, network turnaround, ...
> >Even measurement itself has delay described by Shannon sampling.
> >Should the delay be a part of the protocol?
> 
> Please, you are either trolling or just kidding. If there are delays in
> the "computing world", these has to be handled by the kernel, and not
> exported to the user space if the kernel protocol says that the state is
> reliable.
> 
> > 2. programs are acting according to rules (we call state machines)
> >States are only determined after measurement (like "quantum states")
> >I have Schroedinger's cat in my mind.
> >Events are determined as they always occur after measurement to trigger 
> > "quantum jumps".
> >So for EV_SW protocol,
> >Should programs rely on the reliable "quantum jumps",
> >Or should programs rely on the unreliable "quantum states"?
> 
> No comments, this won't get us anywhere.
> 
> >
> > I think most UI programs care no about state store

RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-16 Thread Zheng, Lv
Hi,

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> On Jun 16 2017 or thereabouts, Zheng, Lv wrote:
> > Hi, Benjamin
> >
> > Let me just say something one more time.
> >
> > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> 
> [snip]
> > > > > >
> > > > > > We can see:
> > > > > > "logind" has already implemented a timeout, and will not respond 
> > > > > > lid state
> > > > > > unless it can be stable within this timeout period.
> > > > > > I'm not an expert of logind, maybe this is because of 
> > > > > > "HoldOffTimeoutSec"?
> > > > > >
> > > > > > I feel "removing the input node for a period where its state is not 
> > > > > > trustful"
> > > > > > is technically identical to this mechanism.
> > > > >
> > > > > but you'd be making kernel policy based on one userspace 
> > > > > implementation.
> > > > > e.g. libinput doesn't have a timeout period, it assumes the state is
> > > > > correct when an input node is present.
> > > >
> > > > Do you see practical issues?
> > >
> > > Yes, libinput can't rely on the LID switch information to disable
> > > touchpads/touchscreens that are potentially sending false positive.
> >
> > "potential" doesn't mean "practical", right?
> 
> I was using potential to say that some actual devices are sending
> rightful states, while others are not (we already named them a lot in
> those countless threads). So potential here is from a user space
> perspective where you are not sure if the state is reliable or not
> (given we currently don't have this information about reliability).
> 
> > After applying my last version.
> > There are no false-positives IMO.
> > There are only delays for the reliable key events.
> >^^
> > While the "delay" is very common in computing world.
> 
> No, if there is a delay, there is a false positive, because the initial
> state is wrong with respect to the device physical state.
> 
> >
> > > > After resume, SW_LID state could remain unreliable "close" for a while.
> > >
> > > This is not an option. It is not part of the protocol, having an
> > > unreliable state.
> > >
> > > > But that's just a kind of delay happens in all computing programs.
> > > > I suppose all power managing programs have already handled that.
> > > > I confirmed no breakage for systemd 233.
> > > > For systemd 229, it cannot handle it well due to bugs.
> > > > But my latest patch series has worked the bug around.
> > > > So I don't see any breakage related to post-resume incorrect state 
> > > > period.
> > > > Do you see problems that my tests haven't covered?
> > >
> > > The problems are that you are not following the protocol. And if systemd
> > > 233 works around it, that's good, but systemd is not the only listener
> > > of the LID switch input node, and you are still breaking those by
> > > refusing to follow the specification of the evdev protocol.
> >
> > As you are talking about protocol, let me just ask once.
> >
> > In computing world,
> > 1. delay is very common
> >There are bus turnaround, network turnaround, ...
> >Even measurement itself has delay described by Shannon sampling.
> >Should the delay be a part of the protocol?
> 
> Please, you are either trolling or just kidding. If there are delays in
> the "computing world", these has to be handled by the kernel, and not
> exported to the user space if the kernel protocol says that the state is
> reliable.
> 
> > 2. programs are acting according to rules (we call state machines)
> >States are only determined after measurement (like "quantum states")
> >I have Schroedinger's cat in my mind.
> >Events are determined as they always occur after measurement to trigger 
> > "quantum jumps".
> >So for EV_SW protocol,
> >Should programs rely on the reliable "quantum jumps",
> >Or should programs rely on the unreliable "quantum states"?
> 
> No comments, this won't get us anywhere.
> 
> >
> > I think most UI programs care no about state store

RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-16 Thread Zheng, Lv
Hi, Benjamin

Let me just say something one more time.

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> On Jun 16 2017 or thereabouts, Zheng, Lv wrote:
> > Hi,
> >
> > > From: linux-acpi-ow...@vger.kernel.org 
> > > [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of
> Peter
> > > Hutterer
> > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID 
> > > switch exported by ACPI
> > >
> > > On Thu, Jun 15, 2017 at 07:33:58AM +, Zheng, Lv wrote:
> > > > Hi, Peter
> > > >
> > > > > From: Peter Hutterer [mailto:peter.hutte...@who-t.net]
> > > > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable 
> > > > > LID switch exported by ACPI
> > > > >
> > > > > On Thu, Jun 15, 2017 at 02:52:57AM +, Zheng, Lv wrote:
> > > > > > Hi, Benjamin
> > > > > >
> > > > > > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> > > > > > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the 
> > > > > > > unreliable LID switch exported by
> ACPI
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > [Sorry for the delay, I have been sidetracked from this]
> > > > > > >
> > > > > > > On Jun 07 2017 or thereabouts, Lennart Poettering wrote:
> > > > > > > > On Thu, 01.06.17 20:46, Benjamin Tissoires 
> > > > > > > > (benjamin.tissoi...@redhat.com) wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > Sending this as a WIP as it still need a few changes, but it 
> > > > > > > > > mostly works as
> > > > > > > > > expected (still not fully compliant yet).
> > > > > > > > >
> > > > > > > > > So this is based on Lennart's comment in [1]: if the LID 
> > > > > > > > > state is not reliable,
> > > > > > > > > the kernel should not export the LID switch device as long as 
> > > > > > > > > we are not sure
> > > > > > > > > about its state.
> > > > > > > >
> > > > > > > > Ah nice! I (obviously) like this approach.
> > > > > > >
> > > > > > > Heh. Now I just need to convince Lv that it's the right approach.
> > > > > >
> > > > > > I feel we don't have big conflicts.
> > > > > > And I already took part of your idea into this patchset:
> > > > > > https://patchwork.kernel.org/patch/9771121/
> > > > > > https://patchwork.kernel.org/patch/9771119/
> > > > > > I tested my surface pros with Ubuntu, they are working as expected.
> > > > > >
> > > > > > > > > Note that systemd currently doesn't sync the state when the 
> > > > > > > > > input node just
> > > > > > > > > appears. This is a systemd bug, and it should not be handled 
> > > > > > > > > by the kernel
> > > > > > > > > community.
> > > > > > > >
> > > > > > > > Uh if this is borked, we should indeed fix this in systemd. Is 
> > > > > > > > there
> > > > > > > > already a systemd github bug about this? If not, please create 
> > > > > > > > one,
> > > > > > > > and we'll look into it!
> > > > > > >
> > > > > > > I don't think there is. I haven't raised it yet because I am not 
> > > > > > > so sure
> > > > > > > this will not break again those worthless unreliable LID, and if 
> > > > > > > we play
> > > > > > > whack a mole between the kernel and user space, things are going 
> > > > > > > to be
> > > > > > > nasty. So I'd rather have this fixed in systemd along with the
> > > > > > > unreliable LID switch knowledge, so we are sure that the kernel 
> > > > > > > behaves
> > > > > > &g

RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-16 Thread Zheng, Lv
Hi, Benjamin

Let me just say something one more time.

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> On Jun 16 2017 or thereabouts, Zheng, Lv wrote:
> > Hi,
> >
> > > From: linux-acpi-ow...@vger.kernel.org 
> > > [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of
> Peter
> > > Hutterer
> > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID 
> > > switch exported by ACPI
> > >
> > > On Thu, Jun 15, 2017 at 07:33:58AM +, Zheng, Lv wrote:
> > > > Hi, Peter
> > > >
> > > > > From: Peter Hutterer [mailto:peter.hutte...@who-t.net]
> > > > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable 
> > > > > LID switch exported by ACPI
> > > > >
> > > > > On Thu, Jun 15, 2017 at 02:52:57AM +, Zheng, Lv wrote:
> > > > > > Hi, Benjamin
> > > > > >
> > > > > > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> > > > > > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the 
> > > > > > > unreliable LID switch exported by
> ACPI
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > [Sorry for the delay, I have been sidetracked from this]
> > > > > > >
> > > > > > > On Jun 07 2017 or thereabouts, Lennart Poettering wrote:
> > > > > > > > On Thu, 01.06.17 20:46, Benjamin Tissoires 
> > > > > > > > (benjamin.tissoi...@redhat.com) wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > Sending this as a WIP as it still need a few changes, but it 
> > > > > > > > > mostly works as
> > > > > > > > > expected (still not fully compliant yet).
> > > > > > > > >
> > > > > > > > > So this is based on Lennart's comment in [1]: if the LID 
> > > > > > > > > state is not reliable,
> > > > > > > > > the kernel should not export the LID switch device as long as 
> > > > > > > > > we are not sure
> > > > > > > > > about its state.
> > > > > > > >
> > > > > > > > Ah nice! I (obviously) like this approach.
> > > > > > >
> > > > > > > Heh. Now I just need to convince Lv that it's the right approach.
> > > > > >
> > > > > > I feel we don't have big conflicts.
> > > > > > And I already took part of your idea into this patchset:
> > > > > > https://patchwork.kernel.org/patch/9771121/
> > > > > > https://patchwork.kernel.org/patch/9771119/
> > > > > > I tested my surface pros with Ubuntu, they are working as expected.
> > > > > >
> > > > > > > > > Note that systemd currently doesn't sync the state when the 
> > > > > > > > > input node just
> > > > > > > > > appears. This is a systemd bug, and it should not be handled 
> > > > > > > > > by the kernel
> > > > > > > > > community.
> > > > > > > >
> > > > > > > > Uh if this is borked, we should indeed fix this in systemd. Is 
> > > > > > > > there
> > > > > > > > already a systemd github bug about this? If not, please create 
> > > > > > > > one,
> > > > > > > > and we'll look into it!
> > > > > > >
> > > > > > > I don't think there is. I haven't raised it yet because I am not 
> > > > > > > so sure
> > > > > > > this will not break again those worthless unreliable LID, and if 
> > > > > > > we play
> > > > > > > whack a mole between the kernel and user space, things are going 
> > > > > > > to be
> > > > > > > nasty. So I'd rather have this fixed in systemd along with the
> > > > > > > unreliable LID switch knowledge, so we are sure that the kernel 
> > > > > > > behaves
> > > > > > &g

RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-15 Thread Zheng, Lv
Hi,

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Peter
> Hutterer
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> On Thu, Jun 15, 2017 at 07:33:58AM +, Zheng, Lv wrote:
> > Hi, Peter
> >
> > > From: Peter Hutterer [mailto:peter.hutte...@who-t.net]
> > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID 
> > > switch exported by ACPI
> > >
> > > On Thu, Jun 15, 2017 at 02:52:57AM +, Zheng, Lv wrote:
> > > > Hi, Benjamin
> > > >
> > > > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> > > > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable 
> > > > > LID switch exported by ACPI
> > > > >
> > > > > Hi,
> > > > >
> > > > > [Sorry for the delay, I have been sidetracked from this]
> > > > >
> > > > > On Jun 07 2017 or thereabouts, Lennart Poettering wrote:
> > > > > > On Thu, 01.06.17 20:46, Benjamin Tissoires 
> > > > > > (benjamin.tissoi...@redhat.com) wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Sending this as a WIP as it still need a few changes, but it 
> > > > > > > mostly works as
> > > > > > > expected (still not fully compliant yet).
> > > > > > >
> > > > > > > So this is based on Lennart's comment in [1]: if the LID state is 
> > > > > > > not reliable,
> > > > > > > the kernel should not export the LID switch device as long as we 
> > > > > > > are not sure
> > > > > > > about its state.
> > > > > >
> > > > > > Ah nice! I (obviously) like this approach.
> > > > >
> > > > > Heh. Now I just need to convince Lv that it's the right approach.
> > > >
> > > > I feel we don't have big conflicts.
> > > > And I already took part of your idea into this patchset:
> > > > https://patchwork.kernel.org/patch/9771121/
> > > > https://patchwork.kernel.org/patch/9771119/
> > > > I tested my surface pros with Ubuntu, they are working as expected.
> > > >
> > > > > > > Note that systemd currently doesn't sync the state when the input 
> > > > > > > node just
> > > > > > > appears. This is a systemd bug, and it should not be handled by 
> > > > > > > the kernel
> > > > > > > community.
> > > > > >
> > > > > > Uh if this is borked, we should indeed fix this in systemd. Is there
> > > > > > already a systemd github bug about this? If not, please create one,
> > > > > > and we'll look into it!
> > > > >
> > > > > I don't think there is. I haven't raised it yet because I am not so 
> > > > > sure
> > > > > this will not break again those worthless unreliable LID, and if we 
> > > > > play
> > > > > whack a mole between the kernel and user space, things are going to be
> > > > > nasty. So I'd rather have this fixed in systemd along with the
> > > > > unreliable LID switch knowledge, so we are sure that the kernel 
> > > > > behaves
> > > > > the way we expect it to be.
> > > >
> > > > This is my feeling:
> > > > We needn't go that far.
> > > > We can interpret "input node appears" into "default input node state".
> > >
> > > Sorry, can you clarify this bit please? I'm not sure what you mean here.
> > > Note that there's an unknown amount of time between "device node appearing
> > > in the system" and when a userspace process actually opens it and looks at
> > > its state. By then, the node may have changed state again.
> >
> > We can see:
> > "logind" has already implemented a timeout, and will not respond lid state
> > unless it can be stable within this timeout period.
> > I'm not an expert of logind, maybe this is because of "HoldOffTimeoutSec"?
> >
> > I feel "removing the input node for a period where its state is not 
> > trustful"
> > is technically identical to this mechanism.
> 
> but you'd be making kernel policy based on one userspace implementation.
> e.g. libinput doesn't have a timeout period, it assumes the state is
> correct when an input node is present.

Do you see practical issues?
If not, should we avoid over-engineering at this moment?

After resume, SW_LID state could remain unreliable "close" for a while.
But that's just a kind of delay happens in all computing programs.
I suppose all power managing programs have already handled that.
I confirmed no breakage for systemd 233.
For systemd 229, it cannot handle it well due to bugs.
But my latest patch series has worked the bug around.
So I don't see any breakage related to post-resume incorrect state period.
Do you see problems that my tests haven't covered?

So I wonder if you mean:
After boot, button driver should create input node right before sending first 
input report.
Is this exactly what you want me to improve?
If so, please also let me know if you have seen real issues related to this?

Cheers,
Lv


RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-15 Thread Zheng, Lv
Hi,

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Peter
> Hutterer
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> On Thu, Jun 15, 2017 at 07:33:58AM +, Zheng, Lv wrote:
> > Hi, Peter
> >
> > > From: Peter Hutterer [mailto:peter.hutte...@who-t.net]
> > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID 
> > > switch exported by ACPI
> > >
> > > On Thu, Jun 15, 2017 at 02:52:57AM +, Zheng, Lv wrote:
> > > > Hi, Benjamin
> > > >
> > > > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> > > > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable 
> > > > > LID switch exported by ACPI
> > > > >
> > > > > Hi,
> > > > >
> > > > > [Sorry for the delay, I have been sidetracked from this]
> > > > >
> > > > > On Jun 07 2017 or thereabouts, Lennart Poettering wrote:
> > > > > > On Thu, 01.06.17 20:46, Benjamin Tissoires 
> > > > > > (benjamin.tissoi...@redhat.com) wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Sending this as a WIP as it still need a few changes, but it 
> > > > > > > mostly works as
> > > > > > > expected (still not fully compliant yet).
> > > > > > >
> > > > > > > So this is based on Lennart's comment in [1]: if the LID state is 
> > > > > > > not reliable,
> > > > > > > the kernel should not export the LID switch device as long as we 
> > > > > > > are not sure
> > > > > > > about its state.
> > > > > >
> > > > > > Ah nice! I (obviously) like this approach.
> > > > >
> > > > > Heh. Now I just need to convince Lv that it's the right approach.
> > > >
> > > > I feel we don't have big conflicts.
> > > > And I already took part of your idea into this patchset:
> > > > https://patchwork.kernel.org/patch/9771121/
> > > > https://patchwork.kernel.org/patch/9771119/
> > > > I tested my surface pros with Ubuntu, they are working as expected.
> > > >
> > > > > > > Note that systemd currently doesn't sync the state when the input 
> > > > > > > node just
> > > > > > > appears. This is a systemd bug, and it should not be handled by 
> > > > > > > the kernel
> > > > > > > community.
> > > > > >
> > > > > > Uh if this is borked, we should indeed fix this in systemd. Is there
> > > > > > already a systemd github bug about this? If not, please create one,
> > > > > > and we'll look into it!
> > > > >
> > > > > I don't think there is. I haven't raised it yet because I am not so 
> > > > > sure
> > > > > this will not break again those worthless unreliable LID, and if we 
> > > > > play
> > > > > whack a mole between the kernel and user space, things are going to be
> > > > > nasty. So I'd rather have this fixed in systemd along with the
> > > > > unreliable LID switch knowledge, so we are sure that the kernel 
> > > > > behaves
> > > > > the way we expect it to be.
> > > >
> > > > This is my feeling:
> > > > We needn't go that far.
> > > > We can interpret "input node appears" into "default input node state".
> > >
> > > Sorry, can you clarify this bit please? I'm not sure what you mean here.
> > > Note that there's an unknown amount of time between "device node appearing
> > > in the system" and when a userspace process actually opens it and looks at
> > > its state. By then, the node may have changed state again.
> >
> > We can see:
> > "logind" has already implemented a timeout, and will not respond lid state
> > unless it can be stable within this timeout period.
> > I'm not an expert of logind, maybe this is because of "HoldOffTimeoutSec"?
> >
> > I feel "removing the input node for a period where its state is not 
> > trustful"
> > is technically identical to this mechanism.
> 
> but you'd be making kernel policy based on one userspace implementation.
> e.g. libinput doesn't have a timeout period, it assumes the state is
> correct when an input node is present.

Do you see practical issues?
If not, should we avoid over-engineering at this moment?

After resume, SW_LID state could remain unreliable "close" for a while.
But that's just a kind of delay happens in all computing programs.
I suppose all power managing programs have already handled that.
I confirmed no breakage for systemd 233.
For systemd 229, it cannot handle it well due to bugs.
But my latest patch series has worked the bug around.
So I don't see any breakage related to post-resume incorrect state period.
Do you see problems that my tests haven't covered?

So I wonder if you mean:
After boot, button driver should create input node right before sending first 
input report.
Is this exactly what you want me to improve?
If so, please also let me know if you have seen real issues related to this?

Cheers,
Lv


RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-15 Thread Zheng, Lv
Hi, Peter

> From: Peter Hutterer [mailto:peter.hutte...@who-t.net]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> On Thu, Jun 15, 2017 at 02:52:57AM +, Zheng, Lv wrote:
> > Hi, Benjamin
> >
> > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID 
> > > switch exported by ACPI
> > >
> > > Hi,
> > >
> > > [Sorry for the delay, I have been sidetracked from this]
> > >
> > > On Jun 07 2017 or thereabouts, Lennart Poettering wrote:
> > > > On Thu, 01.06.17 20:46, Benjamin Tissoires 
> > > > (benjamin.tissoi...@redhat.com) wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Sending this as a WIP as it still need a few changes, but it mostly 
> > > > > works as
> > > > > expected (still not fully compliant yet).
> > > > >
> > > > > So this is based on Lennart's comment in [1]: if the LID state is not 
> > > > > reliable,
> > > > > the kernel should not export the LID switch device as long as we are 
> > > > > not sure
> > > > > about its state.
> > > >
> > > > Ah nice! I (obviously) like this approach.
> > >
> > > Heh. Now I just need to convince Lv that it's the right approach.
> >
> > I feel we don't have big conflicts.
> > And I already took part of your idea into this patchset:
> > https://patchwork.kernel.org/patch/9771121/
> > https://patchwork.kernel.org/patch/9771119/
> > I tested my surface pros with Ubuntu, they are working as expected.
> >
> > > > > Note that systemd currently doesn't sync the state when the input 
> > > > > node just
> > > > > appears. This is a systemd bug, and it should not be handled by the 
> > > > > kernel
> > > > > community.
> > > >
> > > > Uh if this is borked, we should indeed fix this in systemd. Is there
> > > > already a systemd github bug about this? If not, please create one,
> > > > and we'll look into it!
> > >
> > > I don't think there is. I haven't raised it yet because I am not so sure
> > > this will not break again those worthless unreliable LID, and if we play
> > > whack a mole between the kernel and user space, things are going to be
> > > nasty. So I'd rather have this fixed in systemd along with the
> > > unreliable LID switch knowledge, so we are sure that the kernel behaves
> > > the way we expect it to be.
> >
> > This is my feeling:
> > We needn't go that far.
> > We can interpret "input node appears" into "default input node state".
> 
> Sorry, can you clarify this bit please? I'm not sure what you mean here.
> Note that there's an unknown amount of time between "device node appearing
> in the system" and when a userspace process actually opens it and looks at
> its state. By then, the node may have changed state again.

We can see:
"logind" has already implemented a timeout, and will not respond lid state
unless it can be stable within this timeout period.
I'm not an expert of logind, maybe this is because of "HoldOffTimeoutSec"?

I feel "removing the input node for a period where its state is not trustful"
is technically identical to this mechanism.

Cheers,
Lv

> 
> Cheers,
>Peter
> 
> > That's what you want for acpi button driver - we now defaults to "method" 
> > mode.
> >
> > What's your opinion?
> >
> > Thanks
> > Lv
> >


RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-15 Thread Zheng, Lv
Hi, Peter

> From: Peter Hutterer [mailto:peter.hutte...@who-t.net]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> On Thu, Jun 15, 2017 at 02:52:57AM +, Zheng, Lv wrote:
> > Hi, Benjamin
> >
> > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID 
> > > switch exported by ACPI
> > >
> > > Hi,
> > >
> > > [Sorry for the delay, I have been sidetracked from this]
> > >
> > > On Jun 07 2017 or thereabouts, Lennart Poettering wrote:
> > > > On Thu, 01.06.17 20:46, Benjamin Tissoires 
> > > > (benjamin.tissoi...@redhat.com) wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Sending this as a WIP as it still need a few changes, but it mostly 
> > > > > works as
> > > > > expected (still not fully compliant yet).
> > > > >
> > > > > So this is based on Lennart's comment in [1]: if the LID state is not 
> > > > > reliable,
> > > > > the kernel should not export the LID switch device as long as we are 
> > > > > not sure
> > > > > about its state.
> > > >
> > > > Ah nice! I (obviously) like this approach.
> > >
> > > Heh. Now I just need to convince Lv that it's the right approach.
> >
> > I feel we don't have big conflicts.
> > And I already took part of your idea into this patchset:
> > https://patchwork.kernel.org/patch/9771121/
> > https://patchwork.kernel.org/patch/9771119/
> > I tested my surface pros with Ubuntu, they are working as expected.
> >
> > > > > Note that systemd currently doesn't sync the state when the input 
> > > > > node just
> > > > > appears. This is a systemd bug, and it should not be handled by the 
> > > > > kernel
> > > > > community.
> > > >
> > > > Uh if this is borked, we should indeed fix this in systemd. Is there
> > > > already a systemd github bug about this? If not, please create one,
> > > > and we'll look into it!
> > >
> > > I don't think there is. I haven't raised it yet because I am not so sure
> > > this will not break again those worthless unreliable LID, and if we play
> > > whack a mole between the kernel and user space, things are going to be
> > > nasty. So I'd rather have this fixed in systemd along with the
> > > unreliable LID switch knowledge, so we are sure that the kernel behaves
> > > the way we expect it to be.
> >
> > This is my feeling:
> > We needn't go that far.
> > We can interpret "input node appears" into "default input node state".
> 
> Sorry, can you clarify this bit please? I'm not sure what you mean here.
> Note that there's an unknown amount of time between "device node appearing
> in the system" and when a userspace process actually opens it and looks at
> its state. By then, the node may have changed state again.

We can see:
"logind" has already implemented a timeout, and will not respond lid state
unless it can be stable within this timeout period.
I'm not an expert of logind, maybe this is because of "HoldOffTimeoutSec"?

I feel "removing the input node for a period where its state is not trustful"
is technically identical to this mechanism.

Cheers,
Lv

> 
> Cheers,
>Peter
> 
> > That's what you want for acpi button driver - we now defaults to "method" 
> > mode.
> >
> > What's your opinion?
> >
> > Thanks
> > Lv
> >


RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-14 Thread Zheng, Lv
Hi, Benjamin

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> Hi,
> 
> [Sorry for the delay, I have been sidetracked from this]
> 
> On Jun 07 2017 or thereabouts, Lennart Poettering wrote:
> > On Thu, 01.06.17 20:46, Benjamin Tissoires (benjamin.tissoi...@redhat.com) 
> > wrote:
> >
> > > Hi,
> > >
> > > Sending this as a WIP as it still need a few changes, but it mostly works 
> > > as
> > > expected (still not fully compliant yet).
> > >
> > > So this is based on Lennart's comment in [1]: if the LID state is not 
> > > reliable,
> > > the kernel should not export the LID switch device as long as we are not 
> > > sure
> > > about its state.
> >
> > Ah nice! I (obviously) like this approach.
> 
> Heh. Now I just need to convince Lv that it's the right approach.

I feel we don't have big conflicts.
And I already took part of your idea into this patchset:
https://patchwork.kernel.org/patch/9771121/
https://patchwork.kernel.org/patch/9771119/
I tested my surface pros with Ubuntu, they are working as expected.

> > > Note that systemd currently doesn't sync the state when the input node 
> > > just
> > > appears. This is a systemd bug, and it should not be handled by the kernel
> > > community.
> >
> > Uh if this is borked, we should indeed fix this in systemd. Is there
> > already a systemd github bug about this? If not, please create one,
> > and we'll look into it!
> 
> I don't think there is. I haven't raised it yet because I am not so sure
> this will not break again those worthless unreliable LID, and if we play
> whack a mole between the kernel and user space, things are going to be
> nasty. So I'd rather have this fixed in systemd along with the
> unreliable LID switch knowledge, so we are sure that the kernel behaves
> the way we expect it to be.

This is my feeling:
We needn't go that far.
We can interpret "input node appears" into "default input node state".
That's what you want for acpi button driver - we now defaults to "method" mode.

What's your opinion?

Thanks
Lv


RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-14 Thread Zheng, Lv
Hi, Benjamin

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> Hi,
> 
> [Sorry for the delay, I have been sidetracked from this]
> 
> On Jun 07 2017 or thereabouts, Lennart Poettering wrote:
> > On Thu, 01.06.17 20:46, Benjamin Tissoires (benjamin.tissoi...@redhat.com) 
> > wrote:
> >
> > > Hi,
> > >
> > > Sending this as a WIP as it still need a few changes, but it mostly works 
> > > as
> > > expected (still not fully compliant yet).
> > >
> > > So this is based on Lennart's comment in [1]: if the LID state is not 
> > > reliable,
> > > the kernel should not export the LID switch device as long as we are not 
> > > sure
> > > about its state.
> >
> > Ah nice! I (obviously) like this approach.
> 
> Heh. Now I just need to convince Lv that it's the right approach.

I feel we don't have big conflicts.
And I already took part of your idea into this patchset:
https://patchwork.kernel.org/patch/9771121/
https://patchwork.kernel.org/patch/9771119/
I tested my surface pros with Ubuntu, they are working as expected.

> > > Note that systemd currently doesn't sync the state when the input node 
> > > just
> > > appears. This is a systemd bug, and it should not be handled by the kernel
> > > community.
> >
> > Uh if this is borked, we should indeed fix this in systemd. Is there
> > already a systemd github bug about this? If not, please create one,
> > and we'll look into it!
> 
> I don't think there is. I haven't raised it yet because I am not so sure
> this will not break again those worthless unreliable LID, and if we play
> whack a mole between the kernel and user space, things are going to be
> nasty. So I'd rather have this fixed in systemd along with the
> unreliable LID switch knowledge, so we are sure that the kernel behaves
> the way we expect it to be.

This is my feeling:
We needn't go that far.
We can interpret "input node appears" into "default input node state".
That's what you want for acpi button driver - we now defaults to "method" mode.

What's your opinion?

Thanks
Lv


RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-14 Thread Zheng, Lv
Hi,

> From: Lennart Poettering [mailto:mzxre...@0pointer.de]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> On Thu, 01.06.17 20:46, Benjamin Tissoires (benjamin.tissoi...@redhat.com) 
> wrote:
> 
> > Hi,
> >
> > Sending this as a WIP as it still need a few changes, but it mostly works as
> > expected (still not fully compliant yet).
> >
> > So this is based on Lennart's comment in [1]: if the LID state is not 
> > reliable,
> > the kernel should not export the LID switch device as long as we are not 
> > sure
> > about its state.
> 
> Ah nice! I (obviously) like this approach.
> 
> > Note that systemd currently doesn't sync the state when the input node just
> > appears. This is a systemd bug, and it should not be handled by the kernel
> > community.
> 
> Uh if this is borked, we should indeed fix this in systemd. Is there
> already a systemd github bug about this? If not, please create one,
> and we'll look into it!

This is not my opinion.
My opinion is as follows.

We confirmed Ubuntu shipped systemd (version 229) with "reliable|unreliable" 
platforms.
We can see 2 problems:
1. LID_OPEN cannot cancel an on-going suspend sequence
   After boot, if user space receives "LID_CLOSE" key event,
   systemd may not suspend the platform right after seeing the event,
   it may suspend the platform several seconds later.
   This is not a problem.

   The problem is, if "LID_OPEN" is sent within this deferring period,
   Systemd doesn't cancel previously scheduled "suspend".
   And the platform may be suspended with lid opened.
   Then users need to close and re-open the lid to wake the system up.
   Causing another "LID_CLOSE/LID_OPEN" sequence delivered to the user space 
after resume.
   Users then can see a suspend/resume loop.
   This problem can even be seen on "reliable" platforms.
   It can be easily triggered by user actions.
2. Need explicit LID_OPEN to stay woken-up
   After boot, systemd seems to be wait for a significant "LID_OPEN".
   If it cannot see a "LID_OPEN" within several seconds,
   it suspends the platform.
   So if a platform doesn't send "LID_OPEN" or fails to send "LID_OPEN" within 
this period.
   Users then can see a suspend/resume loop.

However we've tested with github cloned systemd (version 233).
The 2 problems seem to have been fixed.
It works well with current ACPI button driver,
but you need to boot linux kernel with button.lid_init_state=ignore.
I don't know the story of the improvement.
Systemd developers should know that better than me.

So IMO, systemd needn't do any further improvement.
^^
^^

But the kernel button driver implements several "lid_init_state" modes.
It appears "method" mode is determined to be the default mode.
Thus we need to do:
1. improve button driver "method" mode to make systemd 233 work well with it.
2. determine if we need to improve button driver to make it work well with 
systemd 229.

Thanks and best regards
Lv 



RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-14 Thread Zheng, Lv
Hi,

> From: Lennart Poettering [mailto:mzxre...@0pointer.de]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch 
> exported by ACPI
> 
> On Thu, 01.06.17 20:46, Benjamin Tissoires (benjamin.tissoi...@redhat.com) 
> wrote:
> 
> > Hi,
> >
> > Sending this as a WIP as it still need a few changes, but it mostly works as
> > expected (still not fully compliant yet).
> >
> > So this is based on Lennart's comment in [1]: if the LID state is not 
> > reliable,
> > the kernel should not export the LID switch device as long as we are not 
> > sure
> > about its state.
> 
> Ah nice! I (obviously) like this approach.
> 
> > Note that systemd currently doesn't sync the state when the input node just
> > appears. This is a systemd bug, and it should not be handled by the kernel
> > community.
> 
> Uh if this is borked, we should indeed fix this in systemd. Is there
> already a systemd github bug about this? If not, please create one,
> and we'll look into it!

This is not my opinion.
My opinion is as follows.

We confirmed Ubuntu shipped systemd (version 229) with "reliable|unreliable" 
platforms.
We can see 2 problems:
1. LID_OPEN cannot cancel an on-going suspend sequence
   After boot, if user space receives "LID_CLOSE" key event,
   systemd may not suspend the platform right after seeing the event,
   it may suspend the platform several seconds later.
   This is not a problem.

   The problem is, if "LID_OPEN" is sent within this deferring period,
   Systemd doesn't cancel previously scheduled "suspend".
   And the platform may be suspended with lid opened.
   Then users need to close and re-open the lid to wake the system up.
   Causing another "LID_CLOSE/LID_OPEN" sequence delivered to the user space 
after resume.
   Users then can see a suspend/resume loop.
   This problem can even be seen on "reliable" platforms.
   It can be easily triggered by user actions.
2. Need explicit LID_OPEN to stay woken-up
   After boot, systemd seems to be wait for a significant "LID_OPEN".
   If it cannot see a "LID_OPEN" within several seconds,
   it suspends the platform.
   So if a platform doesn't send "LID_OPEN" or fails to send "LID_OPEN" within 
this period.
   Users then can see a suspend/resume loop.

However we've tested with github cloned systemd (version 233).
The 2 problems seem to have been fixed.
It works well with current ACPI button driver,
but you need to boot linux kernel with button.lid_init_state=ignore.
I don't know the story of the improvement.
Systemd developers should know that better than me.

So IMO, systemd needn't do any further improvement.
^^
^^

But the kernel button driver implements several "lid_init_state" modes.
It appears "method" mode is determined to be the default mode.
Thus we need to do:
1. improve button driver "method" mode to make systemd 233 work well with it.
2. determine if we need to improve button driver to make it work well with 
systemd 229.

Thanks and best regards
Lv 



RE: [PATCH] acpi: acpica: dsutils: fixanoff-by-one index

2017-06-08 Thread Zheng, Lv
Hi,

> From: Seraphime Kirkovski
> 
> On Wed, Jun 07, 2017 at 03:14:46PM +, Moore, Robert wrote:
> > I believe that the rationale for this is that at that point in the code, it 
> > is *guaranteed* that
> there is at least one operand; therefore the -1 would always be valid.
> >
> > In the end, we just deleted that call to
> > acpi_db_display_argument_object.

Yes, the AcpiDbDisplayArgumentObject() invocation itself here is not proper.
It can mess up debugging messages.

In debugging console, AcpiDbDisplayArgumentObject() can be enabled to display 
stacked objects.
In single step mode, it dumps stacked objects on the debugging console for each 
step.
It means AcpiDbDisplayArgumentObject() should be invoked for each 
AcpiDsObjStackPush().
There are only 2 AcpiDsObjStackPush() invocations.
Both of them have have already been paired with AcpiDbDisplayArgumentObject() 
in AcpiDsCreateOperand().

This isolated AcpiDbDisplayArgumentObject() tries to find back missing stacked 
object debugging information.
ACPI_DEBUG_PRINT((ACPI_DB_DISPATCH,
  "Argument previously created, already 
stacked\n"));
However there won't be such a missing object as all operands are created via 
AcpiDsCreateOperand().
Finally this invocation only generates redundant debugging logs on console:
 ArgObj: 00ADA610 Integer 
 ArgObj: 00ADA4F0 Integer 00FF
 ArgObj: 00ADA4F0 Integer 00FF <= redundant log generated by this 
line
 ArgObj: 00ADA580 Integer 
 ResultObj: 00ADA710 Integer 
So we just removed it.

> > I don't know if this change has made it into Linux yet.
> >
> The latest rc actually produces the UBSAN splat in my previous message.
> So I suppose, I have some buggy hardware/firmware.

It's in recent ACPICA release and hasn't been merged by Linux upstream.
It's still in community review list:
https://patchwork.kernel.org/project/linux-acpi/list/
named as:
[33/53] ACPICA: Dispatcher: Remove unnecessary call to debugger
You'll see it merged in the next 1-2 RCs if everything works smoothly.

Thanks
Lv


RE: [PATCH] acpi: acpica: dsutils: fixanoff-by-one index

2017-06-08 Thread Zheng, Lv
Hi,

> From: Seraphime Kirkovski
> 
> On Wed, Jun 07, 2017 at 03:14:46PM +, Moore, Robert wrote:
> > I believe that the rationale for this is that at that point in the code, it 
> > is *guaranteed* that
> there is at least one operand; therefore the -1 would always be valid.
> >
> > In the end, we just deleted that call to
> > acpi_db_display_argument_object.

Yes, the AcpiDbDisplayArgumentObject() invocation itself here is not proper.
It can mess up debugging messages.

In debugging console, AcpiDbDisplayArgumentObject() can be enabled to display 
stacked objects.
In single step mode, it dumps stacked objects on the debugging console for each 
step.
It means AcpiDbDisplayArgumentObject() should be invoked for each 
AcpiDsObjStackPush().
There are only 2 AcpiDsObjStackPush() invocations.
Both of them have have already been paired with AcpiDbDisplayArgumentObject() 
in AcpiDsCreateOperand().

This isolated AcpiDbDisplayArgumentObject() tries to find back missing stacked 
object debugging information.
ACPI_DEBUG_PRINT((ACPI_DB_DISPATCH,
  "Argument previously created, already 
stacked\n"));
However there won't be such a missing object as all operands are created via 
AcpiDsCreateOperand().
Finally this invocation only generates redundant debugging logs on console:
 ArgObj: 00ADA610 Integer 
 ArgObj: 00ADA4F0 Integer 00FF
 ArgObj: 00ADA4F0 Integer 00FF <= redundant log generated by this 
line
 ArgObj: 00ADA580 Integer 
 ResultObj: 00ADA710 Integer 
So we just removed it.

> > I don't know if this change has made it into Linux yet.
> >
> The latest rc actually produces the UBSAN splat in my previous message.
> So I suppose, I have some buggy hardware/firmware.

It's in recent ACPICA release and hasn't been merged by Linux upstream.
It's still in community review list:
https://patchwork.kernel.org/project/linux-acpi/list/
named as:
[33/53] ACPICA: Dispatcher: Remove unnecessary call to debugger
You'll see it merged in the next 1-2 RCs if everything works smoothly.

Thanks
Lv


RE: [PATCH v5] ACPICA: Tables: Add mechanism to allow to balance late stage acpi_get_table() independently

2017-06-07 Thread Zheng, Lv
Hi, Dan

> From: Dan Williams [mailto:dan.j.willi...@intel.com]
> Subject: Re: [PATCH v5] ACPICA: Tables: Add mechanism to allow to balance 
> late stage acpi_get_table()
> independently
> 
> On Wed, Jun 7, 2017 at 2:14 PM, Rafael J. Wysocki  wrote:
> > On Wed, Jun 7, 2017 at 8:41 AM, Dan Williams  
> > wrote:
> >> On Tue, Jun 6, 2017 at 9:54 PM, Lv Zheng  wrote:
> >>> Considering this case:
> >>> 1. A program opens a sysfs table file 65535 times, it can increase
> >>>validation_count and first increment cause the table to be mapped:
> >>> validation_count = 65535
> >>> 2. AML execution causes "Load" to be executed on the same table, this time
> >>>it cannot increase validation_count, so validation_count remains:
> >>> validation_count = 65535
> >>> 3. The program closes sysfs table file 65535 times, it can decrease
> >>>validation_count and the last decrement cause the table to be unmapped:
> >>> validation_count = 0
> >>> 4. AML code still accessing the loaded table, kernel crash can be 
> >>> observed.
> >>>
> >>> This is because orginally ACPICA doesn't support unmapping tables during
> >>> OS late stage. So the current code only allows unmapping tables during OS
> >>> early stage, and for late stage, no acpi_put_table() clones should be
> >>> invoked, especially cases that can trigger frequent invocations of
> >>> acpi_get_table()/acpi_put_table() are forbidden:
> >>> 1. sysfs table accesses
> >>> 2. dynamic Load/Unload opcode executions
> >>> 3. acpi_load_table()
> >>> 4. etc.
> >>> Such frequent acpi_put_table() balance changes have to be done altogether.
> >>>
> >>> This philosophy is not convenient for Linux driver writers. Since the API
> >>> is just there, developers will start to use acpi_put_table() during late
> >>> stage. So we need to consider a better mechanism to allow them to safely
> >>> invoke acpi_put_table().
> >>>
> >>> This patch provides such a mechanism by adding a validation_count
> >>> threashold. When it is reached, the validation_count can no longer be
> >>> incremented/decremented to invalidate the table descriptor (means
> >>> preventing table unmappings) so that acpi_put_table() balance changes can 
> >>> be
> >>> done independently to each others.
> >>>
> >>> Note: code added in acpi_tb_put_table() is actually a no-op but changes 
> >>> the
> >>> warning message into a warning once message. Lv Zheng.
> >>>
> >>
> >> This still seems to be unnecessary gymnastics to keep the validation
> >> count around and make it work for random drivers.
> >
> > Well, I'm not sure I agree here.
> >
> > If we can make it work at one point, it should not be too hard to
> > maintain that status.
> >
> 
> I agree with that, my concern was with driver writers needing to be
> worried about when it is safe to call acpi_put_table(). This reference
> count behaves differently than other reference counts like kobjects.

I don't think they behave differently.

"kref" needn't consider unbalanced "get/put".
Because when the drivers(users) are deploying "kref",
they are responsible for ensuring balanced "get/put".
"kref" needn't take too much care about "overflow/underflow"
as if all users ensure balanced "get/put",
"overflow/underflow" is not possible.
Occurrence of "overflow/underflow" means bugs.
And can be further captured as "panic".

If "kref" considers to "warn_once" overflow/underflow users,
the logic in this commit can also be introduced to kref.
However it's useless as all users have ensured balanced "get/put".
Putting useless check than panic on hot path could be a waste.

> The difference is not necessarily bad, but hopefully it can be
> contained within the acpi core.

The old warning logic for table desc is just derived from utdelete.c.
Which reduces communication cost when the mechanism is upstreamed.

ACPICA table "validation_count" is deployed on top of old design.
Where "table unmap" is forbidden for late stage.
Thus there is no users ensuring balanced "get/put".
Under this circumstances, when we start to deploy balanced "get/put",
we need to consider all users as a whole.
You cannot say current unbalanced "get/put" users have bugs.
They are there just because of historical reasons.

Fortunately after applying this patch,
drivers should be able to have a better environment to use the new APIs.

Cheers,
Lv


RE: [PATCH v5] ACPICA: Tables: Add mechanism to allow to balance late stage acpi_get_table() independently

2017-06-07 Thread Zheng, Lv
Hi, Dan

> From: Dan Williams [mailto:dan.j.willi...@intel.com]
> Subject: Re: [PATCH v5] ACPICA: Tables: Add mechanism to allow to balance 
> late stage acpi_get_table()
> independently
> 
> On Wed, Jun 7, 2017 at 2:14 PM, Rafael J. Wysocki  wrote:
> > On Wed, Jun 7, 2017 at 8:41 AM, Dan Williams  
> > wrote:
> >> On Tue, Jun 6, 2017 at 9:54 PM, Lv Zheng  wrote:
> >>> Considering this case:
> >>> 1. A program opens a sysfs table file 65535 times, it can increase
> >>>validation_count and first increment cause the table to be mapped:
> >>> validation_count = 65535
> >>> 2. AML execution causes "Load" to be executed on the same table, this time
> >>>it cannot increase validation_count, so validation_count remains:
> >>> validation_count = 65535
> >>> 3. The program closes sysfs table file 65535 times, it can decrease
> >>>validation_count and the last decrement cause the table to be unmapped:
> >>> validation_count = 0
> >>> 4. AML code still accessing the loaded table, kernel crash can be 
> >>> observed.
> >>>
> >>> This is because orginally ACPICA doesn't support unmapping tables during
> >>> OS late stage. So the current code only allows unmapping tables during OS
> >>> early stage, and for late stage, no acpi_put_table() clones should be
> >>> invoked, especially cases that can trigger frequent invocations of
> >>> acpi_get_table()/acpi_put_table() are forbidden:
> >>> 1. sysfs table accesses
> >>> 2. dynamic Load/Unload opcode executions
> >>> 3. acpi_load_table()
> >>> 4. etc.
> >>> Such frequent acpi_put_table() balance changes have to be done altogether.
> >>>
> >>> This philosophy is not convenient for Linux driver writers. Since the API
> >>> is just there, developers will start to use acpi_put_table() during late
> >>> stage. So we need to consider a better mechanism to allow them to safely
> >>> invoke acpi_put_table().
> >>>
> >>> This patch provides such a mechanism by adding a validation_count
> >>> threashold. When it is reached, the validation_count can no longer be
> >>> incremented/decremented to invalidate the table descriptor (means
> >>> preventing table unmappings) so that acpi_put_table() balance changes can 
> >>> be
> >>> done independently to each others.
> >>>
> >>> Note: code added in acpi_tb_put_table() is actually a no-op but changes 
> >>> the
> >>> warning message into a warning once message. Lv Zheng.
> >>>
> >>
> >> This still seems to be unnecessary gymnastics to keep the validation
> >> count around and make it work for random drivers.
> >
> > Well, I'm not sure I agree here.
> >
> > If we can make it work at one point, it should not be too hard to
> > maintain that status.
> >
> 
> I agree with that, my concern was with driver writers needing to be
> worried about when it is safe to call acpi_put_table(). This reference
> count behaves differently than other reference counts like kobjects.

I don't think they behave differently.

"kref" needn't consider unbalanced "get/put".
Because when the drivers(users) are deploying "kref",
they are responsible for ensuring balanced "get/put".
"kref" needn't take too much care about "overflow/underflow"
as if all users ensure balanced "get/put",
"overflow/underflow" is not possible.
Occurrence of "overflow/underflow" means bugs.
And can be further captured as "panic".

If "kref" considers to "warn_once" overflow/underflow users,
the logic in this commit can also be introduced to kref.
However it's useless as all users have ensured balanced "get/put".
Putting useless check than panic on hot path could be a waste.

> The difference is not necessarily bad, but hopefully it can be
> contained within the acpi core.

The old warning logic for table desc is just derived from utdelete.c.
Which reduces communication cost when the mechanism is upstreamed.

ACPICA table "validation_count" is deployed on top of old design.
Where "table unmap" is forbidden for late stage.
Thus there is no users ensuring balanced "get/put".
Under this circumstances, when we start to deploy balanced "get/put",
we need to consider all users as a whole.
You cannot say current unbalanced "get/put" users have bugs.
They are there just because of historical reasons.

Fortunately after applying this patch,
drivers should be able to have a better environment to use the new APIs.

Cheers,
Lv


RE: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the state is unknown

2017-06-07 Thread Zheng, Lv
Hi, Benjamin

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Benjamin
> Tissoires
> Subject: Re: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the 
> state is unknown
> 
> Hi Lv,
> 
> On Jun 05 2017 or thereabouts, Zheng, Lv wrote:
> > Hi, Benjamin
> >
> > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> > > Subject: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the 
> > > state is unknown
> 
> > My dell latitude 6430u test platform sends multiple Notify(lid) before 
> > suspend and after resume.
> 
> Does this platform requires the not lid_reliable check as per this
> series? Because if it doesn't, then we should not care.

No need to mark lid_reliable.

> > This is because the aml table puts many Notify(LID, 0x80) in various 
> > control methods.
> > And not one of them but multiple of them will be invoked by various OS 
> > drivers during suspend/resume
> period.
> > I think this is not an isolated platform that will invoke multiple 
> > redundant "Notify(lid)".
> >
> > Fortunately, the lid state for the multiple notify(lid) should be same as 
> > the first "Notify(lid)".
> > I suppose this is why SW_LID is invented, as it can really filter such 
> > redundant events.
> > And user space finally can only see 1 "close" event.
> >
> > But unconditionally prepending "open" before all "close" events surely can 
> > break the logic by
> > delivering multiple "close" events to the user space.
> 
> That doesn't matter. What matters is the state of the switch, not the
> event. So if user space receives (in case we marked the switch as not
> reliable) several close events, all user space will do is realize that
> the state is still closed and will act accordingly.

OK, I tried to address this here:
https://patchwork.kernel.org/patch/9771121/

> > Another issue is, for case 5, when we use button.lid_init_state=method.
> > Unconditionally prepending "open" before driver initiated "close" event
> > sent due to acpi_lid_initialize_state(), we will see suspend/resume cycles.
> 
> Case 5 is broken anyway and needs to be handled specially. It was not
> targeted in this WIP series.

It was addressed by button.lid_init_state=open and newer systemd.
It's not broken any more.


> > Thus if we consider both cases, we should:
> > 1. put a frequency check to filter possible redundant events.
> 
> This doesn't work and should be avoided. The state of the input switch
> is known to the input layer only, and given there are spinlocks, you can
> not know if the state is actually the one you expected beforehand.
> 
> You can however add frequency checks in the input handler, but that
> would assume the input layer is not doing its job properly and so should
> be avoided.

OK, I dropped frequency check mechanism.

> > 2. distinguish AML "Notify" call and button driver initiated lid 
> > notification.
> 
> Again, we don't care if the "event" comes from ACPI, the driver itself or
> user space (libinput). All that matters is the current state of the
> input node switch, that needs to match the physical world at any time.

That depends on the final test result.
However I managed to get systemd working with case 2,4 using this commit:
https://patchwork.kernel.org/patch/9771121/

> > This is another major differences between your proposal and mine.
> >
> > First of all, I think it should be in a separate patch.
> 
> Well, that's already a patch on its own :/
> 
> >
> > Second, I have concerns related to such a change:
> > I can see that, you are trying to address a problem that:
> > The input layer requires a determined initial SW_LID state while ACPI 
> > button driver cannot offer.
> > So by adding/removing input node, you can introduce a tristate SW_LID input 
> > node.
> 
> You can put it that way. I prefer putting it: "when we export the LID
> switch input node, you are guaranteed to have the proper state".
> 
> > However I doubt if this is necessary and can solve real issues, as:
> > systemd now works fine with button driver for all cases,
> 
> I do not care about systemd or the suspend lopps introduced by systemd.
> All I care is that the kernel provides correct behavior. If systemd can
> work around some issues we see because we are too lazy to fix them in
> the kernel (this is not a personal attack, sometimes being lazy is the
> right solution), fine. But the current state of this driver doesn't
> follow the specification

RE: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the state is unknown

2017-06-07 Thread Zheng, Lv
Hi, Benjamin

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Benjamin
> Tissoires
> Subject: Re: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the 
> state is unknown
> 
> Hi Lv,
> 
> On Jun 05 2017 or thereabouts, Zheng, Lv wrote:
> > Hi, Benjamin
> >
> > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> > > Subject: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the 
> > > state is unknown
> 
> > My dell latitude 6430u test platform sends multiple Notify(lid) before 
> > suspend and after resume.
> 
> Does this platform requires the not lid_reliable check as per this
> series? Because if it doesn't, then we should not care.

No need to mark lid_reliable.

> > This is because the aml table puts many Notify(LID, 0x80) in various 
> > control methods.
> > And not one of them but multiple of them will be invoked by various OS 
> > drivers during suspend/resume
> period.
> > I think this is not an isolated platform that will invoke multiple 
> > redundant "Notify(lid)".
> >
> > Fortunately, the lid state for the multiple notify(lid) should be same as 
> > the first "Notify(lid)".
> > I suppose this is why SW_LID is invented, as it can really filter such 
> > redundant events.
> > And user space finally can only see 1 "close" event.
> >
> > But unconditionally prepending "open" before all "close" events surely can 
> > break the logic by
> > delivering multiple "close" events to the user space.
> 
> That doesn't matter. What matters is the state of the switch, not the
> event. So if user space receives (in case we marked the switch as not
> reliable) several close events, all user space will do is realize that
> the state is still closed and will act accordingly.

OK, I tried to address this here:
https://patchwork.kernel.org/patch/9771121/

> > Another issue is, for case 5, when we use button.lid_init_state=method.
> > Unconditionally prepending "open" before driver initiated "close" event
> > sent due to acpi_lid_initialize_state(), we will see suspend/resume cycles.
> 
> Case 5 is broken anyway and needs to be handled specially. It was not
> targeted in this WIP series.

It was addressed by button.lid_init_state=open and newer systemd.
It's not broken any more.


> > Thus if we consider both cases, we should:
> > 1. put a frequency check to filter possible redundant events.
> 
> This doesn't work and should be avoided. The state of the input switch
> is known to the input layer only, and given there are spinlocks, you can
> not know if the state is actually the one you expected beforehand.
> 
> You can however add frequency checks in the input handler, but that
> would assume the input layer is not doing its job properly and so should
> be avoided.

OK, I dropped frequency check mechanism.

> > 2. distinguish AML "Notify" call and button driver initiated lid 
> > notification.
> 
> Again, we don't care if the "event" comes from ACPI, the driver itself or
> user space (libinput). All that matters is the current state of the
> input node switch, that needs to match the physical world at any time.

That depends on the final test result.
However I managed to get systemd working with case 2,4 using this commit:
https://patchwork.kernel.org/patch/9771121/

> > This is another major differences between your proposal and mine.
> >
> > First of all, I think it should be in a separate patch.
> 
> Well, that's already a patch on its own :/
> 
> >
> > Second, I have concerns related to such a change:
> > I can see that, you are trying to address a problem that:
> > The input layer requires a determined initial SW_LID state while ACPI 
> > button driver cannot offer.
> > So by adding/removing input node, you can introduce a tristate SW_LID input 
> > node.
> 
> You can put it that way. I prefer putting it: "when we export the LID
> switch input node, you are guaranteed to have the proper state".
> 
> > However I doubt if this is necessary and can solve real issues, as:
> > systemd now works fine with button driver for all cases,
> 
> I do not care about systemd or the suspend lopps introduced by systemd.
> All I care is that the kernel provides correct behavior. If systemd can
> work around some issues we see because we are too lazy to fix them in
> the kernel (this is not a personal attack, sometimes being lazy is the
> right solution), fine. But the current state of this driver doesn't
> follow the specification

RE: [WIP PATCH 4/4] ACPI: button: Fix lid notification locks

2017-06-07 Thread Zheng, Lv
Hi, Benjamin

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: Re: [WIP PATCH 4/4] ACPI: button: Fix lid notification locks
> 
> On Jun 05 2017 or thereabouts, Zheng, Lv wrote:
> > Hi,
> >
> > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> > > Subject: [WIP PATCH 4/4] ACPI: button: Fix lid notification locks
> > >
> > > From: Lv Zheng <lv.zh...@intel.com>
> > >
> > > acpi/button.c now contains the logic to avoid frequently replayed events
> > > which originally was ensured by using blocking notifier.
> > > On the contrary, using a blocking notifier is wrong as it could keep on
> > > returning NOTIFY_DONE, causing events lost.
> > >
> > > This patch thus changes lid notification to raw notifier in order not to
> > > have any events lost.
> >
> > This patch is on top of the following:
> > https://patchwork.kernel.org/patch/9756467/
> > where button driver implements a frequency check and
> > thus is capable of filtering redundant events itself:
> > I saw you have deleted it from PATCH 02.
> > So this patch is not applicable now.
> 
> I actually rebased it in this series. I kept your SoB line given that
> the idea came from you and the resulting patch was rather similar (only
> one hunk differs, but the meaning is the same).
> 
> >
> > Is input layer capable of filtering redundant events.
> 
> I don't think it does, and it should not. If an event is emitted, it has
> to be forwarded. However, the logic of the protocol makes that the only
> state that matters is when an EV_SYN is emitted. So if a SW_LID 0 then 1
> is sent between the 2 EV_SYN, and the state was 1 before, from a
> protocol point of view it's a no-op.
> 
> > I saw you unconditionally prepend "open" before "close",
> > which may make input layer incapable of filtering redundant close events.
> 
> Again, we don't care about events. We care about states, and those are
> only emitted when the lid is marked as non reliable.
> 
> >
> > If input layer is capable of filtering redundant events,
> > why don't you:
> > 1. drop this commit;
> > 2. remove all ACPI lid notifier APIs;
> > 3. change lid notifier callers to register notification via input layer?
> 
> Having the i915 driver listening to the input events is actually a good
> solution. Let me think about it a little bit more and I'll come back.

OK, then I'll drop the frequency check mechanism and drop patch 4/5.

Cheers,
Lv

> 
> Cheers,
> Benjamin
> 
> >
> > Thanks and best regards
> > Lv
> >
> > >
> > > Signed-off-by: Lv Zheng <lv.zh...@intel.com>
> > > Signed-off-by: Benjamin Tissoires <benjamin.tissoi...@redhat.com>
> > > ---
> > >  drivers/acpi/button.c | 68 
> > > ---
> > >  1 file changed, 27 insertions(+), 41 deletions(-)
> > >
> > > diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
> > > index 03e5981..1927b08 100644
> > > --- a/drivers/acpi/button.c
> > > +++ b/drivers/acpi/button.c
> > > @@ -114,7 +114,7 @@ struct acpi_button {
> > >
> > >  static DEFINE_MUTEX(button_input_lock);
> > >
> > > -static BLOCKING_NOTIFIER_HEAD(acpi_lid_notifier);
> > > +static RAW_NOTIFIER_HEAD(acpi_lid_notifier);
> > >  static struct acpi_device *lid_device;
> > >  static u8 lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
> > >
> > > @@ -179,14 +179,12 @@ static int acpi_lid_evaluate_state(struct 
> > > acpi_device *device)
> > >   return lid_state ? 1 : 0;
> > >  }
> > >
> > > -static int acpi_lid_notify_state(struct acpi_device *device, int state)
> > > +static void acpi_lid_notify_state(struct acpi_device *device, int state)
> > >  {
> > >   struct acpi_button *button = acpi_driver_data(device);
> > >
> > > - /* button_input_lock must be held */
> > > -
> > >   if (!button->input)
> > > - return 0;
> > > + return;
> > >
> > >   /*
> > >* If the lid is unreliable, always send an "open" event before any
> > > @@ -201,8 +199,6 @@ static int acpi_lid_notify_state(struct acpi_device 
> > > *device, int state)
> > >
> > >   if (state)
> > >   pm_wakeup_hard_event(>dev);
> > > -
> > > - return 0;
> > >  }
> > >
> > >  /*
> > > @@ -214,28 +210,14 @

RE: [WIP PATCH 4/4] ACPI: button: Fix lid notification locks

2017-06-07 Thread Zheng, Lv
Hi, Benjamin

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: Re: [WIP PATCH 4/4] ACPI: button: Fix lid notification locks
> 
> On Jun 05 2017 or thereabouts, Zheng, Lv wrote:
> > Hi,
> >
> > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> > > Subject: [WIP PATCH 4/4] ACPI: button: Fix lid notification locks
> > >
> > > From: Lv Zheng 
> > >
> > > acpi/button.c now contains the logic to avoid frequently replayed events
> > > which originally was ensured by using blocking notifier.
> > > On the contrary, using a blocking notifier is wrong as it could keep on
> > > returning NOTIFY_DONE, causing events lost.
> > >
> > > This patch thus changes lid notification to raw notifier in order not to
> > > have any events lost.
> >
> > This patch is on top of the following:
> > https://patchwork.kernel.org/patch/9756467/
> > where button driver implements a frequency check and
> > thus is capable of filtering redundant events itself:
> > I saw you have deleted it from PATCH 02.
> > So this patch is not applicable now.
> 
> I actually rebased it in this series. I kept your SoB line given that
> the idea came from you and the resulting patch was rather similar (only
> one hunk differs, but the meaning is the same).
> 
> >
> > Is input layer capable of filtering redundant events.
> 
> I don't think it does, and it should not. If an event is emitted, it has
> to be forwarded. However, the logic of the protocol makes that the only
> state that matters is when an EV_SYN is emitted. So if a SW_LID 0 then 1
> is sent between the 2 EV_SYN, and the state was 1 before, from a
> protocol point of view it's a no-op.
> 
> > I saw you unconditionally prepend "open" before "close",
> > which may make input layer incapable of filtering redundant close events.
> 
> Again, we don't care about events. We care about states, and those are
> only emitted when the lid is marked as non reliable.
> 
> >
> > If input layer is capable of filtering redundant events,
> > why don't you:
> > 1. drop this commit;
> > 2. remove all ACPI lid notifier APIs;
> > 3. change lid notifier callers to register notification via input layer?
> 
> Having the i915 driver listening to the input events is actually a good
> solution. Let me think about it a little bit more and I'll come back.

OK, then I'll drop the frequency check mechanism and drop patch 4/5.

Cheers,
Lv

> 
> Cheers,
> Benjamin
> 
> >
> > Thanks and best regards
> > Lv
> >
> > >
> > > Signed-off-by: Lv Zheng 
> > > Signed-off-by: Benjamin Tissoires 
> > > ---
> > >  drivers/acpi/button.c | 68 
> > > ---
> > >  1 file changed, 27 insertions(+), 41 deletions(-)
> > >
> > > diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
> > > index 03e5981..1927b08 100644
> > > --- a/drivers/acpi/button.c
> > > +++ b/drivers/acpi/button.c
> > > @@ -114,7 +114,7 @@ struct acpi_button {
> > >
> > >  static DEFINE_MUTEX(button_input_lock);
> > >
> > > -static BLOCKING_NOTIFIER_HEAD(acpi_lid_notifier);
> > > +static RAW_NOTIFIER_HEAD(acpi_lid_notifier);
> > >  static struct acpi_device *lid_device;
> > >  static u8 lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
> > >
> > > @@ -179,14 +179,12 @@ static int acpi_lid_evaluate_state(struct 
> > > acpi_device *device)
> > >   return lid_state ? 1 : 0;
> > >  }
> > >
> > > -static int acpi_lid_notify_state(struct acpi_device *device, int state)
> > > +static void acpi_lid_notify_state(struct acpi_device *device, int state)
> > >  {
> > >   struct acpi_button *button = acpi_driver_data(device);
> > >
> > > - /* button_input_lock must be held */
> > > -
> > >   if (!button->input)
> > > - return 0;
> > > + return;
> > >
> > >   /*
> > >* If the lid is unreliable, always send an "open" event before any
> > > @@ -201,8 +199,6 @@ static int acpi_lid_notify_state(struct acpi_device 
> > > *device, int state)
> > >
> > >   if (state)
> > >   pm_wakeup_hard_event(>dev);
> > > -
> > > - return 0;
> > >  }
> > >
> > >  /*
> > > @@ -214,28 +210,14 @@ static void acpi_button_lid_events(struct 
> > > input_hand

RE: [WIP PATCH 3/4] ACPI: button: Let input filter out the LID events

2017-06-04 Thread Zheng, Lv
Hi, Benjamin

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: [WIP PATCH 3/4] ACPI: button: Let input filter out the LID events
> 
> The input stack already filters out the LID events. So instead of
> filtering them out at the source, we can hook up after the input
> processing and propagate the lid switch events when the input stack
> tells us to.
> 
> An other benefit is that if userspace (think libinput) "fixes" the lid
> switch state by some heuristics, this new state is forwarded to the
> listeners in the kernel.

See my comments to PATCH 4.
IMO, it sounds better that
1. ACPI lid works as a driver of SW_LID, and
2. i915 registers notification (the only user) via input layer.
So it looks i915 rather than button driver should call input_register_handler().
And input layer may help to provide a simplified interface for drivers to 
register key notifications.

Thanks and best regards
Lv

> 
> Signed-off-by: Benjamin Tissoires 
> ---
>  drivers/acpi/button.c | 156 
> --
>  1 file changed, 139 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
> index 9ad7604..03e5981 100644
> --- a/drivers/acpi/button.c
> +++ b/drivers/acpi/button.c
> @@ -109,8 +109,6 @@ struct acpi_button {
>   struct input_dev *input;
>   char phys[32];  /* for input device */
>   unsigned long pushed;
> - int last_state;
> - ktime_t last_time;
>   bool suspended;
>  };
> 
> @@ -184,7 +182,6 @@ static int acpi_lid_evaluate_state(struct acpi_device 
> *device)
>  static int acpi_lid_notify_state(struct acpi_device *device, int state)
>  {
>   struct acpi_button *button = acpi_driver_data(device);
> - int ret;
> 
>   /* button_input_lock must be held */
> 
> @@ -205,20 +202,129 @@ static int acpi_lid_notify_state(struct acpi_device 
> *device, int state)
>   if (state)
>   pm_wakeup_hard_event(>dev);
> 
> - ret = blocking_notifier_call_chain(_lid_notifier, state, device);
> - if (ret == NOTIFY_DONE)
> - ret = blocking_notifier_call_chain(_lid_notifier, state,
> -device);
> - if (ret == NOTIFY_DONE || ret == NOTIFY_OK) {
> - /*
> -  * It is also regarded as success if the notifier_chain
> -  * returns NOTIFY_OK or NOTIFY_DONE.
> -  */
> - ret = 0;
> + return 0;
> +}
> +
> +/*
> + * Pass incoming event to all connected clients.
> + */
> +static void acpi_button_lid_events(struct input_handle *handle,
> +const struct input_value *vals,
> +unsigned int count)
> +{
> + const struct input_value *v;
> + int state = -1;
> + int ret;
> +
> + for (v = vals; v != vals + count; v++) {
> + switch (v->type) {
> + case EV_SYN:
> + if (v->code == SYN_REPORT && state >= 0) {
> + ret = 
> blocking_notifier_call_chain(_lid_notifier,
> + state,
> + lid_device);
> + if (ret == NOTIFY_DONE)
> + ret = 
> blocking_notifier_call_chain(_lid_notifier,
> + state,
> + lid_device);
> + if (ret == NOTIFY_DONE || ret == NOTIFY_OK) {
> + /*
> +  * It is also regarded as success if
> +  * the notifier_chain returns NOTIFY_OK
> +  * or NOTIFY_DONE.
> +  */
> + ret = 0;
> + }
> + }
> + break;
> + case EV_SW:
> + if (v->code == SW_LID)
> + state = !v->value;
> + break;
> + }
>   }
> - return ret;
>  }
> 
> +static int acpi_button_lid_connect(struct input_handler *handler,
> +struct input_dev *dev,
> +const struct input_device_id *id)
> +{
> + struct input_handle *handle;
> + int error;
> +
> + handle = kzalloc(sizeof(struct input_handle), GFP_KERNEL);
> + if (!handle)
> + return -ENOMEM;
> +
> + handle->dev = dev;
> + handle->handler = handler;
> + handle->name = "acpi-button-lid";
> +
> + error = input_register_handle(handle);
> + if (error) {
> + dev_err(_device->dev, "Error installing input handle\n");
> + goto err_free;
> + }
> +
> + 

RE: [WIP PATCH 3/4] ACPI: button: Let input filter out the LID events

2017-06-04 Thread Zheng, Lv
Hi, Benjamin

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: [WIP PATCH 3/4] ACPI: button: Let input filter out the LID events
> 
> The input stack already filters out the LID events. So instead of
> filtering them out at the source, we can hook up after the input
> processing and propagate the lid switch events when the input stack
> tells us to.
> 
> An other benefit is that if userspace (think libinput) "fixes" the lid
> switch state by some heuristics, this new state is forwarded to the
> listeners in the kernel.

See my comments to PATCH 4.
IMO, it sounds better that
1. ACPI lid works as a driver of SW_LID, and
2. i915 registers notification (the only user) via input layer.
So it looks i915 rather than button driver should call input_register_handler().
And input layer may help to provide a simplified interface for drivers to 
register key notifications.

Thanks and best regards
Lv

> 
> Signed-off-by: Benjamin Tissoires 
> ---
>  drivers/acpi/button.c | 156 
> --
>  1 file changed, 139 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
> index 9ad7604..03e5981 100644
> --- a/drivers/acpi/button.c
> +++ b/drivers/acpi/button.c
> @@ -109,8 +109,6 @@ struct acpi_button {
>   struct input_dev *input;
>   char phys[32];  /* for input device */
>   unsigned long pushed;
> - int last_state;
> - ktime_t last_time;
>   bool suspended;
>  };
> 
> @@ -184,7 +182,6 @@ static int acpi_lid_evaluate_state(struct acpi_device 
> *device)
>  static int acpi_lid_notify_state(struct acpi_device *device, int state)
>  {
>   struct acpi_button *button = acpi_driver_data(device);
> - int ret;
> 
>   /* button_input_lock must be held */
> 
> @@ -205,20 +202,129 @@ static int acpi_lid_notify_state(struct acpi_device 
> *device, int state)
>   if (state)
>   pm_wakeup_hard_event(>dev);
> 
> - ret = blocking_notifier_call_chain(_lid_notifier, state, device);
> - if (ret == NOTIFY_DONE)
> - ret = blocking_notifier_call_chain(_lid_notifier, state,
> -device);
> - if (ret == NOTIFY_DONE || ret == NOTIFY_OK) {
> - /*
> -  * It is also regarded as success if the notifier_chain
> -  * returns NOTIFY_OK or NOTIFY_DONE.
> -  */
> - ret = 0;
> + return 0;
> +}
> +
> +/*
> + * Pass incoming event to all connected clients.
> + */
> +static void acpi_button_lid_events(struct input_handle *handle,
> +const struct input_value *vals,
> +unsigned int count)
> +{
> + const struct input_value *v;
> + int state = -1;
> + int ret;
> +
> + for (v = vals; v != vals + count; v++) {
> + switch (v->type) {
> + case EV_SYN:
> + if (v->code == SYN_REPORT && state >= 0) {
> + ret = 
> blocking_notifier_call_chain(_lid_notifier,
> + state,
> + lid_device);
> + if (ret == NOTIFY_DONE)
> + ret = 
> blocking_notifier_call_chain(_lid_notifier,
> + state,
> + lid_device);
> + if (ret == NOTIFY_DONE || ret == NOTIFY_OK) {
> + /*
> +  * It is also regarded as success if
> +  * the notifier_chain returns NOTIFY_OK
> +  * or NOTIFY_DONE.
> +  */
> + ret = 0;
> + }
> + }
> + break;
> + case EV_SW:
> + if (v->code == SW_LID)
> + state = !v->value;
> + break;
> + }
>   }
> - return ret;
>  }
> 
> +static int acpi_button_lid_connect(struct input_handler *handler,
> +struct input_dev *dev,
> +const struct input_device_id *id)
> +{
> + struct input_handle *handle;
> + int error;
> +
> + handle = kzalloc(sizeof(struct input_handle), GFP_KERNEL);
> + if (!handle)
> + return -ENOMEM;
> +
> + handle->dev = dev;
> + handle->handler = handler;
> + handle->name = "acpi-button-lid";
> +
> + error = input_register_handle(handle);
> + if (error) {
> + dev_err(_device->dev, "Error installing input handle\n");
> + goto err_free;
> + }
> +
> + error = 

RE: [WIP PATCH 4/4] ACPI: button: Fix lid notification locks

2017-06-04 Thread Zheng, Lv
Hi,

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: [WIP PATCH 4/4] ACPI: button: Fix lid notification locks
> 
> From: Lv Zheng 
> 
> acpi/button.c now contains the logic to avoid frequently replayed events
> which originally was ensured by using blocking notifier.
> On the contrary, using a blocking notifier is wrong as it could keep on
> returning NOTIFY_DONE, causing events lost.
> 
> This patch thus changes lid notification to raw notifier in order not to
> have any events lost.

This patch is on top of the following:
https://patchwork.kernel.org/patch/9756467/
where button driver implements a frequency check and
thus is capable of filtering redundant events itself:
I saw you have deleted it from PATCH 02.
So this patch is not applicable now.

Is input layer capable of filtering redundant events.
I saw you unconditionally prepend "open" before "close",
which may make input layer incapable of filtering redundant close events.

If input layer is capable of filtering redundant events,
why don't you:
1. drop this commit;
2. remove all ACPI lid notifier APIs;
3. change lid notifier callers to register notification via input layer?

Thanks and best regards
Lv 

> 
> Signed-off-by: Lv Zheng 
> Signed-off-by: Benjamin Tissoires 
> ---
>  drivers/acpi/button.c | 68 
> ---
>  1 file changed, 27 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
> index 03e5981..1927b08 100644
> --- a/drivers/acpi/button.c
> +++ b/drivers/acpi/button.c
> @@ -114,7 +114,7 @@ struct acpi_button {
> 
>  static DEFINE_MUTEX(button_input_lock);
> 
> -static BLOCKING_NOTIFIER_HEAD(acpi_lid_notifier);
> +static RAW_NOTIFIER_HEAD(acpi_lid_notifier);
>  static struct acpi_device *lid_device;
>  static u8 lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
> 
> @@ -179,14 +179,12 @@ static int acpi_lid_evaluate_state(struct acpi_device 
> *device)
>   return lid_state ? 1 : 0;
>  }
> 
> -static int acpi_lid_notify_state(struct acpi_device *device, int state)
> +static void acpi_lid_notify_state(struct acpi_device *device, int state)
>  {
>   struct acpi_button *button = acpi_driver_data(device);
> 
> - /* button_input_lock must be held */
> -
>   if (!button->input)
> - return 0;
> + return;
> 
>   /*
>* If the lid is unreliable, always send an "open" event before any
> @@ -201,8 +199,6 @@ static int acpi_lid_notify_state(struct acpi_device 
> *device, int state)
> 
>   if (state)
>   pm_wakeup_hard_event(>dev);
> -
> - return 0;
>  }
> 
>  /*
> @@ -214,28 +210,14 @@ static void acpi_button_lid_events(struct input_handle 
> *handle,
>  {
>   const struct input_value *v;
>   int state = -1;
> - int ret;
> 
>   for (v = vals; v != vals + count; v++) {
>   switch (v->type) {
>   case EV_SYN:
> - if (v->code == SYN_REPORT && state >= 0) {
> - ret = 
> blocking_notifier_call_chain(_lid_notifier,
> + if (v->code == SYN_REPORT && state >= 0)
> + 
> (void)raw_notifier_call_chain(_lid_notifier,
>   state,
>   lid_device);
> - if (ret == NOTIFY_DONE)
> - ret = 
> blocking_notifier_call_chain(_lid_notifier,
> - state,
> - lid_device);
> - if (ret == NOTIFY_DONE || ret == NOTIFY_OK) {
> - /*
> -  * It is also regarded as success if
> -  * the notifier_chain returns NOTIFY_OK
> -  * or NOTIFY_DONE.
> -  */
> - ret = 0;
> - }
> - }
>   break;
>   case EV_SW:
>   if (v->code == SW_LID)
> @@ -433,13 +415,25 @@ static int acpi_button_remove_fs(struct acpi_device 
> *device)
> 
> -- */
>  int acpi_lid_notifier_register(struct notifier_block *nb)
>  {
> - return blocking_notifier_chain_register(_lid_notifier, nb);
> + return raw_notifier_chain_register(_lid_notifier, nb);
>  }
>  EXPORT_SYMBOL(acpi_lid_notifier_register);
> 
> +static inline int __acpi_lid_notifier_unregister(struct notifier_block *nb,
> +  bool sync)
> +{
> + int ret;
> +
> + ret = raw_notifier_chain_unregister(_lid_notifier, nb);
> 

RE: [WIP PATCH 4/4] ACPI: button: Fix lid notification locks

2017-06-04 Thread Zheng, Lv
Hi,

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: [WIP PATCH 4/4] ACPI: button: Fix lid notification locks
> 
> From: Lv Zheng 
> 
> acpi/button.c now contains the logic to avoid frequently replayed events
> which originally was ensured by using blocking notifier.
> On the contrary, using a blocking notifier is wrong as it could keep on
> returning NOTIFY_DONE, causing events lost.
> 
> This patch thus changes lid notification to raw notifier in order not to
> have any events lost.

This patch is on top of the following:
https://patchwork.kernel.org/patch/9756467/
where button driver implements a frequency check and
thus is capable of filtering redundant events itself:
I saw you have deleted it from PATCH 02.
So this patch is not applicable now.

Is input layer capable of filtering redundant events.
I saw you unconditionally prepend "open" before "close",
which may make input layer incapable of filtering redundant close events.

If input layer is capable of filtering redundant events,
why don't you:
1. drop this commit;
2. remove all ACPI lid notifier APIs;
3. change lid notifier callers to register notification via input layer?

Thanks and best regards
Lv 

> 
> Signed-off-by: Lv Zheng 
> Signed-off-by: Benjamin Tissoires 
> ---
>  drivers/acpi/button.c | 68 
> ---
>  1 file changed, 27 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
> index 03e5981..1927b08 100644
> --- a/drivers/acpi/button.c
> +++ b/drivers/acpi/button.c
> @@ -114,7 +114,7 @@ struct acpi_button {
> 
>  static DEFINE_MUTEX(button_input_lock);
> 
> -static BLOCKING_NOTIFIER_HEAD(acpi_lid_notifier);
> +static RAW_NOTIFIER_HEAD(acpi_lid_notifier);
>  static struct acpi_device *lid_device;
>  static u8 lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
> 
> @@ -179,14 +179,12 @@ static int acpi_lid_evaluate_state(struct acpi_device 
> *device)
>   return lid_state ? 1 : 0;
>  }
> 
> -static int acpi_lid_notify_state(struct acpi_device *device, int state)
> +static void acpi_lid_notify_state(struct acpi_device *device, int state)
>  {
>   struct acpi_button *button = acpi_driver_data(device);
> 
> - /* button_input_lock must be held */
> -
>   if (!button->input)
> - return 0;
> + return;
> 
>   /*
>* If the lid is unreliable, always send an "open" event before any
> @@ -201,8 +199,6 @@ static int acpi_lid_notify_state(struct acpi_device 
> *device, int state)
> 
>   if (state)
>   pm_wakeup_hard_event(>dev);
> -
> - return 0;
>  }
> 
>  /*
> @@ -214,28 +210,14 @@ static void acpi_button_lid_events(struct input_handle 
> *handle,
>  {
>   const struct input_value *v;
>   int state = -1;
> - int ret;
> 
>   for (v = vals; v != vals + count; v++) {
>   switch (v->type) {
>   case EV_SYN:
> - if (v->code == SYN_REPORT && state >= 0) {
> - ret = 
> blocking_notifier_call_chain(_lid_notifier,
> + if (v->code == SYN_REPORT && state >= 0)
> + 
> (void)raw_notifier_call_chain(_lid_notifier,
>   state,
>   lid_device);
> - if (ret == NOTIFY_DONE)
> - ret = 
> blocking_notifier_call_chain(_lid_notifier,
> - state,
> - lid_device);
> - if (ret == NOTIFY_DONE || ret == NOTIFY_OK) {
> - /*
> -  * It is also regarded as success if
> -  * the notifier_chain returns NOTIFY_OK
> -  * or NOTIFY_DONE.
> -  */
> - ret = 0;
> - }
> - }
>   break;
>   case EV_SW:
>   if (v->code == SW_LID)
> @@ -433,13 +415,25 @@ static int acpi_button_remove_fs(struct acpi_device 
> *device)
> 
> -- */
>  int acpi_lid_notifier_register(struct notifier_block *nb)
>  {
> - return blocking_notifier_chain_register(_lid_notifier, nb);
> + return raw_notifier_chain_register(_lid_notifier, nb);
>  }
>  EXPORT_SYMBOL(acpi_lid_notifier_register);
> 
> +static inline int __acpi_lid_notifier_unregister(struct notifier_block *nb,
> +  bool sync)
> +{
> + int ret;
> +
> + ret = raw_notifier_chain_unregister(_lid_notifier, nb);
> + if (sync)
> + synchronize_rcu();
> +
> + return 

RE: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the state is unknown

2017-06-04 Thread Zheng, Lv
Hi, Benjamin

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the 
> state is unknown
> 
> Because of the variation of firmware implementation, there is a chance
> the LID state is unknown:
> 1. Some platforms send "open" ACPI notification to the OS and the event
>arrive before the button driver is resumed;
> 2. Some platforms send "open" ACPI notification to the OS, but the event
>arrives after the button driver is resumed, ex., Samsung N210+;
> 3. Some platforms never send an "open" ACPI notification to the OS, but
>update the cached _LID return value to "open", and this update arrives
>before the button driver is resumed;
> 4. Some platforms never send an "open" ACPI notification to the OS, but
>update the cached _LID return value to "open", but this update arrives
>after the button driver is resumed, ex., Surface Pro 3;
> 5. Some platforms never send an "open" ACPI notification to the OS, and
>_LID ACPI method returns a value which stays to "close", ex.,
>Surface Pro 1.
> 
> We can mark the unreliable platform (cases 2, 4, 5 above) as such and make
> sure we do not export an input node with an unknown state to prevent
> suspend loops.
> 
> The database of unreliable devices is left to userspace to handle with
> a hwdb file and a udev rule.
> 
> Note that this patch removes the filtering of duplicate events when
> calling blocking_notifier_call_chain(), but this will be addressed in
> a following patch.
> 
> Signed-off-by: Benjamin Tissoires 
> ---
>  drivers/acpi/button.c | 207 
> --
>  1 file changed, 131 insertions(+), 76 deletions(-)
> 
> diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
> index 48bcdca..9ad7604 100644
> --- a/drivers/acpi/button.c
> +++ b/drivers/acpi/button.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -79,6 +80,8 @@ MODULE_DEVICE_TABLE(acpi, button_device_ids);
>  static int acpi_button_add(struct acpi_device *device);
>  static int acpi_button_remove(struct acpi_device *device);
>  static void acpi_button_notify(struct acpi_device *device, u32 event);
> +static int acpi_button_add_input(struct acpi_device *device);
> +static int acpi_lid_update_reliable(struct acpi_device *device);
> 
>  #ifdef CONFIG_PM_SLEEP
>  static int acpi_button_suspend(struct device *dev);
> @@ -111,6 +114,8 @@ struct acpi_button {
>   bool suspended;
>  };
> 
> +static DEFINE_MUTEX(button_input_lock);
> +
>  static BLOCKING_NOTIFIER_HEAD(acpi_lid_notifier);
>  static struct acpi_device *lid_device;
>  static u8 lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
> @@ -119,6 +124,44 @@ static unsigned long lid_report_interval __read_mostly = 
> 500;
>  module_param(lid_report_interval, ulong, 0644);
>  MODULE_PARM_DESC(lid_report_interval, "Interval (ms) between lid key 
> events");
> 
> +static bool lid_reliable = true;
> +
> +static int param_set_lid_reliable(const char *val,
> +   const struct kernel_param *kp)
> +{
> + bool prev_lid_reliable = lid_reliable;
> + int ret;
> +
> + mutex_lock(_input_lock);
> +
> + ret = param_set_bool(val, kp);
> + if (ret) {
> + mutex_unlock(_input_lock);
> + return ret;
> + }
> +
> + /*
> +  * prevent a loop when we show up the device to userspace because
> +  * of an acpi notification, and userspace immediately removes it
> +  * by marking it as unreliable when this was already known.
> +  */
> + if (lid_device && prev_lid_reliable != lid_reliable) {
> + ret = acpi_lid_update_reliable(lid_device);
> + if (ret)
> + lid_reliable = prev_lid_reliable;
> + }
> +
> + mutex_unlock(_input_lock);
> + return ret;
> +}
> +
> +static const struct kernel_param_ops lid_reliable_ops = {
> + .get = param_get_bool,
> + .set = param_set_lid_reliable,
> +};
> +module_param_cb(lid_reliable, _reliable_ops, _reliable, 0644);
> +MODULE_PARM_DESC(lid_reliable, "Is the LID switch reliable (true|false)?");
> +
>  /* --
>FS Interface (/proc)
> 
> -- */
> @@ -142,79 +185,22 @@ static int acpi_lid_notify_state(struct acpi_device 
> *device, int state)
>  {
>   struct acpi_button *button = acpi_driver_data(device);
>   int ret;
> - ktime_t next_report;
> - bool do_update;
> +
> + /* button_input_lock must be held */
> +
> + if (!button->input)
> + return 0;
> 
>   /*
> -  * In lid_init_state=ignore mode, if user opens/closes lid
> -  * frequently with "open" missing, and "last_time" is also updated
> -  * frequently, "close" cannot 

RE: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the state is unknown

2017-06-04 Thread Zheng, Lv
Hi, Benjamin

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the 
> state is unknown
> 
> Because of the variation of firmware implementation, there is a chance
> the LID state is unknown:
> 1. Some platforms send "open" ACPI notification to the OS and the event
>arrive before the button driver is resumed;
> 2. Some platforms send "open" ACPI notification to the OS, but the event
>arrives after the button driver is resumed, ex., Samsung N210+;
> 3. Some platforms never send an "open" ACPI notification to the OS, but
>update the cached _LID return value to "open", and this update arrives
>before the button driver is resumed;
> 4. Some platforms never send an "open" ACPI notification to the OS, but
>update the cached _LID return value to "open", but this update arrives
>after the button driver is resumed, ex., Surface Pro 3;
> 5. Some platforms never send an "open" ACPI notification to the OS, and
>_LID ACPI method returns a value which stays to "close", ex.,
>Surface Pro 1.
> 
> We can mark the unreliable platform (cases 2, 4, 5 above) as such and make
> sure we do not export an input node with an unknown state to prevent
> suspend loops.
> 
> The database of unreliable devices is left to userspace to handle with
> a hwdb file and a udev rule.
> 
> Note that this patch removes the filtering of duplicate events when
> calling blocking_notifier_call_chain(), but this will be addressed in
> a following patch.
> 
> Signed-off-by: Benjamin Tissoires 
> ---
>  drivers/acpi/button.c | 207 
> --
>  1 file changed, 131 insertions(+), 76 deletions(-)
> 
> diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
> index 48bcdca..9ad7604 100644
> --- a/drivers/acpi/button.c
> +++ b/drivers/acpi/button.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -79,6 +80,8 @@ MODULE_DEVICE_TABLE(acpi, button_device_ids);
>  static int acpi_button_add(struct acpi_device *device);
>  static int acpi_button_remove(struct acpi_device *device);
>  static void acpi_button_notify(struct acpi_device *device, u32 event);
> +static int acpi_button_add_input(struct acpi_device *device);
> +static int acpi_lid_update_reliable(struct acpi_device *device);
> 
>  #ifdef CONFIG_PM_SLEEP
>  static int acpi_button_suspend(struct device *dev);
> @@ -111,6 +114,8 @@ struct acpi_button {
>   bool suspended;
>  };
> 
> +static DEFINE_MUTEX(button_input_lock);
> +
>  static BLOCKING_NOTIFIER_HEAD(acpi_lid_notifier);
>  static struct acpi_device *lid_device;
>  static u8 lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
> @@ -119,6 +124,44 @@ static unsigned long lid_report_interval __read_mostly = 
> 500;
>  module_param(lid_report_interval, ulong, 0644);
>  MODULE_PARM_DESC(lid_report_interval, "Interval (ms) between lid key 
> events");
> 
> +static bool lid_reliable = true;
> +
> +static int param_set_lid_reliable(const char *val,
> +   const struct kernel_param *kp)
> +{
> + bool prev_lid_reliable = lid_reliable;
> + int ret;
> +
> + mutex_lock(_input_lock);
> +
> + ret = param_set_bool(val, kp);
> + if (ret) {
> + mutex_unlock(_input_lock);
> + return ret;
> + }
> +
> + /*
> +  * prevent a loop when we show up the device to userspace because
> +  * of an acpi notification, and userspace immediately removes it
> +  * by marking it as unreliable when this was already known.
> +  */
> + if (lid_device && prev_lid_reliable != lid_reliable) {
> + ret = acpi_lid_update_reliable(lid_device);
> + if (ret)
> + lid_reliable = prev_lid_reliable;
> + }
> +
> + mutex_unlock(_input_lock);
> + return ret;
> +}
> +
> +static const struct kernel_param_ops lid_reliable_ops = {
> + .get = param_get_bool,
> + .set = param_set_lid_reliable,
> +};
> +module_param_cb(lid_reliable, _reliable_ops, _reliable, 0644);
> +MODULE_PARM_DESC(lid_reliable, "Is the LID switch reliable (true|false)?");
> +
>  /* --
>FS Interface (/proc)
> 
> -- */
> @@ -142,79 +185,22 @@ static int acpi_lid_notify_state(struct acpi_device 
> *device, int state)
>  {
>   struct acpi_button *button = acpi_driver_data(device);
>   int ret;
> - ktime_t next_report;
> - bool do_update;
> +
> + /* button_input_lock must be held */
> +
> + if (!button->input)
> + return 0;
> 
>   /*
> -  * In lid_init_state=ignore mode, if user opens/closes lid
> -  * frequently with "open" missing, and "last_time" is also updated
> -  * frequently, "close" cannot be delivered to the userspace.
> 

RE: [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-04 Thread Zheng, Lv
Hi, Benjamin

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
> 
> Hi,
> 
> Sending this as a WIP as it still need a few changes, but it mostly works as
> expected (still not fully compliant yet).
> 
> So this is based on Lennart's comment in [1]: if the LID state is not 
> reliable,
> the kernel should not export the LID switch device as long as we are not sure
> about its state.
> 
> That is the basic idea, and here are some more general comments:
> Lv described the 5 cases in "RFC PATCH v3" regarding the LID switch.
> Let me rewrite them here (they are in patch 2):
> 
> 1. Some platforms send "open" ACPI notification to the OS and the event
>arrive before the button driver is resumed;
> 2. Some platforms send "open" ACPI notification to the OS, but the event
>arrives after the button driver is resumed, ex., Samsung N210+;
> 3. Some platforms never send an "open" ACPI notification to the OS, but
>update the cached _LID return value to "open", and this update arrives
>before the button driver is resumed;
> 4. Some platforms never send an "open" ACPI notification to the OS, but
>update the cached _LID return value to "open", but this update arrives
>after the button driver is resumed, ex., Surface Pro 3;
> 5. Some platforms never send an "open" ACPI notification to the OS, and
>_LID ACPI method returns a value which stays to "close", ex.,
>Surface Pro 1.
> 
> We we consider that we can mark the LID switch as unreliable and make it
> disappear when we are not certain of the state, we can consider cases 1, 2, 3
> are solved:

I have concerns with case 2.

> cases 1 and 3 are solved when the LID state is reliable (majority
> of existing laptops),

Agreed.

> and case 2 is solved just by marking when the LID is not
> reliable. When we go to sleep, we unregister the input node. We wait for
> the next ACPI notification to re-export the LID switch input node with the
> correct state.

According to the test, both case 2,4,5 have already been solved in systemd.
So we needn't do anything in kernel.

If you still want to improve in acpi button.
IMO, for case 2, 4, we really have chance to improve.

For example, we could just add a timer right after resume.
And before it is timed out, if we can see the BIOS notify, we delete the timer.
And after it is timed out, we report "lid init value" to input layer.

> Given that the "close" event is reliable, on platforms where the LID switch is
> not reliable for "open", we will get the "close" event when we will start
> exporting the switch at the input level.
> 
> Note that systemd currently doesn't sync the state when the input node just
> appears. This is a systemd bug, and it should not be handled by the kernel
> community.

According to the test, systemd should be ok now.
Why do we need to change it again?

> 
> For case 4, we are not aware at the acpi/button.c level when the state is 
> valid.
> We can solve this by polling every seconds for let's say 1 min, and if we 
> detect
> a change, then we can re-export the input node (this hasn't been implemented
> yet). After this delay, we can consider the state as valid and export the 
> input
> node with the current reported state in the ACPI.

Looks similar as the timer solution mentioned above.

> 
> However, this will conflict with case 5 where the ACPI value reported by
> the _LID method can be wrong anytime. We will need to treat this separately
> or find some other magic to make cases 4 and 5 compatible.

Case 5 is not compliant to SW_LID anyway.
However it works well with latest systemd.
Maybe we should just let it be and wait for further user request.

> 
> libinput will help cases 4 and 5 to restore the proper state, but that's
> assuming we have exported a wrong state. It might happen in case 5, but
> shouldn't in case 4.

IMO, if we improved case 2,4, libinput should only help to handle case 5.
Which is entirely not SW_LID compliant.

Thanks
Lv

> 
> Anyway, that is just a WIP which IMO is less hacky than the few other series.
> I still need to work on the udev/hwdb rules to have the list of problematic
> platforms in hwdb to not have them in the kernel, but that shouldn't be much
> of an issue. I also need to work on the polling but I'd like to get some 
> inputs
> from Lv, Peter and others before spending too much time on it.
> 
> Note: yes, there is a lot of boilerplate for the input handler and for the
> reliable state, but I think this simplifies the logic as we are all reliying
> on the input stack to filter duplicate events.
> One other benefit of this boilerplate is that when libinput changes the LID
> state, i915 and nouveau will get notified.
> 
> Cheers,
> Benjamin
> 
> 
> [1] https://github.com/systemd/systemd/issues/2807
> 
> Benjamin Tissoires (3):
>   ACPI: button: extract input creation/destruction helpers
>   ACPI: button: remove the LID input node when the state is 

RE: [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

2017-06-04 Thread Zheng, Lv
Hi, Benjamin

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
> 
> Hi,
> 
> Sending this as a WIP as it still need a few changes, but it mostly works as
> expected (still not fully compliant yet).
> 
> So this is based on Lennart's comment in [1]: if the LID state is not 
> reliable,
> the kernel should not export the LID switch device as long as we are not sure
> about its state.
> 
> That is the basic idea, and here are some more general comments:
> Lv described the 5 cases in "RFC PATCH v3" regarding the LID switch.
> Let me rewrite them here (they are in patch 2):
> 
> 1. Some platforms send "open" ACPI notification to the OS and the event
>arrive before the button driver is resumed;
> 2. Some platforms send "open" ACPI notification to the OS, but the event
>arrives after the button driver is resumed, ex., Samsung N210+;
> 3. Some platforms never send an "open" ACPI notification to the OS, but
>update the cached _LID return value to "open", and this update arrives
>before the button driver is resumed;
> 4. Some platforms never send an "open" ACPI notification to the OS, but
>update the cached _LID return value to "open", but this update arrives
>after the button driver is resumed, ex., Surface Pro 3;
> 5. Some platforms never send an "open" ACPI notification to the OS, and
>_LID ACPI method returns a value which stays to "close", ex.,
>Surface Pro 1.
> 
> We we consider that we can mark the LID switch as unreliable and make it
> disappear when we are not certain of the state, we can consider cases 1, 2, 3
> are solved:

I have concerns with case 2.

> cases 1 and 3 are solved when the LID state is reliable (majority
> of existing laptops),

Agreed.

> and case 2 is solved just by marking when the LID is not
> reliable. When we go to sleep, we unregister the input node. We wait for
> the next ACPI notification to re-export the LID switch input node with the
> correct state.

According to the test, both case 2,4,5 have already been solved in systemd.
So we needn't do anything in kernel.

If you still want to improve in acpi button.
IMO, for case 2, 4, we really have chance to improve.

For example, we could just add a timer right after resume.
And before it is timed out, if we can see the BIOS notify, we delete the timer.
And after it is timed out, we report "lid init value" to input layer.

> Given that the "close" event is reliable, on platforms where the LID switch is
> not reliable for "open", we will get the "close" event when we will start
> exporting the switch at the input level.
> 
> Note that systemd currently doesn't sync the state when the input node just
> appears. This is a systemd bug, and it should not be handled by the kernel
> community.

According to the test, systemd should be ok now.
Why do we need to change it again?

> 
> For case 4, we are not aware at the acpi/button.c level when the state is 
> valid.
> We can solve this by polling every seconds for let's say 1 min, and if we 
> detect
> a change, then we can re-export the input node (this hasn't been implemented
> yet). After this delay, we can consider the state as valid and export the 
> input
> node with the current reported state in the ACPI.

Looks similar as the timer solution mentioned above.

> 
> However, this will conflict with case 5 where the ACPI value reported by
> the _LID method can be wrong anytime. We will need to treat this separately
> or find some other magic to make cases 4 and 5 compatible.

Case 5 is not compliant to SW_LID anyway.
However it works well with latest systemd.
Maybe we should just let it be and wait for further user request.

> 
> libinput will help cases 4 and 5 to restore the proper state, but that's
> assuming we have exported a wrong state. It might happen in case 5, but
> shouldn't in case 4.

IMO, if we improved case 2,4, libinput should only help to handle case 5.
Which is entirely not SW_LID compliant.

Thanks
Lv

> 
> Anyway, that is just a WIP which IMO is less hacky than the few other series.
> I still need to work on the udev/hwdb rules to have the list of problematic
> platforms in hwdb to not have them in the kernel, but that shouldn't be much
> of an issue. I also need to work on the polling but I'd like to get some 
> inputs
> from Lv, Peter and others before spending too much time on it.
> 
> Note: yes, there is a lot of boilerplate for the input handler and for the
> reliable state, but I think this simplifies the logic as we are all reliying
> on the input stack to filter duplicate events.
> One other benefit of this boilerplate is that when libinput changes the LID
> state, i915 and nouveau will get notified.
> 
> Cheers,
> Benjamin
> 
> 
> [1] https://github.com/systemd/systemd/issues/2807
> 
> Benjamin Tissoires (3):
>   ACPI: button: extract input creation/destruction helpers
>   ACPI: button: remove the LID input node when the state is 

RE: [GIT PULL] ACPI fixes for v4.12-rc4

2017-06-04 Thread Zheng, Lv
Hi,

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Rafael J.
> Wysocki
> Subject: [GIT PULL] ACPI fixes for v4.12-rc4
> 
> Hi Linus,
> 
> Please pull from the tag
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
>  acpi-4.12-rc4
> 
> with top-most commit 60319130254084b337e02439d3b4ec301b6328bb
> 
>  Merge branches 'acpi-button', 'acpica' and 'acpi-sysfs'
> 
> on top of commit 5ed02dbb497422bf225783f46e6eadd237d23d6b
> 
>  Linux 4.12-rc3
> 
> to receive ACPI fixes for v4.12-rc4.
> 
> These revert one more problematic commit related to the ACPI-based
> handling of laptop lids and make some unuseful error messages coming
> from ACPICA go away.
> 
> Specifics:
> 
>  - Revert one more commit related to the ACPI-based handling of
>laptop lids that changed the default behavior on laptops that
>booted with closed lids and introduced a regression there
>(Benjamin Tissoires).
> 
>  - Add a missing acpi_put_table() to the code implementing the
>/sys/firmware/acpi/tables interface to prevent a counter in
>the ACPICA core from overflowing (Dan Williams).
> 
>  - Drop error messages printed by ACPICA on acpi_get_table()
>reference counting mismatches as they need not indicate real
>errors at this point (Lv Zheng).
> 
> Thanks!
> 
> 
> ---
> 
> Benjamin Tissoires (1):
>   Revert "ACPI / button: Change default behavior to lid_init_state=open"
> 
> Dan Williams (1):
>   ACPI / sysfs: fix acpi_get_table() leak / acpi-sysfs denial of service

This commit can trigger regression as mentioned by this discussion:
https://patchwork.kernel.org/patch/9717073/
So if this commit is accepted by the upstream, the above commit should also be 
in upstream in order not to regress.
Do you need to me refine it and re-send it to the community?
It's a bit slow in ACPICA upstream as ACPICA upstream is frozen for spec 6.2 
support.

Thanks and best regards
Lv

> 
> Lv Zheng (1):
>   ACPICA: Tables: Fix regression introduced by a too early
> mechanism enabling
> 
> ---
> 
>  drivers/acpi/acpica/tbutils.c | 4 
>  drivers/acpi/button.c | 2 +-
>  drivers/acpi/sysfs.c  | 7 +--
>  3 files changed, 6 insertions(+), 7 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [GIT PULL] ACPI fixes for v4.12-rc4

2017-06-04 Thread Zheng, Lv
Hi,

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Rafael J.
> Wysocki
> Subject: [GIT PULL] ACPI fixes for v4.12-rc4
> 
> Hi Linus,
> 
> Please pull from the tag
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
>  acpi-4.12-rc4
> 
> with top-most commit 60319130254084b337e02439d3b4ec301b6328bb
> 
>  Merge branches 'acpi-button', 'acpica' and 'acpi-sysfs'
> 
> on top of commit 5ed02dbb497422bf225783f46e6eadd237d23d6b
> 
>  Linux 4.12-rc3
> 
> to receive ACPI fixes for v4.12-rc4.
> 
> These revert one more problematic commit related to the ACPI-based
> handling of laptop lids and make some unuseful error messages coming
> from ACPICA go away.
> 
> Specifics:
> 
>  - Revert one more commit related to the ACPI-based handling of
>laptop lids that changed the default behavior on laptops that
>booted with closed lids and introduced a regression there
>(Benjamin Tissoires).
> 
>  - Add a missing acpi_put_table() to the code implementing the
>/sys/firmware/acpi/tables interface to prevent a counter in
>the ACPICA core from overflowing (Dan Williams).
> 
>  - Drop error messages printed by ACPICA on acpi_get_table()
>reference counting mismatches as they need not indicate real
>errors at this point (Lv Zheng).
> 
> Thanks!
> 
> 
> ---
> 
> Benjamin Tissoires (1):
>   Revert "ACPI / button: Change default behavior to lid_init_state=open"
> 
> Dan Williams (1):
>   ACPI / sysfs: fix acpi_get_table() leak / acpi-sysfs denial of service

This commit can trigger regression as mentioned by this discussion:
https://patchwork.kernel.org/patch/9717073/
So if this commit is accepted by the upstream, the above commit should also be 
in upstream in order not to regress.
Do you need to me refine it and re-send it to the community?
It's a bit slow in ACPICA upstream as ACPICA upstream is frozen for spec 6.2 
support.

Thanks and best regards
Lv

> 
> Lv Zheng (1):
>   ACPICA: Tables: Fix regression introduced by a too early
> mechanism enabling
> 
> ---
> 
>  drivers/acpi/acpica/tbutils.c | 4 
>  drivers/acpi/button.c | 2 +-
>  drivers/acpi/sysfs.c  | 7 +--
>  3 files changed, 6 insertions(+), 7 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH v3 5/5] ACPI: button: Always notify kernel space using _LID returning value

2017-05-31 Thread Zheng, Lv
Hi,

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: Re: [RFC PATCH v3 5/5] ACPI: button: Always notify kernel space 
> using _LID returning value
> 
> Hi Lv,
> 
> On May 27 2017 or thereabouts, Lv Zheng wrote:
> > Both nouveau and i915, the only 2 kernel space lid notification listeners,
> > invoke acpi_lid_open() API to obtain _LID returning value instead of using
> > the notified value.
> >
> > So this patch moves this logic from listeners to lid driver, always notify
> > kernel space listeners using _LID returning value.
> >
> > This is a no-op cleanup, but facilitates administrators to configure to
> > notify kernel drivers with faked lid init states via command line
> > "button.lid_notify_init_state=Y".
> >
> > Cc: 
> > Cc: 
> > Cc: Benjamin Tissoires 
> > Cc: Peter Hutterer 
> > Signed-off-by: Lv Zheng 
> > ---
> >  drivers/acpi/button.c | 16 ++--
> >  drivers/gpu/drm/i915/intel_lvds.c |  2 +-
> >  2 files changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
> > index 4abf8ae..e047d34 100644
> > --- a/drivers/acpi/button.c
> > +++ b/drivers/acpi/button.c
> > @@ -119,6 +119,9 @@ static u8 lid_init_state = ACPI_BUTTON_LID_INIT_OPEN;
> >  static unsigned long lid_report_interval __read_mostly = 500;
> >  module_param(lid_report_interval, ulong, 0644);
> >  MODULE_PARM_DESC(lid_report_interval, "Interval (ms) between lid key 
> > events");
> > +static bool lid_notify_init_state __read_mostly = false;
> > +module_param(lid_notify_init_state, bool, 0644);
> > +MODULE_PARM_DESC(lid_notify_init_state, "Notify init lid state to kernel 
> > drivers after
> boot/resume");
> >
> >  /* 
> > --
> >FS Interface (/proc)
> > @@ -224,6 +227,15 @@ static void acpi_lid_notify_state(struct acpi_device 
> > *device,
> > if (state)
> > pm_wakeup_event(>dev, 0);
> >
> > +   if (!lid_notify_init_state) {
> > +   /*
> > +* There are cases "state" is not a _LID return value, so
> > +* correct it before notification.
> > +*/
> > +   if (!bios_notify &&
> > +   lid_init_state != ACPI_BUTTON_LID_INIT_METHOD)
> > +   state = acpi_lid_evaluate_state(device);
> > +   }
> > acpi_lid_notifier_call(device, state);
> >  }
> >
> > @@ -572,10 +584,10 @@ static int param_set_lid_init_state(const char *val, 
> > struct kernel_param *kp)
> >
> > if (!strncmp(val, "open", sizeof("open") - 1)) {
> > lid_init_state = ACPI_BUTTON_LID_INIT_OPEN;
> > -   pr_info("Notify initial lid state as open\n");
> > +   pr_info("Notify initial lid state to users space as open and 
> > kernel drivers with _LID
> return value\n");
> > } else if (!strncmp(val, "method", sizeof("method") - 1)) {
> > lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
> > -   pr_info("Notify initial lid state with _LID return value\n");
> > +   pr_info("Notify initial lid state to user/kernel space with 
> > _LID return value\n");
> > } else if (!strncmp(val, "ignore", sizeof("ignore") - 1)) {
> > lid_init_state = ACPI_BUTTON_LID_INIT_IGNORE;
> > pr_info("Do not notify initial lid state\n");
> > diff --git a/drivers/gpu/drm/i915/intel_lvds.c 
> > b/drivers/gpu/drm/i915/intel_lvds.c
> > index 9ca4dc4..8ca9080 100644
> > --- a/drivers/gpu/drm/i915/intel_lvds.c
> > +++ b/drivers/gpu/drm/i915/intel_lvds.c
> > @@ -548,7 +548,7 @@ static int intel_lid_notify(struct notifier_block *nb, 
> > unsigned long val,
> > /* Don't force modeset on machines where it causes a GPU lockup */
> > if (dmi_check_system(intel_no_modeset_on_lid))
> > goto exit;
> > -   if (!acpi_lid_open()) {
> > +   if (!val) {
> > /* do modeset on next lid open event */
> > dev_priv->modeset_restore = MODESET_ON_LID_OPEN;
> > goto exit;
> 
> This last hunk should really be in its own patch because the intel GPU
> folks would need to apply the rest of the series for their CI suite, and
> also because there is no reason for this change to be alongside any
> other acpi/button.c change.

OK, I'll drop i915 related changes.
However I can see cleanup chances in button.c.
I feel I should at least do minimal tunings in button driver to allow future 
improvements.

Cheers,
Lv

> Cheers,
> Benjamin


RE: [RFC PATCH v3 5/5] ACPI: button: Always notify kernel space using _LID returning value

2017-05-31 Thread Zheng, Lv
Hi,

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: Re: [RFC PATCH v3 5/5] ACPI: button: Always notify kernel space 
> using _LID returning value
> 
> Hi Lv,
> 
> On May 27 2017 or thereabouts, Lv Zheng wrote:
> > Both nouveau and i915, the only 2 kernel space lid notification listeners,
> > invoke acpi_lid_open() API to obtain _LID returning value instead of using
> > the notified value.
> >
> > So this patch moves this logic from listeners to lid driver, always notify
> > kernel space listeners using _LID returning value.
> >
> > This is a no-op cleanup, but facilitates administrators to configure to
> > notify kernel drivers with faked lid init states via command line
> > "button.lid_notify_init_state=Y".
> >
> > Cc: 
> > Cc: 
> > Cc: Benjamin Tissoires 
> > Cc: Peter Hutterer 
> > Signed-off-by: Lv Zheng 
> > ---
> >  drivers/acpi/button.c | 16 ++--
> >  drivers/gpu/drm/i915/intel_lvds.c |  2 +-
> >  2 files changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
> > index 4abf8ae..e047d34 100644
> > --- a/drivers/acpi/button.c
> > +++ b/drivers/acpi/button.c
> > @@ -119,6 +119,9 @@ static u8 lid_init_state = ACPI_BUTTON_LID_INIT_OPEN;
> >  static unsigned long lid_report_interval __read_mostly = 500;
> >  module_param(lid_report_interval, ulong, 0644);
> >  MODULE_PARM_DESC(lid_report_interval, "Interval (ms) between lid key 
> > events");
> > +static bool lid_notify_init_state __read_mostly = false;
> > +module_param(lid_notify_init_state, bool, 0644);
> > +MODULE_PARM_DESC(lid_notify_init_state, "Notify init lid state to kernel 
> > drivers after
> boot/resume");
> >
> >  /* 
> > --
> >FS Interface (/proc)
> > @@ -224,6 +227,15 @@ static void acpi_lid_notify_state(struct acpi_device 
> > *device,
> > if (state)
> > pm_wakeup_event(>dev, 0);
> >
> > +   if (!lid_notify_init_state) {
> > +   /*
> > +* There are cases "state" is not a _LID return value, so
> > +* correct it before notification.
> > +*/
> > +   if (!bios_notify &&
> > +   lid_init_state != ACPI_BUTTON_LID_INIT_METHOD)
> > +   state = acpi_lid_evaluate_state(device);
> > +   }
> > acpi_lid_notifier_call(device, state);
> >  }
> >
> > @@ -572,10 +584,10 @@ static int param_set_lid_init_state(const char *val, 
> > struct kernel_param *kp)
> >
> > if (!strncmp(val, "open", sizeof("open") - 1)) {
> > lid_init_state = ACPI_BUTTON_LID_INIT_OPEN;
> > -   pr_info("Notify initial lid state as open\n");
> > +   pr_info("Notify initial lid state to users space as open and 
> > kernel drivers with _LID
> return value\n");
> > } else if (!strncmp(val, "method", sizeof("method") - 1)) {
> > lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
> > -   pr_info("Notify initial lid state with _LID return value\n");
> > +   pr_info("Notify initial lid state to user/kernel space with 
> > _LID return value\n");
> > } else if (!strncmp(val, "ignore", sizeof("ignore") - 1)) {
> > lid_init_state = ACPI_BUTTON_LID_INIT_IGNORE;
> > pr_info("Do not notify initial lid state\n");
> > diff --git a/drivers/gpu/drm/i915/intel_lvds.c 
> > b/drivers/gpu/drm/i915/intel_lvds.c
> > index 9ca4dc4..8ca9080 100644
> > --- a/drivers/gpu/drm/i915/intel_lvds.c
> > +++ b/drivers/gpu/drm/i915/intel_lvds.c
> > @@ -548,7 +548,7 @@ static int intel_lid_notify(struct notifier_block *nb, 
> > unsigned long val,
> > /* Don't force modeset on machines where it causes a GPU lockup */
> > if (dmi_check_system(intel_no_modeset_on_lid))
> > goto exit;
> > -   if (!acpi_lid_open()) {
> > +   if (!val) {
> > /* do modeset on next lid open event */
> > dev_priv->modeset_restore = MODESET_ON_LID_OPEN;
> > goto exit;
> 
> This last hunk should really be in its own patch because the intel GPU
> folks would need to apply the rest of the series for their CI suite, and
> also because there is no reason for this change to be alongside any
> other acpi/button.c change.

OK, I'll drop i915 related changes.
However I can see cleanup chances in button.c.
I feel I should at least do minimal tunings in button driver to allow future 
improvements.

Cheers,
Lv

> Cheers,
> Benjamin


RE: [RFC PATCH v3 1/5] ACPI: button: Add indication of BIOS notification and faked events

2017-05-31 Thread Zheng, Lv
Hi,

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Benjamin
> Tissoires
> Subject: Re: [RFC PATCH v3 1/5] ACPI: button: Add indication of BIOS 
> notification and faked events
> 
> Hi Lv,
> 
> On May 27 2017 or thereabouts, Lv Zheng wrote:
> > This patch adds a parameter to acpi_lid_notify_state() so that it can act
> > differently against BIOS notification and kernel faked events.
> >
> > Cc: 
> > Cc: Benjamin Tissoires 
> > Cc: Peter Hutterer 
> > Signed-off-by: Lv Zheng 
> > ---
> 
> Answering to this one for the entire series:
> last week was a mix of public holidays and PTO from me. I was only
> able to review this series today, so sorry for the delay.

Here we were having "the Dragon Boat Festival".
But we really won't have chances of seeing see dragon boats in major cities.

> I still have a feeling this driver is far too engineered for a simple
> input node. There are internal states, defers, mangle of events and too
> many kernel parameters.

That's the firmware world and windows compliance world. :)

> I still need to get my head around it, but the more I think of it, the
> more I think the solution provided by Lennart in
> https://github.com/systemd/systemd/issues/2807 is the simplest one:
> when we are not sure about the state of the LID switch because _LID
> might be wrong, we shouldn't export a LID input node.
> Which means that all broken cases would be fixed by just a quirk
> "unreliable lid switch".

I checked the post and had no idea about what was going on.
However, my test shows systemd 233 works fine with button.lid_init_state=ignore.
I don't know what has been improved.

But a noticeable thing on old systemd 229 is:
Even with button.lid_init_state=open, systemd still behaves strangely.
It looks systemd 229 really has a problem with its timeout logic.
And in systemd 233, the timeout mechanism seems to work better.

Anyway, the problem disappears when there are only user space changes.

> Give me a day or two to get this in a better shape.

Sure.

Cheers,
Lv

> 
> Cheers,
> Benjamin
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH v3 1/5] ACPI: button: Add indication of BIOS notification and faked events

2017-05-31 Thread Zheng, Lv
Hi,

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Benjamin
> Tissoires
> Subject: Re: [RFC PATCH v3 1/5] ACPI: button: Add indication of BIOS 
> notification and faked events
> 
> Hi Lv,
> 
> On May 27 2017 or thereabouts, Lv Zheng wrote:
> > This patch adds a parameter to acpi_lid_notify_state() so that it can act
> > differently against BIOS notification and kernel faked events.
> >
> > Cc: 
> > Cc: Benjamin Tissoires 
> > Cc: Peter Hutterer 
> > Signed-off-by: Lv Zheng 
> > ---
> 
> Answering to this one for the entire series:
> last week was a mix of public holidays and PTO from me. I was only
> able to review this series today, so sorry for the delay.

Here we were having "the Dragon Boat Festival".
But we really won't have chances of seeing see dragon boats in major cities.

> I still have a feeling this driver is far too engineered for a simple
> input node. There are internal states, defers, mangle of events and too
> many kernel parameters.

That's the firmware world and windows compliance world. :)

> I still need to get my head around it, but the more I think of it, the
> more I think the solution provided by Lennart in
> https://github.com/systemd/systemd/issues/2807 is the simplest one:
> when we are not sure about the state of the LID switch because _LID
> might be wrong, we shouldn't export a LID input node.
> Which means that all broken cases would be fixed by just a quirk
> "unreliable lid switch".

I checked the post and had no idea about what was going on.
However, my test shows systemd 233 works fine with button.lid_init_state=ignore.
I don't know what has been improved.

But a noticeable thing on old systemd 229 is:
Even with button.lid_init_state=open, systemd still behaves strangely.
It looks systemd 229 really has a problem with its timeout logic.
And in systemd 233, the timeout mechanism seems to work better.

Anyway, the problem disappears when there are only user space changes.

> Give me a day or two to get this in a better shape.

Sure.

Cheers,
Lv

> 
> Cheers,
> Benjamin
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] Revert "ACPI / button: Change default behavior to lid_init_state=open"

2017-05-25 Thread Zheng, Lv
Hi,

> >> >> >> Benjamin, my understanding is that this is the case, is it correct?
> >> >> >
> >> >> > That is correct. This patch I reverted introduces regression for 
> >> >> > professional
> >> >> > laptops that expect the LID switch to be reported accurately.
> >> >>
> >> >> And from a user's perspective, what does not work any more?
> >> >
> >> > If you boot or resume your laptop with the lid closed on a docking
> >> > station while using an external monitor connected to it, both internal
> >> > and external displays will light on, while only the external should.
> >> >
> >> > There is a design choice in gdm to only provide the greater on the
> >> > internal display when lit on, so users only see a gray area on the
> >> > external monitor. Also, the cursor will not show up as it's by default
> >> > on the internal display too.
> >> >
> >> > To "fix" that, users have to open the laptop once and close it once
> >> > again to sync the state of the switch with the hardware state.
> >>
> >> OK
> >>
> >> Yeah, that sucks.
> >>
> >> So without the Lv's patch the behavior (on the systems in question) is
> >> as expected, right?
> >
> > Would you agree to take both these reverts without Lv's ACK? We already
> > tried to explain for 2 weeks that they are valuable, but it seems we
> > can't make change his mind.

It's not that difficult to get an agreement.
We just didn't communicate well.

> One of the reverts actually is already in (as a patch from Lv) and
> I'll most probably push the other one for -rc4 next week.

If we really want to go back to "method" mode.
We need one more patch and it is not in Benjamin's series.

There are 3 known broken cases, 2 of them are related to orders.
The last 1 is not order related:
1. Surface Pro 3: open arrives very early to update cached value, not 
notification
2. Samsung N210+: open arrives very late to update cached value, not 
notification 
3. Surface Pro 1: no open event, _LID keeps on returning close

The order problem is (considering method mode):

_Qxx <- Invoked due to EC events
  Update _LID return value
  Notify(LID, close)
input_report(SW_LID 1) -> captured by user space and system starts to 
suspend
acpi_button_suspend
acpi_ec_suspend
  acpi_ec_disable_event
acpi_button_resume
  if (method)
input_report(SW_LID, _LID return value, would be 1 for cached value)
acpi_ec_resume
  acpi_ec_enable_event
_Qxx <- Invoked due to EC events, for broken case 3, no such event
  Update _LID return value
  Notify(LID, open) <- for broken case 1, 2, 3, no such notification, thus open 
cannot be delivered to user space.
input_report(SW_LID, 0)

The order of acpi_button_resume()/acpi_ec_resume() is determined by the 
enumeration order.
So it could vary on different platforms.
Considering case 1, for surface pro 3, where acpi_button_resume() is invoked 
before acpi_ec_resume().
Button driver will send false "close" to user space, and the updated "open" 
state won't be delivered to user space.
Staying in method mode, we can only suspend the system once, follow-up "close" 
events won't arrive to user space.

Even we can add many workarounds to make sure acpi_ec_resume() is executed 
before acpi_button_resume() on such platforms.
We still cannot fix case 2 and case 3.
So finally this order still cannot be ensured, and the solution is still not 
stable.
I would imagine the order problem is the key reason why MS stops sending "open" 
on these platforms.

Then given this order issue is not fixable, we need to fix broken case 1 by 
adding "complement event" in "method" mode.
Why don't we do this first before reverting to "method" mode?

And for broken stuffs like case 2/case 3,
IMO, they can only be solved using "ignore" mode, and changes in user space.
If we don't use this solution, we still need libinput quirks to handle them 
(may possibly encounter some new unfixable problems).
If you want to stay in "method" mode, will you be responsible for responding 
end users on kernel Bugzilla using libinput quirks?
We have a Power-Lid category now, we can have you guys as default assignee.

Cheers,
Lv


RE: [PATCH 2/2] Revert "ACPI / button: Change default behavior to lid_init_state=open"

2017-05-25 Thread Zheng, Lv
Hi,

> >> >> >> Benjamin, my understanding is that this is the case, is it correct?
> >> >> >
> >> >> > That is correct. This patch I reverted introduces regression for 
> >> >> > professional
> >> >> > laptops that expect the LID switch to be reported accurately.
> >> >>
> >> >> And from a user's perspective, what does not work any more?
> >> >
> >> > If you boot or resume your laptop with the lid closed on a docking
> >> > station while using an external monitor connected to it, both internal
> >> > and external displays will light on, while only the external should.
> >> >
> >> > There is a design choice in gdm to only provide the greater on the
> >> > internal display when lit on, so users only see a gray area on the
> >> > external monitor. Also, the cursor will not show up as it's by default
> >> > on the internal display too.
> >> >
> >> > To "fix" that, users have to open the laptop once and close it once
> >> > again to sync the state of the switch with the hardware state.
> >>
> >> OK
> >>
> >> Yeah, that sucks.
> >>
> >> So without the Lv's patch the behavior (on the systems in question) is
> >> as expected, right?
> >
> > Would you agree to take both these reverts without Lv's ACK? We already
> > tried to explain for 2 weeks that they are valuable, but it seems we
> > can't make change his mind.

It's not that difficult to get an agreement.
We just didn't communicate well.

> One of the reverts actually is already in (as a patch from Lv) and
> I'll most probably push the other one for -rc4 next week.

If we really want to go back to "method" mode.
We need one more patch and it is not in Benjamin's series.

There are 3 known broken cases, 2 of them are related to orders.
The last 1 is not order related:
1. Surface Pro 3: open arrives very early to update cached value, not 
notification
2. Samsung N210+: open arrives very late to update cached value, not 
notification 
3. Surface Pro 1: no open event, _LID keeps on returning close

The order problem is (considering method mode):

_Qxx <- Invoked due to EC events
  Update _LID return value
  Notify(LID, close)
input_report(SW_LID 1) -> captured by user space and system starts to 
suspend
acpi_button_suspend
acpi_ec_suspend
  acpi_ec_disable_event
acpi_button_resume
  if (method)
input_report(SW_LID, _LID return value, would be 1 for cached value)
acpi_ec_resume
  acpi_ec_enable_event
_Qxx <- Invoked due to EC events, for broken case 3, no such event
  Update _LID return value
  Notify(LID, open) <- for broken case 1, 2, 3, no such notification, thus open 
cannot be delivered to user space.
input_report(SW_LID, 0)

The order of acpi_button_resume()/acpi_ec_resume() is determined by the 
enumeration order.
So it could vary on different platforms.
Considering case 1, for surface pro 3, where acpi_button_resume() is invoked 
before acpi_ec_resume().
Button driver will send false "close" to user space, and the updated "open" 
state won't be delivered to user space.
Staying in method mode, we can only suspend the system once, follow-up "close" 
events won't arrive to user space.

Even we can add many workarounds to make sure acpi_ec_resume() is executed 
before acpi_button_resume() on such platforms.
We still cannot fix case 2 and case 3.
So finally this order still cannot be ensured, and the solution is still not 
stable.
I would imagine the order problem is the key reason why MS stops sending "open" 
on these platforms.

Then given this order issue is not fixable, we need to fix broken case 1 by 
adding "complement event" in "method" mode.
Why don't we do this first before reverting to "method" mode?

And for broken stuffs like case 2/case 3,
IMO, they can only be solved using "ignore" mode, and changes in user space.
If we don't use this solution, we still need libinput quirks to handle them 
(may possibly encounter some new unfixable problems).
If you want to stay in "method" mode, will you be responsible for responding 
end users on kernel Bugzilla using libinput quirks?
We have a Power-Lid category now, we can have you guys as default assignee.

Cheers,
Lv


RE: [PATCH 2/2] Revert "ACPI / button: Change default behavior to lid_init_state=open"

2017-05-17 Thread Zheng, Lv
Hi, Benjamin

> > What's that?
> > I mean, the bad faith?
> I already explained 4 times why we need to revert these two patches and
> why we need to keep 'method'. And you keep answering with long emails
> that you would rather not. I call it bad faith, sorry.

The 4 times explanations didn't answer my questions.
But that's OK, let's clarify it again.

> > > This is a REGRESSION. It used to work on thousands of devices, it
> > > doesn't anymore. So any regression has to be chased down and no good
> > > reason can justify such a regression.
> > I triggered many such kind of layered regressions and did fix them 1 by 1 
> > in different places.
> > However, this might be different.
> No. It is a regression. It used to work for thousands of devices befor
> v4.11, and now it's broken for those devices. It's a regression.
> Some new devices are broken with "method", it's a bug, and we can't fix
> them by regressing on all the others.
...
> I call this "fixing by users", and this is wrong. It used to work for
> years for almost everybody, you can not ask users to fix this one by
> one.

What about regressions triggered by this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23de5d9ef2a4
Before that (year 2007), "ignore" is the default mode.

Other than this, I just had concerns related to fixing things back and forth, 
but you didn't reply properly.
Again, that's OK, let's just clarify it.

> Yes, it's called a quirk. And the good practice is to register those
> quirks and make them available to everybody. Being in hwdb in user space
> or in acpi/button in kernel space doesn't matter, we need them.

I have no objections but concerns related to the combination of "default mode" 
and "quirk responsibles".
From my point of view, these are my conclusions:
1. If you want to use libinput to generate quirks, you should use "ignore" 
rather than "method" mode as default mode;
2. If you want to use button driver to generate quirks, we need "close" mode;
3. If GDM can change or users are ok use command lines, we can remain to use 
"open" as the default behavior.
(I'll send technical details in private about these conclusions)
But you seem to always:
1. Say no to "ignore" which makes 1 impossible;
2. Say no to "close" which makes 2 impossible;
3. Say no to "open" which makes 3 impossible.

> > We haven't asked user space to change.
> > We are just discussing the correctness of some user space behaviors.
> 
> They *are* correct.
> They are following the exported ACPI documentation

I doubt. In ACPI world, Windows is the only standard.

> and the input node documentation.
> Quoting the input doc:
> file Documentation/input/event-codes.rst:
> EV_SW
> -
> 
> EV_SW events describe stateful binary switches. For example, the SW_LID code 
> is
> used to denote when a laptop lid is closed.
> 
> Upon binding to a device or resuming from suspend, a driver must report
> the current switch state. This ensures that the device, kernel, and userspace
> state is in sync.
> 
> Upon resume, if the switch state is the same as before suspend, then the input
> subsystem will filter out the duplicate switch state reports. The driver does
> not need to keep the state of the switch at any time.
> 

That's really a convenient feature for driver.
If I'm the driver writers, I would be very appreciated for being able to use 
such features.
So you see I don't have objections to having this feature.

I just have concerns related to:
1. Is it required to have a timeout in systemd, forcing platform to suspend 
again, just due to event delays?
2. Is it required to use SW_LID to determine whether an internal display should 
be lit on?
I don't see any conflicts between the ABI of EV_SW and the 2 questions.

> So no, you can't have 'ignore' or 'open' to be the default, because user
> space expects the switch to reflect the current state of the hardware.

Then what's the benefit of having 'method' to be the default,
Given it is still not able to reliably deliver the current state of hardware?
Actually both 'ignore/open/method' modes are trying to be compliant to EV_SW.
Among them, "ignore" did the best IMO.
And cases broken in "ignore" mode but not broken in "method" mode are all 
issues:
 - Platform doesn't send notification after boot/resume.
   IMO, we should also collect them and indicate them to desktop managers.

So in the end, we just have differences related to picking which default mode.

> > > You can not also change the semantic of an input switch. An input
> > > switch, as per the input subsystem is supposed to forward an actual
> > > state of the underlying hardware. Any fake information is bad and has to
> > > be avoided.
> > Since fake events are harmful, why do we fake an event after boot/resume?
> > button.lid_init_state=method seems can fake such an event.
> We don't fake an event, we are syncing the input switch state with the
> hardware.
> Faking an event is when you send "switch is open" while 

RE: [PATCH 2/2] Revert "ACPI / button: Change default behavior to lid_init_state=open"

2017-05-17 Thread Zheng, Lv
Hi, Benjamin

> > What's that?
> > I mean, the bad faith?
> I already explained 4 times why we need to revert these two patches and
> why we need to keep 'method'. And you keep answering with long emails
> that you would rather not. I call it bad faith, sorry.

The 4 times explanations didn't answer my questions.
But that's OK, let's clarify it again.

> > > This is a REGRESSION. It used to work on thousands of devices, it
> > > doesn't anymore. So any regression has to be chased down and no good
> > > reason can justify such a regression.
> > I triggered many such kind of layered regressions and did fix them 1 by 1 
> > in different places.
> > However, this might be different.
> No. It is a regression. It used to work for thousands of devices befor
> v4.11, and now it's broken for those devices. It's a regression.
> Some new devices are broken with "method", it's a bug, and we can't fix
> them by regressing on all the others.
...
> I call this "fixing by users", and this is wrong. It used to work for
> years for almost everybody, you can not ask users to fix this one by
> one.

What about regressions triggered by this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23de5d9ef2a4
Before that (year 2007), "ignore" is the default mode.

Other than this, I just had concerns related to fixing things back and forth, 
but you didn't reply properly.
Again, that's OK, let's just clarify it.

> Yes, it's called a quirk. And the good practice is to register those
> quirks and make them available to everybody. Being in hwdb in user space
> or in acpi/button in kernel space doesn't matter, we need them.

I have no objections but concerns related to the combination of "default mode" 
and "quirk responsibles".
From my point of view, these are my conclusions:
1. If you want to use libinput to generate quirks, you should use "ignore" 
rather than "method" mode as default mode;
2. If you want to use button driver to generate quirks, we need "close" mode;
3. If GDM can change or users are ok use command lines, we can remain to use 
"open" as the default behavior.
(I'll send technical details in private about these conclusions)
But you seem to always:
1. Say no to "ignore" which makes 1 impossible;
2. Say no to "close" which makes 2 impossible;
3. Say no to "open" which makes 3 impossible.

> > We haven't asked user space to change.
> > We are just discussing the correctness of some user space behaviors.
> 
> They *are* correct.
> They are following the exported ACPI documentation

I doubt. In ACPI world, Windows is the only standard.

> and the input node documentation.
> Quoting the input doc:
> file Documentation/input/event-codes.rst:
> EV_SW
> -
> 
> EV_SW events describe stateful binary switches. For example, the SW_LID code 
> is
> used to denote when a laptop lid is closed.
> 
> Upon binding to a device or resuming from suspend, a driver must report
> the current switch state. This ensures that the device, kernel, and userspace
> state is in sync.
> 
> Upon resume, if the switch state is the same as before suspend, then the input
> subsystem will filter out the duplicate switch state reports. The driver does
> not need to keep the state of the switch at any time.
> 

That's really a convenient feature for driver.
If I'm the driver writers, I would be very appreciated for being able to use 
such features.
So you see I don't have objections to having this feature.

I just have concerns related to:
1. Is it required to have a timeout in systemd, forcing platform to suspend 
again, just due to event delays?
2. Is it required to use SW_LID to determine whether an internal display should 
be lit on?
I don't see any conflicts between the ABI of EV_SW and the 2 questions.

> So no, you can't have 'ignore' or 'open' to be the default, because user
> space expects the switch to reflect the current state of the hardware.

Then what's the benefit of having 'method' to be the default,
Given it is still not able to reliably deliver the current state of hardware?
Actually both 'ignore/open/method' modes are trying to be compliant to EV_SW.
Among them, "ignore" did the best IMO.
And cases broken in "ignore" mode but not broken in "method" mode are all 
issues:
 - Platform doesn't send notification after boot/resume.
   IMO, we should also collect them and indicate them to desktop managers.

So in the end, we just have differences related to picking which default mode.

> > > You can not also change the semantic of an input switch. An input
> > > switch, as per the input subsystem is supposed to forward an actual
> > > state of the underlying hardware. Any fake information is bad and has to
> > > be avoided.
> > Since fake events are harmful, why do we fake an event after boot/resume?
> > button.lid_init_state=method seems can fake such an event.
> We don't fake an event, we are syncing the input switch state with the
> hardware.
> Faking an event is when you send "switch is open" while 

RE: [PATCH 2/2] Revert "ACPI / button: Change default behavior to lid_init_state=open"

2017-05-16 Thread Zheng, Lv
Hi, Benjamin

> > > > > >> >> > > > > For example, such a hwdb entry is:
> > > > > >> >> > > > > libinput:name:*Lid 
> > > > > >> >> > > > > Switch*:dmi:*svnMicrosoftCorporation:pnSurface3:*
> > > > > >> >> > > > >  LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=write_open
> > > > > >> >> Well, if it worked in a specific way that users depended on 
> > > > > >> >> before the commit in
> > > > > >> >> question and now it works differently, then it does break 
> > > > > >> >> things.
> > > > > >> >> Benjamin, my understanding is that this is the case, is it 
> > > > > >> >> correct?
> > > > > >> > That is correct. This patch I reverted introduces regression for 
> > > > > >> > professional
> > > > > >> > laptops that expect the LID switch to be reported accurately.
> > > > > >> And from a user's perspective, what does not work any more?
> > > > > > If you boot or resume your laptop with the lid closed on a docking
> > > > > > station while using an external monitor connected to it, both 
> > > > > > internal
> > > > > > and external displays will light on, while only the external should.
> > > > > > There is a design choice in gdm to only provide the greater on the
> > > > > > internal display when lit on, so users only see a gray area on the
> > > > > > external monitor. Also, the cursor will not show up as it's by 
> > > > > > default
> > > > > > on the internal display too.
> > > > > > To "fix" that, users have to open the laptop once and close it once
> > > > > > again to sync the state of the switch with the hardware state.
> > > > > OK
> > > > > Yeah, that sucks.
> > > > > So without the Lv's patch the behavior (on the systems in question) is
> > > > > as expected, right?
> > > > Yes, reverting these 2 patches restores the pre v4.11 kernel behavior.
> > > I would make an argument that:
> > > A. Is this necessarily a button driver regression?
> > > 1. Users already configured to not using internal display, why gdm need 
> > > to determine it again
> instead
> > > of users choice?
> > > 2. Can gdm/graphics driver saves state before suspend, and restores saved 
> > > state after resume?
> > >If users didn't change state during suspend, then everything should be 
> > > correct.
> > >If users changed state during suspend, it should be acceptable for 
> > > users to change it again to
> make
> > > the state correct.
> > > See, this is obviously a case that is not so strictly related to ACPI 
> > > button driver.
> > > Why do we need to force button driver to marry external monitors.
> > > B. Bug reporters are all ok with using quirk modes as boot parameters to 
> > > work this around.
> > > Why should we change our default behavior aimlessly?
> >
> > I have one more concern:
> > In button.lid_init_state=method mode,
> > Is that possible for libinput to work things around if _LID return value is 
> > not correct?
> > How libinput ensures correct timing of overwriting the input node value?
> > Will button driver faked event value overwrites what libinput has written?
> >
> > From this point of view, button.lid_init_state=ignore might be a better 
> > choice than
> button.lid_init_state=method to allow libinput to deal with all kind of cases.
> >
> 
> This is my last email on this topic, I don't even want to fully read/answer
> the one in 1/2 given the amount of bad faith you put in that.

What's that?
I mean, the bad faith?

> This is a REGRESSION. It used to work on thousands of devices, it
> doesn't anymore. So any regression has to be chased down and no good
> reason can justify such a regression.

I triggered many such kind of layered regressions and did fix them 1 by 1 in 
different places.
However, this might be different.
Which depends on our agreement.

> The only solution is to revert both these changes. We can not ask user
> space to fix a kernel regression, it's not how it works.

Yes, I know.
We just asked users to use quirk modes of button driver.
And there is in fact always one of them working.
We haven't asked user space to change.
We are just discussing the correctness of some user space behaviors.

> You can not also change the semantic of an input switch. An input
> switch, as per the input subsystem is supposed to forward an actual
> state of the underlying hardware. Any fake information is bad and has to
> be avoided.

Since fake events are harmful, why do we fake an event after boot/resume?
button.lid_init_state=method seems can fake such an event.

> I already gave you 2 solutions to fix the 7 machines you see that are
> problematic, and you just seem to ignore them:
> - revert to the v4.10 behavior and let libinput fix that for you

I already chose this.
But I just raised a concern that button.lid_init_state=method could bring 
troubles to libinput quirks.

> - revert to the v4.10 behavior and have a quirk database in acpi/button
> 
> I also proposed to take maintainership on this particular module because
> you said you were assigned this by default because you were the last
> 

RE: [PATCH 2/2] Revert "ACPI / button: Change default behavior to lid_init_state=open"

2017-05-16 Thread Zheng, Lv
Hi, Benjamin

> > > > > >> >> > > > > For example, such a hwdb entry is:
> > > > > >> >> > > > > libinput:name:*Lid 
> > > > > >> >> > > > > Switch*:dmi:*svnMicrosoftCorporation:pnSurface3:*
> > > > > >> >> > > > >  LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=write_open
> > > > > >> >> Well, if it worked in a specific way that users depended on 
> > > > > >> >> before the commit in
> > > > > >> >> question and now it works differently, then it does break 
> > > > > >> >> things.
> > > > > >> >> Benjamin, my understanding is that this is the case, is it 
> > > > > >> >> correct?
> > > > > >> > That is correct. This patch I reverted introduces regression for 
> > > > > >> > professional
> > > > > >> > laptops that expect the LID switch to be reported accurately.
> > > > > >> And from a user's perspective, what does not work any more?
> > > > > > If you boot or resume your laptop with the lid closed on a docking
> > > > > > station while using an external monitor connected to it, both 
> > > > > > internal
> > > > > > and external displays will light on, while only the external should.
> > > > > > There is a design choice in gdm to only provide the greater on the
> > > > > > internal display when lit on, so users only see a gray area on the
> > > > > > external monitor. Also, the cursor will not show up as it's by 
> > > > > > default
> > > > > > on the internal display too.
> > > > > > To "fix" that, users have to open the laptop once and close it once
> > > > > > again to sync the state of the switch with the hardware state.
> > > > > OK
> > > > > Yeah, that sucks.
> > > > > So without the Lv's patch the behavior (on the systems in question) is
> > > > > as expected, right?
> > > > Yes, reverting these 2 patches restores the pre v4.11 kernel behavior.
> > > I would make an argument that:
> > > A. Is this necessarily a button driver regression?
> > > 1. Users already configured to not using internal display, why gdm need 
> > > to determine it again
> instead
> > > of users choice?
> > > 2. Can gdm/graphics driver saves state before suspend, and restores saved 
> > > state after resume?
> > >If users didn't change state during suspend, then everything should be 
> > > correct.
> > >If users changed state during suspend, it should be acceptable for 
> > > users to change it again to
> make
> > > the state correct.
> > > See, this is obviously a case that is not so strictly related to ACPI 
> > > button driver.
> > > Why do we need to force button driver to marry external monitors.
> > > B. Bug reporters are all ok with using quirk modes as boot parameters to 
> > > work this around.
> > > Why should we change our default behavior aimlessly?
> >
> > I have one more concern:
> > In button.lid_init_state=method mode,
> > Is that possible for libinput to work things around if _LID return value is 
> > not correct?
> > How libinput ensures correct timing of overwriting the input node value?
> > Will button driver faked event value overwrites what libinput has written?
> >
> > From this point of view, button.lid_init_state=ignore might be a better 
> > choice than
> button.lid_init_state=method to allow libinput to deal with all kind of cases.
> >
> 
> This is my last email on this topic, I don't even want to fully read/answer
> the one in 1/2 given the amount of bad faith you put in that.

What's that?
I mean, the bad faith?

> This is a REGRESSION. It used to work on thousands of devices, it
> doesn't anymore. So any regression has to be chased down and no good
> reason can justify such a regression.

I triggered many such kind of layered regressions and did fix them 1 by 1 in 
different places.
However, this might be different.
Which depends on our agreement.

> The only solution is to revert both these changes. We can not ask user
> space to fix a kernel regression, it's not how it works.

Yes, I know.
We just asked users to use quirk modes of button driver.
And there is in fact always one of them working.
We haven't asked user space to change.
We are just discussing the correctness of some user space behaviors.

> You can not also change the semantic of an input switch. An input
> switch, as per the input subsystem is supposed to forward an actual
> state of the underlying hardware. Any fake information is bad and has to
> be avoided.

Since fake events are harmful, why do we fake an event after boot/resume?
button.lid_init_state=method seems can fake such an event.

> I already gave you 2 solutions to fix the 7 machines you see that are
> problematic, and you just seem to ignore them:
> - revert to the v4.10 behavior and let libinput fix that for you

I already chose this.
But I just raised a concern that button.lid_init_state=method could bring 
troubles to libinput quirks.

> - revert to the v4.10 behavior and have a quirk database in acpi/button
> 
> I also proposed to take maintainership on this particular module because
> you said you were assigned this by default because you were the last
> 

RE: [PATCH 2/2] Revert "ACPI / button: Change default behavior to lid_init_state=open"

2017-05-15 Thread Zheng, Lv
Hi,

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Zheng,
> Lv
> Subject: RE: [PATCH 2/2] Revert "ACPI / button: Change default behavior to 
> lid_init_state=open"
> 
> Hi, Guys
> 
> > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> > Subject: Re: [PATCH 2/2] Revert "ACPI / button: Change default behavior to 
> > lid_init_state=open"
> >
> > On May 15 2017 or thereabouts, Rafael J. Wysocki wrote:
> > > On Mon, May 15, 2017 at 11:37 AM, Benjamin Tissoires
> > > <benjamin.tissoi...@redhat.com> wrote:
> > > > On May 15 2017 or thereabouts, Rafael J. Wysocki wrote:
> > > >> On Mon, May 15, 2017 at 9:45 AM, Benjamin Tissoires
> > > >> <benjamin.tissoi...@redhat.com> wrote:
> > > >> > On May 12 2017 or thereabouts, Rafael J. Wysocki wrote:
> > > >> >> On Friday, May 12, 2017 02:36:20 AM Zheng, Lv wrote:
> > > >> >> > Hi,
> > > >> >> >
> > > >> >> > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> > > >> >> > > Subject: Re: [PATCH 2/2] Revert "ACPI / button: Change default 
> > > >> >> > > behavior to
> > lid_init_state=open"
> > > >> >> > >
> > > >> >> > > On May 11 2017 or thereabouts, Zheng, Lv wrote:
> > > >> >> > > > Hi,
> > > >> >> > > >
> > > >> >> > > > > From: Benjamin Tissoires 
> > > >> >> > > > > [mailto:benjamin.tissoi...@redhat.com]
> > > >> >> > > > > Subject: [PATCH 2/2] Revert "ACPI / button: Change default 
> > > >> >> > > > > behavior to
> > lid_init_state=open"
> > > >> >> > > > >
> > > >> >> > > > > This reverts commit 
> > > >> >> > > > > 77e9a4aa9de10cc1418bf9a892366988802a8025.
> > > >> >> > > > >
> > > >> >> > > > > Even if the method implementation can be buggy on some 
> > > >> >> > > > > platform,
> > > >> >> > > > > the "open" choice is worse. It breaks docking stations 
> > > >> >> > > > > basically
> > > >> >> > > > > and there is no way to have a user-space hwdb to fix that.
> > > >> >> > > > >
> > > >> >> > > > > On the contrary, it's rather easy in user-space to have a 
> > > >> >> > > > > hwdb
> > > >> >> > > > > with the problematic platforms. Then, libinput (1.7.0+) can 
> > > >> >> > > > > fix
> > > >> >> > > > > the state of the LID switch for us: you need to set the udev
> > > >> >> > > > > property LIBINPUT_ATTR_LID_SWITCH_RELIABILITY to 
> > > >> >> > > > > 'write_open'.
> > > >> >> > > > >
> > > >> >> > > > > When libinput detects internal keyboard events, it will
> > > >> >> > > > > overwrite the state of the switch to open, making it 
> > > >> >> > > > > reliable
> > > >> >> > > > > again. Given that logind only checks the LID switch value 
> > > >> >> > > > > after
> > > >> >> > > > > a timeout, we can assume the user will use the internal 
> > > >> >> > > > > keyboard
> > > >> >> > > > > before this timeout expires.
> > > >> >> > > > >
> > > >> >> > > > > For example, such a hwdb entry is:
> > > >> >> > > > >
> > > >> >> > > > > libinput:name:*Lid 
> > > >> >> > > > > Switch*:dmi:*svnMicrosoftCorporation:pnSurface3:*
> > > >> >> > > > >  LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=write_open
> > > >> >> > > >
> > > >> >
> > > >> > [...]
> > > >> >
> > > >> >>
> > > >> >> Well, if it worked in a specific way that users depended on befor

RE: [PATCH 2/2] Revert "ACPI / button: Change default behavior to lid_init_state=open"

2017-05-15 Thread Zheng, Lv
Hi,

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Zheng,
> Lv
> Subject: RE: [PATCH 2/2] Revert "ACPI / button: Change default behavior to 
> lid_init_state=open"
> 
> Hi, Guys
> 
> > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> > Subject: Re: [PATCH 2/2] Revert "ACPI / button: Change default behavior to 
> > lid_init_state=open"
> >
> > On May 15 2017 or thereabouts, Rafael J. Wysocki wrote:
> > > On Mon, May 15, 2017 at 11:37 AM, Benjamin Tissoires
> > >  wrote:
> > > > On May 15 2017 or thereabouts, Rafael J. Wysocki wrote:
> > > >> On Mon, May 15, 2017 at 9:45 AM, Benjamin Tissoires
> > > >>  wrote:
> > > >> > On May 12 2017 or thereabouts, Rafael J. Wysocki wrote:
> > > >> >> On Friday, May 12, 2017 02:36:20 AM Zheng, Lv wrote:
> > > >> >> > Hi,
> > > >> >> >
> > > >> >> > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> > > >> >> > > Subject: Re: [PATCH 2/2] Revert "ACPI / button: Change default 
> > > >> >> > > behavior to
> > lid_init_state=open"
> > > >> >> > >
> > > >> >> > > On May 11 2017 or thereabouts, Zheng, Lv wrote:
> > > >> >> > > > Hi,
> > > >> >> > > >
> > > >> >> > > > > From: Benjamin Tissoires 
> > > >> >> > > > > [mailto:benjamin.tissoi...@redhat.com]
> > > >> >> > > > > Subject: [PATCH 2/2] Revert "ACPI / button: Change default 
> > > >> >> > > > > behavior to
> > lid_init_state=open"
> > > >> >> > > > >
> > > >> >> > > > > This reverts commit 
> > > >> >> > > > > 77e9a4aa9de10cc1418bf9a892366988802a8025.
> > > >> >> > > > >
> > > >> >> > > > > Even if the method implementation can be buggy on some 
> > > >> >> > > > > platform,
> > > >> >> > > > > the "open" choice is worse. It breaks docking stations 
> > > >> >> > > > > basically
> > > >> >> > > > > and there is no way to have a user-space hwdb to fix that.
> > > >> >> > > > >
> > > >> >> > > > > On the contrary, it's rather easy in user-space to have a 
> > > >> >> > > > > hwdb
> > > >> >> > > > > with the problematic platforms. Then, libinput (1.7.0+) can 
> > > >> >> > > > > fix
> > > >> >> > > > > the state of the LID switch for us: you need to set the udev
> > > >> >> > > > > property LIBINPUT_ATTR_LID_SWITCH_RELIABILITY to 
> > > >> >> > > > > 'write_open'.
> > > >> >> > > > >
> > > >> >> > > > > When libinput detects internal keyboard events, it will
> > > >> >> > > > > overwrite the state of the switch to open, making it 
> > > >> >> > > > > reliable
> > > >> >> > > > > again. Given that logind only checks the LID switch value 
> > > >> >> > > > > after
> > > >> >> > > > > a timeout, we can assume the user will use the internal 
> > > >> >> > > > > keyboard
> > > >> >> > > > > before this timeout expires.
> > > >> >> > > > >
> > > >> >> > > > > For example, such a hwdb entry is:
> > > >> >> > > > >
> > > >> >> > > > > libinput:name:*Lid 
> > > >> >> > > > > Switch*:dmi:*svnMicrosoftCorporation:pnSurface3:*
> > > >> >> > > > >  LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=write_open
> > > >> >> > > >
> > > >> >
> > > >> > [...]
> > > >> >
> > > >> >>
> > > >> >> Well, if it worked in a specific way that users depended on before 
> > > >> >> the commit in
> > > &

RE: [PATCH 2/2] Revert "ACPI / button: Change default behavior to lid_init_state=open"

2017-05-15 Thread Zheng, Lv
Hi, Guys

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: Re: [PATCH 2/2] Revert "ACPI / button: Change default behavior to 
> lid_init_state=open"
> 
> On May 15 2017 or thereabouts, Rafael J. Wysocki wrote:
> > On Mon, May 15, 2017 at 11:37 AM, Benjamin Tissoires
> > <benjamin.tissoi...@redhat.com> wrote:
> > > On May 15 2017 or thereabouts, Rafael J. Wysocki wrote:
> > >> On Mon, May 15, 2017 at 9:45 AM, Benjamin Tissoires
> > >> <benjamin.tissoi...@redhat.com> wrote:
> > >> > On May 12 2017 or thereabouts, Rafael J. Wysocki wrote:
> > >> >> On Friday, May 12, 2017 02:36:20 AM Zheng, Lv wrote:
> > >> >> > Hi,
> > >> >> >
> > >> >> > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> > >> >> > > Subject: Re: [PATCH 2/2] Revert "ACPI / button: Change default 
> > >> >> > > behavior to
> lid_init_state=open"
> > >> >> > >
> > >> >> > > On May 11 2017 or thereabouts, Zheng, Lv wrote:
> > >> >> > > > Hi,
> > >> >> > > >
> > >> >> > > > > From: Benjamin Tissoires 
> > >> >> > > > > [mailto:benjamin.tissoi...@redhat.com]
> > >> >> > > > > Subject: [PATCH 2/2] Revert "ACPI / button: Change default 
> > >> >> > > > > behavior to
> lid_init_state=open"
> > >> >> > > > >
> > >> >> > > > > This reverts commit 77e9a4aa9de10cc1418bf9a892366988802a8025.
> > >> >> > > > >
> > >> >> > > > > Even if the method implementation can be buggy on some 
> > >> >> > > > > platform,
> > >> >> > > > > the "open" choice is worse. It breaks docking stations 
> > >> >> > > > > basically
> > >> >> > > > > and there is no way to have a user-space hwdb to fix that.
> > >> >> > > > >
> > >> >> > > > > On the contrary, it's rather easy in user-space to have a hwdb
> > >> >> > > > > with the problematic platforms. Then, libinput (1.7.0+) can 
> > >> >> > > > > fix
> > >> >> > > > > the state of the LID switch for us: you need to set the udev
> > >> >> > > > > property LIBINPUT_ATTR_LID_SWITCH_RELIABILITY to 'write_open'.
> > >> >> > > > >
> > >> >> > > > > When libinput detects internal keyboard events, it will
> > >> >> > > > > overwrite the state of the switch to open, making it reliable
> > >> >> > > > > again. Given that logind only checks the LID switch value 
> > >> >> > > > > after
> > >> >> > > > > a timeout, we can assume the user will use the internal 
> > >> >> > > > > keyboard
> > >> >> > > > > before this timeout expires.
> > >> >> > > > >
> > >> >> > > > > For example, such a hwdb entry is:
> > >> >> > > > >
> > >> >> > > > > libinput:name:*Lid 
> > >> >> > > > > Switch*:dmi:*svnMicrosoftCorporation:pnSurface3:*
> > >> >> > > > >  LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=write_open
> > >> >> > > >
> > >> >
> > >> > [...]
> > >> >
> > >> >>
> > >> >> Well, if it worked in a specific way that users depended on before 
> > >> >> the commit in
> > >> >> question and now it works differently, then it does break things.
> > >> >>
> > >> >> Benjamin, my understanding is that this is the case, is it correct?
> > >> >
> > >> > That is correct. This patch I reverted introduces regression for 
> > >> > professional
> > >> > laptops that expect the LID switch to be reported accurately.
> > >>
> > >> And from a user's perspective, what does not work any more?
> > >
> > > If you boot or resume your laptop with the lid closed on a docking
> > > station while using an external monitor connected to it, both internal
> > > and external displays will light on, while only the external should.
> > >
> > > There is a design choice in gdm to only provide the greater on the
> > > internal display when lit on, so users only see a gray area on the
> > > external monitor. Also, the cursor will not show up as it's by default
> > > on the internal display too.
> > >
> > > To "fix" that, users have to open the laptop once and close it once
> > > again to sync the state of the switch with the hardware state.
> >
> > OK
> >
> > Yeah, that sucks.
> >
> > So without the Lv's patch the behavior (on the systems in question) is
> > as expected, right?
> >
> 
> Yes, reverting these 2 patches restores the pre v4.11 kernel behavior.

I would make an argument that:
A. Is this necessarily a button driver regression?
1. Users already configured to not using internal display, why gdm need to 
determine it again instead of users choice?
2. Can gdm/graphics driver saves state before suspend, and restores saved state 
after resume?
   If users didn't change state during suspend, then everything should be 
correct.
   If users changed state during suspend, it should be acceptable for users to 
change it again to make the state correct.
See, this is obviously a case that is not so strictly related to ACPI button 
driver.
Why do we need to force button driver to marry external monitors.
B. Bug reporters are all ok with using quirk modes as boot parameters to work 
this around.
Why should we change our default behavior aimlessly?

Thanks and best regards
Lv


RE: [PATCH 2/2] Revert "ACPI / button: Change default behavior to lid_init_state=open"

2017-05-15 Thread Zheng, Lv
Hi, Guys

> From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> Subject: Re: [PATCH 2/2] Revert "ACPI / button: Change default behavior to 
> lid_init_state=open"
> 
> On May 15 2017 or thereabouts, Rafael J. Wysocki wrote:
> > On Mon, May 15, 2017 at 11:37 AM, Benjamin Tissoires
> >  wrote:
> > > On May 15 2017 or thereabouts, Rafael J. Wysocki wrote:
> > >> On Mon, May 15, 2017 at 9:45 AM, Benjamin Tissoires
> > >>  wrote:
> > >> > On May 12 2017 or thereabouts, Rafael J. Wysocki wrote:
> > >> >> On Friday, May 12, 2017 02:36:20 AM Zheng, Lv wrote:
> > >> >> > Hi,
> > >> >> >
> > >> >> > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com]
> > >> >> > > Subject: Re: [PATCH 2/2] Revert "ACPI / button: Change default 
> > >> >> > > behavior to
> lid_init_state=open"
> > >> >> > >
> > >> >> > > On May 11 2017 or thereabouts, Zheng, Lv wrote:
> > >> >> > > > Hi,
> > >> >> > > >
> > >> >> > > > > From: Benjamin Tissoires 
> > >> >> > > > > [mailto:benjamin.tissoi...@redhat.com]
> > >> >> > > > > Subject: [PATCH 2/2] Revert "ACPI / button: Change default 
> > >> >> > > > > behavior to
> lid_init_state=open"
> > >> >> > > > >
> > >> >> > > > > This reverts commit 77e9a4aa9de10cc1418bf9a892366988802a8025.
> > >> >> > > > >
> > >> >> > > > > Even if the method implementation can be buggy on some 
> > >> >> > > > > platform,
> > >> >> > > > > the "open" choice is worse. It breaks docking stations 
> > >> >> > > > > basically
> > >> >> > > > > and there is no way to have a user-space hwdb to fix that.
> > >> >> > > > >
> > >> >> > > > > On the contrary, it's rather easy in user-space to have a hwdb
> > >> >> > > > > with the problematic platforms. Then, libinput (1.7.0+) can 
> > >> >> > > > > fix
> > >> >> > > > > the state of the LID switch for us: you need to set the udev
> > >> >> > > > > property LIBINPUT_ATTR_LID_SWITCH_RELIABILITY to 'write_open'.
> > >> >> > > > >
> > >> >> > > > > When libinput detects internal keyboard events, it will
> > >> >> > > > > overwrite the state of the switch to open, making it reliable
> > >> >> > > > > again. Given that logind only checks the LID switch value 
> > >> >> > > > > after
> > >> >> > > > > a timeout, we can assume the user will use the internal 
> > >> >> > > > > keyboard
> > >> >> > > > > before this timeout expires.
> > >> >> > > > >
> > >> >> > > > > For example, such a hwdb entry is:
> > >> >> > > > >
> > >> >> > > > > libinput:name:*Lid 
> > >> >> > > > > Switch*:dmi:*svnMicrosoftCorporation:pnSurface3:*
> > >> >> > > > >  LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=write_open
> > >> >> > > >
> > >> >
> > >> > [...]
> > >> >
> > >> >>
> > >> >> Well, if it worked in a specific way that users depended on before 
> > >> >> the commit in
> > >> >> question and now it works differently, then it does break things.
> > >> >>
> > >> >> Benjamin, my understanding is that this is the case, is it correct?
> > >> >
> > >> > That is correct. This patch I reverted introduces regression for 
> > >> > professional
> > >> > laptops that expect the LID switch to be reported accurately.
> > >>
> > >> And from a user's perspective, what does not work any more?
> > >
> > > If you boot or resume your laptop with the lid closed on a docking
> > > station while using an external monitor connected to it, both internal
> > > and external displays will light on, while only the external should.
> > >
> > > There is a design choice in gdm to only provide the greater on the
> > > internal display when lit on, so users only see a gray area on the
> > > external monitor. Also, the cursor will not show up as it's by default
> > > on the internal display too.
> > >
> > > To "fix" that, users have to open the laptop once and close it once
> > > again to sync the state of the switch with the hardware state.
> >
> > OK
> >
> > Yeah, that sucks.
> >
> > So without the Lv's patch the behavior (on the systems in question) is
> > as expected, right?
> >
> 
> Yes, reverting these 2 patches restores the pre v4.11 kernel behavior.

I would make an argument that:
A. Is this necessarily a button driver regression?
1. Users already configured to not using internal display, why gdm need to 
determine it again instead of users choice?
2. Can gdm/graphics driver saves state before suspend, and restores saved state 
after resume?
   If users didn't change state during suspend, then everything should be 
correct.
   If users changed state during suspend, it should be acceptable for users to 
change it again to make the state correct.
See, this is obviously a case that is not so strictly related to ACPI button 
driver.
Why do we need to force button driver to marry external monitors.
B. Bug reporters are all ok with using quirk modes as boot parameters to work 
this around.
Why should we change our default behavior aimlessly?

Thanks and best regards
Lv


RE: [PATCH 1/2] Revert "ACPI / button: Remove lid_init_state=method mode"

2017-05-15 Thread Zheng, Lv
Hi, Benjamin

I reordered the discussion to collect topics and delete things to make 
discussion shorter.

1. root caused issue:

> > It seems we just need to determine the following first:
> > 1. Who should be responsible for solving bugs triggered by the conflict 
> > between bios and linux user
> space expectations:
> >button driver? libinput? Some other user space programs? Users?
> Hopefully libinput or systemd (through a udev rule). If things gets
> worse a acpi/button quirk might be used, but in a second time.

I have concerns about what's in your mind. :)

So let me high light a root caused issue:
https://bugzilla.kernel.org/show_bug.cgi?id=106151
If we use any "open" modes, the suspend/resume loop can be fixed.
Both "ignore/method" modes cannot fix the problem.
In this bug, lid open event has a huge delay. But it can correctly arrive.
However systemd will force 2nd suspend if it cannot see "open" event instantly 
after resume.
So why don't systemd fix the issue (the enforcement) prior than letting us 
(input layer/button driver) to invent workarounds?
IMO, this is a root caused problem and should be the first priority.
Before seeing it is addressed in systemd, any changes made to libinput/button 
driver may not be proper.

So the order of fixing all troubles are the followings in my mind:
1. system -> should eliminate the enforcement first
2. libinput -> may change lid event type (see my reply below for topic 2)
3. button driver

> > > > > The issue we are fixing here is the fact that the switch state is 
> > > > > wrong,
> > > > > which makes user space assumptions wrong too (and users angry).
> > > > Considering no platform quirks.
> > > > If ACPI button driver sends SW_LID, users are likely angry.
> > > > Unless the user space programs are changed to "ignore open event".
> > > > If ACPI button driver doesn't send switch events, but key events.
> > > > The user space programs need to change to handle the new events.
> > > > So finally whatever we do, user space need to change a bit related to 
> > > > ACPI control method lid
> device.
> > > Or we fix the switch to report accurate events/state.
> > > You do realise that all the energy you are spending, answering to me,
> > > talking to user space maintainers, users, all comes down to the fact
> > > that you refuse to have hardware quirks?

Yes, as we have a root caused but unfixed issue in systemd first.
It's pointless to introduce hardware quirks at this point.

> > Thus there is no possible __FIX__ for acpi button driver and libinput.
> I never talked about a fix. I know the situation is unsolvable, which is
> why I talked about quirks or workarounds.
> > While user space programs can just fix their usage models.
> You can't expect user space to change anything from the kernel point of
> view without a long enough warning.

Why cannot we expect so?
The above issue has already been root caused.

> > > > > > However, is that possible to not introduce platform quirks?
> > > > > > For example, for systemd:
> > > > > > If it detected ACPI lid device, automatically switch to an 
> > > > > > "IgnoreLidOpenEvent" mode (like
> > > nouveau
> > > > > drivers' ignorelid=Y option).
> > > > > Well, given that 99.9% of laptops have this ACPI lid button, you'll 
> > > > > just
> > > > > remove the feature from all laptops.
> > > > No, I only removed the wrong usage models related to the strict "open" 
> > > > event.
> > > > All laptops are still capable of sending correct "close" event.
> > > My bad, I read too fast and missed the "...Open..." part of
> > > "IgnoreLidOpenEvent".
> > > Though I am not sure IgnoreLidOpenEvent is accurate.
> > > "OpenEventNeverSent" seems to reflect more the reality. But again,
> > > that's not of our (kernel developers) business.
> > IMO this is the only root cause fix. :)
> > It's the only way that the user can use without changing its quirk modes 
> > for different usage models.
> Yes and no. There is a design issue in logind, but this is based on the
> design choice made in acpi/button: we use an input switch. A switch has
> a state, and the kernel ought to report correct state. With this in
> mind, the design choice in logind is correct. But now we are
> seeing that some OEM are not providing everything we need, and we need
> to find the solution.

We can stop arguing this.
First of all, we just need to check if systemd can remove the enforcement 
mentioned above.
And what will happen next.
It's likely that there will be no user issues or button driver design issues 
then.
Thus finally we may needn't introduce any "hardware quirks".

2. keep "method" mode:

> > Button driver default behavior should be (not 100% sure if this is your 
> > opinion):
> button.lid_init_state=method
> 
> Yes, I'd like to revert to the old behavior (see below for a rationale).
...
> > 2. What should be the default button driver behavior?
> >button.lid_init_state=ignore? button.lid_init_state=method?
> button.lid_init_state=method:
> - this 

RE: [PATCH 1/2] Revert "ACPI / button: Remove lid_init_state=method mode"

2017-05-15 Thread Zheng, Lv
Hi, Benjamin

I reordered the discussion to collect topics and delete things to make 
discussion shorter.

1. root caused issue:

> > It seems we just need to determine the following first:
> > 1. Who should be responsible for solving bugs triggered by the conflict 
> > between bios and linux user
> space expectations:
> >button driver? libinput? Some other user space programs? Users?
> Hopefully libinput or systemd (through a udev rule). If things gets
> worse a acpi/button quirk might be used, but in a second time.

I have concerns about what's in your mind. :)

So let me high light a root caused issue:
https://bugzilla.kernel.org/show_bug.cgi?id=106151
If we use any "open" modes, the suspend/resume loop can be fixed.
Both "ignore/method" modes cannot fix the problem.
In this bug, lid open event has a huge delay. But it can correctly arrive.
However systemd will force 2nd suspend if it cannot see "open" event instantly 
after resume.
So why don't systemd fix the issue (the enforcement) prior than letting us 
(input layer/button driver) to invent workarounds?
IMO, this is a root caused problem and should be the first priority.
Before seeing it is addressed in systemd, any changes made to libinput/button 
driver may not be proper.

So the order of fixing all troubles are the followings in my mind:
1. system -> should eliminate the enforcement first
2. libinput -> may change lid event type (see my reply below for topic 2)
3. button driver

> > > > > The issue we are fixing here is the fact that the switch state is 
> > > > > wrong,
> > > > > which makes user space assumptions wrong too (and users angry).
> > > > Considering no platform quirks.
> > > > If ACPI button driver sends SW_LID, users are likely angry.
> > > > Unless the user space programs are changed to "ignore open event".
> > > > If ACPI button driver doesn't send switch events, but key events.
> > > > The user space programs need to change to handle the new events.
> > > > So finally whatever we do, user space need to change a bit related to 
> > > > ACPI control method lid
> device.
> > > Or we fix the switch to report accurate events/state.
> > > You do realise that all the energy you are spending, answering to me,
> > > talking to user space maintainers, users, all comes down to the fact
> > > that you refuse to have hardware quirks?

Yes, as we have a root caused but unfixed issue in systemd first.
It's pointless to introduce hardware quirks at this point.

> > Thus there is no possible __FIX__ for acpi button driver and libinput.
> I never talked about a fix. I know the situation is unsolvable, which is
> why I talked about quirks or workarounds.
> > While user space programs can just fix their usage models.
> You can't expect user space to change anything from the kernel point of
> view without a long enough warning.

Why cannot we expect so?
The above issue has already been root caused.

> > > > > > However, is that possible to not introduce platform quirks?
> > > > > > For example, for systemd:
> > > > > > If it detected ACPI lid device, automatically switch to an 
> > > > > > "IgnoreLidOpenEvent" mode (like
> > > nouveau
> > > > > drivers' ignorelid=Y option).
> > > > > Well, given that 99.9% of laptops have this ACPI lid button, you'll 
> > > > > just
> > > > > remove the feature from all laptops.
> > > > No, I only removed the wrong usage models related to the strict "open" 
> > > > event.
> > > > All laptops are still capable of sending correct "close" event.
> > > My bad, I read too fast and missed the "...Open..." part of
> > > "IgnoreLidOpenEvent".
> > > Though I am not sure IgnoreLidOpenEvent is accurate.
> > > "OpenEventNeverSent" seems to reflect more the reality. But again,
> > > that's not of our (kernel developers) business.
> > IMO this is the only root cause fix. :)
> > It's the only way that the user can use without changing its quirk modes 
> > for different usage models.
> Yes and no. There is a design issue in logind, but this is based on the
> design choice made in acpi/button: we use an input switch. A switch has
> a state, and the kernel ought to report correct state. With this in
> mind, the design choice in logind is correct. But now we are
> seeing that some OEM are not providing everything we need, and we need
> to find the solution.

We can stop arguing this.
First of all, we just need to check if systemd can remove the enforcement 
mentioned above.
And what will happen next.
It's likely that there will be no user issues or button driver design issues 
then.
Thus finally we may needn't introduce any "hardware quirks".

2. keep "method" mode:

> > Button driver default behavior should be (not 100% sure if this is your 
> > opinion):
> button.lid_init_state=method
> 
> Yes, I'd like to revert to the old behavior (see below for a rationale).
...
> > 2. What should be the default button driver behavior?
> >button.lid_init_state=ignore? button.lid_init_state=method?
> button.lid_init_state=method:
> - this 

  1   2   3   4   5   6   7   8   9   10   >