Re: rtc-opal: Fix handling of firmware error codes, prevent busy loops

2018-02-06 Thread Michael Ellerman
Alexandre Belloni  writes:

> On 06/02/2018 at 16:22:47 +1100, Michael Ellerman wrote:
>> > Just a note to let you know that this patch should have gone through my
>> > tree but it was not sent to linux-rtc or me.
>> 
>> Sorry, I saw it had been languishing for a long time and assumed you'd
>> missed it.
>> 
>> Happy to revert/rework it if you're not happy with it.
>> 
>
> No, that's fine. It's just that the commit title stands out when using
> git log --oneline and that triggered my OCD ;)

Oh no! Now I'm *really* sorry, that's the worst! :)

cheers


Re: rtc-opal: Fix handling of firmware error codes, prevent busy loops

2018-02-06 Thread Alexandre Belloni
On 06/02/2018 at 16:22:47 +1100, Michael Ellerman wrote:
> > Just a note to let you know that this patch should have gone through my
> > tree but it was not sent to linux-rtc or me.
> 
> Sorry, I saw it had been languishing for a long time and assumed you'd
> missed it.
> 
> Happy to revert/rework it if you're not happy with it.
> 

No, that's fine. It's just that the commit title stands out when using
git log --oneline and that triggered my OCD ;)

> > I guess what happened is that Michael cleaned up the Linux PPC patchwork
> > queue.
> 
> Yeah I did.
> 
> In future I'll ping you if there's something that seems to have fallen
> through the cracks.
> 

Great thanks!


-- 
Alexandre Belloni, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
http://bootlin.com


Re: rtc-opal: Fix handling of firmware error codes, prevent busy loops

2018-02-06 Thread Alexandre Belloni
Hi,

On 02/08/2016 at 11:50:16 +1000, Stewart Smith wrote:
> According to the OPAL docs:
> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt
> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt
> OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this
> indicates either a transient or permanent error.
> 
> Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a
> permanent error particularly well, in that you could end up in a busy
> loop.
> 
> This was not too hard to trigger on an AMI BMC based OpenPOWER machine
> doing a continuous "ipmitool mc reset cold" to the BMC, the result of
> that being that we'd get stuck in an infinite loop in opal_get_rtc_time.
> 
> We now retry a few times before returning the error higher up the stack.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Stewart Smith 
> ---
>  drivers/rtc/rtc-opal.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 

Just a note to let you know that this patch should have gone through my
tree but it was not sent to linux-rtc or me.

I guess what happened is that Michael cleaned up the Linux PPC patchwork
queue.

-- 
Alexandre Belloni, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
http://bootlin.com


Re: rtc-opal: Fix handling of firmware error codes, prevent busy loops

2018-02-05 Thread Stewart Smith
Alexandre Belloni  writes:
> On 02/08/2016 at 11:50:16 +1000, Stewart Smith wrote:
>> According to the OPAL docs:
>> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt
>> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt
>> OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this
>> indicates either a transient or permanent error.
>> 
>> Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a
>> permanent error particularly well, in that you could end up in a busy
>> loop.
>> 
>> This was not too hard to trigger on an AMI BMC based OpenPOWER machine
>> doing a continuous "ipmitool mc reset cold" to the BMC, the result of
>> that being that we'd get stuck in an infinite loop in opal_get_rtc_time.
>> 
>> We now retry a few times before returning the error higher up the stack.
>> 
>> Cc: sta...@vger.kernel.org
>> Signed-off-by: Stewart Smith 
>> ---
>>  drivers/rtc/rtc-opal.c | 12 ++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>> 
>
> Just a note to let you know that this patch should have gone through my
> tree but it was not sent to linux-rtc or me.
>
> I guess what happened is that Michael cleaned up the Linux PPC patchwork
> queue.

Apologies for not sending there. My (18 month ago self) bad.

-- 
Stewart Smith
OPAL Architect, IBM.



Re: rtc-opal: Fix handling of firmware error codes, prevent busy loops

2018-02-05 Thread Michael Ellerman
Alexandre Belloni  writes:

> Hi,
>
> On 02/08/2016 at 11:50:16 +1000, Stewart Smith wrote:
>> According to the OPAL docs:
>> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt
>> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt
>> OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this
>> indicates either a transient or permanent error.
>> 
>> Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a
>> permanent error particularly well, in that you could end up in a busy
>> loop.
>> 
>> This was not too hard to trigger on an AMI BMC based OpenPOWER machine
>> doing a continuous "ipmitool mc reset cold" to the BMC, the result of
>> that being that we'd get stuck in an infinite loop in opal_get_rtc_time.
>> 
>> We now retry a few times before returning the error higher up the stack.
>> 
>> Cc: sta...@vger.kernel.org
>> Signed-off-by: Stewart Smith 
>> ---
>>  drivers/rtc/rtc-opal.c | 12 ++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>> 
>
> Just a note to let you know that this patch should have gone through my
> tree but it was not sent to linux-rtc or me.

Sorry, I saw it had been languishing for a long time and assumed you'd
missed it.

Happy to revert/rework it if you're not happy with it.

> I guess what happened is that Michael cleaned up the Linux PPC patchwork
> queue.

Yeah I did.

In future I'll ping you if there's something that seems to have fallen
through the cracks.

cheers


Re: rtc-opal: Fix handling of firmware error codes, prevent busy loops

2018-01-28 Thread Michael Ellerman
On Tue, 2016-08-02 at 01:50:16 UTC, Stewart Smith wrote:
> According to the OPAL docs:
> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt
> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt
> OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this
> indicates either a transient or permanent error.
> 
> Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a
> permanent error particularly well, in that you could end up in a busy
> loop.
> 
> This was not too hard to trigger on an AMI BMC based OpenPOWER machine
> doing a continuous "ipmitool mc reset cold" to the BMC, the result of
> that being that we'd get stuck in an infinite loop in opal_get_rtc_time.
> 
> We now retry a few times before returning the error higher up the stack.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Stewart Smith 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/5b8b58063029f02da573120ef4dc90

cheers


Re: [PATCH] rtc-opal: Fix handling of firmware error codes, prevent busy loops

2016-08-03 Thread Michael Ellerman
Stewart Smith  writes:

> According to the OPAL docs:
> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt
> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt
> OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this
> indicates either a transient or permanent error.
>
> Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a
> permanent error particularly well, in that you could end up in a busy
> loop.
>
> This was not too hard to trigger on an AMI BMC based OpenPOWER machine
> doing a continuous "ipmitool mc reset cold" to the BMC, the result of
> that being that we'd get stuck in an infinite loop in opal_get_rtc_time.
>
> We now retry a few times before returning the error higher up the stack.

Looks like this has always been broken, so:

Fixes: 16b1d26e77b1 ("rtc/tpo: Driver to support rtc and wakeup on PowerNV 
platform")

> Cc: sta...@vger.kernel.org

And therefore that should be:

Cc: sta...@vger.kernel.org # v3.19+

> Signed-off-by: Stewart Smith 
> ---
>  drivers/rtc/rtc-opal.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/rtc/rtc-opal.c b/drivers/rtc/rtc-opal.c
> index 9c18d6fd8107..fab19e3e2fba 100644
> --- a/drivers/rtc/rtc-opal.c
> +++ b/drivers/rtc/rtc-opal.c
> @@ -58,6 +58,7 @@ static void tm_to_opal(struct rtc_time *tm, u32 *y_m_d, u64 
> *h_m_s_ms)
>  static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm)
>  {
>   long rc = OPAL_BUSY;
> + int retries = 10;
>   u32 y_m_d;
>   u64 h_m_s_ms;
>   __be32 __y_m_d;
> @@ -67,8 +68,11 @@ static int opal_get_rtc_time(struct device *dev, struct 
> rtc_time *tm)
>   rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms);
>   if (rc == OPAL_BUSY_EVENT)
>   opal_poll_events(NULL);
> - else
> + else if (retries-- && (rc == OPAL_HARDWARE
> +|| rc == OPAL_INTERNAL_ERROR))
>   msleep(10);
> + else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT)
> + break;
>   }

This is a pretty gross API at this point.

That's basically a score of 2 on Rusty's API usability index ("Read the 
implementation
and you'll get it right" - 
http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html).

The docs don't mention OPAL_INTERNAL_ERROR being transient, nor do they mention
OPAL_BUSY.

Can we at least do a wrapper function in opal.h for drivers to use that handles
some or all of these cases?

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] rtc-opal: Fix handling of firmware error codes, prevent busy loops

2016-08-01 Thread Stewart Smith
According to the OPAL docs:
https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt
https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt
OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this
indicates either a transient or permanent error.

Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a
permanent error particularly well, in that you could end up in a busy
loop.

This was not too hard to trigger on an AMI BMC based OpenPOWER machine
doing a continuous "ipmitool mc reset cold" to the BMC, the result of
that being that we'd get stuck in an infinite loop in opal_get_rtc_time.

We now retry a few times before returning the error higher up the stack.

Cc: sta...@vger.kernel.org
Signed-off-by: Stewart Smith 
---
 drivers/rtc/rtc-opal.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/rtc/rtc-opal.c b/drivers/rtc/rtc-opal.c
index 9c18d6fd8107..fab19e3e2fba 100644
--- a/drivers/rtc/rtc-opal.c
+++ b/drivers/rtc/rtc-opal.c
@@ -58,6 +58,7 @@ static void tm_to_opal(struct rtc_time *tm, u32 *y_m_d, u64 
*h_m_s_ms)
 static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm)
 {
long rc = OPAL_BUSY;
+   int retries = 10;
u32 y_m_d;
u64 h_m_s_ms;
__be32 __y_m_d;
@@ -67,8 +68,11 @@ static int opal_get_rtc_time(struct device *dev, struct 
rtc_time *tm)
rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms);
if (rc == OPAL_BUSY_EVENT)
opal_poll_events(NULL);
-   else
+   else if (retries-- && (rc == OPAL_HARDWARE
+  || rc == OPAL_INTERNAL_ERROR))
msleep(10);
+   else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT)
+   break;
}
 
if (rc != OPAL_SUCCESS)
@@ -84,6 +88,7 @@ static int opal_get_rtc_time(struct device *dev, struct 
rtc_time *tm)
 static int opal_set_rtc_time(struct device *dev, struct rtc_time *tm)
 {
long rc = OPAL_BUSY;
+   int retries = 10;
u32 y_m_d = 0;
u64 h_m_s_ms = 0;
 
@@ -92,8 +97,11 @@ static int opal_set_rtc_time(struct device *dev, struct 
rtc_time *tm)
rc = opal_rtc_write(y_m_d, h_m_s_ms);
if (rc == OPAL_BUSY_EVENT)
opal_poll_events(NULL);
-   else
+   else if (retries-- && (rc == OPAL_HARDWARE
+  || rc == OPAL_INTERNAL_ERROR))
msleep(10);
+   else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT)
+   break;
}
 
return rc == OPAL_SUCCESS ? 0 : -EIO;
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev