Re: rtc-opal: Fix handling of firmware error codes, prevent busy loops
Alexandre Belloniwrites: > On 06/02/2018 at 16:22:47 +1100, Michael Ellerman wrote: >> > Just a note to let you know that this patch should have gone through my >> > tree but it was not sent to linux-rtc or me. >> >> Sorry, I saw it had been languishing for a long time and assumed you'd >> missed it. >> >> Happy to revert/rework it if you're not happy with it. >> > > No, that's fine. It's just that the commit title stands out when using > git log --oneline and that triggered my OCD ;) Oh no! Now I'm *really* sorry, that's the worst! :) cheers
Re: rtc-opal: Fix handling of firmware error codes, prevent busy loops
On 06/02/2018 at 16:22:47 +1100, Michael Ellerman wrote: > > Just a note to let you know that this patch should have gone through my > > tree but it was not sent to linux-rtc or me. > > Sorry, I saw it had been languishing for a long time and assumed you'd > missed it. > > Happy to revert/rework it if you're not happy with it. > No, that's fine. It's just that the commit title stands out when using git log --oneline and that triggered my OCD ;) > > I guess what happened is that Michael cleaned up the Linux PPC patchwork > > queue. > > Yeah I did. > > In future I'll ping you if there's something that seems to have fallen > through the cracks. > Great thanks! -- Alexandre Belloni, Bootlin (formerly Free Electrons) Embedded Linux and Kernel engineering http://bootlin.com
Re: rtc-opal: Fix handling of firmware error codes, prevent busy loops
Hi, On 02/08/2016 at 11:50:16 +1000, Stewart Smith wrote: > According to the OPAL docs: > https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt > https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt > OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this > indicates either a transient or permanent error. > > Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a > permanent error particularly well, in that you could end up in a busy > loop. > > This was not too hard to trigger on an AMI BMC based OpenPOWER machine > doing a continuous "ipmitool mc reset cold" to the BMC, the result of > that being that we'd get stuck in an infinite loop in opal_get_rtc_time. > > We now retry a few times before returning the error higher up the stack. > > Cc: sta...@vger.kernel.org > Signed-off-by: Stewart Smith> --- > drivers/rtc/rtc-opal.c | 12 ++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > Just a note to let you know that this patch should have gone through my tree but it was not sent to linux-rtc or me. I guess what happened is that Michael cleaned up the Linux PPC patchwork queue. -- Alexandre Belloni, Bootlin (formerly Free Electrons) Embedded Linux and Kernel engineering http://bootlin.com
Re: rtc-opal: Fix handling of firmware error codes, prevent busy loops
Alexandre Belloniwrites: > On 02/08/2016 at 11:50:16 +1000, Stewart Smith wrote: >> According to the OPAL docs: >> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt >> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt >> OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this >> indicates either a transient or permanent error. >> >> Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a >> permanent error particularly well, in that you could end up in a busy >> loop. >> >> This was not too hard to trigger on an AMI BMC based OpenPOWER machine >> doing a continuous "ipmitool mc reset cold" to the BMC, the result of >> that being that we'd get stuck in an infinite loop in opal_get_rtc_time. >> >> We now retry a few times before returning the error higher up the stack. >> >> Cc: sta...@vger.kernel.org >> Signed-off-by: Stewart Smith >> --- >> drivers/rtc/rtc-opal.c | 12 ++-- >> 1 file changed, 10 insertions(+), 2 deletions(-) >> > > Just a note to let you know that this patch should have gone through my > tree but it was not sent to linux-rtc or me. > > I guess what happened is that Michael cleaned up the Linux PPC patchwork > queue. Apologies for not sending there. My (18 month ago self) bad. -- Stewart Smith OPAL Architect, IBM.
Re: rtc-opal: Fix handling of firmware error codes, prevent busy loops
Alexandre Belloniwrites: > Hi, > > On 02/08/2016 at 11:50:16 +1000, Stewart Smith wrote: >> According to the OPAL docs: >> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt >> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt >> OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this >> indicates either a transient or permanent error. >> >> Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a >> permanent error particularly well, in that you could end up in a busy >> loop. >> >> This was not too hard to trigger on an AMI BMC based OpenPOWER machine >> doing a continuous "ipmitool mc reset cold" to the BMC, the result of >> that being that we'd get stuck in an infinite loop in opal_get_rtc_time. >> >> We now retry a few times before returning the error higher up the stack. >> >> Cc: sta...@vger.kernel.org >> Signed-off-by: Stewart Smith >> --- >> drivers/rtc/rtc-opal.c | 12 ++-- >> 1 file changed, 10 insertions(+), 2 deletions(-) >> > > Just a note to let you know that this patch should have gone through my > tree but it was not sent to linux-rtc or me. Sorry, I saw it had been languishing for a long time and assumed you'd missed it. Happy to revert/rework it if you're not happy with it. > I guess what happened is that Michael cleaned up the Linux PPC patchwork > queue. Yeah I did. In future I'll ping you if there's something that seems to have fallen through the cracks. cheers
Re: rtc-opal: Fix handling of firmware error codes, prevent busy loops
On Tue, 2016-08-02 at 01:50:16 UTC, Stewart Smith wrote: > According to the OPAL docs: > https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt > https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt > OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this > indicates either a transient or permanent error. > > Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a > permanent error particularly well, in that you could end up in a busy > loop. > > This was not too hard to trigger on an AMI BMC based OpenPOWER machine > doing a continuous "ipmitool mc reset cold" to the BMC, the result of > that being that we'd get stuck in an infinite loop in opal_get_rtc_time. > > We now retry a few times before returning the error higher up the stack. > > Cc: sta...@vger.kernel.org > Signed-off-by: Stewart SmithApplied to powerpc next, thanks. https://git.kernel.org/powerpc/c/5b8b58063029f02da573120ef4dc90 cheers
Re: [PATCH] rtc-opal: Fix handling of firmware error codes, prevent busy loops
Stewart Smithwrites: > According to the OPAL docs: > https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt > https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt > OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this > indicates either a transient or permanent error. > > Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a > permanent error particularly well, in that you could end up in a busy > loop. > > This was not too hard to trigger on an AMI BMC based OpenPOWER machine > doing a continuous "ipmitool mc reset cold" to the BMC, the result of > that being that we'd get stuck in an infinite loop in opal_get_rtc_time. > > We now retry a few times before returning the error higher up the stack. Looks like this has always been broken, so: Fixes: 16b1d26e77b1 ("rtc/tpo: Driver to support rtc and wakeup on PowerNV platform") > Cc: sta...@vger.kernel.org And therefore that should be: Cc: sta...@vger.kernel.org # v3.19+ > Signed-off-by: Stewart Smith > --- > drivers/rtc/rtc-opal.c | 12 ++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/drivers/rtc/rtc-opal.c b/drivers/rtc/rtc-opal.c > index 9c18d6fd8107..fab19e3e2fba 100644 > --- a/drivers/rtc/rtc-opal.c > +++ b/drivers/rtc/rtc-opal.c > @@ -58,6 +58,7 @@ static void tm_to_opal(struct rtc_time *tm, u32 *y_m_d, u64 > *h_m_s_ms) > static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm) > { > long rc = OPAL_BUSY; > + int retries = 10; > u32 y_m_d; > u64 h_m_s_ms; > __be32 __y_m_d; > @@ -67,8 +68,11 @@ static int opal_get_rtc_time(struct device *dev, struct > rtc_time *tm) > rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms); > if (rc == OPAL_BUSY_EVENT) > opal_poll_events(NULL); > - else > + else if (retries-- && (rc == OPAL_HARDWARE > +|| rc == OPAL_INTERNAL_ERROR)) > msleep(10); > + else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT) > + break; > } This is a pretty gross API at this point. That's basically a score of 2 on Rusty's API usability index ("Read the implementation and you'll get it right" - http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html). The docs don't mention OPAL_INTERNAL_ERROR being transient, nor do they mention OPAL_BUSY. Can we at least do a wrapper function in opal.h for drivers to use that handles some or all of these cases? cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] rtc-opal: Fix handling of firmware error codes, prevent busy loops
According to the OPAL docs: https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this indicates either a transient or permanent error. Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a permanent error particularly well, in that you could end up in a busy loop. This was not too hard to trigger on an AMI BMC based OpenPOWER machine doing a continuous "ipmitool mc reset cold" to the BMC, the result of that being that we'd get stuck in an infinite loop in opal_get_rtc_time. We now retry a few times before returning the error higher up the stack. Cc: sta...@vger.kernel.org Signed-off-by: Stewart Smith--- drivers/rtc/rtc-opal.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/rtc/rtc-opal.c b/drivers/rtc/rtc-opal.c index 9c18d6fd8107..fab19e3e2fba 100644 --- a/drivers/rtc/rtc-opal.c +++ b/drivers/rtc/rtc-opal.c @@ -58,6 +58,7 @@ static void tm_to_opal(struct rtc_time *tm, u32 *y_m_d, u64 *h_m_s_ms) static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm) { long rc = OPAL_BUSY; + int retries = 10; u32 y_m_d; u64 h_m_s_ms; __be32 __y_m_d; @@ -67,8 +68,11 @@ static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm) rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms); if (rc == OPAL_BUSY_EVENT) opal_poll_events(NULL); - else + else if (retries-- && (rc == OPAL_HARDWARE + || rc == OPAL_INTERNAL_ERROR)) msleep(10); + else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT) + break; } if (rc != OPAL_SUCCESS) @@ -84,6 +88,7 @@ static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm) static int opal_set_rtc_time(struct device *dev, struct rtc_time *tm) { long rc = OPAL_BUSY; + int retries = 10; u32 y_m_d = 0; u64 h_m_s_ms = 0; @@ -92,8 +97,11 @@ static int opal_set_rtc_time(struct device *dev, struct rtc_time *tm) rc = opal_rtc_write(y_m_d, h_m_s_ms); if (rc == OPAL_BUSY_EVENT) opal_poll_events(NULL); - else + else if (retries-- && (rc == OPAL_HARDWARE + || rc == OPAL_INTERNAL_ERROR)) msleep(10); + else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT) + break; } return rc == OPAL_SUCCESS ? 0 : -EIO; -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev