Re: [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430

2021-01-08 Thread Adam Ford
On Fri, Jan 8, 2021 at 12:31 PM Adam Ford  wrote:
>
> On Fri, Jan 8, 2021 at 7:45 AM Adam Ford  wrote:
> >
> > On Fri, Jan 8, 2021 at 1:22 AM Tony Lindgren  wrote:
> > >
> > > * H. Nikolaus Schaller  [201230 13:29]:
> > > > > Am 30.12.2020 um 13:55 schrieb Adam Ford :
> > > > > On Wed, Dec 30, 2020 at 2:43 AM Tony Lindgren  
> > > > > wrote:
> > > > >>
> > > > >> At least for 4430, trying to use the single conversion mode 
> > > > >> eventually
> > > > >> hangs the thermal sensor. This can be quite easily seen with errors:
> > > > >>
> > > > >> thermal thermal_zone0: failed to read out thermal zone (-5)
> > > ...
> > >
> > > > > I don't have an OMAP4, but if you want, I can test a DM3730.
> > > >
> > > > Indeed I remember a similar discussion from the DM3730 [1]. temp values 
> > > > were
> > > > always those from the last measurement. E.g. the first one was done
> > > > during (cold) boot and the first request after 10 minutes did show a
> > > > quite cold system... The next one did show a hot system independent
> > > > of what had been between (suspend or high activity).
> > > >
> > > > It seems as if it was even reproducible with a very old kernel on a 
> > > > BeagleBoard.
> > > > So it is quite fundamental.
> > > >
> > > > We tried to fix it but did not come to a solution [2]. So we opened an 
> > > > issue
> > > > in our tracker [3] and decided to stay with continuous conversion 
> > > > although this
> > > > raises idle mode processor load.
> > >
> > > Hmm so maybe eocz high always times out in single mode since it also
> > > triggers at least on dra7?
> > >
> > > Yes it would be great if you guys can the $subject patch a try at
> > > least on your omap36xx and omap5 boards and see if you see eocz
> > > time out warnings in dmesg.


I do see chatter.

[   15.531005] ti-soc-thermal 48002524.bandgap: eocz timed out waiting low
[   16.571075] ti-soc-thermal 48002524.bandgap: eocz timed out waiting low
[   17.610961] ti-soc-thermal 48002524.bandgap: eocz timed out waiting low

and it repeats quite often.

I would say this patch series would cause a regression on the DM3730.

adam


> >
> > I should be able to try it on the dm3730 logicpd-torpedo kit this weekend.
>
> I am going to be a bit delayed testing this.  I cannot boot omap2plus
> using Linux version 5.11.0-rc2.
>
> [2.666748] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xbc
> [2.673309] nand: Micron MT29F4G16ABBDA3W
> [2.677368] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
> 2048, OOB size: 64
> [2.685119] nand: using OMAP_ECC_BCH8_CODE_HW_DETECTION_SW
> [2.693237] Invalid ECC layout
> [2.696350] omap2-nand 3000.nand: unable to use BCH library
> [2.702575] omap2-nand: probe of 3000.nand failed with error -22
> [2.716094] 8<--- cut here ---
> [2.719207] Unable to handle kernel NULL pointer dereference at
> virtual address 0018
> [2.727600] pgd = (ptrval)
> ...
> [3.050933] ---[ end trace 59640c7399a80a07 ]---
> [3.055603] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x000b
> [3.063323] ---[ end Kernel panic - not syncing: Attempted to kill
> init! exitcode=0x000b ]---
>
> Once I get past this, I'll try to test the thermal stuff.
>
> adam
>
> >
> > adam
> > >
> > > Regards,
> > >
> > > Tony


Re: [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430

2021-01-08 Thread Adam Ford
On Fri, Jan 8, 2021 at 7:45 AM Adam Ford  wrote:
>
> On Fri, Jan 8, 2021 at 1:22 AM Tony Lindgren  wrote:
> >
> > * H. Nikolaus Schaller  [201230 13:29]:
> > > > Am 30.12.2020 um 13:55 schrieb Adam Ford :
> > > > On Wed, Dec 30, 2020 at 2:43 AM Tony Lindgren  wrote:
> > > >>
> > > >> At least for 4430, trying to use the single conversion mode eventually
> > > >> hangs the thermal sensor. This can be quite easily seen with errors:
> > > >>
> > > >> thermal thermal_zone0: failed to read out thermal zone (-5)
> > ...
> >
> > > > I don't have an OMAP4, but if you want, I can test a DM3730.
> > >
> > > Indeed I remember a similar discussion from the DM3730 [1]. temp values 
> > > were
> > > always those from the last measurement. E.g. the first one was done
> > > during (cold) boot and the first request after 10 minutes did show a
> > > quite cold system... The next one did show a hot system independent
> > > of what had been between (suspend or high activity).
> > >
> > > It seems as if it was even reproducible with a very old kernel on a 
> > > BeagleBoard.
> > > So it is quite fundamental.
> > >
> > > We tried to fix it but did not come to a solution [2]. So we opened an 
> > > issue
> > > in our tracker [3] and decided to stay with continuous conversion 
> > > although this
> > > raises idle mode processor load.
> >
> > Hmm so maybe eocz high always times out in single mode since it also
> > triggers at least on dra7?
> >
> > Yes it would be great if you guys can the $subject patch a try at
> > least on your omap36xx and omap5 boards and see if you see eocz
> > time out warnings in dmesg.
>
> I should be able to try it on the dm3730 logicpd-torpedo kit this weekend.

I am going to be a bit delayed testing this.  I cannot boot omap2plus
using Linux version 5.11.0-rc2.

[2.666748] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xbc
[2.673309] nand: Micron MT29F4G16ABBDA3W
[2.677368] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
2048, OOB size: 64
[2.685119] nand: using OMAP_ECC_BCH8_CODE_HW_DETECTION_SW
[2.693237] Invalid ECC layout
[2.696350] omap2-nand 3000.nand: unable to use BCH library
[2.702575] omap2-nand: probe of 3000.nand failed with error -22
[2.716094] 8<--- cut here ---
[2.719207] Unable to handle kernel NULL pointer dereference at
virtual address 0018
[2.727600] pgd = (ptrval)
...
[3.050933] ---[ end trace 59640c7399a80a07 ]---
[3.055603] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x000b
[3.063323] ---[ end Kernel panic - not syncing: Attempted to kill
init! exitcode=0x000b ]---

Once I get past this, I'll try to test the thermal stuff.

adam

>
> adam
> >
> > Regards,
> >
> > Tony


Re: [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430

2021-01-08 Thread Adam Ford
On Fri, Jan 8, 2021 at 1:22 AM Tony Lindgren  wrote:
>
> * H. Nikolaus Schaller  [201230 13:29]:
> > > Am 30.12.2020 um 13:55 schrieb Adam Ford :
> > > On Wed, Dec 30, 2020 at 2:43 AM Tony Lindgren  wrote:
> > >>
> > >> At least for 4430, trying to use the single conversion mode eventually
> > >> hangs the thermal sensor. This can be quite easily seen with errors:
> > >>
> > >> thermal thermal_zone0: failed to read out thermal zone (-5)
> ...
>
> > > I don't have an OMAP4, but if you want, I can test a DM3730.
> >
> > Indeed I remember a similar discussion from the DM3730 [1]. temp values were
> > always those from the last measurement. E.g. the first one was done
> > during (cold) boot and the first request after 10 minutes did show a
> > quite cold system... The next one did show a hot system independent
> > of what had been between (suspend or high activity).
> >
> > It seems as if it was even reproducible with a very old kernel on a 
> > BeagleBoard.
> > So it is quite fundamental.
> >
> > We tried to fix it but did not come to a solution [2]. So we opened an issue
> > in our tracker [3] and decided to stay with continuous conversion although 
> > this
> > raises idle mode processor load.
>
> Hmm so maybe eocz high always times out in single mode since it also
> triggers at least on dra7?
>
> Yes it would be great if you guys can the $subject patch a try at
> least on your omap36xx and omap5 boards and see if you see eocz
> time out warnings in dmesg.

I should be able to try it on the dm3730 logicpd-torpedo kit this weekend.

adam
>
> Regards,
>
> Tony


Re: [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430

2021-01-07 Thread Tony Lindgren
* H. Nikolaus Schaller  [201230 13:29]:
> > Am 30.12.2020 um 13:55 schrieb Adam Ford :
> > On Wed, Dec 30, 2020 at 2:43 AM Tony Lindgren  wrote:
> >> 
> >> At least for 4430, trying to use the single conversion mode eventually
> >> hangs the thermal sensor. This can be quite easily seen with errors:
> >> 
> >> thermal thermal_zone0: failed to read out thermal zone (-5)
...

> > I don't have an OMAP4, but if you want, I can test a DM3730.
> 
> Indeed I remember a similar discussion from the DM3730 [1]. temp values were
> always those from the last measurement. E.g. the first one was done
> during (cold) boot and the first request after 10 minutes did show a
> quite cold system... The next one did show a hot system independent
> of what had been between (suspend or high activity).
> 
> It seems as if it was even reproducible with a very old kernel on a 
> BeagleBoard.
> So it is quite fundamental.
> 
> We tried to fix it but did not come to a solution [2]. So we opened an issue
> in our tracker [3] and decided to stay with continuous conversion although 
> this
> raises idle mode processor load.

Hmm so maybe eocz high always times out in single mode since it also
triggers at least on dra7?

Yes it would be great if you guys can the $subject patch a try at
least on your omap36xx and omap5 boards and see if you see eocz
time out warnings in dmesg.

Regards,

Tony


Re: [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430

2021-01-07 Thread Tony Lindgren
* Péter Ujfalusi  [201231 12:55]:
> On 12/30/20 10:43 AM, Tony Lindgren wrote:
> > @@ -58,7 +58,8 @@ omap4430_adc_to_temp[OMAP4430_ADC_END_VALUE - 
> > OMAP4430_ADC_START_VALUE + 1] = {
> >  const struct ti_bandgap_data omap4430_data = {
> > .features = TI_BANDGAP_FEATURE_MODE_CONFIG |
> > TI_BANDGAP_FEATURE_CLK_CTRL |
> > -   TI_BANDGAP_FEATURE_POWER_SWITCH,
> > +   TI_BANDGAP_FEATURE_POWER_SWITCH |
> > +   TI_BANDGAP_FEATURE_CONT_MODE_ONLY,
> 
> Can we add a comment with the observations?

Sure, and I also noticed that the timeout triggers also on dra7
too. I need to recheck what all are affected.. At least we now
see warnings on the SoCs affected.

> > @@ -605,8 +606,10 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, 
> > int id)
> > u32 counter = 1000;
> > struct temp_sensor_registers *tsr;
> >  
> > -   /* Select single conversion mode */
> > -   if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
> > +   /* Select continuous or single conversion mode */
> > +   if (TI_BANDGAP_HAS(bgp, CONT_MODE_ONLY))
> > +   RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 1);
> > +   else if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
> > RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 0);
> 
> Would not be better to:
> if (TI_BANDGAP_HAS(bgp, MODE_CONFIG)) {
>   if (TI_BANDGAP_HAS(bgp, CONT_MODE_ONLY))
>   RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 1);
>   else
>   RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 0);
> }
> 
> One can only switch to cont/single mode if the mode config is possible.

Yup makes sense thanks for spotting that.

Regards,

Tony


Re: [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430

2020-12-31 Thread Péter Ujfalusi
Hi Tony,

On 12/30/20 10:43 AM, Tony Lindgren wrote:
> At least for 4430, trying to use the single conversion mode eventually
> hangs the thermal sensor. This can be quite easily seen with errors:
> 
> thermal thermal_zone0: failed to read out thermal zone (-5)
> 
> Also, trying to read the temperature shows a stuck value with:
> 
> $ while true; do cat /sys/class/thermal/thermal_zone0/temp; done
> 
> Where the temperature is not rising at all with the busy loop.
> 
> Additionally, the EOCZ (end of conversion) bit is not rising on 4430 in
> single conversion mode while it works fine in continuous conversion mode.
> It is also possible that the hung temperature sensor can affect the
> thermal shutdown alert too.
> 
> Let's fix the issue by adding TI_BANDGAP_FEATURE_CONT_MODE_ONLY flag and
> use it for 4430.
> 
> Note that we also need to add udelay to for the EOCZ (end of conversion)
> bit polling as otherwise we have it time out too early on 4430. We'll be
> changing the loop to use iopoll in the following clean-up patch.

I don't yet have my setup in working condition, so I can not test these.

> Cc: Adam Ford 
> Cc: Carl Philipp Klemm 
> Cc: Eduardo Valentin 
> Cc: Merlijn Wajer 
> Cc: Pavel Machek 
> Cc: Peter Ujfalusi 
> Cc: Sebastian Reichel 
> Signed-off-by: Tony Lindgren 
> ---
>  drivers/thermal/ti-soc-thermal/omap4-thermal-data.c | 3 ++-
>  drivers/thermal/ti-soc-thermal/ti-bandgap.c | 9 +++--
>  drivers/thermal/ti-soc-thermal/ti-bandgap.h | 2 ++
>  3 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c 
> b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
> --- a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
> +++ b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
> @@ -58,7 +58,8 @@ omap4430_adc_to_temp[OMAP4430_ADC_END_VALUE - 
> OMAP4430_ADC_START_VALUE + 1] = {
>  const struct ti_bandgap_data omap4430_data = {
>   .features = TI_BANDGAP_FEATURE_MODE_CONFIG |
>   TI_BANDGAP_FEATURE_CLK_CTRL |
> - TI_BANDGAP_FEATURE_POWER_SWITCH,
> + TI_BANDGAP_FEATURE_POWER_SWITCH |
> + TI_BANDGAP_FEATURE_CONT_MODE_ONLY,

Can we add a comment with the observations?

>   .fclock_name = "bandgap_fclk",
>   .div_ck_name = "bandgap_fclk",
>   .conv_table = omap4430_adc_to_temp,
> diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.c 
> b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
> --- a/drivers/thermal/ti-soc-thermal/ti-bandgap.c
> +++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -605,8 +606,10 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int 
> id)
>   u32 counter = 1000;
>   struct temp_sensor_registers *tsr;
>  
> - /* Select single conversion mode */
> - if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
> + /* Select continuous or single conversion mode */
> + if (TI_BANDGAP_HAS(bgp, CONT_MODE_ONLY))
> + RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 1);
> + else if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
>   RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 0);

Would not be better to:
if (TI_BANDGAP_HAS(bgp, MODE_CONFIG)) {
if (TI_BANDGAP_HAS(bgp, CONT_MODE_ONLY))
RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 1);
else
RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 0);
}

One can only switch to cont/single mode if the mode config is possible.

>  
>   /* Start of Conversion = 1 */
> @@ -619,6 +622,7 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int 
> id)
>   if (ti_bandgap_readl(bgp, tsr->temp_sensor_ctrl) &
>   tsr->bgap_eocz_mask)
>   break;
> + udelay(1);
>   }
>  
>   /* Start of Conversion = 0 */
> @@ -630,6 +634,7 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int 
> id)
>   if (!(ti_bandgap_readl(bgp, tsr->temp_sensor_ctrl) &
> tsr->bgap_eocz_mask))
>   break;
> + udelay(1);
>   }
>  
>   return 0;
> diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.h 
> b/drivers/thermal/ti-soc-thermal/ti-bandgap.h
> --- a/drivers/thermal/ti-soc-thermal/ti-bandgap.h
> +++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.h
> @@ -280,6 +280,7 @@ struct ti_temp_sensor {
>   *   has Errata 814
>   * TI_BANDGAP_FEATURE_UNRELIABLE - used when the sensor readings are too
>   *   inaccurate.
> + * TI_BANDGAP_FEATURE_CONT_MODE_ONLY - used when single mode hangs the sensor
>   * TI_BANDGAP_HAS(b, f) - macro to check if a bandgap device is capable of a
>   *  specific feature (above) or not. Return non-zero, if yes.
>   */
> @@ -295,6 +296,7 @@ struct ti_temp_sensor {
>  #define TI_BANDGAP_FEATURE_HISTORY_BUFFERBIT(9)
>  #define 

Re: [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430

2020-12-30 Thread H. Nikolaus Schaller
Hi Adam and Tony,

> Am 30.12.2020 um 13:55 schrieb Adam Ford :
> 
> On Wed, Dec 30, 2020 at 2:43 AM Tony Lindgren  wrote:
>> 
>> At least for 4430, trying to use the single conversion mode eventually
>> hangs the thermal sensor. This can be quite easily seen with errors:
>> 
>> thermal thermal_zone0: failed to read out thermal zone (-5)
>> 
>> Also, trying to read the temperature shows a stuck value with:
>> 
>> $ while true; do cat /sys/class/thermal/thermal_zone0/temp; done
>> 
>> Where the temperature is not rising at all with the busy loop.
>> 
>> Additionally, the EOCZ (end of conversion) bit is not rising on 4430 in
>> single conversion mode while it works fine in continuous conversion mode.
>> It is also possible that the hung temperature sensor can affect the
>> thermal shutdown alert too.
>> 
>> Let's fix the issue by adding TI_BANDGAP_FEATURE_CONT_MODE_ONLY flag and
>> use it for 4430.
>> 
>> Note that we also need to add udelay to for the EOCZ (end of conversion)
>> bit polling as otherwise we have it time out too early on 4430. We'll be
>> changing the loop to use iopoll in the following clean-up patch.
>> 
>> Cc: Adam Ford 
> 
> I don't have an OMAP4, but if you want, I can test a DM3730.

Indeed I remember a similar discussion from the DM3730 [1]. temp values were
always those from the last measurement. E.g. the first one was done
during (cold) boot and the first request after 10 minutes did show a
quite cold system... The next one did show a hot system independent
of what had been between (suspend or high activity).

It seems as if it was even reproducible with a very old kernel on a BeagleBoard.
So it is quite fundamental.

We tried to fix it but did not come to a solution [2]. So we opened an issue
in our tracker [3] and decided to stay with continuous conversion although this
raises idle mode processor load.

BR,
Nikolaus

[1]: 
https://lists.goldelico.com/pipermail/letux-kernel/2019-September/003958.html
[2]: 
https://lists.goldelico.com/pipermail/letux-kernel/2019-September/003975.html
[3]: https://projects.goldelico.com/p/gta04-kernel/issues/928/

> adam
> 
>> Cc: Carl Philipp Klemm 
>> Cc: Eduardo Valentin 
>> Cc: Merlijn Wajer 
>> Cc: Pavel Machek 
>> Cc: Peter Ujfalusi 
>> Cc: Sebastian Reichel 
>> Signed-off-by: Tony Lindgren 
>> ---
>> drivers/thermal/ti-soc-thermal/omap4-thermal-data.c | 3 ++-
>> drivers/thermal/ti-soc-thermal/ti-bandgap.c | 9 +++--
>> drivers/thermal/ti-soc-thermal/ti-bandgap.h | 2 ++
>> 3 files changed, 11 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c 
>> b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
>> --- a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
>> +++ b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
>> @@ -58,7 +58,8 @@ omap4430_adc_to_temp[OMAP4430_ADC_END_VALUE - 
>> OMAP4430_ADC_START_VALUE + 1] = {
>> const struct ti_bandgap_data omap4430_data = {
>>.features = TI_BANDGAP_FEATURE_MODE_CONFIG |
>>TI_BANDGAP_FEATURE_CLK_CTRL |
>> -   TI_BANDGAP_FEATURE_POWER_SWITCH,
>> +   TI_BANDGAP_FEATURE_POWER_SWITCH |
>> +   TI_BANDGAP_FEATURE_CONT_MODE_ONLY,
>>.fclock_name = "bandgap_fclk",
>>.div_ck_name = "bandgap_fclk",
>>.conv_table = omap4430_adc_to_temp,
>> diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.c 
>> b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
>> --- a/drivers/thermal/ti-soc-thermal/ti-bandgap.c
>> +++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
>> @@ -15,6 +15,7 @@
>> #include 
>> #include 
>> #include 
>> +#include 
>> #include 
>> #include 
>> #include 
>> @@ -605,8 +606,10 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, 
>> int id)
>>u32 counter = 1000;
>>struct temp_sensor_registers *tsr;
>> 
>> -   /* Select single conversion mode */
>> -   if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
>> +   /* Select continuous or single conversion mode */
>> +   if (TI_BANDGAP_HAS(bgp, CONT_MODE_ONLY))
>> +   RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 1);
>> +   else if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
>>RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 0);
>> 
>>/* Start of Conversion = 1 */
>> @@ -619,6 +622,7 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int 
>> id)
>>if (ti_bandgap_readl(bgp, tsr->temp_sensor_ctrl) &
>>tsr->bgap_eocz_mask)
>>break;
>> +   udelay(1);
>>}
>> 
>>/* Start of Conversion = 0 */
>> @@ -630,6 +634,7 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int 
>> id)
>>if (!(ti_bandgap_readl(bgp, tsr->temp_sensor_ctrl) &
>>  tsr->bgap_eocz_mask))
>>break;
>> +   udelay(1);
>>}
>> 
>>return 0;
>> diff --git 

Re: [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430

2020-12-30 Thread Adam Ford
On Wed, Dec 30, 2020 at 2:43 AM Tony Lindgren  wrote:
>
> At least for 4430, trying to use the single conversion mode eventually
> hangs the thermal sensor. This can be quite easily seen with errors:
>
> thermal thermal_zone0: failed to read out thermal zone (-5)
>
> Also, trying to read the temperature shows a stuck value with:
>
> $ while true; do cat /sys/class/thermal/thermal_zone0/temp; done
>
> Where the temperature is not rising at all with the busy loop.
>
> Additionally, the EOCZ (end of conversion) bit is not rising on 4430 in
> single conversion mode while it works fine in continuous conversion mode.
> It is also possible that the hung temperature sensor can affect the
> thermal shutdown alert too.
>
> Let's fix the issue by adding TI_BANDGAP_FEATURE_CONT_MODE_ONLY flag and
> use it for 4430.
>
> Note that we also need to add udelay to for the EOCZ (end of conversion)
> bit polling as otherwise we have it time out too early on 4430. We'll be
> changing the loop to use iopoll in the following clean-up patch.
>
> Cc: Adam Ford 

I don't have an OMAP4, but if you want, I can test a DM3730.

adam

> Cc: Carl Philipp Klemm 
> Cc: Eduardo Valentin 
> Cc: Merlijn Wajer 
> Cc: Pavel Machek 
> Cc: Peter Ujfalusi 
> Cc: Sebastian Reichel 
> Signed-off-by: Tony Lindgren 
> ---
>  drivers/thermal/ti-soc-thermal/omap4-thermal-data.c | 3 ++-
>  drivers/thermal/ti-soc-thermal/ti-bandgap.c | 9 +++--
>  drivers/thermal/ti-soc-thermal/ti-bandgap.h | 2 ++
>  3 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c 
> b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
> --- a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
> +++ b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
> @@ -58,7 +58,8 @@ omap4430_adc_to_temp[OMAP4430_ADC_END_VALUE - 
> OMAP4430_ADC_START_VALUE + 1] = {
>  const struct ti_bandgap_data omap4430_data = {
> .features = TI_BANDGAP_FEATURE_MODE_CONFIG |
> TI_BANDGAP_FEATURE_CLK_CTRL |
> -   TI_BANDGAP_FEATURE_POWER_SWITCH,
> +   TI_BANDGAP_FEATURE_POWER_SWITCH |
> +   TI_BANDGAP_FEATURE_CONT_MODE_ONLY,
> .fclock_name = "bandgap_fclk",
> .div_ck_name = "bandgap_fclk",
> .conv_table = omap4430_adc_to_temp,
> diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.c 
> b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
> --- a/drivers/thermal/ti-soc-thermal/ti-bandgap.c
> +++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -605,8 +606,10 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int 
> id)
> u32 counter = 1000;
> struct temp_sensor_registers *tsr;
>
> -   /* Select single conversion mode */
> -   if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
> +   /* Select continuous or single conversion mode */
> +   if (TI_BANDGAP_HAS(bgp, CONT_MODE_ONLY))
> +   RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 1);
> +   else if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
> RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 0);
>
> /* Start of Conversion = 1 */
> @@ -619,6 +622,7 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int 
> id)
> if (ti_bandgap_readl(bgp, tsr->temp_sensor_ctrl) &
> tsr->bgap_eocz_mask)
> break;
> +   udelay(1);
> }
>
> /* Start of Conversion = 0 */
> @@ -630,6 +634,7 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int 
> id)
> if (!(ti_bandgap_readl(bgp, tsr->temp_sensor_ctrl) &
>   tsr->bgap_eocz_mask))
> break;
> +   udelay(1);
> }
>
> return 0;
> diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.h 
> b/drivers/thermal/ti-soc-thermal/ti-bandgap.h
> --- a/drivers/thermal/ti-soc-thermal/ti-bandgap.h
> +++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.h
> @@ -280,6 +280,7 @@ struct ti_temp_sensor {
>   * has Errata 814
>   * TI_BANDGAP_FEATURE_UNRELIABLE - used when the sensor readings are too
>   * inaccurate.
> + * TI_BANDGAP_FEATURE_CONT_MODE_ONLY - used when single mode hangs the sensor
>   * TI_BANDGAP_HAS(b, f) - macro to check if a bandgap device is capable of a
>   *  specific feature (above) or not. Return non-zero, if yes.
>   */
> @@ -295,6 +296,7 @@ struct ti_temp_sensor {
>  #define TI_BANDGAP_FEATURE_HISTORY_BUFFER  BIT(9)
>  #define TI_BANDGAP_FEATURE_ERRATA_814  BIT(10)
>  #define TI_BANDGAP_FEATURE_UNRELIABLE  BIT(11)
> +#define TI_BANDGAP_FEATURE_CONT_MODE_ONLY  BIT(12)
>  #define TI_BANDGAP_HAS(b, f)   \
> ((b)->conf->features & TI_BANDGAP_FEATURE_ ## f)
>
> --
> 2.29.2


[PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430

2020-12-30 Thread Tony Lindgren
At least for 4430, trying to use the single conversion mode eventually
hangs the thermal sensor. This can be quite easily seen with errors:

thermal thermal_zone0: failed to read out thermal zone (-5)

Also, trying to read the temperature shows a stuck value with:

$ while true; do cat /sys/class/thermal/thermal_zone0/temp; done

Where the temperature is not rising at all with the busy loop.

Additionally, the EOCZ (end of conversion) bit is not rising on 4430 in
single conversion mode while it works fine in continuous conversion mode.
It is also possible that the hung temperature sensor can affect the
thermal shutdown alert too.

Let's fix the issue by adding TI_BANDGAP_FEATURE_CONT_MODE_ONLY flag and
use it for 4430.

Note that we also need to add udelay to for the EOCZ (end of conversion)
bit polling as otherwise we have it time out too early on 4430. We'll be
changing the loop to use iopoll in the following clean-up patch.

Cc: Adam Ford 
Cc: Carl Philipp Klemm 
Cc: Eduardo Valentin 
Cc: Merlijn Wajer 
Cc: Pavel Machek 
Cc: Peter Ujfalusi 
Cc: Sebastian Reichel 
Signed-off-by: Tony Lindgren 
---
 drivers/thermal/ti-soc-thermal/omap4-thermal-data.c | 3 ++-
 drivers/thermal/ti-soc-thermal/ti-bandgap.c | 9 +++--
 drivers/thermal/ti-soc-thermal/ti-bandgap.h | 2 ++
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c 
b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
--- a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
+++ b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
@@ -58,7 +58,8 @@ omap4430_adc_to_temp[OMAP4430_ADC_END_VALUE - 
OMAP4430_ADC_START_VALUE + 1] = {
 const struct ti_bandgap_data omap4430_data = {
.features = TI_BANDGAP_FEATURE_MODE_CONFIG |
TI_BANDGAP_FEATURE_CLK_CTRL |
-   TI_BANDGAP_FEATURE_POWER_SWITCH,
+   TI_BANDGAP_FEATURE_POWER_SWITCH |
+   TI_BANDGAP_FEATURE_CONT_MODE_ONLY,
.fclock_name = "bandgap_fclk",
.div_ck_name = "bandgap_fclk",
.conv_table = omap4430_adc_to_temp,
diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.c 
b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
--- a/drivers/thermal/ti-soc-thermal/ti-bandgap.c
+++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -605,8 +606,10 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int 
id)
u32 counter = 1000;
struct temp_sensor_registers *tsr;
 
-   /* Select single conversion mode */
-   if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
+   /* Select continuous or single conversion mode */
+   if (TI_BANDGAP_HAS(bgp, CONT_MODE_ONLY))
+   RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 1);
+   else if (TI_BANDGAP_HAS(bgp, MODE_CONFIG))
RMW_BITS(bgp, id, bgap_mode_ctrl, mode_ctrl_mask, 0);
 
/* Start of Conversion = 1 */
@@ -619,6 +622,7 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int id)
if (ti_bandgap_readl(bgp, tsr->temp_sensor_ctrl) &
tsr->bgap_eocz_mask)
break;
+   udelay(1);
}
 
/* Start of Conversion = 0 */
@@ -630,6 +634,7 @@ ti_bandgap_force_single_read(struct ti_bandgap *bgp, int id)
if (!(ti_bandgap_readl(bgp, tsr->temp_sensor_ctrl) &
  tsr->bgap_eocz_mask))
break;
+   udelay(1);
}
 
return 0;
diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.h 
b/drivers/thermal/ti-soc-thermal/ti-bandgap.h
--- a/drivers/thermal/ti-soc-thermal/ti-bandgap.h
+++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.h
@@ -280,6 +280,7 @@ struct ti_temp_sensor {
  * has Errata 814
  * TI_BANDGAP_FEATURE_UNRELIABLE - used when the sensor readings are too
  * inaccurate.
+ * TI_BANDGAP_FEATURE_CONT_MODE_ONLY - used when single mode hangs the sensor
  * TI_BANDGAP_HAS(b, f) - macro to check if a bandgap device is capable of a
  *  specific feature (above) or not. Return non-zero, if yes.
  */
@@ -295,6 +296,7 @@ struct ti_temp_sensor {
 #define TI_BANDGAP_FEATURE_HISTORY_BUFFER  BIT(9)
 #define TI_BANDGAP_FEATURE_ERRATA_814  BIT(10)
 #define TI_BANDGAP_FEATURE_UNRELIABLE  BIT(11)
+#define TI_BANDGAP_FEATURE_CONT_MODE_ONLY  BIT(12)
 #define TI_BANDGAP_HAS(b, f)   \
((b)->conf->features & TI_BANDGAP_FEATURE_ ## f)
 
-- 
2.29.2