Re: [PATCH] media: venus: hfi: fix error handling in hfi_sys_init_done()

2017-07-09 Thread Rob Clark
On Sun, Jul 9, 2017 at 3:49 PM, Stanimir Varbanov
 wrote:
> Hi Rob,
>
> On 07/09/2017 04:19 PM, Rob Clark wrote:
>> Not entirely sure what triggers it, but with venus build as kernel
>> module and in initrd, we hit this crash:
>
> Is it happens occasionally or everytime in the initrd? And also with
> your patch it will bail out on venus_probe, does it crash again on next
> venus_probe?

seems to happen every time.. (module is ending up in initrd, but not
sure if fw is.. which might be triggering this?)

I could not boot successfully without this patch, but otoh I haven't
yet tried if venus actually works after boot.

BR,
-R

>>
>>   Unable to handle kernel paging request at virtual address 80003c039000
>>   pgd = 0a14f000
>>   [80003c039000] *pgd=bd9f7003, *pud=bd9f6003, 
>> *pmd=bd9f0003, *pte=
>>   Internal error: Oops: 9607 [#1] SMP
>>   Modules linked in: qcom_wcnss_pil(E+) crc32_ce(E) qcom_common(E) 
>> venus_core(E+) remoteproc(E) snd_soc_msm8916_digital(E) virtio_ring(E) 
>> cdc_ether(E) snd_soc_lpass_apq8016(E) snd_soc_lpass_cpu(E) 
>> snd_soc_apq8016_sbc(E) snd_soc_lpass_platform(E) v4l2_mem2mem(E) virtio(E) 
>> snd_soc_core(E) ac97_bus(E) snd_pcm_dmaengine(E) snd_seq(E) leds_gpio(E) 
>> videobuf2_v4l2(E) videobuf2_core(E) snd_seq_device(E) sndi_pcm(E) 
>> videodev(E) media(E) nvmem_qfprom(E) msm(E) snd_timer(E) snd(E) soundcore(E) 
>> spi_qup(E) mdt_loader(E) qcom_tsens(E) qcom_spmi_temp_alarm(E) nvmem_core(E) 
>> msm_rng(E) uas(E) usb_storage(E) dm9601(E) usbnet(E) mii(E) mmc_block(E) 
>> adv7511(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) 
>> fb_sys_fops(E) qcom_spmi_vadc(E) qcom_vadc_common(PE) industrialio(E) 
>> pinctrl_spmi_mpp(E)
>>pinctrl_spmi_gpio(E) rtc_pm8xxx(E) clk_smd_rpm(E) sdhci_msm(E) 
>> sdhci_pltfm(E) qcom_smd_regulator(E) drm(E) smd_rpm(E) qcom_spmi_pmic(E) 
>> regmap_spmi(E) ci_hdrc_msm(E) ci_hdrc(E) usb3503(E) extcon_usb_gpio(E) 
>> phy_msm_usb(E) udc_core(E) qcom_hwspinlock(E) extcon_core(E) ehci_msm(E) 
>> i2c_qup(E) sdhci(E) mmc_core(E) spmi_pmic_arb(E) spmi(E) qcom_smd(E) smsm(E) 
>> rpmsg_core(E) smp2p(E) smem(E) hwspinlock_core(E) gpio_keys(E)
>>   CPU: 2 PID: 551 Comm: irq/150-venus Tainted: PE   4.12.0+ #1625
>>   Hardware name: qualcomm dragonboard410c/dragonboard410c, BIOS 
>> 2017.07-rc2-00144-ga97bdbdf72-dirty 07/08/2017
>>   task: 800037338000 task.stack: 800038e0
>>   PC is at hfi_sys_init_done+0x64/0x140 [venus_core]
>>   LR is at hfi_process_msg_packet+0xcc/0x1e8 [venus_core]
>>   pc : [] lr : [] pstate: 20400145
>>   sp : 800038e03c60
>>   x29: 800038e03c60 x28: 
>>   x27: 000df018 x26: 0118f4d0
>>   x25: 00020003 x24: 80003a8d3010
>>   x23: 0118f760 x22: 800037b40028
>>   x21: 8000382981f0 x20: 800037b40028
>>   x19: 80003c039000 x18: 0020
>>   x17:  x16: 800037338000
>>   x15:  x14: 00100014
>>   x13: 00011007 x12: 00010020
>>   x11: 100e x10: 0001
>>   x9 : 0002 x8 : 00140001
>>   x7 : 1010 x6 : 0148
>>   x5 : 1009 x4 : 80003c039000
>>   x3 : cd770abb x2 : 0042
>>   x1 : 0788 x0 : 0002
>>   Process irq/150-venus (pid: 551, stack limit = 0x800038e0)
>>   Call trace:
>>   [] hfi_sys_init_done+0x64/0x140 [venus_core]
>>   [] hfi_process_msg_packet+0xcc/0x1e8 [venus_core]
>>   [] venus_isr_thread+0x1b4/0x208 [venus_core]
>>   [] hfi_isr_thread+0x28/0x38 [venus_core]
>>   [] irq_thread_fn+0x30/0x70
>>   [] irq_thread+0x14c/0x1c8
>>   [] kthread+0x138/0x140
>>   [] ret_from_fork+0x10/0x40
>>   Code: 52820125 52820207 7a431820 54000249 (b9400263)
>>   ---[ end trace c963460f20a984b6 ]---
>>
>> The problem is that in the error case, we've incremented the data ptr
>> but not decremented rem_bytes, and keep reading (presumably garbage)
>> until eventually we go beyond the end of the buffer.
>>
>> Instead, on first error, we should probably just bail out.  Other
>> option is to increment read_bytes by sizeof(u32) before the switch,
>> rather than only accounting for the ptype header in the non-error
>> case.  Note that in this case it is HFI_ERR_SYS_INVALID_PARAMETER,
>> ie. an unrecognized/unsupported parameter, so interpreting the next
>> word as a property type would be bogus.  The other error cases are
>> due to truncated buffer, so there isn't likely to be anything valid
>> to interpret in the remainder of the buffer.  So just bailing seems
>> like a reasonable solution.
>
> I have a WIP patch which rewrite the message parsing, it would be nice
> if I can reproduce this crash.
>
>>
>> Signed-off-by: Rob Clark 
>> ---
>>  drivers/media/platform/qcom/venus/hfi_msgs.c | 11 ++-
>>  1 file changed, 6 insertions(+), 5 

Re: [PATCH] media: venus: hfi: fix error handling in hfi_sys_init_done()

2017-07-09 Thread Stanimir Varbanov
Hi Rob,

On 07/09/2017 04:19 PM, Rob Clark wrote:
> Not entirely sure what triggers it, but with venus build as kernel
> module and in initrd, we hit this crash:

Is it happens occasionally or everytime in the initrd? And also with
your patch it will bail out on venus_probe, does it crash again on next
venus_probe?

> 
>   Unable to handle kernel paging request at virtual address 80003c039000
>   pgd = 0a14f000
>   [80003c039000] *pgd=bd9f7003, *pud=bd9f6003, 
> *pmd=bd9f0003, *pte=
>   Internal error: Oops: 9607 [#1] SMP
>   Modules linked in: qcom_wcnss_pil(E+) crc32_ce(E) qcom_common(E) 
> venus_core(E+) remoteproc(E) snd_soc_msm8916_digital(E) virtio_ring(E) 
> cdc_ether(E) snd_soc_lpass_apq8016(E) snd_soc_lpass_cpu(E) 
> snd_soc_apq8016_sbc(E) snd_soc_lpass_platform(E) v4l2_mem2mem(E) virtio(E) 
> snd_soc_core(E) ac97_bus(E) snd_pcm_dmaengine(E) snd_seq(E) leds_gpio(E) 
> videobuf2_v4l2(E) videobuf2_core(E) snd_seq_device(E) sndi_pcm(E) videodev(E) 
> media(E) nvmem_qfprom(E) msm(E) snd_timer(E) snd(E) soundcore(E) spi_qup(E) 
> mdt_loader(E) qcom_tsens(E) qcom_spmi_temp_alarm(E) nvmem_core(E) msm_rng(E) 
> uas(E) usb_storage(E) dm9601(E) usbnet(E) mii(E) mmc_block(E) adv7511(E) 
> drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) 
> qcom_spmi_vadc(E) qcom_vadc_common(PE) industrialio(E) pinctrl_spmi_mpp(E)
>pinctrl_spmi_gpio(E) rtc_pm8xxx(E) clk_smd_rpm(E) sdhci_msm(E) 
> sdhci_pltfm(E) qcom_smd_regulator(E) drm(E) smd_rpm(E) qcom_spmi_pmic(E) 
> regmap_spmi(E) ci_hdrc_msm(E) ci_hdrc(E) usb3503(E) extcon_usb_gpio(E) 
> phy_msm_usb(E) udc_core(E) qcom_hwspinlock(E) extcon_core(E) ehci_msm(E) 
> i2c_qup(E) sdhci(E) mmc_core(E) spmi_pmic_arb(E) spmi(E) qcom_smd(E) smsm(E) 
> rpmsg_core(E) smp2p(E) smem(E) hwspinlock_core(E) gpio_keys(E)
>   CPU: 2 PID: 551 Comm: irq/150-venus Tainted: PE   4.12.0+ #1625
>   Hardware name: qualcomm dragonboard410c/dragonboard410c, BIOS 
> 2017.07-rc2-00144-ga97bdbdf72-dirty 07/08/2017
>   task: 800037338000 task.stack: 800038e0
>   PC is at hfi_sys_init_done+0x64/0x140 [venus_core]
>   LR is at hfi_process_msg_packet+0xcc/0x1e8 [venus_core]
>   pc : [] lr : [] pstate: 20400145
>   sp : 800038e03c60
>   x29: 800038e03c60 x28: 
>   x27: 000df018 x26: 0118f4d0
>   x25: 00020003 x24: 80003a8d3010
>   x23: 0118f760 x22: 800037b40028
>   x21: 8000382981f0 x20: 800037b40028
>   x19: 80003c039000 x18: 0020
>   x17:  x16: 800037338000
>   x15:  x14: 00100014
>   x13: 00011007 x12: 00010020
>   x11: 100e x10: 0001
>   x9 : 0002 x8 : 00140001
>   x7 : 1010 x6 : 0148
>   x5 : 1009 x4 : 80003c039000
>   x3 : cd770abb x2 : 0042
>   x1 : 0788 x0 : 0002
>   Process irq/150-venus (pid: 551, stack limit = 0x800038e0)
>   Call trace:
>   [] hfi_sys_init_done+0x64/0x140 [venus_core]
>   [] hfi_process_msg_packet+0xcc/0x1e8 [venus_core]
>   [] venus_isr_thread+0x1b4/0x208 [venus_core]
>   [] hfi_isr_thread+0x28/0x38 [venus_core]
>   [] irq_thread_fn+0x30/0x70
>   [] irq_thread+0x14c/0x1c8
>   [] kthread+0x138/0x140
>   [] ret_from_fork+0x10/0x40
>   Code: 52820125 52820207 7a431820 54000249 (b9400263)
>   ---[ end trace c963460f20a984b6 ]---
> 
> The problem is that in the error case, we've incremented the data ptr
> but not decremented rem_bytes, and keep reading (presumably garbage)
> until eventually we go beyond the end of the buffer.
> 
> Instead, on first error, we should probably just bail out.  Other
> option is to increment read_bytes by sizeof(u32) before the switch,
> rather than only accounting for the ptype header in the non-error
> case.  Note that in this case it is HFI_ERR_SYS_INVALID_PARAMETER,
> ie. an unrecognized/unsupported parameter, so interpreting the next
> word as a property type would be bogus.  The other error cases are
> due to truncated buffer, so there isn't likely to be anything valid
> to interpret in the remainder of the buffer.  So just bailing seems
> like a reasonable solution.

I have a WIP patch which rewrite the message parsing, it would be nice
if I can reproduce this crash.

> 
> Signed-off-by: Rob Clark 
> ---
>  drivers/media/platform/qcom/venus/hfi_msgs.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/media/platform/qcom/venus/hfi_msgs.c 
> b/drivers/media/platform/qcom/venus/hfi_msgs.c
> index debf80a92797..4190825b20a1 100644
> --- a/drivers/media/platform/qcom/venus/hfi_msgs.c
> +++ b/drivers/media/platform/qcom/venus/hfi_msgs.c
> @@ -239,11 +239,12 @@ static void hfi_sys_init_done(struct venus_core *core, 
> struct venus_inst *inst,
>   break;
>   }
>  

[PATCH] media: venus: hfi: fix error handling in hfi_sys_init_done()

2017-07-09 Thread Rob Clark
Not entirely sure what triggers it, but with venus build as kernel
module and in initrd, we hit this crash:

  Unable to handle kernel paging request at virtual address 80003c039000
  pgd = 0a14f000
  [80003c039000] *pgd=bd9f7003, *pud=bd9f6003, 
*pmd=bd9f0003, *pte=
  Internal error: Oops: 9607 [#1] SMP
  Modules linked in: qcom_wcnss_pil(E+) crc32_ce(E) qcom_common(E) 
venus_core(E+) remoteproc(E) snd_soc_msm8916_digital(E) virtio_ring(E) 
cdc_ether(E) snd_soc_lpass_apq8016(E) snd_soc_lpass_cpu(E) 
snd_soc_apq8016_sbc(E) snd_soc_lpass_platform(E) v4l2_mem2mem(E) virtio(E) 
snd_soc_core(E) ac97_bus(E) snd_pcm_dmaengine(E) snd_seq(E) leds_gpio(E) 
videobuf2_v4l2(E) videobuf2_core(E) snd_seq_device(E) snd_pcm(E) videodev(E) 
media(E) nvmem_qfprom(E) msm(E) snd_timer(E) snd(E) soundcore(E) spi_qup(E) 
mdt_loader(E) qcom_tsens(E) qcom_spmi_temp_alarm(E) nvmem_core(E) msm_rng(E) 
uas(E) usb_storage(E) dm9601(E) usbnet(E) mii(E) mmc_block(E) adv7511(E) 
drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) 
qcom_spmi_vadc(E) qcom_vadc_common(PE) industrialio(E) pinctrl_spmi_mpp(E)
   pinctrl_spmi_gpio(E) rtc_pm8xxx(E) clk_smd_rpm(E) sdhci_msm(E) 
sdhci_pltfm(E) qcom_smd_regulator(E) drm(E) smd_rpm(E) qcom_spmi_pmic(E) 
regmap_spmi(E) ci_hdrc_msm(E) ci_hdrc(E) usb3503(E) extcon_usb_gpio(E) 
phy_msm_usb(E) udc_core(E) qcom_hwspinlock(E) extcon_core(E) ehci_msm(E) 
i2c_qup(E) sdhci(E) mmc_core(E) spmi_pmic_arb(E) spmi(E) qcom_smd(E) smsm(E) 
rpmsg_core(E) smp2p(E) smem(E) hwspinlock_core(E) gpio_keys(E)
  CPU: 2 PID: 551 Comm: irq/150-venus Tainted: PE   4.12.0+ #1625
  Hardware name: qualcomm dragonboard410c/dragonboard410c, BIOS 
2017.07-rc2-00144-ga97bdbdf72-dirty 07/08/2017
  task: 800037338000 task.stack: 800038e0
  PC is at hfi_sys_init_done+0x64/0x140 [venus_core]
  LR is at hfi_process_msg_packet+0xcc/0x1e8 [venus_core]
  pc : [] lr : [] pstate: 20400145
  sp : 800038e03c60
  x29: 800038e03c60 x28: 
  x27: 000df018 x26: 0118f4d0
  x25: 00020003 x24: 80003a8d3010
  x23: 0118f760 x22: 800037b40028
  x21: 8000382981f0 x20: 800037b40028
  x19: 80003c039000 x18: 0020
  x17:  x16: 800037338000
  x15:  x14: 00100014
  x13: 00011007 x12: 00010020
  x11: 100e x10: 0001
  x9 : 0002 x8 : 00140001
  x7 : 1010 x6 : 0148
  x5 : 1009 x4 : 80003c039000
  x3 : cd770abb x2 : 0042
  x1 : 0788 x0 : 0002
  Process irq/150-venus (pid: 551, stack limit = 0x800038e0)
  Call trace:
  [] hfi_sys_init_done+0x64/0x140 [venus_core]
  [] hfi_process_msg_packet+0xcc/0x1e8 [venus_core]
  [] venus_isr_thread+0x1b4/0x208 [venus_core]
  [] hfi_isr_thread+0x28/0x38 [venus_core]
  [] irq_thread_fn+0x30/0x70
  [] irq_thread+0x14c/0x1c8
  [] kthread+0x138/0x140
  [] ret_from_fork+0x10/0x40
  Code: 52820125 52820207 7a431820 54000249 (b9400263)
  ---[ end trace c963460f20a984b6 ]---

The problem is that in the error case, we've incremented the data ptr
but not decremented rem_bytes, and keep reading (presumably garbage)
until eventually we go beyond the end of the buffer.

Instead, on first error, we should probably just bail out.  Other
option is to increment read_bytes by sizeof(u32) before the switch,
rather than only accounting for the ptype header in the non-error
case.  Note that in this case it is HFI_ERR_SYS_INVALID_PARAMETER,
ie. an unrecognized/unsupported parameter, so interpreting the next
word as a property type would be bogus.  The other error cases are
due to truncated buffer, so there isn't likely to be anything valid
to interpret in the remainder of the buffer.  So just bailing seems
like a reasonable solution.

Signed-off-by: Rob Clark 
---
 drivers/media/platform/qcom/venus/hfi_msgs.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/media/platform/qcom/venus/hfi_msgs.c 
b/drivers/media/platform/qcom/venus/hfi_msgs.c
index debf80a92797..4190825b20a1 100644
--- a/drivers/media/platform/qcom/venus/hfi_msgs.c
+++ b/drivers/media/platform/qcom/venus/hfi_msgs.c
@@ -239,11 +239,12 @@ static void hfi_sys_init_done(struct venus_core *core, 
struct venus_inst *inst,
break;
}
 
-   if (!error) {
-   rem_bytes -= read_bytes;
-   data += read_bytes;
-   num_properties--;
-   }
+   if (error)
+   break;
+
+   rem_bytes -= read_bytes;
+   data += read_bytes;
+   num_properties--;
}
 
 err_no_prop:
-- 
2.13.0