Hi everyone, sorry for the delay on this. We dropped this kernel tree and moved on to 4.10-rc6. So far I have not had any issues there.
Except: [ 8305.931816] cc1[31947]: unhandled level 0 translation fault (11) at 0x10004df5cfc19, esr 0x92000004 But this time inside a VM running debian arm64. The host seems untroubled by this. So for now, this story is closed. As to the errors inside the guest for now I will ignore them, but if necessary Debian seems to be the place to ask about it. Am 10.01.2017 um 16:12 schrieb Matthias Brugger: > > > On 10/01/17 15:08, Josua Mayer wrote: >> Hi Alex, >> >> >> Am 08.01.2017 um 22:46 schrieb Alexander Graf: >>> >>> >>> On 07/01/2017 19:50, Josua Mayer wrote: >>>> Hi everybody, >>>> >>>> I am approaching you with a collection of unusual crashes that >>>> occur on >>>> my test machine: >>>> It is an early version of the 8040 Community Board by SolidRun >>>> running a >>>> patched 4.9.0 kernel with the 42.2 rootfs from >>>> http://download.opensuse.org/ports/aarch64/distribution/leap/42.2/appliances/openSUSE-Leap42.2-ARM-JeOS.aarch64-rootfs.aarch64-2017.01.02-Build1.2.tbz >>>> >>>> >>>> >>>> Now what I see are the following situations: >>>> >>>> 1) unhandled level XY translation fault (11) at 0xYYYYYYYY, esr >>>> 0xYYYYYYYY >>>> I have seen this with level 0, 1, 2 and 3 so far. >>>> So far I have seen it with zypper, where it freezes seemlinly at >>>> random, and ctrl+x produces this kind of error in dmesg. >>> >>> That's really just a kernel log entry for a user space segmentation >>> fault. The addresses that faulted were: >>> >>> [ 2312.480811] zypper[5524]: unhandled level 2 translation fault (11) >>> at 0x00000000, esr 0x82000006 >>> >>> --> >>> >>> ESR 0x82000006 means "Instruction Abort from a lower Exception level" >>> Fault type: Translation fault. (in EL2) >>> The faulting address is 0x00000000 (PC) >> I see. >>> >>> >>> >>> [ 2321.136185] zypper[9319]: unhandled level 2 translation fault (11) >>> at 0xffffe5720449, esr 0x92000006 >>> >>> --> >>> >>> ESR 0x92000006 means "Data Abort from a lower Exception level" >>> Fault type: Translation fault. (in EL2) >>> The faulting address is 0xffffe5720449 (x0) >>> >>> >>> I'm curious. You seem to be running in EL2. Do you have the following >>> option enabled in your kernel? >>> >>> CONFIG_ARM64_VHE >>> >>> If so, please try to disable it and see whether that makes things work. >> is indeed enabled, will comment below. >>> >>>> >>>> 2) undefined instruction: pc=... >>>> This one causes a program to crash and exit without any delay. I >>>> actively observed this twice with zypper, but the kernel log >>>> suggests it >>>> can happen to other applications too. >>>> >>>> Attached to this mail you can find a full system log with both >>>> kinds of >>>> crashes. You will find that I played with qemu when this log was >>>> produced. Sadly I did not save the initial log with the zypper >>>> crashes, >>>> except for one extract. >>>> >>>> I am quite unsure what to do about this. Has anybody seen such >>>> behaviour >>>> with other boards? >>> >>> I have seen it on a different CPU type with VHE enabled, yes. >> I see >>> >>> >> So after some additional tests I can conclude that it is some kernel >> option triggering the defect. I have one .config that works just fine, >> and one that is broken. Unfortunately the diff is not small. >> Both have VHE enabled, so I am not sure if that is the root cause. >> However I will try disabling VHE and let you know how it goes. > > From my experience I would start by changing the page size in your > broken config. > > Regards, > Matthias -- To unsubscribe, e-mail: [email protected] To contact the owner, e-mail: [email protected]
