Re: offlist or1k printf causes crash
On Thu, Aug 21, 2014 at 11:44 PM, Joel Sherrill joel.sherr...@oarcorp.com wrote: On 8/21/2014 4:04 PM, Christian Svensson wrote: On Thu, Aug 21, 2014 at 9:56 PM, Joel Sherrill joel.sherr...@oarcorp.com wrote: The sp must be updated before the memory can be used. This is just a bug otherwise. No, 128 byte redzone is an ABI thing both that OpenRISC and x86-64 have. The bug in GCC was that redzone was not respected IIRC (if it's the same bug that I have in mind). OK. Would it possibly need to be respected at the beginning of the interrupt? And at the transfer via _ISR_Dispatch? I did so at the beginning of the interrupt, and before jumping to _Thread_Dispatch from the _ISR_Handler; but I got the same behavior. R1 [SP] The stack pointer holds the limit of the current stack frame. The first 128 bytes below the stack pointer are reserved for leaf functions, and below that are undefined. Stack pointer must be word aligned at all times. Christian.. can you review that code? Could you point me to the code? I don't know exactly which code and version is being used. http://git.rtems.org/rtems/tree/cpukit/score/cpu/or1k/rtems/score/cpu.h around line 548. That is implemented using _OR1K_mtspr which is around line 336 of http://git.rtems.org/rtems/tree/cpukit/score/cpu/or1k/rtems/score/or1k-utility.h These asm constraints are always so hard to get right. -- Joel Sherrill, Ph.D. Director of Research Development joel.sherr...@oarcorp.comOn-Line Applications Research Ask me about RTEMS: a free RTOS Huntsville AL 35805 Support Available(256) 722-9985 ___ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel
Re: offlist or1k printf causes crash
On 8/21/2014 4:15 PM, Hesham Moustafa wrote: On Thu, Aug 21, 2014 at 10:56 PM, Joel Sherrill joel.sherr...@oarcorp.com wrote: On 8/21/2014 2:44 PM, Hesham Moustafa wrote: Hi, I have been debugging since a while or1k code hopefully I'd find what's wrong. Here's what I got. First I am moving this to devel@ so others can chime in. First, I asked about this problem at #openrisc IRC channel, they told me the problem might be that I have to take account of the red-zone, I asked what's the red-zone and Stefan said the following: the first 128 bytes of the stack has to be stepped over, leaf functions might use that without modifying the stack pointer, and gcc takes advantage of the fact that there is a red zone in non-leaf functions prologues too. i.e. it stores things on the stack and *then* update the stack pointer This is a bug in gcc. We have seen it on the ARM and there was a recent dust up from the Linux kernel community because it happened on x86-64. My understanding is that there was rework/improvement which triggered bugs in backends. But this needs to be fixed. The sp must be updated before the memory can be used. This is just a bug otherwise. He suggested that I add 128 bytes to stack pointer before I jump to _ISR_Handler (from start.S). I tried this solution and I was not lucky. You may have some ideas where/when this red-zone make problem. You probably need to Second, I discovered that there is unusual (unalign) exception happens when using printf (which does not happen with printk). When I stack, I found out the problem happens in rtems_semaphore_obtain(), when trying to access the_semaphore data which its pointer is returned (invalid pointer) from a call to _Objects_Get_isr_disable(). This exception only happens after DISPATCH_NEEDED is true and _ISR_Handler jumps to _Thread_Dispatch and make a successful context switch and run the first task. The following is a snapshot of the output when encountering this problem. What's the alignment of the task stack in the port? The stack may not be properly aligned for the widest access of the or1k. If you mean the following: #define CPU_STACK_ALIGNMENT0 but even if with this macro assigned to 4 or 8, I got the same problem. and from linkcmds.base bsp_stack_align = DEFINED (bsp_stack_align) ? bsp_stack_align : 8; Hmm.. ok .. then we need to know the instruction. 8 is normally a wide enough alignment since that is the usually like a double or 64-bit access. *** BEGIN OF TEST CLOCK TICK *** TA1 - rtems_clock_get_tod - 09:00:00 12/31/1988 TA2 - rtems_clock_get_tod - 09:00:00 12/31/1988 TA3 - rtems_clock_get_tod - 09:00:00 12/31/1988 Fatal Error 263572 Halted Can you tell what the instruction is? And the address it is trying to access. The _Objects_Get_isr_disable() function returns a weird address for Object (which in tern should be the_semaphore), this address is 0x8007, it seems like the value of the SR register. All previous Object/the_semaphore addresses returned from _Objects_Get_isr_disable() are higher addresses, that's why I indicate that the last (0x8007) Object address is invalid. _Objects_Get_isr_disable() will return an address from the RTEMS Workspace which would tend to be a higher RAM address. Random thought. Temporarily disable the real hardware clock tick driver in your BSP and add the simulated clock tick driver. See h8sim BSP's Makefile for an example. We need to eliminate that your ISR code is doing the right thing. You could be getting an interrupt at the wrong time and just clobbering a register. Doing this will let the test run without interrupts. What is the value of _Watchdog_Ticks_since_boot at this fault? I set a break point at a call to _Objects_Get_isr_disable() and continued until the call that returns the invalid Object pointer, and typed bt to get the following stack: Another possibility is that the register/memory constraints on enable/disable interrupts isn't right and it is confusing gcc. You could be randomly clobbering registers anytime ISRs are disabled/enabled. Christian.. can you review that code? #0 _Objects_Get_isr_disable ( information=0x3ba54 _Semaphore_Information, id=436273156, location=0x406b4, level_p=0x406b0) at ../../../../../../rtems/c/src/../../cpukit/score/src/objectgetisr.c:34 #1 0x00014294 in _Semaphore_Get_interrupt_disable ( id=436273156, location=0x406b4, level=0x406b0) at ../../cpukit/../../../or1k_or1ksim/lib/include/rtems/rtems/semimpl.h:196 #2 0x000142e0 in rtems_semaphore_obtain (id=436273156, option_set=0, timeout=0) at ../../../../../../rtems/c/src/../../cpukit/rtems/src/semobtain.c:47 #3 0xd648 in rtems_termios_write (arg=0x40730) at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/termios.c:1099 #4 0x4380 in console_write (major=0, minor=0, arg=0x40730) at
Re: offlist or1k printf causes crash
On Thu, Aug 21, 2014 at 11:54 PM, Joel Sherrill joel.sherr...@oarcorp.com wrote: On 8/21/2014 4:15 PM, Hesham Moustafa wrote: On Thu, Aug 21, 2014 at 10:56 PM, Joel Sherrill joel.sherr...@oarcorp.com wrote: On 8/21/2014 2:44 PM, Hesham Moustafa wrote: Hi, I have been debugging since a while or1k code hopefully I'd find what's wrong. Here's what I got. First I am moving this to devel@ so others can chime in. First, I asked about this problem at #openrisc IRC channel, they told me the problem might be that I have to take account of the red-zone, I asked what's the red-zone and Stefan said the following: the first 128 bytes of the stack has to be stepped over, leaf functions might use that without modifying the stack pointer, and gcc takes advantage of the fact that there is a red zone in non-leaf functions prologues too. i.e. it stores things on the stack and *then* update the stack pointer This is a bug in gcc. We have seen it on the ARM and there was a recent dust up from the Linux kernel community because it happened on x86-64. My understanding is that there was rework/improvement which triggered bugs in backends. But this needs to be fixed. The sp must be updated before the memory can be used. This is just a bug otherwise. He suggested that I add 128 bytes to stack pointer before I jump to _ISR_Handler (from start.S). I tried this solution and I was not lucky. You may have some ideas where/when this red-zone make problem. You probably need to Second, I discovered that there is unusual (unalign) exception happens when using printf (which does not happen with printk). When I stack, I found out the problem happens in rtems_semaphore_obtain(), when trying to access the_semaphore data which its pointer is returned (invalid pointer) from a call to _Objects_Get_isr_disable(). This exception only happens after DISPATCH_NEEDED is true and _ISR_Handler jumps to _Thread_Dispatch and make a successful context switch and run the first task. The following is a snapshot of the output when encountering this problem. What's the alignment of the task stack in the port? The stack may not be properly aligned for the widest access of the or1k. If you mean the following: #define CPU_STACK_ALIGNMENT0 but even if with this macro assigned to 4 or 8, I got the same problem. and from linkcmds.base bsp_stack_align = DEFINED (bsp_stack_align) ? bsp_stack_align : 8; Hmm.. ok .. then we need to know the instruction. 8 is normally a wide enough alignment since that is the usually like a double or 64-bit access. *** BEGIN OF TEST CLOCK TICK *** TA1 - rtems_clock_get_tod - 09:00:00 12/31/1988 TA2 - rtems_clock_get_tod - 09:00:00 12/31/1988 TA3 - rtems_clock_get_tod - 09:00:00 12/31/1988 Fatal Error 263572 Halted Can you tell what the instruction is? And the address it is trying to access. The _Objects_Get_isr_disable() function returns a weird address for Object (which in tern should be the_semaphore), this address is 0x8007, it seems like the value of the SR register. All previous Object/the_semaphore addresses returned from _Objects_Get_isr_disable() are higher addresses, that's why I indicate that the last (0x8007) Object address is invalid. _Objects_Get_isr_disable() will return an address from the RTEMS Workspace which would tend to be a higher RAM address. Random thought. Temporarily disable the real hardware clock tick driver in your BSP and add the simulated clock tick driver. See h8sim BSP's Makefile for an example. We need to eliminate that your ISR code is doing the right thing. You could be getting an interrupt at the wrong time and just clobbering a register. Doing this will let the test run without interrupts. What is the value of _Watchdog_Ticks_since_boot at this fault? 5. Pleaes note that I replaced ticker wake_after call (to avoid waiting long time) with the following status = rtems_task_wake_after( task_index * 5 ); And it was making the context switch to the first task, the unalign happens when task 1 (after the context switch) tries to use printf and semaphore obtain. I set a break point at a call to _Objects_Get_isr_disable() and continued until the call that returns the invalid Object pointer, and typed bt to get the following stack: Another possibility is that the register/memory constraints on enable/disable interrupts isn't right and it is confusing gcc. You could be randomly clobbering registers anytime ISRs are disabled/enabled. Christian.. can you review that code? #0 _Objects_Get_isr_disable ( information=0x3ba54 _Semaphore_Information, id=436273156, location=0x406b4, level_p=0x406b0) at ../../../../../../rtems/c/src/../../cpukit/score/src/objectgetisr.c:34 #1 0x00014294 in _Semaphore_Get_interrupt_disable ( id=436273156, location=0x406b4, level=0x406b0) at ../../cpukit/../../../or1k_or1ksim/lib/include/rtems/rtems/semimpl.h:196 #2 0x000142e0 in
Re: offlist or1k printf causes crash
On 8/21/2014 5:00 PM, Hesham Moustafa wrote: On Thu, Aug 21, 2014 at 11:54 PM, Joel Sherrill joel.sherr...@oarcorp.com wrote: On 8/21/2014 4:15 PM, Hesham Moustafa wrote: On Thu, Aug 21, 2014 at 10:56 PM, Joel Sherrill joel.sherr...@oarcorp.com wrote: On 8/21/2014 2:44 PM, Hesham Moustafa wrote: Hi, I have been debugging since a while or1k code hopefully I'd find what's wrong. Here's what I got. First I am moving this to devel@ so others can chime in. First, I asked about this problem at #openrisc IRC channel, they told me the problem might be that I have to take account of the red-zone, I asked what's the red-zone and Stefan said the following: the first 128 bytes of the stack has to be stepped over, leaf functions might use that without modifying the stack pointer, and gcc takes advantage of the fact that there is a red zone in non-leaf functions prologues too. i.e. it stores things on the stack and *then* update the stack pointer This is a bug in gcc. We have seen it on the ARM and there was a recent dust up from the Linux kernel community because it happened on x86-64. My understanding is that there was rework/improvement which triggered bugs in backends. But this needs to be fixed. The sp must be updated before the memory can be used. This is just a bug otherwise. He suggested that I add 128 bytes to stack pointer before I jump to _ISR_Handler (from start.S). I tried this solution and I was not lucky. You may have some ideas where/when this red-zone make problem. You probably need to Second, I discovered that there is unusual (unalign) exception happens when using printf (which does not happen with printk). When I stack, I found out the problem happens in rtems_semaphore_obtain(), when trying to access the_semaphore data which its pointer is returned (invalid pointer) from a call to _Objects_Get_isr_disable(). This exception only happens after DISPATCH_NEEDED is true and _ISR_Handler jumps to _Thread_Dispatch and make a successful context switch and run the first task. The following is a snapshot of the output when encountering this problem. What's the alignment of the task stack in the port? The stack may not be properly aligned for the widest access of the or1k. If you mean the following: #define CPU_STACK_ALIGNMENT0 but even if with this macro assigned to 4 or 8, I got the same problem. and from linkcmds.base bsp_stack_align = DEFINED (bsp_stack_align) ? bsp_stack_align : 8; Hmm.. ok .. then we need to know the instruction. 8 is normally a wide enough alignment since that is the usually like a double or 64-bit access. *** BEGIN OF TEST CLOCK TICK *** TA1 - rtems_clock_get_tod - 09:00:00 12/31/1988 TA2 - rtems_clock_get_tod - 09:00:00 12/31/1988 TA3 - rtems_clock_get_tod - 09:00:00 12/31/1988 Fatal Error 263572 Halted Can you tell what the instruction is? And the address it is trying to access. The _Objects_Get_isr_disable() function returns a weird address for Object (which in tern should be the_semaphore), this address is 0x8007, it seems like the value of the SR register. All previous Object/the_semaphore addresses returned from _Objects_Get_isr_disable() are higher addresses, that's why I indicate that the last (0x8007) Object address is invalid. _Objects_Get_isr_disable() will return an address from the RTEMS Workspace which would tend to be a higher RAM address. Random thought. Temporarily disable the real hardware clock tick driver in your BSP and add the simulated clock tick driver. See h8sim BSP's Makefile for an example. We need to eliminate that your ISR code is doing the right thing. You could be getting an interrupt at the wrong time and just clobbering a register. Doing this will let the test run without interrupts. What is the value of _Watchdog_Ticks_since_boot at this fault? 5. Pleaes note that I replaced ticker wake_after call (to avoid waiting long time) with the following status = rtems_task_wake_after( task_index * 5 ); And it was making the context switch to the first task, the unalign happens when task 1 (after the context switch) tries to use printf and semaphore obtain. Either your context switch or ISR code appears to be messing the stack up. Switching out the real hW tick for the simulated idle task one will help us figure out which one. Plus you can run most tests with this feature and see how things go. I set a break point at a call to _Objects_Get_isr_disable() and continued until the call that returns the invalid Object pointer, and typed bt to get the following stack: Another possibility is that the register/memory constraints on enable/disable interrupts isn't right and it is confusing gcc. You could be randomly clobbering registers anytime ISRs are disabled/enabled. Christian.. can you review that code? #0 _Objects_Get_isr_disable ( information=0x3ba54 _Semaphore_Information, id=436273156, location=0x406b4, level_p=0x406b0) at