Re: offlist or1k printf causes crash

2014-08-21 Thread Hesham Moustafa
On Thu, Aug 21, 2014 at 11:44 PM, Joel Sherrill
joel.sherr...@oarcorp.com wrote:

 On 8/21/2014 4:04 PM, Christian Svensson wrote:
 On Thu, Aug 21, 2014 at 9:56 PM, Joel Sherrill
 joel.sherr...@oarcorp.com wrote:
 The sp must be updated before the memory can be used. This is just
 a bug otherwise.
 No, 128 byte redzone is an ABI thing both that OpenRISC and x86-64 have.
 The bug in GCC was that redzone was not respected IIRC (if it's the
 same bug that I have in mind).
 OK.

 Would it possibly need to be respected at the beginning of the interrupt?
 And at the transfer via _ISR_Dispatch?
I did so at the beginning of the interrupt, and before jumping to
_Thread_Dispatch from the _ISR_Handler; but I got the same behavior.
 R1 [SP]
 The stack pointer holds the limit of the current stack frame. The first 128 
 bytes below the stack pointer are reserved for leaf functions, and below 
 that are undefined. Stack pointer must be word aligned at all times.
 Christian.. can you review that code?
 Could you point me to the code? I don't know exactly which code and
 version is being used.
 http://git.rtems.org/rtems/tree/cpukit/score/cpu/or1k/rtems/score/cpu.h

 around line 548.  That is implemented using _OR1K_mtspr which is
 around line 336 of

 http://git.rtems.org/rtems/tree/cpukit/score/cpu/or1k/rtems/score/or1k-utility.h

 These asm constraints are always so hard to get right.

 --
 Joel Sherrill, Ph.D. Director of Research  Development
 joel.sherr...@oarcorp.comOn-Line Applications Research
 Ask me about RTEMS: a free RTOS  Huntsville AL 35805
 Support Available(256) 722-9985

___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel


Re: offlist or1k printf causes crash

2014-08-21 Thread Joel Sherrill

On 8/21/2014 4:15 PM, Hesham Moustafa wrote:
 On Thu, Aug 21, 2014 at 10:56 PM, Joel Sherrill
 joel.sherr...@oarcorp.com wrote:
 On 8/21/2014 2:44 PM, Hesham Moustafa wrote:
 Hi,

 I have been debugging since a while or1k code hopefully I'd find
 what's wrong. Here's what I got.
 First I am moving this to devel@ so others can chime in.
 First, I asked about this problem at #openrisc IRC channel, they told
 me the problem might be that I have to take account of the red-zone, I
 asked what's the red-zone and Stefan said the following:
 the first 128 bytes of the stack has to be stepped over, leaf
 functions might use that without modifying the stack pointer, and gcc
 takes advantage of the fact that there is a red zone in non-leaf
 functions prologues too. i.e. it stores things on the stack and *then*
 update the stack pointer
 This is a bug in gcc. We have seen it on the ARM and there was a recent
 dust up from the Linux kernel community because it happened on x86-64.
 My understanding is that there was rework/improvement which triggered
 bugs in backends. But this needs to be fixed.

 The sp must be updated before the memory can be used. This is just
 a bug otherwise.
 He suggested that I add 128 bytes to stack pointer before I jump to
 _ISR_Handler (from start.S). I tried this solution and I was not
 lucky. You may have some ideas where/when this red-zone make problem.
 You probably need to
 Second, I discovered that there is unusual (unalign) exception happens
 when using printf (which does not happen with printk). When I stack, I
 found out the problem happens in rtems_semaphore_obtain(), when trying
 to access the_semaphore data which its pointer is returned (invalid
 pointer) from a call to _Objects_Get_isr_disable(). This exception
 only happens after DISPATCH_NEEDED is true and _ISR_Handler jumps to
 _Thread_Dispatch and make a successful context switch and run the
 first task. The following is a snapshot of the output when
 encountering this problem.
 What's the alignment of the task stack in the port? The stack may not be
 properly aligned for the widest access of the or1k.
 If you mean the following:
 #define CPU_STACK_ALIGNMENT0
 but even if with this macro assigned to 4 or 8, I got the same problem.
 and from linkcmds.base
 bsp_stack_align = DEFINED (bsp_stack_align) ? bsp_stack_align : 8;
Hmm.. ok .. then we need to know the instruction. 8 is normally a wide
enough alignment since that is the usually like a double or 64-bit access.
 *** BEGIN OF TEST CLOCK TICK ***
 TA1  - rtems_clock_get_tod - 09:00:00   12/31/1988
 TA2  - rtems_clock_get_tod - 09:00:00   12/31/1988
 TA3  - rtems_clock_get_tod - 09:00:00   12/31/1988
 Fatal Error 263572 Halted
 Can you tell what the instruction is? And the address it is trying to
 access.
 The _Objects_Get_isr_disable() function returns a weird address for
 Object (which in tern should be the_semaphore), this address is
 0x8007, it seems like the value of the SR register. All previous
 Object/the_semaphore addresses returned from
 _Objects_Get_isr_disable() are higher addresses, that's why I indicate
 that the last (0x8007) Object address is invalid.
_Objects_Get_isr_disable() will return an address from the RTEMS Workspace
which would tend to be a higher RAM address.

Random thought. Temporarily disable the real hardware clock tick driver
in your BSP and add the simulated clock tick driver. See h8sim BSP's
Makefile
for an example. We need to eliminate that your ISR code is doing the
right thing. You could be getting an interrupt at the wrong time and
just clobbering a register. Doing this will let the test run without
interrupts.

What is the value of _Watchdog_Ticks_since_boot at this fault?

 I set a break point at  a call to _Objects_Get_isr_disable() and
 continued until the call that returns the invalid Object pointer, and
 typed bt to get the following stack:
 Another possibility is that the register/memory constraints on
 enable/disable
 interrupts isn't right and it is confusing gcc. You could be randomly
 clobbering
 registers anytime ISRs are disabled/enabled.

 Christian.. can you review that code?
 
 #0  _Objects_Get_isr_disable (
 information=0x3ba54 _Semaphore_Information,
 id=436273156, location=0x406b4, level_p=0x406b0)
 at 
 ../../../../../../rtems/c/src/../../cpukit/score/src/objectgetisr.c:34
 #1  0x00014294 in _Semaphore_Get_interrupt_disable (
 id=436273156, location=0x406b4, level=0x406b0)
 at 
 ../../cpukit/../../../or1k_or1ksim/lib/include/rtems/rtems/semimpl.h:196
 #2  0x000142e0 in rtems_semaphore_obtain (id=436273156,
 option_set=0, timeout=0)
 at ../../../../../../rtems/c/src/../../cpukit/rtems/src/semobtain.c:47
 #3  0xd648 in rtems_termios_write (arg=0x40730)
 at 
 ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/termios.c:1099
 #4  0x4380 in console_write (major=0, minor=0,
 arg=0x40730)
 at 
 

Re: offlist or1k printf causes crash

2014-08-21 Thread Hesham Moustafa
On Thu, Aug 21, 2014 at 11:54 PM, Joel Sherrill
joel.sherr...@oarcorp.com wrote:

 On 8/21/2014 4:15 PM, Hesham Moustafa wrote:
 On Thu, Aug 21, 2014 at 10:56 PM, Joel Sherrill
 joel.sherr...@oarcorp.com wrote:
 On 8/21/2014 2:44 PM, Hesham Moustafa wrote:
 Hi,

 I have been debugging since a while or1k code hopefully I'd find
 what's wrong. Here's what I got.
 First I am moving this to devel@ so others can chime in.
 First, I asked about this problem at #openrisc IRC channel, they told
 me the problem might be that I have to take account of the red-zone, I
 asked what's the red-zone and Stefan said the following:
 the first 128 bytes of the stack has to be stepped over, leaf
 functions might use that without modifying the stack pointer, and gcc
 takes advantage of the fact that there is a red zone in non-leaf
 functions prologues too. i.e. it stores things on the stack and *then*
 update the stack pointer
 This is a bug in gcc. We have seen it on the ARM and there was a recent
 dust up from the Linux kernel community because it happened on x86-64.
 My understanding is that there was rework/improvement which triggered
 bugs in backends. But this needs to be fixed.

 The sp must be updated before the memory can be used. This is just
 a bug otherwise.
 He suggested that I add 128 bytes to stack pointer before I jump to
 _ISR_Handler (from start.S). I tried this solution and I was not
 lucky. You may have some ideas where/when this red-zone make problem.
 You probably need to
 Second, I discovered that there is unusual (unalign) exception happens
 when using printf (which does not happen with printk). When I stack, I
 found out the problem happens in rtems_semaphore_obtain(), when trying
 to access the_semaphore data which its pointer is returned (invalid
 pointer) from a call to _Objects_Get_isr_disable(). This exception
 only happens after DISPATCH_NEEDED is true and _ISR_Handler jumps to
 _Thread_Dispatch and make a successful context switch and run the
 first task. The following is a snapshot of the output when
 encountering this problem.
 What's the alignment of the task stack in the port? The stack may not be
 properly aligned for the widest access of the or1k.
 If you mean the following:
 #define CPU_STACK_ALIGNMENT0
 but even if with this macro assigned to 4 or 8, I got the same problem.
 and from linkcmds.base
 bsp_stack_align = DEFINED (bsp_stack_align) ? bsp_stack_align : 8;
 Hmm.. ok .. then we need to know the instruction. 8 is normally a wide
 enough alignment since that is the usually like a double or 64-bit access.
 *** BEGIN OF TEST CLOCK TICK ***
 TA1  - rtems_clock_get_tod - 09:00:00   12/31/1988
 TA2  - rtems_clock_get_tod - 09:00:00   12/31/1988
 TA3  - rtems_clock_get_tod - 09:00:00   12/31/1988
 Fatal Error 263572 Halted
 Can you tell what the instruction is? And the address it is trying to
 access.
 The _Objects_Get_isr_disable() function returns a weird address for
 Object (which in tern should be the_semaphore), this address is
 0x8007, it seems like the value of the SR register. All previous
 Object/the_semaphore addresses returned from
 _Objects_Get_isr_disable() are higher addresses, that's why I indicate
 that the last (0x8007) Object address is invalid.
 _Objects_Get_isr_disable() will return an address from the RTEMS Workspace
 which would tend to be a higher RAM address.

 Random thought. Temporarily disable the real hardware clock tick driver
 in your BSP and add the simulated clock tick driver. See h8sim BSP's
 Makefile
 for an example. We need to eliminate that your ISR code is doing the
 right thing. You could be getting an interrupt at the wrong time and
 just clobbering a register. Doing this will let the test run without
 interrupts.

 What is the value of _Watchdog_Ticks_since_boot at this fault?

5. Pleaes note that I replaced ticker wake_after call (to avoid
waiting long time) with the following
status = rtems_task_wake_after(
  task_index * 5
);

And it was making the context switch to the first task, the unalign
happens when task 1 (after the context switch) tries to use printf and
semaphore obtain.
 I set a break point at  a call to _Objects_Get_isr_disable() and
 continued until the call that returns the invalid Object pointer, and
 typed bt to get the following stack:
 Another possibility is that the register/memory constraints on
 enable/disable
 interrupts isn't right and it is confusing gcc. You could be randomly
 clobbering
 registers anytime ISRs are disabled/enabled.

 Christian.. can you review that code?
 
 #0  _Objects_Get_isr_disable (
 information=0x3ba54 _Semaphore_Information,
 id=436273156, location=0x406b4, level_p=0x406b0)
 at 
 ../../../../../../rtems/c/src/../../cpukit/score/src/objectgetisr.c:34
 #1  0x00014294 in _Semaphore_Get_interrupt_disable (
 id=436273156, location=0x406b4, level=0x406b0)
 at 
 ../../cpukit/../../../or1k_or1ksim/lib/include/rtems/rtems/semimpl.h:196
 #2  0x000142e0 in 

Re: offlist or1k printf causes crash

2014-08-21 Thread Joel Sherrill

On 8/21/2014 5:00 PM, Hesham Moustafa wrote:
 On Thu, Aug 21, 2014 at 11:54 PM, Joel Sherrill
 joel.sherr...@oarcorp.com wrote:
 On 8/21/2014 4:15 PM, Hesham Moustafa wrote:
 On Thu, Aug 21, 2014 at 10:56 PM, Joel Sherrill
 joel.sherr...@oarcorp.com wrote:
 On 8/21/2014 2:44 PM, Hesham Moustafa wrote:
 Hi,

 I have been debugging since a while or1k code hopefully I'd find
 what's wrong. Here's what I got.
 First I am moving this to devel@ so others can chime in.
 First, I asked about this problem at #openrisc IRC channel, they told
 me the problem might be that I have to take account of the red-zone, I
 asked what's the red-zone and Stefan said the following:
 the first 128 bytes of the stack has to be stepped over, leaf
 functions might use that without modifying the stack pointer, and gcc
 takes advantage of the fact that there is a red zone in non-leaf
 functions prologues too. i.e. it stores things on the stack and *then*
 update the stack pointer
 This is a bug in gcc. We have seen it on the ARM and there was a recent
 dust up from the Linux kernel community because it happened on x86-64.
 My understanding is that there was rework/improvement which triggered
 bugs in backends. But this needs to be fixed.

 The sp must be updated before the memory can be used. This is just
 a bug otherwise.
 He suggested that I add 128 bytes to stack pointer before I jump to
 _ISR_Handler (from start.S). I tried this solution and I was not
 lucky. You may have some ideas where/when this red-zone make problem.
 You probably need to
 Second, I discovered that there is unusual (unalign) exception happens
 when using printf (which does not happen with printk). When I stack, I
 found out the problem happens in rtems_semaphore_obtain(), when trying
 to access the_semaphore data which its pointer is returned (invalid
 pointer) from a call to _Objects_Get_isr_disable(). This exception
 only happens after DISPATCH_NEEDED is true and _ISR_Handler jumps to
 _Thread_Dispatch and make a successful context switch and run the
 first task. The following is a snapshot of the output when
 encountering this problem.
 What's the alignment of the task stack in the port? The stack may not be
 properly aligned for the widest access of the or1k.
 If you mean the following:
 #define CPU_STACK_ALIGNMENT0
 but even if with this macro assigned to 4 or 8, I got the same problem.
 and from linkcmds.base
 bsp_stack_align = DEFINED (bsp_stack_align) ? bsp_stack_align : 8;
 Hmm.. ok .. then we need to know the instruction. 8 is normally a wide
 enough alignment since that is the usually like a double or 64-bit access.
 *** BEGIN OF TEST CLOCK TICK ***
 TA1  - rtems_clock_get_tod - 09:00:00   12/31/1988
 TA2  - rtems_clock_get_tod - 09:00:00   12/31/1988
 TA3  - rtems_clock_get_tod - 09:00:00   12/31/1988
 Fatal Error 263572 Halted
 Can you tell what the instruction is? And the address it is trying to
 access.
 The _Objects_Get_isr_disable() function returns a weird address for
 Object (which in tern should be the_semaphore), this address is
 0x8007, it seems like the value of the SR register. All previous
 Object/the_semaphore addresses returned from
 _Objects_Get_isr_disable() are higher addresses, that's why I indicate
 that the last (0x8007) Object address is invalid.
 _Objects_Get_isr_disable() will return an address from the RTEMS Workspace
 which would tend to be a higher RAM address.

 Random thought. Temporarily disable the real hardware clock tick driver
 in your BSP and add the simulated clock tick driver. See h8sim BSP's
 Makefile
 for an example. We need to eliminate that your ISR code is doing the
 right thing. You could be getting an interrupt at the wrong time and
 just clobbering a register. Doing this will let the test run without
 interrupts.

 What is the value of _Watchdog_Ticks_since_boot at this fault?

 5. Pleaes note that I replaced ticker wake_after call (to avoid
 waiting long time) with the following
 status = rtems_task_wake_after(
   task_index * 5
 );

 And it was making the context switch to the first task, the unalign
 happens when task 1 (after the context switch) tries to use printf and
 semaphore obtain.
Either your context switch or ISR code appears to be messing the stack up.
Switching out the real hW tick for the simulated idle task one will help us
figure out which one. Plus you can run most tests with this feature
and see how things go.
 I set a break point at  a call to _Objects_Get_isr_disable() and
 continued until the call that returns the invalid Object pointer, and
 typed bt to get the following stack:
 Another possibility is that the register/memory constraints on
 enable/disable
 interrupts isn't right and it is confusing gcc. You could be randomly
 clobbering
 registers anytime ISRs are disabled/enabled.

 Christian.. can you review that code?
 
 #0  _Objects_Get_isr_disable (
 information=0x3ba54 _Semaphore_Information,
 id=436273156, location=0x406b4, level_p=0x406b0)
 at