Re: Time spent in ticks...

2016-10-13 Thread Pavel Pisa
Hello Joel,

On Friday 14 of October 2016 00:56:21 Joel Sherrill wrote:
> On Thu, Oct 13, 2016 at 1:37 PM, Joel Sherrill  wrote:
> > On Thu, Oct 13, 2016 at 11:21 AM, Jakob Viketoft <
> >
> > jakob.viket...@aacmicrotec.com> wrote:
> >> *From:* Joel Sherrill [j...@rtems.org]
> >> *Sent:* Thursday, October 13, 2016 17:38
> >> *To:* Jakob Viketoft
> >> *Cc:* devel@rtems.org
> >> *Subject:* Re: Time spent in ticks...
> >>
> >> >I don't have an or1k handy so ran on a sparc/erc32 simulator/
> >> >It is is a SPARC v7 at 15 Mhz.
> >> >
> >> >These times are in microseconds and based on the tmtests.
> >> >Specifically tm08and tm27.
> >> >
> >> >(1) rtems_clock_tick: only case - 52
> >> >(2) rtems interrupt: entry overhead returns to interrupted task - 12
> >> >(3) rtems interrupt: exit overhead returns to interrupted task - 4
> >> >(4) rtems interrupt: entry overhead returns to nested interrupt - 11
> >> >(5) rtems interrupt: exit overhead returns to nested interrupt - 3
> >
> > The above was from the master with SMP enabled. I repeated it with
> > SMP disabled and it had no impact.
> >
> > Since the timing change is post 4.11, I decided to try 4.11 with SMP
> > disabled:
> >
> > rtems_clock_tick: only case - 42
> > rtems interrupt: entry overhead returns to interrupted task - 11
> > rtems interrupt: exit overhead returns to interrupted task - 4
> > rtems interrupt: entry overhead returns to nested interrupt - 11
> > rtems interrupt: exit overhead returns to nested interrupt - 3
> >
> > So 42 + 12 + 4 = 58 microseconds, 58 * 15 = 870 cycles
> >
> > So the overhead has gone up some but as Pavel says it is quite likely
> > some mathematical operation on 64 bit types is slow on your CPU.
> >
> > HINT: If you can write a benchmark for 64-bit operations,
> > it would be a good comparison between CPUs and might
> > highlight where the software implementation needs improvement.
>
> I decided that another good point of reference was the powerpc/psim BSP. It
> reports the benchmarks in instructions:
>
> (1) rtems_clock_tick: only case - 229
> (2) rtems interrupt: entry overhead returns to interrupted task - 102
> (3) rtems interrupt: exit overhead returns to interrupted task - 95
> (4) rtems interrupt: entry overhead returns to nested interrupt - 105
> (5) rtems interrupt: exit overhead returns to nested interrupt - 85
>
> 229 + 102 + 96 = 427 instructions.
>
> That seems roughly inline with the erc32 which is 1 cycle for all
> instructions
> except loads which are 3 and stores which are 2. And the sparc has
> register windows so entering and exiting an ISR can potentially save
> and restore a lot of registers.
>
> So I am still leaning to Pavel's explanation that some primitive operation
> is really inefficient.

These numbers looks good.

I would expect that in the case of or1k there can be real penalty
if it is synthesized without multiply or barrel shifter.
Or CPU has these and compiler is set to not use them.
If that cannot be corrected (for example hardware multiplier
or shifter would cause design to not fit in FPGA) then there
is real problem and mitchmatch between RTEMS and CPU target
area. This can be solved by configurable time measurement
data type. For example use only ticks in 32-bit number
and change even timers queues to this type. It cannot be unconditional,
because today users of RTEMS expect that the time resolution
is better and that time does not overflow in longer range, ideally 2100
or more supported.

As for actual code, if I remember, I have not liked conversions
of monotonic to ticks in nanosleep and there has been some division.
The division is not in tick code (at least I thinks so). So this should
be OK. The packet sec and fractions format of timespec for one
of queues has some interresting properties but on the other hand
its repcaking has some overhead even in the tick processing.

If we take that for some CPU time spent in tick is for example 50 usec
then it is not problem if there are no deadlines in the similar range.
For example, tollerated latencies of 500 or 1000 usec and critical tasks
execution time is 300 usec then it is OK. If the tick rate is selected
1 kHz then 5% of CPU time consumption by time keeping looks like quite
a lot. If the timing of applications can tolerated tick time 0.1 sec (10 Hz)
then load contribution by tick processing is neglectable.

So all these numbers are relative to needs of planned target application.

Best wishes,

Pavel


___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel


Re: Time spent in ticks...

2016-10-13 Thread Joel Sherrill
On Thu, Oct 13, 2016 at 1:37 PM, Joel Sherrill  wrote:

>
>
> On Thu, Oct 13, 2016 at 11:21 AM, Jakob Viketoft <
> jakob.viket...@aacmicrotec.com> wrote:
>
>>
>> *From:* Joel Sherrill [j...@rtems.org]
>> *Sent:* Thursday, October 13, 2016 17:38
>> *To:* Jakob Viketoft
>> *Cc:* devel@rtems.org
>> *Subject:* Re: Time spent in ticks...
>>
>> >I don't have an or1k handy so ran on a sparc/erc32 simulator/
>> >It is is a SPARC v7 at 15 Mhz.
>>
>> >These times are in microseconds and based on the tmtests.
>> >Specifically tm08and tm27.
>>
>> >(1) rtems_clock_tick: only case - 52
>> >(2) rtems interrupt: entry overhead returns to interrupted task - 12
>> >(3) rtems interrupt: exit overhead returns to interrupted task - 4
>> >(4) rtems interrupt: entry overhead returns to nested interrupt - 11
>> >(5) rtems interrupt: exit overhead returns to nested interrupt - 3
>>
>>
> The above was from the master with SMP enabled. I repeated it with
> SMP disabled and it had no impact.
>
> Since the timing change is post 4.11, I decided to try 4.11 with SMP
> disabled:
>
> rtems_clock_tick: only case - 42
> rtems interrupt: entry overhead returns to interrupted task - 11
> rtems interrupt: exit overhead returns to interrupted task - 4
> rtems interrupt: entry overhead returns to nested interrupt - 11
> rtems interrupt: exit overhead returns to nested interrupt - 3
>
> So 42 + 12 + 4 = 58 microseconds, 58 * 15 = 870 cycles
>
> So the overhead has gone up some but as Pavel says it is quite likely
> some mathematical operation on 64 bit types is slow on your CPU.
>
> HINT: If you can write a benchmark for 64-bit operations,
> it would be a good comparison between CPUs and might
> highlight where the software implementation needs improvement.
>

I decided that another good point of reference was the powerpc/psim BSP. It
reports the benchmarks in instructions:

(1) rtems_clock_tick: only case - 229
(2) rtems interrupt: entry overhead returns to interrupted task - 102
(3) rtems interrupt: exit overhead returns to interrupted task - 95
(4) rtems interrupt: entry overhead returns to nested interrupt - 105
(5) rtems interrupt: exit overhead returns to nested interrupt - 85

229 + 102 + 96 = 427 instructions.

That seems roughly inline with the erc32 which is 1 cycle for all
instructions
except loads which are 3 and stores which are 2. And the sparc has
register windows so entering and exiting an ISR can potentially save
and restore a lot of registers.

So I am still leaning to Pavel's explanation that some primitive operation
is really inefficient.


>
>
>> >The clock tick test has 100 tasks but it looks like they are blocked on
>> a semaphore
>> >without timeout.
>>
>> >Your times look WAY too high. Maybe the interrupt is stuck on and
>> >not being cleared.
>>
>> >On the erc32, a nominal "nothing to do clock tick" would be 1+2+3 from
>> >above or 52+12+4 = 68 microseconds. 68 * 15 = 1020 machine cycles.
>> >So at a higher clock rate, it should be even less time.
>>
>> >My gut feeling is that I think something is wrong with the ISR handler
>> >and it is stuck. But the performance is definitely way too high.
>>
>> >--joel
>>
>> (Sorry if the format got somewhat I garbled, anything but top-posting
>> have to be done manually...)
>>
>> I re-tested my case using an -O3 optimization (we have been using -O0
>> during development for debugging purposes) and I got a good performance
>> boost, but I'm still nowhere near your numbers. I can vouch for that the
>> interrupt (exception really) isn't stuck, but that the code unfortunately
>> takes a long time to compute. I have a subsecond counter (1/16 of a second)
>> which I'm sampling at various places in the code, storing its numbers to a
>> buffer in memory so as to interfere with the program as little as possible.
>>
>> With -O3, a tick handling still takes ~320 us to perform, but the weight
>> has now shifted. tc_windup takes ~214 us and the rest is obviously
>> _Watchdog_Tick(). When fragmenting the tc_windup function to find the worst
>> speed bumps the biggest contribution (~122 us) seem to be coming from scale
>> factor recalculation. Since it's 64 bits, it's turned into a software
>> function which can be quite time-consuming apparently.
>>
>> Even though _Watchdog_Tick() "only" takes ~100 us now, it still sound
>> much higher than your total tick with a slower system (we're running at 50
>> MHz).
>>
>> Is there anything we can do to improve these numbers? Is Clock_isr
>> intended to be run uninterrupted as it is now? Can't see that much of the
>> BSP patch code has anything to do with the speed of what I'm looking at
>> right now...
>>
>>  /Jakob
>>
>>
>>
>> *Jakob Viketoft *Senior Engineer in RTL and embedded software
>>
>> ÅAC Microtec AB
>> Dag Hammarskjölds väg 48
>> SE-751 83 Uppsala, Sweden
>>
>> T: +46 702 80 95 97
>> http://www.aacmicrotec.com
>>
>
>
___
devel mailing list
devel@rtems.org

Re: Time spent in ticks...

2016-10-13 Thread Joel Sherrill
On Thu, Oct 13, 2016 at 11:21 AM, Jakob Viketoft <
jakob.viket...@aacmicrotec.com> wrote:

>
> *From:* Joel Sherrill [j...@rtems.org]
> *Sent:* Thursday, October 13, 2016 17:38
> *To:* Jakob Viketoft
> *Cc:* devel@rtems.org
> *Subject:* Re: Time spent in ticks...
>
> >I don't have an or1k handy so ran on a sparc/erc32 simulator/
> >It is is a SPARC v7 at 15 Mhz.
>
> >These times are in microseconds and based on the tmtests.
> >Specifically tm08and tm27.
>
> >(1) rtems_clock_tick: only case - 52
> >(2) rtems interrupt: entry overhead returns to interrupted task - 12
> >(3) rtems interrupt: exit overhead returns to interrupted task - 4
> >(4) rtems interrupt: entry overhead returns to nested interrupt - 11
> >(5) rtems interrupt: exit overhead returns to nested interrupt - 3
>
>
The above was from the master with SMP enabled. I repeated it with
SMP disabled and it had no impact.

Since the timing change is post 4.11, I decided to try 4.11 with SMP
disabled:

rtems_clock_tick: only case - 42
rtems interrupt: entry overhead returns to interrupted task - 11
rtems interrupt: exit overhead returns to interrupted task - 4
rtems interrupt: entry overhead returns to nested interrupt - 11
rtems interrupt: exit overhead returns to nested interrupt - 3

So 42 + 12 + 4 = 58 microseconds, 58 * 15 = 870 cycles

So the overhead has gone up some but as Pavel says it is quite likely
some mathematical operation on 64 bit types is slow on your CPU.

HINT: If you can write a benchmark for 64-bit operations,
it would be a good comparison between CPUs and might
highlight where the software implementation needs improvement.


> >The clock tick test has 100 tasks but it looks like they are blocked on a
> semaphore
> >without timeout.
>
> >Your times look WAY too high. Maybe the interrupt is stuck on and
> >not being cleared.
>
> >On the erc32, a nominal "nothing to do clock tick" would be 1+2+3 from
> >above or 52+12+4 = 68 microseconds. 68 * 15 = 1020 machine cycles.
> >So at a higher clock rate, it should be even less time.
>
> >My gut feeling is that I think something is wrong with the ISR handler
> >and it is stuck. But the performance is definitely way too high.
>
> >--joel
>
> (Sorry if the format got somewhat I garbled, anything but top-posting have
> to be done manually...)
>
> I re-tested my case using an -O3 optimization (we have been using -O0
> during development for debugging purposes) and I got a good performance
> boost, but I'm still nowhere near your numbers. I can vouch for that the
> interrupt (exception really) isn't stuck, but that the code unfortunately
> takes a long time to compute. I have a subsecond counter (1/16 of a second)
> which I'm sampling at various places in the code, storing its numbers to a
> buffer in memory so as to interfere with the program as little as possible.
>
> With -O3, a tick handling still takes ~320 us to perform, but the weight
> has now shifted. tc_windup takes ~214 us and the rest is obviously
> _Watchdog_Tick(). When fragmenting the tc_windup function to find the worst
> speed bumps the biggest contribution (~122 us) seem to be coming from scale
> factor recalculation. Since it's 64 bits, it's turned into a software
> function which can be quite time-consuming apparently.
>
> Even though _Watchdog_Tick() "only" takes ~100 us now, it still sound much
> higher than your total tick with a slower system (we're running at 50 MHz).
>
> Is there anything we can do to improve these numbers? Is Clock_isr
> intended to be run uninterrupted as it is now? Can't see that much of the
> BSP patch code has anything to do with the speed of what I'm looking at
> right now...
>
>  /Jakob
>
>
>
> *Jakob Viketoft *Senior Engineer in RTL and embedded software
>
> ÅAC Microtec AB
> Dag Hammarskjölds väg 48
> SE-751 83 Uppsala, Sweden
>
> T: +46 702 80 95 97
> http://www.aacmicrotec.com
>
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Re: Time spent in ticks...

2016-10-13 Thread Pavel Pisa
Hello Jakob,

On Thursday 13 of October 2016 18:21:05 Jakob Viketoft wrote:
> I re-tested my case using an -O3 optimization (we have been using -O0
> during development for debugging purposes) and I got a good performance
> boost, but I'm still nowhere near your numbers. I can vouch for that the
> interrupt (exception really) isn't stuck, but that the code unfortunately
> takes a long time to compute. I have a subsecond counter (1/16 of a second)
> which I'm sampling at various places in the code, storing its numbers to a
> buffer in memory so as to interfere with the program as little as possible.
>
> With -O3, a tick handling still takes ~320 us to perform, but the weight
> has now shifted. tc_windup takes ~214 us and the rest is obviously
> _Watchdog_Tick(). When fragmenting the tc_windup function to find the worst
> speed bumps the biggest contribution (~122 us) seem to be coming from scale
> factor recalculation. Since it's 64 bits, it's turned into a software
> function which can be quite time-consuming apparently.
>
> Even though _Watchdog_Tick() "only" takes ~100 us now, it still sound much
> higher than your total tick with a slower system (we're running at 50 MHz).
>
> Is there anything we can do to improve these numbers? Is Clock_isr intended
> to be run uninterrupted as it is now? Can't see that much of the BSP patch
> code has anything to do with the speed of what I'm looking at right now...

the time is measured and timers queue use 64-bit types for time
representation. When higher time measurement resolution than tick
is requested then it is reasonable (optimal) choice but it can be problem
for 16-bit CPUs and some 32-bit one as well.

How you have configured or1k CPU? Have you available hardware multiplier
and barrel shifter or only shift by one and multiplier in SW?
Do the CFLAGS match available instructions?

I am not sure, if there is not 64 division in the time computation
either. This is would be a killer for your CPU. The high resolution
time sources and even tickless timers support can be implemented
with full scaling and adjustment with only shifts, addition and 
multiplications in hot paths.

I have tried to understand to actual RTEMS time-keeping code
some time ago when nanosleep has been introduced and
I have tried to analyze, propose some changes and compared
it to Linux. See the thread following next messages

  https://lists.rtems.org/pipermail/devel/2016-August/015720.html

  https://lists.rtems.org/pipermail/devel/2016-August/015721.html

Some discussed changes to nanosleep has been been implemented
already.

Generally, try to measure how many times multiplication
and division is called in ISR.
I think that I am capable to design implementation which
restricted to mul, add and shr and minimizes number
of transformations but if it sis found that RTEMS implementation
needs to be optimized/changed then it can be task counted
in man months.

Generally, if tick interrupt last more than 10 (may be 20) usec then
there is problem. One its source can be SW implementation ineffectiveness
other that OS selected and possibly application required features
are above selected CPU capabilities.

Best wishes,


Pavel
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel


RE: Time spent in ticks...

2016-10-13 Thread Jakob Viketoft

From: Joel Sherrill [j...@rtems.org]
Sent: Thursday, October 13, 2016 17:38
To: Jakob Viketoft
Cc: devel@rtems.org
Subject: Re: Time spent in ticks...

>I don't have an or1k handy so ran on a sparc/erc32 simulator/
>It is is a SPARC v7 at 15 Mhz.

>These times are in microseconds and based on the tmtests.
>Specifically tm08and tm27.

>(1) rtems_clock_tick: only case - 52
>(2) rtems interrupt: entry overhead returns to interrupted task - 12
>(3) rtems interrupt: exit overhead returns to interrupted task - 4
>(4) rtems interrupt: entry overhead returns to nested interrupt - 11
>(5) rtems interrupt: exit overhead returns to nested interrupt - 3

>The clock tick test has 100 tasks but it looks like they are blocked on a 
>semaphore
>without timeout.

>Your times look WAY too high. Maybe the interrupt is stuck on and
>not being cleared.

>On the erc32, a nominal "nothing to do clock tick" would be 1+2+3 from
>above or 52+12+4 = 68 microseconds. 68 * 15 = 1020 machine cycles.
>So at a higher clock rate, it should be even less time.

>My gut feeling is that I think something is wrong with the ISR handler
>and it is stuck. But the performance is definitely way too high.

>--joel

(Sorry if the format got somewhat I garbled, anything but top-posting have to 
be done manually...)

I re-tested my case using an -O3 optimization (we have been using -O0 during 
development for debugging purposes) and I got a good performance boost, but I'm 
still nowhere near your numbers. I can vouch for that the interrupt (exception 
really) isn't stuck, but that the code unfortunately takes a long time to 
compute. I have a subsecond counter (1/16 of a second) which I'm sampling at 
various places in the code, storing its numbers to a buffer in memory so as to 
interfere with the program as little as possible.

With -O3, a tick handling still takes ~320 us to perform, but the weight has 
now shifted. tc_windup takes ~214 us and the rest is obviously 
_Watchdog_Tick(). When fragmenting the tc_windup function to find the worst 
speed bumps the biggest contribution (~122 us) seem to be coming from scale 
factor recalculation. Since it's 64 bits, it's turned into a software function 
which can be quite time-consuming apparently.

Even though _Watchdog_Tick() "only" takes ~100 us now, it still sound much 
higher than your total tick with a slower system (we're running at 50 MHz).

Is there anything we can do to improve these numbers? Is Clock_isr intended to 
be run uninterrupted as it is now? Can't see that much of the BSP patch code 
has anything to do with the speed of what I'm looking at right now...

 /Jakob


Jakob Viketoft
Senior Engineer in RTL and embedded software

ÅAC Microtec AB
Dag Hammarskjölds väg 48
SE-751 83 Uppsala, Sweden

T: +46 702 80 95 97
http://www.aacmicrotec.com
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Re: Time spent in ticks...

2016-10-13 Thread Joel Sherrill
On Thu, Oct 13, 2016 at 3:51 AM, Jakob Viketoft <
jakob.viket...@aacmicrotec.com> wrote:

> Hello everyone,
>
> We're running on an or1k-based BSP off of 4.11 (with the patches I've
> forwarded in February last year) and have seen some strange sluggishness in
> the system. When measuring using a standalone peripheral clock, I can see
> that we spend between 0.8 - 1.4 ms just handling the tick. This sounds a
> bit absurd to me and I just wanted to send out a couple of questions to see
> if anyone has an inkling of what is going on. I haven't been able to test
> with the or1k-simulator (and the generic_or1k BSP) as it won't easily
> compile with a newer gcc, but I'm running on real hardware. The patches I
> made don't sound like big hold-ups to me either, but a second pair of eyes
> is of course always welcome.
>
> To the questions:
> 1. On the or1k-cpu RTEMS bsp, timer ticks are using the cpu-internal
> timer, which when timing out results in a timer exception. Clock_isr is
> installed as the exception handler for this and thus have complete control
> of the cpu for it's duration. Is this as the Clock_isr is intended to run,
> i.e. no other tasks or interrupts are allowed during tick handling? Just
> want to make sure there is no mismatch between the or1k setup in RTEMS and
> how Clock_isr is intended to run.
>
> 2. Running a very simple test application with three tasks, I delved into
> the _Timecounter_Tick part of the Clock_isr and I have seen the tc_windup()
> is using ~340 us quite consistently and _Watchdog_Tick() is using ~630 when
> all tasks are started. What numbers can be seen at other systems, i.e. what
> should I expect as normal here? Any ideas on what can be wrong? I'll keep
> digging and try to discern any individual culprits as well.
>
>
I don't have an or1k handy so ran on a sparc/erc32 simulator/
It is is a SPARC v7 at 15 Mhz.

These times are in microseconds and based on the tmtests.
Specifically tm08and tm27.

(1) rtems_clock_tick: only case - 52
(2) rtems interrupt: entry overhead returns to interrupted task - 12
(3) rtems interrupt: exit overhead returns to interrupted task - 4
(4) rtems interrupt: entry overhead returns to nested interrupt - 11
(5) rtems interrupt: exit overhead returns to nested interrupt - 3

The clock tick test has 100 tasks but it looks like they are blocked on a
semaphore
without timeout.

Your times look WAY too high. Maybe the interrupt is stuck on and
not being cleared.

On the erc32, a nominal "nothing to do clock tick" would be 1+2+3 from
above or 52+12+4 = 68 microseconds. 68 * 15 = 1020 machine cycles.
So at a higher clock rate, it should be even less time.

My gut feeling is that I think something is wrong with the ISR handler
and it is stuck. But the performance is definitely way too high.

--joel


> Oh, and we use 1 as base for the tick quantum.
>
> (If anyone is interested in looking at our code, bsps and toolchains can
> be downloaded at repo.aacmicrotec.com.)
>
> Best regards,
>
>   /Jakob
>
>
> Jakob Viketoft
> Senior Engineer in RTL and embedded software
>
> ÅAC Microtec AB
> Dag Hammarskjölds väg 48
> SE-751 83 Uppsala, Sweden
>
> T: +46 702 80 95 97
> http://www.aacmicrotec.com
> ___
> devel mailing list
> devel@rtems.org
> http://lists.rtems.org/mailman/listinfo/devel
>
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Re: [PATCH 3/3] libchip/network/if_fxp.c: do not use rtems_interrupt_disable.

2016-10-13 Thread Pavel Pisa
Some typo corrections of e-mail written when I have returned
late in night from meeting with friends.

And some more clarification as well.

On Thursday 13 of October 2016 01:55:30 Pavel Pisa wrote:
> Hello Chris,
>
> On Wednesday 12 of October 2016 23:05:30 Chris Johns wrote:
> > On 13/10/2016 03:22, Pavel Pisa wrote:
> > > But RTEMS i8269 support has been broken to disable
> > > vector for level triggered interrupts in generic
> > > IRQ processing code.
> >
> > I am not sure where the blame should be placed. We need to disable at
> > the PIC when using libbsd with shared PCI interrupts because an
> > interrupt server is used that is common to a few architectures. Some
> > legacy drivers like this one assume processing inside the interrupt
> > context. It is not clear to me shared interrupts were ever supported
> > with these drivers. I would assume it means some type of per driver
> > interrupt chaining.
> >
> > > So I have introduced reenable
> > > bsp_interrupt_vector_enable to ensure that driver
> > > can work even with that setup.
> >
> > I am not sure we can mix both models without some changes.
>
> I hope that interrupt server should work after the committed change.
> At least, I have feeling that it has been outcome of previous debate.
>
> The IRQ server bsp_interrupt_server_trigger() disables given IRQ
> vector on PIC level in hardware IRQ context by
> bsp_interrupt_vector_disable()
>
> See
>
>  
> https://git.rtems.org/rtems/tree/c/src/lib/libbsp/shared/src/irq-server.c#n
>64
>
> I would not push the changes if it has not be a case.
>
> > > classic networking: adapt FXP driver to work with actual PCI and IRQ
> > > code.
> > >
> > > The hack is not required after
> >
> > Which hack?
>
> The reenabling of PIC level ISR in driver code. Generally I consider
> the functions bsp_interrupt_vector_disable() and
> bsp_interrupt_vector_enable() should be used as paired and use should allow
> to use them even
> if implemented as counting disable clock> . 

spellchecker ...

s/clock/lock/

The counting of disable calls is required if vector is shared
because if multiple hard IRQ handlers needs to disable
source at controller level (generally bad practice, better is handling
on device level if possible) then if source is enabled on controller level
when the first worker theread finishes processing and cause of the shared
level triggered IRQ is other device for which threaded handler
dis not finish yet then there is complete system dead/livelock.

Linux provides next functions to maintain controller side
interrupt disable and enable

https://www.kernel.org/doc/htmldocs/kernel-api/hardware.html#idp11592384

 disable_irq - this function guarantees that after it finishes
   corresponding IRQ source handlers are not invoked
   and not running in parallel (wait for their finish)
   This function cannot be called in the handler itself
   (deadlock).

 disable_irq_nosync - disables vector at controller level, does not guarantee
   that the last actually running instances of each of shared
   handlers is finished before call return. This can be called
   for source from its own hard context handler.

 enable_irq - undoes effect of the corresponding disable_irq, it is necessary
   to call it as many times as disable_irq has been called before.
   Only when all calls are balanced then controller enables
   source.

I this that when all possible HW and SW constellations are possible then
this is only usable API.

And yes, there are strange thing in the world.

I have debugged over e-mail my CAN driver at another university when I have 
found that PCI card has level triggered IRQ output but multiple CAN 
controllers connected to local bus behind card PCI bridge has shared
interrupts and bridge has asserted interrupt only at shared signal
raising edge. So if PCI interrupt processing finished without beeing
sure that all chips behind bridge has their output inactive then
the device and CAN control/monitoring inside some intelligent van
on the street has been lost. Fortunately not without driver at that time.

> That is implementation where bsp_interrupt_vector_enable() enables vector
> only after same number of calls as was the number of calls
> bsp_interrupt_vector_enable()
>
> > > bsps/i386: Separate variable for i8259 IRQs disable due to in progress
> > > state.
> > >
> > > so I have removed unneeded reenable from daemon hot path.
> > > I have left it in the setup to be sure that it is enabled
> > > after some driver stop start cycles.
> > >
> > > In theory, this occurrence should be deleted as well.
> > >
> > > Generally, I am not sure if/how much I have broken/I am
> > > breaking i386 support by all these changes.
> >
> > I have not testing the i386 with libbsd with your recent changes. I will
> > see what I can do. I did not notice the enables/disables had been
> > changed.
> >
> > > I believe 

Time spent in ticks...

2016-10-13 Thread Jakob Viketoft
Hello everyone,

We're running on an or1k-based BSP off of 4.11 (with the patches I've forwarded 
in February last year) and have seen some strange sluggishness in the system. 
When measuring using a standalone peripheral clock, I can see that we spend 
between 0.8 - 1.4 ms just handling the tick. This sounds a bit absurd to me and 
I just wanted to send out a couple of questions to see if anyone has an inkling 
of what is going on. I haven't been able to test with the or1k-simulator (and 
the generic_or1k BSP) as it won't easily compile with a newer gcc, but I'm 
running on real hardware. The patches I made don't sound like big hold-ups to 
me either, but a second pair of eyes is of course always welcome.

To the questions:
1. On the or1k-cpu RTEMS bsp, timer ticks are using the cpu-internal timer, 
which when timing out results in a timer exception. Clock_isr is installed as 
the exception handler for this and thus have complete control of the cpu for 
it's duration. Is this as the Clock_isr is intended to run, i.e. no other tasks 
or interrupts are allowed during tick handling? Just want to make sure there is 
no mismatch between the or1k setup in RTEMS and how Clock_isr is intended to 
run.

2. Running a very simple test application with three tasks, I delved into the 
_Timecounter_Tick part of the Clock_isr and I have seen the tc_windup() is 
using ~340 us quite consistently and _Watchdog_Tick() is using ~630 when all 
tasks are started. What numbers can be seen at other systems, i.e. what should 
I expect as normal here? Any ideas on what can be wrong? I'll keep digging and 
try to discern any individual culprits as well. 

Oh, and we use 1 as base for the tick quantum.

(If anyone is interested in looking at our code, bsps and toolchains can be 
downloaded at repo.aacmicrotec.com.)

Best regards,

  /Jakob


Jakob Viketoft
Senior Engineer in RTL and embedded software

ÅAC Microtec AB
Dag Hammarskjölds väg 48
SE-751 83 Uppsala, Sweden

T: +46 702 80 95 97
http://www.aacmicrotec.com
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel