Re: Time spent in ticks...
Hello Joel, On Friday 14 of October 2016 00:56:21 Joel Sherrill wrote: > On Thu, Oct 13, 2016 at 1:37 PM, Joel Sherrillwrote: > > On Thu, Oct 13, 2016 at 11:21 AM, Jakob Viketoft < > > > > jakob.viket...@aacmicrotec.com> wrote: > >> *From:* Joel Sherrill [j...@rtems.org] > >> *Sent:* Thursday, October 13, 2016 17:38 > >> *To:* Jakob Viketoft > >> *Cc:* devel@rtems.org > >> *Subject:* Re: Time spent in ticks... > >> > >> >I don't have an or1k handy so ran on a sparc/erc32 simulator/ > >> >It is is a SPARC v7 at 15 Mhz. > >> > > >> >These times are in microseconds and based on the tmtests. > >> >Specifically tm08and tm27. > >> > > >> >(1) rtems_clock_tick: only case - 52 > >> >(2) rtems interrupt: entry overhead returns to interrupted task - 12 > >> >(3) rtems interrupt: exit overhead returns to interrupted task - 4 > >> >(4) rtems interrupt: entry overhead returns to nested interrupt - 11 > >> >(5) rtems interrupt: exit overhead returns to nested interrupt - 3 > > > > The above was from the master with SMP enabled. I repeated it with > > SMP disabled and it had no impact. > > > > Since the timing change is post 4.11, I decided to try 4.11 with SMP > > disabled: > > > > rtems_clock_tick: only case - 42 > > rtems interrupt: entry overhead returns to interrupted task - 11 > > rtems interrupt: exit overhead returns to interrupted task - 4 > > rtems interrupt: entry overhead returns to nested interrupt - 11 > > rtems interrupt: exit overhead returns to nested interrupt - 3 > > > > So 42 + 12 + 4 = 58 microseconds, 58 * 15 = 870 cycles > > > > So the overhead has gone up some but as Pavel says it is quite likely > > some mathematical operation on 64 bit types is slow on your CPU. > > > > HINT: If you can write a benchmark for 64-bit operations, > > it would be a good comparison between CPUs and might > > highlight where the software implementation needs improvement. > > I decided that another good point of reference was the powerpc/psim BSP. It > reports the benchmarks in instructions: > > (1) rtems_clock_tick: only case - 229 > (2) rtems interrupt: entry overhead returns to interrupted task - 102 > (3) rtems interrupt: exit overhead returns to interrupted task - 95 > (4) rtems interrupt: entry overhead returns to nested interrupt - 105 > (5) rtems interrupt: exit overhead returns to nested interrupt - 85 > > 229 + 102 + 96 = 427 instructions. > > That seems roughly inline with the erc32 which is 1 cycle for all > instructions > except loads which are 3 and stores which are 2. And the sparc has > register windows so entering and exiting an ISR can potentially save > and restore a lot of registers. > > So I am still leaning to Pavel's explanation that some primitive operation > is really inefficient. These numbers looks good. I would expect that in the case of or1k there can be real penalty if it is synthesized without multiply or barrel shifter. Or CPU has these and compiler is set to not use them. If that cannot be corrected (for example hardware multiplier or shifter would cause design to not fit in FPGA) then there is real problem and mitchmatch between RTEMS and CPU target area. This can be solved by configurable time measurement data type. For example use only ticks in 32-bit number and change even timers queues to this type. It cannot be unconditional, because today users of RTEMS expect that the time resolution is better and that time does not overflow in longer range, ideally 2100 or more supported. As for actual code, if I remember, I have not liked conversions of monotonic to ticks in nanosleep and there has been some division. The division is not in tick code (at least I thinks so). So this should be OK. The packet sec and fractions format of timespec for one of queues has some interresting properties but on the other hand its repcaking has some overhead even in the tick processing. If we take that for some CPU time spent in tick is for example 50 usec then it is not problem if there are no deadlines in the similar range. For example, tollerated latencies of 500 or 1000 usec and critical tasks execution time is 300 usec then it is OK. If the tick rate is selected 1 kHz then 5% of CPU time consumption by time keeping looks like quite a lot. If the timing of applications can tolerated tick time 0.1 sec (10 Hz) then load contribution by tick processing is neglectable. So all these numbers are relative to needs of planned target application. Best wishes, Pavel ___ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel
Re: Time spent in ticks...
On Thu, Oct 13, 2016 at 1:37 PM, Joel Sherrillwrote: > > > On Thu, Oct 13, 2016 at 11:21 AM, Jakob Viketoft < > jakob.viket...@aacmicrotec.com> wrote: > >> >> *From:* Joel Sherrill [j...@rtems.org] >> *Sent:* Thursday, October 13, 2016 17:38 >> *To:* Jakob Viketoft >> *Cc:* devel@rtems.org >> *Subject:* Re: Time spent in ticks... >> >> >I don't have an or1k handy so ran on a sparc/erc32 simulator/ >> >It is is a SPARC v7 at 15 Mhz. >> >> >These times are in microseconds and based on the tmtests. >> >Specifically tm08and tm27. >> >> >(1) rtems_clock_tick: only case - 52 >> >(2) rtems interrupt: entry overhead returns to interrupted task - 12 >> >(3) rtems interrupt: exit overhead returns to interrupted task - 4 >> >(4) rtems interrupt: entry overhead returns to nested interrupt - 11 >> >(5) rtems interrupt: exit overhead returns to nested interrupt - 3 >> >> > The above was from the master with SMP enabled. I repeated it with > SMP disabled and it had no impact. > > Since the timing change is post 4.11, I decided to try 4.11 with SMP > disabled: > > rtems_clock_tick: only case - 42 > rtems interrupt: entry overhead returns to interrupted task - 11 > rtems interrupt: exit overhead returns to interrupted task - 4 > rtems interrupt: entry overhead returns to nested interrupt - 11 > rtems interrupt: exit overhead returns to nested interrupt - 3 > > So 42 + 12 + 4 = 58 microseconds, 58 * 15 = 870 cycles > > So the overhead has gone up some but as Pavel says it is quite likely > some mathematical operation on 64 bit types is slow on your CPU. > > HINT: If you can write a benchmark for 64-bit operations, > it would be a good comparison between CPUs and might > highlight where the software implementation needs improvement. > I decided that another good point of reference was the powerpc/psim BSP. It reports the benchmarks in instructions: (1) rtems_clock_tick: only case - 229 (2) rtems interrupt: entry overhead returns to interrupted task - 102 (3) rtems interrupt: exit overhead returns to interrupted task - 95 (4) rtems interrupt: entry overhead returns to nested interrupt - 105 (5) rtems interrupt: exit overhead returns to nested interrupt - 85 229 + 102 + 96 = 427 instructions. That seems roughly inline with the erc32 which is 1 cycle for all instructions except loads which are 3 and stores which are 2. And the sparc has register windows so entering and exiting an ISR can potentially save and restore a lot of registers. So I am still leaning to Pavel's explanation that some primitive operation is really inefficient. > > >> >The clock tick test has 100 tasks but it looks like they are blocked on >> a semaphore >> >without timeout. >> >> >Your times look WAY too high. Maybe the interrupt is stuck on and >> >not being cleared. >> >> >On the erc32, a nominal "nothing to do clock tick" would be 1+2+3 from >> >above or 52+12+4 = 68 microseconds. 68 * 15 = 1020 machine cycles. >> >So at a higher clock rate, it should be even less time. >> >> >My gut feeling is that I think something is wrong with the ISR handler >> >and it is stuck. But the performance is definitely way too high. >> >> >--joel >> >> (Sorry if the format got somewhat I garbled, anything but top-posting >> have to be done manually...) >> >> I re-tested my case using an -O3 optimization (we have been using -O0 >> during development for debugging purposes) and I got a good performance >> boost, but I'm still nowhere near your numbers. I can vouch for that the >> interrupt (exception really) isn't stuck, but that the code unfortunately >> takes a long time to compute. I have a subsecond counter (1/16 of a second) >> which I'm sampling at various places in the code, storing its numbers to a >> buffer in memory so as to interfere with the program as little as possible. >> >> With -O3, a tick handling still takes ~320 us to perform, but the weight >> has now shifted. tc_windup takes ~214 us and the rest is obviously >> _Watchdog_Tick(). When fragmenting the tc_windup function to find the worst >> speed bumps the biggest contribution (~122 us) seem to be coming from scale >> factor recalculation. Since it's 64 bits, it's turned into a software >> function which can be quite time-consuming apparently. >> >> Even though _Watchdog_Tick() "only" takes ~100 us now, it still sound >> much higher than your total tick with a slower system (we're running at 50 >> MHz). >> >> Is there anything we can do to improve these numbers? Is Clock_isr >> intended to be run uninterrupted as it is now? Can't see that much of the >> BSP patch code has anything to do with the speed of what I'm looking at >> right now... >> >> /Jakob >> >> >> >> *Jakob Viketoft *Senior Engineer in RTL and embedded software >> >> ÅAC Microtec AB >> Dag Hammarskjölds väg 48 >> SE-751 83 Uppsala, Sweden >> >> T: +46 702 80 95 97 >> http://www.aacmicrotec.com >> > > ___ devel mailing list devel@rtems.org
Re: Time spent in ticks...
On Thu, Oct 13, 2016 at 11:21 AM, Jakob Viketoft < jakob.viket...@aacmicrotec.com> wrote: > > *From:* Joel Sherrill [j...@rtems.org] > *Sent:* Thursday, October 13, 2016 17:38 > *To:* Jakob Viketoft > *Cc:* devel@rtems.org > *Subject:* Re: Time spent in ticks... > > >I don't have an or1k handy so ran on a sparc/erc32 simulator/ > >It is is a SPARC v7 at 15 Mhz. > > >These times are in microseconds and based on the tmtests. > >Specifically tm08and tm27. > > >(1) rtems_clock_tick: only case - 52 > >(2) rtems interrupt: entry overhead returns to interrupted task - 12 > >(3) rtems interrupt: exit overhead returns to interrupted task - 4 > >(4) rtems interrupt: entry overhead returns to nested interrupt - 11 > >(5) rtems interrupt: exit overhead returns to nested interrupt - 3 > > The above was from the master with SMP enabled. I repeated it with SMP disabled and it had no impact. Since the timing change is post 4.11, I decided to try 4.11 with SMP disabled: rtems_clock_tick: only case - 42 rtems interrupt: entry overhead returns to interrupted task - 11 rtems interrupt: exit overhead returns to interrupted task - 4 rtems interrupt: entry overhead returns to nested interrupt - 11 rtems interrupt: exit overhead returns to nested interrupt - 3 So 42 + 12 + 4 = 58 microseconds, 58 * 15 = 870 cycles So the overhead has gone up some but as Pavel says it is quite likely some mathematical operation on 64 bit types is slow on your CPU. HINT: If you can write a benchmark for 64-bit operations, it would be a good comparison between CPUs and might highlight where the software implementation needs improvement. > >The clock tick test has 100 tasks but it looks like they are blocked on a > semaphore > >without timeout. > > >Your times look WAY too high. Maybe the interrupt is stuck on and > >not being cleared. > > >On the erc32, a nominal "nothing to do clock tick" would be 1+2+3 from > >above or 52+12+4 = 68 microseconds. 68 * 15 = 1020 machine cycles. > >So at a higher clock rate, it should be even less time. > > >My gut feeling is that I think something is wrong with the ISR handler > >and it is stuck. But the performance is definitely way too high. > > >--joel > > (Sorry if the format got somewhat I garbled, anything but top-posting have > to be done manually...) > > I re-tested my case using an -O3 optimization (we have been using -O0 > during development for debugging purposes) and I got a good performance > boost, but I'm still nowhere near your numbers. I can vouch for that the > interrupt (exception really) isn't stuck, but that the code unfortunately > takes a long time to compute. I have a subsecond counter (1/16 of a second) > which I'm sampling at various places in the code, storing its numbers to a > buffer in memory so as to interfere with the program as little as possible. > > With -O3, a tick handling still takes ~320 us to perform, but the weight > has now shifted. tc_windup takes ~214 us and the rest is obviously > _Watchdog_Tick(). When fragmenting the tc_windup function to find the worst > speed bumps the biggest contribution (~122 us) seem to be coming from scale > factor recalculation. Since it's 64 bits, it's turned into a software > function which can be quite time-consuming apparently. > > Even though _Watchdog_Tick() "only" takes ~100 us now, it still sound much > higher than your total tick with a slower system (we're running at 50 MHz). > > Is there anything we can do to improve these numbers? Is Clock_isr > intended to be run uninterrupted as it is now? Can't see that much of the > BSP patch code has anything to do with the speed of what I'm looking at > right now... > > /Jakob > > > > *Jakob Viketoft *Senior Engineer in RTL and embedded software > > ÅAC Microtec AB > Dag Hammarskjölds väg 48 > SE-751 83 Uppsala, Sweden > > T: +46 702 80 95 97 > http://www.aacmicrotec.com > ___ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel
Re: Time spent in ticks...
Hello Jakob, On Thursday 13 of October 2016 18:21:05 Jakob Viketoft wrote: > I re-tested my case using an -O3 optimization (we have been using -O0 > during development for debugging purposes) and I got a good performance > boost, but I'm still nowhere near your numbers. I can vouch for that the > interrupt (exception really) isn't stuck, but that the code unfortunately > takes a long time to compute. I have a subsecond counter (1/16 of a second) > which I'm sampling at various places in the code, storing its numbers to a > buffer in memory so as to interfere with the program as little as possible. > > With -O3, a tick handling still takes ~320 us to perform, but the weight > has now shifted. tc_windup takes ~214 us and the rest is obviously > _Watchdog_Tick(). When fragmenting the tc_windup function to find the worst > speed bumps the biggest contribution (~122 us) seem to be coming from scale > factor recalculation. Since it's 64 bits, it's turned into a software > function which can be quite time-consuming apparently. > > Even though _Watchdog_Tick() "only" takes ~100 us now, it still sound much > higher than your total tick with a slower system (we're running at 50 MHz). > > Is there anything we can do to improve these numbers? Is Clock_isr intended > to be run uninterrupted as it is now? Can't see that much of the BSP patch > code has anything to do with the speed of what I'm looking at right now... the time is measured and timers queue use 64-bit types for time representation. When higher time measurement resolution than tick is requested then it is reasonable (optimal) choice but it can be problem for 16-bit CPUs and some 32-bit one as well. How you have configured or1k CPU? Have you available hardware multiplier and barrel shifter or only shift by one and multiplier in SW? Do the CFLAGS match available instructions? I am not sure, if there is not 64 division in the time computation either. This is would be a killer for your CPU. The high resolution time sources and even tickless timers support can be implemented with full scaling and adjustment with only shifts, addition and multiplications in hot paths. I have tried to understand to actual RTEMS time-keeping code some time ago when nanosleep has been introduced and I have tried to analyze, propose some changes and compared it to Linux. See the thread following next messages https://lists.rtems.org/pipermail/devel/2016-August/015720.html https://lists.rtems.org/pipermail/devel/2016-August/015721.html Some discussed changes to nanosleep has been been implemented already. Generally, try to measure how many times multiplication and division is called in ISR. I think that I am capable to design implementation which restricted to mul, add and shr and minimizes number of transformations but if it sis found that RTEMS implementation needs to be optimized/changed then it can be task counted in man months. Generally, if tick interrupt last more than 10 (may be 20) usec then there is problem. One its source can be SW implementation ineffectiveness other that OS selected and possibly application required features are above selected CPU capabilities. Best wishes, Pavel ___ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel
RE: Time spent in ticks...
From: Joel Sherrill [j...@rtems.org] Sent: Thursday, October 13, 2016 17:38 To: Jakob Viketoft Cc: devel@rtems.org Subject: Re: Time spent in ticks... >I don't have an or1k handy so ran on a sparc/erc32 simulator/ >It is is a SPARC v7 at 15 Mhz. >These times are in microseconds and based on the tmtests. >Specifically tm08and tm27. >(1) rtems_clock_tick: only case - 52 >(2) rtems interrupt: entry overhead returns to interrupted task - 12 >(3) rtems interrupt: exit overhead returns to interrupted task - 4 >(4) rtems interrupt: entry overhead returns to nested interrupt - 11 >(5) rtems interrupt: exit overhead returns to nested interrupt - 3 >The clock tick test has 100 tasks but it looks like they are blocked on a >semaphore >without timeout. >Your times look WAY too high. Maybe the interrupt is stuck on and >not being cleared. >On the erc32, a nominal "nothing to do clock tick" would be 1+2+3 from >above or 52+12+4 = 68 microseconds. 68 * 15 = 1020 machine cycles. >So at a higher clock rate, it should be even less time. >My gut feeling is that I think something is wrong with the ISR handler >and it is stuck. But the performance is definitely way too high. >--joel (Sorry if the format got somewhat I garbled, anything but top-posting have to be done manually...) I re-tested my case using an -O3 optimization (we have been using -O0 during development for debugging purposes) and I got a good performance boost, but I'm still nowhere near your numbers. I can vouch for that the interrupt (exception really) isn't stuck, but that the code unfortunately takes a long time to compute. I have a subsecond counter (1/16 of a second) which I'm sampling at various places in the code, storing its numbers to a buffer in memory so as to interfere with the program as little as possible. With -O3, a tick handling still takes ~320 us to perform, but the weight has now shifted. tc_windup takes ~214 us and the rest is obviously _Watchdog_Tick(). When fragmenting the tc_windup function to find the worst speed bumps the biggest contribution (~122 us) seem to be coming from scale factor recalculation. Since it's 64 bits, it's turned into a software function which can be quite time-consuming apparently. Even though _Watchdog_Tick() "only" takes ~100 us now, it still sound much higher than your total tick with a slower system (we're running at 50 MHz). Is there anything we can do to improve these numbers? Is Clock_isr intended to be run uninterrupted as it is now? Can't see that much of the BSP patch code has anything to do with the speed of what I'm looking at right now... /Jakob Jakob Viketoft Senior Engineer in RTL and embedded software ÅAC Microtec AB Dag Hammarskjölds väg 48 SE-751 83 Uppsala, Sweden T: +46 702 80 95 97 http://www.aacmicrotec.com ___ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel
Re: Time spent in ticks...
On Thu, Oct 13, 2016 at 3:51 AM, Jakob Viketoft < jakob.viket...@aacmicrotec.com> wrote: > Hello everyone, > > We're running on an or1k-based BSP off of 4.11 (with the patches I've > forwarded in February last year) and have seen some strange sluggishness in > the system. When measuring using a standalone peripheral clock, I can see > that we spend between 0.8 - 1.4 ms just handling the tick. This sounds a > bit absurd to me and I just wanted to send out a couple of questions to see > if anyone has an inkling of what is going on. I haven't been able to test > with the or1k-simulator (and the generic_or1k BSP) as it won't easily > compile with a newer gcc, but I'm running on real hardware. The patches I > made don't sound like big hold-ups to me either, but a second pair of eyes > is of course always welcome. > > To the questions: > 1. On the or1k-cpu RTEMS bsp, timer ticks are using the cpu-internal > timer, which when timing out results in a timer exception. Clock_isr is > installed as the exception handler for this and thus have complete control > of the cpu for it's duration. Is this as the Clock_isr is intended to run, > i.e. no other tasks or interrupts are allowed during tick handling? Just > want to make sure there is no mismatch between the or1k setup in RTEMS and > how Clock_isr is intended to run. > > 2. Running a very simple test application with three tasks, I delved into > the _Timecounter_Tick part of the Clock_isr and I have seen the tc_windup() > is using ~340 us quite consistently and _Watchdog_Tick() is using ~630 when > all tasks are started. What numbers can be seen at other systems, i.e. what > should I expect as normal here? Any ideas on what can be wrong? I'll keep > digging and try to discern any individual culprits as well. > > I don't have an or1k handy so ran on a sparc/erc32 simulator/ It is is a SPARC v7 at 15 Mhz. These times are in microseconds and based on the tmtests. Specifically tm08and tm27. (1) rtems_clock_tick: only case - 52 (2) rtems interrupt: entry overhead returns to interrupted task - 12 (3) rtems interrupt: exit overhead returns to interrupted task - 4 (4) rtems interrupt: entry overhead returns to nested interrupt - 11 (5) rtems interrupt: exit overhead returns to nested interrupt - 3 The clock tick test has 100 tasks but it looks like they are blocked on a semaphore without timeout. Your times look WAY too high. Maybe the interrupt is stuck on and not being cleared. On the erc32, a nominal "nothing to do clock tick" would be 1+2+3 from above or 52+12+4 = 68 microseconds. 68 * 15 = 1020 machine cycles. So at a higher clock rate, it should be even less time. My gut feeling is that I think something is wrong with the ISR handler and it is stuck. But the performance is definitely way too high. --joel > Oh, and we use 1 as base for the tick quantum. > > (If anyone is interested in looking at our code, bsps and toolchains can > be downloaded at repo.aacmicrotec.com.) > > Best regards, > > /Jakob > > > Jakob Viketoft > Senior Engineer in RTL and embedded software > > ÅAC Microtec AB > Dag Hammarskjölds väg 48 > SE-751 83 Uppsala, Sweden > > T: +46 702 80 95 97 > http://www.aacmicrotec.com > ___ > devel mailing list > devel@rtems.org > http://lists.rtems.org/mailman/listinfo/devel > ___ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel
Re: [PATCH 3/3] libchip/network/if_fxp.c: do not use rtems_interrupt_disable.
Some typo corrections of e-mail written when I have returned late in night from meeting with friends. And some more clarification as well. On Thursday 13 of October 2016 01:55:30 Pavel Pisa wrote: > Hello Chris, > > On Wednesday 12 of October 2016 23:05:30 Chris Johns wrote: > > On 13/10/2016 03:22, Pavel Pisa wrote: > > > But RTEMS i8269 support has been broken to disable > > > vector for level triggered interrupts in generic > > > IRQ processing code. > > > > I am not sure where the blame should be placed. We need to disable at > > the PIC when using libbsd with shared PCI interrupts because an > > interrupt server is used that is common to a few architectures. Some > > legacy drivers like this one assume processing inside the interrupt > > context. It is not clear to me shared interrupts were ever supported > > with these drivers. I would assume it means some type of per driver > > interrupt chaining. > > > > > So I have introduced reenable > > > bsp_interrupt_vector_enable to ensure that driver > > > can work even with that setup. > > > > I am not sure we can mix both models without some changes. > > I hope that interrupt server should work after the committed change. > At least, I have feeling that it has been outcome of previous debate. > > The IRQ server bsp_interrupt_server_trigger() disables given IRQ > vector on PIC level in hardware IRQ context by > bsp_interrupt_vector_disable() > > See > > > https://git.rtems.org/rtems/tree/c/src/lib/libbsp/shared/src/irq-server.c#n >64 > > I would not push the changes if it has not be a case. > > > > classic networking: adapt FXP driver to work with actual PCI and IRQ > > > code. > > > > > > The hack is not required after > > > > Which hack? > > The reenabling of PIC level ISR in driver code. Generally I consider > the functions bsp_interrupt_vector_disable() and > bsp_interrupt_vector_enable() should be used as paired and use should allow > to use them even > if implemented as counting disable clock> . spellchecker ... s/clock/lock/ The counting of disable calls is required if vector is shared because if multiple hard IRQ handlers needs to disable source at controller level (generally bad practice, better is handling on device level if possible) then if source is enabled on controller level when the first worker theread finishes processing and cause of the shared level triggered IRQ is other device for which threaded handler dis not finish yet then there is complete system dead/livelock. Linux provides next functions to maintain controller side interrupt disable and enable https://www.kernel.org/doc/htmldocs/kernel-api/hardware.html#idp11592384 disable_irq - this function guarantees that after it finishes corresponding IRQ source handlers are not invoked and not running in parallel (wait for their finish) This function cannot be called in the handler itself (deadlock). disable_irq_nosync - disables vector at controller level, does not guarantee that the last actually running instances of each of shared handlers is finished before call return. This can be called for source from its own hard context handler. enable_irq - undoes effect of the corresponding disable_irq, it is necessary to call it as many times as disable_irq has been called before. Only when all calls are balanced then controller enables source. I this that when all possible HW and SW constellations are possible then this is only usable API. And yes, there are strange thing in the world. I have debugged over e-mail my CAN driver at another university when I have found that PCI card has level triggered IRQ output but multiple CAN controllers connected to local bus behind card PCI bridge has shared interrupts and bridge has asserted interrupt only at shared signal raising edge. So if PCI interrupt processing finished without beeing sure that all chips behind bridge has their output inactive then the device and CAN control/monitoring inside some intelligent van on the street has been lost. Fortunately not without driver at that time. > That is implementation where bsp_interrupt_vector_enable() enables vector > only after same number of calls as was the number of calls > bsp_interrupt_vector_enable() > > > > bsps/i386: Separate variable for i8259 IRQs disable due to in progress > > > state. > > > > > > so I have removed unneeded reenable from daemon hot path. > > > I have left it in the setup to be sure that it is enabled > > > after some driver stop start cycles. > > > > > > In theory, this occurrence should be deleted as well. > > > > > > Generally, I am not sure if/how much I have broken/I am > > > breaking i386 support by all these changes. > > > > I have not testing the i386 with libbsd with your recent changes. I will > > see what I can do. I did not notice the enables/disables had been > > changed. > > > > > I believe
Time spent in ticks...
Hello everyone, We're running on an or1k-based BSP off of 4.11 (with the patches I've forwarded in February last year) and have seen some strange sluggishness in the system. When measuring using a standalone peripheral clock, I can see that we spend between 0.8 - 1.4 ms just handling the tick. This sounds a bit absurd to me and I just wanted to send out a couple of questions to see if anyone has an inkling of what is going on. I haven't been able to test with the or1k-simulator (and the generic_or1k BSP) as it won't easily compile with a newer gcc, but I'm running on real hardware. The patches I made don't sound like big hold-ups to me either, but a second pair of eyes is of course always welcome. To the questions: 1. On the or1k-cpu RTEMS bsp, timer ticks are using the cpu-internal timer, which when timing out results in a timer exception. Clock_isr is installed as the exception handler for this and thus have complete control of the cpu for it's duration. Is this as the Clock_isr is intended to run, i.e. no other tasks or interrupts are allowed during tick handling? Just want to make sure there is no mismatch between the or1k setup in RTEMS and how Clock_isr is intended to run. 2. Running a very simple test application with three tasks, I delved into the _Timecounter_Tick part of the Clock_isr and I have seen the tc_windup() is using ~340 us quite consistently and _Watchdog_Tick() is using ~630 when all tasks are started. What numbers can be seen at other systems, i.e. what should I expect as normal here? Any ideas on what can be wrong? I'll keep digging and try to discern any individual culprits as well. Oh, and we use 1 as base for the tick quantum. (If anyone is interested in looking at our code, bsps and toolchains can be downloaded at repo.aacmicrotec.com.) Best regards, /Jakob Jakob Viketoft Senior Engineer in RTL and embedded software ÅAC Microtec AB Dag Hammarskjölds väg 48 SE-751 83 Uppsala, Sweden T: +46 702 80 95 97 http://www.aacmicrotec.com ___ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel