[lng-odp] [Bug 1449] odp_timer_test core dump
https://bugs.linaro.org/show_bug.cgi?id=1449 Maxim Uvarovchanged: What|Removed |Added Status|CONFIRMED |RESOLVED Resolution|--- |FIXED CC||maxim.uva...@linaro.org --- Comment #14 from Maxim Uvarov --- f73b184 linux-generic: timer use SIGEV_THREAD_ID -- You are receiving this mail because: You are on the CC list for the bug.___ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp
[lng-odp] [Bug 1449] odp_timer_test core dump
https://bugs.linaro.org/show_bug.cgi?id=1449 --- Comment #13 from Ivan Khoronzhuk--- Sent series that combine previously mentioned updated patches: [lng-odp] [PATCH 0/3] add warnings to improve timer usage https://lists.linaro.org/pipermail/lng-odp/2015-November/017160.html -- You are receiving this mail because: You are on the CC list for the bug.___ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp
[lng-odp] [Bug 1449] odp_timer_test core dump
https://bugs.linaro.org/show_bug.cgi?id=1449 --- Comment #12 from Ivan Khoronzhuk--- As were said previously we shouldn't set period less than resolution. It's incorrect and makes example to work in it's own time. The actual resolution of timer is not 1ns it's much more and includes code path from thread creation till sending event on the queue and depends on CPU freq. It can differ from system to system and circumstances. On my PC it was ~2ms (I filtered picks spent on scheduling and it's like in the best case, like CPU0 is isolated). If set resolution to be more than jiffy, say 15ms and timeout 30ms the error in question is still present. Because linux scheduler can switch worker task that trying to set abs time, in this time CPU0 can count several ticks while worker thread is sleeping. When it's back, it sees that time already spent and cannot be set. tick = odp_timer_current_tick(gbls->tp); tick += period; > here we can be interrupted for instance by scheduler < here, when back, period is expired odp_timer_set_abs(ttp->tim, tick, >ev); > here an error after check Ony case to work it correctly is under kernel style spin_lock (that is not needed in real app) or CPUs to be isolated, that is also problem of the OS. So this is an issue of non-real-time Linux, not the timer. Maybe it can be a little improved but it can be really fixed only in case if each core (including CPU0) is isolated. So, to work this example test correctly the timeout should be set including time spent by scheduler on handling some other tasks, etc... in my case with 4CPUs (2 real, 2 virtual) and overloaded by other tasks system it took about 4jiffies, like: "./odp_timer_test -p 8 -r 2" It's like question, why to set period less than your system can handle? Also, don't forget that actual resolution of the timer directly depends on how good CPU0 is isolated, as it handles timer notifications updating ticks. Minimum timeout (period set as argument) value depends how CPUs of worker threads are isolated. The following patches already on review allow to improve situation a little. Maybe I will send some more, but it's definitely not the problem of the timer and this bug should be closed. [lng-odp] [PATCH] example: timer: don't set timeout less than resolution https://lists.linaro.org/pipermail/lng-odp/2015-October/016772.html [lng-odp] [PATCH] linux-generic: odp_timer: abort if tick is lost https://lists.linaro.org/pipermail/lng-odp/2015-October/016773.html [lng-odp] [PATCH] linux-generic: cpumask: exclude CPU0 from odp_cpumask_default_worker https://lists.linaro.org/pipermail/lng-odp/2015-October/016542.html -- You are receiving this mail because: You are on the CC list for the bug.___ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp
[lng-odp] [Bug 1449] odp_timer_test core dump
https://bugs.linaro.org/show_bug.cgi?id=1449 --- Comment #11 from Ivan Khoronzhuk--- Will send patch series soon to add fixes and corrections eliminating resolution impact demonstrated by previous comment and decreasing impact of schedule delays. -- You are receiving this mail because: You are on the CC list for the bug.___ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp
[lng-odp] [Bug 1449] odp_timer_test core dump
https://bugs.linaro.org/show_bug.cgi?id=1449 --- Comment #8 from Ivan Khoronzhuk--- What timer resolution? Could you please add init log. In my case it's 10ms. You are requesting 5us. Also we should forget timer accuracy +-1 timer tick. I need find some time to look a little deeper and debug it on my platform for convenience, I reuse linux-generic timer API yet. -- You are receiving this mail because: You are on the CC list for the bug.___ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp
[lng-odp] [Bug 1449] odp_timer_test core dump
https://bugs.linaro.org/show_bug.cgi?id=1449 --- Comment #9 from Ivan Khoronzhuk--- *should -> shouldn't. Hands always forget about it... -- You are receiving this mail because: You are on the CC list for the bug.___ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp
[lng-odp] [Bug 1449] odp_timer_test core dump
https://bugs.linaro.org/show_bug.cgi?id=1449 --- Comment #10 from Ivan Khoronzhuk--- The following correction allows to visually demonstrate resolution impact: [lng-odp] [PATCH v2 1/5] example: timer: print timer ticks/ns table instead of cycles/ns https://lists.linaro.org/pipermail/lng-odp/2015-September/015351.html -- You are receiving this mail because: You are on the CC list for the bug.___ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp
[lng-odp] [Bug 1449] odp_timer_test core dump
https://bugs.linaro.org/show_bug.cgi?id=1449 Mike Holmeschanged: What|Removed |Added Assignee|ola.liljed...@linaro.org|ivan.khoronz...@linaro.org --- Comment #7 from Mike Holmes --- Ivan You have been digging into the timers, can you correct this simple case ? -- You are receiving this mail because: You are on the CC list for the bug.___ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp
[lng-odp] [Bug 1449] odp_timer_test core dump
https://bugs.linaro.org/show_bug.cgi?id=1449 --- Comment #6 from Mike Holmes mike.hol...@linaro.org --- ./odp_timer_test -p 5 odp_timer_test.c:94:test_abs_timeouts(): [6] test_timeouts odp_timer_test.c:102:test_abs_timeouts(): [6] period 0 ticks, 5000 ns odp_timer_test.c:102:test_abs_timeouts(): [7] period 0 ticks, 5000 ns odp_timer_test.c:105:test_abs_timeouts(): [7] current tick 0 odp_timer_test.c:105:test_abs_timeouts(): [6] current tick 0 odp_timer_test.c:171:test_abs_timeouts(): [2] timeout, tick 0 odp_timer_test.c:132:test_abs_timeouts(): odp_timer_set_abs() failed: too early Aborted (core dumped) -- You are receiving this mail because: You are on the CC list for the bug.___ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp
[lng-odp] [Bug 1449] odp_timer_test core dump
https://bugs.linaro.org/show_bug.cgi?id=1449 --- Comment #4 from Ola Liljedahl ola.liljed...@linaro.org --- I see now that the reason for the abort and core dump is that the timeout period is smaller than the tick (the resolution of the timer). This does not make sense and is also not supported by the timer example which aborts when the timer API returns timeout too early. The solution is to add a check to the timer example to ensure that the user-specified configuration is correct. However I don't know if we can know all the restrictions in advance, I suspect we might detect more invalid configurations in the future. -- You are receiving this mail because: You are on the CC list for the bug.___ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp
[lng-odp] [Bug 1449] odp_timer_test core dump
https://bugs.linaro.org/show_bug.cgi?id=1449 --- Comment #5 from Ola Liljedahl ola.liljed...@linaro.org --- I am testing with periods 1x-2x the length of the timer resolution and still get intermittent too early failures. I assume this is caused by the non-determinism of Linux, the ODP threads may not execute immediately (e.g. the threads are swapped out and Linux running something else on the CPU's). This means that the timer example needs to be more robust. It cannot expect that it will have full control over the CPU and run immediately a timer expired and the timeout is delivered. Too early errors from odp_timer_set() should be handled gracefully and not cause an abort. -- You are receiving this mail because: You are on the CC list for the bug.___ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp
[lng-odp] [Bug 1449] odp_timer_test core dump
https://bugs.linaro.org/show_bug.cgi?id=1449 Mike Holmes mike.hol...@linaro.org changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |CONFIRMED --- Comment #2 from Mike Holmes mike.hol...@linaro.org --- (In reply to Ola Liljedahl from comment #1) Does this error occur all of the time for the specific configuration (e.g. 5 CPU's?) or is this problem intermittent? I was thinking that if the test (it is just the timer example?) is running in a busy environment (other processes, VM migration etc), timeliness cannot be guaranteed and possibly the timer example expects some high level of timeliness. Ola, did you have time to look at the core file I was able to produce? This is still happening in nightly regression. -- You are receiving this mail because: You are on the CC list for the bug. ___ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp
[lng-odp] [Bug 1449] odp_timer_test core dump
https://bugs.linaro.org/show_bug.cgi?id=1449 --- Comment #3 from Ola Liljedahl ola.liljed...@linaro.org --- Not yet I am sorry. -- You are receiving this mail because: You are on the CC list for the bug. ___ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp
[lng-odp] [Bug 1449] odp_timer_test core dump
https://bugs.linaro.org/show_bug.cgi?id=1449 --- Comment #1 from Ola Liljedahl ola.liljed...@linaro.org --- Does this error occur all of the time for the specific configuration (e.g. 5 CPU's?) or is this problem intermittent? I was thinking that if the test (it is just the timer example?) is running in a busy environment (other processes, VM migration etc), timeliness cannot be guaranteed and possibly the timer example expects some high level of timeliness. -- You are receiving this mail because: You are on the CC list for the bug. ___ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp