I've tested your patch on x86_64 and ARM (I picked the weakest HW I
have). I've gathered some statistics:
A "good" tuner there means the one that returns bigger chunks on
average and generally consumes less CPU cycles with a vanilla (not
CPU usage means the CPU usage of the TS buffer thread reported by top -d2.
Bit rates are average and coarse.
I didn't include unconditional delay dumps. They are "smooth" like
those 8 and 15 Mbit/s ARM figures. CPU usage on x86_64 is typically
1.5-2 times less.
1. Small reads are really expensive and must be avoided.
2. "Good" tuners are not that "good" at higher bit rates.
3. An "non-delayed" read() (the last read() in the sequence
poll()->read()->sleep()->poll()->read()->poll()->read()) is very
likely to return a small chunk.
With bit rates higher that 30 Mbit/s I got device buffer overflows on
ARM - ring buffer processing couldn't catch up.
So, generally, it solves the issue, but is not as efficient as with an
unconditional delay. Compared to the unconditional patch:
1. CPU usage is higher and not as steady.
2. The read() sizes are not smooth too.
Those CPU usage percents (without the patch) may look small, but it's
20-35% of the whole VDR CPU usage and are several times less with the
patch, conditional or not.
I understand, an unconditional delay here looks a bit scaring. How
about increasing the threshold value to, say, 100000, or 500 * TS_SIZE
or even higher? In other words, treat delayed read() as a normal
operation, and not delayed one as an emergency case. I can test it on
x86_64 with 50-60 Mbit/s. What do you think of it?
2016-03-10 12:41 GMT+03:00 Klaus Schmidinger <klaus.schmidin...@tvdr.de>:
> On 10.03.2016 02:54, glenvt18 wrote:
>> Hi folks,
>> I've found that with some DVB tuner drivers poll() returns when there
>> are only small (1-3) number of TS packets available to read(). This
>> results in high CPU usage of the TS buffer thread which is busy
>> reading small chunks of data in cTSBuffer::Action(). 2 out of 3 tuners
>> I tested were affected with this issue. Even with a "good" tuner TS
>> buffer thread CPU usage is up to 10% per HD stream on ARM Cortex-A7.
>> With the proposed patch it is below 1-2% on all ARM and x86_64
>> platforms I've tested. The delay value of 10 ms can be considered
>> #define DVR_BUFFER_SIZE (10*188*1024)
>> It will take a tuner to receive (10*188*1024)*8*(1000/10) / 1000000 =
>> 1540 Mbit/s to overflow the device buffer within 10 ms interval. A
>> smaller delay is not enough for ARM. cDvbDevice has a ring buffer of
>> 5MB which is larger.
>> This patch was made against VDR 2.3.1, but it can be applied to VDR
>> 2.2.0 as well.
>> Please review.
>> Index: b/device.c
>> --- a/device.c 2015-09-18 01:04:12.000000000 +0300
>> +++ b/device.c 2016-03-10 03:38:50.078400715 +0300
>> @@ -1768,6 +1768,8 @@
>> + else
>> + cCondWait::SleepMs(10);
> I'm not too fond of the idea of introducing an additional, unconditional
> wait here. The actual problem should be fixed within the drivers.
> However, maybe waiting in case there is only a small number of TS packets
> available is acceptable:
> --- device.c 2015/09/05 11:42:17 4.2
> +++ device.c 2016/03/10 09:34:11
> @@ -1768,6 +1768,8 @@
> + else if (r < MIN_TS_PACKETS_FOR_FRAME_DETECTOR * TS_SIZE)
> + cCondWait::SleepMs(10);
> The number MIN_TS_PACKETS_FOR_FRAME_DETECTOR * TS_SIZE is just a random pick
> to avoid using a concrete literal number here.
> Can you please test if this still solves your problem?
> vdr mailing list
vdr mailing list