Re: [DE] Re: [CN] Re: [DE] Re: coda: i.MX6 decoding performance issues for multi-streaming
Hi Javier, On Mon, 2018-04-23 at 11:29 +0200, Javier Martin wrote: > Sorry for resurrecting this thread but I'm still quite interested on > making this scenario work: > > > OK, I've performed some tests with several resolutions and gop sizes, > here is the table with the results: > > > > Always playing 3 streams > > > > | Resolution | QP | GopSize | Kind of content | Result > | > > | 640x368 | 25 |16 | Waving hands | time > shifts, no DEC_PIC_SUCCESS | > > | 640x368 | 25 |0 | Waving hands | time > shifts, no DEC_PIC_SUCCESS| > > | 320x192 | 25 |0 | Waving hands | time > shifts, no DEC_PIC_SUCCESS | > > | 320x192 | 25 |16 | Waving hands | time > shifts, no DEC_PIC_SUCCESS | > > | 1280x720 | 25 |16 | Waving hands | macroblock > artifacts and lots of DEC_PIC_SUCCESS messages | > > | 1280x720 | 25 |0 | Waving hands | > Surprisingly smooth, no artifacts, time shifts nor DEC_PIC_SUCCESS| > > > > * The issues always happens in the first stream, the other 2 streams > are fine. > > * With GopSize = 0 I can even decode 4 720p streams with no artifacts > > > > It looks like for small resolutions it suffers from time shifts when > multi-streaming, always affecting the first stream for some reason. In > this case gop size doesn't seem to make any difference. > > > > For higher resolutions like 720p using GopSize = 0 seems to improve > things a lot. > > I've tried to reproduce this with GStreamer 1.14.0: gst-launch-1.0 filesrc location=test_720p.mp4 ! qtdemux ! h264parse ! tee name=t \ t. ! v4l2h264dec ! fakesink \ t. ! v4l2h264dec ! fakesink \ t. ! v4l2h264dec ! fakesink \ t. ! v4l2h264dec ! fakesink with sync=false and sync=true, and with waylandsink instead of fakesink, with various streams, all the same or all different: gst-launch-1.0 \ filesrc location=a.mp4 ! qtdemux ! h264parse ! v4l2h264dec ! fakesink \ filesrc location=b.mp4 ! qtdemux ! h264parse ! v4l2h264dec ! fakesink \ filesrc location=c.mp4 ! qtdemux ! h264parse ! v4l2h264dec ! fakesink \ filesrc location=d.mp4 ! qtdemux ! h264parse ! v4l2h264dec ! fakesink I can't seem to cause the DEC_PIC_SUCCESS issue with this setup, with CODA-preencoded files. Same when I split this into an UDP sender and receiver via RTP: gst-launch-1.0 filesrc location=test_720p.mp4 ! qtdemux ! h264parse ! rtph264pay ! udpsink host=10.0.0.1 port=12345 gst-launch-1.0 udpsrc port=12345 ! application/x-rtp,payload=96 ! rtph264depay ! h264parse ! tee name=t \ t. ! v4l2h264dec ! fakesink \ t. ! v4l2h264dec ! fakesink \ t. ! v4l2h264dec ! fakesink \ t. ! v4l2h264dec ! fakesink Could you try to either recreate the issue with GStreamer or with a simple test program that I can see, or maybe provide a test stream somewhere that causes the issue for me to download? > Philipp, you mentioned some possible issue with context switches in a > previous e-mail: > > I fear this may be some interaction between coda context switches and > > bitstream reader unit state. > > Philipp, do these results confirm your theory? Are there any more tests > I could prepare to help get to the bottom of this or this is something > that belongs entirely to the coda firmware domain? Does anyone know if > the official BSP from NXP is able to decode 4 flows without issues? I still have no idea. Maybe print coda_get_bitstream_payload(ctx) when the DEC_PIC_SUCCESS error is emitted, to check whether this could be some kind of buffer underrun issue. I assume you are not dropping any buffers. regards Philipp
Re: [DE] Re: [CN] Re: [DE] Re: coda: i.MX6 decoding performance issues for multi-streaming
Sorry for resurrecting this thread but I'm still quite interested on making this scenario work: > OK, I've performed some tests with several resolutions and gop sizes, here is the table with the results: > > Always playing 3 streams > > | Resolution | QP | GopSize | Kind of content | Result | > | 640x368 | 25 |16 | Waving hands | time shifts, no DEC_PIC_SUCCESS | > | 640x368 | 25 |0 | Waving hands | time shifts, no DEC_PIC_SUCCESS| > | 320x192 | 25 |0 | Waving hands | time shifts, no DEC_PIC_SUCCESS | > | 320x192 | 25 |16 | Waving hands | time shifts, no DEC_PIC_SUCCESS | > | 1280x720 | 25 |16 | Waving hands | macroblock artifacts and lots of DEC_PIC_SUCCESS messages | > | 1280x720 | 25 |0 | Waving hands | Surprisingly smooth, no artifacts, time shifts nor DEC_PIC_SUCCESS| > > * The issues always happens in the first stream, the other 2 streams are fine. > * With GopSize = 0 I can even decode 4 720p streams with no artifacts > > It looks like for small resolutions it suffers from time shifts when multi-streaming, always affecting the first stream for some reason. In this case gop size doesn't seem to make any difference. > > For higher resolutions like 720p using GopSize = 0 seems to improve things a lot. > Philipp, you mentioned some possible issue with context switches in a previous e-mail: > I fear this may be some interaction between coda context switches and > bitstream reader unit state. Philipp, do these results confirm your theory? Are there any more tests I could prepare to help get to the bottom of this or this is something that belongs entirely to the coda firmware domain? Does anyone know if the official BSP from NXP is able to decode 4 flows without issues?
Re: [DE] Re: [CN] Re: [DE] Re: coda: i.MX6 decoding performance issues for multi-streaming
Hello Philipp, On 14/03/18 16:11, Philipp Zabel wrote: Hi Javier, On Wed, 2018-03-14 at 15:35 +0100, Javier Martin wrote: [...] The encoder is running on a different system with an older 4.1.0 kernel. Altough the firmware version in the code is 3.1.1 as well. Do you think I should try updating the system in the encoder to kernel 4.15 too and see if that makes any difference? I don't think that should matter. It'd be more interesting if GOP size has a significant influence. Does the Problem also appear in I-frame only streams? OK, I've performed some tests with several resolutions and gop sizes, here is the table with the results: Always playing 3 streams | Resolution | QP | GopSize | Kind of content | Result | | 640x368 | 25 |16 | Waving hands | time shifts, no DEC_PIC_SUCCESS | | 640x368 | 25 |0 | Waving hands | time shifts, no DEC_PIC_SUCCESS | | 320x192 | 25 |0 | Waving hands | time shifts, no DEC_PIC_SUCCESS | | 320x192 | 25 |16 | Waving hands | time shifts, no DEC_PIC_SUCCESS | | 1280x720 | 25 |16 | Waving hands | macroblock artifacts and lots of DEC_PIC_SUCCESS messages | | 1280x720 | 25 |0 | Waving hands | Surprisingly smooth, no artifacts, time shifts nor DEC_PIC_SUCCESS| * The issues always happens in the first stream, the other 2 streams are fine. * With GopSize = 0 I can even decode 4 720p streams with no artifacts It looks like for small resolutions it suffers from time shifts when multi-streaming, always affecting the first stream for some reason. In this case gop size doesn't seem to make any difference. For higher resolutions like 720p using GopSize = 0 seems to improve things a lot. Regards, Javier.
Re: [CN] Re: [DE] Re: coda: i.MX6 decoding performance issues for multi-streaming
Hi Javier, On Wed, 2018-03-14 at 15:35 +0100, Javier Martin wrote: [...] > The encoder is running on a different system with an older 4.1.0 kernel. > Altough the firmware version in the code is 3.1.1 as well. > > Do you think I should try updating the system in the encoder to kernel > 4.15 too and see if that makes any difference? I don't think that should matter. It'd be more interesting if GOP size has a significant influence. Does the Problem also appear in I-frame only streams? regards Philipp
Re: [CN] Re: [DE] Re: coda: i.MX6 decoding performance issues for multi-streaming
Hello, On 14/03/18 14:57, Philipp Zabel wrote: On Wed, 2018-03-14 at 13:05 +0100, Javier Martin wrote: Sorry everyone about my previous e-mail with all the HTML garbage. Here is the plain text answer instead. Hi Philipp, thanks for your answer. On 13/03/18 12:20, Philipp Zabel wrote: > Hi Javier, > > On Mon, 2018-03-12 at 17:54 +0100, Javier Martin wrote: >> Hi, >> we have an i.MX6 Solo based board running the latest mainline kernel >> (4.15.3). >> >> As part of our development we were measuring the decoding performance of >> the i.MX6 coda chip. >> >> For that purpose we are feeding the decoder with 640x368 @ 30fps H.264 >> streams that have been generated by another i.MX6 coda encoder >> configured with fixed qp = 25 and gopsize = 16. Those are the defaults. Is the encoder running on the same system, at the same time? Or are you decoding a previously encoded stream (multiple previously encoded streams)? The encoder is running on a different system with an older 4.1.0 kernel. Altough the firmware version in the code is 3.1.1 as well. Do you think I should try updating the system in the encoder to kernel 4.15 too and see if that makes any difference? [...] I'm currently using 3.1.1 both for encoding and decoding. I think I got it from the latest BSP provided by NXP. Now that you mention it the driver is printing these messages at probe time which I had ignored so far: coda 204.vpu: Firmware code revision: 46056 coda 204.vpu: Initialized CODA960. coda 204.vpu: Unsupported firmware version: 3.1.1 coda 204.vpu: codec registered as /dev/video[3-4] That is strange, commit be7f1ab26f42 ("media: coda: mark CODA960 firmware versions 2.3.10 and 3.1.1 as supported") was merged in v4.14. You are right, those messages where taken from an old 4.1 kernel and not from the latest 4.15 where they don't appear any longer. Sorry for the noise. Do you think I should use an older version instead? Unfortunately I have no indication that this would help. Also, do you think it would be worth trying different parameters in the encoder to see how the decoder responds in those cases? Possibly. It would be interesting to know if this happens more often for low resolutions / low quality / static frames than high resolutions / high quality / high movement. I can easily prepare a test matrix with several resolutions, QPs and content and let you know the results. Although first I'd like to know your opinion on whether I should update the encoder to kernel 4.15 too. Regards, Javier.
Re: [DE] Re: coda: i.MX6 decoding performance issues for multi-streaming
On Wed, 2018-03-14 at 13:05 +0100, Javier Martin wrote: > Sorry everyone about my previous e-mail with all the HTML garbage. Here > is the plain text answer instead. > > Hi Philipp, > > thanks for your answer. > > On 13/03/18 12:20, Philipp Zabel wrote: > > Hi Javier, > > > > On Mon, 2018-03-12 at 17:54 +0100, Javier Martin wrote: > >> Hi, > >> we have an i.MX6 Solo based board running the latest mainline kernel > >> (4.15.3). > >> > >> As part of our development we were measuring the decoding > performance of > >> the i.MX6 coda chip. > >> > >> For that purpose we are feeding the decoder with 640x368 @ 30fps H.264 > >> streams that have been generated by another i.MX6 coda encoder > >> configured with fixed qp = 25 and gopsize = 16. Those are the defaults. Is the encoder running on the same system, at the same time? Or are you decoding a previously encoded stream (multiple previously encoded streams)? [...] > I'm currently using 3.1.1 both for encoding and decoding. I think I got > it from the latest BSP provided by NXP. Now that you mention it the > driver is printing these messages at probe time which I had ignored so far: > > coda 204.vpu: Firmware code revision: 46056 > coda 204.vpu: Initialized CODA960. > coda 204.vpu: Unsupported firmware version: 3.1.1 > coda 204.vpu: codec registered as /dev/video[3-4] That is strange, commit be7f1ab26f42 ("media: coda: mark CODA960 firmware versions 2.3.10 and 3.1.1 as supported") was merged in v4.14. > Do you think I should use an older version instead? Unfortunately I have no indication that this would help. > Also, do you think it would be worth trying different parameters in the > encoder to see how the decoder responds in those cases? Possibly. It would be interesting to know if this happens more often for low resolutions / low quality / static frames than high resolutions / high quality / high movement. I fear this may be some interaction between coda context switches and bitstream reader unit state. regards Philipp
Re: [DE] Re: coda: i.MX6 decoding performance issues for multi-streaming
Sorry everyone about my previous e-mail with all the HTML garbage. Here is the plain text answer instead. Hi Philipp, thanks for your answer. On 13/03/18 12:20, Philipp Zabel wrote: > Hi Javier, > > On Mon, 2018-03-12 at 17:54 +0100, Javier Martin wrote: >> Hi, >> we have an i.MX6 Solo based board running the latest mainline kernel >> (4.15.3). >> >> As part of our development we were measuring the decoding performance of >> the i.MX6 coda chip. >> >> For that purpose we are feeding the decoder with 640x368 @ 30fps H.264 >> streams that have been generated by another i.MX6 coda encoder >> configured with fixed qp = 25 and gopsize = 16. >> >> For 1-2 streams it works smoothly. However, when adding the 3rd stream >> the first decoder instance starts to output these kind of errors: >> >> DEC_PIC_SUCCESS = 2097153 -> 0x21 >> DEC_PIC_SUCCESS = 2621441 -> 0x280001 > I think these might be (recoverable?) error flags, but so far I have > never seen them myself. > I've had reports of those occurring occasionally with certain streams > (not encoded by coda, regardless of the number of running decoder > instances) though. > > What is the coda firmware version you are using? I'm currently using 3.1.1 both for encoding and decoding. I think I got it from the latest BSP provided by NXP. Now that you mention it the driver is printing these messages at probe time which I had ignored so far: coda 204.vpu: Firmware code revision: 46056 coda 204.vpu: Initialized CODA960. coda 204.vpu: Unsupported firmware version: 3.1.1 coda 204.vpu: codec registered as /dev/video[3-4] Do you think I should use an older version instead? Also, do you think it would be worth trying different parameters in the encoder to see how the decoder responds in those cases? Regards, Javier.
Re: coda: i.MX6 decoding performance issues for multi-streaming
Hi Javier, On Mon, 2018-03-12 at 17:54 +0100, Javier Martin wrote: > Hi, > we have an i.MX6 Solo based board running the latest mainline kernel > (4.15.3). > > As part of our development we were measuring the decoding performance of > the i.MX6 coda chip. > > For that purpose we are feeding the decoder with 640x368 @ 30fps H.264 > streams that have been generated by another i.MX6 coda encoder > configured with fixed qp = 25 and gopsize = 16. > > For 1-2 streams it works smoothly. However, when adding the 3rd stream > the first decoder instance starts to output these kind of errors: > > DEC_PIC_SUCCESS = 2097153 -> 0x21 > DEC_PIC_SUCCESS = 2621441 -> 0x280001 I think these might be (recoverable?) error flags, but so far I have never seen them myself. I've had reports of those occurring occasionally with certain streams (not encoded by coda, regardless of the number of running decoder instances) though. What is the coda firmware version you are using? regards Philipp
Re: coda: i.MX6 decoding performance issues for multi-streaming
Hi Javier, On Mon, Mar 12, 2018 at 1:54 PM, Javier Martin wrote: > Hi, > we have an i.MX6 Solo based board running the latest mainline kernel > (4.15.3). > > As part of our development we were measuring the decoding performance of the > i.MX6 coda chip. > > For that purpose we are feeding the decoder with 640x368 @ 30fps H.264 > streams that have been generated by another i.MX6 coda encoder configured > with fixed qp = 25 and gopsize = 16. > > For 1-2 streams it works smoothly. However, when adding the 3rd stream the > first decoder instance starts to output these kind of errors: > > DEC_PIC_SUCCESS = 2097153 -> 0x21 > DEC_PIC_SUCCESS = 2621441 -> 0x280001 > > Every time one of these errors appears we can observe a weird artifact in > the decoded video (pixelated macroblocks and/or jumps back in time). > > I tried looking at the original VPU lib implementation by Freescale [1] but > they don't seem to handle these errors either. As I don't have access to any > kind of Coda IP documentation it's quite hard to me to perform any > additional debugging. > > Has anyone experienced these kind of performance issues too? I'm open to any > suggestions and willing to perform extra tests to get to the bottom of this. Are you passing 'capture-io-mode=dmabuf' in your Gstreamer pipeline? This really improves the performance of video decoding.