Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
On 26/05/16 16:32, Ronald S. Bultje wrote: > I think this misses the point. I tested 3 decoders, and 2 were slower. It > looks like something between 10 and 100 decoders were changed. I could blow > this up and say that this suggests that there may, in fact, be a > performance problem that might significantly affect the majority of the > decoders on a significant subset of systems. I'm not seeing this way. A large speed gain (10-30%) on x86_64 (and on power8) for mezzanine formats dwarfs the fact they are 1-5% slower on platforms (arm32, x86_32) that are slow on reproducing that kind of content, no matter what you do, since they have tiny I/O, IMHO. Decoding non-mezzanine formats spends more time doing math operations so the speed gains/slowdowns in reading bits aren't as marked, but you are welcome to spend time on testing exhaustively. Bonus point if the method is reproducible and just uses avconv and perf. > I'm not going to make that claim just yet because I feel like I don't have > enough data. But claiming that this only affects dnxhd/prores decoding on > VLC on "some tablets" misses the point. I prefer Luca's approach (thanks!) > of testing arm32, even if it takes a little bit of effort. Well, you seem to miss the fact that I was not happy to spend more time on that since to me the trade-off is already in the good enough area and it required me about 1/2 day to set up and arm32 chroot on the odroid... For prores there is a 1% slowdown with a %0.4 uncertainty, huffyuv is a 20% slower or a 5% slower depending on the compiler (it takes 26s vs the 21s to decode 10s of test stream in the worst scenario), from what I can see much of it depends on the compiler choosing which function to inline in which one, on x86_32 (and x86_64) the compiler gets to do a better job by relaxing the constraint from force-inline to inline, on arm32 seems that the compiler isn't as good. I tested on arm32 h264 cavlc (that's the only codec with some bitreader impact I would consider for an arm32 platform) and there is no speed difference (0.6% difference with a 0.7% uncertainty). I'm still convinced the set is good enough to enter the tree, help in making huffyuv behave in a less unpredictable way is welcome independently since it seems to be quite hairy for compilers in a way or another (see the ancient gcc used by travis or the current clang from git master). lu ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
Hi, On Wed, May 25, 2016 at 7:38 PM, Kieran Kunhyawrote: > On Wed, 25 May 2016 at 19:37 Vittorio Giovara > wrote: > > > On Wed, May 25, 2016 at 10:49 AM, Ronald S. Bultje > > wrote: > > > Hi, > > > > > > On Wed, May 25, 2016 at 9:54 AM, Luca Barbato > > wrote: > > > > > >> On 25/05/16 15:32, Ronald S. Bultje wrote: > > >> > I agree, ARM 32bit results would be very interesting. > > >> > > >> The odroid images I have are arm64 only (and I'm still figuring out > how > > >> to set it up properly), if you have a fast arm32 to try you are > welcome > > >> to beat me at it =) > > > > > > > > > Hm, sorry, x86-32 was the best I could do ;-). I can try to help by > > finding > > > other decoders that became slower (on x86-32) maybe? I can also try to > > > debug why decoders that became slower, actually are slower (on x86-32 - > > but > > > then again you already did quite some work in that area). But I also > kind > > > of agree with Anton that x86-32 isn't exactly top priority (although > > > Chromebooks...), arm32 is more interesting. > > > > Are there people processing dnxhd or prores on arm32? > > If so, they should volunteer to run the benchmarks. > > > > VLC on some tablets I guess. I think this misses the point. I tested 3 decoders, and 2 were slower. It looks like something between 10 and 100 decoders were changed. I could blow this up and say that this suggests that there may, in fact, be a performance problem that might significantly affect the majority of the decoders on a significant subset of systems. I'm not going to make that claim just yet because I feel like I don't have enough data. But claiming that this only affects dnxhd/prores decoding on VLC on "some tablets" misses the point. I prefer Luca's approach (thanks!) of testing arm32, even if it takes a little bit of effort. Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
On Wed, 25 May 2016 at 19:37 Vittorio Giovarawrote: > On Wed, May 25, 2016 at 10:49 AM, Ronald S. Bultje > wrote: > > Hi, > > > > On Wed, May 25, 2016 at 9:54 AM, Luca Barbato > wrote: > > > >> On 25/05/16 15:32, Ronald S. Bultje wrote: > >> > I agree, ARM 32bit results would be very interesting. > >> > >> The odroid images I have are arm64 only (and I'm still figuring out how > >> to set it up properly), if you have a fast arm32 to try you are welcome > >> to beat me at it =) > > > > > > Hm, sorry, x86-32 was the best I could do ;-). I can try to help by > finding > > other decoders that became slower (on x86-32) maybe? I can also try to > > debug why decoders that became slower, actually are slower (on x86-32 - > but > > then again you already did quite some work in that area). But I also kind > > of agree with Anton that x86-32 isn't exactly top priority (although > > Chromebooks...), arm32 is more interesting. > > Are there people processing dnxhd or prores on arm32? > If so, they should volunteer to run the benchmarks. > VLC on some tablets I guess. Kieran ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
On Wed, May 25, 2016 at 10:49 AM, Ronald S. Bultjewrote: > Hi, > > On Wed, May 25, 2016 at 9:54 AM, Luca Barbato wrote: > >> On 25/05/16 15:32, Ronald S. Bultje wrote: >> > I agree, ARM 32bit results would be very interesting. >> >> The odroid images I have are arm64 only (and I'm still figuring out how >> to set it up properly), if you have a fast arm32 to try you are welcome >> to beat me at it =) > > > Hm, sorry, x86-32 was the best I could do ;-). I can try to help by finding > other decoders that became slower (on x86-32) maybe? I can also try to > debug why decoders that became slower, actually are slower (on x86-32 - but > then again you already did quite some work in that area). But I also kind > of agree with Anton that x86-32 isn't exactly top priority (although > Chromebooks...), arm32 is more interesting. Are there people processing dnxhd or prores on arm32? If so, they should volunteer to run the benchmarks. -- Vittorio ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
Hi, On Wed, May 25, 2016 at 10:59 AM, Diego Biurrunwrote: > On Wed, May 25, 2016 at 10:49:12AM -0400, Ronald S. Bultje wrote: > > It's not like this is to be committed tomorrow, right? So take your time, > > not urgent... > > Ummm, no; it's overdue already and was slated for pushing (after far too > many delays) a few days ago. So if you want to chip in (which you are > welcome to), you should do it sooner rather than later. OK, so it seems Luca is still volunteering to do the arm32 stuff. So, what can I best help with to "chip in"? I can help figuring out what change in the bitstream reader makes it slower for these decoders that became slower on x86-32, and alternatively I can try to test more decoders and find more that became slower (on x86-32). Which of these is more useful? Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
On 5/25/2016 9:26 AM, Diego Biurrun wrote: > On Wed, May 25, 2016 at 03:03:08PM +0300, Martin Storsjö wrote: >> On Wed, 25 May 2016, Diego Biurrun wrote: >> >>> More seriously, can you provide (hints at) numbers? I was one of the >>> last x86_32 holdouts, but the SSD in my notebook died and won't get >>> replaced. The Windows box I have here gathering dust between occasional >>> portability tests and doing taxes once per year is an ancient single >>> core machine with Windows 7 - and it runs 64-bit Windows 7, which no >>> longer has (basic) support from Microsoft. So what's the actual >>> usecase? Where are those 32-bit machines and what do they do that is >>> affected by the new bit reader? >> >> It's not (so much) about 32 bit machines, but more about 32 bit >> applications. For windows, it's still very common to ship 32 bit >> binaries only. > > Is VLC 64-bit on Windows? One more reason to go with VLC :) It is, but it defaults to a 32bits installer if you click on the big blue "Download VLC" button, and forces you to do five extra clicks browsing two extra pages to finally get a "64 bits installer" link. The result of this is that 99% of downloads will be of the 32 bits version. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
On Wed, May 25, 2016 at 10:49:12AM -0400, Ronald S. Bultje wrote: > It's not like this is to be committed tomorrow, right? So take your time, > not urgent... Ummm, no; it's overdue already and was slated for pushing (after far too many delays) a few days ago. So if you want to chip in (which you are welcome to), you should do it sooner rather than later. Diego ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
Hi, On Wed, May 25, 2016 at 9:54 AM, Luca Barbatowrote: > On 25/05/16 15:32, Ronald S. Bultje wrote: > > I agree, ARM 32bit results would be very interesting. > > The odroid images I have are arm64 only (and I'm still figuring out how > to set it up properly), if you have a fast arm32 to try you are welcome > to beat me at it =) Hm, sorry, x86-32 was the best I could do ;-). I can try to help by finding other decoders that became slower (on x86-32) maybe? I can also try to debug why decoders that became slower, actually are slower (on x86-32 - but then again you already did quite some work in that area). But I also kind of agree with Anton that x86-32 isn't exactly top priority (although Chromebooks...), arm32 is more interesting. It's not like this is to be committed tomorrow, right? So take your time, not urgent... Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
On 25/05/16 15:32, Ronald S. Bultje wrote: > I agree, ARM 32bit results would be very interesting. The odroid images I have are arm64 only (and I'm still figuring out how to set it up properly), if you have a fast arm32 to try you are welcome to beat me at it =) lu ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
Hi, On Wed, May 25, 2016 at 8:20 AM, Luca Barbatowrote: > On 25/05/16 12:31, Anton Khirnov wrote: > > All the results quoted here seem to be from x86 and 32bit x86 is not > > really all that relevant these days. Did anyone do any ARM tests? > > On power8 I'm seeing a 17% speedup for huffyuv, 10% speedup for prores, > 4% speedup for dnxhd. > > Testing on ARM is going to be more contrived than for x86_32: high > bitrate over tiny platforms mixes not so well. > > I'll try to get proper results from the odroid since it is the fastes > system I have access to, but will take more time. I agree, ARM 32bit results would be very interesting. Thanks, Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
On Wed, May 25, 2016 at 03:03:08PM +0300, Martin Storsjö wrote: > On Wed, 25 May 2016, Diego Biurrun wrote: > > > More seriously, can you provide (hints at) numbers? I was one of the > > last x86_32 holdouts, but the SSD in my notebook died and won't get > > replaced. The Windows box I have here gathering dust between occasional > > portability tests and doing taxes once per year is an ancient single > > core machine with Windows 7 - and it runs 64-bit Windows 7, which no > > longer has (basic) support from Microsoft. So what's the actual > > usecase? Where are those 32-bit machines and what do they do that is > > affected by the new bit reader? > > It's not (so much) about 32 bit machines, but more about 32 bit > applications. For windows, it's still very common to ship 32 bit > binaries only. Is VLC 64-bit on Windows? One more reason to go with VLC :) > (For instance I'm not sure if Chrome switched to 64 bit already - if > they did, it wasn't too long ago.) 21 months ago, according to Wikipedia. Diego ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
On 25/05/16 12:31, Anton Khirnov wrote: > All the results quoted here seem to be from x86 and 32bit x86 is not > really all that relevant these days. Did anyone do any ARM tests? On power8 I'm seeing a 17% speedup for huffyuv, 10% speedup for prores, 4% speedup for dnxhd. Testing on ARM is going to be more contrived than for x86_32: high bitrate over tiny platforms mixes not so well. I'll try to get proper results from the odroid since it is the fastes system I have access to, but will take more time. lu ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
On Wed, 25 May 2016, Diego Biurrun wrote: More seriously, can you provide (hints at) numbers? I was one of the last x86_32 holdouts, but the SSD in my notebook died and won't get replaced. The Windows box I have here gathering dust between occasional portability tests and doing taxes once per year is an ancient single core machine with Windows 7 - and it runs 64-bit Windows 7, which no longer has (basic) support from Microsoft. So what's the actual usecase? Where are those 32-bit machines and what do they do that is affected by the new bit reader? It's not (so much) about 32 bit machines, but more about 32 bit applications. For windows, it's still very common to ship 32 bit binaries only. (For instance I'm not sure if Chrome switched to 64 bit already - if they did, it wasn't too long ago.) // Martin ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
Quoting Hendrik Leppkes (2016-05-25 13:36:52) > On Wed, May 25, 2016 at 12:31 PM, Anton Khirnovwrote: > > Quoting Luca Barbato (2016-05-25 11:35:58) > >> On 24/05/16 22:10, Ronald S. Bultje wrote: > >> > But 2 out of 3 are still slower. I can try to look somewhat more into > >> > this, > >> > but I think that not understanding what makes it slower is fundamentally > >> > flawed. If we understand why it's slower and we decide that that's OK, > >> > that's an entirely different thing. > >> > >> Well, the the new code is quite straightforward to read and looking at > >> the perf report it shows that it is more execution-efficient and less > >> prone to branch miss as expected. > >> > >> x86_32 is register-constrained so having a 64bit cache can have an > >> impact if you are already in a tight loop (dnxhd), and force-inlining if > >> you are calling many time read_vlc in the same function can be a > >> detriment as well (huffyuv). > >> > >> In general on 32bit it results in more instructions and more register > >> usage. In some cases the extra efficiency offsets that increase, in > >> other nearly does. > >> > >> That said I'm not seeing much usage of high bitrate mezzanine formats on > >> tiny systems. Codecs that aren't using the vlc reader so intensively are > >> obviously impacted less. > >> > > > > All the results quoted here seem to be from x86 and 32bit x86 is not > > really all that relevant these days. Did anyone do any ARM tests? > > > > Unfortunately, 32-bit x86 is still quite relevant on Windows, as > migration to full 64-bit systems is a very slow process, so any > performance degredations in 32-bit x86 should not be brushed aside on > an argument of relevance. I'm not saying x86-32 should be completely ignored, but it's clearly deprecated, and it's almost always run on 64bit cpus for legacy sw reasons. So while it should continue to run reasonably well, it's not that important to spend a lot of effort to have top possible performance there So I'd think performance on 32bit ARM would be more important. -- Anton Khirnov ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
On Wed, May 25, 2016 at 01:36:52PM +0200, Hendrik Leppkes wrote: > On Wed, May 25, 2016 at 12:31 PM, Anton Khirnovwrote: > > Quoting Luca Barbato (2016-05-25 11:35:58) > >> On 24/05/16 22:10, Ronald S. Bultje wrote: > >> > But 2 out of 3 are still slower. I can try to look somewhat more into > >> > this, > >> > but I think that not understanding what makes it slower is fundamentally > >> > flawed. If we understand why it's slower and we decide that that's OK, > >> > that's an entirely different thing. > >> > >> Well, the the new code is quite straightforward to read and looking at > >> the perf report it shows that it is more execution-efficient and less > >> prone to branch miss as expected. > >> > >> x86_32 is register-constrained so having a 64bit cache can have an > >> impact if you are already in a tight loop (dnxhd), and force-inlining if > >> you are calling many time read_vlc in the same function can be a > >> detriment as well (huffyuv). > >> > >> In general on 32bit it results in more instructions and more register > >> usage. In some cases the extra efficiency offsets that increase, in > >> other nearly does. > >> > >> That said I'm not seeing much usage of high bitrate mezzanine formats on > >> tiny systems. Codecs that aren't using the vlc reader so intensively are > >> obviously impacted less. > > > > All the results quoted here seem to be from x86 and 32bit x86 is not > > really all that relevant these days. Did anyone do any ARM tests? > > Unfortunately, 32-bit x86 is still quite relevant on Windows, as > migration to full 64-bit systems is a very slow process, so any > performance degredations in 32-bit x86 should not be brushed aside on > an argument of relevance. I could play devil's advocate and say "do we care?". More seriously, can you provide (hints at) numbers? I was one of the last x86_32 holdouts, but the SSD in my notebook died and won't get replaced. The Windows box I have here gathering dust between occasional portability tests and doing taxes once per year is an ancient single core machine with Windows 7 - and it runs 64-bit Windows 7, which no longer has (basic) support from Microsoft. So what's the actual usecase? Where are those 32-bit machines and what do they do that is affected by the new bit reader? Diego ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
On Wed, May 25, 2016 at 12:31 PM, Anton Khirnovwrote: > Quoting Luca Barbato (2016-05-25 11:35:58) >> On 24/05/16 22:10, Ronald S. Bultje wrote: >> > But 2 out of 3 are still slower. I can try to look somewhat more into this, >> > but I think that not understanding what makes it slower is fundamentally >> > flawed. If we understand why it's slower and we decide that that's OK, >> > that's an entirely different thing. >> >> Well, the the new code is quite straightforward to read and looking at >> the perf report it shows that it is more execution-efficient and less >> prone to branch miss as expected. >> >> x86_32 is register-constrained so having a 64bit cache can have an >> impact if you are already in a tight loop (dnxhd), and force-inlining if >> you are calling many time read_vlc in the same function can be a >> detriment as well (huffyuv). >> >> In general on 32bit it results in more instructions and more register >> usage. In some cases the extra efficiency offsets that increase, in >> other nearly does. >> >> That said I'm not seeing much usage of high bitrate mezzanine formats on >> tiny systems. Codecs that aren't using the vlc reader so intensively are >> obviously impacted less. >> > > All the results quoted here seem to be from x86 and 32bit x86 is not > really all that relevant these days. Did anyone do any ARM tests? > Unfortunately, 32-bit x86 is still quite relevant on Windows, as migration to full 64-bit systems is a very slow process, so any performance degredations in 32-bit x86 should not be brushed aside on an argument of relevance. - Hendrik ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
Quoting Luca Barbato (2016-05-25 11:35:58) > On 24/05/16 22:10, Ronald S. Bultje wrote: > > But 2 out of 3 are still slower. I can try to look somewhat more into this, > > but I think that not understanding what makes it slower is fundamentally > > flawed. If we understand why it's slower and we decide that that's OK, > > that's an entirely different thing. > > Well, the the new code is quite straightforward to read and looking at > the perf report it shows that it is more execution-efficient and less > prone to branch miss as expected. > > x86_32 is register-constrained so having a 64bit cache can have an > impact if you are already in a tight loop (dnxhd), and force-inlining if > you are calling many time read_vlc in the same function can be a > detriment as well (huffyuv). > > In general on 32bit it results in more instructions and more register > usage. In some cases the extra efficiency offsets that increase, in > other nearly does. > > That said I'm not seeing much usage of high bitrate mezzanine formats on > tiny systems. Codecs that aren't using the vlc reader so intensively are > obviously impacted less. > All the results quoted here seem to be from x86 and 32bit x86 is not really all that relevant these days. Did anyone do any ARM tests? -- Anton Khirnov ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
On 24/05/16 22:10, Ronald S. Bultje wrote: > But 2 out of 3 are still slower. I can try to look somewhat more into this, > but I think that not understanding what makes it slower is fundamentally > flawed. If we understand why it's slower and we decide that that's OK, > that's an entirely different thing. Well, the the new code is quite straightforward to read and looking at the perf report it shows that it is more execution-efficient and less prone to branch miss as expected. x86_32 is register-constrained so having a 64bit cache can have an impact if you are already in a tight loop (dnxhd), and force-inlining if you are calling many time read_vlc in the same function can be a detriment as well (huffyuv). In general on 32bit it results in more instructions and more register usage. In some cases the extra efficiency offsets that increase, in other nearly does. That said I'm not seeing much usage of high bitrate mezzanine formats on tiny systems. Codecs that aren't using the vlc reader so intensively are obviously impacted less. lu ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
On Wed, May 25, 2016 at 07:23:11AM +0200, Anton Khirnov wrote: > Quoting Kostya Shishkov (2016-05-25 07:19:24) > > On Tue, May 24, 2016 at 04:10:07PM -0400, Ronald S. Bultje wrote: > > > Hi, > > > > > > On Tue, May 24, 2016 at 3:47 PM, Luca Barbatowrote: > > > > > > > On 23/05/16 17:01, Ronald S. Bultje wrote: > > > > > Howdy, > > > > > > > > Interesting. I spent a bit of time on it myself. > > > > > > > > I run some benchmark using a yuv422 file of the right size from the > > > > Tim's collection [directly][1] and looped/cut to have a length that > > > > works fine (1minute and 10 minutes) and I used `perf stat -r 30` on a > > > > system that surely has a cpu unencumbered by random process on a server, > > > > so it does not have random quirks like a laptop one. > > > > > > > > The benchmark shown that force-inlining bitstream_read_vlc is not > > > > exactly helpful on the poor constained x86_32, and its implementation > > > > could spare few branches. > > > > > > > > With that change in, looks like the gains for x86_64 get even larger. > > > > > > > > I get the dnxhd to be about 3% slower on x86_32 and 20% faster on > > > > x86_64. > > > > > > [..] > > > > > > > And with that I guess we are set =) > > > > > > > > > But 2 out of 3 are still slower. I can try to look somewhat more into > > > this, > > > but I think that not understanding what makes it slower is fundamentally > > > flawed. > > > > So you admit you're fundamentally flawed? Sane approach would be to study > > proper performance tools output, like perf on Linux. Or look at generated > > assembly and referring to instruction timings and latencies if one wants it > > hardcore way. > > > > > If we understand why it's slower and we decide that that's OK, > > > that's an entirely different thing. > > > > Is that royal we or "we, FFmpeg developers"? I'm pretty sure nobody has > > elected you Libav leader. > > Not this shit again. Can't we have a civilized discussion like normal > people? So far there was not much of it. But don't mind me, I shan't bother you again. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
Quoting Kostya Shishkov (2016-05-25 07:19:24) > On Tue, May 24, 2016 at 04:10:07PM -0400, Ronald S. Bultje wrote: > > Hi, > > > > On Tue, May 24, 2016 at 3:47 PM, Luca Barbatowrote: > > > > > On 23/05/16 17:01, Ronald S. Bultje wrote: > > > > Howdy, > > > > > > Interesting. I spent a bit of time on it myself. > > > > > > I run some benchmark using a yuv422 file of the right size from the > > > Tim's collection [directly][1] and looped/cut to have a length that > > > works fine (1minute and 10 minutes) and I used `perf stat -r 30` on a > > > system that surely has a cpu unencumbered by random process on a server, > > > so it does not have random quirks like a laptop one. > > > > > > The benchmark shown that force-inlining bitstream_read_vlc is not > > > exactly helpful on the poor constained x86_32, and its implementation > > > could spare few branches. > > > > > > With that change in, looks like the gains for x86_64 get even larger. > > > > > > I get the dnxhd to be about 3% slower on x86_32 and 20% faster on x86_64. > > > > [..] > > > > > And with that I guess we are set =) > > > > > > But 2 out of 3 are still slower. I can try to look somewhat more into this, > > but I think that not understanding what makes it slower is fundamentally > > flawed. > > So you admit you're fundamentally flawed? Sane approach would be to study > proper performance tools output, like perf on Linux. Or look at generated > assembly and referring to instruction timings and latencies if one wants it > hardcore way. > > > If we understand why it's slower and we decide that that's OK, > > that's an entirely different thing. > > Is that royal we or "we, FFmpeg developers"? I'm pretty sure nobody has > elected you Libav leader. Not this shit again. Can't we have a civilized discussion like normal people? -- Anton Khirnov ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
On Tue, May 24, 2016 at 04:10:07PM -0400, Ronald S. Bultje wrote: > Hi, > > On Tue, May 24, 2016 at 3:47 PM, Luca Barbatowrote: > > > On 23/05/16 17:01, Ronald S. Bultje wrote: > > > Howdy, > > > > Interesting. I spent a bit of time on it myself. > > > > I run some benchmark using a yuv422 file of the right size from the > > Tim's collection [directly][1] and looped/cut to have a length that > > works fine (1minute and 10 minutes) and I used `perf stat -r 30` on a > > system that surely has a cpu unencumbered by random process on a server, > > so it does not have random quirks like a laptop one. > > > > The benchmark shown that force-inlining bitstream_read_vlc is not > > exactly helpful on the poor constained x86_32, and its implementation > > could spare few branches. > > > > With that change in, looks like the gains for x86_64 get even larger. > > > > I get the dnxhd to be about 3% slower on x86_32 and 20% faster on x86_64. > > [..] > > > And with that I guess we are set =) > > > But 2 out of 3 are still slower. I can try to look somewhat more into this, > but I think that not understanding what makes it slower is fundamentally > flawed. So you admit you're fundamentally flawed? Sane approach would be to study proper performance tools output, like perf on Linux. Or look at generated assembly and referring to instruction timings and latencies if one wants it hardcore way. > If we understand why it's slower and we decide that that's OK, > that's an entirely different thing. Is that royal we or "we, FFmpeg developers"? I'm pretty sure nobody has elected you Libav leader. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
Hi, On Tue, May 24, 2016 at 3:47 PM, Luca Barbatowrote: > On 23/05/16 17:01, Ronald S. Bultje wrote: > > Howdy, > > Interesting. I spent a bit of time on it myself. > > I run some benchmark using a yuv422 file of the right size from the > Tim's collection [directly][1] and looped/cut to have a length that > works fine (1minute and 10 minutes) and I used `perf stat -r 30` on a > system that surely has a cpu unencumbered by random process on a server, > so it does not have random quirks like a laptop one. > > The benchmark shown that force-inlining bitstream_read_vlc is not > exactly helpful on the poor constained x86_32, and its implementation > could spare few branches. > > With that change in, looks like the gains for x86_64 get even larger. > > I get the dnxhd to be about 3% slower on x86_32 and 20% faster on x86_64. [..] > And with that I guess we are set =) But 2 out of 3 are still slower. I can try to look somewhat more into this, but I think that not understanding what makes it slower is fundamentally flawed. If we understand why it's slower and we decide that that's OK, that's an entirely different thing. Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
On 23/05/16 17:01, Ronald S. Bultje wrote: > Howdy, Interesting. I spent a bit of time on it myself. I run some benchmark using a yuv422 file of the right size from the Tim's collection [directly][1] and looped/cut to have a length that works fine (1minute and 10 minutes) and I used `perf stat -r 30` on a system that surely has a cpu unencumbered by random process on a server, so it does not have random quirks like a laptop one. The benchmark shown that force-inlining bitstream_read_vlc is not exactly helpful on the poor constained x86_32, and its implementation could spare few branches. With that change in, looks like the gains for x86_64 get even larger. I get the dnxhd to be about 3% slower on x86_32 and 20% faster on x86_64. huffyuv gets quite a absurd boost on both arches, I tested twice and even with 10x longer samples just in case and I still get some 30% boost on x86_64 and about 14% on x86_32. prores has a 1% slowdown on x86_32 and a nice 14% speedup. And with that I guess we are set =) [1]: https://media.xiph.org/video/derf/y4m/old_town_cross_422_720p50.y4m lu PS: below some data from the benchmark script. ## dnxhd x86_64 (20% speedup) Performance counter stats for '../libav-master/avconv-new -threads 1 -nostats -v quiet -i testcase-dnxhd-1.mov -f null -' (30 runs): 20.545996490 seconds time elapsed ( +- 0.09% ) Performance counter stats for '../libav-master/avconv-no-inline -threads 1 -nostats -v quiet -i testcase-dnxhd-1.mov -f null -' (30 runs): 20.548445395 seconds time elapsed ( +- 0.04% ) Performance counter stats for '../libav-master/avconv-old -threads 1 -nostats -v quiet -i testcase-dnxhd-1.mov -f null -' (30 runs): 25.639981062 seconds time elapsed ( +- 0.06% ) x86_32 (3% slowdown) Performance counter stats for '../libav-master-x86/avconv-no-inline -threads 1 -nostats -v quiet -i testcase-dnxhd-1.mov -f null -' (30 runs): 26.258827740 seconds time elapsed ( +- 0.16% ) Performance counter stats for '../libav-master-x86/avconv-old -threads 1 -nostats -v quiet -i testcase-dnxhd-1.mov -f null -' (30 runs): 25.308070817 seconds time elapsed ( +- 0.09% ) ## huffyuv x86_64 (30% speedup) Performance counter stats for '../libav-master/avconv-new -threads 1 -nostats -v quiet -i testcase-huffyuv-1.mkv -f null -' (30 runs): 4.135589121 seconds time elapsed ( +- 0.15% ) Performance counter stats for '../libav-master/avconv-no-inline -threads 1 -nostats -v quiet -i testcase-huffyuv-1.mkv -f null -' (30 runs): 3.962414301 seconds time elapsed ( +- 0.16% ) Performance counter stats for '../libav-master/avconv-old -threads 1 -nostats -v quiet -i testcase-huffyuv-1.mkv -f null -' (30 runs): 5.695144348 seconds time elapsed ( +- 0.11% ) x86_32 (13% speedup) Performance counter stats for '../libav-master-x86/avconv-no-inline -threads 1 -nostats -v quiet -i testcase-huffyuv-1.mkv -f null -' (30 runs): 5.252524363 seconds time elapsed ( +- 0.13% ) Performance counter stats for '../libav-master-x86/avconv-old -threads 1 -nostats -v quiet -i testcase-huffyuv-1.mkv -f null -' (30 runs): 6.075662940 seconds time elapsed ( +- 0.10% ) ## prores x86_64 (14% speedup) Performance counter stats for '../libav-master/avconv-new -threads 1 -nostats -v quiet -i testcase-prores-1.mov -f null -' (30 runs): 23.248159307 seconds time elapsed ( +- 0.05% ) Performance counter stats for '../libav-master/avconv-no-inline -threads 1 -nostats -v quiet -i testcase-prores-1.mov -f null -' (30 runs): 23.363347522 seconds time elapsed ( +- 0.06% ) Performance counter stats for '../libav-master/avconv-old -threads 1 -nostats -v quiet -i testcase-prores-1.mov -f null -' (30 runs): 27.200361179 seconds time elapsed ( +- 0.03% ) x86_32 (1% slowdown) Performance counter stats for '../libav-master-x86/avconv-no-inline -threads 1 -nostats -v quiet -i testcase-prores-1.mov -f null -' (30 runs): 47.914316875 seconds time elapsed ( +- 0.04% ) Performance counter stats for '../libav-master-x86/avconv-old -threads 1 -nostats -v quiet -i testcase-prores-1.mov -f null -' (30 runs): 47.306030476 seconds time elapsed ( +- 0.02% ) ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
Howdy, On Sun, May 22, 2016 at 5:27 AM, Alexandra Hájková < alexandra.khirn...@gmail.com> wrote: > > Do you have a tree for testing somewhere? > > Yes, there's github branch: > https://github.com/sasshka/libav/tree/get_bits3. > Thanks! > If I find decoders that are slower on 32bit after the patch, will you fix > > it? > > > If you'll find something it will be discussed with the other > developers and potentially fixed. So, I tested a few decoders for which it's easy to generate test files. It looks like dnxhd got about 20% slower slower. avconv-new is 5ab5ff1f0783daf0924fdbd25333ea63a7faeb54 (i.e. tip of your get_bits3 branch), and avconv is 3399a26d3f57d462e839c0ee51223ae9aca20852 (branch point of get_bits3 branch from upstream). Both are compiled using "../configure --arch=i386 --extra-cflags='-arch i386' --extra-ldflags='-arch i386' --enable-gpl && make -j4". Input file was generated from [1], downsampled to 720p30/yuv420p [2] and then encoded using [3]. This is decoding time (single-threaded) of the two binaries: bash-4.3$ for n in {1..5}; do ( time ./avconv -threads 1 -i /tmp/sat-dnxhd.mov -f null -v 0 -nostats - ) 2>&1|grep user; done user 0m3.138s user 0m3.057s user 0m3.122s user 0m3.120s user 0m3.095s bash-4.3$ for n in {1..5}; do ( time ./avconv-new -threads 1 -i /tmp/sat-dnxhd.mov -f null -v 0 -nostats - ) 2>&1|grep user; done user 0m3.769s user 0m3.767s user 0m3.761s user 0m3.711s user 0m3.745s I also tested prores (which looks like it got about 5% faster), and huffyuv, which seems to be about 10% slower (input generated using [4]): bash-4.3$ for n in {1..5}; do ( time ./avconv -threads 1 -i /tmp/sat-huvvyuv.avi -f null -v 0 -nostats - ) 2>&1|grep user; done user 0m3.782s user 0m3.776s user 0m3.780s user 0m3.835s user 0m3.773s bash-4.3$ for n in {1..5}; do ( time ./avconv-new -threads 1 -i /tmp/sat-huvvyuv.avi -f null -v 0 -nostats - ) 2>&1|grep user; done user 0m4.127s user 0m4.162s user 0m4.159s user 0m4.134s user 0m4.124s I think the speed regression in these 2 decoders (dnxhd/huffyuv) should be addressed, since this might go beyond just x86-32 and affect other 32-bit platforms also. Ronald [1] https://media.xiph.org/video/derf/ElFuente/Netflix_SquareAndTimelapse_4096x2160_60fps_10bit_420.y4m [2] ffmpeg -i Netflix_SquareAndTimelapse_4096x2160_60fps_10bit_420.y4m -vf framestep=2 -s 1280x720 -pix_fmt yuv420p -c:v ffv1 SquareAndTimelapse.ffv1.mkv [3] ffmpeg -i SquareAndTimelapse.ffv1.mkv -pix_fmt yuv422p -b:v 75M -c:v dnxhd /tmp/sat-dnxhd.mov [4] ffmpeg -i SquareAndTimelapse.ffv1.mkv -c:v huffyuv /tmp/sat-huffyuv.avi (PS ffmpeg in [2-4] is whatever ships by default in the latest macports, seems to be 2.8.6.) ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
Hello, > Do you have a tree for testing somewhere? Yes, there's github branch: https://github.com/sasshka/libav/tree/get_bits3. > If I find decoders that are slower on 32bit after the patch, will you fix > it? > If you'll find something it will be discussed with the other developers and potentially fixed. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
Hi, On Sat, May 21, 2016 at 2:39 AM, Alexandra Hájková < alexandra.khirn...@gmail.com> wrote: > > > > I noticed proresdec (for example) is not converted to the new bitstream > > reader. Is there a reason for that? > > Not all the sets are sent yet, I'm sending them gradually to make the > reviewing > easier. > > > > Also, since this patch basically converts the bitstream reader to 64bits, > > do people think it would be useful to do some speed tests on 32bit as > well? > > I feel that on 32bits, the 64bit emulation might actually slow the thing > > down considerably, even if it's faster on 64bits. > We did benchmarks for 32 bits for several decoders and the new bitreader is > faster or as fast as the old one for the 32 bit CPU. Do you have a tree for testing somewhere? If I find decoders that are slower on 32bit after the patch, will you fix it? Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
> > I noticed proresdec (for example) is not converted to the new bitstream > reader. Is there a reason for that? Not all the sets are sent yet, I'm sending them gradually to make the reviewing easier. > > Also, since this patch basically converts the bitstream reader to 64bits, > do people think it would be useful to do some speed tests on 32bit as well? > I feel that on 32bits, the 64bit emulation might actually slow the thing > down considerably, even if it's faster on 64bits. We did benchmarks for 32 bits for several decoders and the new bitreader is faster or as fast as the old one for the 32 bit CPU. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
Hi, On Fri, May 20, 2016 at 4:11 PM, Alexandra Hájková < alexandra.khirn...@gmail.com> wrote: > This set is compilable together only. I noticed proresdec (for example) is not converted to the new bitstream reader. Is there a reason for that? Also, since this patch basically converts the bitstream reader to 64bits, do people think it would be useful to do some speed tests on 32bit as well? I feel that on 32bits, the 64bit emulation might actually slow the thing down considerably, even if it's faster on 64bits. (That doesn't mean the patch doesn't have merit, but rather it might mean that you might want a state size that depends on the bit width of the architecture. While I agree 32bit x86 is on its way out and possibly somewhat irrelevant, some - chromebook or x86-android are some examples - still care about it, and on non-x86, 32bit may actually be a more predominant target.) Btw don't get my comments wrong, I'm not criticizing the direction you guys take, work in this area is good and seems to have merit (as measured on 64bits), so thanks! Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3
This set is compilable together only. Alexandra Hájková (38): h261dec: Convert to the new bitstream reader flv: Convert to the new bitstream reader mpeg4video: Convert to the new bitstream reader h263dec: Convert to the new bitstream reader intrax8: Convert to the new bitstream reader ituh263dec: Convert to the new bitstream reader mss1, mss2, mss4: Convert to the new bitstream reader msmpeg4: Convert to the new bitstream reader vc1: Convert to the new bitstream reader wmv2dec: Convert to the new bitstream reader rv10, rv30, rv40: Convert to the new bitstream reader unary.h: Convert to the new bitstream reader aic: Convert to the new bitstream reader wavpack: Convert to the new bitstream reader tak: Convert to the new bitstream reader ralf: Convert to the new bitstream reader dxtory: Convert to the new bitstream reader dca: Convert to the new bitstream reader apedec: Convert to the new bitstream reader alsdec, bgmc: Convert to the new bitstream reader mpc8 (demuxer): Convert to the new bitstream reader golomb: Convert to the new bitstream reader golomb.h: Optimise get_ue/se_golomb golomb.h: Optimise svq3_get_ue_golomb h264: Convert to the new bitstream reader on2avc: Convert to the new bitstream reader svq3: Convert to the new bitstream reader cavsdec: Convert to the new bitstream reader dirac: Convert to the new bitstream reader ffv1dec: Convert to the new bitstream reader fic: Convert to the new bitstream reader flac: Convert to the new bitstream reader jpeglsdec, mjpegdec: Convert to the new bitstream reader loco: Convert to the new bitstream reader shorten: Convert to the new bitstream reader hevc: Convert to the new bitstream reader mxpegdec: Convert to the new bitstream reader alac: Convert to the new bitstream reader libavcodec/aic.c | 36 +-- libavcodec/alac.c | 62 ++--- libavcodec/alsdec.c| 228 +++ libavcodec/apedec.c| 44 +-- libavcodec/bgmc.c | 15 +- libavcodec/bgmc.h | 8 +- libavcodec/cavs.c | 6 +- libavcodec/cavs.h | 4 +- libavcodec/cavsdec.c | 176 ++-- libavcodec/dca.h | 6 +- libavcodec/dca_exss.c | 166 +-- libavcodec/dca_parser.c| 16 +- libavcodec/dca_xll.c | 158 +-- libavcodec/dcadec.c| 219 +++ libavcodec/dirac.c | 87 +++--- libavcodec/dxtory.c| 86 +++--- libavcodec/dxva2_vc1.c | 3 +- libavcodec/ffv1.c | 2 +- libavcodec/ffv1.h | 4 +- libavcodec/ffv1dec.c | 22 +- libavcodec/fic.c | 16 +- libavcodec/flac.c | 67 ++--- libavcodec/flac.h | 6 +- libavcodec/flac_parser.c | 6 +- libavcodec/flacdec.c | 56 ++-- libavcodec/flacenc.c | 1 - libavcodec/flv.h | 4 +- libavcodec/flvdec.c| 39 +-- libavcodec/golomb.h| 236 +++- libavcodec/h261dec.c | 90 +++--- libavcodec/h263.h | 3 +- libavcodec/h263dec.c | 54 ++-- libavcodec/h264.c | 15 +- libavcodec/h264.h | 13 +- libavcodec/h2645_parse.c | 21 +- libavcodec/h2645_parse.h | 4 +- libavcodec/h264_cabac.c| 2 + libavcodec/h264_cavlc.c| 167 +-- libavcodec/h264_mb_template.c | 16 +- libavcodec/h264_parse.c| 32 +-- libavcodec/h264_parse.h| 6 +- libavcodec/h264_parser.c | 64 ++--- libavcodec/h264_ps.c | 260 - libavcodec/h264_refs.c | 23 +- libavcodec/h264_sei.c | 192 ++--- libavcodec/h264_sei.h | 4 +- libavcodec/h264_slice.c| 68 ++--- libavcodec/hevc.c | 164 +-- libavcodec/hevc.h | 14 +- libavcodec/hevc_cabac.c| 11 +- libavcodec/hevc_parser.c | 14 +- libavcodec/hevc_ps.c | 455 +++--- libavcodec/hevc_sei.c | 69 ++--- libavcodec/hevcdsp.h | 4 +- libavcodec/hevcdsp_template.c | 6 +- libavcodec/intelh263dec.c | 69 ++--- libavcodec/intrax8.c | 37 +-- libavcodec/intrax8.h | 7 +- libavcodec/ituh263dec.c| 327 +++--- libavcodec/jpeglsdec.c | 46 +-- libavcodec/loco.c | 12 +- libavcodec/mjpegbdec.c | 41 +-- libavcodec/mjpegdec.c | 266 -- libavcodec/mjpegdec.h | 5 +- libavcodec/mpeg4video.h| 4 +- libavcodec/mpeg4video_parser.c | 11 +- libavcodec/mpeg4videodec.c | 620 - libavcodec/mpegvideo.h | 4 +- libavcodec/msmpeg4data.h | 3 +- libavcodec/msmpeg4dec.c| 195 ++--- libavcodec/mss1.c