Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-26 Thread u-9iep
On Wed, Oct 26, 2016 at 06:17:27PM +0200, wm4 wrote:
> > With all respect, I find such an argument hardly appropriate for two 
> > reasons:
> > it is formally incorrect (see below),
> > it shifts the attention from the functionality/usability to emotionally
> > charged aspects like age.
> 
> It's not an "emotionally charged" aspect. Phasing out old hardware is
> common sense. Of course this doesn't appeal to the "keep all the trash"

You do not seem to notice that you make the same mistake,
using emotionally charged expressions.

You are insulting the people who use older hardware, which is
_not_necessarily_ "trash" and who do this for their reasons - good or
bad is not in the realm of one's competence unless one is involved in
the actual case.

You the developers know everything between CPU, memory and I/O. The user
acts in her real world. For the developers it is unfounded to pretend
to be familiar with all the worlds out there.

If not otherwise, I mentioned the good reasons to run on the existing
hardware:

On Wed, Oct 26, 2016 at 06:04:24PM +0200, u-9...@aetey.se wrote:
> Such arguments can not be passed on to the users without an offer of
> a corresponding amount in cache. :)
> 
> More seriously, the act of replacement itself has its costs too, in
> cases like this one such cost is much higher than the cost of hardware.

To make it clear why I do not offer the expected spared money to the
developers, let me point out the phrase which may have gone unnoticed:

> (regrettably, this cost or gain is not trivial to convert to a
> compensation for the ffmpeg developers, unless video is one's main
> business, sorry for that!)
=>^^

I did my best to contribute some small bits of code and of understanding.

To conclude, I would like to friendly suggest to every bright
programmer on this list (sincerely highly respecting your skills),
try to notice when you tread outside your competence area.

Good luck to the ffmpeg project, my humble effort here is over.

Regards,
Rune

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-26 Thread wm4
On Wed, 26 Oct 2016 18:04:24 +0200
u-9...@aetey.se wrote:

> On Wed, Oct 26, 2016 at 05:08:56PM +0200, Hendrik Leppkes wrote:
> > > One of the highly appreciated virtues of ffmpeg is its efficiency,
> > > this makes hardware much more useful. Please do not cut off the low
> > > end systems when possible.  
> 
> > Its not about low-end, its about 15+ years old hardware. At some point  
> 
> With all respect, I find such an argument hardly appropriate for two reasons:
> it is formally incorrect (see below),
> it shifts the attention from the functionality/usability to emotionally
> charged aspects like age.

It's not an "emotionally charged" aspect. Phasing out old hardware is
common sense. Of course this doesn't appeal to the "keep all the trash"
crowd, which for some reason seems to be going strong in FFmpeg.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-26 Thread Ronald S. Bultje
Hi,

On Wed, Oct 26, 2016 at 12:04 PM,  wrote:

> On Wed, Oct 26, 2016 at 05:08:56PM +0200, Hendrik Leppkes wrote:
> > you just have to replace this stuff.
> > For $100 you can buy a system at least 10 times as fast then those.
> > Heck even a RPi might rival those for $20 or so.
>
> Such arguments can not be passed on to the users without an offer of
> a corresponding amount in cache. :)


Will you pay us for code maintenance?

Ronald
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-26 Thread u-9iep
On Wed, Oct 26, 2016 at 05:08:56PM +0200, Hendrik Leppkes wrote:
> > One of the highly appreciated virtues of ffmpeg is its efficiency,
> > this makes hardware much more useful. Please do not cut off the low
> > end systems when possible.

> Its not about low-end, its about 15+ years old hardware. At some point

With all respect, I find such an argument hardly appropriate for two reasons:
it is formally incorrect (see below),
it shifts the attention from the functionality/usability to emotionally
charged aspects like age.

I am participating in the discussion as a concerned _packager_.
Nevertheless I am a ffmpeg "user" too.

As a matter of fact I am running several low power units which have mmx but no 
sse.
Several of them exist for handling video, either for playback or encoding or 
both.

They have been set up this way because they fit the bill very nicely,
among others thanks to ffmpeg.

The one I looked at recently is manufactured 10 years ago and is not
youngest in its generation. So we have to wait at least extra 5 years
to make it 15+.

> you just have to replace this stuff.
> For $100 you can buy a system at least 10 times as fast then those.
> Heck even a RPi might rival those for $20 or so.

Such arguments can not be passed on to the users without an offer of
a corresponding amount in cache. :)

More seriously, the act of replacement itself has its costs too, in
cases like this one such cost is much higher than the cost of hardware.

(regrettably, this cost or gain is not trivial to convert to a
compensation for the ffmpeg developers, unless video is one's main
business, sorry for that!)

Regards,
Rune

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-26 Thread Michael Niedermayer
On Wed, Oct 26, 2016 at 04:21:14PM +0200, Hendrik Leppkes wrote:
> On Wed, Oct 26, 2016 at 3:54 PM, Michael Niedermayer
>  wrote:
> > On Tue, Oct 25, 2016 at 12:00:01AM +0200, Hendrik Leppkes wrote:
> >> On Mon, Oct 24, 2016 at 10:31 PM, Ronald S. Bultje  
> >> wrote:
> >> > Hi,
> >> >
> >> > On Mon, Oct 24, 2016 at 4:26 PM, Henrik Gramner  
> >> > wrote:
> >> >
> >> >> On Mon, Oct 24, 2016 at 9:59 PM, Ronald S. Bultje 
> >> >> wrote:
> >> >> > Good idea to reference Hendrik Gramner here, who keeps insisting we 
> >> >> > get
> >> >> rid
> >> >> > of all MMX code in ffmpeg (at least as an option) for future Intel 
> >> >> > CPUs
> >> >> in
> >> >> > which MMX will be deprecated.
> >> >>
> >> >> Replacing MMX with SSE2 is indeed the most "proper" fix in my opinion,
> >> >> but it's a fair amount of work and not done in an evening.
> >> >>
> >> >> The fact that a lot of assembly lacks unit tests is certainly not
> >> >> helping in that regard.
> >> >>
> >> >> Some MMX instructions are slower than the equivalent SSE2 code on
> >> >> Skylake. Intel hasn't officially commented on (as far as I know at
> >> >> least) if we should expect this trend to continue, but they certainly
> >> >> seem to treat MMX as legacy.
> >> >>
> >> >> I doubt they would completely remove support for it though, backwards
> >> >> compatibility is a big selling-point for x86.
> >> >
> >> >
> >> > Well, it gives us another way of fixing this issue (on x86-64 only): have
> >> > sse2 implementations for all code that has a mmx (register) path right 
> >> > now.
> >> >
> >>
> >> I don't think the argument for pre-sse2 CPUs is that strong on 32-bit
> >> systems, either.
> >
> > SSE2 was initially not faster than MMX as CPUs implemented it as 2
> > MMX operations internally not having a full width SIMD unit for SSE*
> > so there would be a performace loss on some x86-32 CPUs if MMX was
> > replaced by "half-width SSE2" there
> >
> 
> You can add "not caring about first-gen sse2 CPUs" to the list as

its more like 3 or 4 generations than 1 according to the instruction
tables from Agner Fog

core 2 (Merom) seems the first that has partial full width support
shift/pack/unpack/shuffle still are faster as MMX
PM, P4, P4E all seem half speed at SSE* than MMX


> well, if you want. Those are way old as well.


> There is going to be a performance loss either way, except that emms
> slows it down everywhere, while using sse2 is likely to be fine on

minor detail being that there is a factor of around
ten thousand in the speed loss between the 2 cases you compare
(0.001% vs maybe 50%)

Droping MMX will cause pre SSE2 CPUs to be alot slower, maybe half
speed overall or less, they loose all SIMD optimizations. On older
SSE2 cpus its still going to be a hefty hit too.
adding emms at a video frame or slice level which is what the patches
posted do pretty much has no real effect but dont belive me look at the
timings worst case i see in agners tables are 18 clock cycles that at
60fps and 1slice and a slow 100mhz cpu is 0.001%
even if there are 100 times more emms (due to slice level EMMS) it
still at the edge of being
hard to meassure. Doing EMMS per function call is of course not
prcatical.

theres an additional penalty for the first float instruction after emms
on some cpus, 58 clock cycles (on P4) but thats still just 0.003% in the
example above.

anyway, i wantd to stay out of this and ill do that, just wanted to
comment on the technical details

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Those who are too smart to engage in politics are punished by being
governed by those who are dumber. -- Plato 


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-26 Thread Hendrik Leppkes
On Wed, Oct 26, 2016 at 4:52 PM,   wrote:
> On Wed, Oct 26, 2016 at 04:21:14PM +0200, Hendrik Leppkes wrote:
>> You can add "not caring about first-gen sse2 CPUs" to the list as
>> well, if you want. Those are way old as well.
>> There is going to be a performance loss either way, except that emms
>> slows it down everywhere, while using sse2 is likely to be fine on
>> modern CPUs. So IMHO thats the better course to take, if any.
>
> Note that making MMX-acceleration unavailable would hurt at most on
> those slowest, non-sse systems, which otherwise e.g. often struggle for
> real-time decoding of heavier streams.
>
> One of the highly appreciated virtues of ffmpeg is its efficiency,
> this makes hardware much more useful. Please do not cut off the low
> end systems when possible.
>

Its not about low-end, its about 15+ years old hardware. At some point
you just have to replace this stuff.
For $100 you can buy a system at least 10 times as fast then those.
Heck even a RPi might rival those for $20 or so.

If you want to convince developers to fix the MMX situation properly,
which would cost a lot of time to do, then the best course is to look
forward, which the SSE2 option imho is doing.
All other options are just band-aids.

- Hendrik
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-26 Thread u-9iep
On Wed, Oct 26, 2016 at 04:21:14PM +0200, Hendrik Leppkes wrote:
> You can add "not caring about first-gen sse2 CPUs" to the list as
> well, if you want. Those are way old as well.
> There is going to be a performance loss either way, except that emms
> slows it down everywhere, while using sse2 is likely to be fine on
> modern CPUs. So IMHO thats the better course to take, if any.

Note that making MMX-acceleration unavailable would hurt at most on
those slowest, non-sse systems, which otherwise e.g. often struggle for
real-time decoding of heavier streams.

One of the highly appreciated virtues of ffmpeg is its efficiency,
this makes hardware much more useful. Please do not cut off the low
end systems when possible.

Regards,
Rune

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-26 Thread Hendrik Leppkes
On Wed, Oct 26, 2016 at 3:54 PM, Michael Niedermayer
 wrote:
> On Tue, Oct 25, 2016 at 12:00:01AM +0200, Hendrik Leppkes wrote:
>> On Mon, Oct 24, 2016 at 10:31 PM, Ronald S. Bultje  
>> wrote:
>> > Hi,
>> >
>> > On Mon, Oct 24, 2016 at 4:26 PM, Henrik Gramner  wrote:
>> >
>> >> On Mon, Oct 24, 2016 at 9:59 PM, Ronald S. Bultje 
>> >> wrote:
>> >> > Good idea to reference Hendrik Gramner here, who keeps insisting we get
>> >> rid
>> >> > of all MMX code in ffmpeg (at least as an option) for future Intel CPUs
>> >> in
>> >> > which MMX will be deprecated.
>> >>
>> >> Replacing MMX with SSE2 is indeed the most "proper" fix in my opinion,
>> >> but it's a fair amount of work and not done in an evening.
>> >>
>> >> The fact that a lot of assembly lacks unit tests is certainly not
>> >> helping in that regard.
>> >>
>> >> Some MMX instructions are slower than the equivalent SSE2 code on
>> >> Skylake. Intel hasn't officially commented on (as far as I know at
>> >> least) if we should expect this trend to continue, but they certainly
>> >> seem to treat MMX as legacy.
>> >>
>> >> I doubt they would completely remove support for it though, backwards
>> >> compatibility is a big selling-point for x86.
>> >
>> >
>> > Well, it gives us another way of fixing this issue (on x86-64 only): have
>> > sse2 implementations for all code that has a mmx (register) path right now.
>> >
>>
>> I don't think the argument for pre-sse2 CPUs is that strong on 32-bit
>> systems, either.
>
> SSE2 was initially not faster than MMX as CPUs implemented it as 2
> MMX operations internally not having a full width SIMD unit for SSE*
> so there would be a performace loss on some x86-32 CPUs if MMX was
> replaced by "half-width SSE2" there
>

You can add "not caring about first-gen sse2 CPUs" to the list as
well, if you want. Those are way old as well.
There is going to be a performance loss either way, except that emms
slows it down everywhere, while using sse2 is likely to be fine on
modern CPUs. So IMHO thats the better course to take, if any.

- Hendrik
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-26 Thread Michael Niedermayer
On Tue, Oct 25, 2016 at 12:00:01AM +0200, Hendrik Leppkes wrote:
> On Mon, Oct 24, 2016 at 10:31 PM, Ronald S. Bultje  wrote:
> > Hi,
> >
> > On Mon, Oct 24, 2016 at 4:26 PM, Henrik Gramner  wrote:
> >
> >> On Mon, Oct 24, 2016 at 9:59 PM, Ronald S. Bultje 
> >> wrote:
> >> > Good idea to reference Hendrik Gramner here, who keeps insisting we get
> >> rid
> >> > of all MMX code in ffmpeg (at least as an option) for future Intel CPUs
> >> in
> >> > which MMX will be deprecated.
> >>
> >> Replacing MMX with SSE2 is indeed the most "proper" fix in my opinion,
> >> but it's a fair amount of work and not done in an evening.
> >>
> >> The fact that a lot of assembly lacks unit tests is certainly not
> >> helping in that regard.
> >>
> >> Some MMX instructions are slower than the equivalent SSE2 code on
> >> Skylake. Intel hasn't officially commented on (as far as I know at
> >> least) if we should expect this trend to continue, but they certainly
> >> seem to treat MMX as legacy.
> >>
> >> I doubt they would completely remove support for it though, backwards
> >> compatibility is a big selling-point for x86.
> >
> >
> > Well, it gives us another way of fixing this issue (on x86-64 only): have
> > sse2 implementations for all code that has a mmx (register) path right now.
> >
> 
> I don't think the argument for pre-sse2 CPUs is that strong on 32-bit
> systems, either.

SSE2 was initially not faster than MMX as CPUs implemented it as 2
MMX operations internally not having a full width SIMD unit for SSE*
so there would be a performace loss on some x86-32 CPUs if MMX was
replaced by "half-width SSE2" there

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Everything should be made as simple as possible, but not simpler.
-- Albert Einstein


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-25 Thread Ronald S. Bultje
Hi,

On Tue, Oct 25, 2016 at 5:12 PM, Moritz Barsnick  wrote:

> On Tue, Oct 25, 2016 at 14:28:38 +0200, u-9...@aetey.se wrote:
> > In a perfect world the user might be offered a build time choice:
> > either lose N% of the performance or take the consequences of not
> > following the ABI.
>
> That was exactly my thought too, assuming it doesn't pollute the code
> even more:
> --disable-fast-simd or --enable-compliant-mmx or something. "auto"
> would detect musl and/or test for the necessity to use emms_c() or
> other measures.


If you don't care about performance, issue a emms upon RET in INIT_MMX
functions (just like we do for vzeroupper in INIT_YMM functions) using
x86inc.asm. Of course, this should be disabled for x86-64 or non-musl
builds.

(That doesn't yet deal with inline asm.)

Ronald
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-25 Thread Moritz Barsnick
On Tue, Oct 25, 2016 at 14:28:38 +0200, u-9...@aetey.se wrote:
> In a perfect world the user might be offered a build time choice:
> either lose N% of the performance or take the consequences of not
> following the ABI.

That was exactly my thought too, assuming it doesn't pollute the code
even more:
--disable-fast-simd or --enable-compliant-mmx or something. "auto"
would detect musl and/or test for the necessity to use emms_c() or
other measures.

Moritz
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-25 Thread u-9iep
On Tue, Oct 25, 2016 at 05:48:50PM +0200, Henrik Gramner wrote:
> On Tue, Oct 25, 2016 at 2:28 PM,   wrote:
> > It would be nice to look at a benchmarking comparison, to be able to
> > see the actual practical performance gain of the decision not to follow
> > the ABI.
> 
> Just a quick comparison from adding EMMS to a random MMX function
> (from x264, because I happened to have the source for that but not
> ffmpeg on this machine and I was too lazy to download it).
> 
> sad_4x4_c: 359
> sad_4x4_mmx: 94
> sad_4x4_mmx (emms): 186

This is unfortunatly not exactly the numbers a user or a packager would
need to know. The relevant ones would be what time it takes to actually
compress or decompress a given file.

(if most of the time is spent calling this very function, then we can
extrapolate the numbers to the net result, otherwise we shouldn't)

If we nevertheless assume that the net performance is proportional
to the numbers above, we learn that omission of MMX acceleration
compatible with the standard ABI would lead to 50% performance loss
for the ABI-compliant users (pure C instead of compliant MMX).

This would be a pity, even if the circle of the users who depend
on strict ABI-compliance is limited, for the moment.

Regards,
Rune

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-25 Thread Henrik Gramner
On Tue, Oct 25, 2016 at 2:28 PM,   wrote:
> It would be nice to look at a benchmarking comparison, to be able to
> see the actual practical performance gain of the decision not to follow
> the ABI.

Just a quick comparison from adding EMMS to a random MMX function
(from x264, because I happened to have the source for that but not
ffmpeg on this machine and I was too lazy to download it).

sad_4x4_c: 359
sad_4x4_mmx: 94
sad_4x4_mmx (emms): 186
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-25 Thread u-9iep
On Mon, Oct 24, 2016 at 09:53:22PM +0200, Henrik Gramner wrote:
> The decision to issue emms manually instead of after every MMX
> function was a deliberate decision. I'd hardly call it "misdesigned"
> to make SIMD code twice as fast at the cost of technically abusing the

It would be nice to look at a benchmarking comparison, to be able to
see the actual practical performance gain of the decision not to follow
the ABI.

My expectation is among others that using MMX in a compliant way would
remain remarkably beneficial compared to no SIMD.

In a perfect world the user might be offered a build time choice:
either lose N% of the performance or take the consequences of not
following the ABI. Apparently, different users/packagers would make
different choices, but the crucial thing would be knowing the N.

Of course, offering to disable MMX _is_ such a choice, but having instead
an option of "somewhat slower, compliant MMX" would be much nicer.

my 2c
Rune

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-24 Thread Hendrik Leppkes
On Mon, Oct 24, 2016 at 10:31 PM, Ronald S. Bultje  wrote:
> Hi,
>
> On Mon, Oct 24, 2016 at 4:26 PM, Henrik Gramner  wrote:
>
>> On Mon, Oct 24, 2016 at 9:59 PM, Ronald S. Bultje 
>> wrote:
>> > Good idea to reference Hendrik Gramner here, who keeps insisting we get
>> rid
>> > of all MMX code in ffmpeg (at least as an option) for future Intel CPUs
>> in
>> > which MMX will be deprecated.
>>
>> Replacing MMX with SSE2 is indeed the most "proper" fix in my opinion,
>> but it's a fair amount of work and not done in an evening.
>>
>> The fact that a lot of assembly lacks unit tests is certainly not
>> helping in that regard.
>>
>> Some MMX instructions are slower than the equivalent SSE2 code on
>> Skylake. Intel hasn't officially commented on (as far as I know at
>> least) if we should expect this trend to continue, but they certainly
>> seem to treat MMX as legacy.
>>
>> I doubt they would completely remove support for it though, backwards
>> compatibility is a big selling-point for x86.
>
>
> Well, it gives us another way of fixing this issue (on x86-64 only): have
> sse2 implementations for all code that has a mmx (register) path right now.
>

I don't think the argument for pre-sse2 CPUs is that strong on 32-bit
systems, either.
I wouldn't necessarily limit this to x86-64 at that, considering its
probably another half year at least, probably longer, until such a
conversion could be completed.

- Hendrik
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-24 Thread Ronald S. Bultje
Hi,

On Mon, Oct 24, 2016 at 4:26 PM, Henrik Gramner  wrote:

> On Mon, Oct 24, 2016 at 9:59 PM, Ronald S. Bultje 
> wrote:
> > Good idea to reference Hendrik Gramner here, who keeps insisting we get
> rid
> > of all MMX code in ffmpeg (at least as an option) for future Intel CPUs
> in
> > which MMX will be deprecated.
>
> Replacing MMX with SSE2 is indeed the most "proper" fix in my opinion,
> but it's a fair amount of work and not done in an evening.
>
> The fact that a lot of assembly lacks unit tests is certainly not
> helping in that regard.
>
> Some MMX instructions are slower than the equivalent SSE2 code on
> Skylake. Intel hasn't officially commented on (as far as I know at
> least) if we should expect this trend to continue, but they certainly
> seem to treat MMX as legacy.
>
> I doubt they would completely remove support for it though, backwards
> compatibility is a big selling-point for x86.


Well, it gives us another way of fixing this issue (on x86-64 only): have
sse2 implementations for all code that has a mmx (register) path right now.

Given the complexity of this issue, it's worth considering that fix along
with the other possibilities...

Ronald
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-24 Thread Henrik Gramner
On Mon, Oct 24, 2016 at 9:59 PM, Ronald S. Bultje  wrote:
> Good idea to reference Hendrik Gramner here, who keeps insisting we get rid
> of all MMX code in ffmpeg (at least as an option) for future Intel CPUs in
> which MMX will be deprecated.

Replacing MMX with SSE2 is indeed the most "proper" fix in my opinion,
but it's a fair amount of work and not done in an evening.

The fact that a lot of assembly lacks unit tests is certainly not
helping in that regard.

Some MMX instructions are slower than the equivalent SSE2 code on
Skylake. Intel hasn't officially commented on (as far as I know at
least) if we should expect this trend to continue, but they certainly
seem to treat MMX as legacy.

I doubt they would completely remove support for it though, backwards
compatibility is a big selling-point for x86.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-24 Thread wm4
On Mon, 24 Oct 2016 21:53:22 +0200
Henrik Gramner  wrote:

> On Mon, Oct 24, 2016 at 9:34 PM, wm4  wrote:
> > a ASM function must, according to the calling convention, reset the
> > MMX state when returning.
> >
> > What FFmpeg does here was misdesigned from the very start.  
> 
> The decision to issue emms manually instead of after every MMX
> function was a deliberate decision. I'd hardly call it "misdesigned"
> to make SIMD code twice as fast at the cost of technically abusing the
> ABI, which has worked flawlessly for years until very recently.

It's still clearly not a good idea because it violates the ABI directly.

OK let's get back to the "practical" argument that we have not much of
a choice but to violate the ABI.

Then at least the "weaker" assumption, that emms should be called before
calling (or returning to) functions that don't explicitly support MMX,
should be followed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-24 Thread Ronald S. Bultje
Hi,

On Mon, Oct 24, 2016 at 3:34 PM, wm4  wrote:

> On Mon, 24 Oct 2016 21:19:46 +0200
> Andreas Cadhalpun  wrote:
>
> > On 24.10.2016 16:14, Ronald S. Bultje wrote:
> > > On Mon, Oct 24, 2016 at 8:47 AM, wm4  wrote:
> > >> On Mon, 24 Oct 2016 07:54:47 -0400
> > >> "Ronald S. Bultje"  wrote:
> > >>> On Mon, Oct 24, 2016 at 3:36 AM, wm4  wrote:
> >  I was under the impression that it is UB to have the FPU in MMX
> state
> >  at any time while in C, not just while e.g. calling the stdlib.
> Maybe I
> >  got that wrong (how would MMX intrinsics even work?) - can anyone
> shed
> >  light on the exact requirements? (Possibly again, sorry.)
> > >>>
> > >>> I'm under the impression that it's part of the calling convention.
> That
> > >> is,
> > >>> any code anywhere (including mmx intrinsics, indeed) can - when
> called -
> > >>> expect the state to be cleared by the caller, just like you'd expect
> > >>> eax/edx to be caller-save (whereas esi/edi are callee-save).
> > >>>
> > >>> However, if you never call external code (including intrinsics), you
> can
> > >>> ignore this, just as you can ignore / create your own calling
> > >>> convention (remember fastcall etc.?). However, when calling any
> external
> > >>> code, this could (in theory) crash; it's just that right now it only
> > >>> crashes with musl when calling malloc/free. So basically, ffmpeg has
> its
> > >>> own calling convention, and manually calling emms_c() fixes "ffmpeg"
> > >>> calling convention to be compatible with "standard" calling
> > >> convention...?
> > >>
> > >> It can't really be a calling convention unless the compiler is aware
> of
> > >> it?
> >
> > It is defined as part of the System V Application Binary Interface [1]:
> > "The CPU shall be in x87 mode upon entry to a function. Therefore, every
> > function that uses the MMX registers is required to issue an emms or
> femms
> > instruction after using MMX registers, before returning or calling
> another
> > function."
>
> I mean FFmpeg can't make up its own calling convention without the
> compiler's knowledge.
>
> But thanks for reminding me about this but of the sysv ABI. The
> paragraph you quoted is actually very clear about the requirements. It
> means FFmpeg can barely do anything and remain standard compliant: a
> ASM function must, according to the calling convention, reset the MMX
> state when returning.
>
> What FFmpeg does here was misdesigned from the very start.


Good idea to reference Hendrik Gramner here, who keeps insisting we get rid
of all MMX code in ffmpeg (at least as an option) for future Intel CPUs in
which MMX will be deprecated.

Food for thought, indeed.

Ronald
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-24 Thread Henrik Gramner
On Mon, Oct 24, 2016 at 9:34 PM, wm4  wrote:
> a ASM function must, according to the calling convention, reset the
> MMX state when returning.
>
> What FFmpeg does here was misdesigned from the very start.

The decision to issue emms manually instead of after every MMX
function was a deliberate decision. I'd hardly call it "misdesigned"
to make SIMD code twice as fast at the cost of technically abusing the
ABI, which has worked flawlessly for years until very recently.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-24 Thread Andreas Cadhalpun
On 24.10.2016 21:34, wm4 wrote:
> On Mon, 24 Oct 2016 21:19:46 +0200
> Andreas Cadhalpun  wrote:
>> On 24.10.2016 16:14, Ronald S. Bultje wrote:
>>> On Mon, Oct 24, 2016 at 8:47 AM, wm4  wrote:  
 The next safest assumption is that it's fine as long as we explicitly
 don't use floating point or call external functions that aren't
 MMX-aware. This would mean calling any libc functions or user callbacks
 (including indirectly through e.g. av_log) is not fine.  
>>
>> This is probably OK in practice and likely has a significant performance
>> benefit.
> 
> Seems like it.
> 
> The compiler still could generate code using the FPU state at any
> point, though, unless maybe there is an inline asm block. No function
> can put the FPU into MMX mode, because any MMX using function must have
> called emms before returning. Consequently, only compiler-known
> intrinsics or opaque inline asm block could clobber the FPU state. The
> latter because the compiler doesn't inspect asm blocks.

I agree. However, I think that this is currently only a theoretical
issue. At least I'm not aware of this causing problems with a real compiler.

Best regards,
Andreas
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-24 Thread Andreas Cadhalpun
On 24.10.2016 16:14, Ronald S. Bultje wrote:
> On Mon, Oct 24, 2016 at 8:47 AM, wm4  wrote:
>> On Mon, 24 Oct 2016 07:54:47 -0400
>> "Ronald S. Bultje"  wrote:
>>> On Mon, Oct 24, 2016 at 3:36 AM, wm4  wrote:
 I was under the impression that it is UB to have the FPU in MMX state
 at any time while in C, not just while e.g. calling the stdlib. Maybe I
 got that wrong (how would MMX intrinsics even work?) - can anyone shed
 light on the exact requirements? (Possibly again, sorry.)
>>>
>>> I'm under the impression that it's part of the calling convention. That
>> is,
>>> any code anywhere (including mmx intrinsics, indeed) can - when called -
>>> expect the state to be cleared by the caller, just like you'd expect
>>> eax/edx to be caller-save (whereas esi/edi are callee-save).
>>>
>>> However, if you never call external code (including intrinsics), you can
>>> ignore this, just as you can ignore / create your own calling
>>> convention (remember fastcall etc.?). However, when calling any external
>>> code, this could (in theory) crash; it's just that right now it only
>>> crashes with musl when calling malloc/free. So basically, ffmpeg has its
>>> own calling convention, and manually calling emms_c() fixes "ffmpeg"
>>> calling convention to be compatible with "standard" calling
>> convention...?
>>
>> It can't really be a calling convention unless the compiler is aware of
>> it?

It is defined as part of the System V Application Binary Interface [1]:
"The CPU shall be in x87 mode upon entry to a function. Therefore, every
function that uses the MMX registers is required to issue an emms or femms
instruction after using MMX registers, before returning or calling another
function."

>> We're doing things behind the compiler's back here. The safest
>> assumption would be that leaving the FPU state invalid while in C is
>> not allowed, period.

That's what the standard says.

>> The next safest assumption is that it's fine as long as we explicitly
>> don't use floating point or call external functions that aren't
>> MMX-aware. This would mean calling any libc functions or user callbacks
>> (including indirectly through e.g. av_log) is not fine.

This is probably OK in practice and likely has a significant performance
benefit.

Best regards,
Andreas


1: 
https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-24 Thread Andreas Cadhalpun
On 23.10.2016 20:18, Carl Eugen Hoyos wrote:
> 2016-10-23 12:14 GMT+02:00 Andreas Cadhalpun 
> :
> 
>> I also don't like adding emms_c in various places, but I don't see a
>> realistic alternative
> 
> (+1!)
> 
>> other than not fixing the problem. And I think that would be worse
>> than this patch set.
> 
> I don't (strongly) disagree but could you elaborate on why this would
> be such a bad alternative?

For one thing FFmpeg shouldn't rely on undefined behavior (a theoretical
issue) and for another thing it should work with musl libc (a practical
problem). I think not fixing the latter is bad for our users.

Best regards,
Andreas
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-24 Thread Ronald S. Bultje
Hi,

On Mon, Oct 24, 2016 at 8:47 AM, wm4  wrote:

> On Mon, 24 Oct 2016 07:54:47 -0400
> "Ronald S. Bultje"  wrote:
>
> > Hi,
> >
> > On Mon, Oct 24, 2016 at 3:36 AM, wm4  wrote:
> >
> > > On Sun, 23 Oct 2016 13:02:01 -0400
> > > "Ronald S. Bultje"  wrote:
> > >
> > > > Hi,
> > > >
> > > > On Sat, Oct 22, 2016 at 11:36 PM, Michael Niedermayer <
> > > > mich...@niedermayer.cc> wrote:
> > > >
> > > > > On Sat, Oct 22, 2016 at 10:10:01PM -0400, Ronald S. Bultje wrote:
> > > > > > Hi,
> > > > > >
> > > > > > general comment about all other dec patches.
> > > > > >
> > > > > > On Sat, Oct 22, 2016 at 3:02 PM, Michael Niedermayer
> > > > >  > > > > > > wrote:
> > > > > >
> > > > > > > Signed-off-by: Michael Niedermayer 
> > > > > > > ---
> > > > > > >  libavcodec/svq1dec.c | 2 ++
> > > > > > >  1 file changed, 2 insertions(+)
> > > > > > >
> > > > > > > diff --git a/libavcodec/svq1dec.c b/libavcodec/svq1dec.c
> > > > > > > index 2b72e08..0fe222e 100644
> > > > > > > --- a/libavcodec/svq1dec.c
> > > > > > > +++ b/libavcodec/svq1dec.c
> > > > > > > @@ -744,6 +744,7 @@ static int svq1_decode_frame(AVCodecConte
> xt
> > > > > *avctx,
> > > > > > > void *data,
> > > > > > >  }
> > > > > > >  }
> > > > > > >  }
> > > > > > > +emms_c();
> > > > > >
> > > > > >
> > > > > > This is hideous, you're sprinkling emms_c in various places to
> make
> > > some
> > > > > > stupid test pass. The test is morbidly stupid and there is no
> general
> > > > > > consensus on patterns to be followed as for where to place
> emms_c.
> > > > > Someone
> > > > > > who doesn't know any better will litter each new decoder with
> 10-20
> > > calls
> > > > > > to emms_c just because he found that other decoders do it in
> > > > > undocumented,
> > > > > > unexplained and unclear locations also.
> > > > > >
> > > > > > If you want this to be a "thing", you need to design and document
> > > > > carefully
> > > > > > where emms_c is necessary. Then come up with some system that
> makes
> > > this
> > > > > > work by itself.
> > > > >
> > > > > Your mail sounds quite aggressive to me, did i say something to
> anger
> > > > > you ? It was certainly not intended
> > > > >
> > > > > About this patchset
> > > > > FFmpeg is broken ATM (both in theory and practice with some libcs),
> > > > > the way MMX code is used is not correct, emms_c()
> > > > > calls are missing and required. The obvious way to fix that
> > > > > (in practice) is adding emms_c() calls where they are missing.
> > > > > I can document why each call is needed, if thats wanted?
> > > >
> > > >
> > > > Your representation of facts is strange, to say the least. Let's
> explore
> > > > two related claims:
> > > >
> > > > - (A) ffmpeg is broken in practice when calling musl malloc/free
> > > functions
> > > > after calling MMX SIMD functions on x86-32 (which crashes).
> > > > - (B) ffmpeg is broken in theory because we don't clear FPU state (as
> > > > required) at the end of MMX SIMD functions.
> > >
> > > I was under the impression that it is UB to have the FPU in MMX state
> > > at any time while in C, not just while e.g. calling the stdlib. Maybe I
> > > got that wrong (how would MMX intrinsics even work?) - can anyone shed
> > > light on the exact requirements? (Possibly again, sorry.)
> >
> >
> > I'm under the impression that it's part of the calling convention. That
> is,
> > any code anywhere (including mmx intrinsics, indeed) can - when called -
> > expect the state to be cleared by the caller, just like you'd expect
> > eax/edx to be caller-save (whereas esi/edi are callee-save).
> >
> > However, if you never call external code (including intrinsics), you can
> > ignore this, just as you can ignore / create your own calling
> > convention (remember fastcall etc.?). However, when calling any external
> > code, this could (in theory) crash; it's just that right now it only
> > crashes with musl when calling malloc/free. So basically, ffmpeg has its
> > own calling convention, and manually calling emms_c() fixes "ffmpeg"
> > calling convention to be compatible with "standard" calling
> convention...?
>
> It can't really be a calling convention unless the compiler is aware of
> it?
>
> We're doing things behind the compiler's back here. The safest
> assumption would be that leaving the FPU state invalid while in C is
> not allowed, period.
>
> The next safest assumption is that it's fine as long as we explicitly
> don't use floating point or call external functions that aren't
> MMX-aware. This would mean calling any libc functions or user callbacks
> (including indirectly through e.g. av_log) is not fine.
>

Yes.

And while I don't agree with it, I feel we should also mention Carl Eugen's
alternative proposal, which is to document that ffmpeg doesn't formally
adhere to this calling convention requirement [1].

Of course 

Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-24 Thread wm4
On Mon, 24 Oct 2016 07:54:47 -0400
"Ronald S. Bultje"  wrote:

> Hi,
> 
> On Mon, Oct 24, 2016 at 3:36 AM, wm4  wrote:
> 
> > On Sun, 23 Oct 2016 13:02:01 -0400
> > "Ronald S. Bultje"  wrote:
> >  
> > > Hi,
> > >
> > > On Sat, Oct 22, 2016 at 11:36 PM, Michael Niedermayer <  
> > > mich...@niedermayer.cc> wrote:  
> > >  
> > > > On Sat, Oct 22, 2016 at 10:10:01PM -0400, Ronald S. Bultje wrote:  
> > > > > Hi,
> > > > >
> > > > > general comment about all other dec patches.
> > > > >
> > > > > On Sat, Oct 22, 2016 at 3:02 PM, Michael Niedermayer  
> > > >  > > > > > wrote:  
> > > > >  
> > > > > > Signed-off-by: Michael Niedermayer 
> > > > > > ---
> > > > > >  libavcodec/svq1dec.c | 2 ++
> > > > > >  1 file changed, 2 insertions(+)
> > > > > >
> > > > > > diff --git a/libavcodec/svq1dec.c b/libavcodec/svq1dec.c
> > > > > > index 2b72e08..0fe222e 100644
> > > > > > --- a/libavcodec/svq1dec.c
> > > > > > +++ b/libavcodec/svq1dec.c
> > > > > > @@ -744,6 +744,7 @@ static int svq1_decode_frame(AVCodecContext  
> > > > *avctx,  
> > > > > > void *data,
> > > > > >  }
> > > > > >  }
> > > > > >  }
> > > > > > +emms_c();  
> > > > >
> > > > >
> > > > > This is hideous, you're sprinkling emms_c in various places to make  
> > some  
> > > > > stupid test pass. The test is morbidly stupid and there is no general
> > > > > consensus on patterns to be followed as for where to place emms_c.  
> > > > Someone  
> > > > > who doesn't know any better will litter each new decoder with 10-20  
> > calls  
> > > > > to emms_c just because he found that other decoders do it in  
> > > > undocumented,  
> > > > > unexplained and unclear locations also.
> > > > >
> > > > > If you want this to be a "thing", you need to design and document  
> > > > carefully  
> > > > > where emms_c is necessary. Then come up with some system that makes  
> > this  
> > > > > work by itself.  
> > > >
> > > > Your mail sounds quite aggressive to me, did i say something to anger
> > > > you ? It was certainly not intended
> > > >
> > > > About this patchset
> > > > FFmpeg is broken ATM (both in theory and practice with some libcs),
> > > > the way MMX code is used is not correct, emms_c()
> > > > calls are missing and required. The obvious way to fix that
> > > > (in practice) is adding emms_c() calls where they are missing.
> > > > I can document why each call is needed, if thats wanted?  
> > >
> > >
> > > Your representation of facts is strange, to say the least. Let's explore
> > > two related claims:
> > >
> > > - (A) ffmpeg is broken in practice when calling musl malloc/free  
> > functions  
> > > after calling MMX SIMD functions on x86-32 (which crashes).
> > > - (B) ffmpeg is broken in theory because we don't clear FPU state (as
> > > required) at the end of MMX SIMD functions.  
> >
> > I was under the impression that it is UB to have the FPU in MMX state
> > at any time while in C, not just while e.g. calling the stdlib. Maybe I
> > got that wrong (how would MMX intrinsics even work?) - can anyone shed
> > light on the exact requirements? (Possibly again, sorry.)  
> 
> 
> I'm under the impression that it's part of the calling convention. That is,
> any code anywhere (including mmx intrinsics, indeed) can - when called -
> expect the state to be cleared by the caller, just like you'd expect
> eax/edx to be caller-save (whereas esi/edi are callee-save).
> 
> However, if you never call external code (including intrinsics), you can
> ignore this, just as you can ignore / create your own calling
> convention (remember fastcall etc.?). However, when calling any external
> code, this could (in theory) crash; it's just that right now it only
> crashes with musl when calling malloc/free. So basically, ffmpeg has its
> own calling convention, and manually calling emms_c() fixes "ffmpeg"
> calling convention to be compatible with "standard" calling convention...?

It can't really be a calling convention unless the compiler is aware of
it?

We're doing things behind the compiler's back here. The safest
assumption would be that leaving the FPU state invalid while in C is
not allowed, period.

The next safest assumption is that it's fine as long as we explicitly
don't use floating point or call external functions that aren't
MMX-aware. This would mean calling any libc functions or user callbacks
(including indirectly through e.g. av_log) is not fine.

Of course this is not as easy to verify as adding a hack-assert to
av_malloc/av_free.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-24 Thread wm4
On Sun, 23 Oct 2016 13:02:01 -0400
"Ronald S. Bultje"  wrote:

> Hi,
> 
> On Sat, Oct 22, 2016 at 11:36 PM, Michael Niedermayer <
> mich...@niedermayer.cc> wrote:  
> 
> > On Sat, Oct 22, 2016 at 10:10:01PM -0400, Ronald S. Bultje wrote:  
> > > Hi,
> > >
> > > general comment about all other dec patches.
> > >
> > > On Sat, Oct 22, 2016 at 3:02 PM, Michael Niedermayer  
> >  > > > wrote:  
> > >  
> > > > Signed-off-by: Michael Niedermayer 
> > > > ---
> > > >  libavcodec/svq1dec.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/libavcodec/svq1dec.c b/libavcodec/svq1dec.c
> > > > index 2b72e08..0fe222e 100644
> > > > --- a/libavcodec/svq1dec.c
> > > > +++ b/libavcodec/svq1dec.c
> > > > @@ -744,6 +744,7 @@ static int svq1_decode_frame(AVCodecContext  
> > *avctx,  
> > > > void *data,
> > > >  }
> > > >  }
> > > >  }
> > > > +emms_c();  
> > >
> > >
> > > This is hideous, you're sprinkling emms_c in various places to make some
> > > stupid test pass. The test is morbidly stupid and there is no general
> > > consensus on patterns to be followed as for where to place emms_c.  
> > Someone  
> > > who doesn't know any better will litter each new decoder with 10-20 calls
> > > to emms_c just because he found that other decoders do it in  
> > undocumented,  
> > > unexplained and unclear locations also.
> > >
> > > If you want this to be a "thing", you need to design and document  
> > carefully  
> > > where emms_c is necessary. Then come up with some system that makes this
> > > work by itself.  
> >
> > Your mail sounds quite aggressive to me, did i say something to anger
> > you ? It was certainly not intended
> >
> > About this patchset
> > FFmpeg is broken ATM (both in theory and practice with some libcs),
> > the way MMX code is used is not correct, emms_c()
> > calls are missing and required. The obvious way to fix that
> > (in practice) is adding emms_c() calls where they are missing.
> > I can document why each call is needed, if thats wanted?  
> 
> 
> Your representation of facts is strange, to say the least. Let's explore
> two related claims:
> 
> - (A) ffmpeg is broken in practice when calling musl malloc/free functions
> after calling MMX SIMD functions on x86-32 (which crashes).
> - (B) ffmpeg is broken in theory because we don't clear FPU state (as
> required) at the end of MMX SIMD functions.

I was under the impression that it is UB to have the FPU in MMX state
at any time while in C, not just while e.g. calling the stdlib. Maybe I
got that wrong (how would MMX intrinsics even work?) - can anyone shed
light on the exact requirements? (Possibly again, sorry.)
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-23 Thread Ronald S. Bultje
Hi,

On Sat, Oct 22, 2016 at 11:36 PM, Michael Niedermayer <
mich...@niedermayer.cc> wrote:

> On Sat, Oct 22, 2016 at 10:10:01PM -0400, Ronald S. Bultje wrote:
> > Hi,
> >
> > general comment about all other dec patches.
> >
> > On Sat, Oct 22, 2016 at 3:02 PM, Michael Niedermayer
>  > > wrote:
> >
> > > Signed-off-by: Michael Niedermayer 
> > > ---
> > >  libavcodec/svq1dec.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/libavcodec/svq1dec.c b/libavcodec/svq1dec.c
> > > index 2b72e08..0fe222e 100644
> > > --- a/libavcodec/svq1dec.c
> > > +++ b/libavcodec/svq1dec.c
> > > @@ -744,6 +744,7 @@ static int svq1_decode_frame(AVCodecContext
> *avctx,
> > > void *data,
> > >  }
> > >  }
> > >  }
> > > +emms_c();
> >
> >
> > This is hideous, you're sprinkling emms_c in various places to make some
> > stupid test pass. The test is morbidly stupid and there is no general
> > consensus on patterns to be followed as for where to place emms_c.
> Someone
> > who doesn't know any better will litter each new decoder with 10-20 calls
> > to emms_c just because he found that other decoders do it in
> undocumented,
> > unexplained and unclear locations also.
> >
> > If you want this to be a "thing", you need to design and document
> carefully
> > where emms_c is necessary. Then come up with some system that makes this
> > work by itself.
>
> Your mail sounds quite aggressive to me, did i say something to anger
> you ? It was certainly not intended
>
> About this patchset
> FFmpeg is broken ATM (both in theory and practice with some libcs),
> the way MMX code is used is not correct, emms_c()
> calls are missing and required. The obvious way to fix that
> (in practice) is adding emms_c() calls where they are missing.
> I can document why each call is needed, if thats wanted?


Your representation of facts is strange, to say the least. Let's explore
two related claims:

- (A) ffmpeg is broken in practice when calling musl malloc/free functions
after calling MMX SIMD functions on x86-32 (which crashes).
- (B) ffmpeg is broken in theory because we don't clear FPU state (as
required) at the end of MMX SIMD functions.

Which are you trying to fix? You make it sound like you're fixing (B) (you
use the plural "libcs" and often use the word "required" or "standard"),
but your patchset only addresses a small subset of (A). For example, you're
not clearing FPU state at the end of a SIMD function if it's followed by a
SIMD function that calls memcpy in the C implementation.

But more importantly, in doing so (regardless whether you're trying to fix
"A" or "B"), this patchset:
- has no design (or at least, I can't find it);
- it litters the code with unstructured calls to emms_c(), which seem
specifically placed to fix make fate with an assert added in av_malloc/free;
- it adds unnecessary (to fix "A") calls to emms_c() on x86-64;
- it doesn't address pretty much anything in "B";
- it does not address any scenario not tested by make fate (e.g. external
lib encoders/decoders/filters).

I'm mostly asking about design here. "Litter around emms_c() calls until
make fate passes with an assert added" is not a good design, as much as it
may decrease the number of av_assert2_fpu()s triggered by "make fate". Do
you think we can do better?

You're calling me aggressive, so to address that, let me try to be
substantive and give suggestions on how this could be done better. Please
note that this list is incomplete and is meant to start a discussion on
this subject, it's not like "if you address these specific issues, I'm OK
with the patchset". I'm happy to think more about this and give more
suggestions if you want to hear them. My suggestions:
- define whether we're trying to fix "A" or "B" from above, because it
affects the design. I'm assuming you're trying to fix A.
- #define emms_c() do { /* nothing */ } while (0) on x86-64;
- add a call to emms_c() in frame allocation routines (this should prevent
the emms_c() insertions in vp56.c and svq1dec.c);
- add a routine to signal that frame processing is complete (like the
INT_MAX check you're trying to add with frame-threading), and call that in
all decoders/encoders after block processing (which calls SIMD) is
complete, so we can context-switch (and emms_c()) regardless of the
presence of frame-mt or not. It also gets rid of the magic value INT_MAX;
- consider potential issues with external lib en/decoders which don't have
much of a presence in "make fate";
- think about libavfilter.

I'm fully aware that this is not a complete design that gets rid of calls
to emms_c() in all decoders. I'm also fully aware that some decoders
already call emms_c() because they signal when one line of blocks is
completed, which is a user-callback (which may need a cleared FPU state).
I'm not asking you to get rid of any of that, I'm just asking you to not
make it exponentially worse by designing this a little bit more 

Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-22 Thread Michael Niedermayer
On Sat, Oct 22, 2016 at 10:10:01PM -0400, Ronald S. Bultje wrote:
> Hi,
> 
> general comment about all other dec patches.
> 
> On Sat, Oct 22, 2016 at 3:02 PM, Michael Niedermayer  > wrote:
> 
> > Signed-off-by: Michael Niedermayer 
> > ---
> >  libavcodec/svq1dec.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/libavcodec/svq1dec.c b/libavcodec/svq1dec.c
> > index 2b72e08..0fe222e 100644
> > --- a/libavcodec/svq1dec.c
> > +++ b/libavcodec/svq1dec.c
> > @@ -744,6 +744,7 @@ static int svq1_decode_frame(AVCodecContext *avctx,
> > void *data,
> >  }
> >  }
> >  }
> > +emms_c();
> 
> 
> This is hideous, you're sprinkling emms_c in various places to make some
> stupid test pass. The test is morbidly stupid and there is no general
> consensus on patterns to be followed as for where to place emms_c. Someone
> who doesn't know any better will litter each new decoder with 10-20 calls
> to emms_c just because he found that other decoders do it in undocumented,
> unexplained and unclear locations also.
> 
> If you want this to be a "thing", you need to design and document carefully
> where emms_c is necessary. Then come up with some system that makes this
> work by itself.

Your mail sounds quite aggressive to me, did i say something to anger
you ? It was certainly not intended

About this patchset
FFmpeg is broken ATM (both in theory and practice with some libcs),
the way MMX code is used is not correct, emms_c()
calls are missing and required. The obvious way to fix that
(in practice) is adding emms_c() calls where they are missing.
I can document why each call is needed, if thats wanted?

I dont think hiding the calls by some trick would make it 
better. The calls must be carefully placed and knowing what the calls
do makes that easier to maintain than a frame_end() function which
would have to be ordered so it comes before any alloc/free/ref/unref/
... as well but for which thie requirement is less obvious
in fact even the emms in the thread progress is probably not that great
of an idea because of this.

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If you fake or manipulate statistics in a paper in physics you will never
get a job again.
If you fake or manipulate statistics in a paper in medicin you will get
a job for life at the pharma industry.


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-22 Thread Ronald S. Bultje
Hi,

general comment about all other dec patches.

On Sat, Oct 22, 2016 at 3:02 PM, Michael Niedermayer  wrote:

> Signed-off-by: Michael Niedermayer 
> ---
>  libavcodec/svq1dec.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/libavcodec/svq1dec.c b/libavcodec/svq1dec.c
> index 2b72e08..0fe222e 100644
> --- a/libavcodec/svq1dec.c
> +++ b/libavcodec/svq1dec.c
> @@ -744,6 +744,7 @@ static int svq1_decode_frame(AVCodecContext *avctx,
> void *data,
>  }
>  }
>  }
> +emms_c();


This is hideous, you're sprinkling emms_c in various places to make some
stupid test pass. The test is morbidly stupid and there is no general
consensus on patterns to be followed as for where to place emms_c. Someone
who doesn't know any better will litter each new decoder with 10-20 calls
to emms_c just because he found that other decoders do it in undocumented,
unexplained and unclear locations also.

If you want this to be a "thing", you need to design and document carefully
where emms_c is necessary. Then come up with some system that makes this
work by itself. I've said from the beginning that I highly dislike
littering the code with emms_c in individual decoders, and that's exactly
what you're doing here. This is insane.

Ronald
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel