Re: [FFmpeg-devel] [PATCH 5/9] x86: simple_idct10_template: fix overflow in pass

2015-10-13 Thread Christophe Gisquet
Hi, 2015-10-13 2:26 GMT+02:00 Michael Niedermayer : > On Mon, Oct 12, 2015 at 07:37:46PM +0200, Christophe Gisquet wrote: >> When the input of a pass has 15 or 16 bits of precision (in particular >> the column pass), the addition of a bias to W4 may lead to overflows >> in

Re: [FFmpeg-devel] [PATCH 5/9] x86: simple_idct10_template: fix overflow in pass

2015-10-13 Thread Michael Niedermayer
On Tue, Oct 13, 2015 at 01:33:07PM +0200, Christophe Gisquet wrote: > Hi, > > 2015-10-13 13:10 GMT+02:00 Michael Niedermayer : > > hmm, iam a bit concerned that adding the rounder (which effectively is > > 0.5) causes a overflow, that would if iam not mistaken imlpy that >

Re: [FFmpeg-devel] [PATCH 5/9] x86: simple_idct10_template: fix overflow in pass

2015-10-13 Thread Michael Niedermayer
On Tue, Oct 13, 2015 at 09:01:44AM +0200, Christophe Gisquet wrote: > Hi, > > 2015-10-13 2:26 GMT+02:00 Michael Niedermayer : > > On Mon, Oct 12, 2015 at 07:37:46PM +0200, Christophe Gisquet wrote: > >> When the input of a pass has 15 or 16 bits of precision (in particular

Re: [FFmpeg-devel] [PATCH 5/9] x86: simple_idct10_template: fix overflow in pass

2015-10-13 Thread Christophe Gisquet
Hi, 2015-10-13 13:10 GMT+02:00 Michael Niedermayer : > hmm, iam a bit concerned that adding the rounder (which effectively is > 0.5) causes a overflow, that would if iam not mistaken imlpy that > things are very close to overflowing already without it It's true, but the

Re: [FFmpeg-devel] [PATCH 5/9] x86: simple_idct10_template: fix overflow in pass

2015-10-13 Thread Christophe Gisquet
2015-10-13 15:43 GMT+02:00 Michael Niedermayer : > On Tue, Oct 13, 2015 at 01:33:07PM +0200, Christophe Gisquet wrote: >> Hi, >> >> 2015-10-13 13:10 GMT+02:00 Michael Niedermayer : >> > hmm, iam a bit concerned that adding the rounder (which

[FFmpeg-devel] [PATCH 5/9] x86: simple_idct10_template: fix overflow in pass

2015-10-12 Thread Christophe Gisquet
When the input of a pass has 15 or 16 bits of precision (in particular the column pass), the addition of a bias to W4 may lead to overflows in the input to pmaddwd. This requires postponing the adding of the bias to after the first butterfly. To do so, the fact that m15, unused although zeroed,

Re: [FFmpeg-devel] [PATCH 5/9] x86: simple_idct10_template: fix overflow in pass

2015-10-12 Thread Michael Niedermayer
On Mon, Oct 12, 2015 at 07:37:46PM +0200, Christophe Gisquet wrote: > When the input of a pass has 15 or 16 bits of precision (in particular > the column pass), the addition of a bias to W4 may lead to overflows > in the input to pmaddwd. > > This requires postponing the adding of the bias to