Re: [PATCH v2 08/12] intel-ipu3: params: compute and program ccs

2017-10-13 Thread Sakari Ailus
Hi Tuukka, Andy, others,

On Wed, Oct 11, 2017 at 05:27:27PM +0300, Andy Shevchenko wrote:
> On Wed, Oct 11, 2017 at 5:01 PM, Tuukka Toivonen
>  wrote:
> > On Wed, 2017-10-11 at 16:31 +0300, Andy Shevchenko wrote:
> >> On Wed, Oct 11, 2017 at 10:29 AM, sakari.ai...@linux.intel.com
> >>  wrote:
> >> > On Wed, Oct 11, 2017 at 04:14:37AM +, Zhi, Yong wrote:
> 
> >> > > > > +static unsigned int ipu3_css_scaler_get_exp(unsigned int
> >> > > > > counter,
> >> > > > > +   unsigned int
> >> > > > > divider) {
> >> > > > > +   unsigned int i = 0;
> >> > > > > +
> >> > > > > +   while (counter <= divider / 2) {
> >> > > > > +   divider /= 2;
> >> > > > > +   i++;
> >> > > > > +   }
> >> > > > > +
> >> > > > > +   return i;
> 
> >> Roughly like
> >>
> >> if (!counter || divider < counter)
> >>  return 0;
> >> return order_base_2(divider) - order_base_2(counter);
> >
> > The original loop is typical ran just couple of times, so I think
> > that fls or division are probably slower than the original loop.
> > Furthermore, these "optimizations" are also harder to read, so in
> > my opinion there's no advantage in using them.
> 
> Honestly I'm opposing that.
> It took me about minute to be clear what is going on on that loop
> while fls() / ffs() / ilog2() like stuff can be read fast.
> 
> Like
> 
> int shift = order_base_2(divider) - order_base_2(counter);
> 
> return shift > 0 ? shift : 0;
> 
> And frankly I don't care about under the hoods of order_base_2(). I
> care about this certain piece of code to be simpler.
> 
> One may put a comment line:
> 
> # Get log2 of how divider bigger than counter
> 
> And thinking more while writing this message the use of order_base_2()
> actually explains what's going on here.

I guess this isn't really worth spending much time on; either way, please
at least fix the current implementation so it won't end up in an infinite
loop. This happens now if counter is zero.

-- 
Regards,

Sakari Ailus
e-mail: sakari.ai...@iki.fi


Re: [PATCH v2 08/12] intel-ipu3: params: compute and program ccs

2017-10-11 Thread Andy Shevchenko
On Wed, Oct 11, 2017 at 5:01 PM, Tuukka Toivonen
 wrote:
> On Wed, 2017-10-11 at 16:31 +0300, Andy Shevchenko wrote:
>> On Wed, Oct 11, 2017 at 10:29 AM, sakari.ai...@linux.intel.com
>>  wrote:
>> > On Wed, Oct 11, 2017 at 04:14:37AM +, Zhi, Yong wrote:

>> > > > > +static unsigned int ipu3_css_scaler_get_exp(unsigned int
>> > > > > counter,
>> > > > > +   unsigned int
>> > > > > divider) {
>> > > > > +   unsigned int i = 0;
>> > > > > +
>> > > > > +   while (counter <= divider / 2) {
>> > > > > +   divider /= 2;
>> > > > > +   i++;
>> > > > > +   }
>> > > > > +
>> > > > > +   return i;

>> Roughly like
>>
>> if (!counter || divider < counter)
>>  return 0;
>> return order_base_2(divider) - order_base_2(counter);
>
> The original loop is typical ran just couple of times, so I think
> that fls or division are probably slower than the original loop.
> Furthermore, these "optimizations" are also harder to read, so in
> my opinion there's no advantage in using them.

Honestly I'm opposing that.
It took me about minute to be clear what is going on on that loop
while fls() / ffs() / ilog2() like stuff can be read fast.

Like

int shift = order_base_2(divider) - order_base_2(counter);

return shift > 0 ? shift : 0;

And frankly I don't care about under the hoods of order_base_2(). I
care about this certain piece of code to be simpler.

One may put a comment line:

# Get log2 of how divider bigger than counter

And thinking more while writing this message the use of order_base_2()
actually explains what's going on here.

-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH v2 08/12] intel-ipu3: params: compute and program ccs

2017-10-11 Thread Tuukka Toivonen
On Wed, 2017-10-11 at 16:31 +0300, Andy Shevchenko wrote:
> On Wed, Oct 11, 2017 at 10:29 AM, sakari.ai...@linux.intel.com
>  wrote:
> > On Wed, Oct 11, 2017 at 04:14:37AM +, Zhi, Yong wrote:
> > > > > +static unsigned int ipu3_css_scaler_get_exp(unsigned int
> > > > > counter,
> > > > > +   unsigned int
> > > > > divider) {
> > > > > +   unsigned int i = 0;
> > > > > +
> > > > > +   while (counter <= divider / 2) {
> > > > > +   divider /= 2;
> > > > > +   i++;
> > > > > +   }
> > > > > +
> > > > > +   return i;
> > return (!counter || divider < counter) ?
> >    0 : fls(divider / counter) - 1;
> 
> Extra division is here (I dunno if counter is always power of 2 but
> it
> doesn't matter for compiler).
> 
> Basically above calculates how much bits we need to shift divider to
> get it less than counter.
> 
> I would consider to use something from log2.h.
> 
> Roughly like
> 
> if (!counter || divider < counter)
>  return 0;
> return order_base_2(divider) - order_base_2(counter);

The original loop is typical ran just couple of times, so I think
that fls or division are probably slower than the original loop.
Furthermore, these "optimizations" are also harder to read, so in
my opinion there's no advantage in using them.

- Tuukka



Re: [PATCH v2 08/12] intel-ipu3: params: compute and program ccs

2017-10-11 Thread Andy Shevchenko
On Wed, Oct 11, 2017 at 10:29 AM, sakari.ai...@linux.intel.com
 wrote:
> On Wed, Oct 11, 2017 at 04:14:37AM +, Zhi, Yong wrote:

>> > > +static unsigned int ipu3_css_scaler_get_exp(unsigned int counter,
>> > > +   unsigned int divider) {
>> > > +   unsigned int i = 0;
>> > > +
>> > > +   while (counter <= divider / 2) {
>> > > +   divider /= 2;
>> > > +   i++;
>> > > +   }
>> > > +
>> > > +   return i;

> return (!counter || divider < counter) ?
>0 : fls(divider / counter) - 1;

Extra division is here (I dunno if counter is always power of 2 but it
doesn't matter for compiler).

Basically above calculates how much bits we need to shift divider to
get it less than counter.

I would consider to use something from log2.h.

Roughly like

if (!counter || divider < counter)
 return 0;
return order_base_2(divider) - order_base_2(counter);

-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH v2 08/12] intel-ipu3: params: compute and program ccs

2017-10-11 Thread sakari.ai...@linux.intel.com
On Wed, Oct 11, 2017 at 04:14:37AM +, Zhi, Yong wrote:
> Hi, Andy,
> 
> > -Original Message-
> > From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
> > ow...@vger.kernel.org] On Behalf Of Andy Shevchenko
> > Sent: Friday, June 16, 2017 3:53 PM
> > To: Zhi, Yong <yong@intel.com>
> > Cc: Linux Media Mailing List <linux-media@vger.kernel.org>;
> > sakari.ai...@linux.intel.com; Zheng, Jian Xu <jian.xu.zh...@intel.com>;
> > tf...@chromium.org; Mani, Rajmohan <rajmohan.m...@intel.com>;
> > Toivonen, Tuukka <tuukka.toivo...@intel.com>
> > Subject: Re: [PATCH v2 08/12] intel-ipu3: params: compute and program ccs
> > 
> > On Thu, Jun 15, 2017 at 1:19 AM, Yong Zhi <yong@intel.com> wrote:
> > > A collection of routines that are mainly responsible to calculate the
> > > acc parameters.
> > 
> > > +static unsigned int ipu3_css_scaler_get_exp(unsigned int counter,
> > > +   unsigned int divider) {
> > > +   unsigned int i = 0;
> > > +
> > > +   while (counter <= divider / 2) {
> > > +   divider /= 2;
> > > +   i++;
> > > +   }
> > > +
> > > +   return i;
> > 
> > We have a lot of different helpers including one you may use instead of this
> > function.
> > 
> > It's *highly* recommended you learn what we have under lib/ (and not only
> > there) in kernel bewfore submitting a new version.
> > 
> 
> Tried to identify more places that could be re-implemented with lib
> helpers or more generic method, but we failed to spot any obvious
> candidate thus far.

How about:

return (!counter || divider < counter) ?
   0 : fls(divider / counter) - 1;

-- 
Sakari Ailus
sakari.ai...@linux.intel.com


RE: [PATCH v2 08/12] intel-ipu3: params: compute and program ccs

2017-10-10 Thread Zhi, Yong
Hi, Andy,

> -Original Message-
> From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
> ow...@vger.kernel.org] On Behalf Of Andy Shevchenko
> Sent: Friday, June 16, 2017 3:53 PM
> To: Zhi, Yong <yong@intel.com>
> Cc: Linux Media Mailing List <linux-media@vger.kernel.org>;
> sakari.ai...@linux.intel.com; Zheng, Jian Xu <jian.xu.zh...@intel.com>;
> tf...@chromium.org; Mani, Rajmohan <rajmohan.m...@intel.com>;
> Toivonen, Tuukka <tuukka.toivo...@intel.com>
> Subject: Re: [PATCH v2 08/12] intel-ipu3: params: compute and program ccs
> 
> On Thu, Jun 15, 2017 at 1:19 AM, Yong Zhi <yong@intel.com> wrote:
> > A collection of routines that are mainly responsible to calculate the
> > acc parameters.
> 
> > +static unsigned int ipu3_css_scaler_get_exp(unsigned int counter,
> > +   unsigned int divider) {
> > +   unsigned int i = 0;
> > +
> > +   while (counter <= divider / 2) {
> > +   divider /= 2;
> > +   i++;
> > +   }
> > +
> > +   return i;
> 
> We have a lot of different helpers including one you may use instead of this
> function.
> 
> It's *highly* recommended you learn what we have under lib/ (and not only
> there) in kernel bewfore submitting a new version.
> 

Tried to identify more places that could be re-implemented with lib helpers or 
more generic method, but we failed to spot any obvious candidate thus far.

> --
> With Best Regards,
> Andy Shevchenko


Re: [PATCH v2 08/12] intel-ipu3: params: compute and program ccs

2017-06-16 Thread Andy Shevchenko
On Thu, Jun 15, 2017 at 1:19 AM, Yong Zhi  wrote:
> A collection of routines that are mainly responsible
> to calculate the acc parameters.

> +static unsigned int ipu3_css_scaler_get_exp(unsigned int counter,
> +   unsigned int divider)
> +{
> +   unsigned int i = 0;
> +
> +   while (counter <= divider / 2) {
> +   divider /= 2;
> +   i++;
> +   }
> +
> +   return i;

We have a lot of different helpers including one you may use instead
of this function.

It's *highly* recommended you learn what we have under lib/ (and not
only there) in kernel bewfore submitting a new version.

-- 
With Best Regards,
Andy Shevchenko


[PATCH v2 08/12] intel-ipu3: params: compute and program ccs

2017-06-14 Thread Yong Zhi
A collection of routines that are mainly responsible
to calculate the acc parameters.

Signed-off-by: Yong Zhi 
---
 drivers/media/pci/intel/ipu3/ipu3-css-params.c | 3119 
 drivers/media/pci/intel/ipu3/ipu3-css-params.h |  105 +
 drivers/media/pci/intel/ipu3/ipu3-css.h|   92 +
 3 files changed, 3316 insertions(+)
 create mode 100644 drivers/media/pci/intel/ipu3/ipu3-css-params.c
 create mode 100644 drivers/media/pci/intel/ipu3/ipu3-css-params.h

diff --git a/drivers/media/pci/intel/ipu3/ipu3-css-params.c 
b/drivers/media/pci/intel/ipu3/ipu3-css-params.c
new file mode 100644
index 000..0dcdfed
--- /dev/null
+++ b/drivers/media/pci/intel/ipu3/ipu3-css-params.c
@@ -0,0 +1,3119 @@
+/*
+ * Copyright (c) 2017 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+
+#include "ipu3-abi.h"
+#include "ipu3-css.h"
+#include "ipu3-css-fw.h"
+#include "ipu3-css-params.h"
+#include "ipu3-tables.h"
+
+static unsigned int ipu3_css_scaler_get_exp(unsigned int counter,
+   unsigned int divider)
+{
+   unsigned int i = 0;
+
+   while (counter <= divider / 2) {
+   divider /= 2;
+   i++;
+   }
+
+   return i;
+}
+
+/* Set up the CSS scaler look up table */
+static void ipu3_css_scaler_setup_lut(unsigned int taps,
+ unsigned int input_width,
+ unsigned int output_width,
+ int phase_step_correction,
+ const int *coeffs,
+ unsigned int coeffs_size,
+ s8 coeff_lut[IMGU_SCALER_PHASES *
+  IMGU_SCALER_FILTER_TAPS],
+ struct ipu3_css_scaler_info *info)
+{
+   int tap;
+   int phase;
+   int exponent = ipu3_css_scaler_get_exp(output_width, input_width);
+   int mantissa = (1 << exponent) * output_width;
+   unsigned int phase_step = 0;
+   int phase_sum_left = 0;
+   int phase_sum_right = 0;
+
+   for (phase = 0; phase < IMGU_SCALER_PHASES; phase++) {
+   for (tap = 0; tap < taps; tap++) {
+   /* flip table to for convolution reverse indexing */
+   s64 coeff =  coeffs[coeffs_size -
+   ((tap * (coeffs_size / taps)) +
+   phase) - 1];
+   coeff *= mantissa;
+   coeff /= input_width;
+
+   /* Add +"0.5" */
+   coeff += 1 << (IMGU_SCALER_COEFF_BITS - 1);
+   coeff >>= IMGU_SCALER_COEFF_BITS;
+
+   coeff_lut[phase * IMGU_SCALER_FILTER_TAPS + tap] =
+   coeff;
+   }
+   }
+
+   phase_step = IMGU_SCALER_PHASES *
+   (1 << IMGU_SCALER_PHASE_COUNTER_PREC_REF)
+   * output_width / input_width;
+   phase_step += phase_step_correction;
+   phase_sum_left = (taps / 2 * IMGU_SCALER_PHASES *
+   (1 << IMGU_SCALER_PHASE_COUNTER_PREC_REF))
+   - (1 << (IMGU_SCALER_PHASE_COUNTER_PREC_REF - 1));
+   phase_sum_right = (taps / 2 * IMGU_SCALER_PHASES *
+   (1 << IMGU_SCALER_PHASE_COUNTER_PREC_REF))
+   + (1 << (IMGU_SCALER_PHASE_COUNTER_PREC_REF - 1));
+
+   info->exp_shift = IMGU_SCALER_MAX_EXPONENT_SHIFT - exponent;
+   info->pad_left = (phase_sum_left % phase_step == 0) ?
+   phase_sum_left / phase_step - 1 : phase_sum_left / phase_step;
+   info->pad_right = (phase_sum_right % phase_step == 0) ?
+   phase_sum_right / phase_step - 1 : phase_sum_right / phase_step;
+   info->phase_init = phase_sum_left - phase_step * info->pad_left;
+   info->phase_step = phase_step;
+   info->crop_left = taps - 1;
+   info->crop_top = taps - 1;
+}
+
+/*
+ * Calculates the exact output image width/height, based on phase_step setting
+ * (must be perfectly aligned with hardware).
+ */
+static unsigned int ipu3_css_scaler_calc_scaled_output(unsigned int input,
+   struct ipu3_css_scaler_info *info)
+{
+   unsigned int arg1 = input * info->phase_step +
+   (1 - IMGU_SCALER_TAPS_Y / 2) *
+   IMGU_SCALER_FIR_PHASES -