Re: [PATCH v2 08/12] intel-ipu3: params: compute and program ccs
Hi Tuukka, Andy, others, On Wed, Oct 11, 2017 at 05:27:27PM +0300, Andy Shevchenko wrote: > On Wed, Oct 11, 2017 at 5:01 PM, Tuukka Toivonen >wrote: > > On Wed, 2017-10-11 at 16:31 +0300, Andy Shevchenko wrote: > >> On Wed, Oct 11, 2017 at 10:29 AM, sakari.ai...@linux.intel.com > >> wrote: > >> > On Wed, Oct 11, 2017 at 04:14:37AM +, Zhi, Yong wrote: > > >> > > > > +static unsigned int ipu3_css_scaler_get_exp(unsigned int > >> > > > > counter, > >> > > > > + unsigned int > >> > > > > divider) { > >> > > > > + unsigned int i = 0; > >> > > > > + > >> > > > > + while (counter <= divider / 2) { > >> > > > > + divider /= 2; > >> > > > > + i++; > >> > > > > + } > >> > > > > + > >> > > > > + return i; > > >> Roughly like > >> > >> if (!counter || divider < counter) > >> return 0; > >> return order_base_2(divider) - order_base_2(counter); > > > > The original loop is typical ran just couple of times, so I think > > that fls or division are probably slower than the original loop. > > Furthermore, these "optimizations" are also harder to read, so in > > my opinion there's no advantage in using them. > > Honestly I'm opposing that. > It took me about minute to be clear what is going on on that loop > while fls() / ffs() / ilog2() like stuff can be read fast. > > Like > > int shift = order_base_2(divider) - order_base_2(counter); > > return shift > 0 ? shift : 0; > > And frankly I don't care about under the hoods of order_base_2(). I > care about this certain piece of code to be simpler. > > One may put a comment line: > > # Get log2 of how divider bigger than counter > > And thinking more while writing this message the use of order_base_2() > actually explains what's going on here. I guess this isn't really worth spending much time on; either way, please at least fix the current implementation so it won't end up in an infinite loop. This happens now if counter is zero. -- Regards, Sakari Ailus e-mail: sakari.ai...@iki.fi
Re: [PATCH v2 08/12] intel-ipu3: params: compute and program ccs
On Wed, Oct 11, 2017 at 5:01 PM, Tuukka Toivonenwrote: > On Wed, 2017-10-11 at 16:31 +0300, Andy Shevchenko wrote: >> On Wed, Oct 11, 2017 at 10:29 AM, sakari.ai...@linux.intel.com >> wrote: >> > On Wed, Oct 11, 2017 at 04:14:37AM +, Zhi, Yong wrote: >> > > > > +static unsigned int ipu3_css_scaler_get_exp(unsigned int >> > > > > counter, >> > > > > + unsigned int >> > > > > divider) { >> > > > > + unsigned int i = 0; >> > > > > + >> > > > > + while (counter <= divider / 2) { >> > > > > + divider /= 2; >> > > > > + i++; >> > > > > + } >> > > > > + >> > > > > + return i; >> Roughly like >> >> if (!counter || divider < counter) >> return 0; >> return order_base_2(divider) - order_base_2(counter); > > The original loop is typical ran just couple of times, so I think > that fls or division are probably slower than the original loop. > Furthermore, these "optimizations" are also harder to read, so in > my opinion there's no advantage in using them. Honestly I'm opposing that. It took me about minute to be clear what is going on on that loop while fls() / ffs() / ilog2() like stuff can be read fast. Like int shift = order_base_2(divider) - order_base_2(counter); return shift > 0 ? shift : 0; And frankly I don't care about under the hoods of order_base_2(). I care about this certain piece of code to be simpler. One may put a comment line: # Get log2 of how divider bigger than counter And thinking more while writing this message the use of order_base_2() actually explains what's going on here. -- With Best Regards, Andy Shevchenko
Re: [PATCH v2 08/12] intel-ipu3: params: compute and program ccs
On Wed, 2017-10-11 at 16:31 +0300, Andy Shevchenko wrote: > On Wed, Oct 11, 2017 at 10:29 AM, sakari.ai...@linux.intel.com >wrote: > > On Wed, Oct 11, 2017 at 04:14:37AM +, Zhi, Yong wrote: > > > > > +static unsigned int ipu3_css_scaler_get_exp(unsigned int > > > > > counter, > > > > > + unsigned int > > > > > divider) { > > > > > + unsigned int i = 0; > > > > > + > > > > > + while (counter <= divider / 2) { > > > > > + divider /= 2; > > > > > + i++; > > > > > + } > > > > > + > > > > > + return i; > > return (!counter || divider < counter) ? > > 0 : fls(divider / counter) - 1; > > Extra division is here (I dunno if counter is always power of 2 but > it > doesn't matter for compiler). > > Basically above calculates how much bits we need to shift divider to > get it less than counter. > > I would consider to use something from log2.h. > > Roughly like > > if (!counter || divider < counter) > return 0; > return order_base_2(divider) - order_base_2(counter); The original loop is typical ran just couple of times, so I think that fls or division are probably slower than the original loop. Furthermore, these "optimizations" are also harder to read, so in my opinion there's no advantage in using them. - Tuukka
Re: [PATCH v2 08/12] intel-ipu3: params: compute and program ccs
On Wed, Oct 11, 2017 at 10:29 AM, sakari.ai...@linux.intel.comwrote: > On Wed, Oct 11, 2017 at 04:14:37AM +, Zhi, Yong wrote: >> > > +static unsigned int ipu3_css_scaler_get_exp(unsigned int counter, >> > > + unsigned int divider) { >> > > + unsigned int i = 0; >> > > + >> > > + while (counter <= divider / 2) { >> > > + divider /= 2; >> > > + i++; >> > > + } >> > > + >> > > + return i; > return (!counter || divider < counter) ? >0 : fls(divider / counter) - 1; Extra division is here (I dunno if counter is always power of 2 but it doesn't matter for compiler). Basically above calculates how much bits we need to shift divider to get it less than counter. I would consider to use something from log2.h. Roughly like if (!counter || divider < counter) return 0; return order_base_2(divider) - order_base_2(counter); -- With Best Regards, Andy Shevchenko
Re: [PATCH v2 08/12] intel-ipu3: params: compute and program ccs
On Wed, Oct 11, 2017 at 04:14:37AM +, Zhi, Yong wrote: > Hi, Andy, > > > -Original Message- > > From: linux-media-ow...@vger.kernel.org [mailto:linux-media- > > ow...@vger.kernel.org] On Behalf Of Andy Shevchenko > > Sent: Friday, June 16, 2017 3:53 PM > > To: Zhi, Yong <yong@intel.com> > > Cc: Linux Media Mailing List <linux-media@vger.kernel.org>; > > sakari.ai...@linux.intel.com; Zheng, Jian Xu <jian.xu.zh...@intel.com>; > > tf...@chromium.org; Mani, Rajmohan <rajmohan.m...@intel.com>; > > Toivonen, Tuukka <tuukka.toivo...@intel.com> > > Subject: Re: [PATCH v2 08/12] intel-ipu3: params: compute and program ccs > > > > On Thu, Jun 15, 2017 at 1:19 AM, Yong Zhi <yong@intel.com> wrote: > > > A collection of routines that are mainly responsible to calculate the > > > acc parameters. > > > > > +static unsigned int ipu3_css_scaler_get_exp(unsigned int counter, > > > + unsigned int divider) { > > > + unsigned int i = 0; > > > + > > > + while (counter <= divider / 2) { > > > + divider /= 2; > > > + i++; > > > + } > > > + > > > + return i; > > > > We have a lot of different helpers including one you may use instead of this > > function. > > > > It's *highly* recommended you learn what we have under lib/ (and not only > > there) in kernel bewfore submitting a new version. > > > > Tried to identify more places that could be re-implemented with lib > helpers or more generic method, but we failed to spot any obvious > candidate thus far. How about: return (!counter || divider < counter) ? 0 : fls(divider / counter) - 1; -- Sakari Ailus sakari.ai...@linux.intel.com
RE: [PATCH v2 08/12] intel-ipu3: params: compute and program ccs
Hi, Andy, > -Original Message- > From: linux-media-ow...@vger.kernel.org [mailto:linux-media- > ow...@vger.kernel.org] On Behalf Of Andy Shevchenko > Sent: Friday, June 16, 2017 3:53 PM > To: Zhi, Yong <yong@intel.com> > Cc: Linux Media Mailing List <linux-media@vger.kernel.org>; > sakari.ai...@linux.intel.com; Zheng, Jian Xu <jian.xu.zh...@intel.com>; > tf...@chromium.org; Mani, Rajmohan <rajmohan.m...@intel.com>; > Toivonen, Tuukka <tuukka.toivo...@intel.com> > Subject: Re: [PATCH v2 08/12] intel-ipu3: params: compute and program ccs > > On Thu, Jun 15, 2017 at 1:19 AM, Yong Zhi <yong@intel.com> wrote: > > A collection of routines that are mainly responsible to calculate the > > acc parameters. > > > +static unsigned int ipu3_css_scaler_get_exp(unsigned int counter, > > + unsigned int divider) { > > + unsigned int i = 0; > > + > > + while (counter <= divider / 2) { > > + divider /= 2; > > + i++; > > + } > > + > > + return i; > > We have a lot of different helpers including one you may use instead of this > function. > > It's *highly* recommended you learn what we have under lib/ (and not only > there) in kernel bewfore submitting a new version. > Tried to identify more places that could be re-implemented with lib helpers or more generic method, but we failed to spot any obvious candidate thus far. > -- > With Best Regards, > Andy Shevchenko
Re: [PATCH v2 08/12] intel-ipu3: params: compute and program ccs
On Thu, Jun 15, 2017 at 1:19 AM, Yong Zhiwrote: > A collection of routines that are mainly responsible > to calculate the acc parameters. > +static unsigned int ipu3_css_scaler_get_exp(unsigned int counter, > + unsigned int divider) > +{ > + unsigned int i = 0; > + > + while (counter <= divider / 2) { > + divider /= 2; > + i++; > + } > + > + return i; We have a lot of different helpers including one you may use instead of this function. It's *highly* recommended you learn what we have under lib/ (and not only there) in kernel bewfore submitting a new version. -- With Best Regards, Andy Shevchenko
[PATCH v2 08/12] intel-ipu3: params: compute and program ccs
A collection of routines that are mainly responsible to calculate the acc parameters. Signed-off-by: Yong Zhi--- drivers/media/pci/intel/ipu3/ipu3-css-params.c | 3119 drivers/media/pci/intel/ipu3/ipu3-css-params.h | 105 + drivers/media/pci/intel/ipu3/ipu3-css.h| 92 + 3 files changed, 3316 insertions(+) create mode 100644 drivers/media/pci/intel/ipu3/ipu3-css-params.c create mode 100644 drivers/media/pci/intel/ipu3/ipu3-css-params.h diff --git a/drivers/media/pci/intel/ipu3/ipu3-css-params.c b/drivers/media/pci/intel/ipu3/ipu3-css-params.c new file mode 100644 index 000..0dcdfed --- /dev/null +++ b/drivers/media/pci/intel/ipu3/ipu3-css-params.c @@ -0,0 +1,3119 @@ +/* + * Copyright (c) 2017 Intel Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License version + * 2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include +#include + +#include "ipu3-abi.h" +#include "ipu3-css.h" +#include "ipu3-css-fw.h" +#include "ipu3-css-params.h" +#include "ipu3-tables.h" + +static unsigned int ipu3_css_scaler_get_exp(unsigned int counter, + unsigned int divider) +{ + unsigned int i = 0; + + while (counter <= divider / 2) { + divider /= 2; + i++; + } + + return i; +} + +/* Set up the CSS scaler look up table */ +static void ipu3_css_scaler_setup_lut(unsigned int taps, + unsigned int input_width, + unsigned int output_width, + int phase_step_correction, + const int *coeffs, + unsigned int coeffs_size, + s8 coeff_lut[IMGU_SCALER_PHASES * + IMGU_SCALER_FILTER_TAPS], + struct ipu3_css_scaler_info *info) +{ + int tap; + int phase; + int exponent = ipu3_css_scaler_get_exp(output_width, input_width); + int mantissa = (1 << exponent) * output_width; + unsigned int phase_step = 0; + int phase_sum_left = 0; + int phase_sum_right = 0; + + for (phase = 0; phase < IMGU_SCALER_PHASES; phase++) { + for (tap = 0; tap < taps; tap++) { + /* flip table to for convolution reverse indexing */ + s64 coeff = coeffs[coeffs_size - + ((tap * (coeffs_size / taps)) + + phase) - 1]; + coeff *= mantissa; + coeff /= input_width; + + /* Add +"0.5" */ + coeff += 1 << (IMGU_SCALER_COEFF_BITS - 1); + coeff >>= IMGU_SCALER_COEFF_BITS; + + coeff_lut[phase * IMGU_SCALER_FILTER_TAPS + tap] = + coeff; + } + } + + phase_step = IMGU_SCALER_PHASES * + (1 << IMGU_SCALER_PHASE_COUNTER_PREC_REF) + * output_width / input_width; + phase_step += phase_step_correction; + phase_sum_left = (taps / 2 * IMGU_SCALER_PHASES * + (1 << IMGU_SCALER_PHASE_COUNTER_PREC_REF)) + - (1 << (IMGU_SCALER_PHASE_COUNTER_PREC_REF - 1)); + phase_sum_right = (taps / 2 * IMGU_SCALER_PHASES * + (1 << IMGU_SCALER_PHASE_COUNTER_PREC_REF)) + + (1 << (IMGU_SCALER_PHASE_COUNTER_PREC_REF - 1)); + + info->exp_shift = IMGU_SCALER_MAX_EXPONENT_SHIFT - exponent; + info->pad_left = (phase_sum_left % phase_step == 0) ? + phase_sum_left / phase_step - 1 : phase_sum_left / phase_step; + info->pad_right = (phase_sum_right % phase_step == 0) ? + phase_sum_right / phase_step - 1 : phase_sum_right / phase_step; + info->phase_init = phase_sum_left - phase_step * info->pad_left; + info->phase_step = phase_step; + info->crop_left = taps - 1; + info->crop_top = taps - 1; +} + +/* + * Calculates the exact output image width/height, based on phase_step setting + * (must be perfectly aligned with hardware). + */ +static unsigned int ipu3_css_scaler_calc_scaled_output(unsigned int input, + struct ipu3_css_scaler_info *info) +{ + unsigned int arg1 = input * info->phase_step + + (1 - IMGU_SCALER_TAPS_Y / 2) * + IMGU_SCALER_FIR_PHASES -