RE: [EXT] RE: [PATCH v8 10/15] crypto/fsl: Improve hwrng performance in kernel

2022-01-10 Thread Gaurav Jain
Hi Andrey

For now I will remove this particular patch from my series and will send a 
separate patch addressing your comments.

Regards
Gaurav Jain

> -Original Message-
> From: ZHIZHIKIN Andrey 
> Sent: Monday, January 10, 2022 7:31 PM
> To: Gaurav Jain ; u-boot@lists.denx.de
> Cc: Stefano Babic ; Fabio Estevam ;
> Peng Fan ; Simon Glass ; Priyanka
> Jain ; Ye Li ; Horia Geanta
> ; Ji Luo ; Franck Lenormand
> ; Silvano Di Ninno ;
> Sahil Malhotra ; Pankaj Gupta
> ; Varun Sethi ; dl-uboot-imx
> ; Shengzhou Liu ; Mingkai Hu
> ; Rajesh Bhagat ; Meenakshi
> Aggarwal ; Wasim Khan
> ; Alison Wang ; Pramod
> Kumar ; Andy Tang ;
> Adrian Alonso ; Vladimir Oltean
> ; mich...@walle.cc
> Subject: [EXT] RE: [PATCH v8 10/15] crypto/fsl: Improve hwrng performance in
> kernel
> 
> Caution: EXT Email
> 
> Hello Gaurav,
> 
> Cc: Michael Walle
> 
> > -Original Message-
> > From: U-Boot  On Behalf Of Gaurav Jain
> > Sent: Monday, January 10, 2022 1:27 PM
> > To: u-boot@lists.denx.de
> > Cc: Stefano Babic ; Fabio Estevam
> > ; Peng Fan ; Simon Glass
> > ; Priyanka Jain ; Ye Li
> > ; Horia Geanta ; Ji Luo
> > ; Franck Lenormand ; Silvano
> > Di Ninno ; Sahil malhotra
> > ; Pankaj Gupta ; Varun
> > Sethi ; NXP i . MX U-Boot Team ;
> > Shengzhou Liu ; Mingkai Hu
> > ; Rajesh Bhagat ;
> Meenakshi
> > Aggarwal ; Wasim Khan
> > ; Alison Wang ; Pramod
> Kumar
> > ; Tang Yuantian ; Adrian
> > Alonso ; Vladimir Oltean 
> > Subject: [PATCH v8 10/15] crypto/fsl: Improve hwrng performance in
> > kernel
> >
> > From: Ye Li 
> >
> > RNG parameters are reconfigured.
> > - For TRNG to generate 256 bits of entropy, RNG TRNG Seed Control register
> >   is configured to have reduced SAMP_SIZE from default 2500 to 512. it is
> >   number of entropy samples that will be taken during Entropy generation.
> > - self-test registers(Monobit Limit, Poker Range, Run Length Limit)
> >   are synchronized with new RTSDCTL[SAMP_SIZE] of 512.
> >
> > TRNG time is caluculated based on sample size.
> 
> Typo: caluculated -> calculated
> 
> > time required to generate entropy is reduced and hwrng performance
> > improved from 0.3 kB/s to 1.3 kB/s.
> 
> Is there any degradation in passed/failed FIPS 140-2 test count? Can you 
> provide
> some results from at least rngtest run?
> 
> >
> > Signed-off-by: Ye Li 
> > Acked-by: Gaurav Jain >
> > ---
> >  drivers/crypto/fsl/jr.c | 102 +---
> >  include/fsl_sec.h   |   1 +
> >  2 files changed, 87 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/crypto/fsl/jr.c b/drivers/crypto/fsl/jr.c index
> > a84440ab10..e5346a84a4 100644
> > --- a/drivers/crypto/fsl/jr.c
> > +++ b/drivers/crypto/fsl/jr.c
> > @@ -603,30 +603,100 @@ static u8 get_rng_vid(ccsr_sec_t *sec)
> >   */
> >  static void kick_trng(int ent_delay, ccsr_sec_t *sec)  {
> > + u32 samples  = 512; /* number of bits to generate and test */
> > + u32 mono_min = 195;
> > + u32 mono_max = 317;
> > + u32 mono_range  = mono_max - mono_min;
> > + u32 poker_min = 1031;
> > + u32 poker_max = 1600;
> > + u32 poker_range = poker_max - poker_min + 1;
> > + u32 retries= 2;
> > + u32 lrun_max   = 32;
> > + s32 run_1_min   = 27;
> > + s32 run_1_max   = 107;
> > + s32 run_1_range = run_1_max - run_1_min;
> > + s32 run_2_min   = 7;
> > + s32 run_2_max   = 62;
> > + s32 run_2_range = run_2_max - run_2_min;
> > + s32 run_3_min   = 0;
> > + s32 run_3_max   = 39;
> > + s32 run_3_range = run_3_max - run_3_min;
> > + s32 run_4_min   = -1;
> > + s32 run_4_max   = 26;
> > + s32 run_4_range = run_4_max - run_4_min;
> > + s32 run_5_min   = -1;
> > + s32 run_5_max   = 18;
> > + s32 run_5_range = run_5_max - run_5_min;
> > + s32 run_6_min   = -1;
> > + s32 run_6_max   = 17;
> > + s32 run_6_range = run_6_max - run_6_min;
> 
> I have a feeling that this whole block of local variables can be simplified. 
> I'm not
> sure it is required to list this so detailed.
> 
> You can attempt to define those values in header file and use macros to
> compute bound conditions, rather than allocating this on the stack here.
> 
> > + u32 val;
> > +
> >   struct rng4tst __iomem *rng =
> >   (struct rng4tst __iomem *)&sec->rng;
> > - u32 val;
> >
> > - /* 

RE: [PATCH v8 10/15] crypto/fsl: Improve hwrng performance in kernel

2022-01-10 Thread ZHIZHIKIN Andrey
Hello Gaurav,

Cc: Michael Walle

> -Original Message-
> From: U-Boot  On Behalf Of Gaurav Jain
> Sent: Monday, January 10, 2022 1:27 PM
> To: u-boot@lists.denx.de
> Cc: Stefano Babic ; Fabio Estevam ; Peng 
> Fan
> ; Simon Glass ; Priyanka Jain
> ; Ye Li ; Horia Geanta
> ; Ji Luo ; Franck Lenormand
> ; Silvano Di Ninno ; Sahil
> malhotra ; Pankaj Gupta ; Varun
> Sethi ; NXP i . MX U-Boot Team ; Shengzhou
> Liu ; Mingkai Hu ; Rajesh Bhagat
> ; Meenakshi Aggarwal ; 
> Wasim
> Khan ; Alison Wang ; Pramod Kumar
> ; Tang Yuantian ; Adrian Alonso
> ; Vladimir Oltean 
> Subject: [PATCH v8 10/15] crypto/fsl: Improve hwrng performance in kernel
> 
> From: Ye Li 
> 
> RNG parameters are reconfigured.
> - For TRNG to generate 256 bits of entropy, RNG TRNG Seed Control register
>   is configured to have reduced SAMP_SIZE from default 2500 to 512. it is
>   number of entropy samples that will be taken during Entropy generation.
> - self-test registers(Monobit Limit, Poker Range, Run Length Limit)
>   are synchronized with new RTSDCTL[SAMP_SIZE] of 512.
> 
> TRNG time is caluculated based on sample size.

Typo: caluculated -> calculated

> time required to generate entropy is reduced and
> hwrng performance improved from 0.3 kB/s to 1.3 kB/s.

Is there any degradation in passed/failed FIPS 140-2 test count? Can you
provide some results from at least rngtest run?

> 
> Signed-off-by: Ye Li 
> Acked-by: Gaurav Jain >
> ---
>  drivers/crypto/fsl/jr.c | 102 +---
>  include/fsl_sec.h   |   1 +
>  2 files changed, 87 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/crypto/fsl/jr.c b/drivers/crypto/fsl/jr.c
> index a84440ab10..e5346a84a4 100644
> --- a/drivers/crypto/fsl/jr.c
> +++ b/drivers/crypto/fsl/jr.c
> @@ -603,30 +603,100 @@ static u8 get_rng_vid(ccsr_sec_t *sec)
>   */
>  static void kick_trng(int ent_delay, ccsr_sec_t *sec)
>  {
> + u32 samples  = 512; /* number of bits to generate and test */
> + u32 mono_min = 195;
> + u32 mono_max = 317;
> + u32 mono_range  = mono_max - mono_min;
> + u32 poker_min = 1031;
> + u32 poker_max = 1600;
> + u32 poker_range = poker_max - poker_min + 1;
> + u32 retries= 2;
> + u32 lrun_max   = 32;
> + s32 run_1_min   = 27;
> + s32 run_1_max   = 107;
> + s32 run_1_range = run_1_max - run_1_min;
> + s32 run_2_min   = 7;
> + s32 run_2_max   = 62;
> + s32 run_2_range = run_2_max - run_2_min;
> + s32 run_3_min   = 0;
> + s32 run_3_max   = 39;
> + s32 run_3_range = run_3_max - run_3_min;
> + s32 run_4_min   = -1;
> + s32 run_4_max   = 26;
> + s32 run_4_range = run_4_max - run_4_min;
> + s32 run_5_min   = -1;
> + s32 run_5_max   = 18;
> + s32 run_5_range = run_5_max - run_5_min;
> + s32 run_6_min   = -1;
> + s32 run_6_max   = 17;
> + s32 run_6_range = run_6_max - run_6_min;

I have a feeling that this whole block of local variables can be
simplified. I'm not sure it is required to list this so detailed.

You can attempt to define those values in header file and use
macros to compute bound conditions, rather than allocating this on
the stack here.

> + u32 val;
> +
>   struct rng4tst __iomem *rng =
>   (struct rng4tst __iomem *)&sec->rng;
> - u32 val;
> 
> - /* put RNG4 into program mode */
> - sec_setbits32(&rng->rtmctl, RTMCTL_PRGM);
> - /* rtsdctl bits 0-15 contain "Entropy Delay, which defines the
> -  * length (in system clocks) of each Entropy sample taken
> -  * */
> + /* Put RNG in program mode */
> + /* Setting both RTMCTL:PRGM and RTMCTL:TRNG_ACC causes TRNG to
> +  * properly invalidate the entropy in the entropy register and
> +  * force re-generation.
> +  */
> + sec_setbits32(&rng->rtmctl, RTMCTL_PRGM | RTMCTL_ACC);
> +
> + /* Configure the RNG Entropy Delay
> +  * Performance-wise, it does not make sense to
> +  * set the delay to a value that is lower
> +  * than the last one that worked (i.e. the state handles
> +  * were instantiated properly. Thus, instead of wasting
> +  * time trying to set the values controlling the sample
> +  * frequency, the function simply returns.
> +  */
>   val = sec_in32(&rng->rtsdctl);
> - val = (val & ~RTSDCTL_ENT_DLY_MASK) |
> -   (ent_delay << RTSDCTL_ENT_DLY_SHIFT);
> + val &= RTSDCTL_ENT_DLY_MASK;
> + val >>= RTSDCTL_ENT_DLY_SHIFT;
> + if (ent_delay < val) {
> + /* Put RNG4 into run mode */
> + sec_clrbits32(&rng->rtmctl, RTMCTL_PRGM | RTMCTL_ACC);
> + return;
> + }
> +
> + val = (ent_delay << RTSDCTL_ENT_DLY_SHIFT) | samples;
>   sec_out32(&rng->rtsdctl, val);
> - /* min. freq. count, equal to 1/4 of the entropy sample length */
> - sec_out32(&rng->rtfreqmin, ent_delay >> 2);
> - /* disable maximum frequency count */
> - sec_out32(&rng->rtfreqmax, RTFRQMAX_DISABLE);
> +
> + /*
>