Re: [VOLK] a += b*c ?
Do we have documentation that an `add` implementation must be able to work in-place? Otherwise, we should probably write that down :) Also, on the API: C99 wise, I'm pretty sure this is a strict aliasing rule violation: pointers of different types mustn't point to the same data. The compiler is totally allowed to assume the first and second argument to volk_32f_x2_add_32f are pointing *distinct* objects, and hence could optimize as if the (const) a never changes as soon as the function has been entered. But exactly that happens. Don't see how this can go wrong for an operation like addition, but I honestly think the semantics of type_x2_operation_type should be that two inputs that are not the output are passed. If we want in-place kernels, we should probably have them separately. Cheers, Marcus On 8/16/22 14:13, Johannes Demel wrote: Hi Randall, in your case, https://github.com/gnuradio/volk/blob/main/kernels/volk/volk_32f_x2_multiply_32f.h followed by https://github.com/gnuradio/volk/blob/main/kernels/volk/volk_32f_x2_add_32f.h would be the way to go at the moment. ``` volk_32f_x2_multiply_32f(multiply_result, b, c, num_samples); volk_32f_x2_add_32f(a, a, multiply_result, num_samples); ``` You're welcome to start a new kernel ``` volk_32f_x3_multiply_add_32f(out, a, b, c, num_samples); ``` In fact, it would be a great addition to VOLK. Cheers Johannes On 16.08.22 01:38, Randall Wayth wrote: Thanks for the suggestions and apologies for not being 100% clear at the start. I'm not looking for a dot product. I'm looking for a[i] += b[i]*c[i] specifically for floating point So it would be the equivalent of IPP's ippsAddProduct_32f. The application is to apply a window to a set of samples before accumulating, to implement a weighted overlap add PFB. In my case the samples are real-valued, but I could also see a case for a and b being complex, or the case for b being 8 or 16-bit ints with a and c being floating point. Cheers, Randall
Re: [VOLK] a += b*c ?
Hi Randall, in your case, https://github.com/gnuradio/volk/blob/main/kernels/volk/volk_32f_x2_multiply_32f.h followed by https://github.com/gnuradio/volk/blob/main/kernels/volk/volk_32f_x2_add_32f.h would be the way to go at the moment. ``` volk_32f_x2_multiply_32f(multiply_result, b, c, num_samples); volk_32f_x2_add_32f(a, a, multiply_result, num_samples); ``` You're welcome to start a new kernel ``` volk_32f_x3_multiply_add_32f(out, a, b, c, num_samples); ``` In fact, it would be a great addition to VOLK. Cheers Johannes On 16.08.22 01:38, Randall Wayth wrote: Thanks for the suggestions and apologies for not being 100% clear at the start. I'm not looking for a dot product. I'm looking for a[i] += b[i]*c[i] specifically for floating point So it would be the equivalent of IPP's ippsAddProduct_32f. The application is to apply a window to a set of samples before accumulating, to implement a weighted overlap add PFB. In my case the samples are real-valued, but I could also see a case for a and b being complex, or the case for b being 8 or 16-bit ints with a and c being floating point. Cheers, Randall smime.p7s Description: S/MIME Cryptographic Signature
Re: [VOLK] a += b*c ?
Thanks for the suggestions and apologies for not being 100% clear at the start. I'm not looking for a dot product. I'm looking for a[i] += b[i]*c[i] specifically for floating point So it would be the equivalent of IPP's ippsAddProduct_32f. The application is to apply a window to a set of samples before accumulating, to implement a weighted overlap add PFB. In my case the samples are real-valued, but I could also see a case for a and b being complex, or the case for b being 8 or 16-bit ints with a and c being floating point. Cheers, Randall
Re: [VOLK] a += b*c ?
Hi, the kernels are type specific. However, if you want a dot product: a = \sum_{i=0}^{N-1} b[i]c[i], https://github.com/gnuradio/volk/blob/main/kernels/volk/volk_32fc_x2_dot_prod_32fc.h A [i] += B [i] * C [i], for i = 0...N-1 This would implement the above formula: 1. https://github.com/gnuradio/volk/blob/main/kernels/volk/volk_32fc_x2_multiply_32fc.h 2. https://github.com/gnuradio/volk/blob/main/kernels/volk/volk_32fc_x2_add_32fc.h It would be interesting to see how much faster an integrated kernel would be. The ipp `AddProduct function: https://www.intel.com/content/www/us/en/develop/documentation/ipp-dev-reference/top/volume-1-signal-and-data-processing/essential-functions/arithmetic-functions/addproduct.html I assume ipp aims for: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#ig_expand=3202,3202,3202,3206,3206,3206=_mm256_fmadd_ps Which version are you looking for? Cheers Johannes smime.p7s Description: S/MIME Cryptographic Signature
Re: [VOLK] a += b*c ?
On Mon, Aug 15, 2022 at 12:03:25PM +0200, Marcus Müller wrote: > Just to be sure you mean, for N-long b and c, > > a = \sum_{i=0}^{N-1} b[i]c[i], I suspect the OP meant A [i] += B [i] * C [i], for i = 0...N-1 -- FA
Re: [VOLK] a += b*c ?
Just to be sure you mean, for N-long b and c, a = \sum_{i=0}^{N-1} b[i]c[i], right? That's the dot product, and that exists for a couple in/output types; the kernels you're looking for are all called dot_prod. Cheers, Marcus On 8/15/22 04:41, Randall Wayth wrote: Hi Folks, Hopefully I am just missing this, but is there a kernel that does vectorised a += b*c ? Something like the IPP "AddProduct" function? Cheers, Randall.
[VOLK] a += b*c ?
Hi Folks, Hopefully I am just missing this, but is there a kernel that does vectorised a += b*c ? Something like the IPP "AddProduct" function? Cheers, Randall.