Re: [Discuss-gnuradio] Segfault in volk_32fc_x2_multiply_32fc_a_avx2_fma

2016-03-09 Thread West, Nathan
Good news!

That branch now belongs in GNU Radio.

Cheers,
Nathan

On Wed, Mar 9, 2016 at 8:45 AM, devin kelly  wrote:

> Thanks for the help, I don't think I could have figured this out on my own.
>
> This is because I'm on RHEL7 (argh!).  My libfftw.so doesn't contain any
> references to AVX. For me there are a couple of options for fixing this:
>
> 1) Use Nathan's branch.
> 2) Rebuild fftw with AVX support
> 3) Rebuild GR and Volk without AVX.
>
> I tried 2) first and noticed this in the spec file that was in the source
> RPM I was trying to rebuild:
>
> %ifarch %{ix86} x86_64
> # Enable SSE2 support for x86 and x86_64
> # (no avx as it is claimed to drastically slower)
> for((i=0;i<2;i++)); do
>  prec_flags[i]+=" --enable-sse2"
> done
> %endif
>
> Is the spec file author right?  Now I'm a little confused about the
> approach I should take.  I'll probably just go with 1) in the mean time.
>
> Thanks again Nathan,
> Devin
>
> On Wed, Mar 9, 2016 at 1:06 AM, West, Nathan 
> wrote:
>
>> The a and c vectors come from gr:fft objects' internal buffers. These are
>> internally created with fftwf_malloc (lines 152/156 of gr-fft/lib/fft.cc).
>> fftwf_malloc is obviously not generating buffers with proper alignment so
>> you're seeing a 50% (per buffer) that this segfaults. I'll note that this
>> is also only an issue with fftwf buffers when fftwf isn't built with AVX
>> support (and therefore nothing in fftwf requires  a 32-byte aligned buffer).
>>
>> Andy Walls (thanks!) pointed out on IRC that we had a similar issue years
>> ago with a QT sink.
>>
>> I have a branch that should fix this (
>> https://github.com/n-west/gnuradio/tree/fft-avx-alignment). I also
>> suggest you look in to getting a version of fftwf built with AVX. I don't
>> know if there's a good way to tell, but if I run readelf -a on my
>> libfftw3.so I see some functions with avx in the name.
>>
>> Cheers,
>> nw
>>
>>
>> On Tue, Mar 8, 2016 at 1:31 PM, devin kelly  wrote:
>>
>>> OK, here's my C program:
>>>
>>> #include 
>>> #include 
>>> #include 
>>> #include 
>>>
>>> int main() {
>>>
>>> size_t alignment = volk_get_alignment();
>>>
>>> uint8_t* ptr;
>>>
>>> ptr = (uint8_t*)volk_malloc(1000 * sizeof(uint8_t), alignment);
>>> printf("alignment = %lu, ptr = %x, *ptr = %u\n", alignment, ptr,
>>> *ptr);
>>> volk_free((void*)ptr);
>>> ptr = NULL;
>>>
>>>
>>> return 0;
>>> }
>>>
>>>
>>> Compile:
>>>
>>> $ gcc volk_test.c -o volk_test -lvolk -L/local_disk/gr_3.7.9_debug/lib
>>>
>>> It's output:
>>>
>>> $ ./volk_test
>>> Using Volk machine: avx2_64_mmx_orc
>>> alignment = 32, ptr = 151b040, *ptr = 00
>>>
>>> Also, I've attached the output from the preprocessor, this command:
>>>
>>> $ /usr/bin/cc  -DHAVE_AVX_CVTPI32_PS -DHAVE_CPUID_H -DHAVE_DLFCN_H
>>> -DHAVE_FENV_H -DHAVE_POSIX_MEMALIGN -DHAVE_XGETBV -Wall -fvisibility=hidden
>>> -g -I/local_disk/gr_3.7.9_src/volk/build_debug/include
>>> -I/local_disk/gr_3.7.9_src/volk/include
>>> -I/local_disk/gr_3.7.9_src/volk/kernels
>>> -I/local_disk/gr_3.7.9_src/volk/build_debug/lib
>>> -I/local_disk/gr_3.7.9_src/volk/lib -I/usr/include/orc-0.4  -E  -fPIC -o
>>> volk_malloc_preprocessed   -c
>>> /local_disk/gr_3.7.9_src/volk/lib/volk_malloc.c
>>>
>>> I just found the compiler step from from doing 'VERBOSE=1 make' then
>>> changed the output and added -E.  I attached volk_malloc_preprocessed as
>>> well.
>>>
>>> It looks like this is my volk_malloc():
>>>
>>>
>>> void *volk_malloc(size_t size, size_t alignment)
>>> {
>>>   void *ptr;
>>>
>>>
>>>
>>>
>>>   if (alignment == 1)
>>> return malloc(size);
>>>
>>>   int err = posix_memalign(, alignment, size);
>>>   if(err == 0) {
>>> return ptr;
>>>   }
>>>   else {
>>> fprintf(stderr,
>>> "VOLK: Error allocating memory "
>>> "(posix_memalign: error %d: %s)\n", err, strerror(err));
>>> return ((void *)0);
>>>   }
>>> }
>>>
>>>
>>>
>>> Devin
>>>
>>>
>>>
>>> On Tue, Mar 8, 2016 at 11:37 AM, West, Nathan <
>>> n...@ostatemail.okstate.edu> wrote:
>>>

 On Tue, Mar 8, 2016 at 10:58 AM, devin kelly 
 wrote:

> Calling 'info variables' (or args or locals) the last few frames
> didn't give me any real info so I built a copy of GR/Volk with debug
> symbols.  I ran the FG again, this time from GDB, here's my back trace.  
> In
> this backtrace you can see the arguments passed in each call.  I have an
> i7-5600U CPU @ 2.60GHz, the volk_profile is appended at the bottom.
>

 Excellent. Thanks for going through that extra step. It really helps.


>
> Here's are the links for the relevant code:
>
>
> https://github.com/gnuradio/volk/blob/f0b722392950bf7ede7b32f5ff60019bce7a8592/kernels/volk/volk_32fc_x2_multiply_32fc.h#L232
>
> https://github.com/gnuradio/gnuradio/blob/master/gr-filter/lib/fft_filter.cc#L323
>
> 

Re: [Discuss-gnuradio] Segfault in volk_32fc_x2_multiply_32fc_a_avx2_fma

2016-03-09 Thread Anon Lister
To answer your question however, he's misguided. Fftw and volk both have
methods (volk_profile, fftw-wisdom) to profile and determine the best
instructions to use for cases where they have multiple options, it's not
going to get noticeably slower by compiling the extra stuff in.
On Mar 9, 2016 8:47 AM, "devin kelly"  wrote:

> Thanks for the help, I don't think I could have figured this out on my own.
>
> This is because I'm on RHEL7 (argh!).  My libfftw.so doesn't contain any
> references to AVX. For me there are a couple of options for fixing this:
>
> 1) Use Nathan's branch.
> 2) Rebuild fftw with AVX support
> 3) Rebuild GR and Volk without AVX.
>
> I tried 2) first and noticed this in the spec file that was in the source
> RPM I was trying to rebuild:
>
> %ifarch %{ix86} x86_64
> # Enable SSE2 support for x86 and x86_64
> # (no avx as it is claimed to drastically slower)
> for((i=0;i<2;i++)); do
>  prec_flags[i]+=" --enable-sse2"
> done
> %endif
>
> Is the spec file author right?  Now I'm a little confused about the
> approach I should take.  I'll probably just go with 1) in the mean time.
>
> Thanks again Nathan,
> Devin
>
> On Wed, Mar 9, 2016 at 1:06 AM, West, Nathan 
> wrote:
>
>> The a and c vectors come from gr:fft objects' internal buffers. These are
>> internally created with fftwf_malloc (lines 152/156 of gr-fft/lib/fft.cc).
>> fftwf_malloc is obviously not generating buffers with proper alignment so
>> you're seeing a 50% (per buffer) that this segfaults. I'll note that this
>> is also only an issue with fftwf buffers when fftwf isn't built with AVX
>> support (and therefore nothing in fftwf requires  a 32-byte aligned buffer).
>>
>> Andy Walls (thanks!) pointed out on IRC that we had a similar issue years
>> ago with a QT sink.
>>
>> I have a branch that should fix this (
>> https://github.com/n-west/gnuradio/tree/fft-avx-alignment). I also
>> suggest you look in to getting a version of fftwf built with AVX. I don't
>> know if there's a good way to tell, but if I run readelf -a on my
>> libfftw3.so I see some functions with avx in the name.
>>
>> Cheers,
>> nw
>>
>>
>> On Tue, Mar 8, 2016 at 1:31 PM, devin kelly  wrote:
>>
>>> OK, here's my C program:
>>>
>>> #include 
>>> #include 
>>> #include 
>>> #include 
>>>
>>> int main() {
>>>
>>> size_t alignment = volk_get_alignment();
>>>
>>> uint8_t* ptr;
>>>
>>> ptr = (uint8_t*)volk_malloc(1000 * sizeof(uint8_t), alignment);
>>> printf("alignment = %lu, ptr = %x, *ptr = %u\n", alignment, ptr,
>>> *ptr);
>>> volk_free((void*)ptr);
>>> ptr = NULL;
>>>
>>>
>>> return 0;
>>> }
>>>
>>>
>>> Compile:
>>>
>>> $ gcc volk_test.c -o volk_test -lvolk -L/local_disk/gr_3.7.9_debug/lib
>>>
>>> It's output:
>>>
>>> $ ./volk_test
>>> Using Volk machine: avx2_64_mmx_orc
>>> alignment = 32, ptr = 151b040, *ptr = 00
>>>
>>> Also, I've attached the output from the preprocessor, this command:
>>>
>>> $ /usr/bin/cc  -DHAVE_AVX_CVTPI32_PS -DHAVE_CPUID_H -DHAVE_DLFCN_H
>>> -DHAVE_FENV_H -DHAVE_POSIX_MEMALIGN -DHAVE_XGETBV -Wall -fvisibility=hidden
>>> -g -I/local_disk/gr_3.7.9_src/volk/build_debug/include
>>> -I/local_disk/gr_3.7.9_src/volk/include
>>> -I/local_disk/gr_3.7.9_src/volk/kernels
>>> -I/local_disk/gr_3.7.9_src/volk/build_debug/lib
>>> -I/local_disk/gr_3.7.9_src/volk/lib -I/usr/include/orc-0.4  -E  -fPIC -o
>>> volk_malloc_preprocessed   -c
>>> /local_disk/gr_3.7.9_src/volk/lib/volk_malloc.c
>>>
>>> I just found the compiler step from from doing 'VERBOSE=1 make' then
>>> changed the output and added -E.  I attached volk_malloc_preprocessed as
>>> well.
>>>
>>> It looks like this is my volk_malloc():
>>>
>>>
>>> void *volk_malloc(size_t size, size_t alignment)
>>> {
>>>   void *ptr;
>>>
>>>
>>>
>>>
>>>   if (alignment == 1)
>>> return malloc(size);
>>>
>>>   int err = posix_memalign(, alignment, size);
>>>   if(err == 0) {
>>> return ptr;
>>>   }
>>>   else {
>>> fprintf(stderr,
>>> "VOLK: Error allocating memory "
>>> "(posix_memalign: error %d: %s)\n", err, strerror(err));
>>> return ((void *)0);
>>>   }
>>> }
>>>
>>>
>>>
>>> Devin
>>>
>>>
>>>
>>> On Tue, Mar 8, 2016 at 11:37 AM, West, Nathan <
>>> n...@ostatemail.okstate.edu> wrote:
>>>

 On Tue, Mar 8, 2016 at 10:58 AM, devin kelly 
 wrote:

> Calling 'info variables' (or args or locals) the last few frames
> didn't give me any real info so I built a copy of GR/Volk with debug
> symbols.  I ran the FG again, this time from GDB, here's my back trace.  
> In
> this backtrace you can see the arguments passed in each call.  I have an
> i7-5600U CPU @ 2.60GHz, the volk_profile is appended at the bottom.
>

 Excellent. Thanks for going through that extra step. It really helps.


>
> Here's are the links for the relevant code:
>
>
> 

Re: [Discuss-gnuradio] Segfault in volk_32fc_x2_multiply_32fc_a_avx2_fma

2016-03-09 Thread Anon Lister
I would use fftw source from their site not rhels source rpm, unless you
need to deploy it on a large number of machines. (Even then I would pull
latest source and update the srpm)

You can just build fftw from source. Add almost all the configure options.
We do this on rhel 6 because their version doesn't support wisdom or take
advantage of any recent CPU releases. It should install in /usr/local by
default. Just add the library path to /etc/ld.so.conf and run ldconfig and
you'll be set.
On Mar 9, 2016 8:47 AM, "devin kelly"  wrote:

> Thanks for the help, I don't think I could have figured this out on my own.
>
> This is because I'm on RHEL7 (argh!).  My libfftw.so doesn't contain any
> references to AVX. For me there are a couple of options for fixing this:
>
> 1) Use Nathan's branch.
> 2) Rebuild fftw with AVX support
> 3) Rebuild GR and Volk without AVX.
>
> I tried 2) first and noticed this in the spec file that was in the source
> RPM I was trying to rebuild:
>
> %ifarch %{ix86} x86_64
> # Enable SSE2 support for x86 and x86_64
> # (no avx as it is claimed to drastically slower)
> for((i=0;i<2;i++)); do
>  prec_flags[i]+=" --enable-sse2"
> done
> %endif
>
> Is the spec file author right?  Now I'm a little confused about the
> approach I should take.  I'll probably just go with 1) in the mean time.
>
> Thanks again Nathan,
> Devin
>
> On Wed, Mar 9, 2016 at 1:06 AM, West, Nathan 
> wrote:
>
>> The a and c vectors come from gr:fft objects' internal buffers. These are
>> internally created with fftwf_malloc (lines 152/156 of gr-fft/lib/fft.cc).
>> fftwf_malloc is obviously not generating buffers with proper alignment so
>> you're seeing a 50% (per buffer) that this segfaults. I'll note that this
>> is also only an issue with fftwf buffers when fftwf isn't built with AVX
>> support (and therefore nothing in fftwf requires  a 32-byte aligned buffer).
>>
>> Andy Walls (thanks!) pointed out on IRC that we had a similar issue years
>> ago with a QT sink.
>>
>> I have a branch that should fix this (
>> https://github.com/n-west/gnuradio/tree/fft-avx-alignment). I also
>> suggest you look in to getting a version of fftwf built with AVX. I don't
>> know if there's a good way to tell, but if I run readelf -a on my
>> libfftw3.so I see some functions with avx in the name.
>>
>> Cheers,
>> nw
>>
>>
>> On Tue, Mar 8, 2016 at 1:31 PM, devin kelly  wrote:
>>
>>> OK, here's my C program:
>>>
>>> #include 
>>> #include 
>>> #include 
>>> #include 
>>>
>>> int main() {
>>>
>>> size_t alignment = volk_get_alignment();
>>>
>>> uint8_t* ptr;
>>>
>>> ptr = (uint8_t*)volk_malloc(1000 * sizeof(uint8_t), alignment);
>>> printf("alignment = %lu, ptr = %x, *ptr = %u\n", alignment, ptr,
>>> *ptr);
>>> volk_free((void*)ptr);
>>> ptr = NULL;
>>>
>>>
>>> return 0;
>>> }
>>>
>>>
>>> Compile:
>>>
>>> $ gcc volk_test.c -o volk_test -lvolk -L/local_disk/gr_3.7.9_debug/lib
>>>
>>> It's output:
>>>
>>> $ ./volk_test
>>> Using Volk machine: avx2_64_mmx_orc
>>> alignment = 32, ptr = 151b040, *ptr = 00
>>>
>>> Also, I've attached the output from the preprocessor, this command:
>>>
>>> $ /usr/bin/cc  -DHAVE_AVX_CVTPI32_PS -DHAVE_CPUID_H -DHAVE_DLFCN_H
>>> -DHAVE_FENV_H -DHAVE_POSIX_MEMALIGN -DHAVE_XGETBV -Wall -fvisibility=hidden
>>> -g -I/local_disk/gr_3.7.9_src/volk/build_debug/include
>>> -I/local_disk/gr_3.7.9_src/volk/include
>>> -I/local_disk/gr_3.7.9_src/volk/kernels
>>> -I/local_disk/gr_3.7.9_src/volk/build_debug/lib
>>> -I/local_disk/gr_3.7.9_src/volk/lib -I/usr/include/orc-0.4  -E  -fPIC -o
>>> volk_malloc_preprocessed   -c
>>> /local_disk/gr_3.7.9_src/volk/lib/volk_malloc.c
>>>
>>> I just found the compiler step from from doing 'VERBOSE=1 make' then
>>> changed the output and added -E.  I attached volk_malloc_preprocessed as
>>> well.
>>>
>>> It looks like this is my volk_malloc():
>>>
>>>
>>> void *volk_malloc(size_t size, size_t alignment)
>>> {
>>>   void *ptr;
>>>
>>>
>>>
>>>
>>>   if (alignment == 1)
>>> return malloc(size);
>>>
>>>   int err = posix_memalign(, alignment, size);
>>>   if(err == 0) {
>>> return ptr;
>>>   }
>>>   else {
>>> fprintf(stderr,
>>> "VOLK: Error allocating memory "
>>> "(posix_memalign: error %d: %s)\n", err, strerror(err));
>>> return ((void *)0);
>>>   }
>>> }
>>>
>>>
>>>
>>> Devin
>>>
>>>
>>>
>>> On Tue, Mar 8, 2016 at 11:37 AM, West, Nathan <
>>> n...@ostatemail.okstate.edu> wrote:
>>>

 On Tue, Mar 8, 2016 at 10:58 AM, devin kelly 
 wrote:

> Calling 'info variables' (or args or locals) the last few frames
> didn't give me any real info so I built a copy of GR/Volk with debug
> symbols.  I ran the FG again, this time from GDB, here's my back trace.  
> In
> this backtrace you can see the arguments passed in each call.  I have an
> i7-5600U CPU @ 2.60GHz, the volk_profile is appended at the bottom.

Re: [Discuss-gnuradio] Segfault in volk_32fc_x2_multiply_32fc_a_avx2_fma

2016-03-09 Thread devin kelly
Thanks for the help, I don't think I could have figured this out on my own.

This is because I'm on RHEL7 (argh!).  My libfftw.so doesn't contain any
references to AVX. For me there are a couple of options for fixing this:

1) Use Nathan's branch.
2) Rebuild fftw with AVX support
3) Rebuild GR and Volk without AVX.

I tried 2) first and noticed this in the spec file that was in the source
RPM I was trying to rebuild:

%ifarch %{ix86} x86_64
# Enable SSE2 support for x86 and x86_64
# (no avx as it is claimed to drastically slower)
for((i=0;i<2;i++)); do
 prec_flags[i]+=" --enable-sse2"
done
%endif

Is the spec file author right?  Now I'm a little confused about the
approach I should take.  I'll probably just go with 1) in the mean time.

Thanks again Nathan,
Devin

On Wed, Mar 9, 2016 at 1:06 AM, West, Nathan 
wrote:

> The a and c vectors come from gr:fft objects' internal buffers. These are
> internally created with fftwf_malloc (lines 152/156 of gr-fft/lib/fft.cc).
> fftwf_malloc is obviously not generating buffers with proper alignment so
> you're seeing a 50% (per buffer) that this segfaults. I'll note that this
> is also only an issue with fftwf buffers when fftwf isn't built with AVX
> support (and therefore nothing in fftwf requires  a 32-byte aligned buffer).
>
> Andy Walls (thanks!) pointed out on IRC that we had a similar issue years
> ago with a QT sink.
>
> I have a branch that should fix this (
> https://github.com/n-west/gnuradio/tree/fft-avx-alignment). I also
> suggest you look in to getting a version of fftwf built with AVX. I don't
> know if there's a good way to tell, but if I run readelf -a on my
> libfftw3.so I see some functions with avx in the name.
>
> Cheers,
> nw
>
>
> On Tue, Mar 8, 2016 at 1:31 PM, devin kelly  wrote:
>
>> OK, here's my C program:
>>
>> #include 
>> #include 
>> #include 
>> #include 
>>
>> int main() {
>>
>> size_t alignment = volk_get_alignment();
>>
>> uint8_t* ptr;
>>
>> ptr = (uint8_t*)volk_malloc(1000 * sizeof(uint8_t), alignment);
>> printf("alignment = %lu, ptr = %x, *ptr = %u\n", alignment, ptr,
>> *ptr);
>> volk_free((void*)ptr);
>> ptr = NULL;
>>
>>
>> return 0;
>> }
>>
>>
>> Compile:
>>
>> $ gcc volk_test.c -o volk_test -lvolk -L/local_disk/gr_3.7.9_debug/lib
>>
>> It's output:
>>
>> $ ./volk_test
>> Using Volk machine: avx2_64_mmx_orc
>> alignment = 32, ptr = 151b040, *ptr = 00
>>
>> Also, I've attached the output from the preprocessor, this command:
>>
>> $ /usr/bin/cc  -DHAVE_AVX_CVTPI32_PS -DHAVE_CPUID_H -DHAVE_DLFCN_H
>> -DHAVE_FENV_H -DHAVE_POSIX_MEMALIGN -DHAVE_XGETBV -Wall -fvisibility=hidden
>> -g -I/local_disk/gr_3.7.9_src/volk/build_debug/include
>> -I/local_disk/gr_3.7.9_src/volk/include
>> -I/local_disk/gr_3.7.9_src/volk/kernels
>> -I/local_disk/gr_3.7.9_src/volk/build_debug/lib
>> -I/local_disk/gr_3.7.9_src/volk/lib -I/usr/include/orc-0.4  -E  -fPIC -o
>> volk_malloc_preprocessed   -c
>> /local_disk/gr_3.7.9_src/volk/lib/volk_malloc.c
>>
>> I just found the compiler step from from doing 'VERBOSE=1 make' then
>> changed the output and added -E.  I attached volk_malloc_preprocessed as
>> well.
>>
>> It looks like this is my volk_malloc():
>>
>>
>> void *volk_malloc(size_t size, size_t alignment)
>> {
>>   void *ptr;
>>
>>
>>
>>
>>   if (alignment == 1)
>> return malloc(size);
>>
>>   int err = posix_memalign(, alignment, size);
>>   if(err == 0) {
>> return ptr;
>>   }
>>   else {
>> fprintf(stderr,
>> "VOLK: Error allocating memory "
>> "(posix_memalign: error %d: %s)\n", err, strerror(err));
>> return ((void *)0);
>>   }
>> }
>>
>>
>>
>> Devin
>>
>>
>>
>> On Tue, Mar 8, 2016 at 11:37 AM, West, Nathan <
>> n...@ostatemail.okstate.edu> wrote:
>>
>>>
>>> On Tue, Mar 8, 2016 at 10:58 AM, devin kelly  wrote:
>>>
 Calling 'info variables' (or args or locals) the last few frames didn't
 give me any real info so I built a copy of GR/Volk with debug symbols.  I
 ran the FG again, this time from GDB, here's my back trace.  In this
 backtrace you can see the arguments passed in each call.  I have an
 i7-5600U CPU @ 2.60GHz, the volk_profile is appended at the bottom.

>>>
>>> Excellent. Thanks for going through that extra step. It really helps.
>>>
>>>

 Here's are the links for the relevant code:


 https://github.com/gnuradio/volk/blob/f0b722392950bf7ede7b32f5ff60019bce7a8592/kernels/volk/volk_32fc_x2_multiply_32fc.h#L232

 https://github.com/gnuradio/gnuradio/blob/master/gr-filter/lib/fft_filter.cc#L323

 https://github.com/gnuradio/gnuradio/blob/222e0003f9797a1b92d64855bd2b93f0d9099f93/gr-digital/lib/corr_est_cc_impl.cc#L214

 Could the problem be that nitems is 257 and num_points is 512?  Or
 should nitems really be 256 and not 257?

>>>
>>> I don't think so. I'm not familiar with the details of the fft_filter
>>> implementations, 

Re: [Discuss-gnuradio] Segfault in volk_32fc_x2_multiply_32fc_a_avx2_fma

2016-03-08 Thread West, Nathan
The a and c vectors come from gr:fft objects' internal buffers. These are
internally created with fftwf_malloc (lines 152/156 of gr-fft/lib/fft.cc).
fftwf_malloc is obviously not generating buffers with proper alignment so
you're seeing a 50% (per buffer) that this segfaults. I'll note that this
is also only an issue with fftwf buffers when fftwf isn't built with AVX
support (and therefore nothing in fftwf requires  a 32-byte aligned buffer).

Andy Walls (thanks!) pointed out on IRC that we had a similar issue years
ago with a QT sink.

I have a branch that should fix this (
https://github.com/n-west/gnuradio/tree/fft-avx-alignment). I also suggest
you look in to getting a version of fftwf built with AVX. I don't know if
there's a good way to tell, but if I run readelf -a on my libfftw3.so I see
some functions with avx in the name.

Cheers,
nw


On Tue, Mar 8, 2016 at 1:31 PM, devin kelly  wrote:

> OK, here's my C program:
>
> #include 
> #include 
> #include 
> #include 
>
> int main() {
>
> size_t alignment = volk_get_alignment();
>
> uint8_t* ptr;
>
> ptr = (uint8_t*)volk_malloc(1000 * sizeof(uint8_t), alignment);
> printf("alignment = %lu, ptr = %x, *ptr = %u\n", alignment, ptr, *ptr);
> volk_free((void*)ptr);
> ptr = NULL;
>
>
> return 0;
> }
>
>
> Compile:
>
> $ gcc volk_test.c -o volk_test -lvolk -L/local_disk/gr_3.7.9_debug/lib
>
> It's output:
>
> $ ./volk_test
> Using Volk machine: avx2_64_mmx_orc
> alignment = 32, ptr = 151b040, *ptr = 00
>
> Also, I've attached the output from the preprocessor, this command:
>
> $ /usr/bin/cc  -DHAVE_AVX_CVTPI32_PS -DHAVE_CPUID_H -DHAVE_DLFCN_H
> -DHAVE_FENV_H -DHAVE_POSIX_MEMALIGN -DHAVE_XGETBV -Wall -fvisibility=hidden
> -g -I/local_disk/gr_3.7.9_src/volk/build_debug/include
> -I/local_disk/gr_3.7.9_src/volk/include
> -I/local_disk/gr_3.7.9_src/volk/kernels
> -I/local_disk/gr_3.7.9_src/volk/build_debug/lib
> -I/local_disk/gr_3.7.9_src/volk/lib -I/usr/include/orc-0.4  -E  -fPIC -o
> volk_malloc_preprocessed   -c
> /local_disk/gr_3.7.9_src/volk/lib/volk_malloc.c
>
> I just found the compiler step from from doing 'VERBOSE=1 make' then
> changed the output and added -E.  I attached volk_malloc_preprocessed as
> well.
>
> It looks like this is my volk_malloc():
>
>
> void *volk_malloc(size_t size, size_t alignment)
> {
>   void *ptr;
>
>
>
>
>   if (alignment == 1)
> return malloc(size);
>
>   int err = posix_memalign(, alignment, size);
>   if(err == 0) {
> return ptr;
>   }
>   else {
> fprintf(stderr,
> "VOLK: Error allocating memory "
> "(posix_memalign: error %d: %s)\n", err, strerror(err));
> return ((void *)0);
>   }
> }
>
>
>
> Devin
>
>
>
> On Tue, Mar 8, 2016 at 11:37 AM, West, Nathan  > wrote:
>
>>
>> On Tue, Mar 8, 2016 at 10:58 AM, devin kelly  wrote:
>>
>>> Calling 'info variables' (or args or locals) the last few frames didn't
>>> give me any real info so I built a copy of GR/Volk with debug symbols.  I
>>> ran the FG again, this time from GDB, here's my back trace.  In this
>>> backtrace you can see the arguments passed in each call.  I have an
>>> i7-5600U CPU @ 2.60GHz, the volk_profile is appended at the bottom.
>>>
>>
>> Excellent. Thanks for going through that extra step. It really helps.
>>
>>
>>>
>>> Here's are the links for the relevant code:
>>>
>>>
>>> https://github.com/gnuradio/volk/blob/f0b722392950bf7ede7b32f5ff60019bce7a8592/kernels/volk/volk_32fc_x2_multiply_32fc.h#L232
>>>
>>> https://github.com/gnuradio/gnuradio/blob/master/gr-filter/lib/fft_filter.cc#L323
>>>
>>> https://github.com/gnuradio/gnuradio/blob/222e0003f9797a1b92d64855bd2b93f0d9099f93/gr-digital/lib/corr_est_cc_impl.cc#L214
>>>
>>> Could the problem be that nitems is 257 and num_points is 512?  Or
>>> should nitems really be 256 and not 257?
>>>
>>
>> I don't think so. I'm not familiar with the details of the fft_filter
>> implementations, but usually these things will take in some history if they
>> don't have enough points to operate on (in this case 512).
>>
>> The much more worrying thing is your vector addresses.
>>
>>
>>>
>>> Thanks,
>>> Devin
>>>
>>> (gdb) bt
>>> #0  0x7fffdcaccb57 in volk_32fc_x2_multiply_32fc_a_avx2_fma
>>> (__P=0x3b051b0)
>>> at /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/avxintrin.h:835
>>> #1  0x7fffdcaccb57 in volk_32fc_x2_multiply_32fc_a_avx2_fma
>>> (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
>>>
>>
>> 0x3b1f770 % 32 = 16 (bad)
>> 0x3b051b0 % 32 = 16 (bad)
>> 0x3b240e0 % 32 = 0 (good)
>>
>> Unfortunately it looks like volk_get_alignment is returning the wrong
>> thing or there's a bug in volk_malloc. Can you tell us what
>> volk_get_alignment returns? The easiest thing is probably to write a simple
>> C program that prints out the result (hmm, I should add that to
>> volk-config-info). I'd also like to know which volk_malloc implementation
>> you're using. 

Re: [Discuss-gnuradio] Segfault in volk_32fc_x2_multiply_32fc_a_avx2_fma

2016-03-08 Thread devin kelly
OK, here's my C program:

#include 
#include 
#include 
#include 

int main() {

size_t alignment = volk_get_alignment();

uint8_t* ptr;

ptr = (uint8_t*)volk_malloc(1000 * sizeof(uint8_t), alignment);
printf("alignment = %lu, ptr = %x, *ptr = %u\n", alignment, ptr, *ptr);
volk_free((void*)ptr);
ptr = NULL;


return 0;
}


Compile:

$ gcc volk_test.c -o volk_test -lvolk -L/local_disk/gr_3.7.9_debug/lib

It's output:

$ ./volk_test
Using Volk machine: avx2_64_mmx_orc
alignment = 32, ptr = 151b040, *ptr = 00

Also, I've attached the output from the preprocessor, this command:

$ /usr/bin/cc  -DHAVE_AVX_CVTPI32_PS -DHAVE_CPUID_H -DHAVE_DLFCN_H
-DHAVE_FENV_H -DHAVE_POSIX_MEMALIGN -DHAVE_XGETBV -Wall -fvisibility=hidden
-g -I/local_disk/gr_3.7.9_src/volk/build_debug/include
-I/local_disk/gr_3.7.9_src/volk/include
-I/local_disk/gr_3.7.9_src/volk/kernels
-I/local_disk/gr_3.7.9_src/volk/build_debug/lib
-I/local_disk/gr_3.7.9_src/volk/lib -I/usr/include/orc-0.4  -E  -fPIC -o
volk_malloc_preprocessed   -c
/local_disk/gr_3.7.9_src/volk/lib/volk_malloc.c

I just found the compiler step from from doing 'VERBOSE=1 make' then
changed the output and added -E.  I attached volk_malloc_preprocessed as
well.

It looks like this is my volk_malloc():


void *volk_malloc(size_t size, size_t alignment)
{
  void *ptr;




  if (alignment == 1)
return malloc(size);

  int err = posix_memalign(, alignment, size);
  if(err == 0) {
return ptr;
  }
  else {
fprintf(stderr,
"VOLK: Error allocating memory "
"(posix_memalign: error %d: %s)\n", err, strerror(err));
return ((void *)0);
  }
}



Devin



On Tue, Mar 8, 2016 at 11:37 AM, West, Nathan 
wrote:

>
> On Tue, Mar 8, 2016 at 10:58 AM, devin kelly  wrote:
>
>> Calling 'info variables' (or args or locals) the last few frames didn't
>> give me any real info so I built a copy of GR/Volk with debug symbols.  I
>> ran the FG again, this time from GDB, here's my back trace.  In this
>> backtrace you can see the arguments passed in each call.  I have an
>> i7-5600U CPU @ 2.60GHz, the volk_profile is appended at the bottom.
>>
>
> Excellent. Thanks for going through that extra step. It really helps.
>
>
>>
>> Here's are the links for the relevant code:
>>
>>
>> https://github.com/gnuradio/volk/blob/f0b722392950bf7ede7b32f5ff60019bce7a8592/kernels/volk/volk_32fc_x2_multiply_32fc.h#L232
>>
>> https://github.com/gnuradio/gnuradio/blob/master/gr-filter/lib/fft_filter.cc#L323
>>
>> https://github.com/gnuradio/gnuradio/blob/222e0003f9797a1b92d64855bd2b93f0d9099f93/gr-digital/lib/corr_est_cc_impl.cc#L214
>>
>> Could the problem be that nitems is 257 and num_points is 512?  Or should
>> nitems really be 256 and not 257?
>>
>
> I don't think so. I'm not familiar with the details of the fft_filter
> implementations, but usually these things will take in some history if they
> don't have enough points to operate on (in this case 512).
>
> The much more worrying thing is your vector addresses.
>
>
>>
>> Thanks,
>> Devin
>>
>> (gdb) bt
>> #0  0x7fffdcaccb57 in volk_32fc_x2_multiply_32fc_a_avx2_fma
>> (__P=0x3b051b0)
>> at /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/avxintrin.h:835
>> #1  0x7fffdcaccb57 in volk_32fc_x2_multiply_32fc_a_avx2_fma
>> (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
>>
>
> 0x3b1f770 % 32 = 16 (bad)
> 0x3b051b0 % 32 = 16 (bad)
> 0x3b240e0 % 32 = 0 (good)
>
> Unfortunately it looks like volk_get_alignment is returning the wrong
> thing or there's a bug in volk_malloc. Can you tell us what
> volk_get_alignment returns? The easiest thing is probably to write a simple
> C program that prints out the result (hmm, I should add that to
> volk-config-info). I'd also like to know which volk_malloc implementation
> you're using. Unfortunately I don't think we have an easy way to discover
> that (hmm, something else that should be added to volk-config-info). I
> think the best way might be to look at volk_malloc.c intermediate files
> after the preprocessor has done its work.
>
> If you want to move on while we figure this out then you can edit
> ~/.volk/volk_config and replace the avx2_fma with sse3 on the line that has
> this kernel name on it.
>
>
>> at
>> /local_disk/gr_3.7.9_src/volk/kernels/volk/volk_32fc_x2_multiply_32fc.h:242
>> #2  0x7fffdc945a75 in __volk_32fc_x2_multiply_32fc_a
>> (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
>> at /local_disk/gr_3.7.9_src/volk/build_debug/lib/volk.c:7010
>> #3  0x7fffd3f8e360 in gr::filter::kernel::fft_filter_ccc::filter(int,
>> std::complex const*, std::complex*) (this=0x3b02f40,
>> nitems=nitems@entry=257, input=input@entry=0x7fffc9cc7000,
>> output=output@entry=0x3b36460)
>> at /local_disk/gr_3.7.9_src/gnuradio/gr-filter/lib/fft_filter.cc:323
>> #4  0x7fffd42910df in gr::digital::corr_est_cc_impl::work(int,
>> std::vector >&, 

Re: [Discuss-gnuradio] Segfault in volk_32fc_x2_multiply_32fc_a_avx2_fma

2016-03-08 Thread West, Nathan
On Tue, Mar 8, 2016 at 10:58 AM, devin kelly  wrote:

> Calling 'info variables' (or args or locals) the last few frames didn't
> give me any real info so I built a copy of GR/Volk with debug symbols.  I
> ran the FG again, this time from GDB, here's my back trace.  In this
> backtrace you can see the arguments passed in each call.  I have an
> i7-5600U CPU @ 2.60GHz, the volk_profile is appended at the bottom.
>

Excellent. Thanks for going through that extra step. It really helps.


>
> Here's are the links for the relevant code:
>
>
> https://github.com/gnuradio/volk/blob/f0b722392950bf7ede7b32f5ff60019bce7a8592/kernels/volk/volk_32fc_x2_multiply_32fc.h#L232
>
> https://github.com/gnuradio/gnuradio/blob/master/gr-filter/lib/fft_filter.cc#L323
>
> https://github.com/gnuradio/gnuradio/blob/222e0003f9797a1b92d64855bd2b93f0d9099f93/gr-digital/lib/corr_est_cc_impl.cc#L214
>
> Could the problem be that nitems is 257 and num_points is 512?  Or should
> nitems really be 256 and not 257?
>

I don't think so. I'm not familiar with the details of the fft_filter
implementations, but usually these things will take in some history if they
don't have enough points to operate on (in this case 512).

The much more worrying thing is your vector addresses.


>
> Thanks,
> Devin
>
> (gdb) bt
> #0  0x7fffdcaccb57 in volk_32fc_x2_multiply_32fc_a_avx2_fma
> (__P=0x3b051b0)
> at /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/avxintrin.h:835
> #1  0x7fffdcaccb57 in volk_32fc_x2_multiply_32fc_a_avx2_fma
> (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
>

0x3b1f770 % 32 = 16 (bad)
0x3b051b0 % 32 = 16 (bad)
0x3b240e0 % 32 = 0 (good)

Unfortunately it looks like volk_get_alignment is returning the wrong thing
or there's a bug in volk_malloc. Can you tell us what volk_get_alignment
returns? The easiest thing is probably to write a simple C program that
prints out the result (hmm, I should add that to volk-config-info). I'd
also like to know which volk_malloc implementation you're using.
Unfortunately I don't think we have an easy way to discover that (hmm,
something else that should be added to volk-config-info). I think the best
way might be to look at volk_malloc.c intermediate files after the
preprocessor has done its work.

If you want to move on while we figure this out then you can edit
~/.volk/volk_config and replace the avx2_fma with sse3 on the line that has
this kernel name on it.


> at
> /local_disk/gr_3.7.9_src/volk/kernels/volk/volk_32fc_x2_multiply_32fc.h:242
> #2  0x7fffdc945a75 in __volk_32fc_x2_multiply_32fc_a
> (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
> at /local_disk/gr_3.7.9_src/volk/build_debug/lib/volk.c:7010
> #3  0x7fffd3f8e360 in gr::filter::kernel::fft_filter_ccc::filter(int,
> std::complex const*, std::complex*) (this=0x3b02f40,
> nitems=nitems@entry=257, input=input@entry=0x7fffc9cc7000,
> output=output@entry=0x3b36460)
> at /local_disk/gr_3.7.9_src/gnuradio/gr-filter/lib/fft_filter.cc:323
> #4  0x7fffd42910df in gr::digital::corr_est_cc_impl::work(int,
> std::vector >&, std::vector std::allocator >&) (this=0x3b01560, noutput_items=257,
> input_items=..., output_items=std::vector of length 1, capacity 1 = {...})
> at
> /local_disk/gr_3.7.9_src/gnuradio/gr-digital/lib/corr_est_cc_impl.cc:237
> #5  0x7fffdd064907 in gr::sync_block::general_work(int,
> std::vector&, std::vector std::allocator >&, std::vector
> >&) (this=0x3b015b8, noutput_items=, ninput_items=...,
> input_items=..., output_items=...) at
> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/sync_block.cc:66
> #6  0x7fffdd02f70f in gr::block_executor::run_one_iteration()
> (this=this@entry=0x7fff83ffedb0)
> at
> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/block_executor.cc:438
> #7  0x7fffdd06da8a in
> gr::tpb_thread_body::tpb_thread_body(boost::shared_ptr, int)
> (this=0x7fff83ffedb0, block=..., max_noutput_items=) at
> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/tpb_thread_body.cc:122
> #8  0x7fffdd062761 in
> boost::detail::function::void_function_obj_invoker0 void>::invoke(boost::detail::function::function_buffer&) (this=0x3bc3ec0)
> at
> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/scheduler_tpb.cc:44
> #9  0x7fffdd062761 in
> boost::detail::function::void_function_obj_invoker0 void>::invoke(boost::detail::function::function_buffer&) (this=0x3bc3ec0)
> at
> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/include/gnuradio/thread/thread_body_wrapper.h:51
> #10 0x7fffdd062761 in
> boost::detail::function::void_function_obj_invoker0 void>::invoke(boost::detail::function::function_buffer&)
> (function_obj_ptr=...) at
> /usr/include/boost/function/function_template.hpp:153
> #11 

Re: [Discuss-gnuradio] Segfault in volk_32fc_x2_multiply_32fc_a_avx2_fma

2016-03-08 Thread devin kelly
Here's a simpler FG that I can use to reproduce this error.  Hopefully it
actually works for someone else.

Devin

On Tue, Mar 8, 2016 at 10:58 AM, devin kelly  wrote:

> Calling 'info variables' (or args or locals) the last few frames didn't
> give me any real info so I built a copy of GR/Volk with debug symbols.  I
> ran the FG again, this time from GDB, here's my back trace.  In this
> backtrace you can see the arguments passed in each call.  I have an
> i7-5600U CPU @ 2.60GHz, the volk_profile is appended at the bottom.
>
> Here's are the links for the relevant code:
>
>
> https://github.com/gnuradio/volk/blob/f0b722392950bf7ede7b32f5ff60019bce7a8592/kernels/volk/volk_32fc_x2_multiply_32fc.h#L232
>
> https://github.com/gnuradio/gnuradio/blob/master/gr-filter/lib/fft_filter.cc#L323
>
> https://github.com/gnuradio/gnuradio/blob/222e0003f9797a1b92d64855bd2b93f0d9099f93/gr-digital/lib/corr_est_cc_impl.cc#L214
>
> Could the problem be that nitems is 257 and num_points is 512?  Or should
> nitems really be 256 and not 257?
>
> Thanks,
> Devin
>
> (gdb) bt
> #0  0x7fffdcaccb57 in volk_32fc_x2_multiply_32fc_a_avx2_fma
> (__P=0x3b051b0)
> at /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/avxintrin.h:835
> #1  0x7fffdcaccb57 in volk_32fc_x2_multiply_32fc_a_avx2_fma
> (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
> at
> /local_disk/gr_3.7.9_src/volk/kernels/volk/volk_32fc_x2_multiply_32fc.h:242
> #2  0x7fffdc945a75 in __volk_32fc_x2_multiply_32fc_a
> (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
> at /local_disk/gr_3.7.9_src/volk/build_debug/lib/volk.c:7010
> #3  0x7fffd3f8e360 in gr::filter::kernel::fft_filter_ccc::filter(int,
> std::complex const*, std::complex*) (this=0x3b02f40,
> nitems=nitems@entry=257, input=input@entry=0x7fffc9cc7000,
> output=output@entry=0x3b36460)
> at /local_disk/gr_3.7.9_src/gnuradio/gr-filter/lib/fft_filter.cc:323
> #4  0x7fffd42910df in gr::digital::corr_est_cc_impl::work(int,
> std::vector >&, std::vector std::allocator >&) (this=0x3b01560, noutput_items=257,
> input_items=..., output_items=std::vector of length 1, capacity 1 = {...})
> at
> /local_disk/gr_3.7.9_src/gnuradio/gr-digital/lib/corr_est_cc_impl.cc:237
> #5  0x7fffdd064907 in gr::sync_block::general_work(int,
> std::vector&, std::vector std::allocator >&, std::vector
> >&) (this=0x3b015b8, noutput_items=, ninput_items=...,
> input_items=..., output_items=...) at
> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/sync_block.cc:66
> #6  0x7fffdd02f70f in gr::block_executor::run_one_iteration()
> (this=this@entry=0x7fff83ffedb0)
> at
> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/block_executor.cc:438
> #7  0x7fffdd06da8a in
> gr::tpb_thread_body::tpb_thread_body(boost::shared_ptr, int)
> (this=0x7fff83ffedb0, block=..., max_noutput_items=) at
> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/tpb_thread_body.cc:122
> #8  0x7fffdd062761 in
> boost::detail::function::void_function_obj_invoker0 void>::invoke(boost::detail::function::function_buffer&) (this=0x3bc3ec0)
> at
> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/scheduler_tpb.cc:44
> #9  0x7fffdd062761 in
> boost::detail::function::void_function_obj_invoker0 void>::invoke(boost::detail::function::function_buffer&) (this=0x3bc3ec0)
> at
> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/include/gnuradio/thread/thread_body_wrapper.h:51
> #10 0x7fffdd062761 in
> boost::detail::function::void_function_obj_invoker0 void>::invoke(boost::detail::function::function_buffer&)
> (function_obj_ptr=...) at
> /usr/include/boost/function/function_template.hpp:153
> #11 0x7fffdd016cd0 in
> boost::detail::thread_data::run() (this= out>)
> at /usr/include/boost/function/function_template.hpp:767
> #12 0x7fffdd016cd0 in
> boost::detail::thread_data::run() (this= out>)
> at /usr/include/boost/thread/detail/thread.hpp:117
> #13 0x7fffdbe4f24a in thread_proxy () at
> /lib64/libboost_thread-mt.so.1.53.0
> #14 0x77800dc5 in start_thread () at /lib64/libpthread.so.0
> #15 0x76e2528d in clone () at /lib64/libc.so.6
>
> Here are the locals on the last few frames:
>
> (gdb) f 0
> #0  0x7fffdcaccb57 in _mm256_load_ps (__P=0x3b051b0) at
> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/avxintrin.h:835
> 835   return *(__m256 *)__P;
> (gdb) info locals
> No locals.
> (gdb) f 1
> #1  volk_32fc_x2_multiply_32fc_a_avx2_fma (cVector=0x3b1f770,
> aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
> at
> /local_disk/gr_3.7.9_src/volk/kernels/volk/volk_32fc_x2_multiply_32fc.h:242
> 242 const __m256 x = _mm256_load_ps((float*)a); // Load the ar +
> ai, br + bi 

Re: [Discuss-gnuradio] Segfault in volk_32fc_x2_multiply_32fc_a_avx2_fma

2016-03-08 Thread devin kelly
Calling 'info variables' (or args or locals) the last few frames didn't
give me any real info so I built a copy of GR/Volk with debug symbols.  I
ran the FG again, this time from GDB, here's my back trace.  In this
backtrace you can see the arguments passed in each call.  I have an
i7-5600U CPU @ 2.60GHz, the volk_profile is appended at the bottom.

Here's are the links for the relevant code:

https://github.com/gnuradio/volk/blob/f0b722392950bf7ede7b32f5ff60019bce7a8592/kernels/volk/volk_32fc_x2_multiply_32fc.h#L232
https://github.com/gnuradio/gnuradio/blob/master/gr-filter/lib/fft_filter.cc#L323
https://github.com/gnuradio/gnuradio/blob/222e0003f9797a1b92d64855bd2b93f0d9099f93/gr-digital/lib/corr_est_cc_impl.cc#L214

Could the problem be that nitems is 257 and num_points is 512?  Or should
nitems really be 256 and not 257?

Thanks,
Devin

(gdb) bt
#0  0x7fffdcaccb57 in volk_32fc_x2_multiply_32fc_a_avx2_fma
(__P=0x3b051b0)
at /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/avxintrin.h:835
#1  0x7fffdcaccb57 in volk_32fc_x2_multiply_32fc_a_avx2_fma
(cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
at
/local_disk/gr_3.7.9_src/volk/kernels/volk/volk_32fc_x2_multiply_32fc.h:242
#2  0x7fffdc945a75 in __volk_32fc_x2_multiply_32fc_a
(cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
at /local_disk/gr_3.7.9_src/volk/build_debug/lib/volk.c:7010
#3  0x7fffd3f8e360 in gr::filter::kernel::fft_filter_ccc::filter(int,
std::complex const*, std::complex*) (this=0x3b02f40,
nitems=nitems@entry=257, input=input@entry=0x7fffc9cc7000,
output=output@entry=0x3b36460)
at /local_disk/gr_3.7.9_src/gnuradio/gr-filter/lib/fft_filter.cc:323
#4  0x7fffd42910df in gr::digital::corr_est_cc_impl::work(int,
std::vector >&, std::vector >&) (this=0x3b01560, noutput_items=257,
input_items=..., output_items=std::vector of length 1, capacity 1 = {...})
at
/local_disk/gr_3.7.9_src/gnuradio/gr-digital/lib/corr_est_cc_impl.cc:237
#5  0x7fffdd064907 in gr::sync_block::general_work(int,
std::vector&, std::vector >&, std::vector
>&) (this=0x3b015b8, noutput_items=, ninput_items=...,
input_items=..., output_items=...) at
/local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/sync_block.cc:66
#6  0x7fffdd02f70f in gr::block_executor::run_one_iteration()
(this=this@entry=0x7fff83ffedb0)
at
/local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/block_executor.cc:438
#7  0x7fffdd06da8a in
gr::tpb_thread_body::tpb_thread_body(boost::shared_ptr, int)
(this=0x7fff83ffedb0, block=..., max_noutput_items=) at
/local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/tpb_thread_body.cc:122
#8  0x7fffdd062761 in
boost::detail::function::void_function_obj_invoker0::invoke(boost::detail::function::function_buffer&) (this=0x3bc3ec0)
at
/local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/scheduler_tpb.cc:44
#9  0x7fffdd062761 in
boost::detail::function::void_function_obj_invoker0::invoke(boost::detail::function::function_buffer&) (this=0x3bc3ec0)
at
/local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/include/gnuradio/thread/thread_body_wrapper.h:51
#10 0x7fffdd062761 in
boost::detail::function::void_function_obj_invoker0::invoke(boost::detail::function::function_buffer&)
(function_obj_ptr=...) at
/usr/include/boost/function/function_template.hpp:153
#11 0x7fffdd016cd0 in boost::detail::thread_data::run() (this=)
at /usr/include/boost/function/function_template.hpp:767
#12 0x7fffdd016cd0 in boost::detail::thread_data::run() (this=)
at /usr/include/boost/thread/detail/thread.hpp:117
#13 0x7fffdbe4f24a in thread_proxy () at
/lib64/libboost_thread-mt.so.1.53.0
#14 0x77800dc5 in start_thread () at /lib64/libpthread.so.0
#15 0x76e2528d in clone () at /lib64/libc.so.6

Here are the locals on the last few frames:

(gdb) f 0
#0  0x7fffdcaccb57 in _mm256_load_ps (__P=0x3b051b0) at
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/avxintrin.h:835
835   return *(__m256 *)__P;
(gdb) info locals
No locals.
(gdb) f 1
#1  volk_32fc_x2_multiply_32fc_a_avx2_fma (cVector=0x3b1f770,
aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
at
/local_disk/gr_3.7.9_src/volk/kernels/volk/volk_32fc_x2_multiply_32fc.h:242
242 const __m256 x = _mm256_load_ps((float*)a); // Load the ar +
ai, br + bi as ar,ai,br,bi
(gdb) info locals
y = {-4.87433296e+17, 4.59163468e-41, -3.92813517e+17, 4.59163468e-41,
5.15677835e-43, 0, 5.26888223e-43, 0}
tmp2x = {6.389921e-43, 0, -512.314453, 4.59163468e-41, 1.26116862e-44, 0,
-4.87433296e+17, 4.59163468e-41}
x = {-512.314453, 4.59163468e-41, 0, 0, 2.76102662, -3.64918089,
-4.92134571, -1.06491208}
yl = {4.14784345e-43, 0, 1.26116862e-44, 0, -4.87442367e+17,

Re: [Discuss-gnuradio] Segfault in volk_32fc_x2_multiply_32fc_a_avx2_fma

2016-03-07 Thread West, Nathan
On Mon, Mar 7, 2016 at 10:18 PM, West, Nathan 
wrote:

> On Mon, Mar 7, 2016 at 2:32 PM, devin kelly  wrote:
>
>> Hello,
>>
>> I've built a flowgraph (grc, python attached) that usually (but not
>> always) produces a segfaults in volk_32fc_x2_multiply_32fc_a_avx2_fma.  The
>> segfault occurs in the FFT filter in correlation estimator block.  I'm not
>> sure if it's the Volk code that's causing the segfault or the GR code
>> calling it or what I'm putting into GR.  I've got a back trace below if
>> that helps.
>>
>> Also, at the tag gate in my flowgraph I would usually have a USRP
>> transmitter and a USRP receiver in a before the Correlation Estimate in a
>> separate flowgraph but since I can re-create the segfault in the flowgraph
>> I simplified.
>>
>> I'm using GR 3.7.9, Volk 1.2.1 and UHD 3.9.2 (though I don't call any UHD
>> code here).
>>
>> Thanks for any help,
>> Devin
>>
>> $ gdb /usr/bin/python core.13408
>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <
>> http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-redhat-linux-gnu".
>> For bug reporting instructions, please see:
>> ...
>> Reading symbols from /usr/bin/python2.7...Reading symbols from
>> /usr/bin/python2.7...(no debugging symbols found)...done.
>> (no debugging symbols found)...done.
>>
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> Core was generated by `python2 ./segfault_test.py'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x7f11a4d064c0 in volk_32fc_x2_multiply_32fc_a_avx2_fma () from
>> /local_disk/gr_3.7.9/lib/libvolk.so.1.2.1
>> Missing separate debuginfos, use: debuginfo-install
>> python-2.7.5-34.el7.x86_64
>> (gdb) bt
>> #0  0x7f11a4d064c0 in volk_32fc_x2_multiply_32fc_a_avx2_fma () at
>> /local_disk/gr_3.7.9/lib/libvolk.so.1.2.1
>> #1  0x7f119c1d0006 in gr::filter::kernel::fft_filter_ccc::filter(int,
>> std::complex const*, std::complex*) ()
>> at /local_disk/gr_3.7.9/lib64/libgnuradio-filter-3.7.9.so.0.0.0
>> #2  0x7f119c4d724f in gr::digital::corr_est_cc_impl::work(int,
>> std::vector >&, std::vector> std::allocator >&) () at
>> /local_disk/gr_3.7.9/lib64/libgnuradio-digital-3.7.9.so.0.0.0
>> #3  0x7f11a52b1f57 in gr::sync_block::general_work(int,
>> std::vector&, std::vector> std::allocator >&, std::vector
>> >&) () at /local_disk/gr_3.7.9/lib64/libgnuradio-runtime-3.7.9.so.0.0.0
>> #4  0x7f11a527a6bd in gr::block_executor::run_one_iteration() () at
>> /local_disk/gr_3.7.9/lib64/libgnuradio-runtime-3.7.9.so.0.0.0
>> #5  0x7f11a52baf60 in
>> gr::tpb_thread_body::tpb_thread_body(boost::shared_ptr, int) ()
>> at /local_disk/gr_3.7.9/lib64/libgnuradio-runtime-3.7.9.so.0.0.0
>> #6  0x7f11a52aebc1 in
>> boost::detail::function::void_function_obj_invoker0> void>::invoke(boost::detail::function::function_buffer&) () at
>> /local_disk/gr_3.7.9/lib64/libgnuradio-runtime-3.7.9.so.0.0.0
>> #7  0x7f11a5260910 in
>> boost::detail::thread_data::run() ()
>> at /local_disk/gr_3.7.9/lib64/libgnuradio-runtime-3.7.9.so.0.0.0
>> #8  0x7f11a40b824a in thread_proxy () at
>> /lib64/libboost_thread-mt.so.1.53.0
>> #9  0x7f11bfa38dc5 in start_thread () at /lib64/libpthread.so.0
>> #10 0x7f11bf05d28d in clone () at /lib64/libc.so.6
>>
>>
>> ___
>> Discuss-gnuradio mailing list
>> Discuss-gnuradio@gnu.org
>> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>>
>>
> I couldn't reproduce this on my machine with avx2. Your GRC flowgraph
> didn't draw anything, so I replaced your FG with random source(0,255) ->
> const. modulator -> correlation est -> sinks.
>
> I'm wondering if a bad vector length is getting passed in by accident. Can
> you get the parameters to the VOLK call when it crashes? If you're not
> familiar with gdb do something like the following:
> gdb> f 0
> gdb> info variables
>


Also, I'd like to know what processor you have. I don't think it has
anything to do with this bug, but my processor (i7-4700MQ) doesn't
generally see much benefit from AVX much less AVX2. What were the times
volk_profile gives you for this kernel? (use volk_profile -R
32fc_x2_multiply)
___
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio


Re: [Discuss-gnuradio] Segfault in volk_32fc_x2_multiply_32fc_a_avx2_fma

2016-03-07 Thread West, Nathan
On Mon, Mar 7, 2016 at 2:32 PM, devin kelly  wrote:

> Hello,
>
> I've built a flowgraph (grc, python attached) that usually (but not
> always) produces a segfaults in volk_32fc_x2_multiply_32fc_a_avx2_fma.  The
> segfault occurs in the FFT filter in correlation estimator block.  I'm not
> sure if it's the Volk code that's causing the segfault or the GR code
> calling it or what I'm putting into GR.  I've got a back trace below if
> that helps.
>
> Also, at the tag gate in my flowgraph I would usually have a USRP
> transmitter and a USRP receiver in a before the Correlation Estimate in a
> separate flowgraph but since I can re-create the segfault in the flowgraph
> I simplified.
>
> I'm using GR 3.7.9, Volk 1.2.1 and UHD 3.9.2 (though I don't call any UHD
> code here).
>
> Thanks for any help,
> Devin
>
> $ gdb /usr/bin/python core.13408
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> ...
> Reading symbols from /usr/bin/python2.7...Reading symbols from
> /usr/bin/python2.7...(no debugging symbols found)...done.
> (no debugging symbols found)...done.
>
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `python2 ./segfault_test.py'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x7f11a4d064c0 in volk_32fc_x2_multiply_32fc_a_avx2_fma () from
> /local_disk/gr_3.7.9/lib/libvolk.so.1.2.1
> Missing separate debuginfos, use: debuginfo-install
> python-2.7.5-34.el7.x86_64
> (gdb) bt
> #0  0x7f11a4d064c0 in volk_32fc_x2_multiply_32fc_a_avx2_fma () at
> /local_disk/gr_3.7.9/lib/libvolk.so.1.2.1
> #1  0x7f119c1d0006 in gr::filter::kernel::fft_filter_ccc::filter(int,
> std::complex const*, std::complex*) ()
> at /local_disk/gr_3.7.9/lib64/libgnuradio-filter-3.7.9.so.0.0.0
> #2  0x7f119c4d724f in gr::digital::corr_est_cc_impl::work(int,
> std::vector >&, std::vector std::allocator >&) () at
> /local_disk/gr_3.7.9/lib64/libgnuradio-digital-3.7.9.so.0.0.0
> #3  0x7f11a52b1f57 in gr::sync_block::general_work(int,
> std::vector&, std::vector std::allocator >&, std::vector
> >&) () at /local_disk/gr_3.7.9/lib64/libgnuradio-runtime-3.7.9.so.0.0.0
> #4  0x7f11a527a6bd in gr::block_executor::run_one_iteration() () at
> /local_disk/gr_3.7.9/lib64/libgnuradio-runtime-3.7.9.so.0.0.0
> #5  0x7f11a52baf60 in
> gr::tpb_thread_body::tpb_thread_body(boost::shared_ptr, int) ()
> at /local_disk/gr_3.7.9/lib64/libgnuradio-runtime-3.7.9.so.0.0.0
> #6  0x7f11a52aebc1 in
> boost::detail::function::void_function_obj_invoker0 void>::invoke(boost::detail::function::function_buffer&) () at
> /local_disk/gr_3.7.9/lib64/libgnuradio-runtime-3.7.9.so.0.0.0
> #7  0x7f11a5260910 in
> boost::detail::thread_data::run() ()
> at /local_disk/gr_3.7.9/lib64/libgnuradio-runtime-3.7.9.so.0.0.0
> #8  0x7f11a40b824a in thread_proxy () at
> /lib64/libboost_thread-mt.so.1.53.0
> #9  0x7f11bfa38dc5 in start_thread () at /lib64/libpthread.so.0
> #10 0x7f11bf05d28d in clone () at /lib64/libc.so.6
>
>
> ___
> Discuss-gnuradio mailing list
> Discuss-gnuradio@gnu.org
> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>
>
I couldn't reproduce this on my machine with avx2. Your GRC flowgraph
didn't draw anything, so I replaced your FG with random source(0,255) ->
const. modulator -> correlation est -> sinks.

I'm wondering if a bad vector length is getting passed in by accident. Can
you get the parameters to the VOLK call when it crashes? If you're not
familiar with gdb do something like the following:
gdb> f 0
gdb> info variables
___
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio