osmo-trx[master]: ssedetect: Add runtime CPU detection
Patch Set 3: > > > Do we officially support anything besides gcc? > > > > not really, but then it is also nice to be portable. My vote > would > > be to merge the current patch under discussion, but open a ticket > > as a reminder that this should be made more portable. I suppose > > mplayer/ffmpeg/fftw or other libs with heavily optimized > algorithms > > also have a solution to that. > > As I just figured out, this call is supported in recent clang > versions, so IMHO: it would be better don't to break compatibility > with older compilers by this commit. It should be fairly easy to > add a new configure check whether __builtin_cpu_supports is > supported > by compiler or not. I'll try to do it in libosmocore. Have a look at: https://gerrit.osmocom.org/#/c/2519/ -- To view, visit https://gerrit.osmocom.org/2100 To unsubscribe, visit https://gerrit.osmocom.org/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iba74f8a6e4e921ff31e4bd9f0c7c881fe547423a Gerrit-PatchSet: 3 Gerrit-Project: osmo-trx Gerrit-Branch: master Gerrit-Owner: dexterGerrit-Reviewer: Alexander Chemeris Gerrit-Reviewer: Harald Welte Gerrit-Reviewer: Jenkins Builder Gerrit-Reviewer: Max Gerrit-Reviewer: Tom Tsou Gerrit-Reviewer: Vadim Yanitskiy Gerrit-Reviewer: dexter Gerrit-HasComments: No
osmo-trx[master]: ssedetect: Add runtime CPU detection
Patch Set 3: Code-Review-1 > > Do we officially support anything besides gcc? > > not really, but then it is also nice to be portable. My vote would > be to merge the current patch under discussion, but open a ticket > as a reminder that this should be made more portable. I suppose > mplayer/ffmpeg/fftw or other libs with heavily optimized algorithms > also have a solution to that. As I just figured out, this call is supported in recent clang versions, so IMHO: it would be better don't to break compatibility with older compilers by this commit. It should be fairly easy to add a new configure check whether __builtin_cpu_supports is supported by compiler or not. I'll try to do it in libosmocore. -- To view, visit https://gerrit.osmocom.org/2100 To unsubscribe, visit https://gerrit.osmocom.org/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iba74f8a6e4e921ff31e4bd9f0c7c881fe547423a Gerrit-PatchSet: 3 Gerrit-Project: osmo-trx Gerrit-Branch: master Gerrit-Owner: dexterGerrit-Reviewer: Alexander Chemeris Gerrit-Reviewer: Harald Welte Gerrit-Reviewer: Jenkins Builder Gerrit-Reviewer: Max Gerrit-Reviewer: Tom Tsou Gerrit-Reviewer: Vadim Yanitskiy Gerrit-Reviewer: dexter Gerrit-HasComments: No
osmo-trx[master]: ssedetect: Add runtime CPU detection
Patch Set 3: Code-Review+2 -- To view, visit https://gerrit.osmocom.org/2100 To unsubscribe, visit https://gerrit.osmocom.org/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iba74f8a6e4e921ff31e4bd9f0c7c881fe547423a Gerrit-PatchSet: 3 Gerrit-Project: osmo-trx Gerrit-Branch: master Gerrit-Owner: dexterGerrit-Reviewer: Alexander Chemeris Gerrit-Reviewer: Harald Welte Gerrit-Reviewer: Jenkins Builder Gerrit-Reviewer: Max Gerrit-Reviewer: Tom Tsou Gerrit-Reviewer: Vadim Yanitskiy Gerrit-Reviewer: dexter Gerrit-HasComments: No
osmo-trx[master]: ssedetect: Add runtime CPU detection
Patch Set 3: Code-Review-1 To recap discussion at the OsmoDevCon, the last remaining request before the patch can be merged is to add an AVX target, which is a minor change. Regarding clang support I also vote for moving this out of this ticket into a new one. -- To view, visit https://gerrit.osmocom.org/2100 To unsubscribe, visit https://gerrit.osmocom.org/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iba74f8a6e4e921ff31e4bd9f0c7c881fe547423a Gerrit-PatchSet: 3 Gerrit-Project: osmo-trx Gerrit-Branch: master Gerrit-Owner: dexterGerrit-Reviewer: Alexander Chemeris Gerrit-Reviewer: Harald Welte Gerrit-Reviewer: Jenkins Builder Gerrit-Reviewer: Max Gerrit-HasComments: No
osmo-trx[master]: ssedetect: Add runtime CPU detection
Patch Set 3: > Do we officially support anything besides gcc? not really, but then it is also nice to be portable. My vote would be to merge the current patch under discussion, but open a ticket as a reminder that this should be made more portable. I suppose mplayer/ffmpeg/fftw or other libs with heavily optimized algorithms also have a solution to that. -- To view, visit https://gerrit.osmocom.org/2100 To unsubscribe, visit https://gerrit.osmocom.org/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iba74f8a6e4e921ff31e4bd9f0c7c881fe547423a Gerrit-PatchSet: 3 Gerrit-Project: osmo-trx Gerrit-Branch: master Gerrit-Owner: dexterGerrit-Reviewer: Harald Welte Gerrit-Reviewer: Jenkins Builder Gerrit-Reviewer: Max Gerrit-Reviewer: Vadim Yanitskiy Gerrit-HasComments: No
osmo-trx[master]: ssedetect: Add runtime CPU detection
Patch Set 3: Do we officially support anything besides gcc? -- To view, visit https://gerrit.osmocom.org/2100 To unsubscribe, visit https://gerrit.osmocom.org/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iba74f8a6e4e921ff31e4bd9f0c7c881fe547423a Gerrit-PatchSet: 3 Gerrit-Project: osmo-trx Gerrit-Branch: master Gerrit-Owner: dexterGerrit-Reviewer: Harald Welte Gerrit-Reviewer: Jenkins Builder Gerrit-Reviewer: Max Gerrit-Reviewer: Vadim Yanitskiy Gerrit-HasComments: No
osmo-trx[master]: ssedetect: Add runtime CPU detection
Patch Set 3: Code-Review-1 (1 comment) https://gerrit.osmocom.org/#/c/2100/3/Transceiver52M/x86/convert.c File Transceiver52M/x86/convert.c: Line 197: if (__builtin_cpu_supports("sse4.1")) { It is only supported by GCC, so building with another compiler, for example with clang, fails: convert.c:197:6: error: use of unknown builtin '__builtin_cpu_supports' [-Wimplicit-function-declaration] if (__builtin_cpu_supports("sse4.1")) { ^ 1 error generated. I don't know, is there any way to determine supported instructions sets in clang, but for now we can go this way: #if (defined(__GNUC__) && !defined(__clang__)) if (__builtin_cpu_supports("sse4.1")) // ... #endif -- To view, visit https://gerrit.osmocom.org/2100 To unsubscribe, visit https://gerrit.osmocom.org/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iba74f8a6e4e921ff31e4bd9f0c7c881fe547423a Gerrit-PatchSet: 3 Gerrit-Project: osmo-trx Gerrit-Branch: master Gerrit-Owner: dexterGerrit-Reviewer: Harald Welte Gerrit-Reviewer: Jenkins Builder Gerrit-Reviewer: Vadim Yanitskiy Gerrit-HasComments: Yes
osmo-trx[master]: ssedetect: Add runtime CPU detection
Patch Set 1: Code-Review+1 -- To view, visit https://gerrit.osmocom.org/2100 To unsubscribe, visit https://gerrit.osmocom.org/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iba74f8a6e4e921ff31e4bd9f0c7c881fe547423a Gerrit-PatchSet: 1 Gerrit-Project: osmo-trx Gerrit-Branch: master Gerrit-Owner: dexterGerrit-Reviewer: Harald Welte Gerrit-Reviewer: Jenkins Builder Gerrit-HasComments: No
[PATCH] osmo-trx[master]: ssedetect: Add runtime CPU detection
Review at https://gerrit.osmocom.org/2100 ssedetect: Add runtime CPU detection The current implementation can select the SSE support level during compiletime only. This commit adds functionality to automatically detect and switch the SSE support level and automatically switch the Implementation if the CPU does not support the required SSE level. Change-Id: Iba74f8a6e4e921ff31e4bd9f0c7c881fe547423a --- M Transceiver52M/arm/convolve.c M Transceiver52M/common/convert.h M Transceiver52M/common/convolve.h M Transceiver52M/osmo-trx.cpp M Transceiver52M/x86/convert.c M Transceiver52M/x86/convolve.c 6 files changed, 142 insertions(+), 49 deletions(-) git pull ssh://gerrit.osmocom.org:29418/osmo-trx refs/changes/00/2100/1 diff --git a/Transceiver52M/arm/convolve.c b/Transceiver52M/arm/convolve.c index 2b42090..912d0c2 100644 --- a/Transceiver52M/arm/convolve.c +++ b/Transceiver52M/arm/convolve.c @@ -58,6 +58,13 @@ } #endif +/* API: Initalize convolve module */ +void convolve_init(void) +{ + /* Stub */ + return; +} + /* API: Aligned complex-real */ int convolve_real(float *x, int x_len, float *h, int h_len, diff --git a/Transceiver52M/common/convert.h b/Transceiver52M/common/convert.h index 4827c28..1d3a180 100644 --- a/Transceiver52M/common/convert.h +++ b/Transceiver52M/common/convert.h @@ -3,5 +3,6 @@ void convert_float_short(short *out, const float *in, float scale, int len); void convert_short_float(float *out, const short *in, int len); +void convert_init(void); #endif /* _CONVERT_H_ */ diff --git a/Transceiver52M/common/convolve.h b/Transceiver52M/common/convolve.h index 08bda0c..43db577 100644 --- a/Transceiver52M/common/convolve.h +++ b/Transceiver52M/common/convolve.h @@ -27,4 +27,6 @@ int start, int len, int step, int offset); +void convolve_init(void); + #endif /* _CONVOLVE_H_ */ diff --git a/Transceiver52M/osmo-trx.cpp b/Transceiver52M/osmo-trx.cpp index 5e81586..dff482e 100644 --- a/Transceiver52M/osmo-trx.cpp +++ b/Transceiver52M/osmo-trx.cpp @@ -32,6 +32,11 @@ #include #include +extern "C" { +#include "convolve.h" +#include "convert.h" +} + /* Samples-per-symbol for downlink path * 4 - Uses precision modulator (more computation, less distortion) * 1 - Uses minimized modulator (less computation, more distortion) @@ -498,6 +503,9 @@ RadioDevice::InterfaceType iface = RadioDevice::NORMAL; struct trx_config config; + convolve_init(); + convert_init(); + handle_options(argc, argv, ); setup_signal_handlers(); diff --git a/Transceiver52M/x86/convert.c b/Transceiver52M/x86/convert.c index 862a2e7..db1c0fc 100644 --- a/Transceiver52M/x86/convert.c +++ b/Transceiver52M/x86/convert.c @@ -25,6 +25,17 @@ #include "config.h" #endif +/* Architecture dependant function pointers */ +struct convert_cpu_context { + void (*convert_si16_ps_16n) (float *, const short *, int); + void (*convert_si16_ps) (float *, const short *, int); + void (*convert_scale_ps_si16_16n)(short *, const float *, float, int); + void (*convert_scale_ps_si16_8n)(short *, const float *, float, int); + void (*convert_scale_ps_si16)(short *, const float *, float, int); +}; + +static struct convert_cpu_context c; + #ifdef HAVE_SSE3 #include #include @@ -157,53 +168,61 @@ _mm_storeu_si128((__m128i *) [16 * i + 8], m7); } } -#else /* HAVE_SSE3 */ +#endif + +__attribute__((optimize("no-tree-vectorize"))) static void convert_scale_ps_si16(short *out, const float *in, float scale, int len) { for (int i = 0; i < len; i++) out[i] = in[i] * scale; } -#endif -#ifndef HAVE_SSE4_1 +__attribute__((optimize("no-tree-vectorize"))) static void convert_si16_ps(float *out, const short *in, int len) { for (int i = 0; i < len; i++) out[i] = in[i]; } + +void convert_init(void) +{ + c.convert_scale_ps_si16_16n = convert_scale_ps_si16; + c.convert_scale_ps_si16_8n = convert_scale_ps_si16; + c.convert_scale_ps_si16 = convert_scale_ps_si16; + c.convert_si16_ps_16n = convert_si16_ps; + c.convert_si16_ps = convert_si16_ps; + +#ifdef HAVE_SSE4_1 + if (__builtin_cpu_supports("sse4.1")) { + c.convert_si16_ps_16n = &_sse_convert_si16_ps_16n; + c.convert_si16_ps = &_sse_convert_si16_ps; + } #endif + +#ifdef HAVE_SSE3 + if (__builtin_cpu_supports("sse3")) { + c.convert_scale_ps_si16_16n = _sse_convert_scale_ps_si16_16n; + c.convert_scale_ps_si16_8n = _sse_convert_scale_ps_si16_8n; + c.convert_scale_ps_si16 = _sse_convert_scale_ps_si16; + } +#endif +} void convert_float_short(short *out, const float *in, float scale, int len) { - void (*conv_func)(short *, const float *, float, int); - -#ifdef HAVE_SSE3 if (!(len %