Re: [Development] Updating x86 SIMD support in Qt

2022-03-25 Thread Lorn Potter




On 19/1/2022 1:01 PM, Thiago Macieira wrote:

For Qt 6.4, I'd like to propose we change the way we detect and enable SIMD
support. TL;DR:


[snip]

On a side track. Not knowing too much regarding simd, what would be the 
best benchmarks to get a comparison of Qt WebAssembly simd support vs. 
non simd (default)? 10 or 15


Particularly looking for speed ups or slow downs, as some wasm simd is 
native opcode provided by the browsers and others are emulated by 
emscripten.



https://emscripten.org/docs/porting/simd.html
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-03-24 Thread Thiago Macieira
On Thursday, 24 March 2022 04:14:38 -03 Volker Hilsheimer wrote:
> Interesting, I have no problems building Qt in VMs running on VirtualBox. I
> build qtbase on several VMs every day, and just checked that Qt Multimedia
> builds on https://app.vagrantup.com/generic/boxes/opensuse15 without
> problems (or at least without that particular problem).

This only happens if you pass -march=native for it, which will inlcude AVX2 
but not the other ones.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-03-24 Thread Volker Hilsheimer


> On 24 Mar 2022, at 02:59, Thiago Macieira  wrote:
> 
> On Wednesday, 23 March 2022 20:34:07 -03 Volker Hilsheimer wrote:
>> I have a possibly wrong hunch that building Qt Multimedia fails because of
>> this. This is in a Ubuntu 20.04 VM in the VMware Fusion 12 provider, and
>> the hardware version is maxed out (it wasn’t earlier, but didn’t help to
>> put it to level 18). So there’s not a whole lot I can do on the VM
>> provisioning side, I think.
> 
>> /home/vagrant/dev-build/qtbase/include/QtCore/6.4.0/QtCore/private/../../../
>> ../../../../qt/dev/qtbase/src/corelib/global/qsimd_p.h:256:8: error: #error
>> "Please enable all x86-64-v3 extensions; you probably want to use
>> -march=haswell or -march=x86-64-v3 instead of -mavx2"
> 
>> /home/vagrant/dev-build/qtbase/include/QtCore/6.4.0/QtCore/private/../../../
>> ../../../../qt/dev/qtbase/src/corelib/global/qsimd_p.h:253:81: error:
>> ‘__FMA__’ was not declared in this scope> 
> 
> By the way, please note that VirtualBox enables AVX2 but not FMA and some of 
> the other required functionality for x86-64-v3. So a VirtualBox environment 
> is 
> *not* x86-64-v3 and therefore you cannot run such binaries and you cannot 
> compile Qt with -march=native in it.
> 
> As far as I know, this is a problem exclusive to VirtualBox. It wouldn't 
> affect VMWare or qemu KVM-accelerated virtualisation. I also haven't noticed 
> it on Parallels on my Mac. It's been reported but hasn't been fixed.


Interesting, I have no problems building Qt in VMs running on VirtualBox. I 
build qtbase on several VMs every day, and just checked that Qt Multimedia 
builds on https://app.vagrantup.com/generic/boxes/opensuse15 without problems 
(or at least without that particular problem).

Volker

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-03-23 Thread Thiago Macieira
On Wednesday, 23 March 2022 20:34:07 -03 Volker Hilsheimer wrote:
> I have a possibly wrong hunch that building Qt Multimedia fails because of
> this. This is in a Ubuntu 20.04 VM in the VMware Fusion 12 provider, and
> the hardware version is maxed out (it wasn’t earlier, but didn’t help to
> put it to level 18). So there’s not a whole lot I can do on the VM
> provisioning side, I think.

> /home/vagrant/dev-build/qtbase/include/QtCore/6.4.0/QtCore/private/../../../
> ../../../../qt/dev/qtbase/src/corelib/global/qsimd_p.h:256:8: error: #error
> "Please enable all x86-64-v3 extensions; you probably want to use
> -march=haswell or -march=x86-64-v3 instead of -mavx2"

> /home/vagrant/dev-build/qtbase/include/QtCore/6.4.0/QtCore/private/../../../
> ../../../../qt/dev/qtbase/src/corelib/global/qsimd_p.h:253:81: error:
> ‘__FMA__’ was not declared in this scope> 

By the way, please note that VirtualBox enables AVX2 but not FMA and some of 
the other required functionality for x86-64-v3. So a VirtualBox environment is 
*not* x86-64-v3 and therefore you cannot run such binaries and you cannot 
compile Qt with -march=native in it.

As far as I know, this is a problem exclusive to VirtualBox. It wouldn't 
affect VMWare or qemu KVM-accelerated virtualisation. I also haven't noticed 
it on Parallels on my Mac. It's been reported but hasn't been fixed.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-03-23 Thread Volker Hilsheimer


> On 24 Mar 2022, at 00:34, Volker Hilsheimer  wrote:
> 
>> On 24 Jan 2022, at 17:04, Thiago Macieira  wrote:
>> 
>> On Monday, 24 January 2022 05:30:46 PST Konrad Rosenbaum wrote:
>>> I have absolutely no problem with stuff running faster and more
>>> efficient on my two laptops (which are significantly more modern), but I
>>> would have a major problem with it not running at all on my workstation
>>> that I use for 95% of all my Open Source work. And I would also not like
>>> my applications to crash on my downstream user's computers (which are on
>>> average just as old as mine) - every crash means hours of work for
>>> someone (usually me) to find out what the problem was.
>> 
>> At least i can promise you not to make it a silent crash. Either QtCore or 
>> the 
>> dynamic linker would say it can't run on that machine.
>> 
>> https://code.woboq.org/qt6/qtbase/src/corelib/global/
>> qsimd.cpp.html#_Z16qDumpCPUFeaturesv
> 
> I have a possibly wrong hunch that building Qt Multimedia fails because of 
> this. This is in a Ubuntu 20.04 VM in the VMware Fusion 12 provider, and the 
> hardware version is maxed out (it wasn’t earlier, but didn’t help to put it 
> to level 18). So there’s not a whole lot I can do on the VM provisioning 
> side, I think.
> 
> Volker
> 


Nevermind, my dev branch wasn’t rebased.

Time to hit the hay, evidently.

Cheers,
Volker

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-03-23 Thread Volker Hilsheimer
> On 24 Jan 2022, at 17:04, Thiago Macieira  wrote:
> 
> On Monday, 24 January 2022 05:30:46 PST Konrad Rosenbaum wrote:
>> I have absolutely no problem with stuff running faster and more
>> efficient on my two laptops (which are significantly more modern), but I
>> would have a major problem with it not running at all on my workstation
>> that I use for 95% of all my Open Source work. And I would also not like
>> my applications to crash on my downstream user's computers (which are on
>> average just as old as mine) - every crash means hours of work for
>> someone (usually me) to find out what the problem was.
> 
> At least i can promise you not to make it a silent crash. Either QtCore or 
> the 
> dynamic linker would say it can't run on that machine.
> 
> https://code.woboq.org/qt6/qtbase/src/corelib/global/
> qsimd.cpp.html#_Z16qDumpCPUFeaturesv

I have a possibly wrong hunch that building Qt Multimedia fails because of 
this. This is in a Ubuntu 20.04 VM in the VMware Fusion 12 provider, and the 
hardware version is maxed out (it wasn’t earlier, but didn’t help to put it to 
level 18). So there’s not a whole lot I can do on the VM provisioning side, I 
think.

Volker



In file included from 
/home/vagrant/dev-build/qtbase/include/QtCore/6.4.0/QtCore/private/qsimd_p.h:1,
 from 
/home/vagrant/qt/dev/qtmultimedia/src/multimedia/video/qvideoframeconversionhelper_p.h:55,
 from 
/home/vagrant/qt/dev/qtmultimedia/src/multimedia/video/qvideoframeconversionhelper_avx2.cpp:40:
/home/vagrant/dev-build/qtbase/include/QtCore/6.4.0/QtCore/private/../../../../../../../qt/dev/qtbase/src/corelib/global/qsimd_p.h:256:8:
 error: #error "Please enable all x86-64-v3 extensions; you probably want to 
use -march=haswell or -march=x86-64-v3 instead of -mavx2"
  256 | #  error "Please enable all x86-64-v3 extensions; you probably want 
to use -march=haswell or -march=x86-64-v3 instead of -mavx2"
  |^
/home/vagrant/dev-build/qtbase/include/QtCore/6.4.0/QtCore/private/../../../../../../../qt/dev/qtbase/src/corelib/global/qsimd_p.h:253:49:
 error: ‘__BMI__’ was not declared in this scope
  253 | #  define ARCH_HASWELL_MACROS   (__AVX2__ + __BMI__ + __BMI2__ + 
__F16C__ + __FMA__ + __LZCNT__)
  | ^~~
/home/vagrant/dev-build/qtbase/include/QtCore/6.4.0/QtCore/private/../../../../../../../qt/dev/qtbase/src/corelib/global/qsimd_p.h:258:15:
 note: in expansion of macro ‘ARCH_HASWELL_MACROS’
  258 | static_assert(ARCH_HASWELL_MACROS, "Undeclared identifiers indicate 
which features are missing.");
  |   ^~~
/home/vagrant/dev-build/qtbase/include/QtCore/6.4.0/QtCore/private/../../../../../../../qt/dev/qtbase/src/corelib/global/qsimd_p.h:253:59:
 error: ‘__BMI2__’ was not declared in this scope
  253 | #  define ARCH_HASWELL_MACROS   (__AVX2__ + __BMI__ + __BMI2__ + 
__F16C__ + __FMA__ + __LZCNT__)
  |   ^~~~
/home/vagrant/dev-build/qtbase/include/QtCore/6.4.0/QtCore/private/../../../../../../../qt/dev/qtbase/src/corelib/global/qsimd_p.h:258:15:
 note: in expansion of macro ‘ARCH_HASWELL_MACROS’
  258 | static_assert(ARCH_HASWELL_MACROS, "Undeclared identifiers indicate 
which features are missing.");
  |   ^~~
/home/vagrant/dev-build/qtbase/include/QtCore/6.4.0/QtCore/private/../../../../../../../qt/dev/qtbase/src/corelib/global/qsimd_p.h:253:70:
 error: ‘__F16C__’ was not declared in this scope
  253 | #  define ARCH_HASWELL_MACROS   (__AVX2__ + __BMI__ + __BMI2__ + 
__F16C__ + __FMA__ + __LZCNT__)
  |  
^~~~
/home/vagrant/dev-build/qtbase/include/QtCore/6.4.0/QtCore/private/../../../../../../../qt/dev/qtbase/src/corelib/global/qsimd_p.h:258:15:
 note: in expansion of macro ‘ARCH_HASWELL_MACROS’
  258 | static_assert(ARCH_HASWELL_MACROS, "Undeclared identifiers indicate 
which features are missing.");
  |   ^~~
/home/vagrant/dev-build/qtbase/include/QtCore/6.4.0/QtCore/private/../../../../../../../qt/dev/qtbase/src/corelib/global/qsimd_p.h:253:81:
 error: ‘__FMA__’ was not declared in this scope
  253 | #  define ARCH_HASWELL_MACROS   (__AVX2__ + __BMI__ + __BMI2__ + 
__F16C__ + __FMA__ + __LZCNT__)
  | 
^~~
/home/vagrant/dev-build/qtbase/include/QtCore/6.4.0/QtCore/private/../../../../../../../qt/dev/qtbase/src/corelib/global/qsimd_p.h:258:15:
 note: in expansion of macro ‘ARCH_HASWELL_MACROS’
  258 | static_assert(ARCH_HASWELL_MACROS, "Undeclared identifiers indicate 
which features are missing.");
  |   ^~~
/home/vagrant/dev-build/qtbase/include/QtCore/6.4.0/QtCore/private/../../../../../../../qt/dev/qtbase/src/corelib/global/qsimd_p.h:253:91:
 error: 

Re: [Development] Updating x86 SIMD support in Qt

2022-01-24 Thread Thiago Macieira
On Monday, 24 January 2022 05:30:46 PST Konrad Rosenbaum wrote:
> I have absolutely no problem with stuff running faster and more
> efficient on my two laptops (which are significantly more modern), but I
> would have a major problem with it not running at all on my workstation
> that I use for 95% of all my Open Source work. And I would also not like
> my applications to crash on my downstream user's computers (which are on
> average just as old as mine) - every crash means hours of work for
> someone (usually me) to find out what the problem was.

At least i can promise you not to make it a silent crash. Either QtCore or the 
dynamic linker would say it can't run on that machine.

https://code.woboq.org/qt6/qtbase/src/corelib/global/
qsimd.cpp.html#_Z16qDumpCPUFeaturesv

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-24 Thread Rui Oliveira

Hey,

Since Qt has its own Conan/Artifactory setup 
, maybe 
it would be viable to have some different SIMD setups available as 
options for the Qt package?


This would require Conan as a dependency, of course. On most systems 
that means Python (Conan 1.x supports py2 still, the upcoming Conan 2.0 
with all its breaking changes will require py3, iirc), but on Windows 
they have .exe versions available .


I think this would have little impact in the overall build process. 
`conan install qt ` and just point your project to look for Qt 
in ~/.conan... With Qt Creator should be just creating a new kit. If the 
Windows installer for Qt helped with this, even better.



Rui

Às 13:30 de 24/01/2022, Konrad Rosenbaum escreveu:

Hi,

On 23/01/2022 19:39, Thiago Macieira wrote:
I expect that most of those tools are therefore simply using whatever 
binaries
they obtained from The Qt Company and didn't rebuild from source. I 
think this
is how we at Intel do for the installers for the oneAPI SDK on Linux 
and macOS

(the Windows installer got rewritten a few years ago).


Correct. Only a handful of application developers know how to compile 
the entire framework and (esp. on Windows) a lot of them would not 
even have the idea to do it. Some of my Windows-based colleagues would 
panic if I told them to compile their frameworks. I have had paid 
customer projects to help them compile Qt for some OS version where 
binaries were not readily available.


Then there are bastards like me who know perfectly well how to compile 
Qt, who compile all kinds of large programs all the time, but refuse 
to do so with Qt because I am too lazy to tune my build process, I 
think it takes too long and I like the excuse of using "the standard 
configuration"... ;-)


So if my proposal had been in effect for those releases, it's quite 
likely the

tools wouldn't have run on your 13+-year-old computer.

But it isn't. We're talking about the next release, 6.4. There won't 
be any

tools built with it until the second half of this year, and commercial
customers may even want to wait for the LTS release after that.


And what makes you think that we stingy, cheap and lazy bastards will 
buy a new computer to replace this outdated piece of scrap metal this 
year when we haven't done so in more than 10 years? ;-)


The cruel fact is that around 2010 computers got so darn efficient 
that you can run a moderately good machine from slightly before 2010 
with minor upgrades to this day and call it "the kid's computer" or 
even your "main workstation" without blushing. I know plenty of people 
(both laymen and CS professionals) who run such old hardware (or 
worse) and don't even notice that it is old - if I hadn't looked it up 
I would have sworn my workstation is less than 6 years old.


[Yes, as someone who's income depends on people buying the newest 
chips the irony of not replacing my own computer in >10 years is not 
entirely lost on me...]


Speaking of LTS: with LTS not available to Open Source users anymore - 
sticking with older versions of Qt is not exactly a good option 
either. Unless I restrict myself to Qt 5.15 until I'm satisfied all my 
downstream users are likely to have bought a new computer (if they are 
as stingy as I am, then Qt7 will be close to being released before 
that happens).



On Linux, we can have the multiple versions. I proposed a minimum of 
v2 and an
option of v3, but we can always choose v1+v2+v3. But I really want v2 
and v3

for the critical libraries.


Please please chose v1 as pre-built minimum. There are plenty of v1 
systems out there that need to run Qt applications and plenty of 
developers who will never re-compile Qt.


If the physical appearance of the "please" makes a difference: Pretty 
Please!


I have absolutely no problem with stuff running faster and more 
efficient on my two laptops (which are significantly more modern), but 
I would have a major problem with it not running at all on my 
workstation that I use for 95% of all my Open Source work. And I would 
also not like my applications to crash on my downstream user's 
computers (which are on average just as old as mine) - every crash 
means hours of work for someone (usually me) to find out what the 
problem was.




    Konrad


___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-23 Thread Thiago Macieira
On Saturday, 22 January 2022 10:41:50 PST Lisandro Damián Nicanor Pérez Meyer 
wrote:
> Keeping the distros able to use whatever baseline is indeed a first
> good step, but I'm also thinking about proprietary apps that might be
> using tQtC's build for their offerings. We used this very same machine
> with Zoom meetings for our kids' school meetings while they couldn't
> go to school due to COVID. It proved very useful.

That's an interesting and good point. It used to be possible to distinguish a 
commercial build from an open source one back in Qt 3 and 4 days by using the 
"strings" command and searching for the strings that were added by configure 
during the initial build. I think that changed part-way through Qt 4 and by 
4.8, which is the last release I was still at Nokia for, the binaries were 
indistinguishable (evaluation binaries were different).

I expect that most of those tools are therefore simply using whatever binaries 
they obtained from The Qt Company and didn't rebuild from source. I think this 
is how we at Intel do for the installers for the oneAPI SDK on Linux and macOS 
(the Windows installer got rewritten a few years ago).

Open Source tools probably just ship what they got from download.qt.io. Makes 
things much simpler... which is also part of the reason why I want to raise 
the minimum and do multi-arch.

So if my proposal had been in effect for those releases, it's quite likely the 
tools wouldn't have run on your 13+-year-old computer.

But it isn't. We're talking about the next release, 6.4. There won't be any 
tools built with it until the second half of this year, and commercial 
customers may even want to wait for the LTS release after that.

> I don't know if the linker capability in using the right library
> version can be used in these specific cases, but if somehow it could
> be done it would be just wonderful.

It can be, it's a matter of our deciding what the minimum and other 
optimisation options are.

On Windows, with MSVC, there's no question to be answered. It's only the 
baseline. On Windows with either of the MinGW-capable compilers, we can choose 
the minimum, but there's no option for runtime selection. In either case, 
since Microsoft isn't interested in providing the tools to make code run fast 
in their OS, I say we agree and keep on wasting CPU.

On Linux, we can have the multiple versions. I proposed a minimum of v2 and an 
option of v3, but we can always choose v1+v2+v3. But I really want v2 and v3 
for the critical libraries.

On macOS, the minimum today is already v2, but I am proposing raising it to v3 
because we no longer support the OSes that supported v2 CPUs.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-22 Thread Lisandro Damián Nicanor Pérez Meyer
Hi!

On Wed, 19 Jan 2022 at 14:50, Allan Sandfeld Jensen  wrote:
[snip]
> I have a ~10 year old Phenom II that I use as a media server, it also lacks
> SSE4 (only having AMDs so-called SSE4a). With 3 cores and 4GB of memory it
> runs a modern Qt5 based Linux desktop just fine, even if I don't regularly use
> it as such. So while it is no great loss for me if I needed to replace it with
> a 200€ NUC, it is certainly plausible people have such working old machines. I
> think it is fine to let the default not work on such machines, and let the
> distros that want to support it use v1.

I'm actually replying to this thread using a very alike machine, a
Core 2 Duo T7250 with just ss3, using (almost) latest Plasma and used
as media center, kid's playground, checking email in non office
time...

Keeping the distros able to use whatever baseline is indeed a first
good step, but I'm also thinking about proprietary apps that might be
using tQtC's build for their offerings. We used this very same machine
with Zoom meetings for our kids' school meetings while they couldn't
go to school due to COVID. It proved very useful.

I don't know if the linker capability in using the right library
version can be used in these specific cases, but if somehow it could
be done it would be just wonderful.


-- 
Lisandro Damián Nicanor Pérez Meyer
https://perezmeyer.com.ar/
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-20 Thread Thiago Macieira
On Thursday, 20 January 2022 13:10:08 PST Lorn Potter wrote:
> well, from https://emscripten.org/docs/porting/simd.html
> "also turn on LLVM’s autovectorization passes, so no source
> modifications are necessary to benefit from SIMD."
> so emscripten's simd support is more than just sse2, sse3, etc.

True, but this is the autovectoriser. It operates on pure scalar C++ code and 
applies what it can to speed up. Conceivably, the compiler knows what is fast 
on the target environment and what isn't.

> well, CI doesn't build wasm simd, so in this respect, it doesn't concern me.
> 
> 256-bit intrinsics won't work in wasm/emscripten so don't enable them
> for those platforms that don't support them like wasm.
> 
> Just don't stop platforms from using 128bit intrinsics such as sse2.

Please see the code I'm submitting, like
https://codereview.qt-project.org/c/qt/qtbase/+/387217
https://codereview.qt-project.org/c/qt/qtbase/+/387414
https://codereview.qt-project.org/c/qt/qtbase/+/380895

That means:
* the new functions are inside an #ifdef __SSE2__ block
* the new functions' content and their calls are behind
if constexpr (UseAvx2) {
or
if constexpr (UseAvx256) {
* the new content uses __m256i among other things without #if

This is the request: if you #define __SSE2__, then you MUST provide __m256i 
and the 256-bit x86 intrinsics (up to and including the new ones added in 
AVX512). All or nothing.

You don't have to use them. They just need to be declared.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-20 Thread Lorn Potter



On 21/1/2022 6:10 AM, Thiago Macieira wrote:

On Thursday, 20 January 2022 11:22:19 PST Lorn Potter wrote:

It's like Jean-Michael says, 32 bit, but it's complicated.


Yeah, I can get that :)

 From what I've understood so far, WASM is not __x86_64__ but it might be
__i386__. So far, that's not a problem. I don't think we use any assembly or
non-vector intrinsic in general code (qnumeric.h would come to mind but those
functions need to be constexpr).

That means we could support WASM as i386 ABI version 0: no vector operation
support. We currently default to turning that on, so keeping an option to turn
it back off is in the plans.


well, from https://emscripten.org/docs/porting/simd.html
"also turn on LLVM’s autovectorization passes, so no source 
modifications are necessary to benefit from SIMD."

so emscripten's simd support is more than just sse2, sse3, etc.



Additionally, looks like Emscripten has an option to transform the SSE 128-bit
intrinsics into WASM 128-bit vector content. I don't personally have a problem
with that, but WASM users might if the translation / emulation isn't very
good. From a link that Morten provided, there are several operations we do use
in our code today that are emulated slowly.


From the few benchmarks I ran, there was minimal slowdowns and more 
often a performance enhancement. But I am no expert in simd.




I do have a problem if you prevent me from using the 256-bit intrinsics. This
this the whole point of this thread. I need the 256-bit intrinsics from AVX,
AVX2 and AVX512 to be available and compile.

So my request is that the CI not prevent me from using those intrinsics in
native x86. If they work with Emscripten, great; if they don't, then the
support gets disabled in the CI.


well, CI doesn't build wasm simd, so in this respect, it doesn't concern me.

256-bit intrinsics won't work in wasm/emscripten so don't enable them 
for those platforms that don't support them like wasm.


Just don't stop platforms from using 128bit intrinsics such as sse2.


--
Lorn Potter
Freelance Qt Developer. Platform Maintainer Qt WebAssembly, Maintainer 
QtSensors

Author, Hands-on Mobile and Embedded Development with Qt 5

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-20 Thread Thiago Macieira
On Thursday, 20 January 2022 11:22:19 PST Lorn Potter wrote:
> It's like Jean-Michael says, 32 bit, but it's complicated.

Yeah, I can get that :)

>From what I've understood so far, WASM is not __x86_64__ but it might be 
__i386__. So far, that's not a problem. I don't think we use any assembly or 
non-vector intrinsic in general code (qnumeric.h would come to mind but those 
functions need to be constexpr).

That means we could support WASM as i386 ABI version 0: no vector operation 
support. We currently default to turning that on, so keeping an option to turn 
it back off is in the plans.

Additionally, looks like Emscripten has an option to transform the SSE 128-bit 
intrinsics into WASM 128-bit vector content. I don't personally have a problem 
with that, but WASM users might if the translation / emulation isn't very 
good. From a link that Morten provided, there are several operations we do use 
in our code today that are emulated slowly.

I do have a problem if you prevent me from using the 256-bit intrinsics. This 
this the whole point of this thread. I need the 256-bit intrinsics from AVX, 
AVX2 and AVX512 to be available and compile.

So my request is that the CI not prevent me from using those intrinsics in 
native x86. If they work with Emscripten, great; if they don't, then the 
support gets disabled in the CI.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-20 Thread Lorn Potter




On 20/1/2022 1:29 AM, Thiago Macieira wrote:

On Tuesday, 18 January 2022 20:56:10 PST Lorn Potter wrote:

wasm is a special case, as we turn it off by default, regardless of
detection. We cannot allow detection by default (specified by some
configure argument which is currently -sse2) because browsers do not
support it by default, and there is no way to just not use it once it is
compiled in.


Hello Lorn

Please explain. What architecture is WASM producing binaries for? Is it 32-bit
i386? Or is it 64-bit x86-64? Because the latter requires SSE2 to do floating
point.


You could think of webassembly as javascript.

It's like Jean-Michael says, 32 bit, but it's complicated.







--
Lorn Potter
Freelance Qt Developer. Platform Maintainer Qt WebAssembly, Maintainer 
QtSensors

Author, Hands-on Mobile and Embedded Development with Qt 5

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Thiago Macieira
On Tuesday, 18 January 2022 19:01:06 PST Thiago Macieira wrote:
> 4) up the defaults from where they are today
> 
> Today, your default Qt build will always target the x86-64 baseline[*],
> including for i386, despite as I said no CPU failing to meet the next level
> for 9 years. I'd like to request we up that minimum.
> 
> By default, I'd like us to produce x86-64 v2 code, which is SSE4.

I've just realised the above cannot be done for Visual Studio because it can't 
generate code for SSE4. The /arch option[1] only has selections for AVX, AVX2 
and AVX512. This means Visual Studio binaries must remain where they are 
today.

So what shall we do for MinGW? Both compilers (Clang and GCC) *can* generate 
SSE4 code. But is it worth having a different minimum for MinGW compared ti 
MSVC?

[1] https://docs.microsoft.com/en-us/cpp/build/reference/arch-x64

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Thiago Macieira
On Wednesday, 19 January 2022 08:48:37 PST Thiago Macieira wrote:
> If Microsoft wants their OS to have better-performing content, they'll have
> to come up with a solution. I do plan to reach out to them via the team
> that works with them at Intel, but I don't expect to see any solution, at
> least not before 2030. There are some workarounds for this (search for
> "delay-loaded DLL") but they've left me with a bad taste in my mouth.

Confirmed there is no current solution. I'm still trying to find out and 
influence a possible future solution, but anything I may learn wouldn't be 
usable for 6.4, so let's count this out.

In my reply to Eddy:
> * not all libc/libm are good. In particular, MinGW's support lacks those
>   optimisations (I've just checked). I haven't disassembled MSVC's Runtime
>   to find out what it does.


While MinGW's runtime library (libmingwex) has NO code above SSE2, Microsoft 
Visual Studio's does in quite a few math library functions (looks like all 
transcendental ones). But not simpler ones like floor() and definitely not any 
of the string functions. In fact, quite a few string functions don't even have 
any vector code at all, like for example memcmp().

So my conclusion so far is that if you care about CPU performance, you're not 
using Windows.


-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Kevin Kofler via Development
Thiago Macieira wrote:
> Whether Fedora decides to up the minimum requirement for a future edition
> is something you'll be in a better position to answer than I am.

So far, I and few others have successfully blocked attempts at upping the 
minimum requirement for Fedora. (The requirement for the eln branch matches 
the EL one though. That was originally the rationale for the branch's 
creation, after a proposal to up the requirement throughout Fedora failed, 
but eln has since evolved to also include other EL-only changes to Rawhide 
before it branches to CentOS Stream.)

> The next question is whether you do (or would do) Qt 6 development on that
> laptop, while travelling and, if so, if you use (or would use) the binary
> packages from qt.io. If you build from source, it would not be a problem;
> if you use a distribution package, it's out of scope.

As already mentioned: I personally have no problem with you bumping the 
minimum requirement on the prebuilt upstream binaries as long as 
distribution packages can still be built for older x86 "versions", because I 
only use the latter anyway.

>> But it is not my primary computer, my primary
>> computer is the desktop on which I am typing this: Sandy Bridge Core
>> i7-2600K (released 2011, supports up to AVX(1)), 16 GiB RAM.
> 
> Sandy Bridges and Ivy Bridges are more like the previous generation
> (Westmere) than the next (Haswell). For example, even though SNB does have
> a 32-byte-load instruction and it can issue two load instructions per
> cycle, it can't do two 32-byte loads in the same cycle. That's something
> Haswell can and it is a cornerstone of the optimisations I've just made.
> So for those two, we'll keep the previous generation of optimisations,
> that focus on 16-byte (128-bit) loads and stores.

So both my computers are exactly one generation short of the next major 
improvement. Sad. :-(

> I used to have an SNB-based Mac Mini, but since Apple stopped providing
> updates for that generation, I was forced to upgrade.

Thankfully, we are not Apple! I am happy to be able to still use 10+ year 
old hardware without planned obsolescence.

> In my other $DAYJOB, I still need to supprt the SNB and IVB-based servers
> (Jaketown and Ivytown, see [1]).
> 
> [1] https://en.wikipedia.org/wiki/List_of_Intel_codenames

And that's good. :-)

Kevin Kofler

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Thiago Macieira
On Wednesday, 19 January 2022 14:03:41 PST Kevin Kofler via Development wrote:
> Thiago Macieira wrote:
> > Clear Linux attempts to use a heuristic to guess which libraries it thinks
> > are worth keeping the AVX2 version of. To see which ones it thought of
> > qtbase, see https://github.com/clearlinux-pkgs/qtbase/blob/
> > e16f08be736d28351219b05e807a6468ea39341b/qtbase.spec#L5771-L5902
> 
> Why is the baseline libQt5Sql in -extras and the haswell one in -lib?

Unintended.

First of all, this file no longer applies. Clear Linux switched to a different 
solution for providing v3 and v4 content.

The reason is that the previous solution involved the heuristic script[1] I 
had mentioned: it tried to guess whether it was worth keeping the v3 and v4 
files. If it thought it wasn't worth, it simply deleted the file. Then 
Autospec wouldn't package it.

The qtbase-extras package exists to enable Quassel Core without bringing the 
full X11 dependency. It is a manual solution. See
https://github.com/clearlinux-pkgs/qtbase/blob/master/extras

So my guess is that an update to the compiler caused it to generate different 
code in QtSql that the script now concluded was worth keeping. As it isn't 
mentioned in the "extras" file, it got packaged as a regular library.

[1] https://github.com/clearlinux/clr-avx-tools

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Thiago Macieira
On Wednesday, 19 January 2022 13:52:21 PST Kevin Kofler via Development wrote:
> The notebook has 4 GiB RAM. (That was the maximum available. I picked it
> because I wanted the notebook to last.) Fedora 35 with KDE Plasma runs fine
> on it. Not fast, but usable.

Whether Fedora decides to up the minimum requirement for a future edition is 
something you'll be in a better position to answer than I am.

The next question is whether you do (or would do) Qt 6 development on that 
laptop, while travelling and, if so, if you use (or would use) the binary 
packages from qt.io. If you build from source, it would not be a problem; if 
you use a distribution package, it's out of scope.

> But it is not my primary computer, my primary
> computer is the desktop on which I am typing this: Sandy Bridge Core
> i7-2600K (released 2011, supports up to AVX(1)), 16 GiB RAM.

Sandy Bridges and Ivy Bridges are more like the previous generation (Westmere) 
than the next (Haswell). For example, even though SNB does have a 32-byte-load 
instruction and it can issue two load instructions per cycle, it can't do two 
32-byte loads in the same cycle. That's something Haswell can and it is a 
cornerstone of the optimisations I've just made. So for those two, we'll keep 
the previous generation of optimisations, that focus on 16-byte (128-bit) 
loads and stores.

I used to have an SNB-based Mac Mini, but since Apple stopped providing 
updates for that generation, I was forced to upgrade.

In my other $DAYJOB, I still need to supprt the SNB and IVB-based servers 
(Jaketown and Ivytown, see [1]).

[1] https://en.wikipedia.org/wiki/List_of_Intel_codenames

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Kevin Kofler via Development
Thiago Macieira wrote:
> Clear Linux attempts to use a heuristic to guess which libraries it thinks
> are worth keeping the AVX2 version of. To see which ones it thought of
> qtbase, see https://github.com/clearlinux-pkgs/qtbase/blob/
> e16f08be736d28351219b05e807a6468ea39341b/qtbase.spec#L5771-L5902

Why is the baseline libQt5Sql in -extras and the haswell one in -lib?

Kevin Kofler

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Kevin Kofler via Development
Thiago Macieira wrote:
> I understand. I have one of those in a cabinet, but it doesn't power on
> (the PSU is bust).

My notebook's power supply adapter went bust in 2019, but thankfully, the 
circuitry inside the notebook is fine, only the external adapter had broken 
down. So I replaced it with a cheap universal one, which fixed the problem.

> How much RAM do you have? How usable is a modern Linux desktop on it?

The notebook has 4 GiB RAM. (That was the maximum available. I picked it 
because I wanted the notebook to last.) Fedora 35 with KDE Plasma runs fine 
on it. Not fast, but usable. But it is not my primary computer, my primary 
computer is the desktop on which I am typing this: Sandy Bridge Core 
i7-2600K (released 2011, supports up to AVX(1)), 16 GiB RAM. That said, the 
2008 Core 2 Duo notebook goes with me when I travel and/or give 
presentations, and still serves me well for those purposes. And it has more 
RAM than my third GNU/Linux machine, a PinePhone with 3 GiB RAM (edition 
"with convergence package") – but that one is of course ARM aarch64, not 
x86, so it is not affected by this thread at all.

Kevin Kofler

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Thiago Macieira
On Wednesday, 19 January 2022 08:55:46 PST Thiago Macieira wrote:
> I'm not proposing we change Xcode project file generation. Only Qt's own
> CMake-based build.

I've just tried an x86_64h;arm64 universal build and it failed because of the 
ARM build. There's something wrong with my SDK or command-line tools.

So I tried an x86_64h non-universal build and it worked, so long as I disable 
either PCH or ccache (the two together interfere with each other).

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Thiago Macieira
On Wednesday, 19 January 2022 09:51:38 PST Alexandru Croitor wrote:
> > On 19. Jan 2022, at 18:34, Thiago Macieira 
> > wrote:
> > 
> > Indeed. I'm hoping it's a matter of making qt_internal_add_module()
> > creating two CMake targets instead of one, and modifying the C and C++
> > compiler flags as well as output dir for one of them.
> 
> Which library names appear in the dynamic section of v3/v4 libraries, as
> reported by readelf -d? Is it the original non-suffixed v2 library? Put
> differently, when building the v4 version of QtGui, does it need to link to
> QtCore-v4 or QtCore? 

They should link to the baseline, whatever the baseline the user chose to be. 
In fact, the "libQt6Xxxx.so" symlink only needs to exist at the top-dir 
because the regular linker doesn't search the subdirs. So it's actually 
important that both binaries have the same ABI and export the same functions, 
even if they wouldn't be used by a matching build. I made this mistake in 
Clear Linux by suppressing the qfloat16 tables in the AVX2 build.

If you install my libqt5-qtbase RPMs for OpenSUSE, you get:

/usr/lib64/haswell/libQt5Core.so.5@
/usr/lib64/haswell/libQt5Core.so.5.15@
/usr/lib64/haswell/libQt5Core.so.5.15.2
/usr/lib64/haswell/libQt5Gui.so.5@
/usr/lib64/haswell/libQt5Gui.so.5.15@
/usr/lib64/haswell/libQt5Gui.so.5.15.2
/usr/lib64/libQt5Core.prl
/usr/lib64/libQt5Core.so@
/usr/lib64/libQt5Core.so.5@
/usr/lib64/libQt5Core.so.5.15@
/usr/lib64/libQt5Core.so.5.15.2
/usr/lib64/libQt5Gui.prl
/usr/lib64/libQt5Gui.so@
/usr/lib64/libQt5Gui.so.5@
/usr/lib64/libQt5Gui.so.5.15@
/usr/lib64/libQt5Gui.so.5.15.2
/usr/lib64/pkgconfig/Qt5Core.pc
/usr/lib64/pkgconfig/Qt5Gui.pc

@ are symlinks. I didn't include the CMake files in the paste, but they are 
like prl and pc files.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Thiago Macieira
On Wednesday, 19 January 2022 09:28:40 PST Edward Welbourne wrote:
> Thiago Macieira (19 January 2022 17:48) replied:
> > That's a misconception. AVX and especially AVX2 introduce a lot of
> > codegen opportunities for the compilers, which they've been able to
> > use for years.
> 
> Is the difference here:
> * We have code that overtly conditions on the availability of CPU
>   features (for example in the places Lars mentioned) vs
> * The compiler can achieve some optimizations, on which we currently
>   miss out, if we pass a relevant command-line option telling it to do
>   so (or omit one telling it not to) ?
> 
> (Hoping you'll educate me if I'm being dense.)

Hello Eddy

It's both.

The compilers can generate better code if the relevant flags are passed on the 
command-line or are built-in. We build QtCore and QtGui (or maybe is it all 
libraries now in Qt6?) with -O3, which enables the auto-vectoriser in GCC and 
Clang. Raising the minimum targeted architecture allows them more options to 
generate faster code.

But it's not just vector code. Compare
 https://gcc.godbolt.org/z/h3WfWsGEz (x86-64 baseline)
with
 https://gcc.godbolt.org/z/vPcKqbT7P (x86-64-v2, except for MSVC)

The floor() function in any good libc/libm will have the runtime detection and 
use the instruction to implement the functionality. glibc does:
https://code.woboq.org/userspace/glibc/sysdeps/x86_64/fpu/multiarch/s_floor-sse4_1.S.html

The problems are:
* not all libc/libm are good. In particular, MinGW's support lacks those 
  optimisations (I've just checked). I haven't disassembled MSVC's Runtime to 
  find out what it does.

* some compilers may opt to not make a function call, like GCC did in the 
  example above. When running on a pre-SSE4 CPU, the GCC generated code is 
  actually better. However, since the overwhelming majority of CPUs this code 
  will run on do have SSE4, GCC's codegen is actually a pessimisation.

* even in the best case scenario (the other compilers), you still have a 
  function call, which on ELF platforms means going through the PLT. So 
  instead of a single instruction, we have a CALL, then an indirect JMP, then 
  that instruction. That's unnecessary overhead for 99.9% of all users.

This is not an isolated example. If I disassemble my QtCore, I see a lot of 
other scalar instructions:
$ objdump -d libQt6Core.so | egrep -c '(movbe|sarx|shrx|shlx|[tl]zcnt|popcnt)'
638

Each of those saves a cycle here and there, so it's not worth making a runtime 
decision to use them. Instead, they must be used opportunistically. This would 
be especially beneficial for math-heavy libraries like Qt3D.


Then there's our optimised code. qstring.cpp has a lot of it and it's not 
selected at runtime, for the same reason: the overhead of selecting is higher 
than the benefit. Most of our strings are fairly small: a histogram of all 
calls to those functions from a Qt Creator start shows they peak around 5-10 
characters, then drop sharply with a long tail. This means those operations 
suffer greatly from overhead and what matters most is latency, not throughput. 
That's very different from image manipulation, in QtGui's drawhelpers: even a 
small 16x16 image is 1024 bytes. So any overhead in making a selection is 
quickly amortised there, but not so for strings.

Would it be worth for some of those operations in qstring.cpp? Probably, 
particularly after my last round of optimisations. I especially think so if I 
could use GNU IFUNC support, which would mean all callers would jump directly 
into one of the optimised functions, instead of calling a function that then 
calls another. But optimising the entire library means we get more, at the 
cost of some extra time building and some more files in your system. It's also 
a generic solution, instead of targetting particular functions. So it should 
be a win-win: better performance at lower maintenance cost.

(*) we've long-since shortened it to 4 characters. And AVX512 has a trick that 
allows us to use it even down to a single byte (see my outgoing changes in 
Gerrit).

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Alexandru Croitor



> On 19. Jan 2022, at 18:34, Thiago Macieira  wrote:
> 
> Indeed. I'm hoping it's a matter of making qt_internal_add_module() creating 
> two CMake targets instead of one, and modifying the C and C++ compiler flags 
> as well as output dir for one of them. 

Which library names appear in the dynamic section of v3/v4 libraries, as 
reported by readelf -d? Is it the original non-suffixed v2 library?
Put differently, when building the v4 version of QtGui, does it need to link to 
QtCore-v4 or QtCore?
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Allan Sandfeld Jensen
On Mittwoch, 19. Januar 2022 16:41:11 CET Thiago Macieira wrote:
> On Tuesday, 18 January 2022 22:43:40 PST Kevin Kofler via Development wrote:
> > Thiago Macieira wrote:
> > > By default, I'd like us to produce x86-64 v2 code, which is SSE4.
> > 
> > But v1 will still be available for distribution packaging? As long as that
> > is the case, I do not see a major issue, it will just be one more caveat
> > for distribution packaging. (Distributions still supporting v1, which I
> > think is most of the distros these days, will have to enable it
> > explicitly, possibly along with newer vn (n>1) if optimized builds are
> > desired.) But dropping support for v1 entirely causes headaches for
> > distributions.
> Yes, that's the idea. I'm just looking at raising our defaults in this case,
> not stop the older solutions, mostly because the compatibility we're
> talking about is with very old machines: the Intel Core line got SSE4.2 in
> 2008 with Nehalem, AMD got it with Bulldozer in 2011 and the Atom line got
> it in 2013 with Silvermont. Meanwhile, foregoing the SSE4 optimisations
> afforded by v2 leaves some performance on the table. Yes, most
> distributions still target v1 today but that's mostly for inertia reasons.
> As I said, I understand Red Hat 9 is going to up the minimum to v2.
> 
> Additionally, I'd like to make it easy to have both v1+v2 or, better yet,
> v1+v2+v3, so you can have your cake and eat it too. Some libraries like
> QtCore, QtGui and the Qt 3D ones make extensive use of math and would
> benefit from the extra operations, especially those of AVX.
> 
> > There are still (end) users of old hardware. E.g., my notebook is a Core 2
> > Duo that supports up to SSSE3 (so v1 + SSE3 + SSSE3), but no SSE4. So it
> > unfortunately falls one generation short of v2. (My desktop supports v2,
> > but not v3, because it is missing at least AVX2.) But as long as the
> > distribution packages work on it, I do not really care about what vn or
> > SSEn the Qt upstream binaries require.
> 
> I understand. I have one of those in a cabinet, but it doesn't power on (the
> PSU is bust). How much RAM do you have? How usable is a modern Linux
> desktop on it?

I have a ~10 year old Phenom II that I use as a media server, it also lacks 
SSE4 (only having AMDs so-called SSE4a). With 3 cores and 4GB of memory it 
runs a modern Qt5 based Linux desktop just fine, even if I don't regularly use 
it as such. So while it is no great loss for me if I needed to replace it with 
a 200€ NUC, it is certainly plausible people have such working old machines. I 
think it is fine to let the default not work on such machines, and let the 
distros that want to support it use v1.

'Allan


___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Thiago Macieira
On Wednesday, 19 January 2022 00:57:04 PST Alexandru Croitor wrote:
> I believe this will pessimize optimisations for certain platforms as pointed
> out in the linked change.
> 
> Specifically QNX and iOS simulator builds.
> 
> https://codereview.qt-project.org/c/qt/qtbase/+/386738/11#message-5015480c07
> 228dd2088d6b1aba137927725a06b2

iOS simulator builds should benefit from enabling extra target support, 
because they're *simulator* builds, not *emulator*. They are running native 
code, after all.

QNX failed to detect support for AVX512 because it required explicit action of 
ours in cmake/QtCompilerOptimization.cmake that the compiler supported it. 
Since I don't have access to the toolchain, I'd never noticed this failing. 
Fixed by https://codereview.qt-project.org/c/qt/qtbase/+/386954

It also failed to detect RDSEED because of a compiler bug. That can be worked 
around easily because it's a single instruction. Or we can disable FIPS-
compliant HW-based random number generation on QNX until Blackberry fixes 
their compiler, which is also fine with me.

As a side note, companies that keep their toolchains behind paywalls are 
basically telling developers "we're fine with our platforms getting little 
support". I'm looking at you too, Green Hills.

> Perhaps WASM and Integrity as well.

WASM needs looking into. See discussion with Lorn.

INTEGRITY wouldn't be affected. Its only build in the CI is for ARM. If it 
supports x86, then they would benefit from having better codegen and 
optimisations like everyone. Since INTEGRITY is almost always device-specific, 
it shouldn't enable runtime detection in the first place.

> > 3) add a way to have multi-arch glibc-based Linux builds
> 
> Who is going to implement this and how?
> 
> CMake basically has no support for proper multi-arch builds.
> 
> For macOS / iOS we rely on Apple clang supporting it. That's not available
> for Linux.

Indeed. I'm hoping it's a matter of making qt_internal_add_module() creating 
two CMake targets instead of one, and modifying the C and C++ compiler flags 
as well as output dir for one of them. 

I have yet to prototype this, though. I have no idea how difficult it would 
be.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Edward Welbourne
On Wednesday, 19 January 2022 00:13:32 PST Lars Knoll wrote:
>> AVX is only used by a couple of classes in Qt Core and the drawhelper
>> in Qt Gui. Qt Gui already does runtime detection, so it would be only
>> about adding that to the methods in Qt Core.

Thiago Macieira (19 January 2022 17:48) replied:
> That's a misconception. AVX and especially AVX2 introduce a lot of
> codegen opportunities for the compilers, which they've been able to
> use for years.

Is the difference here:
* We have code that overtly conditions on the availability of CPU
  features (for example in the places Lars mentioned) vs
* The compiler can achieve some optimizations, on which we currently
  miss out, if we pass a relevant command-line option telling it to do
  so (or omit one telling it not to) ?

(Hoping you'll educate me if I'm being dense.)

Eddy.
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Thiago Macieira
On Wednesday, 19 January 2022 04:23:22 PST Allan Sandfeld Jensen wrote:
> On Mittwoch, 19. Januar 2022 04:01:06 CET Thiago Macieira wrote:
> > 5) for glibc-based Linux, add v3 sub-arch by default
> > 
> > I'd like to raise the default on Linux from baseline to v2 *and* add a v3
> > sub- arch build, as described by point #3 above.
> > 
> > Device-specific Qt builds (Yocto Project, Boot2Qt) would need to turn this
> > off and select a single architecture, if they don't want the extra files.
> 
> I am also sceptical what we would gain from that. It seems mostly like
> something that could benefit QtCore, so perhaps only do v3 sub-arch  there?

That is what I'm proposing: do the extra, v3 sub-arch for only a few select 
libraries. Off the top of my head, that's QtCore, QtGui and some Qt 3D ones. 
Anything that is math-heavy would benefit

> Do Clear Linux have numbers for what they have gained by using a v3 like
> default?

v3 is not the default, v2 is. But we do build qtbase and qt3d as v2+v3, plus 
quite a few more packages (and a handful are v2+v3+v4). As I mentioned in the 
reply to Lars, just search for benchmarks on phoronix.com.

The benefit isn't a lot in the general case. It's a few percent here and 
there, but it's consistent.

You get far more with dedicated algorithms, which is why I am optimising 
qstring.cpp for AVX512VL. Yesterday, while debugging QTBUG-91739, I got the 
opportunity to benchmark ucstrncmp (QtPrivate::compareStrings). See [1] for 
details, but it showed a 20% improvement on v3 over v2 and an additional 10.7% 
on v4 over v3 (for 28.8% gain overall). Please note that the v2 and v3 code 
are already using the new epilogue code I added in [2] plus all the cross-
platform optimisations listed in that JIRA comment, so this is all already 
better than what you have.

However, note that all strings had the same 90-byte (45-character) length, so 
this is not very representative of the real world. In particular, the AVX512VL 
code should show an extra improvement for strings under 16 characters and this 
wasn't exercised.

[1] https://bugreports.qt.io/browse/QTBUG-91739?
focusedCommentId=641671#comment-641671
[2] https://codereview.qt-project.org/c/qt/qtbase/+/390698
-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Thiago Macieira
On Wednesday, 19 January 2022 05:04:07 PST Tor Arne Vestbø wrote:
> Hey hey,
> 
> On 19 Jan 2022, at 04:01, Thiago Macieira
> mailto:thiago.macie...@intel.com>> wrote:
 
> 3) add a way to have multi-arch glibc-based Linux builds
> 
> If we go down this road I ask that we align both the porcelain and plumbing
> (configure, build system, C++ APIs, etc) . We already have universal builds
> on macOS and iOS done one way, and multi-arch/abi on Android another way.
> Let’s not add a third slightly different way, but instead use the
> opportunity to pay off some of the technical debts in this area.

I don't know how Android does this. I'll take some time to study and get back 
to you.

macOS is different because it's done by the compiler: you ask for an universal 
build and the compiler does it for you. We have to be careful of the extra 
files that are arch-specific, but that can be solved at the source level.

I'm not sure how much can be shared because they're trying to solve different 
problems.
 
> One data point here, that I don’t know is worth anything, is that on macOS
> 12 at least, none of the system binaries in /bin or /sbin are x86_64h, they
> are all x86_64+arm64e (arm64e reserved for Apple for now).

I don't think they ever were x86_64h. The trick is in the libraries in 
/usr/lib and in /System/Library/Frameworks. For example:

$ (cd /System/Library/Frameworks/CoreFoundation.framework/; file -L 
CoreFoundation)
CoreFoundation: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 
64-bit dynamically linked shared library x86_64] [x86_64h]
CoreFoundation (for architecture x86_64):   Mach-O 64-bit dynamically 
linked shared library x86_64
CoreFoundation (for architecture x86_64h):  Mach-O 64-bit dynamically 
linked shared library x86_64h

This is what I'm proposing: that we update *libraries* not applications.
 
> It might also need some more work than just changing our default. E.g,
> changing the arch in a simple Xcode project gives:
 
> The run destination My Mac is not valid for Running the scheme
> 'rasterwindow’.
 My Mac doesn’t support any of rasterwindow.app’s
> architectures. You can set rasterwindow.app’s Architectures build setting
> to Standard Architectures to support My Mac. 
> This is on a 2019 MBP

That means the Xcode solution is lacking, not the OS is.

$ cat hello.c 
#include 
int main() { puts("Hello, World!"); }
$ /Applications/Xcode.app/Contents/Developer/Toolchains/
XcodeDefault.xctoolchain/usr/bin/clang \
-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/
MacOSX.platform/Developer/SDKs/MacOSX.sdk \
-arch x86_64h -arch arm64e hello.c
$ file a.out 
a.out: Mach-O universal binary with 2 architectures: [x86_64h:Mach-O 64-bit 
executable x86_64h] [arm64e:Mach-O 64-bit executable arm64e]
a.out (for architecture x86_64h):   Mach-O 64-bit executable x86_64h
a.out (for architecture arm64e):Mach-O 64-bit executable arm64e
$ ./a.out 
Hello, World!

I'm not proposing we change Xcode project file generation. Only Qt's own 
CMake-based build.

> I believe we detect that situation at runtime and explicitly turn off AVX
> support, so we wouldn’t be hitting any of those AVX code paths, if I
> understand things correctly?

No, that's not what I meant. You're referring to the current solution, which 
is that macOS binaries are x86-64-v2 and runtime select some extra features in 
QtGui. When run inside Rosetta2, the CPUID will tell the binaries that AVX is 
unavailable and they wouldn't be enabled.

What I'm proposing is that we skip the CPUID detection and enable AVX2 
everywhere, by using x86_64h for all of Qt's content. This would apply to 
QtCore and qt3d, which do have AVX2 content that isn't getting enabled and 
thus is leaving performance on the table. But if run inside Rosetta2, it will 
either crash or fail to load in the first place due to the missing x86_64 
architecture inside the fat binary.

Why am I even asking this if we have ARM builds? I was thinking of 
applications that are forced to remain on x86-64 because of some other content 
(theirs or, usually, a proprietary third-party) that does not offer an ARM 
version. This was the reason that Apple kept the 32-bit x86 for a long time 
after they stopped supporting that processor with updates and so did we. Those 
people would need to download Qt sources and rebuild using x86_64 as the 
target (they can probably disable the ARM builds too).

I'm thinking that those are the minority and exception to the rule. They're 
often commercial customers, though, but I guess Qt Company will happily take 
their money and make a special build for them. Meanwhile, everyone else is 
leaving performance on the table.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Jean-Michaël Celerier
Re. wasm, as far as I know it's 32-bit (not x86 at all) but with a
compatibility layer provided by emscripten
which allows sse & avx intrinsincs to be translated to either equivalent
vector instructions in the wasm bytecode,
or shims which do it manually.

Cheers,

--
Jean-Michaël Celerier
*cto* ossia.io | *consulting inquiries* celtera.dev | *personal*
jcelerier.name
t: +336 81 31 53 08


On Wed, Jan 19, 2022 at 4:30 PM Thiago Macieira 
wrote:

> On Tuesday, 18 January 2022 20:56:10 PST Lorn Potter wrote:
> > wasm is a special case, as we turn it off by default, regardless of
> > detection. We cannot allow detection by default (specified by some
> > configure argument which is currently -sse2) because browsers do not
> > support it by default, and there is no way to just not use it once it is
> > compiled in.
>
> Hello Lorn
>
> Please explain. What architecture is WASM producing binaries for? Is it
> 32-bit
> i386? Or is it 64-bit x86-64? Because the latter requires SSE2 to do
> floating
> point.
>
> --
> Thiago Macieira - thiago.macieira (AT) intel.com
>   Software Architect - Intel DPG Cloud Engineering
>
>
>
> ___
> Development mailing list
> Development@qt-project.org
> https://lists.qt-project.org/listinfo/development
>
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Thiago Macieira
On Wednesday, 19 January 2022 00:13:32 PST Lars Knoll wrote:
> The main thing I’m wondering about is how much performance we gain by from a
> multi arch build Qt for different x86_64 architectures opposed to building
> maybe for v2 and detecting/using AVX and AVX512 at runtime. I would assume
> it’s little we gain, as there are very few places where the compilers
> auto-vectorizer will be able to emit AVX instructions, but I might be wrong
> here.
 
> AVX is only used by a couple of classes in Qt Core and the drawhelper in Qt
> Gui. Qt Gui already does runtime detection, so it would be only about
> adding that to the methods in Qt Core. 

Hello Lars

That's a misconception. AVX and especially AVX2 introduce a lot of codegen 
opportunities for the compilers, which they've been able to use for years. The 
v3 level also introduces some lesser-known features like MOVBE and BMI, which 
are new scalar instructions, not SIMD. But even v2 brings in interesting 
instructions that match practically 1:1 some C library functions, like  
ROUNDPS and ROUNDPD.

QtGui draw helpers do runtime detection, but QtCore does not. And this is the 
issue: each of those individual optimisations is small enough that the 
overhead of selecting at runtime is higher than the benefit. We're talking 
about very hot functions such as QString::fromLatin1 taking an (extra) 
indirect function call for everything. But the aggregate of those 
optimisations is worth it, for negligible cost in producing them: build the 
entire library twice and let the dynamic linker choose.

I do plan on looking into runtime detection in QtCore for 6.5, but I'm not 
convinced I can make it work with sufficient low overhead in all platforms. I 
know I can for Linux with IFUNC support, but doing so for Linux only is not 
worth our collective time (development, review, long-term maintenance).

For the QtGui runtime detection, we have a ticking time bomb. See below.
 
> > I propose we remove the tests for the intrinsics of each individual CPU 
> > feature. Instead, let's just assume they all have everything up to 2016.
> > This  will shorten cmake time a little and fix the macOS universal
> > builds. It'll also change how 32-bit non-SSE2 builds are selected (see
> > below).
> > 
> > The change https://codereview.qt-project.org/c/qt/qtbase/+/386738 is going
> > in  this direction but retains a test (all or nothing). I'm proposing
> > now we remove the test completely and just assume.
> 
> 
> I’m fine with that, I don’t think we need to support a compiler that doesn’t
> support those. 

As I said, for the record: all compilers we support do support all of them, 
based on the CI runs of those changes. There's only one issue and it's the QCC 
compiler lacking the RDSEED intrinsics, but that's easily worked around (it 
does support inline assembly and it might support the low-level 
__builtin_ia32_rdseed_si_step() intrinsic.
 
> Can we at the same time do the same thing for NEON btw. While there are some
> platforms that don’t support NEON, I believe all compilers do support
> them.

Neon is different.

First, it's never selected at runtime. The issue that affected the NEON code 
generation is actually present in x86 too, but we've been lucky to avoid it. 
This is the ticking time bomb I was talking about: whenever you compile C++ 
sources and the compiler decides not to inline an inlineable function, it will 
create a copy of it and call that. This may happen in multiple translation 
units, so the linker chooses any one of them to emit in the final binary 
(usually, it's the one from the first .o that offered it, but that's just 
implementation behaviour). 

So what happens if the copy came from the higher-target .o? Kaboom. This is 
what affected our Neon builds and this is why we don't detect it at runtime. 
I've spoken to GCC and GNU Binutils maintainers about this issue. There's 
little incentive for them to provide a workaround for this, so it won't 
happen. The only solution they offer is to have different libraries or 
plugins.

So, can we change how we detect Neon? Sure. We can simply assume it's there 
all the time, since it is there all the time anyway. I know the ARMv8 
architecture (read: 64-bit, AArch64) requires it, like 64-bit x86 requires 
SSE2, so that's inescapable. The question is therefore whether we want to 
always enable it on 32-bit ARMv7. I'd say it should get the same answer as 32-
bit i386: yes, enable it by default but allow disabling it.
 
> See my comment above. We also need to think about non Linux platforms.
> Multi-arch is difficult on Windows as far as I know, so a v2 baseline build
> and runtime detection might be preferable.

Indeed, this solution is specific to glibc-based Linux. It does not apply to 
other OSes because it's a solution predicated on the system's dynamic linker 
being able to select different files or different sections of a file based on 
CPU identification.

If Microsoft wants their OS to have better-performing 

Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Thiago Macieira
On Tuesday, 18 January 2022 22:43:40 PST Kevin Kofler via Development wrote:
> Thiago Macieira wrote:
> > By default, I'd like us to produce x86-64 v2 code, which is SSE4.
> 
> But v1 will still be available for distribution packaging? As long as that
> is the case, I do not see a major issue, it will just be one more caveat for
> distribution packaging. (Distributions still supporting v1, which I think
> is most of the distros these days, will have to enable it explicitly,
> possibly along with newer vn (n>1) if optimized builds are desired.) But
> dropping support for v1 entirely causes headaches for distributions.

Yes, that's the idea. I'm just looking at raising our defaults in this case, 
not stop the older solutions, mostly because the compatibility we're talking 
about is with very old machines: the Intel Core line got SSE4.2 in 2008 with 
Nehalem, AMD got it with Bulldozer in 2011 and the Atom line got it in 2013 
with Silvermont. Meanwhile, foregoing the SSE4 optimisations afforded by v2 
leaves some performance on the table. Yes, most distributions still target v1 
today but that's mostly for inertia reasons. As I said, I understand Red Hat 9 
is going to up the minimum to v2.

Additionally, I'd like to make it easy to have both v1+v2 or, better yet, 
v1+v2+v3, so you can have your cake and eat it too. Some libraries like 
QtCore, QtGui and the Qt 3D ones make extensive use of math and would benefit 
from the extra operations, especially those of AVX.

> There are still (end) users of old hardware. E.g., my notebook is a Core 2
> Duo that supports up to SSSE3 (so v1 + SSE3 + SSSE3), but no SSE4. So it
> unfortunately falls one generation short of v2. (My desktop supports v2, but
> not v3, because it is missing at least AVX2.) But as long as the
> distribution packages work on it, I do not really care about what vn or SSEn
> the Qt upstream binaries require.

I understand. I have one of those in a cabinet, but it doesn't power on (the 
PSU is bust). How much RAM do you have? How usable is a modern Linux desktop 
on it?

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Thiago Macieira
On Tuesday, 18 January 2022 20:56:10 PST Lorn Potter wrote:
> wasm is a special case, as we turn it off by default, regardless of
> detection. We cannot allow detection by default (specified by some
> configure argument which is currently -sse2) because browsers do not
> support it by default, and there is no way to just not use it once it is
> compiled in.

Hello Lorn

Please explain. What architecture is WASM producing binaries for? Is it 32-bit 
i386? Or is it 64-bit x86-64? Because the latter requires SSE2 to do floating 
point.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Tor Arne Vestbø
Hey hey,

On 19 Jan 2022, at 04:01, Thiago Macieira 
mailto:thiago.macie...@intel.com>> wrote:

3) add a way to have multi-arch glibc-based Linux builds

If we go down this road I ask that we align both the porcelain and plumbing 
(configure, build system, C++ APIs, etc) . We already have universal builds on 
macOS and iOS done one way, and multi-arch/abi on Android another way. Let’s 
not add a third slightly different way, but instead use the opportunity to pay 
off some of the technical debts in this area.


4) up the defaults from where they are today

Question:
- iOS simulator builds are x86, but currently only SSE2. Does anyone know if
raising to SSE4, which *ALL*  64-bit Mac machines support, would be a problem?

That should be fine. The only reason SSE2 was chosen at the time was to ensure 
we built the draw helpers for the simulator build.

6) for macOS, raise the minimum to v3 (x86_64h)


macOS has supported an extra architecture called "x86_64h" for some time (the
"h" stands for "haswell"). Apple ceased offering macOS updates to processors
without AVX2 back with the Mojave release (10.14) in 2018. Since that's the
minimum version we require for Qt, it means all Intel-based Macs Qt can run on
also support this sub-arch.

One data point here, that I don’t know is worth anything, is that on macOS 12 
at least, none of the system binaries in /bin or /sbin are x86_64h, they are 
all x86_64+arm64e (arm64e reserved for Apple for now).

On the other hand, the dyld shared cache (of all the system frameworks) 
provides x86_64h:

❯ ls /System/Library/dyld/
aot_shared_cache.0aot_shared_cache.3
aot_shared_cache.t8027.2  dyld_shared_cache_arm64e.1
dyld_shared_cache_x86_64.1dyld_shared_cache_x86_64.map  
dyld_shared_cache_x86_64h.2
aot_shared_cache.1aot_shared_cache.t8027.0  
aot_shared_cache.t8027.3  dyld_shared_cache_arm64e.map  
dyld_shared_cache_x86_64.2dyld_shared_cache_x86_64h 
dyld_shared_cache_x86_64h.3
aot_shared_cache.2aot_shared_cache.t8027.1  
dyld_shared_cache_arm64e  dyld_shared_cache_x86_64  
dyld_shared_cache_x86_64.3dyld_shared_cache_x86_64h.1   
dyld_shared_cache_x86_64h.map

It might also need some more work than just changing our default. E.g, changing 
the arch in a simple Xcode project gives:

The run destination My Mac is not valid for Running the scheme 'rasterwindow’.
My Mac doesn’t support any of rasterwindow.app’s architectures. You can set 
rasterwindow.app’s Architectures build setting to Standard Architectures to 
support My Mac.

This is on a 2019 MBP

Xcode’s ARCH_STANDARD build variable is [x86_64, arm64].

The same problem persists after updating VALID_ARCHS from the default [arm64 
arm64e i386 x86_64] to [arm64 arm64e i386 x86_64 x86_64h].

I'd like to do this for all libraries and by default on binaries from 
qt.io.
However, I understand the ARM translation application cannot deal with the AVX
instructions, so it would fail to run our default binaries for the
applications that couldn't rebuild as ARM. Is it acceptable to require those
application developers to rebuild Qt from source?

I believe we detect that situation at runtime and explicitly turn off AVX 
support, so we wouldn’t be hitting any of those AVX code paths, if I understand 
things correctly?

Cheers,
Tor Arne

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Allan Sandfeld Jensen
On Mittwoch, 19. Januar 2022 04:01:06 CET Thiago Macieira wrote:
> 5) for glibc-based Linux, add v3 sub-arch by default
> 
> I'd like to raise the default on Linux from baseline to v2 *and* add a v3
> sub- arch build, as described by point #3 above.
> 
> Device-specific Qt builds (Yocto Project, Boot2Qt) would need to turn this
> off and select a single architecture, if they don't want the extra files.
> 
I am also sceptical what we would gain from that. It seems mostly like 
something that could benefit QtCore, so perhaps only do v3 sub-arch  there?

Do Clear Linux have numbers for what they have gained by using a v3 like 
default?

Regards
Allan


___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Alexandru Croitor



> On 19. Jan 2022, at 04:01, Thiago Macieira  wrote:
> 
> 1) assume all compilers support what we need
> 
> I propose we remove the tests for the intrinsics of each individual CPU 
> feature. Instead, let's just assume they all have everything up to 2016. 
> 
> The change https://codereview.qt-project.org/c/qt/qtbase/+/386738 is going in 
> this direction but retains a test (all or nothing). I'm proposing now we 
> remove the test completely and just assume.

I believe this will pessimize optimisations for certain platforms as pointed 
out in the linked change.

Specifically QNX and iOS simulator builds.

https://codereview.qt-project.org/c/qt/qtbase/+/386738/11#message-5015480c07228dd2088d6b1aba137927725a06b2

Perhaps WASM and Integrity as well.

This may or may not matter to those platform users.

> 3) add a way to have multi-arch glibc-based Linux builds

Who is going to implement this and how? 

CMake basically has no support for proper multi-arch builds.

For macOS / iOS we rely on Apple clang supporting it. That's not available for 
Linux.

One could try using ExternalProjects, but this very quickly gets hairy for 
multiple reasons, especially if it's not a complete Qt build (all libraries, 
and not just a selection of libraries).

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-19 Thread Lars Knoll
Hi Thiago,

I’m absolutely in favour of upping the SIMD support in Qt. Compilers support 
everything we need, and we should make better use of that.

The main thing I’m wondering about is how much performance we gain by from a 
multi arch build Qt for different x86_64 architectures opposed to building 
maybe for v2 and detecting/using AVX and AVX512 at runtime. I would assume it’s 
little we gain, as there are very few places where the compilers 
auto-vectorizer will be able to emit AVX instructions, but I might be wrong 
here.

AVX is only used by a couple of classes in Qt Core and the drawhelper in Qt 
Gui. Qt Gui already does runtime detection, so it would be only about adding 
that to the methods in Qt Core. 

Couple more comments inline below.

> On 19 Jan 2022, at 04:01, Thiago Macieira  wrote:
> 
> For Qt 6.4, I'd like to propose we change the way we detect and enable SIMD 
> support. TL;DR:
> 
> * Assume all compilers support 5-year-old stuff
> * Up the minimum CPU for Linux, Window and macOS/x86
> * Fix macOS Universal builds to use the minimum
> * Add an option to cmake to choose a minimum matching one of the Linux x86-64 
>   ABI revisions
>   * Make it easy to build QtCore, QtGui ad Qt3D multi-arch on Linux
> 
> Long version:
> 
> 1) assume all compilers support what we need
> 
> Our current tests for compiler support go all the way back to SSE2, which is 
> mandatory on x96-64. While testing some changes, I've confirmed that all 
> compilers in the CI support x86 CPU features matching the Intel Cannon Lake 
> architecture, which is more than we need, except for the QCC compiler missing 
> one intrinsic that we can workaround.
> 
> I've also found that macOS universal builds, WASM, Android and maybe some 
> more 
> are improperly detecting support. Specifically for universal builds, what we 
> detect depends on the order in which you specify the architectures. This is 
> buggy at a minimum, surprising at best.
> 
> I propose we remove the tests for the intrinsics of each individual CPU 
> feature. Instead, let's just assume they all have everything up to 2016. This 
> will shorten cmake time a little and fix the macOS universal builds. It'll 
> also change how 32-bit non-SSE2 builds are selected (see below).
> 
> The change https://codereview.qt-project.org/c/qt/qtbase/+/386738 is going in 
> this direction but retains a test (all or nothing). I'm proposing now we 
> remove the test completely and just assume.

I’m fine with that, I don’t think we need to support a compiler that doesn’t 
support those. 

Can we at the same time do the same thing for NEON btw. While there are some 
platforms that don’t support NEON, I believe all compilers do support them.

> Question:
> - the QT_COMPILER_SUPPORTS_xxx macros are in qconfig.h (public config). Do we 
>  keep compatibility? We can easily just move them to qprocessordetection.

These are also to some extent used to differentiate between SSE and NEON. I 
think we can hardcode those in qprocessordetection for source compatibility.
> 
> 2) add options to select the target architecture revision
> 
> Linux established 3 new revisions of the architecture:
> * x86-64 v1 (baseline): SSE2 support
> * x86-64 v2: baseline + SSE3, SSSE3, SSE 4
> * x86-64 v3: v2 + AVX + AVX2 + FMA + BMI + F16C
> * x86-64 v4: v3 + AVX512F + BW + DQ + VL + ER
> 
> For i386, we can consider a "v0" of the non-SSE2 original baseline from the 
> 1980s.

Fine for me. I don’t really care that much about i386, as it’s quickly dying 
out and we’re not providing any binaries for it anymore.
> 
> I propose adding a CMake option to make it easy to opt in to one of those. 
> Yes, you can just set CMAKE_C(XX)FLAGS_{RELEASE,DEBUG,RELWITHDEBINFO}, so 
> this  
> part would be convenience.
> 
> For the default, see #4.
> 
> 3) add a way to have multi-arch glibc-based Linux builds
> 
> The revisions also match subdirectory searches by the Linux dynamic linker. 
> The subdirectories"x86-64-v2", "x86-64-v3" and "x86-64-v4" are new in glibc 
> 2.33, but glibc has supported "haswell" (for v3) and "avx512_1" (for v4) for 
> a 
> number of years prior to that.
> 
> The proposal is to allow the user to specify more than one architecture in 
> the 
> list above. We can query the dynamic linker to find out if it supports the 
> new 
> names and, if not, use the old ones.
> 
> For example, if I specified QT_X86_SUBARCH="v2;v3;v4", it would compile 
> QtCore 
> three times. The build products would be:
>  lib/libQt6Core.so.6.4.0
>  lib/haswell/libQt6Core.so.6.4.0  OR
>   lib/glibc-hwcaps/x86-64-v3/libQt6Core.so.6.4.0
>  lib/haswell/avx512_1/libQt6Core.so.6.4.0 OR
>   lib/glibc-hwcaps/x86-64-v4/libQt6Core.so.6.4.0
> with their matching symlinks.
> 
> This would apply to only a few select libraries. I'm thinking QtCore, QtGui, 
> QtQml and some of the Qt3D libraries.
> 
> I don't currently see a need to do this for any plugins and there is no 
> standardised way to name them anyway.
> 
> This 

Re: [Development] Updating x86 SIMD support in Qt

2022-01-18 Thread Kevin Kofler via Development
Thiago Macieira wrote:
> By default, I'd like us to produce x86-64 v2 code, which is SSE4.

But v1 will still be available for distribution packaging? As long as that 
is the case, I do not see a major issue, it will just be one more caveat for 
distribution packaging. (Distributions still supporting v1, which I think is 
most of the distros these days, will have to enable it explicitly, possibly 
along with newer vn (n>1) if optimized builds are desired.) But dropping 
support for v1 entirely causes headaches for distributions.

There are still (end) users of old hardware. E.g., my notebook is a Core 2 
Duo that supports up to SSSE3 (so v1 + SSE3 + SSSE3), but no SSE4. So it 
unfortunately falls one generation short of v2. (My desktop supports v2, but 
not v3, because it is missing at least AVX2.) But as long as the 
distribution packages work on it, I do not really care about what vn or SSEn 
the Qt upstream binaries require.

Kevin Kofler

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Updating x86 SIMD support in Qt

2022-01-18 Thread Lorn Potter




On 19/1/2022 1:01 PM, Thiago Macieira wrote:

I've also found that macOS universal builds, WASM, Android and maybe some more
are improperly detecting support. Specifically for universal builds, what we
detect depends on the order in which you specify the architectures. This is
buggy at a minimum, surprising at best.


wasm is a special case, as we turn it off by default, regardless of 
detection. We cannot allow detection by default (specified by some 
configure argument which is currently -sse2) because browsers do not 
support it by default, and there is no way to just not use it once it is 
compiled in.



--
Lorn Potter
Freelance Qt Developer. Platform Maintainer Qt WebAssembly, Maintainer 
QtSensors

Author, Hands-on Mobile and Embedded Development with Qt 5

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


[Development] Updating x86 SIMD support in Qt

2022-01-18 Thread Thiago Macieira
For Qt 6.4, I'd like to propose we change the way we detect and enable SIMD 
support. TL;DR:

* Assume all compilers support 5-year-old stuff
* Up the minimum CPU for Linux, Window and macOS/x86
* Fix macOS Universal builds to use the minimum
* Add an option to cmake to choose a minimum matching one of the Linux x86-64 
   ABI revisions
   * Make it easy to build QtCore, QtGui ad Qt3D multi-arch on Linux

Long version:

1) assume all compilers support what we need

Our current tests for compiler support go all the way back to SSE2, which is 
mandatory on x96-64. While testing some changes, I've confirmed that all 
compilers in the CI support x86 CPU features matching the Intel Cannon Lake 
architecture, which is more than we need, except for the QCC compiler missing 
one intrinsic that we can workaround.

I've also found that macOS universal builds, WASM, Android and maybe some more 
are improperly detecting support. Specifically for universal builds, what we 
detect depends on the order in which you specify the architectures. This is 
buggy at a minimum, surprising at best.

I propose we remove the tests for the intrinsics of each individual CPU 
feature. Instead, let's just assume they all have everything up to 2016. This 
will shorten cmake time a little and fix the macOS universal builds. It'll 
also change how 32-bit non-SSE2 builds are selected (see below).

The change https://codereview.qt-project.org/c/qt/qtbase/+/386738 is going in 
this direction but retains a test (all or nothing). I'm proposing now we 
remove the test completely and just assume.

Question:
- the QT_COMPILER_SUPPORTS_xxx macros are in qconfig.h (public config). Do we 
  keep compatibility? We can easily just move them to qprocessordetection.

2) add options to select the target architecture revision

Linux established 3 new revisions of the architecture:
* x86-64 v1 (baseline): SSE2 support
* x86-64 v2: baseline + SSE3, SSSE3, SSE 4
* x86-64 v3: v2 + AVX + AVX2 + FMA + BMI + F16C
* x86-64 v4: v3 + AVX512F + BW + DQ + VL + ER

For i386, we can consider a "v0" of the non-SSE2 original baseline from the 
1980s.

I propose adding a CMake option to make it easy to opt in to one of those. 
Yes, you can just set CMAKE_C(XX)FLAGS_{RELEASE,DEBUG,RELWITHDEBINFO}, so this  
part would be convenience.

For the default, see #4.

3) add a way to have multi-arch glibc-based Linux builds

The revisions also match subdirectory searches by the Linux dynamic linker. 
The subdirectories"x86-64-v2", "x86-64-v3" and "x86-64-v4" are new in glibc 
2.33, but glibc has supported "haswell" (for v3) and "avx512_1" (for v4) for a 
number of years prior to that.

The proposal is to allow the user to specify more than one architecture in the 
list above. We can query the dynamic linker to find out if it supports the new 
names and, if not, use the old ones.

For example, if I specified QT_X86_SUBARCH="v2;v3;v4", it would compile QtCore 
three times. The build products would be:
  lib/libQt6Core.so.6.4.0
  lib/haswell/libQt6Core.so.6.4.0   OR
lib/glibc-hwcaps/x86-64-v3/libQt6Core.so.6.4.0
  lib/haswell/avx512_1/libQt6Core.so.6.4.0  OR
lib/glibc-hwcaps/x86-64-v4/libQt6Core.so.6.4.0
with their matching symlinks.

This would apply to only a few select libraries. I'm thinking QtCore, QtGui, 
QtQml and some of the Qt3D libraries.

I don't currently see a need to do this for any plugins and there is no 
standardised way to name them anyway.

This would replace the current "-mno-sse2" option that is required to turn 
i386 32-bit builds from SSE2 support back to the original baseline. For a 32-
bit build, one would use QT_x86_SUBARCH="v0;v1" and get both baseline and the 
SSE2-optimised version.

4) up the defaults from where they are today

Today, your default Qt build will always target the x86-64 baseline[*], 
including for i386, despite as I said no CPU failing to meet the next level 
for 9 years. I'd like to request we up that minimum.

By default, I'd like us to produce x86-64 v2 code, which is SSE4. There are a 
number of optimisations in QtCore and QtGui that get automatically enabled. In 
particular, qstring.cpp does not do runtime detection, so you've been leaving 
performance on the table on your computers, unless you build Qt from source 
yourself and set -march= to match your CPU.

I'm told that Red Hat 9 will increase their minimum to v2, which is why the 
architecture selection features now exist.

This would apply to source and binary builds from qt.io. Android and macOS 
would be unaffected because they already default to this level.

Question:
- iOS simulator builds are x86, but currently only SSE2. Does anyone know if 
raising to SSE4, which *ALL*  64-bit Mac machines support, would be a problem?

5) for glibc-based Linux, add v3 sub-arch by default

I'd like to raise the default on Linux from baseline to v2 *and* add a v3 sub-
arch build, as described by point #3 above.

Device-specific Qt builds (Yocto Project, Boot2Qt)