Re: [Development] Sub-arch optimisations (was: How qAsConst and qExchange lead to qNN)

2023-07-24 Thread Thiago Macieira
On Sunday, 20 November 2022 18:38:08 PDT Thiago Macieira wrote:
> ("QString: replace #if with if constexpr...") and ending at
> https://codereview.qt-project.org/c/qt/qtbase/+/386952 ("
> QString::toLatin1: do the same as..."). The first six commits are merely
> clean- ups and reorganisation.
> 
> I'll defer the AVX2 and AVX512VL improvements for 6.6.

Well, it didn't happen because I haven't had the time to run the benchmarks 
and don't have a line of sight to when I'll have time for this. Therefore, the 
changes are still in limbo.

Meanwhile, Intel announced today (yesterday GMT) the AVX10[1] and APX[2] 
instruction set extensions. Our objective is to get that baseline established 
as yet another sub-arch and get Linux distributors not only to adopt it, but 
to rebuild a considerable chunk of their software using it.

[1] https://cdrdv2.intel.com/v1/dl/getContent/784267
[2] https://cdrdv2.intel.com/v1/dl/getContent/784266

AVX10 is basically all of AVX512 in 256-bit form. Similarity with the work 
starting in https://codereview.qt-project.org/c/qt/qtbase/+/387415 (see also 
https://lists.qt-project.org/pipermail/development/2022-January/042083.html 
for the discussion in January 2022) is NOT a coincidence. I'd known this was 
coming and as planning on simply updating the CPUID detection to enable it, 
but anyway the work is done and I've been running my own code for 2 years now, 
but it's still unmerged.

Therefore, help benchmarking is appreciated. Obviously, we won't be able to 
benchmark the AVX10.2 implementations for Atoms and laptop CPUs until those 
are in the market.

I've also evolved on my recommendations for the rest of this discussion since 
I've last posted:

> But the short story is:
> * On all x86-64 builds, the new default will be the v2 sub-architecture,
> which is this month 14 years old, and is the minimum on all x86-64 Android
> and Macs anyway, and is the new minimum on Red Hat 9. This can be
> overridden up or down by the user with the new QT_BUILD_SUBARCH variable.

I will make one last push to the WIP changes in gerrit, for the record, then 
abandon. This functionality needs to be in CMake itself, not inside the Qt 
CMake files. I plan on facilitating this discussion with their devs.

Should CMake not make such changes, Linux distributions will probably just 
build the software twice and install one on top of the other, like Clear Linux 
was doing 7 years ago already.

I recommend the Qt Company apply either technique on its own binary builds, 
but I won't make the actual changes. That's SEP.

> * On Macs, the new default will be the v3 sub-architecture (Apple calls it
> "x86-64h") and can similarly be overridden with either that variable or the
> CMAKE_OSX_ARCHITECTURES variable. It should be possible to extend my code to
> do both x86-64 and x86-64h multiarch on macOS, but I don't plan on spending
> time on this, because ALL currently supported Macs can run AVX2.

In line with the "SEP to make it happen" above for Linux and the reduced 
importance of x86 for the Mac world, I have already abandoned all the relevant 
changes.

Qt Company and Qt users can build either x86_64h or x86_64+x86_64h fat 
binaries for their software if they want. I recommend that and the 
recommendation hasn't changed. 

But I won't be implementing the changes like the patches-to-be-abandoned were 
doing -- just build twice and lipo everything together. Maybe CMake can be 
updated to do it for us, but driving that change is SEP.

Patches abandoned:
https://codereview.qt-project.org/c/qt/qtbase/+/444968
https://codereview.qt-project.org/c/qt/qtbase/+/444969
https://codereview.qt-project.org/c/qt/qtbase/+/444538
-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Cloud Software Architect - Intel DCAI Cloud Engineering


smime.p7s
Description: S/MIME cryptographic signature
-- 
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Sub-arch optimisations (was: How qAsConst and qExchange lead to qNN)

2022-11-24 Thread Thiago Macieira
On Thursday, 24 November 2022 02:04:49 PST Edward Welbourne via Development 
wrote:
> Thiago Macieira (23 November 2022 22:11) wrote:
> >> I'll fix it.
> > 
> > That was easy. I just had to remove code to make it work.
> 
> Always a satisfying solution to a problem ;^>

Well, it turns out that I spoke slightly too soon. The x86 portion was easy, 
like I wrote... but it kept failing on universal macOS builds because the test 
for "is this x86" kept being true. I had to redo it in a way that the code in 
the library compiled even if not on x86-64 (it simply expands to empty now, 
like qstring.cpp), and made the unit test not be enabled in CMake at all if 
you're doing multi-arch macOS.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Cloud Software Architect - Intel DCAI Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Sub-arch optimisations (was: How qAsConst and qExchange lead to qNN)

2022-11-24 Thread Edward Welbourne via Development
Thiago Macieira (23 November 2022 22:11) wrote:
>> I'll fix it.

> That was easy. I just had to remove code to make it work.

Always a satisfying solution to a problem ;^>

Eddy.
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Sub-arch optimisations (was: How qAsConst and qExchange lead to qNN)

2022-11-23 Thread Thiago Macieira
On Wednesday, 23 November 2022 11:55:01 PST Thiago Macieira wrote:
> abstractpickingjob.cpp:74:33: error: cannot convert ‘const Matrix4x4’ {aka
> ‘const Qt3DCore::Matrix4x4_SSE’} to ‘const Qt3DCore::Matrix4x4_AVX2&’
> 
> I'll fix it.

That was easy. I just had to remove code to make it work.
https://codereview.qt-project.org/c/qt/qt3d/+/444980

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Cloud Software Architect - Intel DCAI Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Sub-arch optimisations (was: How qAsConst and qExchange lead to qNN)

2022-11-23 Thread Thiago Macieira
On Tuesday, 22 November 2022 07:10:28 PST Thiago Macieira wrote:
> > Are the changes that enable multi subarch builds already up?
> > I did not manage the find them.
> 
> Not yet, only the initial clean-ups. I want to get them working first. I
> fixed the qml_register_types issue last night, now I need to figure out why
> half of QtLocation is missing.

They're now up. The qtbase one ends at
https://codereview.qt-project.org/c/qt/qtbase/+/444969
With support at
https://codereview.qt-project.org/c/qt/qtdeclarative/+/444970
https://codereview.qt-project.org/c/qt/qtwayland/+/444971

All modules except for qt3d build and link now. qt3d failed because it has a 
static configuration on whether to use SSE or AVX and of course that won't work 
when we build with AVX.

abstractpickingjob.cpp:74:33: error: cannot convert ‘const Matrix4x4’ {aka 
‘const Qt3DCore::Matrix4x4_SSE’} to ‘const Qt3DCore::Matrix4x4_AVX2&’

I'll fix it.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Cloud Software Architect - Intel DCAI Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Sub-arch optimisations (was: How qAsConst and qExchange lead to qNN)

2022-11-22 Thread Thiago Macieira
On Tuesday, 22 November 2022 04:35:35 PST Joerg Bornemann via Development 
wrote:
> On 11/21/22 03:38, Thiago Macieira wrote:
> > I've just finished a qtbase build on Linux with two sub-architectures and
> > the symbol comparison of all the resulting libraries has shown zero
> > difference. Tomorrow I will test all other modules (except qtwebengine).
> > The code is ugly, so I'd appreciate guidance from the CMake experts. I've
> > already submitted a few preliminary clean-ups.
> 
> Are the changes that enable multi subarch builds already up?
> I did not manage the find them.

Not yet, only the initial clean-ups. I want to get them working first. I fixed 
the qml_register_types issue last night, now I need to figure out why half of 
QtLocation is missing.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Cloud Software Architect - Intel DCAI Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Sub-arch optimisations (was: How qAsConst and qExchange lead to qNN)

2022-11-22 Thread Joerg Bornemann via Development

On 11/21/22 03:38, Thiago Macieira wrote:


I've just finished a qtbase build on Linux with two sub-architectures and the
symbol comparison of all the resulting libraries has shown zero difference.
Tomorrow I will test all other modules (except qtwebengine). The code is ugly,
so I'd appreciate guidance from the CMake experts. I've already submitted a
few preliminary clean-ups.


Are the changes that enable multi subarch builds already up?
I did not manage the find them.


Cheers,

Joerg
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Sub-arch optimisations (was: How qAsConst and qExchange lead to qNN)

2022-11-21 Thread Thiago Macieira
On Sunday, 20 November 2022 18:38:08 PST Thiago Macieira wrote:
> I've just finished a qtbase build on Linux with two sub-architectures and
> the symbol comparison of all the resulting libraries has shown zero
> difference..

Done. All modules now built in multi-subarch mode. I've submitted a few 
cleanup commits to the modules to fix issues that don't depend on the new 
functionality.

My script is showing symbol differences between the two builds. It looks like 
all the qml_register_types_* symbols are missing and half of QtLocation. I'll 
need to investigate tomorrow.

Meanwhile, I've also completed the first part of the macOS switch to v3:
https://codereview.qt-project.org/c/qt/qtbase/+/444538
Tested locally and the build is ok. I'll add the QT_BUILD_ARCH support to it 
when I'm finished on Linux.

> [*] I had an idea an hour ago, thinking about the qxcb plugin and remembered
> the old KDE Brockenbores solution.

Didn't work so well. It's possible to link to an executable, but requires 
removing one bit from the dynamic section. I managed it, but it's probably not 
worth the hassle:

$ ls -l libexec/moc
-rwxr-xr-x 1 tjmaciei users 2592 Nov 21 18:48 libexec/moc
 $ ldd libexec/moc | sed 's/(.*//'
 linux-vdso.so.1 
 moc.so => /home/tjmaciei/obj/qt/qt6/qtbase/libexec/binlib/haswell/moc.so 
 libpcre2-16.so.0 => /lib64/libpcre2-16.so.0 
 libstdc++.so.6 => /lib64/libstdc++.so.6 
 libm.so.6 => /lib64/libm.so.6 
 libgcc_s.so.1 => /lib64/libgcc_s.so.1 
 libc.so.6 => /lib64/libc.so.6 
 /lib64/ld-linux-x86-64.so.2 
$ libexec/moc --version
moc 6.5.0
$ libexec/binlib/haswell/moc.so --version
moc.so 6.5.0

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Cloud Software Architect - Intel DCAI Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


[Development] Sub-arch optimisations (was: How qAsConst and qExchange lead to qNN)

2022-11-20 Thread Thiago Macieira
On Thursday, 17 November 2022 10:56:22 PST Thiago Macieira wrote:
> The algorithms available are:
> * baseline SSE2: no comparisons

I realised yesterday that, since there will be no benchmarking to prove that 
the new SSE2 code is better than the old one, it is by definition ready. So 
I've rebased, reordered the SSE2 portion only and pushed.

The changes satrt at https://codereview.qt-project.org/c/qt/qtbase/+/386952 
("QString: replace #if with if constexpr...") and ending at 
https://codereview.qt-project.org/c/qt/qtbase/+/386952 ("
QString::toLatin1: do the same as..."). The first six commits are merely clean-
ups and reorganisation.

I'll defer the AVX2 and AVX512VL improvements for 6.6.

Meanwhile, I did make some progress on upping our default minimum sub-arch 
targets. For the long discussion, see the thread at
https://lists.qt-project.org/pipermail/development/2022-March/042320.html

But the short story is:
* On all x86-64 builds, the new default will be the v2 sub-architecture, which 
is this month 14 years old, and is the minimum on all x86-64 Android and Macs 
anyway, and is the new minimum on Red Hat 9. This can be overridden up or down 
by the user with the new QT_BUILD_SUBARCH variable.

* On Macs, the new default will be the v3 sub-architecture (Apple calls it 
"x86-64h") and can similarly be overridden with either that variable or the 
CMAKE_OSX_ARCHITECTURES variable. It should be possible to extend my code to 
do both x86-64 and x86-64h multiarch on macOS, but I don't plan on spending 
time on this, because ALL currently supported Macs can run AVX2.

* On Linux, we gain the ability to create multi-arch builds of modules when 
compiled to shared libraries. The default on x86-64 will be to build the v2 
and v3 sub-architectures. The CMake variable again allows you to add v1 and 
v4, though v1 + v2 only works with glibc 2.33 (Feb 2021) and up. All other 
combinations work since 2.28 (Feb 2018)

* The option can be controlled per module, so Linux distributors could choose 
to do a dual-, triple-, or (in Debian's case) quadruple-arch build of qtbase, 
qtdeclarative and qt3d, but not the other modules.

I've just finished a qtbase build on Linux with two sub-architectures and the 
symbol comparison of all the resulting libraries has shown zero difference. 
Tomorrow I will test all other modules (except qtwebengine). The code is ugly, 
so I'd appreciate guidance from the CMake experts. I've already submitted a 
few preliminary clean-ups.

I only implemented multi-arch for modules when compiled as shared libraries. 
There's currently no solution for multi-arch binaries on Linux[*], so there's 
no sense in making that solution work for modules as static libraries right 
now. I might revisit this for non-module static libraries. QPluginLoader can 
load multi-arch plugins, but right now they're not worth it; they can do like 
the qxcb plugin did and move its functionality onto a library.

[*] I had an idea an hour ago, thinking about the qxcb plugin and remembered 
the old KDE Brockenbores solution.
-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Cloud Software Architect - Intel DCAI Cloud Engineering



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development