Re: [Qt5-feedback] Merging the vector/array headers in QtCore

lars.knoll Mon, 17 Oct 2011 01:25:14 -0700

Do we really believe we need the 32 flags? We need capacity. sharable is
only used internally right now (in qiterator.h). Have we thought if there
are ways to avoid needing it?


I like the idea of getting the header into 16 bytes at least on 32bit
platforms. We don't have to be as strict on 64bit IMO, as these platforms
should usually have some more memory available anyway.

Cheers,
Lars

On 10/16/11 9:55 PM, "ext Thiago Macieira" <[email protected]> wrote:

>On Sunday, 16 de October de 2011 21:09:44 Thiago Macieira wrote:
>> Here's an idea:
>>         QAtomicInt ref;
>>         int alloc;
>>         union {
>>                 qptrdiff offset;
>>                 struct { int begin; int end; };
>>         };
>>         // size = 16 bytes
>
>And here are two possibilities admitting defeat and going over 16 bytes:
>
>Option 1:
>
>       QAtomicInt ref;
>       int alloc;
>       union {
>               qptrdiff begin;
>               qint64 dummy;
>       };
>       int end;
>       int flags;
>       // size = 24 bytes
>
>Advantages:
> * 32 bits of flags available, reserving room for future expansion
> * no fiddling with sign bits anywhere
>
>Disadvantages:
> * 32 bits wasted on 32-bit platforms, which will never be used
> * assuming an allocator aligning to 16 bytes, the start of the data will
>    always be 8 bytes off, incurring performance penalty with SSE2
>operations
>    (> 99% of the cases)
> * QVectors of SSE types will have 8 bytes of padding
>
>Option 2:
>
>       QAtomicInt ref;
>       int flags;
>       union {
>               qptrdiff alloc;
>               qint64 dummy;
>       };
>       qptrdiff begin;
>       qptrdiff end;
>       // size = 24 (32) bytes
>
>Advantages:
> * 32 bits of flags available
> * size multiple of 16 on 64-bit platforms, for best SSE2 performance
> * full 64-bit sizes for 64-bit machines, allowing for allocation of more
>than 
>   2 GB of data. The same header could be used for a QHugeVector class
>that
>   operates on signed 64-bit sizes, allowing up to 8388608 TB of data
> * No padding required for QVectors of SSE types
>
>Disadvantages:
> * 100% bigger than the original structure, 50% bigger than the Option 1
> * 32 bits wasted on 32-bit platforms
>
>On 32-bit machines, if the allocator produces 16-byte-aligned memory
>regions, 
>we'll be wrong on >95% of the cases, causing SSE2 performance penalties.
>However, if the allocator produces 8-byte-algined memory regions, as
>malloc in 
>glibc does, we'll be wrong just over 50% of the cases whether the
>structure is 
>24 or 32 bytes long. So we gain nothing by making it 32 bytes long on
>32-bit 
>machines.
>
>The %-age of the use-cases is based on my experience with attempting SIMD
>optimisations on QString. Over a large sample, I found out that 95%-99%
>of the 
>data comes from QString's own allocations and the rest (1-5%) comes from
>fromRawData. The strings in fromRawData are evenly distributed across all
>possible alignments, the strings allocated by QString are evenly
>distributed 
>across both possibilities on 32-bit machines.
>
>In other words, the histogram of QString data alignments, on a 32-bit
>machine 
>with an 8-byte-aligning allocator (like glibc's) should be roughly like
>the 
>following, with both a 16, 24 or 32-byte header:
>
>        0      48.5%
>        2      0.5%
>        4      0.5%
>        6      0.5%
>        8      48.5%
>       10      0.5%
>       12      0.5%
>       14      0.5%
>
>With a 16- or 32-byte header with an allocator giving aligned-to-16
>memory 
>regions, we should see:
>
>        0      96.5%
>        2      0.5%
>        4      0.5%
>        6      0.5%
>        8      0.5%
>       10      0.5%
>       12      0.5%
>       14      0.5%
>
>To make the 32-bit structure fit the latter profile above, we'd need to
>add 
>another 8 bytes to the header (bringing the total wastage to 12 bytes)
>and 
>hope for an allocator that aligns to 16 bytes. Using posix_memalign or
>equivalent functions is likely to simply cause another 8 bytes of
>overhead 
>inside the allocators.
>
>An alternative, and IMHO better, approach would be to always allocate 8
>bytes 
>more than strictly needed and force d->begin to the 16-byte boundary.
>That 
>means that d->begin == 4 whenever d is misaligned. This approach would
>allow 
>us to achieve the above profile even on systems with allocators giving
>8-byte-
>aligned pointers, such as glibc 32-bit.
>
>It would also allow us to adapt on-the-fly if the allocator is updated
>and 
>starts to give us 16-byte aligned pointers on 32-bit, which would
>otherwise be 
>the worst case scanerio below:
>
>        0      0.5%
>        2      0.5%
>        4      0.5%
>        6      0.5%
>        8      96.5%
>       10      0.5%
>       12      0.5%
>       14      0.5%
>
>-- 
>Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>   Software Architect - Intel Open Source Technology Center
>      PGP/GPG: 0x6EF45358; fingerprint:
>      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358
>_______________________________________________
>Qt5-feedback mailing list
>[email protected]
>http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback

_______________________________________________
Qt5-feedback mailing list
[email protected]
http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback

Re: [Qt5-feedback] Merging the vector/array headers in QtCore

Reply via email to