On Sunday, 16 de October de 2011 21:09:44 Thiago Macieira wrote:
> Here's an idea:
>         QAtomicInt ref;
>         int alloc;
>         union {
>                 qptrdiff offset;
>                 struct { int begin; int end; };
>         };
>         // size = 16 bytes

And here are two possibilities admitting defeat and going over 16 bytes:

Option 1:

        QAtomicInt ref;
        int alloc;
        union {
                qptrdiff begin;
                qint64 dummy;
        };
        int end;
        int flags;
        // size = 24 bytes

Advantages:
 * 32 bits of flags available, reserving room for future expansion
 * no fiddling with sign bits anywhere

Disadvantages:
 * 32 bits wasted on 32-bit platforms, which will never be used
 * assuming an allocator aligning to 16 bytes, the start of the data will 
    always be 8 bytes off, incurring performance penalty with SSE2 operations
    (> 99% of the cases)
 * QVectors of SSE types will have 8 bytes of padding

Option 2:

        QAtomicInt ref;
        int flags;
        union {
                qptrdiff alloc;
                qint64 dummy;
        };
        qptrdiff begin;
        qptrdiff end;
        // size = 24 (32) bytes

Advantages:
 * 32 bits of flags available
 * size multiple of 16 on 64-bit platforms, for best SSE2 performance
 * full 64-bit sizes for 64-bit machines, allowing for allocation of more than 
   2 GB of data. The same header could be used for a QHugeVector class that
   operates on signed 64-bit sizes, allowing up to 8388608 TB of data
 * No padding required for QVectors of SSE types

Disadvantages:
 * 100% bigger than the original structure, 50% bigger than the Option 1
 * 32 bits wasted on 32-bit platforms

On 32-bit machines, if the allocator produces 16-byte-aligned memory regions, 
we'll be wrong on >95% of the cases, causing SSE2 performance penalties. 
However, if the allocator produces 8-byte-algined memory regions, as malloc in 
glibc does, we'll be wrong just over 50% of the cases whether the structure is 
24 or 32 bytes long. So we gain nothing by making it 32 bytes long on 32-bit 
machines.

The %-age of the use-cases is based on my experience with attempting SIMD 
optimisations on QString. Over a large sample, I found out that 95%-99% of the 
data comes from QString's own allocations and the rest (1-5%) comes from 
fromRawData. The strings in fromRawData are evenly distributed across all 
possible alignments, the strings allocated by QString are evenly distributed 
across both possibilities on 32-bit machines.

In other words, the histogram of QString data alignments, on a 32-bit machine 
with an 8-byte-aligning allocator (like glibc's) should be roughly like the 
following, with both a 16, 24 or 32-byte header:

         0      48.5%
         2      0.5%
         4      0.5%
         6      0.5%
         8      48.5%
        10      0.5%
        12      0.5%
        14      0.5%

With a 16- or 32-byte header with an allocator giving aligned-to-16 memory 
regions, we should see:

         0      96.5%
         2      0.5%
         4      0.5%
         6      0.5%
         8      0.5%
        10      0.5%
        12      0.5%
        14      0.5%

To make the 32-bit structure fit the latter profile above, we'd need to add 
another 8 bytes to the header (bringing the total wastage to 12 bytes) and 
hope for an allocator that aligns to 16 bytes. Using posix_memalign or 
equivalent functions is likely to simply cause another 8 bytes of overhead 
inside the allocators.

An alternative, and IMHO better, approach would be to always allocate 8 bytes 
more than strictly needed and force d->begin to the 16-byte boundary. That 
means that d->begin == 4 whenever d is misaligned. This approach would allow 
us to achieve the above profile even on systems with allocators giving 8-byte-
aligned pointers, such as glibc 32-bit.

It would also allow us to adapt on-the-fly if the allocator is updated and 
starts to give us 16-byte aligned pointers on 32-bit, which would otherwise be 
the worst case scanerio below:

         0      0.5%
         2      0.5%
         4      0.5%
         6      0.5%
         8      96.5%
        10      0.5%
        12      0.5%
        14      0.5%

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center
      PGP/GPG: 0x6EF45358; fingerprint:
      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Qt5-feedback mailing list
[email protected]
http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback

Reply via email to