[Qt5-feedback] QList in-array item size

2011-10-16 Thread Thiago Macieira
Hello

Currently QList is an array of void* and it stores items that fit that size and 
are movable in the array itself. This has a couple of inefficiencies:

 - on a 64-bit platform, there's a 50% overhead for QListint 
   (QListshort and QListchar are unlikely)

 - QListdouble uses indirect allocation on 32-bit platforms

 - QVariant has one member of 12 bytes on all platforms, QSharedPointer and 
   QWeakPointer are always twice the size of void*, so QList of them always 
   uses indirect allocation (including QVariantList)

So I'd like to propose a change to QList's allocation strategy.

Option 1: large  2*sizeof(void*)
 - pros: will include double, QSharedPointer and QWeakPointer
 - cons: does not include QVariant on 32-bit platforms, 75% overhead on int on 
   32-bit

Option 2: large  16 bytes
 - pros: includes double, the pointers and QVariant
 - cons: overhead of 75% of int and pointers on 32-bit; 50% overhead of 
   pointers on 64-bit

Option 3: make it QListT be an actual QVectorT for movable types, maybe 
with an upper limit of size (32 bytes, 128 bytes?).
 - pros: suitable for all types, shares code
 - cons: more complex to implement, more code to compile and parse in headers

One big difference between QList and QVector today is that QList has prepend / 
takeFirst optimisation, whereas QVector must move all elements to accommodate. 
I would prefer if that optimisation remained present.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center
  PGP/GPG: 0x6EF45358; fingerprint:
  E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358


signature.asc
Description: This is a digitally signed message part.
___
Qt5-feedback mailing list
Qt5-feedback@qt.nokia.com
http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback


[Qt5-feedback] Merging the vector/array headers in QtCore

2011-10-16 Thread Thiago Macieira
When I talked to João this week, he expressed the desire to unify the header
code of QVector, QByteArray and QString. Since QByteArray is a vector of char
and QString is a vector of QChar / ushort, it makes sense to unify the code to
simplify maintenance. Whether specific optimisations in allocation policies are
needed, we can figure out later.

But given my last email, about also bringing in QList for small and movable
types into this merging, I pointed out that QList has one optimisation that
QVector doesn't have.

To support that feature, the QListData header currently is:

QAtomicInt ref;
int alloc;
int begin;
int end;
// size = 16 bytes, alignment = 4 bytes

There's also an 1-bit flag which I have not included. It can be stored in the
sign bit of one of the ints in the future.

In addition, the current QStringData and QByteArrayData headers are (not
including the flags):

QAtomicint ref;
int size;
int alloc;
qptrdiff offset;
// size = 16 (24) bytes, alignment = 4 (8) bytes

Note how there's a 4-byte padding gap before the offset member on 64-bit
platforms and how the data starts at an offset of 24 bytes. Given a 16-byte
aligned allocator, which is common on 64-bit platforms, the data starts at an
8 byte offset from the nearest 16-byte alignment, which is a problem for SSE2
optimisations.

I'd like to pick all of your brains for a solution that has the following
properties:

 - size always 16 bytes
 - contains a qptrdiff offset, to still allow for fromRawData (which we'd
   then introduce to QVector too, but not QList)
 - can store at least one 1-bit flag, but 2 would be preferred
 - size calculation reasonably fast

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center
  PGP/GPG: 0x6EF45358; fingerprint:
  E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358


signature.asc
Description: This is a digitally signed message part.
___
Qt5-feedback mailing list
Qt5-feedback@qt.nokia.com
http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback


Re: [Qt5-feedback] Merging the vector/array headers in QtCore

2011-10-16 Thread Olivier Goffart
On Sunday 16 October 2011 16:30:39 Thiago Macieira wrote:
 When I talked to João this week, he expressed the desire to unify the header
 code of QVector, QByteArray and QString. Since QByteArray is a vector of
 char and QString is a vector of QChar / ushort, it makes sense to unify the
 code to simplify maintenance. Whether specific optimisations in allocation
 policies are needed, we can figure out later.

Appart from consistency in out internal structures, what is the gain?
What is the code that can be reuse? (Appart the declaration itself, i can't 
think of any)

As this is an internal thing, this do not give us anything regarding the API 
point of view.
And this keep us to have nice per/data structure optimisation.

In other word, i'm not really in favor of this.

(I think the sharing should happen at code level, by providing the same 
interface which enable templated code (like QStringBuilder) to work of all 
types.)
 
 But given my last email, about also bringing in QList for small and movable
 types into this merging, I pointed out that QList has one optimisation that
 QVector doesn't have.
 
 To support that feature, the QListData header currently is:
 
   QAtomicInt ref;
   int alloc;
   int begin;
   int end;
   // size = 16 bytes, alignment = 4 bytes
 
 There's also an 1-bit flag which I have not included. It can be stored in
 the sign bit of one of the ints in the future.
 
 In addition, the current QStringData and QByteArrayData headers are (not
 including the flags):
 
   QAtomicint ref;
   int size;
   int alloc;
   qptrdiff offset;
   // size = 16 (24) bytes, alignment = 4 (8) bytes
 
 Note how there's a 4-byte padding gap before the offset member on 64-bit
 platforms and how the data starts at an offset of 24 bytes. Given a 16-byte
 aligned allocator, which is common on 64-bit platforms, the data starts at
 an 8 byte offset from the nearest 16-byte alignment, which is a problem for
 SSE2 optimisations.

One thing I would like to add is the hash, for QString (and potentialy 
QByteArray)
see: http://qt.gitorious.org/qt/qtbase/merge_requests/62


 I'd like to pick all of your brains for a solution that has the following
 properties:
 
  - size always 16 bytes

Not possible on 64bit
Or did you mean multiple of?

  - contains a qptrdiff offset, to still allow for fromRawData (which we'd
then introduce to QVector too, but not QList)

And there again you are saying it will be different for QList and QVector
Or you want to put a lot of unions there (which negates the hole purpose)

  - can store at least one 1-bit flag, but 2 would be preferred

The more flag you have, the more future proof you are if you want to keep 
binary compatibility.

  - size calculation reasonably fast

And add stuff like: 
 - can be generate at compile time and put in the .rodata (like we did for 
QString)


___
Qt5-feedback mailing list
Qt5-feedback@qt.nokia.com
http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback


Re: [Qt5-feedback] QList in-array item size

2011-10-16 Thread Olivier Goffart
On Sunday 16 October 2011 16:21:40 Thiago Macieira wrote:
[...]
 Option 3: make it QListT be an actual QVectorT for movable types, maybe
 with an upper limit of size (32 bytes, 128 bytes?).
  - pros: suitable for all types, shares code
  - cons: more complex to implement, more code to compile and parse in
 headers

This is the obvious solutions...

 One big difference between QList and QVector today is that QList has prepend
 / takeFirst optimisation, whereas QVector must move all elements to
 accommodate. I would prefer if that optimisation remained present.

It has to (it enable QList to be used for queue like data structure (and it 
is, see QQueue)

QList was also supposed to expands to less code (than QVector) (at least that 
is what Jasmin used to advertise some years ago)

___
Qt5-feedback mailing list
Qt5-feedback@qt.nokia.com
http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback


Re: [Qt5-feedback] QList in-array item size

2011-10-16 Thread Thiago Macieira
On Sunday, 16 de October de 2011 17:12:33 Olivier Goffart wrote:
  One big difference between QList and QVector today is that QList has
  prepend / takeFirst optimisation, whereas QVector must move all elements
  to accommodate. I would prefer if that optimisation remained present.
 It has to (it enable QList to be used for queue like data structure (and it
 is, see QQueue)
 
 QList was also supposed to expands to less code (than QVector) (at least
 that is what Jasmin used to advertise some years ago)

I don't see how it can expand to less code than a simple vector. The best case 
scenario is to expand to the same code or to similar complexity -- assuming of 
course that QVector is optimised to produce minimal code too.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center
  PGP/GPG: 0x6EF45358; fingerprint:
  E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358


signature.asc
Description: This is a digitally signed message part.
___
Qt5-feedback mailing list
Qt5-feedback@qt.nokia.com
http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback


Re: [Qt5-feedback] QList in-array item size

2011-10-16 Thread Olivier Goffart
On Sunday 16 October 2011 18:59:08 Thiago Macieira wrote:
 On Sunday, 16 de October de 2011 17:12:33 Olivier Goffart wrote:
   One big difference between QList and QVector today is that QList has
   prepend / takeFirst optimisation, whereas QVector must move all
   elements to accommodate. I would prefer if that optimisation
   remained present. 
  It has to (it enable QList to be used for queue like data structure (and
  it is, see QQueue)
  
  QList was also supposed to expands to less code (than QVector) (at least
  that is what Jasmin used to advertise some years ago)
 
 I don't see how it can expand to less code than a simple vector. The best
 case scenario is to expand to the same code or to similar complexity --
 assuming of course that QVector is optimised to produce minimal code too.

Because QList make use of much more stuff in QListData that is not template 
type (hence, not generated for every types)
However, I beleive it is possible to do the same optimisation for movable type 
within QVector, if we want to.

___
Qt5-feedback mailing list
Qt5-feedback@qt.nokia.com
http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback


Re: [Qt5-feedback] QList in-array item size

2011-10-16 Thread Thiago Macieira
On Sunday, 16 de October de 2011 19:51:39 Olivier Goffart wrote:
  I don't see how it can expand to less code than a simple vector. The best
  case scenario is to expand to the same code or to similar complexity --
  assuming of course that QVector is optimised to produce minimal code too.
 
 Because QList make use of much more stuff in QListData that is not template 
 type (hence, not generated for every types)
 However, I beleive it is possible to do the same optimisation for movable
 type  within QVector, if we want to.

The biggest difference between current QList and QVector is that the size of 
the element is constant in QList, whereas it varies in QVector. The static 
code to manage the array, however, need not know the difference: it only needs 
to know the total size and the alignment requirement.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center
  PGP/GPG: 0x6EF45358; fingerprint:
  E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358


signature.asc
Description: This is a digitally signed message part.
___
Qt5-feedback mailing list
Qt5-feedback@qt.nokia.com
http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback


Re: [Qt5-feedback] Merging the vector/array headers in QtCore

2011-10-16 Thread Thiago Macieira
On Sunday, 16 de October de 2011 16:30:39 Thiago Macieira wrote:
  - size always 16 bytes
  - contains a qptrdiff offset, to still allow for fromRawData (which we'd 
then introduce to QVector too, but not QList)
  - can store at least one 1-bit flag, but 2 would be preferred
  - size calculation reasonably fast

Here's an idea:
QAtomicInt ref;
int alloc;
union {
qptrdiff offset;
struct { int begin; int end; };
};
// size = 16 bytes

Allocation sizes are always positive, so the sign bit in alloc is free for our 
use. If the sign bit is set (alloc  0) then we're dealing with fromRawData. 
If it's not set, we're dealing with data we allocated ourselves.

The two 1-bit flags can be stored in the sign bits of begin and end, but only 
for own-allocated data. Currently, the two flags we use are sharable and 
capacity, neither of which makes sense for raw data.

We'd have:

int size() const { return d-alloc  0 ? -d-alloc : d-end - d-begin; }
T *data() { return d-array() + (d-alloc  0 ? d-offset : d-begin); }

Advantages:
 * 16 bytes on all platforms, regardless of pointer size
 * 2 bits available for flags
 * raw data is possible
 * size calculation involves one extra comparison (the subtraction would have 
   been there anyway)
 * the data() function above does not actually do any comparison on 32-bit 
   platforms, as offset and begin always occupy the same space
 * the calculation to increase the size upon reallocation like:
if (newAlloc  d-alloc)
   does not require special code to deal with raw data

Disadvantages:
 * No wiggle room for *any* future expansion
 * Storing the flags in the sign bits require bit fields or other manipulations

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center
  PGP/GPG: 0x6EF45358; fingerprint:
  E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358


signature.asc
Description: This is a digitally signed message part.
___
Qt5-feedback mailing list
Qt5-feedback@qt.nokia.com
http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback


Re: [Qt5-feedback] Merging the vector/array headers in QtCore

2011-10-16 Thread Thiago Macieira
On Sunday, 16 de October de 2011 21:09:44 Thiago Macieira wrote:
 Here's an idea:
 QAtomicInt ref;
 int alloc;
 union {
 qptrdiff offset;
 struct { int begin; int end; };
 };
 // size = 16 bytes

And here are two possibilities admitting defeat and going over 16 bytes:

Option 1:

QAtomicInt ref;
int alloc;
union {
qptrdiff begin;
qint64 dummy;
};
int end;
int flags;
// size = 24 bytes

Advantages:
 * 32 bits of flags available, reserving room for future expansion
 * no fiddling with sign bits anywhere

Disadvantages:
 * 32 bits wasted on 32-bit platforms, which will never be used
 * assuming an allocator aligning to 16 bytes, the start of the data will 
always be 8 bytes off, incurring performance penalty with SSE2 operations
( 99% of the cases)
 * QVectors of SSE types will have 8 bytes of padding

Option 2:

QAtomicInt ref;
int flags;
union {
qptrdiff alloc;
qint64 dummy;
};
qptrdiff begin;
qptrdiff end;
// size = 24 (32) bytes

Advantages:
 * 32 bits of flags available
 * size multiple of 16 on 64-bit platforms, for best SSE2 performance
 * full 64-bit sizes for 64-bit machines, allowing for allocation of more than 
   2 GB of data. The same header could be used for a QHugeVector class that
   operates on signed 64-bit sizes, allowing up to 8388608 TB of data
 * No padding required for QVectors of SSE types

Disadvantages:
 * 100% bigger than the original structure, 50% bigger than the Option 1
 * 32 bits wasted on 32-bit platforms

On 32-bit machines, if the allocator produces 16-byte-aligned memory regions, 
we'll be wrong on 95% of the cases, causing SSE2 performance penalties. 
However, if the allocator produces 8-byte-algined memory regions, as malloc in 
glibc does, we'll be wrong just over 50% of the cases whether the structure is 
24 or 32 bytes long. So we gain nothing by making it 32 bytes long on 32-bit 
machines.

The %-age of the use-cases is based on my experience with attempting SIMD 
optimisations on QString. Over a large sample, I found out that 95%-99% of the 
data comes from QString's own allocations and the rest (1-5%) comes from 
fromRawData. The strings in fromRawData are evenly distributed across all 
possible alignments, the strings allocated by QString are evenly distributed 
across both possibilities on 32-bit machines.

In other words, the histogram of QString data alignments, on a 32-bit machine 
with an 8-byte-aligning allocator (like glibc's) should be roughly like the 
following, with both a 16, 24 or 32-byte header:

 0  48.5%
 2  0.5%
 4  0.5%
 6  0.5%
 8  48.5%
10  0.5%
12  0.5%
14  0.5%

With a 16- or 32-byte header with an allocator giving aligned-to-16 memory 
regions, we should see:

 0  96.5%
 2  0.5%
 4  0.5%
 6  0.5%
 8  0.5%
10  0.5%
12  0.5%
14  0.5%

To make the 32-bit structure fit the latter profile above, we'd need to add 
another 8 bytes to the header (bringing the total wastage to 12 bytes) and 
hope for an allocator that aligns to 16 bytes. Using posix_memalign or 
equivalent functions is likely to simply cause another 8 bytes of overhead 
inside the allocators.

An alternative, and IMHO better, approach would be to always allocate 8 bytes 
more than strictly needed and force d-begin to the 16-byte boundary. That 
means that d-begin == 4 whenever d is misaligned. This approach would allow 
us to achieve the above profile even on systems with allocators giving 8-byte-
aligned pointers, such as glibc 32-bit.

It would also allow us to adapt on-the-fly if the allocator is updated and 
starts to give us 16-byte aligned pointers on 32-bit, which would otherwise be 
the worst case scanerio below:

 0  0.5%
 2  0.5%
 4  0.5%
 6  0.5%
 8  96.5%
10  0.5%
12  0.5%
14  0.5%

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center
  PGP/GPG: 0x6EF45358; fingerprint:
  E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358


signature.asc
Description: This is a digitally signed message part.
___
Qt5-feedback mailing list
Qt5-feedback@qt.nokia.com
http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback