[Qt5-feedback] QList in-array item size
Hello Currently QList is an array of void* and it stores items that fit that size and are movable in the array itself. This has a couple of inefficiencies: - on a 64-bit platform, there's a 50% overhead for QListint (QListshort and QListchar are unlikely) - QListdouble uses indirect allocation on 32-bit platforms - QVariant has one member of 12 bytes on all platforms, QSharedPointer and QWeakPointer are always twice the size of void*, so QList of them always uses indirect allocation (including QVariantList) So I'd like to propose a change to QList's allocation strategy. Option 1: large 2*sizeof(void*) - pros: will include double, QSharedPointer and QWeakPointer - cons: does not include QVariant on 32-bit platforms, 75% overhead on int on 32-bit Option 2: large 16 bytes - pros: includes double, the pointers and QVariant - cons: overhead of 75% of int and pointers on 32-bit; 50% overhead of pointers on 64-bit Option 3: make it QListT be an actual QVectorT for movable types, maybe with an upper limit of size (32 bytes, 128 bytes?). - pros: suitable for all types, shares code - cons: more complex to implement, more code to compile and parse in headers One big difference between QList and QVector today is that QList has prepend / takeFirst optimisation, whereas QVector must move all elements to accommodate. I would prefer if that optimisation remained present. -- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Software Architect - Intel Open Source Technology Center PGP/GPG: 0x6EF45358; fingerprint: E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358 signature.asc Description: This is a digitally signed message part. ___ Qt5-feedback mailing list Qt5-feedback@qt.nokia.com http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback
[Qt5-feedback] Merging the vector/array headers in QtCore
When I talked to João this week, he expressed the desire to unify the header code of QVector, QByteArray and QString. Since QByteArray is a vector of char and QString is a vector of QChar / ushort, it makes sense to unify the code to simplify maintenance. Whether specific optimisations in allocation policies are needed, we can figure out later. But given my last email, about also bringing in QList for small and movable types into this merging, I pointed out that QList has one optimisation that QVector doesn't have. To support that feature, the QListData header currently is: QAtomicInt ref; int alloc; int begin; int end; // size = 16 bytes, alignment = 4 bytes There's also an 1-bit flag which I have not included. It can be stored in the sign bit of one of the ints in the future. In addition, the current QStringData and QByteArrayData headers are (not including the flags): QAtomicint ref; int size; int alloc; qptrdiff offset; // size = 16 (24) bytes, alignment = 4 (8) bytes Note how there's a 4-byte padding gap before the offset member on 64-bit platforms and how the data starts at an offset of 24 bytes. Given a 16-byte aligned allocator, which is common on 64-bit platforms, the data starts at an 8 byte offset from the nearest 16-byte alignment, which is a problem for SSE2 optimisations. I'd like to pick all of your brains for a solution that has the following properties: - size always 16 bytes - contains a qptrdiff offset, to still allow for fromRawData (which we'd then introduce to QVector too, but not QList) - can store at least one 1-bit flag, but 2 would be preferred - size calculation reasonably fast -- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Software Architect - Intel Open Source Technology Center PGP/GPG: 0x6EF45358; fingerprint: E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358 signature.asc Description: This is a digitally signed message part. ___ Qt5-feedback mailing list Qt5-feedback@qt.nokia.com http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback
Re: [Qt5-feedback] Merging the vector/array headers in QtCore
On Sunday 16 October 2011 16:30:39 Thiago Macieira wrote: When I talked to João this week, he expressed the desire to unify the header code of QVector, QByteArray and QString. Since QByteArray is a vector of char and QString is a vector of QChar / ushort, it makes sense to unify the code to simplify maintenance. Whether specific optimisations in allocation policies are needed, we can figure out later. Appart from consistency in out internal structures, what is the gain? What is the code that can be reuse? (Appart the declaration itself, i can't think of any) As this is an internal thing, this do not give us anything regarding the API point of view. And this keep us to have nice per/data structure optimisation. In other word, i'm not really in favor of this. (I think the sharing should happen at code level, by providing the same interface which enable templated code (like QStringBuilder) to work of all types.) But given my last email, about also bringing in QList for small and movable types into this merging, I pointed out that QList has one optimisation that QVector doesn't have. To support that feature, the QListData header currently is: QAtomicInt ref; int alloc; int begin; int end; // size = 16 bytes, alignment = 4 bytes There's also an 1-bit flag which I have not included. It can be stored in the sign bit of one of the ints in the future. In addition, the current QStringData and QByteArrayData headers are (not including the flags): QAtomicint ref; int size; int alloc; qptrdiff offset; // size = 16 (24) bytes, alignment = 4 (8) bytes Note how there's a 4-byte padding gap before the offset member on 64-bit platforms and how the data starts at an offset of 24 bytes. Given a 16-byte aligned allocator, which is common on 64-bit platforms, the data starts at an 8 byte offset from the nearest 16-byte alignment, which is a problem for SSE2 optimisations. One thing I would like to add is the hash, for QString (and potentialy QByteArray) see: http://qt.gitorious.org/qt/qtbase/merge_requests/62 I'd like to pick all of your brains for a solution that has the following properties: - size always 16 bytes Not possible on 64bit Or did you mean multiple of? - contains a qptrdiff offset, to still allow for fromRawData (which we'd then introduce to QVector too, but not QList) And there again you are saying it will be different for QList and QVector Or you want to put a lot of unions there (which negates the hole purpose) - can store at least one 1-bit flag, but 2 would be preferred The more flag you have, the more future proof you are if you want to keep binary compatibility. - size calculation reasonably fast And add stuff like: - can be generate at compile time and put in the .rodata (like we did for QString) ___ Qt5-feedback mailing list Qt5-feedback@qt.nokia.com http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback
Re: [Qt5-feedback] QList in-array item size
On Sunday 16 October 2011 16:21:40 Thiago Macieira wrote: [...] Option 3: make it QListT be an actual QVectorT for movable types, maybe with an upper limit of size (32 bytes, 128 bytes?). - pros: suitable for all types, shares code - cons: more complex to implement, more code to compile and parse in headers This is the obvious solutions... One big difference between QList and QVector today is that QList has prepend / takeFirst optimisation, whereas QVector must move all elements to accommodate. I would prefer if that optimisation remained present. It has to (it enable QList to be used for queue like data structure (and it is, see QQueue) QList was also supposed to expands to less code (than QVector) (at least that is what Jasmin used to advertise some years ago) ___ Qt5-feedback mailing list Qt5-feedback@qt.nokia.com http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback
Re: [Qt5-feedback] QList in-array item size
On Sunday, 16 de October de 2011 17:12:33 Olivier Goffart wrote: One big difference between QList and QVector today is that QList has prepend / takeFirst optimisation, whereas QVector must move all elements to accommodate. I would prefer if that optimisation remained present. It has to (it enable QList to be used for queue like data structure (and it is, see QQueue) QList was also supposed to expands to less code (than QVector) (at least that is what Jasmin used to advertise some years ago) I don't see how it can expand to less code than a simple vector. The best case scenario is to expand to the same code or to similar complexity -- assuming of course that QVector is optimised to produce minimal code too. -- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Software Architect - Intel Open Source Technology Center PGP/GPG: 0x6EF45358; fingerprint: E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358 signature.asc Description: This is a digitally signed message part. ___ Qt5-feedback mailing list Qt5-feedback@qt.nokia.com http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback
Re: [Qt5-feedback] QList in-array item size
On Sunday 16 October 2011 18:59:08 Thiago Macieira wrote: On Sunday, 16 de October de 2011 17:12:33 Olivier Goffart wrote: One big difference between QList and QVector today is that QList has prepend / takeFirst optimisation, whereas QVector must move all elements to accommodate. I would prefer if that optimisation remained present. It has to (it enable QList to be used for queue like data structure (and it is, see QQueue) QList was also supposed to expands to less code (than QVector) (at least that is what Jasmin used to advertise some years ago) I don't see how it can expand to less code than a simple vector. The best case scenario is to expand to the same code or to similar complexity -- assuming of course that QVector is optimised to produce minimal code too. Because QList make use of much more stuff in QListData that is not template type (hence, not generated for every types) However, I beleive it is possible to do the same optimisation for movable type within QVector, if we want to. ___ Qt5-feedback mailing list Qt5-feedback@qt.nokia.com http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback
Re: [Qt5-feedback] QList in-array item size
On Sunday, 16 de October de 2011 19:51:39 Olivier Goffart wrote: I don't see how it can expand to less code than a simple vector. The best case scenario is to expand to the same code or to similar complexity -- assuming of course that QVector is optimised to produce minimal code too. Because QList make use of much more stuff in QListData that is not template type (hence, not generated for every types) However, I beleive it is possible to do the same optimisation for movable type within QVector, if we want to. The biggest difference between current QList and QVector is that the size of the element is constant in QList, whereas it varies in QVector. The static code to manage the array, however, need not know the difference: it only needs to know the total size and the alignment requirement. -- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Software Architect - Intel Open Source Technology Center PGP/GPG: 0x6EF45358; fingerprint: E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358 signature.asc Description: This is a digitally signed message part. ___ Qt5-feedback mailing list Qt5-feedback@qt.nokia.com http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback
Re: [Qt5-feedback] Merging the vector/array headers in QtCore
On Sunday, 16 de October de 2011 16:30:39 Thiago Macieira wrote: - size always 16 bytes - contains a qptrdiff offset, to still allow for fromRawData (which we'd then introduce to QVector too, but not QList) - can store at least one 1-bit flag, but 2 would be preferred - size calculation reasonably fast Here's an idea: QAtomicInt ref; int alloc; union { qptrdiff offset; struct { int begin; int end; }; }; // size = 16 bytes Allocation sizes are always positive, so the sign bit in alloc is free for our use. If the sign bit is set (alloc 0) then we're dealing with fromRawData. If it's not set, we're dealing with data we allocated ourselves. The two 1-bit flags can be stored in the sign bits of begin and end, but only for own-allocated data. Currently, the two flags we use are sharable and capacity, neither of which makes sense for raw data. We'd have: int size() const { return d-alloc 0 ? -d-alloc : d-end - d-begin; } T *data() { return d-array() + (d-alloc 0 ? d-offset : d-begin); } Advantages: * 16 bytes on all platforms, regardless of pointer size * 2 bits available for flags * raw data is possible * size calculation involves one extra comparison (the subtraction would have been there anyway) * the data() function above does not actually do any comparison on 32-bit platforms, as offset and begin always occupy the same space * the calculation to increase the size upon reallocation like: if (newAlloc d-alloc) does not require special code to deal with raw data Disadvantages: * No wiggle room for *any* future expansion * Storing the flags in the sign bits require bit fields or other manipulations -- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Software Architect - Intel Open Source Technology Center PGP/GPG: 0x6EF45358; fingerprint: E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358 signature.asc Description: This is a digitally signed message part. ___ Qt5-feedback mailing list Qt5-feedback@qt.nokia.com http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback
Re: [Qt5-feedback] Merging the vector/array headers in QtCore
On Sunday, 16 de October de 2011 21:09:44 Thiago Macieira wrote: Here's an idea: QAtomicInt ref; int alloc; union { qptrdiff offset; struct { int begin; int end; }; }; // size = 16 bytes And here are two possibilities admitting defeat and going over 16 bytes: Option 1: QAtomicInt ref; int alloc; union { qptrdiff begin; qint64 dummy; }; int end; int flags; // size = 24 bytes Advantages: * 32 bits of flags available, reserving room for future expansion * no fiddling with sign bits anywhere Disadvantages: * 32 bits wasted on 32-bit platforms, which will never be used * assuming an allocator aligning to 16 bytes, the start of the data will always be 8 bytes off, incurring performance penalty with SSE2 operations ( 99% of the cases) * QVectors of SSE types will have 8 bytes of padding Option 2: QAtomicInt ref; int flags; union { qptrdiff alloc; qint64 dummy; }; qptrdiff begin; qptrdiff end; // size = 24 (32) bytes Advantages: * 32 bits of flags available * size multiple of 16 on 64-bit platforms, for best SSE2 performance * full 64-bit sizes for 64-bit machines, allowing for allocation of more than 2 GB of data. The same header could be used for a QHugeVector class that operates on signed 64-bit sizes, allowing up to 8388608 TB of data * No padding required for QVectors of SSE types Disadvantages: * 100% bigger than the original structure, 50% bigger than the Option 1 * 32 bits wasted on 32-bit platforms On 32-bit machines, if the allocator produces 16-byte-aligned memory regions, we'll be wrong on 95% of the cases, causing SSE2 performance penalties. However, if the allocator produces 8-byte-algined memory regions, as malloc in glibc does, we'll be wrong just over 50% of the cases whether the structure is 24 or 32 bytes long. So we gain nothing by making it 32 bytes long on 32-bit machines. The %-age of the use-cases is based on my experience with attempting SIMD optimisations on QString. Over a large sample, I found out that 95%-99% of the data comes from QString's own allocations and the rest (1-5%) comes from fromRawData. The strings in fromRawData are evenly distributed across all possible alignments, the strings allocated by QString are evenly distributed across both possibilities on 32-bit machines. In other words, the histogram of QString data alignments, on a 32-bit machine with an 8-byte-aligning allocator (like glibc's) should be roughly like the following, with both a 16, 24 or 32-byte header: 0 48.5% 2 0.5% 4 0.5% 6 0.5% 8 48.5% 10 0.5% 12 0.5% 14 0.5% With a 16- or 32-byte header with an allocator giving aligned-to-16 memory regions, we should see: 0 96.5% 2 0.5% 4 0.5% 6 0.5% 8 0.5% 10 0.5% 12 0.5% 14 0.5% To make the 32-bit structure fit the latter profile above, we'd need to add another 8 bytes to the header (bringing the total wastage to 12 bytes) and hope for an allocator that aligns to 16 bytes. Using posix_memalign or equivalent functions is likely to simply cause another 8 bytes of overhead inside the allocators. An alternative, and IMHO better, approach would be to always allocate 8 bytes more than strictly needed and force d-begin to the 16-byte boundary. That means that d-begin == 4 whenever d is misaligned. This approach would allow us to achieve the above profile even on systems with allocators giving 8-byte- aligned pointers, such as glibc 32-bit. It would also allow us to adapt on-the-fly if the allocator is updated and starts to give us 16-byte aligned pointers on 32-bit, which would otherwise be the worst case scanerio below: 0 0.5% 2 0.5% 4 0.5% 6 0.5% 8 96.5% 10 0.5% 12 0.5% 14 0.5% -- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Software Architect - Intel Open Source Technology Center PGP/GPG: 0x6EF45358; fingerprint: E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358 signature.asc Description: This is a digitally signed message part. ___ Qt5-feedback mailing list Qt5-feedback@qt.nokia.com http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback