Hmmm. The natural thing would be to have some type to describe these 128-bit values (akin to __m128 in gcc, Intel and MS compilers) and use sizeof on that. I don't see that D has any MMX/SSE intrinsics, so I don't know if there is a standard type. If you don't have such a thing defined by the compiler, I'd be tempted to define it, based on which version of the compiler will compile this code (i.e. 32- or 64-bit dmd). Then you can use that in your sizeof. Maybe you'll get lucky, and that will become standard :)
Jason ----- Original Message ---- > From: Steve Schveighoffer <[email protected]> > To: Discuss the phobos library for D <[email protected]> > Sent: Mon, June 28, 2010 1:35:59 PM > Subject: Re: [phobos] byte alignment for arrays > > Thanks, this information helps a lot! I will make the change to 16-byte > aligned. I'm already using 8 bytes for a 4 byte length. Using 16 > bytes isn't much different, especially when the block size is 4096+ > bytes. One final question -- I currently use sizeof(size_t) * 2, which > could now be sizeof(size_t) * 4, but of course, this changes to 32 bytes on > 64-bit dmd. Would it make sense to just use 16 instead of some multiple of > size_t? -Steve ----- Original Message ---- > From: Jason > Spencer < > href="mailto:[email protected]">[email protected]> > To: > Discuss the phobos library for D < > href="mailto:[email protected]">[email protected]> > Sent: > Mon, June 28, 2010 4:09:01 PM > Subject: Re: [phobos] byte alignment for > arrays > > Sorry, I forgot to address the every-other-one > concern. The MMX registers > are 64-bits, so you can only do 1 > double at a time. Those instructions > only require 8-byte aligned > memory. The SSE instructions use 128-bit > registers, so they take > 2 doubles at a time. As long as the first one is > 16-byte aligned, > you can iterate through on 16-byte (128 bits) chunks, and > you'll be > good. That's why element 0 should be 128-aligned. If it's > > not, the processor will either have an alignment fault (in the instruction > > requires alignment) or will do a bunch of split-loads across cache > lines, which > kill performance. One other thought: > If you wanted to be > tricky, you could do a general, 4-byte allocation > and based on the address you > get, assign your storage pointer to the > next 128-aligned address. But > you're offloading to run-time lot's > of housekeeping. Again, maybe > tolerable for just these large > arrays. But it starts to add a lot of > corner cases. Walter > might have some good suggestions > > here. Jason ----- Original Message ---- > From: > > Steve Schveighoffer < > A question then -- let's say > you have > an array of > doubles, which are 8 bytes wide, and you > want to > > use these SSE instructions. Even if the first > one is aligned on a 16-byte > > boundary, wouldn't every other > double be > > > misaligned? _______________________________________________ phobos mailing > > list > href="mailto: > href="mailto:[email protected]">[email protected]"> > ymailto="mailto:[email protected]" > href="mailto:[email protected]">[email protected] http://lists.puremagic.com/mailman/listinfo/phobos > _______________________________________________ phobos > mailing list > href="mailto:[email protected]">[email protected] > href="http://lists.puremagic.com/mailman/listinfo/phobos" target=_blank > >http://lists.puremagic.com/mailman/listinfo/phobos _______________________________________________ phobos mailing list [email protected] http://lists.puremagic.com/mailman/listinfo/phobos
