Correction: The test I'm running is actually using V4<V4<Float>>. Manually unrolling the loop makes adding V4<V4<Float>> as fast as adding SIMD float4x4. Using the (un-unrolled) for loop will be about 4 times slower. My question is still: Shouldn't the optimizer be able to handle that for loop / make my manual unrolling unnecessary? /Jens
On Fri, Dec 11, 2015 at 8:28 AM, Jens Persson <j...@bitcycle.com> wrote: > I've been doing a lot of performance testing related to generic value > types and SIMD lately, and I've built Swift from sources in order to get an > idea of what's coming up optimizerwise. Things have improved and the > optimizer is impressive overall. But I still see no improvement in the case > exemplified below. > > Manually unrolling the simple for loop will make it ~ 4 times faster (and > exactly the same as when SIMD float4): > > struct V4<T> { > var elements: (T, T, T, T) > /.../ > subscript(index: Int) -> T { /.../ } > /.../ > func addedTo(other: V4) -> V4 { > var r = V4() > // Manually unrolling makes code ~ 4 times faster: > // for i in 0 ..< 4 { r[i] = self[i] + other[i] } > r[0] = self[0] + other[0] > r[1] = self[1] + other[1] > r[2] = self[2] + other[2] > r[3] = self[3] + other[3] > return r > } > /.../ > } > > Shouldn't the optimizer be able to handle that for loop and make the > manual unrolling unnecessary? > > (compiled the test with -O -whole-module-optimizations, also tried > -Ounchecked but with same results.) > > /Jens > > -- bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden http://www.bitcycle.com/ Phone: +46-73-753 24 62 E-mail: j...@bitcycle.com
_______________________________________________ swift-dev mailing list swift-dev@swift.org https://lists.swift.org/mailman/listinfo/swift-dev