Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4300: Speed up BloomFilter::Or with SIMD
......................................................................


Patch Set 1:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/4813/1/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

Line 167
One reason auto-vectorisation must fail here is that the two directories may 
overlap (as far as the compiler knows). 

There's a pragma that asserts that this isn't the case.
https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/Loop-Specific-Pragmas.html 

It may also generate better code if you partially unroll the loop so that it 
operates on 32 bytes at a time - this avoids the need for the compiler to 
unroll into two loops with strides of 32 bytes and 8 bytes.


PS1, Line 162: DCHECK
DCHECK_EQ


Line 163:   auto simd_in = reinterpret_cast<const double*>(in);
I think double*/const double* is more readable than auto here - I'm used to 
looking at the left end of the line to find the type instead of having to parse 
the cast expression.


PS1, Line 166: int
Use size_t for consistency? I had to think about whether the implicit casts 
caused any problems.


PS1, Line 168: _mm256_loadu_pd
We should be able to use the aligned versions here, right, since we allocated 
the memory with posix_memalign? I.e. 
https://software.intel.com/en-us/node/524101


PS1, Line 181: should
It shouldn't actually with the code in it's current form since it has to assume 
the the directories may overlap.


Line 184:   // TODO: Tune gcc flags to auto-vectorize. This might not be 
possible.
See my other comment - should be possible.


Line 186:   // TODO: IMPALA-4312: Evaluate clang for builds other than ASAN.
Just remove this TODO? I don't think it's informative for people trying to 
understand this code.


Line 190:     auto simd_in = reinterpret_cast<const __m128i*>(&in.directory[0]);
This code is simple enough that I'm fine with moving forward with it even if we 
*could* spend some time getting the auto-vectorisation to work. So it WFM if 
you want to clean up the comments and leave this code as-is, or if you want to 
play around with auto-vectorisation futher.


PS1, Line 191: auto
I think just repeating the types const__m128i* or __m128i* is more readable 
than auto since it takes slightly less effort to figure out the type of the 
variable.


PS1, Line 192: auto
I'd prefer the int type was explicit here.


PS1, Line 196: int
size_t for consistency?


PS1, Line 199: _mm_storeu_si128
We should be able to use the aligned versions here too.


-- 
To view, visit http://gerrit.cloudera.org:8080/4813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I840799d9cfb81285c796e2abfe2029bb869b0f67
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to