Tim Armstrong has posted comments on this change. Change subject: IMPALA-5266 Impala ABM / LZCNT support ......................................................................
Patch Set 8: (6 comments) http://gerrit.cloudera.org:8080/#/c/5821/8/be/src/benchmarks/bit-intrinsics-benchmark.cc File be/src/benchmarks/bit-intrinsics-benchmark.cc: Line 33: // PopcountSlowNoHw 78.8 79.6 80.3 0.0782X 0.0788X 0.0787X This should be the baseline for the popcount functions right? I think you'll get the right relative timings for those if you divide this up into multiple suites, one per function. http://gerrit.cloudera.org:8080/#/c/5821/8/be/src/codegen/CMakeLists.txt File be/src/codegen/CMakeLists.txt: Line 67: COMMAND ${LLVM_CLANG_EXECUTABLE} ${CLANG_IR_CXX_FLAGS} "-msse4.2" "-mlzcnt" ${CLANG_INCLUDE_FLAGS} ${IR_INPUT_FILES} -o ${IR_SSE_TMP_OUTPUT_FILE} I don't think this is safe - this tells clang it's safe to emit lzcnt anywhere. Ivy Bridge has SSE4.2 but not LZCNT from what I can tell and we only check for SSE4.2 before loading this version of the module (in LlvmCodeGen::CreateFromMemory()). Currently we have two ways of dealing with these architectural differences - this compile-time approach where we generate different versions of the module, some of which have the unsupported instructions removed at compile-time (see be/src/util/sse-util.h) - and a run-time approach where we apply a target attribute to the individual function and have a runtime check before calling the function. I think both approaches have downsides so I'm not sure what's right to do here. For the compile-time approach we need an extra IR module for every configuration. I wish we had a better solution for cases like this. http://gerrit.cloudera.org:8080/#/c/5821/8/be/src/util/bit-util.h File be/src/util/bit-util.h: Line 323: static inline int Log2FloorNonZero(uint32_t n) { > This actually ended up unused, as the 32-bit version of the RoundUpToPowerO Can we remove it? Line 387: static inline int32_t RoundUpToPowerOfTwo(int32_t v) { Do we need the 32-bit version? I think in most cases we should be doing the kind of calculations that use this (buffer sizes, etc) in 64-bit arithmetic - having the overloads could lead to subtle bugs. http://gerrit.cloudera.org:8080/#/c/5821/8/be/src/util/fixed-size-hash-table.h File be/src/util/fixed-size-hash-table.h: Line 53: DCHECK_EQ(capacity_, BitUtil::RoundUpToPowerOfTwo(static_cast<int64_t>(capacity_))); > Uses of unsigned integers actually get caught now. This should really be DCHECK(BitUtil::IsPowerOfTwo()) - I think this was written before that function existed. http://gerrit.cloudera.org:8080/#/c/5821/8/be/src/util/sse-util.h File be/src/util/sse-util.h: Line 190: #define POPCNT_popcnt_u32 _mm_popcnt_u32 I mentioned this elsewhere but I don't think these are always available when SSE4.2 is present, so it's not necessarilysafe to have these under the SSE4.2 ifdef. It looks like POPCNT is (mostly?) available when SSE42 is available: https://en.wikipedia.org/wiki/SSE4#Supporting_CPUs but LZCNT definitely isn't. I've also seen some cases where VMs report weird CPU info that doesn't match any real processor (e.g. "haswell" without AVX support), so we should be a little careful about what we assume. I can't think of a reason why SSE4.2 would be enabled but POPCNT disabled but who knows. -- To view, visit http://gerrit.cloudera.org:8080/5821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9f6a465ab4a9ee4f582847f8e211a779bdede3d2 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zach Amsden <[email protected]> Gerrit-HasComments: Yes
