Bankim Bhavsar has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17376 )
Change subject: [util] Add special handling for empty strings in FastHash ...................................................................... [util] Add special handling for empty strings in FastHash Impala uses special handling for empty strings while hashing https://github.com/apache/impala/blob/master/be/src/runtime/raw-value.inline.h#L352 This leads to discrepancy between hash values for empty strings between Impala and Kudu. This change basically matches the behavior with Impala. Changes to the implementation of hashing function for empty strings generates a different value and hence incorrect results across different Kudu client and server versions. To resolve this issue, a new feature flag BLOOM_FILTER_PREDICATE_V2 has been added and the older one BLOOM_FILTER_PREDICATE has been marked as deprecated. Since FastHash is not used for any persistent data structures so there are no backward compatibility issues that needs special handling during upgrade. Discovered bugs in the Java implementation when a negative byte is promoted to a long and left shifted. Discovery was because of the magic value for empty strings/buffer. Thankfully the Java implementation of the FastHash is not used currently. Also fixed a bug in predicate test where flush would be missed due to increase in number of values which affected TSAN. Change-Id: Idb2d401ab692decd5bc03b48a96d710546c0039e Reviewed-on: http://gerrit.cloudera.org:8080/17376 Tested-by: Kudu Jenkins Reviewed-by: Grant Henke <[email protected]> Reviewed-by: Wenzhe Zhou <[email protected]> Reviewed-by: Andrew Wong <[email protected]> --- M java/kudu-client/src/main/java/org/apache/kudu/util/HashUtil.java M java/kudu-client/src/test/java/org/apache/kudu/util/TestFashHash.java M src/kudu/client/CMakeLists.txt M src/kudu/client/predicate-test.cc M src/kudu/client/scanner-internal.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/tserver.proto M src/kudu/util/hash_util-test.cc M src/kudu/util/hash_util.h 9 files changed, 99 insertions(+), 32 deletions(-) Approvals: Kudu Jenkins: Verified Grant Henke: Looks good to me, but someone else must approve Wenzhe Zhou: Looks good to me, but someone else must approve Andrew Wong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/17376 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Idb2d401ab692decd5bc03b48a96d710546c0039e Gerrit-Change-Number: 17376 Gerrit-PatchSet: 6 Gerrit-Owner: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
