Greg Rahn has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/13794 )
Change subject: IMPALA-8709: Add Damerau-Levenshtein edit distance built-in function ...................................................................... IMPALA-8709: Add Damerau-Levenshtein edit distance built-in function This patch adds new built-in functions to calculate restricted Damerau-Levenshtein edit distance (optimal string alignment). Implmented as dle_dst() and damerau_levenshtein(). If either value is NULL or both values are NULL returns NULL which differs from Netezza's dle_dst() which returns the length of the not NULL value or 0 if both values are NULL. The NULL behavior matches the existing levenshtein() function. Also clean up levenshtein tests. Testing: - Added unit tests to expr-test.cc - Manualal testing on over 1400 string pairs from http://marvin.cs.uidaho.edu/misspell.html and results match Netezza Change-Id: Ib759817ec15e7075bf49d51e494e45c8af4db94d --- M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M common/function-registry/impala_functions.py 4 files changed, 107 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/13794/3 -- To view, visit http://gerrit.cloudera.org:8080/13794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib759817ec15e7075bf49d51e494e45c8af4db94d Gerrit-Change-Number: 13794 Gerrit-PatchSet: 3 Gerrit-Owner: Greg Rahn <gr...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>