Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Taewoo Kim has submitted this change and it was merged. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. ASTERIXDB-1778: Optimize the edit-distance-check function - Only calculate 2 * (threshold + 1) cells, rather than all cells per row. - Terminate the calculation steps early when it become obvious that the possible edit-distance value is greater than the given threshold. There is no reason to compute all cells in the 2 dimensional array. - Move the location of IListIterator to Hyracks since we now have a CharacterIterator in a String. Change the name to ISequenceIterator. - Add the section for the function in the manual. - Remove letter counting filtering method since it is only applicable for the string in ASCII range (0 ~ 127). Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Reviewed-on: https://asterix-gerrit.ics.uci.edu/1481 Sonar-Qube: JenkinsTested-by: Jenkins BAD: Jenkins Integration-Tests: Jenkins Reviewed-by: Jianfeng Jia --- M asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md M asterixdb/asterix-fuzzyjoin/pom.xml M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricJaccard.java A asterixdb/asterix-fuzzyjoin/src/test/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistanceTest.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/AbstractAsterixListIterator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceCheckEvaluator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedCheckEvaluator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedEvaluator.java R hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/ISequenceIterator.java A hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8StringCharByCharIterator.java 14 files changed, 355 insertions(+), 281 deletions(-) Approvals: Jianfeng Jia: Looks good to me, approved Jenkins: Verified; No violations found; No violations found; Verified diff --git a/asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md b/asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md index 89ef0f7..cb3318f 100644 --- a/asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md +++ b/asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md @@ -47,6 +47,36 @@ 2 +### edit_distance_check ### +* Syntax: + +edit_distance_check(expression1, expression2, threshold) + +* Checks whether the edit distance of `expression1` and `expression2` is within a given threshold. + +* Arguments: +* `expression1` : a `string` or a homogeneous `array` of a comparable item type. +* `expression2` : The same type as `expression1`. +* `threshold` : a `bigint` that represents the distance threshold. +* Return Value: +* an `array` with two items: +* The first item contains a `boolean` value representing whether the edit distance of `expression1` and `expression2` is within the given threshold. +* The second item contains an `integer` that represents the edit distance of `expression1` and `expression2` if the first item is true. +* If the first item is false, then the second item is set to 2147483647. +* `missing` if any argument is a `missing` value, +* `null` if any argument is a `null` value but no argument is a `missing` value, +* a type error will be raised if: +* the first or second argument is any other non-string value, +* or, the third argument is any other non-bigint value. +* Note: an [n_gram index](similarity.html#UsingIndexesToSupportSimilarityQueries) can be utilized for this function. +* Example: + +edit_distance_check("happy","hapr",2); + + +* The expected result is: + +[ true, 2 ] ### edit_distance_contains ### * Syntax: diff --git a/asterixdb/asterix-fuzzyjoin/pom.xml b/asterixdb/asterix-fuzzyjoin/pom.xml index 0539782..9485852 100644 --- a/asterixdb/asterix-fuzzyjoin/pom.xml +++
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 8: Integration-Tests+1 Integration Tests Successful https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1803/ : SUCCESS -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 8 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Chen Li Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 8: BAD+1 BAD Compatibility Tests Successful https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/472/ : SUCCESS -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 8 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Chen Li Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 8: BAD Compatibility Tests Started https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/472/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 8 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Chen Li Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 8: Integration Tests Started https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1803/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 8 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Chen Li Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 8: WARNING: THIS CHANGE CONTAINS CROSS-PRODUCT CHANGES IN: * asterixdb * hyracks-fullstack PLEASE REVIEW CAREFULLY AND LOOK FOR API CHANGES! -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 8 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Chen Li Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 8: Build Started https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-notopic/4180/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 8 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Chen Li Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Hello Jenkins, I'd like you to reexamine a change. Please visit https://asterix-gerrit.ics.uci.edu/1481 to look at the new patch set (#8). Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. ASTERIXDB-1778: Optimize the edit-distance-check function - Only calculate 2 * (threshold + 1) cells, rather than all cells per row. - Terminate the calculation steps early when it become obvious that the possible edit-distance value is greater than the given threshold. There is no reason to compute all cells in the 2 dimensional array. - Move the location of IListIterator to Hyracks since we now have a CharacterIterator in a String. Change the name to ISequenceIterator. - Add the section for the function in the manual. - Remove letter counting filtering method since it is only applicable for the string in ASCII range (0 ~ 127). Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 --- M asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md M asterixdb/asterix-fuzzyjoin/pom.xml M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricJaccard.java A asterixdb/asterix-fuzzyjoin/src/test/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistanceTest.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/AbstractAsterixListIterator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceCheckEvaluator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedCheckEvaluator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedEvaluator.java R hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/ISequenceIterator.java A hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8StringCharByCharIterator.java 14 files changed, 355 insertions(+), 281 deletions(-) git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/81/1481/8 -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 8 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Chen Li Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 7: Build Started https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-notopic/4179/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 7 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Chen Li Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Hello Jenkins, I'd like you to reexamine a change. Please visit https://asterix-gerrit.ics.uci.edu/1481 to look at the new patch set (#7). Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. ASTERIXDB-1778: Optimize the edit-distance-check function - Only calculate 2 * (threshold + 1) cells, rather than all cells per row. - Terminate the calculation steps early when it become obvious that the possible edit-distance value is greater than the given threshold. There is no reason to compute all cells in the 2 dimensional array. - Move the location of IListIterator to Hyracks since we now have a CharacterIterator in a String. Change the name to ISequenceIterator. - Add the section for the function in the manual. - Remove letter counting filtering method since it is only applicable for the string in ASCII range (0 ~ 127). Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 --- M asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md M asterixdb/asterix-fuzzyjoin/pom.xml M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricJaccard.java A asterixdb/asterix-fuzzyjoin/src/test/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistanceTest.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/AbstractAsterixListIterator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceCheckEvaluator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedCheckEvaluator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedEvaluator.java R hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/ISequenceIterator.java A hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8StringCharByCharIterator.java 14 files changed, 360 insertions(+), 281 deletions(-) git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/81/1481/7 -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 7 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Chen Li Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Taewoo Kim has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 6: @Jianfeng: I now see what you mean. Since the main function is a private function, yes, I will add a unit test case since it is not exposed to the public interface. Makes sense. -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 6 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Chen Li Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 6: Integration Tests Started https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1798/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 6 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Chen Li Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Taewoo Kim has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 6: For your comments, edit-distance-check_strings test case already contains that corner case. The latter two queries will do the early termination. I just checked it using "println". let $a := "Nalini Venkatasubramanian" let $b := "Nalini Wekatasupramanian" let $results := [ edit-distance-check($a, $b, 3), edit-distance-check($b, $a, 3), edit-distance-check($a, $b, 2), edit-distance-check($b, $a, 2) ] for $i in $results return $i -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 6 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Chen Li Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jianfeng Jia has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 6: Looks good. But I still feel there should be some simple *JUnit* test for the edit distance, not the AQL ones. The AQL (or SQL++) tests are too far away and usually is very difficult to hit the corner cases. -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 6 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Chen Li Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 6: Integration Tests Started https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1791/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 6 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Chen Li Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 6: Integration-Tests+1 Integration Tests Successful https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1788/ : SUCCESS -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 6 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Chen Li Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 6: Integration Tests Started https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1788/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 6 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Chen Li Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 6: BAD+1 BAD Compatibility Tests Successful https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/456/ : SUCCESS -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 6 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Chen Li Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 6: Integration-Tests-1 Integration Tests Failed https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1785/ : UNSTABLE -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 6 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Anon. E. Moose #1000151 Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 6: BAD Compatibility Tests Started https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/456/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 6 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Anon. E. Moose #1000151 Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 6: BAD-1 BAD Compatibility Tests Failed https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/449/ : FAILURE -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 6 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Anon. E. Moose #1000151 Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 6: BAD Compatibility Tests Started https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/449/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 6 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Anon. E. Moose #1000151 Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 6: Integration Tests Started https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1785/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 6 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Anon. E. Moose #1000151 Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 6: WARNING: THIS CHANGE CONTAINS CROSS-PRODUCT CHANGES IN: * asterixdb * hyracks-fullstack PLEASE REVIEW CAREFULLY AND LOOK FOR API CHANGES! -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 6 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Anon. E. Moose #1000151 Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 5: BAD-1 BAD Compatibility Tests Failed https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/448/ : FAILURE -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 5 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Anon. E. Moose #1000151 Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Taewoo Kim has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 5: @Jianfeng: the early termination logic is in place. We have test cases for them, too. In fact, the current test cases already cover them. (e.g., edit-distance-check_strings) -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 5 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Anon. E. Moose #1000151 Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 6: Build Started https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-notopic/4155/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 6 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Anon. E. Moose #1000151 Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 5: BAD Compatibility Tests Started https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/448/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 5 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Anon. E. Moose #1000151 Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Anon. E. Moose #1000151 has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 5: (1 comment) https://asterix-gerrit.ics.uci.edu/#/c/1481/5/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java: PS5, Line 109: -1 Add a comment to the function to explain the purpose of "-1". -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 5 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Anon. E. Moose #1000151 Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: Yes
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 5: WARNING: THIS CHANGE CONTAINS CROSS-PRODUCT CHANGES IN: * asterixdb * hyracks-fullstack PLEASE REVIEW CAREFULLY AND LOOK FOR API CHANGES! -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 5 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Anon. E. Moose #1000151 Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Anon. E. Moose #1000151 has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 4: (5 comments) First set of comments https://asterix-gerrit.ics.uci.edu/#/c/1481/4/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java File asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java: PS4, Line 29: float > this function doesn't has to be exposed. Is it better to rename "get" to "compute" since "get" seems to suggest it's a "getter"? https://asterix-gerrit.ics.uci.edu/#/c/1481/4/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java File asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java: PS4, Line 49: lists "lists" -> "sequences" to be consistent with the parameter type? PS4, Line 51: entire cells "entire cells" -> "all the cells in the row" PS4, Line 53: less than "less than" -> "within"? Line 99: if (canTerminateEarly) { Where is this "canTerminateEarly" decided? I couldn't find it. -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 4 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Anon. E. Moose #1000151 Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: Yes
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 5: Build Started https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-notopic/4154/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 5 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Anon. E. Moose #1000151 Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Hello Jenkins, I'd like you to reexamine a change. Please visit https://asterix-gerrit.ics.uci.edu/1481 to look at the new patch set (#5). Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. ASTERIXDB-1778: Optimize the edit-distance-check function - Only calculate 2 * (threshold + 1) cells, rather than all cells per row. - Terminate the calculation steps early when it become obvious that the possible edit-distance value is greater than the given threshold. There is no reason to compute all cells in the 2 dimensional array. - Move the location of IListIterator to Hyracks since we now have a CharacterIterator in a String. Change the name to ISequenceIterator. - Add the section for the function in the manual. Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 --- M asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md M asterixdb/asterix-fuzzyjoin/pom.xml M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricJaccard.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/AbstractAsterixListIterator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceCheckEvaluator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java R hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/ISequenceIterator.java A hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8StringCharByCharIterator.java 11 files changed, 291 insertions(+), 239 deletions(-) git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/81/1481/5 -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 5 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Anon. E. Moose #1000151 Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Taewoo Kim has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 3: (11 comments) @Jianfeng: Thanks! https://asterix-gerrit.ics.uci.edu/#/c/1481/4/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java File asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java: PS4, Line 26: > use javadoc syntax? Done PS4, Line 29: float > this function doesn't has to be exposed. Done PS4, Line 32: returns > use javadoc? Done https://asterix-gerrit.ics.uci.edu/#/c/1481/2/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java File asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java: Line 28: public static int getIntersectSize(ISequenceIterator tokensX, ISequenceIterator tokensY) > MAJOR SonarQube violation: Done https://asterix-gerrit.ics.uci.edu/#/c/1481/3/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java File asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java: PS3, Line 63: public > is it necessary to has an `public interface` ? Agreed and done. Line 70: boolean canTerminateEarly = edThresh >= 0 ? true : false; > *boolean canTerminateEarly = edThresh >= 0* is enough. The caller that is calling this function already checks your if condition.Since we change this to a private function, I think it's OK not to add the if condition. PS3, Line 131: 1 > can you define a static variable and give `-1` a good name? Done PS3, Line 144: Gets > do we really need this comments ? :-) Done PS3, Line 157: - > it worth explain the meaning of -1 Done PS3, Line 168: public > is it necessary to be a public method? Yes. It is being called from the outside of this class. PS3, Line 219: public > public -> private? It is being called from the outside. -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 3 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Anon. E. Moose #1000151 Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: Yes
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 4: Integration-Tests-1 Integration Tests Failed https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1775/ : UNSTABLE -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 4 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Anon. E. Moose #1000151 Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 4: BAD-1 BAD Compatibility Tests Failed https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/442/ : FAILURE -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 4 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 4: BAD Compatibility Tests Started https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/442/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 4 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 4: Integration Tests Started https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1775/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 4 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 3: Integration-Tests+1 Integration Tests Successful https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1773/ : SUCCESS -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 3 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jianfeng Jia has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 4: (10 comments) Just some minor comments. https://asterix-gerrit.ics.uci.edu/#/c/1481/4/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java File asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java: PS4, Line 26: use javadoc syntax? PS4, Line 29: float this function doesn't has to be exposed. PS4, Line 32: returns use javadoc? https://asterix-gerrit.ics.uci.edu/#/c/1481/3/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java File asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java: PS3, Line 63: public is it necessary to has an `public interface` ? I think it can just be a private function of this class. Line 70: boolean canTerminateEarly = edThresh >= 0 ? true : false; *boolean canTerminateEarly = edThresh >= 0* is enough. and if edThresh > min(flLen, slLen) should also be false? PS3, Line 131: 1 can you define a static variable and give `-1` a good name? PS3, Line 144: Gets do we really need this comments ? :-) PS3, Line 157: - it worth explain the meaning of -1 PS3, Line 168: public is it necessary to be a public method? PS3, Line 219: public public -> private? -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 4 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: Yes
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 4: WARNING: THIS CHANGE CONTAINS CROSS-PRODUCT CHANGES IN: * asterixdb * hyracks-fullstack PLEASE REVIEW CAREFULLY AND LOOK FOR API CHANGES! -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 4 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. Patch Set 4: Build Started https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-notopic/4147/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 4 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Hello Jenkins, I'd like you to reexamine a change. Please visit https://asterix-gerrit.ics.uci.edu/1481 to look at the new patch set (#4). Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function .. ASTERIXDB-1778: Optimize the edit-distance-check function - Only calculate 2 * (threshold + 1) cells, rather than all cells per row. - Terminate the calculation steps early when it become obvious that the possible edit-distance value is greater than the given threshold. There is no reason to compute all cells in the 2 dimensional array. - Move the location of IListIterator to Hyracks since we now have a CharacterIterator in a String. Change the name to ISequenceIterator. - Add the section for the function in the manual. Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 --- M asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md M asterixdb/asterix-fuzzyjoin/pom.xml M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricJaccard.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/AbstractAsterixListIterator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceCheckEvaluator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedEvaluator.java R hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/ISequenceIterator.java A hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8StringCharByCharIterator.java 12 files changed, 255 insertions(+), 226 deletions(-) git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/81/1481/4 -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 4 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim
Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: optimize the edit-distance-check function .. Patch Set 3: BAD-1 BAD Compatibility Tests Failed https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/440/ : FAILURE -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 3 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: optimize the edit-distance-check function .. Patch Set 3: BAD Compatibility Tests Started https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/440/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 3 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: optimize the edit-distance-check function .. Patch Set 3: Integration Tests Started https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1773/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 3 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: optimize the edit-distance-check function .. Patch Set 3: WARNING: THIS CHANGE CONTAINS CROSS-PRODUCT CHANGES IN: * asterixdb * hyracks-fullstack PLEASE REVIEW CAREFULLY AND LOOK FOR API CHANGES! -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 3 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: optimize the edit-distance-check function .. Patch Set 3: Build Started https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-notopic/4144/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 3 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function
Hello Jenkins, I'd like you to reexamine a change. Please visit https://asterix-gerrit.ics.uci.edu/1481 to look at the new patch set (#3). Change subject: ASTERIXDB-1778: optimize the edit-distance-check function .. ASTERIXDB-1778: optimize the edit-distance-check function - Only calculate 2 * (threshold + 1) cells, rather than all cells per row. - Terminate the calculation stpes early when it become obvious that the possible edit-distance value is greater than the given threshold. There is no reason to computes all cells in the 2*2 array. - Move the location of IListIterator to Hyracks since we now have a CharacterIterator in a String. Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 --- M asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md M asterixdb/asterix-fuzzyjoin/pom.xml M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricJaccard.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/AbstractAsterixListIterator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceCheckEvaluator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedEvaluator.java R hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/ISequenceIterator.java A hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8StringCharByCharIterator.java 12 files changed, 250 insertions(+), 170 deletions(-) git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/81/1481/3 -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 3 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim
Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: optimize the edit-distance-check function .. Patch Set 2: WARNING: THIS CHANGE CONTAINS CROSS-PRODUCT CHANGES IN: * asterixdb * hyracks-fullstack PLEASE REVIEW CAREFULLY AND LOOK FOR API CHANGES! -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 2 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function
Taewoo Kim has posted comments on this change. Change subject: ASTERIXDB-1778: optimize the edit-distance-check function .. Patch Set 2: Added Steven because of BAD failure. -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 2 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-Reviewer: Jianfeng Jia Gerrit-Reviewer: Steven Jacobs Gerrit-Reviewer: Taewoo Kim Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: optimize the edit-distance-check function .. Patch Set 2: Build Started https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-notopic/4143/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 2 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function
Hello Jenkins, I'd like you to reexamine a change. Please visit https://asterix-gerrit.ics.uci.edu/1481 to look at the new patch set (#2). Change subject: ASTERIXDB-1778: optimize the edit-distance-check function .. ASTERIXDB-1778: optimize the edit-distance-check function - Only calculate 2 * (threshold + 1) cells, rather than all cells per row. - Terminate the calculation stpes early when it become obvious that the possible edit-distance value is greater than the given threshold. There is no reason to computes all cells in the 2*2 array. Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 --- M asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md M asterixdb/asterix-fuzzyjoin/pom.xml M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricJaccard.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/AbstractAsterixListIterator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceCheckEvaluator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedEvaluator.java R hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/ISequenceIterator.java A hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8StringCharByCharIterator.java 12 files changed, 249 insertions(+), 170 deletions(-) git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/81/1481/2 -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 2 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins
Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: optimize the edit-distance-check function .. Patch Set 1: Integration-Tests+1 Integration Tests Successful https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1770/ : SUCCESS -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 1 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: optimize the edit-distance-check function .. Patch Set 1: BAD-1 BAD Compatibility Tests Failed https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/437/ : FAILURE -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 1 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: optimize the edit-distance-check function .. Patch Set 1: BAD Compatibility Tests Started https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/437/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 1 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: optimize the edit-distance-check function .. Patch Set 1: Integration Tests Started https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1770/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 1 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function
Jenkins has posted comments on this change. Change subject: ASTERIXDB-1778: optimize the edit-distance-check function .. Patch Set 1: Build Started https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-notopic/4137/ -- To view, visit https://asterix-gerrit.ics.uci.edu/1481 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 Gerrit-PatchSet: 1 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo KimGerrit-Reviewer: Jenkins Gerrit-HasComments: No
Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function
Taewoo Kim has uploaded a new change for review. https://asterix-gerrit.ics.uci.edu/1481 Change subject: ASTERIXDB-1778: optimize the edit-distance-check function .. ASTERIXDB-1778: optimize the edit-distance-check function - Only calculate 2 * (threshold + 1) cells, rather than all cells per row. - Terminate the calculation stpes early when it become obvious that the possible edit-distance value is greater than the given threshold. There is no reason to computes all cells in the 2*2 array. Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977 --- M asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricJaccard.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceCheckEvaluator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedEvaluator.java 8 files changed, 173 insertions(+), 117 deletions(-) git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/81/1481/1 diff --git a/asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md b/asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md index 89ef0f7..cb3318f 100644 --- a/asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md +++ b/asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md @@ -47,6 +47,36 @@ 2 +### edit_distance_check ### +* Syntax: + +edit_distance_check(expression1, expression2, threshold) + +* Checks whether the edit distance of `expression1` and `expression2` is within a given threshold. + +* Arguments: +* `expression1` : a `string` or a homogeneous `array` of a comparable item type. +* `expression2` : The same type as `expression1`. +* `threshold` : a `bigint` that represents the distance threshold. +* Return Value: +* an `array` with two items: +* The first item contains a `boolean` value representing whether the edit distance of `expression1` and `expression2` is within the given threshold. +* The second item contains an `integer` that represents the edit distance of `expression1` and `expression2` if the first item is true. +* If the first item is false, then the second item is set to 2147483647. +* `missing` if any argument is a `missing` value, +* `null` if any argument is a `null` value but no argument is a `missing` value, +* a type error will be raised if: +* the first or second argument is any other non-string value, +* or, the third argument is any other non-bigint value. +* Note: an [n_gram index](similarity.html#UsingIndexesToSupportSimilarityQueries) can be utilized for this function. +* Example: + +edit_distance_check("happy","hapr",2); + + +* The expected result is: + +[ true, 2 ] ### edit_distance_contains ### * Syntax: diff --git a/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java b/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java index ac4a3dd..751597d 100644 --- a/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java +++ b/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java @@ -22,8 +22,11 @@ import org.apache.hyracks.api.exceptions.HyracksDataException; public interface IGenericSimilarityMetric { -// returns similarity -public float getSimilarity(IListIterator firstList, IListIterator secondList) throws HyracksDataException; +// Returns -1 if this method supports early-termination and it becomes obvious that +// the possible similarity value can't satisfy the given simThresh value. +// Else returns the calculated similarity value. +public float getActualSimilarityVal(IListIterator firstList, IListIterator secondList, float simThresh) +throws HyracksDataException; // returns -1 if does not satisfy threshold // else returns similarity diff --git a/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java b/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java index d36d60d..70029a3 100644 ---