Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-05 Thread Taewoo Kim (Code Review)
Taewoo Kim has submitted this change and it was merged.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


ASTERIXDB-1778: Optimize the edit-distance-check function

 - Only calculate 2 * (threshold + 1) cells, rather than all cells per row.
 - Terminate the calculation steps early when it become obvious that
   the possible edit-distance value is greater than the given threshold.
   There is no reason to compute all cells in the 2 dimensional array.
 - Move the location of IListIterator to Hyracks since we now have
   a CharacterIterator in a String. Change the name to ISequenceIterator.
 - Add the section for the function in the manual.
 - Remove letter counting filtering method since it is only applicable for
   the string in ASCII range (0 ~ 127).

Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Reviewed-on: https://asterix-gerrit.ics.uci.edu/1481
Sonar-Qube: Jenkins 
Tested-by: Jenkins 
BAD: Jenkins 
Integration-Tests: Jenkins 
Reviewed-by: Jianfeng Jia 
---
M asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md
M asterixdb/asterix-fuzzyjoin/pom.xml
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricJaccard.java
A 
asterixdb/asterix-fuzzyjoin/src/test/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistanceTest.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/AbstractAsterixListIterator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceCheckEvaluator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedCheckEvaluator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedEvaluator.java
R 
hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/ISequenceIterator.java
A 
hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8StringCharByCharIterator.java
14 files changed, 355 insertions(+), 281 deletions(-)

Approvals:
  Jianfeng Jia: Looks good to me, approved
  Jenkins: Verified; No violations found; No violations found; Verified



diff --git a/asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md 
b/asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md
index 89ef0f7..cb3318f 100644
--- a/asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md
+++ b/asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md
@@ -47,6 +47,36 @@
 
 2
 
+### edit_distance_check ###
+* Syntax:
+
+edit_distance_check(expression1, expression2, threshold)
+
+* Checks whether the edit distance of `expression1` and `expression2` is 
within a given threshold.
+
+* Arguments:
+* `expression1` : a `string` or a homogeneous `array` of a comparable item 
type.
+* `expression2` : The same type as `expression1`.
+* `threshold` : a `bigint` that represents the distance threshold.
+* Return Value:
+* an `array` with two items:
+* The first item contains a `boolean` value representing whether the 
edit distance of `expression1` and `expression2` is within the given threshold.
+* The second item contains an `integer` that represents the edit 
distance of `expression1` and `expression2` if the first item is true.
+* If the first item is false, then the second item is set to 
2147483647.
+* `missing` if any argument is a `missing` value,
+* `null` if any argument is a `null` value but no argument is a `missing` 
value,
+* a type error will be raised if:
+* the first or second argument is any other non-string value,
+* or, the third argument is any other non-bigint value.
+* Note: an [n_gram 
index](similarity.html#UsingIndexesToSupportSimilarityQueries) can be utilized 
for this function.
+* Example:
+
+edit_distance_check("happy","hapr",2);
+
+
+* The expected result is:
+
+[ true, 2 ]
 
 ### edit_distance_contains ###
 * Syntax:
diff --git a/asterixdb/asterix-fuzzyjoin/pom.xml 
b/asterixdb/asterix-fuzzyjoin/pom.xml
index 0539782..9485852 100644
--- a/asterixdb/asterix-fuzzyjoin/pom.xml
+++ 

Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-04 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 8: Integration-Tests+1

Integration Tests Successful

https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1803/ 
: SUCCESS

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 8
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Chen Li 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-04 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 8: BAD+1

BAD Compatibility Tests Successful

https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/472/ : SUCCESS

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 8
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Chen Li 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-04 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 8:

BAD Compatibility Tests Started 
https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/472/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 8
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Chen Li 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-04 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 8:

Integration Tests Started 
https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1803/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 8
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Chen Li 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-04 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 8:

WARNING: THIS CHANGE CONTAINS CROSS-PRODUCT CHANGES IN:
* asterixdb
* hyracks-fullstack

PLEASE REVIEW CAREFULLY AND LOOK FOR API CHANGES!

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 8
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Chen Li 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-04 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 8:

Build Started 
https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-notopic/4180/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 8
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Chen Li 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-04 Thread Taewoo Kim (Code Review)
Hello Jenkins,

I'd like you to reexamine a change.  Please visit

https://asterix-gerrit.ics.uci.edu/1481

to look at the new patch set (#8).

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..

ASTERIXDB-1778: Optimize the edit-distance-check function

 - Only calculate 2 * (threshold + 1) cells, rather than all cells per row.
 - Terminate the calculation steps early when it become obvious that
   the possible edit-distance value is greater than the given threshold.
   There is no reason to compute all cells in the 2 dimensional array.
 - Move the location of IListIterator to Hyracks since we now have
   a CharacterIterator in a String. Change the name to ISequenceIterator.
 - Add the section for the function in the manual.
 - Remove letter counting filtering method since it is only applicable for
   the string in ASCII range (0 ~ 127).

Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
---
M asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md
M asterixdb/asterix-fuzzyjoin/pom.xml
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricJaccard.java
A 
asterixdb/asterix-fuzzyjoin/src/test/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistanceTest.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/AbstractAsterixListIterator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceCheckEvaluator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedCheckEvaluator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedEvaluator.java
R 
hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/ISequenceIterator.java
A 
hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8StringCharByCharIterator.java
14 files changed, 355 insertions(+), 281 deletions(-)


  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb 
refs/changes/81/1481/8
-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 8
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Chen Li 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-04 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 7:

Build Started 
https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-notopic/4179/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 7
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Chen Li 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-04 Thread Taewoo Kim (Code Review)
Hello Jenkins,

I'd like you to reexamine a change.  Please visit

https://asterix-gerrit.ics.uci.edu/1481

to look at the new patch set (#7).

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..

ASTERIXDB-1778: Optimize the edit-distance-check function

 - Only calculate 2 * (threshold + 1) cells, rather than all cells per row.
 - Terminate the calculation steps early when it become obvious that
   the possible edit-distance value is greater than the given threshold.
   There is no reason to compute all cells in the 2 dimensional array.
 - Move the location of IListIterator to Hyracks since we now have
   a CharacterIterator in a String. Change the name to ISequenceIterator.
 - Add the section for the function in the manual.
 - Remove letter counting filtering method since it is only applicable for
   the string in ASCII range (0 ~ 127).

Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
---
M asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md
M asterixdb/asterix-fuzzyjoin/pom.xml
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricJaccard.java
A 
asterixdb/asterix-fuzzyjoin/src/test/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistanceTest.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/AbstractAsterixListIterator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceCheckEvaluator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedCheckEvaluator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedEvaluator.java
R 
hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/ISequenceIterator.java
A 
hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8StringCharByCharIterator.java
14 files changed, 360 insertions(+), 281 deletions(-)


  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb 
refs/changes/81/1481/7
-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 7
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Chen Li 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-04 Thread Taewoo Kim (Code Review)
Taewoo Kim has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 6:

@Jianfeng: I now see what you mean. Since the main function is a private 
function, yes, I will add a unit test case since it is not exposed to the 
public interface. Makes sense.

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 6
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Chen Li 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-04 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 6:

Integration Tests Started 
https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1798/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 6
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Chen Li 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Taewoo Kim (Code Review)
Taewoo Kim has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 6:

For your comments, edit-distance-check_strings test case already contains that 
corner case. The latter two queries will do the early termination. I just 
checked it using "println".  

let $a := "Nalini Venkatasubramanian"
let $b := "Nalini Wekatasupramanian"
let $results :=
[
  edit-distance-check($a, $b, 3),
  edit-distance-check($b, $a, 3),
  edit-distance-check($a, $b, 2),
  edit-distance-check($b, $a, 2)
]
for $i in $results
return $i

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 6
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Chen Li 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Jianfeng Jia (Code Review)
Jianfeng Jia has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 6:

Looks good. But I still feel there should be some simple *JUnit* test for the 
edit distance, not the AQL ones. 
The AQL (or SQL++) tests are too far away and usually is very difficult to hit 
the corner cases.

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 6
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Chen Li 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 6:

Integration Tests Started 
https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1791/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 6
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Chen Li 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 6: Integration-Tests+1

Integration Tests Successful

https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1788/ 
: SUCCESS

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 6
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Chen Li 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 6:

Integration Tests Started 
https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1788/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 6
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Chen Li 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 6: BAD+1

BAD Compatibility Tests Successful

https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/456/ : SUCCESS

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 6
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Chen Li 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 6: Integration-Tests-1

Integration Tests Failed

https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1785/ 
: UNSTABLE

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 6
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Anon. E. Moose #1000151
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 6:

BAD Compatibility Tests Started 
https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/456/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 6
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Anon. E. Moose #1000151
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 6: BAD-1

BAD Compatibility Tests Failed

https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/449/ : FAILURE

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 6
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Anon. E. Moose #1000151
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 6:

BAD Compatibility Tests Started 
https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/449/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 6
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Anon. E. Moose #1000151
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 6:

Integration Tests Started 
https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1785/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 6
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Anon. E. Moose #1000151
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 6:

WARNING: THIS CHANGE CONTAINS CROSS-PRODUCT CHANGES IN:
* asterixdb
* hyracks-fullstack

PLEASE REVIEW CAREFULLY AND LOOK FOR API CHANGES!

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 6
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Anon. E. Moose #1000151
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 5: BAD-1

BAD Compatibility Tests Failed

https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/448/ : FAILURE

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 5
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Anon. E. Moose #1000151
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Taewoo Kim (Code Review)
Taewoo Kim has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 5:

@Jianfeng: the early termination logic is in place. We have test cases for 
them, too. In fact, the current test cases already cover them. (e.g., 
edit-distance-check_strings)

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 5
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Anon. E. Moose #1000151
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 6:

Build Started 
https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-notopic/4155/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 6
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Anon. E. Moose #1000151
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 5:

BAD Compatibility Tests Started 
https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/448/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 5
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Anon. E. Moose #1000151
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Anon. E. Moose (Code Review)
Anon. E. Moose #1000151 has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 5:

(1 comment)

https://asterix-gerrit.ics.uci.edu/#/c/1481/5/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java
File 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java:

PS5, Line 109: -1
Add a comment to the function to explain the purpose of "-1".


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 5
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Anon. E. Moose #1000151
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: Yes


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 5:

WARNING: THIS CHANGE CONTAINS CROSS-PRODUCT CHANGES IN:
* asterixdb
* hyracks-fullstack

PLEASE REVIEW CAREFULLY AND LOOK FOR API CHANGES!

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 5
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Anon. E. Moose #1000151
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Anon. E. Moose (Code Review)
Anon. E. Moose #1000151 has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 4:

(5 comments)

First set of comments

https://asterix-gerrit.ics.uci.edu/#/c/1481/4/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java
File 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java:

PS4, Line 29: float
> this function doesn't has to be exposed.
Is it better to rename "get" to "compute" since "get" seems to suggest it's a 
"getter"?


https://asterix-gerrit.ics.uci.edu/#/c/1481/4/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java
File 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java:

PS4, Line 49: lists
"lists" -> "sequences" to be consistent with the parameter type?


PS4, Line 51: entire cells
"entire cells" -> "all the cells in the row"


PS4, Line 53: less than
"less than" -> "within"?


Line 99: if (canTerminateEarly) {
Where is this "canTerminateEarly" decided?  I couldn't find it.


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 4
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Anon. E. Moose #1000151
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: Yes


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 5:

Build Started 
https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-notopic/4154/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 5
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Anon. E. Moose #1000151
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Taewoo Kim (Code Review)
Hello Jenkins,

I'd like you to reexamine a change.  Please visit

https://asterix-gerrit.ics.uci.edu/1481

to look at the new patch set (#5).

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..

ASTERIXDB-1778: Optimize the edit-distance-check function

 - Only calculate 2 * (threshold + 1) cells, rather than all cells per row.
 - Terminate the calculation steps early when it become obvious that
   the possible edit-distance value is greater than the given threshold.
   There is no reason to compute all cells in the 2 dimensional array.
 - Move the location of IListIterator to Hyracks since we now have
   a CharacterIterator in a String. Change the name to ISequenceIterator.
 - Add the section for the function in the manual.

Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
---
M asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md
M asterixdb/asterix-fuzzyjoin/pom.xml
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricJaccard.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/AbstractAsterixListIterator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceCheckEvaluator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java
R 
hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/ISequenceIterator.java
A 
hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8StringCharByCharIterator.java
11 files changed, 291 insertions(+), 239 deletions(-)


  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb 
refs/changes/81/1481/5
-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 5
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Anon. E. Moose #1000151
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-03 Thread Taewoo Kim (Code Review)
Taewoo Kim has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 3:

(11 comments)

@Jianfeng: Thanks!

https://asterix-gerrit.ics.uci.edu/#/c/1481/4/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java
File 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java:

PS4, Line 26:  
> use javadoc syntax?
Done


PS4, Line 29: float
> this function doesn't has to be exposed.
Done


PS4, Line 32: returns
> use javadoc?
Done


https://asterix-gerrit.ics.uci.edu/#/c/1481/2/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java
File 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java:

Line 28: public static int getIntersectSize(ISequenceIterator tokensX, 
ISequenceIterator tokensY)
> MAJOR SonarQube violation:
Done


https://asterix-gerrit.ics.uci.edu/#/c/1481/3/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java
File 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java:

PS3, Line 63: public
> is it necessary to has an `public interface` ? 
Agreed and done.


Line 70: boolean canTerminateEarly = edThresh >= 0 ? true : false;
> *boolean canTerminateEarly = edThresh >= 0* is enough.
The caller that is calling this function already checks your if condition.Since 
we change this to a private function, I think it's OK not to add the if 
condition.


PS3, Line 131: 1
> can you define a static variable and give `-1` a good name?
Done


PS3, Line 144: Gets
> do we really need this comments ? :-)
Done


PS3, Line 157: -
> it worth explain the meaning of -1
Done


PS3, Line 168: public
> is it necessary to be a public method?
Yes. It is being called from the outside of this class.


PS3, Line 219: public
> public -> private?
It is being called from the outside.


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 3
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Anon. E. Moose #1000151
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: Yes


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 4: Integration-Tests-1

Integration Tests Failed

https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1775/ 
: UNSTABLE

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 4
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Anon. E. Moose #1000151
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 4: BAD-1

BAD Compatibility Tests Failed

https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/442/ : FAILURE

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 4
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 4:

BAD Compatibility Tests Started 
https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/442/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 4
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 4:

Integration Tests Started 
https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1775/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 4
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 3: Integration-Tests+1

Integration Tests Successful

https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1773/ 
: SUCCESS

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 3
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-02 Thread Jianfeng Jia (Code Review)
Jianfeng Jia has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 4:

(10 comments)

Just some minor comments.

https://asterix-gerrit.ics.uci.edu/#/c/1481/4/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java
File 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java:

PS4, Line 26:  
use javadoc syntax?


PS4, Line 29: float
this function doesn't has to be exposed.


PS4, Line 32: returns
use javadoc?


https://asterix-gerrit.ics.uci.edu/#/c/1481/3/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java
File 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java:

PS3, Line 63: public
is it necessary to has an `public interface` ? 
I think it can just be a private function of this class.


Line 70: boolean canTerminateEarly = edThresh >= 0 ? true : false;
*boolean canTerminateEarly = edThresh >= 0* is enough.

and if edThresh > min(flLen, slLen) should also be false?


PS3, Line 131: 1
can you define a static variable and give `-1` a good name?


PS3, Line 144: Gets
do we really need this comments ? :-)


PS3, Line 157: -
it worth explain the meaning of -1


PS3, Line 168: public
is it necessary to be a public method?


PS3, Line 219: public
public -> private?


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 4
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: Yes


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 4:

WARNING: THIS CHANGE CONTAINS CROSS-PRODUCT CHANGES IN:
* asterixdb
* hyracks-fullstack

PLEASE REVIEW CAREFULLY AND LOOK FOR API CHANGES!

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 4
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..


Patch Set 4:

Build Started 
https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-notopic/4147/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 4
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function

2017-02-02 Thread Taewoo Kim (Code Review)
Hello Jenkins,

I'd like you to reexamine a change.  Please visit

https://asterix-gerrit.ics.uci.edu/1481

to look at the new patch set (#4).

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
..

ASTERIXDB-1778: Optimize the edit-distance-check function

 - Only calculate 2 * (threshold + 1) cells, rather than all cells per row.
 - Terminate the calculation steps early when it become obvious that
   the possible edit-distance value is greater than the given threshold.
   There is no reason to compute all cells in the 2 dimensional array.
 - Move the location of IListIterator to Hyracks since we now have
   a CharacterIterator in a String. Change the name to ISequenceIterator.
 - Add the section for the function in the manual.

Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
---
M asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md
M asterixdb/asterix-fuzzyjoin/pom.xml
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricJaccard.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/AbstractAsterixListIterator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceCheckEvaluator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedEvaluator.java
R 
hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/ISequenceIterator.java
A 
hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8StringCharByCharIterator.java
12 files changed, 255 insertions(+), 226 deletions(-)


  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb 
refs/changes/81/1481/4
-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 4
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 


Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: optimize the edit-distance-check function
..


Patch Set 3: BAD-1

BAD Compatibility Tests Failed

https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/440/ : FAILURE

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 3
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: optimize the edit-distance-check function
..


Patch Set 3:

BAD Compatibility Tests Started 
https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/440/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 3
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: optimize the edit-distance-check function
..


Patch Set 3:

Integration Tests Started 
https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1773/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 3
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: optimize the edit-distance-check function
..


Patch Set 3:

WARNING: THIS CHANGE CONTAINS CROSS-PRODUCT CHANGES IN:
* asterixdb
* hyracks-fullstack

PLEASE REVIEW CAREFULLY AND LOOK FOR API CHANGES!

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 3
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: optimize the edit-distance-check function
..


Patch Set 3:

Build Started 
https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-notopic/4144/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 3
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function

2017-02-02 Thread Taewoo Kim (Code Review)
Hello Jenkins,

I'd like you to reexamine a change.  Please visit

https://asterix-gerrit.ics.uci.edu/1481

to look at the new patch set (#3).

Change subject: ASTERIXDB-1778: optimize the edit-distance-check function
..

ASTERIXDB-1778: optimize the edit-distance-check function

 - Only calculate 2 * (threshold + 1) cells, rather than all cells per row.
 - Terminate the calculation stpes early when it become obvious that
   the possible edit-distance value is greater than the given threshold.
   There is no reason to computes all cells in the 2*2 array.
 - Move the location of IListIterator to Hyracks since we now have
   a CharacterIterator in a String.

Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
---
M asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md
M asterixdb/asterix-fuzzyjoin/pom.xml
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricJaccard.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/AbstractAsterixListIterator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceCheckEvaluator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedEvaluator.java
R 
hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/ISequenceIterator.java
A 
hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8StringCharByCharIterator.java
12 files changed, 250 insertions(+), 170 deletions(-)


  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb 
refs/changes/81/1481/3
-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 3
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 


Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: optimize the edit-distance-check function
..


Patch Set 2:

WARNING: THIS CHANGE CONTAINS CROSS-PRODUCT CHANGES IN:
* asterixdb
* hyracks-fullstack

PLEASE REVIEW CAREFULLY AND LOOK FOR API CHANGES!

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 2
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function

2017-02-02 Thread Taewoo Kim (Code Review)
Taewoo Kim has posted comments on this change.

Change subject: ASTERIXDB-1778: optimize the edit-distance-check function
..


Patch Set 2:

Added Steven because of BAD failure.

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 2
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-Reviewer: Jianfeng Jia 
Gerrit-Reviewer: Steven Jacobs 
Gerrit-Reviewer: Taewoo Kim 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: optimize the edit-distance-check function
..


Patch Set 2:

Build Started 
https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-notopic/4143/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 2
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function

2017-02-02 Thread Taewoo Kim (Code Review)
Hello Jenkins,

I'd like you to reexamine a change.  Please visit

https://asterix-gerrit.ics.uci.edu/1481

to look at the new patch set (#2).

Change subject: ASTERIXDB-1778: optimize the edit-distance-check function
..

ASTERIXDB-1778: optimize the edit-distance-check function

 - Only calculate 2 * (threshold + 1) cells, rather than all cells per row.
 - Terminate the calculation stpes early when it become obvious that
   the possible edit-distance value is greater than the given threshold.
   There is no reason to computes all cells in the 2*2 array.

Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
---
M asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md
M asterixdb/asterix-fuzzyjoin/pom.xml
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricJaccard.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/AbstractAsterixListIterator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceCheckEvaluator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedEvaluator.java
R 
hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/ISequenceIterator.java
A 
hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8StringCharByCharIterator.java
12 files changed, 249 insertions(+), 170 deletions(-)


  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb 
refs/changes/81/1481/2
-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 2
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 


Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: optimize the edit-distance-check function
..


Patch Set 1: Integration-Tests+1

Integration Tests Successful

https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1770/ 
: SUCCESS

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 1
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: optimize the edit-distance-check function
..


Patch Set 1: BAD-1

BAD Compatibility Tests Failed

https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/437/ : FAILURE

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 1
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: optimize the edit-distance-check function
..


Patch Set 1:

BAD Compatibility Tests Started 
https://asterix-jenkins.ics.uci.edu/job/asterixbad-compat/437/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 1
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: optimize the edit-distance-check function
..


Patch Set 1:

Integration Tests Started 
https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/1770/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 1
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function

2017-02-02 Thread Jenkins (Code Review)
Jenkins has posted comments on this change.

Change subject: ASTERIXDB-1778: optimize the edit-distance-check function
..


Patch Set 1:

Build Started 
https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-notopic/4137/

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 1
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim 
Gerrit-Reviewer: Jenkins 
Gerrit-HasComments: No


Change in asterixdb[master]: ASTERIXDB-1778: optimize the edit-distance-check function

2017-02-02 Thread Taewoo Kim (Code Review)
Taewoo Kim has uploaded a new change for review.

  https://asterix-gerrit.ics.uci.edu/1481

Change subject: ASTERIXDB-1778: optimize the edit-distance-check function
..

ASTERIXDB-1778: optimize the edit-distance-check function

 - Only calculate 2 * (threshold + 1) cells, rather than all cells per row.
 - Terminate the calculation stpes early when it become obvious that
   the possible edit-distance value is greater than the given threshold.
   There is no reason to computes all cells in the 2*2 array.

Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
---
M asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java
M 
asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricJaccard.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceCheckEvaluator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedEvaluator.java
8 files changed, 173 insertions(+), 117 deletions(-)


  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb 
refs/changes/81/1481/1

diff --git a/asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md 
b/asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md
index 89ef0f7..cb3318f 100644
--- a/asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md
+++ b/asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md
@@ -47,6 +47,36 @@
 
 2
 
+### edit_distance_check ###
+* Syntax:
+
+edit_distance_check(expression1, expression2, threshold)
+
+* Checks whether the edit distance of `expression1` and `expression2` is 
within a given threshold.
+
+* Arguments:
+* `expression1` : a `string` or a homogeneous `array` of a comparable item 
type.
+* `expression2` : The same type as `expression1`.
+* `threshold` : a `bigint` that represents the distance threshold.
+* Return Value:
+* an `array` with two items:
+* The first item contains a `boolean` value representing whether the 
edit distance of `expression1` and `expression2` is within the given threshold.
+* The second item contains an `integer` that represents the edit 
distance of `expression1` and `expression2` if the first item is true.
+* If the first item is false, then the second item is set to 
2147483647.
+* `missing` if any argument is a `missing` value,
+* `null` if any argument is a `null` value but no argument is a `missing` 
value,
+* a type error will be raised if:
+* the first or second argument is any other non-string value,
+* or, the third argument is any other non-bigint value.
+* Note: an [n_gram 
index](similarity.html#UsingIndexesToSupportSimilarityQueries) can be utilized 
for this function.
+* Example:
+
+edit_distance_check("happy","hapr",2);
+
+
+* The expected result is:
+
+[ true, 2 ]
 
 ### edit_distance_contains ###
 * Syntax:
diff --git 
a/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java
 
b/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java
index ac4a3dd..751597d 100644
--- 
a/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java
+++ 
b/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java
@@ -22,8 +22,11 @@
 import org.apache.hyracks.api.exceptions.HyracksDataException;
 
 public interface IGenericSimilarityMetric {
-// returns similarity
-public float getSimilarity(IListIterator firstList, IListIterator 
secondList) throws HyracksDataException;
+// Returns -1 if this method supports early-termination and it becomes 
obvious that
+// the possible similarity value can't satisfy the given simThresh value.
+// Else returns the calculated similarity value.
+public float getActualSimilarityVal(IListIterator firstList, IListIterator 
secondList, float simThresh)
+throws HyracksDataException;
 
 // returns -1 if does not satisfy threshold
 // else returns similarity
diff --git 
a/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java
 
b/asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java
index d36d60d..70029a3 100644
---