nfsantos commented on code in PR #2180:
URL: https://github.com/apache/jackrabbit-oak/pull/2180#discussion_r1995029456


##########
oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/query/ElasticRequestHandler.java:
##########
@@ -908,6 +914,66 @@ private static QueryStringQuery.Builder 
fullTextQuery(String text, String fieldN
         return qsqBuilder.fields(fieldName);
     }
 
+    private String rewriteQueryText(String text) {
+        String rewritten = FulltextIndex.rewriteQueryText(text);
+
+        // here we handle special cases where the syntax used in the lucene 
4.x query parser is not supported by the current version
+        if (rewritten.contains("~")) {
+            rewritten = convertFuzzyQuery(rewritten);
+        }
+
+        return rewritten;
+    }
+
+    /**
+     * Converts Lucene fuzzy queries from the old syntax (float similarity) to 
the new syntax (edit distance).
+     * <p>
+     * In Lucene 4, fuzzy queries were specified using a floating-point 
similarity (e.g., "term~0.8"), where values
+     * closer to 1 required a higher similarity match. In later Lucene 
versions, this was replaced with a discrete
+     * edit distance (0, 1, or 2).
+     * <p>
+     * This method:
+     * <ul>
+     *   <li>Detects and converts old fuzzy queries (e.g., "roam~0.7" → 
"roam~1").</li>
+     *   <li>Preserves new fuzzy queries (e.g., "test~2" remains 
unchanged).</li>
+     *   <li>Avoids modifying proximity queries (e.g., "\"quick fox\"~5" 
remains unchanged).</li>
+     * </ul>
+     *
+     * @param text The input query string containing fuzzy or proximity 
queries.
+     * @return A query string where old fuzzy syntax is converted to the new 
format.
+     */
+    private String convertFuzzyQuery(String text) {
+        Matcher oldMatcher = OLD_FUZZY_PATTERN.matcher(text);
+        StringBuilder result = new StringBuilder();
+
+        while (oldMatcher.find()) {
+            String term = oldMatcher.group(1);
+            String fuzzyValue = oldMatcher.group(2);
+
+            // Skip if it's already using the new syntax (integer 0-2)
+            if (NEW_FUZZY_PATTERN.matcher(term + "~" + fuzzyValue).matches()) {
+                continue;
+            }
+
+            // Convert floating-point similarity to integer edit distance
+            int editDistance = 2; // Default to the most lenient setting
+            try {
+                float similarity = Float.parseFloat(fuzzyValue);
+                if (similarity >= 0.8f) {
+                    editDistance = 0;
+                } else if (similarity >= 0.5f) {
+                    editDistance = 1;
+                }
+            } catch (NumberFormatException e) {
+                LOG.warn("Invalid fuzzy value: {}, using default edit distance 
of 2", fuzzyValue);

Review Comment:
   Should we log the query as well? Otherwise we do not have the context to 
understand where the this error occurred. 



##########
oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/query/ElasticRequestHandler.java:
##########
@@ -908,6 +914,66 @@ private static QueryStringQuery.Builder 
fullTextQuery(String text, String fieldN
         return qsqBuilder.fields(fieldName);
     }
 
+    private String rewriteQueryText(String text) {
+        String rewritten = FulltextIndex.rewriteQueryText(text);
+
+        // here we handle special cases where the syntax used in the lucene 
4.x query parser is not supported by the current version
+        if (rewritten.contains("~")) {
+            rewritten = convertFuzzyQuery(rewritten);
+        }
+
+        return rewritten;
+    }
+
+    /**
+     * Converts Lucene fuzzy queries from the old syntax (float similarity) to 
the new syntax (edit distance).
+     * <p>
+     * In Lucene 4, fuzzy queries were specified using a floating-point 
similarity (e.g., "term~0.8"), where values
+     * closer to 1 required a higher similarity match. In later Lucene 
versions, this was replaced with a discrete
+     * edit distance (0, 1, or 2).
+     * <p>
+     * This method:
+     * <ul>
+     *   <li>Detects and converts old fuzzy queries (e.g., "roam~0.7" → 
"roam~1").</li>
+     *   <li>Preserves new fuzzy queries (e.g., "test~2" remains 
unchanged).</li>
+     *   <li>Avoids modifying proximity queries (e.g., "\"quick fox\"~5" 
remains unchanged).</li>
+     * </ul>
+     *
+     * @param text The input query string containing fuzzy or proximity 
queries.
+     * @return A query string where old fuzzy syntax is converted to the new 
format.
+     */
+    private String convertFuzzyQuery(String text) {
+        Matcher oldMatcher = OLD_FUZZY_PATTERN.matcher(text);
+        StringBuilder result = new StringBuilder();
+
+        while (oldMatcher.find()) {

Review Comment:
   If the query does not contain a fuzzy expression (maybe the `~` in the query 
was quoted and is not part of a fuzzy expression or someone in the future 
removed the check for `~` done before calling this method), this method will 
always created a copy of the argument. We could instead return the argument 
unchanged and avoid creating a new copy if it does not match a fuzzy query.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to