[GitHub] [calcite] tanclary commented on a diff in pull request #3408: [CALCITE-5978] Add REGEXP_INSTR function (enabled in BigQuery library)

2023-09-07 Thread via GitHub


tanclary commented on code in PR #3408:
URL: https://github.com/apache/calcite/pull/3408#discussion_r1318980163


##
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##
@@ -404,16 +404,39 @@ private Pattern validateRegexPattern(String regex, String 
methodName) {
   }
 }
 
-/** Helper for multiple capturing group regex check in REGEXP_EXTRACT fns. 
*/
+/** Helper for multiple capturing group regex check in REGEXP_* fns. */
 private void checkMultipleCapturingGroupsInRegex(Matcher matcher, String 
methodName) {
   if (matcher.groupCount() > 1) {
 throw RESOURCE.multipleCapturingGroupsForRegexpExtract(
 Integer.toString(matcher.groupCount()), methodName).ex();
   }
 }
 
+/** Helper for checking values of position and occurrence arguments in 
REGEXP_* fns.
+ *  Regex Fns not using occurrencePosition param pass a default value as 0.
+ *  Throws an exception or returns true in case of failed value checks. */
+private boolean checkPosOccurrenceParamValues(int position,
+int occurrence, int occurrencePosition, String value, String 
methodName) {
+  if (position <= 0) {
+throw 
RESOURCE.invalidIntegerInputForRegexpFunctions(Integer.toString(position),
+"position", methodName).ex();
+  }
+  if (occurrence <= 0) {
+throw 
RESOURCE.invalidIntegerInputForRegexpFunctions(Integer.toString(occurrence),
+"occurrence", methodName).ex();
+  }
+  if (occurrencePosition < 0 || occurrencePosition > 1) {

Review Comment:
   nit: would it be better to change this to != 0 && != 1? I feel like that 
makes it more clear that the only options are 0 or 1. Maybe I am just bad at 
reading math comparisons though. This isn't a big enough comment to worry about 
if I don't leave others.



##
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##
@@ -488,13 +502,74 @@ public List regexpExtractAll(String value, String 
regex) {
   ImmutableList.Builder matches = ImmutableList.builder();
   while (matcher.find()) {
 String match = matcher.group(matcher.groupCount());
-if (match != null && !match.isEmpty()) {
+if (match != null) {
   matches.add(match);
 }
   }
   return matches.build();
 }
 
+/** SQL {@code REGEXP_INSTR(value, regexp)} function.
+ *  Returns 0 if there is no match or regex is empty. Returns an exception 
if regex is invalid.

Review Comment:
   based on other javadocs in sqlfunctions I think this should only be one 
space.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@calcite.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [calcite] tanclary commented on a diff in pull request #3408: [CALCITE-5978] Add REGEXP_INSTR function (enabled in BigQuery library)

2023-09-06 Thread via GitHub


tanclary commented on code in PR #3408:
URL: https://github.com/apache/calcite/pull/3408#discussion_r1317629669


##
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##
@@ -488,13 +502,74 @@ public List regexpExtractAll(String value, String 
regex) {
   ImmutableList.Builder matches = ImmutableList.builder();
   while (matcher.find()) {
 String match = matcher.group(matcher.groupCount());
-if (match != null && !match.isEmpty()) {
+if (match != null) {
   matches.add(match);
 }
   }
   return matches.build();
 }
 
+/** SQL {@code REGEXP_INSTR(value, regexp)} function.
+ *  Returns 0 if there is no match or regex is empty. Returns an exception 
if regex is invalid.
+ *  Uses position=1, occurrence=1, occurrencePosition=0 as default values 
if not specified. */
+public Integer regexpInstr(String value, String regex) {

Review Comment:
   should we use `int` in favor of `Integer`? It seems that's what used in most 
other cases.



##
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##
@@ -488,13 +502,74 @@ public List regexpExtractAll(String value, String 
regex) {
   ImmutableList.Builder matches = ImmutableList.builder();
   while (matcher.find()) {
 String match = matcher.group(matcher.groupCount());
-if (match != null && !match.isEmpty()) {
+if (match != null) {
   matches.add(match);
 }
   }
   return matches.build();
 }
 
+/** SQL {@code REGEXP_INSTR(value, regexp)} function.
+ *  Returns 0 if there is no match or regex is empty. Returns an exception 
if regex is invalid.
+ *  Uses position=1, occurrence=1, occurrencePosition=0 as default values 
if not specified. */
+public Integer regexpInstr(String value, String regex) {
+  return regexpInstr(value, regex, 1, 1, 0);
+}
+
+/** SQL {@code REGEXP_INSTR(value, regexp, position)} function.
+ *  Returns 0 if there is no match, regex is empty, or if position is 
beyond range.
+ *  Returns an exception if regex or position is invalid.
+ *  Uses occurrence=1, occurrencePosition=0 as default value when not 
specified. */
+public Integer regexpInstr(String value, String regex, int position) {
+  return regexpInstr(value, regex, position, 1, 0);
+}
+
+/** SQL {@code REGEXP_INSTR(value, regexp, position, occurrence)} function.
+ *  Returns NULL if there is no match, regex is empty, or if position or 
occurrence

Review Comment:
   Is it correct that this one should return null if there is no match while 
others return 0? Just checking



##
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##
@@ -404,16 +404,39 @@ private Pattern validateRegexPattern(String regex, String 
methodName) {
   }
 }
 
-/** Helper for multiple capturing group regex check in REGEXP_EXTRACT fns. 
*/
+/** Helper for multiple capturing group regex check in REGEXP_* fns. */
 private void checkMultipleCapturingGroupsInRegex(Matcher matcher, String 
methodName) {
   if (matcher.groupCount() > 1) {
 throw RESOURCE.multipleCapturingGroupsForRegexpExtract(
 Integer.toString(matcher.groupCount()), methodName).ex();
   }
 }
 
+/** Helper for checking values of position and occurrence arguments in 
REGEXP_* fns.
+ *  Regex Fns not using occurrencePosition param pass a default value as 0.
+ *  Throws an exception or returns true in case of failed value checks. */
+private boolean checkPosOccurrenceParamValues(int position,
+int occurrence, int occurrencePosition, String value, String 
methodName) {
+  if (position <= 0) {
+throw 
RESOURCE.invalidIntegerInputForRegexpFunctions(Integer.toString(position),
+"position", methodName).ex();
+  }
+  if (occurrence <= 0) {
+throw 
RESOURCE.invalidIntegerInputForRegexpFunctions(Integer.toString(occurrence),
+"occurrence", methodName).ex();
+  }
+  if (occurrencePosition < 0 || occurrencePosition > 1) {
+throw 
RESOURCE.invalidIntegerInputForRegexpFunctions(Integer.toString(occurrencePosition),
+"occurrence_position", methodName).ex();
+  }
+  if (position > value.length()) {
+return true;
+  }
+  return false;
+}
+
 /** SQL {@code REGEXP_CONTAINS(value, regexp)} function.
- * Throws a runtime exception for invalid regular expressions.*/
+ *  Throws a runtime exception for invalid regular expressions. */

Review Comment:
   Did you mean to add this?



##
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##
@@ -404,16 +404,39 @@ private Pattern validateRegexPattern(String regex, String 
methodName) {
   }
 }
 
-/** Helper for multiple capturing group regex check in REGEXP_EXTRACT fns. 
*/
+/** Helper for multiple capturing group regex