Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22227#discussion_r214135400
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
    @@ -229,33 +229,58 @@ case class RLike(left: Expression, right: Expression) 
extends StringRegexExpress
     
     
     /**
    - * Splits str around pat (pattern is a regular expression).
    + * Splits str around matches of the given regex.
      */
     @ExpressionDescription(
    -  usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match 
`regex`.",
    +  usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences 
that match `regex`" +
    +    " and returns an array of at most `limit`",
    +  arguments = """
    +    Arguments:
    +      * str - a string expression to split.
    +      * regex - a string representing a regular expression. The regex 
string should be a
    +        Java regular expression.
    +      * limit - an integer expression which controls the number of times 
the regex is applied.
    +
    +        limit > 0: The resulting array's length will not be more than 
`limit`, and the resulting
    +                   array's last entry will contain all input beyond the 
last matched regex.
    +
    +        limit < 0: `regex` will be applied as many times as possible, and 
the resulting
    +                   array can be of any size.
    +
    +        limit = 0: `regex` will be applied as many times as possible, the 
resulting array can
    --- End diff --
    
    yea but i'd focus on what behavior we want to enable. do other database 
systems have this split=0 semantics? if not, i'd rewrite split=0 internally to 
just -1.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to