[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

phegstrom Thu, 30 Aug 2018 13:13:36 -0700

Github user phegstrom commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22227#discussion_r214165774
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
    @@ -229,33 +229,58 @@ case class RLike(left: Expression, right: Expression) 
extends StringRegexExpress
     
     
     /**
    - * Splits str around pat (pattern is a regular expression).
    + * Splits str around matches of the given regex.
      */
     @ExpressionDescription(
    -  usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match 
`regex`.",
    +  usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences 
that match `regex`" +
    +    " and returns an array of at most `limit`",
    +  arguments = """
    +    Arguments:
    +      * str - a string expression to split.
    +      * regex - a string representing a regular expression. The regex 
string should be a
    +        Java regular expression.
    +      * limit - an integer expression which controls the number of times 
the regex is applied.
    +
    +        limit > 0: The resulting array's length will not be more than 
`limit`, and the resulting
    +                   array's last entry will contain all input beyond the 
last matched regex.
    +
    +        limit < 0: `regex` will be applied as many times as possible, and 
the resulting
    +                   array can be of any size.
    +
    +        limit = 0: `regex` will be applied as many times as possible, the 
resulting array can
    --- End diff --
    
    I see this as a value-add in that the user just gets more ammunition to 
answer a given question about his/her data. Sure, it's not necessary to have 
the `limit = 0` case, but if a user can get what they need by writing fewer 
lines of code because it exists I'd say it's definitely worth exposing.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

Reply via email to