[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

HyukjinKwon Sun, 02 Sep 2018 18:48:54 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22227#discussion_r214562340
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -2546,15 +2546,39 @@ object functions {
       def soundex(e: Column): Column = withExpr { SoundEx(e.expr) }
     
       /**
    -   * Splits str around pattern (pattern is a regular expression).
    +   * Splits str around matches of the given regex.
        *
    -   * @note Pattern is a string representation of the regular expression.
    +   * @param str a string expression to split
    +   * @param regex a string representing a regular expression. The regex 
string should be
    +   *              a Java regular expression.
        *
        * @group string_funcs
        * @since 1.5.0
        */
    -  def split(str: Column, pattern: String): Column = withExpr {
    -    StringSplit(str.expr, lit(pattern).expr)
    +  def split(str: Column, regex: String): Column = withExpr {
    +    StringSplit(str.expr, Literal(regex), Literal(-1))
    +  }
    +
    +  /**
    +   * Splits str around matches of the given regex.
    +   *
    +   * @param str a string expression to split
    +   * @param regex a string representing a regular expression. The regex 
string should be
    +   *              a Java regular expression.
    +   * @param limit an integer expression which controls the number of times 
the regex is applied.
    +   *        <p><p>
    +   *        limit greater than 0: The resulting array's length will not be 
more than `limit`,
    +   *                              and the resulting array's last entry 
will contain all input beyond
    +   *                              the last matched regex.
    +   *        <p><p>
    +   *        limit less than or equal to 0: `regex` will be applied as many 
times as possible, and
    +   *                                       the resulting array can be of 
any size.
    --- End diff --
    
    I think you can refer 
https://github.com/apache/spark/blob/e754887182304ad0d622754e33192ebcdd515965/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L338-L386



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

Reply via email to