[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

HyukjinKwon Sun, 02 Sep 2018 18:35:36 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22227#discussion_r214561410
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1669,20 +1669,33 @@ def repeat(col, n):
         return Column(sc._jvm.functions.repeat(_to_java_column(col), n))
     
     
    -@since(1.5)
    +@since(2.4)
     @ignore_unicode_prefix
    -def split(str, pattern):
    +def split(str, pattern, limit=-1):
         """
    -    Splits str around pattern (pattern is a regular expression).
    +    Splits str around matches of the given pattern.
    +
    +    :param str: a string expression to split
    +    :param pattern: a string representing a regular expression. The regex 
string should be
    +                  a Java regular expression.
    +    :param limit: an integer expression which controls the number of times 
the pattern is applied.
     
    -    .. note:: pattern is a string represent the regular expression.
    +            * ``limit > 0``: The resulting array's length will not be more 
than `limit`, and the
    +                             resulting array's last entry will contain all 
input beyond the last
    +                             matched pattern.
    +            * ``limit <= 0``: `pattern` will be applied as many times as 
possible, and the resulting
    +                              array can be of any size.
     
    -    >>> df = spark.createDataFrame([('ab12cd',)], ['s',])
    -    >>> df.select(split(df.s, '[0-9]+').alias('s')).collect()
    -    [Row(s=[u'ab', u'cd'])]
    +    >>> df = spark.createDataFrame([('oneAtwoBthreeC',)], ['s',])
    +    >>> df.select(split(df.s, '[ABC]', 2).alias('s')).collect()
    +    [Row(s=[u'one', u'twoBthreeC'])]
    +    >>> df.select(split(df.s, '[ABC]', -1).alias('s')).collect()
    +    [Row(s=[u'one', u'two', u'three', u''])]
    +    >>> df.select(split(df.s, '[ABC]', 0).alias('s')).collect()
    --- End diff --
    
    I wouldn't have this test since we now don't have a specific behaviour to 0.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

Reply via email to