[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

viirya Thu, 01 Feb 2018 00:23:40 -0800

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r165284238
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -120,20 +122,46 @@ def rangeBetween(start, end):
             and "5" means the five off after the current row.
     
             We recommend users use ``Window.unboundedPreceding``, 
``Window.unboundedFollowing``,
    -        and ``Window.currentRow`` to specify special boundary values, 
rather than using integral
    -        values directly.
    +        ``Window.currentRow``, 
``pyspark.sql.functions.unboundedPreceding``,
    +        ``pyspark.sql.functions.unboundedFollowing`` and 
``pyspark.sql.functions.currentRow``
    +        to specify special boundary values, rather than using integral 
values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is 
``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is 
``Window.unboundedPreceding``,
    +                      a column returned by 
``pyspark.sql.functions.unboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, 
-9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is 
``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is 
``Window.unboundedFollowing``,
    +                    a column returned by 
``pyspark.sql.functions.unboundedFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 
9223372036854775807).
    +
    +        >>> from pyspark.sql import functions as F, SparkSession, Window
    +        >>> spark = SparkSession.builder.getOrCreate()
    +        >>> df = spark.createDataFrame(
    +        ...     [(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, 
"b")], ["id", "category"])
    +        >>> window = 
Window.orderBy("id").partitionBy("category").rangeBetween(
    +        ...     F.currentRow(), F.lit(1))
    +        >>> df.withColumn("sum", F.sum("id").over(window)).show()
    +        +---+--------+---+
    +        | id|category|sum|
    +        +---+--------+---+
    +        |  1|       b|  3|
    +        |  2|       b|  5|
    +        |  3|       b|  3|
    +        |  1|       a|  4|
    +        |  1|       a|  4|
    +        |  2|       a|  2|
    +        +---+--------+---+
    +        <BLANKLINE>
             """
    -        if start <= Window._PRECEDING_THRESHOLD:
    -            start = Window.unboundedPreceding
    -        if end >= Window._FOLLOWING_THRESHOLD:
    -            end = Window.unboundedFollowing
    +        if isinstance(start, (int, long)) and isinstance(end, (int, long)):
    --- End diff --
    
    Is it possibly that we mix int and Column in the parameters?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Reply via email to