viirya opened a new pull request #24864: [MINOR][PySpark][SQL][DOC] Fix rowsBetween doc in Window URL: https://github.com/apache/spark/pull/24864 ## What changes were proposed in this pull request? I suspect that the doc of `rowsBetween` methods in Scala and PySpark looks wrong. Because: ```scala scala> val df = Seq((1, "a"), (2, "a"), (3, "a"), (4, "a"), (5, "a"), (6, "a")).toDF("id", "category") df: org.apache.spark.sql.DataFrame = [id: int, category: string] scala> val byCategoryOrderedById = Window.partitionBy('category).orderBy('id).rowsBetween(-1, 2) byCategoryOrderedById: org.apache.spark.sql.expressions.WindowSpec = org.apache.spark.sql.expressions.WindowSpec@7f04de97 scala> df.withColumn("sum", sum('id) over byCategoryOrderedById).show() +---+--------+---+ | id|category|sum| +---+--------+---+ | 1| a| 6| # sum from index 0 to (0 + 2): 1 + 2 + 3 = 6 | 2| a| 10| # sum from index (1 - 1) to (1 + 2): 1 + 2 + 3 + 4 = 10 | 3| a| 14| | 4| a| 18| | 5| a| 15| | 6| a| 11| +---+--------+---+ ``` So the frame (-1, 2) for row with index 5, as described in the doc, should range from index 4 to index 7. ## How was this patch tested? N/A, just doc change.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
