[GitHub] [spark] viirya opened a new pull request #24864: [MINOR][PySpark][SQL][DOC] Fix rowsBetween doc in Window

GitBox Thu, 13 Jun 2019 08:31:36 -0700

viirya opened a new pull request #24864: [MINOR][PySpark][SQL][DOC] Fix 
rowsBetween doc in Window
URL: https://github.com/apache/spark/pull/24864
 
 
   ## What changes were proposed in this pull request?
   
   I suspect that the doc of `rowsBetween` methods in Scala and PySpark looks 
wrong.
   Because:
   
   ```scala
   scala> val df = Seq((1, "a"), (2, "a"), (3, "a"), (4, "a"), (5, "a"), (6, 
"a")).toDF("id", "category")                                                 
   df: org.apache.spark.sql.DataFrame = [id: int, category: string]             
                                                                          
                                                                                
                                                                          
   scala> val byCategoryOrderedById = 
Window.partitionBy('category).orderBy('id).rowsBetween(-1, 2)                   
                                    
   byCategoryOrderedById: org.apache.spark.sql.expressions.WindowSpec = 
org.apache.spark.sql.expressions.WindowSpec@7f04de97                            
  
                                                                                
                                                                          
   scala> df.withColumn("sum", sum('id) over byCategoryOrderedById).show()      
                                                                          
   +---+--------+---+                                                           
                                                                          
   | id|category|sum|                                                           
                                                                          
   +---+--------+---+                                                           
                                                                          
   |  1|       a|  6|              # sum from index 0 to (0 + 2): 1 + 2 + 3 = 6 
                                                                                
                                     
   |  2|       a| 10|              # sum from index (1 - 1) to (1 + 2): 1 + 2 + 
3 + 4 = 10                                                                      
                                                 
   |  3|       a| 14|                                                           
                                                                  
   |  4|       a| 18|                                                           
                                                                          
   |  5|       a| 15|
   |  6|       a| 11|
   +---+--------+---+
   ```
   
   So the frame (-1, 2) for row with index 5, as described in the doc, should 
range from index 4 to index 7.
   
   ## How was this patch tested?
   
   N/A, just doc change.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya opened a new pull request #24864: [MINOR][PySpark][SQL][DOC] Fix rowsBetween doc in Window

Reply via email to