[GitHub] spark pull request: [SPARK-12295] [SQL] external spilling for wind...

hvanhovell Tue, 05 Jan 2016 14:08:50 -0800

Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/10605#issuecomment-169149338
  
    @davies this is pretty awesome! I have taken long look at the window code 
and it looks solid. I am less of an expert on the Memory management front, so 
maybe someone else should take a look at that.
    
    I do have one small concern: I am absolutely convinced that in the case of 
large partition sizes this will outperform the current implementation by a 
margin. However, I am wondering what happens if we consider smaller partition 
sizes (e.g. n 2-32). We might take a small hit in these cases, because of the 
added complexity. Have you done some benchmarking on this? If you haven't this 
is a link to benchmark I used for my initial window prototype: 
https://issues.apache.org/jira/secure/attachment/12745984/perf_test3.scala
    
    I'd like to finish with something we should **not** address in this PR 
(thinking out loud if you will). The child node of a ```Window``` operator is 
allmost always an ```ExternalSort``` operator. Wouldn't it be cool if we could 
eliminate the row buffer of the ```Window``` by using the external sorts buffer?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12295] [SQL] external spilling for wind...

Reply via email to