Github user hvanhovell commented on the pull request:
https://github.com/apache/spark/pull/10605#issuecomment-169149338
@davies this is pretty awesome! I have taken long look at the window code
and it looks solid. I am less of an expert on the Memory management front, so
maybe someone else should take a look at that.
I do have one small concern: I am absolutely convinced that in the case of
large partition sizes this will outperform the current implementation by a
margin. However, I am wondering what happens if we consider smaller partition
sizes (e.g. n 2-32). We might take a small hit in these cases, because of the
added complexity. Have you done some benchmarking on this? If you haven't this
is a link to benchmark I used for my initial window prototype:
https://issues.apache.org/jira/secure/attachment/12745984/perf_test3.scala
I'd like to finish with something we should **not** address in this PR
(thinking out loud if you will). The child node of a ```Window``` operator is
allmost always an ```ExternalSort``` operator. Wouldn't it be cool if we could
eliminate the row buffer of the ```Window``` by using the external sorts buffer?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]