Github user squito commented on the issue: https://github.com/apache/spark/pull/21346 > is this effectively dead code at this point? yes, thats right. this PR by itself is not useful. Its a step towards https://github.com/apache/spark/pull/21451 This is a good point to put in the PR summary -- I'll do that, and also your summary notes above, if you don't mind. > what are the major risks of this change in terms of introducing performance or correctness issues? If we identify risks (e.g. "this is a historically tricky area of code?"), can we mitigate those risks through correctness testing / load testing? I've made an effort to make minimal modifications to all existing code paths, to minimize the risk of introducing bugs in current functionality. My intention is to only turn it on by default initially for cases we know would fail with the old code -- when the data is > 2gb ([SPARK-24297](https://issues.apache.org/jira/browse/SPARK-24297)). I've added unit tests and shared the test I'm doing on a cluster just to find holes in functionality (posted on the parent jira here: https://issues.apache.org/jira/browse/SPARK-6235?focusedCommentId=16484069&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16484069). I have not done load testing yet but plan to. Extra testing, of course, would certainly be good.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org