[GitHub] spark pull request: [SPARK-10250][CORE] External group by to handl...

JoshRosen Fri, 18 Sep 2015 15:54:07 -0700

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/8438#issuecomment-141588625
  
    Hey @mccheah,
    
    This is a really cool patch. Although we're trying to encourage users to 
migrate workloads to the DataFrames API, there are still many workloads for 
which this is a useful improvement. This patch has some high inherent 
complexity and risk, though, since the details of managing file cleanup are 
non-trivial and it needs to touch a number of stable code paths which haven't 
been modified in a long time.
    
    I'd like to review this patch but I'm a bit too busy with other 1.6 
development and design tasks to devote adequate review time right now. I may 
have some time to review this in a couple of weeks, though. If you'd like to 
try to make this easier to review, I have a couple of suggestions (perhaps 
off-base, since I haven't considered them in detail yet):
    
    1. Split the ContextCleaner changes into their own JIRA + PR.
    2. Guard the external groupBy behind a feature-flag so that users have an 
escape-hatch in case bugs are uncovered in the new external groupBy 
implementation.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-10250][CORE] External group by to handl...

Reply via email to