[GitHub] spark pull request: Common interfaces between RDD, DStream, and Da...

srowen Fri, 17 Apr 2015 21:52:15 -0700

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5565#issuecomment-94126381
  
    This is such a significant change that it certainly should have a JIRA, and 
should probably start with design discussion first. This can accompany it as a 
straw-man, sure. But before even that happens --
    
    My initial reaction is that this is introducing a lot of change to the 
APIs, which isn't backwards compatible. That's a non-starter for the short term 
of course, not necessarily in the longer-term, but to core methods, it would 
still have to pull its weight.
    
    Can these changes be made without changing the API though?
    
    Unifying RDD and DStream may not be important. They expose similar-ish APIs 
but are different things, and as I understand DStream's methods are partly 
redundant at this point anyway. I don't have as good a view into how 
interchangeable an RDD and DataFrame is intended to be.
    
    How much does this much change buy -- what's the upside?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Common interfaces between RDD, DStream, and Da...

Reply via email to