Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/5565#issuecomment-94126381
This is such a significant change that it certainly should have a JIRA, and
should probably start with design discussion first. This can accompany it as a
straw-man, sure. But before even that happens --
My initial reaction is that this is introducing a lot of change to the
APIs, which isn't backwards compatible. That's a non-starter for the short term
of course, not necessarily in the longer-term, but to core methods, it would
still have to pull its weight.
Can these changes be made without changing the API though?
Unifying RDD and DStream may not be important. They expose similar-ish APIs
but are different things, and as I understand DStream's methods are partly
redundant at this point anyway. I don't have as good a view into how
interchangeable an RDD and DataFrame is intended to be.
How much does this much change buy -- what's the upside?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]