Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/5565#issuecomment-94351472
No, we do that at the moment.
But doing it that way results in a few rather ugly constructs in the
application code that can be rather painful, as soon as one starts passing data
constructs around. As soon as one starts passing the collection structures
between modules, say, for instance, between stages in a pipeline, one instantly
needs to duplicate the entire pipeline for batch and streaming cases.
It isn't just one place where one has to do this replacement - it's every
little pipeline operation, for every algorithm, 90% of which are using just the
most basic RDD and DStream functions should be easily consolidated.
I'd also note that, where there is an interface change, it is there because
the original methods in RDD and DStream were declared inconsistently. Unless
there is a good reason to keep them inconsistent (which so far I don't see in
any of these three cases), I would suggest that isn't a good thing to begin
with - just in terms of consistency and usability of the library, where they
can be the same, they should be. It reduces the learning curve, and removes
some esoteric, hard-to-track-down gotchas that are bound occasionally to bite
people newly switching from one case to the other.
On a final note, if this is the intended use of dstream, why have the map,
flatMap, reduceByKey, etc functions on it at all? It seems clear it was
intended to be used this way (Hm, that reminds me of a fourth small interface
change I'll add above, but as you'll see, it's very, very minor), so why not
make sure the use is the same?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]