Github user tdas commented on the pull request:
https://github.com/apache/spark/pull/5680#issuecomment-97587604
@jerryshao This is a decent patch for the assumed design, but I mentioned
in the parent JIRA https://issues.apache.org/jira/browse/SPARK-7111 that the
design is not great. The "direct API" is a name we came up to differentiate new
Kafka API from the old one receiver-based one, and logically every subclass of
`InputDStream` that is not a `ReceiverInputDStream` is a direct stream. So
further separating out InputDStreams as direct stream and non-direct stream
(beyond the receiver stream) is a bad idea. In the end, the goal is to enable
all the streams, irrespective of its type, to report information about its
input to the infra. And there should be a single common way / code path for
doing it for all input streams, with some customization / override for receiver
input streams (as unlike non-receiver-streams, all receiver streams has a
common way of reporting block info).
I have been thinking about this since last night and I will post a very
rough design doc on the JIRA shortly. Please follow up on the JIRA.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]