Hi Lars, Since Structured Streaming doesn't support receivers at all so that source/sink can't be used.
Data source v2 is under development and because of that it's a moving target so I suggest to implement it with v1 (unless special features are required from v2). Additionally since I've just adopted Kafka batch source/sink I can say it's doable to merge from v1 to v2 when time comes. (Please see https://github.com/apache/spark/pull/24738. Worth to mention this is batch and not streaming but there is a similar PR) Dropping v1 will not happen lightning fast in the near future though... BR, G On Tue, Jun 25, 2019 at 10:02 PM Lars Francke <[email protected]> wrote: > Hi, > > I'm a bit confused about the current state and the future plans of custom > data sources in Structured Streaming. > > So for DStreams we could write a Receiver as documented. Can this be used > with Structured Streaming? > > Then we had the DataSource API with DefaultSource et. al. which was (in my > opinion) never properly documented. > > With Spark 2.3 we got a new DataSourceV2 (which also was a marker > interface), also not properly documented. > > Now with Spark 3 this seems to change again? ( > https://issues.apache.org/jira/browse/SPARK-25390), at least the > DataSourceV2 interface is gone, still no documentation but still called v2 > somehow? > > Can anyone shed some light on the current state of data sources & sinks > for batch & streaming in Spark 2.4 and 3.x? > > Thank you! > > Cheers, > Lars >
