Gabor, thank you. That is immensely helpful. DataSource v1 it is then. Does that mean DSV2 is not really for production use yet?
Any idea what the best documentation would be? I'd probably start by looking at existing code. Cheers, Lars On Fri, Jun 28, 2019 at 1:06 PM Gabor Somogyi <[email protected]> wrote: > Hi Lars, > > Since Structured Streaming doesn't support receivers at all so that > source/sink can't be used. > > Data source v2 is under development and because of that it's a moving > target so I suggest to implement it with v1 (unless special features are > required from v2). > Additionally since I've just adopted Kafka batch source/sink I can say > it's doable to merge from v1 to v2 when time comes. > (Please see https://github.com/apache/spark/pull/24738. Worth to mention > this is batch and not streaming but there is a similar PR) > Dropping v1 will not happen lightning fast in the near future though... > > BR, > G > > > On Tue, Jun 25, 2019 at 10:02 PM Lars Francke <[email protected]> > wrote: > >> Hi, >> >> I'm a bit confused about the current state and the future plans of custom >> data sources in Structured Streaming. >> >> So for DStreams we could write a Receiver as documented. Can this be used >> with Structured Streaming? >> >> Then we had the DataSource API with DefaultSource et. al. which was (in >> my opinion) never properly documented. >> >> With Spark 2.3 we got a new DataSourceV2 (which also was a marker >> interface), also not properly documented. >> >> Now with Spark 3 this seems to change again? ( >> https://issues.apache.org/jira/browse/SPARK-25390), at least the >> DataSourceV2 interface is gone, still no documentation but still called v2 >> somehow? >> >> Can anyone shed some light on the current state of data sources & sinks >> for batch & streaming in Spark 2.4 and 3.x? >> >> Thank you! >> >> Cheers, >> Lars >> >
