Hi Lars, DSv2 already used in production.
Documentation, well since Spark evolving fast I would take a look at how the built-in connectors implemented. BR? G On Fri, Jun 28, 2019 at 3:52 PM Lars Francke <[email protected]> wrote: > Gabor, > > thank you. That is immensely helpful. DataSource v1 it is then. Does that > mean DSV2 is not really for production use yet? > > Any idea what the best documentation would be? I'd probably start by > looking at existing code. > > Cheers, > Lars > > On Fri, Jun 28, 2019 at 1:06 PM Gabor Somogyi <[email protected]> > wrote: > >> Hi Lars, >> >> Since Structured Streaming doesn't support receivers at all so that >> source/sink can't be used. >> >> Data source v2 is under development and because of that it's a moving >> target so I suggest to implement it with v1 (unless special features are >> required from v2). >> Additionally since I've just adopted Kafka batch source/sink I can say >> it's doable to merge from v1 to v2 when time comes. >> (Please see https://github.com/apache/spark/pull/24738. Worth to mention >> this is batch and not streaming but there is a similar PR) >> Dropping v1 will not happen lightning fast in the near future though... >> >> BR, >> G >> >> >> On Tue, Jun 25, 2019 at 10:02 PM Lars Francke <[email protected]> >> wrote: >> >>> Hi, >>> >>> I'm a bit confused about the current state and the future plans of >>> custom data sources in Structured Streaming. >>> >>> So for DStreams we could write a Receiver as documented. Can this be >>> used with Structured Streaming? >>> >>> Then we had the DataSource API with DefaultSource et. al. which was (in >>> my opinion) never properly documented. >>> >>> With Spark 2.3 we got a new DataSourceV2 (which also was a marker >>> interface), also not properly documented. >>> >>> Now with Spark 3 this seems to change again? ( >>> https://issues.apache.org/jira/browse/SPARK-25390), at least the >>> DataSourceV2 interface is gone, still no documentation but still called v2 >>> somehow? >>> >>> Can anyone shed some light on the current state of data sources & sinks >>> for batch & streaming in Spark 2.4 and 3.x? >>> >>> Thank you! >>> >>> Cheers, >>> Lars >>> >>
