Hi Gabor, sure, the DSv2 seems to be undergoing backward-incompatible changes from Spark 2 -> 3 though, right? That combined with the fact that the API is pretty new still doesn't instill confidence in its stability (API wise I mean).
Cheers, Lars On Fri, Jun 28, 2019 at 4:10 PM Gabor Somogyi <[email protected]> wrote: > Hi Lars, > > DSv2 already used in production. > > Documentation, well since Spark evolving fast I would take a look at how > the built-in connectors implemented. > > BR? > G > > > On Fri, Jun 28, 2019 at 3:52 PM Lars Francke <[email protected]> > wrote: > >> Gabor, >> >> thank you. That is immensely helpful. DataSource v1 it is then. Does that >> mean DSV2 is not really for production use yet? >> >> Any idea what the best documentation would be? I'd probably start by >> looking at existing code. >> >> Cheers, >> Lars >> >> On Fri, Jun 28, 2019 at 1:06 PM Gabor Somogyi <[email protected]> >> wrote: >> >>> Hi Lars, >>> >>> Since Structured Streaming doesn't support receivers at all so that >>> source/sink can't be used. >>> >>> Data source v2 is under development and because of that it's a moving >>> target so I suggest to implement it with v1 (unless special features are >>> required from v2). >>> Additionally since I've just adopted Kafka batch source/sink I can say >>> it's doable to merge from v1 to v2 when time comes. >>> (Please see https://github.com/apache/spark/pull/24738. Worth to >>> mention this is batch and not streaming but there is a similar PR) >>> Dropping v1 will not happen lightning fast in the near future though... >>> >>> BR, >>> G >>> >>> >>> On Tue, Jun 25, 2019 at 10:02 PM Lars Francke <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> I'm a bit confused about the current state and the future plans of >>>> custom data sources in Structured Streaming. >>>> >>>> So for DStreams we could write a Receiver as documented. Can this be >>>> used with Structured Streaming? >>>> >>>> Then we had the DataSource API with DefaultSource et. al. which was (in >>>> my opinion) never properly documented. >>>> >>>> With Spark 2.3 we got a new DataSourceV2 (which also was a marker >>>> interface), also not properly documented. >>>> >>>> Now with Spark 3 this seems to change again? ( >>>> https://issues.apache.org/jira/browse/SPARK-25390), at least the >>>> DataSourceV2 interface is gone, still no documentation but still called v2 >>>> somehow? >>>> >>>> Can anyone shed some light on the current state of data sources & sinks >>>> for batch & streaming in Spark 2.4 and 3.x? >>>> >>>> Thank you! >>>> >>>> Cheers, >>>> Lars >>>> >>>
