Re: Spark Structured Streaming Custom Sources confusion

Lars Francke Fri, 28 Jun 2019 07:57:16 -0700

Hi Gabor,

sure, the DSv2 seems to be undergoing backward-incompatible changes from
Spark 2 -> 3 though, right? That combined with the fact that the API is
pretty new still doesn't instill confidence in its stability (API wise I
mean).


Cheers,
Lars

On Fri, Jun 28, 2019 at 4:10 PM Gabor Somogyi <[email protected]>
wrote:

> Hi Lars,
>
> DSv2 already used in production.
>
> Documentation, well since Spark evolving fast I would take a look at how
> the built-in connectors implemented.
>
> BR?
> G
>
>
> On Fri, Jun 28, 2019 at 3:52 PM Lars Francke <[email protected]>
> wrote:
>
>> Gabor,
>>
>> thank you. That is immensely helpful. DataSource v1 it is then. Does that
>> mean DSV2 is not really for production use yet?
>>
>> Any idea what the best documentation would be? I'd probably start by
>> looking at existing code.
>>
>> Cheers,
>> Lars
>>
>> On Fri, Jun 28, 2019 at 1:06 PM Gabor Somogyi <[email protected]>
>> wrote:
>>
>>> Hi Lars,
>>>
>>> Since Structured Streaming doesn't support receivers at all so that
>>> source/sink can't be used.
>>>
>>> Data source v2 is under development and because of that it's a moving
>>> target so I suggest to implement it with v1 (unless special features are
>>> required from v2).
>>> Additionally since I've just adopted Kafka batch source/sink I can say
>>> it's doable to merge from v1 to v2 when time comes.
>>> (Please see https://github.com/apache/spark/pull/24738. Worth to
>>> mention this is batch and not streaming but there is a similar PR)
>>> Dropping v1 will not happen lightning fast in the near future though...
>>>
>>> BR,
>>> G
>>>
>>>
>>> On Tue, Jun 25, 2019 at 10:02 PM Lars Francke <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm a bit confused about the current state and the future plans of
>>>> custom data sources in Structured Streaming.
>>>>
>>>> So for DStreams we could write a Receiver as documented. Can this be
>>>> used with Structured Streaming?
>>>>
>>>> Then we had the DataSource API with DefaultSource et. al. which was (in
>>>> my opinion) never properly documented.
>>>>
>>>> With Spark 2.3 we got a new DataSourceV2 (which also was a marker
>>>> interface), also not properly documented.
>>>>
>>>> Now with Spark 3 this seems to change again? (
>>>> https://issues.apache.org/jira/browse/SPARK-25390), at least the
>>>> DataSourceV2 interface is gone, still no documentation but still called v2
>>>> somehow?
>>>>
>>>> Can anyone shed some light on the current state of data sources & sinks
>>>> for batch & streaming in Spark 2.4 and 3.x?
>>>>
>>>> Thank you!
>>>>
>>>> Cheers,
>>>> Lars
>>>>
>>>

Re: Spark Structured Streaming Custom Sources confusion

Reply via email to