Re: [DISCUSS] Deprecate DStream in 3.4

2023-01-15 Thread Jungtaek Lim
Given that I got more than 3 PMC members' positive votes as well as several
active contributors' positive votes as well, I will proceed with the actual
work.
(It may take a couple of more days as folk in US will help me and there's a
holiday in US.)

Please let me know if we want to have an official vote thread before moving
forward.

Thanks all for providing your voices on this!

On Sat, Jan 14, 2023 at 3:56 AM Anish Shrigondekar <
anish.shrigonde...@databricks.com> wrote:

> +1 on the Dstreams deprecation proposal
>
> On Fri, Jan 13, 2023 at 10:47 AM Jerry Peng 
> wrote:
>
>> +1 in general for marking the DStreams API as deprecated
>>
>> Jungtaek, can you please provide / elaborate on the concrete actions you
>> intend on taking for the depreciation process?
>>
>> Best,
>>
>> Jerry
>>
>> On Thu, Jan 12, 2023 at 11:16 PM L. C. Hsieh  wrote:
>>
>>> +1
>>>
>>> On Thu, Jan 12, 2023 at 10:39 PM Jungtaek Lim
>>>  wrote:
>>> >
>>> > Yes, exactly. I'm sorry to bring confusion - should have clarified
>>> action items on the proposal.
>>> >
>>> > On Fri, Jan 13, 2023 at 3:31 PM Dongjoon Hyun 
>>> wrote:
>>> >>
>>> >> Then, could you elaborate `the proposed code change` specifically?
>>> >> Maybe, usual deprecation warning logs and annotation on the API?
>>> >>
>>> >>
>>> >> On Thu, Jan 12, 2023 at 10:05 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>> >>>
>>> >>> Maybe I need to clarify - my proposal is "explicitly" deprecating
>>> it, which incurs code change for sure. Guidance on the Spark website is
>>> done already as I mentioned - we updated the DStream doc page to mention
>>> that DStream is a "legacy" project and users should move to SS. I don't
>>> feel this is sufficient to refrain users from using it, hence initiating
>>> this proposal.
>>> >>>
>>> >>> Sorry to make confusion. I just wanted to make sure the goal of the
>>> proposal is not "removing" the API. The discussion on the removal of API
>>> doesn't tend to go well, so I wanted to make sure I don't mean that.
>>> >>>
>>> >>> On Fri, Jan 13, 2023 at 2:46 PM Dongjoon Hyun <
>>> dongjoon.h...@gmail.com> wrote:
>>> 
>>>  +1 for the proposal (guiding only without any code change).
>>> 
>>>  Thanks,
>>>  Dongjoon.
>>> 
>>>  On Thu, Jan 12, 2023 at 9:33 PM Shixiong Zhu 
>>> wrote:
>>> >
>>> > +1
>>> >
>>> >
>>> > On Thu, Jan 12, 2023 at 5:08 PM Tathagata Das <
>>> tathagata.das1...@gmail.com> wrote:
>>> >>
>>> >> +1
>>> >>
>>> >> On Thu, Jan 12, 2023 at 7:46 PM Hyukjin Kwon 
>>> wrote:
>>> >>>
>>> >>> +1
>>> >>>
>>> >>> On Fri, 13 Jan 2023 at 08:51, Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>> 
>>>  bump for more visibility.
>>> 
>>>  On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>> >
>>> > Hi dev,
>>> >
>>> > I'd like to propose the deprecation of DStream in Spark 3.4,
>>> in favor of promoting Structured Streaming.
>>> > (Sorry for the late proposal, if we don't make the change in
>>> 3.4, we will have to wait for another 6 months.)
>>> >
>>> > We have been focusing on Structured Streaming for years
>>> (across multiple major and minor versions), and during the time we haven't
>>> made any improvements for DStream. Furthermore, recently we updated the
>>> DStream doc to explicitly say DStream is a legacy project.
>>> >
>>> https://spark.apache.org/docs/latest/streaming-programming-guide.html#note
>>> >
>>> > The baseline of deprecation is that we don't see a particular
>>> use case which only DStream solves. This is a different story with GraphX
>>> and MLLIB, as we don't have replacements for that.
>>> >
>>> > The proposal does not mean we will remove the API soon, as the
>>> Spark project has been making deprecation against public API. I don't
>>> intend to propose the target version for removal. The goal is to guide
>>> users to refrain from constructing a new workload with DStream. We might
>>> want to go with this in future, but it would require a new discussion
>>> thread at that time.
>>> >
>>> > What do you think?
>>> >
>>> > Thanks,
>>> > Jungtaek Lim (HeartSaVioR)
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: [DISCUSS] Deprecate DStream in 3.4

2023-01-15 Thread Jungtaek Lim
I described it in the thread -  I had to add it in the reply so it's not
easy to find. Sorry for the inconvenience.

https://lists.apache.org/thread/d9yg7w9pnb9rw7c2yglp4qk6jt43y0kw


On Sat, Jan 14, 2023 at 3:46 AM Jerry Peng 
wrote:

> +1 in general for marking the DStreams API as deprecated
>
> Jungtaek, can you please provide / elaborate on the concrete actions you
> intend on taking for the depreciation process?
>
> Best,
>
> Jerry
>
> On Thu, Jan 12, 2023 at 11:16 PM L. C. Hsieh  wrote:
>
>> +1
>>
>> On Thu, Jan 12, 2023 at 10:39 PM Jungtaek Lim
>>  wrote:
>> >
>> > Yes, exactly. I'm sorry to bring confusion - should have clarified
>> action items on the proposal.
>> >
>> > On Fri, Jan 13, 2023 at 3:31 PM Dongjoon Hyun 
>> wrote:
>> >>
>> >> Then, could you elaborate `the proposed code change` specifically?
>> >> Maybe, usual deprecation warning logs and annotation on the API?
>> >>
>> >>
>> >> On Thu, Jan 12, 2023 at 10:05 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>> >>>
>> >>> Maybe I need to clarify - my proposal is "explicitly" deprecating it,
>> which incurs code change for sure. Guidance on the Spark website is done
>> already as I mentioned - we updated the DStream doc page to mention that
>> DStream is a "legacy" project and users should move to SS. I don't feel
>> this is sufficient to refrain users from using it, hence initiating this
>> proposal.
>> >>>
>> >>> Sorry to make confusion. I just wanted to make sure the goal of the
>> proposal is not "removing" the API. The discussion on the removal of API
>> doesn't tend to go well, so I wanted to make sure I don't mean that.
>> >>>
>> >>> On Fri, Jan 13, 2023 at 2:46 PM Dongjoon Hyun <
>> dongjoon.h...@gmail.com> wrote:
>> 
>>  +1 for the proposal (guiding only without any code change).
>> 
>>  Thanks,
>>  Dongjoon.
>> 
>>  On Thu, Jan 12, 2023 at 9:33 PM Shixiong Zhu 
>> wrote:
>> >
>> > +1
>> >
>> >
>> > On Thu, Jan 12, 2023 at 5:08 PM Tathagata Das <
>> tathagata.das1...@gmail.com> wrote:
>> >>
>> >> +1
>> >>
>> >> On Thu, Jan 12, 2023 at 7:46 PM Hyukjin Kwon 
>> wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Fri, 13 Jan 2023 at 08:51, Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>> 
>>  bump for more visibility.
>> 
>>  On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>> >
>> > Hi dev,
>> >
>> > I'd like to propose the deprecation of DStream in Spark 3.4, in
>> favor of promoting Structured Streaming.
>> > (Sorry for the late proposal, if we don't make the change in
>> 3.4, we will have to wait for another 6 months.)
>> >
>> > We have been focusing on Structured Streaming for years (across
>> multiple major and minor versions), and during the time we haven't made any
>> improvements for DStream. Furthermore, recently we updated the DStream doc
>> to explicitly say DStream is a legacy project.
>> >
>> https://spark.apache.org/docs/latest/streaming-programming-guide.html#note
>> >
>> > The baseline of deprecation is that we don't see a particular
>> use case which only DStream solves. This is a different story with GraphX
>> and MLLIB, as we don't have replacements for that.
>> >
>> > The proposal does not mean we will remove the API soon, as the
>> Spark project has been making deprecation against public API. I don't
>> intend to propose the target version for removal. The goal is to guide
>> users to refrain from constructing a new workload with DStream. We might
>> want to go with this in future, but it would require a new discussion
>> thread at that time.
>> >
>> > What do you think?
>> >
>> > Thanks,
>> > Jungtaek Lim (HeartSaVioR)
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>