I would argue that the guarantees of @Public methods that became
ineffective were broken when they became ineffective (and were deprecated).

   - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
   - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)

Removing these methods seems like the better of two evils to me as it shows
users that they have been using no-ops for some time.

On Thu, Aug 20, 2020 at 10:50 AM Stephan Ewen <se...@apache.org> wrote:

> We have removed some public methods in the past, after a careful
> deprecation period, if they were not well working any more.
>
> The sentiment I got from users is that careful cleanup is in fact
> appreciated, otherwise things get confusing over time (the deprecated
> methods cause noise in the API).
> Still, we need to be very careful here.
>
> I would suggest to
>   - start with the non-public breaking methods
>   - remove fold() (very long deprecated)
>   - remove split() buggy
>
> Removing the env.socketStream() and env.fileStream() methods would
> probably be good, too. They are very long deprecated and they don't work
> well (with checkpoints) and the sources are the first thing a user needs to
> understand when starting with Flink, so removing noise there is super
> helpful.
>
>
> On Thu, Aug 20, 2020 at 8:53 AM Dawid Wysakowicz <dwysakow...@apache.org>
> wrote:
>
>> Hey Till,
>>
>> You've got a good point here. Removing some of the methods would cause
>> breaking the stability guarantees. I do understand if we decide not to
>> remove them for that reason, let me explain though why I am thinking it
>> might make sense to remove them already. First of all I am a bit afraid it
>> might take a long time before we arrive at the 2.0 version. We have not
>> ever discussed that in the community. At the same time a lot of the methods
>> already don't work or are buggy, and we do not fix them any more.
>>
>> Methods which removing would not break the Public guarantees:
>>
>>    - ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>    - RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>    - ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>    - 
>> StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>>    (not the equivalent in the ExecutionConfig)
>>    - StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>    (deprecated in 1.5)
>>
>> Methods which removing would break the Public guarantees:
>>
>> which have no effect:
>>
>>    - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>    - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>
>> which are buggy or discouraged and thus we do not support fixing them:
>>
>>    - DataStream#split (deprecated in 1.8)
>>    - DataStream#fold and all related classes and methods such as
>>    FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>    1.3/1.4)
>>
>> The methods like:
>>
>>    - 
>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...),
>>
>>    - methods in (Connected)DataStream that specify keys as either
>>    indices or field names
>>    -
>>    ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>>
>> should be working just fine and I feel the least eager to remove those.
>>
>> I'd suggest I will open PRs for removing the methods that will not cause
>> breakage of the Public guarantees as the general feedback was rather
>> positive. For the rest I do understand the resentment to do so and will not
>> do it in the 1.x branch. Still I think it is valuable to have the
>> discussion.
>>
>> Best,
>>
>> Dawid
>>
>>
>> On 18/08/2020 09:26, Till Rohrmann wrote:
>>
>> Having looked at the proposed set of methods to remove I've noticed that
>> some of them are actually annotated with @Public. According to our
>> stability guarantees, only major releases (1.0, 2.0, etc.) can break APIs
>> with this annotation. Hence, I believe that we cannot simply remove them
>> unless the community decides to change the stability guarantees we give or
>> by making the next release a major release (Flink 2.0).
>>
>> Cheers,
>> Till
>>
>> On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yungao...@aliyun.com> wrote:
>>
>>> +1 for removing the methods that are deprecated for a while & have
>>> alternative methods.
>>>
>>> One specific thing is that if we remove the DataStream#split, do we
>>> consider enabling side-output in more operators in the future ? Currently
>>> it should be only available in ProcessFunctions, but not available to other
>>> commonly used UDF like Source or AsyncFunction[1].
>>>
>>> One temporary solution occurs to me is to add a ProcessFunction after
>>> the operators want to use side-output. But I think the solution is not very
>>> direct to come up with and if it really works we might add it to the
>>> document of side-output.
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-7954
>>>
>>> Best,
>>>  Yun
>>>
>>> ------------------Original Mail ------------------
>>> *Sender:*Kostas Kloudas <kklou...@gmail.com>
>>> *Send Date:*Tue Aug 18 03:52:44 2020
>>> *Recipients:*Dawid Wysakowicz <dwysakow...@apache.org>
>>> *CC:*dev <d...@flink.apache.org>, user <user@flink.apache.org>
>>> *Subject:*Re: [DISCUSS] Removing deprecated methods from DataStream API
>>>
>>>> +1 for removing them.
>>>>
>>>>
>>>>
>>>> From a quick look, most of them (not all) have been deprecated a long
>>>> time ago.
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Kostas
>>>>
>>>>
>>>>
>>>> On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote:
>>>>
>>>> >
>>>>
>>>> > @David Yes, my idea was to remove any use of fold method and all
>>>> related classes including WindowedStream#fold
>>>>
>>>> >
>>>>
>>>> > @Klou Good idea to also remove the deprecated enableCheckpointing() &
>>>> StreamExecutionEnvironment#readFile and alike. I did another pass over some
>>>> of the classes and thought we could also drop:
>>>>
>>>> >
>>>>
>>>> > ExecutionConfig#set/getCodeAnalysisMode
>>>>
>>>> > ExecutionConfig#disable/enableSysoutLogging
>>>>
>>>> > ExecutionConfig#set/isFailTaskOnCheckpointError
>>>>
>>>> > ExecutionConfig#isLatencyTrackingEnabled
>>>>
>>>> >
>>>>
>>>> > As for the `forceCheckpointing` I am not fully convinced to doing it.
>>>> As far as I know iterations still do not participate in checkpointing
>>>> correctly. Therefore it still might make sense to force it. In other words
>>>> there is no real alternative to that method. Unless we only remove the
>>>> methods from StreamExecutionEnvironment and redirect to the setter in
>>>> CheckpointConfig. WDYT?
>>>>
>>>> >
>>>>
>>>> > An updated list of methods I suggest to remove:
>>>>
>>>> >
>>>>
>>>> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>
>>>> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>
>>>> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>>
>>>> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>
>>>> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>>> 1.1?)
>>>>
>>>> >
>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>>> (deprecated in 1.2)
>>>>
>>>> > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>
>>>> > DataStream#fold and all related classes and methods such as
>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>> 1.3/1.4)
>>>>
>>>> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>> (deprecated in 1.5)
>>>>
>>>> > DataStream#split (deprecated in 1.8)
>>>>
>>>> > Methods in (Connected)DataStream that specify keys as either indices
>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>
>>>> >
>>>>
>>>> > Bear in mind that majority of the options listed above in
>>>> ExecutionConfig take no effect. They were left there purely to satisfy the
>>>> binary compatibility. Personally I don't see any benefit of leaving a
>>>> method and silently dropping the underlying feature. The only configuration
>>>> that is respected is setting the number of execution retries.
>>>>
>>>> >
>>>>
>>>> > I also wanted to make it explicit that most of the changes above
>>>> would result in a binary incompatibility and require additional exclusions
>>>> in the japicmp. Those are:
>>>>
>>>> >
>>>>
>>>> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>
>>>> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>
>>>> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>>
>>>> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>
>>>> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>>> 1.1?)
>>>>
>>>> > DataStream#fold and all related classes and methods such as
>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>> 1.3/1.4)
>>>>
>>>> > DataStream#split (deprecated in 1.8)
>>>>
>>>> > Methods in (Connected)DataStream that specify keys as either indices
>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>
>>>> >
>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>>> (deprecated in 1.2)
>>>>
>>>> >
>>>>
>>>> > Looking forward to more opinions on the issue.
>>>>
>>>> >
>>>>
>>>> > Best,
>>>>
>>>> >
>>>>
>>>> > Dawid
>>>>
>>>> >
>>>>
>>>> >
>>>>
>>>> > On 17/08/2020 12:49, Kostas Kloudas wrote:
>>>>
>>>> >
>>>>
>>>> > Thanks a lot for starting this Dawid,
>>>>
>>>> >
>>>>
>>>> > Big +1 for the proposed clean-up, and I would also add the deprecated
>>>>
>>>> > methods of the StreamExecutionEnvironment like:
>>>>
>>>> >
>>>>
>>>> > enableCheckpointing(long interval, CheckpointingMode mode, boolean
>>>> force)
>>>>
>>>> > enableCheckpointing()
>>>>
>>>> > isForceCheckpointing()
>>>>
>>>> >
>>>>
>>>> > readFile(FileInputFormat inputFormat,String
>>>>
>>>> > filePath,FileProcessingMode watchType,long interval, FilePathFilter
>>>>
>>>> > filter)
>>>>
>>>> > readFileStream(...)
>>>>
>>>> >
>>>>
>>>> > socketTextStream(String hostname, int port, char delimiter, long
>>>> maxRetry)
>>>>
>>>> > socketTextStream(String hostname, int port, char delimiter)
>>>>
>>>> >
>>>>
>>>> > There are more, like the (get)/setNumberOfExecutionRetries() that were
>>>>
>>>> > deprecated long ago, but I have not investigated to see if they are
>>>>
>>>> > actually easy to remove.
>>>>
>>>> >
>>>>
>>>> > Cheers,
>>>>
>>>> > Kostas
>>>>
>>>> >
>>>>
>>>> > On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
>>>>
>>>> > wrote:
>>>>
>>>> >
>>>>
>>>> > Hi devs and users,
>>>>
>>>> >
>>>>
>>>> > I wanted to ask you what do you think about removing some of the
>>>> deprecated APIs around the DataStream API.
>>>>
>>>> >
>>>>
>>>> > The APIs I have in mind are:
>>>>
>>>> >
>>>>
>>>> > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>
>>>> > DataStream#fold and all related classes and methods such as
>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>> 1.3/1.4)
>>>>
>>>> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>> (deprecated in 1.5)
>>>>
>>>> > DataStream#split (deprecated in 1.8)
>>>>
>>>> > Methods in (Connected)DataStream that specify keys as either indices
>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>
>>>> >
>>>>
>>>> > I think the first three should be straightforward. They are long
>>>> deprecated. The getAccumulators method is not used very often in my
>>>> opinion. The same applies to the DataStream#fold which additionally is not
>>>> very performant. Lastly the setStateBackend has an alternative with a class
>>>> from the AbstractStateBackend hierarchy, therefore it will be still code
>>>> compatible. Moreover if we remove the
>>>> #setStateBackend(AbstractStateBackend) we will get rid off warnings users
>>>> have right now when setting a statebackend as the correct method cannot be
>>>> used without an explicit casting.
>>>>
>>>> >
>>>>
>>>> > As for the DataStream#split I know there were some objections against
>>>> removing the #split method in the past. I still believe the output tags can
>>>> replace the split method already.
>>>>
>>>> >
>>>>
>>>> > The only problem in the last set of methods I propose to remove is
>>>> that they were deprecated only in the last release and those method were
>>>> only partially deprecated. Moreover some of the methods were not deprecated
>>>> in ConnectedStreams. Nevertheless I'd still be inclined to remove the
>>>> methods in this release.
>>>>
>>>> >
>>>>
>>>> > Let me know what do you think about it.
>>>>
>>>> >
>>>>
>>>> > Best,
>>>>
>>>> >
>>>>
>>>> > Dawid
>>>
>>>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk

Reply via email to