+1 for reverting these changes in Flink 1.16, so I will cancel 1.16.0-rc1.
+1 for `numXXXSend` as the alias of `numXXXOut` in 1.15.3.

Best,
Xingbo

Chesnay Schepler <ches...@apache.org> 于2022年10月10日周一 19:13写道:

> > I’m with Xintong’s idea to treat numXXXSend as an alias of numXXXOut
>
> But that's not possible. If it were that simple there would have never
> been a need to introduce another metric in the first place.
>
> It's a rather fundamental issue with how the new sinks work, in that they
> emit data to the external system (usually considered as "numRecordsOut" of
> sinks) while _also_ sending data to a downstream operator (usually
> considered as "numRecordsOut" of tasks).
> The original issue was that the numRecordsOut of the sink counted both
> (which is completely wrong).
>
> A new metric was always required; otherwise you inevitably end up breaking
> *some* semantic.
> Adding a new metric for what the sink writes to the external system is,
> for better or worse, more consistent with how these metrics usually work in
> Flink.
>
> On 10/10/2022 12:45, Qingsheng Ren wrote:
>
> Thanks everyone for joining the discussion!
>
> > Do you have any idea what has happened in the process here?
>
> The discussion in this PR [1] shows some details and could be helpful to
> understand the original motivation of the renaming. We do have a test case
> for guarding metrics but unfortunaly the case was also modified so the
> defense was broken.
>
> I think the reason why both the developer and the reviewer forgot to
> trigger an discussion and gave a green pass on the change is that metrics
> are quite “trivial” to be noticed as public APIs. As mentioned by Martijn I
> couldn’t find a place noting that metrics are public APIs and should be
> treated carefully while contributing and reviewing.
>
> IMHO three actions could be made to prevent this kind of changes in the
> future:
>
> a. Add test case for metrics (which we already have in SinkMetricsITCase)
> b. We emphasize that any public-interface breaking changes should be
> proposed by a FLIP or discussed in mailing list, and should be listed in
> the release note.
> c. We remind contributors and reviewers about what should be considered as
> public API, and include metric names in it.
>
> For b and c these two pages [2][3] might be proper places.
>
> About the patch to revert this, it looks like we have a consensus on 1.16.
> As of 1.15 I think it’s worthy to trigger a minor version. I didn’t see
> complaints about this for now so it should be OK to save the situation
> asap. I’m with Xintong’s idea to treat numXXXSend as an alias of numXXXOut
> considering there could possibly some users have already adapted their
> system to the new naming, and have another internal metric for reflecting
> number of outgoing committable batches (actually the numRecordsIn of sink
> committer operator should be carrying this info already).
>
> [1] https://github.com/apache/flink/pull/18825
> [2] https://flink.apache.org/contributing/contribute-code.html
> [3] https://flink.apache.org/contributing/reviewing-prs.html
>
> Best,
> Qingsheng
> On Oct 10, 2022, 17:40 +0800, Xintong Song <tonysong...@gmail.com>
> <tonysong...@gmail.com>, wrote:
>
> +1 for reverting these changes in Flink 1.16.
>
> For 1.15.3, can we make these metrics available via both names (numXXXOut
> and numXXXSend)? In this way we don't break it for those who already
> migrated to 1.15 and numXXXSend. That means we still need to change
> SinkWriterOperator to use another metric name in 1.15.3, which IIUC is
> internal to Flink sink.
>
> I'm overall +1 to change numXXXOut back to its original semantics. AFAIK
> (from meetup / flink-forward questionaires), most users do not migrate to a
> new Flink release immediately, until the next 1-2 major releases are out.
>
> Best,
>
> Xintong
>
>
> On Mon, Oct 10, 2022 at 5:26 PM Martijn Visser <martijnvis...@apache.org>
> wrote:
>
>> Hi Qingsheng,
>>
>> Do you have any idea what has happened in the process here? Do we know why
>> they were changed? I was under the impression that these metric names were
>> newly introduced due to the new interfaces and because it still depends on
>> each connector implementing these.
>>
>> Sidenote: metric names are not mentioned in the FLIP process as a public
>> API. Might make sense to have a separate follow-up to add that to the list
>> (I do think we should list them there).
>>
>> +1 for reverting this and make this change in Flink 1.16
>>
>> I'm not in favour of releasing a Flink 1.15.3 with this change: I think
>> the
>> impact is too big for a patch version, especially given how long Flink
>> 1.15
>> is already out there.
>>
>> Best regards,
>>
>> Martijn
>>
>> On Mon, Oct 10, 2022 at 11:13 AM Leonard Xu <xbjt...@gmail.com> wrote:
>>
>> > Thanks Qingsheng for starting this thread.
>> >
>> > +1 on reverting sink metric name and releasing 1.15.3 to fix this
>> > inconsistent behavior.
>> >
>> >
>> > Best,
>> > Leonard
>> >
>> >
>> >
>> >
>> >
>> > 2022年10月10日 下午3:06,Jark Wu <imj...@gmail.com> 写道:
>> >
>> > Thanks for discovering this problem, Qingsheng!
>> >
>> > I'm also +1 for reverting the breaking changes.
>> >
>> > IIUC, currently, the behavior of "numXXXOut" metrics of the new and old
>> > sink is inconsistent.
>> > We have to break one of them to have consistent behavior. Sink V2 is an
>> > evolving API which is just introduced in 1.15.
>> > I think it makes sense to break the unstable API instead of the stable
>> API
>> > which many connectors and users depend on.
>> >
>> > Best,
>> > Jark
>> >
>> >
>> >
>> > On Mon, 10 Oct 2022 at 11:36, Jingsong Li <jingsongl...@gmail.com>
>> wrote:
>> >
>> >> Thanks for driving, Qingsheng.
>> >>
>> >> +1 for reverting sink metric name.
>> >>
>> >> We often forget that metric is also one of the important APIs.
>> >>
>> >> +1 for releasing 1.15.3 to fix this.
>> >>
>> >> Best,
>> >> Jingsong
>> >>
>> >> On Sun, Oct 9, 2022 at 11:35 PM Becket Qin <becket....@gmail.com>
>> wrote:
>> >> >
>> >> > Thanks for raising the discussion, Qingsheng,
>> >> >
>> >> > +1 on reverting the breaking changes.
>> >> >
>> >> > In addition, we might want to release a 1.15.3 to fix this and update
>> >> the previous release docs with this known issue, so that users can
>> upgrade
>> >> to 1.15.3 when they hit it. It would also be good to add some backwards
>> >> compatibility tests on metrics to avoid unintended breaking changes
>> like
>> >> this in the future.
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Jiangjie (Becket) Qin
>> >> >
>> >> > On Sun, Oct 9, 2022 at 10:35 AM Qingsheng Ren <re...@apache.org>
>> wrote:
>> >> >>
>> >> >> Hi devs and users,
>> >> >>
>> >> >> I’d like to start a discussion about reverting a breaking change
>> about
>> >> sink metrics made in 1.15 by FLINK-26126 [1] and FLINK-26492 [2].
>> >> >>
>> >> >> TL;DR
>> >> >>
>> >> >> All sink metrics with name “numXXXOut” defined in FLIP-33 are
>> replace
>> >> by “numXXXSend” in FLINK-26126 and FLINK-26492. Considering metric
>> names
>> >> are public APIs, this is a breaking change to end users and not
>> backward
>> >> compatible. Also unfortunately this breaking change was not discussed
>> in
>> >> the mailing list before.
>> >> >>
>> >> >> Background
>> >> >>
>> >> >> As defined previously in FLIP-33 (the FLIP page has been changed so
>> >> please refer to the old version [3] ), metric “numRecordsOut” is used
>> for
>> >> reporting the total number of output records since the sink started
>> (number
>> >> of records written to the external system), and similarly for
>> >> “numRecordsOutPerSecond”, “numBytesOut”, “numBytesOutPerSecond” and
>> >> “numRecordsOutError”. Most sinks are following this naming and
>> definition.
>> >> However, these metrics are ambiguous in the new Sink API as “numXXXOut”
>> >> could be used by the output of SinkWriterOperator for reporting number
>> of
>> >> Committables delivered to SinkCommitterOperator. In order to resolve
>> the
>> >> conflict, FLINK-26126 and FLINK-26492 changed names of these metrics
>> with
>> >> “numXXXSend”.
>> >> >>
>> >> >> Necessity of reverting this change
>> >> >>
>> >> >> - Metric names are actually public API, as end users need to
>> configure
>> >> metric collecting and alerting system with metric names. Users have to
>> >> reset all configurations related to affected metrics.
>> >> >> - This could also affect custom and external sinks not maintained by
>> >> Flink, which might have implemented with numXXXOut metrics.
>> >> >> - The number of records sent to external system is way more
>> important
>> >> than the number of Committables sent to SinkCommitterOperator, as the
>> >> latter one is just an internal implementation of sink. We could have a
>> new
>> >> metric name for the latter one instead.
>> >> >> - We could avoid splitting the project by version (like “plz use
>> >> numXXXOut before 1.15 and use numXXXSend after”) if we revert it ASAP,
>> >> cosidering 1.16 is still not released for now.
>> >> >>
>> >> >> As a consequence, I’d like to hear from devs and users about your
>> >> opinion on changing these metrics back to “numXXXOut”.
>> >> >>
>> >> >> Looking forward to your reply!
>> >> >>
>> >> >> [1] https://issues.apache.org/jira/browse/FLINK-26126
>> >> >> [2] https://issues.apache.org/jira/browse/FLINK-26492
>> >> >> [1] FLIP-33, version 18:
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883136
>> >> >>
>> >> >> Best,
>> >> >> Qingsheng
>> >>
>> >
>> >
>>
>
>

Reply via email to