The whole operator API is only for advanced users and is not marked
Public(Evolving). Users have to accept that things change and we have to
use that freedom that we don't have in many other parts of the system.
The change needs to be very clear in the change notes though. I also don't
expect
Hi all,
Very thanks for the warm discussions!
Regarding the change in the operator lifecycle, I also agree with adding
the flush/drain/stopAndFlush/finish method. For the duplication between this
method and `endInput` for one input operator, with some offline disucssion with
Dawid now I also
ould still tend to we support the
> ordered case, since the sinks' implementation seem to depend
> on this functionality.
>
> Best,
> Yun
>
> --
> From:Piotr Nowojski
> Send Time:2021 Mar. 4 (Thu.) 22:56
&g
From:Piotr Nowojski
Send Time:2021 Mar. 4 (Thu.) 22:56
To:Kezhu Wang
Cc:Till Rohrmann ; Guowei Ma ; dev
; Yun Gao ; jingsongl...@gmail.com
Subject:Re: Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished
Hi Yun and Kezhu,
> 1. We might introduce a new
hTask or network io stack.
>> > > * Or introducing stream task level `EndOfUserRecordsEvent`(from
>> PR#14831
>> > > @Yun @Piotr)
>> > > * Or special handling of `CheckpointType.SAVEPOINT_TERMINATE`.
>> >
>> > I also have similar concern
putGate/InputChannel would be released after the
> > downstream tasks have received
> > EndOfPartitionEvent from all the input channels, this would makes the
> > following checkpoint unable to
> > perform since we could not emit barriers to downstream tasks ?
> >
> &
i
Send Time:2021 Mar. 4 (Thu.) 17:16
To:Kezhu Wang
Cc:dev ; Yun Gao ;
jingsongl...@gmail.com ; Guowei Ma
; Till Rohrmann
Subject:Re: Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished
Hi Kezhu,
What do you mean by “end-flushing”? I was suggesting to just keep
`endOfInput()
titionEvent is currently emitted in Task instead of
> > StreamTask, we would need some
> > refactors here.
> > 2. Currently the InputGate/InputChannel would be released after the
> > downstream tasks have received
> > EndOfPartitionEvent from all the input channels, this would makes the
> > following checkpoint unable to
> > perform sin
------------
> From:Kezhu Wang
> Send Time:2021 Mar. 1 (Mon.) 01:26
> To:Till Rohrmann
> Cc:Piotr Nowojski ; Guowei Ma <
> guowei@gmail.com>; dev ; Yun Gao <
> yungao...@aliyun.com>; jingsongl...@gmail.com
> Subject:Re: Re: Re: [DISCUSS] FLIP-147: Supp
>
> Best,
> Yun
>
>
> --------------
> From:Kezhu Wang
> Send Time:2021 Mar. 1 (Mon.) 01:26
> To:Till Rohrmann
> Cc:Piotr Nowojski ; Guowei Ma <
> guowei@gmail.com>; dev ; Yun Gao <
> yungao...@aliy
he close()
> happens
> > at last, but it seems close() might also emit records (
> > so now the operator are closed with op1's close() -> op2's endOfInput()
> ->
> > op2's close() -> op3's endOfinput -> ...) ?
> >
> > And on the other side, as Kezhu has also
From:Kezhu Wang
Send Time:2021 Mar. 1 (Mon.) 01:26
To:Till Rohrmann
Cc:Piotr Nowojski ; Guowei Ma ;
dev ; Yun Gao ;
jingsongl...@gmail.com
Subject:Re: Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished
In “stop-with-savepoint —drain”, MAX_WATERMARK is not an issue.
might also emit records (
> > so now the operator are closed with op1's close() -> op2's endOfInput()
> ->
> > op2's close() -> op3's endOfinput -> ...) ?
> >
> > And on the other side, as Kezhu has also proposed, perhapse we might
> have
> > the stop-with-save
osal the close()
> happens
> > at last, but it seems close() might also emit records (
> > so now the operator are closed with op1's close() -> op2's endOfInput()
> ->
> > op2's close() -> op3's endOfinput -> ...) ?
> >
> > And on the other side, a
he operator would
not
> need to wait for other slow operators to exit.
>
> Best,
> Yun
>
>
>
> --Original Mail --
> *Sender:*Kezhu Wang
> *Send Date:*Thu Feb 25 15:11:53 2021
> *Recipients:*Flink Dev , Piotr Nowojski <
> piotr.nowoj
oint --drain could
> be done with one savepoint, and for the normal exit, the operator would not
> need to wait for other slow operators to exit.
>
> Best,
> Yun
>
>
>
> --Original Mail --
> *Sender:*Kezhu Wang
> *Send Date:*Thu Feb
low operators to exit.
Best,
Yun
--Original Mail --
*Sender:*Kezhu Wang
*Send Date:*Thu Feb 25 15:11:53 2021
*Recipients:*Flink Dev , Piotr Nowojski <
piotr.nowoj...@gmail.com>
*CC:*Guowei Ma , jingsongl...@gmail.com <
jingsongl...@gmail.com>
*Subject:*Re
Hi all,
Very thanks for the discussions!
A. Regarding how to avoid emitting records in notifyCheckpointComplete:
Currently the structure of a new sink is writer -> committer -> global
committer and the paralellism of global committer
must be one. By design it would be used in several cases:
Hi all, thanks for driving this and especially Piotr for re-active this
thread.
First, for `notifyCheckpointComplete`, I have strong preference towards
"shut down the dataflow
pipeline with one checkpoint in total", so I tend to option dropping "send
records" from
`notifyCheckpointComplete` for
Thanks for the reponses Guowei and Yun,
Could you elaborate more/remind me, what does it mean to replace emitting
results from the `notifyCheckpointComplete` with `OperatorCoordinator`
approach?
About the discussion in FLINK-21133 and how it relates to FLIP-147. You are
right Yun gao, that in
Hi Till, Guowei,
Very thanks for initiating the disucssion and the deep thoughts!
For the notifyCheckpointComplete, I also agree we could try to avoid
emitting new records in notifyCheckpointComplete via using OperatorCoordinator
for new sink API. Besides, the hive sink might also need some
Hi all,
We have some offline discussion together with @Arvid, @Roman and @Aljoscha and
I'd
like to post some points we discussed:
1) For the problem that the "new" root task coincidently finished before
getting triggered
successfully, we have listed two options in the FLIP-147[1], for the
Hi all,
I updated the FLIP[1] to reflect the major discussed points in the ML thread:
1) For the "new" root tasks finished before it received trigger message,
previously we proposed
to let JM re-compute and re-trigger the descendant tasks, but after the
discussion we realized that
it might
Hi Roman,
Very thanks for the feedbacks and suggestions!
> I think UC will be the common case with multiple sources each with
DoP > 1.
> IIUC, waiting for EoP will be needed on each subtask each time one of
it's source subtask finishes.
Yes, waiting for
Hi Yun,
> b) With unaligned checkpoint enabled, the slower cases might happen if
the downstream task processes very slowly.
I think UC will be the common case with multiple sources each with DoP > 1.
IIUC, waiting for EoP will be needed on each subtask each time one of it's
source subtask
Hi Roman,
Very thanks for the feedbacks !
> Probably it would be simpler to just decline the RPC-triggered
checkpoint
> if not all inputs of this task are finished (with
CHECKPOINT_DECLINED_TASK_NOT_READY).
> But I wonder how significantly this waiting
Thanks a lot for your answers Yun,
> In detail, support we have a job with the graph A -> B -> C, support in
one checkpoint A has reported FINISHED, CheckpointCoordinator would
> choose B as the new "source" to trigger checkpoint via RPC. For task B,
if it received checkpoint trigger, it would
Hi Roman,
Very thanks for the feedbacks! I'll try to answer the issues inline:
> 1. Option 1 is said to be not preferable because it wastes resources and adds
> complexity (new event).
> However, the resources would be wasted for a relatively short time until the
> job finishes completely.
Hi Arvid,
Very thanks for the deep thoughts !
> If this somehow works, we would not need to change much in the checkpoint
> coordinator. He would always inject into sources. We could also ignore the
> race conditions as long as the TM lives. Checkpointing times are also not
> worse as with the
Hi Yun,
thanks for the detailed example. It feels like Aljoscha and you are also
not fully aligned yet. For me, it sounded as if Aljoscha would like to
avoid sending RPC to non-source subtasks.
I think we are still on the same page that we would like to trigger
> checkpoint periodically until
Hi Arvid,
Very thanks for the feedbacks! I'll try to answer the questions inline:
> I'm also concerned about the notion of a final checkpoint. What happens
> when this final checkpoint times out (checkpoint timeout > async timeout)
> or fails for a different reason? I'm currently more inclined
31 matches
Mail list logo