Re: PartitionNotFoundException after deployment

2018-06-05 Thread Nico Kruber
Hi Gyula, as a follow-up, you may be interested in https://issues.apache.org/jira/browse/FLINK-9413 Nico On 04/05/18 15:36, Gyula Fóra wrote: > Looks pretty clear that one operator takes too long to start (even on > the UI it shows it in the created state for far too long). Any idea what > might

Re: PartitionNotFoundException after deployment

2018-05-04 Thread Gyula Fóra
Looks pretty clear that one operator takes too long to start (even on the UI it shows it in the created state for far too long). Any idea what might cause this delay? It actually often crashes on Akka ask timeout during scheduling the node. Gyula Piotr Nowojski ezt írta (időpont: 2018. máj. 4.,

Re: PartitionNotFoundException after deployment

2018-05-04 Thread Piotr Nowojski
Ufuk: I don’t know why. +1 for your other suggestions. Piotrek > On 4 May 2018, at 14:52, Ufuk Celebi wrote: > > Hey Gyula! > > I'm including Piotr and Nico (cc'd) who have worked on the network > stack in the last releases. > > Registering the network structures including the intermediate r

Re: PartitionNotFoundException after deployment

2018-05-04 Thread Ufuk Celebi
Hey Gyula! I'm including Piotr and Nico (cc'd) who have worked on the network stack in the last releases. Registering the network structures including the intermediate results actually happens **before** any state is restored. I'm not sure why this reproducibly happens when you restore state. @Ni

PartitionNotFoundException after deployment

2018-05-04 Thread Gyula Fóra
Hi Ufuk, Do you have any quick idea what could cause this problems in flink 1.4.2? Seems like one operator takes too long to deploy and downstream tasks error out on partition not found. This only seems to happen when the job is restored from state and in fact that operator has some keyed and oper