Sorry I wanted to write Kryo but I'm on my mobile.... On 4 Jul 2016 12:34 p.m., "Flavio Pompermaier" <pomperma...@okkam.it> wrote:
> Because I don't see any good reason for that...maybe also all keyo > serialization errors that I have from time to time could be symptomatic of > some other error in how Flink manage the ibternal buffers...but also this > is just another personal guess I did.. > On 4 Jul 2016 12:29 p.m., "Ufuk Celebi" <u...@apache.org> wrote: > >> It's not possible to tell. You would have to look into the logs of the >> job manager to check what happened. The not killed task manager could >> have re-connected to the job manager, if it was restarted quickly >> after the failure. Why do you think that the task manager would >> influence the job result though? >> >> On Mon, Jul 4, 2016 at 12:23 PM, Flavio Pompermaier >> <pomperma...@okkam.it> wrote: >> > No, I haven't. >> > I fear that unkilled taskmanger could have been the cause of this >> problem. >> > Last day I run the job and I discovered that on some node there was some >> > zombie taskmanger yhat wasn't terminated during the stop-cluster. >> > What do you think?What happens in this situations?old taskmanager are >> still >> > avle to interfer with the new jobmanager? >> > in the webdashboard I didn't see them so I thought it wasn't >> problematic >> > at all so I just killed them.. >> > >> > On 4 Jul 2016 12:07 p.m., "Ufuk Celebi" <u...@apache.org> wrote: >> > >> > I guess Aljoscha was referring to whether you also have broadcasted >> > input or something like it? >> > >> > On Fri, Jul 1, 2016 at 7:05 PM, Flavio Pompermaier < >> pomperma...@okkam.it> >> > wrote: >> >> what do you mean exactly? >> >> >> >> On 1 Jul 2016 18:58, "Aljoscha Krettek" <aljos...@apache.org> wrote: >> >>> >> >>> Hi, >> >>> do you have any data in the coGroup/groupBy operators that you use, >> >>> besides the input data? >> >>> >> >>> Cheers, >> >>> Aljoscha >> >>> >> >>> On Fri, 1 Jul 2016 at 14:17 Flavio Pompermaier <pomperma...@okkam.it> >> >>> wrote: >> >>>> >> >>>> Hi to all, >> >>>> I have a Flink job that computes data correctly when launched locally >> >>>> from my IDE while it doesn't when launched on the cluster. >> >>>> >> >>>> Is there any suggestion/example to understand the problematic >> operators >> >>>> in this way? >> >>>> I think the root cause is the fact that some operator (e.g. >> >>>> coGroup/groupBy,etc), which I assume to have all the data for a key, >> >>>> maybe >> >>>> it is not (because the data is partitioned among nodes). >> >>>> >> >>>> Any help is appreciated, >> >>>> Flavio >> >