Re: Taskmanagers are quarantined

2017-12-07 Thread T Obi
e on each >> machine. >> Also I will tune JVM memory parameters to reduce the frequency of >> "Full GC (Metadata GC Threshold)". >> >> Best, >> Tetsuya >> >> >> 2017-11-28 16:30 GMT+09:00 T Obi : >> > Hello Chesnay, >> > >> >

Re: Taskmanagers are quarantined

2017-11-29 Thread Stephan Ewen
VM memory parameters to reduce the frequency of > "Full GC (Metadata GC Threshold)". > > Best, > Tetsuya > > > 2017-11-28 16:30 GMT+09:00 T Obi : > > Hello Chesnay, > > > > Thank you for answer to my rough question. > > > > Not all of taskmanage

Re: Taskmanagers are quarantined

2017-11-29 Thread Till Rohrmann
agers run with divided memory size on each > machine. > Also I will tune JVM memory parameters to reduce the frequency of > "Full GC (Metadata GC Threshold)". > > Best, > Tetsuya > > > 2017-11-28 16:30 GMT+09:00 T Obi : > > Hello Chesnay, > > >

Re: Taskmanagers are quarantined

2017-11-29 Thread T Obi
er to my rough question. > > Not all of taskmanagers are quarantined at a time, but each > taskmanager has been quarantined at least once. > > We are using CDH 5.8 based on hadoop 2.6. > We didn't give attention about datanodes. We will check it. > However, we are also using

Re: Taskmanagers are quarantined

2017-11-27 Thread T Obi
Hello Chesnay, Thank you for answer to my rough question. Not all of taskmanagers are quarantined at a time, but each taskmanager has been quarantined at least once. We are using CDH 5.8 based on hadoop 2.6. We didn't give attention about datanodes. We will check it. However, we are also

Re: Taskmanagers are quarantined

2017-11-27 Thread Chesnay Schepler
Are only some taskmanagers quarantined, or all of them? Do the quarantined taskmanagers have anything in common? (are the failing ones always on certain machines; do the stacktraces reference the same hdfs datanodes) Which hadoop version are you using? From the stack-trace it appears that mul

Taskmanagers are quarantined

2017-11-26 Thread T Obi
Hello all, We run jobs on a standalone cluster with Flink 1.3.2 and we're facing a problem. Suddenly a connection between a taskmanager and the jobmanager is timed out and the taskmanager is "quarantined" by jobmanager. Once a taskmanager is quarantined, of course jobs are restarted, but the timeo