I am currently suffering through similar issues. Had a job running happily, but when it the cluster tried to restarted it would not find the JSON serializer in it. The job kept trying to restart in a loop.
Just today I was running a job I built locally. The job ran fine. I added two commits and rebuilt the jar. Now the job dies when it tries to start claiming it can't find the time assigner class. I've unzipped the job jar, both locally and in the TM blob directory and have confirmed the class is in it. This is the backtrace: java.lang.ClassNotFoundException: com.foo.flink.common.util.TimeAssigner at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown Source) at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:73) at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source) at java.io.ObjectInputStream.readClassDesc(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:393) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:380) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:368) at org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:58) at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.createPartitionStateHolders(AbstractFetcher.java:542) at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.<init>(AbstractFetcher.java:167) at org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.<init>(Kafka09Fetcher.java:89) at org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.<init>(Kafka010Fetcher.java:62) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010.createFetcher(FlinkKafkaConsumer010.java:203) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:564) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:86) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:94) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:264) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) at java.lang.Thread.run(Unknown Source) On Tue, Jan 23, 2018 at 7:51 AM, Stephan Ewen <se...@apache.org> wrote: > Hi! > > We changed a few things between 1.3 and 1.4 concerning Avro. One of the > main things is that Avro is no longer part of the core Flink class library, > but needs to be packaged into your application jar file. > > The class loading / caching issues of 1.3 with respect to Avro should > disappear in Flink 1.4, because Avro classes and caches are scoped to the > job classloaders, so the caches do not go across different jobs, or even > different operators. > > > *Please check: Make sure you have Avro as a dependency in your jar file > (in scope "compile").* > > Hope that solves the issue. > > Stephan > > > On Mon, Jan 22, 2018 at 2:31 PM, Edward <egb...@hotmail.com> wrote: > >> Yes, we've seen this issue as well, though it usually takes many more >> resubmits before the error pops up. Interestingly, of the 7 jobs we run >> (all >> of which use different Avro schemas), we only see this issue on 1 of them. >> Once the NoClassDefFoundError crops up though, it is necessary to recreate >> the task managers. >> >> There's a whole page on the Flink documentation on debugging classloading, >> and Avro is mentioned several times on that page: >> https://ci.apache.org/projects/flink/flink-docs-release-1.3/ >> monitoring/debugging_classloading.html >> >> It seems that (in 1.3 at least) each submitted job has its own >> classloader, >> and its own instance of the Avro class definitions. However, the Avro >> class >> cache will keep references to the Avro classes from classloaders for the >> previous cancelled jobs. That said, we haven't been able to find a >> solution >> to this error yet. Flink 1.4 would be worth a try because the of the >> changes >> to the default classloading behaviour (child-first is the new default, not >> parent-first). >> >> >> >> >> >> -- >> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4. >> nabble.com/ >> > >