I’m seeing sporadic issues where it appears that curator (or other) user 
threads are left running after a stream shutdown, and then the user class 
loader goes away and I get spammed with ClassNotFoundExceptions… I’m wondering 
if this might have something to do with perhaps the UserClassLoader being shut 
down before close is invoked on all operators?

Here’s a stack trace I see from an attempt at closing an elastic search sink:

java.lang.ClassNotFoundException: 
com.intellify.flink.shaded.curator.org.apache.curator.utils.CloseableUtils
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at 
org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$ChildFirstClassLoader.loadClass(FlinkUserCodeClassLoaders.java:128)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at 
com.intellify.flink.shaded.curator.org.apache.curator.ConnectionState.close(ConnectionState.java:119)
    at 
com.intellify.flink.shaded.curator.org.apache.curator.CuratorZookeeperClient.close(CuratorZookeeperClient.java:227)
    at 
com.intellify.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.close(CuratorFrameworkImpl.java:381)
    at 
com.intellify.config.ManagedCuratorFramework.reallyClose(ManagedCuratorFramework.java:48)
    at 
com.intellify.config.ArchaiusInitializer.close(ArchaiusInitializer.java:75)
    at org.apache.commons.io.IOUtils.closeQuietly(IOUtils.java:303)
    at 
com.intellify.flink.shared.config.SharedIntellifyConfigProvider.close(SharedIntellifyConfigProvider.java:56)
    at 
com.intellify.flink.shared.config.SerializableLiveProperty.close(SerializableLiveProperty.java:68)
    at 
com.intellify.flink.shared.elasticsearch.LiveResolvingEs1ApiCallBridge.cleanup(LiveResolvingEs1ApiCallBridge.java:105)
    at 
org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase.close(ElasticsearchSinkBase.java:323)
    at com.intellify.flink.shared.tracer.TracingSink.close(TracingSink.java:50)
    at 
org.apache.flink.api.common.functions.util.FunctionUtils.closeFunction(FunctionUtils.java:43)
    at 
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:117)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:446)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:351)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
    at java.lang.Thread.run(Thread.java:748)
I’m using a curator connection for archaius, and closing it in the call 
bridge’s cleanup method. I’m ensuring that I’m not reaching up into the parent 
class loader by shading curator and zookeeper. 

I also see the following on repeat in my task manager log:

2018-01-11 14:53:13.313 [heartbeat-filter -> input-trace-filter -> 
filter-inactive-ds -> filter-duplicates 
(ip-10-80-53-99.us-west-2.compute.internal:2181)] WARN  
c.i.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Session 
0x3c00002d8a7603de for server 
ip-10-80-53-99.us-west-2.compute.internal/10.80.53.99:2181, unexpected error, 
closing socket connection and attempting reconnect
java.lang.NoClassDefFoundError: 
com/intellify/flink/shaded/zookeeper/org/apache/zookeeper/proto/SetWatches
        at 
com.intellify.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn$SendThread.primeConnection(ClientCnxn.java:926)
        at 
com.intellify.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:363)
        at 
com.intellify.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)


Does anyone have any insight into what might be happening here? Does this seem 
like I’m not closing a thread properly, or something else entirely?


--
Jared Stehler
Chief Architect - Intellify Learning
o: 617.701.6330 x703



Reply via email to