Hi folks,
I'm testing running Beam Python pipelines on Flink but I'm running into
consistent Flink JVM metaspace OoM issues.
The symptom is that Flink's task manager's JVM metaspace usage will
monotonically increase with each run of the Beam Python pipeline, and never
goes down, eventually causing an OoM error.
>From the stack trace of the OoM error (see below), it looks like a memory
leak from the class loader, but I'm not 100% sure.
I'll post my setup details below. Any help would be appreciated!
I'm running flink 1.14.5 on my local minikube setup. Here's my flink config:
flink-conf.yaml: |+
jobmanager.rpc.address: flink-jobmanager
taskmanager.numberOfTaskSlots: 2
blob.server.port: 6124
jobmanager.rpc.port: 6123
taskmanager.rpc.port: 6122
jobmanager.heap.size: 2048m
taskmanager.heap.size: 4096m
I'm on Python Beam SDK 2.40.0. The Python Beam pipeline I'm running is the
wordcount example provided with the SDK. Here's my command to run it:
python3 -m apache_beam.examples.wordcount --input '/etc/*.conf' --output
'/tmp/output/out' --runner FlinkRunner --flink_master 192.168.49.2:30081
--flink_submit_uber_jar --flink_version 1.14 --environment_type EXTERNAL
--environment_config localhost:50000
The stack trace of the OoM error:
java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error
has occurred. This can mean two things: either the job requires a larger
size of JVM metaspace to load classes or there is a class loading leak. In
the first case 'taskmanager.memory.jvm-metaspace.size' configuration option
should be increased. If the error persists (usually in cluster after
several job (re-)submissions) then there is probably a class loading leak
in user code or some of its dependencies which has to be investigated and
fixed. The task executor has to be shutdown...
at java.lang.ClassLoader.defineClass1(Native Method) ~[?:?]
at java.lang.ClassLoader.defineClass(Unknown Source) ~[?:?]
at java.security.SecureClassLoader.defineClass(Unknown Source) ~[?:?]
at java.net.URLClassLoader.defineClass(Unknown Source) ~[?:?]
at java.net.URLClassLoader$1.run(Unknown Source) ~[?:?]
at java.net.URLClassLoader$1.run(Unknown Source) ~[?:?]
at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
at java.net.URLClassLoader.findClass(Unknown Source) ~[?:?]
at
org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:71)
~[flink-dist_2.12-1.14.5.jar:1.14.5]
at
org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:48)
~[flink-dist_2.12-1.14.5.jar:1.14.5]
at java.lang.ClassLoader.loadClass(Unknown Source) ~[?:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.DescriptorProtos$EnumDescriptorProto.<init>(DescriptorProtos.java:15448)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.DescriptorProtos$EnumDescriptorProto.<init>(DescriptorProtos.java:15368)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.DescriptorProtos$EnumDescriptorProto$1.parsePartialFrom(DescriptorProtos.java:17941)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.DescriptorProtos$EnumDescriptorProto$1.parsePartialFrom(DescriptorProtos.java:17935)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.CodedInputStream$ArrayDecoder.readMessage(CodedInputStream.java:889)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.DescriptorProtos$DescriptorProto.<init>(DescriptorProtos.java:5246)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.DescriptorProtos$DescriptorProto.<init>(DescriptorProtos.java:5164)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.DescriptorProtos$DescriptorProto$1.parsePartialFrom(DescriptorProtos.java:10295)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.DescriptorProtos$DescriptorProto$1.parsePartialFrom(DescriptorProtos.java:10289)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.CodedInputStream$ArrayDecoder.readMessage(CodedInputStream.java:889)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.DescriptorProtos$FileDescriptorProto.<init>(DescriptorProtos.java:1281)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.DescriptorProtos$FileDescriptorProto.<init>(DescriptorProtos.java:1201)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.DescriptorProtos$FileDescriptorProto$1.parsePartialFrom(DescriptorProtos.java:4888)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.DescriptorProtos$FileDescriptorProto$1.parsePartialFrom(DescriptorProtos.java:4882)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:158)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:191)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:203)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:208)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.DescriptorProtos$FileDescriptorProto.parseFrom(DescriptorProtos.java:2297)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]
at
org.apache.beam.vendor.grpc.v1p43p2.com.google.protobuf.Descriptors$FileDescriptor.internalBuildGeneratedFileFrom(Descriptors.java:417)
~[blob_p-a27d743072f4873abe7ed62919a585d4ada8d410-d24838cc004fe03290f9982e48ff25b4:?]