Using default Dataflow workers, this is the set of options I passed: --dumpHeapOnOOM --saveHeapDumpsToGcsPath=$MYBUCKET/heapdump --diskSizeGb=100
On Mon, Nov 18, 2019 at 11:57 AM Jeff Klukas <[email protected]> wrote: > It sounds like you're generally doing the right thing. I've successfully > used --saveHeapDumpsToGcsPath in a Java pipeline running on Dataflow and > inspected the results in Eclipse MAT. > > I think that --saveHeapDumpsToGcsPath will automatically turn on > --dumpHeapOnOOM but worth setting that explicitly too. > > Are your boot disks large enough to store the heap dumps? The docs for > getSaveHeapDumpsToGcsPath [0] mention "CAUTION: This option implies > dumpHeapOnOOM, and has similar caveats. Specifically, heap dumps can of > comparable size to the default boot disk. Consider increasing the boot disk > size before setting this flag to true." > > When I've done this in the past, I definitely had to increase boot disk > size (though I forget now what the relevant Dataflow option was). > > [0] > https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.html > > On Mon, Nov 18, 2019 at 11:35 AM Reynaldo Baquerizo < > [email protected]> wrote: > >> Hi all, >> >> We are running into OOM issues with one of our pipelines. They are not >> reproducible with DirectRunner, only with Dataflow. >> I tried --saveHeapDumpsToGcsPath, but it does not save any heap dump >> (MyOptions extends DataflowPipelineDebugOptions) >> I looked at the java process inside the docker container and it has >> remote jmx enabled through port 5555, but outside traffic is firewalled. >> >> Beam SDK: 2.15.0 >> >> Any ideas? >> >> Cheers, >> -- >> Reynaldo >> >
