You might want to reach out to cloud support for help with debugging this and/or help with how to debug this.
On Mon, Nov 18, 2019 at 10:56 AM Jeff Klukas <[email protected]> wrote: > On Mon, Nov 18, 2019 at 1:32 PM Reynaldo Baquerizo < > [email protected]> wrote: > >> >> Does it tell anything that the GCP console does not show the options >> --dumpHeapOnOOM --saveHeapDumpsToGcsPath of a running job under >> PipelineOptions (it does for diskSizeGb)? >> > > That's normal; I also never saw those heap dump options display in the > Dataflow UI. I think Dataflow doesn't show any options that originate from > "Debug" options interfaces. > > > >> On Mon, Nov 18, 2019 at 11:59 AM Jeff Klukas <[email protected]> wrote: >> >>> Using default Dataflow workers, this is the set of options I passed: >>> >>> --dumpHeapOnOOM --saveHeapDumpsToGcsPath=$MYBUCKET/heapdump >>> --diskSizeGb=100 >>> >>> >>> On Mon, Nov 18, 2019 at 11:57 AM Jeff Klukas <[email protected]> >>> wrote: >>> >>>> It sounds like you're generally doing the right thing. I've >>>> successfully used --saveHeapDumpsToGcsPath in a Java pipeline running on >>>> Dataflow and inspected the results in Eclipse MAT. >>>> >>>> I think that --saveHeapDumpsToGcsPath will automatically turn on >>>> --dumpHeapOnOOM but worth setting that explicitly too. >>>> >>>> Are your boot disks large enough to store the heap dumps? The docs for >>>> getSaveHeapDumpsToGcsPath [0] mention "CAUTION: This option implies >>>> dumpHeapOnOOM, and has similar caveats. Specifically, heap dumps can of >>>> comparable size to the default boot disk. Consider increasing the boot disk >>>> size before setting this flag to true." >>>> >>>> When I've done this in the past, I definitely had to increase boot disk >>>> size (though I forget now what the relevant Dataflow option was). >>>> >>>> [0] >>>> https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.html >>>> >>>> On Mon, Nov 18, 2019 at 11:35 AM Reynaldo Baquerizo < >>>> [email protected]> wrote: >>>> >>>>> Hi all, >>>>> >>>>> We are running into OOM issues with one of our pipelines. They are not >>>>> reproducible with DirectRunner, only with Dataflow. >>>>> I tried --saveHeapDumpsToGcsPath, but it does not save any heap dump >>>>> (MyOptions extends DataflowPipelineDebugOptions) >>>>> I looked at the java process inside the docker container and it has >>>>> remote jmx enabled through port 5555, but outside traffic is firewalled. >>>>> >>>>> Beam SDK: 2.15.0 >>>>> >>>>> Any ideas? >>>>> >>>>> Cheers, >>>>> -- >>>>> Reynaldo >>>>> >>>>
