Hi, Have you succeeded saving a heap dump? I've also run into this a while ago and was not able to save a heap dump nor increase the boot disc size. If you have any update on this, could you please share?
Thanks in advance, Frantisek On Wed, Nov 20, 2019 at 1:46 AM Luke Cwik <[email protected]> wrote: > You might want to reach out to cloud support for help with debugging this > and/or help with how to debug this. > > On Mon, Nov 18, 2019 at 10:56 AM Jeff Klukas <[email protected]> wrote: > >> On Mon, Nov 18, 2019 at 1:32 PM Reynaldo Baquerizo < >> [email protected]> wrote: >> >>> >>> Does it tell anything that the GCP console does not show the options >>> --dumpHeapOnOOM --saveHeapDumpsToGcsPath of a running job under >>> PipelineOptions (it does for diskSizeGb)? >>> >> >> That's normal; I also never saw those heap dump options display in the >> Dataflow UI. I think Dataflow doesn't show any options that originate from >> "Debug" options interfaces. >> >> >> >>> On Mon, Nov 18, 2019 at 11:59 AM Jeff Klukas <[email protected]> >>> wrote: >>> >>>> Using default Dataflow workers, this is the set of options I passed: >>>> >>>> --dumpHeapOnOOM --saveHeapDumpsToGcsPath=$MYBUCKET/heapdump >>>> --diskSizeGb=100 >>>> >>>> >>>> On Mon, Nov 18, 2019 at 11:57 AM Jeff Klukas <[email protected]> >>>> wrote: >>>> >>>>> It sounds like you're generally doing the right thing. I've >>>>> successfully used --saveHeapDumpsToGcsPath in a Java pipeline running on >>>>> Dataflow and inspected the results in Eclipse MAT. >>>>> >>>>> I think that --saveHeapDumpsToGcsPath will automatically turn on >>>>> --dumpHeapOnOOM but worth setting that explicitly too. >>>>> >>>>> Are your boot disks large enough to store the heap dumps? The docs for >>>>> getSaveHeapDumpsToGcsPath [0] mention "CAUTION: This option implies >>>>> dumpHeapOnOOM, and has similar caveats. Specifically, heap dumps can of >>>>> comparable size to the default boot disk. Consider increasing the boot >>>>> disk >>>>> size before setting this flag to true." >>>>> >>>>> When I've done this in the past, I definitely had to increase boot >>>>> disk size (though I forget now what the relevant Dataflow option was). >>>>> >>>>> [0] >>>>> https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.html >>>>> >>>>> On Mon, Nov 18, 2019 at 11:35 AM Reynaldo Baquerizo < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> We are running into OOM issues with one of our pipelines. They are >>>>>> not reproducible with DirectRunner, only with Dataflow. >>>>>> I tried --saveHeapDumpsToGcsPath, but it does not save any heap dump >>>>>> (MyOptions extends DataflowPipelineDebugOptions) >>>>>> I looked at the java process inside the docker container and it has >>>>>> remote jmx enabled through port 5555, but outside traffic is firewalled. >>>>>> >>>>>> Beam SDK: 2.15.0 >>>>>> >>>>>> Any ideas? >>>>>> >>>>>> Cheers, >>>>>> -- >>>>>> Reynaldo >>>>>> >>>>>
