What Jeff mentioned is the easiest way to get heap dumps on OOM.

If you want to connect to JMX, try using an SSH tunnel and forward the
ports.

On Mon, Nov 18, 2019 at 8:59 AM Jeff Klukas <[email protected]> wrote:

> Using default Dataflow workers, this is the set of options I passed:
>
> --dumpHeapOnOOM --saveHeapDumpsToGcsPath=$MYBUCKET/heapdump
> --diskSizeGb=100
>
>
> On Mon, Nov 18, 2019 at 11:57 AM Jeff Klukas <[email protected]> wrote:
>
>> It sounds like you're generally doing the right thing. I've successfully
>> used --saveHeapDumpsToGcsPath in a Java pipeline running on Dataflow and
>> inspected the results in Eclipse MAT.
>>
>> I think that --saveHeapDumpsToGcsPath will automatically turn on
>> --dumpHeapOnOOM but worth setting that explicitly too.
>>
>> Are your boot disks large enough to store the heap dumps? The docs for
>> getSaveHeapDumpsToGcsPath [0] mention "CAUTION: This option implies
>> dumpHeapOnOOM, and has similar caveats. Specifically, heap dumps can of
>> comparable size to the default boot disk. Consider increasing the boot disk
>> size before setting this flag to true."
>>
>> When I've done this in the past, I definitely had to increase boot disk
>> size (though I forget now what the relevant Dataflow option was).
>>
>> [0]
>> https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.html
>>
>> On Mon, Nov 18, 2019 at 11:35 AM Reynaldo Baquerizo <
>> [email protected]> wrote:
>>
>>> Hi all,
>>>
>>> We are running into OOM issues with one of our pipelines. They are not
>>> reproducible with DirectRunner, only with Dataflow.
>>> I tried --saveHeapDumpsToGcsPath, but it does not save any heap dump
>>> (MyOptions extends DataflowPipelineDebugOptions)
>>> I looked at the java process inside the docker container and it has
>>> remote jmx enabled through port 5555, but outside traffic is firewalled.
>>>
>>> Beam SDK: 2.15.0
>>>
>>> Any ideas?
>>>
>>> Cheers,
>>> --
>>> Reynaldo
>>>
>>

Reply via email to