Using default Dataflow workers, this is the set of options I passed:

--dumpHeapOnOOM --saveHeapDumpsToGcsPath=$MYBUCKET/heapdump --diskSizeGb=100


On Mon, Nov 18, 2019 at 11:57 AM Jeff Klukas <[email protected]> wrote:

> It sounds like you're generally doing the right thing. I've successfully
> used --saveHeapDumpsToGcsPath in a Java pipeline running on Dataflow and
> inspected the results in Eclipse MAT.
>
> I think that --saveHeapDumpsToGcsPath will automatically turn on
> --dumpHeapOnOOM but worth setting that explicitly too.
>
> Are your boot disks large enough to store the heap dumps? The docs for
> getSaveHeapDumpsToGcsPath [0] mention "CAUTION: This option implies
> dumpHeapOnOOM, and has similar caveats. Specifically, heap dumps can of
> comparable size to the default boot disk. Consider increasing the boot disk
> size before setting this flag to true."
>
> When I've done this in the past, I definitely had to increase boot disk
> size (though I forget now what the relevant Dataflow option was).
>
> [0]
> https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.html
>
> On Mon, Nov 18, 2019 at 11:35 AM Reynaldo Baquerizo <
> [email protected]> wrote:
>
>> Hi all,
>>
>> We are running into OOM issues with one of our pipelines. They are not
>> reproducible with DirectRunner, only with Dataflow.
>> I tried --saveHeapDumpsToGcsPath, but it does not save any heap dump
>> (MyOptions extends DataflowPipelineDebugOptions)
>> I looked at the java process inside the docker container and it has
>> remote jmx enabled through port 5555, but outside traffic is firewalled.
>>
>> Beam SDK: 2.15.0
>>
>> Any ideas?
>>
>> Cheers,
>> --
>> Reynaldo
>>
>

Reply via email to