You might want to reach out to cloud support for help with debugging this
and/or help with how to debug this.

On Mon, Nov 18, 2019 at 10:56 AM Jeff Klukas <[email protected]> wrote:

> On Mon, Nov 18, 2019 at 1:32 PM Reynaldo Baquerizo <
> [email protected]> wrote:
>
>>
>> Does it tell anything that the GCP console does not show the options
>> --dumpHeapOnOOM --saveHeapDumpsToGcsPath of a running job under
>> PipelineOptions (it does for diskSizeGb)?
>>
>
> That's normal; I also never saw those heap dump options display in the
> Dataflow UI. I think Dataflow doesn't show any options that originate from
> "Debug" options interfaces.
>
>
>
>> On Mon, Nov 18, 2019 at 11:59 AM Jeff Klukas <[email protected]> wrote:
>>
>>> Using default Dataflow workers, this is the set of options I passed:
>>>
>>> --dumpHeapOnOOM --saveHeapDumpsToGcsPath=$MYBUCKET/heapdump
>>> --diskSizeGb=100
>>>
>>>
>>> On Mon, Nov 18, 2019 at 11:57 AM Jeff Klukas <[email protected]>
>>> wrote:
>>>
>>>> It sounds like you're generally doing the right thing. I've
>>>> successfully used --saveHeapDumpsToGcsPath in a Java pipeline running on
>>>> Dataflow and inspected the results in Eclipse MAT.
>>>>
>>>> I think that --saveHeapDumpsToGcsPath will automatically turn on
>>>> --dumpHeapOnOOM but worth setting that explicitly too.
>>>>
>>>> Are your boot disks large enough to store the heap dumps? The docs for
>>>> getSaveHeapDumpsToGcsPath [0] mention "CAUTION: This option implies
>>>> dumpHeapOnOOM, and has similar caveats. Specifically, heap dumps can of
>>>> comparable size to the default boot disk. Consider increasing the boot disk
>>>> size before setting this flag to true."
>>>>
>>>> When I've done this in the past, I definitely had to increase boot disk
>>>> size (though I forget now what the relevant Dataflow option was).
>>>>
>>>> [0]
>>>> https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.html
>>>>
>>>> On Mon, Nov 18, 2019 at 11:35 AM Reynaldo Baquerizo <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> We are running into OOM issues with one of our pipelines. They are not
>>>>> reproducible with DirectRunner, only with Dataflow.
>>>>> I tried --saveHeapDumpsToGcsPath, but it does not save any heap dump
>>>>> (MyOptions extends DataflowPipelineDebugOptions)
>>>>> I looked at the java process inside the docker container and it has
>>>>> remote jmx enabled through port 5555, but outside traffic is firewalled.
>>>>>
>>>>> Beam SDK: 2.15.0
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> Cheers,
>>>>> --
>>>>> Reynaldo
>>>>>
>>>>

Reply via email to