If you go the port forwarding route, you need to use a SOCKS proxy as well
as forwarding the JMX port because of how JMX works.

For example, I SSH into a worker with:
ssh <blablabla> *-D 7777 -L 5555:127.0.0.1:5555 <http://127.0.0.1:5555>*

and then launch eg, jvisualvm with:
jvisualvm -J-DsocksProxyHost=localhost -J-DsocksProxyPort=7777

Then, set up a connection to the worker using its private IP address
(probably 10.something) on port 5555 (make sure to allow non-SSL
connections as well).

On Mon, Nov 18, 2019 at 12:54 PM Luke Cwik <[email protected]> wrote:

> What Jeff mentioned is the easiest way to get heap dumps on OOM.
>
> If you want to connect to JMX, try using an SSH tunnel and forward the
> ports.
>
> On Mon, Nov 18, 2019 at 8:59 AM Jeff Klukas <[email protected]> wrote:
>
>> Using default Dataflow workers, this is the set of options I passed:
>>
>> --dumpHeapOnOOM --saveHeapDumpsToGcsPath=$MYBUCKET/heapdump
>> --diskSizeGb=100
>>
>>
>> On Mon, Nov 18, 2019 at 11:57 AM Jeff Klukas <[email protected]> wrote:
>>
>>> It sounds like you're generally doing the right thing. I've successfully
>>> used --saveHeapDumpsToGcsPath in a Java pipeline running on Dataflow and
>>> inspected the results in Eclipse MAT.
>>>
>>> I think that --saveHeapDumpsToGcsPath will automatically turn on
>>> --dumpHeapOnOOM but worth setting that explicitly too.
>>>
>>> Are your boot disks large enough to store the heap dumps? The docs for
>>> getSaveHeapDumpsToGcsPath [0] mention "CAUTION: This option implies
>>> dumpHeapOnOOM, and has similar caveats. Specifically, heap dumps can of
>>> comparable size to the default boot disk. Consider increasing the boot disk
>>> size before setting this flag to true."
>>>
>>> When I've done this in the past, I definitely had to increase boot disk
>>> size (though I forget now what the relevant Dataflow option was).
>>>
>>> [0]
>>> https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.html
>>>
>>> On Mon, Nov 18, 2019 at 11:35 AM Reynaldo Baquerizo <
>>> [email protected]> wrote:
>>>
>>>> Hi all,
>>>>
>>>> We are running into OOM issues with one of our pipelines. They are not
>>>> reproducible with DirectRunner, only with Dataflow.
>>>> I tried --saveHeapDumpsToGcsPath, but it does not save any heap dump
>>>> (MyOptions extends DataflowPipelineDebugOptions)
>>>> I looked at the java process inside the docker container and it has
>>>> remote jmx enabled through port 5555, but outside traffic is firewalled.
>>>>
>>>> Beam SDK: 2.15.0
>>>>
>>>> Any ideas?
>>>>
>>>> Cheers,
>>>> --
>>>> Reynaldo
>>>>
>>>

Reply via email to