If you go the port forwarding route, you need to use a SOCKS proxy as well as forwarding the JMX port because of how JMX works.
For example, I SSH into a worker with: ssh <blablabla> *-D 7777 -L 5555:127.0.0.1:5555 <http://127.0.0.1:5555>* and then launch eg, jvisualvm with: jvisualvm -J-DsocksProxyHost=localhost -J-DsocksProxyPort=7777 Then, set up a connection to the worker using its private IP address (probably 10.something) on port 5555 (make sure to allow non-SSL connections as well). On Mon, Nov 18, 2019 at 12:54 PM Luke Cwik <[email protected]> wrote: > What Jeff mentioned is the easiest way to get heap dumps on OOM. > > If you want to connect to JMX, try using an SSH tunnel and forward the > ports. > > On Mon, Nov 18, 2019 at 8:59 AM Jeff Klukas <[email protected]> wrote: > >> Using default Dataflow workers, this is the set of options I passed: >> >> --dumpHeapOnOOM --saveHeapDumpsToGcsPath=$MYBUCKET/heapdump >> --diskSizeGb=100 >> >> >> On Mon, Nov 18, 2019 at 11:57 AM Jeff Klukas <[email protected]> wrote: >> >>> It sounds like you're generally doing the right thing. I've successfully >>> used --saveHeapDumpsToGcsPath in a Java pipeline running on Dataflow and >>> inspected the results in Eclipse MAT. >>> >>> I think that --saveHeapDumpsToGcsPath will automatically turn on >>> --dumpHeapOnOOM but worth setting that explicitly too. >>> >>> Are your boot disks large enough to store the heap dumps? The docs for >>> getSaveHeapDumpsToGcsPath [0] mention "CAUTION: This option implies >>> dumpHeapOnOOM, and has similar caveats. Specifically, heap dumps can of >>> comparable size to the default boot disk. Consider increasing the boot disk >>> size before setting this flag to true." >>> >>> When I've done this in the past, I definitely had to increase boot disk >>> size (though I forget now what the relevant Dataflow option was). >>> >>> [0] >>> https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.html >>> >>> On Mon, Nov 18, 2019 at 11:35 AM Reynaldo Baquerizo < >>> [email protected]> wrote: >>> >>>> Hi all, >>>> >>>> We are running into OOM issues with one of our pipelines. They are not >>>> reproducible with DirectRunner, only with Dataflow. >>>> I tried --saveHeapDumpsToGcsPath, but it does not save any heap dump >>>> (MyOptions extends DataflowPipelineDebugOptions) >>>> I looked at the java process inside the docker container and it has >>>> remote jmx enabled through port 5555, but outside traffic is firewalled. >>>> >>>> Beam SDK: 2.15.0 >>>> >>>> Any ideas? >>>> >>>> Cheers, >>>> -- >>>> Reynaldo >>>> >>>
