Hi, I assume you're using the Dataflow runner. Have you tried using the OOM debugging flags at https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.java#L169-L193 ?
On Mon, Jan 29, 2018 at 2:05 AM Carlos Alonso <[email protected]> wrote: > Hi everyone!! > > On our pipeline we're experiencing OOM errors. It's been a while since > we've been trying to nail them down but without any luck. > > Our pipeline: > 1. Reads messages from PubSub of different types (about 50 different > types). > 2. Outputs KV elements being the key the type and the value the element > message itself (JSON) > 3. Applies windowing (fixed windows of one our. With early and late > firings after one minute after processing the first element in pane). Two > days allowed lateness and discarding fired panes. > 4. Buffers the elements (using stateful and timely processing). We buffer > the elements for 15 minutes or until it reaches a maximum size of 16Mb. > This step's objective is to avoid window's panes grow too big. > 5. Writes the outputted elements to files in GCS using dynamic routing. > Being the route: type/window/buffer_index-paneInfo. > > Our fist step was without buffering and just windows with early and late > firings on 15 minutes, but. guessing that OOMs were because of panes > growing too big we built that buffering step to trigger on size as well. > > The full trace we're seeing can be found here: > https://pastebin.com/0vfE6pUg > > There's an annoying thing we've experienced and is that, the smaller the > buffer size, the more OOM errors we see which was a bit disappointing... > > Can anyone please give us any hint? > > Thanks in advance! >
