Hi Eugene! Thanks for your comments. Yes, we've downloaded a couple of dumps, but TBH, couldn't understand anything (using the Eclipse MAT), I was wondering if the trace and the fact that "the smaller the buffer, the more OOM errors" could give any of you a hint as I think it may be on the writing part...
Do you know how the dynamic writes are distributed on workers? Based on the full path? Regards On Mon, Jan 29, 2018 at 7:38 PM Eugene Kirpichov <[email protected]> wrote: > Hi, I assume you're using the Dataflow runner. Have you tried using the > OOM debugging flags at > https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.java#L169-L193 > ? > > On Mon, Jan 29, 2018 at 2:05 AM Carlos Alonso <[email protected]> > wrote: > >> Hi everyone!! >> >> On our pipeline we're experiencing OOM errors. It's been a while since >> we've been trying to nail them down but without any luck. >> >> Our pipeline: >> 1. Reads messages from PubSub of different types (about 50 different >> types). >> 2. Outputs KV elements being the key the type and the value the element >> message itself (JSON) >> 3. Applies windowing (fixed windows of one our. With early and late >> firings after one minute after processing the first element in pane). Two >> days allowed lateness and discarding fired panes. >> 4. Buffers the elements (using stateful and timely processing). We buffer >> the elements for 15 minutes or until it reaches a maximum size of 16Mb. >> This step's objective is to avoid window's panes grow too big. >> 5. Writes the outputted elements to files in GCS using dynamic routing. >> Being the route: type/window/buffer_index-paneInfo. >> >> Our fist step was without buffering and just windows with early and late >> firings on 15 minutes, but. guessing that OOMs were because of panes >> growing too big we built that buffering step to trigger on size as well. >> >> The full trace we're seeing can be found here: >> https://pastebin.com/0vfE6pUg >> >> There's an annoying thing we've experienced and is that, the smaller the >> buffer size, the more OOM errors we see which was a bit disappointing... >> >> Can anyone please give us any hint? >> >> Thanks in advance! >> >
