date:20210426

Re: Question on late data handling in Beam streaming mode

2021-04-26 Thread Tao Li

Thanks folks. This is really informative! From: Kenneth Knowles Reply-To: "user@beam.apache.org" Date: Friday, April 23, 2021 at 9:34 AM To: Reuven Lax Cc: user , Kenneth Knowles , Kelly Smith , Lian Jiang Subject: Re: Question on late data handling in Beam streaming mode Reuven's answer

Re: Avoiding OutOfMemoryError for large batch-jobs

2021-04-26 Thread Alexey Romanenko

> On 26 Apr 2021, at 13:34, Thomas Fredriksen(External) > wrote: > > The stack-trace for the OOM: > > 21/04/21 21:40:43 WARN TaskSetManager: Lost task 1.2 in stage 2.0 (TID 57, > 10.139.64.6, executor 3): org.apache.beam.sdk.util.UserCodeException: > java.lang.OutOfMemoryError: GC overhead

Re: Avoiding OutOfMemoryError for large batch-jobs

2021-04-26 Thread Thomas Fredriksen(External)

The stack-trace for the OOM: 21/04/21 21:40:43 WARN TaskSetManager: Lost task 1.2 in stage 2.0 (TID 57, > 10.139.64.6, executor 3): org.apache.beam.sdk.util.UserCodeException: > java.lang.OutOfMemoryError: GC overhead limit exceeded > at >

Re: Avoiding OutOfMemoryError for large batch-jobs

2021-04-26 Thread Alexey Romanenko

Hi Thomas, Could you share the stack trace of your OOM and, if possible, the code snippet of your pipeline? Afaik, usually only “large" GroupByKey transforms, caused by “hot keys”, may lead to OOM with SparkRunner. — Alexey > On 26 Apr 2021, at 08:23, Thomas Fredriksen(External) > wrote:

Avoiding OutOfMemoryError for large batch-jobs

2021-04-26 Thread Thomas Fredriksen(External)

Good morning, We are ingesting a very large dataset into our database using Beam on Spark. The dataset is available through a REST-like API and is splicedin such a way so that in order to obtain the whole dataset, we must do around 24000 API calls. All in all, this results in 24000 CSV files

Re: Question on late data handling in Beam streaming mode

Re: Avoiding OutOfMemoryError for large batch-jobs

Re: Avoiding OutOfMemoryError for large batch-jobs

Re: Avoiding OutOfMemoryError for large batch-jobs

Avoiding OutOfMemoryError for large batch-jobs

5 matches

Site Navigation

Mail list logo

Footer information