No, we are not seeing anything specific about request timeouts in the logs.
Typically, the only thing we do see in the logs is the following:
21/04/21 21:44:52 ERROR ScalaDriverLocal: User Code Stack Trace:
> java.lang.RuntimeException: org.apache.spark.SparkException: Job aborted
> due to stage
> On 27 Apr 2021, at 08:39, Thomas Fredriksen(External)
> wrote:
>
> Thank you, this is very informative.
>
> We tried reducing the JdbcIO batch size from 1 to 1000, then to 100. In
> our runs, we no longer see the explicit OOM-error, but we are seeing executor
> heartbeat timeouts.
Thank you, this is very informative.
We tried reducing the JdbcIO batch size from 1 to 1000, then to 100. In
our runs, we no longer see the explicit OOM-error, but we are seeing
executor heartbeat timeouts. From what we understand, this is typically
caused by OOM-errors also. However, the
> On 26 Apr 2021, at 13:34, Thomas Fredriksen(External)
> wrote:
>
> The stack-trace for the OOM:
>
> 21/04/21 21:40:43 WARN TaskSetManager: Lost task 1.2 in stage 2.0 (TID 57,
> 10.139.64.6, executor 3): org.apache.beam.sdk.util.UserCodeException:
> java.lang.OutOfMemoryError: GC overhead
The stack-trace for the OOM:
21/04/21 21:40:43 WARN TaskSetManager: Lost task 1.2 in stage 2.0 (TID 57,
> 10.139.64.6, executor 3): org.apache.beam.sdk.util.UserCodeException:
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> at
>
Hi Thomas,
Could you share the stack trace of your OOM and, if possible, the code snippet
of your pipeline?
Afaik, usually only “large" GroupByKey transforms, caused by “hot keys”, may
lead to OOM with SparkRunner.
—
Alexey
> On 26 Apr 2021, at 08:23, Thomas Fredriksen(External)
> wrote:
Good morning,
We are ingesting a very large dataset into our database using Beam on
Spark. The dataset is available through a REST-like API and is splicedin
such a way so that in order to obtain the whole dataset, we must do around
24000 API calls.
All in all, this results in 24000 CSV files