Hi Kyle, Thanks for the help. It seems like I have no other choice than using Spark directly, since my job causes immense memory pressure if I can't decide what to cache.
Best regards, Augusto > On 14 May 2019, at 18:40, Kyle Weaver <[email protected]> wrote: > > Minor correction: Slack channel is actually #beam-spark > > Kyle Weaver | Software Engineer | github.com/ibzib <http://github.com/ibzib> > | [email protected] <mailto:[email protected]> | +16502035555 > > > From: Kyle Weaver <[email protected] <mailto:[email protected]>> > Date: Tue, May 14, 2019 at 9:38 AM > To: <[email protected] <mailto:[email protected]>> > > Hi Augusto, > > Right now the default behavior is to cache all intermediate RDDs that are > consumed more than once by the pipeline. This can be disabled with > `options.setCacheDisabled(true)` [1], but there is currently no way for the > user to specify to the runner that it should cache certain RDDs, but not > others. > > There has recently been some discussion on the Slack (#spark-beam) about > implementing such a feature, but no concrete plans as of yet. > > [1] > https://github.com/apache/beam/blob/81faf35c8a42493317eba9fa1e7b06fb42d54662/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java#L150 > > <https://github.com/apache/beam/blob/81faf35c8a42493317eba9fa1e7b06fb42d54662/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java#L150> > > Thanks > > Kyle Weaver | Software Engineer | github.com/ibzib <http://github.com/ibzib> > | [email protected] <mailto:[email protected]> | +16502035555 > > > From: [email protected] <mailto:[email protected]> > <[email protected] <mailto:[email protected]>> > Date: Tue, May 14, 2019 at 5:01 AM > To: <[email protected] <mailto:[email protected]>> > > Hi, > > I guess the title says it all, right now it seems like BEAM caches all the > intermediate RDD results for my pipeline when using the Spark runner, this > leads to a very inefficient usage of memory. Any way to control this? > > Best regards, > Augusto
