Hi Kyle,

Thanks for the help. It seems like I have no other choice than using Spark 
directly, since my job causes immense memory pressure if I can't decide what to 
cache.

Best regards,
Augusto

> On 14 May 2019, at 18:40, Kyle Weaver <[email protected]> wrote:
> 
> Minor correction: Slack channel is actually #beam-spark
> 
> Kyle Weaver | Software Engineer | github.com/ibzib <http://github.com/ibzib> 
> | [email protected] <mailto:[email protected]> | +16502035555
> 
> 
> From: Kyle Weaver <[email protected] <mailto:[email protected]>>
> Date: Tue, May 14, 2019 at 9:38 AM
> To: <[email protected] <mailto:[email protected]>>
> 
> Hi Augusto,
> 
> Right now the default behavior is to cache all intermediate RDDs that are 
> consumed more than once by the pipeline. This can be disabled with 
> `options.setCacheDisabled(true)` [1], but there is currently no way for the 
> user to specify to the runner that it should cache certain RDDs, but not 
> others.
> 
> There has recently been some discussion on the Slack (#spark-beam) about 
> implementing such a feature, but no concrete plans as of yet.
> 
> [1] 
> https://github.com/apache/beam/blob/81faf35c8a42493317eba9fa1e7b06fb42d54662/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java#L150
>  
> <https://github.com/apache/beam/blob/81faf35c8a42493317eba9fa1e7b06fb42d54662/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java#L150>
> 
> Thanks
> 
> Kyle Weaver | Software Engineer | github.com/ibzib <http://github.com/ibzib> 
> | [email protected] <mailto:[email protected]> | +16502035555
> 
> 
> From: [email protected] <mailto:[email protected]> 
> <[email protected] <mailto:[email protected]>>
> Date: Tue, May 14, 2019 at 5:01 AM
> To: <[email protected] <mailto:[email protected]>>
> 
> Hi,
> 
> I guess the title says it all, right now it seems like BEAM caches all the 
> intermediate RDD results for my pipeline when using the Spark runner, this 
> leads to a very inefficient usage of memory. Any way to control this?
> 
> Best regards,
> Augusto

Reply via email to