RDD Caching in SparkRunner

Ajit Dongre Wed, 26 Feb 2020 02:17:52 -0800

Hello,

I am running simple Beam pipeline with Spark runner.


I found in Beam's code that particular RDD is cached if corresponding DoFn is 
using PCollectionTuple, mentioned in TransformTranslator.java 
<https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java>
  (line number 413)
Want to know what is the need of such kind of caching ?

Also SparkRunner option --cacheDisabled is not honoured at this code level. Any 
specific reason ?

Regards,
Ajit Dongre

RDD Caching in SparkRunner

Reply via email to