Hello, I am running simple Beam pipeline with Spark runner.
I found in Beam's code that particular RDD is cached if corresponding DoFn is using PCollectionTuple, mentioned in TransformTranslator.java <https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java> (line number 413) Want to know what is the need of such kind of caching ? Also SparkRunner option --cacheDisabled is not honoured at this code level. Any specific reason ? Regards, Ajit Dongre
