> Persisting is usually the right thing to do. +1, cacheDisabled should only be used if you're certain that *in aggregate* recomputation is faster than writing to and reading from the cache. Keep in mind that cacheDisabled applies to the whole pipeline, meaning you're out of luck if you want to recompute only certain PCollections. There was discussion about maybe annotating PCollections to achieve that, but I don't think anything ever came of it.
That being said, I agree that cacheDisabled being ignored there sounds like a bug. I filed https://jira.apache.org/jira/browse/BEAM-9387 to track the issue. Thanks, Kyle On Wed, Feb 26, 2020 at 7:53 AM Ryan Skraba <[email protected]> wrote: > Hello! > > If I understand correctly in Spark, the common pattern for multiple > outputs is to collect them all into a single _persisted_ RDD > internally, then filter into the separate RDDs (one per output) on > demand. > > Persisting is usually the right thing to do. Otherwise, spark could > risk fusioning the "all" RDD representing the PCollectionTuple with > each filter to get the PCollection (one per TupleTag), and > recalculating "all" many times... or worse, recalculating the upstream > RDDs continually if there's no fusion break or upstream persist. > > It looks like the cacheDisabled flag > (https://issues.apache.org/jira/browse/BEAM-6053) is only considered > for RDDs under PCollections that are reused in the job DAG. That does > sounds like a bug to me, since the description of the flag implies > all-or-nothing. > > I hope this helps, Ryan > > On Wed, Feb 26, 2020 at 11:17 AM Ajit Dongre > <[email protected]> wrote: > > > > Hello, > > > > > > > > I am running simple Beam pipeline with Spark runner. > > > > > > > > I found in Beam's code that particular RDD is cached if corresponding > DoFn is using PCollectionTuple, mentioned in TransformTranslator.java > (line number 413) > > > > Want to know what is the need of such kind of caching ? > > > > > > > > Also SparkRunner option --cacheDisabled is not honoured at this code > level. Any specific reason ? > > > > > > > > Regards, > > > > Ajit Dongre >
