Dear Community, I am a beginner of using Spark. I am confused by the comment of the following method.
def union(other: Dataset[T]): Dataset[T] = withSetOperator { // This breaks caching, but it's usually ok because it addresses a very specific use case: // using union to union many files or partitions. CombineUnions(Union(logicalPlan, other.logicalPlan)).mapChildren(AnalysisBarrier) } and here is the corresponding PR comment https://github.com/apache/spark/pull/10577#discussion_r48820132 Another option would just be to do this at construction time, that way we can avoid paying the cost in the analyzer. *This would still limit the cases we could cache (i.e. we'd miss cached data unioned with other data), but that doesn't seem like a huge deal.* Could anyone please kindly explain to me what does *This breaks caching *mean? It would be awesome if an example is given. Best regards, Yi Huang