Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21111#discussion_r183266741 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala --- @@ -114,11 +119,8 @@ case class OptimizeMetadataOnlyQuery(catalog: SessionCatalog) extends Rule[Logic relation match { case l @ LogicalRelation(fsRelation: HadoopFsRelation, _, _, isStreaming) => val partAttrs = getPartitionAttrs(fsRelation.partitionSchema.map(_.name), l) - val partitionData = fsRelation.location.listFiles(relFilters, Nil) - // partition data may be a stream, which can cause serialization to hit stack level too - // deep exceptions because it is a recursive structure in memory. converting to array - // avoids the problem. --- End diff -- > Would it be reasonable for a future commit to remove the @transient modifier and re-introduce the problem? That's very unlikely. SPARK-21884 guarantees Spark won't serialize the rows and we have regression tests to protect us. BTW it would be a lot of work to make sure all the places that create `LocalRelation` do not use recursive structure. I'll add some comments to `LocalRelation` to emphasize it.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org