Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21111#discussion_r183266741
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
 ---
    @@ -114,11 +119,8 @@ case class OptimizeMetadataOnlyQuery(catalog: 
SessionCatalog) extends Rule[Logic
             relation match {
               case l @ LogicalRelation(fsRelation: HadoopFsRelation, _, _, 
isStreaming) =>
                 val partAttrs = 
getPartitionAttrs(fsRelation.partitionSchema.map(_.name), l)
    -            val partitionData = fsRelation.location.listFiles(relFilters, 
Nil)
    -            // partition data may be a stream, which can cause 
serialization to hit stack level too
    -            // deep exceptions because it is a recursive structure in 
memory. converting to array
    -            // avoids the problem.
    --- End diff --
    
    > Would it be reasonable for a future commit to remove the @transient 
modifier and re-introduce the problem?
    
    That's very unlikely. SPARK-21884 guarantees Spark won't serialize the rows 
and we have regression tests to protect us. BTW it would be a lot of work to 
make sure all the places that create `LocalRelation` do not use recursive 
structure. I'll add some comments to `LocalRelation` to emphasize it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to