Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21111#discussion_r183101013
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
 ---
    @@ -114,11 +119,8 @@ case class OptimizeMetadataOnlyQuery(catalog: 
SessionCatalog) extends Rule[Logic
             relation match {
               case l @ LogicalRelation(fsRelation: HadoopFsRelation, _, _, 
isStreaming) =>
                 val partAttrs = 
getPartitionAttrs(fsRelation.partitionSchema.map(_.name), l)
    -            val partitionData = fsRelation.location.listFiles(relFilters, 
Nil)
    -            // partition data may be a stream, which can cause 
serialization to hit stack level too
    -            // deep exceptions because it is a recursive structure in 
memory. converting to array
    -            // avoids the problem.
    --- End diff --
    
    Yes, that does fix it but that's in a non-obvious way. What isn't clear is 
what guarantees that the rows used to construct the LocalRelation will never 
need to be serialized. Would it be reasonable for a future commit to remove the 
`@transient` modifier and re-introduce the problem?
    
    I would rather this return the data in a non-recursive structure, but it's 
a minor point.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to