umehrot2 commented on issue #2180:
URL: https://github.com/apache/hudi/issues/2180#issuecomment-709480582


   This is strange. The exception appears to be happening at 
https://github.com/apache/hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala#L138
 while doing `fileStatuses.toArray`.
   
   Now it is an `java.lang.ArrayStoreException` which indicates that it is 
possibly trying to store a wrong type of object 
`org.apache.spark.sql.execution.datasources.SerializableFileStatus` in an array 
of `FileStatus`. This would mean that Spark itself is returning 
`Seq[SerializableFileStatus]` instead of `Seq[FileStatus]` which is not 
possible on open source spark.
   
   In open source spark they always convert `SerializableFileStatus` to 
`FileStatus` before returning and thats the contract: 
https://github.com/apache/spark/blob/v2.4.5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala#L246
 . `SerializableFileStatus` is private to that class. So my guess is that 
databricks Spark implementation differs here and its possibly returning 
`Seq[SerializableFileStatus]` and thats why this is happening. Not sure whether 
this is something we want to consider fixing and how. @bvaradar @garyli1019 
@vinothchandar 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to