[GitHub] [spark] dilip-k-m commented on pull request #4039: [SPARK-5236] Fix ClassCastException in SpecificMutableRow

GitBox Thu, 28 May 2020 14:18:14 -0700


dilip-k-m commented on pull request #4039:
URL: https://github.com/apache/spark/pull/4039#issuecomment-635613051



   I've got the same issue in production.
   I was able to replicate in our performance test environment. So, concluded 
that, with the same cluster configuration, if a spark job is fed growing rate 
of input traffic, while writing after processing the feed, it generates parquet 
files with corrupted footer.
   Again, this probability of footer corruption increases when more unique 
values[the input file has more distinct values, i.e. lesser redundant field 
values] are fed.
   If, number of partitions to write is increased, then also this probability 
is reduced.
   
   I have found that Spark 2.x does not have such issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dilip-k-m commented on pull request #4039: [SPARK-5236] Fix ClassCastException in SpecificMutableRow

Reply via email to