Github user vamaral1 commented on the issue: https://github.com/apache/spark/pull/21397 Thanks for the quick responses. I did try to build everything from scratch and am still getting the error on large datasets. If I run on a few tens of GB, there's no problem but once it gets to a couple hundred GB, that's when I start seeing the issue. I will try to create a reproducible example and post it here shortly.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org