Github user srowen commented on the issue: https://github.com/apache/spark/pull/19266 Yeah, agree, it could be some global constant. I don't think it should be configurable. Ideally it's determined from the JVM, but don't know a way to do that. In many cases, assuming Int.MaxValue is the max array size when it's Int.MaxValue-8 doesn't matter much. For example, arguably I should leave the ML changes alone here, because, in the very rare case that a matrix size is somewhere between Int.MaxValue-8 and Int.MaxValue, it will fail anyway, and it's not avoidable given the user input. It's also, maybe, more conservative to not always assume anything beyond Int.MaxValue-8 is going to fail, and not "proactively" fail at this cutoff. However I think there are a smallish number of identifiable cases where Spark can very much avoid the failure (like BufferHolder), and they're the instances where an array size keeps doubling. Maybe we can stick to those clear cases? especially any one that seems to have triggered the original error? Those cases are few enough and related enough that I'm sure they're just one issue, not several.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org