Github user srowen commented on the issue:
https://github.com/apache/spark/pull/19266
Yeah, agree, it could be some global constant. I don't think it should be
configurable. Ideally it's determined from the JVM, but don't know a way to do
that.
In many cases, assuming Int.MaxValue is the max array size when it's
Int.MaxValue-8 doesn't matter much. For example, arguably I should leave the ML
changes alone here, because, in the very rare case that a matrix size is
somewhere between Int.MaxValue-8 and Int.MaxValue, it will fail anyway, and
it's not avoidable given the user input. It's also, maybe, more conservative to
not always assume anything beyond Int.MaxValue-8 is going to fail, and not
"proactively" fail at this cutoff.
However I think there are a smallish number of identifiable cases where
Spark can very much avoid the failure (like BufferHolder), and they're the
instances where an array size keeps doubling. Maybe we can stick to those clear
cases? especially any one that seems to have triggered the original error?
Those cases are few enough and related enough that I'm sure they're just
one issue, not several.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]