Github user henrify commented on the issue:
https://github.com/apache/spark/pull/19943
@dongjoon-hyun Thanks. I don't think it matters if nextBatch() is inlined
or not. I think what matters is 1) how the putX() etc methods calls inside the
tight loops are inlined and 2) how complex the methods containing the tight
loops are.
For example the toColumn argument is megamorphic and the putX()
implementation is bimorphic, and then you have about 10 of these in single
method inside if-else 'instanceof' checks. That's quite complex for JVM to
optimize.
If you split the loops so that each loop has it's own method with the
toColumn defined as exact type (BytesColumnVector etc), then the argument is
monomorphic, putX() is 100% biased bimorphic, and there is only one of these.
Lot easier for JVM to optimize.
Again, i'm not sure if it makes difference, but it may, and it is easy to
try (e.g. extract the for loops of just one data type to separate method and
benchmark).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]