[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

henrify Mon, 08 Jan 2018 14:45:39 -0800

Github user henrify commented on the issue:

    https://github.com/apache/spark/pull/19943
  
    @dongjoon-hyun Thanks. I don't think it matters if nextBatch() is inlined 
or not. I think what matters is 1) how the putX() etc methods calls inside the 
tight loops are inlined and 2) how complex the methods containing the tight 
loops are.
    
    For example the toColumn argument is megamorphic and the putX() 
implementation is bimorphic, and then you have about 10 of these in single 
method inside if-else 'instanceof' checks. That's quite complex for JVM to 
optimize.
    
    If you split the loops so that each loop has it's own method with the 
toColumn defined as exact type (BytesColumnVector etc), then the argument is 
monomorphic, putX() is 100% biased bimorphic, and there is only one of these. 
Lot easier for JVM to optimize.
    
    Again, i'm not sure if it makes difference, but it may, and it is easy to 
try (e.g. extract the for loops of just one data type to separate method and 
benchmark).



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

Reply via email to