LuciferYang opened a new pull request, #55816:
URL: https://github.com/apache/spark/pull/55816

   ### What changes were proposed in this pull request?
   
   Extend the bulk read+widen pattern introduced in SPARK-56791 to 
`FloatToDoubleUpdater` (parquet FLOAT read into Spark `DoubleType`).
   
   A new `readFloatsAsDoubles` default method on `VectorizedValuesReader` does 
the per-row fallback. `VectorizedPlainValuesReader` overrides it to fetch 
source bytes once via `getBuffer(total * 4)` and run a tight in-method 
conversion loop. `FloatToDoubleUpdater.readValues` becomes a one-line 
delegation. The widen is Java's primitive float-to-double conversion: exact for 
every finite and infinite float; a NaN float widens to a double NaN (the JVM 
may canonicalize the payload).
   
   ### Why are the changes needed?
   
   `FloatToDoubleUpdater.readValues` allocates a fresh `ByteBuffer` slice 
inside `getBuffer(4)` for every element on the legacy path, and that allocation 
dominates the loop. Collapsing N allocations into one is the same win 
SPARK-56791 delivered for the INT32 -> Long sibling.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   (To be updated after the GHA benchmark and test runs complete.)
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to