bersprockets opened a new pull request #35120: URL: https://github.com/apache/spark/pull/35120
### What changes were proposed in this pull request? Change the Orc struct converter to index an array rather than a linked list when looking up field converters. ### Why are the changes needed? Currently, the OrcSerializer's struct converter uses an index to look up each field converter in a linked list, resulting in a n*(n/2) average complexity per row (where n is the field count). Simply converting the linked list to an array brings performance gains, especially for wide structs. | field count | row count | master | pr | improvement | | ----------- | --------- | ------ | ----- | ----------- | | 10 | 15728640 | 4729 | 4338 | none | | 100 | 157286 | 5270 | 4064 | 22% | | 600 | 26214 | 13548 | 4726 | 65% | The above benchmarks were run on my local machine. Official benchmarks are forthcoming. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Existing unit tests - New benchmark (in a separate PR) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
