awdavidson commented on PR #36150: URL: https://github.com/apache/spark/pull/36150#issuecomment-1107809212
Last week out of interested I looked at a basic benchmark run locally for the alternatives. Thought I would share my results but probably worth doing your own comparison too. Times are in `ms` and taken wrapping the execution in `spark.time(...)` **Number of Rows: 10000000** | Number of Columns |10|30|50|70|90|110| |---|---|---|---|---|---|---| |**Method**|---|---|---|---|---|---| |Stack|37663|112470|209787|355580|467684|605764| |Explode|34038|99185|174957|364611|455648|843931| |FlatMap|54867|194701|356367|527655|721415|DNF| **Number of Rows: 1000000** | Number of Columns |10|30|50|70|90|110| |---|---|---|---|---|---|---| |**Method**|---|---|---|---|---|---| |Stack|6093|11134|19239|33560|41202|50834| |Explode|4529|10127|17087|24381|45797|60731| -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
