sadikovi commented on pull request #34596: URL: https://github.com/apache/spark/pull/34596#issuecomment-978913178
The benchmark results are fairly the same, there is some variability. I think we are good here, no separate option is required. Without the PR changes bb9e1d92d931a064c52cbc4cc84eaa32528809f0: ``` [info] OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1045-aws [info] Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz [info] Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] Create a dataset of timestamps 1170 1233 60 8.5 117.0 1.0X [info] to_csv(timestamp) 9771 9838 58 1.0 977.1 0.1X [info] write timestamps to files 8752 8790 34 1.1 875.2 0.1X [info] Create a dataset of dates 1330 1341 9 7.5 133.0 0.9X [info] to_csv(date) 6502 6518 14 1.5 650.2 0.2X [info] write dates to files 5487 5503 14 1.8 548.7 0.2X [info] OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1045-aws [info] Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz [info] Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] read timestamp text from files 1508 1535 26 6.6 150.8 1.0X [info] read timestamps from files 24018 24608 531 0.4 2401.8 0.1X [info] infer timestamps from files 51043 51171 111 0.2 5104.3 0.0X [info] read date text from files 1437 1451 15 7.0 143.7 1.0X [info] read date from files 9391 9433 51 1.1 939.1 0.2X [info] infer date from files 21983 22029 77 0.5 2198.3 0.1X [info] timestamp strings 2488 2519 46 4.0 248.8 0.6X [info] parse timestamps from Dataset[String] 27073 27108 33 0.4 2707.3 0.1X [info] infer timestamps from Dataset[String] 53325 53399 106 0.2 5332.5 0.0X [info] date strings 2802 2809 6 3.6 280.2 0.5X [info] parse dates from Dataset[String] 11487 11577 96 0.9 1148.7 0.1X [info] from_csv(timestamp) 25019 25068 55 0.4 2501.9 0.1X [info] from_csv(date) 10394 10431 39 1.0 1039.4 0.1X ``` With the PR changes: ``` PR changes: [info] OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1045-aws [info] Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz [info] Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] Create a dataset of timestamps 1164 1215 44 8.6 116.4 1.0X [info] to_csv(timestamp) 9733 9831 125 1.0 973.3 0.1X [info] write timestamps to files 8810 8832 22 1.1 881.0 0.1X [info] Create a dataset of dates 1339 1348 9 7.5 133.9 0.9X [info] to_csv(date) 6511 6519 12 1.5 651.1 0.2X [info] write dates to files 5488 5500 11 1.8 548.8 0.2X [info] OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1045-aws [info] Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz [info] Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] read timestamp text from files 1479 1488 10 6.8 147.9 1.0X [info] read timestamps from files 24271 24680 412 0.4 2427.1 0.1X [info] infer timestamps from files 50436 50497 54 0.2 5043.6 0.0X [info] read date text from files 1422 1441 25 7.0 142.2 1.0X [info] read date from files 9725 9795 63 1.0 972.5 0.2X [info] infer date from files 21550 21572 28 0.5 2155.0 0.1X [info] timestamp strings 2483 2528 39 4.0 248.3 0.6X [info] parse timestamps from Dataset[String] 27110 27199 82 0.4 2711.0 0.1X [info] infer timestamps from Dataset[String] 53590 53720 147 0.2 5359.0 0.0X [info] date strings 2635 2644 15 3.8 263.5 0.6X [info] parse dates from Dataset[String] 11662 11714 56 0.9 1166.2 0.1X [info] from_csv(timestamp) 25599 25715 139 0.4 2559.9 0.1X [info] from_csv(date) 10838 10885 41 0.9 1083.8 0.1X [success] Total time: 1164 s (19:24), completed Nov 25, 2021 7:00:03 AM ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
