[GitHub] [systemds] Baunsgaard commented on pull request #1127: [SYSTEMDS-2760] Transpose micro benchmark
Baunsgaard commented on pull request #1127: URL: https://github.com/apache/systemds/pull/1127#issuecomment-748005848 When looking at before and after (the way i tested it was dropping the transpose commit from the history.) it looks like i might have done something wrong in the initial tests. That said, it does not look like the changes had any impact, but it did make me notice the difference between executions on the wide transpose is large. Sometimes it takes 5 seconds sometimes 2.5 I'm guessing it has to do with the two NUMA nodes? The Full transpose micro benchmark: After change Alpha ```code scripts/perftest/results/transpose-skinny-1.0.log Total elapsed time: 5.177 sec. 1 r' 2.567 5 Total elapsed time: 5.592 sec. 1 r' 2.487 5 Total elapsed time: 5.394 sec. 2 r' 2.393 5 Total elapsed time: 5.607 sec. 1 r' 2.496 5 Total elapsed time: 5.361 sec. 1 r' 2.531 5 195735.81 msec task-clock# 31.188 CPUs utilized ( +- 3.50% ) 595845281584 cycles#3.044 GHz ( +- 3.34% ) (30.75%) 67405027834 instructions #0.11 insn per cycle ( +- 2.26% ) (38.51%) scripts/perftest/results/transpose-wide-1.0.log Total elapsed time: 4.870 sec. 1 r' 2.439 5 Total elapsed time: 5.466 sec. 1 r' 2.418 5 Total elapsed time: 5.381 sec. 1 r' 2.393 5 Total elapsed time: 5.257 sec. 1 r' 2.343 5 Total elapsed time: 4.880 sec. 1 r' 2.453 5 197370.59 msec task-clock# 32.701 CPUs utilized ( +- 6.74% ) 598434626116 cycles#3.032 GHz ( +- 6.70% ) (30.76%) 70128163005 instructions #0.12 insn per cycle ( +- 1.65% ) (38.51%) scripts/perftest/results/transpose-full-1.0.log Total elapsed time: 3.736 sec. 2 r' 1.343 5 Total elapsed time: 3.858 sec. 2 r' 1.326 5 Total elapsed time: 3.500 sec. 2 r' 1.299 5 Total elapsed time: 3.894 sec. 2 r' 1.305 5 Total elapsed time: 3.526 sec. 2 r' 1.304 5 104490.76 msec task-clock# 22.819 CPUs utilized ( +- 1.56% ) 320478636150 cycles#3.067 GHz ( +- 1.69% ) (30.80%) 62146562879 instructions #0.19 insn per cycle ( +- 1.59% ) (38.55%) scripts/perftest/results/transpose-skinny-0.1.log Total elapsed time: 2.701 sec. 1 r' 1.437 5 Total elapsed time: 2.659 sec. 1 r' 1.141 5 Total elapsed time: 3.174 sec. 1 r' 1.761 5 Total elapsed time: 2.705 sec. 1 r' 1.103 5 Total elapsed time: 3.112 sec. 1 r' 1.472 5 152922.25 msec task-clock# 43.917 CPUs utilized ( +- 5.32% ) 473697710114 cycles#3.098 GHz ( +- 5.32% ) (31.11%) 75871932728 instructions #0.16 insn per cycle ( +- 2.13% ) (38.92%) scripts/perftest/results/transpose-wide-0.1.log Total elapsed time: 7.215 sec. 1 r' 5.376 5 Total elapsed time: 6.703 sec. 1 r' 4.871 5 Total elapsed time: 4.625 sec. 1 r' 2.815 5 Total elapsed time: 4.400 sec. 1 r' 2.592 5 Total elapsed time: 5.506 sec. 1 r' 3.721 5 214645.79 msec task-clock# 33.943 CPUs utilized ( +- 18.68% ) 658068071617 cycles#3.066 GHz ( +- 18.75% ) (30.71%) 78768925872 instructions #0.12 insn per cycle ( +- 21.76% ) (38.42%) scripts/perftest/results/transpose-full-0.1.log Total elapsed time: 1.368 sec. 1 r' 0.583 5 Total elapsed time: 1.365 sec. 1 r' 0.574 5 Total elapsed time: 1.724 sec. 1 r' 0.835 5 Total elapsed time: 1.564 sec. 1 r' 0.708 5 Total elapsed time: 1.404 sec. 1 r' 0.522 5 79268.38 msec task-clock#
[GitHub] [systemds] Baunsgaard commented on pull request #1127: [SYSTEMDS-2760] Transpose micro benchmark
Baunsgaard commented on pull request #1127: URL: https://github.com/apache/systemds/pull/1127#issuecomment-747426098 > I'll have a look tonight and see what we can do. Airline was dense, right? Yes airline is dense, and i don't seem to be able to reproduce the bad performance calling transpose in a script. dimensions on airline is: 14.5mil row, 29 col, 2200 mil nnz This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [systemds] Baunsgaard commented on pull request #1127: [SYSTEMDS-2760] Transpose micro benchmark
Baunsgaard commented on pull request #1127: URL: https://github.com/apache/systemds/pull/1127#issuecomment-747424955 The large 15 mil case seems to have little to no difference. But there still is a bug somewhere. XPS: ```bash scripts/perftest/results/transpose-large.log Total elapsed time: 7.377 sec. 1 r' 4.352 1 Total elapsed time: 7.835 sec. 1 r' 4.649 1 Total elapsed time: 7.659 sec. 1 r' 4.398 1 Total elapsed time: 7.903 sec. 1 r' 4.677 1 Total elapsed time: 7.723 sec. 1 r' 4.445 1 36.435,71 msec task-clock#4,264 CPUs utilized ( +- 1,27% ) 134.881.449.707 cycles#3,702 GHz ( +- 0,43% ) (30,65%) 119.303.817.112 instructions #0,88 insn per cycle ( +- 0,37% ) (38,39%) ``` Alpha: ```bash scripts/perftest/results/transpose-large.log Total elapsed time: 8.531 sec. 1 r' 5.459 1 Total elapsed time: 8.366 sec. 1 r' 5.412 1 Total elapsed time: 10.413 sec. 1 r' 7.507 1 Total elapsed time: 8.373 sec. 1 r' 5.420 1 Total elapsed time: 8.254 sec. 1 r' 5.394 1 100414.75 msec task-clock# 10.271 CPUs utilized ( +- 5.07% ) 314073685855 cycles#3.128 GHz ( +- 4.82% ) (30.86%) 127951221368 instructions #0.41 insn per cycle ( +- 3.10% ) (38.62%) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org