Hello (5-nodes, RF=3, 32GB RAM, 0.8.6, BOP, 6x2.33Ghz AMD cores) I'm encountering a weird issue on my CPU-bound workload. Basically I have two processes - one is sending counter increments as fast as it can for a couple tens of seconds. When few were issued it sleeps for 30 seconds - another is reading thousands of keys per second, computing a PNG and writing the PNG out. I've got about a dozen threads doing this in //
When the cluster is loaded, the first process starts timing out. When this happens, the nodes just freeze all at the same time : CPU goes down to 0 (as seen below). This happens every single time. And I can see the second process gets paused during this timeout Any ideas why this might be happening ? Thanks Philippe 0 0 128 165472 8304 14347412 0 0 0 0 289 281 0 0 100 0 5 0 128 166452 8288 14337836 0 0 2 32 449 483 1 1 98 0 0 1 128 170252 8288 14337872 0 0 6 0 551 597 1 1 98 0 1 0 128 180968 8280 14329708 0 0 0 16 2276 2151 12 3 84 0 2 0 128 166788 8288 14329900 0 0 0 58 2368 2316 13 4 83 0 0 0 128 179676 8288 14329960 0 0 94 10 2438 2411 13 4 83 0 4 1 128 173012 8296 14331244 0 0 660 26 22201 40483 50 8 40 1 21 0 128 172224 8296 14333276 0 0 926 146 29059 52564 70 9 19 1 20 0 128 164532 8296 14335092 0 0 814 36 31140 72887 75 10 14 1 10 0 128 160648 8304 14335316 0 0 46 52 31737 85292 78 11 11 0 38 0 128 167832 8304 14328224 0 0 298 0 26278 71371 79 8 12 0 7 0 128 177600 8304 14328656 0 0 216 0 23951 66422 78 9 13 1 4 0 128 177776 8312 14332372 0 0 2116 330 15475 24864 47 7 43 3 1 0 128 177840 8312 14332208 0 0 0 15 2719 2646 13 4 83 0 1 0 128 165456 8320 14332424 0 0 0 42 3013 2786 15 3 81 0 3 0 128 178776 8320 14332332 0 0 0 10 21238 47980 44 8 47 0 1 0 128 179048 8320 14332332 0 0 0 0 8365 11498 28 6 66 0 0 0 128 178476 8328 14332396 0 0 0 38 7135 8871 26 3 71 0 1 0 128 177004 8328 14332336 0 0 0 34 4220 4494 17 3 80 0 1 0 128 178364 8336 14332368 0 0 0 40 6023 6676 20 4 76 0 1 0 128 165420 8336 14332432 0 0 0 10 7590 8065 27 5 69 0 0 0 128 179204 8336 14332368 0 0 0 6 8593 10051 29 5 66 0 3 0 128 166092 8344 14332368 0 0 0 26 3416 3709 11 2 87 0 1 0 128 174644 8344 14332332 0 0 0 14 2193 2178 13 4 83 0 1 0 128 178216 8344 14332332 0 0 0 0 2019 1955 12 3 85 0 8 0 128 177256 8352 14333704 0 0 558 74 17155 28102 50 6 44 0 13 0 128 176964 8352 14334388 0 0 262 4 25344 43567 56 9 34 1 20 0 128 176472 8360 14334120 0 0 116 276 21765 35447 46 9 44 0 8 0 128 180508 8356 14329500 0 0 1494 18 30749 42590 53 10 36 2 1 0 128 164908 8356 14331800 0 0 1006 0 23405 37877 59 9 33 0 10 0 128 178016 8364 14332748 0 0 424 44 24255 37797 55 9 36 0 1 0 128 174984 8364 14333828 0 0 538 0 24787 41429 56 9 35 1 10 1 128 175436 8372 14335028 0 0 734 358 28829 47777 54 9 36 0 11 0 128 176576 8372 14335220 0 0 92 18 28132 50177 55 8 37 0 3 0 128 176476 8372 14335288 0 0 34 0 23369 36918 36 6 58 0 6 0 128 176104 8380 14335556 0 0 134 34 23846 38779 40 6 54 0 0 0 128 176200 8380 14335784 0 0 114 4 6777 8368 16 1 83 0 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 128 176456 8380 14335456 0 0 0 162 4968 5861 12 1 87 0 0 0 128 176588 8388 14335520 0 0 0 44 288 282 0 0 100 0 0 0 128 176752 8388 14335524 0 0 0 2 253 242 0 0 100 0 0 0 128 176752 8388 14335524 0 0 0 0 286 275 0 0 100 0 0 0 128 176924 8388 14335524 0 0 0 0 260 250 0 0 100 0