Re: [gentoo-user] Re: Slow network transfers ... lost interrupts because of clocksource?
Am 10.10.2013 16:38, schrieb Stefan G. Weichinger: I don't plan to stay with 3.8.13, this is just an intermediate step to get a working config. For now I don't have any more lost hpet interrupts etc and the LAN speed is fine. Emerging packages as well ... From this config I will then try 3.10.7-r1 again. Went back to 3.10.7-r1 yesterday ... performance fine so far. Today I checked back and found the following in dmesg. Should I disable some option? Thanks for any feedback, Stefan [20788.258330] NMI backtrace for cpu 16 [20788.258334] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 3.10.7-gentoo-r1 #1 [20788.258336] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 12/10/2012 [20788.258338] task: 88042dcc9fe0 ti: 88042dcd6000 task.ti: 88042dcd6000 [20788.258340] RIP: 0010:[a0006e31] [a0006e31] acpi_idle_enter_simple+0x9e/0xde [processor] [20788.258346] RSP: 0018:88042dcd7e78 EFLAGS: 0093 [20788.258348] RAX: 12e824917a00 RBX: 880c2d1c78a0 RCX: b6e8 [20788.258350] RDX: 0ff9 RSI: 88082fc8 RDI: b6e0 [20788.258352] RBP: 0002 R08: R09: 0192 [20788.258354] R10: 0001 R11: R12: 880c2d1c7800 [20788.258356] R13: 12e605db654b R14: a00087d8 R15: [20788.258358] FS: 7fcbeb7de700() GS:88082fc8() knlGS:f77556c0 [20788.258360] CS: 0010 DS: ES: CR0: 8005003b [20788.258362] CR2: 7fd4098d2000 CR3: 00042d3a4000 CR4: 000407a0 [20788.258364] DR0: 00a0 DR1: DR2: 0003 [20788.258365] DR3: 00b0 DR6: 0ff0 DR7: 0400 [20788.258367] Stack: [20788.258368] 88102ccd4c00 a0008710 0002 8140284b [20788.258371] 01f6 81403b11 [20788.258374] 88102ccd4c00 0002 a0008710 88042dcd7fd8 [20788.258378] Call Trace: [20788.258382] [8140284b] ? cpuidle_enter_state+0x4b/0xe0 [20788.258386] [81403b11] ? ladder_select_state+0x31/0x1e0 [20788.258390] [8140297a] ? cpuidle_idle_call+0x9a/0x140 [20788.258394] [8103b919] ? arch_cpu_idle+0x9/0x30 [20788.258398] [81095439] ? cpu_startup_entry+0x59/0x130 [20788.258399] Code: 01 03 75 02 0f 09 e8 1f 50 08 e1 8a 43 08 3c 01 75 0a 48 89 df e8 c0 78 04 e1 eb 17 3c 02 75 07 e8 0f f9 ff ff eb 0c 8b 53 04 ec 48 8b 15 4c e0 81 e1 ed 31 ff e8 90 50 08 e1 80 7b 08 01 74 10 [20788.258440] NMI backtrace for cpu 20 [20788.258444] CPU: 20 PID: 0 Comm: swapper/20 Not tainted 3.10.7-gentoo-r1 #1 [20788.258445] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 12/10/2012 [20788.258447] task: 88042dccb960 ti: 88042dcde000 task.ti: 88042dcde000 [20788.258449] RIP: 0010:[a0006e31] [a0006e31] acpi_idle_enter_simple+0x9e/0xde [processor] [20788.258456] RSP: 0018:88042dcdfe78 EFLAGS: 0093 [20788.258457] RAX: 12e824919a00 RBX: 880c2d1c50a0 RCX: b6e8 [20788.258459] RDX: 0ff9 RSI: 88082fd0 RDI: b6e0 [20788.258461] RBP: 0002 R08: R09: 0193 [20788.258463] R10: 0001 R11: R12: 880c2d1c5000 [20788.258464] R13: 12e605db857d R14: a00087d8 R15: [20788.258467] FS: 7fcbeb7de700() GS:88082fd0() knlGS:f778d700 [20788.258469] CS: 0010 DS: ES: CR0: 8005003b [20788.258471] CR2: 7fd4098d2000 CR3: 00042d3a4000 CR4: 000407a0 [20788.258473] DR0: 00a0 DR1: DR2: 0003 [20788.258474] DR3: 00b0 DR6: 0ff0 DR7: 0400 [20788.258476] Stack: [20788.258477] 880c2d9e0400 a0008710 0002 8140284b [20788.258480] 01f7 81403b11 [20788.258484] 880c2d9e0400 0002 a0008710 88042dcdffd8 [20788.258487] Call Trace: [20788.258491] [8140284b] ? cpuidle_enter_state+0x4b/0xe0 [20788.258495] [81403b11] ? ladder_select_state+0x31/0x1e0 [20788.258498] [8140297a] ? cpuidle_idle_call+0x9a/0x140 [20788.258502] [8103b919] ? arch_cpu_idle+0x9/0x30 [20788.258506] [81095439] ? cpu_startup_entry+0x59/0x130 [20788.258508] Code: 01 03 75 02 0f 09 e8 1f 50 08 e1 8a 43 08 3c 01 75 0a 48 89 df e8 c0 78 04 e1 eb 17 3c 02 75 07 e8 0f f9 ff ff eb 0c 8b 53 04 ec 48 8b 15 4c e0 81 e1 ed 31 ff e8 90 50 08 e1 80 7b 08 01 74 10
Re: [gentoo-user] Re: Slow network transfers ... lost interrupts because of clocksource?
Am 14.10.2013 08:23, schrieb Stefan G. Weichinger: Am 10.10.2013 16:38, schrieb Stefan G. Weichinger: I don't plan to stay with 3.8.13, this is just an intermediate step to get a working config. For now I don't have any more lost hpet interrupts etc and the LAN speed is fine. Emerging packages as well ... From this config I will then try 3.10.7-r1 again. Went back to 3.10.7-r1 yesterday ... performance fine so far. Today I checked back and found the following in dmesg. Should I disable some option? Thanks for any feedback, Stefan [20788.258330] NMI backtrace for cpu 16 [20788.258334] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 3.10.7-gentoo-r1 #1 [20788.258336] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 12/10/2012 [20788.258338] task: 88042dcc9fe0 ti: 88042dcd6000 task.ti: 88042dcd6000 [20788.258340] RIP: 0010:[a0006e31] [a0006e31] acpi_idle_enter_simple+0x9e/0xde [processor] [20788.258346] RSP: 0018:88042dcd7e78 EFLAGS: 0093 [20788.258348] RAX: 12e824917a00 RBX: 880c2d1c78a0 RCX: b6e8 [20788.258350] RDX: 0ff9 RSI: 88082fc8 RDI: b6e0 [20788.258352] RBP: 0002 R08: R09: 0192 [20788.258354] R10: 0001 R11: R12: 880c2d1c7800 [20788.258356] R13: 12e605db654b R14: a00087d8 R15: [20788.258358] FS: 7fcbeb7de700() GS:88082fc8() knlGS:f77556c0 [20788.258360] CS: 0010 DS: ES: CR0: 8005003b [20788.258362] CR2: 7fd4098d2000 CR3: 00042d3a4000 CR4: 000407a0 [20788.258364] DR0: 00a0 DR1: DR2: 0003 [20788.258365] DR3: 00b0 DR6: 0ff0 DR7: 0400 [20788.258367] Stack: [20788.258368] 88102ccd4c00 a0008710 0002 8140284b [20788.258371] 01f6 81403b11 [20788.258374] 88102ccd4c00 0002 a0008710 88042dcd7fd8 [20788.258378] Call Trace: [20788.258382] [8140284b] ? cpuidle_enter_state+0x4b/0xe0 [20788.258386] [81403b11] ? ladder_select_state+0x31/0x1e0 [20788.258390] [8140297a] ? cpuidle_idle_call+0x9a/0x140 [20788.258394] [8103b919] ? arch_cpu_idle+0x9/0x30 [20788.258398] [81095439] ? cpu_startup_entry+0x59/0x130 [20788.258399] Code: 01 03 75 02 0f 09 e8 1f 50 08 e1 8a 43 08 3c 01 75 0a 48 89 df e8 c0 78 04 e1 eb 17 3c 02 75 07 e8 0f f9 ff ff eb 0c 8b 53 04 ec 48 8b 15 4c e0 81 e1 ed 31 ff e8 90 50 08 e1 80 7b 08 01 74 10 [20788.258440] NMI backtrace for cpu 20 [20788.258444] CPU: 20 PID: 0 Comm: swapper/20 Not tainted 3.10.7-gentoo-r1 #1 [20788.258445] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 12/10/2012 [20788.258447] task: 88042dccb960 ti: 88042dcde000 task.ti: 88042dcde000 [20788.258449] RIP: 0010:[a0006e31] [a0006e31] acpi_idle_enter_simple+0x9e/0xde [processor] [20788.258456] RSP: 0018:88042dcdfe78 EFLAGS: 0093 [20788.258457] RAX: 12e824919a00 RBX: 880c2d1c50a0 RCX: b6e8 [20788.258459] RDX: 0ff9 RSI: 88082fd0 RDI: b6e0 [20788.258461] RBP: 0002 R08: R09: 0193 [20788.258463] R10: 0001 R11: R12: 880c2d1c5000 [20788.258464] R13: 12e605db857d R14: a00087d8 R15: [20788.258467] FS: 7fcbeb7de700() GS:88082fd0() knlGS:f778d700 [20788.258469] CS: 0010 DS: ES: CR0: 8005003b [20788.258471] CR2: 7fd4098d2000 CR3: 00042d3a4000 CR4: 000407a0 [20788.258473] DR0: 00a0 DR1: DR2: 0003 [20788.258474] DR3: 00b0 DR6: 0ff0 DR7: 0400 [20788.258476] Stack: [20788.258477] 880c2d9e0400 a0008710 0002 8140284b [20788.258480] 01f7 81403b11 [20788.258484] 880c2d9e0400 0002 a0008710 88042dcdffd8 [20788.258487] Call Trace: [20788.258491] [8140284b] ? cpuidle_enter_state+0x4b/0xe0 [20788.258495] [81403b11] ? ladder_select_state+0x31/0x1e0 [20788.258498] [8140297a] ? cpuidle_idle_call+0x9a/0x140 [20788.258502] [8103b919] ? arch_cpu_idle+0x9/0x30 [20788.258506] [81095439] ? cpu_startup_entry+0x59/0x130 [20788.258508] Code: 01 03 75 02 0f 09 e8 1f 50 08 e1 8a 43 08 3c 01 75 0a 48 89 df e8 c0 78 04 e1 eb 17 3c 02 75 07 e8 0f f9 ff ff eb 0c 8b 53 04 ec 48 8b 15 4c e0 81 e1 ed 31 ff e8 90 50 08 e1 80 7b 08 01 74 10 nmo, you should finally go to lkml. Something is very wrong with your setup.
Re: [gentoo-user] Re: Slow network transfers ... lost interrupts because of clocksource?
On Thu, Oct 10, 2013 at 12:18:06AM +0200, Stefan G. Weichinger wrote: This is 3.8.13 now ... with some changed options, sure. For now I am happy ... can't believe it yet ;-) Why do you use a kernel that has been abandoned? https://www.kernel.org/ You should use a longterm kernel, preferably 3.10 series at this point. Why not use 3.8 series, or one marked EOL? Because no more patches will be applied against them, so no more bug/security fixes. -- Happy Penguin Computers ') 126 Fenco Drive ( \ Tupelo, MS 38801 ^^ supp...@happypenguincomputers.com 662-269-2706 662-205-6424 http://happypenguincomputers.com/ A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? Don't top-post: http://en.wikipedia.org/wiki/Top_post#Top-posting
Re: [gentoo-user] Re: Slow network transfers ... lost interrupts because of clocksource?
Am 10.10.2013 14:20, schrieb Bruce Hill: On Thu, Oct 10, 2013 at 12:18:06AM +0200, Stefan G. Weichinger wrote: This is 3.8.13 now ... with some changed options, sure. For now I am happy ... can't believe it yet ;-) Why do you use a kernel that has been abandoned? https://www.kernel.org/ You should use a longterm kernel, preferably 3.10 series at this point. Why not use 3.8 series, or one marked EOL? Because no more patches will be applied against them, so no more bug/security fixes. 3.8.13 is 3.8 series, or ... ? I don't plan to stay with 3.8.13, this is just an intermediate step to get a working config. For now I don't have any more lost hpet interrupts etc and the LAN speed is fine. Emerging packages as well ... From this config I will then try 3.10.7-r1 again.
Re: [gentoo-user] Re: Slow network transfers ... lost interrupts because of clocksource?
Would someone please take a look at this dmesg: https://dl.dropboxusercontent.com/u/24516209/dmesg.txt and tell me what those ugly messages around the CPUs could mean? I still see suboptimal performance on this hardware ... Thanks!
Re: [gentoo-user] Re: Slow network transfers ... lost interrupts because of clocksource?
Am 09.10.2013 20:20, schrieb Nicolas Sebrecht: I would think about a kernel bug first and try with a much lower version. Yep. A bit scary with a server which is hundreds of kilometers away. Got to get that HP IlO-thingy going in my browser(s) ... S
Re: [gentoo-user] Re: Slow network transfers ... lost interrupts because of clocksource?
Am 09.10.2013 21:17, schrieb Stefan G. Weichinger: Am 09.10.2013 20:20, schrieb Nicolas Sebrecht: I would think about a kernel bug first and try with a much lower version. Yep. A bit scary with a server which is hundreds of kilometers away. Got to get that HP IlO-thingy going in my browser(s) ... S go with that dmesg, lspci, kernel config etc pp to lkml. Seriously.
Re: [gentoo-user] Re: Slow network transfers ... lost interrupts because of clocksource?
Am 09.10.2013 21:57, schrieb Volker Armin Hemmann: go with that dmesg, lspci, kernel config etc pp to lkml. Seriously. phew ... that sounds like making a fool of myself ... ;-) But maybe you are right, thanks. Considering to test 3.8.13 ... got the IlO to display the console in a browser ... and will test choosing a kernel in the GRUB-menu before I really switch over. S
Re: [gentoo-user] Re: Slow network transfers ... lost interrupts because of clocksource?
Am 09.10.2013 22:05, schrieb Stefan G. Weichinger: Am 09.10.2013 21:57, schrieb Volker Armin Hemmann: go with that dmesg, lspci, kernel config etc pp to lkml. Seriously. phew ... that sounds like making a fool of myself ... ;-) But maybe you are right, thanks. Considering to test 3.8.13 ... got the IlO to display the console in a browser ... and will test choosing a kernel in the GRUB-menu before I really switch over. Compiled 3.8.13 and rebooted successfully ... still laggy ... still these lines in dmesg: [ 477.896055] hpet1: lost 1 rtc interrupts [ 477.933359] hpet1: lost 1 rtc interrupts [ 478.050117] hpet1: lost 1 rtc interrupts [ 478.315080] hpet1: lost 1 rtc interrupts [ 478.409115] hpet1: lost 1 rtc interrupts [ 478.444083] hpet1: lost 1 rtc interrupts [ 478.638091] hpet1: lost 1 rtc interrupts [ 478.780332] hpet1: lost 1 rtc interrupts [ 478.862724] hpet1: lost 1 rtc interrupts [ 478.950488] hpet1: lost 1 rtc interrupts While I disabled most of that HPET-stuff (took from my own server here): # zgrep -i hpet /proc/config.gz CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y # CONFIG_HPET is not set This is quite frustrating already ... but I repeat myself, sorry.
Re: [gentoo-user] Re: Slow network transfers ... lost interrupts because of clocksource?
Am 09.10.2013 22:40, schrieb Stefan G. Weichinger: While I disabled most of that HPET-stuff (took from my own server here): # zgrep -i hpet /proc/config.gz CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y # CONFIG_HPET is not set This is quite frustrating already ... but I repeat myself, sorry. I achieved *something* ... I still have to diff kernel-configs but right now I am happily rsyncing that vmdk-image with around 40 MB/s That virt-manager-issue is still there but at least the basic performance seems way better now. I could transfer the KVM-raw-image to an LVM-LV within a few minutes ... the same dd-command took hours before and never succeeded. Some more tests now. S
Re: [gentoo-user] Re: Slow network transfers ... lost interrupts because of clocksource?
Am 10.10.2013 00:11, schrieb Stefan G. Weichinger: Am 09.10.2013 22:40, schrieb Stefan G. Weichinger: While I disabled most of that HPET-stuff (took from my own server here): # zgrep -i hpet /proc/config.gz CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y # CONFIG_HPET is not set This is quite frustrating already ... but I repeat myself, sorry. I achieved *something* ... I still have to diff kernel-configs but right now I am happily rsyncing that vmdk-image with around 40 MB/s quite happy after dozens of hours spent on this: # time rsync -av 192.168.1.200:/mnt/vm_apps/xy.vmdk . --progress [...] 8058765312 100% 53.58MB/s0:02:23 (xfer#1, to-check=0/1) sent 42 bytes received 8059749181 bytes 55776811.23 bytes/sec total size is 8058765312 speedup is 1.00 real2m23.856s user2m28.640s sys 0m26.820s nice! ;-) This is 3.8.13 now ... with some changed options, sure. For now I am happy ... can't believe it yet ;-)
Re: [gentoo-user] Re: Slow network transfers ... lost interrupts because of clocksource?
Am 01.10.2013 14:32, schrieb Nicolas Sebrecht: The 01/10/13, Stefan G. Weichinger wrote: I used split and tar to split the image-file into 100 MB parts and rsync them over right now. Maybe I have something wrong in my kernel ... the server shows a load of around 3 ... while only the rsync is running and my mosh-session ... This is a 24-core-system ... it shouldn't even blink ... If you are sure the load don't come from userland (htop?), I would think about reporting a kernel bug. This might be an issue with the NIC driver. lspci shows: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) This is a 4-port NIC ... maybe I need some specific drivers and not only the tg3-kernel-module? Stefan
Re: [gentoo-user] Re: Slow network transfers ... lost interrupts because of clocksource?
Am 01.10.2013 16:00, schrieb Stefan G. Weichinger: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) This is a 4-port NIC ... maybe I need some specific drivers and not only the tg3-kernel-module? for the records and if someone wants to join in: recompiled the kernel, put in some ACPI-stuff and chose generic X86_64 for the processor type. This somehow helped ... the load is lower now and the system is more responsive. I am considering to change the (auto-generated? by systemd) mount options for the ext4-fs: data=ordered ... - Right now I am re-cat-ing my splitted image file ... we'll see. Stefan
Re: [gentoo-user] Re: Slow network transfers ... lost interrupts because of clocksource?
Am 27.09.2013 15:02, schrieb Nicolas Sebrecht: You should give details of the tests. It looks like a hard disk write speed bottleneck. I will get access again on monday. That's a hardware RAID-10 on 6 SAS disks ... that should be fast enough ...
Re: [gentoo-user] Re: Slow network transfers ... lost interrupts because of clocksource?
Am 27.09.2013 15:18, schrieb Nicolas Sebrecht: Try avoiding to write on disks. Use devices /dev/null and /dev/zero with both protocol and command lines not optimizing zeros for network tests. You could also use dedicated network performance tools if you have install rights. Will do next week to focus on the main issues ... did some bonnie-tests yesterday, which were quite OK so I assume it's not the disk-performance. AFAI googled rsync and ssh have their issues with big files ... but tar/netcat wasn't really fast either. Stefan