Re: Firefox, malloc(3) and threads
On Mon, Jan 25, 2016 at 09:57:37AM +0100, Landry Breuil wrote: > On Mon, Jan 25, 2016 at 08:48:21AM +0100, Mark Kettenis wrote: > > > From: "Peter N. M. Hansteen" > > > Date: Sun, 24 Jan 2016 23:10:41 +0100 > > > > > > On 01/22/16 22:46, Mark Kettenis wrote: > > > > Firefox makes a lot of concurrent malloc(3) calls. The locking to > > > > make malloc(3) thread-safe is a bit...suboptimal. This diff makes > > > > things better by using a mutex instead of spinlock. If you're running > > > > Firefox you want to try it; it makes video watchable on some machines. > > > > If you're not running Firefox you want to try it; to make sure it > > > > doesn't break things. > > > > > > Running this since early Saturday, Firefox is definitely more responsive > > > than earlier. > > > > > > I haven't tried running other resource hogs such as LibreOffice with > > > several large documents, but I guess I could try that too if it's a > > > relevant scenario. > > > > Please do! > > Albeit small, x11/xfce4/thunar makes a heavy use of threads (in general, > and even more when talking to gvfs mounts). It feels now 200% snappier. Another successful test on i386 where firefox had became totally unusable (Atom N270, 1Gb RAM), with the latest snap (including the diff) it's sort-of usable (gmaps, google news...). Yay! Landry
Re: Firefox, malloc(3) and threads
FYI: This diff is in the snapshots since Sunday. On Mon, Jan 25, 2016 at 4:34 PM, Matthew Via wrote: > I've had the patch applied for two days now and have not seen any ill > efects. This is a Thinkpad T410 running snapshots. > > Before, youtube was unwatchable. Sound would continue normally while > video would freeze for long stretches, often over 10 seconds. Its not > perfect now, but its very nearly so when not fullscreen. > > It does seem that cpu usage of firefox is also significantly reduced, > and is generally snappier. > > Thank you! > -via > > On 22:46 Fri 22 Jan , Mark Kettenis wrote: > > Firefox makes a lot of concurrent malloc(3) calls. The locking to > > make malloc(3) thread-safe is a bit...suboptimal. This diff makes > > things better by using a mutex instead of spinlock. If you're running > > Firefox you want to try it; it makes video watchable on some machines. > > If you're not running Firefox you want to try it; to make sure it > > doesn't break things. > > > > Enjoy, > > > > Mark >
Re: Firefox, malloc(3) and threads
I've had the patch applied for two days now and have not seen any ill efects. This is a Thinkpad T410 running snapshots. Before, youtube was unwatchable. Sound would continue normally while video would freeze for long stretches, often over 10 seconds. Its not perfect now, but its very nearly so when not fullscreen. It does seem that cpu usage of firefox is also significantly reduced, and is generally snappier. Thank you! -via On 22:46 Fri 22 Jan , Mark Kettenis wrote: > Firefox makes a lot of concurrent malloc(3) calls. The locking to > make malloc(3) thread-safe is a bit...suboptimal. This diff makes > things better by using a mutex instead of spinlock. If you're running > Firefox you want to try it; it makes video watchable on some machines. > If you're not running Firefox you want to try it; to make sure it > doesn't break things. > > Enjoy, > > Mark pgpADkOhkH3M0.pgp Description: PGP signature
Re: Firefox, malloc(3) and threads
On Sat, Jan 23, 2016 at 03:53:32PM +0100, Martin Natano wrote: > Yes! This absolutely makes Youtube videos watchable for me (on a > Thinkpad T520). There still is occassional stuttering, but _far_ less > disruptive than before. Another usecase where I see improvements is > reloading a resource-heavy web page while switching tabs. Before > applying the patch, this caused the browser to hang for several seconds. > Now it doesn't. The same here on a ThinkPad T420. dmesg: OpenBSD 5.9-beta (GENERIC.MP) #0: Mon Jan 25 19:14:50 BRST 2016 dbolgher...@iron.my.domain:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 8451125248 (8059MB) avail mem = 8190803968 (7811MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xdae9c000 (65 entries) bios0: vendor LENOVO version "83ET70WW (1.40 )" date 06/12/2012 bios0: LENOVO 4180DL4 acpi0 at bios0: rev 2 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP SLIC SSDT SSDT SSDT HPET APIC MCFG ECDT ASF! TCPA SSDT SSDT DMAR UEFI UEFI UEFI acpi0: wakeup devices LID_(S3) SLPB(S3) IGBE(S4) EXP4(S4) EHC1(S3) EHC2(S3) HDEF(S4) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpihpet0 at acpi0: 14318179 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2492.32 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE cpu1 at mainbus0: apid 1 (application processor) cpu1: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2491.91 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 1, core 0, package 0 cpu2 at mainbus0: apid 2 (application processor) cpu2: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2491.91 MHz cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT cpu2: 256KB 64b/line 8-way L2 cache cpu2: smt 0, core 1, package 0 cpu3 at mainbus0: apid 3 (application processor) cpu3: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2491.92 MHz cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT cpu3: 256KB 64b/line 8-way L2 cache cpu3: smt 1, core 1, package 0 ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins acpimcfg0 at acpi0 addr 0xf800, bus 0-63 acpiec0 at acpi0 acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus -1 (PEG_) acpiprt2 at acpi0: bus 2 (EXP1) acpiprt3 at acpi0: bus 3 (EXP2) acpiprt4 at acpi0: bus 5 (EXP4) acpiprt5 at acpi0: bus 13 (EXP5) acpicpu0 at acpi0: C3(350@104 io@0x415), C1(1000@1 halt), PSS acpicpu1 at acpi0: C3(350@104 io@0x415), C1(1000@1 halt), PSS acpicpu2 at acpi0: C3(350@104 io@0x415), C1(1000@1 halt), PSS acpicpu3 at acpi0: C3(350@104 io@0x415), C1(1000@1 halt), PSS acpipwrres0 at acpi0: PUBS, resource for EHC1, EHC2 acpitz0 at acpi0: critical temperature is 98 degC acpibtn0 at acpi0: LID_ acpibtn1 at acpi0: SLPB acpibat0 at acpi0: BAT0 model "42T4710" serial 1694 type LION oem "SANYO" acpibat1 at acpi0: BAT1 not present acpiac0 at acpi0: AC unit online acpithinkpad0 at acpi0 cpu0: Enhanced SpeedStep 2492 MHz: speeds: 2501, 2500, 2200, 2000, 1800, 1600, 1400, 1200, 1000, 800 MHz pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel Core 2G Host" rev 0x09 inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 3000" rev 0x09 drm0 at inteldrm0 inteldrm0: msi inteldrm0: 1600x900 wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation) wsdisplay0: screen 1-5 added (std, vt100 emulation) "Intel 6 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured puc0 at pci0 dev 22 function 3 "Intel 6 Series KT" rev 0x04: ports: 1 com com4 at puc0 port 0 apic 2 int 19: ns16550a, 16 byte fifo com4: probed fifo depth: 0 bytes em0 at pci0 dev 25 function 0 "Intel 82579LM" rev 0x04: msi, address 00:21:cc:ba:e3:5d ehci0 at pci0 dev 26 function 0 "Intel 6 Series USB" rev 0x04: apic 2 int 16 usb0 at ehci0: USB revision 2.0 uhub0
Re: Firefox, malloc(3) and threads
Hi Mark, even with 16GB RAM I needed to install smtube to get a decent view of videos prior to your patches. Patched last night but only tonight I am able to do some testing: At present I have openend - LibreOffice Writer with one doc - LibreOffice Calc with one doc - gimp with one picture - Pidgin-OTR - smplayer (nothing playing) - Thunderbird (two mail boxes) - Firefox with 10 tabs open, one of them being YT (Theo talking about pledge at Hackfest 2015) Even though YT is hanging every now an then it it now perfectly possible to watch / listen / follow the presentation although I have just a modest line. CPU usage (noticed via 'top') peaked at around 160% but average seems to be around 100%. I didn't notice any drawbacks from your patches. Every program is responsive, only thunderbird had some delays while typing this post (listening to Theo meanwhile). While this is not a "serious" test (by academic terms as it is not 100% repeatable) I can only report that I didn't come across any failures. Instead the system "feels" to be highly responsive with any task I tried. To summarize: THANK YOU! Best, STEFAN OpenBSD 5.9-beta (GENERIC.MP) #1863: Sun Jan 24 21:35:42 MST 2016 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 17082359808 (16291MB) avail mem = 16560455680 (15793MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xeb500 (35 entries) bios0: vendor American Megatrends Inc. version "1.05.01" date 08/05/2015 bios0: Notebook W65_67SZ acpi0 at bios0: rev 2 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP APIC FPDT ASF! SSDT SSDT SSDT MCFG HPET SSDT SSDT SSDT DMAR acpi0: wakeup devices PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) RLAN(S4) PXSX(S4) RP05(S4) PXSX(S4) RP06(S4) PXSX(S4) RP07(S4) PXSX(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz, 3093.23 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz, 3092.84 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 1 (application processor) cpu2: Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz, 3092.84 MHz cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT cpu2: 256KB 64b/line 8-way L2 cache cpu2: smt 1, core 0, package 0 cpu3 at mainbus0: apid 3 (application processor) cpu3: Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz, 3092.84 MHz cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT cpu3: 256KB 64b/line 8-way L2 cache cpu3: smt 1, core 1, package 0 ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins acpimcfg0 at acpi0 addr 0xf800, bus 0-63 acpihpet0 at acpi0: 14318179 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 2 (RP01) acpiprt2 at acpi0: bus 3 (RP03) acpiprt3 at acpi0: bus 4 (RP04) acpiprt4 at acpi0: bus 1 (P0P2) acpiprt5 at acpi0: bus -1 (P0PA) acpiprt6 at acpi0: bus -1 (P0PB) acpiprt7 at acpi0: bus 1 (PEG0) acpiec0 at acpi0 acpicpu0 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS acpicpu1 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS acpicpu2 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS acpicpu3 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS acpitz0 at acpi0: critical temperature is 120 degC acpibtn0 at acpi0: PWRB acpibtn1 at acpi0: S
Re: Firefox, malloc(3) and threads
On Mon, Jan 25, 2016 at 10:06:22AM +0100, David Coppa wrote: > On Sun, Jan 24, 2016 at 7:47 PM, Adam Wolk wrote: > > On Fri, 22 Jan 2016 22:46:39 +0100 (CET) > > Mark Kettenis wrote: > > > >> Firefox makes a lot of concurrent malloc(3) calls. The locking to > >> make malloc(3) thread-safe is a bit...suboptimal. This diff makes > >> things better by using a mutex instead of spinlock. If you're running > >> Firefox you want to try it; it makes video watchable on some machines. > >> If you're not running Firefox you want to try it; to make sure it > >> doesn't break things. > >> > >> Enjoy, > >> > >> Mark > >> ' > > > > Applied to a Jan 15h snapshot sources. Youtube is not fully 'watchable' > > on firefox but feels significantly better. I can also now watch full > > screen youtube videos on chromium 1920x1080 with no stutter (lenovo > > g50-70). > > > > Generally gnome 3 feels a bit snappier especially on first load, > > bringing up the menu searching for 'terminal' leads to a faster > > rendering of the results. This might be just 'imagined' by me. > > > > On a more measurable front. I ran the octane benchmark against firefox > > post and before the patch. It resulted in a slight improvement from > > 12486 to 12826 score [1]. > > Besides performance related issues, the problem we saw in the past was > firefox using a huge amount of CPU resources with no apparent > reasons... I've seen the same behavior on Linux. Probably not 100% related to the OS. -- Juan Francisco Cantero Hurtado http://juanfra.info
Re: Firefox, malloc(3) and threads
Hi Mark, On Fri, Jan 22, 2016 at 10:46:39PM +0100, Mark Kettenis wrote: > Firefox makes a lot of concurrent malloc(3) calls. The locking to > make malloc(3) thread-safe is a bit...suboptimal. This diff makes > things better by using a mutex instead of spinlock. If you're running > Firefox you want to try it; it makes video watchable on some machines. > If you're not running Firefox you want to try it; to make sure it > doesn't break things. I tried your diff. Nothing bad happened. I don't notice much difference in firefox using a highly unscientific "gut-feeling" before and after test. Youtube videos still stutter -- too much to watch. During this time firefox uses ~170% CPU. I also tried iridium, my everyday browser and didn't notice a difference here either. Youtube videos performance remains the same: much better than firefox, but still skipping frequently. My system is a thinkpad x240t tablet. Dmesg follows (sorry about the suspend in there: I have to perform a zzz and wake before the HDMI2 output shows up in my docking station, so it's always the first thing I do after booting fresh -- keep meaning to look into this): OpenBSD 5.9-beta (GENERIC.MP) #17: Mon Jan 25 14:31:46 GMT 2016 e...@wilfred.dlink.com:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 16844521472 (16064MB) avail mem = 16329822208 (15573MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xdae9c000 (68 entries) bios0: vendor LENOVO version "GCETA2WW (2.62 )" date 04/09/2015 bios0: LENOVO 3437CTO acpi0 at bios0: rev 2 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP SLIC TCPA SSDT SSDT SSDT HPET APIC MCFG ECDT FPDT ASF! UEFI UEFI POAT SSDT SSDT DMAR UEFI DBG2 acpi0: wakeup devices LID_(S4) SLPB(S3) IGBE(S4) EXP3(S4) XHCI(S3) EHC1(S3) EHC2(S3) HDEF(S4) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpihpet0 at acpi0: 14318179 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.59 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,SENSOR,ARAT cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.11 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,SENSOR,ARAT cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 0, core 1, package 0 ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins acpimcfg0 at acpi0 addr 0xf800, bus 0-63 acpiec0 at acpi0 acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus -1 (PEG_) acpiprt2 at acpi0: bus 2 (EXP1) acpiprt3 at acpi0: bus 3 (EXP2) acpiprt4 at acpi0: bus 4 (EXP3) acpicpu0 at acpi0: C2(350@80 mwait.1@0x20), C1(1000@1 mwait.1), PSS acpicpu1 at acpi0: C2(350@80 mwait.1@0x20), C1(1000@1 mwait.1), PSS acpipwrres0 at acpi0: PUBS, resource for XHCI, EHC1, EHC2 acpitz0 at acpi0: critical temperature is 103 degC acpibtn0 at acpi0: LID_ acpibtn1 at acpi0: SLPB acpibat0 at acpi0: BAT0 model "45N1077" serial 14278 type LION oem "SANYO" acpibat1 at acpi0: BAT1 not present acpiac0 at acpi0: AC unit online acpithinkpad0 at acpi0 acpidock0 at acpi0: GDCK docked (15) cpu0: Enhanced SpeedStep 2594 MHz: speeds: 2601, 2600, 2500, 2400, 2300, 2200, 2100, 2000, 1900, 1800, 1700, 1600, 1500, 1400, 1300, 1200 MHz pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel Core 3G Host" rev 0x09 inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 4000" rev 0x09 drm0 at inteldrm0 inteldrm0: msi inteldrm0: 1366x768 wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation) wsdisplay0: screen 1-5 added (std, vt100 emulation) "Intel 7 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured em0 at pci0 dev 25 function 0 "Intel 82579LM" rev 0x04: msi, address 3c:97:0e:a5:02:69 ehci0 at pci0 dev 26 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 16 usb0 at ehci0: USB revision 2.0 uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1 azalia0 at pci0 dev 27 function 0 "Intel 7 Series HD Audio" rev 0x04: msi azalia0: codecs: Realtek ALC269, Intel/0x2806, using Realtek ALC269 audio0 at azalia0 ppb0 at pci0 dev 28 function 0 "Intel 7 Series PCIE" rev 0xc4: msi pci1 at ppb0 bus 2 sdhc0 at pci1 dev 0 function 0 "Ricoh 5U822 SD/MMC" rev 0x07: apic 2 int 16 sdmmc0 at sdhc0 ppb1
Re: Firefox, malloc(3) and threads
I haven't tried anything too scientific yet, but pages seem to load quicker and firefox seems to be more responsive under load for me. Before this patch, loading a page would have a tendency to lock the browser for a few seconds on complex pages. Nothing seems to have broken, so I'll try harder.
Re: Firefox, malloc(3) and threads
On Sun, Jan 24, 2016 at 7:47 PM, Adam Wolk wrote: > On Fri, 22 Jan 2016 22:46:39 +0100 (CET) > Mark Kettenis wrote: > >> Firefox makes a lot of concurrent malloc(3) calls. The locking to >> make malloc(3) thread-safe is a bit...suboptimal. This diff makes >> things better by using a mutex instead of spinlock. If you're running >> Firefox you want to try it; it makes video watchable on some machines. >> If you're not running Firefox you want to try it; to make sure it >> doesn't break things. >> >> Enjoy, >> >> Mark >> ' > > Applied to a Jan 15h snapshot sources. Youtube is not fully 'watchable' > on firefox but feels significantly better. I can also now watch full > screen youtube videos on chromium 1920x1080 with no stutter (lenovo > g50-70). > > Generally gnome 3 feels a bit snappier especially on first load, > bringing up the menu searching for 'terminal' leads to a faster > rendering of the results. This might be just 'imagined' by me. > > On a more measurable front. I ran the octane benchmark against firefox > post and before the patch. It resulted in a slight improvement from > 12486 to 12826 score [1]. Besides performance related issues, the problem we saw in the past was firefox using a huge amount of CPU resources with no apparent reasons... So please also try to test if you still see this erratic behavior with Mark's patch applied. ciao, David
Re: Firefox, malloc(3) and threads
On Mon, Jan 25, 2016 at 08:48:21AM +0100, Mark Kettenis wrote: > > From: "Peter N. M. Hansteen" > > Date: Sun, 24 Jan 2016 23:10:41 +0100 > > > > On 01/22/16 22:46, Mark Kettenis wrote: > > > Firefox makes a lot of concurrent malloc(3) calls. The locking to > > > make malloc(3) thread-safe is a bit...suboptimal. This diff makes > > > things better by using a mutex instead of spinlock. If you're running > > > Firefox you want to try it; it makes video watchable on some machines. > > > If you're not running Firefox you want to try it; to make sure it > > > doesn't break things. > > > > Running this since early Saturday, Firefox is definitely more responsive > > than earlier. > > > > I haven't tried running other resource hogs such as LibreOffice with > > several large documents, but I guess I could try that too if it's a > > relevant scenario. > > Please do! Albeit small, x11/xfce4/thunar makes a heavy use of threads (in general, and even more when talking to gvfs mounts). It feels now 200% snappier. Landry
Re: Firefox, malloc(3) and threads
> From: "Peter N. M. Hansteen" > Date: Sun, 24 Jan 2016 23:10:41 +0100 > > On 01/22/16 22:46, Mark Kettenis wrote: > > Firefox makes a lot of concurrent malloc(3) calls. The locking to > > make malloc(3) thread-safe is a bit...suboptimal. This diff makes > > things better by using a mutex instead of spinlock. If you're running > > Firefox you want to try it; it makes video watchable on some machines. > > If you're not running Firefox you want to try it; to make sure it > > doesn't break things. > > Running this since early Saturday, Firefox is definitely more responsive > than earlier. > > I haven't tried running other resource hogs such as LibreOffice with > several large documents, but I guess I could try that too if it's a > relevant scenario. Please do!
Re: Firefox, malloc(3) and threads
On 01/22/16 22:46, Mark Kettenis wrote: > Firefox makes a lot of concurrent malloc(3) calls. The locking to > make malloc(3) thread-safe is a bit...suboptimal. This diff makes > things better by using a mutex instead of spinlock. If you're running > Firefox you want to try it; it makes video watchable on some machines. > If you're not running Firefox you want to try it; to make sure it > doesn't break things. Running this since early Saturday, Firefox is definitely more responsive than earlier. I haven't tried running other resource hogs such as LibreOffice with several large documents, but I guess I could try that too if it's a relevant scenario. - P -- Peter N. M. Hansteen, member of the first RFC 1149 implementation team http://bsdly.blogspot.com/ http://www.bsdly.net/ http://www.nuug.no/ "Remember to set the evil bit on all malicious network traffic" delilah spamd[29949]: 85.152.224.147: disconnected after 42673 seconds.
Re: Firefox, malloc(3) and threads
On 24 January 2016 at 20:47, Adam Wolk wrote: > On Fri, 22 Jan 2016 22:46:39 +0100 (CET) > Mark Kettenis wrote: > >> Firefox makes a lot of concurrent malloc(3) calls. The locking to >> make malloc(3) thread-safe is a bit...suboptimal. This diff makes >> things better by using a mutex instead of spinlock. If you're running >> Firefox you want to try it; it makes video watchable on some machines. >> If you're not running Firefox you want to try it; to make sure it >> doesn't break things. >> >> Enjoy, >> >> Mark >> ' > > Applied to a Jan 15h snapshot sources. Youtube is not fully 'watchable' > on firefox but feels significantly better. I can also now watch full > screen youtube videos on chromium 1920x1080 with no stutter (lenovo > g50-70). > > Generally gnome 3 feels a bit snappier especially on first load, > bringing up the menu searching for 'terminal' leads to a faster > rendering of the results. This might be just 'imagined' by me. > > On a more measurable front. I ran the octane benchmark against firefox > post and before the patch. It resulted in a slight improvement from > 12486 to 12826 score [1]. > > cpu0: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.93 MHz > cpu1: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.62 MHz > cpu2: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.62 MHz > cpu3: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.62 MHz > inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics" rev 0x0b > running Intel Haswell Mobile for the gfx card. > > Regards, > Adam > > [1] - https://twitter.com/mulander/status/691327370985345024 Hi, pretty much the same results here, though running Lenovo X250 with i7-5600U. Dankuwel Mark, nice finding. -- Regards, Ville Valkonen
Re: Firefox, malloc(3) and threads
On Fri, 22 Jan 2016 22:46:39 +0100 (CET) Mark Kettenis wrote: > Firefox makes a lot of concurrent malloc(3) calls. The locking to > make malloc(3) thread-safe is a bit...suboptimal. This diff makes > things better by using a mutex instead of spinlock. If you're running > Firefox you want to try it; it makes video watchable on some machines. > If you're not running Firefox you want to try it; to make sure it > doesn't break things. > > Enjoy, > > Mark > ' Applied to a Jan 15h snapshot sources. Youtube is not fully 'watchable' on firefox but feels significantly better. I can also now watch full screen youtube videos on chromium 1920x1080 with no stutter (lenovo g50-70). Generally gnome 3 feels a bit snappier especially on first load, bringing up the menu searching for 'terminal' leads to a faster rendering of the results. This might be just 'imagined' by me. On a more measurable front. I ran the octane benchmark against firefox post and before the patch. It resulted in a slight improvement from 12486 to 12826 score [1]. cpu0: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.93 MHz cpu1: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.62 MHz cpu2: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.62 MHz cpu3: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.62 MHz inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics" rev 0x0b running Intel Haswell Mobile for the gfx card. Regards, Adam [1] - https://twitter.com/mulander/status/691327370985345024
Re: Firefox, malloc(3) and threads
* On Fri Jan 22, 2016 at 10:46:39PM +0100 28706 , Mark Kettenis (mark.kette...@xs4all.nl) wrote: > > Firefox makes a lot of concurrent malloc(3) calls. The locking to > make malloc(3) thread-safe is a bit...suboptimal. This diff makes > things better by using a mutex instead of spinlock. If you're running > Firefox you want to try it; it makes video watchable on some machines. > If you're not running Firefox you want to try it; to make sure it > doesn't break things. > > Enjoy, > > Mark > [snip] Hi Mark, I have applied your patch and noticed a big improvement with Youtube videos and if I am not mistaken, content heavy websites like news sites seem to load faster and more smoothly too. This machine is a 2009 Macbook Pro running -Current. I will patch my -Current server as well and let you know if I notice anything good or bad. Awesome! Thanks!!
Re: Firefox, malloc(3) and threads
Yes! This absolutely makes Youtube videos watchable for me (on a Thinkpad T520). There still is occassional stuttering, but _far_ less disruptive than before. Another usecase where I see improvements is reloading a resource-heavy web page while switching tabs. Before applying the patch, this caused the browser to hang for several seconds. Now it doesn't. The patch reads fine to, although I'm not an rthread expert. It doesn't seem to break anything on my system either. Thanks, natano On Fri, Jan 22, 2016 at 10:46:39PM +0100, Mark Kettenis wrote: > Firefox makes a lot of concurrent malloc(3) calls. The locking to > make malloc(3) thread-safe is a bit...suboptimal. This diff makes > things better by using a mutex instead of spinlock. If you're running > Firefox you want to try it; it makes video watchable on some machines. > If you're not running Firefox you want to try it; to make sure it > doesn't break things. > > Enjoy, > > Mark > > > Index: rthread.h > === > RCS file: /cvs/src/lib/librthread/rthread.h,v > retrieving revision 1.54 > diff -u -p -r1.54 rthread.h > --- rthread.h 10 Nov 2015 04:30:59 - 1.54 > +++ rthread.h 22 Jan 2016 21:08:11 - > @@ -223,6 +223,7 @@ void _rthread_debug_init(void); > #ifndef NO_PIC > void _rthread_dl_lock(int what); > #endif > +void _thread_malloc_reinit(void); > > /* rthread_cancel.c */ > void _enter_cancel(pthread_t); > Index: rthread_fork.c > === > RCS file: /cvs/src/lib/librthread/rthread_fork.c,v > retrieving revision 1.14 > diff -u -p -r1.14 rthread_fork.c > --- rthread_fork.c18 Oct 2015 08:02:58 - 1.14 > +++ rthread_fork.c22 Jan 2016 21:08:11 - > @@ -82,7 +82,10 @@ _dofork(int is_vfork) > newid = sys_fork(); > > _thread_arc4_unlock(); > - _thread_malloc_unlock(); > + if (newid == 0) > + _thread_malloc_reinit(); > + else > + _thread_malloc_unlock(); > _thread_atexit_unlock(); > > if (newid == 0) { > Index: rthread_libc.c > === > RCS file: /cvs/src/lib/librthread/rthread_libc.c,v > retrieving revision 1.12 > diff -u -p -r1.12 rthread_libc.c > --- rthread_libc.c7 Apr 2015 01:27:07 - 1.12 > +++ rthread_libc.c22 Jan 2016 21:08:11 - > @@ -152,18 +152,35 @@ _thread_mutex_destroy(void **mutex) > /* > * the malloc lock > */ > -static struct _spinlock malloc_lock = _SPINLOCK_UNLOCKED; > +static struct pthread_mutex malloc_lock = { > + _SPINLOCK_UNLOCKED, > + TAILQ_HEAD_INITIALIZER(malloc_lock.lockers), > + PTHREAD_MUTEX_DEFAULT, > + NULL, > + 0, > + -1 > +}; > +static pthread_mutex_t malloc_mutex = &malloc_lock; > > void > _thread_malloc_lock(void) > { > - _spinlock(&malloc_lock); > + pthread_mutex_lock(&malloc_mutex); > } > > void > _thread_malloc_unlock(void) > { > - _spinunlock(&malloc_lock); > + pthread_mutex_unlock(&malloc_mutex); > +} > + > +void > +_thread_malloc_reinit(void) > +{ > + malloc_lock.lock = _SPINLOCK_UNLOCKED_ASSIGN; > + TAILQ_INIT(&malloc_lock.lockers); > + malloc_lock.owner = NULL; > + malloc_lock.count = 0; > } > > /* >
Firefox, malloc(3) and threads
Firefox makes a lot of concurrent malloc(3) calls. The locking to make malloc(3) thread-safe is a bit...suboptimal. This diff makes things better by using a mutex instead of spinlock. If you're running Firefox you want to try it; it makes video watchable on some machines. If you're not running Firefox you want to try it; to make sure it doesn't break things. Enjoy, Mark Index: rthread.h === RCS file: /cvs/src/lib/librthread/rthread.h,v retrieving revision 1.54 diff -u -p -r1.54 rthread.h --- rthread.h 10 Nov 2015 04:30:59 - 1.54 +++ rthread.h 22 Jan 2016 21:08:11 - @@ -223,6 +223,7 @@ void_rthread_debug_init(void); #ifndef NO_PIC void _rthread_dl_lock(int what); #endif +void _thread_malloc_reinit(void); /* rthread_cancel.c */ void _enter_cancel(pthread_t); Index: rthread_fork.c === RCS file: /cvs/src/lib/librthread/rthread_fork.c,v retrieving revision 1.14 diff -u -p -r1.14 rthread_fork.c --- rthread_fork.c 18 Oct 2015 08:02:58 - 1.14 +++ rthread_fork.c 22 Jan 2016 21:08:11 - @@ -82,7 +82,10 @@ _dofork(int is_vfork) newid = sys_fork(); _thread_arc4_unlock(); - _thread_malloc_unlock(); + if (newid == 0) + _thread_malloc_reinit(); + else + _thread_malloc_unlock(); _thread_atexit_unlock(); if (newid == 0) { Index: rthread_libc.c === RCS file: /cvs/src/lib/librthread/rthread_libc.c,v retrieving revision 1.12 diff -u -p -r1.12 rthread_libc.c --- rthread_libc.c 7 Apr 2015 01:27:07 - 1.12 +++ rthread_libc.c 22 Jan 2016 21:08:11 - @@ -152,18 +152,35 @@ _thread_mutex_destroy(void **mutex) /* * the malloc lock */ -static struct _spinlock malloc_lock = _SPINLOCK_UNLOCKED; +static struct pthread_mutex malloc_lock = { + _SPINLOCK_UNLOCKED, + TAILQ_HEAD_INITIALIZER(malloc_lock.lockers), + PTHREAD_MUTEX_DEFAULT, + NULL, + 0, + -1 +}; +static pthread_mutex_t malloc_mutex = &malloc_lock; void _thread_malloc_lock(void) { - _spinlock(&malloc_lock); + pthread_mutex_lock(&malloc_mutex); } void _thread_malloc_unlock(void) { - _spinunlock(&malloc_lock); + pthread_mutex_unlock(&malloc_mutex); +} + +void +_thread_malloc_reinit(void) +{ + malloc_lock.lock = _SPINLOCK_UNLOCKED_ASSIGN; + TAILQ_INIT(&malloc_lock.lockers); + malloc_lock.owner = NULL; + malloc_lock.count = 0; } /*