Re: Firefox, malloc(3) and threads

2016-01-26 Thread Nayden Markatchev
FYI:  This diff is in the snapshots since Sunday.

On Mon, Jan 25, 2016 at 4:34 PM, Matthew Via  wrote:

> I've had the patch applied for two days now and have not seen any ill
> efects.  This is a Thinkpad T410 running snapshots.
>
> Before, youtube was unwatchable.  Sound would continue normally while
> video would freeze for long stretches, often over 10 seconds.  Its not
> perfect now, but its very nearly so when not fullscreen.
>
> It does seem that cpu usage of firefox is also significantly reduced,
> and is generally snappier.
>
> Thank you!
> -via
>
> On 22:46 Fri 22 Jan , Mark Kettenis wrote:
> > Firefox makes a lot of concurrent malloc(3) calls.  The locking to
> > make malloc(3) thread-safe is a bit...suboptimal.  This diff makes
> > things better by using a mutex instead of spinlock.  If you're running
> > Firefox you want to try it; it makes video watchable on some machines.
> > If you're not running Firefox you want to try it; to make sure it
> > doesn't break things.
> >
> > Enjoy,
> >
> > Mark
>


Re: Firefox, malloc(3) and threads

2016-01-26 Thread Landry Breuil
On Mon, Jan 25, 2016 at 09:57:37AM +0100, Landry Breuil wrote:
> On Mon, Jan 25, 2016 at 08:48:21AM +0100, Mark Kettenis wrote:
> > > From: "Peter N. M. Hansteen" 
> > > Date: Sun, 24 Jan 2016 23:10:41 +0100
> > > 
> > > On 01/22/16 22:46, Mark Kettenis wrote:
> > > > Firefox makes a lot of concurrent malloc(3) calls.  The locking to
> > > > make malloc(3) thread-safe is a bit...suboptimal.  This diff makes
> > > > things better by using a mutex instead of spinlock.  If you're running
> > > > Firefox you want to try it; it makes video watchable on some machines.
> > > > If you're not running Firefox you want to try it; to make sure it
> > > > doesn't break things.
> > > 
> > > Running this since early Saturday, Firefox is definitely more responsive
> > > than earlier.
> > > 
> > > I haven't tried running other resource hogs such as LibreOffice with
> > > several large documents, but I guess I could try that too if it's a
> > > relevant scenario.
> > 
> > Please do!
> 
> Albeit small, x11/xfce4/thunar makes a heavy use of threads (in general,
> and even more when talking to gvfs mounts). It feels now 200% snappier.

Another successful test on i386 where firefox had became totally unusable
(Atom N270, 1Gb RAM), with the latest snap (including the diff) it's
sort-of usable (gmaps, google news...). Yay!

Landry



Re: Firefox, malloc(3) and threads

2016-01-25 Thread lists
I haven't tried anything too scientific yet, but pages seem to load
quicker and firefox seems to be more responsive under load for me.
Before this patch, loading a page would have a tendency to lock the
browser for a few seconds on complex pages.

Nothing seems to have broken, so I'll try harder.



Re: Firefox, malloc(3) and threads

2016-01-25 Thread Stefan Wollny
Hi Mark,

even with 16GB RAM I needed to install smtube to get a decent view of
videos prior to your patches. Patched last night but only tonight I am
able to do some testing:

At present I have openend
- LibreOffice Writer with one doc
- LibreOffice Calc with one doc
- gimp with one picture
- Pidgin-OTR
- smplayer (nothing playing)
- Thunderbird (two mail boxes)
- Firefox with 10 tabs open, one of them being YT (Theo talking about
pledge at Hackfest 2015)

Even though YT is hanging every now an then it it now perfectly possible
to watch / listen / follow the presentation although I have just a
modest line. CPU usage (noticed via 'top') peaked at around 160% but
average seems to be around 100%.

I didn't notice any drawbacks from your patches. Every program is
responsive, only thunderbird had some delays while typing this post
(listening to Theo meanwhile).

While this is not a "serious" test (by academic terms as it is not 100%
repeatable) I can only report that I didn't come across any failures.
Instead the system "feels" to be highly responsive with any task I tried.

To summarize: THANK YOU!

Best,
STEFAN



OpenBSD 5.9-beta (GENERIC.MP) #1863: Sun Jan 24 21:35:42 MST 2016
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 17082359808 (16291MB)
avail mem = 16560455680 (15793MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xeb500 (35 entries)
bios0: vendor American Megatrends Inc. version "1.05.01" date 08/05/2015
bios0: Notebook W65_67SZ
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT ASF! SSDT SSDT SSDT MCFG HPET SSDT
SSDT SSDT DMAR
acpi0: wakeup devices PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4)
RP03(S4) PXSX(S4) RP04(S4) RLAN(S4) PXSX(S4) RP05(S4) PXSX(S4) RP06(S4)
PXSX(S4) RP07(S4) PXSX(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz, 3093.23 MHz
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz, 3092.84 MHz
cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 1 (application processor)
cpu2: Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz, 3092.84 MHz
cpu2:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 1, core 0, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz, 3092.84 MHz
cpu3:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins
acpimcfg0 at acpi0 addr 0xf800, bus 0-63
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 2 (RP01)
acpiprt2 at acpi0: bus 3 (RP03)
acpiprt3 at acpi0: bus 4 (RP04)
acpiprt4 at acpi0: bus 1 (P0P2)
acpiprt5 at acpi0: bus -1 (P0PA)
acpiprt6 at acpi0: bus -1 (P0PB)
acpiprt7 at acpi0: bus 1 (PEG0)
acpiec0 at acpi0
acpicpu0 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS
acpicpu1 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS
acpicpu2 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS
acpicpu3 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS
acpitz0 at acpi0: critical temperature is 120 degC
acpibtn0 at acpi0: PWRB
acpibtn1 at acpi0: 

Re: Firefox, malloc(3) and threads

2016-01-25 Thread Edd Barrett
Hi Mark,

On Fri, Jan 22, 2016 at 10:46:39PM +0100, Mark Kettenis wrote:
> Firefox makes a lot of concurrent malloc(3) calls.  The locking to
> make malloc(3) thread-safe is a bit...suboptimal.  This diff makes
> things better by using a mutex instead of spinlock.  If you're running
> Firefox you want to try it; it makes video watchable on some machines.
> If you're not running Firefox you want to try it; to make sure it
> doesn't break things.

I tried your diff. Nothing bad happened.

I don't notice much difference in firefox using a highly unscientific
"gut-feeling" before and after test. Youtube videos still stutter -- too
much to watch. During this time firefox uses ~170% CPU.

I also tried iridium, my everyday browser and didn't notice a difference
here either. Youtube videos performance remains the same: much better
than firefox, but still skipping frequently.

My system is a thinkpad x240t tablet. Dmesg follows (sorry about the
suspend in there: I have to perform a zzz and wake before the HDMI2
output shows up in my docking station, so it's always the first thing I
do after booting fresh -- keep meaning to look into this):

OpenBSD 5.9-beta (GENERIC.MP) #17: Mon Jan 25 14:31:46 GMT 2016
e...@wilfred.dlink.com:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 16844521472 (16064MB)
avail mem = 16329822208 (15573MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xdae9c000 (68 entries)
bios0: vendor LENOVO version "GCETA2WW (2.62 )" date 04/09/2015
bios0: LENOVO 3437CTO
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SLIC TCPA SSDT SSDT SSDT HPET APIC MCFG ECDT FPDT ASF! 
UEFI UEFI POAT SSDT SSDT DMAR UEFI DBG2
acpi0: wakeup devices LID_(S4) SLPB(S3) IGBE(S4) EXP3(S4) XHCI(S3) EHC1(S3) 
EHC2(S3) HDEF(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.59 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,SENSOR,ARAT
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.11 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,SENSOR,ARAT
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins
acpimcfg0 at acpi0 addr 0xf800, bus 0-63
acpiec0 at acpi0
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (PEG_)
acpiprt2 at acpi0: bus 2 (EXP1)
acpiprt3 at acpi0: bus 3 (EXP2)
acpiprt4 at acpi0: bus 4 (EXP3)
acpicpu0 at acpi0: C2(350@80 mwait.1@0x20), C1(1000@1 mwait.1), PSS
acpicpu1 at acpi0: C2(350@80 mwait.1@0x20), C1(1000@1 mwait.1), PSS
acpipwrres0 at acpi0: PUBS, resource for XHCI, EHC1, EHC2
acpitz0 at acpi0: critical temperature is 103 degC
acpibtn0 at acpi0: LID_
acpibtn1 at acpi0: SLPB
acpibat0 at acpi0: BAT0 model "45N1077" serial 14278 type LION oem "SANYO"
acpibat1 at acpi0: BAT1 not present
acpiac0 at acpi0: AC unit online
acpithinkpad0 at acpi0
acpidock0 at acpi0: GDCK docked (15)
cpu0: Enhanced SpeedStep 2594 MHz: speeds: 2601, 2600, 2500, 2400, 2300, 2200, 
2100, 2000, 1900, 1800, 1700, 1600, 1500, 1400, 1300, 1200 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 3G Host" rev 0x09
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 4000" rev 0x09
drm0 at inteldrm0
inteldrm0: msi
inteldrm0: 1366x768
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
"Intel 7 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
em0 at pci0 dev 25 function 0 "Intel 82579LM" rev 0x04: msi, address 
3c:97:0e:a5:02:69
ehci0 at pci0 dev 26 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 16
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
azalia0 at pci0 dev 27 function 0 "Intel 7 Series HD Audio" rev 0x04: msi
azalia0: codecs: Realtek ALC269, Intel/0x2806, using Realtek ALC269
audio0 at azalia0
ppb0 at pci0 dev 28 function 0 "Intel 7 Series PCIE" rev 0xc4: msi
pci1 at ppb0 bus 2
sdhc0 at pci1 dev 0 function 0 "Ricoh 5U822 SD/MMC" rev 0x07: apic 2 int 16
sdmmc0 at sdhc0

Re: Firefox, malloc(3) and threads

2016-01-25 Thread Juan Francisco Cantero Hurtado
On Mon, Jan 25, 2016 at 10:06:22AM +0100, David Coppa wrote:
> On Sun, Jan 24, 2016 at 7:47 PM, Adam Wolk  wrote:
> > On Fri, 22 Jan 2016 22:46:39 +0100 (CET)
> > Mark Kettenis  wrote:
> >
> >> Firefox makes a lot of concurrent malloc(3) calls.  The locking to
> >> make malloc(3) thread-safe is a bit...suboptimal.  This diff makes
> >> things better by using a mutex instead of spinlock.  If you're running
> >> Firefox you want to try it; it makes video watchable on some machines.
> >> If you're not running Firefox you want to try it; to make sure it
> >> doesn't break things.
> >>
> >> Enjoy,
> >>
> >> Mark
> >>  '
> >
> > Applied to a Jan 15h snapshot sources. Youtube is not fully 'watchable'
> > on firefox but feels significantly better. I can also now watch full
> > screen youtube videos on chromium 1920x1080 with no stutter (lenovo
> > g50-70).
> >
> > Generally gnome 3 feels a bit snappier especially on first load,
> > bringing up the menu searching for 'terminal' leads to a faster
> > rendering of the results. This might be just 'imagined' by me.
> >
> > On a more measurable front. I ran the octane benchmark against firefox
> > post and before the patch. It resulted in a slight improvement from
> > 12486 to 12826 score [1].
> 
> Besides performance related issues, the problem we saw in the past was
> firefox using a huge amount of CPU resources with no apparent
> reasons...

I've seen the same behavior on Linux. Probably not 100% related to the
OS.

-- 
Juan Francisco Cantero Hurtado http://juanfra.info



Re: Firefox, malloc(3) and threads

2016-01-25 Thread Daniel Bolgheroni
On Sat, Jan 23, 2016 at 03:53:32PM +0100, Martin Natano wrote:
> Yes! This absolutely makes Youtube videos watchable for me (on a
> Thinkpad T520). There still is occassional stuttering, but _far_ less
> disruptive than before. Another usecase where I see improvements is
> reloading a resource-heavy web page while switching tabs. Before
> applying the patch, this caused the browser to hang for several seconds.
> Now it doesn't.

The same here on a ThinkPad T420.

dmesg:
OpenBSD 5.9-beta (GENERIC.MP) #0: Mon Jan 25 19:14:50 BRST 2016
dbolgher...@iron.my.domain:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8451125248 (8059MB)
avail mem = 8190803968 (7811MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xdae9c000 (65 entries)
bios0: vendor LENOVO version "83ET70WW (1.40 )" date 06/12/2012
bios0: LENOVO 4180DL4
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SLIC SSDT SSDT SSDT HPET APIC MCFG ECDT ASF! TCPA SSDT 
SSDT DMAR UEFI UEFI UEFI
acpi0: wakeup devices LID_(S3) SLPB(S3) IGBE(S4) EXP4(S4) EHC1(S3) EHC2(S3) 
HDEF(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2492.32 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2491.91 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 1, core 0, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2491.91 MHz
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2491.92 MHz
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins
acpimcfg0 at acpi0 addr 0xf800, bus 0-63
acpiec0 at acpi0
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (PEG_)
acpiprt2 at acpi0: bus 2 (EXP1)
acpiprt3 at acpi0: bus 3 (EXP2)
acpiprt4 at acpi0: bus 5 (EXP4)
acpiprt5 at acpi0: bus 13 (EXP5)
acpicpu0 at acpi0: C3(350@104 io@0x415), C1(1000@1 halt), PSS
acpicpu1 at acpi0: C3(350@104 io@0x415), C1(1000@1 halt), PSS
acpicpu2 at acpi0: C3(350@104 io@0x415), C1(1000@1 halt), PSS
acpicpu3 at acpi0: C3(350@104 io@0x415), C1(1000@1 halt), PSS
acpipwrres0 at acpi0: PUBS, resource for EHC1, EHC2
acpitz0 at acpi0: critical temperature is 98 degC
acpibtn0 at acpi0: LID_
acpibtn1 at acpi0: SLPB
acpibat0 at acpi0: BAT0 model "42T4710" serial  1694 type LION oem "SANYO"
acpibat1 at acpi0: BAT1 not present
acpiac0 at acpi0: AC unit online
acpithinkpad0 at acpi0
cpu0: Enhanced SpeedStep 2492 MHz: speeds: 2501, 2500, 2200, 2000, 1800, 1600, 
1400, 1200, 1000, 800 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 2G Host" rev 0x09
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 3000" rev 0x09
drm0 at inteldrm0
inteldrm0: msi
inteldrm0: 1600x900
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
"Intel 6 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
puc0 at pci0 dev 22 function 3 "Intel 6 Series KT" rev 0x04: ports: 1 com
com4 at puc0 port 0 apic 2 int 19: ns16550a, 16 byte fifo
com4: probed fifo depth: 0 bytes
em0 at pci0 dev 25 function 0 "Intel 82579LM" rev 0x04: msi, address 
00:21:cc:ba:e3:5d
ehci0 at pci0 dev 26 function 0 "Intel 6 Series USB" rev 0x04: apic 2 int 16
usb0 at ehci0: USB revision 2.0

Re: Firefox, malloc(3) and threads

2016-01-25 Thread Matthew Via
I've had the patch applied for two days now and have not seen any ill
efects.  This is a Thinkpad T410 running snapshots.

Before, youtube was unwatchable.  Sound would continue normally while
video would freeze for long stretches, often over 10 seconds.  Its not
perfect now, but its very nearly so when not fullscreen.

It does seem that cpu usage of firefox is also significantly reduced,
and is generally snappier.

Thank you!
-via

On 22:46 Fri 22 Jan , Mark Kettenis wrote:
> Firefox makes a lot of concurrent malloc(3) calls.  The locking to
> make malloc(3) thread-safe is a bit...suboptimal.  This diff makes
> things better by using a mutex instead of spinlock.  If you're running
> Firefox you want to try it; it makes video watchable on some machines.
> If you're not running Firefox you want to try it; to make sure it
> doesn't break things.
> 
> Enjoy,
> 
> Mark


pgpADkOhkH3M0.pgp
Description: PGP signature


Re: Firefox, malloc(3) and threads

2016-01-25 Thread Landry Breuil
On Mon, Jan 25, 2016 at 08:48:21AM +0100, Mark Kettenis wrote:
> > From: "Peter N. M. Hansteen" 
> > Date: Sun, 24 Jan 2016 23:10:41 +0100
> > 
> > On 01/22/16 22:46, Mark Kettenis wrote:
> > > Firefox makes a lot of concurrent malloc(3) calls.  The locking to
> > > make malloc(3) thread-safe is a bit...suboptimal.  This diff makes
> > > things better by using a mutex instead of spinlock.  If you're running
> > > Firefox you want to try it; it makes video watchable on some machines.
> > > If you're not running Firefox you want to try it; to make sure it
> > > doesn't break things.
> > 
> > Running this since early Saturday, Firefox is definitely more responsive
> > than earlier.
> > 
> > I haven't tried running other resource hogs such as LibreOffice with
> > several large documents, but I guess I could try that too if it's a
> > relevant scenario.
> 
> Please do!

Albeit small, x11/xfce4/thunar makes a heavy use of threads (in general,
and even more when talking to gvfs mounts). It feels now 200% snappier.

Landry



Re: Firefox, malloc(3) and threads

2016-01-25 Thread David Coppa
On Sun, Jan 24, 2016 at 7:47 PM, Adam Wolk  wrote:
> On Fri, 22 Jan 2016 22:46:39 +0100 (CET)
> Mark Kettenis  wrote:
>
>> Firefox makes a lot of concurrent malloc(3) calls.  The locking to
>> make malloc(3) thread-safe is a bit...suboptimal.  This diff makes
>> things better by using a mutex instead of spinlock.  If you're running
>> Firefox you want to try it; it makes video watchable on some machines.
>> If you're not running Firefox you want to try it; to make sure it
>> doesn't break things.
>>
>> Enjoy,
>>
>> Mark
>>  '
>
> Applied to a Jan 15h snapshot sources. Youtube is not fully 'watchable'
> on firefox but feels significantly better. I can also now watch full
> screen youtube videos on chromium 1920x1080 with no stutter (lenovo
> g50-70).
>
> Generally gnome 3 feels a bit snappier especially on first load,
> bringing up the menu searching for 'terminal' leads to a faster
> rendering of the results. This might be just 'imagined' by me.
>
> On a more measurable front. I ran the octane benchmark against firefox
> post and before the patch. It resulted in a slight improvement from
> 12486 to 12826 score [1].

Besides performance related issues, the problem we saw in the past was
firefox using a huge amount of CPU resources with no apparent
reasons...
So please also try to test if you still see this erratic behavior with
Mark's patch applied.

ciao,
David



Re: Firefox, malloc(3) and threads

2016-01-24 Thread Mark Kettenis
> From: "Peter N. M. Hansteen" 
> Date: Sun, 24 Jan 2016 23:10:41 +0100
> 
> On 01/22/16 22:46, Mark Kettenis wrote:
> > Firefox makes a lot of concurrent malloc(3) calls.  The locking to
> > make malloc(3) thread-safe is a bit...suboptimal.  This diff makes
> > things better by using a mutex instead of spinlock.  If you're running
> > Firefox you want to try it; it makes video watchable on some machines.
> > If you're not running Firefox you want to try it; to make sure it
> > doesn't break things.
> 
> Running this since early Saturday, Firefox is definitely more responsive
> than earlier.
> 
> I haven't tried running other resource hogs such as LibreOffice with
> several large documents, but I guess I could try that too if it's a
> relevant scenario.

Please do!



Re: Firefox, malloc(3) and threads

2016-01-24 Thread Adam Wolk
On Fri, 22 Jan 2016 22:46:39 +0100 (CET)
Mark Kettenis  wrote:

> Firefox makes a lot of concurrent malloc(3) calls.  The locking to
> make malloc(3) thread-safe is a bit...suboptimal.  This diff makes
> things better by using a mutex instead of spinlock.  If you're running
> Firefox you want to try it; it makes video watchable on some machines.
> If you're not running Firefox you want to try it; to make sure it
> doesn't break things.
> 
> Enjoy,
> 
> Mark
>  '

Applied to a Jan 15h snapshot sources. Youtube is not fully 'watchable'
on firefox but feels significantly better. I can also now watch full
screen youtube videos on chromium 1920x1080 with no stutter (lenovo
g50-70).

Generally gnome 3 feels a bit snappier especially on first load,
bringing up the menu searching for 'terminal' leads to a faster
rendering of the results. This might be just 'imagined' by me.

On a more measurable front. I ran the octane benchmark against firefox
post and before the patch. It resulted in a slight improvement from
12486 to 12826 score [1].

cpu0: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.93 MHz
cpu1: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.62 MHz
cpu2: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.62 MHz
cpu3: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.62 MHz
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics" rev 0x0b
running Intel Haswell Mobile for the gfx card. 

Regards,
Adam

[1] - https://twitter.com/mulander/status/691327370985345024



Re: Firefox, malloc(3) and threads

2016-01-24 Thread Ville Valkonen
On 24 January 2016 at 20:47, Adam Wolk  wrote:
> On Fri, 22 Jan 2016 22:46:39 +0100 (CET)
> Mark Kettenis  wrote:
>
>> Firefox makes a lot of concurrent malloc(3) calls.  The locking to
>> make malloc(3) thread-safe is a bit...suboptimal.  This diff makes
>> things better by using a mutex instead of spinlock.  If you're running
>> Firefox you want to try it; it makes video watchable on some machines.
>> If you're not running Firefox you want to try it; to make sure it
>> doesn't break things.
>>
>> Enjoy,
>>
>> Mark
>>  '
>
> Applied to a Jan 15h snapshot sources. Youtube is not fully 'watchable'
> on firefox but feels significantly better. I can also now watch full
> screen youtube videos on chromium 1920x1080 with no stutter (lenovo
> g50-70).
>
> Generally gnome 3 feels a bit snappier especially on first load,
> bringing up the menu searching for 'terminal' leads to a faster
> rendering of the results. This might be just 'imagined' by me.
>
> On a more measurable front. I ran the octane benchmark against firefox
> post and before the patch. It resulted in a slight improvement from
> 12486 to 12826 score [1].
>
> cpu0: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.93 MHz
> cpu1: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.62 MHz
> cpu2: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.62 MHz
> cpu3: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.62 MHz
> inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics" rev 0x0b
> running Intel Haswell Mobile for the gfx card.
>
> Regards,
> Adam
>
> [1] - https://twitter.com/mulander/status/691327370985345024


Hi,

pretty much the same results here, though running Lenovo X250 with i7-5600U.

Dankuwel Mark, nice finding.

--
Regards,
Ville Valkonen



Re: Firefox, malloc(3) and threads

2016-01-23 Thread Martin Natano
Yes! This absolutely makes Youtube videos watchable for me (on a
Thinkpad T520). There still is occassional stuttering, but _far_ less
disruptive than before. Another usecase where I see improvements is
reloading a resource-heavy web page while switching tabs. Before
applying the patch, this caused the browser to hang for several seconds.
Now it doesn't.

The patch reads fine to, although I'm not an rthread expert. It doesn't
seem to break anything on my system either.

Thanks,
natano

On Fri, Jan 22, 2016 at 10:46:39PM +0100, Mark Kettenis wrote:
> Firefox makes a lot of concurrent malloc(3) calls.  The locking to
> make malloc(3) thread-safe is a bit...suboptimal.  This diff makes
> things better by using a mutex instead of spinlock.  If you're running
> Firefox you want to try it; it makes video watchable on some machines.
> If you're not running Firefox you want to try it; to make sure it
> doesn't break things.
> 
> Enjoy,
> 
> Mark
> 
> 
> Index: rthread.h
> ===
> RCS file: /cvs/src/lib/librthread/rthread.h,v
> retrieving revision 1.54
> diff -u -p -r1.54 rthread.h
> --- rthread.h 10 Nov 2015 04:30:59 -  1.54
> +++ rthread.h 22 Jan 2016 21:08:11 -
> @@ -223,6 +223,7 @@ void  _rthread_debug_init(void);
>  #ifndef NO_PIC
>  void _rthread_dl_lock(int what);
>  #endif
> +void _thread_malloc_reinit(void);
>  
>  /* rthread_cancel.c */
>  void _enter_cancel(pthread_t);
> Index: rthread_fork.c
> ===
> RCS file: /cvs/src/lib/librthread/rthread_fork.c,v
> retrieving revision 1.14
> diff -u -p -r1.14 rthread_fork.c
> --- rthread_fork.c18 Oct 2015 08:02:58 -  1.14
> +++ rthread_fork.c22 Jan 2016 21:08:11 -
> @@ -82,7 +82,10 @@ _dofork(int is_vfork)
>   newid = sys_fork();
>  
>   _thread_arc4_unlock();
> - _thread_malloc_unlock();
> + if (newid == 0)
> + _thread_malloc_reinit();
> + else
> + _thread_malloc_unlock();
>   _thread_atexit_unlock();
>  
>   if (newid == 0) {
> Index: rthread_libc.c
> ===
> RCS file: /cvs/src/lib/librthread/rthread_libc.c,v
> retrieving revision 1.12
> diff -u -p -r1.12 rthread_libc.c
> --- rthread_libc.c7 Apr 2015 01:27:07 -   1.12
> +++ rthread_libc.c22 Jan 2016 21:08:11 -
> @@ -152,18 +152,35 @@ _thread_mutex_destroy(void **mutex)
>  /*
>   * the malloc lock
>   */
> -static struct _spinlock malloc_lock = _SPINLOCK_UNLOCKED;
> +static struct pthread_mutex malloc_lock = {
> + _SPINLOCK_UNLOCKED,
> + TAILQ_HEAD_INITIALIZER(malloc_lock.lockers),
> + PTHREAD_MUTEX_DEFAULT,
> + NULL,
> + 0,
> + -1
> +};
> +static pthread_mutex_t malloc_mutex = _lock;
>  
>  void
>  _thread_malloc_lock(void)
>  {
> - _spinlock(_lock);
> + pthread_mutex_lock(_mutex);
>  }
>  
>  void
>  _thread_malloc_unlock(void)
>  {
> - _spinunlock(_lock);
> + pthread_mutex_unlock(_mutex);
> +}
> +
> +void
> +_thread_malloc_reinit(void)
> +{
> + malloc_lock.lock = _SPINLOCK_UNLOCKED_ASSIGN;
> + TAILQ_INIT(_lock.lockers);
> + malloc_lock.owner = NULL;
> + malloc_lock.count = 0;
>  }
>  
>  /*
> 



Re: Firefox, malloc(3) and threads

2016-01-23 Thread Jaime Tarrant
* On Fri Jan 22, 2016 at 10:46:39PM +0100 28706 , Mark Kettenis 
(mark.kette...@xs4all.nl) wrote:
>
> Firefox makes a lot of concurrent malloc(3) calls.  The locking to
> make malloc(3) thread-safe is a bit...suboptimal.  This diff makes
> things better by using a mutex instead of spinlock.  If you're running
> Firefox you want to try it; it makes video watchable on some machines.
> If you're not running Firefox you want to try it; to make sure it
> doesn't break things.
>
> Enjoy,
>
> Mark
>
[snip]

Hi Mark,

I have applied your patch and noticed a big improvement with Youtube
videos and if I am not mistaken, content heavy websites like news
sites seem to load faster and more smoothly too.

This machine is a 2009 Macbook Pro running -Current. I will patch my
-Current server as well and let you know if I notice anything good or
bad.

Awesome! Thanks!!