Firefox performance regressions

2016-01-06 Thread Landry Breuil
Hi,

i've had multiple ppl coming to me privately about this - Yes,
performance with firefox has been steadily degrading in the past
releases, since around 39 or so - i'm aware of this, but nobody is
actually working on it - i won't/can't since this is way out my skills
and i have no time nor motivation for that.

Debugging this requires profiling / running the browser within ktrace
and figuring out why apparently it does way much more syscalls than
before, which might be a clue, or not. Of course since you cant really
use traditional tools like gdb, your toolbox is empty. Or that could be
graphical stack regressions. Or so many other things. The builtin
profiler needs specific code to work on OpenBSD.

Anybody is welcome to look into it - or 5.9 will ship with firefox 43
as is.

Landry



Re: Firefox performance regressions

2016-01-06 Thread Martin Pieuchot
On 06/01/16(Wed) 11:19, Landry Breuil wrote:
> [...] 
> i've had multiple ppl coming to me privately about this - Yes,
> performance with firefox has been steadily degrading in the past
> releases, since around 39 or so - i'm aware of this, but nobody is
> actually working on it - i won't/can't since this is way out my skills
> and i have no time nor motivation for that.
> 
> Debugging this requires profiling / running the browser within ktrace
> and figuring out why apparently it does way much more syscalls than
> before, which might be a clue, or not. Of course since you cant really
> use traditional tools like gdb, your toolbox is empty. Or that could be
> graphical stack regressions. Or so many other things. The builtin
> profiler needs specific code to work on OpenBSD.
> 
> Anybody is welcome to look into it - or 5.9 will ship with firefox 43
> as is.

I started looking at this but didn't go far.  It seems that the problem
is related to/exposed by the use of pthread_mutex_lock(3) & friends. I
tried to analyze ltrace(1) outputs, but I got lost in Firefox's sources.
I really don't know where to look at.

Here's what I wrote two months ago:

On 18/11/15(Wed) 20:44 +0100, Martin Pieuchot wrote:
> The actual firefox port (not the -esr one) might be exposing a bug in
> librthread resulting in a storm of sched_yield(2) calls when multiples
> threads are trying to access the same lock, see _spinlock().
> 
> I generated two ltrace(1) dumps for librthread hackers to have a look
> at.  They are both generate with a hacked version of ltrace(1) with:
> 
>   $ ltrace -p $pid -t cu -u libpthread ; sleep 2; ktrace -C
> 
> After having started firefox with:
> 
>   $ LD_TRACE_PLT="" LD_TRACE_PLTSPEC="libpthread" DISPLAY=:0 firefox
> 
> 
> The dumps are generate while firefox is sitting in its own custom home
> page.
> 
> 
> kump-esr.txt  a dump of the firefox-esr which work "not so bad"
> 
> kdump-nightly.txt a dump of firefox's trunk which expose the problem
> 
> 
> $ grep sched_yield kdump-esr.txt |wc -l
>4
> $ grep sched_yield kdump-nightly.txt  |wc -l   
> 89418
> 
> 
> The sequence below is an extract of what you can find there, when you
> have multiple threads fighting for the malloc lock...  this goes on
> forever and gets even worse when thread 1019109 returns from thrsleep...
> 
> Apparently they are all waiting for thread 1010468 to release the malloc
> lock or...  is it bad doctor?
> 
> 
>  13288/1032189 firefox-bin RET   sched_yield 0
>  13288/1010095 firefox-bin USER  .plt symbol: 7 bytes"__errno"
>  13288/1032189 firefox-bin USER  .plt symbol: 12 bytes"_atomic_lock"
>  13288/1010095 firefox-bin USER  .plt symbol: 19 bytes
> "_thread_malloc_lock"
>  13288/1032189 firefox-bin CALL  sched_yield()
>  13288/1027370 firefox-bin USER  .plt symbol: 11 bytes"_spinunlock"
>  13288/1010095 firefox-bin USER  .plt symbol: 9 bytes"_spinlock"
>  13288/1027370 firefox-bin USER  .plt symbol: 7 bytes"__errno"
>  13288/1010095 firefox-bin USER  .plt symbol: 12 bytes"_atomic_lock"
>  13288/1010095 firefox-bin CALL  sched_yield()
>  13288/1027370 firefox-bin USER  .plt symbol: 19 bytes
> "_thread_malloc_lock"
>  13288/1010095 firefox-bin RET   sched_yield 0
>  13288/1027370 firefox-bin USER  .plt symbol: 9 bytes"_spinlock"
>  13288/1010095 firefox-bin USER  .plt symbol: 12 bytes"_atomic_lock"
>  13288/1027370 firefox-bin USER  .plt symbol: 12 bytes"_atomic_lock"
>  13288/1010095 firefox-bin CALL  sched_yield()
>  13288/1027370 firefox-bin CALL  sched_yield()
>  13288/1032189 firefox-bin RET   sched_yield 0
>  13288/1032189 firefox-bin USER  .plt symbol: 12 bytes"_atomic_lock"
>  13288/1032189 firefox-bin CALL  sched_yield()
>  13288/1027370 firefox-bin RET   sched_yield 0
>  13288/1027370 firefox-bin USER  .plt symbol: 12 bytes"_atomic_lock"
>  13288/1027370 firefox-bin CALL  sched_yield()
>  13288/1032189 firefox-bin RET   sched_yield 0
>  13288/1032189 firefox-bin USER  .plt symbol: 12 bytes"_atomic_lock"
>  13288/1032189 firefox-bin CALL  sched_yield()
>  13288/1027370 firefox-bin RET   sched_yield 0
>  13288/1010095 firefox-bin RET   sched_yield 0
>  13288/1027370 firefox-bin USER  .plt symbol: 12 bytes"_atomic_lock"
>  13288/1010095 firefox-bin USER  .plt symbol: 12 bytes"_atomic_lock"
>  13288/1027370 firefox-bin CALL  sched_yield()
>  13288/1010095 firefox-bin CALL  sched_yield()
>  13288/1032189 firefox-bin RET   sched_yield 0
>  13288/1032189 firefox-bin USER  .plt symbol: 12 bytes"_atomic_lock"
>  13288/1032189 firefox-bin CALL  sched_yield()
>  13288/1027370 firefox-bin RET   sched_yield 0
>  13288/1010095 firefox-bin RET   sched_yield 0
>  13288/1027370 firefox-bin USER  .plt symbol: 12 bytes"_atomic_lock"
>  13288/1027370 firefox-bin CALL  sched_yield()
>  13288/1032189 firefox-bin RET   sched_yield 0
>  13288/1010095 

Re: Firefox performance regressions

2016-01-06 Thread Karel Gardas
If this is about malloc and it's locking, have anybody tried to use
some custom more C++ friendly malloc lib like google's tcmalloc?
Another possibility is to gdb attach to firefox and than attempt to
track down where those excessive locks happen or hack ltrace to
print/save full stack trace if this is possible. Seeing the
responsible code may help or not here. And constructing similar
testcase which would duplicate the same behaviour would be probably
something what rthread library authors would most welcome...

On Wed, Jan 6, 2016 at 12:06 PM, Martin Pieuchot  wrote:
> On 06/01/16(Wed) 11:19, Landry Breuil wrote:
>> [...]
>> i've had multiple ppl coming to me privately about this - Yes,
>> performance with firefox has been steadily degrading in the past
>> releases, since around 39 or so - i'm aware of this, but nobody is
>> actually working on it - i won't/can't since this is way out my skills
>> and i have no time nor motivation for that.
>>
>> Debugging this requires profiling / running the browser within ktrace
>> and figuring out why apparently it does way much more syscalls than
>> before, which might be a clue, or not. Of course since you cant really
>> use traditional tools like gdb, your toolbox is empty. Or that could be
>> graphical stack regressions. Or so many other things. The builtin
>> profiler needs specific code to work on OpenBSD.
>>
>> Anybody is welcome to look into it - or 5.9 will ship with firefox 43
>> as is.
>
> I started looking at this but didn't go far.  It seems that the problem
> is related to/exposed by the use of pthread_mutex_lock(3) & friends. I
> tried to analyze ltrace(1) outputs, but I got lost in Firefox's sources.
> I really don't know where to look at.
>
> Here's what I wrote two months ago:
>
> On 18/11/15(Wed) 20:44 +0100, Martin Pieuchot wrote:
>> The actual firefox port (not the -esr one) might be exposing a bug in
>> librthread resulting in a storm of sched_yield(2) calls when multiples
>> threads are trying to access the same lock, see _spinlock().
>>
>> I generated two ltrace(1) dumps for librthread hackers to have a look
>> at.  They are both generate with a hacked version of ltrace(1) with:
>>
>>   $ ltrace -p $pid -t cu -u libpthread ; sleep 2; ktrace -C
>>
>> After having started firefox with:
>>
>>   $ LD_TRACE_PLT="" LD_TRACE_PLTSPEC="libpthread" DISPLAY=:0 firefox
>>
>>
>> The dumps are generate while firefox is sitting in its own custom home
>> page.
>>
>>
>> kump-esr.txt  a dump of the firefox-esr which work "not so bad"
>>
>> kdump-nightly.txt a dump of firefox's trunk which expose the problem
>>
>>
>> $ grep sched_yield kdump-esr.txt |wc -l
>>4
>> $ grep sched_yield kdump-nightly.txt  |wc -l
>> 89418
>>
>>
>> The sequence below is an extract of what you can find there, when you
>> have multiple threads fighting for the malloc lock...  this goes on
>> forever and gets even worse when thread 1019109 returns from thrsleep...
>>
>> Apparently they are all waiting for thread 1010468 to release the malloc
>> lock or...  is it bad doctor?
>>
>>
>>  13288/1032189 firefox-bin RET   sched_yield 0
>>  13288/1010095 firefox-bin USER  .plt symbol: 7 bytes"__errno"
>>  13288/1032189 firefox-bin USER  .plt symbol: 12 bytes"_atomic_lock"
>>  13288/1010095 firefox-bin USER  .plt symbol: 19 bytes
>> "_thread_malloc_lock"
>>  13288/1032189 firefox-bin CALL  sched_yield()
>>  13288/1027370 firefox-bin USER  .plt symbol: 11 bytes"_spinunlock"
>>  13288/1010095 firefox-bin USER  .plt symbol: 9 bytes"_spinlock"
>>  13288/1027370 firefox-bin USER  .plt symbol: 7 bytes"__errno"
>>  13288/1010095 firefox-bin USER  .plt symbol: 12 bytes"_atomic_lock"
>>  13288/1010095 firefox-bin CALL  sched_yield()
>>  13288/1027370 firefox-bin USER  .plt symbol: 19 bytes
>> "_thread_malloc_lock"
>>  13288/1010095 firefox-bin RET   sched_yield 0
>>  13288/1027370 firefox-bin USER  .plt symbol: 9 bytes"_spinlock"
>>  13288/1010095 firefox-bin USER  .plt symbol: 12 bytes"_atomic_lock"
>>  13288/1027370 firefox-bin USER  .plt symbol: 12 bytes"_atomic_lock"
>>  13288/1010095 firefox-bin CALL  sched_yield()
>>  13288/1027370 firefox-bin CALL  sched_yield()
>>  13288/1032189 firefox-bin RET   sched_yield 0
>>  13288/1032189 firefox-bin USER  .plt symbol: 12 bytes"_atomic_lock"
>>  13288/1032189 firefox-bin CALL  sched_yield()
>>  13288/1027370 firefox-bin RET   sched_yield 0
>>  13288/1027370 firefox-bin USER  .plt symbol: 12 bytes"_atomic_lock"
>>  13288/1027370 firefox-bin CALL  sched_yield()
>>  13288/1032189 firefox-bin RET   sched_yield 0
>>  13288/1032189 firefox-bin USER  .plt symbol: 12 bytes"_atomic_lock"
>>  13288/1032189 firefox-bin CALL  sched_yield()
>>  13288/1027370 firefox-bin RET   sched_yield 0
>>  13288/1010095 firefox-bin RET   sched_yield 0
>>  13288/1027370 firefox-bin USER  .plt symbol: 12 bytes