Re: [webkit-dev] WTF::fastMalloc

2013-10-05 Thread Maciej Stachowiak

These days, pure JavaScript tests won't do a lot of malloc() calls, so it's 
more relevant to try a page load speed or DOM benchmark.

 - Maciej

On Oct 4, 2013, at 6:10 AM, Osztrogonác Csaba  wrote:

> Hi,
> 
> As Zoltan said this feature was introduced for Qt port. But now
> EFL, GTK and Nix use fastmalloc instead of system malloc too.
> It was fine and used for some use-cases in those days.
> 
> To make a decision if the fastmalloc or the system malloc is better,
> we need some measurements. I made a quick test on EFL and Nix with
> SunSpider and with the Methanol test suite and haven't seen any
> significant performance differences between fastmalloc and system
> malloc on my desktop: Ubuntu 12.04 (x86_64). I haven't checked the
> memory consumption, it would need more preparation.
> 
> Keeping the old TCMalloc and the custom allocator framework isn't
> blocker for us (University of Szeged), so we don't have objection
> against removing it from trunk. If nodbody is interested in maintaining
> the framework, it can be removed. If the final conclusion would be
> dropping TCMalloc, we willingly help in this clean-up.
> 
> Ossy
> 
> Zoltan Horvath írta:
>> I used to work on memory related topics, while I was working on the 
>> University of Szeged.
>> Based on a 2.5-year-old measurement 
>> (http://webkit.sed.hu/blog/20100302/war-allocators-qtlaunchers-coast) on the 
>> Qt-port, the page loading on the Methanol test suite was 5% faster (avg) 
>> with TCmalloc than the default system allocator on Linux. The performance 
>> results of the SunSpider suite was similar for both allocators. The memory 
>> consumption was always lower with the default os allocator. I guess the new 
>> allocator only has iOS support. I'm fine with removing TCmalloc, although 
>> this direction might raises further questions, like removing the custom 
>> allocation framework also. Feel free to cc me on bugs, I can help by 
>> contributing some patches. 
> 
>> On Mon, Sep 30, 2013 at 2:48 PM, Geoffrey Garen > planning to remove our years-out-of-date port of TCMalloc, and
>>replace it with something that takes maximum advantage of Mac and
>>iOS virtual memory, threading, and security APIs.
>>I've heard that TCMalloc has caused some problems for non-Mac,
>>non-iOS ports in the past. So, if you maintain a port, this change
>>might make things simpler for you.
>>Are there any ports whose built-in malloc implementations are slow
>>enough that they can't get by without TCMalloc?
> ___
> webkit-dev mailing list
> webkit-dev@lists.webkit.org
> https://lists.webkit.org/mailman/listinfo/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-04 Thread Osztrogonác Csaba

Hi,

As Zoltan said this feature was introduced for Qt port. But now
EFL, GTK and Nix use fastmalloc instead of system malloc too.
It was fine and used for some use-cases in those days.

To make a decision if the fastmalloc or the system malloc is better,
we need some measurements. I made a quick test on EFL and Nix with
SunSpider and with the Methanol test suite and haven't seen any
significant performance differences between fastmalloc and system
malloc on my desktop: Ubuntu 12.04 (x86_64). I haven't checked the
memory consumption, it would need more preparation.

Keeping the old TCMalloc and the custom allocator framework isn't
blocker for us (University of Szeged), so we don't have objection
against removing it from trunk. If nodbody is interested in maintaining
the framework, it can be removed. If the final conclusion would be
dropping TCMalloc, we willingly help in this clean-up.

Ossy

Zoltan Horvath írta:
I used to work on memory related topics, while I was working on the 
University of Szeged.


Based on a 2.5-year-old measurement 
(http://webkit.sed.hu/blog/20100302/war-allocators-qtlaunchers-coast) on 
the Qt-port, the page loading on the Methanol test suite was 5% faster 
(avg) with TCmalloc than the default system allocator on Linux. The 
performance results of the SunSpider suite was similar for both 
allocators. The memory consumption was always lower with the default os 
allocator. 

I guess the new allocator only has iOS support. I'm fine with removing 
TCmalloc, although this direction might raises further questions, like 
removing the custom allocation framework also. Feel free to cc me on 
bugs, I can help by contributing some patches. 


On Mon, Sep 30, 2013 at 2:48 PM, Geoffrey Garen I'm planning to remove our years-out-of-date port of TCMalloc, and

replace it with something that takes maximum advantage of Mac and
iOS virtual memory, threading, and security APIs.

I've heard that TCMalloc has caused some problems for non-Mac,
non-iOS ports in the past. So, if you maintain a port, this change
might make things simpler for you.

Are there any ports whose built-in malloc implementations are slow
enough that they can't get by without TCMalloc?

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-02 Thread Geoffrey Garen
> However, given the constraints, what's the problem with the mmap
> strategy?  Sure, you have more page tables on the kernel side, but
> mmap'd memory that is never touched is never resident in a process.  I
> verified this a few months back when troubleshooting some memory-related
> issues.

Okeedokee. Can you write up this patch for OSAllocator?

> 
>> (2) POSIX uses MADV_FREE, MADV_DONTNEED, and/or MADV_WILLNEED. I don’t
>> think anybody has ever verified that these APIs do what we want. In my
>> experience, they usually don’t. So, we need to find a variation on these
>> APIs that works and is fast.
> 
> I've looked into it.  The MADV_WILLNEED is useless -- it does nothing on
> anonymous pages, returns -EINVAL, but is harmless also.  The
> MADV_DONTNEED dance does work though, properly paging out memory and
> lazily providing fresh zeroed pages should the memory be paged in again.

The API we want shouldn’t zero the pages or require a page fault right away. It 
should only zero the pages if they end up being used by the rest of the system. 
In the normal case, it should return the pages to use intact. Otherwise, it 
will be too slow, and we’ll have to jump through hoops to avoid using the API 
very much, which confuses the design.

Geoff
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-02 Thread Maciej Stachowiak

On Oct 2, 2013, at 2:41 AM, Andy Wingo  wrote:

> 
>> We need somebody to resolve these issues, otherwise our memory
>> footprint will be unacceptably high, and/or our VM operations will be
>> unacceptably slow.
> 
> There is no memory footprint problem caused by mmap here -- to my
> knowledge.  I don't know how to profile the VM overhead, though.

It's easy to fix the VM overhead by unmapping the extra at either end, if 
running out of address space is a real risk.

 - Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-02 Thread Maciej Stachowiak

On Oct 2, 2013, at 1:17 AM, Konstantin Tokarev  wrote:

> 
> 02.10.2013, 03:18, "Zoltan Horvath" :
>> On Tue, Oct 1, 2013 at 3:52 PM, Geoffrey Garen  wrote:
 So are you proposing to use the system allocator on Windows?
>>> 
>>> I’m proposing a two step process:
>>> 
>>> (1) Use the system allocator on Windows (and GTK).
>>> (2) If a port maintainer cares to optimize a given port, without too much 
>>> disruption to mainline code, they may do so.
>>> 
>>> FWIW, If I were conducting (2) for Windows, malloc would be pretty far down 
>>> the list of things I started porting.
>>> 
 The current malloc logic has been the source of a number of mysterious 
 crashes on Windows, so reverting to the system allocator might be a good 
 thing for stability. I don’t know what the potential performance 
 ramifications would be.
>>> 
>>> Yes, I’ve heard that on other platforms as well.
>> 
>> This usually happens because the allocation/free mismatches. (In cases such 
>> as memory allocated by TCmalloc via the FastMalloc interface (fastMalloc, 
>> fastNewMalloc) and tried to be freed by the system free.)
> 
> Out of curiosity, what's wrong with linking whole application using WebKit 
> against tcmalloc or some other malloc implementation? This way it's possible 
> to use optimized allocator without any source changes, and malloc/free 
> mismatch cannot happen. Why FastMalloc API was needed at all?

We couldn't find a clean way to do this on Mac because some low-level 
frameworks make use of specific obscure features of the system allocator. But 
it may be viable on other platforms.

Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-02 Thread Darin Adler
On Oct 2, 2013, at 1:17 AM, Konstantin Tokarev  wrote:

> Out of curiosity, what's wrong with linking whole application using WebKit 
> against tcmalloc or some other malloc implementation?

There are a lot of things wrong with that. Most of them depend on the platform.

On Mac, for example, WebKit is a framework. Linking apps using WebKit against a 
different malloc implementation would have no effect on WebKit’s memory 
allocation. Further, doing this would create allocator mismatch problems for 
any memory allocated by WebKit but freed by the application or vice versa. 
There are many other problems with this approach on Mac. Another one is that 
there are at least thousands of apps currently using WebKit on Mac, maybe tens 
of thousands (hundreds of thousands, at least, on iOS), and so if this is 
something the app developer has to do, there are a lot of people to reach.

-- Darin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-02 Thread Andy Wingo
Hi Geoffrey,

On Wed 02 Oct 2013 00:11, Geoffrey Garen  writes:

> There are two problems with the current OSAllocator POSIX implementation:
>
> (1) It uses mmap, which doesn’t support aligned allocation. To get
> aligned allocation, POSIX double-allocates all virtual memory. That is
> 2X too much. So, we need to find a variation on mmap that supports an
> alignment constraint.

This doesn't exist on POSIX, as you probably know.  posix_memalign
doesn't have the zeroing characteristics of anonymous mmap, and is
otherwise a terrible interface.  Darwin-like vm_map and friends would be
nicer.

However, given the constraints, what's the problem with the mmap
strategy?  Sure, you have more page tables on the kernel side, but
mmap'd memory that is never touched is never resident in a process.  I
verified this a few months back when troubleshooting some memory-related
issues.

> (2) POSIX uses MADV_FREE, MADV_DONTNEED, and/or MADV_WILLNEED. I don’t
> think anybody has ever verified that these APIs do what we want. In my
> experience, they usually don’t. So, we need to find a variation on these
> APIs that works and is fast.

I've looked into it.  The MADV_WILLNEED is useless -- it does nothing on
anonymous pages, returns -EINVAL, but is harmless also.  The
MADV_DONTNEED dance does work though, properly paging out memory and
lazily providing fresh zeroed pages should the memory be paged in again.

> We need somebody to resolve these issues, otherwise our memory
> footprint will be unacceptably high, and/or our VM operations will be
> unacceptably slow.

There is no memory footprint problem caused by mmap here -- to my
knowledge.  I don't know how to profile the VM overhead, though.

I will agree that OSAllocatorPosix.cpp is exceptionally ugly ;), but it
does seem to do its job within reasonable performance constraints.

Regards,

Andy
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-02 Thread Konstantin Tokarev

02.10.2013, 03:18, "Zoltan Horvath" :
> On Tue, Oct 1, 2013 at 3:52 PM, Geoffrey Garen  wrote:
>>> So are you proposing to use the system allocator on Windows?
>>
>> I’m proposing a two step process:
>>
>> (1) Use the system allocator on Windows (and GTK).
>> (2) If a port maintainer cares to optimize a given port, without too much 
>> disruption to mainline code, they may do so.
>>
>> FWIW, If I were conducting (2) for Windows, malloc would be pretty far down 
>> the list of things I started porting.
>>
>>> The current malloc logic has been the source of a number of mysterious 
>>> crashes on Windows, so reverting to the system allocator might be a good 
>>> thing for stability. I don’t know what the potential performance 
>>> ramifications would be.
>>
>> Yes, I’ve heard that on other platforms as well.
>
> This usually happens because the allocation/free mismatches. (In cases such 
> as memory allocated by TCmalloc via the FastMalloc interface (fastMalloc, 
> fastNewMalloc) and tried to be freed by the system free.)

Out of curiosity, what's wrong with linking whole application using WebKit 
against tcmalloc or some other malloc implementation? This way it's possible to 
use optimized allocator without any source changes, and malloc/free mismatch 
cannot happen. Why FastMalloc API was needed at all?

-- 
Regards,
Konstantin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Zoltan Horvath
On Tue, Oct 1, 2013 at 3:52 PM, Geoffrey Garen  wrote:

> > So are you proposing to use the system allocator on Windows?
>
> I’m proposing a two step process:
>
> (1) Use the system allocator on Windows (and GTK).


> (2) If a port maintainer cares to optimize a given port, without too much
> disruption to mainline code, they may do so.
>
> FWIW, If I were conducting (2) for Windows, malloc would be pretty far
> down the list of things I started porting.
>
> > The current malloc logic has been the source of a number of mysterious
> crashes on Windows, so reverting to the system allocator might be a good
> thing for stability. I don’t know what the potential performance
> ramifications would be.
>
> Yes, I’ve heard that on other platforms as well.
>

This usually happens because the allocation/free mismatches. (In cases such
as memory allocated by TCmalloc via the FastMalloc interface (fastMalloc,
fastNewMalloc) and tried to be freed by the system free.)

I support (1).

I think for (2), it would be better if the port maintainers would just try
to support the core (the new) allocator if it's possible. It would be
better to have only 1 allocator and optionally the system allocator for
special cases.

Cheers,

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Maciej Stachowiak

On Oct 1, 2013, at 3:47 PM, Geoffrey Garen  wrote:

>>> To access thread-specific data using pthreads, you first need to take a 
>>> lock and call pthread_key_create(). Since the whole point of 
>>> thread-specific data is to avoid taking a lock, the API is useless.
>> 
>> The normal way to do it is to use pthread_once to create the key, which does 
>> not in general take a lock. (That or use an out-of-band prior initializer, 
>> but that wouldn't work for malloc).
> 
> Most implementations of pthread_once use a spinlock, or some moral 
> equivalent. Fundamentally, there’s no memory-safe way to implement concurrent 
> one-time execution of arbitrary side effects without a spinlock.

This implementation from the Linux C library will only ever take a lock in the 
rare case where initialization has not already been performed, as far as I can 
tell:
http://searchcode.com/codesearch/view/18325089

Assuming my reading is correct, it only ever hits the slow path if 
initialization has not been performed yet, and multiple threads attempt to do 
it at once, which happens at most once early in startup.

As far as I know, the only significant cost in practice to using pthread_once + 
pthread_getspecific instead of pthread_getspecific_direct is function call 
overhead. That is my recollection from when we switched on Mac.

> 
> That’s why requiring concurrent one-time execution of arbitrary side effects 
> in order to access thread-specific memory is broken API.

It's definitely lame, but we have existence proofs that you can still be a lot 
faster than popular system malloc implementations without solving this problem 
(namely FastMalloc on Linux platforms today, and FastMalloc as initially 
deployed on Mac before we adopted pthread_getspecific). Does the new malloc 
implementation access thread-specific data much more frequently?

>> C++11 also introduces the thread_local keyword which is likely more readily 
>> optimizable than function-call-based APIs where supported.
> 
> thread_local might be a reasonable option, if a platform achieves all the 
> other requirements for fast malloc. It’s still too slow, but at least it 
> isn’t slow by definition, and it doesn’t pollute the rest of the code too 
> badly.

Maybe it would be easier to understand what the issue is looking at the code. 

>From this and your other posts, it sounds like there might be an issue of code 
>pollution/complexity and not just prospective performance.

Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Geoffrey Garen
> So are you proposing to use the system allocator on Windows?

I’m proposing a two step process:

(1) Use the system allocator on Windows (and GTK).

(2) If a port maintainer cares to optimize a given port, without too much 
disruption to mainline code, they may do so.

FWIW, If I were conducting (2) for Windows, malloc would be pretty far down the 
list of things I started porting.

> The current malloc logic has been the source of a number of mysterious 
> crashes on Windows, so reverting to the system allocator might be a good 
> thing for stability. I don’t know what the potential performance 
> ramifications would be.

Yes, I’ve heard that on other platforms as well.

Geoff

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Geoffrey Garen
>> To access thread-specific data using pthreads, you first need to take a lock 
>> and call pthread_key_create(). Since the whole point of thread-specific data 
>> is to avoid taking a lock, the API is useless.
> 
> The normal way to do it is to use pthread_once to create the key, which does 
> not in general take a lock. (That or use an out-of-band prior initializer, 
> but that wouldn't work for malloc).

Most implementations of pthread_once use a spinlock, or some moral equivalent. 
Fundamentally, there’s no memory-safe way to implement concurrent one-time 
execution of arbitrary side effects without a spinlock. That’s why requiring 
concurrent one-time execution of arbitrary side effects in order to access 
thread-specific memory is broken API.

> C++11 also introduces the thread_local keyword which is likely more readily 
> optimizable than function-call-based APIs where supported.

thread_local might be a reasonable option, if a platform achieves all the other 
requirements for fast malloc. It’s still too slow, but at least it isn’t slow 
by definition, and it doesn’t pollute the rest of the code too badly.

Geoff
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Maciej Stachowiak

On Oct 1, 2013, at 3:11 PM, Geoffrey Garen  wrote:

>>> (4) Find a fast API for aligned virtual memory allocation.
>>> (5) Find a fast API for committing / decommitting physical memory without 
>>> releasing virtual memory pages.
>> 
>> Hrm. Isn't this already available via OSAllocator or are you referring
>> to the fact that the Posix implementation has a few problems?
> 
> OSAllocator is the right model, yes. 
> 
> There are two problems with the current OSAllocator POSIX implementation:
> 
> (1) It uses mmap, which doesn’t support aligned allocation. To get aligned 
> allocation, POSIX double-allocates all virtual memory. That is 2X too much. 
> So, we need to find a variation on mmap that supports an alignment constraint.

PageAllocationAligned.cpp does this, but it would be more effective to unmap 
the unneeded extra at each end (or use mremap on systems that have it). That 
would be extra VM calls but would not require 2x the space. I'm not sure why it 
doesn't do that already - perhaps because it is building on top of OSAllocator 
and no one tried hard enough to optimize it.

(The current FastMalloc doesn't try to align its requests for system memory to 
more than a page boundary so it doesn't have this issue.)

> 
> (2) POSIX uses MADV_FREE, MADV_DONTNEED, and/or MADV_WILLNEED. I don’t think 
> anybody has ever verified that these APIs do what we want. In my experience, 
> they usually don’t. So, we need to find a variation on these APIs that works 
> and is fast.

I don't have the expertise to know what these do or whether it is what we want. 
But our current malloc uses these, so it would not be a regression for the new 
malloc to use them even if they are subtly wrong, unless there is something 
wildly different about its use of system memory.

> 
> We need somebody to resolve these issues, otherwise our memory footprint will 
> be unacceptably high, and/or our VM operations will be unacceptably slow.
> 
> Geoff
> ___
> webkit-dev mailing list
> webkit-dev@lists.webkit.org
> https://lists.webkit.org/mailman/listinfo/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Maciej Stachowiak

On Oct 1, 2013, at 3:05 PM, Geoffrey Garen  wrote:

>>> (3) Find a fast thread-specific data API on the canonical GTK platform.
>> 
>> Threading for GTK+ on non-Mac/non-Windows platforms is essentially
>> pthreads.
> 
> To access thread-specific data using pthreads, you first need to take a lock 
> and call pthread_key_create(). Since the whole point of thread-specific data 
> is to avoid taking a lock, the API is useless.

The normal way to do it is to use pthread_once to create the key, which does 
not in general take a lock. (That or use an out-of-band prior initializer, but 
that wouldn't work for malloc).

> 
> You’ll need an alternative to the cross-platform pthread API for accessing 
> thread-specific data. Otherwise, the cost of that API will dominate any other 
> cost, and it won’t be worth our time to try to optimize other things.

FastMalloc uses vanilla pthread_getspecific() all the time (including at least 
on every malloc call) on platforms that don't have a faster form of 
thread-specific data (such as pthread_getspecific on Mac or __thread on 
Windows). While it makes a difference, FastMalloc still tends to be faster 
overall than system malloc implementations. So I suspect it would work ok for 
the new malloc as well. Probably the easiest way to find out is to test.

C++11 also introduces the thread_local keyword which is likely more readily 
optimizable than function-call-based APIs where supported.

Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Maciej Stachowiak

On Oct 1, 2013, at 3:35 PM, Brent Fulgham  wrote:

> So are you proposing to use the system allocator on Windows? Or would we keep 
> using the existing FastMalloc implementation?
> 
> The current malloc logic has been the source of a number of mysterious 
> crashes on Windows, so reverting to the system allocator might be a good 
> thing for stability. I don’t know what the potential performance 
> ramifications would be.

They would be bad. The default malloc on Windows is very slow.

 - Maciej

> 
> -Brent
> 
> On Oct 1, 2013, at 3:23 PM, Geoffrey Garen  wrote:
> 
>>> Apple's Windows port uses FastMalloc and the last measurements we took show 
>>> it to be a large performance gain over the default Windows malloc 
>>> implementation.
>> 
>> I believe those measurements were taken 5 Windows versions ago.
>> 
>>> While this port is only used by iTunes these days, we still would not want 
>>> to regress its performance. Can the new allocator be made to work with 
>>> Windows?
>> 
>> The set of porting tasks is the same set I outlined for GTK.
>> 
>> The Windows port is missing many performance features, including tiled 
>> scrolling, LLInt, parallel garbage collection, DFG, and FTL. Given those 
>> other major missing pieces, I don’t think this piece is worth the porting 
>> time.
>> 
>> Geoff
>> ___
>> webkit-dev mailing list
>> webkit-dev@lists.webkit.org
>> https://lists.webkit.org/mailman/listinfo/webkit-dev
> 

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Brent Fulgham
So are you proposing to use the system allocator on Windows? Or would we keep 
using the existing FastMalloc implementation?

The current malloc logic has been the source of a number of mysterious crashes 
on Windows, so reverting to the system allocator might be a good thing for 
stability. I don’t know what the potential performance ramifications would be.

-Brent

On Oct 1, 2013, at 3:23 PM, Geoffrey Garen  wrote:

>> Apple's Windows port uses FastMalloc and the last measurements we took show 
>> it to be a large performance gain over the default Windows malloc 
>> implementation.
> 
> I believe those measurements were taken 5 Windows versions ago.
> 
>> While this port is only used by iTunes these days, we still would not want 
>> to regress its performance. Can the new allocator be made to work with 
>> Windows?
> 
> The set of porting tasks is the same set I outlined for GTK.
> 
> The Windows port is missing many performance features, including tiled 
> scrolling, LLInt, parallel garbage collection, DFG, and FTL. Given those 
> other major missing pieces, I don’t think this piece is worth the porting 
> time.
> 
> Geoff
> ___
> webkit-dev mailing list
> webkit-dev@lists.webkit.org
> https://lists.webkit.org/mailman/listinfo/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Geoffrey Garen
> Apple's Windows port uses FastMalloc and the last measurements we took show 
> it to be a large performance gain over the default Windows malloc 
> implementation.

I believe those measurements were taken 5 Windows versions ago.

> While this port is only used by iTunes these days, we still would not want to 
> regress its performance. Can the new allocator be made to work with Windows?

The set of porting tasks is the same set I outlined for GTK.

The Windows port is missing many performance features, including tiled 
scrolling, LLInt, parallel garbage collection, DFG, and FTL. Given those other 
major missing pieces, I don’t think this piece is worth the porting time.

Geoff
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Geoffrey Garen
>> (4) Find a fast API for aligned virtual memory allocation.
>> (5) Find a fast API for committing / decommitting physical memory without 
>> releasing virtual memory pages.
> 
> Hrm. Isn't this already available via OSAllocator or are you referring
> to the fact that the Posix implementation has a few problems?

OSAllocator is the right model, yes. 

There are two problems with the current OSAllocator POSIX implementation:

(1) It uses mmap, which doesn’t support aligned allocation. To get aligned 
allocation, POSIX double-allocates all virtual memory. That is 2X too much. So, 
we need to find a variation on mmap that supports an alignment constraint.

(2) POSIX uses MADV_FREE, MADV_DONTNEED, and/or MADV_WILLNEED. I don’t think 
anybody has ever verified that these APIs do what we want. In my experience, 
they usually don’t. So, we need to find a variation on these APIs that works 
and is fast.

We need somebody to resolve these issues, otherwise our memory footprint will 
be unacceptably high, and/or our VM operations will be unacceptably slow.

Geoff
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Geoffrey Garen
>> (3) Find a fast thread-specific data API on the canonical GTK platform.
> 
> Threading for GTK+ on non-Mac/non-Windows platforms is essentially
> pthreads.

To access thread-specific data using pthreads, you first need to take a lock 
and call pthread_key_create(). Since the whole point of thread-specific data is 
to avoid taking a lock, the API is useless.

You’ll need an alternative to the cross-platform pthread API for accessing 
thread-specific data. Otherwise, the cost of that API will dominate any other 
cost, and it won’t be worth our time to try to optimize other things.

Geoff
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Zoltan Horvath
On Tue, Oct 1, 2013 at 11:56 AM, Martin Robinson wrote:

>
> > Here’s a rough task list:
> >
> > (1) Define a canonical GTK platform we’ll use for performance
> measurement.
>
> Perhaps the University of Szeged team has some insight into what
> platforms they used for comparing allocator performance.


I measured the performance and memory for Qt on desktop and on some ARM
based embedded devices (e.g. Nokia N9). The blogs are still available on
the blog site, but I'm not sure we can consider the numbers as valid after
that many years. Please note also, I've working for Adobe for more than a
year now, so I don't know whether the University team has any recent public
results.

The goal for enabling TCmalloc on Qt/Gtk was to match the implementation
with the Apple port, which used TCmalloc at time. Please note also, only a
subset of QtWebKit platforms uses TCmalloc (linux, mac), the rest of them
still uses the default system allocator.

> (1) Refactor GTK APIs so that API-level objects are not allocated/deleted
> by global operator new/delete in WebCore+JavaScriptCore.
> > (1a) Either build the API layer as a separate library from
> WebCore+JavaScriptCore,
> > (1b) or specifically annotate each object at the API library
> with a per-class operator new / operator delete.
>
> I don't think this should be a problem. Currently all allocations of
> API-level objects happen with the GLib slab allocator (or system
> malloc/free, given the right environment arguments).
>
> > (2) Find a fast secure random number API on the canonical GTK platform.
>
> I can look into this.
>
> > (3) Find a fast thread-specific data API on the canonical GTK platform.
>
> Threading for GTK+ on non-Mac/non-Windows platforms is essentially
> pthreads. It probably wouldn't be a lot of work to defer to Windows
> and Mac implementations on those platforms.
>
> --Martin
>
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Martin Robinson
On Tue, Oct 1, 2013 at 11:34 AM, Geoffrey Garen  wrote:

> (4) Find a fast API for aligned virtual memory allocation.
> (5) Find a fast API for committing / decommitting physical memory without 
> releasing virtual memory pages.

Hrm. Isn't this already available via OSAllocator or are you referring
to the fact that the Posix implementation has a few problems?

--Martin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Maciej Stachowiak

On Sep 30, 2013, at 2:48 PM, Geoffrey Garen  wrote:

> Hi folks.
> 
> I’m planning to remove our years-out-of-date port of TCMalloc, and replace it 
> with something that takes maximum advantage of Mac and iOS virtual memory, 
> threading, and security APIs.
> 
> I've heard that TCMalloc has caused some problems for non-Mac, non-iOS ports 
> in the past. So, if you maintain a port, this change might make things 
> simpler for you.
> 
> Are there any ports whose built-in malloc implementations are slow enough 
> that they can’t get by without TCMalloc?

Apple's Windows port uses FastMalloc and the last measurements we took show it 
to be a large performance gain over the default Windows malloc implementation. 
While this port is only used by iTunes these days, we still would not want to 
regress its performance. Can the new allocator be made to work with Windows?

Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Oliver Hunt

On Oct 1, 2013, at 11:56 AM, Martin Robinson  wrote:

> On Tue, Oct 1, 2013 at 11:33 AM, Geoffrey Garen  wrote:
>>> A 5% regression in page load performance seems pretty serious.
>> 
>> I’m assuming you’re considering the GTK port here, and not the end-of-life 
>> Qt port.
>> 
>> Are you up for some engineering work to adopt a better malloc for GTK?
> 
> I appreciate your offer!
> 
>> Here’s a rough task list:
>> 
>> (1) Define a canonical GTK platform we’ll use for performance measurement.
> 
> Perhaps the University of Szeged team has some insight into what
> platforms they used for comparing allocator performance.
> 
>> (1) Refactor GTK APIs so that API-level objects are not allocated/deleted by 
>> global operator new/delete in WebCore+JavaScriptCore.
>>(1a) Either build the API layer as a separate library from 
>> WebCore+JavaScriptCore,
>>(1b) or specifically annotate each object at the API library with a 
>> per-class operator new / operator delete.
> 
> I don't think this should be a problem. Currently all allocations of
> API-level objects happen with the GLib slab allocator (or system
> malloc/free, given the right environment arguments).
> 
>> (2) Find a fast secure random number API on the canonical GTK platform.
> 
> I can look into this.

WTF has a custom implementation of arc4random(), i suspect most current Gtk 
host environments have a native one as well (s_rand on windows is terribly 
slow, but like i said, WTF has its own secure generator that will seed 
appropriately)

> 
>> (3) Find a fast thread-specific data API on the canonical GTK platform.
> 
> Threading for GTK+ on non-Mac/non-Windows platforms is essentially
> pthreads. It probably wouldn't be a lot of work to defer to Windows
> and Mac implementations on those platforms.

I recall linux having fast thread locals, as does windows.

--Oliver

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Martin Robinson
On Tue, Oct 1, 2013 at 11:33 AM, Geoffrey Garen  wrote:
>> A 5% regression in page load performance seems pretty serious.
>
> I’m assuming you’re considering the GTK port here, and not the end-of-life Qt 
> port.
>
> Are you up for some engineering work to adopt a better malloc for GTK?

I appreciate your offer!

> Here’s a rough task list:
>
> (1) Define a canonical GTK platform we’ll use for performance measurement.

Perhaps the University of Szeged team has some insight into what
platforms they used for comparing allocator performance.

> (1) Refactor GTK APIs so that API-level objects are not allocated/deleted by 
> global operator new/delete in WebCore+JavaScriptCore.
> (1a) Either build the API layer as a separate library from 
> WebCore+JavaScriptCore,
> (1b) or specifically annotate each object at the API library with a 
> per-class operator new / operator delete.

I don't think this should be a problem. Currently all allocations of
API-level objects happen with the GLib slab allocator (or system
malloc/free, given the right environment arguments).

> (2) Find a fast secure random number API on the canonical GTK platform.

I can look into this.

> (3) Find a fast thread-specific data API on the canonical GTK platform.

Threading for GTK+ on non-Mac/non-Windows platforms is essentially
pthreads. It probably wouldn't be a lot of work to defer to Windows
and Mac implementations on those platforms.

--Martin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Geoffrey Garen
> Here’s a rough task list:
> 
> (1) Define a canonical GTK platform we’ll use for performance measurement.
> 
> (2) Measure FastMalloc on/off on that platform.
> 
> Assuming FastMalloc is a significant improvement:
> 
> (1) Refactor GTK APIs so that API-level objects are not allocated/deleted by 
> global operator new/delete in WebCore+JavaScriptCore.
> 
>   (1a) Either build the API layer as a separate library from 
> WebCore+JavaScriptCore,
> 
>   (1b) or specifically annotate each object at the API library with a 
> per-class operator new / operator delete.
> 
> (2) Find a fast secure random number API on the canonical GTK platform.
> 
> (3) Find a fast thread-specific data API on the canonical GTK platform.

(4) Find a fast API for aligned virtual memory allocation.

(5) Find a fast API for committing / decommitting physical memory without 
releasing virtual memory pages.

Geoff
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Geoffrey Garen
> A 5% regression in page load performance seems pretty serious.

I’m assuming you’re considering the GTK port here, and not the end-of-life Qt 
port.

Are you up for some engineering work to adopt a better malloc for GTK?

Here’s a rough task list:

(1) Define a canonical GTK platform we’ll use for performance measurement.

(2) Measure FastMalloc on/off on that platform.

Assuming FastMalloc is a significant improvement:

(1) Refactor GTK APIs so that API-level objects are not allocated/deleted by 
global operator new/delete in WebCore+JavaScriptCore.

(1a) Either build the API layer as a separate library from 
WebCore+JavaScriptCore,

(1b) or specifically annotate each object at the API library with a 
per-class operator new / operator delete.

(2) Find a fast secure random number API on the canonical GTK platform.

(3) Find a fast thread-specific data API on the canonical GTK platform.

If you take on these tasks, I’m happy to take on the larger task of providing a 
fast malloc for GTK WebKit.

Geoff
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-10-01 Thread Martin Robinson
On Mon, Sep 30, 2013 at 7:41 PM, Zoltan Horvath  wrote:

> Based on a 2.5-year-old measurement
> (http://webkit.sed.hu/blog/20100302/war-allocators-qtlaunchers-coast) on the
> Qt-port, the page loading on the Methanol test suite was 5% faster (avg)
> with TCmalloc than the default system allocator on Linux. The performance
> results of the SunSpider suite was similar for both allocators. The memory
> consumption was always lower with the default os allocator.

A 5% regression in page load performance seems pretty serious.

--Martin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] WTF::fastMalloc

2013-09-30 Thread Zoltan Horvath
Hi Geoffrey,

I used to work on memory related topics, while I was working on the
University of Szeged.

Based on a 2.5-year-old measurement (
http://webkit.sed.hu/blog/20100302/war-allocators-qtlaunchers-coast) on the
Qt-port, the page loading on the Methanol test suite was 5% faster (avg)
with TCmalloc than the default system allocator on Linux. The performance
results of the SunSpider suite was similar for both allocators. The memory
consumption was always lower with the default os allocator.

I guess the new allocator only has iOS support. I'm fine with removing
TCmalloc, although this direction might raises further questions, like
removing the custom allocation framework also. Feel free to cc me on bugs,
I can help by contributing some patches.

Cheers,



On Mon, Sep 30, 2013 at 2:48 PM, Geoffrey Garen  wrote:

> Hi folks.
>
> I’m planning to remove our years-out-of-date port of TCMalloc, and replace
> it with something that takes maximum advantage of Mac and iOS virtual
> memory, threading, and security APIs.
>
> I've heard that TCMalloc has caused some problems for non-Mac, non-iOS
> ports in the past. So, if you maintain a port, this change might make
> things simpler for you.
>
> Are there any ports whose built-in malloc implementations are slow enough
> that they can’t get by without TCMalloc?
>
> Thanks,
> Geoff
> ___
> webkit-dev mailing list
> webkit-dev@lists.webkit.org
> https://lists.webkit.org/mailman/listinfo/webkit-dev
>
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


[webkit-dev] WTF::fastMalloc

2013-09-30 Thread Geoffrey Garen
Hi folks.

I’m planning to remove our years-out-of-date port of TCMalloc, and replace it 
with something that takes maximum advantage of Mac and iOS virtual memory, 
threading, and security APIs.

I've heard that TCMalloc has caused some problems for non-Mac, non-iOS ports in 
the past. So, if you maintain a port, this change might make things simpler for 
you.

Are there any ports whose built-in malloc implementations are slow enough that 
they can’t get by without TCMalloc?

Thanks,
Geoff
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev