(sorry Balint for the double-post, I don't know why my "reply" button
dropped the mailing list)

---------- Forwarded message ----------
From: Evan Huus <[email protected]>
Date: Fri, Jul 11, 2014 at 9:05 PM
Subject: Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits]
master b6d20a2: Optimize reseting epan_dissect_t when filtering.)
To: Bálint Reczey <[email protected]>


On Fri, Jul 11, 2014 at 8:27 PM, Bálint Réczey <[email protected]>
wrote:

> Hi Evan,
>
> 2014-07-11 23:51 GMT+02:00 Evan Huus <[email protected]>:
> > On Fri, Jul 11, 2014 at 5:12 PM, Bálint Réczey <[email protected]>
> > wrote:
> >>
> >> Hi All,
> >>
> >> Please provide the input data for letting others reproduce the results
> >> or perform the performance tests on pcap files already available to
> >> the public.
> >>
> >> I'm not a fan of implementing custom memory management methods because
> >> partly because I highly doubt we can beat jemalloc easily on
> >> performance
> >
> >
> > The only place we reliably beat jemalloc (or even glib) is when we have a
> > large number of allocations that live together, and can be freed with
> > free_all. Anything else is basically noise. As Jakub's test noted, the
> main
> > block allocator is actually slightly slower than g_slice_* if the frees
> are
> > done manually.
> >
> >>
> >> and custom allocation methods can also have nasty bugs
> >> like the one observed in OpenSSL:
> >> http://www.tedunangst.com/flak/post/analysis-of-openssl-freelist-reuse
> >>
> >> I wrote a short post about making all programs in Debian resistant to
> >> malloc() related attacks using ASAN and wmem in its current form
> >> prevents implementing the protection:
> >>
> >>
> http://balintreczey.hu/blog/proposing-amd64-hardened-architecture-for-debian/
> >
> >
> > It's not clear to me from reading the blog post or the mail to debian
> what
> > the actual protections would be, or why wmem would prevent them from
> > working. Could you clarify please? Glib has its own allocator logic
> > internally for g_slice_*, does this also cause problems?
> I plan using ASAN for all programs which would catch (among others)
> use-after-free and reading below or over the malloc()-ed
> memory area. Those can't be caught if the program uses another layer
> of bulk memory allocations.
> g_malloc() and g_slice_* has the same problem, but they can be
> overrideb by passing G_SLICE=always-malloc .
>
> I know wmem has simple allocator


Which can be foreced everywhere by passing
WIRESHARK_DEBUG_WMEM_OVERRIDE=simple like we do in "valgrind-wireshark.sh".
I don't see how this is different from using G_SLICE=always-malloc?


> but wmem_free() is really
> inefficient


I grant that wmem_simple_free has a terrible worst-case behaviour, but its
practical efficiency is actually the best of anything I've tried. The only
alternative which doesn't involve adding a prefix to the memory block
(which would break ASAN/valgrind/etc) is a hash table, which is actually
substantially slower.


> and as a fix I would like to propose removing wmem_free()
> from the API.
> IMO Wireshark needs wmem_alloc() for allocating and freeing memory
> needed during whole scopes, but should not offer wmem_free() and
> wmem_realloc() to let us able to provide efficient per-scope
> allocations.


Hmm. We need free and realloc to efficiently implement certain data
structures, and I see you've addressed that below, so I will include my
answer there.


> Dissectors which should simply do g_malloc()/g_free() for
> allocations when they know when they need to free memory


(which they can't do safely if they also call proto_tree functions
inbetween)


> and using
> wmem_free() also does not keep the promise of having a
> wmem_alloc()-ated object available during the whole scope.
>

I don't think that promise was ever made, really. The promise is that such
memory won't be available after the scope, not that it will be available
right until the end.


> Wmem also have a lot of data structures reimplemented using
> wmem-backed memory, but I think it would be way easier to use GLists
> registered to be g_list_free()-d using wmem_register_callback() than
> using wmem_list_* functions for manipulating per-scope allocated lists
> for example.
>

This is a follow-up to the performance concern. If we use callbacks to free
everything, we lose the *substantial* speed-up we get from the free_all
operation. I grant that we should avoid optimizations unless
well-justified, but I do believe this is well-justified given the benefit.

Taking it to the extreme, wmem could be made *extraordinarily* simple if it
were nothing more than a linked list of callbacks (some to g_free, some to
g_list_free, etc) and we used glib memory for everything. It would just be
very, very, very slow. The simplicity has an obvious appeal, but I can't
justify a performance hit of that magnitude.

Cheers,
> Balint
>
> >
> >>
> >> Please don't sacrifice protection for 2% speedup. Please keep wmem
> >> usage for cases where it is used for garbage collecting (free() after
> >> end of frame/capture file) not when the allocation and deallocation
> >> are already done properly.
> >>
> >> Thanks,
> >> Balint
> >>
> >> 2014-07-11 8:58 GMT+02:00 Jakub Zawadzki <[email protected]>:
> >> > Hi,
> >> >
> >> > On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote:
> >> >> If we're in topic of optimizing 'slower' [de]allocations in common
> >> >> functions:
> >> >>
> >> >> - tvb allocation/deallocation (2.5%, or 3.4% when no filtering)
> >> >>
> >> >>    243,931,671  *  ???:tvb_new
> >> >> [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
> >> >>    202,052,290  >   ???:g_slice_alloc (2463493x)
> >> >> [/usr/lib64/libglib-2.0.so.0.3600.4]
> >> >>
> >> >>    291,765,126  *  ???:tvb_free_chain
> >> >> [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
> >> >>    256,390,635  >   ???:g_slice_free1 (2435843x)
> >> >> [/usr/lib64/libglib-2.0.so.0.3600.4]
> >> >
> >> >> This, or next week I'll try to do tvb.
> >> >
> >> > ... or maybe this week:
> >> >
> >> > ver0 | 18,055,719,820 (-----------) | Base version
> >> > 96f0585268f1cc4e820767c4038c10ed4915c12a
> >> > ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_*
> to
> >> > wmem with file scope
> >> > ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_*
> to
> >> > wmem with file/packet scope
> >> > ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_*
> to
> >> > simple object allocator with epan scope
> >> > ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_*
> to
> >> > simple object allocator with file scope
> >> >
> >> > I have uploaded patches & profiler outputs to:
> >> > http://www.wireshark.org/~darkjames/tvb-opt-allocator/
> >> >
> >> > Please review, and check what version is OK to be applied.
> >> >
> >> >
> >> > P.S: I'll might find some time to do ver5 with slab allocator, but
> it'll
> >> > look like object allocator API with epan scope.
> >> >
> >> > Cheers,
> >> > Jakub.
> >> >
> >> >
> ___________________________________________________________________________
> >> > Sent via:    Wireshark-dev mailing list <[email protected]>
> >> > Archives:    http://www.wireshark.org/lists/wireshark-dev
> >> > Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
> >> >
> >> > mailto:[email protected]?subject=unsubscribe
> >>
> >>
> ___________________________________________________________________________
> >> Sent via:    Wireshark-dev mailing list <[email protected]>
> >> Archives:    http://www.wireshark.org/lists/wireshark-dev
> >> Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
> >>
> >> mailto:[email protected]?subject=unsubscribe
> >
> >
>
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <[email protected]>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:[email protected]?subject=unsubscribe

Reply via email to