(sorry Balint for the double-post, I don't know why my "reply" button dropped the mailing list)
---------- Forwarded message ---------- From: Evan Huus <[email protected]> Date: Fri, Jul 11, 2014 at 9:05 PM Subject: Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.) To: Bálint Reczey <[email protected]> On Fri, Jul 11, 2014 at 8:27 PM, Bálint Réczey <[email protected]> wrote: > Hi Evan, > > 2014-07-11 23:51 GMT+02:00 Evan Huus <[email protected]>: > > On Fri, Jul 11, 2014 at 5:12 PM, Bálint Réczey <[email protected]> > > wrote: > >> > >> Hi All, > >> > >> Please provide the input data for letting others reproduce the results > >> or perform the performance tests on pcap files already available to > >> the public. > >> > >> I'm not a fan of implementing custom memory management methods because > >> partly because I highly doubt we can beat jemalloc easily on > >> performance > > > > > > The only place we reliably beat jemalloc (or even glib) is when we have a > > large number of allocations that live together, and can be freed with > > free_all. Anything else is basically noise. As Jakub's test noted, the > main > > block allocator is actually slightly slower than g_slice_* if the frees > are > > done manually. > > > >> > >> and custom allocation methods can also have nasty bugs > >> like the one observed in OpenSSL: > >> http://www.tedunangst.com/flak/post/analysis-of-openssl-freelist-reuse > >> > >> I wrote a short post about making all programs in Debian resistant to > >> malloc() related attacks using ASAN and wmem in its current form > >> prevents implementing the protection: > >> > >> > http://balintreczey.hu/blog/proposing-amd64-hardened-architecture-for-debian/ > > > > > > It's not clear to me from reading the blog post or the mail to debian > what > > the actual protections would be, or why wmem would prevent them from > > working. Could you clarify please? Glib has its own allocator logic > > internally for g_slice_*, does this also cause problems? > I plan using ASAN for all programs which would catch (among others) > use-after-free and reading below or over the malloc()-ed > memory area. Those can't be caught if the program uses another layer > of bulk memory allocations. > g_malloc() and g_slice_* has the same problem, but they can be > overrideb by passing G_SLICE=always-malloc . > > I know wmem has simple allocator Which can be foreced everywhere by passing WIRESHARK_DEBUG_WMEM_OVERRIDE=simple like we do in "valgrind-wireshark.sh". I don't see how this is different from using G_SLICE=always-malloc? > but wmem_free() is really > inefficient I grant that wmem_simple_free has a terrible worst-case behaviour, but its practical efficiency is actually the best of anything I've tried. The only alternative which doesn't involve adding a prefix to the memory block (which would break ASAN/valgrind/etc) is a hash table, which is actually substantially slower. > and as a fix I would like to propose removing wmem_free() > from the API. > IMO Wireshark needs wmem_alloc() for allocating and freeing memory > needed during whole scopes, but should not offer wmem_free() and > wmem_realloc() to let us able to provide efficient per-scope > allocations. Hmm. We need free and realloc to efficiently implement certain data structures, and I see you've addressed that below, so I will include my answer there. > Dissectors which should simply do g_malloc()/g_free() for > allocations when they know when they need to free memory (which they can't do safely if they also call proto_tree functions inbetween) > and using > wmem_free() also does not keep the promise of having a > wmem_alloc()-ated object available during the whole scope. > I don't think that promise was ever made, really. The promise is that such memory won't be available after the scope, not that it will be available right until the end. > Wmem also have a lot of data structures reimplemented using > wmem-backed memory, but I think it would be way easier to use GLists > registered to be g_list_free()-d using wmem_register_callback() than > using wmem_list_* functions for manipulating per-scope allocated lists > for example. > This is a follow-up to the performance concern. If we use callbacks to free everything, we lose the *substantial* speed-up we get from the free_all operation. I grant that we should avoid optimizations unless well-justified, but I do believe this is well-justified given the benefit. Taking it to the extreme, wmem could be made *extraordinarily* simple if it were nothing more than a linked list of callbacks (some to g_free, some to g_list_free, etc) and we used glib memory for everything. It would just be very, very, very slow. The simplicity has an obvious appeal, but I can't justify a performance hit of that magnitude. Cheers, > Balint > > > > >> > >> Please don't sacrifice protection for 2% speedup. Please keep wmem > >> usage for cases where it is used for garbage collecting (free() after > >> end of frame/capture file) not when the allocation and deallocation > >> are already done properly. > >> > >> Thanks, > >> Balint > >> > >> 2014-07-11 8:58 GMT+02:00 Jakub Zawadzki <[email protected]>: > >> > Hi, > >> > > >> > On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote: > >> >> If we're in topic of optimizing 'slower' [de]allocations in common > >> >> functions: > >> >> > >> >> - tvb allocation/deallocation (2.5%, or 3.4% when no filtering) > >> >> > >> >> 243,931,671 * ???:tvb_new > >> >> [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] > >> >> 202,052,290 > ???:g_slice_alloc (2463493x) > >> >> [/usr/lib64/libglib-2.0.so.0.3600.4] > >> >> > >> >> 291,765,126 * ???:tvb_free_chain > >> >> [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] > >> >> 256,390,635 > ???:g_slice_free1 (2435843x) > >> >> [/usr/lib64/libglib-2.0.so.0.3600.4] > >> > > >> >> This, or next week I'll try to do tvb. > >> > > >> > ... or maybe this week: > >> > > >> > ver0 | 18,055,719,820 (-----------) | Base version > >> > 96f0585268f1cc4e820767c4038c10ed4915c12a > >> > ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_* > to > >> > wmem with file scope > >> > ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_* > to > >> > wmem with file/packet scope > >> > ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_* > to > >> > simple object allocator with epan scope > >> > ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_* > to > >> > simple object allocator with file scope > >> > > >> > I have uploaded patches & profiler outputs to: > >> > http://www.wireshark.org/~darkjames/tvb-opt-allocator/ > >> > > >> > Please review, and check what version is OK to be applied. > >> > > >> > > >> > P.S: I'll might find some time to do ver5 with slab allocator, but > it'll > >> > look like object allocator API with epan scope. > >> > > >> > Cheers, > >> > Jakub. > >> > > >> > > ___________________________________________________________________________ > >> > Sent via: Wireshark-dev mailing list <[email protected]> > >> > Archives: http://www.wireshark.org/lists/wireshark-dev > >> > Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev > >> > > >> > mailto:[email protected]?subject=unsubscribe > >> > >> > ___________________________________________________________________________ > >> Sent via: Wireshark-dev mailing list <[email protected]> > >> Archives: http://www.wireshark.org/lists/wireshark-dev > >> Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev > >> > >> mailto:[email protected]?subject=unsubscribe > > > > >
___________________________________________________________________________ Sent via: Wireshark-dev mailing list <[email protected]> Archives: http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:[email protected]?subject=unsubscribe
