Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)
Hi Jakub, 2014-07-22 0:52 GMT+02:00 darkjames...@darkjames.pl: Hi, On Sat, Jul 12, 2014 at 02:27:06AM +0200, B??lint R??czey wrote: I plan using ASAN for all programs which would catch (among others) use-after-free and reading below or over the malloc()-ed memory area. Those can't be caught if the program uses another layer of bulk memory allocations. g_malloc() and g_slice_* has the same problem, but they can be overrideb by passing G_SLICE=always-malloc . We have these environment variables also, ver{3, 4} introduce WIRESHARK_DEBUG_WMEM_OBJECT_POOL_SKIP, where you can set, and object pool will not maintain free list of objects. I see no problem with enabling it by default when building with ASAN. Also it should be quite easy to use ASAN manual poisoning[1] with object pool API. wmem_object_pool_alloc(wmem_allocator_t *allocator, wmem_object_pool_t *pool): if (pool-current_count 0) { ... ASAN_UNPOISON_MEMORY_REGION(pool-free_list, pool-object_size); pool-free_list = pool-free_list-next; wmem_object_pool_free(wmem_allocator_t *allocator, wmem_object_pool_t *pool, void *ptr) ... ASAN_POISON_MEMORY_REGION(ptr, pool-object_size); (not tested). Thank you, I was not aware of those macros. I think we should add those to the custom allocator code. Cheers, Balint ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe
Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)
On Thu, Jul 24, 2014 at 2:42 PM, Bálint Réczey bal...@balintreczey.hu wrote: Hi Jakub, 2014-07-22 0:52 GMT+02:00 darkjames...@darkjames.pl: Hi, On Sat, Jul 12, 2014 at 02:27:06AM +0200, B??lint R??czey wrote: I plan using ASAN for all programs which would catch (among others) use-after-free and reading below or over the malloc()-ed memory area. Those can't be caught if the program uses another layer of bulk memory allocations. g_malloc() and g_slice_* has the same problem, but they can be overrideb by passing G_SLICE=always-malloc . We have these environment variables also, ver{3, 4} introduce WIRESHARK_DEBUG_WMEM_OBJECT_POOL_SKIP, where you can set, and object pool will not maintain free list of objects. I see no problem with enabling it by default when building with ASAN. Also it should be quite easy to use ASAN manual poisoning[1] with object pool API. wmem_object_pool_alloc(wmem_allocator_t *allocator, wmem_object_pool_t *pool): if (pool-current_count 0) { ... ASAN_UNPOISON_MEMORY_REGION(pool-free_list, pool-object_size); pool-free_list = pool-free_list-next; wmem_object_pool_free(wmem_allocator_t *allocator, wmem_object_pool_t *pool, void *ptr) ... ASAN_POISON_MEMORY_REGION(ptr, pool-object_size); (not tested). Thank you, I was not aware of those macros. I think we should add those to the custom allocator code. Just setting WIRESHARK_DEBUG_WMEM_OVERRIDE=simple is easier, is it not? Cheers, Balint ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org ?subject=unsubscribe ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe
Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)
Hi, On Fri, Jul 11, 2014 at 11:12:37PM +0200, B??lint R??czey wrote: Please provide the input data for letting others reproduce the results or perform the performance tests on pcap files already available to the public. I have only fake one: http://www.wireshark.org/~darkjames/tvb-opt-allocator/udp-test/udp-tvb-benchmark.pcap.xz (eth:ethertype:vlan:ethertype:ip:udp:sip]) This generally always shows up. If you have small amount of frames just substract time spend in epan_init() (as it doesn't matter of number of capture files). Profiled target: /tmp/wireshark/.libs/tshark -nr /tmp/udp-tvb-benchmark.pcap (...) === git revision: 450f4916522b0099364e8978405c048362c9e745 === 3,241,026,732 PROGRAM TOTALS 545,660,479 ???:epan_init (1x) [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 65,641,038 * ???:tvb_new [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 78,702,144 * ???:tvb_free_chain [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] tvb allocation: 144.3 / 3241.0 = 4.4% of total tvb allocation: 144.3 / 2695.3 = 5.3% of total - epan_init() === git revision: 450f4916522b0099364e8978405c048362c9e745 + ver2 of patch === 3,174,032,847 PROGRAM TOTALS 544,803,794 * ???:epan_init [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 43,514,953 * ???:tvb_new [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 27,263,712 * ???:tvb_free_chain [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] tvb allocation: 70.7 / 3174.0 = 2.2% of total tvb allocation: 70.7 / 2629.2 = 2.7% of total (without initialization) == Cheers, Jakub. ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe
Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)
Hi, On Sat, Jul 12, 2014 at 02:27:06AM +0200, B??lint R??czey wrote: I plan using ASAN for all programs which would catch (among others) use-after-free and reading below or over the malloc()-ed memory area. Those can't be caught if the program uses another layer of bulk memory allocations. g_malloc() and g_slice_* has the same problem, but they can be overrideb by passing G_SLICE=always-malloc . We have these environment variables also, ver{3, 4} introduce WIRESHARK_DEBUG_WMEM_OBJECT_POOL_SKIP, where you can set, and object pool will not maintain free list of objects. I see no problem with enabling it by default when building with ASAN. Also it should be quite easy to use ASAN manual poisoning[1] with object pool API. wmem_object_pool_alloc(wmem_allocator_t *allocator, wmem_object_pool_t *pool): if (pool-current_count 0) { ... ASAN_UNPOISON_MEMORY_REGION(pool-free_list, pool-object_size); pool-free_list = pool-free_list-next; wmem_object_pool_free(wmem_allocator_t *allocator, wmem_object_pool_t *pool, void *ptr) ... ASAN_POISON_MEMORY_REGION(ptr, pool-object_size); (not tested). Cheers, Jakub. [1] http://code.google.com/p/address-sanitizer/wiki/ManualPoisoning ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe
Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)
The biggest win, I think, would be if we can avoid calling free_chain at all because tvbs are always allocated in the right scope and so get freed automatically. I think this would involve touching every place that creates new tvbs backed with glib memory though... I will try and think about this and review your patches sometime this weekend. On Fri, Jul 11, 2014 at 2:58 AM, Jakub Zawadzki darkjames...@darkjames.pl wrote: Hi, On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote: If we're in topic of optimizing 'slower' [de]allocations in common functions: - tvb allocation/deallocation (2.5%, or 3.4% when no filtering) 243,931,671 * ???:tvb_new [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 202,052,290 ???:g_slice_alloc (2463493x) [/usr/lib64/libglib-2.0.so.0.3600.4] 291,765,126 * ???:tvb_free_chain [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 256,390,635 ???:g_slice_free1 (2435843x) [/usr/lib64/libglib-2.0.so.0.3600.4] This, or next week I'll try to do tvb. ... or maybe this week: ver0 | 18,055,719,820 (---) | Base version 96f0585268f1cc4e820767c4038c10ed4915c12a ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_* to wmem with file scope ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_* to wmem with file/packet scope ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_* to simple object allocator with epan scope ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_* to simple object allocator with file scope I have uploaded patches profiler outputs to: http://www.wireshark.org/~darkjames/tvb-opt-allocator/ Please review, and check what version is OK to be applied. P.S: I'll might find some time to do ver5 with slab allocator, but it'll look like object allocator API with epan scope. Cheers, Jakub. ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org ?subject=unsubscribe ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe
Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)
Hi All, Please provide the input data for letting others reproduce the results or perform the performance tests on pcap files already available to the public. I'm not a fan of implementing custom memory management methods because partly because I highly doubt we can beat jemalloc easily on performance and custom allocation methods can also have nasty bugs like the one observed in OpenSSL: http://www.tedunangst.com/flak/post/analysis-of-openssl-freelist-reuse I wrote a short post about making all programs in Debian resistant to malloc() related attacks using ASAN and wmem in its current form prevents implementing the protection: http://balintreczey.hu/blog/proposing-amd64-hardened-architecture-for-debian/ Please don't sacrifice protection for 2% speedup. Please keep wmem usage for cases where it is used for garbage collecting (free() after end of frame/capture file) not when the allocation and deallocation are already done properly. Thanks, Balint 2014-07-11 8:58 GMT+02:00 Jakub Zawadzki darkjames...@darkjames.pl: Hi, On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote: If we're in topic of optimizing 'slower' [de]allocations in common functions: - tvb allocation/deallocation (2.5%, or 3.4% when no filtering) 243,931,671 * ???:tvb_new [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 202,052,290 ???:g_slice_alloc (2463493x) [/usr/lib64/libglib-2.0.so.0.3600.4] 291,765,126 * ???:tvb_free_chain [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 256,390,635 ???:g_slice_free1 (2435843x) [/usr/lib64/libglib-2.0.so.0.3600.4] This, or next week I'll try to do tvb. ... or maybe this week: ver0 | 18,055,719,820 (---) | Base version 96f0585268f1cc4e820767c4038c10ed4915c12a ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_* to wmem with file scope ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_* to wmem with file/packet scope ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_* to simple object allocator with epan scope ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_* to simple object allocator with file scope I have uploaded patches profiler outputs to: http://www.wireshark.org/~darkjames/tvb-opt-allocator/ Please review, and check what version is OK to be applied. P.S: I'll might find some time to do ver5 with slab allocator, but it'll look like object allocator API with epan scope. Cheers, Jakub. ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe
Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)
On Fri, Jul 11, 2014 at 5:12 PM, Bálint Réczey bal...@balintreczey.hu wrote: Hi All, Please provide the input data for letting others reproduce the results or perform the performance tests on pcap files already available to the public. I'm not a fan of implementing custom memory management methods because partly because I highly doubt we can beat jemalloc easily on performance The only place we reliably beat jemalloc (or even glib) is when we have a large number of allocations that live together, and can be freed with free_all. Anything else is basically noise. As Jakub's test noted, the main block allocator is actually slightly slower than g_slice_* if the frees are done manually. and custom allocation methods can also have nasty bugs like the one observed in OpenSSL: http://www.tedunangst.com/flak/post/analysis-of-openssl-freelist-reuse I wrote a short post about making all programs in Debian resistant to malloc() related attacks using ASAN and wmem in its current form prevents implementing the protection: http://balintreczey.hu/blog/proposing-amd64-hardened-architecture-for-debian/ It's not clear to me from reading the blog post or the mail to debian what the actual protections would be, or why wmem would prevent them from working. Could you clarify please? Glib has its own allocator logic internally for g_slice_*, does this also cause problems? Please don't sacrifice protection for 2% speedup. Please keep wmem usage for cases where it is used for garbage collecting (free() after end of frame/capture file) not when the allocation and deallocation are already done properly. Thanks, Balint 2014-07-11 8:58 GMT+02:00 Jakub Zawadzki darkjames...@darkjames.pl: Hi, On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote: If we're in topic of optimizing 'slower' [de]allocations in common functions: - tvb allocation/deallocation (2.5%, or 3.4% when no filtering) 243,931,671 * ???:tvb_new [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 202,052,290 ???:g_slice_alloc (2463493x) [/usr/lib64/libglib-2.0.so.0.3600.4] 291,765,126 * ???:tvb_free_chain [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 256,390,635 ???:g_slice_free1 (2435843x) [/usr/lib64/libglib-2.0.so.0.3600.4] This, or next week I'll try to do tvb. ... or maybe this week: ver0 | 18,055,719,820 (---) | Base version 96f0585268f1cc4e820767c4038c10ed4915c12a ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_* to wmem with file scope ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_* to wmem with file/packet scope ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_* to simple object allocator with epan scope ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_* to simple object allocator with file scope I have uploaded patches profiler outputs to: http://www.wireshark.org/~darkjames/tvb-opt-allocator/ Please review, and check what version is OK to be applied. P.S: I'll might find some time to do ver5 with slab allocator, but it'll look like object allocator API with epan scope. Cheers, Jakub. ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org ?subject=unsubscribe ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org ?subject=unsubscribe ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe
Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)
Den 11 jul 2014 23:13 skrev Bálint Réczey bal...@balintreczey.hu: Hi All, Please provide the input data for letting others reproduce the results or perform the performance tests on pcap files already available to the public Ok I'll see if we can use something from the wiki instead. I'm not a fan of implementing custom memory management methods because partly because I highly doubt we can beat jemalloc easily on performance and custom allocation methods can also have nasty bugs like the one observed in OpenSSL: http://www.tedunangst.com/flak/post/analysis-of-openssl-freelist-reuse We have gone through a set of memory allocation schemes already to try to improve performance (g_slice, emen and now wmem) are you saying that you are opposed to that? As wmem seems to be the faster scheme for packet scope memory allocation /free, why not use it in all possible places where the scope is packet? Please don't sacrifice protection for 2% speedup. Please keep wmem usage for cases where it is used for garbage collecting (free() after end of frame/capture file) not when the allocation and deallocation are already done properly. ? A slow scheme might be working well but that in it self does not warrant to not replace it with a faster one. With this reasoning we shouldn't have introduced ep memory in the first place. What percentage if improvement do you think makes a change worthwhile? The set of improvements Jacub and I have done lately has given a reduction of 40-50 percent compared to 1.10 measuring with the sample file. The problem is that each improvement only yeald a percent or 2. Agreed that we shouldn't complicate the code for a small speed gain. In your blog you say that people would accept double the execution time with increased security - I'm not so sure. If say the reformatting of a video takes one hour instead of 30 minutes. Just my 2 cents Anders Thanks, Balint 2014-07-11 8:58 GMT+02:00 Jakub Zawadzki darkjames...@darkjames.pl: Hi, On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote: If we're in topic of optimizing 'slower' [de]allocations in common functions: - tvb allocation/deallocation (2.5%, or 3.4% when no filtering) 243,931,671 * ???:tvb_new [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 202,052,290 ???:g_slice_alloc (2463493x) [/usr/lib64/libglib-2.0.so.0.3600.4] 291,765,126 * ???:tvb_free_chain [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 256,390,635 ???:g_slice_free1 (2435843x) [/usr/lib64/libglib-2.0.so.0.3600.4] This, or next week I'll try to do tvb. ... or maybe this week: ver0 | 18,055,719,820 (---) | Base version 96f0585268f1cc4e820767c4038c10ed4915c12a ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_* to wmem with file scope ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_* to wmem with file/packet scope ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_* to simple object allocator with epan scope ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_* to simple object allocator with file scope I have uploaded patches profiler outputs to: http://www.wireshark.org/~darkjames/tvb-opt-allocator/ Please review, and check what version is OK to be applied. P.S: I'll might find some time to do ver5 with slab allocator, but it'll look like object allocator API with epan scope. Cheers, Jakub. ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org ?subject=unsubscribe ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org ?subject=unsubscribe ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe
Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)
On Fri, Jul 11, 2014 at 4:08 PM, Evan Huus eapa...@gmail.com wrote: The biggest win, I think, would be if we can avoid calling free_chain at all because tvbs are always allocated in the right scope and so get freed automatically. I think this would involve touching every place that creates new tvbs backed with glib memory though... I will try and think about this and review your patches sometime this weekend. Balint's concerns temporarily aside (this may end up being moot if we decide not to use wmem here at all, but I wanted to discuss the architecture anyways), my current understanding is the following: 1. Most current uses of TVBs fall naturally into one of the existing wmem scopes. The main tvb (edt-tvb) has the same effective scope as pinfo-pool. Any TVBs generated by subsets, decryptions, decompressions, etc. likewise have pinfo scope. Reassembled TVBs have file scope. 2. The exception to the above is the intermediate TVBs used by reassembly when not all fragments have been received. It isn't clear to me that these have a defined scope at all. 3. TVB chaining was a convenient way to track all the various subsets etc. created during dissection, so we can simply free the parent and all the others get cleaned up as well. Architecturally, the approach that seems simplest to me (and is I believe likely to be most efficient) would be to get rid of tvb chaining entirely. Allocate tvbuffs (and their backing data, if appropriate) in the correct wmem scope, and let wmem clean them up at the appropriate point. For intermediate reassembly buffers, use scope NULL and manually free them the way we are doing now. This approach avoids keeping any additional free-lists. It greatly simplifies the tvbuff code and API by removing all need for chaining and the various acrobatics that accompany it. It ends up using pinfo-scope for the vast majority of tvbs in the normal path, which Jakub's ver2 benchmark showed to be a measurable (if small) speed-up. It lets (most) TVBs be collected by wmem's free_all, which I expect will be another measurable (if small) speed-up over manually freeing each one. Thoughts? On Fri, Jul 11, 2014 at 2:58 AM, Jakub Zawadzki darkjames...@darkjames.pl wrote: Hi, On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote: If we're in topic of optimizing 'slower' [de]allocations in common functions: - tvb allocation/deallocation (2.5%, or 3.4% when no filtering) 243,931,671 * ???:tvb_new [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 202,052,290 ???:g_slice_alloc (2463493x) [/usr/lib64/libglib-2.0.so.0.3600.4] 291,765,126 * ???:tvb_free_chain [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 256,390,635 ???:g_slice_free1 (2435843x) [/usr/lib64/libglib-2.0.so.0.3600.4] This, or next week I'll try to do tvb. ... or maybe this week: ver0 | 18,055,719,820 (---) | Base version 96f0585268f1cc4e820767c4038c10ed4915c12a ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_* to wmem with file scope ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_* to wmem with file/packet scope ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_* to simple object allocator with epan scope ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_* to simple object allocator with file scope I have uploaded patches profiler outputs to: http://www.wireshark.org/~darkjames/tvb-opt-allocator/ Please review, and check what version is OK to be applied. P.S: I'll might find some time to do ver5 with slab allocator, but it'll look like object allocator API with epan scope. Cheers, Jakub. ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org ?subject=unsubscribe ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe
Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)
On Fri, Jul 11, 2014 at 6:07 PM, Anders Broman a.broma...@gmail.com wrote: Den 11 jul 2014 23:13 skrev Bálint Réczey bal...@balintreczey.hu: Hi All, Please provide the input data for letting others reproduce the results or perform the performance tests on pcap files already available to the public Ok I'll see if we can use something from the wiki instead. I'm not a fan of implementing custom memory management methods because partly because I highly doubt we can beat jemalloc easily on performance and custom allocation methods can also have nasty bugs like the one observed in OpenSSL: http://www.tedunangst.com/flak/post/analysis-of-openssl-freelist-reuse We have gone through a set of memory allocation schemes already to try to improve performance (g_slice, emen and now wmem) Technically, emem was originally added to fix memory leaks caused by exceptions. Wmem (as just a better-architected replacement for emem) is similar. The block allocator was added later as an optimization because it provided a *significant* improvement in performance over a simple linked-list of OS-alloced blocks (5-10x faster in my benchmarks). are you saying that you are opposed to that? As wmem seems to be the faster scheme for packet scope memory allocation /free, why not use it in all possible places where the scope is packet? Please don't sacrifice protection for 2% speedup. Please keep wmem usage for cases where it is used for garbage collecting (free() after end of frame/capture file) not when the allocation and deallocation are already done properly. ? A slow scheme might be working well but that in it self does not warrant to not replace it with a faster one. With this reasoning we shouldn't have introduced ep memory in the first place. What percentage if improvement do you think makes a change worthwhile? The set of improvements Jacub and I have done lately has given a reduction of 40-50 percent compared to 1.10 measuring with the sample file. The problem is that each improvement only yeald a percent or 2. Agreed that we shouldn't complicate the code for a small speed gain. In your blog you say that people would accept double the execution time with increased security - I'm not so sure. If say the reformatting of a video takes one hour instead of 30 minutes. Just my 2 cents Anders Thanks, Balint 2014-07-11 8:58 GMT+02:00 Jakub Zawadzki darkjames...@darkjames.pl: Hi, On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote: If we're in topic of optimizing 'slower' [de]allocations in common functions: - tvb allocation/deallocation (2.5%, or 3.4% when no filtering) 243,931,671 * ???:tvb_new [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 202,052,290 ???:g_slice_alloc (2463493x) [/usr/lib64/libglib-2.0.so.0.3600.4] 291,765,126 * ???:tvb_free_chain [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 256,390,635 ???:g_slice_free1 (2435843x) [/usr/lib64/libglib-2.0.so.0.3600.4] This, or next week I'll try to do tvb. ... or maybe this week: ver0 | 18,055,719,820 (---) | Base version 96f0585268f1cc4e820767c4038c10ed4915c12a ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_* to wmem with file scope ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_* to wmem with file/packet scope ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_* to simple object allocator with epan scope ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_* to simple object allocator with file scope I have uploaded patches profiler outputs to: http://www.wireshark.org/~darkjames/tvb-opt-allocator/ Please review, and check what version is OK to be applied. P.S: I'll might find some time to do ver5 with slab allocator, but it'll look like object allocator API with epan scope. Cheers, Jakub. ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org ?subject=unsubscribe ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org ?subject=unsubscribe ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)
Hi Evan, 2014-07-11 23:51 GMT+02:00 Evan Huus eapa...@gmail.com: On Fri, Jul 11, 2014 at 5:12 PM, Bálint Réczey bal...@balintreczey.hu wrote: Hi All, Please provide the input data for letting others reproduce the results or perform the performance tests on pcap files already available to the public. I'm not a fan of implementing custom memory management methods because partly because I highly doubt we can beat jemalloc easily on performance The only place we reliably beat jemalloc (or even glib) is when we have a large number of allocations that live together, and can be freed with free_all. Anything else is basically noise. As Jakub's test noted, the main block allocator is actually slightly slower than g_slice_* if the frees are done manually. and custom allocation methods can also have nasty bugs like the one observed in OpenSSL: http://www.tedunangst.com/flak/post/analysis-of-openssl-freelist-reuse I wrote a short post about making all programs in Debian resistant to malloc() related attacks using ASAN and wmem in its current form prevents implementing the protection: http://balintreczey.hu/blog/proposing-amd64-hardened-architecture-for-debian/ It's not clear to me from reading the blog post or the mail to debian what the actual protections would be, or why wmem would prevent them from working. Could you clarify please? Glib has its own allocator logic internally for g_slice_*, does this also cause problems? I plan using ASAN for all programs which would catch (among others) use-after-free and reading below or over the malloc()-ed memory area. Those can't be caught if the program uses another layer of bulk memory allocations. g_malloc() and g_slice_* has the same problem, but they can be overrideb by passing G_SLICE=always-malloc . I know wmem has simple allocator, but wmem_free() is really inefficient and as a fix I would like to propose removing wmem_free() from the API. IMO Wireshark needs wmem_alloc() for allocating and freeing memory needed during whole scopes, but should not offer wmem_free() and wmem_realloc() to let us able to provide efficient per-scope allocations. Dissectors which should simply do g_malloc()/g_free() for allocations when they know when they need to free memory and using wmem_free() also does not keep the promise of having a wmem_alloc()-ated object available during the whole scope. Wmem also have a lot of data structures reimplemented using wmem-backed memory, but I think it would be way easier to use GLists registered to be g_list_free()-d using wmem_register_callback() than using wmem_list_* functions for manipulating per-scope allocated lists for example. Cheers, Balint Please don't sacrifice protection for 2% speedup. Please keep wmem usage for cases where it is used for garbage collecting (free() after end of frame/capture file) not when the allocation and deallocation are already done properly. Thanks, Balint 2014-07-11 8:58 GMT+02:00 Jakub Zawadzki darkjames...@darkjames.pl: Hi, On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote: If we're in topic of optimizing 'slower' [de]allocations in common functions: - tvb allocation/deallocation (2.5%, or 3.4% when no filtering) 243,931,671 * ???:tvb_new [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 202,052,290 ???:g_slice_alloc (2463493x) [/usr/lib64/libglib-2.0.so.0.3600.4] 291,765,126 * ???:tvb_free_chain [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 256,390,635 ???:g_slice_free1 (2435843x) [/usr/lib64/libglib-2.0.so.0.3600.4] This, or next week I'll try to do tvb. ... or maybe this week: ver0 | 18,055,719,820 (---) | Base version 96f0585268f1cc4e820767c4038c10ed4915c12a ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_* to wmem with file scope ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_* to wmem with file/packet scope ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_* to simple object allocator with epan scope ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_* to simple object allocator with file scope I have uploaded patches profiler outputs to: http://www.wireshark.org/~darkjames/tvb-opt-allocator/ Please review, and check what version is OK to be applied. P.S: I'll might find some time to do ver5 with slab allocator, but it'll look like object allocator API with epan scope. Cheers, Jakub. ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe ___ Sent via:Wireshark-dev mailing list
Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)
2014-07-12 0:07 GMT+02:00 Anders Broman a.broma...@gmail.com: Den 11 jul 2014 23:13 skrev Bálint Réczey bal...@balintreczey.hu: Hi All, Please provide the input data for letting others reproduce the results or perform the performance tests on pcap files already available to the public Ok I'll see if we can use something from the wiki instead. I'm not a fan of implementing custom memory management methods because partly because I highly doubt we can beat jemalloc easily on performance and custom allocation methods can also have nasty bugs like the one observed in OpenSSL: http://www.tedunangst.com/flak/post/analysis-of-openssl-freelist-reuse We have gone through a set of memory allocation schemes already to try to improve performance (g_slice, emen and now wmem) are you saying that you are opposed to that? No, IMO there is need for wmem for being able to keep memory allocated during scopes, but I would prefer seeing it only tracking and one-by-one freeing the memory areas instead of doing bulk alloc()-s for optimizing for speed. I tried to detail this in my answer to Evan's email. As wmem seems to be the faster scheme for packet scope memory allocation /free, why not use it in all possible places where the scope is packet? Please don't sacrifice protection for 2% speedup. Please keep wmem usage for cases where it is used for garbage collecting (free() after end of frame/capture file) not when the allocation and deallocation are already done properly. ? A slow scheme might be working well but that in it self does not warrant to not replace it with a faster one. With this reasoning we shouldn't have introduced ep memory in the first place. What percentage if improvement do you think makes a change worthwhile? The set of improvements Jacub and I have done lately has given a reduction of 40-50 percent compared to 1.10 measuring with the sample file. The problem is that each improvement only yeald a percent or 2. Agreed that we shouldn't complicate the code for a small speed gain. 40-50% reduction is great and congratulations for such a speedup! I hope memory allocation handling is responsible for only small fraction of it because I would like to keep the possibility of detecting memory allocation related errors and I would prefer using tools implemented outside of Wireshark. For example I would avoid bulk allocations to make us able to use ASAN and Valgrind, even if we would implement our canaries. In your blog you say that people would accept double the execution time with increased security - I'm not so sure. If say the reformatting of a video takes one hour instead of 30 minutes. In Debian you would be able to pick a slow and secure video player for streaming from the untrusted Internet and a fast and less secure video encoder. I expect people to install Wireshark from the hardened-amd64 repository if they want to monitor a hostile network, while others pick the fast Wireshark for using it in their safe labs. So it depends and there will be good options to choose from. Cheers, Balint Just my 2 cents Anders Thanks, Balint 2014-07-11 8:58 GMT+02:00 Jakub Zawadzki darkjames...@darkjames.pl: Hi, On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote: If we're in topic of optimizing 'slower' [de]allocations in common functions: - tvb allocation/deallocation (2.5%, or 3.4% when no filtering) 243,931,671 * ???:tvb_new [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 202,052,290 ???:g_slice_alloc (2463493x) [/usr/lib64/libglib-2.0.so.0.3600.4] 291,765,126 * ???:tvb_free_chain [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0] 256,390,635 ???:g_slice_free1 (2435843x) [/usr/lib64/libglib-2.0.so.0.3600.4] This, or next week I'll try to do tvb. ... or maybe this week: ver0 | 18,055,719,820 (---) | Base version 96f0585268f1cc4e820767c4038c10ed4915c12a ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_* to wmem with file scope ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_* to wmem with file/packet scope ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_* to simple object allocator with epan scope ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_* to simple object allocator with file scope I have uploaded patches profiler outputs to: http://www.wireshark.org/~darkjames/tvb-opt-allocator/ Please review, and check what version is OK to be applied. P.S: I'll might find some time to do ver5 with slab allocator, but it'll look like object allocator API with epan scope. Cheers, Jakub. ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org Archives:http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)
On Fri, Jul 11, 2014 at 9:08 PM, Evan Huus eapa...@gmail.com wrote: (sorry Balint for the double-post, I don't know why my reply button dropped the mailing list) -- Forwarded message -- From: Evan Huus eapa...@gmail.com Date: Fri, Jul 11, 2014 at 9:05 PM Subject: Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.) To: Bálint Reczey bal...@balintreczey.hu On Fri, Jul 11, 2014 at 8:27 PM, Bálint Réczey bal...@balintreczey.hu wrote: Hi Evan, 2014-07-11 23:51 GMT+02:00 Evan Huus eapa...@gmail.com: On Fri, Jul 11, 2014 at 5:12 PM, Bálint Réczey bal...@balintreczey.hu wrote: Hi All, Please provide the input data for letting others reproduce the results or perform the performance tests on pcap files already available to the public. I'm not a fan of implementing custom memory management methods because partly because I highly doubt we can beat jemalloc easily on performance The only place we reliably beat jemalloc (or even glib) is when we have a large number of allocations that live together, and can be freed with free_all. Anything else is basically noise. As Jakub's test noted, the main block allocator is actually slightly slower than g_slice_* if the frees are done manually. and custom allocation methods can also have nasty bugs like the one observed in OpenSSL: http://www.tedunangst.com/flak/post/analysis-of-openssl-freelist-reuse I wrote a short post about making all programs in Debian resistant to malloc() related attacks using ASAN and wmem in its current form prevents implementing the protection: http://balintreczey.hu/blog/proposing-amd64-hardened-architecture-for-debian/ It's not clear to me from reading the blog post or the mail to debian what the actual protections would be, or why wmem would prevent them from working. Could you clarify please? Glib has its own allocator logic internally for g_slice_*, does this also cause problems? I plan using ASAN for all programs which would catch (among others) use-after-free and reading below or over the malloc()-ed memory area. Those can't be caught if the program uses another layer of bulk memory allocations. g_malloc() and g_slice_* has the same problem, but they can be overrideb by passing G_SLICE=always-malloc . I know wmem has simple allocator Which can be foreced everywhere by passing WIRESHARK_DEBUG_WMEM_OVERRIDE=simple like we do in valgrind-wireshark.sh. I don't see how this is different from using G_SLICE=always-malloc? but wmem_free() is really inefficient I grant that wmem_simple_free has a terrible worst-case behaviour, but its practical efficiency is actually the best of anything I've tried. The only alternative which doesn't involve adding a prefix to the memory block (which would break ASAN/valgrind/etc) is a hash table, which is actually substantially slower. (we had part of this discussion already once when I first introduced that change: https://code.wireshark.org/review/1602) and as a fix I would like to propose removing wmem_free() from the API. IMO Wireshark needs wmem_alloc() for allocating and freeing memory needed during whole scopes, but should not offer wmem_free() and wmem_realloc() to let us able to provide efficient per-scope allocations. Hmm. We need free and realloc to efficiently implement certain data structures, and I see you've addressed that below, so I will include my answer there. Dissectors which should simply do g_malloc()/g_free() for allocations when they know when they need to free memory (which they can't do safely if they also call proto_tree functions inbetween) and using wmem_free() also does not keep the promise of having a wmem_alloc()-ated object available during the whole scope. I don't think that promise was ever made, really. The promise is that such memory won't be available after the scope, not that it will be available right until the end. Wmem also have a lot of data structures reimplemented using wmem-backed memory, but I think it would be way easier to use GLists registered to be g_list_free()-d using wmem_register_callback() than using wmem_list_* functions for manipulating per-scope allocated lists for example. This is a follow-up to the performance concern. If we use callbacks to free everything, we lose the *substantial* speed-up we get from the free_all operation. I grant that we should avoid optimizations unless well-justified, but I do believe this is well-justified given the benefit. Taking it to the extreme, wmem could be made *extraordinarily* simple if it were nothing more than a linked list of callbacks (some to g_free, some to g_list_free, etc) and we used glib memory for everything. It would just be very, very, very slow. The simplicity has an obvious appeal, but I can't justify a performance hit of that magnitude. Cheers