Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)

2014-07-24 Thread Bálint Réczey
Hi Jakub,

2014-07-22 0:52 GMT+02:00  darkjames...@darkjames.pl:
 Hi,

 On Sat, Jul 12, 2014 at 02:27:06AM +0200, B??lint R??czey wrote:
 I plan using ASAN for all programs which would catch (among others)
 use-after-free and reading below or over the malloc()-ed
 memory area. Those can't be caught if the program uses another layer
 of bulk memory allocations.
 g_malloc() and g_slice_* has the same problem, but they can be
 overrideb by passing G_SLICE=always-malloc .

 We have these environment variables also, ver{3, 4} introduce 
 WIRESHARK_DEBUG_WMEM_OBJECT_POOL_SKIP,
 where you can set, and object pool will not maintain free list of objects.

 I see no problem with enabling it by default when building with ASAN.

 Also it should be quite easy to use ASAN manual poisoning[1] with object pool 
 API.

 wmem_object_pool_alloc(wmem_allocator_t *allocator, wmem_object_pool_t *pool):
 if (pool-current_count  0) {
 ...
 ASAN_UNPOISON_MEMORY_REGION(pool-free_list, pool-object_size);
 pool-free_list = pool-free_list-next;

 wmem_object_pool_free(wmem_allocator_t *allocator, wmem_object_pool_t *pool, 
 void *ptr)
 ...
 ASAN_POISON_MEMORY_REGION(ptr, pool-object_size);

 (not tested).
Thank you, I was not aware of those macros. I think we should add
those to the custom allocator code.

Cheers,
Balint
___
Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
Archives:http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
 mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe


Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)

2014-07-24 Thread Evan Huus
On Thu, Jul 24, 2014 at 2:42 PM, Bálint Réczey bal...@balintreczey.hu
wrote:

 Hi Jakub,

 2014-07-22 0:52 GMT+02:00  darkjames...@darkjames.pl:
  Hi,
 
  On Sat, Jul 12, 2014 at 02:27:06AM +0200, B??lint R??czey wrote:
  I plan using ASAN for all programs which would catch (among others)
  use-after-free and reading below or over the malloc()-ed
  memory area. Those can't be caught if the program uses another layer
  of bulk memory allocations.
  g_malloc() and g_slice_* has the same problem, but they can be
  overrideb by passing G_SLICE=always-malloc .
 
  We have these environment variables also, ver{3, 4} introduce
 WIRESHARK_DEBUG_WMEM_OBJECT_POOL_SKIP,
  where you can set, and object pool will not maintain free list of
 objects.
 
  I see no problem with enabling it by default when building with ASAN.
 
  Also it should be quite easy to use ASAN manual poisoning[1] with object
 pool API.
 
  wmem_object_pool_alloc(wmem_allocator_t *allocator, wmem_object_pool_t
 *pool):
  if (pool-current_count  0) {
  ...
  ASAN_UNPOISON_MEMORY_REGION(pool-free_list, pool-object_size);
  pool-free_list = pool-free_list-next;
 
  wmem_object_pool_free(wmem_allocator_t *allocator, wmem_object_pool_t
 *pool, void *ptr)
  ...
  ASAN_POISON_MEMORY_REGION(ptr, pool-object_size);
 
  (not tested).
 Thank you, I was not aware of those macros. I think we should add
 those to the custom allocator code.


Just setting WIRESHARK_DEBUG_WMEM_OVERRIDE=simple is easier, is it not?


 Cheers,
 Balint
 ___
 Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
 Archives:http://www.wireshark.org/lists/wireshark-dev
 Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
  mailto:wireshark-dev-requ...@wireshark.org
 ?subject=unsubscribe

___
Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
Archives:http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
 mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe

Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)

2014-07-21 Thread darkjames-ws
Hi,

On Fri, Jul 11, 2014 at 11:12:37PM +0200, B??lint R??czey wrote:
 Please provide the input data for letting others reproduce the results
 or perform the performance tests on pcap files already available to
 the public.

I have only fake one:
http://www.wireshark.org/~darkjames/tvb-opt-allocator/udp-test/udp-tvb-benchmark.pcap.xz
 (eth:ethertype:vlan:ethertype:ip:udp:sip])

This generally always shows up.
If you have small amount of frames just substract time spend in epan_init()
(as it doesn't matter of number of capture files).

Profiled target:  /tmp/wireshark/.libs/tshark -nr /tmp/udp-tvb-benchmark.pcap 
(...)

===   git revision: 450f4916522b0099364e8978405c048362c9e745 ===

3,241,026,732  PROGRAM TOTALS
  545,660,479 ???:epan_init (1x) 
[/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
 
   65,641,038  *  ???:tvb_new [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
   78,702,144  *  ???:tvb_free_chain 
[/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
 
  tvb allocation: 144.3 / 3241.0 = 4.4% of total
  tvb allocation: 144.3 / 2695.3 = 5.3% of total - epan_init()

=== git revision: 450f4916522b0099364e8978405c048362c9e745 + ver2 of patch ===

3,174,032,847  PROGRAM TOTALS
  544,803,794  *  ???:epan_init 
[/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
 
   43,514,953  *  ???:tvb_new [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
   27,263,712  *  ???:tvb_free_chain 
[/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
 
  tvb allocation: 70.7 / 3174.0 = 2.2% of total
  tvb allocation: 70.7 / 2629.2 = 2.7% of total (without initialization)

==

Cheers,
Jakub.
___
Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
Archives:http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
 mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe


Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)

2014-07-21 Thread darkjames-ws
Hi,

On Sat, Jul 12, 2014 at 02:27:06AM +0200, B??lint R??czey wrote:
 I plan using ASAN for all programs which would catch (among others)
 use-after-free and reading below or over the malloc()-ed
 memory area. Those can't be caught if the program uses another layer
 of bulk memory allocations.
 g_malloc() and g_slice_* has the same problem, but they can be
 overrideb by passing G_SLICE=always-malloc .

We have these environment variables also, ver{3, 4} introduce 
WIRESHARK_DEBUG_WMEM_OBJECT_POOL_SKIP, 
where you can set, and object pool will not maintain free list of objects.

I see no problem with enabling it by default when building with ASAN.

Also it should be quite easy to use ASAN manual poisoning[1] with object pool 
API.

wmem_object_pool_alloc(wmem_allocator_t *allocator, wmem_object_pool_t *pool):
if (pool-current_count  0) {
...
ASAN_UNPOISON_MEMORY_REGION(pool-free_list, pool-object_size);
pool-free_list = pool-free_list-next;

wmem_object_pool_free(wmem_allocator_t *allocator, wmem_object_pool_t *pool, 
void *ptr)
...
ASAN_POISON_MEMORY_REGION(ptr, pool-object_size);

(not tested).

Cheers,
Jakub.

[1] http://code.google.com/p/address-sanitizer/wiki/ManualPoisoning
___
Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
Archives:http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
 mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe


Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)

2014-07-11 Thread Evan Huus
The biggest win, I think, would be if we can avoid calling free_chain at
all because tvbs are always allocated in the right scope and so get freed
automatically. I think this would involve touching every place that creates
new tvbs backed with glib memory though...

I will try and think about this and review your patches sometime this
weekend.


On Fri, Jul 11, 2014 at 2:58 AM, Jakub Zawadzki darkjames...@darkjames.pl
wrote:

 Hi,

 On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote:
  If we're in topic of optimizing 'slower' [de]allocations in common
 functions:
 
  - tvb allocation/deallocation (2.5%, or 3.4% when no filtering)
 
 243,931,671  *  ???:tvb_new
 [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
 202,052,290 ???:g_slice_alloc (2463493x)
 [/usr/lib64/libglib-2.0.so.0.3600.4]
 
 291,765,126  *  ???:tvb_free_chain
 [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
 256,390,635 ???:g_slice_free1 (2435843x)
 [/usr/lib64/libglib-2.0.so.0.3600.4]

  This, or next week I'll try to do tvb.

 ... or maybe this week:

 ver0 | 18,055,719,820 (---) | Base version
 96f0585268f1cc4e820767c4038c10ed4915c12a
 ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_* to
 wmem with file scope
 ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_* to
 wmem with file/packet scope
 ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_* to
 simple object allocator with epan scope
 ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_* to
 simple object allocator with file scope

 I have uploaded patches  profiler outputs to:
 http://www.wireshark.org/~darkjames/tvb-opt-allocator/

 Please review, and check what version is OK to be applied.


 P.S: I'll might find some time to do ver5 with slab allocator, but it'll
 look like object allocator API with epan scope.

 Cheers,
 Jakub.
 ___
 Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
 Archives:http://www.wireshark.org/lists/wireshark-dev
 Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
  mailto:wireshark-dev-requ...@wireshark.org
 ?subject=unsubscribe

___
Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
Archives:http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
 mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe

Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)

2014-07-11 Thread Bálint Réczey
Hi All,

Please provide the input data for letting others reproduce the results
or perform the performance tests on pcap files already available to
the public.

I'm not a fan of implementing custom memory management methods because
partly because I highly doubt we can beat jemalloc easily on
performance and custom allocation methods can also have nasty bugs
like the one observed in OpenSSL:
http://www.tedunangst.com/flak/post/analysis-of-openssl-freelist-reuse

I wrote a short post about making all programs in Debian resistant to
malloc() related attacks using ASAN and wmem in its current form
prevents implementing the protection:
http://balintreczey.hu/blog/proposing-amd64-hardened-architecture-for-debian/

Please don't sacrifice protection for 2% speedup. Please keep wmem
usage for cases where it is used for garbage collecting (free() after
end of frame/capture file) not when the allocation and deallocation
are already done properly.

Thanks,
Balint

2014-07-11 8:58 GMT+02:00 Jakub Zawadzki darkjames...@darkjames.pl:
 Hi,

 On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote:
 If we're in topic of optimizing 'slower' [de]allocations in common functions:

 - tvb allocation/deallocation (2.5%, or 3.4% when no filtering)

243,931,671  *  ???:tvb_new 
 [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
202,052,290 ???:g_slice_alloc (2463493x) 
 [/usr/lib64/libglib-2.0.so.0.3600.4]

291,765,126  *  ???:tvb_free_chain 
 [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
256,390,635 ???:g_slice_free1 (2435843x) 
 [/usr/lib64/libglib-2.0.so.0.3600.4]

 This, or next week I'll try to do tvb.

 ... or maybe this week:

 ver0 | 18,055,719,820 (---) | Base version 
 96f0585268f1cc4e820767c4038c10ed4915c12a
 ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_* to wmem 
 with file scope
 ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_* to wmem 
 with file/packet scope
 ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_* to 
 simple object allocator with epan scope
 ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_* to 
 simple object allocator with file scope

 I have uploaded patches  profiler outputs to: 
 http://www.wireshark.org/~darkjames/tvb-opt-allocator/

 Please review, and check what version is OK to be applied.


 P.S: I'll might find some time to do ver5 with slab allocator, but it'll look 
 like object allocator API with epan scope.

 Cheers,
 Jakub.
 ___
 Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
 Archives:http://www.wireshark.org/lists/wireshark-dev
 Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
  mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe
___
Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
Archives:http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
 mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe


Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)

2014-07-11 Thread Evan Huus
On Fri, Jul 11, 2014 at 5:12 PM, Bálint Réczey bal...@balintreczey.hu
wrote:

 Hi All,

 Please provide the input data for letting others reproduce the results
 or perform the performance tests on pcap files already available to
 the public.

 I'm not a fan of implementing custom memory management methods because
 partly because I highly doubt we can beat jemalloc easily on
 performance


The only place we reliably beat jemalloc (or even glib) is when we have a
large number of allocations that live together, and can be freed with
free_all. Anything else is basically noise. As Jakub's test noted, the main
block allocator is actually slightly slower than g_slice_* if the frees are
done manually.


 and custom allocation methods can also have nasty bugs
 like the one observed in OpenSSL:
 http://www.tedunangst.com/flak/post/analysis-of-openssl-freelist-reuse

 I wrote a short post about making all programs in Debian resistant to
 malloc() related attacks using ASAN and wmem in its current form
 prevents implementing the protection:

 http://balintreczey.hu/blog/proposing-amd64-hardened-architecture-for-debian/


It's not clear to me from reading the blog post or the mail to debian what
the actual protections would be, or why wmem would prevent them from
working. Could you clarify please? Glib has its own allocator logic
internally for g_slice_*, does this also cause problems?


 Please don't sacrifice protection for 2% speedup. Please keep wmem
 usage for cases where it is used for garbage collecting (free() after
 end of frame/capture file) not when the allocation and deallocation
 are already done properly.

 Thanks,
 Balint

 2014-07-11 8:58 GMT+02:00 Jakub Zawadzki darkjames...@darkjames.pl:
  Hi,
 
  On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote:
  If we're in topic of optimizing 'slower' [de]allocations in common
 functions:
 
  - tvb allocation/deallocation (2.5%, or 3.4% when no filtering)
 
 243,931,671  *  ???:tvb_new
 [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
 202,052,290 ???:g_slice_alloc (2463493x)
 [/usr/lib64/libglib-2.0.so.0.3600.4]
 
 291,765,126  *  ???:tvb_free_chain
 [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
 256,390,635 ???:g_slice_free1 (2435843x)
 [/usr/lib64/libglib-2.0.so.0.3600.4]
 
  This, or next week I'll try to do tvb.
 
  ... or maybe this week:
 
  ver0 | 18,055,719,820 (---) | Base version
 96f0585268f1cc4e820767c4038c10ed4915c12a
  ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_* to
 wmem with file scope
  ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_* to
 wmem with file/packet scope
  ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_* to
 simple object allocator with epan scope
  ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_* to
 simple object allocator with file scope
 
  I have uploaded patches  profiler outputs to:
 http://www.wireshark.org/~darkjames/tvb-opt-allocator/
 
  Please review, and check what version is OK to be applied.
 
 
  P.S: I'll might find some time to do ver5 with slab allocator, but it'll
 look like object allocator API with epan scope.
 
  Cheers,
  Jakub.
 
 ___
  Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
  Archives:http://www.wireshark.org/lists/wireshark-dev
  Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
   mailto:wireshark-dev-requ...@wireshark.org
 ?subject=unsubscribe
 ___
 Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
 Archives:http://www.wireshark.org/lists/wireshark-dev
 Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
  mailto:wireshark-dev-requ...@wireshark.org
 ?subject=unsubscribe

___
Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
Archives:http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
 mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe

Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)

2014-07-11 Thread Anders Broman
Den 11 jul 2014 23:13 skrev Bálint Réczey bal...@balintreczey.hu:

 Hi All,

 Please provide the input data for letting others reproduce the results
 or perform the performance tests on pcap files already available to
 the public

Ok I'll see if we can use something from the wiki instead.


 I'm not a fan of implementing custom memory management methods because
 partly because I highly doubt we can beat jemalloc easily on
 performance and custom allocation methods can also have nasty bugs
 like the one observed in OpenSSL:
 http://www.tedunangst.com/flak/post/analysis-of-openssl-freelist-reuse


We have gone through a set of memory allocation schemes already to try to
improve performance (g_slice, emen and now wmem) are you saying that you
are opposed to that?

As wmem seems to be the faster scheme for packet scope memory allocation
/free, why not use it in all possible places where  the scope is packet?

 Please don't sacrifice protection for 2% speedup. Please keep wmem
 usage for cases where it is used for garbage collecting (free() after
 end of frame/capture file) not when the allocation and deallocation
 are already done properly.

? A slow scheme might be working well but that in it self does not warrant
to not replace it with a faster one. With this reasoning we shouldn't have
introduced ep memory in the first place.

What percentage if improvement do you think makes a change worthwhile?

The set of improvements Jacub and I have done lately has given a reduction
of 40-50 percent compared to 1.10 measuring with the sample file. The
problem is that each improvement only yeald a percent or 2. Agreed that we
shouldn't complicate the code for a small speed gain.

In your blog you say that people would accept double the execution time
with increased security - I'm not so sure. If say the reformatting of a
video takes one hour instead of 30 minutes.

Just my 2 cents
Anders

 Thanks,
 Balint

 2014-07-11 8:58 GMT+02:00 Jakub Zawadzki darkjames...@darkjames.pl:
  Hi,
 
  On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote:
  If we're in topic of optimizing 'slower' [de]allocations in common
functions:
 
  - tvb allocation/deallocation (2.5%, or 3.4% when no filtering)
 
 243,931,671  *  ???:tvb_new
[/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
 202,052,290 ???:g_slice_alloc (2463493x)
[/usr/lib64/libglib-2.0.so.0.3600.4]
 
 291,765,126  *  ???:tvb_free_chain
[/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
 256,390,635 ???:g_slice_free1 (2435843x)
[/usr/lib64/libglib-2.0.so.0.3600.4]
 
  This, or next week I'll try to do tvb.
 
  ... or maybe this week:
 
  ver0 | 18,055,719,820 (---) | Base version
96f0585268f1cc4e820767c4038c10ed4915c12a
  ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_* to
wmem with file scope
  ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_* to
wmem with file/packet scope
  ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_* to
simple object allocator with epan scope
  ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_* to
simple object allocator with file scope
 
  I have uploaded patches  profiler outputs to:
http://www.wireshark.org/~darkjames/tvb-opt-allocator/
 
  Please review, and check what version is OK to be applied.
 
 
  P.S: I'll might find some time to do ver5 with slab allocator, but
it'll look like object allocator API with epan scope.
 
  Cheers,
  Jakub.
 
___
  Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
  Archives:http://www.wireshark.org/lists/wireshark-dev
  Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
   mailto:wireshark-dev-requ...@wireshark.org
?subject=unsubscribe

___
 Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
 Archives:http://www.wireshark.org/lists/wireshark-dev
 Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
  mailto:wireshark-dev-requ...@wireshark.org
?subject=unsubscribe
___
Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
Archives:http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
 mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe

Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)

2014-07-11 Thread Evan Huus
On Fri, Jul 11, 2014 at 4:08 PM, Evan Huus eapa...@gmail.com wrote:

 The biggest win, I think, would be if we can avoid calling free_chain at
 all because tvbs are always allocated in the right scope and so get freed
 automatically. I think this would involve touching every place that creates
 new tvbs backed with glib memory though...

 I will try and think about this and review your patches sometime this
 weekend.


Balint's concerns temporarily aside (this may end up being moot if we
decide not to use wmem here at all, but I wanted to discuss the
architecture anyways), my current understanding is the following:

1. Most current uses of TVBs fall naturally into one of the existing wmem
scopes. The main tvb (edt-tvb) has the same effective scope as
pinfo-pool. Any TVBs generated by subsets, decryptions, decompressions,
etc. likewise have pinfo scope. Reassembled TVBs have file scope.

2. The exception to the above is the intermediate TVBs used by reassembly
when not all fragments have been received. It isn't clear to me that these
have a defined scope at all.

3. TVB chaining was a convenient way to track all the various subsets etc.
created during dissection, so we can simply free the parent and all the
others get cleaned up as well.

Architecturally, the approach that seems simplest to me (and is I believe
likely to be most efficient) would be to get rid of tvb chaining entirely.
Allocate tvbuffs (and their backing data, if appropriate) in the correct
wmem scope, and let wmem clean them up at the appropriate point. For
intermediate reassembly buffers, use scope NULL and manually free them the
way we are doing now.

This approach avoids keeping any additional free-lists. It greatly
simplifies the tvbuff code and API by removing all need for chaining and
the various acrobatics that accompany it. It ends up using pinfo-scope for
the vast majority of tvbs in the normal path, which Jakub's ver2 benchmark
showed to be a measurable (if small) speed-up. It lets (most) TVBs be
collected by wmem's free_all, which I expect will be another measurable (if
small) speed-up over manually freeing each one.

Thoughts?


 On Fri, Jul 11, 2014 at 2:58 AM, Jakub Zawadzki darkjames...@darkjames.pl
  wrote:

 Hi,

 On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote:
  If we're in topic of optimizing 'slower' [de]allocations in common
 functions:
 
  - tvb allocation/deallocation (2.5%, or 3.4% when no filtering)
 
 243,931,671  *  ???:tvb_new
 [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
 202,052,290 ???:g_slice_alloc (2463493x)
 [/usr/lib64/libglib-2.0.so.0.3600.4]
 
 291,765,126  *  ???:tvb_free_chain
 [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
 256,390,635 ???:g_slice_free1 (2435843x)
 [/usr/lib64/libglib-2.0.so.0.3600.4]

  This, or next week I'll try to do tvb.

 ... or maybe this week:

 ver0 | 18,055,719,820 (---) | Base version
 96f0585268f1cc4e820767c4038c10ed4915c12a
 ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_* to
 wmem with file scope
 ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_* to
 wmem with file/packet scope
 ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_* to
 simple object allocator with epan scope
 ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_* to
 simple object allocator with file scope

 I have uploaded patches  profiler outputs to:
 http://www.wireshark.org/~darkjames/tvb-opt-allocator/

 Please review, and check what version is OK to be applied.


 P.S: I'll might find some time to do ver5 with slab allocator, but it'll
 look like object allocator API with epan scope.

 Cheers,
 Jakub.

 ___
 Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
 Archives:http://www.wireshark.org/lists/wireshark-dev
 Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
  mailto:wireshark-dev-requ...@wireshark.org
 ?subject=unsubscribe



___
Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
Archives:http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
 mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe

Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)

2014-07-11 Thread Evan Huus
On Fri, Jul 11, 2014 at 6:07 PM, Anders Broman a.broma...@gmail.com wrote:


 Den 11 jul 2014 23:13 skrev Bálint Réczey bal...@balintreczey.hu:

 
  Hi All,
 
  Please provide the input data for letting others reproduce the results
  or perform the performance tests on pcap files already available to
  the public

 Ok I'll see if we can use something from the wiki instead.

 
  I'm not a fan of implementing custom memory management methods because
  partly because I highly doubt we can beat jemalloc easily on
  performance and custom allocation methods can also have nasty bugs
  like the one observed in OpenSSL:
  http://www.tedunangst.com/flak/post/analysis-of-openssl-freelist-reuse
 

 We have gone through a set of memory allocation schemes already to try to
 improve performance (g_slice, emen and now wmem)

Technically, emem was originally added to fix memory leaks caused by
exceptions. Wmem (as just a better-architected replacement for emem) is
similar.

The block allocator was added later as an optimization because it provided
a *significant* improvement in performance over a simple linked-list of
OS-alloced blocks (5-10x faster in my benchmarks).

 are you saying that you are opposed to that?

 As wmem seems to be the faster scheme for packet scope memory allocation
 /free, why not use it in all possible places where  the scope is packet?

  Please don't sacrifice protection for 2% speedup. Please keep wmem
  usage for cases where it is used for garbage collecting (free() after
  end of frame/capture file) not when the allocation and deallocation
  are already done properly.

 ? A slow scheme might be working well but that in it self does not warrant
 to not replace it with a faster one. With this reasoning we shouldn't have
 introduced ep memory in the first place.

 What percentage if improvement do you think makes a change worthwhile?

 The set of improvements Jacub and I have done lately has given a reduction
 of 40-50 percent compared to 1.10 measuring with the sample file. The
 problem is that each improvement only yeald a percent or 2. Agreed that we
 shouldn't complicate the code for a small speed gain.

 In your blog you say that people would accept double the execution time
 with increased security - I'm not so sure. If say the reformatting of a
 video takes one hour instead of 30 minutes.

 Just my 2 cents
 Anders
 
  Thanks,
  Balint
 
  2014-07-11 8:58 GMT+02:00 Jakub Zawadzki darkjames...@darkjames.pl:
   Hi,
  
   On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote:
   If we're in topic of optimizing 'slower' [de]allocations in common
 functions:
  
   - tvb allocation/deallocation (2.5%, or 3.4% when no filtering)
  
  243,931,671  *  ???:tvb_new
 [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
  202,052,290 ???:g_slice_alloc (2463493x)
 [/usr/lib64/libglib-2.0.so.0.3600.4]
  
  291,765,126  *  ???:tvb_free_chain
 [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
  256,390,635 ???:g_slice_free1 (2435843x)
 [/usr/lib64/libglib-2.0.so.0.3600.4]
  
   This, or next week I'll try to do tvb.
  
   ... or maybe this week:
  
   ver0 | 18,055,719,820 (---) | Base version
 96f0585268f1cc4e820767c4038c10ed4915c12a
   ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_*
 to wmem with file scope
   ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_*
 to wmem with file/packet scope
   ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_*
 to simple object allocator with epan scope
   ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_*
 to simple object allocator with file scope
  
   I have uploaded patches  profiler outputs to:
 http://www.wireshark.org/~darkjames/tvb-opt-allocator/
  
   Please review, and check what version is OK to be applied.
  
  
   P.S: I'll might find some time to do ver5 with slab allocator, but
 it'll look like object allocator API with epan scope.
  
   Cheers,
   Jakub.
  
 ___
   Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
   Archives:http://www.wireshark.org/lists/wireshark-dev
   Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
mailto:wireshark-dev-requ...@wireshark.org
 ?subject=unsubscribe
 
 ___
  Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
  Archives:http://www.wireshark.org/lists/wireshark-dev
  Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
   mailto:wireshark-dev-requ...@wireshark.org
 ?subject=unsubscribe


 ___
 Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
 Archives:http://www.wireshark.org/lists/wireshark-dev
 Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
  

Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)

2014-07-11 Thread Bálint Réczey
Hi Evan,

2014-07-11 23:51 GMT+02:00 Evan Huus eapa...@gmail.com:
 On Fri, Jul 11, 2014 at 5:12 PM, Bálint Réczey bal...@balintreczey.hu
 wrote:

 Hi All,

 Please provide the input data for letting others reproduce the results
 or perform the performance tests on pcap files already available to
 the public.

 I'm not a fan of implementing custom memory management methods because
 partly because I highly doubt we can beat jemalloc easily on
 performance


 The only place we reliably beat jemalloc (or even glib) is when we have a
 large number of allocations that live together, and can be freed with
 free_all. Anything else is basically noise. As Jakub's test noted, the main
 block allocator is actually slightly slower than g_slice_* if the frees are
 done manually.


 and custom allocation methods can also have nasty bugs
 like the one observed in OpenSSL:
 http://www.tedunangst.com/flak/post/analysis-of-openssl-freelist-reuse

 I wrote a short post about making all programs in Debian resistant to
 malloc() related attacks using ASAN and wmem in its current form
 prevents implementing the protection:

 http://balintreczey.hu/blog/proposing-amd64-hardened-architecture-for-debian/


 It's not clear to me from reading the blog post or the mail to debian what
 the actual protections would be, or why wmem would prevent them from
 working. Could you clarify please? Glib has its own allocator logic
 internally for g_slice_*, does this also cause problems?
I plan using ASAN for all programs which would catch (among others)
use-after-free and reading below or over the malloc()-ed
memory area. Those can't be caught if the program uses another layer
of bulk memory allocations.
g_malloc() and g_slice_* has the same problem, but they can be
overrideb by passing G_SLICE=always-malloc .

I know wmem has simple allocator, but wmem_free() is really
inefficient and as a fix I would like to propose removing wmem_free()
from the API.
IMO Wireshark needs wmem_alloc() for allocating and freeing memory
needed during whole scopes, but should not offer wmem_free() and
wmem_realloc() to let us able to provide efficient per-scope
allocations. Dissectors which should simply do g_malloc()/g_free() for
allocations when they know when they need to free memory and using
wmem_free() also does not keep the promise of having a
wmem_alloc()-ated object available during the whole scope.

Wmem also have a lot of data structures reimplemented using
wmem-backed memory, but I think it would be way easier to use GLists
registered to be g_list_free()-d using wmem_register_callback() than
using wmem_list_* functions for manipulating per-scope allocated lists
for example.

Cheers,
Balint



 Please don't sacrifice protection for 2% speedup. Please keep wmem
 usage for cases where it is used for garbage collecting (free() after
 end of frame/capture file) not when the allocation and deallocation
 are already done properly.

 Thanks,
 Balint

 2014-07-11 8:58 GMT+02:00 Jakub Zawadzki darkjames...@darkjames.pl:
  Hi,
 
  On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote:
  If we're in topic of optimizing 'slower' [de]allocations in common
  functions:
 
  - tvb allocation/deallocation (2.5%, or 3.4% when no filtering)
 
 243,931,671  *  ???:tvb_new
  [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
 202,052,290 ???:g_slice_alloc (2463493x)
  [/usr/lib64/libglib-2.0.so.0.3600.4]
 
 291,765,126  *  ???:tvb_free_chain
  [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
 256,390,635 ???:g_slice_free1 (2435843x)
  [/usr/lib64/libglib-2.0.so.0.3600.4]
 
  This, or next week I'll try to do tvb.
 
  ... or maybe this week:
 
  ver0 | 18,055,719,820 (---) | Base version
  96f0585268f1cc4e820767c4038c10ed4915c12a
  ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_* to
  wmem with file scope
  ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_* to
  wmem with file/packet scope
  ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_* to
  simple object allocator with epan scope
  ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_* to
  simple object allocator with file scope
 
  I have uploaded patches  profiler outputs to:
  http://www.wireshark.org/~darkjames/tvb-opt-allocator/
 
  Please review, and check what version is OK to be applied.
 
 
  P.S: I'll might find some time to do ver5 with slab allocator, but it'll
  look like object allocator API with epan scope.
 
  Cheers,
  Jakub.
 
  ___
  Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
  Archives:http://www.wireshark.org/lists/wireshark-dev
  Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
 
  mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe

 ___
 Sent via:Wireshark-dev mailing list 

Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)

2014-07-11 Thread Bálint Réczey
2014-07-12 0:07 GMT+02:00 Anders Broman a.broma...@gmail.com:

 Den 11 jul 2014 23:13 skrev Bálint Réczey bal...@balintreczey.hu:



 Hi All,

 Please provide the input data for letting others reproduce the results
 or perform the performance tests on pcap files already available to
 the public

 Ok I'll see if we can use something from the wiki instead.


 I'm not a fan of implementing custom memory management methods because
 partly because I highly doubt we can beat jemalloc easily on
 performance and custom allocation methods can also have nasty bugs
 like the one observed in OpenSSL:
 http://www.tedunangst.com/flak/post/analysis-of-openssl-freelist-reuse


 We have gone through a set of memory allocation schemes already to try to
 improve performance (g_slice, emen and now wmem) are you saying that you are
 opposed to that?
No, IMO there is need for wmem for being able to keep memory allocated
during scopes,
but I would prefer seeing it only tracking and one-by-one freeing the
memory areas instead of doing bulk alloc()-s for optimizing for speed.

I tried to detail this in my answer to Evan's email.


 As wmem seems to be the faster scheme for packet scope memory allocation
 /free, why not use it in all possible places where  the scope is packet?

 Please don't sacrifice protection for 2% speedup. Please keep wmem
 usage for cases where it is used for garbage collecting (free() after
 end of frame/capture file) not when the allocation and deallocation
 are already done properly.

 ? A slow scheme might be working well but that in it self does not warrant
 to not replace it with a faster one. With this reasoning we shouldn't have
 introduced ep memory in the first place.

 What percentage if improvement do you think makes a change worthwhile?

 The set of improvements Jacub and I have done lately has given a reduction
 of 40-50 percent compared to 1.10 measuring with the sample file. The
 problem is that each improvement only yeald a percent or 2. Agreed that we
 shouldn't complicate the code for a small speed gain.
40-50% reduction is great and congratulations for such a speedup!
I hope memory allocation handling is responsible for only small
fraction of it because I would like to keep the possibility of
detecting memory allocation related errors and I would prefer using
tools implemented outside of Wireshark.
For example I would avoid bulk allocations to make us able to use ASAN
and Valgrind, even if we would implement our canaries.


 In your blog you say that people would accept double the execution time with
 increased security - I'm not so sure. If say the reformatting of a video
 takes one hour instead of 30 minutes.
In Debian you would be able to pick a slow and secure video player for
streaming from the untrusted Internet and a fast and less secure video
encoder.
I expect people to install Wireshark from the hardened-amd64
repository if they want to monitor a hostile network, while others
pick the fast Wireshark for using it in their safe labs.
So it depends and there will be good options to choose from.

Cheers,
Balint


 Just my 2 cents
 Anders


 Thanks,
 Balint

 2014-07-11 8:58 GMT+02:00 Jakub Zawadzki darkjames...@darkjames.pl:
  Hi,
 
  On Sat, Jun 21, 2014 at 10:12:48PM +0200, Jakub Zawadzki wrote:
  If we're in topic of optimizing 'slower' [de]allocations in common
  functions:
 
  - tvb allocation/deallocation (2.5%, or 3.4% when no filtering)
 
 243,931,671  *  ???:tvb_new
  [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
 202,052,290 ???:g_slice_alloc (2463493x)
  [/usr/lib64/libglib-2.0.so.0.3600.4]
 
 291,765,126  *  ???:tvb_free_chain
  [/tmp/wireshark/epan/.libs/libwireshark.so.0.0.0]
 256,390,635 ???:g_slice_free1 (2435843x)
  [/usr/lib64/libglib-2.0.so.0.3600.4]
 
  This, or next week I'll try to do tvb.
 
  ... or maybe this week:
 
  ver0 | 18,055,719,820 (---) | Base version
  96f0585268f1cc4e820767c4038c10ed4915c12a
  ver1 | 18,185,185,838 (0.6% slower) | Change tvb allocator g_slice_* to
  wmem with file scope
  ver2 | 17,809,433,204 (1.4% faster) | Change tvb allocator g_slice_* to
  wmem with file/packet scope
  ver3 | 17,812,128,887 (1.3% faster) | Change tvb allocator g_slice_* to
  simple object allocator with epan scope
  ver4 | 17,704,132,561 (2.0% faster) | Change tvb allocator g_slice_* to
  simple object allocator with file scope
 
  I have uploaded patches  profiler outputs to:
  http://www.wireshark.org/~darkjames/tvb-opt-allocator/
 
  Please review, and check what version is OK to be applied.
 
 
  P.S: I'll might find some time to do ver5 with slab allocator, but it'll
  look like object allocator API with epan scope.
 
  Cheers,
  Jakub.
 
  ___
  Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org
  Archives:http://www.wireshark.org/lists/wireshark-dev
  Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
 
  

Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits] master b6d20a2: Optimize reseting epan_dissect_t when filtering.)

2014-07-11 Thread Evan Huus
On Fri, Jul 11, 2014 at 9:08 PM, Evan Huus eapa...@gmail.com wrote:

 (sorry Balint for the double-post, I don't know why my reply button
 dropped the mailing list)


 -- Forwarded message --
 From: Evan Huus eapa...@gmail.com
 Date: Fri, Jul 11, 2014 at 9:05 PM
 Subject: Re: [Wireshark-dev] tvb allocator (was: Re: [Wireshark-commits]
 master b6d20a2: Optimize reseting epan_dissect_t when filtering.)
 To: Bálint Reczey bal...@balintreczey.hu


 On Fri, Jul 11, 2014 at 8:27 PM, Bálint Réczey bal...@balintreczey.hu
 wrote:

 Hi Evan,

 2014-07-11 23:51 GMT+02:00 Evan Huus eapa...@gmail.com:
  On Fri, Jul 11, 2014 at 5:12 PM, Bálint Réczey bal...@balintreczey.hu
  wrote:
 
  Hi All,
 
  Please provide the input data for letting others reproduce the results
  or perform the performance tests on pcap files already available to
  the public.
 
  I'm not a fan of implementing custom memory management methods because
  partly because I highly doubt we can beat jemalloc easily on
  performance
 
 
  The only place we reliably beat jemalloc (or even glib) is when we have
 a
  large number of allocations that live together, and can be freed with
  free_all. Anything else is basically noise. As Jakub's test noted, the
 main
  block allocator is actually slightly slower than g_slice_* if the frees
 are
  done manually.
 
 
  and custom allocation methods can also have nasty bugs
  like the one observed in OpenSSL:
  http://www.tedunangst.com/flak/post/analysis-of-openssl-freelist-reuse
 
  I wrote a short post about making all programs in Debian resistant to
  malloc() related attacks using ASAN and wmem in its current form
  prevents implementing the protection:
 
 
 http://balintreczey.hu/blog/proposing-amd64-hardened-architecture-for-debian/
 
 
  It's not clear to me from reading the blog post or the mail to debian
 what
  the actual protections would be, or why wmem would prevent them from
  working. Could you clarify please? Glib has its own allocator logic
  internally for g_slice_*, does this also cause problems?
 I plan using ASAN for all programs which would catch (among others)
 use-after-free and reading below or over the malloc()-ed
 memory area. Those can't be caught if the program uses another layer
 of bulk memory allocations.
 g_malloc() and g_slice_* has the same problem, but they can be
 overrideb by passing G_SLICE=always-malloc .

 I know wmem has simple allocator


 Which can be foreced everywhere by passing
 WIRESHARK_DEBUG_WMEM_OVERRIDE=simple like we do in valgrind-wireshark.sh.
 I don't see how this is different from using G_SLICE=always-malloc?


 but wmem_free() is really
 inefficient


 I grant that wmem_simple_free has a terrible worst-case behaviour, but its
 practical efficiency is actually the best of anything I've tried. The only
 alternative which doesn't involve adding a prefix to the memory block
 (which would break ASAN/valgrind/etc) is a hash table, which is actually
 substantially slower.


(we had part of this discussion already once when I first introduced that
change: https://code.wireshark.org/review/1602)

 and as a fix I would like to propose removing wmem_free()
 from the API.
 IMO Wireshark needs wmem_alloc() for allocating and freeing memory
 needed during whole scopes, but should not offer wmem_free() and
 wmem_realloc() to let us able to provide efficient per-scope
 allocations.


 Hmm. We need free and realloc to efficiently implement certain data
 structures, and I see you've addressed that below, so I will include my
 answer there.


 Dissectors which should simply do g_malloc()/g_free() for
 allocations when they know when they need to free memory


 (which they can't do safely if they also call proto_tree functions
 inbetween)


 and using
 wmem_free() also does not keep the promise of having a
 wmem_alloc()-ated object available during the whole scope.


 I don't think that promise was ever made, really. The promise is that such
 memory won't be available after the scope, not that it will be available
 right until the end.


 Wmem also have a lot of data structures reimplemented using
 wmem-backed memory, but I think it would be way easier to use GLists
 registered to be g_list_free()-d using wmem_register_callback() than
 using wmem_list_* functions for manipulating per-scope allocated lists
 for example.


 This is a follow-up to the performance concern. If we use callbacks to
 free everything, we lose the *substantial* speed-up we get from the
 free_all operation. I grant that we should avoid optimizations unless
 well-justified, but I do believe this is well-justified given the benefit.

 Taking it to the extreme, wmem could be made *extraordinarily* simple if
 it were nothing more than a linked list of callbacks (some to g_free, some
 to g_list_free, etc) and we used glib memory for everything. It would just
 be very, very, very slow. The simplicity has an obvious appeal, but I can't
 justify a performance hit of that magnitude.

 Cheers