Re: [lttng-dev] High memory consumption issue on RCU side

2016-09-25 Thread Evgeniy Ivanov
Hi Mathieu,

On Sun, Sep 25, 2016 at 4:10 PM, Mathieu Desnoyers
 wrote:
> Hi,
>
> Did you enable the CDS_LFHT_ACCOUNTING flag for your hash tables at
> creation, or only CDS_LFHT_AUTO_RESIZE ?

Only CDS_LFHT_AUTO_RESIZE, with CDS_LFHT_ACCOUNTING situation with
memory seems to become much better (same effect as setting max number
of buckets limit).

> With only CDS_LFHT_AUTO_RESIZE, the algorithm used in check_resize()
> is to verify if the current chain is longer than CHAIN_LEN_RESIZE_THRESHOLD
> (which is currently 3). It effectively detects bucket collisions and
> resize the hash table accordingly.
>
> If you have both CDS_LFHT_AUTO_RESIZE | CDS_LFHT_ACCOUNTING flags set,
> then it goes as follow: for a small table of size below
> (1UL << (COUNT_COMMIT_ORDER + split_count_order)), we use the
> bucket-chain-length algorithm. This is because the accounting uses
> split-counters, and amortizes the cost of committing to the global
> counter. So it is not precise enough for small tables.
> When we are beyond the threshold, then we use the overall number of
> nodes in the hash table to calculate how we should resize it.
>
> The "resize_target" field of struct cds_lfht (in rculfhash-internal.h)
> is a good way to see the number of buckets that were requested at the
> last resize. This is not exposed in the public API though. You can
> also try compiling rculfhash with -DDEBUG, which will enable debugging
> printouts that tell you how the tables are resized. You can deduce the
> number of buckets from that information.
>
> So if you expect to have many collisions in your hash table,
> I recommend you activate the CDS_LFHT_ACCOUNTING flag.
>
> Hoping this clarifies things,

Thank you very much for explaining and for your help!


> Thanks,
>
> Mathieu
>
> - On Sep 24, 2016, at 2:40 PM, Evgeniy Ivanov lolkaanti...@gmail.com 
> wrote:
>
>> All hash tables are created with 1024 initial buckets (no limit for
>> max number of buckets). Only three tables can contain at most about 5
>> 000 000 nodes, the rest (I think about 5000) tables contain at most
>> 1000-5000 nodes. Big tables have UUID key and CityHash, small tables
>> have a complicated binary key with SuperFastHash. Binary keys are the
>> same between executions, but UUID are generated on the fly and if
>> there're collisions it might explain why memory footprint varies so
>> much.
>>
>> I've set both min and max buckets limits and now RSS looks constant
>> between executions. Thank you very much for pointing to this! Do I
>> understand it correctly, that besides load factor rculfhash also
>> resizes depending on max number of nodes in any bucket? Is there any
>> way to get number of buckets (sorry if I missed it looking into API)
>> allocated by table? This would help to further troubleshoot the issue.
>>
>>
>>
>> On Sat, Sep 24, 2016 at 6:34 PM, Mathieu Desnoyers
>>  wrote:
>>> - On Sep 24, 2016, at 11:22 AM, Paul E. McKenney 
>>> paul...@linux.vnet.ibm.com
>>> wrote:
>>>
 On Sat, Sep 24, 2016 at 10:42:24AM +0300, Evgeniy Ivanov wrote:
> Hi Mathieu,
>
> On Sat, Sep 24, 2016 at 12:59 AM, Mathieu Desnoyers
>  wrote:
> > - On Sep 22, 2016, at 3:14 PM, Evgeniy Ivanov 
> > lolkaanti...@gmail.com wrote:
> >
> >> Hi all,
> >>
> >> I'm investigating high memory usage of my program: RSS varies between
> >> executions in range 20-50 GB, though it should be determenistic. I've
> >> found that all the memory is allocated in this stack:
> >>
> >> Allocated 17673781248 bytes in 556 allocations
> >>cds_lfht_alloc_bucket_table3 from liburcu-cds.so.2.0.0
> >>_do_cds_lfht_resize  from liburcu-cds.so.2.0.0
> >>do_resize_cb from liburcu-cds.so.2.0.0
> >>call_rcu_thread  from liburcu-qsbr.so.2.0.0
> >>start_thread from libpthread-2.12.so
> >>clonefrom libc-2.12.so
> >>
> >> According pstack it should be quiescent state.  Call thread waits on 
> >> syscall:
> >> syscall
> >> call_rcu_thread
> >> start_thread
> >> clone
> >>
> >> We use urcu-0.8.7, only rculfhash (QSBR). Is it some kind of leak in
> >> RCU or any chance I misuse it? What would you recommend to
> >> troubleshoot the situation?
> >
> > urcu-qsbr is the fastest flavor of urcu, but it is rather tricky to use 
> > well.
> > Make sure that:
> >
> > - Each registered thread periodically reach a quiescent state, by:
> >   - Invoking rcu_quiescent_state periodically, and
> >   - Making sure to surround any blocking for relatively large amount of
> > time by rcu_thread_offline()/rcu_thread_online().
> >
> > In urcu-qsbr, the "default" state of threads is to be within a RCU 
> > read-side.
> > Therefore, if you omit any of the two 

Re: [lttng-dev] High memory consumption issue on RCU side

2016-09-25 Thread Mathieu Desnoyers
Hi,

Did you enable the CDS_LFHT_ACCOUNTING flag for your hash tables at
creation, or only CDS_LFHT_AUTO_RESIZE ?

With only CDS_LFHT_AUTO_RESIZE, the algorithm used in check_resize()
is to verify if the current chain is longer than CHAIN_LEN_RESIZE_THRESHOLD
(which is currently 3). It effectively detects bucket collisions and
resize the hash table accordingly.

If you have both CDS_LFHT_AUTO_RESIZE | CDS_LFHT_ACCOUNTING flags set,
then it goes as follow: for a small table of size below
(1UL << (COUNT_COMMIT_ORDER + split_count_order)), we use the
bucket-chain-length algorithm. This is because the accounting uses
split-counters, and amortizes the cost of committing to the global
counter. So it is not precise enough for small tables.
When we are beyond the threshold, then we use the overall number of
nodes in the hash table to calculate how we should resize it.

The "resize_target" field of struct cds_lfht (in rculfhash-internal.h)
is a good way to see the number of buckets that were requested at the
last resize. This is not exposed in the public API though. You can
also try compiling rculfhash with -DDEBUG, which will enable debugging
printouts that tell you how the tables are resized. You can deduce the
number of buckets from that information.

So if you expect to have many collisions in your hash table,
I recommend you activate the CDS_LFHT_ACCOUNTING flag.

Hoping this clarifies things,

Thanks,

Mathieu

- On Sep 24, 2016, at 2:40 PM, Evgeniy Ivanov lolkaanti...@gmail.com wrote:

> All hash tables are created with 1024 initial buckets (no limit for
> max number of buckets). Only three tables can contain at most about 5
> 000 000 nodes, the rest (I think about 5000) tables contain at most
> 1000-5000 nodes. Big tables have UUID key and CityHash, small tables
> have a complicated binary key with SuperFastHash. Binary keys are the
> same between executions, but UUID are generated on the fly and if
> there're collisions it might explain why memory footprint varies so
> much.
> 
> I've set both min and max buckets limits and now RSS looks constant
> between executions. Thank you very much for pointing to this! Do I
> understand it correctly, that besides load factor rculfhash also
> resizes depending on max number of nodes in any bucket? Is there any
> way to get number of buckets (sorry if I missed it looking into API)
> allocated by table? This would help to further troubleshoot the issue.
> 
> 
> 
> On Sat, Sep 24, 2016 at 6:34 PM, Mathieu Desnoyers
>  wrote:
>> - On Sep 24, 2016, at 11:22 AM, Paul E. McKenney 
>> paul...@linux.vnet.ibm.com
>> wrote:
>>
>>> On Sat, Sep 24, 2016 at 10:42:24AM +0300, Evgeniy Ivanov wrote:
 Hi Mathieu,

 On Sat, Sep 24, 2016 at 12:59 AM, Mathieu Desnoyers
  wrote:
 > - On Sep 22, 2016, at 3:14 PM, Evgeniy Ivanov lolkaanti...@gmail.com 
 > wrote:
 >
 >> Hi all,
 >>
 >> I'm investigating high memory usage of my program: RSS varies between
 >> executions in range 20-50 GB, though it should be determenistic. I've
 >> found that all the memory is allocated in this stack:
 >>
 >> Allocated 17673781248 bytes in 556 allocations
 >>cds_lfht_alloc_bucket_table3 from liburcu-cds.so.2.0.0
 >>_do_cds_lfht_resize  from liburcu-cds.so.2.0.0
 >>do_resize_cb from liburcu-cds.so.2.0.0
 >>call_rcu_thread  from liburcu-qsbr.so.2.0.0
 >>start_thread from libpthread-2.12.so
 >>clonefrom libc-2.12.so
 >>
 >> According pstack it should be quiescent state.  Call thread waits on 
 >> syscall:
 >> syscall
 >> call_rcu_thread
 >> start_thread
 >> clone
 >>
 >> We use urcu-0.8.7, only rculfhash (QSBR). Is it some kind of leak in
 >> RCU or any chance I misuse it? What would you recommend to
 >> troubleshoot the situation?
 >
 > urcu-qsbr is the fastest flavor of urcu, but it is rather tricky to use 
 > well.
 > Make sure that:
 >
 > - Each registered thread periodically reach a quiescent state, by:
 >   - Invoking rcu_quiescent_state periodically, and
 >   - Making sure to surround any blocking for relatively large amount of
 > time by rcu_thread_offline()/rcu_thread_online().
 >
 > In urcu-qsbr, the "default" state of threads is to be within a RCU 
 > read-side.
 > Therefore, if you omit any of the two advice above, you end up in a 
 > situation
 > where grace periods never complete, and therefore no call_rcu() 
 > callbacks can
 > be processed. This effectively acts like a big memory leak.

 It was the original assumption, but in memory stacks I don't see such
 allocations for my data. Instead huge allocations happen right in
 call_rcu_thread. Memory footprint for my app is about 20 GB, erasing
 RCU data is 

Re: [lttng-dev] High memory consumption issue on RCU side

2016-09-24 Thread Paul E. McKenney
On Sat, Sep 24, 2016 at 03:34:47PM +, Mathieu Desnoyers wrote:
> - On Sep 24, 2016, at 11:22 AM, Paul E. McKenney 
> paul...@linux.vnet.ibm.com wrote:
> 
> > On Sat, Sep 24, 2016 at 10:42:24AM +0300, Evgeniy Ivanov wrote:
> >> Hi Mathieu,
> >> 
> >> On Sat, Sep 24, 2016 at 12:59 AM, Mathieu Desnoyers
> >>  wrote:
> >> > - On Sep 22, 2016, at 3:14 PM, Evgeniy Ivanov lolkaanti...@gmail.com 
> >> > wrote:
> >> >
> >> >> Hi all,
> >> >>
> >> >> I'm investigating high memory usage of my program: RSS varies between
> >> >> executions in range 20-50 GB, though it should be determenistic. I've
> >> >> found that all the memory is allocated in this stack:
> >> >>
> >> >> Allocated 17673781248 bytes in 556 allocations
> >> >>cds_lfht_alloc_bucket_table3 from liburcu-cds.so.2.0.0
> >> >>_do_cds_lfht_resize  from liburcu-cds.so.2.0.0
> >> >>do_resize_cb from liburcu-cds.so.2.0.0
> >> >>call_rcu_thread  from liburcu-qsbr.so.2.0.0
> >> >>start_thread from libpthread-2.12.so
> >> >>clonefrom libc-2.12.so
> >> >>
> >> >> According pstack it should be quiescent state.  Call thread waits on 
> >> >> syscall:
> >> >> syscall
> >> >> call_rcu_thread
> >> >> start_thread
> >> >> clone
> >> >>
> >> >> We use urcu-0.8.7, only rculfhash (QSBR). Is it some kind of leak in
> >> >> RCU or any chance I misuse it? What would you recommend to
> >> >> troubleshoot the situation?
> >> >
> >> > urcu-qsbr is the fastest flavor of urcu, but it is rather tricky to use 
> >> > well.
> >> > Make sure that:
> >> >
> >> > - Each registered thread periodically reach a quiescent state, by:
> >> >   - Invoking rcu_quiescent_state periodically, and
> >> >   - Making sure to surround any blocking for relatively large amount of
> >> > time by rcu_thread_offline()/rcu_thread_online().
> >> >
> >> > In urcu-qsbr, the "default" state of threads is to be within a RCU 
> >> > read-side.
> >> > Therefore, if you omit any of the two advice above, you end up in a 
> >> > situation
> >> > where grace periods never complete, and therefore no call_rcu() 
> >> > callbacks can
> >> > be processed. This effectively acts like a big memory leak.
> >> 
> >> It was the original assumption, but in memory stacks I don't see such
> >> allocations for my data. Instead huge allocations happen right in
> >> call_rcu_thread. Memory footprint for my app is about 20 GB, erasing
> >> RCU data is a rare operation, so almost 20 GB in rcu thread looks
> >> suspecios. I'll try to not erase any RCU protected data and reproduce
> >> the issue (complicated thing is that under memory tracer it happens
> >> not so often).
> > 
> > Interesting.  Trying to figure out why your call_rcu_thread() would
> > ever allocate memory.
> > 
> > Ah!  Do your RCU callbacks allocate memory?
> 
> In this case yes: urculfhash allocates memory within a call rcu worker
> thread when a hash table resize is performed.

Is this then expected behavior?

Though I must admit that 20GB sounds like some serious resizing...

Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> > 
> > Thanx, Paul
> > 
> >> > Hoping this helps,
> >> >
> >> > Thanks,
> >> >
> >> > Mathieu
> >> >
> >> >
> >> > --
> >> > Mathieu Desnoyers
> >> > EfficiOS Inc.
> >> > http://www.efficios.com
> >> 
> >> 
> >> 
> >> --
> >> Cheers,
> >> Evgeniy
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
> 

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] High memory consumption issue on RCU side

2016-09-24 Thread Evgeniy Ivanov
All hash tables are created with 1024 initial buckets (no limit for
max number of buckets). Only three tables can contain at most about 5
000 000 nodes, the rest (I think about 5000) tables contain at most
1000-5000 nodes. Big tables have UUID key and CityHash, small tables
have a complicated binary key with SuperFastHash. Binary keys are the
same between executions, but UUID are generated on the fly and if
there're collisions it might explain why memory footprint varies so
much.

I've set both min and max buckets limits and now RSS looks constant
between executions. Thank you very much for pointing to this! Do I
understand it correctly, that besides load factor rculfhash also
resizes depending on max number of nodes in any bucket? Is there any
way to get number of buckets (sorry if I missed it looking into API)
allocated by table? This would help to further troubleshoot the issue.



On Sat, Sep 24, 2016 at 6:34 PM, Mathieu Desnoyers
 wrote:
> - On Sep 24, 2016, at 11:22 AM, Paul E. McKenney 
> paul...@linux.vnet.ibm.com wrote:
>
>> On Sat, Sep 24, 2016 at 10:42:24AM +0300, Evgeniy Ivanov wrote:
>>> Hi Mathieu,
>>>
>>> On Sat, Sep 24, 2016 at 12:59 AM, Mathieu Desnoyers
>>>  wrote:
>>> > - On Sep 22, 2016, at 3:14 PM, Evgeniy Ivanov lolkaanti...@gmail.com 
>>> > wrote:
>>> >
>>> >> Hi all,
>>> >>
>>> >> I'm investigating high memory usage of my program: RSS varies between
>>> >> executions in range 20-50 GB, though it should be determenistic. I've
>>> >> found that all the memory is allocated in this stack:
>>> >>
>>> >> Allocated 17673781248 bytes in 556 allocations
>>> >>cds_lfht_alloc_bucket_table3 from liburcu-cds.so.2.0.0
>>> >>_do_cds_lfht_resize  from liburcu-cds.so.2.0.0
>>> >>do_resize_cb from liburcu-cds.so.2.0.0
>>> >>call_rcu_thread  from liburcu-qsbr.so.2.0.0
>>> >>start_thread from libpthread-2.12.so
>>> >>clonefrom libc-2.12.so
>>> >>
>>> >> According pstack it should be quiescent state.  Call thread waits on 
>>> >> syscall:
>>> >> syscall
>>> >> call_rcu_thread
>>> >> start_thread
>>> >> clone
>>> >>
>>> >> We use urcu-0.8.7, only rculfhash (QSBR). Is it some kind of leak in
>>> >> RCU or any chance I misuse it? What would you recommend to
>>> >> troubleshoot the situation?
>>> >
>>> > urcu-qsbr is the fastest flavor of urcu, but it is rather tricky to use 
>>> > well.
>>> > Make sure that:
>>> >
>>> > - Each registered thread periodically reach a quiescent state, by:
>>> >   - Invoking rcu_quiescent_state periodically, and
>>> >   - Making sure to surround any blocking for relatively large amount of
>>> > time by rcu_thread_offline()/rcu_thread_online().
>>> >
>>> > In urcu-qsbr, the "default" state of threads is to be within a RCU 
>>> > read-side.
>>> > Therefore, if you omit any of the two advice above, you end up in a 
>>> > situation
>>> > where grace periods never complete, and therefore no call_rcu() callbacks 
>>> > can
>>> > be processed. This effectively acts like a big memory leak.
>>>
>>> It was the original assumption, but in memory stacks I don't see such
>>> allocations for my data. Instead huge allocations happen right in
>>> call_rcu_thread. Memory footprint for my app is about 20 GB, erasing
>>> RCU data is a rare operation, so almost 20 GB in rcu thread looks
>>> suspecios. I'll try to not erase any RCU protected data and reproduce
>>> the issue (complicated thing is that under memory tracer it happens
>>> not so often).
>>
>> Interesting.  Trying to figure out why your call_rcu_thread() would
>> ever allocate memory.
>>
>> Ah!  Do your RCU callbacks allocate memory?
>
> In this case yes: urculfhash allocates memory within a call rcu worker
> thread when a hash table resize is performed.
>
> Thanks,
>
> Mathieu
>
>>
>>   Thanx, Paul
>>
>>> > Hoping this helps,
>>> >
>>> > Thanks,
>>> >
>>> > Mathieu
>>> >
>>> >
>>> > --
>>> > Mathieu Desnoyers
>>> > EfficiOS Inc.
>>> > http://www.efficios.com
>>>
>>>
>>>
>>> --
>>> Cheers,
>>> Evgeniy
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com



-- 
Cheers,
Evgeniy
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] High memory consumption issue on RCU side

2016-09-24 Thread Paul E. McKenney
On Sat, Sep 24, 2016 at 10:42:24AM +0300, Evgeniy Ivanov wrote:
> Hi Mathieu,
> 
> On Sat, Sep 24, 2016 at 12:59 AM, Mathieu Desnoyers
>  wrote:
> > - On Sep 22, 2016, at 3:14 PM, Evgeniy Ivanov lolkaanti...@gmail.com 
> > wrote:
> >
> >> Hi all,
> >>
> >> I'm investigating high memory usage of my program: RSS varies between
> >> executions in range 20-50 GB, though it should be determenistic. I've
> >> found that all the memory is allocated in this stack:
> >>
> >> Allocated 17673781248 bytes in 556 allocations
> >>cds_lfht_alloc_bucket_table3 from liburcu-cds.so.2.0.0
> >>_do_cds_lfht_resize  from liburcu-cds.so.2.0.0
> >>do_resize_cb from liburcu-cds.so.2.0.0
> >>call_rcu_thread  from liburcu-qsbr.so.2.0.0
> >>start_thread from libpthread-2.12.so
> >>clonefrom libc-2.12.so
> >>
> >> According pstack it should be quiescent state.  Call thread waits on 
> >> syscall:
> >> syscall
> >> call_rcu_thread
> >> start_thread
> >> clone
> >>
> >> We use urcu-0.8.7, only rculfhash (QSBR). Is it some kind of leak in
> >> RCU or any chance I misuse it? What would you recommend to
> >> troubleshoot the situation?
> >
> > urcu-qsbr is the fastest flavor of urcu, but it is rather tricky to use 
> > well.
> > Make sure that:
> >
> > - Each registered thread periodically reach a quiescent state, by:
> >   - Invoking rcu_quiescent_state periodically, and
> >   - Making sure to surround any blocking for relatively large amount of
> > time by rcu_thread_offline()/rcu_thread_online().
> >
> > In urcu-qsbr, the "default" state of threads is to be within a RCU 
> > read-side.
> > Therefore, if you omit any of the two advice above, you end up in a 
> > situation
> > where grace periods never complete, and therefore no call_rcu() callbacks 
> > can
> > be processed. This effectively acts like a big memory leak.
> 
> It was the original assumption, but in memory stacks I don't see such
> allocations for my data. Instead huge allocations happen right in
> call_rcu_thread. Memory footprint for my app is about 20 GB, erasing
> RCU data is a rare operation, so almost 20 GB in rcu thread looks
> suspecios. I'll try to not erase any RCU protected data and reproduce
> the issue (complicated thing is that under memory tracer it happens
> not so often).

Interesting.  Trying to figure out why your call_rcu_thread() would
ever allocate memory.

Ah!  Do your RCU callbacks allocate memory?

Thanx, Paul

> > Hoping this helps,
> >
> > Thanks,
> >
> > Mathieu
> >
> >
> > --
> > Mathieu Desnoyers
> > EfficiOS Inc.
> > http://www.efficios.com
> 
> 
> 
> -- 
> Cheers,
> Evgeniy
> 

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] High memory consumption issue on RCU side

2016-09-24 Thread Mathieu Desnoyers
- On Sep 24, 2016, at 11:22 AM, Paul E. McKenney paul...@linux.vnet.ibm.com 
wrote:

> On Sat, Sep 24, 2016 at 10:42:24AM +0300, Evgeniy Ivanov wrote:
>> Hi Mathieu,
>> 
>> On Sat, Sep 24, 2016 at 12:59 AM, Mathieu Desnoyers
>>  wrote:
>> > - On Sep 22, 2016, at 3:14 PM, Evgeniy Ivanov lolkaanti...@gmail.com 
>> > wrote:
>> >
>> >> Hi all,
>> >>
>> >> I'm investigating high memory usage of my program: RSS varies between
>> >> executions in range 20-50 GB, though it should be determenistic. I've
>> >> found that all the memory is allocated in this stack:
>> >>
>> >> Allocated 17673781248 bytes in 556 allocations
>> >>cds_lfht_alloc_bucket_table3 from liburcu-cds.so.2.0.0
>> >>_do_cds_lfht_resize  from liburcu-cds.so.2.0.0
>> >>do_resize_cb from liburcu-cds.so.2.0.0
>> >>call_rcu_thread  from liburcu-qsbr.so.2.0.0
>> >>start_thread from libpthread-2.12.so
>> >>clonefrom libc-2.12.so
>> >>
>> >> According pstack it should be quiescent state.  Call thread waits on 
>> >> syscall:
>> >> syscall
>> >> call_rcu_thread
>> >> start_thread
>> >> clone
>> >>
>> >> We use urcu-0.8.7, only rculfhash (QSBR). Is it some kind of leak in
>> >> RCU or any chance I misuse it? What would you recommend to
>> >> troubleshoot the situation?
>> >
>> > urcu-qsbr is the fastest flavor of urcu, but it is rather tricky to use 
>> > well.
>> > Make sure that:
>> >
>> > - Each registered thread periodically reach a quiescent state, by:
>> >   - Invoking rcu_quiescent_state periodically, and
>> >   - Making sure to surround any blocking for relatively large amount of
>> > time by rcu_thread_offline()/rcu_thread_online().
>> >
>> > In urcu-qsbr, the "default" state of threads is to be within a RCU 
>> > read-side.
>> > Therefore, if you omit any of the two advice above, you end up in a 
>> > situation
>> > where grace periods never complete, and therefore no call_rcu() callbacks 
>> > can
>> > be processed. This effectively acts like a big memory leak.
>> 
>> It was the original assumption, but in memory stacks I don't see such
>> allocations for my data. Instead huge allocations happen right in
>> call_rcu_thread. Memory footprint for my app is about 20 GB, erasing
>> RCU data is a rare operation, so almost 20 GB in rcu thread looks
>> suspecios. I'll try to not erase any RCU protected data and reproduce
>> the issue (complicated thing is that under memory tracer it happens
>> not so often).
> 
> Interesting.  Trying to figure out why your call_rcu_thread() would
> ever allocate memory.
> 
> Ah!  Do your RCU callbacks allocate memory?

In this case yes: urculfhash allocates memory within a call rcu worker
thread when a hash table resize is performed.

Thanks,

Mathieu

> 
>   Thanx, Paul
> 
>> > Hoping this helps,
>> >
>> > Thanks,
>> >
>> > Mathieu
>> >
>> >
>> > --
>> > Mathieu Desnoyers
>> > EfficiOS Inc.
>> > http://www.efficios.com
>> 
>> 
>> 
>> --
>> Cheers,
>> Evgeniy

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] High memory consumption issue on RCU side

2016-09-24 Thread Evgeniy Ivanov
Hi Mathieu,

On Sat, Sep 24, 2016 at 12:59 AM, Mathieu Desnoyers
 wrote:
> - On Sep 22, 2016, at 3:14 PM, Evgeniy Ivanov lolkaanti...@gmail.com 
> wrote:
>
>> Hi all,
>>
>> I'm investigating high memory usage of my program: RSS varies between
>> executions in range 20-50 GB, though it should be determenistic. I've
>> found that all the memory is allocated in this stack:
>>
>> Allocated 17673781248 bytes in 556 allocations
>>cds_lfht_alloc_bucket_table3 from liburcu-cds.so.2.0.0
>>_do_cds_lfht_resize  from liburcu-cds.so.2.0.0
>>do_resize_cb from liburcu-cds.so.2.0.0
>>call_rcu_thread  from liburcu-qsbr.so.2.0.0
>>start_thread from libpthread-2.12.so
>>clonefrom libc-2.12.so
>>
>> According pstack it should be quiescent state.  Call thread waits on syscall:
>> syscall
>> call_rcu_thread
>> start_thread
>> clone
>>
>> We use urcu-0.8.7, only rculfhash (QSBR). Is it some kind of leak in
>> RCU or any chance I misuse it? What would you recommend to
>> troubleshoot the situation?
>
> urcu-qsbr is the fastest flavor of urcu, but it is rather tricky to use well.
> Make sure that:
>
> - Each registered thread periodically reach a quiescent state, by:
>   - Invoking rcu_quiescent_state periodically, and
>   - Making sure to surround any blocking for relatively large amount of
> time by rcu_thread_offline()/rcu_thread_online().
>
> In urcu-qsbr, the "default" state of threads is to be within a RCU read-side.
> Therefore, if you omit any of the two advice above, you end up in a situation
> where grace periods never complete, and therefore no call_rcu() callbacks can
> be processed. This effectively acts like a big memory leak.

It was the original assumption, but in memory stacks I don't see such
allocations for my data. Instead huge allocations happen right in
call_rcu_thread. Memory footprint for my app is about 20 GB, erasing
RCU data is a rare operation, so almost 20 GB in rcu thread looks
suspecios. I'll try to not erase any RCU protected data and reproduce
the issue (complicated thing is that under memory tracer it happens
not so often).


> Hoping this helps,
>
> Thanks,
>
> Mathieu
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com



-- 
Cheers,
Evgeniy
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] High memory consumption issue on RCU side

2016-09-23 Thread Mathieu Desnoyers
- On Sep 22, 2016, at 3:14 PM, Evgeniy Ivanov lolkaanti...@gmail.com wrote:

> Hi all,
> 
> I'm investigating high memory usage of my program: RSS varies between
> executions in range 20-50 GB, though it should be determenistic. I've
> found that all the memory is allocated in this stack:
> 
> Allocated 17673781248 bytes in 556 allocations
>cds_lfht_alloc_bucket_table3 from liburcu-cds.so.2.0.0
>_do_cds_lfht_resize  from liburcu-cds.so.2.0.0
>do_resize_cb from liburcu-cds.so.2.0.0
>call_rcu_thread  from liburcu-qsbr.so.2.0.0
>start_thread from libpthread-2.12.so
>clonefrom libc-2.12.so
> 
> According pstack it should be quiescent state.  Call thread waits on syscall:
> syscall
> call_rcu_thread
> start_thread
> clone
> 
> We use urcu-0.8.7, only rculfhash (QSBR). Is it some kind of leak in
> RCU or any chance I misuse it? What would you recommend to
> troubleshoot the situation?

urcu-qsbr is the fastest flavor of urcu, but it is rather tricky to use well.
Make sure that:

- Each registered thread periodically reach a quiescent state, by:
  - Invoking rcu_quiescent_state periodically, and
  - Making sure to surround any blocking for relatively large amount of
time by rcu_thread_offline()/rcu_thread_online().

In urcu-qsbr, the "default" state of threads is to be within a RCU read-side.
Therefore, if you omit any of the two advice above, you end up in a situation
where grace periods never complete, and therefore no call_rcu() callbacks can
be processed. This effectively acts like a big memory leak.

Hoping this helps,

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev