On 04/12/2018 05:44 PM, Steve Freyder wrote:
> On 4/12/2018 5:23 AM, Philippe Gerum wrote:
>> On 04/12/2018 11:31 AM, Philippe Gerum wrote:
>>> On 04/09/2018 01:01 AM, Steve Freyder wrote:
>>>> On 4/2/2018 11:51 AM, Philippe Gerum wrote:
>>>>> On 04/02/2018 06:11 PM, Steve Freyder wrote:
>>>>>> On 4/2/2018 10:20 AM, Philippe Gerum wrote:
>>>>>>> On 04/02/2018 04:54 PM, Steve Freyder wrote:
>>>>>>>> On 4/2/2018 8:41 AM, Philippe Gerum wrote:
>>>>>>>>> On 04/01/2018 07:28 PM, Steve Freyder wrote:
>>>>>>>>>> Greetings again.
>>>>>>>>>>
>>>>>>>>>> As I understand it, for each rt_queue there's supposed to be a
>>>>>>>>>> "status file" located in the fuse filesystem underneath the
>>>>>>>>>> "/run/xenomai/user/session/pid/alchemy/queues" directory, with
>>>>>>>>>> the file name being the queue name.  This used to contain very
>>>>>>>>>> useful info about queue status, message counts, etc.  I don't
>>>>>>>>>> know
>>>>>>>>>> when it broke or whether it's something I'm doing wrong but I'm
>>>>>>>>>> now getting a "memory exhausted" message on the console when I
>>>>>>>>>> attempt to do a "cat" on the status file.
>>>>>>>>>>
>>>>>>>>>> Here's a small C program that just creates a queue, and then does
>>>>>>>>>> a pause to hold the accessor count non-zero.
>>>>>>>>>>
>>>>>>>>> <snip>
>>>>>>>>>
>>>>>>>>>> The resulting output (logged in via the system console):
>>>>>>>>>>
>>>>>>>>>> # sh qtest.sh
>>>>>>>>>> + sleep 1
>>>>>>>>>> + ./qc --mem-pool-size=64M --session=mysession foo
>>>>>>>>>> + find /run/xenomai
>>>>>>>>>> /run/xenomai
>>>>>>>>>> /run/xenomai/root
>>>>>>>>>> /run/xenomai/root/mysession
>>>>>>>>>> /run/xenomai/root/mysession/821
>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy
>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks
>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks/task@1[821]
>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues
>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>>>>>> /run/xenomai/root/mysession/system
>>>>>>>>>> /run/xenomai/root/mysession/system/threads
>>>>>>>>>> /run/xenomai/root/mysession/system/heaps
>>>>>>>>>> /run/xenomai/root/mysession/system/version
>>>>>>>>>> + qfile='/run/xenomai/*/*/*/alchemy/queues/foo'
>>>>>>>>>> + cat /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>>>>>> memory exhausted
>>>>>>>>>>
>>>>>>>>>> At this point, it hangs, although SIGINT usually terminates it.
>>>>>>>>>>
>>>>>>>>>> I've seen some cases where SIGINT won't terminate it, and a
>>>>>>>>>> reboot is
>>>>>>>>>> required to clean things up.  I see this message appears to be
>>>>>>>>>> logged
>>>>>>>>>> in the obstack error handler.  I don't think I'm running out of
>>>>>>>>>> memory,
>>>>>>>>>> which makes me think "heap corruption".  Not much of an analysis!
>>>>>>>>>> I did
>>>>>>>>>> try varying queue sizes and max message counts - no change.
>>>>>>>>>>
>>>>>>>>> I can't reproduce this. I would suspect a rampant memory
>>>>>>>>> corruption
>>>>>>>>> too,
>>>>>>>>> although running the test code over valgrind (mercury build)
>>>>>>>>> did not
>>>>>>>>> reveal any issue.
>>>>>>>>>
>>>>>>>>> - which Xenomai version are you using?
>>>>>>>>> - cobalt / mercury ?
>>>>>>>>> - do you enable the shared heap when configuring ?
>>>>>>>>> (--enable-pshared)
>>>>>>>>>
>>>>>>>> I'm using Cobalt.  uname -a reports:
>>>>>>>>
>>>>>>>> Linux sdftest 4.1.18_C01571-15S00-00.000.zimg+83fdace666 #2 SMP Fri
>>>>>>>> Mar
>>>>>>>> 9 11:07:52 CST 2018 armv7l GNU/Linux
>>>>>>>>
>>>>>>>> Here is the config dump:
>>>>>>>>
>>>>>>>> CONFIG_XENO_PSHARED=1
>>>>>>> Any chance you could have some leftover files in /dev/shm from
>>>>>>> aborted
>>>>>>> runs, which would steal RAM?
>>>>>>>
>>>>>> I've been rebooting before each test run, but I'll keep that in
>>>>>> mind for
>>>>>> future testing.
>>>>>>
>>>>>> Sounds like I need to try rolling back to an older build, I have a
>>>>>> 3.0.5
>>>>>> and a 3.0.3 build handy.
>>>>> The standalone test should work with the shared heap disabled,
>>>>> could you
>>>>> check it against a build configure with --disable-pshared? Thanks,
>>>>>
>>>> Philippe,
>>>>
>>>> Sorry for the delay - our vendor had been doing all of our kernel
>>>> and SDK
>>>> builds so I had to do a lot of learning to get this all going.
>>>>
>>>> With the --disable-pshared in effect:
>>>>
>>>> /.g3l # ./qc --dump-config | grep SHARED
>>>> based on Xenomai/cobalt v3.0.6 -- #6e34bb5 (2018-04-01 10:50:59 +0200)
>>>> CONFIG_XENO_PSHARED is OFF
>>>>
>>>> /.g3l # ./qc foo &
>>>> /.g3l # find /run/xenomai/
>>>> /run/xenomai/
>>>> /run/xenomai/root
>>>> /run/xenomai/root/opus
>>>> /run/xenomai/root/opus/3477
>>>> /run/xenomai/root/opus/3477/alchemy
>>>> /run/xenomai/root/opus/3477/alchemy/tasks
>>>> /run/xenomai/root/opus/3477/alchemy/tasks/qcreate3477
>>>> /run/xenomai/root/opus/3477/alchemy/queues
>>>> /run/xenomai/root/opus/3477/alchemy/queues/foo
>>>> /run/xenomai/root/opus/system
>>>> /run/xenomai/root/opus/system/threads
>>>> /run/xenomai/root/opus/system/heaps
>>>> /run/xenomai/root/opus/system/version
>>>> root@ICB-G3L:/.g3l # cat run/xenomai/root/opus/3477/alchemy/queues/foo
>>>> [TYPE]  [TOTALMEM]  [USEDMEM]  [QLIMIT]  [MCOUNT]
>>>>   FIFO        5344       3248        10         0
>>>>
>>>> Perfect!
>>>>
>>>> What's the next step?
>>>>
>>> I can reproduce this issue. I'm on it.
>>>
>> The patch below should solve the problem for the registry, however this
>> may have uncovered a bug in the "tlsf" allocator (once again), which
>> should not have failed allocating memory. Two separate issues then.
>>
>> diff --git a/include/copperplate/registry-obstack.h
>> b/include/copperplate/registry-obstack.h
>> index fe192faf7..48e453bc3 100644
>> --- a/include/copperplate/registry-obstack.h
>> +++ b/include/copperplate/registry-obstack.h
>> @@ -29,11 +29,12 @@ struct threadobj;
>>   struct syncobj;
>>
>>   /*
>> - * Assume we may want fast allocation of private memory from real-time
>> - * mode when growing the obstack.
>> + * Obstacks are grown from handlers called by the fusefs server
>> + * thread, which has no real-time requirement: malloc/free is fine for
>> + * memory management.
>>    */
>> -#define obstack_chunk_alloc    pvmalloc
>> -#define obstack_chunk_free    pvfree
>> +#define obstack_chunk_alloc    malloc
>> +#define obstack_chunk_free    free
>>
>>   struct threadobj;
>>
>>
> Thanks Philippe,
> 
> I shall add this to my build ASAP.  If I understand correctly, this is
> switching
> the entire registry-obstack-related dynamic storage allocation mechanism
> from the
> "pv" routines (TLSF allocator?) paradigm to the standard malloc/free
> paradigm.

Yes, because the context that runs those calls is a non-rt one in
essence, i.e. the fuse filesystem server thread.

> 
> I ask because my next issue report was going to be about a SEGV that I have
> been seeing occasionally in registry_add_file() after having called
> pvstrdup()
> and having gotten a NULL return back.  The caller there apparently does not
> expect a NULL return, so when you said "should not have failed
> allocating memory"
> that brought my attention back to the SEGV issue.  This appears to be
> related to
> what I will call "heavy registry activity" when I am initializing -
> creating
> lots of RT tasks, queues, mutexes, etc, causing hot activity in
> registry_add_file
> I would expect.

Your assumption is spot on. There is a flaw in the way the private
memory allocator is sized in pshared mode (i.e. tlsf in that case). Only
8k are reserved for the main private pool pvmalloc() refers to, that
size was picked in order to limit the amount of locked memory reserved
by an allocator which is usually under low pressure when pshared is
enabled. The consequence of random pvmalloc() calls implicitly pulling
memory from that heap was overlooked.

Problem is that a high registry activity exactly defeats that
assumption, consuming much more memory as obstacks grow with printout data.

I've been converting pvmalloc() calls to plain malloc() when applicable
lately, but we may still need a way to let the user specify which amount
of private memory would be required for the remaining pvmalloc() calls.

> 
> I'm thinking creation of a "registry exerciser" program may be in order...
> 
> 

Yeah, possibly. The fuse-based registry was assumed to be straighforward
to implement in theory, but ended up with lots of brain damage corner cases.

-- 
Philippe.

_______________________________________________
Xenomai mailing list
Xenomai@xenomai.org
https://xenomai.org/mailman/listinfo/xenomai

Reply via email to