On 12/2/22 21:14, Mike Pattrick wrote:
> On Fri, Dec 2, 2022 at 1:40 PM Ilya Maximets <i.maxim...@ovn.org> wrote:
>>
>> On 12/2/22 18:59, Mike Pattrick wrote:
>>> On Fri, Dec 2, 2022 at 11:59 AM Ilya Maximets <i.maxim...@ovn.org> wrote:
>>>>
>>>> On 12/2/22 11:36, Maxime Coquelin wrote:
>>>>>
>>>>>
>>>>> On 12/2/22 11:09, David Marchand wrote:
>>>>>> On Wed, Nov 30, 2022 at 9:30 PM Ilya Maximets <i.maxim...@ovn.org> wrote:
>>>>>>>>>>> Shouldn't this be 0x7f instead?
>>>>>>>>>>> 0x3f doesn't enable bit #6, which is responsible for dumping
>>>>>>>>>>> shared huge pages.  Or am I missing something?
>>>>>>>>>>
>>>>>>>>>> That's a good point, the hugepage may or may not be private. I'll 
>>>>>>>>>> send
>>>>>>>>>> in a new one.
>>>>>>>>>
>>>>>>>>> OK.  One thing to think about though is that we'll grab
>>>>>>>>> VM memory, I guess, in case we have vhost-user ports.
>>>>>>>>> So, the core dump size can become insanely huge.
>>>>>>>>>
>>>>>>>>> The downside of not having them is inability to inspect
>>>>>>>>> virtqueues and stuff in the dump.
>>>>>>>>
>>>>>>>> Did you consider madvise()?
>>>>>>>>
>>>>>>>>         MADV_DONTDUMP (since Linux 3.4)
>>>>>>>>                Exclude from a core dump those pages in the range
>>>>>>>> specified by addr and length.  This is useful in applications that
>>>>>>>> have large areas of memory that are known not to be useful in a core
>>>>>>>> dump.  The effect of  MADV_DONT‐
>>>>>>>>                DUMP takes precedence over the bit mask that is set via
>>>>>>>> the /proc/[pid]/coredump_filter file (see core(5)).
>>>>>>>>
>>>>>>>>         MADV_DODUMP (since Linux 3.4)
>>>>>>>>                Undo the effect of an earlier MADV_DONTDUMP.
>>>>>>>
>>>>>>> I don't think OVS actually knows location of particular VM memory
>>>>>>> pages that we do not need.  And dumping virtqueues and stuff is,
>>>>>>> probably, the point of this patch (?).
>>>>>>>
>>>>>>> vhost-user library might have a better idea on which particular parts
>>>>>>> of the memory guest may use for virtqueues and buffers, but I'm not
>>>>>>> 100% sure.
>>>>>>
>>>>>> Yes, distinguishing hugepages of interest is a problem.
>>>>>>
>>>>>> Since v20.05, DPDK mem allocator takes care of excluding (unused)
>>>>>> hugepages from dump.
>>>>>> So with this OVS patch, if we catch private and shared hugepages,
>>>>>> "interesting" DPDK hugepages will get dumped, which is useful for
>>>>>> debugging post mortem.
>>>>>>
>>>>>> Adding Maxime, who will have a better idea of what is possible for the
>>>>>> guest mapping part.
>>>>>>
>>>>>>
>>>>>
>>>>> I wonder if we could do a MADV_DONTDUMP on all the guest memory at mmap
>>>>> time, then there are two cases:
>>>>>   a. vIOMMU = OFF. In this case we could do MADV_DODUMP on virtqueues
>>>>> memory. Doing so, we would have the rings memory, but not their buffers
>>>>> (except if they are located on same hugepages).
>>>>>   b. vIOMMU = ON. In this case we could do MADV_DODUMP on IOTLB_UPDATE
>>>>> new entries and MADV_DONTDUMP on invalidated entries. Doing so we will
>>>>> get both vrings and their buffers the backend is allowed to access.
>>>>
>>>> I guess, while DONTDUMP calls are mainly harmless, the explicit DODUMP
>>>> will override whatever user had in their global configuration.  Meaning
>>>> every DPDK application with vhost ports will start dumping some of the
>>>> guest pages with no actual ability to turn that off.
>>>
>>> I initially thought it would work that way, but the DODUMP flag just
>>> disables the DONTDUMP flag.
>>>
>>> https://github.com/torvalds/linux/blob/master/mm/madvise.c#L1055
>>> https://github.com/torvalds/linux/blob/master/fs/coredump.c#L1033
>>
>> Hmm, interesting.  Makes sense.
>>
>> Thanks for the pointers!
>>
>> So, it should still be 7f regardless in the coredump filter for OVS, right?
>> Do you plan to update the current patch or do you think we should omit
>> shared pages until support for MADV_DO/DONTDUMP is added to vhost library?
>>
>> Note that this will likely not be available in 22.11 as it's not a bug fix.
>> So, 23.11 at the earliest.
>>
>> Basically 2 options:
>>
>> 1. 0x3f and not having shared pages.  Flip to 0x7f with DPDK 23.11 next year.
>>    Pros: Smaller files
>>    Cons: Missing some of the virtqueue memory until [potentially] DPDK 23.11.
>>
>> 2. 0x7f today.
>>    Pros: All the memory is available.
>>    Cons: [Significantly] larger files until [potentially] DPDK 23.11.
>>
>> What do you think?  David, Maxime?
> 
> I'd prefer 7f today. It's disabled by default, has zero impact on end
> users, makes setting up debugging environments more convenient, and on
> distributions with systemd the larger coredumps are managed somewhat
> automatically. The news item already warns about large coredumps.
> 
> WDYT?

Sounds good to me.

> 
> -M
> 
>>>
>>> Cheers,
>>> M
>>>
>>>>
>>>> Can the behavior be configurable?
>>>>
>>>>>
>>>>> I can prepare a PoC quickly if someone is willing to experiment.
>>>>>
>>>>> Regards,
>>>>> Maxime
>>>>>
>>>>>
>>>>
>>>
>>
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to