On Fri, Dec 2, 2022 at 11:59 AM Ilya Maximets <i.maxim...@ovn.org> wrote:
>
> On 12/2/22 11:36, Maxime Coquelin wrote:
> >
> >
> > On 12/2/22 11:09, David Marchand wrote:
> >> On Wed, Nov 30, 2022 at 9:30 PM Ilya Maximets <i.maxim...@ovn.org> wrote:
> >>>>>>> Shouldn't this be 0x7f instead?
> >>>>>>> 0x3f doesn't enable bit #6, which is responsible for dumping
> >>>>>>> shared huge pages.  Or am I missing something?
> >>>>>>
> >>>>>> That's a good point, the hugepage may or may not be private. I'll send
> >>>>>> in a new one.
> >>>>>
> >>>>> OK.  One thing to think about though is that we'll grab
> >>>>> VM memory, I guess, in case we have vhost-user ports.
> >>>>> So, the core dump size can become insanely huge.
> >>>>>
> >>>>> The downside of not having them is inability to inspect
> >>>>> virtqueues and stuff in the dump.
> >>>>
> >>>> Did you consider madvise()?
> >>>>
> >>>>         MADV_DONTDUMP (since Linux 3.4)
> >>>>                Exclude from a core dump those pages in the range
> >>>> specified by addr and length.  This is useful in applications that
> >>>> have large areas of memory that are known not to be useful in a core
> >>>> dump.  The effect of  MADV_DONT‐
> >>>>                DUMP takes precedence over the bit mask that is set via
> >>>> the /proc/[pid]/coredump_filter file (see core(5)).
> >>>>
> >>>>         MADV_DODUMP (since Linux 3.4)
> >>>>                Undo the effect of an earlier MADV_DONTDUMP.
> >>>
> >>> I don't think OVS actually knows location of particular VM memory
> >>> pages that we do not need.  And dumping virtqueues and stuff is,
> >>> probably, the point of this patch (?).
> >>>
> >>> vhost-user library might have a better idea on which particular parts
> >>> of the memory guest may use for virtqueues and buffers, but I'm not
> >>> 100% sure.
> >>
> >> Yes, distinguishing hugepages of interest is a problem.
> >>
> >> Since v20.05, DPDK mem allocator takes care of excluding (unused)
> >> hugepages from dump.
> >> So with this OVS patch, if we catch private and shared hugepages,
> >> "interesting" DPDK hugepages will get dumped, which is useful for
> >> debugging post mortem.
> >>
> >> Adding Maxime, who will have a better idea of what is possible for the
> >> guest mapping part.
> >>
> >>
> >
> > I wonder if we could do a MADV_DONTDUMP on all the guest memory at mmap
> > time, then there are two cases:
> >   a. vIOMMU = OFF. In this case we could do MADV_DODUMP on virtqueues
> > memory. Doing so, we would have the rings memory, but not their buffers
> > (except if they are located on same hugepages).
> >   b. vIOMMU = ON. In this case we could do MADV_DODUMP on IOTLB_UPDATE
> > new entries and MADV_DONTDUMP on invalidated entries. Doing so we will
> > get both vrings and their buffers the backend is allowed to access.
>
> I guess, while DONTDUMP calls are mainly harmless, the explicit DODUMP
> will override whatever user had in their global configuration.  Meaning
> every DPDK application with vhost ports will start dumping some of the
> guest pages with no actual ability to turn that off.

I initially thought it would work that way, but the DODUMP flag just
disables the DONTDUMP flag.

https://github.com/torvalds/linux/blob/master/mm/madvise.c#L1055
https://github.com/torvalds/linux/blob/master/fs/coredump.c#L1033

Cheers,
M

>
> Can the behavior be configurable?
>
> >
> > I can prepare a PoC quickly if someone is willing to experiment.
> >
> > Regards,
> > Maxime
> >
> >
>

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to