On Fri, Dec 2, 2022 at 11:59 AM Ilya Maximets <i.maxim...@ovn.org> wrote: > > On 12/2/22 11:36, Maxime Coquelin wrote: > > > > > > On 12/2/22 11:09, David Marchand wrote: > >> On Wed, Nov 30, 2022 at 9:30 PM Ilya Maximets <i.maxim...@ovn.org> wrote: > >>>>>>> Shouldn't this be 0x7f instead? > >>>>>>> 0x3f doesn't enable bit #6, which is responsible for dumping > >>>>>>> shared huge pages. Or am I missing something? > >>>>>> > >>>>>> That's a good point, the hugepage may or may not be private. I'll send > >>>>>> in a new one. > >>>>> > >>>>> OK. One thing to think about though is that we'll grab > >>>>> VM memory, I guess, in case we have vhost-user ports. > >>>>> So, the core dump size can become insanely huge. > >>>>> > >>>>> The downside of not having them is inability to inspect > >>>>> virtqueues and stuff in the dump. > >>>> > >>>> Did you consider madvise()? > >>>> > >>>> MADV_DONTDUMP (since Linux 3.4) > >>>> Exclude from a core dump those pages in the range > >>>> specified by addr and length. This is useful in applications that > >>>> have large areas of memory that are known not to be useful in a core > >>>> dump. The effect of MADV_DONT‐ > >>>> DUMP takes precedence over the bit mask that is set via > >>>> the /proc/[pid]/coredump_filter file (see core(5)). > >>>> > >>>> MADV_DODUMP (since Linux 3.4) > >>>> Undo the effect of an earlier MADV_DONTDUMP. > >>> > >>> I don't think OVS actually knows location of particular VM memory > >>> pages that we do not need. And dumping virtqueues and stuff is, > >>> probably, the point of this patch (?). > >>> > >>> vhost-user library might have a better idea on which particular parts > >>> of the memory guest may use for virtqueues and buffers, but I'm not > >>> 100% sure. > >> > >> Yes, distinguishing hugepages of interest is a problem. > >> > >> Since v20.05, DPDK mem allocator takes care of excluding (unused) > >> hugepages from dump. > >> So with this OVS patch, if we catch private and shared hugepages, > >> "interesting" DPDK hugepages will get dumped, which is useful for > >> debugging post mortem. > >> > >> Adding Maxime, who will have a better idea of what is possible for the > >> guest mapping part. > >> > >> > > > > I wonder if we could do a MADV_DONTDUMP on all the guest memory at mmap > > time, then there are two cases: > > a. vIOMMU = OFF. In this case we could do MADV_DODUMP on virtqueues > > memory. Doing so, we would have the rings memory, but not their buffers > > (except if they are located on same hugepages). > > b. vIOMMU = ON. In this case we could do MADV_DODUMP on IOTLB_UPDATE > > new entries and MADV_DONTDUMP on invalidated entries. Doing so we will > > get both vrings and their buffers the backend is allowed to access. > > I guess, while DONTDUMP calls are mainly harmless, the explicit DODUMP > will override whatever user had in their global configuration. Meaning > every DPDK application with vhost ports will start dumping some of the > guest pages with no actual ability to turn that off.
I initially thought it would work that way, but the DODUMP flag just disables the DONTDUMP flag. https://github.com/torvalds/linux/blob/master/mm/madvise.c#L1055 https://github.com/torvalds/linux/blob/master/fs/coredump.c#L1033 Cheers, M > > Can the behavior be configurable? > > > > > I can prepare a PoC quickly if someone is willing to experiment. > > > > Regards, > > Maxime > > > > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev