On 12/2/22 21:14, Mike Pattrick wrote: > On Fri, Dec 2, 2022 at 1:40 PM Ilya Maximets <i.maxim...@ovn.org> wrote: >> >> On 12/2/22 18:59, Mike Pattrick wrote: >>> On Fri, Dec 2, 2022 at 11:59 AM Ilya Maximets <i.maxim...@ovn.org> wrote: >>>> >>>> On 12/2/22 11:36, Maxime Coquelin wrote: >>>>> >>>>> >>>>> On 12/2/22 11:09, David Marchand wrote: >>>>>> On Wed, Nov 30, 2022 at 9:30 PM Ilya Maximets <i.maxim...@ovn.org> wrote: >>>>>>>>>>> Shouldn't this be 0x7f instead? >>>>>>>>>>> 0x3f doesn't enable bit #6, which is responsible for dumping >>>>>>>>>>> shared huge pages. Or am I missing something? >>>>>>>>>> >>>>>>>>>> That's a good point, the hugepage may or may not be private. I'll >>>>>>>>>> send >>>>>>>>>> in a new one. >>>>>>>>> >>>>>>>>> OK. One thing to think about though is that we'll grab >>>>>>>>> VM memory, I guess, in case we have vhost-user ports. >>>>>>>>> So, the core dump size can become insanely huge. >>>>>>>>> >>>>>>>>> The downside of not having them is inability to inspect >>>>>>>>> virtqueues and stuff in the dump. >>>>>>>> >>>>>>>> Did you consider madvise()? >>>>>>>> >>>>>>>> MADV_DONTDUMP (since Linux 3.4) >>>>>>>> Exclude from a core dump those pages in the range >>>>>>>> specified by addr and length. This is useful in applications that >>>>>>>> have large areas of memory that are known not to be useful in a core >>>>>>>> dump. The effect of MADV_DONT‐ >>>>>>>> DUMP takes precedence over the bit mask that is set via >>>>>>>> the /proc/[pid]/coredump_filter file (see core(5)). >>>>>>>> >>>>>>>> MADV_DODUMP (since Linux 3.4) >>>>>>>> Undo the effect of an earlier MADV_DONTDUMP. >>>>>>> >>>>>>> I don't think OVS actually knows location of particular VM memory >>>>>>> pages that we do not need. And dumping virtqueues and stuff is, >>>>>>> probably, the point of this patch (?). >>>>>>> >>>>>>> vhost-user library might have a better idea on which particular parts >>>>>>> of the memory guest may use for virtqueues and buffers, but I'm not >>>>>>> 100% sure. >>>>>> >>>>>> Yes, distinguishing hugepages of interest is a problem. >>>>>> >>>>>> Since v20.05, DPDK mem allocator takes care of excluding (unused) >>>>>> hugepages from dump. >>>>>> So with this OVS patch, if we catch private and shared hugepages, >>>>>> "interesting" DPDK hugepages will get dumped, which is useful for >>>>>> debugging post mortem. >>>>>> >>>>>> Adding Maxime, who will have a better idea of what is possible for the >>>>>> guest mapping part. >>>>>> >>>>>> >>>>> >>>>> I wonder if we could do a MADV_DONTDUMP on all the guest memory at mmap >>>>> time, then there are two cases: >>>>> a. vIOMMU = OFF. In this case we could do MADV_DODUMP on virtqueues >>>>> memory. Doing so, we would have the rings memory, but not their buffers >>>>> (except if they are located on same hugepages). >>>>> b. vIOMMU = ON. In this case we could do MADV_DODUMP on IOTLB_UPDATE >>>>> new entries and MADV_DONTDUMP on invalidated entries. Doing so we will >>>>> get both vrings and their buffers the backend is allowed to access. >>>> >>>> I guess, while DONTDUMP calls are mainly harmless, the explicit DODUMP >>>> will override whatever user had in their global configuration. Meaning >>>> every DPDK application with vhost ports will start dumping some of the >>>> guest pages with no actual ability to turn that off. >>> >>> I initially thought it would work that way, but the DODUMP flag just >>> disables the DONTDUMP flag. >>> >>> https://github.com/torvalds/linux/blob/master/mm/madvise.c#L1055 >>> https://github.com/torvalds/linux/blob/master/fs/coredump.c#L1033 >> >> Hmm, interesting. Makes sense. >> >> Thanks for the pointers! >> >> So, it should still be 7f regardless in the coredump filter for OVS, right? >> Do you plan to update the current patch or do you think we should omit >> shared pages until support for MADV_DO/DONTDUMP is added to vhost library? >> >> Note that this will likely not be available in 22.11 as it's not a bug fix. >> So, 23.11 at the earliest. >> >> Basically 2 options: >> >> 1. 0x3f and not having shared pages. Flip to 0x7f with DPDK 23.11 next year. >> Pros: Smaller files >> Cons: Missing some of the virtqueue memory until [potentially] DPDK 23.11. >> >> 2. 0x7f today. >> Pros: All the memory is available. >> Cons: [Significantly] larger files until [potentially] DPDK 23.11. >> >> What do you think? David, Maxime? > > I'd prefer 7f today. It's disabled by default, has zero impact on end > users, makes setting up debugging environments more convenient, and on > distributions with systemd the larger coredumps are managed somewhat > automatically. The news item already warns about large coredumps. > > WDYT?
Sounds good to me. > > -M > >>> >>> Cheers, >>> M >>> >>>> >>>> Can the behavior be configurable? >>>> >>>>> >>>>> I can prepare a PoC quickly if someone is willing to experiment. >>>>> >>>>> Regards, >>>>> Maxime >>>>> >>>>> >>>> >>> >> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev