Hi Ben, So I decided to try find the leak with help of Address Sanitizer as you recommended. Would the following approach make sense?:
- build OVS from sources with flags CFLAGS="-g -O2 -fsanitize=leak -fno-omit-frame-pointer -fno-common" - service openvswitch-switch stop - replace binary */usr/lib/openvswitch-switch/ovs-vswitchd* - service openvswitch-switch start - load the cluster to cause possible mem leaks - look for address sanitizer logs in syslog Thanks, Oleg On Thu, Mar 7, 2019 at 1:33 PM Oleg Bondarev <[email protected]> wrote: > Hi Ben, > > attaching the full dump-heap script output for a 11G core dump, probably > it can bring some more clarity. > > Thanks, > Oleg > > On Thu, Mar 7, 2019 at 11:54 AM Oleg Bondarev <[email protected]> > wrote: > >> >> >> On Wed, Mar 6, 2019 at 7:01 PM Oleg Bondarev <[email protected]> >> wrote: >> >>> >>> I'm thinking if this can be malloc() not returning memory to the system >>> after peak loads: >>> *"Occasionally, free can actually return memory to the operating system >>> and make the process smaller. Usually, all it can do is allow a later call >>> to malloc to reuse the space. In the meantime, the space remains in your >>> program as part of a free-list used internally by malloc." [1]* >>> >>> Does it sound sane? If yes, what would be a best way to check that? >>> >> >> Seems that's not the case. On one of the nodes memory usage by >> ovs-vswitchd grew from 84G to 87G for the past week, and on other nodes >> grows gradually as well. >> >> >>> >>> [1] http://www.gnu.org/software/libc/manual/pdf/libc.pdf >>> >>> Thanks, >>> Oleg >>> >>> On Wed, Mar 6, 2019 at 12:34 PM Oleg Bondarev <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> On Wed, Mar 6, 2019 at 1:08 AM Ben Pfaff <[email protected]> wrote: >>>> >>>>> Starting from 0x30, this looks like a "minimatch" data structure, which >>>>> is a kind of compressed bitwise match against a flow. >>>>> >>>>> 00000030: 0000 0000 0000 4014 0000 0000 0000 0000 >>>>> 00000040: 0000 0000 0000 0000 fa16 3e2b c5d5 0000 0000 0022 0000 0000 >>>>> >>>>> 00000058: 0000 0000 0000 4014 0000 0000 0000 0000 >>>>> 00000068: 0000 0000 ffff ffff ffff ffff ffff 0000 0000 0fff 0000 0000 >>>>> >>>>> I think this corresponds to a flow of this form: >>>>> >>>>> >>>>> pkt_mark=0xc5d5/0xffff,skb_priority=0x3e2bfa16,reg13=0,mpls_label=2,mpls_tc=1,mpls_ttl=0,mpls_bos=0 >>>>> >>>>> Is that at all meaningful? Does it match anything that appears in the >>>>> OpenFlow flow table? >>>>> >>>> >>>> Not sure, actually fa:16:3e:2b:c5:d5 is a mac address of a neutron port >>>> (this is an OpenStack cluster) - the port is a VM port. >>>> fa:16:3e/fa:16:3f - are standard neutron mac prefixes. That makes me >>>> think those might be some actual eth packets (broadcasts?) that somehow >>>> stuck in memory.. >>>> So I didn't find anything similar in the flow tables. I'm attaching >>>> flows of all 5 OVS bridges on the node. >>>> >>>> >>>>> >>>>> Are you using the kernel or DPDK datapath? >>>>> >>>> >>>> It's kernel datapath, no DPDK. Ubuntu with 4.13.0-45 kernel. >>>> >>>> >>>>> >>>>> On Tue, Mar 05, 2019 at 08:42:14PM +0400, Oleg Bondarev wrote: >>>>> > Hi, >>>>> > >>>>> > thanks for your help! >>>>> > >>>>> > On Tue, Mar 5, 2019 at 7:26 PM Ben Pfaff <[email protected]> wrote: >>>>> > >>>>> > > You're talking about the email where you dumped out a repeating >>>>> sequence >>>>> > > from some blocks? That might be the root of the problem, if you >>>>> can >>>>> > > provide some more context. I didn't see from the message where you >>>>> > > found the sequence (was it just at the beginning of each of the 4 >>>>> MB >>>>> > > blocks you reported separately, or somewhere else), how many >>>>> copies of >>>>> > > it, or if you were able to figure out how long each of the blocks >>>>> was. >>>>> > > If you can provide that information I might be able to learn some >>>>> > > things. >>>>> > > >>>>> > >>>>> > Yes, those were beginnings of 0x4000000 size blocks reported by the >>>>> script. >>>>> > I also checked 0x8000000 blocks reported and the content is the same. >>>>> > Examples of how those blocks end: >>>>> > - https://pastebin.com/D9M6T2BA >>>>> > - https://pastebin.com/gNT7XEGn >>>>> > - https://pastebin.com/fqy4XDbN >>>>> > >>>>> > So basically contents of the blocks are sequences of: >>>>> > >>>>> > *00000020: 0000 0000 0000 0000 6500 0000 0000 0000 ........e.......* >>>>> > *00000030: 0000 0000 0000 4014 0000 0000 0000 0000 ......@.........* >>>>> > *00000040: 0000 0000 0000 0000 fa16 3e2b c5d5 0000 ..........>+....* >>>>> > *00000050: 0000 0022 0000 0000 0000 0000 0000 4014 ..."..........@.* >>>>> > *00000060: 0000 0000 0000 0000 0000 0000 ffff ffff ................* >>>>> > *00000070: ffff ffff ffff 0000 0000 0fff 0000 0000 ................* >>>>> > >>>>> > following each other and sometimes separated by sequences like this: >>>>> > >>>>> > *00001040: 6861 6e64 6c65 7232 3537 0000 0000 0000 handler257......* >>>>> > >>>>> > I ran the scripts against several core dumps of several compute >>>>> nodes with >>>>> > the issue and >>>>> > the picture is pretty much the same: 0x4000000 blocks and less >>>>> 0x8000000 >>>>> > blocks. >>>>> > I checked the core dump from a compute node where OVS memory >>>>> consumption >>>>> > was ok: >>>>> > no such block sizes reported. >>>>> > >>>>> > >>>>> > > >>>>> > > On Tue, Mar 05, 2019 at 09:07:55AM +0400, Oleg Bondarev wrote: >>>>> > > > Hi Ben, >>>>> > > > >>>>> > > > I didn't have a chance to debug the scripts yet, but just in >>>>> case you >>>>> > > > missed my last email with examples of repeatable blocks >>>>> > > > and sequences - do you think we still need to analyze further, >>>>> will the >>>>> > > > scripts tell more about the heap? >>>>> > > > >>>>> > > > Thanks, >>>>> > > > Oleg >>>>> > > > >>>>> > > > On Thu, Feb 28, 2019 at 10:14 PM Ben Pfaff <[email protected]> wrote: >>>>> > > > >>>>> > > > > On Tue, Feb 26, 2019 at 01:41:45PM +0400, Oleg Bondarev wrote: >>>>> > > > > > Hi, >>>>> > > > > > >>>>> > > > > > thanks for the scripts, so here's the output for a 24G core >>>>> dump: >>>>> > > > > > https://pastebin.com/hWa3R9Fx >>>>> > > > > > there's 271 entries of 4MB - does it seem something we >>>>> should take a >>>>> > > > > closer >>>>> > > > > > look at? >>>>> > > > > >>>>> > > > > I think that this output really just indicates that the script >>>>> failed. >>>>> > > > > It analyzed a lot of regions but didn't output anything >>>>> useful. If it >>>>> > > > > had worked properly, it would have told us a lot about data >>>>> blocks that >>>>> > > > > had been allocated and freed. >>>>> > > > > >>>>> > > > > The next step would have to be to debug the script. It >>>>> definitely >>>>> > > > > worked for me before, because I have fixed at least 3 or 4 >>>>> bugs based >>>>> > > on >>>>> > > > > it, but it also definitely is a quick hack and not something >>>>> that I can >>>>> > > > > stand behind. I'm not sure how to debug it at a distance. It >>>>> has a >>>>> > > > > large comment that describes what it's trying to do. Maybe >>>>> that would >>>>> > > > > help you, if you want to try to debug it yourself. I guess >>>>> it's also >>>>> > > > > possible that glibc has changed its malloc implementation; if >>>>> so, then >>>>> > > > > it would probably be necessary to start over and build a new >>>>> script. >>>>> > > > > >>>>> > > >>>>> >>>>
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
