http://defect.opensolaris.org/bz/show_bug.cgi?id=10245


amaguire <alan.maguire at sun.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ACCEPTED                    |CAUSEKNOWN




--- Comment #3 from amaguire <alan.maguire at sun.com>  2009-07-30 04:08:32 ---
I believe the memory leak root cause to be an architectural issue which was
noted by Seb and Sowmini during code review, and also by Anurag when looking at
the code.

The cause of this relates to how we allocate memory to return nwam config data
in the door server. There are three cases where data above and beyond status
needs to be returned:

- a read of a configuration object returns an nvlist of the object's properties
- a walk of the configuration containers - i.e. the /etc/nwam/foo.conf files
returns an nvlist of the configuration files for the NCPs, ems and locations.
- a walk of all objects of a particular type contained in a configuration
container (e.g. nwam_walk_locs()).

The difficulty lies in the third of these. At present, the walk is architected
to return an nvlist of nvlists, where each nvlist is itself an nvlist of the
properties of an object. Problem is, there can be a lot of objects, so we
either need to allocate a massive chunk of memory on the door client side, or
dynamically allocate memory via alloca() on the server side. I chose the latter
option, which I now believe to be an architectural mistake.

It's a mistake that was made for the following reason - I liked the fact that
the walk was taking a full snapshot of the config at a particular point in time
rather than simply returning a list of names of objects, each of which could be
read() individually during the walk function. I resisted this latter approach
on the grounds that configuration objects could appear and disappear during
this process, but I know think it's the better way to do things. Taking this
approach means that the maximum amount of space the door client needs to
allocate is reasonably bounded, and we can simply skip any objects that
disappeared since we read the list of objects for the walk.

This solution will eliminate the problematic alloca() on the door server side,
and reduce the code complexity there. The downside is a slight performance hit
whereby we need to read() each object in turn, but the overhead should not be
significant and will scale with the number of objects configured, which should
not be large in the general case.

So to summarize, we need to do all memory allocation on the door client side,
and to modify the walk function to retrieve a list of object names, and read()
each in turn before operating on it. This should address the stray mmap'ed
memory in netcfgd since it won't be allocating space anymore, simply filling in
data in the door argument passed in.

I'm afraid this is one of those examples where what appeared to be a clever
solution was simply a more sophisticated form of stupidity :-(

-- 
Configure bugmail: http://defect.opensolaris.org/bz/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.

Reply via email to