> A level-zero question ... how does this bug relate to
> 
>     4391 "OSError: [Errno 12] Not enough space"

I would say that 4612 is a palliative fix for the underlying cause of
4391.  We're just going to make the symptoms occur less frequently,
without actually addressing the problem.

An ENOMEM may occur during fork when the kernel duplicates the address
space of the caller.  In the general case, an identical version of the
calling process must be created.  There's a portion of fork that deals
with generating an address space for the new process.  The common path
calls as_dup.  This, in turn, runs around and duplicates the associated
data structures, making modifications where necessary.

The address space structure contains a series of segments, in the
segment structure, that comprise regions of memory mapped in the
process.  As part of the duplication procedure, SEGOP_DUP gets called,
invoking the duplication routine for each specific segment.  Seg_vn 
deals with mapping anonymous memory (and regular memory, too).

When pages that have come from a file must be paged out, they're either
evicted because they're clean, or written back to disk because the page
was modified.  However, anonymous memory isn't backed by a file.  If
those pages are modified and must be removed from RAM, they must be
written to the swap device.

Solaris has a policy of reserving swap for anonymous memory allocations.
This means that the kernel won't let you allocate an anonymous page
unless it can guarantee that it can write that page to disk if the memory is
needed for another purpose.  I've made a number of technical
simplifications here, but the omissions don't pertain to the nature of
this problem.

Getting back to the explanation, when seg_vn duplicates a range of
anonymous pages, it needs to also reserve swap for those new pages,
should they be modified by the new process.  If it's unable to satisfy
that reservation, it returns ENOMEM, which is the error that's occuring
when our process tries to fork.

The posix_spawn() routines call vforkx().  This doesn't duplicate the
address space, since it's understood that the caller is going to exec
in the very near future.  In that case, the forked process is given the
exact same address space as the parent, no modifications are made.

The fix for 4612 causes subprocess to use posix_spawn instead of fork.
This means we won't fail an anon_resvmem() during fork, but we haven't
reduced the amount of memory that's being used by the system. 

> Should that bug be retargeted at providing a more graceful exit should we
> run into ENOSPC (possible even with 4612 fixed, albeit much less likely)?

Just to be clear, the problem we're encountering is an ENOMEM, not an
ENOSPC.  4612 is just a band-aid.  I would say that 4391 should probably
try to address whatever it is that has consumed so much of our memory.

That said, it wouldn't be a bad thing to have a top-level handler for
EnvironmentErrors, unless you think we ought to be catching them and
raising them as something else?

> Right now it's marked as blocking the release, but it seems that 4612 is a
> better candidate for that, and that we should push 4391 off to the future.

4612 addresses the symptoms that show up in 4391, but it doesn't
actually reduce our memory consumption.  Are people still hitting this
problem, after your actions fix went back?  If they are, perhaps we
should figure out where all the memory has gone before deciding whether
to push 4391 off to the future.

Just my $.02

-j
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to