> A level-zero question ... how does this bug relate to > > 4391 "OSError: [Errno 12] Not enough space"
I would say that 4612 is a palliative fix for the underlying cause of 4391. We're just going to make the symptoms occur less frequently, without actually addressing the problem. An ENOMEM may occur during fork when the kernel duplicates the address space of the caller. In the general case, an identical version of the calling process must be created. There's a portion of fork that deals with generating an address space for the new process. The common path calls as_dup. This, in turn, runs around and duplicates the associated data structures, making modifications where necessary. The address space structure contains a series of segments, in the segment structure, that comprise regions of memory mapped in the process. As part of the duplication procedure, SEGOP_DUP gets called, invoking the duplication routine for each specific segment. Seg_vn deals with mapping anonymous memory (and regular memory, too). When pages that have come from a file must be paged out, they're either evicted because they're clean, or written back to disk because the page was modified. However, anonymous memory isn't backed by a file. If those pages are modified and must be removed from RAM, they must be written to the swap device. Solaris has a policy of reserving swap for anonymous memory allocations. This means that the kernel won't let you allocate an anonymous page unless it can guarantee that it can write that page to disk if the memory is needed for another purpose. I've made a number of technical simplifications here, but the omissions don't pertain to the nature of this problem. Getting back to the explanation, when seg_vn duplicates a range of anonymous pages, it needs to also reserve swap for those new pages, should they be modified by the new process. If it's unable to satisfy that reservation, it returns ENOMEM, which is the error that's occuring when our process tries to fork. The posix_spawn() routines call vforkx(). This doesn't duplicate the address space, since it's understood that the caller is going to exec in the very near future. In that case, the forked process is given the exact same address space as the parent, no modifications are made. The fix for 4612 causes subprocess to use posix_spawn instead of fork. This means we won't fail an anon_resvmem() during fork, but we haven't reduced the amount of memory that's being used by the system. > Should that bug be retargeted at providing a more graceful exit should we > run into ENOSPC (possible even with 4612 fixed, albeit much less likely)? Just to be clear, the problem we're encountering is an ENOMEM, not an ENOSPC. 4612 is just a band-aid. I would say that 4391 should probably try to address whatever it is that has consumed so much of our memory. That said, it wouldn't be a bad thing to have a top-level handler for EnvironmentErrors, unless you think we ought to be catching them and raising them as something else? > Right now it's marked as blocking the release, but it seems that 4612 is a > better candidate for that, and that we should push 4391 off to the future. 4612 addresses the symptoms that show up in 4391, but it doesn't actually reduce our memory consumption. Are people still hitting this problem, after your actions fix went back? If they are, perhaps we should figure out where all the memory has gone before deciding whether to push 4391 off to the future. Just my $.02 -j _______________________________________________ pkg-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/pkg-discuss
