#715: Parents probably not reclaimed due to too much caching
-------------------------------------------------------------------+--------
Reporter: robertwb |
Owner: somebody
Type: defect |
Status: needs_review
Priority: major |
Milestone: sage-5.4
Component: coercion |
Resolution:
Keywords: weak cache coercion Cernay2012 | Work
issues:
Report Upstream: N/A |
Reviewers: Jean-Pierre Flori, Simon King, Nils Bruin
Authors: Simon King, Jean-Pierre Flori | Merged
in:
Dependencies: #9138, #11900, #11599, to be merged with #11521 |
Stopgaps:
-------------------------------------------------------------------+--------
Comment (by nbruin):
OK, I've taken out the `omStrDup` call in `sage/libs/singular/ring.pyx`
and just manually copy the strings over:
{{{
for i from 0 <= i < n:
_name = names[i]
sys.stderr.write("calling omStrDup for i=%s with
name=%s\n"%(i,names[i]))
j = 0
while <bint> _name[j]:
j+=1
j+=1 #increment to include the 0
sys.stderr.write("string length (including 0) seems to be %s\n"%j)
copiedname = <char*>omAlloc(sizeof(char)*(j+perturb))
sys.stderr.write("Done reserving memory buffer; got address
%x\n"%(<long>copiedname))
for 0 <= offset < j:
sys.stderr.write("copying character nr %s\n"%offset)
copiedname[offset] = _name[offset]
_names[i] = copiedname
sys.stderr.write("after omStrDup\n")
}}}
If I set this code with `perturb=7`, I don't get a segfault. With smaller
values I do, and the segfault happens in the `omAlloc` line. Given that
`j==2` for most of this code, I guess that memory blocks are at least 8
bytes (this is OSX 64bits).
If `omAlloc` fails, I guess some of the internal omAlloc data structures
is failing (I think the idea is that memory is managed in equal-sized
blocks with just a free list on a system mAlloc-ed page). If I were to
implement that, I'd store the pointers of the free block linked list in
the actual blocks (hence minimum 8 byte blocks), so if anyone omAllocs an
8-byte block and then writes past it, they could ruin the linked list and
likely cause a subsequent omAlloc to segfault (because the omAlloc would
actually have to access the location pointed to to check if the there is a
next node in the free list). Even more likely: some code decides to "zero
out" a block after it's already been `omFree'd`. That could also be a
double deallocation.
There must be people with vast omAlloc debugging experience who have
wonderful tricks to track down this kind of error. A tiny bit of
instrumentation should do the trick (frequent verification of free lists,
checking that a block is not already in the free list when asked to
deallocate -- these are things one could easily do without changing memory
layout.
In the mean time, we can "fix" the segfault on bsd by allocating a little
extra space for variable names. At least 9 bytes seems to do the trick. By
now it's pretty clear that the real error is probably a refcounting error
in sage libsingular rings, which didn't become apparent until these things
actually do get deallocated.
If we insist that libsingular behaves as specified, then part of their
specification is likely that they should not be deallocated, so then we
should put in a strong reffing cache on these things (easy to do). Then
one can make another ticket "make libsingular rings deallocatable".
I think exposing the rest of sage to mortal parents is too important to
delay on a hard-to-track-down memory issue for deallocation in
libsingular.
--
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/715#comment:298>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sage-trac?hl=en.