#13447: Make libsindular multivariate polynomial rings collectable
---------------------------+------------------------------------------------
Reporter: nbruin | Owner: rlm
Type: defect | Status: new
Priority: major | Milestone: sage-5.4
Component: memleak | Resolution:
Keywords: | Work issues:
Report Upstream: N/A | Reviewers:
Authors: | Merged in:
Dependencies: | Stopgaps:
---------------------------+------------------------------------------------
Comment (by nbruin):
On 5.4-beta0 + #715 + #11521, there is a doctest failure on
`bsd.math.washington.edu`, an x86_64 machine running MacOSX 10.6:
{{{
bash-3.2$ ../../sage -t sage/misc/cachefunc.pyx
sage -t "devel/sage-main/sage/misc/cachefunc.pyx"
The doctested process was killed by signal 11
[12.7 s]
----------------------------------------------------------------------
The following tests failed:
sage -t "devel/sage-main/sage/misc/cachefunc.pyx" #
Killed/crashed
Total time for all tests: 12.7 seconds
}}}
The segmentation fault happens reliably, but is hard to study because
- running the same examples in an interactive session does not trigger the
problem
- running with `sage -t -gdb` (yes, that's possible!) or `sage -t
--verbose` does not trigger the problem.
- the order of tests in the file seems fairly important. You can change
and
delete some tests but not others. The likely explanation is that a garbage
collection has to be triggered under the right conditions, so that the
memory
corruption (which likely happens upon deallocate somewhere) happens in the
right
spot.
The segfault happens in the doctest for
`CachedMethodCaller._instance_call` (line 1038 in the sage source;
`example_27` in the file `~/.sage/tmp/cachefunc_*.py` left after
doctesting), in the line
{{{
sage: P.<a,b,c,d> = QQ[]
}}}
Further instrumentation showed that the segfault happens in
`sage/libs/singular/ring. pyx`, in `singular_ring_new`, in the part that
copies the strings over.
{{{#!diff
+ sys.stderr.write("before _names allocation\n")
_names = <char**>omAlloc0(sizeof(char*)*(len(names)))
+ sys.stderr.write("after _names allocation\n")
for i from 0 <= i < n:
_name = names[i]
+ sys.stderr.write("calling omStrDup for i=%s with
name=%s\n"%(i,names[i])
_names[i] = omStrDup(_name)
+ sys.stderr.write("after omStrDup\n")
}}}
The call `_omStrDup` segfaults for `i=1`. Unwinding the `_omStrDup` call:
{{{#!diff
for i from 0 <= i < n:
_name = names[i]
+ sys.stderr.write("calling omStrDup for i=%s with
name=%s\n"%(i,names[i]))
- _names[i] = omStrDup(_name)
+ j = 0
+ while <bint> _name[j]:
+ j+=1
+ j+=1 #increment to include the 0
+ sys.stderr.write("string length (including 0) seems to be
%s\n"%j)
+ copiedname = <char*>omAlloc(sizeof(char)*(j+perturb))
+ sys.stderr.write("Done reserving memory buffer; got address
%x\n"%(<long>copiedname))
+ for 0 <= offset < j:
+ sys.stderr.write("copying character nr %s\n"%offset)
+ copiedname[offset] = _name[offset]
+ _names[i] = copiedname
+ sys.stderr.write("after omStrDup\n")
}}}
shows that it's actually the `omAlloc` call segfaulting. For `perturb=7`
or higher, the segfault does not happen. For `perturb` a lower value it
does. Given that the omAlloc addresses returned on earlier calls do not
seem close to a page boundary, the only way `omAlloc` can fail is
basically by a corrupted freelist an 8-byte bin. Likely culprits:
- a double free (although I'd expect that would trigger problems on more
architectures)
- someone writing out-of-bounds in omAlloc-managed memory.
Perhaps someone claiming an `<int>,<int>` structure and storing a `<void
*>` in the second one?
Note the `<char*>` to `<long>` cast in the print statement. With an
`<int>`, the compiler complains
about loss of precision, but not with `<long>`. I haven't checked whether
`<long>` is really 64 bits on this machine, though.
I have tried and the problem seems to persist with the old singular (5.4b0
has a
recently upgraded singular).
It would help a lot if someone could build singular to use plain malloc
throughout and then use valgrind or a similar tool, which should be able
to
immediately catch a double free or out-of-bounds error. If the root of the
problem is not OSX-specific, this would even show up on other
architectures.
See also [http://trac.sagemath.org/sage_trac/ticket/715#comment:295
#715,comment
295] and below for some more details on how the diagnosis above was
obtained.
--
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/13447#comment:1>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sage-trac?hl=en.