#13447: Make libsingular multivariate polynomial rings collectable
-------------------------------------------------------+--------------------
       Reporter:  nbruin                               |         Owner:  rlm    
 
           Type:  defect                               |        Status:  new    
 
       Priority:  major                                |     Milestone:  
sage-5.4
      Component:  memleak                              |    Resolution:         
 
       Keywords:                                       |   Work issues:         
 
Report Upstream:  Reported upstream. No feedback yet.  |     Reviewers:         
 
        Authors:                                       |     Merged in:         
 
   Dependencies:                                       |      Stopgaps:         
 
-------------------------------------------------------+--------------------

Comment (by nbruin):

 OK! good progress. Instrumenting `sagedoc.py` a little bit we can indeed
 see the order in which the doctests are executed:
 {{{
 __main__
 __main__.change_warning_output
 __main__.check_with_tolerance
 __main__.example_0
 __main__.example_1
 __main__.example_10
 __main__.example_11
 __main__.example_12
 __main__.example_13
 __main__.example_14
 __main__.example_15
 __main__.example_16
 __main__.example_17
 __main__.example_18
 __main__.example_19
 __main__.example_2
 __main__.example_20
 __main__.example_21
 __main__.example_22
 __main__.example_23
 __main__.example_24
 __main__.example_25
 __main__.example_26
 __main__.example_27
 Unhandled SIGSEGV
 }}}
 so that indeed seems to be alphabetical order.

 Now let's run the doctests with singular-using-malloc. Result: No
 segfault. OSX comes with `gmalloc`, which is a guarded malloc for
 debugging purposes. It places every allocation on a separate page and
 unmaps that page upon freeing. So, any access-after-free leads to a
 segfault. Now we do get a segfault and it happens a lot sooner than
 `example_27`. In fact, now the segfault survives in `gdb`. The error
 happens when executing
 {{{
 G = I.groebner_basis()###line 921:_sage_    >>> G = I.groebner_basis()
 }}}

 Here's a session with `gdb` once the segfault has happened. I think I have
 been able to extract enough data to point at the probably problem.
 {{{
 Program received signal EXC_BAD_ACCESS, Could not access memory.
 Reason: KERN_INVALID_ADDRESS at address: 0x00000001850dbf44
 __pyx_f_4sage_4libs_8singular_8function_call_function
 (__pyx_v_self=0x190ab8960, __pyx_v_args=0x190a8e810,
 __pyx_v_R=0x19c39be70, __pyx_optional_args=<value temporarily unavailable,
 due to optimizations>) at sage/libs/singular/function.cpp:13253
 13253       currRingHdl->data.uring->ref = (currRingHdl->data.uring->ref -
 1);
 ####NB: This is line 1410 in sage/libs/singular/function.pyx
 (gdb) print currRingHdl
 $1 = (idhdl) 0x17c2b5fd0
 (gdb) print currRingHdl->data
 $2 = {
   i = -2062696816,
   uring = 0x1850dbe90,
   p = 0x1850dbe90,
   n = 0x1850dbe90,
   uideal = 0x1850dbe90,
   umap = 0x1850dbe90,
   umatrix = 0x1850dbe90,
   ustring = 0x1850dbe90 <Address 0x1850dbe90 out of bounds>,
   iv = 0x1850dbe90,
   bim = 0x1850dbe90,
   l = 0x1850dbe90,
   li = 0x1850dbe90,
   pack = 0x1850dbe90,
   pinf = 0x1850dbe90
 }
 (gdb) print currRingHdl->data.uring
 $3 = (ring) 0x1850dbe90
 (gdb) print currRingHdl->data.uring->ref
 Cannot access memory at address 0x1850dbf44
 (gdb) print  *__pyx_v_si_ring
 $10 = {
   idroot = 0x0,
   order = 0x19c3cbff0,
   block0 = 0x19c3cdff0,
   block1 = 0x19c3cfff0,
   parameter = 0x0,
   minpoly = 0x0,
   minideal = 0x0,
   wvhdl = 0x19c3c9fe0,
   names = 0x19c3bdfe0,
   ordsgn = 0x19c3ddfe0,
   typ = 0x19c3dffd0,
   NegWeightL_Offset = 0x0,
   VarOffset = 0x19c3d9ff0,
   qideal = 0x0,
   firstwv = 0x0,
   PolyBin = 0x104ee8440,
   ringtype = 0,
   ringflaga = 0x0,
   ringflagb = 0,
   nr2mModul = 0,
   nrnModul = 0x0,
   options = 100663424,
   ch = 0,
   ref = 0,
   float_len = 0,
   float_len2 = 0,
   N = 3,
   P = 0,
   OrdSgn = 1,
   firstBlockEnds = 3,
   real_var_start = 0,
   real_var_end = 0,
   isLPring = 0,
   VectorOut = 0,
   ShortOut = 0,
   CanShortOut = 1,
   LexOrder = 0,
   MixedOrder = 0,
   ComponentOrder = -1,
   ExpL_Size = 3,
   CmpL_Size = 3,
   VarL_Size = 1,
   BitsPerExp = 20,
   ExpPerLong = 3,
   pCompIndex = 2,
   pOrdIndex = 0,
   OrdSize = 1,
   VarL_LowIndex = 1,
   MinExpPerLong = 3,
   NegWeightL_Size = 0,
   VarL_Offset = 0x19c3e3ff0,
   bitmask = 1048575,
   divmask = 1152922604119523329,
   p_Procs = 0x19c3e7f80,
   pFDeg = 0x104a80150 <pDeg(spolyrec*, sip_sring*)>,
   pLDeg = 0x104a80920 <pLDegb(spolyrec*, int*, sip_sring*)>,
   pFDegOrig = 0x104a80150 <pDeg(spolyrec*, sip_sring*)>,
   pLDegOrig = 0x104a80920 <pLDegb(spolyrec*, int*, sip_sring*)>,
   p_Setm = 0x104a7ff40 <p_Setm_TotalDegree(spolyrec*, sip_sring*)>,
   cf = 0x11e487e70,
   algring = 0x0,
   _nc = 0x0
 }
 (gdb) print __pyx_v_si_ring
 $11 = (ip_sring *) 0x19c3c5e90
 (gdb) print ((struct
 
__pyx_obj_4sage_5rings_10polynomial_28multi_polynomial_libsingular_MPolynomialRing_libsingular
 *)__pyx_v_R)->_ring
 $12 = (ip_sring *) 0x19c3c5e90
 (gdb) print ((struct
 __pyx_obj_4sage_5rings_10polynomial_6plural_NCPolynomialRing_plural
 *)__pyx_v_R)->_ring
 $13 = (ip_sring *) 0x10019ff30
 ####NB: so PY_TYPE_CHECK(R, MPolynomialRing_libsingular) is true
 (gdb) print (__pyx_v_si_ring != currRing)
 $15 = false
 ####NB: does this mean that rChangeCurrRing(si_ring) got executed or that
 si_ring already equalled currRing?
 (gdb) print (currRingHdl->data.uring != currRing)
 $16 = true
 ####NB: of course, that's why we segfault on the statement that follows:
 ####NB:       currRingHdl.data.uring.ref -= 1
 (gdb) print *(currRingHdl->data.uring)
 Cannot access memory at address 0x1850dbe90
 ####NB: It looks like currRingHdl.data.uring has been unbound.
 ####NB: naturally, changing a field on that pointer will corrupt memory
 (or in this case
 ####NB: because gmalloc has unmapped the page, cause a segfault)
 ####NB: Could it be that the code here should really test for uring being
 still valid?
 ####NB: (if it can do that at all)?
 }}}
 So I think the issue is in `sage.lib.singular.function.call_function`:
 {{{
 ...
     if currRingHdl.data.uring!= currRing:
         currRingHdl.data.uring.ref -= 1
         currRingHdl.data.uring = currRing # ref counting?
         currRingHdl.data.uring.ref += 1
 ...
 }}}
 The evidence points absolutely to `currRingHdl.data.uring` pointing to
 unallocated (probably freed) memory. The access then of course can have
 all kinds of effects. At this point it is probably to reason about the
 code whether this point would always be allocated.

 Keep in mind that in principle, singular code can get executed in rather
 awkward moments, possibly as part of clean-ups of circular garbage and
 call-backs on weakref cleanup, where equality might be tested of objects
 that are soon to be deallocated themselves.

 I think we might be getting close to a badge for debugging excellence
 here!

-- 
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/13447#comment:13>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sage-trac?hl=en.

Reply via email to